WO2016100460A1 - Systèmes et procédés pour la localisation et la séparation de sources - Google Patents

Systèmes et procédés pour la localisation et la séparation de sources Download PDF

Info

Publication number
WO2016100460A1
WO2016100460A1 PCT/US2015/066012 US2015066012W WO2016100460A1 WO 2016100460 A1 WO2016100460 A1 WO 2016100460A1 US 2015066012 W US2015066012 W US 2015066012W WO 2016100460 A1 WO2016100460 A1 WO 2016100460A1
Authority
WO
WIPO (PCT)
Prior art keywords
tensor
acoustic
doa
source
matrix
Prior art date
Application number
PCT/US2015/066012
Other languages
English (en)
Inventor
Johannes TRAA
Noah Daniel STEIN
David Wingate
Original Assignee
Analog Devices, Inc.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Analog Devices, Inc. filed Critical Analog Devices, Inc.
Publication of WO2016100460A1 publication Critical patent/WO2016100460A1/fr

Links

Classifications

    • GPHYSICS
    • G01MEASURING; TESTING
    • G01SRADIO DIRECTION-FINDING; RADIO NAVIGATION; DETERMINING DISTANCE OR VELOCITY BY USE OF RADIO WAVES; LOCATING OR PRESENCE-DETECTING BY USE OF THE REFLECTION OR RERADIATION OF RADIO WAVES; ANALOGOUS ARRANGEMENTS USING OTHER WAVES
    • G01S3/00Direction-finders for determining the direction from which infrasonic, sonic, ultrasonic, or electromagnetic waves, or particle emission, not having a directional significance, are being received
    • G01S3/80Direction-finders for determining the direction from which infrasonic, sonic, ultrasonic, or electromagnetic waves, or particle emission, not having a directional significance, are being received using ultrasonic, sonic or infrasonic waves
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01SRADIO DIRECTION-FINDING; RADIO NAVIGATION; DETERMINING DISTANCE OR VELOCITY BY USE OF RADIO WAVES; LOCATING OR PRESENCE-DETECTING BY USE OF THE REFLECTION OR RERADIATION OF RADIO WAVES; ANALOGOUS ARRANGEMENTS USING OTHER WAVES
    • G01S3/00Direction-finders for determining the direction from which infrasonic, sonic, ultrasonic, or electromagnetic waves, or particle emission, not having a directional significance, are being received
    • G01S3/80Direction-finders for determining the direction from which infrasonic, sonic, ultrasonic, or electromagnetic waves, or particle emission, not having a directional significance, are being received using ultrasonic, sonic or infrasonic waves
    • G01S3/8006Multi-channel systems specially adapted for direction-finding, i.e. having a single aerial system capable of giving simultaneous indications of the directions of different signals

Definitions

  • the present invention relates to the field of signal processing, and in particular to source localization and/or separation.
  • an acoustic sensor acquires an acoustic signal that has contributions from a plurality of different acoustic sources, where, as used herein, the term “contribution of an acoustic source” refers to at least a portion of an acoustic signal generated by a particular acoustic source, typically the portion being a portion of a particular frequency or a range of frequencies, at a particular time or range of times.
  • an acoustic source is e.g. a person speaking, there will be multiple contributions, i.e. there will be acoustic signals of different frequencies at different times generated by such a “source.”
  • source separation In a process generally referred to as “source separation,” various digital signal processing techniques are used to recover the original component signals attributable to different sources from a combined signal acquired by the acoustic sensor (i.e. from the acquired signal that has a combination of contributions from different sources). A process of performing source separation without any prior information about
  • Source separation can often be improved by processing acoustic signals acquired by multiple acoustic sensors, arranged e.g. in a sensor array, e.g. a microphone array. In such scenarios, each acoustic sensor acquires a corresponding signal that includes
  • source localization refers to a process of
  • DOA Directional of Arrival
  • Sound source localization and separation is used in many applications, including, for example, signal enhancement and noise cancellation for phones or hearing aids, speech recognition, home automation, and voice user interface in the car or home.
  • various source separation techniques use DOA in order to recover signals attributable to one or more of the individual sources.
  • source localization typically precedes, or may be considered a part of, source separation.
  • many well ⁇ known source separation approaches use beamforming, i.e. signal processing techniques used to control the directionality of the reception of a signal, by employing arrays of acoustic sensors that aim to improve directional gain of the sensor array(s) by increasing the gain in the direction of a source of interest (e.g. a speaker) and decreasing the gain in the direction of interferences and noise.
  • Beamforming techniques use information about the DOA of the source, and, therefore, are preceded by
  • SRP Steered Response Power
  • each beamformer in a family of beamformers focuses on a specific direction.
  • SRP localization can be used with a
  • GCC Generalized Cross ⁇ Correlation
  • PHAT Phase Transform
  • a different known method for finding DOAs uses eigenanalysis of the data correlation matrix.
  • a Multiple Signal Classification (MUSIC) algorithm uses this method to identify signal and noise subspaces and form a MUSIC pseudospectrum that contains peaks at the source DOAs.
  • the MUSIC pseudospectrum plots direction on the x ⁇ axis and likelihood of that direction as being the source of a sound on the y ⁇ axis, and thus is a function over the space of directions which indicates where sources are likely to be.
  • Another known method includes modeling observed data vectors as zero ⁇ means Gaussian random variables and using an EM algorithm to learn the sources’ covariance parameters.
  • the sources can be separated using multichannel Wiener filtering.
  • multichannel Wiener filtering can be used separate source signals from background noise.
  • multichannel Wiener filtering can be used to separate speech signals from each other.
  • the output of the multichannel Wiener filter includes multiple sources and includes a correlation matrix that describes how the channels are correlated. The multichannel Wiener filter reconstructs source vectors directly.
  • a more effective and efficient method for localizing and separating signals involves interpreting the SRP function as a probability distribution and maximizing it as a function of the source DOAs.
  • a mixture of single ⁇ source SRPs MoSRP
  • MultSRP an SRP that explicitly models the presence of multiple sources.
  • Some advantages of the second method include simultaneous localization of each of the multiple sources and explicit modeling of interference between sources.
  • Time ⁇ Frequency (TF) masking is used to isolate TF bins, described in greater detail below, that correspond to directional signals of interest, thereby merging the localization, separation and Wiener post ⁇ filtering steps into one unified approach.
  • an improved type of Wiener filter may be used for estimating a weight for each of multiple TF bins for each of multiple sources.
  • the weight estimates for each time ⁇ frequency bin can be used to determine which bins contain source energy and which bins do not contain source energy. Bins which do not contain source energy may still contain energy, for example, noise.
  • a Wiener filter coefficient is estimated, where the Wiener filter coefficient corresponds to the probability that any of the directional sources are present.
  • a method for identifying a first direction of arrival of sound waves (i.e. acoustic signals) from a first acoustic source and a second direction of arrival of sound waves from a second acoustic source.
  • the methods includes receiving, at a microphone array, acoustic signals including a combination of the sound waves from the first and second acoustic sources, converting the received acoustic signals from a time domain to a time ⁇ frequency domain, processing the converted acoustic signals to determine an estimated first angle representing the first direction of arrival and an estimated second angle representing the second direction of arrival, and updating the estimated first and second angles.
  • the processing includes localizing, separating and Wiener post ⁇ filtering the converted acoustic signals using time ⁇ frequency weighting and outputting a time ⁇ frequency weighted signal for estimating the first and
  • converting the received acoustic signals from a time domain to a time ⁇ frequency domain includes using a short time Fourier transform.
  • the method includes combining the time ⁇ frequency weighted signal with the converted acoustic signals to generate a correlation matrix.
  • updating the estimated first and second angles comprises utilizing the correlation matrix and the estimated first and second angles and outputting updated estimated first and second angles.
  • processing the converted acoustic signals to determine the estimated first and second angles includes decomposing the converted acoustic signals to identify signals from each of the first and second acoustic sources by accounting for interference between the first and second acoustic sources in forming the acoustic signals.
  • processing the converted acoustic signals and updating the first and second estimated angles includes iteratively decomposing the converted acoustic signals to simultaneously determine the first and second directions of arrival.
  • processing the converted acoustic signals includes processing using steered response power localization.
  • the method further includes using an inverse STFT to convert the processed converted acoustic signals back into the time domain and separating the sound waves from the first acoustic source from the sound waves from the second acoustic source.
  • aspects of the present disclosure may be embodied in various manners – e.g. as a method, a system, a computer program product, or a computer ⁇ readable storage medium. Accordingly, aspects of the present disclosure may take the form of an entirely hardware
  • aspects of the present disclosure may take the form of a computer program product embodied in one or more computer readable medium(s), preferably non ⁇ transitory, having computer readable program code embodied, e.g., stored, thereon.
  • a computer program may, for example, be downloaded (updated) to the existing devices and systems (e.g. to the existing radar or sonar receivers or/and their controllers, etc.) or be stored upon manufacturing of these devices and systems.
  • FIGURE 1 is a diagram illustrating an audio processor receiving signals from multiple sources, according to some embodiments of the disclosure
  • FIGURE 2 is a diagram illustrating a method for identifying a first direction of arrival of sound waves from a first acoustic source and a second direction of arrival of sound waves from a second acoustic source, according to some embodiments of the disclosure;
  • FIGURE 3 is one diagram illustrating a method for separating and localizing signals, according to some embodiments of the disclosure.
  • FIGURE 4 is a diagram illustrating two data vectors from two sources and the combination of the two data vectors, according to some embodiments of the disclosure
  • FIGURE 5A is a diagram illustrating single ⁇ source likelihood over DOAs, according to some embodiments of the disclosure.
  • Figure 5B is a diagram illustrating a multi ⁇ source SRP likelihood for a data mixture of two sources over a joint space of all DOA pairs, according to some
  • FIG. 6 is another diagram illustrating a MultSRP method for separating and localizing signals, according to some embodiments of the disclosure.
  • FIGURE 1 is a diagram 100 illustrating an audio processor 102 receiving signals from first 104a, second 104b, and third 104n sources, according to some embodiments of the disclosure.
  • the audio processor 102 includes a microphone array 106, a direction finding module 108, a source separating module 110, and an audio processing module 112.
  • the microphone array 106 receives (i.e. acquires) a combined sound, referred to in the following as “ambient sound,” including the signals from the first 104a, second 104b and third 104n sources.
  • ambient sound includes signals from more than three sources, and there may be any number of sources present.
  • the microphone array 106 may include one or more acoustic sensors, arranged e.g. in a sensor array, each sensor of the array configured to acquire an ambient sound (i.e., each acoustic sensor acquires a corresponding signal).
  • the sensors may be provided relatively close to one another, e.g. less than 2 centimeters (cm) apart, preferably less than 1 cm apart.
  • the sensors may be arranged separated by distances that are much smaller, on the order of e.g. 1 millimeter (mm) or about 300 times smaller than typical sound wavelength, where beamforming techniques, used e.g. for
  • the sensors may be provided at larger distances with respect to one another.
  • the plurality of signals acquired by an array of acoustic sensors may consider the plurality of signals acquired by an array of acoustic sensors as a single signal, possibly by combining the individual acquired signals into a single signal as is appropriate for a particular implementation. Therefore, in the following, when an “acquired signal” is discussed in a singular form, then, unless otherwise specified, it is to be understood that the signal may comprise several acquired signals acquired by different sensors of the microphone array 106.
  • a characteristic could e.g. be a quantity indicative of a magnitude of the acquired signal.
  • a characteristic is “spectral” in that it is computed for a particular frequency or a range of frequencies.
  • a characteristic is “time ⁇ dependent” in that it may have different values at different times.
  • such characteristics may be a Short Time Fourier Transform (STFT), computed as follows.
  • STFT Short Time Fourier Transform
  • An acquired signal is functionally divided into overlapping blocks, referred to herein as “frames.”
  • frames may be of a duration of 64 milliseconds (ms) and be overlapping by e.g. 48 ms.
  • the portion of the acquired signal within a frame is then multiplied with a window function (i.e. a window function is applied to the frames) to smooth the edges.
  • window function also known as tapering or apodization function refers to a mathematical function that has values equal to or close to zero outside of a particular interval.
  • the window functions used are non ⁇ negative smooth "bell ⁇ shaped" curves, though rectangle, triangle, and other functions can be used.
  • a function that is constant inside the interval and zero elsewhere is called a “rectangular window,” referring to the shape of its graphical representation.
  • a transformation function such as e.g. Fast Fourier Transform (FFT)
  • the frequency decomposition of all of the frames may be arranged in a matrix where frames and frequency are indexed (in the following, frames are described to be indexed by “t” and frequencies are described to be indexed by “f”).
  • Each element of such an array, indexed by (f, t) comprises a complex value resulting from the application of the transformation function and is referred to herein as a "time ⁇ frequency (TF) bin” or simply “bin.”
  • TF time ⁇ frequency bin
  • bin may be viewed as indicative of the fact that such a matrix may be considered as comprising a plurality of bins into which the signal’s energy is distributed.
  • Time ⁇ frequency bins come into play in BSS algorithms in that separation of a particular acoustic signal of interest (i.e. an acoustic signal generated by a particular source of interest) from the total signal acquired by an acoustic sensor may be achieved by identifying which bins correspond to the signal of interest, i.e. when and at which frequencies the signal of interest is active. Once such bins are identified, the total acquired signal may be masked by zeroing out the undesired time ⁇ frequency bins. Such an approach would be called a “hard mask.” Applying a so ⁇ called “soft mask” is also possible, the soft mask scaling the magnitude of each bin by some amount. Then an inverse transformation function (e.g.
  • inverse STFT may be applied to obtain the desired separated signal of interest in the time domain.
  • masking in the frequency domain corresponds to applying a time ⁇ varying frequency ⁇ selective filter in the time domain.
  • the desired separated signal of interest may then be selectively processed for various purposes.
  • each source 104a, 104b, 104n has a distinct location, and the signal from each source 104a, 104b, 104n arrives at the microphone array 106 at an angle relative to its source location. Based on this angle, for each signal, the audio processor 102 estimates a direction ⁇ of ⁇ arrival (DOA).
  • DOA direction ⁇ of ⁇ arrival
  • each source 104a, 104b, 104n has a DOA 114a, 114b, 114n.
  • the first source 104a has a first DOA 114a
  • the second source 104b has a second DOA 114b
  • the third source 104n has a third DOA 114n.
  • the microphone array 106 is coupled to the direction finding module 108, and the signals received at the microphone array 106 are transmitted to the direction
  • the direction finding module 108 estimates the DOAs 114a, 114b, and 114n associated with source signals 104a, 104b, and 104n, as described in greater detail below.
  • the direction finding module 108 is coupled to a separation masking module 110, where the signals corresponding to the various sources 104a, 104b, 104n are separated from each other and from background noise which may be present.
  • the direction finding module 108 and the separation masking module 110 are each also coupled to a further audio processing module 112, where further processing of the acoustic signals occurs.
  • the further audio processing may depend on the application, and may include, for example, enhancing one or more speech signals, and filtering out constant noise or repetitive sounds.
  • MVDR Minimum Variance Distortionless Response
  • LCMV Linearly ⁇ Constrained Minimum ⁇ Variance
  • MUSIC Multiple Signal Classification
  • Delay ⁇ and ⁇ Sum (DS) beamforming involves adding a time delay to the signal recorded from each microphone that cancels out the delay caused by the extra travel time that it took for the signal to reach the microphone (as opposed to microphones that were closer to the signal source). Summing the resulting in ⁇ phase signals enhances the signal.
  • This beamforming method can be used to estimate DOA by testing various time delays, since the delay that correlates with the correct DOA will amplify the signal, while incorrect time delays destructively interfere with the signal.
  • the DS beamforming method focuses on the time domain to estimate DOA, and it is inaccurate in noisy environments.
  • Delay ⁇ and ⁇ Sum (DS) beamforming involves fractional delays in the frequency domain.
  • the received signals are processed by measuring the fractional delays in the signals, weighting each channel by a complex coefficient, and adding up the results.
  • DS beamforming is used in processing received signals in the single source model described below.
  • MVDR beamforming is similar to DS beamforming, but takes into account statistical noise correlations between the channels.
  • a Fourier transform can be used to transform the time domain signal into the time ⁇ frequency plane by converting time delays between sensors into phase shifts.
  • MVDR beamforming provides good noise suppression by minimizing the output power of the array while not distorting signals from the primary DOA, but it has a power defined by a matrix inversion, and is therefore computationally intensive.
  • MVDR and DS beamformers are generalized to the Multi ⁇ source case via a multiply ⁇ constrained optimization problem, and the solution is the Linearly ⁇ Constrained Minimum ⁇ Variance (LCMV) beamformer.
  • the weight vector can be used to determine how to weight the channels in the time ⁇ frequency plane to preserve energy from desired directions and suppress energy from other directions:
  • the beamformers discussed above can be used to estimate the coefficients when the source DOA(s) ⁇ are already known. Thus, in systems using the LCMV, MVDR and DS beamforming methods described above, first the source DOA(s) are determined and then beamforming is performed.
  • the source DOA(s) may be determined using, for example, Steered Response Power Localization as described below.
  • the MUSIC beamformer is a subspace method based on an eigenanalysis of the covariance matrix.
  • the MUSIC beamformer requires an eigendecomposition. Additionally, MUSIC is based on the assumption that the subspace that the signals lie in is
  • the MUSIC beamformer decomposes a covariance matrix representing the signal and noise of the received signal.
  • Steered Response Power (SRP) Localization is used to estimate source DOA(s) ⁇ .
  • SRP localization is used to estimate DOA’s by discretizing the direction space.
  • source DOAs ⁇ estimated by SRP Localization can be input in the LCMV beamforming equation (2) above.
  • SRP localization identifies DOAs ⁇ by searching for peaks in the output power of a single ⁇ source beamformer.
  • a more accurate and effective approach is to scan all DOA sets ⁇ using an LCMV beamformer and locate the peak output power.
  • this is computationally inefficient and too time ⁇ consuming for real ⁇ time feedback, since discretizing the DOA search space into D look directions results in D K ⁇ ‘s to be scanned (where K is the number of sources present).
  • the multi ⁇ source SRP function is modeled as a continuous likelihood function parametrized by ⁇ and the likelihood function is maximized to identify source DOAs.
  • FIGURE 2 is a diagram illustrating a method 200 for identifying a first direction of arrival of sound waves from a first acoustic source and a second direction of arrival of sound waves from a second acoustic source.
  • the method includes, at step 202, receiving, at a microphone array, acoustic signals including the sound waves from the first and second acoustic sources.
  • the received acoustic signals now represented by electrical signals generated by the microphone array, are converted from a time domain to a time ⁇ frequency domain.
  • the converted acoustic signals are processed to determine an estimated first angle representing the first direction of arrival and an estimated second angle representing the second direction of arrival. Processing includes localizing, separating and Wiener post ⁇ filtering the converted acoustic signals using time ⁇ frequency weighting and outputting a time ⁇ frequency
  • the estimated first and second angles are updated.
  • the likelihood of the first and second angles is determined integrally as a single unit from the mixed signals received at the microphone array, rather than maximizing the likelihood of each of the first and second angles separately.
  • FIGURE 3 is a diagram illustrating a method 300 for separating and localizing signals, according to some embodiments of the disclosure.
  • the method 300 is an iterative approach in which a probabilistic SRP model is combined with time ⁇ frequency masking to perform blind source separation and localization in the presence of non ⁇ stationary interference.
  • the method has an iterative loop 306 including a Time ⁇ Frequency (TF) weighting step 308, a correlation matrices step 310, and a direction of arrival (DOA) update step 312.
  • TF Time ⁇ Frequency
  • DOA direction of arrival
  • the method 300 begins with receiving input acoustic signals x 302 acquired by different microphone elements of the microphone array 106.
  • each acquired signal 302 may, and typically will, include contributions from multiple sources 104a ⁇ 104n and a goal of source separation is to distinguish these individual contributions on a per ⁇ source basis.
  • the acquired input acoustic signals 302 are processed through an STFT 304 to transform the signals from the time domain to the time ⁇ frequency plane.
  • the output X from the STFT 304 is input to the TF weighting step 308 and to the correlation matrices step 310.
  • the TF weighting step 308 uses TF masking to isolate TF bins that correspond to selected directional signals.
  • some directional signals are identified as being directional signals of interest, and the corresponding TF bins are isolated. Identifying the directional signal or signals of interest may include separating identified signals, and selecting one (or more) of the separated signals.
  • the selected signal corresponds to a speech signal, and it may be the speech of a particular speaker.
  • the selected directional signals are identified based on peaks in output power.
  • the TF weighting step 308 receives the output signals from the STFT step 304 as well as a DOA set ⁇ (DOA matrix) from the DOA update step 312, and
  • the output ⁇ from the TF weighting step 308 is input into the correlation matrices step 310.
  • the correlation matrices step 310 combines the TF weighted input and data output from the STFT 304.
  • the correlation matrices step 310 uses the inputs to derive correlation matrices as described in greater detail below with respect to equations 15 and 16, and outputs an updated correlation matrix R to the DOA update step 312.
  • the DOA update step revises the set of DOA’s ⁇ based on the input correlation matrix R, and outputs the updated DOA’s ⁇ to the TF weighting step 308.
  • an output set of DOA’s ⁇ indicating the localization results is output from the DOA update step 312 to a final separation step 314.
  • the separation step 314 also receives the STFT processed data x as input.
  • the set of DOA’s ⁇ is used to separate out the signals in the data x and generates an STFT matrix for each source.
  • the STFT matrices are processed with an inverse STFT at step 316, which transforms each one into a time domain signal.
  • the time domain signals 318 output from step 316 are localized, separated and post ⁇ filtered output signals.
  • the method 300 is performed using the following equations.
  • a first method for maximizing the SRP as a function of the source DOA’s uses an SRP that explicitly models the presence of multiple sources.
  • Identifying the DOAs involves maximizing a likelihood function:
  • x is the STFT coefficients of the data from the microphone array
  • ⁇ 1 and ⁇ 2 are estimated DOA angles.
  • a Gaussian likelihood for the observed data vectors x ft is:
  • f the variance of the background noise at frequency f
  • I the identity matrix
  • a f is the steering matrix including the observed mixing vectors a f as elements
  • s ft is a vector of complex source coefficients for a time ⁇ frequency bin with one component for each source.
  • the expectation E[s ft ] can be approximated with a least squares estimate:
  • FIGURE 4 is a diagram 400 illustrating a first 402 and second 404 data vectors from first and second sources and the combination 406 of the two data vectors
  • the diagram 400 illustrates the additivity of the first 402 and second 404 data vectors. As illustrated in Figures 5A and 5B, due to interference between the first 402 and second 404 data vectors, a spurious peak in the single source likelihood is present between the true DOAs. If a single ⁇ source likelihood is calculated for the superposition of the first 402 and second 404 data vectors, the single source likelihood will indicate the likelihood of a single source at the combination data vector 406 source. This is illustrated in FIGURE 5A, which shows is a diagram 500 illustrating single ⁇ source likelihood over DOAs, according to some embodiments of the disclosure. In particular, the diagram 500 shows the single source likelihood, with a peak indicating a DOA around 1.3 ⁇ 1.4 radians. Thus, the single source likelihood equation estimates a single source positioned between the first and second sources, rather than the two separate sources.
  • Figure 5B is a diagram 550 illustrating a multi ⁇ source SRP likelihood for a data mixture of two sources over a joint space of all DOA pairs, according to some embodiments of the disclosure.
  • the data shown in Figure 5B is derived using equation (10) above, which estimated the first and second sources as having a DOAs at 0.56 radians and at 2.26 radians on the unit circle.
  • a second method for maximizing the SRP as a function of the source DOA’s uses a mixture of single ⁇ source SRP’s.
  • is the step size
  • is a function that normalizes the gradient, which appears in parentheses after the ⁇ .
  • the gradient indicates which direction corresponds with an improvement in DOA estimates.
  • the step size ⁇ is how far to move in the indicated direction.
  • the maximum likelihood can be estimated for both the one ⁇ source model and the multiple source model.
  • the gradient can be
  • m is the matrix of microphone positions (a matrix in which the columns are the positions of the
  • the single source model can be used when multiple sources are present by modeling the presence of the other sources at each time t with hidden variables z ft that capture which source is active at any selected time.
  • EM Expectation ⁇ Maximization
  • [0076] are source ⁇ specific correlation matrices, and are defined in terms of the posterior probabilities of the z ft ’s:
  • equations 13 ⁇ 16 show one way to use the single source method of equation 12 for multiple sources.
  • equation 17 can be used for localization of multiple sources.
  • the E step soft TF weights are determined, and in the M step, each source’s DOA is optimized.
  • the EM method alternates between estimating localization (DOA) parameters and estimating separation (TF mask) parameters.
  • the gradient in the multiple source case is:
  • This multiple source case takes cross ⁇ talk into account while avoiding the complexity of the EM algorithm.
  • the wei hts can be a roximated as usin e uations 21 and 22 : (21)
  • interleaving the Wiener masking with DOA optimization improves localization accuracy in the presence of ambient noise.
  • Equation 15 can be estimated by multiplying the posteriors with the Wiener filter weights.
  • the sources can be separated by applying TF masks with weights. In various examples, this may be done in one or more of step 308 and step 314 of the method 300.
  • equation can be used: (23)
  • the source coefficients are recovered with LCMV beamforming.
  • the variance is related to the hardness of the mask, such that as the variance moves to zero, the mask becomes binary.
  • the masks can be applied to corresponding components of and followed with a Wiener masking step to suppress non ⁇ speech interference and reduce the presence of masking artifacts.
  • FIGURE 6 is a diagram illustrating a method 600 for separating and localizing signals, according to some embodiments of the disclosure.
  • the method 600 may be considered as a summary, or an alternative representation, of the method 300 described above. Therefore, in the interests of brevity, some steps illustrated in method 600 refer to steps illustrated in method 300 in order to not repeat all of the details of their descriptions.
  • stage 610 that may be referred to as a preprocessing stage
  • stage 620 that may be referred to as an optimization stage
  • stage 630 that may be referred to as a source separation stage.
  • the preprocessing stage 610 may include steps 612, 614, 616, and 618.
  • acoustic signals are captured by the microphone array 106, as described above with reference to 302.
  • step 614 STFT is applied to the captured signals x m . in order to convert the captured signals into the TF domain resulting in complex ⁇ values matrices
  • step 616 correlation matrices are initialized by estimating correlation matrices for each frequency as:
  • step 618 the DOA arameter matrix
  • k being an integer between 1 and n for the acoustic sources 104 illustrated in FIGURE 1
  • step 618 may be carried out in different manners, including e.g. SRP localization described above.
  • the initialized DOA matrix ⁇ 0 is provided to the optimization stage 620.
  • the optimization stage 620 may include steps 622, 624, 626, and 628 which may be iteratively repeated for a number of iterations I max , in order to improve the estimate of the DOA matrix ⁇ (i.e. in order to improve DOA estimates for the different acoustic sources 104).
  • the number of iterations I max may be determined by various stopping conditions. For example, in some embodiments, the maximum number of iterations may be pre ⁇ defined, while, in other embodiments, iterations may be performed until a certain condition is met, such as e.g. a
  • step 622 for each frequency, a steering matrix A f , described above with reference to equation (5) and subsequent equations, is computed as:
  • l f is the frequency in Hertz at the f th frequency band
  • c is the speed of sound
  • a projector matrix may then be computed as shown above with the equation (8).
  • Steering matrices A and projection matrices B may then be, optionally, provided to step 624.
  • step 624 if Wiener masking described above is used, new correlation matrices are re ⁇ estimated as described above with reference to equations (19) ⁇ (20).
  • equations (20) and (19) for re ⁇ estimating the new correlation matrices may be re ⁇ written as equations (31) and (32) below:
  • a DOA gradient matrix may be computed as
  • Equation (33) is an exemplary explicit equation for the gradient given in equation (17) above.
  • the gradient matrix G is provided to step 628 where the DOA matrix ⁇ is adjusted as described with reference to equation (11) above.
  • the DOA matrix is adjusted as
  • the columns of the DOA matrix may be normalized as:
  • step 628 describes the gradient procedure given an appropriate gradient as given in equations (11) and (12) above.
  • Step 624 may be performed as a part of 308 and 310 described above, while step 628 corresponds to 312 described above.
  • Updated DOA matrix ⁇ is then provided to the source separation, as illustrated in FIGURE 6 with ⁇ provided to the separation stage 630 and as illustrated in FIGURE 3 with an arrow from 306 to the final separation step 314.
  • the source separation stage 630 may include steps 632 and 634. Following the iterative procedure described above, any number of methods may be used to enhance/separate the directional signals, all of which methods are within the scope of the present disclosure.
  • each source 104 may be isolated by estimating TF masks and applying them to the STFT X.
  • the sources can be separated by applying TF masks with weights, which could be done in one or more of step 308 and step 314 of the method 300, using equation (23) provided above using estimates of the source coefficients provided by K LCMV beamformers, each designated to isolate a single source while blocking out, or at least substantially suppressing, the others. In one embodiment, this may be implemented as:
  • the variance controls the hardness of the mask such that as , the mask becomes binary, assigning each TF bin entirely to a single source.
  • these masks are applied to any single captured signal (i.e. to any signal captured by one of the acoustic sensors of the microphone array 106) and inverted to the time ⁇ domain using inverse STFT, as described above with reference to 316.
  • the method 600 is presented for the case of an SRP that explicitly models the presence of multiple sources, i.e.method 600 is a MultSRP method.
  • a method for the mixture of single ⁇ source SRPs would include steps analogous to those illustrated in FIGURE 6 with the main difference residing in the gradients of the two methods, in particular in how the correlation information is used (i.e. the difference between MultSRP and MoSRP is in re ⁇ computing the correlation matrices as is done in step 624 described above).
  • step 624 would involve including posterior probability weights in re ⁇ computing the correlation matrices as in equation (15). Gradients for the MoSRP method are given in equation (12).
  • third rank tensors are represented with capital letters (e.g. X), while individual elements of a tensor are denoted with X ijk , where “ijk” represents indices corresponding to those most appropriate for the tensor.
  • Sub ⁇ matrices of the third rank tensors i.e. second rank tensors, also referred to as matrices
  • X ::k which indicates that, in this example, only the third index of the third rank tensor X is specified.
  • sub ⁇ vectors i.e.
  • first rank tensors derived from the corresponding rank tensors are similarly denoted as, for example, X :jk , indicting that e.g. only the second and third index of the third rank tensor X is specified.
  • Source localization refers to determining a DOA of an acoustic signal generated by an acoustic source k of K acoustic sources 104 ⁇ 1 through 104 ⁇ K, the DOA indicating a DOA of the acoustic signal at a microphone array 106 comprising M microphones.
  • K and M could be an integer equal to or greater than 2.
  • M is typically an integer on the order of 5, but, of course, in various implementations the
  • K is typically an integer in the range [2,4]. Since in a typical deployment scenario it is often not possible to know for sure how many acoustic sources are present, value of K (i.e. the number of acoustic sources being modeled) is estimated/selected based on various considerations that a person of ordinary skill in the art would readily recognize, such as e.g. likely number of acoustic sources, an estimate based on a source ⁇ counting algorithm, or prior knowledge.
  • a source localization method may include steps of: a) determining a time ⁇ frequency (TF) tensor (X) of FxTxM dimensions, where F is an integer indicating the number of frequency components f and T is an integer indicating the number of time frames t (each of F, T, and M being an integer equal to or greater than 2, where F may be on the order of 500 and T may be on the order of 100), the TF tensor comprising a TF representation, e.g.
  • each element Xftm of the tensor X f being an integer from a set ⁇ 1, ... ,F ⁇ , t being an integer from a set ⁇ 1, .., T ⁇ , and m being an integer from a set ⁇ 1, ..., M ⁇ , is configured to comprise a complex value indicative of measured magnitude and phase of a portion of a digitized stream x corresponding to a frequency component f at a time frame t for a microphone m;
  • a DOA tensor ( ⁇ ), the DOA tensor being of dimensions 3xK (i.e. it is a second order tensor, or a matrix) and comprising estimated DOA information for each of the K acoustic sources, where each element ⁇ ik of the DOA tensor (i being an integer from a set ⁇ 1, 2, 3 ⁇ , k being an integer from a set ⁇ 1, .., K ⁇ ) is configured to comprise a real value indicative of orientation of a particular acoustic source k with respect to the microphone array (in a 3 ⁇ dimensional space around the microphone array 106) in dimension i (the columns ⁇ :k of ⁇ are vectors of length 1);
  • each element R m1m2f of the correlation tensor (m1 and m2 each being integers from a set ⁇ 1, ... M ⁇ and f being an integer from a set ⁇ 1, ..., F ⁇ ) is configured to comprise a complex value indicative of estimated correlation between a portion of the digitized stream x as acquired by microphone m1 (m1 being an integer from a set ⁇ 1, ... M ⁇ ) and a portion of the digitized stream x as acquired by microphone m2 (m2 being an integer from a set ⁇ 1, ... M ⁇ ) for a particular frequency component f (f being an integer from a set ⁇ 1, ..., F ⁇ );
  • localizable sources i.e. sources for which it is possible to determine orientation with respect to the microphone array; in other words ⁇ directional sources; in other words – sources that may be approximated as point sources for which it is possible to identify their location; e.g. ambient noise coming from all different directions would not
  • Each element B m1m2f of the projector tensor (m1 and m2 both being integers from a set ⁇ 1, ..., M ⁇ and f being an integer from a set ⁇ 1, ..., F ⁇ ) is configured to comprise a complex value indicative of a set (subspace) of data vectors X ft: that correspond to signals originating from the estimated orientations in ⁇ at frequency component f (the product B ::f * X ft: results in a vector that approximates the directional components in the signal at time t and frequency f);
  • each element G ik of the DOA gradient tensor i being an integer from a set ⁇ 1, 2, 3 ⁇ , k being an integer from a set ⁇ 1, .., K ⁇
  • G ik of the DOA gradient tensor i being an integer from a set ⁇ 1, 2, 3 ⁇ , k being an integer from a set ⁇ 1, .., K ⁇
  • each element G ik of the DOA gradient tensor i being an integer from a set ⁇ 1, 2, 3 ⁇ , k being an integer from a set ⁇ 1, .., K ⁇
  • each element G ik of the DOA gradient tensor i being an integer from a set ⁇ 1, 2, 3 ⁇ , k being an integer from a set ⁇ 1, .., K ⁇
  • orientation estimates of an acoustic source k i.e. an estimated change in the DOA matrix ⁇ that is necessary to improve the source orientation estimates
  • determining the DOA of an acoustic source k based on a column ⁇ :k of the DOA tensor i.e. a DOA vector for any source k is then obtained from the column ⁇ :k of the DOA matrix.
  • the source localization method summarized above could further include steps e’) and e’’) to be iterated together with steps d) ⁇ g), steps e’) and e’’) being as follows:
  • each element W ftk of the weight tensor is configured to comprise a real value between 0 and 1 indicative of the degree to which acoustic source k is active in the (f,t) th bins of the TF tensor X (i.e. indicating a percentage of energy in the (f,t) th bin for each of M microphones that is attributable to the acoustic signal generated by the acoustic source k), and
  • the one or more predefined criteria may include a predefined threshold value indicating improvement, e.g. percentage improvement, of a likelihood value indicating how well the estimated orientations in ⁇ explain the observed data given the assumed data model (see equation (9)).
  • a predefined threshold value indicating improvement, e.g. percentage improvement, of a likelihood value indicating how well the estimated orientations in ⁇ explain the observed data given the assumed data model (see equation (9)).
  • Example 1 provides a method for determining a direction of arrival (DOA) of an acoustic signal generated by an acoustic source k of K acoustic sources, the DOA indicating a DOA of the acoustic signal at a microphone array including M microphones, each of K and M being an integer equal to or greater than 2, the method including: a) determining a time ⁇ frequency (TF) tensor of FxTxM dimensions, where F is an integer indicating a number of frequency components f and T is an integer indicating a number of time frames t, the TF tensor including a TF representation of each of M digitized signal streams x, each digitized stream corresponding to a combined acoustic signal captured by one of M microphones of the microphone array; b) initializing a DOA matrix of dimensions 3xK, the DOA matrix including estimated DOA information for each of the K acoustic sources; c) based on values of the TF ten
  • Example 2 provides the method according to Example 1, where each element X ftm of the TF tensor is configured to include a complex value indicative of measured magnitude and phase of a portion of a digitized stream x corresponding to a frequency component f at a time frame t for a microphone m.
  • Example 3 provides the method according to Examples 1 or 2, where each element ⁇ ik of the DOA matrix is configured to include a real value indicative of orientation of the acoustic source k with respect to the microphone array in dimension i.
  • Example 4 provides the method according to any one of the preceding Examples, where each element R m1m2f of the correlation tensor is configured to include a complex value indicative of correlation between a portion of the digitized stream x as acquired by microphone m1 and a portion of the digitized stream x as acquired by microphone m2 for a particular frequency component f.
  • Example 5 provides the method according to any one of the preceding Examples, where each element A mkf of the steering tensor is configured to include a complex value indicative of a magnitude and a phase response of a microphone m to an acoustic source k at a frequency component f.
  • Example 6 provides the method according to any one of the preceding Examples, where each element B m1m2f of the projector tensor is configured to include a complex value indicative of a set of data vectors X ft: that correspond to localizable signals with steering matrix A ::f at a frequency component f.
  • Example 7 provides the method according to any one of the preceding Examples, where each element G ik of the DOA gradient matrix is configured to include a real value indicative of an estimated change in the DOA tensor for improving orientation estimate of the acoustic source k.
  • Example 8 provides the method according to any one of the preceding Examples, further including: e’) based on values of the projector tensor and values of the TF tensor, computing a TF weight tensor of dimensions FxTxK, where each element W ftk of the TF weight tensor is configured to include a real value between 0 and 1 indicative of
  • Example 9 provides the method according to Example 8, where computing the TF weight tensor includes using a Wiener mask.
  • Example 10 provides the method according to Example 8, where computing the TF weight tensor includes using a Wiener mask and defining source ⁇ specific correlation matrices in terms of posterior probabilities using a Wiener mask.
  • Example 11 provides the method according to any one of the preceding Examples, where the iterations are performed until one or more predefined criteria are met.
  • Example 12 provides a method for identifying a first direction of arrival of sound waves from a first acoustic source and a second direction of arrival of sound waves from a second acoustic source, the method including: receiving, at a microphone array, acoustic signals including the sound waves from the first and second acoustic sources; converting the received acoustic signals from a time domain to a time ⁇ frequency domain; processing the converted acoustic signals to determine an estimated first angle representing the first direction of arrival and an estimated second angle representing the second direction of arrival; and updating the estimated first and second angles; where processing includes localizing, separating and Wiener post ⁇ filtering the converted acoustic signals using time ⁇ frequency weighting and outputting a time ⁇ frequency weighted signal for estimating the first and second angles.
  • Example 13 provides the method according to Example 12, further including combining the time ⁇ frequency weighted signal with the converted acoustic signals to generate a correlation matrix.
  • Example 14 provides the method according to Example 13, where updating the estimated first and second angles includes utilizing the correlation matrix and the estimated first and second angles and outputting updated estimated first and second angles.
  • Example 15 provides the method according to Example 12, where converting the received acoustic signals from a time domain to a time ⁇ frequency domain includes using a short time Fourier transform.
  • Example 16 provides the method according to Example 12, where processing the converted acoustic signals to determine the estimated first and second angles includes decomposing the converted acoustic signals to identify signals from each of the first and second acoustic sources by accounting for interference between the first and second acoustic sources in forming the acoustic signals.
  • Example 17 provides the method according to Example 12, where processing the converted acoustic signals and updating the first and second estimated angles includes iteratively decomposing the converted acoustic signals to simultaneously determine the first and second directions of arrival.
  • Example 18 provides the method according to Example 12, where processing the converted acoustic signals includes processing using steered response power localization.
  • Example 19 provides the method according to Example 12, further including using an inverse STFT to convert the processed converted acoustic signals back into the time domain and separating the sound waves from the first acoustic source from the sound waves from the second acoustic source.
  • Example 20 provides a system comprising means for implementing the method according to any one of the preceding Examples.
  • Example 21 provides a data structure for assisting implementation of the method according to any one of the preceding Examples.
  • Example 22 provides a system for determining a DOA of an acoustic signal generated by an acoustic source k of K acoustic sources, the DOA indicating a DOA of the acoustic signal at a microphone array comprising M microphones, each of K and M being an integer equal to or greater than 2, the system including at least one memory element configured to store computer executable instructions, and at least one processor coupled to the at least one memory element and configured, when executing the instructions, to carry out the method according to any one of Examples 1 ⁇ 11.
  • Example 23 provides one or more non ⁇ transitory tangible media encoding logic that include instructions for execution that, when executed by a processor, are operable to perform operations for determining a DOA of an acoustic signal generated by an acoustic source k of K acoustic sources, the DOA indicating a DOA of the acoustic signal at a microphone array comprising M microphones, each of K and M being an integer equal to or greater than 2, the operations comprising operations of the method according to any one of Examples 1 ⁇ 11.
  • Example 24 provides a system for identifying a first direction of arrival of sound waves from a first acoustic source and a second direction of arrival of sound waves from a second acoustic source, the system including at least one memory element configured to store computer executable instructions, and at least one processor coupled to the at least one memory element and configured, when executing the instructions, to carry out the method according to any one of Examples 12 ⁇ 19.
  • Example 25 provides one or more non ⁇ transitory tangible media encoding logic that include instructions for execution that, when executed by a processor, are operable to perform operations for identifying a first direction of arrival of sound waves from a first acoustic source and a second direction of arrival of sound waves from a second acoustic source, the operations comprising operations of the method according to any one of Examples 12 ⁇ 19.
  • any number of electrical circuits used to implement the systems and methods of the FIGURES may be implemented on a board of an associated electronic device.
  • the board can be a general circuit board that can hold various components of the internal electronic system of the electronic device and,
  • the board can provide the electrical connections by which the other components of the system can communicate electrically.
  • Any suitable processors (inclusive of digital signal processors, microprocessors, supporting chipsets, etc.), computer ⁇ readable non ⁇ transitory memory elements, etc. can be suitably coupled to the board based on particular configuration needs, processing demands, computer designs, etc.
  • Other components such as external storage, additional sensors, controllers for audio/video display, and peripheral devices may be attached to the board as plug ⁇ in cards, via cables, or integrated into the board itself.
  • the functionalities described herein may be implemented in emulation form as software or firmware running within one or more configurable (e.g., programmable) elements arranged in a structure that supports these functions.
  • the software or firmware providing the emulation may be provided on non ⁇ transitory computer ⁇ readable storage medium comprising instructions to allow a processor to carry out those functionalities.
  • the systems and methods of the FIGURES may be implemented as stand ⁇ alone modules (e.g., a device with associated components and circuitry configured to perform a specific application or function) or implemented as plug ⁇ in modules into application specific hardware of electronic devices.
  • SOC system on chip
  • An SOC represents an IC that integrates components of a computer or other electronic system into a single chip. It may contain digital, analog, mixed ⁇ signal, and often radio frequency functions: all of which may be provided on a single chip substrate.
  • MCM multi ⁇ chip ⁇ module
  • the identification, localization and separation functionalities may be implemented in one or more silicon cores in
  • ASICs Application Specific Integrated Circuits
  • FPGAs Field Programmable Gate Arrays
  • the features discussed herein can be applicable to medical systems, scientific instrumentation, wireless and wired communications, radar, industrial process control, audio and video equipment, current sensing, instrumentation (which can be highly precise), and other digital ⁇ processing ⁇ based systems.
  • certain embodiments discussed above can be provisioned in digital signal processing technologies for medical imaging, patient monitoring, medical instrumentation, and home healthcare. This could include pulmonary monitors, accelerometers, heart rate monitors, pacemakers, etc. Other applications can involve automotive technologies for safety systems (e.g., stability control systems, driver assistance systems, braking systems, infotainment and interior applications of any kind). Furthermore, powertrain systems (for example, in hybrid and electric vehicles) can use high ⁇ precision data conversion products in battery monitoring, control systems, reporting controls, maintenance activities, etc.
  • the teachings of the present disclosure can be applicable in the industrial markets that include process control systems that help drive productivity, energy efficiency, and reliability.
  • the teachings of the signal processing circuits discussed above can be used for image processing, auto focus, and image stabilization (e.g., for digital still cameras, camcorders, etc.).
  • Other consumer applications can include audio and video processors for home theater systems, DVD recorders, and high ⁇ definition televisions.
  • Yet other consumer applications can involve advanced touch screen controllers (e.g., for any type of portable media device).
  • such technologies could readily part of smartphones, tablets, security systems, PCs, gaming technologies, virtual reality, simulation training, etc.
  • references to various features e.g., elements, structures, modules, components, steps, operations, characteristics, etc.
  • references to various features e.g., elements, structures, modules, components, steps, operations, characteristics, etc.
  • a system that can include any suitable circuitry, dividers, capacitors, resistors, inductors, ADCs, DFFs, logic gates, software, hardware, links, etc.) that can be part of any type of computer, which can further include
  • the system can include means for clocking data from the digital core onto a first data output of a macro using a first clock, the first clock being a macro clock; means for clocking the data from the first data output of the macro into the physical interface using a second clock, the second clock being a physical interface clock; means for clocking a first reset signal from the digital core onto a reset output of the macro using the macro clock, the first reset signal output used as a second reset signal; means for sampling the second reset signal using a third clock, which provides a clock rate greater than the rate of the second clock, to generate a sampled reset signal; and means for resetting the second clock to a predetermined state in the physical interface in response to a transition of the sampled reset signal.
  • the ‘means for’ in these instances can include (but is not limited to) using any suitable component discussed herein, along with any suitable software, circuitry, hub, computer code, logic, algorithms, hardware, controller, interface, link, bus, communication pathway, etc.
  • the system includes memory that further comprises machine ⁇ readable instructions that when executed cause the system to perform any of the activities discussed above.

Landscapes

  • Physics & Mathematics (AREA)
  • Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Radar, Positioning & Navigation (AREA)
  • Remote Sensing (AREA)
  • Circuit For Audible Band Transducer (AREA)

Abstract

L'invention concerne un procédé pour l'identification de la direction d'arrivée d'ondes sonores provenant de première et seconde sources acoustiques. Le procédé comprend la réception, au niveau d'un ensemble de microphones, de signaux acoustiques comprenant les ondes sonores provenant des première et seconde sources acoustiques, la conversion des signaux acoustiques reçus d'un domaine temporel à un domaine temporel-fréquentiel, le traitement des signaux acoustiques convertis pour déterminer un premier angle estimé représentant la première direction d'arrivée et un second angle estimé représentant la seconde direction d'arrivée et la mise à jour des premier et second angles estimés, le traitement comprenant la localisation, la séparation et le post-filtrage de Wiener des signaux acoustiques convertis à l'aide d'une pondération temporelle-fréquentielle.
PCT/US2015/066012 2014-12-18 2015-12-16 Systèmes et procédés pour la localisation et la séparation de sources WO2016100460A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US201462093903P 2014-12-18 2014-12-18
US62/093,903 2014-12-18

Publications (1)

Publication Number Publication Date
WO2016100460A1 true WO2016100460A1 (fr) 2016-06-23

Family

ID=56127517

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2015/066012 WO2016100460A1 (fr) 2014-12-18 2015-12-16 Systèmes et procédés pour la localisation et la séparation de sources

Country Status (1)

Country Link
WO (1) WO2016100460A1 (fr)

Cited By (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107132503A (zh) * 2017-03-23 2017-09-05 哈尔滨工程大学 基于矢量奇异值分解的声矢量圆阵宽带相干源方位估计方法
WO2018056214A1 (fr) * 2016-09-23 2018-03-29 Jfeスチール株式会社 Dispositif d'orientation d'azimut de source d'ondes ultrasonores, et procédé d'analyse d'image superposée
CN109239665A (zh) * 2018-07-10 2019-01-18 北京大学深圳研究生院 一种基于信号子空间相似度谱和粒子滤波器的多声源连续定位方法和装置
CN109856593A (zh) * 2018-12-21 2019-06-07 南京理工大学 面向声源测向的微型智能阵列式声传感器及其测向方法
CN110876100A (zh) * 2018-08-29 2020-03-10 北京嘉楠捷思信息技术有限公司 一种音源定向方法与***
CN111060875A (zh) * 2019-12-12 2020-04-24 北京声智科技有限公司 设备相对位置信息获取方法、装置及存储介质
CN111133511A (zh) * 2017-07-19 2020-05-08 音智有限公司 声源分离***
WO2020060519A3 (fr) * 2018-09-17 2020-06-04 Aselsan Elektroni̇k Sanayi̇ Ve Ti̇caret Anoni̇m Şi̇rketi̇ Procédé de localisation et de séparation de sources jointes destiné à des sources acoustiques
CN111724801A (zh) * 2020-06-22 2020-09-29 北京小米松果电子有限公司 音频信号处理方法及装置、存储介质
CN112106069A (zh) * 2018-06-13 2020-12-18 赫尔实验室有限公司 使用盲源分离的流传输数据张量分析
WO2021013346A1 (fr) * 2019-07-24 2021-01-28 Huawei Technologies Co., Ltd. Appareil pour déterminer des positions spatiales de multiples sources audio
CN113138367A (zh) * 2020-01-20 2021-07-20 中国科学院上海微***与信息技术研究所 一种目标定位方法、装置、电子设备及存储介质
US11134348B2 (en) 2017-10-31 2021-09-28 Widex A/S Method of operating a hearing aid system and a hearing aid system
WO2022042864A1 (fr) 2020-08-31 2022-03-03 Proactivaudio Gmbh Procédé et appareil de mesure de directions d'arrivée de sources sonores multiples
CN114460541A (zh) * 2022-02-10 2022-05-10 国网上海市电力公司 电力设备噪声声源定位方法、装置和声源定位设备
CN115015830A (zh) * 2022-06-01 2022-09-06 北京中安智能信息科技有限公司 一种基于机器学习的水声信号处理算法

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140023199A1 (en) * 2012-07-23 2014-01-23 Qsound Labs, Inc. Noise reduction using direction-of-arrival information
US20140192999A1 (en) * 2013-01-08 2014-07-10 Stmicroelectronics S.R.L. Method and apparatus for localization of an acoustic source and acoustic beamforming
US20140226838A1 (en) * 2013-02-13 2014-08-14 Analog Devices, Inc. Signal source separation

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140023199A1 (en) * 2012-07-23 2014-01-23 Qsound Labs, Inc. Noise reduction using direction-of-arrival information
US20140192999A1 (en) * 2013-01-08 2014-07-10 Stmicroelectronics S.R.L. Method and apparatus for localization of an acoustic source and acoustic beamforming
US20140226838A1 (en) * 2013-02-13 2014-08-14 Analog Devices, Inc. Signal source separation

Cited By (27)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018056214A1 (fr) * 2016-09-23 2018-03-29 Jfeスチール株式会社 Dispositif d'orientation d'azimut de source d'ondes ultrasonores, et procédé d'analyse d'image superposée
JPWO2018056214A1 (ja) * 2016-09-23 2018-09-20 Jfeスチール株式会社 超音波源の方位標定装置及び重ね合わせ画像の解析方法
CN107132503B (zh) * 2017-03-23 2019-09-27 哈尔滨工程大学 基于矢量奇异值分解的声矢量圆阵宽带相干源方位估计方法
CN107132503A (zh) * 2017-03-23 2017-09-05 哈尔滨工程大学 基于矢量奇异值分解的声矢量圆阵宽带相干源方位估计方法
CN111133511B (zh) * 2017-07-19 2023-10-27 音智有限公司 声源分离***
CN111133511A (zh) * 2017-07-19 2020-05-08 音智有限公司 声源分离***
US11134348B2 (en) 2017-10-31 2021-09-28 Widex A/S Method of operating a hearing aid system and a hearing aid system
US11218814B2 (en) 2017-10-31 2022-01-04 Widex A/S Method of operating a hearing aid system and a hearing aid system
US11146897B2 (en) 2017-10-31 2021-10-12 Widex A/S Method of operating a hearing aid system and a hearing aid system
CN112106069A (zh) * 2018-06-13 2020-12-18 赫尔实验室有限公司 使用盲源分离的流传输数据张量分析
CN109239665A (zh) * 2018-07-10 2019-01-18 北京大学深圳研究生院 一种基于信号子空间相似度谱和粒子滤波器的多声源连续定位方法和装置
CN109239665B (zh) * 2018-07-10 2022-04-15 北京大学深圳研究生院 一种基于信号子空间相似度谱和粒子滤波器的多声源连续定位方法和装置
CN110876100A (zh) * 2018-08-29 2020-03-10 北京嘉楠捷思信息技术有限公司 一种音源定向方法与***
CN110876100B (zh) * 2018-08-29 2022-12-09 嘉楠明芯(北京)科技有限公司 一种音源定向方法与***
US11482239B2 (en) 2018-09-17 2022-10-25 Aselsan Elektronik Sanayi Ve Ticaret Anonim Sirketi Joint source localization and separation method for acoustic sources
WO2020060519A3 (fr) * 2018-09-17 2020-06-04 Aselsan Elektroni̇k Sanayi̇ Ve Ti̇caret Anoni̇m Şi̇rketi̇ Procédé de localisation et de séparation de sources jointes destiné à des sources acoustiques
CN109856593B (zh) * 2018-12-21 2023-01-03 南京理工大学 面向声源测向的微型智能阵列式声传感器及其测向方法
CN109856593A (zh) * 2018-12-21 2019-06-07 南京理工大学 面向声源测向的微型智能阵列式声传感器及其测向方法
WO2021013346A1 (fr) * 2019-07-24 2021-01-28 Huawei Technologies Co., Ltd. Appareil pour déterminer des positions spatiales de multiples sources audio
US11921198B2 (en) 2019-07-24 2024-03-05 Huawei Technologies Co., Ltd. Apparatus for determining spatial positions of multiple audio sources
CN111060875A (zh) * 2019-12-12 2020-04-24 北京声智科技有限公司 设备相对位置信息获取方法、装置及存储介质
CN111060875B (zh) * 2019-12-12 2022-07-15 北京声智科技有限公司 设备相对位置信息获取方法、装置及存储介质
CN113138367A (zh) * 2020-01-20 2021-07-20 中国科学院上海微***与信息技术研究所 一种目标定位方法、装置、电子设备及存储介质
CN111724801A (zh) * 2020-06-22 2020-09-29 北京小米松果电子有限公司 音频信号处理方法及装置、存储介质
WO2022042864A1 (fr) 2020-08-31 2022-03-03 Proactivaudio Gmbh Procédé et appareil de mesure de directions d'arrivée de sources sonores multiples
CN114460541A (zh) * 2022-02-10 2022-05-10 国网上海市电力公司 电力设备噪声声源定位方法、装置和声源定位设备
CN115015830A (zh) * 2022-06-01 2022-09-06 北京中安智能信息科技有限公司 一种基于机器学习的水声信号处理算法

Similar Documents

Publication Publication Date Title
WO2016100460A1 (fr) Systèmes et procédés pour la localisation et la séparation de sources
Gannot et al. A consolidated perspective on multimicrophone speech enhancement and source separation
US10123113B2 (en) Selective audio source enhancement
US9706298B2 (en) Method and apparatus for localization of an acoustic source and acoustic beamforming
US10192568B2 (en) Audio source separation with linear combination and orthogonality characteristics for spatial parameters
US9099096B2 (en) Source separation by independent component analysis with moving constraint
US20130297296A1 (en) Source separation by independent component analysis in conjunction with source direction information
US20130294611A1 (en) Source separation by independent component analysis in conjuction with optimization of acoustic echo cancellation
US20170140771A1 (en) Information processing apparatus, information processing method, and computer program product
US20160073198A1 (en) Spatial audio apparatus
JP6363213B2 (ja) いくつかの入力オーディオ信号の残響を除去するための信号処理の装置、方法、およびコンピュータプログラム
CN113113034A (zh) 用于平面麦克风阵列的多源跟踪和语音活动检测
Koldovský et al. Spatial source subtraction based on incomplete measurements of relative transfer function
CN110088835B (zh) 使用相似性测度的盲源分离
Nesta et al. Convolutive underdetermined source separation through weighted interleaved ICA and spatio-temporal source correlation
US10718742B2 (en) Hypothesis-based estimation of source signals from mixtures
EP2912660A1 (fr) Procédé de détermination d'un dictionnaire de composants de base à partir d'un signal audio
JP2022135451A (ja) 音響処理装置、音響処理方法およびプログラム
Salvati et al. Power method for robust diagonal unloading localization beamforming
GB2510650A (en) Sound source separation based on a Binary Activation model
JP5406866B2 (ja) 音源分離装置、その方法及びプログラム
Girin et al. Audio source separation into the wild
Nesta et al. Unsupervised spatial dictionary learning for sparse underdetermined multichannel source separation
Li et al. Low complex accurate multi-source RTF estimation
Fakhry et al. Underdetermined source detection and separation using a normalized multichannel spatial dictionary

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 15870955

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 15870955

Country of ref document: EP

Kind code of ref document: A1