EP2988302A1

EP2988302A1 - System and method for separation of sound sources in a three-dimensional space

Info

Publication number: EP2988302A1
Application number: EP14461562.2A
Authority: EP
Inventors: Jacek Paczkowski; Krzysztof Kramek; Tomasz Nalewa
Original assignee: Patents Factory Ltd Sp zoo
Current assignee: Patents Factory Ltd Sp zoo
Priority date: 2014-08-21
Filing date: 2014-08-21
Publication date: 2016-02-24

Abstract

A method for sound source separation using a linear microphone array, the method comprising the steps of: calculating distance of each microphone of the microphone array to a known location of a target that is to be sampled; for each microphone calculating a delay with which sound reaches the given microphone from a given location in space; identifying a microphone having the lowest value of the delay and subtracting this value from values of delays of all other microphones; calculating a sound sample for a given location in space by adding sound samples from all microphones while taking into account the respective delays.

Description

The present invention relates to a system and method for separation of sound sources in a three-dimensional space.
Prior art defines US 7047189 B2 entitled "Sound source separation using convolutional mixing and a priori sound source knowledge" discloses sound source separation, without permutation, using convolutional mixing independent component analysis based on a priori knowledge of the target sound source is disclosed. The target sound source can be a human speaker. The reconstruction filters used in the sound source separation take into account the a priori knowledge of the target sound source, such as an estimate the spectra of the target sound source. The filters may be generally constructed based on a speech recognition system. Matching the words of the dictionary of the speech recognition system to a reconstructed signal indicates whether proper separation has occurred. More specifically, the filters may be constructed based on a vector quantization codebook of vectors representing typical sound source patterns. Matching the vectors of the codebook to a reconstructed signal indicates whether proper separation has occurred. The vectors may be linear prediction vectors, among others.
A publication of US 20110075860 A1 entitled "Sound source separation and display method, and system thereof" discloses a measurement system using a microphone array, which is a combination of a plurality of microphones, is widely used to identify and visualize the incoming directions of sound and the sound sources. The measurement system can be configured with only a single microphone array, or can also use several reference signal sensors such as a microphone and a vibration pickup. A microphone array by itself is used to equally evaluate sound sources lying in the intended direction of the microphone array. For example, a microphone array of planar shape is intended to analyze sound sources in the front direction. A spherical microphone array is intended to analyze sound sources in all directions around the sphere. If target sounds have high sound pressure levels and show sufficient S/N ratios with respect to other background noise, the locations of the sound sources or the incoming directions can be analyzed without a reference signal. Digital signal processing can be applied for mechanical determination.
The aim of the development of the present invention is an improved, more accurate and resources cost effective system and method for separation of sound sources in a three-dimensional space.

SUMMARY AND OBJECTS OF THE INVENTION

An object of the present invention is a method for sound source separation using a linear microphone array, the method comprising the steps of: calculating distance of each microphone of the microphone array to a known location of a target that is to be sampled; for each microphone calculating a delay with which sound reaches the given microphone from a given location in space; identifying a microphone having the lowest value of the delay and subtracting this value from all values of delays of all microphones; calculating a sound sample for a given location in space by adding sound samples from all microphones while taking into account the respective delays.
Preferably, the step of calculating distance is executed according to the formula of: $dl = \sqrt{{(x_{i} - x_{t})}^{2} + {(y_{i} - y_{t})}^{2} + {(z_{i} - z_{t})}^{2}}$

wherein the i index denotes a microphone and t index denotes the target. x_i, y_i and z_i mean x, y and z coordinates of the i-th microphone and x_t, y_t and z_t mean x, y and z coordinates of the target.
Preferably, the step of calculating the delay is executed according to the formula of: $dt = \frac{dl}{Vs}$

wherein Vs is the speed of sound. $dt 2 = dt * Fs$

wherein Fs is a sampling frequency.
Preferably, the step of calculating a sound sample for a given location in space is executed according to the formula of: $M_{t} = \sum_{i = 1}^{17} M_{i, t + {dt 2}_{i}}$

wherein Mt is a sound sample, i is the microphone index and t is a sample index for a reference microphone.
Preferably, the linear microphone comprises a plurality of microphones and the microphones are located in at least two groups of at least two microphones whereas each group has a different spacing of the respective microphones.
Preferably, there are five groups of microphones each comprising at least two microphones wherein spacing of respective microphones in groups is such that in a subsequent group the spacing is twice of that of the preceding group.
Preferably, there are five groups of microphones and that the first group comprises seventeen microphones, while the remaining four groups comprise eight microphones each.
Preferably, the linear microphone is an arrangement that comprises three linear microphone arrays according to the present invention, wherein first ends of all three microphone arrays, comprising the same arrangement of microphones, are in proximity or adjacent to each other; and the separate microphone arrays are positioned in different planes in three-dimensional space.
Preferably, the other ends of the microphone arrays linearly extend on X, Y and Z axis respectively.
Another object of the present invention is a computer program comprising program code means for performing all the steps of the computer-implemented method according to the present invention when said program is run on a computer.
Another object of the present invention is a computer readable medium storing computer-executable instructions performing all the steps of the computer-implemented method according to the present invention when executed on a computer.
These and other objects of the invention presented herein are accomplished by providing a system and method for separation of sound from selected place in a three-dimensional space. Further details and features of the present invention, its nature and various advantages will become more apparent from the following detailed description of the preferred embodiments shown in a drawing, in which:

Fig. 1 shows a microphone array;
Figs. 2A-B depict a microphone array system;
Fig. 3 presents a diagram of the method according to the present invention;
Fig. 4 presents a diagram of the system according to the present invention;
Fig. 5 shows an installation of the system in a room;
Fig. 6A presented exemplary acoustic signals;
Fig. 6B presents signals received by the microphones; and
Fig. 6C depicts the original signals and signals separated using the method according to the present invention.

NOTATION AND NOMENCLATURE

Some portions of the detailed description which follows are presented in terms of data processing procedures, steps or other symbolic representations of operations on data bits that can be performed on computer memory. Therefore, a computer executes such logical steps thus requiring physical manipulations of physical quantities.
Usually these quantities take the form of electrical or magnetic signals capable of being stored, transferred, combined, compared, and otherwise manipulated in a computer system. For reasons of common usage, these signals are referred to as bits, packets, messages, values, elements, symbols, characters, terms, numbers, or the like.
Additionally, all of these and similar terms are to be associated with the appropriate physical quantities and are merely convenient labels applied to these quantities. Terms such as "processing" or "creating" or "transferring" or "executing" or "determining" or "detecting" or "obtaining" or "selecting" or "calculating" or "generating" or the like, refer to the action and processes of a computer system that manipulates and transforms data represented as physical (electronic) quantities within the computer's registers and memories into other data similarly represented as physical quantities within the memories or registers or other such information storage.
A computer-readable (storage) medium, such as referred to herein, typically may be non-transitory and/or comprise a non-transitory device. In this context, a non-transitory storage medium may include a device that may be tangible, meaning that the device has a concrete physical form, although the device may change its physical state. Thus, for example, non-transitory refers to a device remaining tangible despite a change in state.

DESCRIPTION OF EMBODIMENTS

A microphone array according to the present invention comprises, as shown in Fig. 1, a supporting body 101 and linearly, spatially located microphones 102A-L wherein the microphones are located in for example two groups 103A-C of for example two microphones whereas each group can have a different spacing of the respective microphones.
It is to be noted however that for facilitating sound separation it is sufficient that the microphones of a microphone array are linearly placed and spaced from each other. In the broadest possible embodiment, the microphones may be spaced by equal distances or may be spaced by irregular, different distances.
The microphones 102 are preferably located on a straight line such that a first group of microphones comprises microphones spaced by for example 6,25mm, the second group of microphones comprises microphones spaced by for example 12,5mm, the third group of microphones comprises microphones spaced by for example 25mm, the fourth group of microphones comprises microphones spaced by for example 50mm and the fifth group of microphones comprises microphones spaced by for example 100mm. Therefore, there are five groups each comprising at least two microphones wherein spacing of respective microphones in groups is such that in subsequent group the spacing is for example twice of that of the preceding group.
Preferably, the first group comprises 17 microphones, while the remaining four groups comprise eight microphones each. This number is a preferred arrangement as shown by experiments and evaluation of response curve at different numbers of microphones in arrays.
A single linear microphone array according to the present invention allows obtain good quality of separation. In order to obtain good quality independent of placement of sound source relative to microphone array, it is necessary to apply at least three microphone arrays.
The microphone arrays must be spaced for example by 90 degrees wherein first ends of all microphone arrays (comprising the same arrangement of microphones) are in proximity or adjacent to a virtual center of a circle as shown in Fig. 2A. Fig. 2A shows a view in a single plane but the separate microphone arrays must be positioned in different planes in 3D space. Preferably, the other ends of microphone arrays linearly extend on X, Y and Z axis respectively (for example forming three edges of a cube as shown in Fig. 2B). Such a microphone system may be located in a corner of a room near the ceiling.
Such a microphone system is able obtain good quality of separation of each sound source independent of placement of sound sources relative to three linear arrays.
It is to be noted however that for facilitating sound separation it is sufficient that a single microphone array is used in the broadest possible embodiment, However, the configuration depicted in Fig. 3 embodiment will provide improved sound separation quality. Such embodiment is preferred but not mandatory.
Fig. 3 presents a diagram of the method according to the present invention. The method starts at step 301 from calculating distance of each microphone of the microphone array to a known location of a target that is to be sampled: $dl = \sqrt{{(x_{i} - x_{t})}^{2} + {(y_{i} - y_{t})}^{2} + {(z_{i} - z_{t})}^{2}}$

wherein the i index denotes a microphone and t index denotes a target. x_i, y_i and z_i means x, y and z coordinates of microphone i. x_t, y_t and z_t means x, y and z coordinates of target.
Subsequently, at step 302, there is calculated a delay with which sound reaches each of the microphones from a given location in space dl: $dt = \frac{dl}{Vs}$

wherein Vs is the speed of sound. $dt 2 = dt * Fs$

wherein Fs is a sampling frequency.
Next, at step 303, there is identified a microphone having the lowest value of dt2 and this value is subtracted from all values of dt2. Thus dt2 values will identify differences in time of arrival of signal from a given location in space to all microphones.
Subsequently, at step 304, there is calculated a sound sample for a given location in space by adding sound samples from all microphones while taking into account the respective delays. Addition of sounds is adding sound samples (in terms of their values) from respective microphones whereas a sample from a microphone which is closest to the target is without delay, whereas samples from the remaining microphones have dt₂ delay (corrected by delay of the microphone which is closest to the target).
For a microphone closest to the given location in space the delay equals 0 and for the remaining microphones it is derived from a difference in distance from the given location in space with respect to the closest microphone: $M_{t} = \sum_{i = 1}^{17} M_{i, t + {dt 2}_{i}}$
wherein Mt is a sound sample, i is the microphone index and t is a sample index for a reference microphone. The dt2_i values may be different for different microphones.
For i-th microphone, as a sum input, there is taken into account a sample delayed by dt2_i samples with respect to the microphone closest to the target.
As a result there is obtained a set of sound samples. This is an equivalent of a directional microphone directed at a given location in space. The sound from this point will be amplified while the sounds from other locations will be attenuated.
In Fig. 6A there are presented exemplary acoustic signals 601 - 1 kHz positioned at -45° angle with respect to the center of the linear microphone array, 602 - 2.2kHz positioned at +45° angle with respect to the center of the linear microphone array. The X axis denotes sample number and the Y axis denotes signal amplitude.
Fig. 6B presents signals received by the microphones. In this arrangement 16 microphones are used and the 603 is a signal from a first microphone, 604 is a signal from the 8^th microphone while 605 is a signal from the 16^th microphone.
Fig. 6C depicts the original signals 606-607 and signals separated 608-609 using the method according to the present invention.
The method according to the present invention allows for separation of sound from a given location in space from other sounds. Three microphone arrays installed for example as shown in Fig. 5 allow to increase separation quality because some of the microphones are always in a good position with respect to a monitored location. Of course the number of microphones and the number of groups of microphones may be different than in the presented example.
Fig. 4 presents a diagram of the system according to the present invention. The system comprises the microphone array arrangement 402 shown in Fig. 2A-B and an appropriate sampling module 403 managed by a controller 405.
The system may be realized using dedicated components or custom made FPGA or ASIC circuits. The system comprises a data bus 401 communicatively coupled to a memory 404. Additionally, other components of the system are communicatively coupled to the system bus 401 so that they may be managed by the controller 405.
The memory 404 may store computer program or programs executed by the controller 405 in order to execute steps of the method according to the present invention.
Therefore, the controller 405 is configured to executed steps of the method described with reference to Fig. 3.
The present invention results in a useful sound separation that may for example be used in surveillance systems. Such results are concrete and tangible thus not abstract. Therefore, the invention provides a useful, concrete and tangible result.
According to the present invention data acquired by different microphones are processed within a dedicated machine. Hence, the machine or transformation test is fulfilled and that the invention is not abstract.
It can be easily recognized, by one skilled in the art, that the aforementioned method for separation of sound sources in a three-dimensional space may be performed and/or controlled by one or more computer programs. Such computer programs are typically executed by utilizing the computing resources in a computing device. Applications are stored on a non-transitory medium. An example of a non-transitory medium is a non-volatile memory, for example a flash memory or volatile memory, for example RAM. The computer instructions are executed by a processor. These memories are exemplary recording media for storing computer programs comprising computer-executable instructions performing all the steps of the computer-implemented method according the technical concept presented herein.
While the invention presented herein has been depicted, described, and has been defined with reference to particular preferred embodiments, such references and examples of implementation in the foregoing specification do not imply any limitation on the invention. It will, however, be evident that various modifications and changes may be made thereto without departing from the broader scope of the technical concept. The presented preferred embodiments are exemplary only, and are not exhaustive of the scope of the technical concept presented herein.
Accordingly, the scope of protection is not limited to the preferred embodiments described in the specification, but is only limited by the claims that follow.

Claims

A method for sound source separation using a linear microphone array, the method being characterized in that it comprises the steps of:
• calculating distance (301) of each microphone of the microphone array to a known location of a target that is to be sampled;

• for each microphone calculating a delay (302) with which sound reaches the given microphone from a given location in space;

• identifying (303) a microphone having the lowest value of the delay and subtracting this value from all values of delays of all microphones;

• calculating (304) a sound sample for a given location in space by adding sound samples from all microphones while taking into account the respective delays.
The method according to claim 1 characterized in that the step of calculating distance (301) is executed according to the formula of: $dl = \sqrt{{(x_{i} - x_{t})}^{2} + {(y_{i} - y_{t})}^{2} + {(z_{i} - z_{t})}^{2}}$

wherein the i index denotes a microphone and t index denotes the target. x_i, y_i and z_i mean x, y and z coordinates of the i-th microphone and x_t, y_t and z_t mean x, y and z coordinates of the target.
The method according to claim 1 characterized in that the step of calculating the delay (302) is executed according to the formula of: $dt = \frac{dl}{Vs}$

wherein Vs is the speed of sound. $dt 2 = dt * Fs$

wherein Fs is a sampling frequency.
The method according to claim 1 characterized in that the step of calculating (304) a sound sample for a given location in space is executed according to the formula of: $M_{t} = \sum_{i = 1}^{17} M_{i, t + {dt 2}_{i}}$

wherein Mt is a sound sample, i is the microphone index and t is a sample index for a reference microphone.
The method according to claim 1 characterized in that the linear microphone comprises a plurality of microphones and the microphones are located in at least two groups (103A-C) of at least two microphones whereas each group has a different spacing of the respective microphones.
The method according to claim 5 characterized in that there are five groups of microphones each comprising at least two microphones wherein spacing of respective microphones in groups is such that in a subsequent group the spacing is twice of that of the preceding group.
The method according to claim 5 characterized in that there are five groups of microphones and that the first group comprises seventeen microphones, while the remaining four groups comprise eight microphones each.
The method according to claim 5 characterized in that the linear microphone is an arrangement that comprises three linear microphone arrays according to claim 5, wherein first ends of all three microphone arrays, comprising the same arrangement of microphones, are in proximity or adjacent to each other; and the separate microphone arrays are positioned in different planes in three-dimensional space.
The method according to claim 8 characterized in that the other ends of the microphone arrays linearly extend on X, Y and Z axis respectively.
A computer program comprising program code means for performing all the steps of the computer-implemented method according to claim 1 when said program is run on a computer.
A computer readable medium storing computer-executable instructions performing all the steps of the computer-implemented method according to claim 1 when executed on a computer.
A system for sound source localization comprising
• a microphone array;

• a data bus (401) communicatively coupling components of the system;

• a memory (404) for storing data;

• a controller (405);

• a sampling module (403);
the system being characterized in that it comprises:
• the microphone array system (402) is a linear microphone array;

• whereas the controller (405) is configured to control the sampling module (403) and to execute all steps of the method according to claim 1.