CN110007276B - Sound source positioning method and system - Google Patents
Sound source positioning method and system Download PDFInfo
- Publication number
- CN110007276B CN110007276B CN201910312565.6A CN201910312565A CN110007276B CN 110007276 B CN110007276 B CN 110007276B CN 201910312565 A CN201910312565 A CN 201910312565A CN 110007276 B CN110007276 B CN 110007276B
- Authority
- CN
- China
- Prior art keywords
- frame signal
- frame
- signal
- signals
- sound source
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01S—RADIO DIRECTION-FINDING; RADIO NAVIGATION; DETERMINING DISTANCE OR VELOCITY BY USE OF RADIO WAVES; LOCATING OR PRESENCE-DETECTING BY USE OF THE REFLECTION OR RERADIATION OF RADIO WAVES; ANALOGOUS ARRANGEMENTS USING OTHER WAVES
- G01S5/00—Position-fixing by co-ordinating two or more direction or position line determinations; Position-fixing by co-ordinating two or more distance determinations
- G01S5/18—Position-fixing by co-ordinating two or more direction or position line determinations; Position-fixing by co-ordinating two or more distance determinations using ultrasonic, sonic, or infrasonic waves
- G01S5/20—Position of source determined by a plurality of spaced direction-finders
Landscapes
- Physics & Mathematics (AREA)
- Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Radar, Positioning & Navigation (AREA)
- Remote Sensing (AREA)
- Measurement Of Velocity Or Position Using Acoustic Or Ultrasonic Waves (AREA)
- Circuit For Audible Band Transducer (AREA)
Abstract
The invention discloses a sound source positioning method and a sound source positioning system. The sound source positioning method firstly carries out windowing and framing on sound source voice signals obtained by the quaternary microphone array, then detects signal effective frame signals, and calculates and fuses quadratic correlation generalized spectrum subtraction correction phase transformation functions on the screened effective frame signals. In order to further improve the time delay precision, the average generalized spectrum fused with quadratic correlation is adopted to subtract the modified phase transformation function to calculate the time delay value. And finally, estimating the sound source direction according to the geometric position of the microphone array and the calculated time delay value, thereby improving the precision of sound source positioning.
Description
Technical Field
The present invention relates to the field of sound source localization, and in particular, to a sound source localization method and system.
Background
Sound source localization has become a research hotspot in the field of voice signal processing, and has wide application in the fields of video conferences, intelligent robots, intelligent video monitoring systems and the like. In the traditional positioning algorithm, the positioning accuracy is sharply reduced in a severe environment with low signal-to-noise ratio and high reverberation time.
Disclosure of Invention
The invention aims to provide a sound source positioning method and a sound source positioning system so as to improve the accuracy of sound source positioning.
In order to achieve the purpose, the invention provides the following scheme:
the invention provides a sound source positioning method, which comprises the following steps:
acquiring four sound source voice signals by adopting a quaternary microphone array; the quaternary microphone array comprises four microphones, and each microphone collects one path of sound source voice signals;
performing synchronous framing processing on four paths of sound source voice signals to obtain a frame signal set, wherein each frame signal in the signal frame set comprises four paths of frame signals which are a first path of frame signal, a second path of frame signal, a third path of frame signal and a fourth path of frame signal respectively;
judging the validity of each frame signal in the frame signal set to obtain a valid frame signal subset;
acquiring an average generalized spectrum subtraction correction phase transformation function fused with secondary correlation of any two paths of effective frame signals according to the effective frame signal subset;
acquiring a time point corresponding to a maximum peak value of a modified phase transformation function fused with a quadratic correlation average generalized spectrum in any two paths of effective frame signals to obtain a time delay value of any two paths of microphone sound source signals;
and determining the direction position of the sound source according to the geometric position of the quaternary microphone array and the time delay values of any two paths of microphone sound source signals.
Optionally, the synchronous framing processing is performed on the four sound source voice signals to obtain a frame signal set, which specifically includes:
using window functionsCarrying out synchronous windowing and framing processing on the four sound source voice signals to obtain a frame signal xij(N), N denotes the nth sampling point, N is 1,2ij(n) a j-th signal indicating an ith frame signal, where j is 1,2,3, 4;
all the frame signals are combined into a set of frame signals.
Optionally, the determining the validity of each frame signal in the frame signal set to obtain a valid frame signal subset specifically includes:
using formulasCalculating short-time frame energy of a jth path of frame signals of the ith frame signal; wherein E isijShort-time frame energy of a jth frame signal of an ith frame signal is represented, N represents an nth sampling point, and N is 1, 2.
Judging whether the short-time frame energy of the jth path of frame signal of the ith frame signal is greater than a first preset threshold value or not to obtain a first judgment result;
if it is the firstIf the judgment result shows that the short-time frame energy is not greater than the first preset threshold, increasing the value of i by 1, and returning to the step of utilizing the formulaCalculating short-time frame energy of a jth frame signal of the ith frame signal;
if the first judgment result shows that the short-time array energy is larger than the first preset threshold, setting the ith frame signal as a starting point, and increasing the value of i by 1;
using formulasCalculating the zero crossing rate of the jth frame signal of the ith frame signal; wherein the content of the first and second substances,
judging whether the zero crossing rate is greater than a second preset threshold value or not to obtain a second judgment result;
if the second judgment result shows that the zero crossing rate is greater than the second preset threshold, marking T of the jth path of frame signal of the ith frame signalijIs set to 1;
if the obtained judgment result shows that the zero crossing rate is not greater than the second preset threshold value, marking T of the jth path of frame signal of the ith frame signalijSet to 0;
using the formula SS (i) ═ Ti1&&Ti2&&Ti3&&Ti4Calculating the total state value SS (i) of the marks of the four paths of frame signals of the ith frame signal; wherein, Ti1、Ti2、Ti3And Ti4Marks respectively representing the 1 st path, the 2 nd path, the 3 rd path and the 4 th path of the ith frame signal;
judging whether the total state value SS (i) is equal to 1 or not to obtain a third judgment result;
if the third judgment result indicates that SS (i) is equal to 1, setting the ith signal frame as an effective signal frame;
judging whether the short-time frame energy of the jth path of frame signal of the ith frame signal is smaller than a third preset threshold value or not to obtain a fourth judgment result;
if the fourth judgment result shows that the short-time frame energy of the jth frame signal of the ith frame signal is smaller than the third preset threshold, setting the ith signal frame as the termination point of the voice signal to obtain an effective frame signal subset;
if the fourth judgment result shows that the short-time frame energy of the jth frame signal of the ith frame signal is not less than the third preset threshold, increasing the value of i by 1, and returning to the step of using the formulaAnd calculating the zero crossing rate of the jth frame signal of the ith frame signal.
Optionally, the obtaining, according to the effective frame signal subset, an average generalized spectrum subtraction correction phase transformation function with which any two effective frame signals are fused with quadratic correlation specifically includes:
calculating the quadratic correlation of any two paths of frame signals of each effective frame signal according to the effective frame signal subset;
calculating the power spectrum of each path of frame signal of each effective frame signal according to the effective frame signal subset;
acquiring a noise masking function of each path of frame signal of each effective frame signal according to the power spectrum of each path of frame signal:
wherein z ispq(ω) noise masking function of the q-th frame signal representing the p-th effective frame signal, Xpq(ω) represents a power spectrum of a q-th frame signal of the p-th effective frame signal, q is 1,2,3,4, N (ω) noise power spectrum, α represents a first coefficient, and β represents a second coefficient;
acquiring generalized spectrum subtraction correction phase transformation function of any two paths of frame signals of each effective frame signal fused with quadratic correlation according to the noise masking function of each path of frame signals of each effective frame signal and the quadratic correlation of any two paths of frame signals:
wherein phi isls_p(ω) a generalized spectrally-subtracted modified phase transform function in which the l-th frame signal and the s-th frame signal, which represent the p-th valid frame signal, are fused together in a quadratic correlation, where l is 1,2,3,4, s is 1,2,3,4, l is not equal to s,
Xpl(omega) and Xps(ω) represents a power spectrum of the l-th frame signal and a power spectrum of the s-th frame signal of the p-th effective frame signal, respectively, and ρ represents a third coefficient;
according to the generalized spectrum subtraction correction phase transformation function of the fusion quadratic correlation of any two paths of frame signals of each effective frame signal, obtaining the average generalized spectrum subtraction correction phase transformation function of the fusion quadratic correlation of any two paths of effective frame signals:
wherein the content of the first and second substances,and the average generalized spectrum subtraction correction phase transformation function which represents that the l-th path of effective frame signal and the s-th path of effective frame signal are fused with secondary correlation is represented, and P represents the number of effective frame signals in the effective frame signal subset.
Optionally, the determining the direction position of the sound source according to the geometric position of the quaternary microphone array and the time delay values of any two paths of microphone sound source signals specifically includes:
according to the geometric position of the quaternary microphone array and the time delay values of any two paths of microphone sound source signals, a formula is utilizedCalculating azimuth angle theta of sound source to coordinate origin
According to the geometric position of the quaternary microphone array and the time delay values of any two paths of microphone sound source signals, a formula is utilizedCalculating azimuth angle and pitch angle of sound source to origin of coordinates
Where c is the sound velocity, d is the distance from the microphone element to the origin of coordinates, τ12Representing the time delay value, tau, of the 1 st and 2 nd microphone source signals13Representing the time delay value, tau, of the 1 st path microphone source signal and the 3 rd path microphone source signal14Which represents the delay values of the 1 st path microphone source signal and the 4 th path microphone source signal.
Optionally, the synchronous framing processing is performed on the four sound source voice signals to obtain a frame signal set, and the method further includes:
carrying out voice enhancement processing on each path of sound source voice signal to obtain a signal subjected to voice enhancement processing;
performing band-pass filtering processing on the signal subjected to the voice enhancement processing to obtain a signal subjected to the band-pass filtering processing;
and denoising the signal subjected to the band-pass filtering by using a wavelet threshold to obtain a preprocessed sound source voice signal.
A sound source localization system, comprising:
the sound source voice signal acquisition module is used for acquiring four paths of sound source voice signals by adopting a quaternary microphone array; the quaternary microphone array comprises four microphones, and each microphone collects one path of sound source voice signals;
a framing module, configured to perform synchronous framing processing on four channels of the sound source voice signals to obtain a frame signal set, where each frame signal in the signal frame set includes four channels of frame signals, namely a first channel of frame signal, a second channel of frame signal, a third channel of frame signal, and a fourth channel of frame signal;
the effective frame signal subset acquisition module is used for judging the effectiveness of each frame signal in the frame signal set to obtain an effective frame signal subset;
the fusion quadratic correlation average generalized spectrum subtraction correction phase transformation function acquisition module is used for acquiring the fusion quadratic correlation average generalized spectrum subtraction correction phase transformation function of any two paths of effective frame signals according to the effective frame signal subsets;
the time delay value calculation module is used for acquiring a time point corresponding to a maximum peak value of a modified phase transformation function fused with a quadratic correlation average generalized spectrum in any two paths of effective frame signals to obtain time delay values of any two paths of microphone sound source signals;
and the direction position determining module is used for determining the direction position of the sound source according to the geometric position of the quaternary microphone array and the time delay values of any two paths of microphone sound source signals.
Optionally, the framing module specifically includes:
a framing sub-module for applying a window functionCarrying out synchronous windowing and framing processing on the four sound source voice signals to obtain a frame signal xij(N), N denotes the nth sampling point, N is 1,2ij(n) a j-th signal indicating an ith frame signal, where j is 1,2,3, 4;
and the synthesis submodule is used for synthesizing all the frame signals into a frame signal set.
Optionally, the valid frame signal subset obtaining module specifically includes:
short time frame energy calculation submodule for utilizing formulaCalculating the short-time frame energy of the jth frame signal of the ith frame signal(ii) a Wherein E isijShort-time frame energy of a jth frame signal of an ith frame signal is represented, N represents an nth sampling point, and N is 1, 2.
The first judgment submodule is used for judging whether the short-time frame energy of the jth path of frame signal of the ith frame signal is greater than a first preset threshold value or not to obtain a first judgment result;
a first judgment result processing submodule for increasing the value of i by 1, calling the short-time frame energy calculation submodule and executing the step of utilizing the formulaCalculating short-time frame energy of a jth frame signal of the ith frame signal; if the first judgment result shows that the short-time array energy is larger than the first preset threshold, setting the ith frame signal as a starting point, and increasing the value of i by 1;
a zero-crossing rate calculation submodule for utilizing the formulaCalculating the zero crossing rate of the jth frame signal of the ith frame signal; wherein the content of the first and second substances,
the second judgment submodule is used for judging whether the zero crossing rate is greater than a second preset threshold value or not to obtain a second judgment result;
a second judgment result processing submodule, configured to, if the second judgment result indicates that the zero crossing rate is greater than the second preset threshold, mark T of a jth frame signal of an ith frame signalijIs set to 1; if the obtained judgment result shows that the zero crossing rate is not greater than the second preset threshold value, marking T of the jth path of frame signal of the ith frame signalijSet to 0;
a total state value ss (i) calculation submodule for using the formula ss (i) ═ Ti1&&Ti2&&Ti3&&Ti4Calculating the total state value SS (i) of the marks of the four paths of frame signals of the ith frame signal; wherein, Ti1、Ti2、Ti3And Ti4Marks respectively representing the 1 st path, the 2 nd path, the 3 rd path and the 4 th path of the ith frame signal;
a third judging submodule, configured to judge whether the total state value ss (i) is equal to 1, to obtain a third judgment result;
a third result processing sub-module, configured to set an ith signal frame as an effective signal frame if the third determination result indicates that ss (i) is equal to 1;
the fourth judgment submodule is used for judging whether the short-time frame energy of the jth path of frame signal of the ith frame signal is smaller than a third preset threshold value or not to obtain a fourth judgment result;
a fourth judgment result processing submodule, configured to set the ith signal frame as a termination point of the voice signal to obtain an effective frame signal subset if the fourth judgment result indicates that the short-time frame energy of the jth frame signal of the ith frame signal is smaller than the third preset threshold; if the fourth judgment result shows that the short-time frame energy of the jth path frame signal of the ith frame signal is not less than the third preset threshold, increasing the value of i by 1, calling a zero crossing rate calculation submodule, and executing the step of utilizing a formulaAnd calculating the zero crossing rate of the jth frame signal of the ith frame signal.
Optionally, the module for obtaining the fused quadratic correlation average generalized spectrum subtraction correction phase transformation function specifically includes:
the secondary correlation calculation submodule is used for calculating the secondary correlation of any two paths of frame signals of each effective frame signal according to the effective frame signal subset;
the power spectrum calculation submodule is used for calculating the power spectrum of each path of frame signal of each effective frame signal according to the effective signal subset;
the noise masking function obtaining submodule is used for obtaining the noise masking function of each path of frame signal of each effective frame signal according to the power spectrum of each path of frame signal:
wherein z ispq(ω) noise masking function of the q-th frame signal representing the p-th effective frame signal, Xpq(ω) represents a power spectrum of a q-th frame signal of the p-th effective frame signal, q is 1,2,3,4, N (ω) noise power spectrum, α represents a first coefficient, and β represents a second coefficient;
the generalized spectrum subtraction correction phase transformation function fused secondary correlation obtaining sub-module is used for obtaining the generalized spectrum subtraction correction phase transformation function fused secondary correlation of any two paths of frame signals of each effective frame signal according to the noise masking function of each path of frame signals of each effective frame signal and the secondary correlation of any two paths of frame signals:
wherein phi isls_p(ω) a generalized spectrally-subtracted modified phase transform function in which the l-th frame signal and the s-th frame signal, which represent the p-th valid frame signal, are fused together in a quadratic correlation, where l is 1,2,3,4, s is 1,2,3,4, l is not equal to s,
Xpl(omega) and Xps(ω) represents a power spectrum of the l-th frame signal and a power spectrum of the s-th frame signal of the p-th effective frame signal, respectively, and ρ represents a third coefficient;
the quadratic correlation fused average generalized spectrum subtraction correction phase transformation function obtaining sub-module is used for fusing quadratic correlation generalized spectrum subtraction correction phase transformation functions according to any two paths of frame signals of each effective frame signal to obtain quadratic correlation fused average generalized spectrum subtraction correction phase transformation functions of any two paths of effective frame signals:
wherein the content of the first and second substances,and the average generalized spectrum subtraction correction phase transformation function which represents that the l-th path of effective frame signal and the s-th path of effective frame signal are fused with secondary correlation is represented, and P represents the number of effective frame signals in the effective frame signal subset.
According to the specific embodiment provided by the invention, the invention discloses the following technical effects:
the invention discloses a sound source positioning method and a sound source positioning system. The sound source positioning method firstly obtains color sound source voice signal windowing framing for a quaternary microphone array, then detects signal effective frame signals, and calculates and fuses quadratic correlation generalized spectrum subtraction correction phase transformation functions for the screened effective frame signals. In order to further improve the time delay precision, the average generalized spectrum fused with quadratic correlation is adopted to subtract the modified phase transformation function to calculate the time delay value. And finally, estimating the sound source direction according to the geometric position of the microphone array and the calculated time delay value, thereby improving the precision of sound source positioning.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings needed to be used in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings without inventive exercise.
Fig. 1 is a flowchart of a sound source positioning method according to the present invention;
FIG. 2 is a diagram of a model of a quad microphone alignment provided by the present invention;
FIG. 3 is a graph of accuracy versus context for delay estimation of different algorithms at each frame under-5 dB noise environment provided by the present invention;
FIG. 4 is a graph comparing accuracy of delay estimation for different algorithms at each frame in a 5dB noise environment according to the present invention;
FIG. 5 is a graph comparing accuracy of delay estimation for different algorithms at each frame under an environment with a reverberation time of 750ms and a noise of 5dB provided by the present invention;
FIG. 6 is a diagram of an acquisition card according to the present invention;
FIG. 7 is a pictorial view of a microphone provided in accordance with the present invention;
fig. 8 is a physical diagram of a quaternary microphone array provided by the present invention;
fig. 9 is a block diagram of a sound source localization system according to the present invention.
Detailed Description
The invention aims to provide a sound source positioning method and a sound source positioning system so as to improve the accuracy of sound source positioning.
In order to make the aforementioned objects, features and advantages of the present invention comprehensible, embodiments accompanied with figures are described in further detail below.
Example 1
The embodiment 1 of the invention provides a sound source positioning method.
As shown in fig. 1, the sound source localization method includes the steps of:
Example 2
Example 2 of the present invention provides a preferred embodiment of a sound source localization method, but the implementation of the present invention is not limited to the embodiment defined in example 2 of the present invention.
The quaternary microphone array in step 101 is shown in fig. 2, where the coordinate of the quaternary microphone is m1(d,0,0),m2(0,d,0),m3(-d,0,0),m4(0, -d,0), d being the microphone element to origin distance.
After four sound source voice signals are obtained, carrying out voice enhancement processing on each sound source voice signal to obtain a signal subjected to voice enhancement processing; performing band-pass filtering processing on the signal subjected to the voice enhancement processing to obtain a signal subjected to the band-pass filtering processing; and denoising the signal subjected to the band-pass filtering by using a wavelet threshold to obtain a preprocessed sound source voice signal.
wherein z ispq(ω) noise masking function of the q-th frame signal representing the p-th effective frame signal, Xpq(ω) represents a power spectrum of a q-th frame signal of the p-th effective frame signal, q is 1,2,3,4, N (ω) noise power spectrum, α represents a first coefficient, and β represents a second coefficient; acquiring generalized spectrum subtraction correction phase transformation function of any two paths of frame signals of each effective frame signal fused with quadratic correlation according to the noise masking function of each path of frame signals of each effective frame signal and the quadratic correlation of any two paths of frame signals:
wherein phi isls_p(ω) a generalized spectrally-subtracted modified phase transform function in which the l-th frame signal and the s-th frame signal, which represent the p-th valid frame signal, are fused together in a quadratic correlation, where l is 1,2,3,4, s is 1,2,3,4, l is not equal to s,
Xpl(omega) and Xps(ω) represents a power spectrum of the l-th frame signal and a power spectrum of the s-th frame signal of the p-th effective frame signal, respectively, and ρ represents a third coefficient; according to the generalized spectrum subtraction correction phase transformation function of the fusion quadratic correlation of any two paths of frame signals of each effective frame signal, obtaining the average generalized spectrum subtraction correction phase transformation function of the fusion quadratic correlation of any two paths of effective frame signals:
wherein the content of the first and second substances,and the average generalized spectrum subtraction correction phase transformation function which represents that the l-th path of effective frame signal and the s-th path of effective frame signal are fused with secondary correlation is represented, and P represents the number of effective frame signals in the effective frame signal subset.
according to the geometric position of the quaternary microphone array and the time delay values of any two paths of microphone sound source signals, a formula is utilizedCalculating azimuth angle theta of sound source to coordinate origin
According to the geometric position of the quaternary microphone array and the time delay values of any two paths of microphone sound source signals, a formula is utilizedCalculating azimuth angle and pitch angle of sound source to origin of coordinates
Where c is the sound velocity, d is the distance from the microphone element to the origin of coordinates, τ12Representing the time delay value, tau, of the 1 st and 2 nd microphone source signals13Representing the time delay value, tau, of the 1 st path microphone source signal and the 3 rd path microphone source signal14Which represents the delay values of the 1 st path microphone source signal and the 4 th path microphone source signal. Specifically, the calculation formula x of the spherical coordinates is calculated according to the geometrical position relationship of the array of the four-element microphones (the coordinates of the four-element array microphone are m1(d,0,0), m2(0, d,0), m3(-d,0,0), m4(0, -d,0))2+y2+z2=r2Formula for calculating distance between two pointsAnd formula of velocitySolving for azimuthAnd a pitch angle
In order to illustrate the effect of the sound source localization method of the present invention, the present invention performs analog simulation comparison under different Signal-to-noise ratios and reverberation environments, and as can be seen from fig. 3 and 4, under a medium noise environment (SNB (Signal-to-noise ratio) ═ 5dB), the accuracy performance of the time delay value estimated by the Phase transformation (PHAT) algorithm is far inferior to the improved Cross Power Phase algorithm (MCPSP) and the Generalized spectral subtraction Modified Cross correlation function (GCC-APHAT) method, while APHAT is significantly better than the MCPSP method; under a strong noise environment (SNB ═ 5dB), the PHAT performance is reduced sharply, and only the MCPSP and the APHAT still maintain good performance. As can be seen from fig. 5, under the environmental conditions where both strong reverberation and strong noise exist (T60-750 ms and SNB-5 dB), the APHAT algorithm has better delay accuracy than the PHAT and MCPSP algorithms. The analysis and comparison in the simultaneous way can verify that the APHAT algorithm has better robustness to noise and reverberation.
In order to further explain the effect of the sound source positioning method, the method is used for building and carrying out real environment experiments, a multi-channel data acquisition card Q801 of Beijing Acoustic science technology Limited (SKC) is used for recording sound source signals, as shown in fig. 6, an array support and a microphone MP40 in a quaternary microphone array are both products of SKC manufacturers, as shown in fig. 7 and 8.
The experiments were all completed in a 7.2m x 6m x 3.2m room with both doors and windows closed. Given that certain background noise and reverberation exist in a room, including reflection of sound of a computer host fan, tables and chairs and other artificial interference, a sound source is a girl speaking voice (I goes to Beijing), and the sound source is a section of voice recorded in an actual environment. The sampling rate of the signal is 8kHz, the frame length of the framing is 256, the frame shift is 128, and a Hamming window is added. The coordinates of the quaternary microphone array are respectively: m is1(25cm,0,0),m2(0,25cm,0),m3(-25cm,0,0),m4(0, -25cm,0), and the microphone array is arranged at a height of 70cm from the ground. In the comparison of experimental data in an actual environment, a plurality of sound source positions in the following table 1 are respectively selected in an experiment, and 10 groups of data are collected at each sound source position. The invention takes PHAT and MCPSP algorithms as reference, and compares the performance of the APHAT algorithm and other two algorithms through experimental analysis. Table 1 shows the result of comparing the experimental data with the actual sound source localization results of the improved algorithm APHAT, PHAT and MCPSP in the actual environment, the localization error result is shown in table 2, and the root mean square error of localization is shown in table 3: TABLE 1
Serial number | S(x,y,z) | (r,θ,φ) | PHAT | MCPSP | APHAT |
1 | (1,1,0.76) | (1.6,45°,61.6°) | (45°,58.6°) | (45°,58.6°) | (45°,57.3°) |
2 | (2,1,0.76) | (2.36,26.6°,71.2°) | (26.6°,74.6°) | (26.6°,74.6°) | (26.6°,71.9°) |
3 | (2,2,076) | (2.93,45°,75°) | (45°,77.4°) | (45°,77.4°) | (45°,74.1°) |
4 | (-2,1,0.76) | (2.36,-26.6°,71.2°) | (-26.6°,74.6°) | (-26.6°,74.6°) | (-26.6°,71.9°) |
5 | (-2,2,0.76) | (2.93,-45°,75°) | (-45°,77.4°) | (-45°,77.4°) | (-45°,74.1°) |
6 | (1.2,0.6,076) | (1.54,26.6°,60.4°) | (37.9°,79.5°) | (29.1°,62.6°) | (29.1°,61.1°) |
7 | (-2.4,2.4,0.76) | (3.48,-45°,77.4°) | (-45°,77.4°) | (-45°,77.4°) | (-45°,74.1°) |
8 | (1.5,1.2,0.76) | (2.07,38.7°,68.5°) | (41.2°,66.5°) | (37.9°,79.5°) | (41.2°,64.6°) |
9 | (1.8,1.2,076) | (2.29,33.7°,70.6°) | (0,234°) | (37.9°,79.5°) | (37.9°,75.7°) |
10 | (2,1.2,0.76) | (2.45,31°,71.9°) | (-36.9°,59.6°) | (30.1°,90.2°) | (31°,82.4°) |
11 | (1.2,0,0.76) | (1.42,0°,57.7°) | (0,59.6°) | (0,59.6°) | (0,58.2°) |
12 | (0,1.8,0.76) | (1.95,90°,67.1°) | (0,234°) | (-90°,71.6°) | (90°,69.2°) |
13 | (1.2,2.4,076 | (2.79,63.4°,74.2°) | (63.4°,74.6°) | (63.4°,74.6°) | (63.4°,71.9°) |
14 | (0,1.2,0.76) | (1.42,90°,57.7°) | (-90°,59.6°) | (-90°,59.6°) | (90°,58.2°) |
15 | (-1.2,0,0.76) | (1.42,180°,57.7°) | (180,234°) | (180,59.6°) | (180°,58.7°) |
16 | (0,-1.2,0.76) | (1.42,-90°,57.7°) | (90°,59.6°) | (90°,59.6°) | (-90°,58.2°) |
17 | (-0.6,-1.2,0.76) | (1.54,63.4°,60.4°) | (60.9°,62.6°) | (60.9°,62.6°) | (60.9°,61.1°) |
18 | (0.6,-1.2,0.76) | (1.54,-63.4°,60.4°) | (-63.4°,74.6°) | (-68.2°,68.3°) | (-68.2°,66.3°) |
TABLE 2
TABLE 3
PHAT | MCPSP | APHAT | |
Azimuth angle thetaRMSE | 59.4 | 68.6 | 1.6 |
Angle of pitch phiRMSE | 63.2 | 4.6 | 2.7 |
From the comparative analysis of the experimental results in tables 1 and 2, it can be seen that: under the actual environment, the performance of estimating the azimuth angle and the pitch angle by the PHAT algorithm is unstable and has larger error. Estimating the positioning direction opposite errors of the sound source when the sound source is positioned in the X axis and the Y axis of the system coordinate by the MCPSP algorithm azimuth angle; the positioning performance of the APHAT algorithm is stable, the positioning progress is high, and the root mean square error REMS of the azimuth angle of the APHAT algorithm is 1.6 and the root mean square error REMS of the pitch angle is 2.7 as can be seen from the table 3; the angular deviation error of the APHAT direction of the algorithm is basically within an acceptable range, and the accuracy is relatively high. This also verifies the efficient performance of the algorithm proposed herein.
Example 3
Embodiment 3 the present invention provides a sound source localization system.
As shown in fig. 9, the present invention provides a sound source localization system including: a sound source voice signal acquisition module 901, configured to acquire four paths of sound source voice signals by using a quaternary microphone array; the quaternary microphone array comprises four microphones, and each microphone collects one path of sound source voice signals; a framing module 902, configured to perform synchronous framing processing on four channels of the sound source voice signals to obtain a frame signal set, where each frame signal in the signal frame set includes four channels of frame signals, which are a first channel of frame signal, a second channel of frame signal, a third channel of frame signal, and a fourth channel of frame signal; an effective frame signal subset obtaining module 903, configured to determine validity of each frame signal in the frame signal set, so as to obtain an effective frame signal subset; a fused secondary correlation average generalized spectrum subtraction correction phase transformation function obtaining module 904, configured to obtain a fused secondary correlation average generalized spectrum subtraction correction phase transformation function of any two paths of effective frame signals according to the effective frame signal subset; the time delay value calculation module 905 is used for acquiring a time point corresponding to a maximum peak value of the modified phase transformation function fused with the quadratic correlation average generalized spectrum subtraction correction of any two paths of effective frame signals to obtain time delay values of any two paths of microphone sound source signals; and a direction position determining module 906, configured to determine a direction position of the sound source according to the geometric position of the quaternary microphone array and the time delay values of any two paths of microphone sound source signals.
Example 4
Example 4 of the present invention provides a preferred implementation of a sound source localization system.
The framing module 902 specifically includes: a framing sub-module for applying a window functionCarrying out synchronous windowing and framing processing on the four sound source voice signals to obtain a frame signal xij(N), N denotes the nth sampling point, N is 1,2ij(n) a j-th signal indicating an ith frame signal, where j is 1,2,3, 4; and the synthesis submodule is used for synthesizing all the frame signals into a frame signal set.
The valid frame signal subset obtaining module 903 specifically includes: short time frame energy calculation submodule for utilizing formulaCalculating short-time frame energy of a jth path of frame signals of the ith frame signal; wherein E isijShort-time frame energy of a jth frame signal of an ith frame signal is represented, N represents an nth sampling point, and N is 1, 2. The first judgment submodule is used for judging whether the short-time frame energy of the jth path of frame signal of the ith frame signal is greater than a first preset threshold value or not to obtain a first judgment result; a first judgment result processing submodule for increasing the value of i by 1, calling the short-time frame energy calculation submodule and executing the step of utilizing the formulaCalculating short-time frame energy of a jth frame signal of the ith frame signal; if the first judgment result shows that the short-time array energy is larger than the first preset threshold, setting the ith frame signal as a starting point, and increasing the value of i by 1; a zero-crossing rate calculation submodule for utilizing the formulaCalculating the zero crossing rate of the jth frame signal of the ith frame signal(ii) a Wherein the content of the first and second substances,the second judgment submodule is used for judging whether the zero crossing rate is greater than a second preset threshold value or not to obtain a second judgment result; a second judgment result processing submodule, configured to, if the second judgment result indicates that the zero crossing rate is greater than the second preset threshold, mark T of a jth frame signal of an ith frame signalijIs set to 1; if the obtained judgment result shows that the zero crossing rate is not greater than the second preset threshold value, marking T of the jth path of frame signal of the ith frame signalijSet to 0;
a total state value ss (i) calculation submodule for using the formula ss (i) ═ Ti1&&Ti2&&Ti3&&Ti4Calculating the total state value SS (i) of the marks of the four paths of frame signals of the ith frame signal; wherein, Ti1、Ti2、Ti3And Ti4Marks respectively representing the 1 st path, the 2 nd path, the 3 rd path and the 4 th path of the ith frame signal; a third judging submodule, configured to judge whether the total state value ss (i) is equal to 1, to obtain a third judgment result; a third result processing sub-module, configured to set an ith signal frame as an effective signal frame if the third determination result indicates that ss (i) is equal to 1; the fourth judgment submodule is used for judging whether the short-time frame energy of the jth path of frame signal of the ith frame signal is smaller than a third preset threshold value or not to obtain a fourth judgment result; a fourth judgment result processing submodule, configured to set the ith signal frame as a termination point of the voice signal to obtain an effective frame signal subset if the fourth judgment result indicates that the short-time frame energy of the jth frame signal of the ith frame signal is smaller than the third preset threshold; if the fourth judgment result shows that the short-time frame energy of the jth path frame signal of the ith frame signal is not less than the third preset threshold, increasing the value of i by 1, calling a zero crossing rate calculation submodule, and executing the step of utilizing a formulaCalculating the jth frame signal of the ith frame signalZero crossing rate of ".
The module 904 for obtaining the mean generalized spectrum subtraction correction phase transformation function with fused quadratic correlations specifically includes: the secondary correlation calculation submodule is used for calculating the secondary correlation of any two paths of frame signals of each effective frame signal according to the effective frame signal subset; the power spectrum calculation submodule is used for calculating the power spectrum of each path of frame signal of each effective frame signal according to the effective signal subset; the noise masking function obtaining submodule is used for obtaining the noise masking function of each path of frame signal of each effective frame signal according to the power spectrum of each path of frame signal:
wherein z ispq(ω) noise masking function of the q-th frame signal representing the p-th effective frame signal, Xpq(ω) represents a power spectrum of a q-th frame signal of the p-th effective frame signal, q is 1,2,3,4, N (ω) noise power spectrum, α represents a first coefficient, and β represents a second coefficient; the generalized spectrum subtraction correction phase transformation function fused secondary correlation obtaining sub-module is used for obtaining the generalized spectrum subtraction correction phase transformation function fused secondary correlation of any two paths of frame signals of each effective frame signal according to the noise masking function of each path of frame signals of each effective frame signal and the secondary correlation of any two paths of frame signals:
wherein phi isls_p(ω) a generalized spectrally-subtracted modified phase transform function in which the l-th frame signal and the s-th frame signal, which represent the p-th valid frame signal, are fused together in a quadratic correlation, where l is 1,2,3,4, s is 1,2,3,4, l is not equal to s,
Xpl(omega) and Xps(ω) the power spectrum of the l-th frame signal and the power spectrum of the p-th effective frame signal, respectivelyThe power spectrum of the s-channel frame signals, and rho represents a third coefficient; the quadratic correlation fused average generalized spectrum subtraction correction phase transformation function obtaining sub-module is used for fusing quadratic correlation generalized spectrum subtraction correction phase transformation functions according to any two paths of frame signals of each effective frame signal to obtain quadratic correlation fused average generalized spectrum subtraction correction phase transformation functions of any two paths of effective frame signals:
wherein the content of the first and second substances,and the average generalized spectrum subtraction correction phase transformation function which represents that the l-th path of effective frame signal and the s-th path of effective frame signal are fused with secondary correlation is represented, and P represents the number of effective frame signals in the effective frame signal subset.
According to the specific embodiment provided by the invention, the invention discloses the following technical effects:
the invention discloses a sound source positioning method and a sound source positioning system. The sound source positioning method firstly obtains color sound source voice signal windowing framing for a quaternary microphone array, then detects signal effective frame signals, and calculates and fuses quadratic correlation generalized spectrum subtraction correction phase transformation functions for the screened effective frame signals. In order to further improve the time delay precision, the average generalized spectrum fused with quadratic correlation is adopted to subtract the modified phase transformation function to calculate the time delay value. And finally, estimating the sound source direction according to the geometric position of the microphone array and the calculated time delay value, thereby improving the precision of sound source positioning.
The embodiments in the present description are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other. For the system disclosed by the embodiment, the description is relatively simple because the system corresponds to the method disclosed by the embodiment, and the relevant points can be referred to the method part for description.
The principle and the implementation manner of the present invention are explained by applying specific examples, the above description of the embodiments is only used to help understanding the method of the present invention and the core idea thereof, the described embodiments are only a part of the embodiments of the present invention, not all embodiments, and all other embodiments obtained by a person of ordinary skill in the art based on the embodiments of the present invention without creative efforts belong to the protection scope of the present invention.
Claims (8)
1. A sound source localization method, characterized by comprising the steps of:
acquiring four sound source voice signals by adopting a quaternary microphone array; the quaternary microphone array comprises four microphones, and each microphone collects one path of sound source voice signals;
performing synchronous framing processing on four paths of sound source voice signals to obtain a frame signal set, wherein each frame signal in the signal frame set comprises four paths of frame signals which are a first path of frame signal, a second path of frame signal, a third path of frame signal and a fourth path of frame signal respectively;
judging the validity of each frame signal in the frame signal set to obtain a valid frame signal subset;
acquiring an average generalized spectrum subtraction correction phase transformation function fused with secondary correlation of any two paths of effective frame signals according to the effective frame signal subset;
acquiring sample points corresponding to maximum peak values of any two paths of microphone sound source signals by fusing any two paths of effective frame signals with secondary correlation average generalized spectrum subtraction correction phase transformation functions;
determining the direction position of a sound source according to the geometric position of the quaternary microphone array and the time delay values of any two paths of microphone sound source signals;
the obtaining of the average generalized spectrum subtraction correction phase transformation function fused with quadratic correlation of any two effective frame signals according to the effective frame signal subset specifically includes:
according to the effective frame signal subset, combining autocorrelation and cross correlation, and calculating the quadratic correlation of any two paths of frame signals of each effective frame signal;
calculating the power spectrum of each path of frame signal of each effective frame signal according to the effective frame signal subset;
acquiring a noise masking function of each path of frame signal of each effective frame signal according to the power spectrum of each path of frame signal:
wherein z ispq(ω) noise masking function of the q-th frame signal representing the p-th effective frame signal, Xpq(ω) represents a power spectrum of a q-th frame signal of the p-th effective frame signal, q is 1,2,3,4, N (ω) noise power spectrum, α represents a first coefficient, and β represents a second coefficient;
acquiring generalized spectrum subtraction correction phase transformation function of any two paths of frame signals of each effective frame signal fused with quadratic correlation according to the noise masking function of each path of frame signals of each effective frame signal and the quadratic correlation of any two paths of frame signals:
wherein phi isls_p(ω) a generalized spectrally-subtracted modified phase transform function in which the l-th frame signal and the s-th frame signal, which represent the p-th valid frame signal, are fused together in a quadratic correlation, where l is 1,2,3,4, s is 1,2,3,4, l is not equal to s, Xpl(omega) and Xps(ω) represents a power spectrum of the l-th frame signal and a power spectrum of the s-th frame signal of the p-th effective frame signal, respectively, and ρ represents a third coefficient;
according to the generalized spectrum subtraction correction phase transformation function of the fusion quadratic correlation of any two paths of frame signals of each effective frame signal, obtaining the average generalized spectrum subtraction correction phase transformation function of the fusion quadratic correlation of any two paths of effective frame signals:
wherein the content of the first and second substances,and the average generalized spectrum subtraction correction phase transformation function which represents that the l-th path of effective frame signal and the s-th path of effective frame signal are fused with secondary correlation is represented, and P represents the number of effective frame signals in the effective frame signal subset.
2. The method according to claim 1, wherein the step of synchronously framing the four sound source voice signals to obtain a frame signal set comprises:
using window functionsCarrying out synchronous windowing and framing processing on the four sound source voice signals to obtain a frame signal xij(N), N denotes the nth sampling point, N is 1,2ij(n) a j-th signal indicating an ith frame signal, where j is 1,2,3, 4;
all the frame signals are combined into a set of frame signals.
3. The method according to claim 1, wherein the determining validity of each frame signal in the frame signal set to obtain a valid frame signal subset comprises:
using formulasCalculating short-time frame energy of a jth path of frame signals of the ith frame signal; wherein E isijShort-time frame energy of a jth frame signal of an ith frame signal is represented, N represents an nth sampling point, and N is 1, 2.
Judging whether the short-time frame energy of the jth path of frame signal of the ith frame signal is greater than a first preset threshold value or not to obtain a first judgment result;
if the first judgment result shows that the short-time frame energy is not greater than the first preset threshold, increasing the value of i by 1, and returning to the step of utilizing the formulaCalculating short-time frame energy of a jth frame signal of the ith frame signal;
if the first judgment result shows that the short-time array energy is larger than the first preset threshold, setting the ith frame signal as a starting point, and increasing the value of i by 1;
using formulasCalculating the zero crossing rate of the jth frame signal of the ith frame signal; wherein the content of the first and second substances,
judging whether the zero crossing rate is greater than a second preset threshold value or not to obtain a second judgment result;
if the second judgment result shows that the zero crossing rate is greater than the second preset threshold, marking T of the jth path of frame signal of the ith frame signalijIs set to 1;
if the second judgment result shows that the zero crossing rate is not greater than the second preset threshold, marking T of the jth path of frame signal of the ith frame signalijSet to 0;
using the formula SS (i) ═ Ti1&&Ti2&&Ti3&&Ti4Calculating the total state value SS (i) of the marks of the four paths of frame signals of the ith frame signal; wherein, Ti1、Ti2、Ti3And Ti4Individual watchMarks showing the 1 st, 2 nd, 3 rd and 4 th path frame signals of the ith frame signal;
judging whether the total state value SS (i) is equal to 1 or not to obtain a third judgment result;
if the third judgment result indicates that SS (i) is equal to 1, setting the ith signal frame as an effective signal frame;
judging whether the short-time frame energy of the jth path of frame signal of the ith frame signal is smaller than a third preset threshold value or not to obtain a fourth judgment result;
if the fourth judgment result shows that the short-time frame energy of the jth frame signal of the ith frame signal is smaller than the third preset threshold, setting the ith signal frame as the termination point of the voice signal to obtain an effective frame signal subset;
if the fourth judgment result shows that the short-time frame energy of the jth frame signal of the ith frame signal is not less than the third preset threshold, increasing the value of i by 1, and returning to the step of using the formulaAnd calculating the zero crossing rate of the jth frame signal of the ith frame signal.
4. The sound source localization method according to claim 1, wherein the determining a directional position of a sound source according to the geometric position of the quaternary microphone array and the time delay values of any two microphone sound source signals specifically comprises:
according to the geometric position of the quaternary microphone array and the time delay values of any two paths of microphone sound source signals, a formula is utilizedCalculating azimuth angle theta of sound source to coordinate origin
According to the geometric position of the quaternary microphone array and the time delay values of any two paths of microphone sound source signals, a formula is utilizedCalculating azimuth angle and pitch angle of sound source to origin of coordinates
Where c is the sound velocity, d is the distance from the microphone element to the origin of coordinates, τ12Representing the time delay value, tau, of the 1 st and 2 nd microphone source signals13Representing the time delay value, tau, of the 1 st path microphone source signal and the 3 rd path microphone source signal14Which represents the delay values of the 1 st path microphone source signal and the 4 th path microphone source signal.
5. The sound source localization method according to claim 1, wherein the synchronous framing of the four sound source voice signals to obtain a frame signal set further comprises:
carrying out voice enhancement processing on each path of sound source voice signal to obtain a signal subjected to voice enhancement processing;
performing band-pass filtering processing on the signal subjected to the voice enhancement processing to obtain a signal subjected to the band-pass filtering processing;
and denoising the signal subjected to the band-pass filtering by using a wavelet threshold to obtain a preprocessed sound source voice signal.
6. A sound source localization system, comprising:
the sound source voice signal acquisition module is used for acquiring four paths of sound source voice signals by adopting a quaternary microphone array; the quaternary microphone array comprises four microphones, and each microphone collects one path of sound source voice signals;
a framing module, configured to perform synchronous framing processing on four channels of the sound source voice signals to obtain a frame signal set, where each frame signal in the signal frame set includes four channels of frame signals, namely a first channel of frame signal, a second channel of frame signal, a third channel of frame signal, and a fourth channel of frame signal;
the effective frame signal subset acquisition module is used for judging the effectiveness of each frame signal in the frame signal set to obtain an effective frame signal subset;
the fusion quadratic correlation average generalized spectrum subtraction correction phase transformation function acquisition module is used for acquiring the fusion quadratic correlation average generalized spectrum subtraction correction phase transformation function of any two paths of effective frame signals according to the effective frame signal subsets;
the time delay value calculation module is used for acquiring a time point corresponding to a maximum peak value of a modified phase transformation function fused with a quadratic correlation average generalized spectrum in any two paths of effective frame signals to obtain time delay values of any two paths of microphone sound source signals;
the direction position determining module is used for determining the direction position of a sound source according to the geometric position of the quaternary microphone array and the time delay values of any two paths of microphone sound source signals;
the module for obtaining the average generalized spectrum subtraction correction phase transformation function fused with the quadratic correlation specifically comprises:
the secondary correlation calculation submodule is used for combining the autocorrelation and the cross correlation according to the effective frame signal subset and calculating the secondary correlation of any two paths of frame signals of each effective frame signal;
the power spectrum calculation submodule is used for calculating the power spectrum of each path of frame signal of each effective frame signal according to the effective signal subset;
the noise masking function obtaining submodule is used for obtaining the noise masking function of each path of frame signal of each effective frame signal according to the power spectrum of each path of frame signal:
wherein z ispq(ω) noise masking function of the q-th frame signal representing the p-th effective frame signal, Xpq(ω) represents a power spectrum of a q-th frame signal of the p-th effective frame signal, q is 1,2,3,4, N (ω) noise power spectrum, α represents a first coefficient, and β represents a second coefficient;
the generalized spectrum subtraction correction phase transformation function fused secondary correlation obtaining sub-module is used for obtaining the generalized spectrum subtraction correction phase transformation function fused secondary correlation of any two paths of frame signals of each effective frame signal according to the noise masking function of each path of frame signals of each effective frame signal and the secondary correlation of any two paths of frame signals:
wherein phi isls_p(ω) a generalized spectrally-subtracted modified phase transform function in which the l-th frame signal and the s-th frame signal, which represent the p-th valid frame signal, are fused together in a quadratic correlation, where l is 1,2,3,4, s is 1,2,3,4, l is not equal to s, Xpl(omega) and Xps(ω) represents a power spectrum of the l-th frame signal and a power spectrum of the s-th frame signal of the p-th effective frame signal, respectively, and ρ represents a third coefficient;
the quadratic correlation fused average generalized spectrum subtraction correction phase transformation function obtaining sub-module is used for fusing quadratic correlation generalized spectrum subtraction correction phase transformation functions according to any two paths of frame signals of each effective frame signal to obtain quadratic correlation fused average generalized spectrum subtraction correction phase transformation functions of any two paths of effective frame signals:
wherein the content of the first and second substances,and the average generalized spectrum subtraction correction phase transformation function which represents that the l-th path of effective frame signal and the s-th path of effective frame signal are fused with secondary correlation is represented, and P represents the number of effective frame signals in the effective frame signal subset.
7. The sound source positioning system according to claim 6, wherein the framing module specifically includes:
a framing sub-module for applying a window functionCarrying out synchronous windowing and framing processing on the four sound source voice signals to obtain a frame signal xij(N), N denotes the nth sampling point, N is 1,2ij(n) a j-th signal indicating an ith frame signal, where j is 1,2,3, 4;
and the synthesis submodule is used for synthesizing all the frame signals into a frame signal set.
8. The sound source localization system according to claim 6, wherein the valid frame signal subset acquisition module specifically comprises:
short time frame energy calculation submodule for utilizing formulaCalculating short-time frame energy of a jth path of frame signals of the ith frame signal; wherein E isijShort-time frame energy of a jth frame signal of an ith frame signal is represented, N represents an nth sampling point, and N is 1, 2.
The first judgment submodule is used for judging whether the short-time frame energy of the jth path of frame signal of the ith frame signal is greater than a first preset threshold value or not to obtain a first judgment result;
a first judgment result processing submodule for increasing the value of i by 1, calling the short-time frame energy calculation submodule and executing the step of utilizing the formulaCalculating short-time frame energy of a jth frame signal of the ith frame signal;if the first judgment result shows that the short-time array energy is larger than the first preset threshold, setting the ith frame signal as a starting point, and increasing the value of i by 1;
a zero-crossing rate calculation submodule for utilizing the formulaCalculating the zero crossing rate of the jth frame signal of the ith frame signal; wherein the content of the first and second substances,
the second judgment submodule is used for judging whether the zero crossing rate is greater than a second preset threshold value or not to obtain a second judgment result;
a second judgment result processing submodule, configured to, if the second judgment result indicates that the zero crossing rate is greater than the second preset threshold, mark T of a jth frame signal of an ith frame signalijIs set to 1; if the second judgment result shows that the zero crossing rate is not greater than the second preset threshold, marking T of the jth path of frame signal of the ith frame signalijSet to 0;
a total state value ss (i) calculation submodule for using the formula ss (i) ═ Ti1&&Ti2&&Ti3&&Ti4Calculating the total state value SS (i) of the marks of the four paths of frame signals of the ith frame signal; wherein, Ti1、Ti2、Ti3And Ti4Marks respectively representing the 1 st path, the 2 nd path, the 3 rd path and the 4 th path of the ith frame signal;
a third judging submodule, configured to judge whether the total state value ss (i) is equal to 1, to obtain a third judgment result;
a third result processing sub-module, configured to set an ith signal frame as an effective signal frame if the third determination result indicates that ss (i) is equal to 1;
the fourth judgment submodule is used for judging whether the short-time frame energy of the jth path of frame signal of the ith frame signal is smaller than a third preset threshold value or not to obtain a fourth judgment result;
a fourth judgment result processing submodule, configured to set the ith signal frame as a termination point of the voice signal to obtain an effective frame signal subset if the fourth judgment result indicates that the short-time frame energy of the jth frame signal of the ith frame signal is smaller than the third preset threshold; if the fourth judgment result shows that the short-time frame energy of the jth path frame signal of the ith frame signal is not less than the third preset threshold, increasing the value of i by 1, calling a zero crossing rate calculation submodule, and executing the step of utilizing a formulaAnd calculating the zero crossing rate of the jth frame signal of the ith frame signal.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910312565.6A CN110007276B (en) | 2019-04-18 | 2019-04-18 | Sound source positioning method and system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910312565.6A CN110007276B (en) | 2019-04-18 | 2019-04-18 | Sound source positioning method and system |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110007276A CN110007276A (en) | 2019-07-12 |
CN110007276B true CN110007276B (en) | 2021-01-12 |
Family
ID=67172766
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910312565.6A Active CN110007276B (en) | 2019-04-18 | 2019-04-18 | Sound source positioning method and system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110007276B (en) |
Families Citing this family (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110706717B (en) * | 2019-09-06 | 2021-11-09 | 西安合谱声学科技有限公司 | Microphone array panel-based human voice detection orientation method |
CN110703198B (en) * | 2019-10-22 | 2022-03-22 | 哈尔滨工程大学 | Quaternary cross array envelope spectrum estimation method based on frequency selection |
CN112924937B (en) * | 2021-01-25 | 2024-06-04 | 桂林电子科技大学 | Positioning device and method for two-dimensional plane bursty sound source |
Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102110441A (en) * | 2010-12-22 | 2011-06-29 | 中国科学院声学研究所 | Method for generating sound masking signal based on time reversal |
CN102707262A (en) * | 2012-06-20 | 2012-10-03 | 太仓博天网络科技有限公司 | Sound localization system based on microphone array |
CN103235287A (en) * | 2013-04-17 | 2013-08-07 | 华北电力大学(保定) | Sound source localization camera shooting tracking device |
KR20130114437A (en) * | 2012-04-09 | 2013-10-17 | 주식회사 센서웨이 | The time delay estimation method based on cross-correlation and apparatus thereof |
CN103607361A (en) * | 2013-06-05 | 2014-02-26 | 西安电子科技大学 | Time frequency overlap signal parameter estimation method under Alpha stable distribution noise |
EP2543037B1 (en) * | 2010-03-29 | 2014-03-05 | Fraunhofer Gesellschaft zur Förderung der angewandten Wissenschaft E.V. | A spatial audio processor and a method for providing spatial parameters based on an acoustic input signal |
CN107102296A (en) * | 2017-04-27 | 2017-08-29 | 大连理工大学 | A kind of sonic location system based on distributed microphone array |
CN108198568A (en) * | 2017-12-26 | 2018-06-22 | 太原理工大学 | A kind of method and system of more auditory localizations |
US20180359563A1 (en) * | 2017-06-12 | 2018-12-13 | Ryo Tanaka | Method for accurately calculating the direction of arrival of sound at a microphone array |
Family Cites Families (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101901602B (en) * | 2010-07-09 | 2012-09-05 | 中国科学院声学研究所 | Method for reducing noise by using hearing threshold of impaired hearing |
US9081083B1 (en) * | 2011-06-27 | 2015-07-14 | Amazon Technologies, Inc. | Estimation of time delay of arrival |
FR2992765A1 (en) * | 2012-06-27 | 2014-01-03 | France Telecom | LOW COMPLEXITY COUPLING ESTIMATION |
CN104076331B (en) * | 2014-06-18 | 2016-04-13 | 南京信息工程大学 | A kind of sound localization method of seven yuan of microphone arrays |
CN104991573A (en) * | 2015-06-25 | 2015-10-21 | 北京品创汇通科技有限公司 | Locating and tracking method and apparatus based on sound source array |
CN106098077B (en) * | 2016-07-28 | 2023-05-05 | 浙江诺尔康神经电子科技股份有限公司 | Artificial cochlea speech processing system and method with noise reduction function |
CN106226739A (en) * | 2016-07-29 | 2016-12-14 | 太原理工大学 | Merge the double sound source localization method of Substrip analysis |
US20180074163A1 (en) * | 2016-09-08 | 2018-03-15 | Nanjing Avatarmind Robot Technology Co., Ltd. | Method and system for positioning sound source by robot |
CN107644650B (en) * | 2017-09-29 | 2020-06-05 | 山东大学 | Improved sound source positioning method based on progressive serial orthogonalization blind source separation algorithm and implementation system thereof |
CN108333575B (en) * | 2018-02-02 | 2020-10-20 | 浙江大学 | Gaussian prior and interval constraint based time delay filtering method for mobile sound source |
-
2019
- 2019-04-18 CN CN201910312565.6A patent/CN110007276B/en active Active
Patent Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP2543037B1 (en) * | 2010-03-29 | 2014-03-05 | Fraunhofer Gesellschaft zur Förderung der angewandten Wissenschaft E.V. | A spatial audio processor and a method for providing spatial parameters based on an acoustic input signal |
CN102110441A (en) * | 2010-12-22 | 2011-06-29 | 中国科学院声学研究所 | Method for generating sound masking signal based on time reversal |
KR20130114437A (en) * | 2012-04-09 | 2013-10-17 | 주식회사 센서웨이 | The time delay estimation method based on cross-correlation and apparatus thereof |
CN102707262A (en) * | 2012-06-20 | 2012-10-03 | 太仓博天网络科技有限公司 | Sound localization system based on microphone array |
CN103235287A (en) * | 2013-04-17 | 2013-08-07 | 华北电力大学(保定) | Sound source localization camera shooting tracking device |
CN103607361A (en) * | 2013-06-05 | 2014-02-26 | 西安电子科技大学 | Time frequency overlap signal parameter estimation method under Alpha stable distribution noise |
CN107102296A (en) * | 2017-04-27 | 2017-08-29 | 大连理工大学 | A kind of sonic location system based on distributed microphone array |
US20180359563A1 (en) * | 2017-06-12 | 2018-12-13 | Ryo Tanaka | Method for accurately calculating the direction of arrival of sound at a microphone array |
CN108198568A (en) * | 2017-12-26 | 2018-06-22 | 太原理工大学 | A kind of method and system of more auditory localizations |
Also Published As
Publication number | Publication date |
---|---|
CN110007276A (en) | 2019-07-12 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107102296B (en) | Sound source positioning system based on distributed microphone array | |
CN110007276B (en) | Sound source positioning method and system | |
WO2020042708A1 (en) | Time-frequency masking and deep neural network-based sound source direction estimation method | |
CN104076331B (en) | A kind of sound localization method of seven yuan of microphone arrays | |
CN105068048B (en) | Distributed microphone array sound localization method based on spatial sparsity | |
CN109490822B (en) | Voice DOA estimation method based on ResNet | |
CN104142492B (en) | A kind of SRP PHAT multi-source space-location methods | |
CN110515038B (en) | Self-adaptive passive positioning device based on unmanned aerial vehicle-array and implementation method | |
CN105388459B (en) | The robust sound source space-location method of distributed microphone array network | |
CN110488223A (en) | A kind of sound localization method | |
CN111474521B (en) | Sound source positioning method based on microphone array in multipath environment | |
Ajdler et al. | Acoustic source localization in distributed sensor networks | |
CN107219512B (en) | Sound source positioning method based on sound transfer function | |
CN105204001A (en) | Sound source positioning method and system | |
Pang et al. | Multitask learning of time-frequency CNN for sound source localization | |
CN111239687A (en) | Sound source positioning method and system based on deep neural network | |
CN111798869B (en) | Sound source positioning method based on double microphone arrays | |
CN111273215B (en) | Channel inconsistency error correction direction finding method of channel state information | |
CN110534126A (en) | A kind of auditory localization and sound enhancement method and system based on fixed beam formation | |
CN109188362A (en) | A kind of microphone array auditory localization signal processing method | |
CN107167770A (en) | A kind of microphone array sound source locating device under the conditions of reverberation | |
CN111965596A (en) | Low-complexity single-anchor node positioning method and device based on joint parameter estimation | |
Chen et al. | A supplement to multidimensional scaling framework for mobile location: A unified view | |
CN115267671A (en) | Distributed voice interaction terminal equipment and sound source positioning method and device thereof | |
Dang et al. | A feature-based data association method for multiple acoustic source localization in a distributed microphone array |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |