CN110007276B - Sound source positioning method and system - Google Patents

Sound source positioning method and system Download PDF

Info

Publication number
CN110007276B
CN110007276B CN201910312565.6A CN201910312565A CN110007276B CN 110007276 B CN110007276 B CN 110007276B CN 201910312565 A CN201910312565 A CN 201910312565A CN 110007276 B CN110007276 B CN 110007276B
Authority
CN
China
Prior art keywords
frame signal
frame
signal
signals
sound source
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910312565.6A
Other languages
Chinese (zh)
Other versions
CN110007276A (en
Inventor
黄丽霞
张雪英
王杰
李凤莲
陈桂军
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Taiyuan University of Technology
Original Assignee
Taiyuan University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Taiyuan University of Technology filed Critical Taiyuan University of Technology
Priority to CN201910312565.6A priority Critical patent/CN110007276B/en
Publication of CN110007276A publication Critical patent/CN110007276A/en
Application granted granted Critical
Publication of CN110007276B publication Critical patent/CN110007276B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G01MEASURING; TESTING
    • G01SRADIO DIRECTION-FINDING; RADIO NAVIGATION; DETERMINING DISTANCE OR VELOCITY BY USE OF RADIO WAVES; LOCATING OR PRESENCE-DETECTING BY USE OF THE REFLECTION OR RERADIATION OF RADIO WAVES; ANALOGOUS ARRANGEMENTS USING OTHER WAVES
    • G01S5/00Position-fixing by co-ordinating two or more direction or position line determinations; Position-fixing by co-ordinating two or more distance determinations
    • G01S5/18Position-fixing by co-ordinating two or more direction or position line determinations; Position-fixing by co-ordinating two or more distance determinations using ultrasonic, sonic, or infrasonic waves
    • G01S5/20Position of source determined by a plurality of spaced direction-finders

Landscapes

  • Physics & Mathematics (AREA)
  • Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Radar, Positioning & Navigation (AREA)
  • Remote Sensing (AREA)
  • Measurement Of Velocity Or Position Using Acoustic Or Ultrasonic Waves (AREA)
  • Circuit For Audible Band Transducer (AREA)

Abstract

The invention discloses a sound source positioning method and a sound source positioning system. The sound source positioning method firstly carries out windowing and framing on sound source voice signals obtained by the quaternary microphone array, then detects signal effective frame signals, and calculates and fuses quadratic correlation generalized spectrum subtraction correction phase transformation functions on the screened effective frame signals. In order to further improve the time delay precision, the average generalized spectrum fused with quadratic correlation is adopted to subtract the modified phase transformation function to calculate the time delay value. And finally, estimating the sound source direction according to the geometric position of the microphone array and the calculated time delay value, thereby improving the precision of sound source positioning.

Description

Sound source positioning method and system
Technical Field
The present invention relates to the field of sound source localization, and in particular, to a sound source localization method and system.
Background
Sound source localization has become a research hotspot in the field of voice signal processing, and has wide application in the fields of video conferences, intelligent robots, intelligent video monitoring systems and the like. In the traditional positioning algorithm, the positioning accuracy is sharply reduced in a severe environment with low signal-to-noise ratio and high reverberation time.
Disclosure of Invention
The invention aims to provide a sound source positioning method and a sound source positioning system so as to improve the accuracy of sound source positioning.
In order to achieve the purpose, the invention provides the following scheme:
the invention provides a sound source positioning method, which comprises the following steps:
acquiring four sound source voice signals by adopting a quaternary microphone array; the quaternary microphone array comprises four microphones, and each microphone collects one path of sound source voice signals;
performing synchronous framing processing on four paths of sound source voice signals to obtain a frame signal set, wherein each frame signal in the signal frame set comprises four paths of frame signals which are a first path of frame signal, a second path of frame signal, a third path of frame signal and a fourth path of frame signal respectively;
judging the validity of each frame signal in the frame signal set to obtain a valid frame signal subset;
acquiring an average generalized spectrum subtraction correction phase transformation function fused with secondary correlation of any two paths of effective frame signals according to the effective frame signal subset;
acquiring a time point corresponding to a maximum peak value of a modified phase transformation function fused with a quadratic correlation average generalized spectrum in any two paths of effective frame signals to obtain a time delay value of any two paths of microphone sound source signals;
and determining the direction position of the sound source according to the geometric position of the quaternary microphone array and the time delay values of any two paths of microphone sound source signals.
Optionally, the synchronous framing processing is performed on the four sound source voice signals to obtain a frame signal set, which specifically includes:
using window functions
Figure BDA0002031981220000021
Carrying out synchronous windowing and framing processing on the four sound source voice signals to obtain a frame signal xij(N), N denotes the nth sampling point, N is 1,2ij(n) a j-th signal indicating an ith frame signal, where j is 1,2,3, 4;
all the frame signals are combined into a set of frame signals.
Optionally, the determining the validity of each frame signal in the frame signal set to obtain a valid frame signal subset specifically includes:
using formulas
Figure BDA0002031981220000022
Calculating short-time frame energy of a jth path of frame signals of the ith frame signal; wherein E isijShort-time frame energy of a jth frame signal of an ith frame signal is represented, N represents an nth sampling point, and N is 1, 2.
Judging whether the short-time frame energy of the jth path of frame signal of the ith frame signal is greater than a first preset threshold value or not to obtain a first judgment result;
if it is the firstIf the judgment result shows that the short-time frame energy is not greater than the first preset threshold, increasing the value of i by 1, and returning to the step of utilizing the formula
Figure BDA0002031981220000023
Calculating short-time frame energy of a jth frame signal of the ith frame signal;
if the first judgment result shows that the short-time array energy is larger than the first preset threshold, setting the ith frame signal as a starting point, and increasing the value of i by 1;
using formulas
Figure BDA0002031981220000024
Calculating the zero crossing rate of the jth frame signal of the ith frame signal; wherein the content of the first and second substances,
Figure BDA0002031981220000025
judging whether the zero crossing rate is greater than a second preset threshold value or not to obtain a second judgment result;
if the second judgment result shows that the zero crossing rate is greater than the second preset threshold, marking T of the jth path of frame signal of the ith frame signalijIs set to 1;
if the obtained judgment result shows that the zero crossing rate is not greater than the second preset threshold value, marking T of the jth path of frame signal of the ith frame signalijSet to 0;
using the formula SS (i) ═ Ti1&&Ti2&&Ti3&&Ti4Calculating the total state value SS (i) of the marks of the four paths of frame signals of the ith frame signal; wherein, Ti1、Ti2、Ti3And Ti4Marks respectively representing the 1 st path, the 2 nd path, the 3 rd path and the 4 th path of the ith frame signal;
judging whether the total state value SS (i) is equal to 1 or not to obtain a third judgment result;
if the third judgment result indicates that SS (i) is equal to 1, setting the ith signal frame as an effective signal frame;
judging whether the short-time frame energy of the jth path of frame signal of the ith frame signal is smaller than a third preset threshold value or not to obtain a fourth judgment result;
if the fourth judgment result shows that the short-time frame energy of the jth frame signal of the ith frame signal is smaller than the third preset threshold, setting the ith signal frame as the termination point of the voice signal to obtain an effective frame signal subset;
if the fourth judgment result shows that the short-time frame energy of the jth frame signal of the ith frame signal is not less than the third preset threshold, increasing the value of i by 1, and returning to the step of using the formula
Figure BDA0002031981220000031
And calculating the zero crossing rate of the jth frame signal of the ith frame signal.
Optionally, the obtaining, according to the effective frame signal subset, an average generalized spectrum subtraction correction phase transformation function with which any two effective frame signals are fused with quadratic correlation specifically includes:
calculating the quadratic correlation of any two paths of frame signals of each effective frame signal according to the effective frame signal subset;
calculating the power spectrum of each path of frame signal of each effective frame signal according to the effective frame signal subset;
acquiring a noise masking function of each path of frame signal of each effective frame signal according to the power spectrum of each path of frame signal:
Figure BDA0002031981220000032
wherein z ispq(ω) noise masking function of the q-th frame signal representing the p-th effective frame signal, Xpq(ω) represents a power spectrum of a q-th frame signal of the p-th effective frame signal, q is 1,2,3,4, N (ω) noise power spectrum, α represents a first coefficient, and β represents a second coefficient;
acquiring generalized spectrum subtraction correction phase transformation function of any two paths of frame signals of each effective frame signal fused with quadratic correlation according to the noise masking function of each path of frame signals of each effective frame signal and the quadratic correlation of any two paths of frame signals:
Figure BDA0002031981220000041
wherein phi isls_p(ω) a generalized spectrally-subtracted modified phase transform function in which the l-th frame signal and the s-th frame signal, which represent the p-th valid frame signal, are fused together in a quadratic correlation, where l is 1,2,3,4, s is 1,2,3,4, l is not equal to s,
Figure BDA0002031981220000042
Xpl(omega) and Xps(ω) represents a power spectrum of the l-th frame signal and a power spectrum of the s-th frame signal of the p-th effective frame signal, respectively, and ρ represents a third coefficient;
according to the generalized spectrum subtraction correction phase transformation function of the fusion quadratic correlation of any two paths of frame signals of each effective frame signal, obtaining the average generalized spectrum subtraction correction phase transformation function of the fusion quadratic correlation of any two paths of effective frame signals:
Figure BDA0002031981220000043
wherein the content of the first and second substances,
Figure BDA0002031981220000044
and the average generalized spectrum subtraction correction phase transformation function which represents that the l-th path of effective frame signal and the s-th path of effective frame signal are fused with secondary correlation is represented, and P represents the number of effective frame signals in the effective frame signal subset.
Optionally, the determining the direction position of the sound source according to the geometric position of the quaternary microphone array and the time delay values of any two paths of microphone sound source signals specifically includes:
according to the geometric position of the quaternary microphone array and the time delay values of any two paths of microphone sound source signals, a formula is utilized
Figure BDA0002031981220000045
Calculating azimuth angle theta of sound source to coordinate origin
According to the geometric position of the quaternary microphone array and the time delay values of any two paths of microphone sound source signals, a formula is utilized
Figure BDA0002031981220000046
Calculating azimuth angle and pitch angle of sound source to origin of coordinates
Figure BDA0002031981220000047
Where c is the sound velocity, d is the distance from the microphone element to the origin of coordinates, τ12Representing the time delay value, tau, of the 1 st and 2 nd microphone source signals13Representing the time delay value, tau, of the 1 st path microphone source signal and the 3 rd path microphone source signal14Which represents the delay values of the 1 st path microphone source signal and the 4 th path microphone source signal.
Optionally, the synchronous framing processing is performed on the four sound source voice signals to obtain a frame signal set, and the method further includes:
carrying out voice enhancement processing on each path of sound source voice signal to obtain a signal subjected to voice enhancement processing;
performing band-pass filtering processing on the signal subjected to the voice enhancement processing to obtain a signal subjected to the band-pass filtering processing;
and denoising the signal subjected to the band-pass filtering by using a wavelet threshold to obtain a preprocessed sound source voice signal.
A sound source localization system, comprising:
the sound source voice signal acquisition module is used for acquiring four paths of sound source voice signals by adopting a quaternary microphone array; the quaternary microphone array comprises four microphones, and each microphone collects one path of sound source voice signals;
a framing module, configured to perform synchronous framing processing on four channels of the sound source voice signals to obtain a frame signal set, where each frame signal in the signal frame set includes four channels of frame signals, namely a first channel of frame signal, a second channel of frame signal, a third channel of frame signal, and a fourth channel of frame signal;
the effective frame signal subset acquisition module is used for judging the effectiveness of each frame signal in the frame signal set to obtain an effective frame signal subset;
the fusion quadratic correlation average generalized spectrum subtraction correction phase transformation function acquisition module is used for acquiring the fusion quadratic correlation average generalized spectrum subtraction correction phase transformation function of any two paths of effective frame signals according to the effective frame signal subsets;
the time delay value calculation module is used for acquiring a time point corresponding to a maximum peak value of a modified phase transformation function fused with a quadratic correlation average generalized spectrum in any two paths of effective frame signals to obtain time delay values of any two paths of microphone sound source signals;
and the direction position determining module is used for determining the direction position of the sound source according to the geometric position of the quaternary microphone array and the time delay values of any two paths of microphone sound source signals.
Optionally, the framing module specifically includes:
a framing sub-module for applying a window function
Figure BDA0002031981220000051
Carrying out synchronous windowing and framing processing on the four sound source voice signals to obtain a frame signal xij(N), N denotes the nth sampling point, N is 1,2ij(n) a j-th signal indicating an ith frame signal, where j is 1,2,3, 4;
and the synthesis submodule is used for synthesizing all the frame signals into a frame signal set.
Optionally, the valid frame signal subset obtaining module specifically includes:
short time frame energy calculation submodule for utilizing formula
Figure BDA0002031981220000061
Calculating the short-time frame energy of the jth frame signal of the ith frame signal(ii) a Wherein E isijShort-time frame energy of a jth frame signal of an ith frame signal is represented, N represents an nth sampling point, and N is 1, 2.
The first judgment submodule is used for judging whether the short-time frame energy of the jth path of frame signal of the ith frame signal is greater than a first preset threshold value or not to obtain a first judgment result;
a first judgment result processing submodule for increasing the value of i by 1, calling the short-time frame energy calculation submodule and executing the step of utilizing the formula
Figure BDA0002031981220000062
Calculating short-time frame energy of a jth frame signal of the ith frame signal; if the first judgment result shows that the short-time array energy is larger than the first preset threshold, setting the ith frame signal as a starting point, and increasing the value of i by 1;
a zero-crossing rate calculation submodule for utilizing the formula
Figure BDA0002031981220000063
Calculating the zero crossing rate of the jth frame signal of the ith frame signal; wherein the content of the first and second substances,
Figure BDA0002031981220000064
the second judgment submodule is used for judging whether the zero crossing rate is greater than a second preset threshold value or not to obtain a second judgment result;
a second judgment result processing submodule, configured to, if the second judgment result indicates that the zero crossing rate is greater than the second preset threshold, mark T of a jth frame signal of an ith frame signalijIs set to 1; if the obtained judgment result shows that the zero crossing rate is not greater than the second preset threshold value, marking T of the jth path of frame signal of the ith frame signalijSet to 0;
a total state value ss (i) calculation submodule for using the formula ss (i) ═ Ti1&&Ti2&&Ti3&&Ti4Calculating the total state value SS (i) of the marks of the four paths of frame signals of the ith frame signal; wherein, Ti1、Ti2、Ti3And Ti4Marks respectively representing the 1 st path, the 2 nd path, the 3 rd path and the 4 th path of the ith frame signal;
a third judging submodule, configured to judge whether the total state value ss (i) is equal to 1, to obtain a third judgment result;
a third result processing sub-module, configured to set an ith signal frame as an effective signal frame if the third determination result indicates that ss (i) is equal to 1;
the fourth judgment submodule is used for judging whether the short-time frame energy of the jth path of frame signal of the ith frame signal is smaller than a third preset threshold value or not to obtain a fourth judgment result;
a fourth judgment result processing submodule, configured to set the ith signal frame as a termination point of the voice signal to obtain an effective frame signal subset if the fourth judgment result indicates that the short-time frame energy of the jth frame signal of the ith frame signal is smaller than the third preset threshold; if the fourth judgment result shows that the short-time frame energy of the jth path frame signal of the ith frame signal is not less than the third preset threshold, increasing the value of i by 1, calling a zero crossing rate calculation submodule, and executing the step of utilizing a formula
Figure BDA0002031981220000071
And calculating the zero crossing rate of the jth frame signal of the ith frame signal.
Optionally, the module for obtaining the fused quadratic correlation average generalized spectrum subtraction correction phase transformation function specifically includes:
the secondary correlation calculation submodule is used for calculating the secondary correlation of any two paths of frame signals of each effective frame signal according to the effective frame signal subset;
the power spectrum calculation submodule is used for calculating the power spectrum of each path of frame signal of each effective frame signal according to the effective signal subset;
the noise masking function obtaining submodule is used for obtaining the noise masking function of each path of frame signal of each effective frame signal according to the power spectrum of each path of frame signal:
Figure BDA0002031981220000072
wherein z ispq(ω) noise masking function of the q-th frame signal representing the p-th effective frame signal, Xpq(ω) represents a power spectrum of a q-th frame signal of the p-th effective frame signal, q is 1,2,3,4, N (ω) noise power spectrum, α represents a first coefficient, and β represents a second coefficient;
the generalized spectrum subtraction correction phase transformation function fused secondary correlation obtaining sub-module is used for obtaining the generalized spectrum subtraction correction phase transformation function fused secondary correlation of any two paths of frame signals of each effective frame signal according to the noise masking function of each path of frame signals of each effective frame signal and the secondary correlation of any two paths of frame signals:
Figure BDA0002031981220000081
wherein phi isls_p(ω) a generalized spectrally-subtracted modified phase transform function in which the l-th frame signal and the s-th frame signal, which represent the p-th valid frame signal, are fused together in a quadratic correlation, where l is 1,2,3,4, s is 1,2,3,4, l is not equal to s,
Figure BDA0002031981220000082
Xpl(omega) and Xps(ω) represents a power spectrum of the l-th frame signal and a power spectrum of the s-th frame signal of the p-th effective frame signal, respectively, and ρ represents a third coefficient;
the quadratic correlation fused average generalized spectrum subtraction correction phase transformation function obtaining sub-module is used for fusing quadratic correlation generalized spectrum subtraction correction phase transformation functions according to any two paths of frame signals of each effective frame signal to obtain quadratic correlation fused average generalized spectrum subtraction correction phase transformation functions of any two paths of effective frame signals:
Figure BDA0002031981220000083
wherein the content of the first and second substances,
Figure BDA0002031981220000084
and the average generalized spectrum subtraction correction phase transformation function which represents that the l-th path of effective frame signal and the s-th path of effective frame signal are fused with secondary correlation is represented, and P represents the number of effective frame signals in the effective frame signal subset.
According to the specific embodiment provided by the invention, the invention discloses the following technical effects:
the invention discloses a sound source positioning method and a sound source positioning system. The sound source positioning method firstly obtains color sound source voice signal windowing framing for a quaternary microphone array, then detects signal effective frame signals, and calculates and fuses quadratic correlation generalized spectrum subtraction correction phase transformation functions for the screened effective frame signals. In order to further improve the time delay precision, the average generalized spectrum fused with quadratic correlation is adopted to subtract the modified phase transformation function to calculate the time delay value. And finally, estimating the sound source direction according to the geometric position of the microphone array and the calculated time delay value, thereby improving the precision of sound source positioning.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings needed to be used in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings without inventive exercise.
Fig. 1 is a flowchart of a sound source positioning method according to the present invention;
FIG. 2 is a diagram of a model of a quad microphone alignment provided by the present invention;
FIG. 3 is a graph of accuracy versus context for delay estimation of different algorithms at each frame under-5 dB noise environment provided by the present invention;
FIG. 4 is a graph comparing accuracy of delay estimation for different algorithms at each frame in a 5dB noise environment according to the present invention;
FIG. 5 is a graph comparing accuracy of delay estimation for different algorithms at each frame under an environment with a reverberation time of 750ms and a noise of 5dB provided by the present invention;
FIG. 6 is a diagram of an acquisition card according to the present invention;
FIG. 7 is a pictorial view of a microphone provided in accordance with the present invention;
fig. 8 is a physical diagram of a quaternary microphone array provided by the present invention;
fig. 9 is a block diagram of a sound source localization system according to the present invention.
Detailed Description
The invention aims to provide a sound source positioning method and a sound source positioning system so as to improve the accuracy of sound source positioning.
In order to make the aforementioned objects, features and advantages of the present invention comprehensible, embodiments accompanied with figures are described in further detail below.
Example 1
The embodiment 1 of the invention provides a sound source positioning method.
As shown in fig. 1, the sound source localization method includes the steps of:
step 101, acquiring four sound source voice signals by adopting a quaternary microphone array; the quaternary microphone array comprises four microphones, and each microphone collects one path of sound source voice signals; 102, synchronously framing four sound source voice signals to obtain a frame signal set, wherein each frame signal in the signal frame set comprises four frame signals which are a first frame signal, a second frame signal, a third frame signal and a fourth frame signal respectively; 103, judging the validity of each frame signal in the frame signal set to obtain a valid frame signal subset; 104, acquiring an average generalized spectrum subtraction correction phase transformation function fused with quadratic correlation of any two paths of effective frame signals according to the effective frame signal subsets; step 105, acquiring a time point corresponding to a maximum peak value of any two paths of microphone sound source signals by fusing any two paths of effective frame signals with a quadratic correlation average generalized spectrum subtraction correction phase transformation function; and 106, determining the direction position of the sound source according to the geometric position of the quaternary microphone array and the time delay values of any two paths of microphone sound source signals.
Example 2
Example 2 of the present invention provides a preferred embodiment of a sound source localization method, but the implementation of the present invention is not limited to the embodiment defined in example 2 of the present invention.
The quaternary microphone array in step 101 is shown in fig. 2, where the coordinate of the quaternary microphone is m1(d,0,0),m2(0,d,0),m3(-d,0,0),m4(0, -d,0), d being the microphone element to origin distance.
After four sound source voice signals are obtained, carrying out voice enhancement processing on each sound source voice signal to obtain a signal subjected to voice enhancement processing; performing band-pass filtering processing on the signal subjected to the voice enhancement processing to obtain a signal subjected to the band-pass filtering processing; and denoising the signal subjected to the band-pass filtering by using a wavelet threshold to obtain a preprocessed sound source voice signal.
Step 102, performing synchronous framing processing on the four sound source voice signals to obtain a frame signal set, specifically including: using window functions
Figure BDA0002031981220000101
Carrying out synchronous windowing and framing processing on the four sound source voice signals to obtain a frame signal xij(N), N denotes the nth sampling point, N is 1,2ij(n) a j-th signal indicating an ith frame signal, where j is 1,2,3, 4; all the frame signals are combined into a set of frame signals.
Step 103, judging the validity of each frame signal in the frame signal set to obtain a valid frame signal subset, specifically including: using formulas
Figure BDA0002031981220000102
Calculating the short time of the jth frame signal of the ith frame signalFrame energy; wherein E isijShort-time frame energy of a jth frame signal of an ith frame signal is represented, N represents an nth sampling point, and N is 1, 2. Judging whether the short-time frame energy of the jth path of frame signal of the ith frame signal is greater than a first preset threshold value or not to obtain a first judgment result; if the first judgment result shows that the short-time frame energy is not greater than the first preset threshold, increasing the value of i by 1, and returning to the step of utilizing the formula
Figure BDA0002031981220000103
Calculating short-time frame energy of a jth frame signal of the ith frame signal; if the first judgment result shows that the short-time array energy is larger than the first preset threshold, setting the ith frame signal as a starting point, and increasing the value of i by 1; using formulas
Figure BDA0002031981220000111
Calculating the zero crossing rate of the jth frame signal of the ith frame signal; wherein the content of the first and second substances,
Figure BDA0002031981220000112
judging whether the zero crossing rate is greater than a second preset threshold value or not to obtain a second judgment result; if the second judgment result shows that the zero crossing rate is greater than the second preset threshold, marking T of the jth path of frame signal of the ith frame signalijIs set to 1; if the obtained judgment result shows that the zero crossing rate is not greater than the second preset threshold value, marking T of the jth path of frame signal of the ith frame signalijSet to 0; using the formula SS (i) ═ Ti1&&Ti2&&Ti3&&Ti4Calculating the total state value SS (i) of the marks of the four paths of frame signals of the ith frame signal; wherein, Ti1、Ti2、Ti3And Ti4Marks respectively representing the 1 st path, the 2 nd path, the 3 rd path and the 4 th path of the ith frame signal; judging whether the total state value SS (i) is equal to 1 or not to obtain a third judgment result; if the third judgment result indicates that SS (i) is equal to 1, setting the ith signal frame as an effective signal frame; judging the jth frame of the ith frame signalWhether the short-time frame energy of the signal is smaller than a third preset threshold value or not is judged to obtain a fourth judgment result; if the fourth judgment result shows that the short-time frame energy of the jth frame signal of the ith frame signal is smaller than the third preset threshold, setting the ith signal frame as the termination point of the voice signal to obtain an effective frame signal subset; if the fourth judgment result shows that the short-time frame energy of the jth frame signal of the ith frame signal is not less than the third preset threshold, increasing the value of i by 1, and returning to the step of using the formula
Figure BDA0002031981220000113
And calculating the zero crossing rate of the jth frame signal of the ith frame signal.
Step 104, obtaining an average generalized spectrum subtraction correction phase transformation function fused with quadratic correlation of any two effective frame signals according to the effective frame signal subset, specifically including: calculating the quadratic correlation of any two paths of frame signals of each effective frame signal according to the effective frame signal subset; calculating the power spectrum of each path of frame signal of each effective frame signal according to the effective frame signal subset; acquiring a noise masking function of each path of frame signal of each effective frame signal according to the power spectrum of each path of frame signal:
Figure BDA0002031981220000114
wherein z ispq(ω) noise masking function of the q-th frame signal representing the p-th effective frame signal, Xpq(ω) represents a power spectrum of a q-th frame signal of the p-th effective frame signal, q is 1,2,3,4, N (ω) noise power spectrum, α represents a first coefficient, and β represents a second coefficient; acquiring generalized spectrum subtraction correction phase transformation function of any two paths of frame signals of each effective frame signal fused with quadratic correlation according to the noise masking function of each path of frame signals of each effective frame signal and the quadratic correlation of any two paths of frame signals:
Figure BDA0002031981220000121
wherein phi isls_p(ω) a generalized spectrally-subtracted modified phase transform function in which the l-th frame signal and the s-th frame signal, which represent the p-th valid frame signal, are fused together in a quadratic correlation, where l is 1,2,3,4, s is 1,2,3,4, l is not equal to s,
Figure BDA0002031981220000122
Xpl(omega) and Xps(ω) represents a power spectrum of the l-th frame signal and a power spectrum of the s-th frame signal of the p-th effective frame signal, respectively, and ρ represents a third coefficient; according to the generalized spectrum subtraction correction phase transformation function of the fusion quadratic correlation of any two paths of frame signals of each effective frame signal, obtaining the average generalized spectrum subtraction correction phase transformation function of the fusion quadratic correlation of any two paths of effective frame signals:
Figure BDA0002031981220000123
wherein the content of the first and second substances,
Figure BDA0002031981220000124
and the average generalized spectrum subtraction correction phase transformation function which represents that the l-th path of effective frame signal and the s-th path of effective frame signal are fused with secondary correlation is represented, and P represents the number of effective frame signals in the effective frame signal subset.
Step 105, determining the direction position of the sound source according to the geometric position of the quaternary microphone array and the time delay values of any two paths of microphone sound source signals, specifically comprising:
according to the geometric position of the quaternary microphone array and the time delay values of any two paths of microphone sound source signals, a formula is utilized
Figure BDA0002031981220000125
Calculating azimuth angle theta of sound source to coordinate origin
According to the geometric position of the quaternary microphone array and the time delay values of any two paths of microphone sound source signals, a formula is utilized
Figure BDA0002031981220000126
Calculating azimuth angle and pitch angle of sound source to origin of coordinates
Figure BDA0002031981220000127
Where c is the sound velocity, d is the distance from the microphone element to the origin of coordinates, τ12Representing the time delay value, tau, of the 1 st and 2 nd microphone source signals13Representing the time delay value, tau, of the 1 st path microphone source signal and the 3 rd path microphone source signal14Which represents the delay values of the 1 st path microphone source signal and the 4 th path microphone source signal. Specifically, the calculation formula x of the spherical coordinates is calculated according to the geometrical position relationship of the array of the four-element microphones (the coordinates of the four-element array microphone are m1(d,0,0), m2(0, d,0), m3(-d,0,0), m4(0, -d,0))2+y2+z2=r2Formula for calculating distance between two points
Figure BDA0002031981220000131
And formula of velocity
Figure BDA0002031981220000132
Solving for azimuth
Figure BDA0002031981220000133
And a pitch angle
Figure BDA0002031981220000134
In order to illustrate the effect of the sound source localization method of the present invention, the present invention performs analog simulation comparison under different Signal-to-noise ratios and reverberation environments, and as can be seen from fig. 3 and 4, under a medium noise environment (SNB (Signal-to-noise ratio) ═ 5dB), the accuracy performance of the time delay value estimated by the Phase transformation (PHAT) algorithm is far inferior to the improved Cross Power Phase algorithm (MCPSP) and the Generalized spectral subtraction Modified Cross correlation function (GCC-APHAT) method, while APHAT is significantly better than the MCPSP method; under a strong noise environment (SNB ═ 5dB), the PHAT performance is reduced sharply, and only the MCPSP and the APHAT still maintain good performance. As can be seen from fig. 5, under the environmental conditions where both strong reverberation and strong noise exist (T60-750 ms and SNB-5 dB), the APHAT algorithm has better delay accuracy than the PHAT and MCPSP algorithms. The analysis and comparison in the simultaneous way can verify that the APHAT algorithm has better robustness to noise and reverberation.
In order to further explain the effect of the sound source positioning method, the method is used for building and carrying out real environment experiments, a multi-channel data acquisition card Q801 of Beijing Acoustic science technology Limited (SKC) is used for recording sound source signals, as shown in fig. 6, an array support and a microphone MP40 in a quaternary microphone array are both products of SKC manufacturers, as shown in fig. 7 and 8.
The experiments were all completed in a 7.2m x 6m x 3.2m room with both doors and windows closed. Given that certain background noise and reverberation exist in a room, including reflection of sound of a computer host fan, tables and chairs and other artificial interference, a sound source is a girl speaking voice (I goes to Beijing), and the sound source is a section of voice recorded in an actual environment. The sampling rate of the signal is 8kHz, the frame length of the framing is 256, the frame shift is 128, and a Hamming window is added. The coordinates of the quaternary microphone array are respectively: m is1(25cm,0,0),m2(0,25cm,0),m3(-25cm,0,0),m4(0, -25cm,0), and the microphone array is arranged at a height of 70cm from the ground. In the comparison of experimental data in an actual environment, a plurality of sound source positions in the following table 1 are respectively selected in an experiment, and 10 groups of data are collected at each sound source position. The invention takes PHAT and MCPSP algorithms as reference, and compares the performance of the APHAT algorithm and other two algorithms through experimental analysis. Table 1 shows the result of comparing the experimental data with the actual sound source localization results of the improved algorithm APHAT, PHAT and MCPSP in the actual environment, the localization error result is shown in table 2, and the root mean square error of localization is shown in table 3: TABLE 1
Serial number S(x,y,z) (r,θ,φ) PHAT MCPSP APHAT
1 (1,1,0.76) (1.6,45°,61.6°) (45°,58.6°) (45°,58.6°) (45°,57.3°)
2 (2,1,0.76) (2.36,26.6°,71.2°) (26.6°,74.6°) (26.6°,74.6°) (26.6°,71.9°)
3 (2,2,076) (2.93,45°,75°) (45°,77.4°) (45°,77.4°) (45°,74.1°)
4 (-2,1,0.76) (2.36,-26.6°,71.2°) (-26.6°,74.6°) (-26.6°,74.6°) (-26.6°,71.9°)
5 (-2,2,0.76) (2.93,-45°,75°) (-45°,77.4°) (-45°,77.4°) (-45°,74.1°)
6 (1.2,0.6,076) (1.54,26.6°,60.4°) (37.9°,79.5°) (29.1°,62.6°) (29.1°,61.1°)
7 (-2.4,2.4,0.76) (3.48,-45°,77.4°) (-45°,77.4°) (-45°,77.4°) (-45°,74.1°)
8 (1.5,1.2,0.76) (2.07,38.7°,68.5°) (41.2°,66.5°) (37.9°,79.5°) (41.2°,64.6°)
9 (1.8,1.2,076) (2.29,33.7°,70.6°) (0,234°) (37.9°,79.5°) (37.9°,75.7°)
10 (2,1.2,0.76) (2.45,31°,71.9°) (-36.9°,59.6°) (30.1°,90.2°) (31°,82.4°)
11 (1.2,0,0.76) (1.42,0°,57.7°) (0,59.6°) (0,59.6°) (0,58.2°)
12 (0,1.8,0.76) (1.95,90°,67.1°) (0,234°) (-90°,71.6°) (90°,69.2°)
13 (1.2,2.4,076 (2.79,63.4°,74.2°) (63.4°,74.6°) (63.4°,74.6°) (63.4°,71.9°)
14 (0,1.2,0.76) (1.42,90°,57.7°) (-90°,59.6°) (-90°,59.6°) (90°,58.2°)
15 (-1.2,0,0.76) (1.42,180°,57.7°) (180,234°) (180,59.6°) (180°,58.7°)
16 (0,-1.2,0.76) (1.42,-90°,57.7°) (90°,59.6°) (90°,59.6°) (-90°,58.2°)
17 (-0.6,-1.2,0.76) (1.54,63.4°,60.4°) (60.9°,62.6°) (60.9°,62.6°) (60.9°,61.1°)
18 (0.6,-1.2,0.76) (1.54,-63.4°,60.4°) (-63.4°,74.6°) (-68.2°,68.3°) (-68.2°,66.3°)
TABLE 2
Figure BDA0002031981220000151
TABLE 3
PHAT MCPSP APHAT
Azimuth angle thetaRMSE 59.4 68.6 1.6
Angle of pitch phiRMSE 63.2 4.6 2.7
From the comparative analysis of the experimental results in tables 1 and 2, it can be seen that: under the actual environment, the performance of estimating the azimuth angle and the pitch angle by the PHAT algorithm is unstable and has larger error. Estimating the positioning direction opposite errors of the sound source when the sound source is positioned in the X axis and the Y axis of the system coordinate by the MCPSP algorithm azimuth angle; the positioning performance of the APHAT algorithm is stable, the positioning progress is high, and the root mean square error REMS of the azimuth angle of the APHAT algorithm is 1.6 and the root mean square error REMS of the pitch angle is 2.7 as can be seen from the table 3; the angular deviation error of the APHAT direction of the algorithm is basically within an acceptable range, and the accuracy is relatively high. This also verifies the efficient performance of the algorithm proposed herein.
Example 3
Embodiment 3 the present invention provides a sound source localization system.
As shown in fig. 9, the present invention provides a sound source localization system including: a sound source voice signal acquisition module 901, configured to acquire four paths of sound source voice signals by using a quaternary microphone array; the quaternary microphone array comprises four microphones, and each microphone collects one path of sound source voice signals; a framing module 902, configured to perform synchronous framing processing on four channels of the sound source voice signals to obtain a frame signal set, where each frame signal in the signal frame set includes four channels of frame signals, which are a first channel of frame signal, a second channel of frame signal, a third channel of frame signal, and a fourth channel of frame signal; an effective frame signal subset obtaining module 903, configured to determine validity of each frame signal in the frame signal set, so as to obtain an effective frame signal subset; a fused secondary correlation average generalized spectrum subtraction correction phase transformation function obtaining module 904, configured to obtain a fused secondary correlation average generalized spectrum subtraction correction phase transformation function of any two paths of effective frame signals according to the effective frame signal subset; the time delay value calculation module 905 is used for acquiring a time point corresponding to a maximum peak value of the modified phase transformation function fused with the quadratic correlation average generalized spectrum subtraction correction of any two paths of effective frame signals to obtain time delay values of any two paths of microphone sound source signals; and a direction position determining module 906, configured to determine a direction position of the sound source according to the geometric position of the quaternary microphone array and the time delay values of any two paths of microphone sound source signals.
Example 4
Example 4 of the present invention provides a preferred implementation of a sound source localization system.
The framing module 902 specifically includes: a framing sub-module for applying a window function
Figure BDA0002031981220000161
Carrying out synchronous windowing and framing processing on the four sound source voice signals to obtain a frame signal xij(N), N denotes the nth sampling point, N is 1,2ij(n) a j-th signal indicating an ith frame signal, where j is 1,2,3, 4; and the synthesis submodule is used for synthesizing all the frame signals into a frame signal set.
The valid frame signal subset obtaining module 903 specifically includes: short time frame energy calculation submodule for utilizing formula
Figure BDA0002031981220000162
Calculating short-time frame energy of a jth path of frame signals of the ith frame signal; wherein E isijShort-time frame energy of a jth frame signal of an ith frame signal is represented, N represents an nth sampling point, and N is 1, 2. The first judgment submodule is used for judging whether the short-time frame energy of the jth path of frame signal of the ith frame signal is greater than a first preset threshold value or not to obtain a first judgment result; a first judgment result processing submodule for increasing the value of i by 1, calling the short-time frame energy calculation submodule and executing the step of utilizing the formula
Figure BDA0002031981220000171
Calculating short-time frame energy of a jth frame signal of the ith frame signal; if the first judgment result shows that the short-time array energy is larger than the first preset threshold, setting the ith frame signal as a starting point, and increasing the value of i by 1; a zero-crossing rate calculation submodule for utilizing the formula
Figure BDA0002031981220000172
Calculating the zero crossing rate of the jth frame signal of the ith frame signal(ii) a Wherein the content of the first and second substances,
Figure BDA0002031981220000173
the second judgment submodule is used for judging whether the zero crossing rate is greater than a second preset threshold value or not to obtain a second judgment result; a second judgment result processing submodule, configured to, if the second judgment result indicates that the zero crossing rate is greater than the second preset threshold, mark T of a jth frame signal of an ith frame signalijIs set to 1; if the obtained judgment result shows that the zero crossing rate is not greater than the second preset threshold value, marking T of the jth path of frame signal of the ith frame signalijSet to 0;
a total state value ss (i) calculation submodule for using the formula ss (i) ═ Ti1&&Ti2&&Ti3&&Ti4Calculating the total state value SS (i) of the marks of the four paths of frame signals of the ith frame signal; wherein, Ti1、Ti2、Ti3And Ti4Marks respectively representing the 1 st path, the 2 nd path, the 3 rd path and the 4 th path of the ith frame signal; a third judging submodule, configured to judge whether the total state value ss (i) is equal to 1, to obtain a third judgment result; a third result processing sub-module, configured to set an ith signal frame as an effective signal frame if the third determination result indicates that ss (i) is equal to 1; the fourth judgment submodule is used for judging whether the short-time frame energy of the jth path of frame signal of the ith frame signal is smaller than a third preset threshold value or not to obtain a fourth judgment result; a fourth judgment result processing submodule, configured to set the ith signal frame as a termination point of the voice signal to obtain an effective frame signal subset if the fourth judgment result indicates that the short-time frame energy of the jth frame signal of the ith frame signal is smaller than the third preset threshold; if the fourth judgment result shows that the short-time frame energy of the jth path frame signal of the ith frame signal is not less than the third preset threshold, increasing the value of i by 1, calling a zero crossing rate calculation submodule, and executing the step of utilizing a formula
Figure BDA0002031981220000174
Calculating the jth frame signal of the ith frame signalZero crossing rate of ".
The module 904 for obtaining the mean generalized spectrum subtraction correction phase transformation function with fused quadratic correlations specifically includes: the secondary correlation calculation submodule is used for calculating the secondary correlation of any two paths of frame signals of each effective frame signal according to the effective frame signal subset; the power spectrum calculation submodule is used for calculating the power spectrum of each path of frame signal of each effective frame signal according to the effective signal subset; the noise masking function obtaining submodule is used for obtaining the noise masking function of each path of frame signal of each effective frame signal according to the power spectrum of each path of frame signal:
Figure BDA0002031981220000181
wherein z ispq(ω) noise masking function of the q-th frame signal representing the p-th effective frame signal, Xpq(ω) represents a power spectrum of a q-th frame signal of the p-th effective frame signal, q is 1,2,3,4, N (ω) noise power spectrum, α represents a first coefficient, and β represents a second coefficient; the generalized spectrum subtraction correction phase transformation function fused secondary correlation obtaining sub-module is used for obtaining the generalized spectrum subtraction correction phase transformation function fused secondary correlation of any two paths of frame signals of each effective frame signal according to the noise masking function of each path of frame signals of each effective frame signal and the secondary correlation of any two paths of frame signals:
Figure BDA0002031981220000182
wherein phi isls_p(ω) a generalized spectrally-subtracted modified phase transform function in which the l-th frame signal and the s-th frame signal, which represent the p-th valid frame signal, are fused together in a quadratic correlation, where l is 1,2,3,4, s is 1,2,3,4, l is not equal to s,
Figure BDA0002031981220000183
Xpl(omega) and Xps(ω) the power spectrum of the l-th frame signal and the power spectrum of the p-th effective frame signal, respectivelyThe power spectrum of the s-channel frame signals, and rho represents a third coefficient; the quadratic correlation fused average generalized spectrum subtraction correction phase transformation function obtaining sub-module is used for fusing quadratic correlation generalized spectrum subtraction correction phase transformation functions according to any two paths of frame signals of each effective frame signal to obtain quadratic correlation fused average generalized spectrum subtraction correction phase transformation functions of any two paths of effective frame signals:
Figure BDA0002031981220000184
wherein the content of the first and second substances,
Figure BDA0002031981220000185
and the average generalized spectrum subtraction correction phase transformation function which represents that the l-th path of effective frame signal and the s-th path of effective frame signal are fused with secondary correlation is represented, and P represents the number of effective frame signals in the effective frame signal subset.
According to the specific embodiment provided by the invention, the invention discloses the following technical effects:
the invention discloses a sound source positioning method and a sound source positioning system. The sound source positioning method firstly obtains color sound source voice signal windowing framing for a quaternary microphone array, then detects signal effective frame signals, and calculates and fuses quadratic correlation generalized spectrum subtraction correction phase transformation functions for the screened effective frame signals. In order to further improve the time delay precision, the average generalized spectrum fused with quadratic correlation is adopted to subtract the modified phase transformation function to calculate the time delay value. And finally, estimating the sound source direction according to the geometric position of the microphone array and the calculated time delay value, thereby improving the precision of sound source positioning.
The embodiments in the present description are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other. For the system disclosed by the embodiment, the description is relatively simple because the system corresponds to the method disclosed by the embodiment, and the relevant points can be referred to the method part for description.
The principle and the implementation manner of the present invention are explained by applying specific examples, the above description of the embodiments is only used to help understanding the method of the present invention and the core idea thereof, the described embodiments are only a part of the embodiments of the present invention, not all embodiments, and all other embodiments obtained by a person of ordinary skill in the art based on the embodiments of the present invention without creative efforts belong to the protection scope of the present invention.

Claims (8)

1. A sound source localization method, characterized by comprising the steps of:
acquiring four sound source voice signals by adopting a quaternary microphone array; the quaternary microphone array comprises four microphones, and each microphone collects one path of sound source voice signals;
performing synchronous framing processing on four paths of sound source voice signals to obtain a frame signal set, wherein each frame signal in the signal frame set comprises four paths of frame signals which are a first path of frame signal, a second path of frame signal, a third path of frame signal and a fourth path of frame signal respectively;
judging the validity of each frame signal in the frame signal set to obtain a valid frame signal subset;
acquiring an average generalized spectrum subtraction correction phase transformation function fused with secondary correlation of any two paths of effective frame signals according to the effective frame signal subset;
acquiring sample points corresponding to maximum peak values of any two paths of microphone sound source signals by fusing any two paths of effective frame signals with secondary correlation average generalized spectrum subtraction correction phase transformation functions;
determining the direction position of a sound source according to the geometric position of the quaternary microphone array and the time delay values of any two paths of microphone sound source signals;
the obtaining of the average generalized spectrum subtraction correction phase transformation function fused with quadratic correlation of any two effective frame signals according to the effective frame signal subset specifically includes:
according to the effective frame signal subset, combining autocorrelation and cross correlation, and calculating the quadratic correlation of any two paths of frame signals of each effective frame signal;
calculating the power spectrum of each path of frame signal of each effective frame signal according to the effective frame signal subset;
acquiring a noise masking function of each path of frame signal of each effective frame signal according to the power spectrum of each path of frame signal:
Figure FDA0002769786880000011
wherein z ispq(ω) noise masking function of the q-th frame signal representing the p-th effective frame signal, Xpq(ω) represents a power spectrum of a q-th frame signal of the p-th effective frame signal, q is 1,2,3,4, N (ω) noise power spectrum, α represents a first coefficient, and β represents a second coefficient;
acquiring generalized spectrum subtraction correction phase transformation function of any two paths of frame signals of each effective frame signal fused with quadratic correlation according to the noise masking function of each path of frame signals of each effective frame signal and the quadratic correlation of any two paths of frame signals:
Figure FDA0002769786880000021
wherein phi isls_p(ω) a generalized spectrally-subtracted modified phase transform function in which the l-th frame signal and the s-th frame signal, which represent the p-th valid frame signal, are fused together in a quadratic correlation, where l is 1,2,3,4, s is 1,2,3,4, l is not equal to s,
Figure FDA0002769786880000022
Figure FDA0002769786880000023
Xpl(omega) and Xps(ω) represents a power spectrum of the l-th frame signal and a power spectrum of the s-th frame signal of the p-th effective frame signal, respectively, and ρ represents a third coefficient;
according to the generalized spectrum subtraction correction phase transformation function of the fusion quadratic correlation of any two paths of frame signals of each effective frame signal, obtaining the average generalized spectrum subtraction correction phase transformation function of the fusion quadratic correlation of any two paths of effective frame signals:
Figure FDA0002769786880000024
wherein the content of the first and second substances,
Figure FDA0002769786880000025
and the average generalized spectrum subtraction correction phase transformation function which represents that the l-th path of effective frame signal and the s-th path of effective frame signal are fused with secondary correlation is represented, and P represents the number of effective frame signals in the effective frame signal subset.
2. The method according to claim 1, wherein the step of synchronously framing the four sound source voice signals to obtain a frame signal set comprises:
using window functions
Figure FDA0002769786880000026
Carrying out synchronous windowing and framing processing on the four sound source voice signals to obtain a frame signal xij(N), N denotes the nth sampling point, N is 1,2ij(n) a j-th signal indicating an ith frame signal, where j is 1,2,3, 4;
all the frame signals are combined into a set of frame signals.
3. The method according to claim 1, wherein the determining validity of each frame signal in the frame signal set to obtain a valid frame signal subset comprises:
using formulas
Figure FDA0002769786880000031
Calculating short-time frame energy of a jth path of frame signals of the ith frame signal; wherein E isijShort-time frame energy of a jth frame signal of an ith frame signal is represented, N represents an nth sampling point, and N is 1, 2.
Judging whether the short-time frame energy of the jth path of frame signal of the ith frame signal is greater than a first preset threshold value or not to obtain a first judgment result;
if the first judgment result shows that the short-time frame energy is not greater than the first preset threshold, increasing the value of i by 1, and returning to the step of utilizing the formula
Figure FDA0002769786880000032
Calculating short-time frame energy of a jth frame signal of the ith frame signal;
if the first judgment result shows that the short-time array energy is larger than the first preset threshold, setting the ith frame signal as a starting point, and increasing the value of i by 1;
using formulas
Figure FDA0002769786880000033
Calculating the zero crossing rate of the jth frame signal of the ith frame signal; wherein the content of the first and second substances,
Figure FDA0002769786880000034
judging whether the zero crossing rate is greater than a second preset threshold value or not to obtain a second judgment result;
if the second judgment result shows that the zero crossing rate is greater than the second preset threshold, marking T of the jth path of frame signal of the ith frame signalijIs set to 1;
if the second judgment result shows that the zero crossing rate is not greater than the second preset threshold, marking T of the jth path of frame signal of the ith frame signalijSet to 0;
using the formula SS (i) ═ Ti1&&Ti2&&Ti3&&Ti4Calculating the total state value SS (i) of the marks of the four paths of frame signals of the ith frame signal; wherein, Ti1、Ti2、Ti3And Ti4Individual watchMarks showing the 1 st, 2 nd, 3 rd and 4 th path frame signals of the ith frame signal;
judging whether the total state value SS (i) is equal to 1 or not to obtain a third judgment result;
if the third judgment result indicates that SS (i) is equal to 1, setting the ith signal frame as an effective signal frame;
judging whether the short-time frame energy of the jth path of frame signal of the ith frame signal is smaller than a third preset threshold value or not to obtain a fourth judgment result;
if the fourth judgment result shows that the short-time frame energy of the jth frame signal of the ith frame signal is smaller than the third preset threshold, setting the ith signal frame as the termination point of the voice signal to obtain an effective frame signal subset;
if the fourth judgment result shows that the short-time frame energy of the jth frame signal of the ith frame signal is not less than the third preset threshold, increasing the value of i by 1, and returning to the step of using the formula
Figure FDA0002769786880000041
And calculating the zero crossing rate of the jth frame signal of the ith frame signal.
4. The sound source localization method according to claim 1, wherein the determining a directional position of a sound source according to the geometric position of the quaternary microphone array and the time delay values of any two microphone sound source signals specifically comprises:
according to the geometric position of the quaternary microphone array and the time delay values of any two paths of microphone sound source signals, a formula is utilized
Figure FDA0002769786880000042
Calculating azimuth angle theta of sound source to coordinate origin
According to the geometric position of the quaternary microphone array and the time delay values of any two paths of microphone sound source signals, a formula is utilized
Figure FDA0002769786880000043
Calculating azimuth angle and pitch angle of sound source to origin of coordinates
Figure FDA0002769786880000044
Where c is the sound velocity, d is the distance from the microphone element to the origin of coordinates, τ12Representing the time delay value, tau, of the 1 st and 2 nd microphone source signals13Representing the time delay value, tau, of the 1 st path microphone source signal and the 3 rd path microphone source signal14Which represents the delay values of the 1 st path microphone source signal and the 4 th path microphone source signal.
5. The sound source localization method according to claim 1, wherein the synchronous framing of the four sound source voice signals to obtain a frame signal set further comprises:
carrying out voice enhancement processing on each path of sound source voice signal to obtain a signal subjected to voice enhancement processing;
performing band-pass filtering processing on the signal subjected to the voice enhancement processing to obtain a signal subjected to the band-pass filtering processing;
and denoising the signal subjected to the band-pass filtering by using a wavelet threshold to obtain a preprocessed sound source voice signal.
6. A sound source localization system, comprising:
the sound source voice signal acquisition module is used for acquiring four paths of sound source voice signals by adopting a quaternary microphone array; the quaternary microphone array comprises four microphones, and each microphone collects one path of sound source voice signals;
a framing module, configured to perform synchronous framing processing on four channels of the sound source voice signals to obtain a frame signal set, where each frame signal in the signal frame set includes four channels of frame signals, namely a first channel of frame signal, a second channel of frame signal, a third channel of frame signal, and a fourth channel of frame signal;
the effective frame signal subset acquisition module is used for judging the effectiveness of each frame signal in the frame signal set to obtain an effective frame signal subset;
the fusion quadratic correlation average generalized spectrum subtraction correction phase transformation function acquisition module is used for acquiring the fusion quadratic correlation average generalized spectrum subtraction correction phase transformation function of any two paths of effective frame signals according to the effective frame signal subsets;
the time delay value calculation module is used for acquiring a time point corresponding to a maximum peak value of a modified phase transformation function fused with a quadratic correlation average generalized spectrum in any two paths of effective frame signals to obtain time delay values of any two paths of microphone sound source signals;
the direction position determining module is used for determining the direction position of a sound source according to the geometric position of the quaternary microphone array and the time delay values of any two paths of microphone sound source signals;
the module for obtaining the average generalized spectrum subtraction correction phase transformation function fused with the quadratic correlation specifically comprises:
the secondary correlation calculation submodule is used for combining the autocorrelation and the cross correlation according to the effective frame signal subset and calculating the secondary correlation of any two paths of frame signals of each effective frame signal;
the power spectrum calculation submodule is used for calculating the power spectrum of each path of frame signal of each effective frame signal according to the effective signal subset;
the noise masking function obtaining submodule is used for obtaining the noise masking function of each path of frame signal of each effective frame signal according to the power spectrum of each path of frame signal:
Figure FDA0002769786880000061
wherein z ispq(ω) noise masking function of the q-th frame signal representing the p-th effective frame signal, Xpq(ω) represents a power spectrum of a q-th frame signal of the p-th effective frame signal, q is 1,2,3,4, N (ω) noise power spectrum, α represents a first coefficient, and β represents a second coefficient;
the generalized spectrum subtraction correction phase transformation function fused secondary correlation obtaining sub-module is used for obtaining the generalized spectrum subtraction correction phase transformation function fused secondary correlation of any two paths of frame signals of each effective frame signal according to the noise masking function of each path of frame signals of each effective frame signal and the secondary correlation of any two paths of frame signals:
Figure FDA0002769786880000062
wherein phi isls_p(ω) a generalized spectrally-subtracted modified phase transform function in which the l-th frame signal and the s-th frame signal, which represent the p-th valid frame signal, are fused together in a quadratic correlation, where l is 1,2,3,4, s is 1,2,3,4, l is not equal to s,
Figure FDA0002769786880000063
Figure FDA0002769786880000064
Xpl(omega) and Xps(ω) represents a power spectrum of the l-th frame signal and a power spectrum of the s-th frame signal of the p-th effective frame signal, respectively, and ρ represents a third coefficient;
the quadratic correlation fused average generalized spectrum subtraction correction phase transformation function obtaining sub-module is used for fusing quadratic correlation generalized spectrum subtraction correction phase transformation functions according to any two paths of frame signals of each effective frame signal to obtain quadratic correlation fused average generalized spectrum subtraction correction phase transformation functions of any two paths of effective frame signals:
Figure FDA0002769786880000065
wherein the content of the first and second substances,
Figure FDA0002769786880000066
and the average generalized spectrum subtraction correction phase transformation function which represents that the l-th path of effective frame signal and the s-th path of effective frame signal are fused with secondary correlation is represented, and P represents the number of effective frame signals in the effective frame signal subset.
7. The sound source positioning system according to claim 6, wherein the framing module specifically includes:
a framing sub-module for applying a window function
Figure FDA0002769786880000071
Carrying out synchronous windowing and framing processing on the four sound source voice signals to obtain a frame signal xij(N), N denotes the nth sampling point, N is 1,2ij(n) a j-th signal indicating an ith frame signal, where j is 1,2,3, 4;
and the synthesis submodule is used for synthesizing all the frame signals into a frame signal set.
8. The sound source localization system according to claim 6, wherein the valid frame signal subset acquisition module specifically comprises:
short time frame energy calculation submodule for utilizing formula
Figure FDA0002769786880000072
Calculating short-time frame energy of a jth path of frame signals of the ith frame signal; wherein E isijShort-time frame energy of a jth frame signal of an ith frame signal is represented, N represents an nth sampling point, and N is 1, 2.
The first judgment submodule is used for judging whether the short-time frame energy of the jth path of frame signal of the ith frame signal is greater than a first preset threshold value or not to obtain a first judgment result;
a first judgment result processing submodule for increasing the value of i by 1, calling the short-time frame energy calculation submodule and executing the step of utilizing the formula
Figure FDA0002769786880000073
Calculating short-time frame energy of a jth frame signal of the ith frame signal;if the first judgment result shows that the short-time array energy is larger than the first preset threshold, setting the ith frame signal as a starting point, and increasing the value of i by 1;
a zero-crossing rate calculation submodule for utilizing the formula
Figure FDA0002769786880000074
Calculating the zero crossing rate of the jth frame signal of the ith frame signal; wherein the content of the first and second substances,
Figure FDA0002769786880000075
the second judgment submodule is used for judging whether the zero crossing rate is greater than a second preset threshold value or not to obtain a second judgment result;
a second judgment result processing submodule, configured to, if the second judgment result indicates that the zero crossing rate is greater than the second preset threshold, mark T of a jth frame signal of an ith frame signalijIs set to 1; if the second judgment result shows that the zero crossing rate is not greater than the second preset threshold, marking T of the jth path of frame signal of the ith frame signalijSet to 0;
a total state value ss (i) calculation submodule for using the formula ss (i) ═ Ti1&&Ti2&&Ti3&&Ti4Calculating the total state value SS (i) of the marks of the four paths of frame signals of the ith frame signal; wherein, Ti1、Ti2、Ti3And Ti4Marks respectively representing the 1 st path, the 2 nd path, the 3 rd path and the 4 th path of the ith frame signal;
a third judging submodule, configured to judge whether the total state value ss (i) is equal to 1, to obtain a third judgment result;
a third result processing sub-module, configured to set an ith signal frame as an effective signal frame if the third determination result indicates that ss (i) is equal to 1;
the fourth judgment submodule is used for judging whether the short-time frame energy of the jth path of frame signal of the ith frame signal is smaller than a third preset threshold value or not to obtain a fourth judgment result;
a fourth judgment result processing submodule, configured to set the ith signal frame as a termination point of the voice signal to obtain an effective frame signal subset if the fourth judgment result indicates that the short-time frame energy of the jth frame signal of the ith frame signal is smaller than the third preset threshold; if the fourth judgment result shows that the short-time frame energy of the jth path frame signal of the ith frame signal is not less than the third preset threshold, increasing the value of i by 1, calling a zero crossing rate calculation submodule, and executing the step of utilizing a formula
Figure FDA0002769786880000081
And calculating the zero crossing rate of the jth frame signal of the ith frame signal.
CN201910312565.6A 2019-04-18 2019-04-18 Sound source positioning method and system Active CN110007276B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910312565.6A CN110007276B (en) 2019-04-18 2019-04-18 Sound source positioning method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910312565.6A CN110007276B (en) 2019-04-18 2019-04-18 Sound source positioning method and system

Publications (2)

Publication Number Publication Date
CN110007276A CN110007276A (en) 2019-07-12
CN110007276B true CN110007276B (en) 2021-01-12

Family

ID=67172766

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910312565.6A Active CN110007276B (en) 2019-04-18 2019-04-18 Sound source positioning method and system

Country Status (1)

Country Link
CN (1) CN110007276B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110706717B (en) * 2019-09-06 2021-11-09 西安合谱声学科技有限公司 Microphone array panel-based human voice detection orientation method
CN110703198B (en) * 2019-10-22 2022-03-22 哈尔滨工程大学 Quaternary cross array envelope spectrum estimation method based on frequency selection
CN112924937B (en) * 2021-01-25 2024-06-04 桂林电子科技大学 Positioning device and method for two-dimensional plane bursty sound source

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102110441A (en) * 2010-12-22 2011-06-29 中国科学院声学研究所 Method for generating sound masking signal based on time reversal
CN102707262A (en) * 2012-06-20 2012-10-03 太仓博天网络科技有限公司 Sound localization system based on microphone array
CN103235287A (en) * 2013-04-17 2013-08-07 华北电力大学(保定) Sound source localization camera shooting tracking device
KR20130114437A (en) * 2012-04-09 2013-10-17 주식회사 센서웨이 The time delay estimation method based on cross-correlation and apparatus thereof
CN103607361A (en) * 2013-06-05 2014-02-26 西安电子科技大学 Time frequency overlap signal parameter estimation method under Alpha stable distribution noise
EP2543037B1 (en) * 2010-03-29 2014-03-05 Fraunhofer Gesellschaft zur Förderung der angewandten Wissenschaft E.V. A spatial audio processor and a method for providing spatial parameters based on an acoustic input signal
CN107102296A (en) * 2017-04-27 2017-08-29 大连理工大学 A kind of sonic location system based on distributed microphone array
CN108198568A (en) * 2017-12-26 2018-06-22 太原理工大学 A kind of method and system of more auditory localizations
US20180359563A1 (en) * 2017-06-12 2018-12-13 Ryo Tanaka Method for accurately calculating the direction of arrival of sound at a microphone array

Family Cites Families (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101901602B (en) * 2010-07-09 2012-09-05 中国科学院声学研究所 Method for reducing noise by using hearing threshold of impaired hearing
US9081083B1 (en) * 2011-06-27 2015-07-14 Amazon Technologies, Inc. Estimation of time delay of arrival
FR2992765A1 (en) * 2012-06-27 2014-01-03 France Telecom LOW COMPLEXITY COUPLING ESTIMATION
CN104076331B (en) * 2014-06-18 2016-04-13 南京信息工程大学 A kind of sound localization method of seven yuan of microphone arrays
CN104991573A (en) * 2015-06-25 2015-10-21 北京品创汇通科技有限公司 Locating and tracking method and apparatus based on sound source array
CN106098077B (en) * 2016-07-28 2023-05-05 浙江诺尔康神经电子科技股份有限公司 Artificial cochlea speech processing system and method with noise reduction function
CN106226739A (en) * 2016-07-29 2016-12-14 太原理工大学 Merge the double sound source localization method of Substrip analysis
US20180074163A1 (en) * 2016-09-08 2018-03-15 Nanjing Avatarmind Robot Technology Co., Ltd. Method and system for positioning sound source by robot
CN107644650B (en) * 2017-09-29 2020-06-05 山东大学 Improved sound source positioning method based on progressive serial orthogonalization blind source separation algorithm and implementation system thereof
CN108333575B (en) * 2018-02-02 2020-10-20 浙江大学 Gaussian prior and interval constraint based time delay filtering method for mobile sound source

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2543037B1 (en) * 2010-03-29 2014-03-05 Fraunhofer Gesellschaft zur Förderung der angewandten Wissenschaft E.V. A spatial audio processor and a method for providing spatial parameters based on an acoustic input signal
CN102110441A (en) * 2010-12-22 2011-06-29 中国科学院声学研究所 Method for generating sound masking signal based on time reversal
KR20130114437A (en) * 2012-04-09 2013-10-17 주식회사 센서웨이 The time delay estimation method based on cross-correlation and apparatus thereof
CN102707262A (en) * 2012-06-20 2012-10-03 太仓博天网络科技有限公司 Sound localization system based on microphone array
CN103235287A (en) * 2013-04-17 2013-08-07 华北电力大学(保定) Sound source localization camera shooting tracking device
CN103607361A (en) * 2013-06-05 2014-02-26 西安电子科技大学 Time frequency overlap signal parameter estimation method under Alpha stable distribution noise
CN107102296A (en) * 2017-04-27 2017-08-29 大连理工大学 A kind of sonic location system based on distributed microphone array
US20180359563A1 (en) * 2017-06-12 2018-12-13 Ryo Tanaka Method for accurately calculating the direction of arrival of sound at a microphone array
CN108198568A (en) * 2017-12-26 2018-06-22 太原理工大学 A kind of method and system of more auditory localizations

Also Published As

Publication number Publication date
CN110007276A (en) 2019-07-12

Similar Documents

Publication Publication Date Title
CN107102296B (en) Sound source positioning system based on distributed microphone array
CN110007276B (en) Sound source positioning method and system
WO2020042708A1 (en) Time-frequency masking and deep neural network-based sound source direction estimation method
CN104076331B (en) A kind of sound localization method of seven yuan of microphone arrays
CN105068048B (en) Distributed microphone array sound localization method based on spatial sparsity
CN109490822B (en) Voice DOA estimation method based on ResNet
CN104142492B (en) A kind of SRP PHAT multi-source space-location methods
CN110515038B (en) Self-adaptive passive positioning device based on unmanned aerial vehicle-array and implementation method
CN105388459B (en) The robust sound source space-location method of distributed microphone array network
CN110488223A (en) A kind of sound localization method
CN111474521B (en) Sound source positioning method based on microphone array in multipath environment
Ajdler et al. Acoustic source localization in distributed sensor networks
CN107219512B (en) Sound source positioning method based on sound transfer function
CN105204001A (en) Sound source positioning method and system
Pang et al. Multitask learning of time-frequency CNN for sound source localization
CN111239687A (en) Sound source positioning method and system based on deep neural network
CN111798869B (en) Sound source positioning method based on double microphone arrays
CN111273215B (en) Channel inconsistency error correction direction finding method of channel state information
CN110534126A (en) A kind of auditory localization and sound enhancement method and system based on fixed beam formation
CN109188362A (en) A kind of microphone array auditory localization signal processing method
CN107167770A (en) A kind of microphone array sound source locating device under the conditions of reverberation
CN111965596A (en) Low-complexity single-anchor node positioning method and device based on joint parameter estimation
Chen et al. A supplement to multidimensional scaling framework for mobile location: A unified view
CN115267671A (en) Distributed voice interaction terminal equipment and sound source positioning method and device thereof
Dang et al. A feature-based data association method for multiple acoustic source localization in a distributed microphone array

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant