CN111239686A - Dual-channel sound source positioning method based on deep learning - Google Patents
Dual-channel sound source positioning method based on deep learning Download PDFInfo
- Publication number
- CN111239686A CN111239686A CN202010099231.8A CN202010099231A CN111239686A CN 111239686 A CN111239686 A CN 111239686A CN 202010099231 A CN202010099231 A CN 202010099231A CN 111239686 A CN111239686 A CN 111239686A
- Authority
- CN
- China
- Prior art keywords
- channel
- direction information
- time
- frequency domain
- phase
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01S—RADIO DIRECTION-FINDING; RADIO NAVIGATION; DETERMINING DISTANCE OR VELOCITY BY USE OF RADIO WAVES; LOCATING OR PRESENCE-DETECTING BY USE OF THE REFLECTION OR RERADIATION OF RADIO WAVES; ANALOGOUS ARRANGEMENTS USING OTHER WAVES
- G01S5/00—Position-fixing by co-ordinating two or more direction or position line determinations; Position-fixing by co-ordinating two or more distance determinations
- G01S5/18—Position-fixing by co-ordinating two or more direction or position line determinations; Position-fixing by co-ordinating two or more distance determinations using ultrasonic, sonic, or infrasonic waves
- G01S5/20—Position of source determined by a plurality of spaced direction-finders
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Software Systems (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- Mathematical Physics (AREA)
- Remote Sensing (AREA)
- Radar, Positioning & Navigation (AREA)
- Circuit For Audible Band Transducer (AREA)
Abstract
The invention discloses a two-channel sound source positioning method based on deep learning, which comprises the following steps: respectively performing framing, windowing and Fourier transformation on the microphone pickup data of the left channel and the right channel to obtain time-frequency domain pickup signals of the first channel and the second channel; estimating phase sensitive masking from the time-frequency domain picked-up signals and the time-frequency domain direct sound signals corresponding to the time-frequency domain picked-up signals by utilizing deep learning, guiding estimation of sound source direction information by utilizing the phase sensitive masking, calculating accuracy of direction information estimation by utilizing the phase sensitive masking, obtaining a direction information enhancement value from the estimated direction information and the direction information estimation accuracy by utilizing the deep learning, constructing a weighted histogram by utilizing the enhanced direction information and the accuracy of the direction information estimation, and finally selecting the direction corresponding to the peak value of the histogram as the sound source direction. The method estimates the direction of the sound source from the data picked up by the dual-channel microphone, fully utilizes the generalization capability of the neural network, and has better robustness to the noise reverberation environment.
Description
Technical Field
The invention relates to the technical field of sound source positioning, in particular to a dual-channel sound source positioning method based on deep learning.
Background
Currently, the sound source localization technology mainly estimates the azimuth of a sound source from data containing background noise and reverberation picked up by a microphone array, so as to obtain better performance in the aspects of sound source separation, sound source tracking and the like. In the sound source localization technique using azimuth as output, the azimuth of the sound source can be estimated by using the orthogonality of the signal space and the noise space, but the performance of such algorithms is obviously reduced when reverberation exists. By utilizing deep learning, the robustness of the algorithm in the presence of noise and reverberation can be better improved. Most sound source localization algorithms based on deep learning regard sound source localization as a classification problem, and utilize neural networks to estimate the location of sound sources from partitioned areas. The positioning accuracy of the algorithm is related to region division, and when the requirement of positioning accuracy changes, the neural network needs to be retrained.
Disclosure of Invention
The invention aims to solve the defects of the existing sound source positioning technology.
In order to achieve the purpose, the invention discloses a dual-channel sound source positioning method based on deep learning, which comprises the following steps:
respectively performing framing, windowing and Fourier transform on the microphone picked data of each channel to obtain a time-frequency domain picked signal of each channel; the double-channel time-frequency domain signal comprises the information of the position of the sound source;
combining the logarithmic power spectrum of the time-frequency domain pickup signal of the first channel and the phase difference between the channels to obtain the input characteristic of the first channel; combining the logarithmic power spectrum of the time-frequency domain pickup signal of the second channel and the phase difference between the channels to obtain the input characteristic of the second channel;
calculating to obtain a phase sensitivity masking estimation value of the first channel by using the time-frequency domain pickup signal of the first channel and the time-frequency domain direct sound signal corresponding to the time-frequency domain pickup signal; calculating to obtain a phase sensitivity masking estimation value of the second channel by using the time-frequency domain pickup signal of the second channel and the time-frequency domain direct sound signal corresponding to the time-frequency domain pickup signal;
training a neural network by using the input characteristics of each channel and the corresponding theoretical phase sensitivity masking to obtain an estimation model of the phase sensitivity masking;
taking the input characteristics of the first channel as the input of the estimation model, and outputting the phase sensitivity masking estimation value of the first channel; taking the input characteristics of the second channel as the input of the probability estimation model, and outputting the estimated value of the phase sensitive masking of the second channel;
calculating a voice covariance matrix by using the picked-up signal of each channel time-frequency domain and the phase sensitivity masking estimation value of each channel time-frequency domain;
carrying out eigenvalue decomposition on the voice covariance matrix to obtain a main eigenvector of the voice covariance matrix as a guide vector of a sound source;
taking the phase angle difference of two elements of the guide vector as direction information;
calculating the estimation accuracy of the direction information of each time frequency point by using the two-channel phase sensitive masking estimation value;
calculating an ideal phase difference of data picked up by two microphones by using the time difference of sound sources reaching the microphones as target direction information;
training a neural network by using the direction information, the direction information estimation accuracy and the target direction information to obtain a direction information enhancement model;
the direction information and the direction information estimation accuracy are used as the input of a direction information enhancement model, and the direction information estimation accuracy are output as enhanced direction information;
calculating a sound source direction at each time-frequency point using the enhanced direction information;
and constructing a weighted statistical histogram by using the estimation accuracy of the direction characteristics and the direction information at all the time-frequency points.
And selecting the direction with the largest statistical result as the sound source direction by using the weighted histogram.
Preferably, the specific steps of framing, windowing and fourier transforming the microphone picked data of each channel respectively are as follows:
taking 512 sampling points for each channel as a frame signal, and supplementing 512 points if the length is insufficient; then windowing each frame of signal, wherein the windowing function adopts a Blackman window; and finally, carrying out Fourier transform on each frame of signal.
Preferably, the per-channel input characteristics are:
where n is the number of the data frame, m is the number of the channel,is the log-amplitude spectrum of the time-frequency domain signal of the mth channel,is the phase difference of the time-frequency domain signal of the mth channel.
Preferably, the per-channel phase sensitive mask is:
where f is the number of the frequency band, theta is the phase of the time-frequency domain signal of the data picked up by the microphone,is the phase of the time-frequency domain signal of the direct sound data,is the time-frequency domain signal of the direct sound, and X is the time-frequency domain signal of the microphone pickup data.
Preferably, the step of training the neural network by using the input features of each channel and the theoretical phase sensitivity masking corresponding thereto to obtain the estimation model of the phase sensitivity masking includes:
the neural network is a three-layer long-term memory network, and each layer is provided with 512 nodes. And taking the phase sensitive masking theoretical value as a training target of the neural network, and continuously reducing the mean square error of the phase sensitive masking estimated value and the phase sensitive masking theoretical value through iteration.
preferably, the speech covariance matrix is:
preferably, the eigenvalue decomposition is performed on the speech covariance matrix, and the acquisition of the principal eigenvector thereof as the steering vector of the sound source is:
preferably, the direction information is:
preferably, the accuracy of the direction information estimation is:
preferably, the ideal phase difference is:
wherein the content of the first and second substances,andis the time taken for the sound source to reach the 1 st and 2 nd microphones, fsIs the sampling rate of the pick-up signal.
Preferably, the direction information estimation accuracy and the target direction information are used for training the neural network to obtain a direction information enhancement model, specifically:
the neural network is a fully-connected neural network with three layers, and each layer is provided with 2048 nodes. The input characteristics of the neural network are a splicing vector of sine value and cosine value of the direction information and estimation accuracy of the direction information, and specifically the method comprises the following steps:
In=[sinθn,0,…,sinθn,F-1,cosθn,0,…,cosθn,F-1,Wn,0,…,Wn,F-1]
the estimation target of the neural network is target direction information, specifically:
the mean square error of the enhanced direction information and the target direction information is continuously reduced through iteration.
Preferably, the enhanced direction information is:
wherein the content of the first and second substances,is the output value of the enhancement model.
Preferably, the sound source direction calculated at each time-frequency point is:
where c is the sound propagation velocity and d is the microphone pitch.
Preferably, the weighted histogram is constructed such that each time-frequency point has a weight of Wn,f。
the invention has the advantages that: 1) phase sensitive masking is estimated through spatial information and spectral information, so that more accurate direction information estimation is obtained; 2) the neural network is utilized to enhance the estimated direction information, so that the performance of the positioning method in a noise reverberation environment is improved; 3) by estimating the final sound source orientation using the weighted histogram, the influence of the silence segments on the sound source localization accuracy can be reduced. By containing enough noise types and orientations in the training data, the generalization capability of the deep neural network can be fully utilized, the robustness of the model is improved, and the purpose of sound source positioning in a noise reverberation environment is achieved.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without creative efforts.
Fig. 1 is a flowchart of a dual-channel sound source localization method based on deep learning.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Fig. 1 is a flowchart of a dual-channel sound source localization method based on deep learning. As shown in fig. 1, the method includes:
step S101: and respectively performing framing, windowing and Fourier transformation on the microphone picked data of the left channel and the right channel to obtain a time-frequency domain picked signal of each channel. The dual-channel time-frequency domain signal contains information of the position of the sound source.
In one embodiment, 512 sampling points are taken as a frame signal for each channel, and if the length is insufficient, 512 sampling points are supplemented; then windowing each frame of signal, wherein the windowing function adopts a Blackman window; and finally, carrying out Fourier transform on each frame of signal to obtain a time-frequency domain pickup signal of each channel.
Step S102: combining the logarithmic power spectrum of the time-frequency domain pickup signal of the left channel and the phase difference between the channels to obtain the input characteristic of the first channel; and combining the logarithmic power spectrum of the time-frequency domain pickup signal of the right channel and the phase difference between the channels to obtain the input characteristic of the second channel.
Specifically, the per-channel input characteristics are:where n is the number of the data frame, m is the number of the channel,is the log-amplitude spectrum of the time-frequency domain signal of the mth channel,is the phase difference of the time-frequency domain signal of the mth channel.
Step S103: calculating to obtain a phase sensitivity masking estimation value of the first channel by using the time-frequency domain pickup signal of the first channel and the time-frequency domain direct sound signal corresponding to the time-frequency domain pickup signal; and calculating to obtain a phase sensitivity masking estimation value of the second channel by using the time-frequency domain pickup signal of the second channel and the time-frequency domain direct sound signal corresponding to the time-frequency domain pickup signal.
Specifically, the per-channel phase sensitive mask is: where f is the number of the frequency band, theta is the phase of the time-frequency domain signal of the data picked up by the microphone,is the phase of the time-frequency domain signal of the direct sound data,is the time-frequency domain signal of the direct sound, and X is the time-frequency domain signal of the microphone pickup data.
Step S104: and training the neural network by using the input characteristics of each channel and the corresponding theoretical phase sensitivity masking to obtain an estimation model of the phase sensitivity masking.
In one embodiment, the neural network is a three-layer long-term memory network with 512 nodes in each layer. And taking the phase sensitive masking theoretical value as a training target of the neural network, and continuously reducing the mean square error of the phase sensitive masking estimated value and the phase sensitive masking theoretical value through iteration.
Step S105: taking the input characteristics of the first channel as the input of the estimation model, and outputting the phase sensitivity masking estimation value of the first channel; and taking the input characteristics of the second channel as the input of the probability estimation model, and outputting the estimated value of the phase sensitive masking of the second channel.
step S106: a speech covariance matrix is calculated using each channel time-frequency domain picked-up signal and each channel time-frequency domain phase sensitive masking estimate together.
step S107: and carrying out eigenvalue decomposition on the voice covariance matrix to obtain a main eigenvector of the voice covariance matrix as a guide vector of the sound source.
step S108: and taking the phase angle difference of the two elements of the guide vector as direction information.
step S109: and calculating the estimation accuracy of the direction information of each time frequency point by using the two-channel phase sensitive masking estimation value.
step S110: the ideal phase difference of the picked-up data of the two microphones is calculated as target direction information by using the time difference of the sound source reaching the microphones.
Specifically, the target direction information is:wherein the content of the first and second substances,andthe sound source reaches the 1 st and 2 nd wheatTime taken to catch wind, fsIs the sampling rate of the pick-up signal.
Step S111: and training the neural network by using the direction information, the direction information estimation accuracy and the target direction information to obtain a direction information enhancement model.
In one embodiment, the neural network is a fully-connected neural network with three layers, each layer having 2048 nodes.
Specifically, the input features of the neural network are a splicing vector of a sine value and a cosine value of the direction information and estimation accuracy of the direction information:
In=[sinθn,0,…,sinθn,F-1,cosθn,0,…,cosθn,F-1,Wn,0,…,Wn,F-1]
specifically, the estimated target of the neural network is target direction information:
the mean square error of the enhanced direction information and the target direction information is continuously reduced through iteration.
Step S112: and taking the direction information and the direction information estimation accuracy as the input of a direction information enhancement model, and outputting the direction information and the direction information estimation accuracy as enhanced direction information.
Specifically, the enhanced direction information is:wherein the content of the first and second substances,is the output value of the enhancement model.
Step S113: the sound source direction is calculated at each time-frequency point using the enhanced direction information.
Specifically, the sound source direction calculated at each time-frequency point is:where c is the speed of sound propagationDegree, d is the microphone pitch.
Step S114: and constructing a weighted statistical histogram by using the estimation accuracy of the direction characteristics and the direction information at all the time-frequency points.
Specifically, when the weighted histogram is constructed, the weight of each time-frequency point is Wn,f。
Step S115: and selecting the direction with the largest statistical result as the sound source direction by using the weighted histogram.
the embodiment of the invention provides a two-channel sound source positioning method based on deep learning, which estimates phase-sensitive masking by simultaneously utilizing spatial information and spectral information, estimates direction information by taking the phase-sensitive masking as guidance, enhances the direction information through a neural network, and finally determines the final sound source position through a weighted statistical histogram. By containing enough noise types and orientations in the training data, the generalization capability of the deep neural network can be fully utilized, the robustness of the model is improved, and the purpose of estimating the orientation of the sound source in the noise reverberation environment is achieved.
The above embodiments are provided to further explain the objects, technical solutions and advantages of the present invention in detail, it should be understood that the above embodiments are merely exemplary embodiments of the present invention and are not intended to limit the scope of the present invention, and any modifications, equivalents, improvements and the like made within the spirit and principle of the present invention should be included in the scope of the present invention.
Claims (10)
1. A dual-channel sound source localization algorithm based on deep learning is characterized by comprising the following steps:
respectively performing framing, windowing and Fourier transformation on the microphone pickup data of the left channel and the right channel to obtain time-frequency domain pickup signals of the first channel and the second channel; the double-channel time-frequency domain signal comprises information of the position of a sound source;
combining the logarithmic power spectrum of the time-frequency domain pickup signal of the first channel and the phase difference between the channels to obtain the input characteristic of the first channel; combining the logarithmic power spectrum of the time-frequency domain pickup signal of the second channel and the phase difference between the channels to obtain the input characteristic of the second channel;
calculating to obtain a phase sensitivity masking estimation value of the first channel by using the time-frequency domain pickup signal of the first channel and the time-frequency domain direct sound signal corresponding to the time-frequency domain pickup signal; calculating to obtain a phase sensitivity masking estimation value of the second channel by using the time-frequency domain pickup signal of the second channel and the time-frequency domain direct sound signal corresponding to the time-frequency domain pickup signal;
training a neural network by using the input characteristics of each channel and the corresponding theoretical phase sensitivity masking to obtain an estimation model of the phase sensitivity masking;
taking the input characteristics of the first channel as the input of an estimation model, and outputting a phase sensitive masking estimation value of the first channel; taking the input characteristics of the second channel as the input of a probability estimation model, and outputting an estimated value of the phase sensitive masking of the second channel;
calculating a voice covariance matrix by using the picked-up signal of each channel and the phase sensitivity masking estimated value of each channel;
performing eigenvalue decomposition on the voice covariance matrix to obtain a main eigenvector of the voice covariance matrix as a guide vector of a sound source;
taking the phase angle difference of the two elements of the guide vector as direction information;
calculating the estimation accuracy of the direction information of each time frequency point by using the two-channel phase sensitive masking estimation value;
calculating an ideal phase difference of data picked up by two microphones by using the time difference of sound sources reaching the microphones as target direction information;
training a neural network by using the direction information, the direction information estimation accuracy and the target direction information to obtain a direction information enhancement model;
the direction information and the direction information estimation accuracy are used as the input of a direction information enhancement model, and the direction information estimation accuracy are output as enhanced direction information;
calculating a sound source direction at each time-frequency point using the enhanced direction information;
constructing a weighted statistical histogram by using the estimation accuracy of the direction characteristics and the direction information at all time-frequency points;
and selecting the direction with the largest statistical result as the sound source direction by utilizing the weighted histogram.
2. The method of claim 1, wherein the step of performing framing, windowing and fourier transform on the microphone picked data of each channel respectively comprises:
taking 512 sampling points for each channel as a frame signal, and supplementing 512 points if the length is insufficient; then windowing each frame of signal, wherein the windowing function adopts a Blackman window; and finally, carrying out Fourier transform on each frame of signal.
3. The method of claim 1, wherein the per-channel input features are:
where n is the number of the data frame, m is the number of the channel,is the log-amplitude spectrum of the time-frequency domain signal of the mth channel,is the phase difference of the time-frequency domain signal of the mth channel;
the per-channel phase sensitive mask is:
where f is the number of the frequency band, theta is the phase of the time-frequency domain signal of the data picked up by the microphone,is the phase of the time-frequency domain signal of the direct sound data,is the time-frequency domain signal of the direct sound, and X is the time-frequency domain signal of the microphone pickup data.
4. The method according to claim 1, wherein the step of training the neural network using the input features of each channel and the theoretical phase-sensitive mask corresponding thereto to obtain the estimation model of the phase-sensitive mask comprises:
the neural network is a three-layer long-time memory network, and each layer is provided with 512 nodes; taking the phase sensitive masking theoretical value as a training target of the neural network, and continuously reducing the mean square error of the phase sensitive masking estimated value and the phase sensitive masking theoretical value through iteration; the estimated values of the per-channel phase sensitive masking are:
5. the method of claim 1, wherein the speech covariance matrix is:
the eigenvalue decomposition of the voice covariance matrix is performed to obtain a main eigenvector of the voice covariance matrix as a guide vector of a sound source:
the direction information is as follows:
the accuracy of the direction information estimation is as follows:
7. The method according to claim 1, wherein the training of the neural network using the direction information, the direction information estimation accuracy and the target direction information yields a direction information enhancement model, specifically:
the neural network is a three-layer fully-connected neural network, and each layer is provided with 2048 nodes; the input characteristics of the neural network are a splicing vector of sine value and cosine value of the direction information and estimation accuracy of the direction information, and specifically the method comprises the following steps:
In=[sinθn,0,…,sinθn,F-1,cosθn,0,…,cosθn,F-1,Wn,0,…,Wn,F-1]
the estimation target of the neural network is target direction information, specifically:
the mean square error of the enhanced direction information and the target direction information is continuously reduced through iteration;
the enhanced direction information is as follows:
9. The method of claim 1, wherein the weighted histogram is constructed such that each time-frequency point has a weight of Wn,f。
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010099231.8A CN111239686B (en) | 2020-02-18 | 2020-02-18 | Dual-channel sound source positioning method based on deep learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010099231.8A CN111239686B (en) | 2020-02-18 | 2020-02-18 | Dual-channel sound source positioning method based on deep learning |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111239686A true CN111239686A (en) | 2020-06-05 |
CN111239686B CN111239686B (en) | 2021-12-21 |
Family
ID=70874955
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010099231.8A Active CN111239686B (en) | 2020-02-18 | 2020-02-18 | Dual-channel sound source positioning method based on deep learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111239686B (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112269158A (en) * | 2020-10-14 | 2021-01-26 | 南京南大电子智慧型服务机器人研究院有限公司 | Method for positioning voice source by utilizing microphone array based on UNET structure |
CN113476041A (en) * | 2021-06-21 | 2021-10-08 | 苏州大学附属第一医院 | Speech perception capability test method and system for children using artificial cochlea |
CN113643714A (en) * | 2021-10-14 | 2021-11-12 | 阿里巴巴达摩院(杭州)科技有限公司 | Audio processing method, device, storage medium and computer program |
WO2022012629A1 (en) * | 2020-07-17 | 2022-01-20 | 华为技术有限公司 | Method and apparatus for estimating time delay of stereo audio signal |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103886858A (en) * | 2014-03-11 | 2014-06-25 | 中国科学院信息工程研究所 | Sound masking signal generating method and system |
CN107703486A (en) * | 2017-08-23 | 2018-02-16 | 南京邮电大学 | A kind of auditory localization algorithm based on convolutional neural networks CNN |
CN109448751A (en) * | 2018-12-29 | 2019-03-08 | 中国科学院声学研究所 | A kind of ears sound enhancement method based on deep learning |
CN109839612A (en) * | 2018-08-31 | 2019-06-04 | 大象声科(深圳)科技有限公司 | Sounnd source direction estimation method based on time-frequency masking and deep neural network |
CN109975762A (en) * | 2017-12-28 | 2019-07-05 | 中国科学院声学研究所 | A kind of underwater sound source localization method |
CN110517705A (en) * | 2019-08-29 | 2019-11-29 | 北京大学深圳研究生院 | A kind of binaural sound sources localization method and system based on deep neural network and convolutional neural networks |
-
2020
- 2020-02-18 CN CN202010099231.8A patent/CN111239686B/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103886858A (en) * | 2014-03-11 | 2014-06-25 | 中国科学院信息工程研究所 | Sound masking signal generating method and system |
CN107703486A (en) * | 2017-08-23 | 2018-02-16 | 南京邮电大学 | A kind of auditory localization algorithm based on convolutional neural networks CNN |
CN109975762A (en) * | 2017-12-28 | 2019-07-05 | 中国科学院声学研究所 | A kind of underwater sound source localization method |
CN109839612A (en) * | 2018-08-31 | 2019-06-04 | 大象声科(深圳)科技有限公司 | Sounnd source direction estimation method based on time-frequency masking and deep neural network |
CN109448751A (en) * | 2018-12-29 | 2019-03-08 | 中国科学院声学研究所 | A kind of ears sound enhancement method based on deep learning |
CN110517705A (en) * | 2019-08-29 | 2019-11-29 | 北京大学深圳研究生院 | A kind of binaural sound sources localization method and system based on deep neural network and convolutional neural networks |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2022012629A1 (en) * | 2020-07-17 | 2022-01-20 | 华为技术有限公司 | Method and apparatus for estimating time delay of stereo audio signal |
CN112269158A (en) * | 2020-10-14 | 2021-01-26 | 南京南大电子智慧型服务机器人研究院有限公司 | Method for positioning voice source by utilizing microphone array based on UNET structure |
CN113476041A (en) * | 2021-06-21 | 2021-10-08 | 苏州大学附属第一医院 | Speech perception capability test method and system for children using artificial cochlea |
CN113476041B (en) * | 2021-06-21 | 2023-09-19 | 苏州大学附属第一医院 | Speech perception capability test method and system for artificial cochlea using children |
CN113643714A (en) * | 2021-10-14 | 2021-11-12 | 阿里巴巴达摩院(杭州)科技有限公司 | Audio processing method, device, storage medium and computer program |
CN113643714B (en) * | 2021-10-14 | 2022-02-18 | 阿里巴巴达摩院(杭州)科技有限公司 | Audio processing method, device, storage medium and computer program |
Also Published As
Publication number | Publication date |
---|---|
CN111239686B (en) | 2021-12-21 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111239686B (en) | Dual-channel sound source positioning method based on deep learning | |
US10901063B2 (en) | Localization algorithm for sound sources with known statistics | |
CN110133596B (en) | Array sound source positioning method based on frequency point signal-to-noise ratio and bias soft decision | |
CN109597022A (en) | The operation of sound bearing angle, the method, apparatus and equipment for positioning target audio | |
CN107172018A (en) | The vocal print cryptosecurity control method and system of activation type under common background noise | |
CN105652243B (en) | Multichannel group sparse linear predicts delay time estimation method | |
CN110534126B (en) | Sound source positioning and voice enhancement method and system based on fixed beam forming | |
CN108549052A (en) | A kind of humorous domain puppet sound intensity sound localization method of circle of time-frequency-spatial domain joint weighting | |
CN114171041A (en) | Voice noise reduction method, device and equipment based on environment detection and storage medium | |
CN106019230B (en) | A kind of sound localization method based on i-vector Speaker Identification | |
CN111798869B (en) | Sound source positioning method based on double microphone arrays | |
CN103901400B (en) | A kind of based on delay compensation and ears conforming binaural sound source of sound localization method | |
Pertilä et al. | Multichannel source activity detection, localization, and tracking | |
CN116559778B (en) | Vehicle whistle positioning method and system based on deep learning | |
CN114664288A (en) | Voice recognition method, device, equipment and storage medium | |
CN111179959B (en) | Competitive speaker number estimation method and system based on speaker embedding space | |
CN111060867A (en) | Directional microphone microarray direction of arrival estimation method | |
CN117169812A (en) | Sound source positioning method based on deep learning and beam forming | |
CN111929638A (en) | Voice direction of arrival estimation method and device | |
CN113345421B (en) | Multi-channel far-field target voice recognition method based on angle spectrum characteristics | |
CN109239665B (en) | Multi-sound-source continuous positioning method and device based on signal subspace similarity spectrum and particle filter | |
Wang et al. | A robust doa estimation method for a linear microphone array under reverberant and noisy environments | |
KR101022457B1 (en) | Method to combine CASA and soft mask for single-channel speech separation | |
Tiantian et al. | Underwater Acoustic Sensing with Rational Orthogonal Wavelet Pulse and Auditory Frequency Cepstral Coefficient-Based Feature Extraction | |
Zhou et al. | Multi-source wideband DOA estimation method by frequency focusing and error weighting |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |