CN111239686B - Dual-channel sound source positioning method based on deep learning - Google Patents

Dual-channel sound source positioning method based on deep learning Download PDF

Info

Publication number
CN111239686B
CN111239686B CN202010099231.8A CN202010099231A CN111239686B CN 111239686 B CN111239686 B CN 111239686B CN 202010099231 A CN202010099231 A CN 202010099231A CN 111239686 B CN111239686 B CN 111239686B
Authority
CN
China
Prior art keywords
channel
time
direction information
frequency domain
phase
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010099231.8A
Other languages
Chinese (zh)
Other versions
CN111239686A (en
Inventor
李军锋
程龙彪
夏日升
颜永红
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Acoustics CAS
Original Assignee
Institute of Acoustics CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Acoustics CAS filed Critical Institute of Acoustics CAS
Priority to CN202010099231.8A priority Critical patent/CN111239686B/en
Publication of CN111239686A publication Critical patent/CN111239686A/en
Application granted granted Critical
Publication of CN111239686B publication Critical patent/CN111239686B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G01MEASURING; TESTING
    • G01SRADIO DIRECTION-FINDING; RADIO NAVIGATION; DETERMINING DISTANCE OR VELOCITY BY USE OF RADIO WAVES; LOCATING OR PRESENCE-DETECTING BY USE OF THE REFLECTION OR RERADIATION OF RADIO WAVES; ANALOGOUS ARRANGEMENTS USING OTHER WAVES
    • G01S5/00Position-fixing by co-ordinating two or more direction or position line determinations; Position-fixing by co-ordinating two or more distance determinations
    • G01S5/18Position-fixing by co-ordinating two or more direction or position line determinations; Position-fixing by co-ordinating two or more distance determinations using ultrasonic, sonic, or infrasonic waves
    • G01S5/20Position of source determined by a plurality of spaced direction-finders
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Software Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Remote Sensing (AREA)
  • Radar, Positioning & Navigation (AREA)
  • Circuit For Audible Band Transducer (AREA)

Abstract

The invention discloses a two-channel sound source positioning method based on deep learning, which comprises the following steps: respectively performing framing, windowing and Fourier transformation on the microphone pickup data of the left channel and the right channel to obtain time-frequency domain pickup signals of the first channel and the second channel; estimating phase sensitive masking from the time-frequency domain picked-up signals and the time-frequency domain direct sound signals corresponding to the time-frequency domain picked-up signals by utilizing deep learning, guiding estimation of sound source direction information by utilizing the phase sensitive masking, calculating accuracy of direction information estimation by utilizing the phase sensitive masking, obtaining a direction information enhancement value from the estimated direction information and the direction information estimation accuracy by utilizing the deep learning, constructing a weighted histogram by utilizing the enhanced direction information and the accuracy of the direction information estimation, and finally selecting the direction corresponding to the peak value of the histogram as the sound source direction. The method estimates the direction of the sound source from the data picked up by the dual-channel microphone, fully utilizes the generalization capability of the neural network, and has better robustness to the noise reverberation environment.

Description

Dual-channel sound source positioning method based on deep learning
Technical Field
The invention relates to the technical field of sound source positioning, in particular to a dual-channel sound source positioning method based on deep learning.
Background
Currently, the sound source localization technology mainly estimates the azimuth of a sound source from data containing background noise and reverberation picked up by a microphone array, so as to obtain better performance in the aspects of sound source separation, sound source tracking and the like. In the sound source localization technique using azimuth as output, the azimuth of the sound source can be estimated by using the orthogonality of the signal space and the noise space, but the performance of such algorithms is obviously reduced when reverberation exists. By utilizing deep learning, the robustness of the algorithm in the presence of noise and reverberation can be better improved. Most sound source localization algorithms based on deep learning regard sound source localization as a classification problem, and utilize neural networks to estimate the location of sound sources from partitioned areas. The positioning accuracy of the algorithm is related to region division, and when the requirement of positioning accuracy changes, the neural network needs to be retrained.
Disclosure of Invention
The invention aims to solve the defects of the existing sound source positioning technology.
In order to achieve the purpose, the invention discloses a dual-channel sound source positioning method based on deep learning, which comprises the following steps:
respectively performing framing, windowing and Fourier transform on the microphone picked data of each channel to obtain a time-frequency domain picked signal of each channel; the double-channel time-frequency domain signal comprises the information of the position of the sound source;
combining the logarithmic power spectrum of the time-frequency domain pickup signal of the first channel and the phase difference between the channels to obtain the input characteristic of the first channel; combining the logarithmic power spectrum of the time-frequency domain pickup signal of the second channel and the phase difference between the channels to obtain the input characteristic of the second channel;
calculating to obtain a phase sensitivity masking estimation value of the first channel by using the time-frequency domain pickup signal of the first channel and the time-frequency domain direct sound signal corresponding to the time-frequency domain pickup signal; calculating to obtain a phase sensitivity masking estimation value of the second channel by using the time-frequency domain pickup signal of the second channel and the time-frequency domain direct sound signal corresponding to the time-frequency domain pickup signal;
training a neural network by using the input characteristics of each channel and the corresponding theoretical phase sensitivity masking to obtain an estimation model of the phase sensitivity masking;
taking the input characteristics of the first channel as the input of the estimation model, and outputting the phase sensitivity masking estimation value of the first channel; taking the input characteristics of the second channel as the input of the probability estimation model, and outputting the estimated value of the phase sensitive masking of the second channel;
calculating a voice covariance matrix by using the picked-up signal of each channel time-frequency domain and the phase sensitivity masking estimation value of each channel time-frequency domain;
carrying out eigenvalue decomposition on the voice covariance matrix to obtain a main eigenvector of the voice covariance matrix as a guide vector of a sound source;
taking the phase angle difference of two elements of the guide vector as direction information;
calculating the estimation accuracy of the direction information of each time frequency point by using the two-channel phase sensitive masking estimation value;
calculating an ideal phase difference of data picked up by two microphones by using the time difference of sound sources reaching the microphones as target direction information;
training a neural network by using the direction information, the direction information estimation accuracy and the target direction information to obtain a direction information enhancement model;
the direction information and the direction information estimation accuracy are used as the input of a direction information enhancement model, and the direction information estimation accuracy are output as enhanced direction information;
calculating a sound source direction at each time-frequency point using the enhanced direction information;
and constructing a weighted statistical histogram by using the estimation accuracy of the direction characteristics and the direction information at all the time-frequency points.
And selecting the direction with the largest statistical result as the sound source direction by using the weighted histogram.
Preferably, the specific steps of framing, windowing and fourier transforming the microphone picked data of each channel respectively are as follows:
taking 512 sampling points for each channel as a frame signal, and supplementing 512 points if the length is insufficient; then windowing each frame of signal, wherein the windowing function adopts a Blackman window; and finally, carrying out Fourier transform on each frame of signal.
Preferably, the per-channel input characteristics are:
Figure BDA0002386318970000031
where n is the number of the data frame, m is the number of the channel,
Figure BDA0002386318970000032
is the log-amplitude spectrum of the time-frequency domain signal of the mth channel,
Figure BDA0002386318970000033
is the phase difference of the time-frequency domain signal of the mth channel.
Preferably, the per-channel phase sensitive mask is:
Figure BDA0002386318970000034
Figure BDA0002386318970000035
where f is the number of the frequency band, theta is the phase of the time-frequency domain signal of the data picked up by the microphone,
Figure BDA0002386318970000036
is the phase of the time-frequency domain signal of the direct sound data,
Figure BDA0002386318970000037
is the time-frequency domain signal of the direct sound, and X is the time-frequency domain signal of the microphone pickup data.
Preferably, the step of training the neural network by using the input features of each channel and the theoretical phase sensitivity masking corresponding thereto to obtain the estimation model of the phase sensitivity masking includes:
the neural network is a three-layer long-term memory network, and each layer is provided with 512 nodes. And taking the phase sensitive masking theoretical value as a training target of the neural network, and continuously reducing the mean square error of the phase sensitive masking estimated value and the phase sensitive masking theoretical value through iteration.
Preferably, the estimates of the per-channel phase sensitive masking are:
Figure BDA0002386318970000041
preferably, the speech covariance matrix is:
Figure BDA0002386318970000042
Figure BDA0002386318970000043
Figure BDA0002386318970000044
preferably, the eigenvalue decomposition is performed on the speech covariance matrix, and the acquisition of the principal eigenvector thereof as the steering vector of the sound source is:
Figure BDA0002386318970000045
preferably, the direction information is:
Figure BDA0002386318970000046
preferably, the accuracy of the direction information estimation is:
Figure BDA0002386318970000047
preferably, the ideal phase difference is:
Figure BDA0002386318970000048
wherein the content of the first and second substances,
Figure BDA0002386318970000049
and
Figure BDA00023863189700000410
is the time taken for the sound source to reach the 1 st and 2 nd microphones, fsIs the sampling rate of the pick-up signal.
Preferably, the direction information estimation accuracy and the target direction information are used for training the neural network to obtain a direction information enhancement model, specifically:
the neural network is a fully-connected neural network with three layers, and each layer is provided with 2048 nodes. The input characteristics of the neural network are a splicing vector of sine value and cosine value of the direction information and estimation accuracy of the direction information, and specifically the method comprises the following steps:
In=[sinθn,0,…,sinθn,F-1,cosθn,0,…,cosθn,F-1,Wn,0,…,Wn,F-1]
the estimation target of the neural network is target direction information, specifically:
Figure BDA0002386318970000051
the mean square error of the enhanced direction information and the target direction information is continuously reduced through iteration.
Preferably, the enhanced direction information is:
Figure BDA0002386318970000052
wherein the content of the first and second substances,
Figure BDA0002386318970000053
is the output value of the enhancement model.
Preferably, the sound source direction calculated at each time-frequency point is:
Figure BDA0002386318970000054
where c is the sound propagation velocity and d is the microphone pitch.
Preferably, the weighted histogram is constructed such that each time-frequency point has a weight of Wn,f
Preferably, the direction of the largest statistical result is:
Figure BDA0002386318970000055
the invention has the advantages that: 1) phase sensitive masking is estimated through spatial information and spectral information, so that more accurate direction information estimation is obtained; 2) the neural network is utilized to enhance the estimated direction information, so that the performance of the positioning method in a noise reverberation environment is improved; 3) by estimating the final sound source orientation using the weighted histogram, the influence of the silence segments on the sound source localization accuracy can be reduced. By containing enough noise types and orientations in the training data, the generalization capability of the deep neural network can be fully utilized, the robustness of the model is improved, and the purpose of sound source positioning in a noise reverberation environment is achieved.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without creative efforts.
Fig. 1 is a flowchart of a dual-channel sound source localization method based on deep learning.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Fig. 1 is a flowchart of a dual-channel sound source localization method based on deep learning. As shown in fig. 1, the method includes:
step S101: and respectively performing framing, windowing and Fourier transformation on the microphone picked data of the left channel and the right channel to obtain a time-frequency domain picked signal of each channel. The dual-channel time-frequency domain signal contains information of the position of the sound source.
In one embodiment, 512 sampling points are taken as a frame signal for each channel, and if the length is insufficient, 512 sampling points are supplemented; then windowing each frame of signal, wherein the windowing function adopts a Blackman window; and finally, carrying out Fourier transform on each frame of signal to obtain a time-frequency domain pickup signal of each channel.
Step S102: combining the logarithmic power spectrum of the time-frequency domain pickup signal of the left channel and the phase difference between the channels to obtain the input characteristic of the first channel; and combining the logarithmic power spectrum of the time-frequency domain pickup signal of the right channel and the phase difference between the channels to obtain the input characteristic of the second channel.
Specifically, the per-channel input characteristics are:
Figure BDA0002386318970000071
where n is the number of the data frame, m is the number of the channel,
Figure BDA0002386318970000072
is the log-amplitude spectrum of the time-frequency domain signal of the mth channel,
Figure BDA0002386318970000073
is the phase difference of the time-frequency domain signal of the mth channel.
Step S103: calculating to obtain a phase sensitivity masking estimation value of the first channel by using the time-frequency domain pickup signal of the first channel and the time-frequency domain direct sound signal corresponding to the time-frequency domain pickup signal; and calculating to obtain a phase sensitivity masking estimation value of the second channel by using the time-frequency domain pickup signal of the second channel and the time-frequency domain direct sound signal corresponding to the time-frequency domain pickup signal.
Specifically, the per-channel phase sensitive mask is:
Figure BDA0002386318970000074
Figure BDA0002386318970000075
where f is the number of the frequency band, theta is the phase of the time-frequency domain signal of the data picked up by the microphone,
Figure BDA0002386318970000076
is direct sound dataThe phase of the time-frequency domain signal,
Figure BDA0002386318970000077
is the time-frequency domain signal of the direct sound, and X is the time-frequency domain signal of the microphone pickup data.
Step S104: and training the neural network by using the input characteristics of each channel and the corresponding theoretical phase sensitivity masking to obtain an estimation model of the phase sensitivity masking.
In one embodiment, the neural network is a three-layer long-term memory network with 512 nodes in each layer. And taking the phase sensitive masking theoretical value as a training target of the neural network, and continuously reducing the mean square error of the phase sensitive masking estimated value and the phase sensitive masking theoretical value through iteration.
Step S105: taking the input characteristics of the first channel as the input of the estimation model, and outputting the phase sensitivity masking estimation value of the first channel; and taking the input characteristics of the second channel as the input of the probability estimation model, and outputting the estimated value of the phase sensitive masking of the second channel.
Specifically, the estimates of per-channel phase sensitive masking are:
Figure BDA0002386318970000081
step S106: a speech covariance matrix is calculated using each channel time-frequency domain picked-up signal and each channel time-frequency domain phase sensitive masking estimate together.
Specifically, the speech covariance matrix is:
Figure BDA0002386318970000082
Figure BDA0002386318970000083
step S107: and carrying out eigenvalue decomposition on the voice covariance matrix to obtain a main eigenvector of the voice covariance matrix as a guide vector of the sound source.
Specifically, the steering vector is:
Figure BDA0002386318970000084
step S108: and taking the phase angle difference of the two elements of the guide vector as direction information.
Specifically, the direction information is:
Figure BDA0002386318970000085
step S109: and calculating the estimation accuracy of the direction information of each time frequency point by using the two-channel phase sensitive masking estimation value.
Specifically, the accuracy of the direction information estimation is:
Figure BDA0002386318970000086
step S110: the ideal phase difference of the picked-up data of the two microphones is calculated as target direction information by using the time difference of the sound source reaching the microphones.
Specifically, the target direction information is:
Figure BDA0002386318970000087
wherein the content of the first and second substances,
Figure BDA0002386318970000088
and
Figure BDA0002386318970000089
is the time taken for the sound source to reach the 1 st and 2 nd microphones, fsIs the sampling rate of the pick-up signal.
Step S111: and training the neural network by using the direction information, the direction information estimation accuracy and the target direction information to obtain a direction information enhancement model.
In one embodiment, the neural network is a fully-connected neural network with three layers, each layer having 2048 nodes.
Specifically, the input features of the neural network are a splicing vector of a sine value and a cosine value of the direction information and estimation accuracy of the direction information:
In=[sinθn,0,…,sinθn,F-1,cosθn,0,…,cosθn,F-1,Wn,0,…,Wn,F-1]
specifically, the estimated target of the neural network is target direction information:
Figure BDA0002386318970000091
the mean square error of the enhanced direction information and the target direction information is continuously reduced through iteration.
Step S112: and taking the direction information and the direction information estimation accuracy as the input of a direction information enhancement model, and outputting the direction information and the direction information estimation accuracy as enhanced direction information.
Specifically, the enhanced direction information is:
Figure BDA0002386318970000092
wherein the content of the first and second substances,
Figure BDA0002386318970000093
is the output value of the enhancement model.
Step S113: the sound source direction is calculated at each time-frequency point using the enhanced direction information.
Specifically, the sound source direction calculated at each time-frequency point is:
Figure BDA0002386318970000094
where c is the sound propagation velocity and d is the microphone pitch.
Step S114: and constructing a weighted statistical histogram by using the estimation accuracy of the direction characteristics and the direction information at all the time-frequency points.
Specifically, when the weighted histogram is constructed, the weight of each time-frequency point is Wn,f
Step S115: and selecting the direction with the largest statistical result as the sound source direction by using the weighted histogram.
Specifically, the direction of the maximum statistical result is:
Figure BDA0002386318970000095
the embodiment of the invention provides a two-channel sound source positioning method based on deep learning, which estimates phase-sensitive masking by simultaneously utilizing spatial information and spectral information, estimates direction information by taking the phase-sensitive masking as guidance, enhances the direction information through a neural network, and finally determines the final sound source position through a weighted statistical histogram. By containing enough noise types and orientations in the training data, the generalization capability of the deep neural network can be fully utilized, the robustness of the model is improved, and the purpose of estimating the orientation of the sound source in the noise reverberation environment is achieved.
The above embodiments are provided to further explain the objects, technical solutions and advantages of the present invention in detail, it should be understood that the above embodiments are merely exemplary embodiments of the present invention and are not intended to limit the scope of the present invention, and any modifications, equivalents, improvements and the like made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims (6)

1. A dual-channel sound source positioning method based on deep learning is characterized by comprising the following steps:
respectively performing framing, windowing and Fourier transformation on the microphone pickup data of the left channel and the right channel to obtain time-frequency domain pickup signals of the first channel and the second channel; the double-channel time-frequency domain signal comprises information of the position of a sound source;
combining the logarithmic power spectrum of the time-frequency domain pickup signal of the first channel and the phase difference between the channels to obtain the input characteristic of the first channel; combining the logarithmic power spectrum of the time-frequency domain pickup signal of the second channel and the phase difference between the channels to obtain the input characteristic of the second channel;
calculating to obtain a phase sensitivity masking estimation value of the first channel by using the time-frequency domain pickup signal of the first channel and the time-frequency domain direct sound signal corresponding to the time-frequency domain pickup signal; calculating to obtain a phase sensitivity masking estimation value of the second channel by using the time-frequency domain pickup signal of the second channel and the time-frequency domain direct sound signal corresponding to the time-frequency domain pickup signal;
training a neural network by using the input characteristics of each channel and the corresponding theoretical phase sensitivity masking to obtain an estimation model of the phase sensitivity masking;
taking the input characteristics of the first channel as the input of an estimation model, and outputting a phase sensitive masking estimation value of the first channel; taking the input characteristics of the second channel as the input of a probability estimation model, and outputting an estimated value of the phase sensitive masking of the second channel;
calculating a voice covariance matrix by using the picked-up signal of each channel and the phase sensitivity masking estimated value of each channel;
performing eigenvalue decomposition on the voice covariance matrix to obtain a main eigenvector of the voice covariance matrix as a guide vector of a sound source;
taking the phase angle difference of the two elements of the guide vector as direction information;
calculating the estimation accuracy of the direction information of each time frequency point by using the two-channel phase sensitive masking estimation value;
calculating an ideal phase difference of data picked up by two microphones by using the time difference of sound sources reaching the microphones as target direction information;
training a neural network by using the direction information, the direction information estimation accuracy and the target direction information to obtain a direction information enhancement model;
the direction information and the direction information estimation accuracy are used as the input of a direction information enhancement model, and the direction information estimation accuracy are output as enhanced direction information;
calculating a sound source direction at each time-frequency point using the enhanced direction information;
constructing a weighted statistical histogram by using the estimation accuracy of the direction characteristics and the direction information at all time-frequency points;
and selecting the direction with the largest statistical result as the sound source direction by utilizing the weighted histogram.
2. The method of claim 1, wherein the step of performing framing, windowing and fourier transform on the microphone picked data of each channel respectively comprises:
taking 512 sampling points for each channel as a frame signal, and supplementing 512 points if the length is insufficient; then windowing each frame of signal, wherein the windowing function adopts a Blackman window; and finally, carrying out Fourier transform on each frame of signal.
3. The method of claim 1, wherein the per-channel input features are:
Figure FDA0003204458900000021
where n is the number of the data frame, m is the number of the channel,
Figure FDA0003204458900000022
is the log-amplitude spectrum of the time-frequency domain signal of the mth channel,
Figure FDA0003204458900000023
is the phase difference of the time-frequency domain signal of the mth channel;
the per-channel phase sensitive mask is:
Figure FDA0003204458900000024
Figure FDA0003204458900000031
where f is the number of the frequency band, theta is the phase of the time-frequency domain signal of the data picked up by the microphone,
Figure FDA0003204458900000032
is the phase of the time-frequency domain signal of the direct sound data,
Figure FDA0003204458900000033
is the time-frequency domain signal of the direct sound, and X is the time-frequency domain signal of the microphone pickup data.
4. The method according to claim 1, wherein the step of training the neural network using the input features of each channel and the theoretical phase-sensitive mask corresponding thereto to obtain the estimation model of the phase-sensitive mask comprises:
the neural network is a three-layer long-time memory network, and each layer is provided with 512 nodes; taking the phase sensitive masking theoretical value as a training target of the neural network, and continuously reducing the mean square error of the phase sensitive masking estimated value and the phase sensitive masking theoretical value through iteration; the estimated values of the per-channel phase sensitive masking are:
Figure FDA0003204458900000034
5. the method of claim 1, wherein the weighted histogram is constructed such that each time-frequency point has a weight of Wn,f
6. The method of claim 1, wherein the direction of the largest statistical result is:
Figure FDA0003204458900000035
CN202010099231.8A 2020-02-18 2020-02-18 Dual-channel sound source positioning method based on deep learning Active CN111239686B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010099231.8A CN111239686B (en) 2020-02-18 2020-02-18 Dual-channel sound source positioning method based on deep learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010099231.8A CN111239686B (en) 2020-02-18 2020-02-18 Dual-channel sound source positioning method based on deep learning

Publications (2)

Publication Number Publication Date
CN111239686A CN111239686A (en) 2020-06-05
CN111239686B true CN111239686B (en) 2021-12-21

Family

ID=70874955

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010099231.8A Active CN111239686B (en) 2020-02-18 2020-02-18 Dual-channel sound source positioning method based on deep learning

Country Status (1)

Country Link
CN (1) CN111239686B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113948098A (en) * 2020-07-17 2022-01-18 华为技术有限公司 Stereo audio signal time delay estimation method and device
CN112269158B (en) * 2020-10-14 2022-09-16 南京南大电子智慧型服务机器人研究院有限公司 Method for positioning voice source by utilizing microphone array based on UNET structure
CN113476041B (en) * 2021-06-21 2023-09-19 苏州大学附属第一医院 Speech perception capability test method and system for artificial cochlea using children
CN113643714B (en) * 2021-10-14 2022-02-18 阿里巴巴达摩院(杭州)科技有限公司 Audio processing method, device, storage medium and computer program

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103886858A (en) * 2014-03-11 2014-06-25 中国科学院信息工程研究所 Sound masking signal generating method and system
CN107703486A (en) * 2017-08-23 2018-02-16 南京邮电大学 A kind of auditory localization algorithm based on convolutional neural networks CNN
CN109448751A (en) * 2018-12-29 2019-03-08 中国科学院声学研究所 A kind of ears sound enhancement method based on deep learning
CN109839612A (en) * 2018-08-31 2019-06-04 大象声科(深圳)科技有限公司 Sounnd source direction estimation method based on time-frequency masking and deep neural network
CN109975762A (en) * 2017-12-28 2019-07-05 中国科学院声学研究所 A kind of underwater sound source localization method
CN110517705A (en) * 2019-08-29 2019-11-29 北京大学深圳研究生院 A kind of binaural sound sources localization method and system based on deep neural network and convolutional neural networks

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103886858A (en) * 2014-03-11 2014-06-25 中国科学院信息工程研究所 Sound masking signal generating method and system
CN107703486A (en) * 2017-08-23 2018-02-16 南京邮电大学 A kind of auditory localization algorithm based on convolutional neural networks CNN
CN109975762A (en) * 2017-12-28 2019-07-05 中国科学院声学研究所 A kind of underwater sound source localization method
CN109839612A (en) * 2018-08-31 2019-06-04 大象声科(深圳)科技有限公司 Sounnd source direction estimation method based on time-frequency masking and deep neural network
CN109448751A (en) * 2018-12-29 2019-03-08 中国科学院声学研究所 A kind of ears sound enhancement method based on deep learning
CN110517705A (en) * 2019-08-29 2019-11-29 北京大学深圳研究生院 A kind of binaural sound sources localization method and system based on deep neural network and convolutional neural networks

Also Published As

Publication number Publication date
CN111239686A (en) 2020-06-05

Similar Documents

Publication Publication Date Title
CN111239686B (en) Dual-channel sound source positioning method based on deep learning
US10901063B2 (en) Localization algorithm for sound sources with known statistics
EP1818909B1 (en) Voice recognition system
CN110223708B (en) Speech enhancement method based on speech processing and related equipment
CN107172018A (en) The vocal print cryptosecurity control method and system of activation type under common background noise
CN111429939B (en) Sound signal separation method of double sound sources and pickup
CN110133596A (en) A kind of array sound source localization method based on frequency point signal-to-noise ratio and biasing soft-decision
WO2018133056A1 (en) Method and apparatus for locating sound source
Wang et al. Deep learning assisted time-frequency processing for speech enhancement on drones
CN111798869B (en) Sound source positioning method based on double microphone arrays
CN112363112B (en) Sound source positioning method and device based on linear microphone array
CN109188362A (en) A kind of microphone array auditory localization signal processing method
CN106019230B (en) A kind of sound localization method based on i-vector Speaker Identification
Pertilä et al. Multichannel source activity detection, localization, and tracking
CN116559778B (en) Vehicle whistle positioning method and system based on deep learning
CN111179959B (en) Competitive speaker number estimation method and system based on speaker embedding space
CN111060867A (en) Directional microphone microarray direction of arrival estimation method
CN111929638A (en) Voice direction of arrival estimation method and device
CN115713943A (en) Beam forming voice separation method based on complex space angular center Gaussian mixture clustering model and bidirectional long-short-term memory network
CN113345421B (en) Multi-channel far-field target voice recognition method based on angle spectrum characteristics
Kindt et al. Exploiting speaker embeddings for improved microphone clustering and speech separation in ad-hoc microphone arrays
Wang et al. A robust doa estimation method for a linear microphone array under reverberant and noisy environments
CN114664288A (en) Voice recognition method, device, equipment and storage medium
KR101022457B1 (en) Method to combine CASA and soft mask for single-channel speech separation
JP2005227511A (en) Target sound detection method, sound signal processing apparatus, voice recognition device, and program

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant