CN104599679A - Speech signal based focus covariance matrix construction method and device - Google Patents

Speech signal based focus covariance matrix construction method and device Download PDF

Info

Publication number
CN104599679A
CN104599679A CN201510052368.7A CN201510052368A CN104599679A CN 104599679 A CN104599679 A CN 104599679A CN 201510052368 A CN201510052368 A CN 201510052368A CN 104599679 A CN104599679 A CN 104599679A
Authority
CN
China
Prior art keywords
matrix
covariance matrix
focusing
voice signal
sampling frequency
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201510052368.7A
Other languages
Chinese (zh)
Inventor
陈喆
殷福亮
张梦晗
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huawei Technologies Co Ltd
Original Assignee
Huawei Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co Ltd filed Critical Huawei Technologies Co Ltd
Priority to CN201510052368.7A priority Critical patent/CN104599679A/en
Publication of CN104599679A publication Critical patent/CN104599679A/en
Priority to PCT/CN2015/082571 priority patent/WO2016119388A1/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Circuit For Audible Band Transducer (AREA)
  • Measurement Of Velocity Or Position Using Acoustic Or Ultrasonic Waves (AREA)
  • Obtaining Desirable Characteristics In Audible-Bandwidth Transducers (AREA)

Abstract

The invention discloses a speech signal based focus covariance matrix construction method and device. The method includes the steps of determining sampling frequency points of a microphone array collecting a voice signal; calculating a first covariance matrix and a focus transformation matrix of the voice signal acquired at any sampling frequency point as well as a conjugate transpose matrix of the focus transformation matrix according to any of the determined sampling frequency points, and setting a product of the first covariance matrix, the focus transformation matrix and the conjugate transpose matrix of the focus transformation matrix as a focus covariance matrix of the voice signal acquired at any sampling frequency point; setting a sum of the calculated focus covariance matrixes of the voice signal acquired at every sampling frequency point as a focus covariance matrix of the voice signal. According to the speech signal based focus covariance matrix construction method and device, the prediction of the incidence angle of a sound source is avoided during the construction of the focus covariance matrix and errors exist in the prediction of the incidence angle of the sound source, so that the accuracy of the constructed focus covariance matrix is improved.

Description

A kind of method and device focusing on covariance matrix based on voice signal structure
Technical field
The present invention relates to voice process technology field, particularly a kind of method and device focusing on covariance matrix based on voice signal structure.
Background technology
Microphone array is compared with single microphone, except the time domain that can utilize sound source and frequency domain information, the spatial information of sound source can also be utilized, therefore, there is the advantages such as antijamming capability is strong, applying flexible, in the problems such as solution auditory localization, speech enhan-cement, speech recognition, there is stronger advantage, be widely used in the fields such as audio/video conference system, onboard system, auditory prosthesis, man-machine interactive system, robot system, safety monitoring, military surveillance at present.
Based in the voice processing technology of microphone array, often need the number knowing sound source, higher handling property could be obtained like this; If sound source number is unknown, or the sound source number of hypothesis is too much or very few, then will decline to the accuracy of the result of the voice that microphone array obtains.
In order to improve the accuracy of the result to the voice that microphone array obtains, propose the method calculating sound source, in the process calculating sound source, structure is needed to focus on covariance matrix, but, the incident angle needing to predict sound source in the process of covariance matrix is focused at present at structure, covariance matrix is focused on again according to the incident angle structure of prediction, and estimate the number of sound source, but, if the angle incidence angle error of the sound source doped is comparatively large, the accuracy constructing the focusing covariance matrix obtained is lower.
Summary of the invention
The embodiment of the present invention provides a kind of and focuses on the method for covariance matrix and device based on voice signal structure, the defect that the accuracy in order to solve the focusing covariance matrix that the structure that exists in prior art obtains is lower.
The concrete technical scheme that the embodiment of the present invention provides is as follows:
First aspect, provides a kind of method focusing on covariance matrix based on voice signal structure, comprising:
Determine the sampling frequency that microphone array adopts when gathering voice signal;
For any one the sampling frequency in the sampling frequency determined, calculate the first covariance matrix, the focusing transform matrix of the voice signal collected at any one sampling frequency described, and the associate matrix of described focusing transform matrix, and by the product of the associate matrix of described first covariance matrix, described focusing transform matrix, described focusing transform matrix, as the focusing covariance matrix of the voice signal collected at described any sampling frequency;
By the focusing covariance matrix sum of voice signal collected respectively at each sampling frequency calculated, as the focusing covariance matrix of the voice signal that described microphone array collects.
In conjunction with first aspect, in the implementation that the first is possible, calculate described first covariance matrix, specifically comprise:
Calculate described first covariance matrix in the following way:
R ^ ( k ) = 1 P Σ i = 1 P X i ( k ) X i H ( k ) , k = 0 , . . . . . . , N - 1
Wherein, described in represent described first covariance matrix, described k represents described any sampling frequency, described P represents that described microphone array gathers the quantity of the frame of described voice signal, described X i(k) represent described microphone array any frame and described any one sampling frequency time discrete Fourier transformation DFT value, described in represent described X ik the associate matrix of (), described N represent the quantity of the sampling frequency that any frame comprises, the quantity of the sampling frequency included by any two different frames is all identical.
In conjunction with first aspect, and the first possible implementation of first aspect, in the implementation that the second is possible, before calculating described focusing transform matrix, also comprise:
Determine the focusing frequency of the sampling frequency that described microphone array adopts when gathering voice signal;
Calculate the second covariance matrix that described microphone array is listed in the voice signal that described focusing frequency collects;
Calculate described focusing transform matrix, specifically comprise:
To described first covariance matrix characteristics of decomposition value, obtain first eigenvector matrix, and conjugate transpose is carried out to described first eigenvector matrix, obtain the associate matrix of described first eigenvector matrix;
To described second covariance matrix characteristics of decomposition value, obtain second feature vector matrix;
By the product of the associate matrix of described first eigenvector matrix, described second feature vector matrix, as described focusing transform matrix.
In conjunction with the implementation that the second of first aspect is possible, in the implementation that the third is possible, calculate described second covariance matrix, specifically comprise:
Calculate described second covariance matrix in the following way:
R ^ ( k 0 ) = 1 P Σ i = 1 P X i ( k 0 ) X i H ( k 0 )
Wherein, described in represent described second covariance matrix, described k 0represent described focusing frequency, described P represents that described microphone array gathers the quantity of the frame of described voice signal, described X i(k 0) represent the DFT value of described microphone array when any frame and described focusing frequency, described in represent described X i(k 0) associate matrix.
In conjunction with the second or the third possible implementation of first aspect, in the 4th kind of possible implementation, to described first covariance matrix characteristics of decomposition value, specifically comprise:
In the following way to described first covariance matrix characteristics of decomposition value:
R ^ ( k ) = U ( k ) Λ U H ( k )
Wherein, described in represent described in described second covariance matrix, the expression of described U (k) second feature vector matrix, described Λ represent described in eigenwert arrange the diagonal matrix formed, described U by descending order hk () represents the associate matrix of described U (k).
In conjunction with the second of first aspect to the 4th kind of possible implementation, in the 5th kind of possible implementation, to described second covariance matrix characteristics of decomposition value, specifically comprise:
In the following way to described second covariance matrix characteristics of decomposition value:
R ^ ( k 0 ) = U ( k 0 ) Λ 0 U H ( k 0 )
Wherein, described in represent described second covariance matrix, described U (k 0) described in expression second feature vector matrix, described Λ 0described in expression eigenwert arrange the diagonal matrix formed, described U by descending order h(k 0) represent described U (k 0) associate matrix.
In conjunction with the first of first aspect to the 5th kind of possible implementation, in the 6th kind of possible implementation, described X ik () form is as follows:
X i(k)=[X i1(k),X i2(k),......,X iL(k)] T,i=0,1,2,......,P-1
Wherein: X i1k () represents DFT value, the X of the 1st array element of described microphone array when the i-th frame and kth sample frequency i2k () represents DFT value, the X of the 2nd array element of described microphone array when the i-th frame and kth sample frequency iLthe quantity that k () represents the DFT value of L array element of described microphone array when the i-th frame and kth sample frequency, described L is the array element that described microphone array comprises.
Second aspect, provides a kind of device focusing on covariance matrix based on voice signal structure, comprising:
Determining unit, for determining the sampling frequency that microphone array adopts when gathering voice signal;
First computing unit, for frequency of sampling for any one in the sampling frequency determined, calculate the first covariance matrix, the focusing transform matrix of the voice signal collected at any one sampling frequency described, and the associate matrix of described focusing transform matrix, and by the product of the associate matrix of described first covariance matrix, described focusing transform matrix, described focusing transform matrix, as the focusing covariance matrix of the voice signal collected at described any sampling frequency;
Second computing unit, for the focusing covariance matrix sum of voice signal collected respectively at each sampling frequency that will calculate, as the focusing covariance matrix of the voice signal that described microphone array collects.
In conjunction with second aspect, in the implementation that the first is possible, described first computing unit, when calculating described first covariance matrix, is specially:
Calculate described first covariance matrix in the following way:
R ^ ( k ) = 1 P Σ i = 1 P X i ( k ) X i H ( k ) , k = 0 , . . . . . . , N - 1
Wherein, described in represent described first covariance matrix, described k represents described any sampling frequency, described P represents that described microphone array gathers the quantity of the frame of described voice signal, described X i(k) represent described microphone array any frame and described any one sampling frequency time discrete Fourier transformation DFT value, described in represent described X ik the associate matrix of (), described N represent the quantity of the sampling frequency that any frame comprises, the quantity of the sampling frequency included by any two different frames is all identical.
In conjunction with second aspect, and the first possible implementation of second aspect, in the implementation that the second is possible, described determining unit also for, determine the focusing frequency of the sampling frequency that described microphone array adopts when gathering voice signal;
Described first computing unit also for, calculate the second covariance matrix that described microphone array is listed in the voice signal that described focusing frequency collects;
Described first computing unit, when calculating described focusing transform matrix, is specially:
To described first covariance matrix characteristics of decomposition value, obtain first eigenvector matrix, and conjugate transpose is carried out to described first eigenvector matrix, obtain the associate matrix of described first eigenvector matrix;
To described second covariance matrix characteristics of decomposition value, obtain second feature vector matrix;
By the product of the associate matrix of described first eigenvector matrix, described second feature vector matrix, as described focusing transform matrix.
In conjunction with the implementation that the second of second aspect is possible, in the implementation that the third is possible, described first computing unit, when calculating described second covariance matrix, is specially:
Calculate described second covariance matrix in the following way:
R ^ ( k 0 ) = 1 P Σ i = 1 P X i ( k 0 ) X i H ( k 0 )
Wherein, described in represent described second covariance matrix, described k 0represent described focusing frequency, described P represents that described microphone array gathers the quantity of the frame of described voice signal, described X i(k 0) represent the DFT value of described microphone array when any frame and described focusing frequency, described in represent described X i(k 0) associate matrix.
In conjunction with the second or the third possible implementation of second aspect, in the 4th kind of possible implementation, described first computing unit, when to described first covariance matrix characteristics of decomposition value, is specially:
In the following way to described first covariance matrix characteristics of decomposition value:
R ^ ( k ) = U ( k ) Λ U H ( k )
Wherein, described in represent described in described second covariance matrix, the expression of described U (k) second feature vector matrix, described Λ represent described in eigenwert arrange the diagonal matrix formed, described U by descending order hk () represents the associate matrix of described U (k).
In conjunction with the second of second aspect to the 4th kind of possible implementation, in the 5th kind of possible implementation, described first computing unit, when to described second covariance matrix characteristics of decomposition value, is specially:
In the following way to described second covariance matrix characteristics of decomposition value:
R ^ ( k 0 ) = U ( k 0 ) Λ 0 U H ( k 0 )
Wherein, described in represent described second covariance matrix, described U (k 0) described in expression second feature vector matrix, described Λ 0described in expression eigenwert arrange the diagonal matrix formed, described U by descending order h(k 0) represent described U (k 0) associate matrix.
In conjunction with the first of second aspect to the 5th kind of possible implementation, in the 6th kind of possible implementation, described X ik () form is as follows:
X i(k)=[X i1(k),X i2(k),......,X iL(k)] T,i=0,1,2,......,P-1
Wherein: X i1k () represents DFT value, the X of the 1st array element of described microphone array when the i-th frame and kth sample frequency i2k () represents DFT value, the X of the 2nd array element of described microphone array when the i-th frame and kth sample frequency iLthe quantity that k () represents the DFT value of L array element of described microphone array when the i-th frame and kth sample frequency, described L is the array element that described microphone array comprises.
Beneficial effect of the present invention is as follows:
The main thought based on voice signal structure focusing covariance matrix that the embodiment of the present invention provides is: determine the sampling frequency that microphone array adopts when gathering voice signal; For any one the sampling frequency in the sampling frequency determined, calculate the first covariance matrix, the focusing transform matrix that collect voice signal at any one sampling frequency, and the associate matrix of focusing transform matrix, and by the product of the associate matrix of the first covariance matrix, focusing transform matrix, focusing transform matrix, as the focusing covariance matrix of the voice signal collected at any sampling frequency; By the focusing covariance matrix sum of voice signal collected respectively at each sampling frequency calculated, as the focusing covariance matrix of voice signal, in this scenario, when constructing focusing covariance matrix, do not need the incident angle predicting sound source, and when predicting the incident angle of sound source, there is error, therefore, the scheme that the embodiment of the present invention provides improves the accuracy of the focusing covariance matrix of structure.
Accompanying drawing explanation
Figure 1A is the process flow diagram focusing on covariance matrix in the embodiment of the present invention based on voice signal structure;
Figure 1B is that in the embodiment of the present invention, frame moves schematic diagram;
The one that the number of the calculating sound source that Fig. 1 C provides for the embodiment of the present invention and CSM-GDE calculate the number of sound source contrasts schematic diagram;
The another kind that the number of the calculating sound source that Fig. 1 D provides for the embodiment of the present invention and CSM-GDE calculate the number of sound source contrasts schematic diagram;
Fig. 2 is the embodiment focusing on covariance matrix in the embodiment of the present invention based on voice signal structure;
Fig. 3 A is a kind of structural representation focusing on the device of covariance matrix in the embodiment of the present invention based on voice signal structure;
Fig. 3 B is a kind of structural representation focusing on the device of covariance matrix in the embodiment of the present invention based on voice signal structure.
Embodiment
For making the object of the embodiment of the present invention, technical scheme and advantage clearly, below in conjunction with the accompanying drawing in the embodiment of the present invention, technical scheme in the embodiment of the present invention is clearly and completely described, obviously, described embodiment is the present invention's part embodiment, instead of whole embodiments.Based on the embodiment in the present invention, those of ordinary skill in the art, not making the every other embodiment obtained under creative work prerequisite, belong to the scope of protection of the invention.
Term "and/or" herein, being only a kind of incidence relation describing affiliated partner, can there are three kinds of relations in expression, and such as, A and/or B, can represent: individualism A, exists A and B simultaneously, these three kinds of situations of individualism B.In addition, alphabetical "/" herein, general expression forward-backward correlation is to the relation liking a kind of "or".
Below in conjunction with Figure of description, the preferred embodiment of the present invention is described in detail, be to be understood that, preferred embodiment described herein is only for instruction and explanation of the present invention, be not intended to limit the present invention, and when not conflicting, the embodiment in the application and the feature in embodiment can combine mutually.
Below in conjunction with accompanying drawing, the preferred embodiment of the present invention is described in detail.
Consult shown in Figure 1A, in the embodiment of the present invention, the flow process focusing on covariance matrix based on voice signal structure is as follows:
Step 100: determine the sampling frequency that microphone array adopts when gathering voice signal;
Step 110: for any one the sampling frequency in the sampling frequency determined, calculate the first covariance matrix, the focusing transform matrix of the voice signal collected at any one sampling frequency, and the associate matrix of focusing transform matrix, and by the product of the associate matrix of the first covariance matrix, focusing transform matrix, focusing transform matrix, as the focusing covariance matrix of the voice signal collected at any sampling frequency;
Step 120: by the focusing covariance matrix sum of voice signal collected respectively at each sampling frequency calculated, as the focusing covariance matrix of the voice signal that microphone array collects.
In the embodiment of the present invention, in order to improve the accuracy of the focusing covariance matrix constructed, obtaining microphone array after the voice signal that any sampling frequency collects, calculate the first covariance matrix, the focusing transform matrix of the voice signal collected at any one sampling frequency, and before the associate matrix of focusing transform matrix, also comprise following operation:
Pre-emphasis process is carried out to the voice signal collected;
Now, calculate the first covariance matrix, the focusing transform matrix of voice signal that collect at any one sampling frequency, and the associate matrix of focusing transform matrix, optionally, can in the following way:
Pre-emphasis process is carried out to the voice signal collected at any one sampling frequency;
Calculate the first covariance matrix, the focusing transform matrix of the voice signal after pre-emphasis process, and the associate matrix of focusing transform matrix.
In the embodiment of the present invention, optionally, pre-emphasis process can be carried out to voice signal in the following way:
x ^ ( k ) = x ( k ) - ax ( k - 1 ) , k = 0,1,2 , . . . . . . , N - 1 (formula one)
Wherein, be be the sample quantity of frequency, a be pre emphasis factor at sample voice signal that frequency collects, N of kth-1 at a kth voice signal that sampling frequency collects, x (k-1) for the voice signal after carrying out pre-emphasis process to the voice signal collected at kth sampling frequency, x (k), optionally, a=0.9375 is got.
Wherein, optionally, the form of x (k) is as shown in formula two:
X i(k)=[X i1(k), X i2(k) ..., X iL(k)] t, i=0,1,2 ..., P-1 (formula two)
Wherein: X i1k () represents DFT value, the X of the 1st array element of microphone array when the i-th frame and kth sample frequency i2(k) represent the 2nd array element of microphone array when the i-th frame and kth sample frequency DFT value ..., X iLthe quantity that k () represents the DFT value of L array element of microphone array when the i-th frame and kth sample frequency, L is the array element that microphone array comprises, P represent that microphone array gathers the quantity of the frame of voice signal.
In the embodiment of the present invention, in order to improve the accuracy of the focusing covariance matrix constructed, obtain microphone array after the voice signal that any sampling frequency collects, calculate the first covariance matrix, the focusing transform matrix of the voice signal collected at any one sampling frequency, and before the associate matrix of focusing transform matrix, also comprise following operation:
Sub-frame processing is carried out to the voice signal collected;
Calculate the first covariance matrix, the focusing transform matrix of voice signal that collect at any one sampling frequency, and during the associate matrix of focusing transform matrix, optionally, can in the following way:
Sub-frame processing is carried out to the voice signal collected at any one sampling frequency;
Calculate the first covariance matrix, the focusing transform matrix of the voice signal after carrying out sub-frame processing, and the associate matrix of focusing transform matrix.
In the embodiment of the present invention, when carrying out sub-frame processing, adopt overlapping mode to carry out framing, namely, two frames produce overlapping, and overlapping part is called that frame moves, and optionally, choose frame and move half into frame length, framing is as shown in Figure 1B overlapping.
In the embodiment of the present invention, in order to improve the accuracy of the focusing covariance matrix constructed further, to receive voice signal after carrying out sub-frame processing, need to carry out windowing process to the voice signal carried out after sub-frame processing.
Can in the following way when windowing process carried out to the voice signal carried out after sub-frame processing:
Voice signal after carrying out sub-frame processing is multiplied with Hamming window function w (n).Wherein, optionally, Hamming window function w (n) is as shown in formula three:
w ( k ) = 0.54 - 0.46 cos ( π 2 k + 1 N ) , k = 0 , . . . . . . , N - 1 (formula three)
Wherein, k is any sampling frequency, and N represents the quantity of the sampling frequency that any frame comprises, and the quantity of the sampling frequency included by any two different frames is all identical.
In actual applications, the voice signal that microphone array collects may some signal be the voice signal that destination object sends, some signal is the voice signal that non-destination object sends, such as: time in session, before speaker's speech, there are some noises, these noises are voice signals that non-destination object sends, and when speaker starts to talk, the voice signal that now microphone array collects is exactly the voice signal that destination object sends, and the accuracy of the focusing covariance matrix constructed according to the voice signal that these destination objects send is higher, therefore, in the embodiment of the present invention, after the voice signal that acquisition microphone array collects, calculate the first covariance matrix of the voice signal collected at any one sampling frequency, focusing transform matrix, and before the associate matrix of focusing transform matrix, also comprise following operation:
Calculate at any one sampling frequency, the energy value of voice signal that collects at any frame;
Determine that corresponding energy value reaches the frame at the voice signal place of preset energy threshold value;
Calculate the first covariance matrix, the focusing transform matrix of voice signal that collect at any one sampling frequency, and during the associate matrix of focusing transform matrix, optionally, can in the following way:
Calculate the first covariance matrix, the focusing transform matrix of the voice signal collected at any one sampling frequency and the frame determined, and the associate matrix of focusing transform matrix.
In the embodiment of the present invention, the mode calculating the first covariance matrix has multiple, optionally, and can in the following way:
Calculate the first covariance matrix in the following way:
R ^ ( k ) = 1 P Σ i = 1 P X i ( k ) X i H ( k ) , k = 0 , . . . . . . , N - 1 (formula four)
Wherein, represent the first covariance matrix, k represents any sampling frequency, P represents that microphone array gathers quantity, the X of the frame of voice signal idFT (Discrete Fourier Transform, the discrete Fourier transformation) value of (k) expression microphone array when any frame and any sampling frequency, represent X ik the associate matrix of (), N represent the quantity of the sampling frequency that any frame comprises, the quantity of the sampling frequency included by any two different frames is all identical.
In the embodiment of the present invention, before calculating focusing transform matrix, also comprise following operation:
Determine the focusing frequency of the sampling frequency that microphone array adopts when gathering voice signal;
Calculate microphone array and be listed in the second covariance matrix focusing on the voice signal that frequency collects;
Now, when calculating focusing transform matrix, optionally, can in the following way:
To the first covariance matrix characteristics of decomposition value, obtain first eigenvector matrix, and conjugate transpose is carried out to first eigenvector matrix, obtain the associate matrix of first eigenvector matrix;
To the second covariance matrix characteristics of decomposition value, obtain second feature vector matrix;
By the product of the associate matrix of first eigenvector matrix, second feature vector matrix, as focusing transform matrix.
In the embodiment of the present invention, when calculating the second covariance matrix, optionally, can in the following way:
Calculate the second covariance matrix in the following way:
R ^ ( k 0 ) = 1 P Σ i = 1 P X i ( k 0 ) X i H ( k 0 ) (formula five)
Wherein, represent the second covariance matrix, k 0expression focuses on frequency, P represents that microphone array gathers quantity, the X of the frame of voice signal i(k 0) the DFT value of expression microphone array at any frame and when focusing on frequency, represent X i(k 0) associate matrix.
In the embodiment of the present invention, during to the first covariance matrix characteristics of decomposition value, optionally, can in the following way:
In the following way to the first covariance matrix characteristics of decomposition value:
R ^ ( k ) = U ( k ) Λ U H ( k ) (formula six)
Wherein, represent that the second covariance matrix, U (k) represent second feature vector matrix, Λ represent eigenwert arrange the diagonal matrix, the U that form by descending order hk () represents the associate matrix of U (k).
In the embodiment of the present invention, during to the second covariance matrix characteristics of decomposition value, optionally, can in the following way:
In the following way to the second covariance matrix characteristics of decomposition value:
R ^ ( k 0 ) = U ( k 0 ) Λ 0 U H ( k 0 ) (formula seven)
Wherein, represent the second covariance matrix, U (k 0) represent second feature vector matrix, Λ 0represent eigenwert arrange the diagonal matrix, the U that form by descending order h(k 0) represent U (k 0) associate matrix.
In the embodiment of the present invention, optionally, X ik () form is as shown in formula two.In the embodiment of the present invention, focus on after covariance matrix calculating, sound source number can be calculated according to the focusing covariance matrix obtained, when calculating sound source number according to the focusing covariance matrix obtained, optionally, can in the following way:
Your conical pods of lid is adopted to calculate sound source number according to the focusing covariance matrix obtained.Such as: in indoor environment, room-size is 10m × 10m × 3m, and eight apex coordinates are respectively (0,0,0), (0,10,0), (0,10,2.5), (0,0,2.5), (10,0,0), (10,10,0), (10,10,2.5) and (10,0,2.5).The uniform linear array of 10 microphone compositions is distributed in (2,4,1.3) and (2,4.9,1.3) point-to-point transmission, array element distance is 0.1m, and array element is isotropic omni-directional microphone, and 6 speaker positions are respectively (8,1,1.3), (8,2.6,1.3), (8,4.2,1.3), (8,5.8,1.3), (8,7.4,1.3) and (8,9,1.3), suppose that ground unrest is white Gaussian noise.Use Image realistic model to process microphone array and speaker's speech, with 8kHz sample frequency, voice signal is sampled, obtain microphone array Received signal strength.Coefficient gamma=0.8 of folding resampling, iterations is 20.Speaker's voice signal duration long enough, get different pieces of information in each experiment and carry out 50 tests, detection probability is as follows:
(formula eight)
If actual speaker's number is 2, any frame comprises 128 sampling frequencies, number of frames is 100, parameter D (K)=0.7 in your conical pods of lid, signal to noise ratio (S/N ratio) changes to 5dB from-5dB, when step-length is 1dB, the method of the focusing covariance matrix that the method construct adopting the embodiment of the present invention to provide goes out and existing CSM (Coherent Signal Subspace Method, coherent signal-subspace method)-GDE (GerschgorinDisk Estimator, your disc estimation of lid method) method detection probability with signal to noise ratio (S/N ratio) contrast as shown in Figure 1 C.Can be found out by Fig. 1 C, CSM-GDE method is when signal to noise ratio (S/N ratio) is 0dB, and detection probability can reach 0.9, and when signal to noise ratio (S/N ratio) is 4dB, detection probability can reach 1.Scheme provided by the invention is when signal to noise ratio (S/N ratio) is less than 0dB, and compared with CSM-GDE method, correct detection probability has a distinct increment; When signal to noise ratio (S/N ratio) is-3dB, detection probability reaches 0.9, and when signal to noise ratio (S/N ratio) is-3dB, correct detection probability can reach 1.
If actual speaker's number is 2, signal to noise ratio (S/N ratio) is 10dB, any frame comprises 128 sampling frequencies, number of frames changes to 70 from 5, when step-length is 5, the method for the focusing covariance matrix adopting the method construct that provides of the embodiment of the present invention to go out and existing CSM-GDE method detection probability with number of frames contrast as shown in figure ip.From Fig. 1 D, CSM-GDE method when number of frames is 40, detection probability can reach 0.9, and when number of frames is 65, detection probability can reach 1.The present invention program is when number of frames is less than 50, and compared with CSM-GDE method, detection probability has a distinct increment; When number of frames is 25, detection probability reaches 0.9, and when number of frames is 50, detection probability can reach 1.
Table 1 gives the structure that provides according to the present invention program and focuses on the Performance comparision of method in different speaker's number situation that method that covariance matrix calculates sound source number and CSM-GDE calculate sound source number.In this experiment, actual speaker's number is 2, and signal to noise ratio (S/N ratio) is 10dB, and subframe lengths is 128 points, and number of frames is 100.As shown in Table 1, when actual speaker's number is 2 and 3, the method detection probability that the method for the structure focusing covariance matrix calculating sound source number that the present invention program provides and CSM-GDE calculate sound source number all can reach 1, when actual speaker's number is greater than 3, increase detection probability with speaker's number to decline gradually, under speaker's number same case, the structure provided according to the present invention program focuses on the method that method that covariance matrix calculates sound source number calculates sound source number compared with CSM-GDE and has higher detection probability.
Table 1 detection probability is with the change of actual speaker's number
Actual speaker's number 2 3 4 5 6
CSM-GDE 1 1 0.94 0.84 0.66
The present invention program 1 1 0.98 0.90 0.72
In the embodiment of the present invention, adopting your conical pods of lid to calculate sound source number according to the focusing covariance matrix obtained is the mode relatively commonly used in the art, no longer describes in detail at this.
In order to understand the embodiment of the present invention better, below providing embody rule scene, for the process focusing on covariance matrix based on voice signal structure, making and describing in further detail, as shown in Figure 2:
Step 200: when determining that microphone array gathers voice signal, the sampling frequency that adopts is 100: sampling frequency 0, sampling frequency 1, sampling frequency 2 ..., sampling frequency 99;
Step 210: for sampling frequency, 0, calculate the first covariance matrix for sampling frequency 0;
Step 220: the focusing frequency determining 100 sampling frequencies;
Step 230: calculate microphone array and be listed in the second covariance matrix focusing on the voice signal that frequency collects;
Step 240: to the first covariance matrix characteristics of decomposition value, obtain first eigenvector matrix, and conjugate transpose is carried out to first eigenvector matrix, obtain the associate matrix of first eigenvector matrix;
Step 250: to the second covariance matrix characteristics of decomposition value, obtain second feature vector matrix;
Step 260: by the product of the associate matrix of first eigenvector matrix, second feature vector matrix, as focusing transform matrix, and conjugate transpose is carried out to focusing transform matrix, obtain the associate matrix of focusing transform matrix;
Step 270: by the product of the associate matrix of the first covariance matrix, focusing transform matrix, focusing transform matrix, as the focusing covariance matrix of the voice signal collected at sampling frequency 0;
Step 280: according to calculating the focusing covariance matrix calculating other sampling frequencies for the mode of the focusing covariance matrix of sampling frequency 0, and by the focusing covariance matrix sum for each sampling frequency, as the focusing covariance matrix of the voice signal that microphone array collects.
Based on the technical scheme of above-mentioned correlation method, consult shown in Fig. 3 A, the embodiment of the present invention provides a kind of device focusing on covariance matrix based on voice signal structure, and this device comprises determining unit 30, first computing unit 31, and the second computing unit 32, wherein:
Determining unit 30, for determining the sampling frequency that microphone array adopts when gathering voice signal;
First computing unit 31, for frequency of sampling for any one in the sampling frequency determined, calculate the first covariance matrix, the focusing transform matrix of the voice signal collected at any one sampling frequency, and the associate matrix of focusing transform matrix, and by the product of the associate matrix of the first covariance matrix, focusing transform matrix, focusing transform matrix, as the focusing covariance matrix of the voice signal collected at any sampling frequency;
Second computing unit 32, for the focusing covariance matrix sum of voice signal collected respectively at each sampling frequency that will calculate, as the focusing covariance matrix of the voice signal that microphone array collects.
Optionally, the first computing unit 31, when calculating the first covariance matrix, is specially:
Calculate the first covariance matrix in the following way:
R ^ ( k ) = 1 P Σ i = 1 P X i ( k ) X i H ( k ) , k = 0 , . . . . . . , N - 1
Wherein, represent the first covariance matrix, k represents any sampling frequency, P represents that microphone array gathers quantity, the X of the frame of voice signal ithe discrete Fourier transformation DFT value of (k) expression microphone array when any frame and any sampling frequency, represent X ik the associate matrix of (), N represent the quantity of the sampling frequency that any frame comprises, the quantity of the sampling frequency included by any two different frames is all identical.
Further, determining unit 30 also for, determine the focusing frequency of sampling frequency that microphone array adopts when gathering voice signal;
First computing unit 31 also for, calculate microphone array and be listed in the second covariance matrix of voice signal focusing on frequency and collect;
First computing unit 31, when calculating focusing transform matrix, is specially:
To the first covariance matrix characteristics of decomposition value, obtain first eigenvector matrix, and conjugate transpose is carried out to first eigenvector matrix, obtain the associate matrix of first eigenvector matrix;
To the second covariance matrix characteristics of decomposition value, obtain second feature vector matrix;
By the product of the associate matrix of first eigenvector matrix, second feature vector matrix, as focusing transform matrix.
Optionally, the first computing unit 31, when calculating the second covariance matrix, is specially:
Calculate the second covariance matrix in the following way:
R ^ ( k 0 ) = 1 P Σ i = 1 P X i ( k 0 ) X i H ( k 0 )
Wherein, represent the second covariance matrix, k 0expression focuses on frequency, P represents that microphone array gathers quantity, the X of the frame of voice signal i(k 0) the DFT value of expression microphone array at any frame and when focusing on frequency, represent X i(k 0) associate matrix.
Optionally, the first computing unit 31, when to the first covariance matrix characteristics of decomposition value, is specially:
In the following way to the first covariance matrix characteristics of decomposition value:
R ^ ( k ) = U ( k ) Λ U H ( k )
Wherein, represent that the second covariance matrix, U (k) represent second feature vector matrix, Λ represent eigenwert arrange the diagonal matrix, the U that form by descending order hk () represents the associate matrix of U (k).
Optionally, the first computing unit 31, when to the second covariance matrix characteristics of decomposition value, is specially:
In the following way to the second covariance matrix characteristics of decomposition value:
R ^ ( k 0 ) = U ( k 0 ) Λ 0 U H ( k 0 )
Wherein, represent the second covariance matrix, U (k 0) represent second feature vector matrix, Λ 0represent eigenwert arrange the diagonal matrix, the U that form by descending order h(k 0) represent U (k 0) associate matrix.
Optionally, X ik () form is as follows:
X i(k)=[X i1(k),X i2(k),......,X iL(k)] T,i=0,1,2,......,P-1
Wherein: X i1k () represents DFT value, the X of the 1st array element of microphone array when the i-th frame and kth sample frequency i2(k) represent the 2nd array element of microphone array when the i-th frame and kth sample frequency DFT value ..., X iLthe quantity that k () represents the DFT value of L array element of microphone array when the i-th frame and kth sample frequency, L is the array element that microphone array comprises.
As shown in Figure 3 B, be the another kind of structural representation focusing on the device of covariance matrix based on voice signal structure that the embodiment of the present invention provides, comprise at least one processor 301, communication bus 302, storer 303 and at least one communication interface 304.
Wherein, communication bus 302 is for the connection that realizes between said modules and communicate, and communication interface 304 is for being connected with external unit and communicating.
Wherein, storer 303 is for storing executable program code, and processor 301 passes through to perform these program codes, for:
Determine the sampling frequency that microphone array adopts when gathering voice signal;
For any one the sampling frequency in the sampling frequency determined, calculate the first covariance matrix, the focusing transform matrix of the voice signal collected at any one sampling frequency, and the associate matrix of focusing transform matrix, and by the product of the associate matrix of the first covariance matrix, focusing transform matrix, focusing transform matrix, as the focusing covariance matrix of the voice signal collected at any sampling frequency;
By the focusing covariance matrix sum of voice signal collected respectively at each sampling frequency calculated, as the focusing covariance matrix of the voice signal that microphone array collects.
Optionally, when processor 301 calculates the first covariance matrix, be specially:
Calculate the first covariance matrix in the following way:
R ^ ( k ) = 1 P Σ i = 1 P X i ( k ) X i H ( k ) , k = 0 , . . . . . . , N - 1
Wherein, represent the first covariance matrix, k represents any sampling frequency, P represents that microphone array gathers quantity, the X of the frame of voice signal ithe discrete Fourier transformation DFT value of (k) expression microphone array when any frame and any sampling frequency, represent X ik the associate matrix of (), N represent the quantity of the sampling frequency that any frame comprises, the quantity of the sampling frequency included by any two different frames is all identical.
Further, processor 301 also comprises before calculating focusing transform matrix:
Determine the focusing frequency of the sampling frequency that microphone array adopts when gathering voice signal;
Calculate microphone array and be listed in the second covariance matrix focusing on the voice signal that frequency collects;
Calculate focusing transform matrix, specifically comprise:
To the first covariance matrix characteristics of decomposition value, obtain first eigenvector matrix, and conjugate transpose is carried out to first eigenvector matrix, obtain the associate matrix of first eigenvector matrix;
To the second covariance matrix characteristics of decomposition value, obtain second feature vector matrix;
By the product of the associate matrix of first eigenvector matrix, second feature vector matrix, as focusing transform matrix.
Optionally, when processor 301 calculates the second covariance matrix, be specially:
Calculate the second covariance matrix in the following way:
R ^ ( k 0 ) = 1 P Σ i = 1 P X i ( k 0 ) X i H ( k 0 )
Wherein, represent the second covariance matrix, k 0expression focuses on frequency, P represents that microphone array gathers quantity, the X of the frame of voice signal i(k 0) represent microphone array any frame and focus on frequency time
DFT value, X i h(k 0) represent X i(k 0) associate matrix.
Optionally, when processor 301 is to the first covariance matrix characteristics of decomposition value, be specially:
In the following way to the first covariance matrix characteristics of decomposition value:
R ^ ( k ) = U ( k ) Λ U H ( k )
Wherein, represent that the second covariance matrix, U (k) represent second feature vector matrix, Λ represent eigenwert arrange the diagonal matrix, the U that form by descending order hk () represents the associate matrix of U (k).
Optionally, when processor 301 is to the second covariance matrix characteristics of decomposition value, be specially:
In the following way to the second covariance matrix characteristics of decomposition value:
R ^ ( k 0 ) = U ( k 0 ) Λ 0 U H ( k 0 )
Wherein, represent the second covariance matrix, U (k 0) represent second feature vector matrix, Λ 0represent eigenwert arrange the diagonal matrix, the U that form by descending order h(k 0) represent U (k 0) associate matrix.
In the embodiment of the present invention, optionally, X ik () form is as follows:
X i(k)=[X i1(k),X i2(k),......,X iL(k)] T,i=0,1,2,......,P-1
Wherein: X i1k () represents DFT value, the X of the 1st array element of microphone array when the i-th frame and kth sample frequency i2(k) represent the 2nd array element of microphone array when the i-th frame and kth sample frequency DFT value ..., X iLthe quantity that k () represents the DFT value of L array element of microphone array when the i-th frame and kth sample frequency, L is the array element that microphone array comprises.
The present invention describes with reference to according to the process flow diagram of the method for the embodiment of the present invention, equipment (system) and computer program and/or block scheme.Should understand can by the combination of the flow process in each flow process in computer program instructions realization flow figure and/or block scheme and/or square frame and process flow diagram and/or block scheme and/or square frame.These computer program instructions can being provided to the processor of multi-purpose computer, special purpose computer, Embedded Processor or other programmable data processing device to produce a machine, making the instruction performed by the processor of computing machine or other programmable data processing device produce device for realizing the function in process flow diagram flow process or multiple flow process and/or block scheme square frame or multiple square frame.
These computer program instructions also can be stored in can in the computer-readable memory that works in a specific way of vectoring computer or other programmable data processing device, the instruction making to be stored in this computer-readable memory produces the manufacture comprising command device, and this command device realizes the function in process flow diagram flow process or multiple flow process and/or block scheme square frame or multiple square frame.
These computer program instructions also can be loaded in computing machine or other programmable data processing device, make on computing machine or other programmable devices, to perform sequence of operations step to produce computer implemented process, thus the instruction performed on computing machine or other programmable devices is provided for the step of the function realized in process flow diagram flow process or multiple flow process and/or block scheme square frame or multiple square frame.
Although describe the preferred embodiments of the present invention, those skilled in the art once obtain the basic creative concept of cicada, then can make other change and amendment to these embodiments.So claims are intended to be interpreted as comprising preferred embodiment and falling into all changes and the amendment of the scope of the invention.
Obviously, those skilled in the art can carry out various change and modification to the embodiment of the present invention and not depart from the spirit and scope of the embodiment of the present invention.Like this, if these amendments of the embodiment of the present invention and modification belong within the scope of the claims in the present invention and equivalent technologies thereof, then the present invention is also intended to comprise these change and modification.

Claims (14)

1. focus on a method for covariance matrix based on voice signal structure, it is characterized in that, comprising:
Determine the sampling frequency that microphone array adopts when gathering voice signal;
For any one the sampling frequency in the sampling frequency determined, calculate the first covariance matrix, the focusing transform matrix of the voice signal collected at any one sampling frequency described, and the associate matrix of described focusing transform matrix, and by the product of the associate matrix of described first covariance matrix, described focusing transform matrix, described focusing transform matrix, as the focusing covariance matrix of the voice signal collected at described any sampling frequency;
By the focusing covariance matrix sum of voice signal collected respectively at each sampling frequency calculated, as the focusing covariance matrix of the voice signal that described microphone array collects.
2. the method for claim 1, is characterized in that, calculates described first covariance matrix, specifically comprises:
Calculate described first covariance matrix in the following way:
R ^ ( k ) = 1 P Σ i = 1 P X i ( k ) X i H ( k ) , k = 0 , . . . . . . , N - 1
Wherein, described in represent described first covariance matrix, described k represents described any sampling frequency, described P represents that described microphone array gathers the quantity of the frame of described voice signal, described X i(k) represent described microphone array any frame and described any one sampling frequency time discrete Fourier transformation DFT value, described in represent described X ik the associate matrix of (), described N represent the quantity of the sampling frequency that any frame comprises, the quantity of the sampling frequency included by any two different frames is all identical.
3. method as claimed in claim 1 or 2, is characterized in that, before calculating described focusing transform matrix, also comprise:
Determine the focusing frequency of the sampling frequency that described microphone array adopts when gathering voice signal;
Calculate the second covariance matrix that described microphone array is listed in the voice signal that described focusing frequency collects;
Calculate described focusing transform matrix, specifically comprise:
To described first covariance matrix characteristics of decomposition value, obtain first eigenvector matrix, and conjugate transpose is carried out to described first eigenvector matrix, obtain the associate matrix of described first eigenvector matrix;
To described second covariance matrix characteristics of decomposition value, obtain second feature vector matrix;
By the product of the associate matrix of described first eigenvector matrix, described second feature vector matrix, as described focusing transform matrix.
4. method as claimed in claim 3, is characterized in that, calculate described second covariance matrix, specifically comprise:
Calculate described second covariance matrix in the following way:
R ^ ( k 0 ) = 1 P Σ i = 1 P X i ( k 0 ) X i H ( k 0 )
Wherein, described in represent described second covariance matrix, described k 0represent described focusing frequency, described P represents that described microphone array gathers the quantity of the frame of described voice signal, described X i(k 0) represent the DFT value of described microphone array when any frame and described focusing frequency, described in represent described X i(k 0) associate matrix.
5. the method as described in claim 3 or 4, is characterized in that, to described first covariance matrix characteristics of decomposition value, specifically comprises:
In the following way to described first covariance matrix characteristics of decomposition value:
R ^ ( k ) = U ( k ) Λ U H ( k )
Wherein, described in represent described in described second covariance matrix, the expression of described U (k) second feature vector matrix, described Λ represent described in eigenwert arrange the diagonal matrix formed, described U by descending order hk () represents the associate matrix of described U (k).
6. the method as described in any one of claim 3-5, is characterized in that, to described second covariance matrix characteristics of decomposition value, specifically comprises:
In the following way to described second covariance matrix characteristics of decomposition value:
R ^ ( k 0 ) = U ( k 0 ) Λ 0 U H ( k 0 )
Wherein, described in represent described second covariance matrix, described U (k 0) described in expression second feature vector matrix, described Λ 0described in expression eigenwert arrange the diagonal matrix formed, described U by descending order h(k 0) represent described U (k 0) associate matrix.
7. the method as described in any one of claim 2-6, is characterized in that, described X ik () form is as follows:
X i(k)=[X i1(k),X i2(k),......,X iL(k)] T,i=0,1,2,......,P-1
Wherein: X i1k () represents DFT value, the X of the 1st array element of described microphone array when the i-th frame and kth sample frequency i2k () represents DFT value, the X of the 2nd array element of described microphone array when the i-th frame and kth sample frequency iLthe quantity that k () represents the DFT value of L array element of described microphone array when the i-th frame and kth sample frequency, described L is the array element that described microphone array comprises.
8. focus on a device for covariance matrix based on voice signal structure, it is characterized in that, comprising:
Determining unit, for determining the sampling frequency that microphone array adopts when gathering voice signal;
First computing unit, for frequency of sampling for any one in the sampling frequency determined, calculate the first covariance matrix, the focusing transform matrix of the voice signal collected at any one sampling frequency described, and the associate matrix of described focusing transform matrix, and by the product of the associate matrix of described first covariance matrix, described focusing transform matrix, described focusing transform matrix, as the focusing covariance matrix of the voice signal collected at described any sampling frequency;
Second computing unit, for the focusing covariance matrix sum of voice signal collected respectively at each sampling frequency that will calculate, as the focusing covariance matrix of the voice signal that described microphone array collects.
9. device as claimed in claim 8, is characterized in that, described first computing unit, when calculating described first covariance matrix, is specially:
Calculate described first covariance matrix in the following way:
R ^ ( k ) = 1 P Σ i = 1 P X i ( k ) X i H ( k ) , k = 0 , . . . . . . , N - 1
Wherein, described in represent described first covariance matrix, described k represents described any sampling frequency, described P represents that described microphone array gathers the quantity of the frame of described voice signal, described X i(k) represent described microphone array any frame and described any one sampling frequency time discrete Fourier transformation DFT value, described in represent described X ik the associate matrix of (), described N represent the quantity of the sampling frequency that any frame comprises, the quantity of the sampling frequency included by any two different frames is all identical.
10. as claimed in claim 8 or 9 device, is characterized in that, described determining unit also for, determine the focusing frequency of the sampling frequency that described microphone array adopts when gathering voice signal;
Described first computing unit also for, calculate the second covariance matrix that described microphone array is listed in the voice signal that described focusing frequency collects;
Described first computing unit, when calculating described focusing transform matrix, is specially:
To described first covariance matrix characteristics of decomposition value, obtain first eigenvector matrix, and conjugate transpose is carried out to described first eigenvector matrix, obtain the associate matrix of described first eigenvector matrix;
To described second covariance matrix characteristics of decomposition value, obtain second feature vector matrix;
By the product of the associate matrix of described first eigenvector matrix, described second feature vector matrix, as described focusing transform matrix.
11. devices as claimed in claim 10, is characterized in that, described first computing unit, when calculating described second covariance matrix, is specially:
Calculate described second covariance matrix in the following way:
R ^ ( k 0 ) = 1 P Σ i = 1 P X i ( k 0 ) X i H ( k 0 )
Wherein, described in represent described second covariance matrix, described k 0represent described focusing frequency, described P represents that described microphone array gathers the quantity of the frame of described voice signal, described X i(k 0) represent the DFT value of described microphone array when any frame and described focusing frequency, described in represent described X i(k 0) associate matrix.
12. devices as described in claim 10 or 11, it is characterized in that, described first computing unit, when to described first covariance matrix characteristics of decomposition value, is specially:
In the following way to described first covariance matrix characteristics of decomposition value:
R ^ ( k ) = U ( k ) Λ U H ( k )
Wherein, described in represent described in described second covariance matrix, the expression of described U (k) second feature vector matrix, described Λ represent described in eigenwert arrange the diagonal matrix formed, described U by descending order hk () represents the associate matrix of described U (k).
13. devices as described in any one of claim 10-12, it is characterized in that, described first computing unit, when to described second covariance matrix characteristics of decomposition value, is specially:
In the following way to described second covariance matrix characteristics of decomposition value:
R ^ ( k 0 ) = U ( k 0 ) Λ 0 U H ( k 0 )
Wherein, described in represent described second covariance matrix, described U (k 0) described in expression second feature vector matrix, described Λ 0described in expression eigenwert arrange the diagonal matrix formed, described U by descending order h(k 0) represent described U (k 0) associate matrix.
14. devices as described in any one of claim 9-13, is characterized in that, described X ik () form is as follows:
X i(k)=[X i1(k),X i2(k),......,X iL(k)] T,i=0,1,2,......,P-1
Wherein: X i1k () represents DFT value, the X of the 1st array element of described microphone array when the i-th frame and kth sample frequency i2k () represents DFT value, the X of the 2nd array element of described microphone array when the i-th frame and kth sample frequency iLthe quantity that k () represents the DFT value of L array element of described microphone array when the i-th frame and kth sample frequency, described L is the array element that described microphone array comprises.
CN201510052368.7A 2015-01-30 2015-01-30 Speech signal based focus covariance matrix construction method and device Pending CN104599679A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201510052368.7A CN104599679A (en) 2015-01-30 2015-01-30 Speech signal based focus covariance matrix construction method and device
PCT/CN2015/082571 WO2016119388A1 (en) 2015-01-30 2015-06-26 Method and device for constructing focus covariance matrix on the basis of voice signal

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510052368.7A CN104599679A (en) 2015-01-30 2015-01-30 Speech signal based focus covariance matrix construction method and device

Publications (1)

Publication Number Publication Date
CN104599679A true CN104599679A (en) 2015-05-06

Family

ID=53125412

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510052368.7A Pending CN104599679A (en) 2015-01-30 2015-01-30 Speech signal based focus covariance matrix construction method and device

Country Status (2)

Country Link
CN (1) CN104599679A (en)
WO (1) WO2016119388A1 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2016119388A1 (en) * 2015-01-30 2016-08-04 华为技术有限公司 Method and device for constructing focus covariance matrix on the basis of voice signal
CN108538306A (en) * 2017-12-29 2018-09-14 北京声智科技有限公司 Improve the method and device of speech ciphering equipment DOA estimations
CN110992977A (en) * 2019-12-03 2020-04-10 北京声智科技有限公司 Method and device for extracting target sound source

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110501727B (en) * 2019-08-13 2023-10-20 中国航空工业集团公司西安飞行自动控制研究所 Satellite navigation anti-interference method based on space-frequency adaptive filtering
CN111696570B (en) * 2020-08-17 2020-11-24 北京声智科技有限公司 Voice signal processing method, device, equipment and storage medium
CN113409804A (en) * 2020-12-22 2021-09-17 声耕智能科技(西安)研究院有限公司 Multichannel frequency domain speech enhancement algorithm based on variable-span generalized subspace

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040220800A1 (en) * 2003-05-02 2004-11-04 Samsung Electronics Co., Ltd Microphone array method and system, and speech recognition method and system using the same
CN102568493A (en) * 2012-02-24 2012-07-11 大连理工大学 Underdetermined blind source separation (UBSS) method based on maximum matrix diagonal rate
CN102664666A (en) * 2012-04-09 2012-09-12 电子科技大学 Efficient robust self-adapting beam forming method of broadband
CN104166120A (en) * 2014-07-04 2014-11-26 哈尔滨工程大学 Acoustic vector circular matrix steady broadband MVDR orientation estimation method

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102621527B (en) * 2012-03-20 2014-06-11 哈尔滨工程大学 Broad band coherent source azimuth estimating method based on data reconstruction
CN104599679A (en) * 2015-01-30 2015-05-06 华为技术有限公司 Speech signal based focus covariance matrix construction method and device

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040220800A1 (en) * 2003-05-02 2004-11-04 Samsung Electronics Co., Ltd Microphone array method and system, and speech recognition method and system using the same
CN102568493A (en) * 2012-02-24 2012-07-11 大连理工大学 Underdetermined blind source separation (UBSS) method based on maximum matrix diagonal rate
CN102664666A (en) * 2012-04-09 2012-09-12 电子科技大学 Efficient robust self-adapting beam forming method of broadband
CN104166120A (en) * 2014-07-04 2014-11-26 哈尔滨工程大学 Acoustic vector circular matrix steady broadband MVDR orientation estimation method

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2016119388A1 (en) * 2015-01-30 2016-08-04 华为技术有限公司 Method and device for constructing focus covariance matrix on the basis of voice signal
CN108538306A (en) * 2017-12-29 2018-09-14 北京声智科技有限公司 Improve the method and device of speech ciphering equipment DOA estimations
CN108538306B (en) * 2017-12-29 2020-05-26 北京声智科技有限公司 Method and device for improving DOA estimation of voice equipment
CN110992977A (en) * 2019-12-03 2020-04-10 北京声智科技有限公司 Method and device for extracting target sound source
CN110992977B (en) * 2019-12-03 2021-06-22 北京声智科技有限公司 Method and device for extracting target sound source

Also Published As

Publication number Publication date
WO2016119388A1 (en) 2016-08-04

Similar Documents

Publication Publication Date Title
CN104599679A (en) Speech signal based focus covariance matrix construction method and device
CN110600017B (en) Training method of voice processing model, voice recognition method, system and device
CN103308889B (en) Passive sound source two-dimensional DOA (direction of arrival) estimation method under complex environment
US20180262832A1 (en) Sound Signal Processing Apparatus and Method for Enhancing a Sound Signal
CN101430882B (en) Method and apparatus for restraining wind noise
CN103346845B (en) Based on blind frequency spectrum sensing method and the device of fast Fourier transform
Dorfan et al. Tree-based recursive expectation-maximization algorithm for localization of acoustic sources
CN105301563B (en) A kind of double sound source localization method that least square method is converted based on consistent focusing
CN102707262A (en) Sound localization system based on microphone array
CN103871420B (en) The signal processing method of microphone array and device
CN111856402B (en) Signal processing method and device, storage medium and electronic device
CN113593548B (en) Method and device for waking up intelligent equipment, storage medium and electronic device
CN105609112A (en) Sound source positioning method and apparatus and time delay estimation method and apparatus
CN112802486B (en) Noise suppression method and device and electronic equipment
CN115267671A (en) Distributed voice interaction terminal equipment and sound source positioning method and device thereof
JPWO2018003158A1 (en) Correlation function generation device, correlation function generation method, correlation function generation program and wave source direction estimation device
CN102568473A (en) Method and device for recording voice signals
CN104410762A (en) Steady echo cancellation method in hand free cell phone conversation system
CN104424954B (en) noise estimation method and device
CN109282819B (en) Ultra-wideband positioning method based on distributed hybrid filtering
CN116631438A (en) Width learning and secondary correlation sound source positioning method based on minimum p norm
CN105676167B (en) A kind of robust monolingual sound source DOA method of estimation converted based on acoustics vector sensor and bispectrum
CN113948101A (en) Noise suppression method and device based on spatial discrimination detection
CN113035174A (en) Voice recognition processing method, device, equipment and system
CN111354341A (en) Voice awakening method and device, processor, sound box and television

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20150506