CN107274887A

CN107274887A - Speaker's Further Feature Extraction method based on fusion feature MGFCC

Info

Publication number: CN107274887A
Application number: CN201710322792.8A
Authority: CN
Inventors: 张毅; 王可佳; 颜博; 乐聪聪
Original assignee: Chongqing University of Post and Telecommunications
Current assignee: Chongqing University of Post and Telecommunications
Priority date: 2017-05-09
Filing date: 2017-05-09
Publication date: 2017-10-20

Abstract

The invention discloses a kind of speaker's Further Feature Extraction method based on fusion feature MGFCC, the method comprising the steps of：S1 is handled the progress of speaker's voice using Mel wave filters and is obtained MFCC features；S2 is handled the progress of speaker's voice using Gammatone wave filters and is obtained GFCC features simultaneously；S3 is calculated each dimensional feature discrimination of two kinds of features in a noisy environment；Every one-dimensional characteristic that S4 counts two kinds of features is in maximum F_RThe number of times of value；The different of maximum times of the S5 according to two kinds of features under noise background carry out Fusion Features；S6 carries out differential to fusion feature and restructuring obtains second extraction feature.The present invention can extract more can comprehensive representation speaker feature.

Description

Speaker's Further Feature Extraction method based on fusion feature MGFCC

Technical field

The present invention relates to speaker Recognition Technology field, more particularly to a kind of speaker two based on fusion feature MGFCC Secondary feature extracting method.

Background technology

Speaker's voice is after above a series of pretreatment, it is necessary to feature extraction and calculation be carried out to it, so as to produce A raw mathematics vector sequence is used as training and the input of identification process in Speaker Recognition System, therefore extracts the excellent of feature It is bad particularly important to the training of Speaker Identification model and the determination of parameter, affect the design and its property of whole speaker's system Energy.

The selection and the performance boost of follow-up Speaker Recognition System of speaker characteristic have a direct impact, and are Speaker Identifications It is basic that system is set up.To the Speaker Recognition System in practical application scene, the selection of characteristic parameter will not only consider identification Rate, will more ensure the stability and robustness of whole system performance.Therefore it is whole to extract optimal speaker characteristic parameter Particularly important processing procedure in individual Speaker Recognition System, while being also one of difficult point in Speech processing, to speaking The recognition performance of people has a direct impact.

The content of the invention

To improve the discrimination of Speaker Recognition System in a noisy environment, the present invention is based on people from bionical angle Ear auditory properties are extracted to speaker characteristic and studied, Gammatone filter of the Selection utilization based on human hearing characteristic first Ripple device group is simulated to human ear analog cochlea respectively with Mel wave filter groups, then according to Mel frequency cepstral coefficients and The discrimination of Gammatone frequency cepstral coefficients in a noisy environment carries out Fusion Features, obtains a kind of special based on human auditory system Speaker's fusion feature MGFCC of property.

The present invention is adopted the following technical scheme that to achieve these goals：Speaker based on fusion feature MGFCC is secondary Feature extracting method, it is characterised in that comprise the following steps：

S1：The progress of speaker's voice signal is handled using Mel wave filters and obtains MFCC features；

S2：The progress of speaker's voice signal is handled using Gammatone wave filters simultaneously and obtains GFCC features；

S3：Each dimensional feature discrimination F of MFCC features and GFCC features in a noisy environment is calculated respectively_R；

S4：Every one-dimensional characteristic of statistics MFCC features and GFCC features is in the number of times of maximum characteristic area indexing respectively；

S5：Maximum characteristic area indexing number of times of the two kinds of features counted according to step S4 under noise background carries out feature and melted Close；

S6：Differential is carried out to the fusion feature that step S5 is obtained and feature restructuring obtains second extraction feature.

The method of MFCC feature extractions is described in step S1：

S11：Preemphasis processing is carried out to speaker's voice signal：Speaker's voice signal is carried out using digital filter Handle, the transmission function in its Z domain is：H (z)=1-0.95z^-1；

S12：Framing adding window is carried out to the signal after step S11 processing, each of which frame contains N number of sampled point, window function For w (n), then the voice signal s after adding window_w(n) it is：

s_w(n)=y (n) * w (n)

In formula, y (n) is the signal after preemphasis, 0≤n≤N；

Window function is from the Hamming window that main lobe is wider and secondary lobe is relatively low：

S13：Fast Fourier Transform (FFT)：Signal after S12 is handled carries out Fast Fourier Transform (FFT), from time domain data conversion To frequency domain, obtaining voice linear spectral X (k) is：

S14：Line energy is calculated to the data after each frame Fast Fourier Transform (FFT)：E (k)=[X (k)]²；

S15：Logarithm operation is made in output to each Mel wave filters, can obtain log spectrum S (m) and be：

H_m(k) frequency response of Mel wave filters is represented, M represents the number of Mel wave filters.

S16：Discrete cosine transform conversion is carried out to log spectrum S (m), and then obtains feature MFCC, then the n-th dimensional feature C (n) it is：

The extracting method of GFCC features is described in step S2：

S21：Speaker voice signal s (n) is converted into time-domain signal x (n), by quick Fu after pretreatment Leaf transformation obtains discrete power spectrum L (k),

S22：Take above-mentioned discrete power spectrum L's (k) square to obtain speech energy spectrum, then using Gammatone wave filters Group is filtered to it；

S23：Row index compression is entered in output to each wave filter, obtains one group of energy frequency spectrum s₁,s₂,s₃,…,s_MFor：

In formula, e (f) is index compressed value, and M is filter channel number, 1≤m<M, H_m(k) Gammatone wave filters are represented Frequency response.H in the present invention_m(k) frequency response of wave filter is represented.

S24：Dct transform is made to the energy spectrum after compression, GFCC features are tried to achieve, its operational formula is：

In formula, L is characterized the dimension of parameter.C_GFCC(j) the GFCC characteristic parameters of different dimensions are represented, M represents wave filter Number.

Characteristic area indexes F_RThe ratio between inter _ class relationship and within-cluster variance for being characterized：

In formula, μ is the characteristic mean of whole speakers,It is the characteristic value of i-th of speaker's jth frame, μ_iIt is to say for i-th The characteristic mean of people is talked about, H is speaker's sum, and K is the number of speech frames of single speaker.

The feature restructuring obtains second extraction feature F_MGFCC according to below equation：

MGFCC_iRepresent fusion feature, MGFCC_D_(i-p)The fusion feature after differential is represented, P is feature exponent number.

In summary, by adopting the above-described technical solution, the beneficial effects of the invention are as follows：Under identical experimental situation, Further Feature Extraction algorithm based on fusion feature MGFCC, still there is preferable identification under complicated noise, and stronger Robustness, can further extract the hiding feature of speaker, largely improve Speaker Recognition System Performance.

Brief description of the drawings

Fig. 1 is the extraction flow chart of MFCC characteristic parameters；

Fig. 2 is the extraction flow chart of GFCC characteristic parameters；

Fig. 3 is speaker's Further Feature Extraction flow chart based on fusion feature MGFCC.

Embodiment

Below in conjunction with the accompanying drawing in the embodiment of the present invention, the technical scheme in the embodiment of the present invention is purged, in detail Carefully describe.Described embodiment is only a part of embodiment of the present invention.

Fig. 1 is the extraction flow chart of MFCC characteristic parameters, and Fig. 2 is the extraction flow chart of GFCC characteristic parameters, and Fig. 3 is to be based on Fusion feature MGFCC speaker's Further Feature Extraction flow chart, as shown in the figure：It is special based on fusion the present invention is to provide one kind The method for levying MGFCC speaker's Further Feature Extraction, comprises the following steps：

S1：The progress of speaker's voice is handled using Mel wave filters and obtains MFCC features；

S2, GFCC features are obtained while being handled using Gammatone wave filters the progress of speaker's voice；

S3, is calculated each dimensional feature discrimination of two kinds of features in a noisy environment；

S4, every one-dimensional characteristic of two kinds of features of statistics is in maximum F_RThe number of times of value；

S5, different according to maximum times of two kinds of features under noise background carry out Fusion Features；

S6, carries out differential to fusion feature and restructuring obtains second extraction feature.

Speaker's Further Feature Extraction method based on fusion feature MGFCC described in step S1 is included to MFCC features Extract, then its extraction step is (referring to Fig. 1)：

S11：Preemphasis.Vocal cords and lip can bring certain low frequency signal to influence in mankind's voiced process, in order to carry The HFS of voice signal is risen, and then causes voice signal to tend to be flat, preemphasis can be taken to handle.Generally using numeral Wave filter is handled, set here the transmission function in Z domains as：

H (z)=1-0.95z^-1

S12：Framing adding window.Because voice signal is short-term stationarity, therefore sub-frame processing can be carried out to it, wherein often One frame contains N number of sampled point, and N generally takes 256, and the time is about 30ms.In view of the edge effect of speech frame, the two ends of speech frame It can produce and sharp change, now need to carry out windowing process to the signal after preemphasis processing.Definition window function is w (n), then adds Voice signal s after window_w(n) it is：

s_w(n)=y (n) * w (n)

In formula, y (n) is the signal after preemphasis, 0≤n≤N.

S13：Preferably to express speaker characteristic, generally from the Hamming window that main lobe is wider and secondary lobe is relatively low：

S14：Fast Fourier Transform (FFT) (FFT).After pretreatment, each frame voice signal needs progress corresponding FFT computings, frequency domain is transformed to from time domain data, can be obtained voice linear spectral X (k) and is：

S15：Line energy is calculated.Line energy is calculated to the data after each frame FFT：

E (k)=[X (k)]²

S16：Preferably to lift the recognition performance of Speaker Recognition System, logarithm is made in the generally output to each wave filter Computing, can obtain log spectrum S (m) is：

H_m(k) Mel frequency response is represented, M represents the number of Mel wave filters.

S17：Discrete cosine transform (DCT).Dct transform is carried out to voice spectrum S (m), and then obtains Mel feature MFCC, Then the n-th dimensional feature C (n) such as formula.

M represents number of filter.

Described speaker's Further Feature Extraction method based on fusion feature MGFCC includes the extraction to GFCC features, Then its extraction step is：

S21：Voice signal s (n) is converted into time-domain signal x (n), passes through Fast Fourier Transform (FFT) after pretreatment Obtain discrete power spectrum L (k).

S22：Take above-mentioned power spectrum L (k) square obtains speech energy spectrum, then using Gammatone wave filter groups pair It is filtered.

S23：In order to further improve the antijamming capability of Speaker Recognition System, the output to each wave filter is used Index compresses, and obtains one group of energy frequency spectrum s₁,s₂,s₃,…,s_MFor：

In formula, e (f) is index compressed value, and M is filter channel number.

In formula, L is characterized the dimension of parameter.

Described speaker's Further Feature Extraction method based on fusion feature MGFCC, then to two kinds of spies of MFCC and GFCC Each dimensional feature discrimination computational methods levied in a noisy environment are concretely comprised the following steps：

The ratio between the inter _ class relationship and within-cluster variance of feature F_R：

The noise adaptation of speaker characteristic can be judged using this discrimination.To speaker characteristic in different rings F under border_RValue is calculated, and carries out the noise robustness analysis of speaker characteristic.F_RDefinition is：

Described speaker's Further Feature Extraction method based on fusion feature MGFCC, then two kinds of features of statistics is each Dimensional feature is in maximum F_RThe number of times of value is：

Feature A and B are extracted to the speaker in self-built sound bank, while the maximum of every one-dimensional characteristic to two kinds of features F_RThe number of times P that max occurs in eigenmatrix is counted, and calculation expression is：

Described speaker's Further Feature Extraction method based on fusion feature MGFCC, then according to two kinds of features in noise The step of maximum times under background different carry out Fusion Features be：

S51：From Noisy Speech Signal, MFCC the and GFCC characteristic parameters of 24 dimensions are extracted respectively；

S52：Speaker characteristic is calculated according to the formula in step S3 to be set in factory noise and white noise and signal to noise ratio F in the environment of 5dB, 10dB, 15dB per one-dimensional characteristic_RValue；

S53：Maximum F is in per one-dimensional characteristic according to the formula statistics in step S4_RThe number of times of value, and according to per one-dimensional The difference of MFCC and GFCC maximum number of times is merged, and obtains 24 dimension fusion feature MGFCC.

Described speaker's Further Feature Extraction method based on fusion feature MGFCC, then carried out to fusion feature MGFCC Differential and restructuring obtain second extraction characterization step：

S61：Differential is carried out to fusion feature：

The purpose differentiated to feature is in order to the progress of the continuous dynamic change track of corresponding characteristic vector Description.Therefore, speaker is expressed in order to better profit from fusion feature MGFCC, MGFCC features are differentiated here with Obtain its continuous dynamic change track.

Feature_D(j)_i=Feature (j)_i-Feature(j-1)_i 0≤i≤P,1≤j≤R

In formula, Feature is the sequence vector of original feature, is herein MGFCC, and Feature_D is original characteristic vector First differential, P is feature exponent number, R characteristic vector numbers.

S62：Fusion feature is recombinated：

Formula can obtain Feature in step S62, multigroup characteristic vector such as Feature_D, carry out feature to it below Restructuring, because different speaker characteristics correspond to different phonetic feature differential vectors, is carried out by specific ratio to it Weighting restructuring, preferably to describe the personal information of speaker.It is special that one group of new speaker can be obtained according to formula below Levy parameter F_MGFCC：

The preferred embodiments of the present invention are the foregoing is only, are not intended to limit the invention, it is clear that those skilled in the art Member can carry out various changes and modification to the present invention without departing from the spirit and scope of the present invention.So, if the present invention These modifications and variations belong within the scope of the claims in the present invention and its equivalent technologies, then the present invention is also intended to include these Including change and modification.

Claims

1. speaker's Further Feature Extraction method based on fusion feature MGFCC, it is characterised in that comprise the following steps：

S5：Maximum characteristic area indexing number of times of the two kinds of features counted according to step S4 under noise background carries out Fusion Features；

2. speaker's Further Feature Extraction method based on fusion feature MGFCC according to claim 1, it is characterised in that： The method of MFCC feature extractions is described in step S1：

S11：Preemphasis processing is carried out to speaker's voice signal：Using digital filter to speaker's voice signal at Manage, the transmission function in its Z domain is：H (z)=1-0.95z^-1；

S12：Framing adding window is carried out to the signal after step S11 processing, each of which frame contains N number of sampled point, and window function is w (n), then voice signal s after adding window_w(n) it is：

s_w(n)=y (n) * w (n)

In formula, y (n) is the signal after preemphasis, 0≤n≤N；

S13：Fast Fourier Transform (FFT)：Signal after S12 is handled carries out Fast Fourier Transform (FFT), and frequency is transformed to from time domain data Domain, obtaining voice linear spectral X (k) is：

<mrow> <mi>X</mi> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>=</mo> <munderover> <mo>&Sigma;</mo> <mrow> <mi>n</mi> <mo>=</mo> <mn>0</mn> </mrow> <mrow> <mi>N</mi> <mo>-</mo> <mn>1</mn> </mrow> </munderover> <msub> <mi>s</mi> <mi>w</mi> </msub> <mrow> <mo>(</mo> <mi>n</mi> <mo>)</mo> </mrow> <msup> <mi>e</mi> <mrow> <mo>-</mo> <mi>j</mi> <mn>2</mn> <mi>&pi;</mi> <mi>n</mi> <mi>k</mi> <mo>/</mo> <mi>N</mi> </mrow> </msup> <mo>,</mo> <mrow> <mo>(</mo> <mn>0</mn> <mo>&le;</mo> <mi>n</mi> <mo>,</mo> <mi>k</mi> <mo>&le;</mo> <mi>N</mi> <mo>-</mo> <mn>1</mn> <mo>)</mo> </mrow> </mrow>

<mrow> <mi>S</mi> <mrow> <mo>(</mo> <mi>m</mi> <mo>)</mo> </mrow> <mo>=</mo> <mi>l</mi> <mi>n</mi> <mrow> <mo>(</mo> <munderover> <mo>&Sigma;</mo> <mrow> <mi>k</mi> <mo>=</mo> <mn>0</mn> </mrow> <mrow> <mi>N</mi> <mo>-</mo> <mn>1</mn> </mrow> </munderover> <mo>|</mo> <mi>X</mi> <mo>(</mo> <mi>k</mi> <mo>)</mo> <msup> <mo>|</mo> <mn>2</mn> </msup> <mo>)</mo> </mrow> <msub> <mi>H</mi> <mi>m</mi> </msub> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>)</mo> <mo>,</mo> <mn>0</mn> <mo>&le;</mo> <mi>m</mi> <mo><</mo> <mi>M</mi> </mrow>

H_m(k) frequency response of Mel wave filters is represented, M represents the number of Mel wave filters；

S16：Discrete cosine transform conversion is carried out to log spectrum S (m), and then obtains feature MFCC, then the n-th dimensional feature C (n) For：

<mrow> <mi>C</mi> <mrow> <mo>(</mo> <mi>n</mi> <mo>)</mo> </mrow> <mo>=</mo> <munderover> <mo>&Sigma;</mo> <mrow> <mi>m</mi> <mo>=</mo> <mn>1</mn> </mrow> <mrow> <mi>M</mi> <mo>-</mo> <mn>1</mn> </mrow> </munderover> <mi>S</mi> <mrow> <mo>(</mo> <mi>m</mi> <mo>)</mo> </mrow> <mi>c</mi> <mi>o</mi> <mi>s</mi> <mo>&lsqb;</mo> <mfrac> <mrow> <mi>&pi;</mi> <mi>n</mi> <mrow> <mo>(</mo> <mi>m</mi> <mo>+</mo> <mn>1</mn> <mo>/</mo> <mn>2</mn> <mo>)</mo> </mrow> </mrow> <mi>M</mi> </mfrac> <mo>&rsqb;</mo> <mo>,</mo> <mn>0</mn> <mo>&le;</mo> <mi>m</mi> <mo><</mo> <mi>M</mi> <mo>.</mo> </mrow>

3. speaker's Further Feature Extraction method based on fusion feature MGFCC according to claim 2, it is characterised in that： The window function is from the Hamming window that main lobe is wider and secondary lobe is relatively low：

4. speaker's Further Feature Extraction method based on fusion feature MGFCC according to claim 1, it is characterised in that： The extracting method of GFCC features is described in step S2：

S21：Speaker voice signal s (n) is converted into time-domain signal x (n), become by fast Fourier after pretreatment Get discrete power spectrum L (k) in return,

<mrow> <mi>L</mi> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>=</mo> <munderover> <mo>&Sigma;</mo> <mrow> <mi>n</mi> <mo>=</mo> <mn>0</mn> </mrow> <mrow> <mi>N</mi> <mo>-</mo> <mn>1</mn> </mrow> </munderover> <mi>x</mi> <mrow> <mo>(</mo> <mi>n</mi> <mo>)</mo> </mrow> <msup> <mi>e</mi> <mrow> <mo>-</mo> <mi>j</mi> <mn>2</mn> <mi>&pi;</mi> <mi>n</mi> <mi>k</mi> <mo>/</mo> <mi>N</mi> </mrow> </msup> <mo>,</mo> <mrow> <mo>(</mo> <mn>0</mn> <mo>&le;</mo> <mi>n</mi> <mo>,</mo> <mi>k</mi> <mo>&le;</mo> <mi>N</mi> <mo>-</mo> <mn>1</mn> <mo>)</mo> </mrow> </mrow> 1

S22：Take above-mentioned discrete power spectrum L's (k) square to obtain speech energy spectrum, then using Gammatone wave filter groups pair It is filtered；

<mrow> <msub> <mi>s</mi> <mi>m</mi> </msub> <mo>=</mo> <munderover> <mo>&Sigma;</mo> <mrow> <mi>k</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>N</mi> </munderover> <msup> <mrow> <mo>&lsqb;</mo> <mi>L</mi> <msup> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <mn>2</mn> </msup> <msub> <mi>H</mi> <mi>m</mi> </msub> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>&rsqb;</mo> </mrow> <mrow> <mi>e</mi> <mrow> <mo>(</mo> <mi>f</mi> <mo>)</mo> </mrow> </mrow> </msup> </mrow>

In formula, e (f) is index compressed value, and M is the number of wave filter, 1≤m<M, H_m(k) frequency of Gammatone wave filters is represented Rate is responded；

<mrow> <msub> <mi>C</mi> <mrow> <mi>G</mi> <mi>F</mi> <mi>C</mi> <mi>C</mi> </mrow> </msub> <mrow> <mo>(</mo> <mi>j</mi> <mo>)</mo> </mrow> <mo>=</mo> <msqrt> <mfrac> <mn>2</mn> <mi>M</mi> </mfrac> </msqrt> <munderover> <mo>&Sigma;</mo> <mrow> <mi>m</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>M</mi> </munderover> <msub> <mi>s</mi> <mi>m</mi> </msub> <mi>c</mi> <mi>o</mi> <mi>s</mi> <mo>&lsqb;</mo> <mfrac> <mrow> <mi>&pi;</mi> <mi>j</mi> <mrow> <mo>(</mo> <mi>m</mi> <mo>-</mo> <mn>1</mn> <mo>/</mo> <mn>2</mn> <mo>)</mo> </mrow> </mrow> <mi>M</mi> </mfrac> <mo>&rsqb;</mo> <mo>,</mo> <mi>j</mi> <mo>=</mo> <mn>1</mn> <mo>,</mo> <mn>2</mn> <mo>,</mo> <mn>3</mn> <mo>,</mo> <mn>...</mn> <mo>,</mo> <mi>L</mi> </mrow>

In formula, L is characterized the dimension of parameter.C_GFCC(j) GFCC features are represented, M represents number of filter.

5. speaker's Further Feature Extraction method based on fusion feature MGFCC according to claim 1, it is characterised in that： Characteristic area indexes F_RThe ratio between inter _ class relationship and within-cluster variance for being characterized：

<mrow> <msub> <mi>F</mi> <mi>R</mi> </msub> <mo>=</mo> <mfrac> <mrow> <munderover> <mo>&Sigma;</mo> <mrow> <mi>i</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>H</mi> </munderover> <msup> <mrow> <mo>(</mo> <msub> <mi>&mu;</mi> <mi>i</mi> </msub> <mo>-</mo> <mi>&mu;</mi> <mo>)</mo> </mrow> <mn>2</mn> </msup> </mrow> <mrow> <mfrac> <mn>1</mn> <mi>K</mi> </mfrac> <munderover> <mo>&Sigma;</mo> <mrow> <mi>i</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>H</mi> </munderover> <munderover> <mo>&Sigma;</mo> <mrow> <mi>j</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>K</mi> </munderover> <msup> <mrow> <mo>(</mo> <msubsup> <mi>x</mi> <mi>i</mi> <mi>j</mi> </msubsup> <mo>-</mo> <msub> <mi>&mu;</mi> <mi>i</mi> </msub> <mo>)</mo> </mrow> <mn>2</mn> </msup> </mrow> </mfrac> <mo>,</mo> <mi>i</mi> <mo>=</mo> <mn>1</mn> <mo>,</mo> <mn>2</mn> <mo>,</mo> <mn>3</mn> <mo>,</mo> <mn>...</mn> <mi>H</mi> <mo>,</mo> <mi>j</mi> <mo>=</mo> <mn>1</mn> <mo>,</mo> <mn>2</mn> <mo>,</mo> <mn>3</mn> <mo>,</mo> <mn>...</mn> <mo>,</mo> <mi>K</mi> </mrow>

<mrow> <msub> <mi>&mu;</mi> <mi>i</mi> </msub> <mo>=</mo> <mfrac> <mn>1</mn> <mi>K</mi> </mfrac> <munderover> <mo>&Sigma;</mo> <mrow> <mi>j</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>K</mi> </munderover> <msubsup> <mi>x</mi> <mi>i</mi> <mi>j</mi> </msubsup> <mo>,</mo> <mi>&mu;</mi> <mo>=</mo> <mfrac> <mn>1</mn> <mi>H</mi> </mfrac> <munderover> <mo>&Sigma;</mo> <mrow> <mi>i</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>H</mi> </munderover> <msub> <mi>&mu;</mi> <mi>i</mi> </msub> </mrow>

In formula, μ is the characteristic mean of whole speakers,It is the characteristic value of i-th of speaker's jth frame, μ_iIt is i-th of speaker Characteristic mean, H be speaker sum, K is the number of speech frames of single speaker.

6. speaker's Further Feature Extraction method based on fusion feature MGFCC according to claim 1, it is characterised in that： The feature restructuring obtains second extraction feature F_MGFCC according to below equation：

<mrow> <mi>F</mi> <mo>_</mo> <mi>M</mi> <mi>G</mi> <mi>F</mi> <mi>C</mi> <mi>C</mi> <mo>=</mo> <mfenced open = "{" close = ""> <mtable> <mtr> <mtd> <mrow> <msub> <mi>MGFCC</mi> <mi>i</mi> </msub> </mrow> </mtd> <mtd> <mrow> <mi>i</mi> <mo>=</mo> <mn>0</mn> <mo>,</mo> <mn>1</mn> <mo>,</mo> <mn>2</mn> <mo>,</mo> <mn>...</mn> <mo>,</mo> <mi>P</mi> <mo>-</mo> <mn>1</mn> </mrow> </mtd> </mtr> <mtr> <mtd> <mrow> <mi>M</mi> <mi>G</mi> <mi>F</mi> <mi>C</mi> <mi>C</mi> <mo>_</mo> <msub> <mi>D</mi> <mrow> <mo>(</mo> <mi>i</mi> <mo>-</mo> <mi>p</mi> <mo>)</mo> </mrow> </msub> </mrow> </mtd> <mtd> <mrow> <mi>i</mi> <mo>=</mo> <mi>p</mi> <mo>,</mo> <mi>p</mi> <mo>+</mo> <mn>1</mn> <mo>,</mo> <mi>p</mi> <mo>+</mo> <mn>2</mn> <mo>,</mo> <mn>...</mn> <mo>,</mo> <mn>2</mn> <mi>p</mi> <mo>-</mo> <mn>1</mn> </mrow> </mtd> </mtr> </mtable> </mfenced> </mrow>