CN107274887A - Speaker's Further Feature Extraction method based on fusion feature MGFCC - Google Patents

Speaker's Further Feature Extraction method based on fusion feature MGFCC Download PDF

Info

Publication number
CN107274887A
CN107274887A CN201710322792.8A CN201710322792A CN107274887A CN 107274887 A CN107274887 A CN 107274887A CN 201710322792 A CN201710322792 A CN 201710322792A CN 107274887 A CN107274887 A CN 107274887A
Authority
CN
China
Prior art keywords
mrow
msub
speaker
feature
munderover
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201710322792.8A
Other languages
Chinese (zh)
Inventor
张毅
王可佳
颜博
乐聪聪
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chongqing University of Post and Telecommunications
Original Assignee
Chongqing University of Post and Telecommunications
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chongqing University of Post and Telecommunications filed Critical Chongqing University of Post and Telecommunications
Priority to CN201710322792.8A priority Critical patent/CN107274887A/en
Publication of CN107274887A publication Critical patent/CN107274887A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/02Feature extraction for speech recognition; Selection of recognition unit
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/20Speech recognition techniques specially adapted for robustness in adverse environments, e.g. in noise, of stress induced speech
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/02Preprocessing operations, e.g. segment selection; Pattern representation or modelling, e.g. based on linear discriminant analysis [LDA] or principal components; Feature selection or extraction
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/02Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/04Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
    • G10L19/26Pre-filtering or post-filtering
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • G10L25/24Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being the cepstrum
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/45Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of analysis window
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2218/00Aspects of pattern recognition specially adapted for signal processing
    • G06F2218/08Feature extraction

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Soundproofing, Sound Blocking, And Sound Damping (AREA)

Abstract

The invention discloses a kind of speaker's Further Feature Extraction method based on fusion feature MGFCC, the method comprising the steps of:S1 is handled the progress of speaker's voice using Mel wave filters and is obtained MFCC features;S2 is handled the progress of speaker's voice using Gammatone wave filters and is obtained GFCC features simultaneously;S3 is calculated each dimensional feature discrimination of two kinds of features in a noisy environment;Every one-dimensional characteristic that S4 counts two kinds of features is in maximum FRThe number of times of value;The different of maximum times of the S5 according to two kinds of features under noise background carry out Fusion Features;S6 carries out differential to fusion feature and restructuring obtains second extraction feature.The present invention can extract more can comprehensive representation speaker feature.

Description

Speaker's Further Feature Extraction method based on fusion feature MGFCC
Technical field
The present invention relates to speaker Recognition Technology field, more particularly to a kind of speaker two based on fusion feature MGFCC Secondary feature extracting method.
Background technology
Speaker's voice is after above a series of pretreatment, it is necessary to feature extraction and calculation be carried out to it, so as to produce A raw mathematics vector sequence is used as training and the input of identification process in Speaker Recognition System, therefore extracts the excellent of feature It is bad particularly important to the training of Speaker Identification model and the determination of parameter, affect the design and its property of whole speaker's system Energy.
The selection and the performance boost of follow-up Speaker Recognition System of speaker characteristic have a direct impact, and are Speaker Identifications It is basic that system is set up.To the Speaker Recognition System in practical application scene, the selection of characteristic parameter will not only consider identification Rate, will more ensure the stability and robustness of whole system performance.Therefore it is whole to extract optimal speaker characteristic parameter Particularly important processing procedure in individual Speaker Recognition System, while being also one of difficult point in Speech processing, to speaking The recognition performance of people has a direct impact.
The content of the invention
To improve the discrimination of Speaker Recognition System in a noisy environment, the present invention is based on people from bionical angle Ear auditory properties are extracted to speaker characteristic and studied, Gammatone filter of the Selection utilization based on human hearing characteristic first Ripple device group is simulated to human ear analog cochlea respectively with Mel wave filter groups, then according to Mel frequency cepstral coefficients and The discrimination of Gammatone frequency cepstral coefficients in a noisy environment carries out Fusion Features, obtains a kind of special based on human auditory system Speaker's fusion feature MGFCC of property.
The present invention is adopted the following technical scheme that to achieve these goals:Speaker based on fusion feature MGFCC is secondary Feature extracting method, it is characterised in that comprise the following steps:
S1:The progress of speaker's voice signal is handled using Mel wave filters and obtains MFCC features;
S2:The progress of speaker's voice signal is handled using Gammatone wave filters simultaneously and obtains GFCC features;
S3:Each dimensional feature discrimination F of MFCC features and GFCC features in a noisy environment is calculated respectivelyR
S4:Every one-dimensional characteristic of statistics MFCC features and GFCC features is in the number of times of maximum characteristic area indexing respectively;
S5:Maximum characteristic area indexing number of times of the two kinds of features counted according to step S4 under noise background carries out feature and melted Close;
S6:Differential is carried out to the fusion feature that step S5 is obtained and feature restructuring obtains second extraction feature.
The method of MFCC feature extractions is described in step S1:
S11:Preemphasis processing is carried out to speaker's voice signal:Speaker's voice signal is carried out using digital filter Handle, the transmission function in its Z domain is:H (z)=1-0.95z-1
S12:Framing adding window is carried out to the signal after step S11 processing, each of which frame contains N number of sampled point, window function For w (n), then the voice signal s after adding windoww(n) it is:
sw(n)=y (n) * w (n)
In formula, y (n) is the signal after preemphasis, 0≤n≤N;
Window function is from the Hamming window that main lobe is wider and secondary lobe is relatively low:
S13:Fast Fourier Transform (FFT):Signal after S12 is handled carries out Fast Fourier Transform (FFT), from time domain data conversion To frequency domain, obtaining voice linear spectral X (k) is:
S14:Line energy is calculated to the data after each frame Fast Fourier Transform (FFT):E (k)=[X (k)]2
S15:Logarithm operation is made in output to each Mel wave filters, can obtain log spectrum S (m) and be:
Hm(k) frequency response of Mel wave filters is represented, M represents the number of Mel wave filters.
S16:Discrete cosine transform conversion is carried out to log spectrum S (m), and then obtains feature MFCC, then the n-th dimensional feature C (n) it is:
The extracting method of GFCC features is described in step S2:
S21:Speaker voice signal s (n) is converted into time-domain signal x (n), by quick Fu after pretreatment Leaf transformation obtains discrete power spectrum L (k),
S22:Take above-mentioned discrete power spectrum L's (k) square to obtain speech energy spectrum, then using Gammatone wave filters Group is filtered to it;
S23:Row index compression is entered in output to each wave filter, obtains one group of energy frequency spectrum s1,s2,s3,…,sMFor:
In formula, e (f) is index compressed value, and M is filter channel number, 1≤m<M, Hm(k) Gammatone wave filters are represented Frequency response.H in the present inventionm(k) frequency response of wave filter is represented.
S24:Dct transform is made to the energy spectrum after compression, GFCC features are tried to achieve, its operational formula is:
In formula, L is characterized the dimension of parameter.CGFCC(j) the GFCC characteristic parameters of different dimensions are represented, M represents wave filter Number.
Characteristic area indexes FRThe ratio between inter _ class relationship and within-cluster variance for being characterized:
In formula, μ is the characteristic mean of whole speakers,It is the characteristic value of i-th of speaker's jth frame, μiIt is to say for i-th The characteristic mean of people is talked about, H is speaker's sum, and K is the number of speech frames of single speaker.
The feature restructuring obtains second extraction feature F_MGFCC according to below equation:
MGFCCiRepresent fusion feature, MGFCC_D(i-p)The fusion feature after differential is represented, P is feature exponent number.
In summary, by adopting the above-described technical solution, the beneficial effects of the invention are as follows:Under identical experimental situation, Further Feature Extraction algorithm based on fusion feature MGFCC, still there is preferable identification under complicated noise, and stronger Robustness, can further extract the hiding feature of speaker, largely improve Speaker Recognition System Performance.
Brief description of the drawings
Fig. 1 is the extraction flow chart of MFCC characteristic parameters;
Fig. 2 is the extraction flow chart of GFCC characteristic parameters;
Fig. 3 is speaker's Further Feature Extraction flow chart based on fusion feature MGFCC.
Embodiment
Below in conjunction with the accompanying drawing in the embodiment of the present invention, the technical scheme in the embodiment of the present invention is purged, in detail Carefully describe.Described embodiment is only a part of embodiment of the present invention.
Fig. 1 is the extraction flow chart of MFCC characteristic parameters, and Fig. 2 is the extraction flow chart of GFCC characteristic parameters, and Fig. 3 is to be based on Fusion feature MGFCC speaker's Further Feature Extraction flow chart, as shown in the figure:It is special based on fusion the present invention is to provide one kind The method for levying MGFCC speaker's Further Feature Extraction, comprises the following steps:
S1:The progress of speaker's voice is handled using Mel wave filters and obtains MFCC features;
S2, GFCC features are obtained while being handled using Gammatone wave filters the progress of speaker's voice;
S3, is calculated each dimensional feature discrimination of two kinds of features in a noisy environment;
S4, every one-dimensional characteristic of two kinds of features of statistics is in maximum FRThe number of times of value;
S5, different according to maximum times of two kinds of features under noise background carry out Fusion Features;
S6, carries out differential to fusion feature and restructuring obtains second extraction feature.
Speaker's Further Feature Extraction method based on fusion feature MGFCC described in step S1 is included to MFCC features Extract, then its extraction step is (referring to Fig. 1):
S11:Preemphasis.Vocal cords and lip can bring certain low frequency signal to influence in mankind's voiced process, in order to carry The HFS of voice signal is risen, and then causes voice signal to tend to be flat, preemphasis can be taken to handle.Generally using numeral Wave filter is handled, set here the transmission function in Z domains as:
H (z)=1-0.95z-1
S12:Framing adding window.Because voice signal is short-term stationarity, therefore sub-frame processing can be carried out to it, wherein often One frame contains N number of sampled point, and N generally takes 256, and the time is about 30ms.In view of the edge effect of speech frame, the two ends of speech frame It can produce and sharp change, now need to carry out windowing process to the signal after preemphasis processing.Definition window function is w (n), then adds Voice signal s after windoww(n) it is:
sw(n)=y (n) * w (n)
In formula, y (n) is the signal after preemphasis, 0≤n≤N.
S13:Preferably to express speaker characteristic, generally from the Hamming window that main lobe is wider and secondary lobe is relatively low:
S14:Fast Fourier Transform (FFT) (FFT).After pretreatment, each frame voice signal needs progress corresponding FFT computings, frequency domain is transformed to from time domain data, can be obtained voice linear spectral X (k) and is:
S15:Line energy is calculated.Line energy is calculated to the data after each frame FFT:
E (k)=[X (k)]2
S16:Preferably to lift the recognition performance of Speaker Recognition System, logarithm is made in the generally output to each wave filter Computing, can obtain log spectrum S (m) is:
Hm(k) Mel frequency response is represented, M represents the number of Mel wave filters.
S17:Discrete cosine transform (DCT).Dct transform is carried out to voice spectrum S (m), and then obtains Mel feature MFCC, Then the n-th dimensional feature C (n) such as formula.
M represents number of filter.
Described speaker's Further Feature Extraction method based on fusion feature MGFCC includes the extraction to GFCC features, Then its extraction step is:
S21:Voice signal s (n) is converted into time-domain signal x (n), passes through Fast Fourier Transform (FFT) after pretreatment Obtain discrete power spectrum L (k).
S22:Take above-mentioned power spectrum L (k) square obtains speech energy spectrum, then using Gammatone wave filter groups pair It is filtered.
S23:In order to further improve the antijamming capability of Speaker Recognition System, the output to each wave filter is used Index compresses, and obtains one group of energy frequency spectrum s1,s2,s3,…,sMFor:
In formula, e (f) is index compressed value, and M is filter channel number.
S24:Dct transform is made to the energy spectrum after compression, GFCC features are tried to achieve, its operational formula is:
In formula, L is characterized the dimension of parameter.
Described speaker's Further Feature Extraction method based on fusion feature MGFCC, then to two kinds of spies of MFCC and GFCC Each dimensional feature discrimination computational methods levied in a noisy environment are concretely comprised the following steps:
The ratio between the inter _ class relationship and within-cluster variance of feature FR
The noise adaptation of speaker characteristic can be judged using this discrimination.To speaker characteristic in different rings F under borderRValue is calculated, and carries out the noise robustness analysis of speaker characteristic.FRDefinition is:
In formula, μ is the characteristic mean of whole speakers,It is the characteristic value of i-th of speaker's jth frame, μiIt is to say for i-th The characteristic mean of people is talked about, H is speaker's sum, and K is the number of speech frames of single speaker.
Described speaker's Further Feature Extraction method based on fusion feature MGFCC, then two kinds of features of statistics is each Dimensional feature is in maximum FRThe number of times of value is:
Feature A and B are extracted to the speaker in self-built sound bank, while the maximum of every one-dimensional characteristic to two kinds of features FRThe number of times P that max occurs in eigenmatrix is counted, and calculation expression is:
Described speaker's Further Feature Extraction method based on fusion feature MGFCC, then according to two kinds of features in noise The step of maximum times under background different carry out Fusion Features be:
S51:From Noisy Speech Signal, MFCC the and GFCC characteristic parameters of 24 dimensions are extracted respectively;
S52:Speaker characteristic is calculated according to the formula in step S3 to be set in factory noise and white noise and signal to noise ratio F in the environment of 5dB, 10dB, 15dB per one-dimensional characteristicRValue;
S53:Maximum F is in per one-dimensional characteristic according to the formula statistics in step S4RThe number of times of value, and according to per one-dimensional The difference of MFCC and GFCC maximum number of times is merged, and obtains 24 dimension fusion feature MGFCC.
Described speaker's Further Feature Extraction method based on fusion feature MGFCC, then carried out to fusion feature MGFCC Differential and restructuring obtain second extraction characterization step:
S61:Differential is carried out to fusion feature:
The purpose differentiated to feature is in order to the progress of the continuous dynamic change track of corresponding characteristic vector Description.Therefore, speaker is expressed in order to better profit from fusion feature MGFCC, MGFCC features are differentiated here with Obtain its continuous dynamic change track.
Feature_D(j)i=Feature (j)i-Feature(j-1)i 0≤i≤P,1≤j≤R
In formula, Feature is the sequence vector of original feature, is herein MGFCC, and Feature_D is original characteristic vector First differential, P is feature exponent number, R characteristic vector numbers.
S62:Fusion feature is recombinated:
Formula can obtain Feature in step S62, multigroup characteristic vector such as Feature_D, carry out feature to it below Restructuring, because different speaker characteristics correspond to different phonetic feature differential vectors, is carried out by specific ratio to it Weighting restructuring, preferably to describe the personal information of speaker.It is special that one group of new speaker can be obtained according to formula below Levy parameter F_MGFCC:
The preferred embodiments of the present invention are the foregoing is only, are not intended to limit the invention, it is clear that those skilled in the art Member can carry out various changes and modification to the present invention without departing from the spirit and scope of the present invention.So, if the present invention These modifications and variations belong within the scope of the claims in the present invention and its equivalent technologies, then the present invention is also intended to include these Including change and modification.

Claims (6)

1. speaker's Further Feature Extraction method based on fusion feature MGFCC, it is characterised in that comprise the following steps:
S1:The progress of speaker's voice signal is handled using Mel wave filters and obtains MFCC features;
S2:The progress of speaker's voice signal is handled using Gammatone wave filters simultaneously and obtains GFCC features;
S3:Each dimensional feature discrimination F of MFCC features and GFCC features in a noisy environment is calculated respectivelyR
S4:Every one-dimensional characteristic of statistics MFCC features and GFCC features is in the number of times of maximum characteristic area indexing respectively;
S5:Maximum characteristic area indexing number of times of the two kinds of features counted according to step S4 under noise background carries out Fusion Features;
S6:Differential is carried out to the fusion feature that step S5 is obtained and feature restructuring obtains second extraction feature.
2. speaker's Further Feature Extraction method based on fusion feature MGFCC according to claim 1, it is characterised in that: The method of MFCC feature extractions is described in step S1:
S11:Preemphasis processing is carried out to speaker's voice signal:Using digital filter to speaker's voice signal at Manage, the transmission function in its Z domain is:H (z)=1-0.95z-1
S12:Framing adding window is carried out to the signal after step S11 processing, each of which frame contains N number of sampled point, and window function is w (n), then voice signal s after adding windoww(n) it is:
sw(n)=y (n) * w (n)
In formula, y (n) is the signal after preemphasis, 0≤n≤N;
S13:Fast Fourier Transform (FFT):Signal after S12 is handled carries out Fast Fourier Transform (FFT), and frequency is transformed to from time domain data Domain, obtaining voice linear spectral X (k) is:
<mrow> <mi>X</mi> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>=</mo> <munderover> <mo>&amp;Sigma;</mo> <mrow> <mi>n</mi> <mo>=</mo> <mn>0</mn> </mrow> <mrow> <mi>N</mi> <mo>-</mo> <mn>1</mn> </mrow> </munderover> <msub> <mi>s</mi> <mi>w</mi> </msub> <mrow> <mo>(</mo> <mi>n</mi> <mo>)</mo> </mrow> <msup> <mi>e</mi> <mrow> <mo>-</mo> <mi>j</mi> <mn>2</mn> <mi>&amp;pi;</mi> <mi>n</mi> <mi>k</mi> <mo>/</mo> <mi>N</mi> </mrow> </msup> <mo>,</mo> <mrow> <mo>(</mo> <mn>0</mn> <mo>&amp;le;</mo> <mi>n</mi> <mo>,</mo> <mi>k</mi> <mo>&amp;le;</mo> <mi>N</mi> <mo>-</mo> <mn>1</mn> <mo>)</mo> </mrow> </mrow>
S14:Line energy is calculated to the data after each frame Fast Fourier Transform (FFT):E (k)=[X (k)]2
S15:Logarithm operation is made in output to each Mel wave filters, can obtain log spectrum S (m) and be:
<mrow> <mi>S</mi> <mrow> <mo>(</mo> <mi>m</mi> <mo>)</mo> </mrow> <mo>=</mo> <mi>l</mi> <mi>n</mi> <mrow> <mo>(</mo> <munderover> <mo>&amp;Sigma;</mo> <mrow> <mi>k</mi> <mo>=</mo> <mn>0</mn> </mrow> <mrow> <mi>N</mi> <mo>-</mo> <mn>1</mn> </mrow> </munderover> <mo>|</mo> <mi>X</mi> <mo>(</mo> <mi>k</mi> <mo>)</mo> <msup> <mo>|</mo> <mn>2</mn> </msup> <mo>)</mo> </mrow> <msub> <mi>H</mi> <mi>m</mi> </msub> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>)</mo> <mo>,</mo> <mn>0</mn> <mo>&amp;le;</mo> <mi>m</mi> <mo>&lt;</mo> <mi>M</mi> </mrow>
Hm(k) frequency response of Mel wave filters is represented, M represents the number of Mel wave filters;
S16:Discrete cosine transform conversion is carried out to log spectrum S (m), and then obtains feature MFCC, then the n-th dimensional feature C (n) For:
<mrow> <mi>C</mi> <mrow> <mo>(</mo> <mi>n</mi> <mo>)</mo> </mrow> <mo>=</mo> <munderover> <mo>&amp;Sigma;</mo> <mrow> <mi>m</mi> <mo>=</mo> <mn>1</mn> </mrow> <mrow> <mi>M</mi> <mo>-</mo> <mn>1</mn> </mrow> </munderover> <mi>S</mi> <mrow> <mo>(</mo> <mi>m</mi> <mo>)</mo> </mrow> <mi>c</mi> <mi>o</mi> <mi>s</mi> <mo>&amp;lsqb;</mo> <mfrac> <mrow> <mi>&amp;pi;</mi> <mi>n</mi> <mrow> <mo>(</mo> <mi>m</mi> <mo>+</mo> <mn>1</mn> <mo>/</mo> <mn>2</mn> <mo>)</mo> </mrow> </mrow> <mi>M</mi> </mfrac> <mo>&amp;rsqb;</mo> <mo>,</mo> <mn>0</mn> <mo>&amp;le;</mo> <mi>m</mi> <mo>&lt;</mo> <mi>M</mi> <mo>.</mo> </mrow>
3. speaker's Further Feature Extraction method based on fusion feature MGFCC according to claim 2, it is characterised in that: The window function is from the Hamming window that main lobe is wider and secondary lobe is relatively low:
<mrow> <mi>w</mi> <mrow> <mo>(</mo> <mi>n</mi> <mo>)</mo> </mrow> <mo>=</mo> <mfenced open = "{" close = ""> <mtable> <mtr> <mtd> <mrow> <mn>0.54</mn> <mo>-</mo> <mn>0.46</mn> <mi>c</mi> <mi>o</mi> <mi>s</mi> <mo>&amp;lsqb;</mo> <mn>2</mn> <mi>&amp;pi;</mi> <mi>n</mi> <mrow> <mo>(</mo> <mi>N</mi> <mo>-</mo> <mn>1</mn> <mo>)</mo> </mrow> <mo>&amp;rsqb;</mo> <mo>,</mo> </mrow> </mtd> <mtd> <mrow> <mn>0</mn> <mo>&amp;le;</mo> <mi>n</mi> <mo>&amp;le;</mo> <mrow> <mo>(</mo> <mi>N</mi> <mo>-</mo> <mn>1</mn> <mo>)</mo> </mrow> </mrow> </mtd> </mtr> <mtr> <mtd> <mrow> <mn>0</mn> <mo>,</mo> </mrow> </mtd> <mtd> <mrow> <mi>o</mi> <mi>t</mi> <mi>h</mi> <mi>e</mi> <mi>r</mi> </mrow> </mtd> </mtr> </mtable> </mfenced> <mo>.</mo> </mrow>
4. speaker's Further Feature Extraction method based on fusion feature MGFCC according to claim 1, it is characterised in that: The extracting method of GFCC features is described in step S2:
S21:Speaker voice signal s (n) is converted into time-domain signal x (n), become by fast Fourier after pretreatment Get discrete power spectrum L (k) in return,
<mrow> <mi>L</mi> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>=</mo> <munderover> <mo>&amp;Sigma;</mo> <mrow> <mi>n</mi> <mo>=</mo> <mn>0</mn> </mrow> <mrow> <mi>N</mi> <mo>-</mo> <mn>1</mn> </mrow> </munderover> <mi>x</mi> <mrow> <mo>(</mo> <mi>n</mi> <mo>)</mo> </mrow> <msup> <mi>e</mi> <mrow> <mo>-</mo> <mi>j</mi> <mn>2</mn> <mi>&amp;pi;</mi> <mi>n</mi> <mi>k</mi> <mo>/</mo> <mi>N</mi> </mrow> </msup> <mo>,</mo> <mrow> <mo>(</mo> <mn>0</mn> <mo>&amp;le;</mo> <mi>n</mi> <mo>,</mo> <mi>k</mi> <mo>&amp;le;</mo> <mi>N</mi> <mo>-</mo> <mn>1</mn> <mo>)</mo> </mrow> </mrow> 1
S22:Take above-mentioned discrete power spectrum L's (k) square to obtain speech energy spectrum, then using Gammatone wave filter groups pair It is filtered;
S23:Row index compression is entered in output to each wave filter, obtains one group of energy frequency spectrum s1,s2,s3,…,sMFor:
<mrow> <msub> <mi>s</mi> <mi>m</mi> </msub> <mo>=</mo> <munderover> <mo>&amp;Sigma;</mo> <mrow> <mi>k</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>N</mi> </munderover> <msup> <mrow> <mo>&amp;lsqb;</mo> <mi>L</mi> <msup> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <mn>2</mn> </msup> <msub> <mi>H</mi> <mi>m</mi> </msub> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>&amp;rsqb;</mo> </mrow> <mrow> <mi>e</mi> <mrow> <mo>(</mo> <mi>f</mi> <mo>)</mo> </mrow> </mrow> </msup> </mrow>
In formula, e (f) is index compressed value, and M is the number of wave filter, 1≤m<M, Hm(k) frequency of Gammatone wave filters is represented Rate is responded;
S24:Dct transform is made to the energy spectrum after compression, GFCC features are tried to achieve, its operational formula is:
<mrow> <msub> <mi>C</mi> <mrow> <mi>G</mi> <mi>F</mi> <mi>C</mi> <mi>C</mi> </mrow> </msub> <mrow> <mo>(</mo> <mi>j</mi> <mo>)</mo> </mrow> <mo>=</mo> <msqrt> <mfrac> <mn>2</mn> <mi>M</mi> </mfrac> </msqrt> <munderover> <mo>&amp;Sigma;</mo> <mrow> <mi>m</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>M</mi> </munderover> <msub> <mi>s</mi> <mi>m</mi> </msub> <mi>c</mi> <mi>o</mi> <mi>s</mi> <mo>&amp;lsqb;</mo> <mfrac> <mrow> <mi>&amp;pi;</mi> <mi>j</mi> <mrow> <mo>(</mo> <mi>m</mi> <mo>-</mo> <mn>1</mn> <mo>/</mo> <mn>2</mn> <mo>)</mo> </mrow> </mrow> <mi>M</mi> </mfrac> <mo>&amp;rsqb;</mo> <mo>,</mo> <mi>j</mi> <mo>=</mo> <mn>1</mn> <mo>,</mo> <mn>2</mn> <mo>,</mo> <mn>3</mn> <mo>,</mo> <mn>...</mn> <mo>,</mo> <mi>L</mi> </mrow>
In formula, L is characterized the dimension of parameter.CGFCC(j) GFCC features are represented, M represents number of filter.
5. speaker's Further Feature Extraction method based on fusion feature MGFCC according to claim 1, it is characterised in that: Characteristic area indexes FRThe ratio between inter _ class relationship and within-cluster variance for being characterized:
<mrow> <msub> <mi>F</mi> <mi>R</mi> </msub> <mo>=</mo> <mfrac> <mrow> <munderover> <mo>&amp;Sigma;</mo> <mrow> <mi>i</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>H</mi> </munderover> <msup> <mrow> <mo>(</mo> <msub> <mi>&amp;mu;</mi> <mi>i</mi> </msub> <mo>-</mo> <mi>&amp;mu;</mi> <mo>)</mo> </mrow> <mn>2</mn> </msup> </mrow> <mrow> <mfrac> <mn>1</mn> <mi>K</mi> </mfrac> <munderover> <mo>&amp;Sigma;</mo> <mrow> <mi>i</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>H</mi> </munderover> <munderover> <mo>&amp;Sigma;</mo> <mrow> <mi>j</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>K</mi> </munderover> <msup> <mrow> <mo>(</mo> <msubsup> <mi>x</mi> <mi>i</mi> <mi>j</mi> </msubsup> <mo>-</mo> <msub> <mi>&amp;mu;</mi> <mi>i</mi> </msub> <mo>)</mo> </mrow> <mn>2</mn> </msup> </mrow> </mfrac> <mo>,</mo> <mi>i</mi> <mo>=</mo> <mn>1</mn> <mo>,</mo> <mn>2</mn> <mo>,</mo> <mn>3</mn> <mo>,</mo> <mn>...</mn> <mi>H</mi> <mo>,</mo> <mi>j</mi> <mo>=</mo> <mn>1</mn> <mo>,</mo> <mn>2</mn> <mo>,</mo> <mn>3</mn> <mo>,</mo> <mn>...</mn> <mo>,</mo> <mi>K</mi> </mrow>
<mrow> <msub> <mi>&amp;mu;</mi> <mi>i</mi> </msub> <mo>=</mo> <mfrac> <mn>1</mn> <mi>K</mi> </mfrac> <munderover> <mo>&amp;Sigma;</mo> <mrow> <mi>j</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>K</mi> </munderover> <msubsup> <mi>x</mi> <mi>i</mi> <mi>j</mi> </msubsup> <mo>,</mo> <mi>&amp;mu;</mi> <mo>=</mo> <mfrac> <mn>1</mn> <mi>H</mi> </mfrac> <munderover> <mo>&amp;Sigma;</mo> <mrow> <mi>i</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>H</mi> </munderover> <msub> <mi>&amp;mu;</mi> <mi>i</mi> </msub> </mrow>
In formula, μ is the characteristic mean of whole speakers,It is the characteristic value of i-th of speaker's jth frame, μiIt is i-th of speaker Characteristic mean, H be speaker sum, K is the number of speech frames of single speaker.
6. speaker's Further Feature Extraction method based on fusion feature MGFCC according to claim 1, it is characterised in that: The feature restructuring obtains second extraction feature F_MGFCC according to below equation:
<mrow> <mi>F</mi> <mo>_</mo> <mi>M</mi> <mi>G</mi> <mi>F</mi> <mi>C</mi> <mi>C</mi> <mo>=</mo> <mfenced open = "{" close = ""> <mtable> <mtr> <mtd> <mrow> <msub> <mi>MGFCC</mi> <mi>i</mi> </msub> </mrow> </mtd> <mtd> <mrow> <mi>i</mi> <mo>=</mo> <mn>0</mn> <mo>,</mo> <mn>1</mn> <mo>,</mo> <mn>2</mn> <mo>,</mo> <mn>...</mn> <mo>,</mo> <mi>P</mi> <mo>-</mo> <mn>1</mn> </mrow> </mtd> </mtr> <mtr> <mtd> <mrow> <mi>M</mi> <mi>G</mi> <mi>F</mi> <mi>C</mi> <mi>C</mi> <mo>_</mo> <msub> <mi>D</mi> <mrow> <mo>(</mo> <mi>i</mi> <mo>-</mo> <mi>p</mi> <mo>)</mo> </mrow> </msub> </mrow> </mtd> <mtd> <mrow> <mi>i</mi> <mo>=</mo> <mi>p</mi> <mo>,</mo> <mi>p</mi> <mo>+</mo> <mn>1</mn> <mo>,</mo> <mi>p</mi> <mo>+</mo> <mn>2</mn> <mo>,</mo> <mn>...</mn> <mo>,</mo> <mn>2</mn> <mi>p</mi> <mo>-</mo> <mn>1</mn> </mrow> </mtd> </mtr> </mtable> </mfenced> </mrow>
MGFCCiRepresent fusion feature, MGFCC_D(i-p)The fusion feature after differential is represented, P is feature exponent number.
CN201710322792.8A 2017-05-09 2017-05-09 Speaker's Further Feature Extraction method based on fusion feature MGFCC Pending CN107274887A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710322792.8A CN107274887A (en) 2017-05-09 2017-05-09 Speaker's Further Feature Extraction method based on fusion feature MGFCC

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710322792.8A CN107274887A (en) 2017-05-09 2017-05-09 Speaker's Further Feature Extraction method based on fusion feature MGFCC

Publications (1)

Publication Number Publication Date
CN107274887A true CN107274887A (en) 2017-10-20

Family

ID=60073910

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710322792.8A Pending CN107274887A (en) 2017-05-09 2017-05-09 Speaker's Further Feature Extraction method based on fusion feature MGFCC

Country Status (1)

Country Link
CN (1) CN107274887A (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108109233A (en) * 2017-12-14 2018-06-01 华南理工大学 Multilevel security protection system based on biological information of human body
CN109003364A (en) * 2018-07-04 2018-12-14 深圳市益鑫智能科技有限公司 A kind of Gate-ban Monitoring System of Home House based on speech recognition
CN109147818A (en) * 2018-10-30 2019-01-04 Oppo广东移动通信有限公司 Acoustic feature extracting method, device, storage medium and terminal device
CN110363148A (en) * 2019-07-16 2019-10-22 中用科技有限公司 A kind of method of face vocal print feature fusion verifying
CN111145736A (en) * 2019-12-09 2020-05-12 华为技术有限公司 Speech recognition method and related equipment
CN111755012A (en) * 2020-06-24 2020-10-09 湖北工业大学 Robust speaker recognition method based on depth layer feature fusion

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104900229A (en) * 2015-05-25 2015-09-09 桂林电子科技大学信息科技学院 Method for extracting mixed characteristic parameters of voice signals

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104900229A (en) * 2015-05-25 2015-09-09 桂林电子科技大学信息科技学院 Method for extracting mixed characteristic parameters of voice signals

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
张来洪 等: ""一种基于感知特征动态失真度量的语音质量评估算法"", 《自动化技术与应用》 *
方琦军: ""基于VQ与HMM的说话人识别技术研究"", 《中国优秀硕士学位论文全文数据库信息技术辑》 *
罗元 等: ""一种新的鲁棒声纹特征提取与融合方法"", 《计算机科学》 *

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108109233A (en) * 2017-12-14 2018-06-01 华南理工大学 Multilevel security protection system based on biological information of human body
CN109003364A (en) * 2018-07-04 2018-12-14 深圳市益鑫智能科技有限公司 A kind of Gate-ban Monitoring System of Home House based on speech recognition
CN109147818A (en) * 2018-10-30 2019-01-04 Oppo广东移动通信有限公司 Acoustic feature extracting method, device, storage medium and terminal device
CN110363148A (en) * 2019-07-16 2019-10-22 中用科技有限公司 A kind of method of face vocal print feature fusion verifying
CN111145736A (en) * 2019-12-09 2020-05-12 华为技术有限公司 Speech recognition method and related equipment
CN111145736B (en) * 2019-12-09 2022-10-04 华为技术有限公司 Speech recognition method and related equipment
CN111755012A (en) * 2020-06-24 2020-10-09 湖北工业大学 Robust speaker recognition method based on depth layer feature fusion

Similar Documents

Publication Publication Date Title
CN107274887A (en) Speaker&#39;s Further Feature Extraction method based on fusion feature MGFCC
CN109326302B (en) Voice enhancement method based on voiceprint comparison and generation of confrontation network
Li et al. An auditory-based feature extraction algorithm for robust speaker identification under mismatched conditions
CN108447495B (en) Deep learning voice enhancement method based on comprehensive feature set
CN109036382B (en) Audio feature extraction method based on KL divergence
CN109256127B (en) Robust voice feature extraction method based on nonlinear power transformation Gamma chirp filter
CN110931022B (en) Voiceprint recognition method based on high-low frequency dynamic and static characteristics
Ganapathy et al. Robust feature extraction using modulation filtering of autoregressive models
CN110111803A (en) Based on the transfer learning sound enhancement method from attention multicore Largest Mean difference
CN105513605A (en) Voice enhancement system and method for cellphone microphone
CN109949821B (en) Method for removing reverberation of far-field voice by using U-NET structure of CNN
CN102664010B (en) Robust speaker distinguishing method based on multifactor frequency displacement invariant feature
CN112017682B (en) Single-channel voice simultaneous noise reduction and reverberation removal system
CN108564965B (en) Anti-noise voice recognition system
CN110931023B (en) Gender identification method, system, mobile terminal and storage medium
CN110942766A (en) Audio event detection method, system, mobile terminal and storage medium
CN105679321B (en) Voice recognition method, device and terminal
CN105225672A (en) Merge the system and method for the directed noise suppression of dual microphone of fundamental frequency information
CN104778948B (en) A kind of anti-noise audio recognition method based on bending cepstrum feature
CN111489763B (en) GMM model-based speaker recognition self-adaption method in complex environment
Shi et al. Robust speaker recognition based on improved GFCC
CN115602165B (en) Digital employee intelligent system based on financial system
CN114613389A (en) Non-speech audio feature extraction method based on improved MFCC
Riazati Seresht et al. Spectro-temporal power spectrum features for noise robust ASR
CN106297769B (en) A kind of distinctive feature extracting method applied to languages identification

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20171020