CN103956165A - Method for improving audio classification accuracy through mixed component clustering Fisher scoring algorithm - Google Patents

Method for improving audio classification accuracy through mixed component clustering Fisher scoring algorithm Download PDF

Info

Publication number
CN103956165A
CN103956165A CN201410194236.3A CN201410194236A CN103956165A CN 103956165 A CN103956165 A CN 103956165A CN 201410194236 A CN201410194236 A CN 201410194236A CN 103956165 A CN103956165 A CN 103956165A
Authority
CN
China
Prior art keywords
fisher
classification accuracy
mixed components
cgmm
accuracy rate
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201410194236.3A
Other languages
Chinese (zh)
Inventor
王荣燕
李海军
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Dezhou University
Original Assignee
Dezhou University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Dezhou University filed Critical Dezhou University
Priority to CN201410194236.3A priority Critical patent/CN103956165A/en
Publication of CN103956165A publication Critical patent/CN103956165A/en
Pending legal-status Critical Current

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a method for improving the audio classification accuracy through a mixed component clustering Fisher scoring algorithm. The method includes the steps that all class of GMMs are united, and Gaussian components are combined into one Gaussian; a CGMM is formed; Fisher transform is performed on the CGMM; the Fisher score is solved to obtain the equal length characteristic. According to the method for improving the audio classification accuracy through the mixed component clustering Fisher scoring algorithm, each class of GMMs are united; the Gaussian components are combined into one Gaussian; the CGMM is formed; Fisher transform is performed on the CGMM; and the Fisher score is solved to obtain the equal length characteristic. The method combines the advantages of a generative mode and the advantages of a discriminant model, the differentiating characteristics among classes can be described, details can be well differentiated, and particularly when the fragment length of an extracted characteristic is small, high classification accuracy can still be achieved. Through the method, the classification accuracy of six voices can reach 77 percent.

Description

Utilize mixed components cluster Fisher scoring method to improve the method for audio classification accuracy rate
Technical field
The invention belongs to Fisher scoring method application, relate in particular to a kind of method of utilizing mixed components cluster Fisher scoring method to improve audio classification accuracy rate.
Background technology
At present, Fisher score, produces based on information generated, and it attempts from single generation model, to extract more information, and is not their output probability.The object of Fisher score conversion is to analyze score how to depend on model, and which part of model is important when determining this score, thereby obtains the information about data internal representation form.From this angle, by the direction of extension model parameter, come the similarity of two data points of comparison to seem very natural, namely the scoring function of two points is regarded as to the function of parameter, and compared these two gradients.If these two gradients are similar, mean these two data points adaptive model in the same way, namely, from the angle of the given parameters model parameter current arranges, they are similar, because they require parameter to do similar modification.The thought of Fisher score that Here it is.Traditional transform method is that the feature of each frame in an audio fragment is averaged and variance, and as the new isometric feature of this fragment, this method has obtained good effect in some classification problem.But existing method exists the differentiation details that easily neglects some classifications, when the length of audio fragment more in short-term, the effect obtaining can variation, the problem that treatment effeciency is low.
Therefore, invent a kind of method of utilizing mixed components cluster Fisher scoring method to improve audio classification accuracy rate and seem very necessary.
Summary of the invention
The object of the present invention is to provide a kind of method of utilizing mixed components cluster Fisher scoring method to improve audio classification accuracy rate, be intended to existing method and exist the differentiation details that easily neglects some classifications, when the length of audio fragment more in short-term, the effect meeting variation obtaining, the problem that treatment effeciency is low.The present invention is achieved in that
A kind of necessary technology scheme of utilizing mixed components cluster Fisher scoring method to improve the method for audio classification accuracy rate:
The present invention is achieved in that a kind of method of utilizing mixed components cluster Fisher scoring method to improve audio classification accuracy rate comprises,
Step 1: each classification GMM is combined;
Step 2: gaussian component is merged into a Gauss;
Step 3: form CGMM model;
Step 4: CGMM is carried out to Fisher conversion;
Step 5: ask Fisher score to obtain isometric feature.
A kind of less important technical scheme of utilizing mixed components cluster Fisher scoring method to improve the method for audio classification accuracy rate:
Further, in step 1, train the GMM of each classification, each GMM is combined.Each gaussian component that is about to each model is carried out order arrangement, is combined into a new model, and redistributes weight to the gaussian component of model, and making all weight sums is 1.The new model obtaining is like this UGMM, and the mixed components number sum that its mixed components number is each classification, carries out cluster by the gaussian component of UGMM;
Further, in step 2, closely similar gaussian component is gathered and is one bunch, and by bunch in gaussian component merge into a Gauss, as the component that represents of this bunch;
Further, in step 3, by the representing that Gauss is together in series and form new GMM of every cluster, this model is exactly CGMM;
Further, in step 4, to CGMM carry out feature that Fisher conversion obtains not only dimension reduced, and, removed partial redundance information, can better express the differentiation between classification, similarity between gaussian component depends on the distance metric of use, in order better to measure the similarity between gaussian component, adopt some conventional distance metrics of machine learning field, comprising: Euclidean distance, mahalanobis distance, Pasteur's distance and K_L2 distance, and needn't calculate the distance between gauss hybrid models, therefore, do not adopt class divergence distance;
Further, in step 5, based on CGMM, all samples are asked to Fisher score, obtain new isometric feature.With the isometric features training support vector machine multicategory classification device obtaining.
The method of mixed components cluster Fisher scoring method raising audio classification accuracy rate of utilizing provided by the invention is by combining each classification GMM; Gaussian component is merged into a Gauss; Form CGMM model; CGMM is carried out to Fisher conversion; Ask Fisher score to obtain isometric feature.Make a kind of method of utilizing mixed components cluster Fisher scoring method to improve audio classification accuracy rate can be good at the differentiation details of classification, when the length of audio fragment more in short-term, the effect obtaining can variation, treatment effeciency uprises.The present invention combines the advantage of production pattern and discriminative model, can describe the differentiation formula feature between classification, can be good at again distinguishing details, especially, when extracting characteristic fragment length more in short-term, still can obtain classification accuracy.In the present invention's experiment, about 1500 files of downloading on the audio retrieval website (http://www.findsounds.com) that data set Shi Cong U.S. Comparisonics company used releases and other websites, comprise six semantic classess: ox cry, the tinkle of bells, barking, horse cry, frog cry and laugh are all non-sound-types.Each file is an isolated audio fragment, and the length of fragment is from being less than 1s to 1min not etc.The audio format of downloading has wav, mp3, au, aif and aiff etc., and all audio formats are converted into unified wav form, and sampling rate is 8kH, 16bit, single channel, pcm encoder form.
Experimental result demonstration, utilizing the statistical nature of all frames of sheet intersegmental part is 66.29% as six kinds of average classification accuracies of sound of svm classifier device of inputting; And six kinds of average classification accuracies of sound that utilize mixed components cluster scoring method to obtain that the application proposes minimum be 77.11%, than svm classifier device, improved 10.82 percentage points; Even, six kinds of average classification accuracies of sound that the CGMM-SVM algorithm that the application proposes obtains can reach 82.04%, than svm classifier device, have improved 15.75 percentage points.
Accompanying drawing explanation
Fig. 1 is that the mixed components cluster Fisher scoring method of utilizing that the embodiment of the present invention provides improves the method flow diagram of the method for audio classification accuracy rate.
Embodiment
In order to make object of the present invention, technical scheme and advantage clearer, below in conjunction with embodiment, the present invention is further elaborated.Should be appreciated that specific embodiment described herein, only in order to explain the present invention, is not intended to limit the present invention.
Below in conjunction with drawings and the specific embodiments, application principle of the present invention is further described.
A kind of necessary technology scheme of utilizing mixed components cluster Fisher scoring method to improve the method for audio classification accuracy rate:
As shown in Figure 1, the present invention is achieved in that a kind of method of utilizing mixed components cluster Fisher scoring method to improve audio classification accuracy rate comprises,
S101: each classification GMM is combined;
S102: gaussian component is merged into a Gauss;
S103: form CGMM model;
S104: CGMM is carried out to Fisher conversion;
S105: ask Fisher score to obtain isometric feature.
A kind of less important technical scheme of utilizing mixed components cluster Fisher scoring method to improve the method for audio classification accuracy rate:
Further, at S101, train the GMM of each classification, each GMM is combined.Each gaussian component that is about to each model is carried out order arrangement, is combined into a new model, and redistributes weight to the gaussian component of model, and making all weight sums is 1.The new model obtaining is like this UGMM, and the mixed components number sum that its mixed components number is each classification, carries out cluster by the gaussian component of UGMM;
Further, at S102, closely similar gaussian component is gathered and is one bunch, and by bunch in gaussian component merge into a Gauss, as the component that represents of this bunch;
Further, at S103, by the representing that Gauss is together in series and form new GMM of every cluster, this model is exactly CGMM;
Further, at S104, to CGMM carry out feature that Fisher conversion obtains not only dimension reduced, and, removed partial redundance information, can better express the differentiation between classification, similarity between gaussian component depends on the distance metric of use, in order better to measure the similarity between gaussian component, adopt some conventional distance metrics of machine learning field, comprising: Euclidean distance, mahalanobis distance, Pasteur's distance and K_L2 distance, and needn't calculate the distance between gauss hybrid models, therefore, do not adopt class divergence distance;
Further, at S105, based on CGMM, all samples are asked to Fisher score, obtain new isometric feature.With the isometric features training support vector machine multicategory classification device obtaining.
A kind of method of mixed components cluster Fisher scoring method raising audio classification accuracy rate of utilizing of the present invention is by combining each classification GMM; Gaussian component is merged into a Gauss; Form CGMM model; CGMM is carried out to Fisher conversion; Ask Fisher score to obtain isometric feature.Make a kind of method of utilizing mixed components cluster Fisher scoring method to improve audio classification accuracy rate can be good at the differentiation details of classification, when the length of audio fragment more in short-term, the effect obtaining can variation, treatment effeciency uprises.
The foregoing is only preferred embodiment of the present invention, not in order to limit the present invention, all any modifications of doing within the spirit and principles in the present invention, be equal to and replace and improvement etc., within all should being included in protection scope of the present invention.

Claims (6)

1. utilize mixed components cluster Fisher scoring method to improve a method for audio classification accuracy rate, it is characterized in that, the described method of utilizing mixed components cluster Fisher scoring method to improve audio classification accuracy rate comprises:
Step 1: each classification GMM is combined;
Step 2: gaussian component is merged into a Gauss;
Step 3: form CGMM model;
Step 4: CGMM is carried out to Fisher conversion;
Step 5: ask Fisher score to obtain isometric feature.
2. the method for utilizing mixed components cluster Fisher scoring method to improve audio classification accuracy rate as claimed in claim 1, is characterized in that, in step 1, trains the GMM of each classification, and each GMM is combined; Each gaussian component that is about to each model is carried out order arrangement, is combined into a new model, and redistributes weight to the gaussian component of model, and making all weight sums is 1; The new model obtaining is like this UGMM, and mixed components number is the mixed components number sum of each classification, and the gaussian component of UGMM is carried out to cluster.
3. the method for utilizing mixed components cluster Fisher scoring method to improve audio classification accuracy rate as claimed in claim 1, it is characterized in that, in step 2, closely similar gaussian component is gathered and is one bunch, and by bunch in gaussian component merge into a Gauss, as the component that represents of this bunch.
4. the method for utilizing mixed components cluster Fisher scoring method to improve audio classification accuracy rate as claimed in claim 1, is characterized in that, in step 3, by the representing that Gauss is together in series and form new GMM of every cluster, model is exactly CGMM.
5. the method for utilizing mixed components cluster Fisher scoring method to improve audio classification accuracy rate as claimed in claim 1, it is characterized in that, in step 4, to CGMM carry out feature that Fisher conversion obtains not only dimension reduced, and, removed partial redundance information, can better express the differentiation between classification, similarity between gaussian component depends on the distance metric of use, in order to measure the similarity between gaussian component, adopt the conventional distance metric in machine learning field, comprise: Euclidean distance, mahalanobis distance, Pasteur's distance and K_L2 distance.
6. the method for utilizing mixed components cluster Fisher scoring method to improve audio classification accuracy rate as claimed in claim 1, it is characterized in that, in step 5, based on CGMM, all samples are asked to Fisher score, obtain new isometric feature, with the isometric features training support vector machine multicategory classification device obtaining.
CN201410194236.3A 2014-05-09 2014-05-09 Method for improving audio classification accuracy through mixed component clustering Fisher scoring algorithm Pending CN103956165A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410194236.3A CN103956165A (en) 2014-05-09 2014-05-09 Method for improving audio classification accuracy through mixed component clustering Fisher scoring algorithm

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410194236.3A CN103956165A (en) 2014-05-09 2014-05-09 Method for improving audio classification accuracy through mixed component clustering Fisher scoring algorithm

Publications (1)

Publication Number Publication Date
CN103956165A true CN103956165A (en) 2014-07-30

Family

ID=51333431

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410194236.3A Pending CN103956165A (en) 2014-05-09 2014-05-09 Method for improving audio classification accuracy through mixed component clustering Fisher scoring algorithm

Country Status (1)

Country Link
CN (1) CN103956165A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104269169A (en) * 2014-09-09 2015-01-07 山东师范大学 Classifying method for aliasing audio events
CN110085209A (en) * 2019-04-11 2019-08-02 广州多益网络股份有限公司 A kind of tone color screening technique and device
CN112465768A (en) * 2020-11-25 2021-03-09 公安部物证鉴定中心 Blind detection method and system for splicing and tampering of digital images

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102568477A (en) * 2010-12-29 2012-07-11 盛乐信息技术(上海)有限公司 Semi-supervised pronunciation model modeling system and method
CN103544963A (en) * 2013-11-07 2014-01-29 东南大学 Voice emotion recognition method based on core semi-supervised discrimination and analysis

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102568477A (en) * 2010-12-29 2012-07-11 盛乐信息技术(上海)有限公司 Semi-supervised pronunciation model modeling system and method
CN103544963A (en) * 2013-11-07 2014-01-29 东南大学 Voice emotion recognition method based on core semi-supervised discrimination and analysis

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
王荣燕: "复杂音频分类中的关键问题研究", 《博士研究生学位论文》 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104269169A (en) * 2014-09-09 2015-01-07 山东师范大学 Classifying method for aliasing audio events
CN104269169B (en) * 2014-09-09 2017-04-12 山东师范大学 Classifying method for aliasing audio events
CN110085209A (en) * 2019-04-11 2019-08-02 广州多益网络股份有限公司 A kind of tone color screening technique and device
CN110085209B (en) * 2019-04-11 2021-07-23 广州多益网络股份有限公司 Tone screening method and device
CN112465768A (en) * 2020-11-25 2021-03-09 公安部物证鉴定中心 Blind detection method and system for splicing and tampering of digital images

Similar Documents

Publication Publication Date Title
Rozgić et al. Ensemble of svm trees for multimodal emotion recognition
CN103700370B (en) A kind of radio and television speech recognition system method and system
US11189277B2 (en) Dynamic gazetteers for personalized entity recognition
JP6556575B2 (en) Audio processing apparatus, audio processing method, and audio processing program
Verma et al. i-Vectors in speech processing applications: a survey
CN103065620B (en) Method with which text input by user is received on mobile phone or webpage and synthetized to personalized voice in real time
CN103229233B (en) For identifying the modelling apparatus of speaker and method and Speaker Recognition System
US20180253280A1 (en) Voice interaction apparatus, its processing method, and program
Hansen Recognition of phonemes in a-cappella recordings using temporal patterns and mel frequency cepstral coefficients
Garg et al. Speech based Emotion Recognition based on hierarchical decision tree with SVM, BLG and SVR classifiers
Chen et al. Exploring Rich Expressive Information from Audiobook Data Using Cluster Adaptive Training.
CN104240706A (en) Speaker recognition method based on GMM Token matching similarity correction scores
Ben-Harush et al. Initialization of iterative-based speaker diarization systems for telephone conversations
Ntalampiras A novel holistic modeling approach for generalized sound recognition
Abrol et al. Learning hierarchy aware embedding from raw audio for acoustic scene classification
Fang et al. Facial expression GAN for voice-driven face generation
CN103956165A (en) Method for improving audio classification accuracy through mixed component clustering Fisher scoring algorithm
CN104750677A (en) Speech translation apparatus, speech translation method and speech translation program
Kumar et al. Weakly supervised scalable audio content analysis
Jančovič et al. Bird species recognition from field recordings using HMM-based modelling of frequency tracks
Alghifari et al. On the use of voice activity detection in speech emotion recognition
CN115147521A (en) Method for generating character expression animation based on artificial intelligence semantic analysis
Chen et al. V2C: Visual voice cloning
Liu et al. Speech emotion recognition using an enhanced co-training algorithm
Li et al. Emotion recognition from speech with StarGAN and Dense‐DCNN

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20140730