CN106934414A - It is a kind of based on the gradual Ensemble classifier method with noise label data - Google Patents

It is a kind of based on the gradual Ensemble classifier method with noise label data Download PDF

Info

Publication number
CN106934414A
CN106934414A CN201710081412.6A CN201710081412A CN106934414A CN 106934414 A CN106934414 A CN 106934414A CN 201710081412 A CN201710081412 A CN 201710081412A CN 106934414 A CN106934414 A CN 106934414A
Authority
CN
China
Prior art keywords
grader
sample
branch
sigma
bootstrap
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201710081412.6A
Other languages
Chinese (zh)
Inventor
余志文
赵卓雄
王大兴
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
South China University of Technology SCUT
Original Assignee
South China University of Technology SCUT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by South China University of Technology SCUT filed Critical South China University of Technology SCUT
Priority to CN201710081412.6A priority Critical patent/CN106934414A/en
Publication of CN106934414A publication Critical patent/CN106934414A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2413Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on distances to training or reference patterns

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a kind of based on the gradual Ensemble classifier method with noise label data, comprise the following steps:Input training sample and test sample;Sample is carried out using bootstrap methods and tie up sampling, obtain B bootstrap branch;Grader is trained to B bootstrap branch using LDA methods;Integrated classifier set Γ (P) of a newly-built sky, selection first is added in Γ (P) from the grader of generation;Progressively choose in the remaining grader and to meet the grader of condition and be added to Γ (P);Until the number chosen reaches number G set in advance, stop selection;The corresponding weight of integrated classifier set and each grader branch for choosing is exported simultaneously;Test sample is classified, last predicting the outcome is drawn.The present invention, while studying sample dimension and attribute dimensions, is obtained in that preferable classifying quality in the data set with noise label.

Description

It is a kind of based on the gradual Ensemble classifier method with noise label data
Technical field
It is more particularly to a kind of based on the gradual collection with noise label data the invention belongs to computer machine learning areas Constituent class method.
Background technology
Integrated study, as an important branch of machine learning, is applied to data mining, intelligent transportation system, biology The fields such as informatics, pattern-recognition, obtain the concern of more and more researchers.Relative to single grader, integrated study side Method can be with the multiple graders under integrated different situations, as a unified grader.This kind of integrated classifier has stabilization The characteristics of property, robustness and high-accuracy.Sum it up, integrated classifier is successfully used in due to outstanding performance In no field.
But, traditional integrated learning approach is mainly sample peacekeeping attribute dimension is separated to be studied, not right It carries out overall research.For example, Bagging algorithms are only studied sample dimension, and random subspace algorithms are only right Attribute dimension is studied.This considers sample dimension or the method for only considering attribute dimensions, is not enough to structure one powerful Integrated classifier, and the sample with noise is processed.For example, in some data sets, the pattern with feature is present In some attribute dimensions, but for other data sets, same signature pattern can not play identical effect.
The content of the invention
Shortcoming and deficiency it is an object of the invention to overcome prior art, there is provided a kind of based on noise label data Gradual Ensemble classifier method, in the data set with noise label, while studying sample dimension and attribute dimensions, energy Enough obtain preferable classifying quality.
It is a kind of based on the gradual Ensemble classifier method with noise label data, comprise the following steps:
S1, input training sample and test sample;
S2, carried out using bootstrap methods sample dimension sampling, obtain B bootstrap branch;
S3, grader is trained to B bootstrap branch using LDA linear discriminant analysis method, generate respective classification Device;
S4, newly-built integrated classifier set Γ (P), are initialized as sky, are selected from the grader of step S3 generations First grader is added in Γ (P);
S5, the selection of gradual grader:Progressively chosen in remaining grader follow-up outstanding grader as point Branch is added in Γ (P);Until the number of branches chosen reaches the number of branches G of integrated classifier set set in advance, stop Only select;The corresponding weight of integrated classifier set and each grader branch for choosing is exported simultaneously;
S6, test sample is classified using integrated classifier set and each grader branch corresponding weight, drawn Last predicts the outcome.
Preferably, step S1 is comprised the concrete steps that:One data set with noise label to be sorted of input, uses 5 times Cross validation is tested, specifically:
Test for the first time:1st part used as test data set Pe, it is left 4 parts as training dataset Pr;Training dataset Pr ={ (p1,y1),(p2,y2),…,(pl,yl), l is training sample number, pi(i ∈ { 1 ..., l }) is training sample, yiIt is sample This label, while each piThere is d attribute dimension;
Test for second:2nd part used as test data set Pe, it is left 4 parts as training dataset Pr
By that analogy, 5 experiments are carried out altogether.
Preferably, in step S2, using bootstrap methods to training dataset PrCarry out sample dimension sampling:
Using there is the sampling put back to, wherein sample rate isHave
τ1∈ [0,1] is the stochastic variable that unitizes;According to training sample piSubscript is taken out entering row stochastic sample one by one Take, be designated as under specific sampling:
M therein is the subscript of select sample, τ2∈ [0,1] is the stochastic variable that unitizes;Every time in experiment, Under one sample rate, select and select every time in B times, B timesIndividual training sample, just obtains B training sample set, that is, generate B Bootstrap branches
Preferably, step 3 trains comprising the concrete steps that for grader;Each bootstrap branch separately as an instruction Practice collection, using LDA algorithm, generate respective graderThe object function of LDA is as follows:
ΞbRepresent object function;K represents the number summation of label;Λ(k|pb) represent in bootstrap branches ObIn sample This pbLabel k prior probability function;Υ(yb| k) it is the loss function of sample classification result, wherein k is true tag, yb It is prediction label, and when sample is correctly classified, there are Υ (yb| k)=0, otherwise Υ (yb| k)=1;
Λ therein (k | pb) calculation be:
It is thereinIt is bootstrap branches O with ∑ kbIn each label k average and covariance matrix;| ∑ k | withIt is the determinant and inverse matrix of ∑ k;Λ(pb) it is a standardized constant;Λ (k) be kth class training sample number with ObThe ratio of all numbers of samples in branch.
Preferably, step S4 is comprised the concrete steps that:
The newly-built integrated classifier set Γ (P) of S4-1, are initialized as sky
S4-2 initializes the weight of all samples,
S4-3 calculates the accuracy rate ξ of each bootstrap branch classifierj(j ∈ { 1 ..., B }), chooses accuracy rate highest First selected grader of conduct:
S4-4 calculates grader χ1The Weight composite error of sample of classification error be:
Error functions thereini∈{1,…,l};χ(pi) represent grader χ pairs In sample piClassification results;
S4-5 calculates grader χ1Corresponding weight is θ1
S4-6 is by grader χ1It is added in integrated classifier set Γ (P):
Γ1(P)=θ1χ1
The new weight that S4-7 updates all training samples is
The weight has been normalized, therefore has:
Preferably, step S5 is comprised the concrete steps that:
S5-1 calculates remaining each graderThe first integrated loss functionG ∈ 1 ..., and G } changed for current Article used in lieu of a preface number:
ξ thereinjIt is the grader after the regulation of training sample weightCorresponding grader accuracy rate;Grader is apart from letter Number φ (Oj,Oh) represent bootstrapOjWith OhSimilitude, OjIt is grader χjCorresponding bootstrap branches, OhTo have obtained Grader set in the corresponding bootstrap branches set of all graders;β1And β2The proportioning of both weights is represented, And have β12=1;
Remaining each grader is calculatedThe first integrated loss functionAnd it is ranked up;Calculate the Two integrated loss function Π2(Γ):
C therein is sample label, χhIt is acquired integrated classifier set Γg-1(P) h-th grader in;
From the first integrated loss functionMaximum grader starts to compare, if
Establishment then considers next grader;Until above formula is invalid, grader now is integrated as next addition The grader of grader set Γ (P);
S5-2 calculates the classification error of new each grader branch of integrated classifier after new grader branch is added The Weight composite error of sample be:
Herein Represent the number of branches of goal set Γ (P);Then current newly-increased dividing is updated Class device weight is:
S5-3 is added to newest grader in the set for having selected, and generates newest integrated classifier set:
The weight that all training samples are updated on the basis of new integrated classifier is:
Normalized weight after wherein updating has:
S5-4 continues executing with step S5-1~S5-3, until the number of branches chosen reaches number of branches G set in advance, Stop iteration;The integrated classifier set Γ for choosing is exported simultaneouslyGAnd corresponding weight.
Further, grader distance function φ (O in step S5-1j,Oh) computational methods be:Bootstrap OjWith OhCan To regard two Gaussian Mixture distributions as, Ω is designated as respectivelyjWith Ωh;For two gauss hybrid models, Corresponding weight isWithCorresponding weight isK1With K2Point Wei not gauss hybrid models ΩjWith ΩhThe number of corresponding component;
It is thereinRepresent two Gaussian ProfilesWithPasteur's distance; Gaussian Profile is represent respectivelyWithMean vector and covariance matrix.
Preferably, the specific method of step S6 is:
By prediction label of each branch to the sample after the calculating of each grader branch, can be drawn;For every Individual prediction label obtains the last y that predicts the outcome, it is necessary to carry out the ballot of Weight*
NoteIt is g-th grader χ in integrated classifier setgTo the pre- mark of all samples Sign,It is i-th sample fiPrediction label, c ∈ { 0,1 ..., k-1 } be specific sample label, k be total classification number.
The present invention compared with prior art, has the following advantages that and beneficial effect:
1st, the present invention have studied sample dimension and attribute dimensions simultaneously, be mainly used in the real-life label with noise Data set classification problem, can preferably solve this common classification problem.
2nd, the present invention proposes gradual integrated framework, and preferable integrated result is obtained with less integrated branch, carries The validity of integrator high.
3rd, the present invention proposes a grader selection algorithm based on different Similarity measures, for selecting preferably Grader, so as to constitute effective Ensemble classifier algorithm.
Brief description of the drawings
Fig. 1 is the flow chart of embodiment method;
Fig. 2 is the experimental result of different classifications device.
Specific embodiment
With reference to embodiment and accompanying drawing, the present invention is described in further detail, but embodiments of the present invention are not limited In this.
It is a kind of based on the gradual Ensemble classifier method with noise label data, comprise the following steps:
S1, input training sample and test sample;
S2, carried out using bootstrap methods sample dimension sampling, obtain B bootstrap branch;
S3, grader is trained to B bootstrap branch using LDA linear discriminant analysis method, generate respective classification Device;
S4, newly-built integrated classifier set Γ (P), are initialized as sky, are selected from the grader of step S3 generations First grader is added in Γ (P);
S5, the selection of gradual grader:Progressively chosen in remaining grader follow-up outstanding grader as point Branch is added in Γ (P);Until the number of branches chosen reaches the number of branches G of integrated classifier set set in advance, stop Only select;The corresponding weight of integrated classifier set and each grader branch for choosing is exported simultaneously;
S6, test sample is classified using integrated classifier set and each grader branch corresponding weight, drawn Last predicts the outcome.
The method of 1 pair of the present embodiment does further specific descriptions below in conjunction with the accompanying drawings.
Step 1, input training sample and test sample;
One data set with noise label to be sorted of input.There is attribute dimension to be tieed up with sample in each data set.Often One sample of behavior one is tieed up, each to be classified as an attribute dimension, and each sample has its sample label.By the data set average mark Into 5 parts, tested using 5 times of cross validations.Specifically:
Test for the first time:1st part used as test data set Pe, it is left 4 parts as training dataset Pr.Training dataset is Pr={ (p1,y1),(p2,y2),…,(pl,yl), l is training sample number, pi(i ∈ { 1 ..., l }) is training sample, yi∈ { -1,1 } is sample label (one classification of a tag representation), extends to multi-class classification problem.While each piThere is d Attribute dimension.
Test for second:2nd part used as test data set Pe, it is left 4 parts as training dataset Pr.By that analogy, enter altogether 5 experiments of row.
Step 2, using bootstrap methods to training dataset PrCarry out sample dimension sampling:
Using there is the sampling put back to, wherein sample rate isHave
τ1∈ [0,1] is the stochastic variable that unitizes.The method is according to training sample piSubscript come enter it is row stochastic one by one Sampling.It is designated as under specific sampling:
M therein is the subscript of select sample, τ2∈ [0,1] is the stochastic variable that unitizes.Can finally pick out Individual training sample.5 experiments altogether, test only one of which sample rate every time;Every time in experiment, under a sample rate, B is selected It is secondary, selected every time in B timesIndividual training sample, just obtains B training sample set.
According to step 2 select come training sample, generate B bootstrap branch The method of sampling is used due to the step, sampling sample only fraction out reports the training sample with noise, this Sample can improve validity of the method to noise data.
Step 3, uses linear discriminant analysis (LDA) Algorithm for Training grader:
Each above-mentioned bootstrap branch separately as a training set, using LDA algorithm, respective point is generated Class deviceThe reason for using LDA to carry out is that the algorithm is a dimension-reduction algorithm, and LDA can be reduced simultaneously Noise is tieed up with redundant attributes are removed, and reaches the integrated purpose of attribute dimension, improves the effect of classification.The object function of LDA is as follows:
ΞbRepresent object function;K represents the number summation of label;Λ(k|pb) represent in bootstrap branches ObIn sample This pbLabel k prior probability function;Υ(yb| k) it is the loss function of sample classification result, wherein k is true tag, yb It is prediction label, and when sample is correctly classified, there are Υ (yb| k)=0, otherwise Υ (yb| k)=1.
Λ therein (k | pb) calculation be:
It is thereinIt is bootstrap branches O with ∑ kbIn each label k average and covariance matrix;| ∑ k | withIt is the determinant and inverse matrix of ∑ k;Λ(pb) it is a standardized constant;Λ (k) be kth class training sample number with ObThe ratio of all numbers of samples in branch.
Step 4, selects first grader:
4.1 newly-built integrated classifier set Γ (P), are initialized as sky.
The weight of the 4.2 all samples of initialization,
The 4.3 accuracy rate ξ for calculating each bootstrap branch classifierj(j ∈ { 1 ..., B }), chooses accuracy rate highest First selected grader of conduct:
4.4 calculate first Weight composite error (Weighted of the sample of the classification error of selected grader Sum Error) be:
Error functions (Error Function) therein:i∈{1,…,l};χ (pi) grader χ is represented for sample piClassification results, be 1 or -1.
4.5 calculate first selected grader χ1Corresponding weight is θ1
4.6 are added in integrated classifier set Γ (P) first selected grader:
Γ1(P)=θ1χ1
The 4.7 new weights for updating all training samples are
The weight has been normalized, therefore has:
Step 5, gradual grader selection:
5.1 steps are main on the basis of step 4, progressively choose follow-up outstanding grader branch carry out it is integrated. The method of gradual grader selection is:According to each branch of certain logical calculated(remove and be selected into the remaining of Γ (P) Branch) integrated loss functionIt is defined as:
ξ thereinjAfter sample weights regulationThe corresponding grader accuracy rate of branch;Grader distance function φ (Oj,Oh) represent bootstrapOjWith OhSimilitude, φ (Oj,Oh) function be primarily used to calculate prepare the branch to be added The correlation gathered with the branch for having elected.OjIt is grader χjCorresponding bootstrap branches, OhIt is back iteration The corresponding bootstrap branches set of grader set of middle acquisition.β1And β2The proportioning of both weights is represented, and has β12 =1.
Specifically:BootstrapOjWith OhTwo Gaussian Mixture distribution (Gaussian mixture can be regarded as Models, GMMs), Ω is designated as respectivelyjWith Ωh;For two gauss hybrid models:It is corresponding Weight isWithCorresponding weight isK1With K2It is respectively high This mixed model ΩjWith ΩhThe number of corresponding component;
It is thereinRepresent two Gaussian ProfilesWithPasteur distance (Bhattacharyya Distance)。 Gaussian Profile is represent respectivelyWithMean vector and covariance matrix.
In general, grader loss function Π1The definition of (χ) needs to consider two aspects:A) sample of Weight Weight distribution;B) diversity of the different bootstrap in different Similarity measures.
First calculate the value of the grader loss function of remaining each branch for not adding integrated classifierAnd it is right It is ranked up;From integrated loss functionMaximum branch starts to calculate, if following formula
Establishment then considers next branch, and until above formula is invalid, grader now collects composition as next addition The grader of class device set Γ (P):
C ∈ { -1,1 } therein are the set of sample label (true tag), χhIt is acquired integrated classifier set Γg-1(P) h-th linear discriminant analysis grader in.
Use integrated loss function Π2(Γ) is added in final set determining which grader;Π2(Γ's) Meaning is the branch of the classification accuracy reduction after removing addition.
The grader of next addition integrated classifier set Γ (P) is thereby determined that.
5.2 after new grader branch is added, and calculates the sample of the classification error of new each branch of integrated classifier Weight composite error (Weighted Sum Error) be:
It is thereinG ∈ { 1 ..., G } are current iteration sequence number,Represent the branch of goal set Γ (P) Number.Then need to update current newly-increased grader weight and be:
5.3 newest graders are added in the set that previous step has been selected, and generate newest integrated classifier set:
The weight that newest sample is updated on the basis of new integrated classifier is:
Normalized weight after wherein updating has:
5.4 continue executing with step 5.1~5.3, until the number of branches chosen reaches number of branches G set in advance, stop Only iteration;The integrated classifier set Γ for choosing is exported simultaneouslyGAnd corresponding weight.
In step 5.1, for grader distance function φ (Oj,Oh), there are different definition methods.Bootstrap OjWith OhTwo Gaussian Mixtures distribution (Gaussian mixture models, GMMs) can be regarded as, Ω is designated as respectivelyjWith Ωh.And can Using the parameter of K-means algorithm initialization GMMs models, to be come using Expectation-Maximization (EM) algorithm The parameter value that acquisition most has.
For two gauss hybrid models,Corresponding weight isWithCorresponding weight isK1With K2Respectively gauss hybrid models ΩjWith ΩhIt is right The number of the component answered.There is following several method to calculate corresponding grader distance function φ (Oj,Oh)。
1.φ1(Oj,Oh) be defined as choosing the most short Gaussian Profile of two distances, it is specifically defined as:
It is thereinRepresent two Gaussian ProfilesWithPasteur's distance. Gaussian Profile is represent respectivelyWithMean vector and covariance matrix.
2.φ2(Oj,Oh) it is defined as choosing two apart from farthest Gaussian Profile, it is specifically defined as:
3.φ3(Oj,Oh) paired average similarity is defined as, it is specifically defined as:
4.φ4(Oj,Oh) the average similarity calculating of Weight is defined as, specifically it is defined as:
φ above4(Oj,Oh) definition method major advantage be to add weight, can the different branches of calculating it is similar Property.Can be obtained from experiment simultaneously, the 4th kind of method is optimal.Therefore the present embodiment sorting technique also using this definition Calculate.
By being input into training sample, the sample with noise is removed using the Bagging method of samplings and LDA dimension-reduction algorithms and is tieed up With attribute dimension;Generate a series of bootstraps sub-branches and sample member corresponding with each sub-branch;Using based on point The gradual selection algorithm of the specific cost function of class device (Cost Function) and integrated cost function is carried out to grader Selection, and iteration obtains the corresponding weight of each branch;Branch outcome is collected using the voting method of Weight, most The classification results of integrated classifier are obtained eventually.The accuracy of the present embodiment method is further analyzed, step is as follows:
The 1 part of test set for splitting in step 1 as the input data of the grader attribute dimension Pe(each data Concentrating, there is attribute dimension to be tieed up with sample.One sample dimension of each behavior, each to be classified as an attribute dimension).By each classification After the calculating of device, prediction label of each branch to the sample can be drawn.
Prediction label for above step each result obtains last prediction knot, it is necessary to carry out the ballot of Weight Really.Remember simultaneouslyIt is g-th grader χ in integrated classifier setgTo the pre- mark of all samples Sign,It is i-th sample fiPrediction label, c ∈ { 0,1 ..., k-1 } be specific class label, k be total classification number.
Weight ballot is carried out according to following formula, the last y that predicts the outcome is obtained*
In an experiment, the result for being marked according to method, the result with original training sample is compared, and calculates correspondence Classification accuracy (classification Accuracy, AC).
P thereinSTest set is represented, | PS| represent in test set PSIn test sample number.For sample piIt is the prediction label based on the gradual Ensemble classifier method with noise label data,It is true for the sample Real label.In specific experiment, each result has all carried out 10 computings, and the use of its average value is that final classification is accurate Rate.Wherein it is mainly used in reducing the influence of randomness using 5 times of cross validations.
Fig. 2 illustrates the experimental result of different classifications device, and the data for inclining overstriking correspond on the data set accuracy rate most Method high.Result shows that the method that the present embodiment is proposed can obtain preferable classification results on different pieces of information collection.From most Whole result can show that the method more can effectively solve the classification problem with noise label data.
Above-described embodiment is the present invention preferably implementation method, but embodiments of the present invention are not by above-described embodiment Limitation, it is other it is any without departing from Spirit Essence of the invention and the change, modification, replacement made under principle, combine, simplification, Equivalent substitute mode is should be, is included within protection scope of the present invention.

Claims (9)

1. it is a kind of based on the gradual Ensemble classifier method with noise label data, it is characterised in that to comprise the following steps:
S1, input training sample and test sample;
S2, carried out using bootstrap methods sample dimension sampling, obtain B bootstrap branch;
S3, grader is trained to B bootstrap branch using LDA linear discriminant analysis method, generate respective grader;
S4, newly-built integrated classifier set Γ (P), are initialized as sky, and first is selected from the grader of step S3 generations Individual grader is added in Γ (P);
S5, the selection of gradual grader:Progressively chosen in remaining grader the follow-up grader for meeting condition as point Branch is added in Γ (P);Until the number of branches chosen reaches the number of branches G of integrated classifier set set in advance, stop Only select;The corresponding weight of integrated classifier set and each grader branch for choosing is exported simultaneously;
S6, test sample is classified using integrated classifier set and each grader branch corresponding weight, drawn last Predict the outcome.
2. gradual Ensemble classifier method according to claim 1, it is characterised in that step S1's comprises the concrete steps that:It is defeated Enter a data set with noise label to be sorted, select training dataset Pr={ (p1,y1),(p2,y2),…,(pl, yl), l is training sample number, pi(i ∈ { 1 ..., l }) is training sample, yiIt is sample label, while each piThere is d category Property dimension.
3. gradual Ensemble classifier method according to claim 1, it is characterised in that carried out using 5 times of cross validations Experiment, specifically:
Test for the first time:1st part used as test data set Pe, it is left 4 parts as training dataset Pr;Training dataset Pr= {(p1,y1),(p2,y2),…,(pl,yl), l is training sample number, pi(i ∈ { 1 ..., l }) is training sample, yiIt is sample Label, while each piThere is d attribute dimension;
Test for second:2nd part used as test data set Pe, it is left 4 parts as training dataset Pr
By that analogy, 5 experiments are carried out altogether.
4. gradual Ensemble classifier method according to claim 2, it is characterised in that in step S2, use bootstrap Method is to training dataset PrCarry out sample dimension sampling:
Using there is the sampling put back to, wherein sample rate isHave
τ1∈ [0,1] is the stochastic variable that unitizes;According to training sample piSubscript enters row stochastic sampling one by one, It is designated as under specific sampling:
M therein is the subscript of select sample, τ2∈ [0,1] is the stochastic variable that unitizes;Every time in experiment, at one Under sample rate, select and select every time in B times, B timesIndividual training sample, just obtains B training sample set, that is, generate B Bootstrap branches
5. gradual Ensemble classifier method according to claim 4, it is characterised in that step 3 training grader it is specific Step is:Each bootstrap branch separately as a training set, using LDA algorithm, respective grader is generatedThe object function of LDA is as follows:
ΞbRepresent object function;K represents the number summation of label;Λ(k|pb) represent in bootstrap branches ObIn sample pb Label k prior probability function;Υ(yb| k) it is the loss function of sample classification result, wherein k is true tag, ybFor pre- Mark label, and when sample is correctly classified, there is Υ (yb| k)=0, otherwise Υ (yb| k)=1;
Λ therein (k | pb) calculation be:
Λ ( k | p b ) = Λ ( p b | k ) Λ ( k ) Λ ( p b )
Λ ( p b | k ) = 1 ( 2 π | Σ k | ) 1 2 e - 1 2 ( p b - μ k b ) T Σ K - 1 ( p b - p k )
It is thereinIt is bootstrap branches O with ∑ kbIn each label k average and covariance matrix;| ∑ k | withFor The determinant and inverse matrix of ∑ k;Λ(pb) it is a standardized constant;Λ (k) is kth class training sample number and ObBranch In all numbers of samples ratio.
6. gradual Ensemble classifier method according to claim 2, it is characterised in that step S4's comprises the concrete steps that:
The newly-built integrated classifier set Γ (P) of S4-1, are initialized as sky;
S4-2 initializes the weight of all samples,
S4-3 calculates the accuracy rate ξ of each bootstrap branch classifierj(j ∈ { 1 ..., B }), chooses accuracy rate highest and makees It is first selected grader:
χ 1 = argmax χ j ∈ Γ ^ ξ j ;
S4-4 calculates grader χ1The Weight composite error of sample of classification error be:
∈ 1 = Σ i ω i 1 Θ ( χ 1 ( P ) , y , i )
Error functions thereini∈{1,…,l};χ(pi) grader χ is represented for sample This piClassification results;
S4-5 calculates grader χ1Corresponding weight is θ1
θ 1 = 1 2 ln ( 1 - ∈ 1 ∈ 1 ) ;
S4-6 is by grader χ1It is added in integrated classifier set Γ (P):
Γ1(P)=θ1χ1
The new weight that S4-7 updates all training samples is
ω i 2 = ω i 1 e - y i θ 1 χ 1 ( p i ) ;
The weight has been normalized, therefore has:
Σ i = 1 l ω i 2 = 1.
7. gradual Ensemble classifier method according to claim 6, it is characterised in that step S5's comprises the concrete steps that:
S5-1 calculates remaining each graderThe first integrated loss functionG ∈ 1 ..., and G } it is current iteration sequence Number:
Π 1 ( χ j g ) = β 1 ξ j + β 2 φ ( O j , O h )
ξ thereinjIt is the grader after the regulation of training sample weightCorresponding grader accuracy rate;Grader distance function φ (Oj,Oh) represent bootstrapOjWith OhSimilitude, OjIt is grader χjCorresponding bootstrap branches, OhFor acquired The corresponding bootstrap branches set of all graders in grader set;β1And β2The proportioning of both weights is represented, and is had β12=1;
Remaining each grader is calculatedThe first integrated loss functionAnd it is ranked up;Second is calculated to collect Into loss function Π2(Γ):
Π 2 ( Γ ) = Σ i = 1 l | y i - Γ ( p i ) |
Γ ( p i ) = arg max c Σ h = 1 g θ h · 1 { χ h ( p i ) = c }
C therein is sample label, χhIt is acquired integrated classifier set Γg-1(P) h-th grader in;
From the first integrated loss functionMaximum grader starts to compare, if
&Pi; 2 ( &Gamma; g - 1 ( P ) ) < &Pi; 2 ( &Gamma; g - 1 ( P ) + &theta; g &chi; j g )
Establishment then considers next grader;Until above formula is invalid, grader now is used as next addition Ensemble classifier The grader of device set Γ (P);
S5-2 calculates the sample of the classification error of new each grader branch of integrated classifier after new grader branch is added This Weight composite error is:
&Element; g = &Sigma; i &omega; i g &Theta; ( &chi; j g ( P ) , y , i )
Herein Represent the number of branches of goal set Γ (P);Then current newly-increased grader is updated Weight is:
&theta; j g = 1 2 l n ( 1 - &Element; g &Element; g ) ;
S5-3 is added to newest grader in the set for having selected, and generates newest integrated classifier set:
&Gamma; g ( P ) = &Gamma; g - 1 ( P ) + &theta; j g &chi; j g
The weight that all training samples are updated on the basis of new integrated classifier is:
&omega; i g + 1 = &omega; i g e - y i &theta; j g &chi; j g ( p i )
Normalized weight after wherein updating has:
&Sigma; i = 1 l &omega; i g + 1 = 1 ;
S5-4 continues executing with step S5-1~S5-3, until the number of branches chosen reaches number of branches G set in advance, stops Iteration;The integrated classifier set Γ for choosing is exported simultaneouslyGAnd corresponding weight.
8. gradual Ensemble classifier method according to claim 7, it is characterised in that grader distance function φ in step S5-1 (Oj,Oh) computational methods be:BootstrapOjWith OhTwo Gaussian Mixture distributions can be regarded as, Ω is designated as respectivelyjWith Ωh;For two Individual gauss hybrid models,Corresponding weight isWith Corresponding weight isK1With K2Respectively gauss hybrid models ΩjWith ΩhThe number of corresponding component;
&phi; ( O j , O h ) = 1 K 1 K 2 &Sigma; h &Sigma; l &pi; k 1 j &pi; k 2 h &Sigma; k 1 = 1 K 1 &Sigma; k 2 = 1 K 2 &pi; k 1 j &pi; k 2 h &psi; ( &Phi; k 1 j , &Phi; k 2 h )
&psi; ( &Phi; k 1 j , &Phi; k 2 h ) = 1 8 ( &mu; k 1 j - &mu; k 2 h ) T ( &Sigma; k 1 j + &Sigma; k 2 j 2 ) - 1 ( &mu; k 1 j - &mu; k 2 h ) + 1 2 ln | &Sigma; k 1 j + &Sigma; k 2 j 2 | | &Sigma; k 1 j | | &Sigma; k 2 j |
It is thereinRepresent two Gaussian ProfilesWithPasteur's distance;Point Gaussian Profile is not representWithMean vector and covariance matrix.
9. gradual Ensemble classifier method according to claim 7, it is characterised in that the specific method of step S6 is:
By prediction label of each branch to the sample after the calculating of each grader branch, can be drawn;It is pre- for each Mark label obtain the last y that predicts the outcome, it is necessary to carry out the ballot of Weight*
NoteIt is g-th grader χ in integrated classifier setgTo the prediction label of all samples, It is i-th sample fiiPrediction label, c ∈ { 0,1 ..., k-1 } be specific sample label, k be total classification number.
CN201710081412.6A 2017-02-15 2017-02-15 It is a kind of based on the gradual Ensemble classifier method with noise label data Pending CN106934414A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710081412.6A CN106934414A (en) 2017-02-15 2017-02-15 It is a kind of based on the gradual Ensemble classifier method with noise label data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710081412.6A CN106934414A (en) 2017-02-15 2017-02-15 It is a kind of based on the gradual Ensemble classifier method with noise label data

Publications (1)

Publication Number Publication Date
CN106934414A true CN106934414A (en) 2017-07-07

Family

ID=59423237

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710081412.6A Pending CN106934414A (en) 2017-02-15 2017-02-15 It is a kind of based on the gradual Ensemble classifier method with noise label data

Country Status (1)

Country Link
CN (1) CN106934414A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107451101A (en) * 2017-07-21 2017-12-08 江南大学 It is a kind of to be layered integrated Gaussian process recurrence soft-measuring modeling method
CN108021941A (en) * 2017-11-30 2018-05-11 四川大学 Use in medicament-induced hepatotoxicity Forecasting Methodology and device

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107451101A (en) * 2017-07-21 2017-12-08 江南大学 It is a kind of to be layered integrated Gaussian process recurrence soft-measuring modeling method
CN107451101B (en) * 2017-07-21 2020-06-09 江南大学 Method for predicting concentration of butane at bottom of debutanizer by hierarchical integrated Gaussian process regression soft measurement modeling
CN108021941A (en) * 2017-11-30 2018-05-11 四川大学 Use in medicament-induced hepatotoxicity Forecasting Methodology and device
CN108021941B (en) * 2017-11-30 2020-08-28 四川大学 Method and device for predicting drug hepatotoxicity

Similar Documents

Publication Publication Date Title
Karthika et al. A Naïve Bayesian classifier for educational qualification
Buscema et al. Training with input selection and testing (TWIST) algorithm: a significant advance in pattern recognition performance of machine learning
CN100585617C (en) Based on sorter integrated face identification system and method thereof
CN108090510A (en) A kind of integrated learning approach and device based on interval optimization
CN103473556B (en) Hierarchical SVM sorting technique based on rejection subspace
CN106228183A (en) A kind of semi-supervised learning sorting technique and device
CN103927550B (en) A kind of Handwritten Numeral Recognition Method and system
CN106126972A (en) A kind of level multi-tag sorting technique for protein function prediction
Dehuri et al. A hybrid genetic based functional link artificial neural network with a statistical comparison of classifiers over multiple datasets
CN105760888A (en) Neighborhood rough set ensemble learning method based on attribute clustering
CN109165672A (en) A kind of Ensemble classifier method based on incremental learning
Sushil et al. Rule induction for global explanation of trained models
CN106326843A (en) Face recognition method
CN104966106A (en) Biological age step-by-step predication method based on support vector machine
Patacsil Survival analysis approach for early prediction of student dropout using enrollment student data and ensemble models
CN106934414A (en) It is a kind of based on the gradual Ensemble classifier method with noise label data
CN114049527A (en) Self-knowledge distillation method and system based on online cooperation and fusion
Kumar et al. Analysis of feature selection and data mining techniques to predict student academic performance
CN116306785A (en) Student performance prediction method of convolution long-short term network based on attention mechanism
Hastarimasuci et al. Variable Selection to Determine Majors of Student using K-Nearest Neighbor and Naïve Bayes Classifier Algorithm
Ntoutsi et al. A general framework for estimating similarity of datasets and decision trees: exploring semantic similarity of decision trees
CN109858543A (en) The image inferred based on low-rank sparse characterization and relationship can degree of memory prediction technique
CN114997175A (en) Emotion analysis method based on field confrontation training
US20220188647A1 (en) Model learning apparatus, data analysis apparatus, model learning method and program
Gaber et al. Optimisation of ensemble classifiers using genetic algorithm

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20170707