CN106934414A

CN106934414A - It is a kind of based on the gradual Ensemble classifier method with noise label data

Info

Publication number: CN106934414A
Application number: CN201710081412.6A
Authority: CN
Inventors: 余志文; 赵卓雄; 王大兴
Original assignee: South China University of Technology SCUT
Current assignee: South China University of Technology SCUT
Priority date: 2017-02-15
Filing date: 2017-02-15
Publication date: 2017-07-07

Abstract

The invention discloses a kind of based on the gradual Ensemble classifier method with noise label data, comprise the following steps：Input training sample and test sample；Sample is carried out using bootstrap methods and tie up sampling, obtain B bootstrap branch；Grader is trained to B bootstrap branch using LDA methods；Integrated classifier set Γ (P) of a newly-built sky, selection first is added in Γ (P) from the grader of generation；Progressively choose in the remaining grader and to meet the grader of condition and be added to Γ (P)；Until the number chosen reaches number G set in advance, stop selection；The corresponding weight of integrated classifier set and each grader branch for choosing is exported simultaneously；Test sample is classified, last predicting the outcome is drawn.The present invention, while studying sample dimension and attribute dimensions, is obtained in that preferable classifying quality in the data set with noise label.

Description

It is a kind of based on the gradual Ensemble classifier method with noise label data

Technical field

It is more particularly to a kind of based on the gradual collection with noise label data the invention belongs to computer machine learning areas Constituent class method.

Background technology

Integrated study, as an important branch of machine learning, is applied to data mining, intelligent transportation system, biology The fields such as informatics, pattern-recognition, obtain the concern of more and more researchers.Relative to single grader, integrated study side Method can be with the multiple graders under integrated different situations, as a unified grader.This kind of integrated classifier has stabilization The characteristics of property, robustness and high-accuracy.Sum it up, integrated classifier is successfully used in due to outstanding performance In no field.

But, traditional integrated learning approach is mainly sample peacekeeping attribute dimension is separated to be studied, not right It carries out overall research.For example, Bagging algorithms are only studied sample dimension, and random subspace algorithms are only right Attribute dimension is studied.This considers sample dimension or the method for only considering attribute dimensions, is not enough to structure one powerful Integrated classifier, and the sample with noise is processed.For example, in some data sets, the pattern with feature is present In some attribute dimensions, but for other data sets, same signature pattern can not play identical effect.

The content of the invention

Shortcoming and deficiency it is an object of the invention to overcome prior art, there is provided a kind of based on noise label data Gradual Ensemble classifier method, in the data set with noise label, while studying sample dimension and attribute dimensions, energy Enough obtain preferable classifying quality.

It is a kind of based on the gradual Ensemble classifier method with noise label data, comprise the following steps：

S1, input training sample and test sample；

S2, carried out using bootstrap methods sample dimension sampling, obtain B bootstrap branch；

S3, grader is trained to B bootstrap branch using LDA linear discriminant analysis method, generate respective classification Device；

S4, newly-built integrated classifier set Γ (P), are initialized as sky, are selected from the grader of step S3 generations First grader is added in Γ (P)；

S5, the selection of gradual grader：Progressively chosen in remaining grader follow-up outstanding grader as point Branch is added in Γ (P)；Until the number of branches chosen reaches the number of branches G of integrated classifier set set in advance, stop Only select；The corresponding weight of integrated classifier set and each grader branch for choosing is exported simultaneously；

S6, test sample is classified using integrated classifier set and each grader branch corresponding weight, drawn Last predicts the outcome.

Preferably, step S1 is comprised the concrete steps that：One data set with noise label to be sorted of input, uses 5 times Cross validation is tested, specifically：

Test for the first time：1st part used as test data set P_e, it is left 4 parts as training dataset P_r；Training dataset P_r ={ (p₁,y₁),(p₂,y₂),…,(p_l,y_l), l is training sample number, p_i(i ∈ { 1 ..., l }) is training sample, y_iIt is sample This label, while each p_iThere is d attribute dimension；

Test for second：2nd part used as test data set P_e, it is left 4 parts as training dataset P_r；

By that analogy, 5 experiments are carried out altogether.

Preferably, in step S2, using bootstrap methods to training dataset P_rCarry out sample dimension sampling：

Using there is the sampling put back to, wherein sample rate isHave

τ₁∈ [0,1] is the stochastic variable that unitizes；According to training sample p_iSubscript is taken out entering row stochastic sample one by one Take, be designated as under specific sampling：

M therein is the subscript of select sample, τ₂∈ [0,1] is the stochastic variable that unitizes；Every time in experiment, Under one sample rate, select and select every time in B times, B timesIndividual training sample, just obtains B training sample set, that is, generate B Bootstrap branches

Preferably, step 3 trains comprising the concrete steps that for grader；Each bootstrap branch separately as an instruction Practice collection, using LDA algorithm, generate respective graderThe object function of LDA is as follows：

Ξ^bRepresent object function；K represents the number summation of label；Λ(k|p^b) represent in bootstrap branches O^bIn sample This p^bLabel k prior probability function；Υ(y^b| k) it is the loss function of sample classification result, wherein k is true tag, y^b It is prediction label, and when sample is correctly classified, there are Υ (y^b| k)=0, otherwise Υ (y^b| k)=1；

Λ therein (k | p^b) calculation be：

It is thereinIt is bootstrap branches O with ∑ k^bIn each label k average and covariance matrix；| ∑ k | withIt is the determinant and inverse matrix of ∑ k；Λ(p^b) it is a standardized constant；Λ (k) be kth class training sample number with O^bThe ratio of all numbers of samples in branch.

Preferably, step S4 is comprised the concrete steps that：

The newly-built integrated classifier set Γ (P) of S4-1, are initialized as sky

S4-2 initializes the weight of all samples,

S4-3 calculates the accuracy rate ξ of each bootstrap branch classifier_j(j ∈ { 1 ..., B }), chooses accuracy rate highest First selected grader of conduct：

S4-4 calculates grader χ¹The Weight composite error of sample of classification error be：

Error functions thereini∈{1,…,l}；χ(p_i) represent grader χ pairs In sample p_iClassification results；

S4-5 calculates grader χ¹Corresponding weight is θ¹：

S4-6 is by grader χ¹It is added in integrated classifier set Γ (P)：

Γ¹(P)=θ¹χ¹；

The new weight that S4-7 updates all training samples is

The weight has been normalized, therefore has：

Preferably, step S5 is comprised the concrete steps that：

S5-1 calculates remaining each graderThe first integrated loss functionG ∈ 1 ..., and G } changed for current Article used in lieu of a preface number：

ξ therein_jIt is the grader after the regulation of training sample weightCorresponding grader accuracy rate；Grader is apart from letter Number φ (O^j,O^h) represent bootstrapO^jWith O^hSimilitude, O^jIt is grader χ_jCorresponding bootstrap branches, O^hTo have obtained Grader set in the corresponding bootstrap branches set of all graders；β₁And β₂The proportioning of both weights is represented, And have β₁+β₂=1；

Remaining each grader is calculatedThe first integrated loss functionAnd it is ranked up；Calculate the Two integrated loss function Π₂(Γ)：

C therein is sample label, χ^hIt is acquired integrated classifier set Γ^g-1(P) h-th grader in；

From the first integrated loss functionMaximum grader starts to compare, if

Establishment then considers next grader；Until above formula is invalid, grader now is integrated as next addition The grader of grader set Γ (P)；

S5-2 calculates the classification error of new each grader branch of integrated classifier after new grader branch is added The Weight composite error of sample be：

Herein Represent the number of branches of goal set Γ (P)；Then current newly-increased dividing is updated Class device weight is：

S5-3 is added to newest grader in the set for having selected, and generates newest integrated classifier set：

The weight that all training samples are updated on the basis of new integrated classifier is：

Normalized weight after wherein updating has：

S5-4 continues executing with step S5-1~S5-3, until the number of branches chosen reaches number of branches G set in advance, Stop iteration；The integrated classifier set Γ for choosing is exported simultaneously^GAnd corresponding weight.

Further, grader distance function φ (O in step S5-1^j,O^h) computational methods be：Bootstrap O^jWith O^hCan To regard two Gaussian Mixture distributions as, Ω is designated as respectively^jWith Ω^h；For two gauss hybrid models, Corresponding weight isWithCorresponding weight isK₁With K₂Point Wei not gauss hybrid models Ω^jWith Ω^hThe number of corresponding component；

It is thereinRepresent two Gaussian ProfilesWithPasteur's distance； Gaussian Profile is represent respectivelyWithMean vector and covariance matrix.

Preferably, the specific method of step S6 is：

By prediction label of each branch to the sample after the calculating of each grader branch, can be drawn；For every Individual prediction label obtains the last y that predicts the outcome, it is necessary to carry out the ballot of Weight^*：

NoteIt is g-th grader χ in integrated classifier set^gTo the pre- mark of all samples Sign,It is i-th sample f_iPrediction label, c ∈ { 0,1 ..., k-1 } be specific sample label, k be total classification number.

The present invention compared with prior art, has the following advantages that and beneficial effect：

1st, the present invention have studied sample dimension and attribute dimensions simultaneously, be mainly used in the real-life label with noise Data set classification problem, can preferably solve this common classification problem.

2nd, the present invention proposes gradual integrated framework, and preferable integrated result is obtained with less integrated branch, carries The validity of integrator high.

3rd, the present invention proposes a grader selection algorithm based on different Similarity measures, for selecting preferably Grader, so as to constitute effective Ensemble classifier algorithm.

Brief description of the drawings

Fig. 1 is the flow chart of embodiment method；

Fig. 2 is the experimental result of different classifications device.

Specific embodiment

With reference to embodiment and accompanying drawing, the present invention is described in further detail, but embodiments of the present invention are not limited In this.

S1, input training sample and test sample；

The method of 1 pair of the present embodiment does further specific descriptions below in conjunction with the accompanying drawings.

Step 1, input training sample and test sample；

One data set with noise label to be sorted of input.There is attribute dimension to be tieed up with sample in each data set.Often One sample of behavior one is tieed up, each to be classified as an attribute dimension, and each sample has its sample label.By the data set average mark Into 5 parts, tested using 5 times of cross validations.Specifically：

Test for the first time：1st part used as test data set P_e, it is left 4 parts as training dataset P_r.Training dataset is P_r={ (p₁,y₁),(p₂,y₂),…,(p_l,y_l), l is training sample number, p_i(i ∈ { 1 ..., l }) is training sample, y_i∈ { -1,1 } is sample label (one classification of a tag representation), extends to multi-class classification problem.While each p_iThere is d Attribute dimension.

Test for second：2nd part used as test data set P_e, it is left 4 parts as training dataset P_r.By that analogy, enter altogether 5 experiments of row.

Step 2, using bootstrap methods to training dataset P_rCarry out sample dimension sampling：

Using there is the sampling put back to, wherein sample rate isHave

τ₁∈ [0,1] is the stochastic variable that unitizes.The method is according to training sample p_iSubscript come enter it is row stochastic one by one Sampling.It is designated as under specific sampling：

M therein is the subscript of select sample, τ₂∈ [0,1] is the stochastic variable that unitizes.Can finally pick out Individual training sample.5 experiments altogether, test only one of which sample rate every time；Every time in experiment, under a sample rate, B is selected It is secondary, selected every time in B timesIndividual training sample, just obtains B training sample set.

According to step 2 select come training sample, generate B bootstrap branch The method of sampling is used due to the step, sampling sample only fraction out reports the training sample with noise, this Sample can improve validity of the method to noise data.

Step 3, uses linear discriminant analysis (LDA) Algorithm for Training grader：

Each above-mentioned bootstrap branch separately as a training set, using LDA algorithm, respective point is generated Class deviceThe reason for using LDA to carry out is that the algorithm is a dimension-reduction algorithm, and LDA can be reduced simultaneously Noise is tieed up with redundant attributes are removed, and reaches the integrated purpose of attribute dimension, improves the effect of classification.The object function of LDA is as follows：

Ξ^bRepresent object function；K represents the number summation of label；Λ(k|p^b) represent in bootstrap branches O^bIn sample This p^bLabel k prior probability function；Υ(y^b| k) it is the loss function of sample classification result, wherein k is true tag, y^b It is prediction label, and when sample is correctly classified, there are Υ (y^b| k)=0, otherwise Υ (y^b| k)=1.

Λ therein (k | p^b) calculation be：

Step 4, selects first grader：

4.1 newly-built integrated classifier set Γ (P), are initialized as sky.

The weight of the 4.2 all samples of initialization,

The 4.3 accuracy rate ξ for calculating each bootstrap branch classifier_j(j ∈ { 1 ..., B }), chooses accuracy rate highest First selected grader of conduct：

4.4 calculate first Weight composite error (Weighted of the sample of the classification error of selected grader Sum Error) be：

Error functions (Error Function) therein：i∈{1,…,l}；χ (p_i) grader χ is represented for sample p_iClassification results, be 1 or -1.

4.5 calculate first selected grader χ¹Corresponding weight is θ¹：

4.6 are added in integrated classifier set Γ (P) first selected grader：

Γ¹(P)=θ¹χ¹

The 4.7 new weights for updating all training samples are

The weight has been normalized, therefore has：

Step 5, gradual grader selection：

5.1 steps are main on the basis of step 4, progressively choose follow-up outstanding grader branch carry out it is integrated. The method of gradual grader selection is：According to each branch of certain logical calculated(remove and be selected into the remaining of Γ (P) Branch) integrated loss functionIt is defined as：

ξ therein_jAfter sample weights regulationThe corresponding grader accuracy rate of branch；Grader distance function φ (O^j,O^h) represent bootstrapO^jWith O^hSimilitude, φ (O^j,O^h) function be primarily used to calculate prepare the branch to be added The correlation gathered with the branch for having elected.O^jIt is grader χ_jCorresponding bootstrap branches, O^hIt is back iteration The corresponding bootstrap branches set of grader set of middle acquisition.β₁And β₂The proportioning of both weights is represented, and has β₁+β₂ =1.

Specifically：BootstrapO^jWith O^hTwo Gaussian Mixture distribution (Gaussian mixture can be regarded as Models, GMMs), Ω is designated as respectively^jWith Ω^h；For two gauss hybrid models：It is corresponding Weight isWithCorresponding weight isK₁With K₂It is respectively high This mixed model Ω^jWith Ω^hThe number of corresponding component；

It is thereinRepresent two Gaussian ProfilesWithPasteur distance (Bhattacharyya Distance)。 Gaussian Profile is represent respectivelyWithMean vector and covariance matrix.

In general, grader loss function Π₁The definition of (χ) needs to consider two aspects：A) sample of Weight Weight distribution；B) diversity of the different bootstrap in different Similarity measures.

First calculate the value of the grader loss function of remaining each branch for not adding integrated classifierAnd it is right It is ranked up；From integrated loss functionMaximum branch starts to calculate, if following formula

Establishment then considers next branch, and until above formula is invalid, grader now collects composition as next addition The grader of class device set Γ (P)：

C ∈ { -1,1 } therein are the set of sample label (true tag), χ^hIt is acquired integrated classifier set Γ^g-1(P) h-th linear discriminant analysis grader in.

Use integrated loss function Π₂(Γ) is added in final set determining which grader；Π₂(Γ's) Meaning is the branch of the classification accuracy reduction after removing addition.

The grader of next addition integrated classifier set Γ (P) is thereby determined that.

5.2 after new grader branch is added, and calculates the sample of the classification error of new each branch of integrated classifier Weight composite error (Weighted Sum Error) be：

It is thereinG ∈ { 1 ..., G } are current iteration sequence number,Represent the branch of goal set Γ (P) Number.Then need to update current newly-increased grader weight and be：

5.3 newest graders are added in the set that previous step has been selected, and generate newest integrated classifier set：

The weight that newest sample is updated on the basis of new integrated classifier is：

Normalized weight after wherein updating has：

5.4 continue executing with step 5.1~5.3, until the number of branches chosen reaches number of branches G set in advance, stop Only iteration；The integrated classifier set Γ for choosing is exported simultaneously^GAnd corresponding weight.

In step 5.1, for grader distance function φ (O^j,O^h), there are different definition methods.Bootstrap O^jWith O^hTwo Gaussian Mixtures distribution (Gaussian mixture models, GMMs) can be regarded as, Ω is designated as respectively^jWith Ω^h.And can Using the parameter of K-means algorithm initialization GMMs models, to be come using Expectation-Maximization (EM) algorithm The parameter value that acquisition most has.

For two gauss hybrid models,Corresponding weight isWithCorresponding weight isK₁With K₂Respectively gauss hybrid models Ω^jWith Ω^hIt is right The number of the component answered.There is following several method to calculate corresponding grader distance function φ (O^j,O^h)。

1.φ₁(O^j,O^h) be defined as choosing the most short Gaussian Profile of two distances, it is specifically defined as：

It is thereinRepresent two Gaussian ProfilesWithPasteur's distance. Gaussian Profile is represent respectivelyWithMean vector and covariance matrix.

2.φ₂(O^j,O^h) it is defined as choosing two apart from farthest Gaussian Profile, it is specifically defined as：

3.φ₃(O^j,O^h) paired average similarity is defined as, it is specifically defined as：

4.φ₄(O^j,O^h) the average similarity calculating of Weight is defined as, specifically it is defined as：

φ above₄(O^j,O^h) definition method major advantage be to add weight, can the different branches of calculating it is similar Property.Can be obtained from experiment simultaneously, the 4th kind of method is optimal.Therefore the present embodiment sorting technique also using this definition Calculate.

By being input into training sample, the sample with noise is removed using the Bagging method of samplings and LDA dimension-reduction algorithms and is tieed up With attribute dimension；Generate a series of bootstraps sub-branches and sample member corresponding with each sub-branch；Using based on point The gradual selection algorithm of the specific cost function of class device (Cost Function) and integrated cost function is carried out to grader Selection, and iteration obtains the corresponding weight of each branch；Branch outcome is collected using the voting method of Weight, most The classification results of integrated classifier are obtained eventually.The accuracy of the present embodiment method is further analyzed, step is as follows：

The 1 part of test set for splitting in step 1 as the input data of the grader attribute dimension P_e(each data Concentrating, there is attribute dimension to be tieed up with sample.One sample dimension of each behavior, each to be classified as an attribute dimension).By each classification After the calculating of device, prediction label of each branch to the sample can be drawn.

Prediction label for above step each result obtains last prediction knot, it is necessary to carry out the ballot of Weight Really.Remember simultaneouslyIt is g-th grader χ in integrated classifier set^gTo the pre- mark of all samples Sign,It is i-th sample f_iPrediction label, c ∈ { 0,1 ..., k-1 } be specific class label, k be total classification number.

Weight ballot is carried out according to following formula, the last y that predicts the outcome is obtained^*：

In an experiment, the result for being marked according to method, the result with original training sample is compared, and calculates correspondence Classification accuracy (classification Accuracy, AC).

P therein_STest set is represented, | P_S| represent in test set P_SIn test sample number.For sample p_i：It is the prediction label based on the gradual Ensemble classifier method with noise label data,It is true for the sample Real label.In specific experiment, each result has all carried out 10 computings, and the use of its average value is that final classification is accurate Rate.Wherein it is mainly used in reducing the influence of randomness using 5 times of cross validations.

Fig. 2 illustrates the experimental result of different classifications device, and the data for inclining overstriking correspond on the data set accuracy rate most Method high.Result shows that the method that the present embodiment is proposed can obtain preferable classification results on different pieces of information collection.From most Whole result can show that the method more can effectively solve the classification problem with noise label data.

Above-described embodiment is the present invention preferably implementation method, but embodiments of the present invention are not by above-described embodiment Limitation, it is other it is any without departing from Spirit Essence of the invention and the change, modification, replacement made under principle, combine, simplification, Equivalent substitute mode is should be, is included within protection scope of the present invention.

Claims

1. it is a kind of based on the gradual Ensemble classifier method with noise label data, it is characterised in that to comprise the following steps：

S1, input training sample and test sample；

S3, grader is trained to B bootstrap branch using LDA linear discriminant analysis method, generate respective grader；

S4, newly-built integrated classifier set Γ (P), are initialized as sky, and first is selected from the grader of step S3 generations Individual grader is added in Γ (P)；

S5, the selection of gradual grader：Progressively chosen in remaining grader the follow-up grader for meeting condition as point Branch is added in Γ (P)；Until the number of branches chosen reaches the number of branches G of integrated classifier set set in advance, stop Only select；The corresponding weight of integrated classifier set and each grader branch for choosing is exported simultaneously；

S6, test sample is classified using integrated classifier set and each grader branch corresponding weight, drawn last Predict the outcome.

2. gradual Ensemble classifier method according to claim 1, it is characterised in that step S1's comprises the concrete steps that：It is defeated Enter a data set with noise label to be sorted, select training dataset P_r={ (p₁,y₁),(p₂,y₂),…,(p_l, y_l), l is training sample number, p_i(i ∈ { 1 ..., l }) is training sample, y_iIt is sample label, while each p_iThere is d category Property dimension.

3. gradual Ensemble classifier method according to claim 1, it is characterised in that carried out using 5 times of cross validations Experiment, specifically：

Test for the first time：1st part used as test data set P_e, it is left 4 parts as training dataset P_r；Training dataset P_r= {(p₁,y₁),(p₂,y₂),…,(p_l,y_l), l is training sample number, p_i(i ∈ { 1 ..., l }) is training sample, y_iIt is sample Label, while each p_iThere is d attribute dimension；

By that analogy, 5 experiments are carried out altogether.

4. gradual Ensemble classifier method according to claim 2, it is characterised in that in step S2, use bootstrap Method is to training dataset P_rCarry out sample dimension sampling：

Using there is the sampling put back to, wherein sample rate isHave

τ₁∈ [0,1] is the stochastic variable that unitizes；According to training sample p_iSubscript enters row stochastic sampling one by one, It is designated as under specific sampling：

M therein is the subscript of select sample, τ₂∈ [0,1] is the stochastic variable that unitizes；Every time in experiment, at one Under sample rate, select and select every time in B times, B timesIndividual training sample, just obtains B training sample set, that is, generate B Bootstrap branches

5. gradual Ensemble classifier method according to claim 4, it is characterised in that step 3 training grader it is specific Step is：Each bootstrap branch separately as a training set, using LDA algorithm, respective grader is generatedThe object function of LDA is as follows：

Ξ^bRepresent object function；K represents the number summation of label；Λ(k|p^b) represent in bootstrap branches O^bIn sample p^b Label k prior probability function；Υ(y^b| k) it is the loss function of sample classification result, wherein k is true tag, y^bFor pre- Mark label, and when sample is correctly classified, there is Υ (y^b| k)=0, otherwise Υ (y^b| k)=1；

Λ therein (k | p^b) calculation be：

Λ (k | p^{b}) = \frac{Λ (p^{b} | k) Λ (k)}{Λ (p^{b})}

Λ (p^{b} | k) = \frac{1}{{(2 π | Σ k |)}^{\frac{1}{2}}} e^{- \frac{1}{2} {(p^{b} - μ_{k}^{b})}^{T} Σ_{K}^{- 1} (p^{b} - p^{k})}

It is thereinIt is bootstrap branches O with ∑ k^bIn each label k average and covariance matrix；| ∑ k | withFor The determinant and inverse matrix of ∑ k；Λ(p^b) it is a standardized constant；Λ (k) is kth class training sample number and O^bBranch In all numbers of samples ratio.

6. gradual Ensemble classifier method according to claim 2, it is characterised in that step S4's comprises the concrete steps that：

The newly-built integrated classifier set Γ (P) of S4-1, are initialized as sky；

S4-2 initializes the weight of all samples,

S4-3 calculates the accuracy rate ξ of each bootstrap branch classifier_j(j ∈ { 1 ..., B }), chooses accuracy rate highest and makees It is first selected grader：

χ^{1} = {argmax}_{χ_{j} &Element; \hat{Γ}} ξ_{j};

{&Element;}^{1} = Σ_{i} ω_{i}^{1} Θ (χ^{1} (P), y, i)

Error functions thereini∈{1,…,l}；χ(p_i) grader χ is represented for sample This p_iClassification results；

S4-5 calculates grader χ¹Corresponding weight is θ¹：

θ^{1} = \frac{1}{2} \ln (\frac{1 - {&Element;}^{1}}{{&Element;}^{1}});

S4-6 is by grader χ¹It is added in integrated classifier set Γ (P)：

Γ¹(P)=θ¹χ¹；

The new weight that S4-7 updates all training samples is

ω_{i}^{2} = ω_{i}^{1} e^{- y_{i} θ^{1} χ^{1} (p_{i})};

The weight has been normalized, therefore has：

Σ_{i = 1}^{l} ω_{i}^{2} = 1.

7. gradual Ensemble classifier method according to claim 6, it is characterised in that step S5's comprises the concrete steps that：

S5-1 calculates remaining each graderThe first integrated loss functionG ∈ 1 ..., and G } it is current iteration sequence Number：

Π_{1} (χ_{j}^{g}) = β_{1} ξ_{j} + β_{2} φ (O^{j}, O^{h})

ξ therein_jIt is the grader after the regulation of training sample weightCorresponding grader accuracy rate；Grader distance function φ (O^j,O^h) represent bootstrapO^jWith O^hSimilitude, O^jIt is grader χ_jCorresponding bootstrap branches, O^hFor acquired The corresponding bootstrap branches set of all graders in grader set；β₁And β₂The proportioning of both weights is represented, and is had β₁+β₂=1；

Remaining each grader is calculatedThe first integrated loss functionAnd it is ranked up；Second is calculated to collect Into loss function Π₂(Γ)：

Π_{2} (Γ) = Σ_{i = 1}^{l} | y_{i} - Γ (p_{i}) |

Γ (p_{i}) = \arg \max_{c} Σ_{h = 1}^{g} θ^{h} \cdot 1 {χ^{h} (p_{i}) = c}

From the first integrated loss functionMaximum grader starts to compare, if

Π_{2} (Γ^{g - 1} (P)) < Π_{2} (Γ^{g - 1} (P) + θ^{g} χ_{j}^{g})

Establishment then considers next grader；Until above formula is invalid, grader now is used as next addition Ensemble classifier The grader of device set Γ (P)；

S5-2 calculates the sample of the classification error of new each grader branch of integrated classifier after new grader branch is added This Weight composite error is：

{&Element;}^{g} = Σ_{i} ω_{i}^{g} Θ (χ_{j}^{g} (P), y, i)

Herein Represent the number of branches of goal set Γ (P)；Then current newly-increased grader is updated Weight is：

θ_{j}^{g} = \frac{1}{2} l n (\frac{1 - {&Element;}^{g}}{{&Element;}^{g}});

Γ^{g} (P) = Γ^{g - 1} (P) + θ_{j}^{g} χ_{j}^{g}

ω_{i}^{g + 1} = ω_{i}^{g} e^{- y_{i} θ_{j}^{g} χ_{j}^{g} (p_{i})}

Normalized weight after wherein updating has：

Σ_{i = 1}^{l} ω_{i}^{g + 1} = 1;

S5-4 continues executing with step S5-1~S5-3, until the number of branches chosen reaches number of branches G set in advance, stops Iteration；The integrated classifier set Γ for choosing is exported simultaneously^GAnd corresponding weight.

8. gradual Ensemble classifier method according to claim 7, it is characterised in that grader distance function φ in step S5-1 (O^j,O^h) computational methods be：BootstrapO^jWith O^hTwo Gaussian Mixture distributions can be regarded as, Ω is designated as respectively^jWith Ω^h；For two Individual gauss hybrid models,Corresponding weight isWith Corresponding weight isK₁With K₂Respectively gauss hybrid models Ω^jWith Ω^hThe number of corresponding component；

φ (O^{j}, O^{h}) = \frac{1}{K_{1} K_{2} Σ_{h} Σ_{l} π_{k_{1}}^{j} π_{k_{2}}^{h}} Σ_{k_{1} = 1}^{K_{1}} Σ_{k_{2} = 1}^{K_{2}} π_{k_{1}}^{j} π_{k_{2}}^{h} ψ (Φ_{k_{1}}^{j}, Φ_{k_{2}}^{h})

ψ (Φ_{k_{1}}^{j}, Φ_{k_{2}}^{h}) = \frac{1}{8} {(μ_{k_{1}}^{j} - μ_{k_{2}}^{h})}^{T} {(\frac{Σ_{k_{1}}^{j} + Σ_{k_{2}}^{j}}{2})}^{- 1} (μ_{k_{1}}^{j} - μ_{k_{2}}^{h}) + \frac{1}{2} \ln \frac{| \frac{Σ_{k_{1}}^{j} + Σ_{k_{2}}^{j}}{2} |}{\sqrt{| Σ_{k_{1}}^{j} | | Σ_{k_{2}}^{j} |}}

It is thereinRepresent two Gaussian ProfilesWithPasteur's distance；Point Gaussian Profile is not representWithMean vector and covariance matrix.

9. gradual Ensemble classifier method according to claim 7, it is characterised in that the specific method of step S6 is：

By prediction label of each branch to the sample after the calculating of each grader branch, can be drawn；It is pre- for each Mark label obtain the last y that predicts the outcome, it is necessary to carry out the ballot of Weight^*：

NoteIt is g-th grader χ in integrated classifier set^gTo the prediction label of all samples, It is i-th sample fi_iPrediction label, c ∈ { 0,1 ..., k-1 } be specific sample label, k be total classification number.