CN106599906A - Multiple kernel learning classification method based on noise probability function - Google Patents

Multiple kernel learning classification method based on noise probability function Download PDF

Info

Publication number
CN106599906A
CN106599906A CN201611052894.4A CN201611052894A CN106599906A CN 106599906 A CN106599906 A CN 106599906A CN 201611052894 A CN201611052894 A CN 201611052894A CN 106599906 A CN106599906 A CN 106599906A
Authority
CN
China
Prior art keywords
noise probability
noise
sigma
probability function
sample
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201611052894.4A
Other languages
Chinese (zh)
Inventor
武德安
冯杰
吴磊
陈鹏
冯江远
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chengdu Gkhb Information Technology Co ltd
University of Electronic Science and Technology of China
Original Assignee
Chengdu Gkhb Information Technology Co ltd
University of Electronic Science and Technology of China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chengdu Gkhb Information Technology Co ltd, University of Electronic Science and Technology of China filed Critical Chengdu Gkhb Information Technology Co ltd
Priority to CN201611052894.4A priority Critical patent/CN106599906A/en
Publication of CN106599906A publication Critical patent/CN106599906A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2413Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on distances to training or reference patterns
    • G06F18/24147Distances to closest patterns, e.g. nearest neighbour classification

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a multiple kernel learning classification method based on a noise probability function. The method comprises the following steps: calculation of the noise probability function; selection of a base classifier ft<*>(x) and calculation of corresponding coefficients at<*> in each round of iteration; and update of weight. The multiple kernel learning classification method based on the noise probability function is suitable for a classification algorithm of a noise-polluted data set, and has the advantages of no need to solve the complex optimization problem, being small in calculation amount compared with a conventional multiple kernel learning method, effectively solving the problem of noise sensitivity of conventional multiple kernel boosting learning, and being better in robustness.

Description

Multiple Kernel Learning sorting technique based on noise probability function
Technical field
The present invention relates to a kind of Multiple Kernel Learning sorting technique based on noise probability function belongs to data mining technology field.
Background technology
Linear SVM (SVM) is proposed that, with going deep into that SVM is studied, SVM is penetrated into by Cortes and Vapnik The numerous areas of machine learning, such as pattern classification, regression estimates, Multilayer networks etc..SVM achieves huge success, But it belongs to monokaryon study (Single Kernel Learning), with certain limitation.
Machine learning field, Multiple Kernel Learning (Multiple Kernel Learning) increasingly receives publicity, because phase Than monokaryon study, Multiple Kernel Learning can overcome huge in sample characteristics, Heterogeneous Information, multidimensional data irregularly and data exist High-dimensional feature space is distributed uneven phenomenon.
Various effective Multiple Kernel Learnings theories and method, such as Lanckriet in 2004, Bartlett etc. are occurred in that in recent years People proposes the learning method based on Semidefinite Programming (Semidefinite program), the same year Bach, Jordan et al. proposition Optimization method based on quadratic constraints type quadratic programming (Quadratically constrained quajdratic Program), Sonnernburg in 2006, Ratsch et al. are proposed based on semo-infinite linear optimization (Semi-infinite Linear program) learning method, the same year Smola, Ratsch et al. are proposed based on of hypernucleus (Hyperkernels) Learning method, Rakotomamonjy in 2007 et al. propose simple Multiple Kernel Learning method (Simple MKL), Tao Jian texts in 2011 Multinuclear local domain adaptive learning method (Local Learning-based Domain are proposed with Wang Shitong Adaptation)。
Said method achieves certain success in different application field, but these traditional Multiple Kernel Learning methods need to ask The optimization problem of one complexity of solution, amount of calculation is larger, is difficult convergence.Hao Xia in 2012, Steven propose integrated multinuclear Algorithm frame MKBoost of habit, itself test result indicate that, the algorithm greatly reduces amount of calculation, it may have higher precision, but The introducing of Boosting thoughts, also brings the problem to noise-sensitive.
The content of the invention
The purpose of the present invention is that and provide a kind of multinuclear based on noise probability function in order to solve the above problems Sorting technique is practised, the method does not spend the optimization problem of solving complexity, and amount of calculation is little, and efficiently solves to noise-sensitive Problem.
The present invention is achieved through the following technical solutions above-mentioned purpose:
A kind of Multiple Kernel Learning sorting technique based on noise probability function, comprises the following steps:
(1) calculating of noise probability function;
(2) the base grader f in iteration is often taken turnst *The selection of (x) and coefficient of correspondenceCalculating;
(3) renewal of weight.
Preferably, in the step (1), noise probability function is calculated as follows
Wherein:
In formula, ZiIt is sample (xi,yi) K nearest neighbor point set, f (x) be base grader, yjFor true classification, uKNN (xi,yi) it is noise detection result,It is uKNN(xi,yi) meansigma methodss under base grader f (x), λ is artificial arrange parameter;
If set ZiThe sample of middle classification error is more, then sample (xi,yi) be noise probability it is bigger
In the step (2), based on noise probability functionIt is defined below loss function:
Loss function L (y, f (x)) is minimized, base grader f in t wheel iteration is then selected as followst *(x) and Calculate its corresponding coefficient
Wherein,
In formula, Ft-1(xi) represent the assembled classifier obtained after (t-1) takes turns iteration;
In the step (3), using the sample noise probability under M kernel functionIt is initial as follows Change the coefficient related to the selection of base graderAnd sample weights
Known t takes turns the data of iteration, and weight is updated as follows:
The beneficial effects of the present invention is:
Multiple Kernel Learning sorting technique based on noise probability function of the present invention is applied to by noise contaminated data collection Sorting algorithm, advantage is the optimization problem for not spending solving complexity, and amount of calculation is less than traditional Multiple Kernel Learning method, and effectively Solve the problems, such as traditional multinuclear integrated study (Multiple Kernel Boosting Learning) to noise-sensitive, Robustness is more preferable.
Specific embodiment
With reference to embodiment, the invention will be further described:
Multiple Kernel Learning sorting technique based on noise probability function of the present invention is comprised the following steps:
(1) calculating of noise probability function;
(2) the base grader f in iteration is often taken turnst *The selection of (x) and coefficient of correspondenceCalculating;
(3) renewal of weight;
Wherein, in the step (1), noise probability function is calculated as follows
Wherein:
In formula, ZiIt is sample (xi,yi) K nearest neighbor point set, f (x) be base grader, yjFor true classification, uKNN (xi,yi) it is noise detection result,It is uKNN(xi,yi) meansigma methodss under base grader f (x), λ is artificial arrange parameter;
If set ZiThe sample of middle classification error is more, then sample (xi,yi) be noise probability it is bigger
In the step (2), based on noise probability functionIt is defined below loss function:
Loss function L (y, f (x)) is minimized, base grader f in t wheel iteration is then selected as followst *(x) and Calculate its corresponding coefficient
Wherein,
In formula, Ft-1(xi) represent the assembled classifier obtained after (t-1) takes turns iteration;
In the step (3), using the sample noise probability under M kernel functionIt is initial as follows Change the coefficient related to the selection of base graderAnd sample weights
Known t takes turns the data of iteration, and weight is updated as follows:
Embodiment:
In order to verify to the correctness of this method and effectiveness, we are tested using 6 UCI data sets.It is right It is as shown in table 1 below using 8 kernel functions (5 gaussian kernel functions, 3 Polynomial kernel functions) in each data set:
The information of the UCI data sets of table 1
Datasets Samples Features Classes
Balance-scale 567 4 2
Breast-cancer 569 32 2
Ionosphere 351 34 2
Blood-transfusion 748 5 2
Diabetic-retinopathy 1151 20 2
Pima-indians 768 8 2
Under each noise level, experiment 30 times are repeated to data set, experimental result is the meansigma methodss of 30 experiments, as follows Shown in table 2:
10%, 20%, 30% training sample category attribute value is changed in experiment at random respectively, to obtain different noise water Flat training set.In sample noise probability calculation, K=7 in k-nearest neighbor KNN carries out distance metric using Euclidean distance, makes an uproar Sound probability functionMiddle λ=8.6.
As shown in Table 2, in the case of muting, three kinds of test of heuristics errors are suitable, when noise level is 10%, MKB_NP algorithms Balance-scale, Ionosphere, Pima-indians these three data concentrated expressions better than other two Individual algorithm;When noise level is 20%, new algorithm is in Balance-scale, Blood-transfusion, Diabetic- Test error is minimum in retinopathy these three data sets;When noise level is 30%, MKB_NP algorithms are in Balance- This four data concentrated expressions of scale, Breast-cancer, Blood-transfusion, Pima-indians are the most excellent.
In sum, MKB_NP algorithms are that performance of the inventive method on 6 data sets is better than MKB_D1 and MKB_D2 Algorithm, and in the data classification of higher noise levels, lower to noise data sensitivity, training error is less, and robustness is more It is good.
Above-described embodiment is presently preferred embodiments of the present invention, is not the restriction to technical solution of the present invention, as long as Without the technical scheme that creative work can be realized on the basis of above-described embodiment, it is regarded as falling into patent of the present invention Rights protection scope in.

Claims (2)

1. a kind of Multiple Kernel Learning sorting technique based on noise probability function, it is characterised in that:Comprise the following steps:
(1) calculating of noise probability function;
(2) the base grader f in iteration is often taken turnst *The selection of (x) and coefficient of correspondenceCalculating;
(3) renewal of weight.
2. the Multiple Kernel Learning sorting technique based on noise probability function according to claim 1, it is characterised in that:The step Suddenly in (1), noise probability function is calculated as follows
Wherein:
u &OverBar; = &Sigma; i = 1 N u K N N ( x i , y i ) / N ,
In formula, ZiIt is K nearest neighbor point set of sample (xi, yi), f (x) is base grader, yjFor true classification, uKNN(xi, yi) it is noise detection result,It is uKNN(xi,yi) meansigma methodss under base grader f (x), λ is artificial arrange parameter;
If set ZiThe sample of middle classification error is more, then sample (xi,yi) be noise probability it is bigger
In the step (2), based on noise probability functionIt is defined below loss function:
Loss function L (y, f (x)) is minimized, base grader f in t wheel iteration is then selected as followst *(x) and calculate Its corresponding coefficient
f t * ( x ) = a r g m i n f ( x ) &Sigma; f ( x i ) &NotEqual; y i ( w 1 i t - w 2 i t )
a t * = 1 2 l o g ( &Sigma; f ( x i ) &NotEqual; y i w 2 i t + &Sigma; f ( x i ) = y i w 1 i t &Sigma; f ( x i ) &NotEqual; y i w 1 i t + &Sigma; f ( x i ) = y i w 2 i t ) ,
Wherein,
In formula, Ft-1(xi) represent the assembled classifier obtained after (t-1) takes turns iteration;
In the step (3), using the sample noise probability under M kernel functionAs follows initialization with The coefficient of the selection correlation of base graderAnd sample weights
D i 1 = w 1 i 1 + w 2 i 1 &Sigma; i = 1 N ( w 1 i 1 + w 2 i 1 ) ,
Known t takes turns the data of iteration, and weight is updated as follows:
D i t + 1 = ( w 1 i t + w 2 i t ) / &Sigma; i = 1 N ( w 1 i t + w 2 i t )
w 2 i t + 1 = w 2 i t exp ( y i a t * f t * ( x i ) )
w 1 i t + 1 = w 1 i t exp ( - y i a t * f t * ( x i ) ) .
CN201611052894.4A 2016-11-25 2016-11-25 Multiple kernel learning classification method based on noise probability function Pending CN106599906A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201611052894.4A CN106599906A (en) 2016-11-25 2016-11-25 Multiple kernel learning classification method based on noise probability function

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201611052894.4A CN106599906A (en) 2016-11-25 2016-11-25 Multiple kernel learning classification method based on noise probability function

Publications (1)

Publication Number Publication Date
CN106599906A true CN106599906A (en) 2017-04-26

Family

ID=58593335

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201611052894.4A Pending CN106599906A (en) 2016-11-25 2016-11-25 Multiple kernel learning classification method based on noise probability function

Country Status (1)

Country Link
CN (1) CN106599906A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107480715A (en) * 2017-08-10 2017-12-15 合肥工业大学 The method for building up and system of the transmission device failure predication model of hydroforming equipment
CN109359677A (en) * 2018-10-09 2019-02-19 中国石油大学(华东) A kind of resistance to online kernel-based learning method of classifying of making an uproar more

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107480715A (en) * 2017-08-10 2017-12-15 合肥工业大学 The method for building up and system of the transmission device failure predication model of hydroforming equipment
CN109359677A (en) * 2018-10-09 2019-02-19 中国石油大学(华东) A kind of resistance to online kernel-based learning method of classifying of making an uproar more
CN109359677B (en) * 2018-10-09 2021-11-23 中国石油大学(华东) Noise-resistant online multi-classification kernel learning algorithm

Similar Documents

Publication Publication Date Title
Zisselman et al. Deep residual flow for out of distribution detection
Xu et al. Raise a child in large language model: Towards effective and generalizable fine-tuning
US11100283B2 (en) Method for detecting deceptive e-commerce reviews based on sentiment-topic joint probability
Doove et al. Recursive partitioning for missing data imputation in the presence of interaction effects
Ji et al. Ranking-based classification of heterogeneous information networks
Yu et al. Latent semantic analysis for text categorization using neural network
Křížek et al. Improving stability of feature selection methods
US11574240B2 (en) Categorization for a global taxonomy
Taghavi et al. Channel-optimized quantum error correction
CN105005589A (en) Text classification method and text classification device
De Amorim Constrained clustering with minkowski weighted k-means
CN104252456A (en) Method, device and system for weight estimation
CN106326390A (en) Recommendation method based on collaborative filtering
CN106599906A (en) Multiple kernel learning classification method based on noise probability function
CN107729589A (en) A kind of method of the quick calculating SRAM failure probabilities based on more starting point importance sampling technologies
Seo et al. Reliable knowledge graph path representation learning
Tzacheva et al. Support confidence and utility of action rules triggered by meta-actions
Zulehner et al. Approximation of quantum states using decision diagrams
Badapanda et al. Agriculture data visualization and analysis using data mining techniques: application of unsupervised machine learning
CN103942318A (en) Parallel AP propagating XML big data clustering integration method
Cai et al. Robust fuzzy relational classifier incorporating the soft class labels
Liu et al. Identification of drainage patterns using a graph convolutional neural network
CN107193916A (en) Method and system are recommended in a kind of personalized variation inquiry
Baitharu et al. Comparison of Kernel selection for support vector machines using diabetes dataset
Kohns et al. Decoupling shrinkage and selection for the Bayesian quantile regression

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20170426

RJ01 Rejection of invention patent application after publication