CN106599906A - Multiple kernel learning classification method based on noise probability function - Google Patents
Multiple kernel learning classification method based on noise probability function Download PDFInfo
- Publication number
- CN106599906A CN106599906A CN201611052894.4A CN201611052894A CN106599906A CN 106599906 A CN106599906 A CN 106599906A CN 201611052894 A CN201611052894 A CN 201611052894A CN 106599906 A CN106599906 A CN 106599906A
- Authority
- CN
- China
- Prior art keywords
- noise probability
- noise
- sigma
- probability function
- sample
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2413—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on distances to training or reference patterns
- G06F18/24147—Distances to closest patterns, e.g. nearest neighbour classification
Landscapes
- Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Theoretical Computer Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a multiple kernel learning classification method based on a noise probability function. The method comprises the following steps: calculation of the noise probability function; selection of a base classifier ft<*>(x) and calculation of corresponding coefficients at<*> in each round of iteration; and update of weight. The multiple kernel learning classification method based on the noise probability function is suitable for a classification algorithm of a noise-polluted data set, and has the advantages of no need to solve the complex optimization problem, being small in calculation amount compared with a conventional multiple kernel learning method, effectively solving the problem of noise sensitivity of conventional multiple kernel boosting learning, and being better in robustness.
Description
Technical field
The present invention relates to a kind of Multiple Kernel Learning sorting technique based on noise probability function belongs to data mining technology field.
Background technology
Linear SVM (SVM) is proposed that, with going deep into that SVM is studied, SVM is penetrated into by Cortes and Vapnik
The numerous areas of machine learning, such as pattern classification, regression estimates, Multilayer networks etc..SVM achieves huge success,
But it belongs to monokaryon study (Single Kernel Learning), with certain limitation.
Machine learning field, Multiple Kernel Learning (Multiple Kernel Learning) increasingly receives publicity, because phase
Than monokaryon study, Multiple Kernel Learning can overcome huge in sample characteristics, Heterogeneous Information, multidimensional data irregularly and data exist
High-dimensional feature space is distributed uneven phenomenon.
Various effective Multiple Kernel Learnings theories and method, such as Lanckriet in 2004, Bartlett etc. are occurred in that in recent years
People proposes the learning method based on Semidefinite Programming (Semidefinite program), the same year Bach, Jordan et al. proposition
Optimization method based on quadratic constraints type quadratic programming (Quadratically constrained quajdratic
Program), Sonnernburg in 2006, Ratsch et al. are proposed based on semo-infinite linear optimization (Semi-infinite
Linear program) learning method, the same year Smola, Ratsch et al. are proposed based on of hypernucleus (Hyperkernels)
Learning method, Rakotomamonjy in 2007 et al. propose simple Multiple Kernel Learning method (Simple MKL), Tao Jian texts in 2011
Multinuclear local domain adaptive learning method (Local Learning-based Domain are proposed with Wang Shitong
Adaptation)。
Said method achieves certain success in different application field, but these traditional Multiple Kernel Learning methods need to ask
The optimization problem of one complexity of solution, amount of calculation is larger, is difficult convergence.Hao Xia in 2012, Steven propose integrated multinuclear
Algorithm frame MKBoost of habit, itself test result indicate that, the algorithm greatly reduces amount of calculation, it may have higher precision, but
The introducing of Boosting thoughts, also brings the problem to noise-sensitive.
The content of the invention
The purpose of the present invention is that and provide a kind of multinuclear based on noise probability function in order to solve the above problems
Sorting technique is practised, the method does not spend the optimization problem of solving complexity, and amount of calculation is little, and efficiently solves to noise-sensitive
Problem.
The present invention is achieved through the following technical solutions above-mentioned purpose:
A kind of Multiple Kernel Learning sorting technique based on noise probability function, comprises the following steps:
(1) calculating of noise probability function;
(2) the base grader f in iteration is often taken turnst *The selection of (x) and coefficient of correspondenceCalculating;
(3) renewal of weight.
Preferably, in the step (1), noise probability function is calculated as follows
Wherein:
In formula, ZiIt is sample (xi,yi) K nearest neighbor point set, f (x) be base grader, yjFor true classification, uKNN
(xi,yi) it is noise detection result,It is uKNN(xi,yi) meansigma methodss under base grader f (x), λ is artificial arrange parameter;
If set ZiThe sample of middle classification error is more, then sample (xi,yi) be noise probability it is bigger
In the step (2), based on noise probability functionIt is defined below loss function:
Loss function L (y, f (x)) is minimized, base grader f in t wheel iteration is then selected as followst *(x) and
Calculate its corresponding coefficient
Wherein,
In formula, Ft-1(xi) represent the assembled classifier obtained after (t-1) takes turns iteration;
In the step (3), using the sample noise probability under M kernel functionIt is initial as follows
Change the coefficient related to the selection of base graderAnd sample weights
Known t takes turns the data of iteration, and weight is updated as follows:
The beneficial effects of the present invention is:
Multiple Kernel Learning sorting technique based on noise probability function of the present invention is applied to by noise contaminated data collection
Sorting algorithm, advantage is the optimization problem for not spending solving complexity, and amount of calculation is less than traditional Multiple Kernel Learning method, and effectively
Solve the problems, such as traditional multinuclear integrated study (Multiple Kernel Boosting Learning) to noise-sensitive,
Robustness is more preferable.
Specific embodiment
With reference to embodiment, the invention will be further described:
Multiple Kernel Learning sorting technique based on noise probability function of the present invention is comprised the following steps:
(1) calculating of noise probability function;
(2) the base grader f in iteration is often taken turnst *The selection of (x) and coefficient of correspondenceCalculating;
(3) renewal of weight;
Wherein, in the step (1), noise probability function is calculated as follows
Wherein:
In formula, ZiIt is sample (xi,yi) K nearest neighbor point set, f (x) be base grader, yjFor true classification, uKNN
(xi,yi) it is noise detection result,It is uKNN(xi,yi) meansigma methodss under base grader f (x), λ is artificial arrange parameter;
If set ZiThe sample of middle classification error is more, then sample (xi,yi) be noise probability it is bigger
In the step (2), based on noise probability functionIt is defined below loss function:
Loss function L (y, f (x)) is minimized, base grader f in t wheel iteration is then selected as followst *(x) and
Calculate its corresponding coefficient
Wherein,
In formula, Ft-1(xi) represent the assembled classifier obtained after (t-1) takes turns iteration;
In the step (3), using the sample noise probability under M kernel functionIt is initial as follows
Change the coefficient related to the selection of base graderAnd sample weights
Known t takes turns the data of iteration, and weight is updated as follows:
Embodiment:
In order to verify to the correctness of this method and effectiveness, we are tested using 6 UCI data sets.It is right
It is as shown in table 1 below using 8 kernel functions (5 gaussian kernel functions, 3 Polynomial kernel functions) in each data set:
The information of the UCI data sets of table 1
Datasets | Samples | Features | Classes |
Balance-scale | 567 | 4 | 2 |
Breast-cancer | 569 | 32 | 2 |
Ionosphere | 351 | 34 | 2 |
Blood-transfusion | 748 | 5 | 2 |
Diabetic-retinopathy | 1151 | 20 | 2 |
Pima-indians | 768 | 8 | 2 |
Under each noise level, experiment 30 times are repeated to data set, experimental result is the meansigma methodss of 30 experiments, as follows
Shown in table 2:
10%, 20%, 30% training sample category attribute value is changed in experiment at random respectively, to obtain different noise water
Flat training set.In sample noise probability calculation, K=7 in k-nearest neighbor KNN carries out distance metric using Euclidean distance, makes an uproar
Sound probability functionMiddle λ=8.6.
As shown in Table 2, in the case of muting, three kinds of test of heuristics errors are suitable, when noise level is 10%,
MKB_NP algorithms Balance-scale, Ionosphere, Pima-indians these three data concentrated expressions better than other two
Individual algorithm;When noise level is 20%, new algorithm is in Balance-scale, Blood-transfusion, Diabetic-
Test error is minimum in retinopathy these three data sets;When noise level is 30%, MKB_NP algorithms are in Balance-
This four data concentrated expressions of scale, Breast-cancer, Blood-transfusion, Pima-indians are the most excellent.
In sum, MKB_NP algorithms are that performance of the inventive method on 6 data sets is better than MKB_D1 and MKB_D2
Algorithm, and in the data classification of higher noise levels, lower to noise data sensitivity, training error is less, and robustness is more
It is good.
Above-described embodiment is presently preferred embodiments of the present invention, is not the restriction to technical solution of the present invention, as long as
Without the technical scheme that creative work can be realized on the basis of above-described embodiment, it is regarded as falling into patent of the present invention
Rights protection scope in.
Claims (2)
1. a kind of Multiple Kernel Learning sorting technique based on noise probability function, it is characterised in that:Comprise the following steps:
(1) calculating of noise probability function;
(2) the base grader f in iteration is often taken turnst *The selection of (x) and coefficient of correspondenceCalculating;
(3) renewal of weight.
2. the Multiple Kernel Learning sorting technique based on noise probability function according to claim 1, it is characterised in that:The step
Suddenly in (1), noise probability function is calculated as follows
Wherein:
In formula, ZiIt is K nearest neighbor point set of sample (xi, yi), f (x) is base grader, yjFor true classification, uKNN(xi,
yi) it is noise detection result,It is uKNN(xi,yi) meansigma methodss under base grader f (x), λ is artificial arrange parameter;
If set ZiThe sample of middle classification error is more, then sample (xi,yi) be noise probability it is bigger
In the step (2), based on noise probability functionIt is defined below loss function:
Loss function L (y, f (x)) is minimized, base grader f in t wheel iteration is then selected as followst *(x) and calculate
Its corresponding coefficient
Wherein,
In formula, Ft-1(xi) represent the assembled classifier obtained after (t-1) takes turns iteration;
In the step (3), using the sample noise probability under M kernel functionAs follows initialization with
The coefficient of the selection correlation of base graderAnd sample weights
Known t takes turns the data of iteration, and weight is updated as follows:
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201611052894.4A CN106599906A (en) | 2016-11-25 | 2016-11-25 | Multiple kernel learning classification method based on noise probability function |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201611052894.4A CN106599906A (en) | 2016-11-25 | 2016-11-25 | Multiple kernel learning classification method based on noise probability function |
Publications (1)
Publication Number | Publication Date |
---|---|
CN106599906A true CN106599906A (en) | 2017-04-26 |
Family
ID=58593335
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201611052894.4A Pending CN106599906A (en) | 2016-11-25 | 2016-11-25 | Multiple kernel learning classification method based on noise probability function |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106599906A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107480715A (en) * | 2017-08-10 | 2017-12-15 | 合肥工业大学 | The method for building up and system of the transmission device failure predication model of hydroforming equipment |
CN109359677A (en) * | 2018-10-09 | 2019-02-19 | 中国石油大学(华东) | A kind of resistance to online kernel-based learning method of classifying of making an uproar more |
-
2016
- 2016-11-25 CN CN201611052894.4A patent/CN106599906A/en active Pending
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107480715A (en) * | 2017-08-10 | 2017-12-15 | 合肥工业大学 | The method for building up and system of the transmission device failure predication model of hydroforming equipment |
CN109359677A (en) * | 2018-10-09 | 2019-02-19 | 中国石油大学(华东) | A kind of resistance to online kernel-based learning method of classifying of making an uproar more |
CN109359677B (en) * | 2018-10-09 | 2021-11-23 | 中国石油大学(华东) | Noise-resistant online multi-classification kernel learning algorithm |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Zisselman et al. | Deep residual flow for out of distribution detection | |
Xu et al. | Raise a child in large language model: Towards effective and generalizable fine-tuning | |
US11100283B2 (en) | Method for detecting deceptive e-commerce reviews based on sentiment-topic joint probability | |
Doove et al. | Recursive partitioning for missing data imputation in the presence of interaction effects | |
Ji et al. | Ranking-based classification of heterogeneous information networks | |
Yu et al. | Latent semantic analysis for text categorization using neural network | |
Křížek et al. | Improving stability of feature selection methods | |
US11574240B2 (en) | Categorization for a global taxonomy | |
Taghavi et al. | Channel-optimized quantum error correction | |
CN105005589A (en) | Text classification method and text classification device | |
De Amorim | Constrained clustering with minkowski weighted k-means | |
CN104252456A (en) | Method, device and system for weight estimation | |
CN106326390A (en) | Recommendation method based on collaborative filtering | |
CN106599906A (en) | Multiple kernel learning classification method based on noise probability function | |
CN107729589A (en) | A kind of method of the quick calculating SRAM failure probabilities based on more starting point importance sampling technologies | |
Seo et al. | Reliable knowledge graph path representation learning | |
Tzacheva et al. | Support confidence and utility of action rules triggered by meta-actions | |
Zulehner et al. | Approximation of quantum states using decision diagrams | |
Badapanda et al. | Agriculture data visualization and analysis using data mining techniques: application of unsupervised machine learning | |
CN103942318A (en) | Parallel AP propagating XML big data clustering integration method | |
Cai et al. | Robust fuzzy relational classifier incorporating the soft class labels | |
Liu et al. | Identification of drainage patterns using a graph convolutional neural network | |
CN107193916A (en) | Method and system are recommended in a kind of personalized variation inquiry | |
Baitharu et al. | Comparison of Kernel selection for support vector machines using diabetes dataset | |
Kohns et al. | Decoupling shrinkage and selection for the Bayesian quantile regression |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20170426 |
|
RJ01 | Rejection of invention patent application after publication |