CN113792748B - Method and device for creating diabetic retinopathy classifier based on feature extraction and double support vector machines - Google Patents

Method and device for creating diabetic retinopathy classifier based on feature extraction and double support vector machines Download PDF

Info

Publication number
CN113792748B
CN113792748B CN202111366311.6A CN202111366311A CN113792748B CN 113792748 B CN113792748 B CN 113792748B CN 202111366311 A CN202111366311 A CN 202111366311A CN 113792748 B CN113792748 B CN 113792748B
Authority
CN
China
Prior art keywords
sample
data
hyperplane
support vector
training
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202111366311.6A
Other languages
Chinese (zh)
Other versions
CN113792748A (en
Inventor
王天棋
高慧
孙艺
王洲洋
刘传昌
高宇航
龙中武
徐懿
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing University of Posts and Telecommunications
Original Assignee
Beijing University of Posts and Telecommunications
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing University of Posts and Telecommunications filed Critical Beijing University of Posts and Telecommunications
Priority to CN202111366311.6A priority Critical patent/CN113792748B/en
Publication of CN113792748A publication Critical patent/CN113792748A/en
Application granted granted Critical
Publication of CN113792748B publication Critical patent/CN113792748B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2411Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H30/00ICT specially adapted for the handling or processing of medical images
    • G16H30/40ICT specially adapted for the handling or processing of medical images for processing medical images, e.g. editing
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/50ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for simulation or modelling of medical disorders

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Public Health (AREA)
  • General Engineering & Computer Science (AREA)
  • Primary Health Care (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • General Physics & Mathematics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Evolutionary Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Epidemiology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Nuclear Medicine, Radiotherapy & Molecular Imaging (AREA)
  • Biomedical Technology (AREA)
  • Databases & Information Systems (AREA)
  • Pathology (AREA)
  • Radiology & Medical Imaging (AREA)

Abstract

The invention provides a method and a device for establishing a diabetic retinopathy classifier based on feature extraction and a double-support vector machine, wherein the method comprises the steps of carrying out feature extraction on sample data, carrying out data optimization aiming at different tasks, controlling the membership of each sample point to a training set in a weighting mode by introducing the membership, and simultaneously balancing the constraint conditions of isolated points or noise points in training data by introducing relaxation variables so as to reduce errors caused by the isolated points or the noise points in the samples; furthermore, the cost is controlled in a weighting mode, a cost sensitive learning framework is used, and the cost is introduced in a fuzzy support vector machine through weighting, so that the error of expressing the data imbalance problem by an equation is reduced; further, by generating two independent and non-parallel hyperplanes, each hyperplane is brought close to one of the two categories while being far from the other, so that large-scale classification problems can be handled without the need for additional external optimizers.

Description

Method and device for establishing diabetic retinopathy classifier based on feature extraction and double support vector machines
Technical Field
The invention relates to the technical field of data processing, in particular to a method and a device for creating a diabetic retinopathy classifier based on feature extraction and a double-support-vector machine.
Background
The traditional Support Vector Machine (SVM) algorithm has been successfully applied to the fields of image classification, bioinformatics, text classification and the like. In general, in order to solve the classification problem, a mathematical model is constructed by using a set of training examples to predict unknown class labels of test examples, so that the prediction result can maximally embody the good generalization capability of a classification algorithm.
When the number of instances representing one class is much smaller than the other classes, some imbalance data is generated in the samples. The classification of unbalanced data is also referred to as skewed class distribution. In the real classification process, the category with the least number of samples is usually the category in which researchers are most interested, and the situation of skewed distribution is also most likely to occur, and the characteristics of the data are as follows: small sample size, high inter-class overlap, small separation amount and the like. The existence of the data not only reduces the performance of the classifier and gradually increases the calculation complexity of the algorithm, but also causes the rapid rise of redundant data and the increase of noise and error mark data, so that the unbalanced classification condition occurs again, and an inertia cycle is formed in the classification process.
Disclosure of Invention
The embodiment of the invention provides a method and a device for creating a diabetic retinopathy classifier based on feature extraction and a double-support-vector machine, which are used for eliminating or improving one or more defects in the prior art and solving the problems of high outlier and poor training effect caused by small quantity of partial class instances.
The technical scheme of the invention is as follows:
the invention provides a method for creating a diabetic retinopathy classifier based on feature extraction and a double-support-vector machine, which comprises the following steps:
acquiring a training sample set, wherein the training sample set comprises a set number of classes, each class comprises a plurality of sample data, and each sample is provided with a class label; the sample data at least comprises images of retinal capillary aneurysm, bleeding spots, hard exudation, velveteen spots, venous beading, intraretinal microvascular abnormality and macular edema states, and corresponding states are marked as labels;
performing feature extraction on each sample data based on vector feature selection or matrix feature selection, wherein a lasso feature selection method is adopted for the vector-based feature selection, and an lr or p-norm-based feature selection method is adopted for the matrix-based feature selection;
introducing a fuzzy support vector machine, controlling the membership degree of each sample point to a training set in a weighting mode, and performing fuzzification processing on each sample data to reduce the membership degree of the isolated points and the noise points in the training sample set relative to the belonged classes; meanwhile, a relaxation variable is introduced into the fuzzy support vector machine, and a penalty factor is introduced based on a cost sensitive learning frame;
generating two independent and nonparallel hyperplanes based on the structure of the fuzzy support vector machine by using the training sample set after feature extraction, and enabling each hyperplane to be close to one of the two categories and to be far away from the other category at the same time so as to create a double classifier.
In some embodiments, a first set proportion of sample data in the training sample set is used to construct the dual classifier, and the sample data in the remainder of the training sample set is used to detect the accuracy of the classifier.
The invention has the beneficial effects that:
in the method and the device for establishing the diabetic retinopathy classifier based on the feature extraction and the double support vector machines, the method performs the feature extraction on sample data, performs the data optimization aiming at different tasks, controls the membership degree of each sample point to a training set by introducing the membership degree in a weighting mode, and simultaneously balances the constraint condition of a solitary point or a noise point in training data by introducing a relaxation variable so as to reduce the error caused by the solitary point or the noise point in the sample; furthermore, the cost is controlled in a weighting mode, a cost sensitive learning framework is used, and the cost is introduced in a fuzzy support vector machine through weighting, so that the error of expressing the data imbalance problem by an equation is reduced; further, by generating two independent and non-parallel hyperplanes, each hyperplane is brought close to one of the two categories while being far from the other, so that large-scale classification problems can be handled without the need for additional external optimizers.
Additional advantages, objects, and features of the invention will be set forth in part in the description which follows and in part will become apparent to those having ordinary skill in the art upon examination of the following or may be learned from practice of the invention. The objectives and other advantages of the invention will be realized and attained by the structure particularly pointed out in the written description and claims hereof as well as the appended drawings.
It will be appreciated by those skilled in the art that the objects and advantages that can be achieved with the present invention are not limited to the specific details set forth above, and that these and other objects that can be achieved with the present invention will be more clearly understood from the detailed description that follows.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the invention and together with the description serve to explain the principles of the invention. In the drawings:
FIG. 1 is a method for creating a diabetic retinopathy classifier based on feature extraction and a dual support vector machine according to an embodiment of the present invention;
FIG. 2 (a) is a graph of the effect of ordered regression based on the marginal sum strategy;
FIG. 2 (b) is a diagram of two example incremental support vector machines;
FIG. 3 is a graph of TSVM versus data set Australian under noisy and non-noisy conditions;
FIG. 4 is a graph of the results of the classifier's test on dataset Australian with and without noise according to the present invention;
FIG. 5 is a graph of TSVM versus data set Blood under noisy and non-noisy conditions;
FIG. 6 is a graph of the results of the classifier of the present invention testing a data set Blood under noisy and non-noisy conditions;
FIG. 7 is a graph of TSVM versus data set Heart under noisy and quiet conditions;
FIG. 8 is a graph of the results of the test of the classifier of the present invention on a data set Heart under noisy and non-noisy conditions;
FIG. 9 is an original drawing of diabetic retinopathy;
FIG. 10 is a hard exudate feature map obtained from feature extraction of FIG. 9;
FIG. 11 is a characteristic diagram of the microaneurysm of FIG. 9 obtained by feature extraction;
FIG. 12 is a characteristic diagram of the bleeding spot obtained from the feature extraction in FIG. 9;
FIG. 13 is a graph of TSVM versus accuracy of classification of diabetic retinopathy plots by the improved classifier of the present invention under noisy conditions;
FIG. 14 is a thermal map relating ozone levels;
FIG. 15 is a graph comparing TSVM and the improved classifier of the present invention to the accuracy of ozone water bisection detection under noisy conditions.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be described in further detail with reference to the following embodiments and accompanying drawings. The exemplary embodiments and descriptions of the present invention are provided to explain the present invention, but not to limit the present invention.
It should be noted that, in order to avoid obscuring the present invention with unnecessary details, only the structures and/or processing steps closely related to the scheme according to the present invention are shown in the drawings, and other details not so relevant to the present invention are omitted.
It should be emphasized that the term "comprises/comprising" when used herein, is taken to specify the presence of stated features, elements, steps or components, but does not preclude the presence or addition of one or more other features, elements, steps or components.
It is also noted herein that the term "coupled," if not specifically stated, may refer herein to not only a direct connection, but also an indirect connection in which an intermediate is present.
For the problem of performance reduction of the classifier caused by data imbalance when the number of partial sample instances is far smaller than that of other classes, some methods balance the distribution of the classes by resampling and constructing a data space, and perform priority processing in a preprocessing stage to reduce the result influenced by unbalanced data. However, such methods, while avoiding modifications to the learning algorithm, lack the assignment of weights, resulting in less than ideal newly constructed results. Some methods adopt a cost-sensitive learning mode, weights are distributed to different data samples according to importance, and the relationship between the samples and the membership degree of a cluster group is ignored. Some approaches employ ensemble learning, combining the outputs of several basic learners together to obtain a new classifier. However, although the learning effect is improved in this way, the overall calculation performance is often neglected.
Most of the above methods are based on local consideration, and although a certain range of problems can be solved, abnormal values still occur in many cases. Classification algorithms that deal with unbalanced data can present serious problems due to the presence of outliers. Training is poor because in the imbalance problem, a few data points may contain few samples and the weight of a few clustered data points may be high, resulting in outliers with higher weights. Although a square-loss function can be used to solve this problem, a linear equation is derived for both systems. However, if the similarity of two minority clusters is extremely high, the two non-parallel hyperplanes are extremely similar according to the two system equations and are likely to fall within the range represented by the linear equation of one system, so that the classification mode capability through the characteristics and the construction of the view and the hyperplane is weakened, the data noise is gradually increased, and subjective data input errors are inevitably caused. Meanwhile, the information for marking each sample is also seriously insufficient, which results in the generation of more serious abnormal values in the data, and finally, the performance of the classifier is gradually reduced, thereby forming an obstacle cycle.
The invention provides a classifier establishing method based on feature extraction and a double-support-vector machine, which comprises the following steps of S101-S103:
step S101: the method comprises the steps of obtaining a training sample set, wherein the training sample set comprises a set number of categories, each category comprises a plurality of sample data, and each sample is provided with a category label.
Step S102: and performing feature extraction on each sample data based on vector feature selection or matrix feature selection.
Step S103: introducing a fuzzy support vector machine, controlling the membership degree of each sample point to the training set in a weighting mode, and performing fuzzification processing on each sample data to reduce the membership degree of the isolated points and the noise points in the training sample set relative to the belonged classes; meanwhile, a relaxation variable is introduced into the fuzzy support vector machine, and a penalty factor is introduced based on a cost sensitive learning framework.
Step S104: two independent and nonparallel hyperplanes are generated based on the structure of the fuzzy support vector machine by utilizing the training sample set after feature extraction, and each hyperplane is close to one of two categories and is far away from the other category at the same time so as to create a double classifier.
In some embodiments, the sample data of the first set proportion in the training sample set is used for constructing a dual classifier, and the sample data of the rest part of the training sample set is used for detecting the precision of the classifier.
Specifically, the following describes a method for creating a diabetic retinopathy classifier based on feature extraction and a dual support vector machine, and the support vector machine obtained by improvement based on the present invention is labeled as PTSVM:
1. principle of classifier creation
1.1 analysis of ordered regression
To have
Figure DEST_PATH_IMAGE001
Order regression problem for individual classes, representing classes as consecutive integers
Figure 407903DEST_PATH_IMAGE002
And the known sorting information is assigned with a value,
Figure 293819DEST_PATH_IMAGE003
is shown as
Figure 885338DEST_PATH_IMAGE004
The number of training samples for a class,
Figure 100002_DEST_PATH_IMAGE005
(ii) a First, the
Figure 468373DEST_PATH_IMAGE006
A training sample is recorded
Figure 588775DEST_PATH_IMAGE007
Figure 493146DEST_PATH_IMAGE008
Wherein
Figure 357197DEST_PATH_IMAGE009
Is to satisfy
Figure 499465DEST_PATH_IMAGE010
The input space of (2). As figure 2 (a) shows the effect of ordered regression based on the marginal sum strategy,
Figure 384245DEST_PATH_IMAGE011
and
Figure 120120DEST_PATH_IMAGE012
are topologically parallel hyperplanes. When in use
Figure 319020DEST_PATH_IMAGE013
When the support vector is located on the boundary between adjacent classes, pair
Figure 286101DEST_PATH_IMAGE014
Maximization, and discrimination of the parallel hyperplane can be carried out. FIG. 2 (b) shows two example incremental support vector machines. It can be analyzed from the graph if a new sample is added (e.g. the sample is added
Figure 138519DEST_PATH_IMAGE015
) Is a super plane
Figure 361690DEST_PATH_IMAGE016
And hyperplane
Figure 426598DEST_PATH_IMAGE017
The hyperplane in between, then the classifier needs to be adjusted.
After hypothesis training, a classifier is generated
Figure 12300DEST_PATH_IMAGE018
As a weight, in
Figure 268576DEST_PATH_IMAGE019
As a function of the coefficients:
Figure 916726DEST_PATH_IMAGE020
when at the first
Figure 18281DEST_PATH_IMAGE021
When the super-plane is parallel and topological, the method can obtain
Figure 192910DEST_PATH_IMAGE022
Wherein
Figure 387131DEST_PATH_IMAGE023
Wherein
Figure 286692DEST_PATH_IMAGE024
Is as follows
Figure 693403DEST_PATH_IMAGE025
A discrimination threshold for each hyperplane. Is provided with
Figure 519276DEST_PATH_IMAGE026
The decision function is:
Figure 323546DEST_PATH_IMAGE027
(1)
wherein the content of the first and second substances,
Figure 8606DEST_PATH_IMAGE028
is the first
Figure 100002_DEST_PATH_IMAGE029
Personal identificationThe edge of the hyperplane, representing the shortest distance of the nearest sample from this hyperplane, passes through
Figure 953428DEST_PATH_IMAGE029
Individual discrimination hyperplane and second
Figure 899387DEST_PATH_IMAGE030
Or a first
Figure 45198DEST_PATH_IMAGE031
And obtaining a class difference value. Based on marginal sum strategy, by
Figure 76608DEST_PATH_IMAGE032
Maximizing the sum of all margins, and according to the intermediate constraints:
Figure 231645DEST_PATH_IMAGE033
(2)
to obtain:
Figure 819663DEST_PATH_IMAGE034
(3)
so that
Figure DEST_PATH_IMAGE035
(4)
Figure 995429DEST_PATH_IMAGE036
(5)
Figure 389502DEST_PATH_IMAGE037
(6)
Figure 207285DEST_PATH_IMAGE038
(7)
Figure 3203DEST_PATH_IMAGE039
(8)
Wherein the content of the first and second substances,
Figure 84291DEST_PATH_IMAGE040
training sample
Figure 591758DEST_PATH_IMAGE041
Mapping to a high-dimensional Reproduction Kernel Hilbert Space (RKHS) by a transformation function, and having a kernel function:
Figure 88598DEST_PATH_IMAGE042
(9)
Figure 66919DEST_PATH_IMAGE043
the inner product is expressed in RKHS. In addition to this, the present invention is,
Figure 115646DEST_PATH_IMAGE044
is measured data
Figure 484311DEST_PATH_IMAGE045
Non-negative relaxation variables, parameters, of the degree of misclassification of
Figure 643897DEST_PATH_IMAGE046
Controlling the trade-off between error in the training samples and margin and maximization,
Figure 414406DEST_PATH_IMAGE047
representing the relaxation variable.
1.2 introduction of relaxation variables
In combination with the analysis of 1.1, a relaxation variable is introduced, and a large number of constraints are involved in the processing of a large number of isolated points and noisy points in the process of dividing the hyperplane. Usually, these constraint conditions are of different types, and therefore these isolated points and noise points are in regions with variable ranges, a slack variable (residual variable) can be introduced in the process of normalizing linear or nonlinear programming, if equal to 0, the original state is converged, if greater than zero, constraint slack is determined, in the process of constraint condition judgment, equalisation is realized by adding (or subtracting) a new non-negative variable (namely, a slack variable) to the left of the inequality, and the initial coefficient of the slack variable in the objective function is zero.
The optimal representation for the hyperplane classification is:
Figure 132570DEST_PATH_IMAGE048
(10)
wherein the content of the first and second substances,
Figure 722952DEST_PATH_IMAGE049
in order to be a penalty factor,
Figure 889491DEST_PATH_IMAGE050
in order to be a function of the relaxation variable,
Figure 639141DEST_PATH_IMAGE051
which represents the degree of membership of the sample,
Figure 905037DEST_PATH_IMAGE052
is a normal vector of the hyperplane,
Figure 100002_DEST_PATH_IMAGE053
is a dimension of the training set of fuzzy data,
Figure 576190DEST_PATH_IMAGE054
is as follows
Figure 579043DEST_PATH_IMAGE055
And identifies the edge of the hyperplane.
1.3 introduction of fuzzy support vector machine
In the traditional support vector machine algorithm, the hyperplane can be divided by effectively training each sample point. However, in the data set collected in actual situations, there are often isolated points or noise points with very small differences in the training samples, which can seriously affect the structure of the classifier, so that the classifier generates an overfitting phenomenon, resulting in a reduction in generalization performance of the classifier. In order to reduce the serious interference of the outlier isolated points or the noise points, a fuzzy support vector machine is introduced in combination with the membership function. The degree of membership of each sample point to the training set is controlled in a weighting mode, and the degree of membership of the isolated points and the noise points is obviously lower than that of other points. Thus, the total error of these solitons and noise will be significantly reduced.
Fuzzification processing is carried out on the sample data during classification, matrix transformation is carried out on the actually acquired data set, and a training set representing fuzzy data is as follows:
Figure 324145DEST_PATH_IMAGE056
; (11)
wherein the content of the first and second substances,
Figure 88839DEST_PATH_IMAGE057
Figure 778446DEST_PATH_IMAGE058
Figure 224471DEST_PATH_IMAGE059
(ii) a Then the optimal representation for the hyperplane classification is:
Figure 948714DEST_PATH_IMAGE060
; (12)
wherein the content of the first and second substances,
Figure 556413DEST_PATH_IMAGE049
in order to be a penalty factor,
Figure 966272DEST_PATH_IMAGE061
in order to be a function of the relaxation variable,
Figure 950408DEST_PATH_IMAGE062
which represents the degree of membership of the sample,
Figure 732420DEST_PATH_IMAGE052
is a normal vector of the hyperplane,
Figure 635654DEST_PATH_IMAGE053
is the dimension of the fuzzy data training set.
1.3 introducing cost control to account for sample skew
In the training phase of actual data samples, the noise and outlier nature of the isolated points can largely lead to an imbalance in the data set, but in most cases, it is unreasonable to express them in terms of equations. Therefore, on the basis of 1.3, the cost is controlled in a weighting mode, and the cost is introduced through weighting in a fuzzy support vector machine by using a cost sensitive learning framework, so that the error of the data imbalance problem represented by an equation is reduced. By positive and negative class penalty factors
Figure 909640DEST_PATH_IMAGE063
And
Figure 556522DEST_PATH_IMAGE064
to represent different costs, to represent the degree of contribution to different importance of misclassification of the data set, and to find the optimal hyperplane by solving the following problem.
Figure 100002_DEST_PATH_IMAGE065
Is used for training data sets
Figure 960084DEST_PATH_IMAGE066
Of data points in dimensional space
Figure 909585DEST_PATH_IMAGE067
Set of column vectors
Figure 61081DEST_PATH_IMAGE068
Is shown as follows
Figure 121441DEST_PATH_IMAGE069
An input mode is
Figure 674782DEST_PATH_IMAGE070
Of 1 at
Figure 795185DEST_PATH_IMAGE069
An output mode is
Figure 168397DEST_PATH_IMAGE071
,
Figure 829186DEST_PATH_IMAGE072
And
Figure 112400DEST_PATH_IMAGE073
respectively positive and negative values; order to
Figure 26873DEST_PATH_IMAGE074
A set of sample metrics is represented that are,
Figure 762747DEST_PATH_IMAGE075
and
Figure 555123DEST_PATH_IMAGE076
respectively representing a positive class sample index set and a negative class sample index set.
Then the hyperplane is optimally represented as
Figure DEST_PATH_IMAGE077
Figure 286318DEST_PATH_IMAGE078
;(13)
Wherein the content of the first and second substances,
Figure DEST_PATH_IMAGE079
and
Figure 341999DEST_PATH_IMAGE080
is a penalty factor that is a function of,
Figure 565170DEST_PATH_IMAGE081
is a hyperplane normal vector and is,
Figure 662701DEST_PATH_IMAGE082
is a relaxation variable;
calculating the performance of each weight coefficient by adopting an adaptive regression method
Figure 654928DEST_PATH_IMAGE083
Using a weight
Figure 412669DEST_PATH_IMAGE084
The loss caused by the solitary point or the noise point in the calculation process is represented by a penalty factor
Figure 123136DEST_PATH_IMAGE084
When the sample tends to be changed to the limit, the sample is gradually driven to be classified into the correct hyperplane,
Figure 257314DEST_PATH_IMAGE085
the degree, which is the correction to the initial sample, selects the correct sample population in successive iterations, expressed as:
Figure 104047DEST_PATH_IMAGE086
(14)
wherein each weight coefficient
Figure 298268DEST_PATH_IMAGE083
Corresponding to a weight value
Figure 230452DEST_PATH_IMAGE084
Using penalty factor as weight
Figure 666856DEST_PATH_IMAGE084
Combining equation (13), we obtain the following equation:
Figure 102517DEST_PATH_IMAGE087
(15)
wherein the content of the first and second substances,
Figure 467639DEST_PATH_IMAGE081
represents a hyperplane normal vector;
equations (10) and (15), eliminating the hyperplane normal vector yields the following equation:
Figure 887119DEST_PATH_IMAGE088
(16)
elimination of the relaxation variables gives the following formula:
Figure 363100DEST_PATH_IMAGE089
(17)
elimination of the penalty factor gives the following formula:
Figure 184425DEST_PATH_IMAGE090
(18)
1.4 improved double support vector machine
The purpose of the dual support vector machine is to generate two independent and non-parallel hyperplanes. The distance between the hyperplane and the target class is different, and the control modes of relaxation variables, cost control and fuzzification provided in 1.2 are combined to break through the mode that all data are controlled by a constraint force, so that the problem of division of two secondary planning related classes with small scale and small change is solved, and the speed and the performance of the algorithm are improved.
If the binary classification problem is calculated in an n-dimensional real number space, the training data set is represented as:
Figure 720449DEST_PATH_IMAGE091
(19)
show, is provided with
Figure 627225DEST_PATH_IMAGE092
Let us order
Figure 408361DEST_PATH_IMAGE093
Matrix A of (A) represents a positive class
Figure 880931DEST_PATH_IMAGE094
In
Figure 463222DEST_PATH_IMAGE095
A sample, an order
Figure 981928DEST_PATH_IMAGE096
Matrix B of (a) represents a negative class
Figure 675077DEST_PATH_IMAGE097
In
Figure 595629DEST_PATH_IMAGE098
The number of the samples is one,
Figure 83242DEST_PATH_IMAGE099
is a kernel matrix. The two separator hyperplanes are represented as:
Figure 89244DEST_PATH_IMAGE100
(20)
Figure 320505DEST_PATH_IMAGE101
(21)
Figure 594099DEST_PATH_IMAGE102
(22)
order to
Figure 100002_DEST_PATH_IMAGE103
Is composed of
Figure 111668DEST_PATH_IMAGE104
In the data set
Figure 100002_DEST_PATH_IMAGE105
The input of the order is carried out,
Figure 480332DEST_PATH_IMAGE106
is a variance-like matrix;
in order to improve the marginal benefit and reduce the structural risk to the maximum extent, a regularization term is introduced into the formula, and then the optimization of the hyperplane is expressed as
Figure 100002_DEST_PATH_IMAGE107
Figure 108759DEST_PATH_IMAGE108
; (23)
Wherein the content of the first and second substances,
Figure 3903DEST_PATH_IMAGE109
Figure 98898DEST_PATH_IMAGE110
is a regularization term;
further performing the following solution to find an optimal hyperplane:
Figure 49799DEST_PATH_IMAGE111
(24)
elimination of the balance parameters gives:
Figure 888442DEST_PATH_IMAGE112
(25)
Figure 638092DEST_PATH_IMAGE113
(26)
wherein the content of the first and second substances,
Figure 903988DEST_PATH_IMAGE114
Figure 106299DEST_PATH_IMAGE115
if the weight vector is the weight vector, the weight vector is eliminated to obtain:
Figure 483054DEST_PATH_IMAGE116
(27)
Figure 759315DEST_PATH_IMAGE117
(28)
2. principle of algorithm
2.1 Linear Classification
For the binary classification problem, a linear loss function is introduced, and the original problem of the linear loss projection dual-support vector machine can be expressed as follows:
Figure 320746DEST_PATH_IMAGE118
(29)
so that
Figure 885720DEST_PATH_IMAGE119
(30)
Figure 689334DEST_PATH_IMAGE120
(31)
Figure 23364DEST_PATH_IMAGE121
(32)
So that
Figure 755696DEST_PATH_IMAGE122
(33)
Figure 807966DEST_PATH_IMAGE123
(34)
Figure 100002_DEST_PATH_IMAGE124
Is a positive parameter of the number of bits,
Figure 385578DEST_PATH_IMAGE125
is the relaxation variable. The optimal value of the empirical risk, which can be derived from equations (17) and (18), is:
Figure 902010DEST_PATH_IMAGE126
and
Figure 306708DEST_PATH_IMAGE127
because of
Figure 846274DEST_PATH_IMAGE128
And
Figure 165260DEST_PATH_IMAGE129
which may be negative, and infinite problems may occur, so to balance the influence of each point on the projection class mean and to introduce the concept of a rough set, a weighted linear loss function with a weighting vector is introduced. The WLPTSVM formula is then given as follows.
Figure 801778DEST_PATH_IMAGE130
(35)
So that
Figure 548017DEST_PATH_IMAGE131
(36)
Figure 699513DEST_PATH_IMAGE132
(37)
And
Figure 759872DEST_PATH_IMAGE133
(38)
so that
Figure 47634DEST_PATH_IMAGE134
(39)
Figure 433616DEST_PATH_IMAGE135
(40)
Wherein:
Figure 305364DEST_PATH_IMAGE136
(41)
Figure 169415DEST_PATH_IMAGE137
(42)
is determined by the following equation:
Figure 311683DEST_PATH_IMAGE138
(43)
Figure 602987DEST_PATH_IMAGE139
(44)
wherein the content of the first and second substances,
Figure 463496DEST_PATH_IMAGE140
and
Figure 927975DEST_PATH_IMAGE141
is a parameter.
Before solving the problems of formulae (35) and (38), the problems of formulae (35) + (43) are geometrically explained, while the problems of formulae (38) and (44) are similar. For equation (35), the first term in the objective function is the hyperplane that controls the model complexity to find the optimal projection direction
Figure 393592DEST_PATH_IMAGE142
. The second term in the objective function is to minimize the empirical risk, minimizing the intra-class variance of the projection samples of the own class. At the same time, the projection samples of the other category are as scattered as possible. In addition, the weight vector
Figure 652535DEST_PATH_IMAGE143
The effect of each point will be balanced with the projection-like mean. During the training process, the control of the empirical risk needs to be compatible with the whole process. Therefore, from this viewpoint, the problems (35) and (38) of the sum are superior to those in the PTSVM.
To facilitate algorithm verification, the above problem can be solved by the following approximation algorithm. Considering the problem (35), and substituting the equality constraint into the objective function yields:
Figure 875706DEST_PATH_IMAGE144
(45)
order:
Figure 707658DEST_PATH_IMAGE145
(46)
Figure 699884DEST_PATH_IMAGE146
(47)
then equation (45) is converted into
Figure 457625DEST_PATH_IMAGE147
(48)
Will (48) pair w1Is set to 0, one can obtain:
Figure 168092DEST_PATH_IMAGE148
(49)
the solution to QPP (35) is obtained from the system of linear equations as follows.
Figure 36691DEST_PATH_IMAGE149
(50)
Considering the problem (38), and substituting the equation constraint into the objective function yields:
Figure 149003DEST_PATH_IMAGE150
(51)
Figure 77645DEST_PATH_IMAGE151
(52)
order:
Figure 275408DEST_PATH_IMAGE152
(53)
convert equation (51) to
Figure 619802DEST_PATH_IMAGE153
(54)
Will be (54) opposite to
Figure 233227DEST_PATH_IMAGE154
Setting the gradient of (c) to zero may result in:
Figure 473716DEST_PATH_IMAGE155
(55)
then, a solution for QPP (38) can be obtained from the system of linear equations as follows.
Figure 17829DEST_PATH_IMAGE156
(56)
In order to find suitable ones defined in (55) and (56)
Figure 369176DEST_PATH_IMAGE157
And
Figure 49556DEST_PATH_IMAGE158
and approximate solutions of problems (52) + (55) and (54) + (56), a weight setting method with two steps is constructed. In general, the first step is to solve the linear loss function problems (25) and (26) and find the corresponding
Figure 460946DEST_PATH_IMAGE159
And
Figure 492356DEST_PATH_IMAGE160
. The second step being using the compounds obtained
Figure 381815DEST_PATH_IMAGE159
And
Figure 949325DEST_PATH_IMAGE161
computing
Figure 531616DEST_PATH_IMAGE158
And
Figure 722426DEST_PATH_IMAGE157
then using the obtained
Figure 540209DEST_PATH_IMAGE158
And
Figure 70547DEST_PATH_IMAGE157
the solutions of the problems (29) and (32) are found and taken as the approximate solution required. The detailed algorithm is as follows:
step 2.1.1: given training input matrices A and B, let
Figure 948373DEST_PATH_IMAGE162
Figure 564163DEST_PATH_IMAGE163
With appropriate penalty parameters
Figure 185637DEST_PATH_IMAGE164
In formulae (38) and (44)
Figure 836061DEST_PATH_IMAGE165
And
Figure 117744DEST_PATH_IMAGE166
step 2.1.2: from formulae (35) and (38)
Figure 486409DEST_PATH_IMAGE165
And
Figure 645995DEST_PATH_IMAGE167
calculating a relaxation variable
Figure 213242DEST_PATH_IMAGE168
And
Figure 42658DEST_PATH_IMAGE169
then obtained from the formulae (43) and (44)
Figure 23252DEST_PATH_IMAGE170
And
Figure 861895DEST_PATH_IMAGE171
wherein
Figure 345966DEST_PATH_IMAGE172
Figure 346283DEST_PATH_IMAGE173
Step 2.1.3: by using
Figure 50059DEST_PATH_IMAGE170
and
Figure 692393DEST_PATH_IMAGE171
Finding solutions of equations (38) and (44)
Figure 296550DEST_PATH_IMAGE174
And
Figure 733348DEST_PATH_IMAGE175
step 2.1.4: the decision is constructed as:
Figure 422955DEST_PATH_IMAGE176
(57)
wherein the content of the first and second substances,
Figure 603400DEST_PATH_IMAGE177
is an absolute value.
2.2 nonlinear Classification
For the non-linear classification problem, first define
Figure 62064DEST_PATH_IMAGE178
And selects an appropriate kernel function
Figure 466500DEST_PATH_IMAGE179
The weighted linear loss is then projected onto the twin support vector machine. The original problem of the non-linear version is represented as follows:
Figure 518770DEST_PATH_IMAGE180
(58)
so that
Figure 860496DEST_PATH_IMAGE181
(59)
Figure 314611DEST_PATH_IMAGE182
(60)
Figure 952266DEST_PATH_IMAGE183
(61)
So that
Figure 491832DEST_PATH_IMAGE184
(62)
Figure 138714DEST_PATH_IMAGE185
(63)
Wherein the content of the first and second substances,
Figure 244073DEST_PATH_IMAGE186
and
Figure 193574DEST_PATH_IMAGE187
are determined by equations (41) and (42).
The derivation process is similar to the linear case, assuming that
Figure 79491DEST_PATH_IMAGE188
And
Figure 139851DEST_PATH_IMAGE189
we can obtain solutions to problems (58) and (61) as follows.
Figure 929077DEST_PATH_IMAGE190
(64)
Figure 49480DEST_PATH_IMAGE191
(65)
Wherein the content of the first and second substances,
Figure DEST_PATH_IMAGE192
and
Figure 157113DEST_PATH_IMAGE193
is a unit matrix, then a matrix
Figure 817902DEST_PATH_IMAGE194
(66)
Figure 960170DEST_PATH_IMAGE195
(67)
Figure 517053DEST_PATH_IMAGE196
(68)
Figure 377562DEST_PATH_IMAGE197
(69)
The specific algorithm is as follows:
step 2.2.1: given training input matrices A and B, let
Figure 45304DEST_PATH_IMAGE198
Figure 540614DEST_PATH_IMAGE199
With appropriate penalty parameters
Figure 2819DEST_PATH_IMAGE200
Obtaining of the compounds of formulae (64) and (65)
Figure 350624DEST_PATH_IMAGE201
Figure DEST_PATH_IMAGE202
And a kernel function K.
Step 2.2.2: from formulae (58) and (61)
Figure 822057DEST_PATH_IMAGE201
And
Figure 938917DEST_PATH_IMAGE202
calculating a relaxation variable
Figure 837603DEST_PATH_IMAGE203
And
Figure 407125DEST_PATH_IMAGE204
then obtained from the formulae (43) and (44)
Figure 151090DEST_PATH_IMAGE205
And
Figure 889501DEST_PATH_IMAGE206
wherein
Figure 693509DEST_PATH_IMAGE207
Figure 15906DEST_PATH_IMAGE208
Step 2.2.3: by using
Figure 563562DEST_PATH_IMAGE209
And
Figure 327118DEST_PATH_IMAGE206
finding solutions of equations (64) and (65)
Figure 426661DEST_PATH_IMAGE210
And
Figure 111721DEST_PATH_IMAGE211
step 2.2.4: constructing a decision as
Figure 587701DEST_PATH_IMAGE212
(70)
Wherein the content of the first and second substances,
Figure 143448DEST_PATH_IMAGE213
is an absolute value.
3. Comparison of the improved support vector machine of the present invention with a conventional twin vector machine
Twin Support Vector Machine (TSVM) is an effective classifier, especially for binary data, defined by the square of the norm distance in the objective lens. It is well known that norm distances are susceptible to outliers, which can lead to errors. The improved algorithm of the invention improves the precision of the test result under the condition of noise and no noise, and has robustness in multi-classification problems.
3.1 testing with dataset Australian
The number of samples of dataset Australian is 690, the data dimension is 14. Fig. 3 is generated by testing the original algorithm TSVM on the dataset Australian, fig. 3 is a description of the change of the classification accuracy of the original algorithm TSVM on the dataset Australian with the adjustment parameter, the red line in fig. 3 represents the change of the classification accuracy with the adjustment parameter under the noise-free condition, the optimal performance of the algorithm is only 77.81251%, the blue line represents the change of the classification accuracy with the adjustment parameter under the noise condition, and the optimal performance of the algorithm is only 95.625025%.
Fig. 4 is a result of a test of the improved algorithm on the dataset Australian of the present invention, which is a description of the change of the classification accuracy of the improved algorithm on the dataset Australian with the adjustment parameter, a red line in fig. 4 shows the change of the classification accuracy with the adjustment parameter under the noise-free condition, the optimal performance of the algorithm is improved to 93.12505%, a blue line shows the change of the classification accuracy with the adjustment parameter under the noise condition, the optimal performance of the algorithm is improved to 96.87505%, the accuracy of the algorithm under both conditions is higher than that of a graph represented by the conventional algorithm, and the improvement is more significant particularly under the noise condition. While under noisy conditions, optimal at C =0.5, the time complexity rate under noisy conditions is reduced compared to the red line of fig. 3 (optimal at C = 1).
3.2 testing with dataset Blood
The number of samples in the dataset Blood is 748 and the data dimension is 4. Fig. 5 is a graph generated by testing the original algorithm TSVM on the data set Blood, fig. 5 is a description of the classification accuracy of the original algorithm TSVM on the data set Blood as a function of the tuning parameter, a red line in the graph represents the classification accuracy as a function of the tuning parameter under a noise-free condition, the optimal performance of the algorithm is only 76.8751%, a blue line represents the classification accuracy as a function of the tuning parameter under a noise condition, and the optimal performance of the algorithm is only 93.4375%.
It can be seen that fig. 6 is generated by testing the improved algorithm on the data set Blood, and is a description of the change of the classification accuracy of the improved algorithm on the data set Blood with the adjustment parameter, a red line in the figure represents the change of the classification accuracy with the adjustment parameter under the noise-free condition, the optimal performance of the algorithm is improved to 87.8125%, a blue line represents the change of the classification accuracy with the adjustment parameter under the noise condition, the optimal performance of the algorithm is improved to 97.5%, the accuracy of the algorithm is higher than that of the graph represented by the conventional algorithm under both conditions, and the improvement is more significant particularly under the noise condition. Meanwhile, under the noiseless condition, the optimal time is reached when the C =0.0625, compared with the red line (the optimal time is reached when the C = 0.25) in the figure 6, under the noised condition, the optimal time is reached when the C =0.25, compared with the red line (the optimal time is reached when the C = 1) in the figure 5, and the time complexity rate under the two conditions is reduced.
3.3 testing with dataset her
The number of samples of the data set her is 270 and the data dimension is 13. Fig. 7 is a graph generated by testing the original algorithm TSVM on the data set Heart, and is a description of the change of the classification accuracy of the original algorithm on the data set Heart along with the adjustment parameter, the red line in the graph represents the change of the classification accuracy along with the adjustment parameter under the noise-free condition, the optimal performance of the algorithm is only 76.5625%, the blue line represents the change of the classification accuracy along with the adjustment parameter under the noise condition, and the optimal performance of the algorithm is only 93.75%.
Fig. 8 is a result of testing the improved algorithm on the data set Heart, and is a description of the change of the classification accuracy of the improved algorithm on the data set Heart along with the adjustment parameter, a red line in the figure represents the change of the classification accuracy along with the adjustment parameter under the noise-free condition, the optimal performance of the algorithm is improved to 88.4375%, a blue line represents the change of the classification accuracy along with the adjustment parameter under the noise condition, the optimal performance of the algorithm is improved to 98.75%, the accuracy of the algorithm is higher than that of the graph represented by the conventional algorithm under both conditions, and the improvement is more significant particularly under the noise condition. While under noisy conditions, optimal at C =0.0625, the time complexity rate under noisy conditions is reduced compared to the red line of fig. 1-2 (optimal at C = 0.125).
3.4 precision testing of each data set under noisy conditions using linear and Gaussian kernel functions
The method comprises the following steps of respectively adopting a traditional Support Vector Machine (SVM), a Twin Support Vector Machine (TSVM) and an improved algorithm classifier of the invention, testing data sets, namely Heart, Australian, Pima, Sonar, Spect, germ, Monk1, cancer, Ionodata, splice, cmc, flood and haberman based on a linear kernel function and a Gaussian kernel function, wherein the sample size and the data dimension of each data set are shown in a table 1:
TABLE 1
Figure 178006DEST_PATH_IMAGE214
The test accuracy of the linear kernel function and the gaussian kernel function for each data set under noisy conditions is shown in tables 2 (a) and 2 (b):
Figure 84782DEST_PATH_IMAGE215
TABLE 2 (a)
Figure 98875DEST_PATH_IMAGE216
TABLE 2 (b)
Table 2 shows the accuracy and fluctuation degree of each data set test by the conventional Support Vector Machine (SVM), Twin Support Vector Machine (TSVM) and the classifier generated by the improved algorithm of the present invention, and it can be seen that through the comparison of the three algorithms, the improved algorithm of the present invention has improved accuracy, reduced standard deviation, and is more efficient and stable than the two conventional algorithms.
The invention is illustrated below with reference to specific examples:
example 1
A method for establishing a diabetic retinopathy classifier based on feature extraction and a double support vector machine is provided, and data distribution plays an important role in the classification work of diabetic retinopathy images. Lesions of retinal capillaries appear as aneurysms, bleeding spots, hard exudation, lint spots, venous beading, intraretinal microvascular abnormalities (IRMA), and macular edema. Diabetes can cause two types of retinopathy, proliferative and non-proliferative retinopathy. Diabetic retinopathy is one of the major blinding eye diseases, and when the abnormal pathological changes in the image have a very low rate and cannot be distinguished, the problem of data imbalance occurs, so that abnormal values and noise are continuously increased, and the speed and performance of classification are greatly influenced. The true cause of retinal capillary pathology is indistinguishable. In view of the above problems, the present embodiment proposes an improved diabetic retinopathy detection scheme using the classifier creation method provided in steps S101 to S104. Starting from the motivation and mathematical expression of classification, based on the relation of proliferative and non-proliferative retinopathy, combining the relation between three characteristics of hard effusion, microaneurysm and bleeding spot of pathological changes, applying two modes of vector-based characteristic selection and matrix-based characteristic selection, wherein the vector-based characteristic selection can adopt a lasso characteristic selection method, the matrix-based characteristic selection can adopt a lr and p-norm-based characteristic selection method, in addition, the fast classification method of the double-bounded support vector machine and the WRSVM are used for solving the characteristics of outlier generated in the imbalance problem, can construct a dual classifier for setting weight values in two steps aiming at the problem of nonlinear classification of diabetic retinopathy images, adopts a mode of feature selection to identify multiple tasks, the problem that the classification of the unbalanced distribution of the image data cannot be effectively processed is solved without any additional external optimizer.
Specifically, first, feature extraction is performed on sample data for a specific task, and as shown in the original image of diabetic retinopathy shown in fig. 9, data including hard exudate features shown in fig. 10, data including microaneurysm features shown in fig. 11, and data including bleeding spot features shown in fig. 12 can be extracted by feature selection.
Further, a classifier is constructed based on the following steps:
the first step is as follows: the method comprises the steps of constructing a mapping function from a training sample of diabetic retinopathy to a category, and then judging the shortest distance from a specific hyperplane to a lesion sample closest to the adjacent category in all parallel topologically superplanes, so that the edge of the hyperplane can be clearly divided; and based on a marginal sum strategy, maximizing the sum of all margins, and finally considering the constraint between all pathological changes and non-pathological change types to solve the main classification problem.
The second step is that: and introducing a fuzzy support vector machine, performing membership grade division on each diabetic retinopathy sample data, performing fuzzification processing on the diabetic retinopathy sample data, and finding out the hyperplane optimization problem of optimally dividing the lesion type if the original data set is changed into a fuzzy data training set.
The third step: in order to solve the problem that the proportion of the abnormal lesion in the image is too low in the second step, and the data set is unbalanced, the samples are weighted. And applying a cost sensitive learning framework to a support vector machine to obtain the following weighted optimization problem. Through the optimization problem, an optimal hyperplane for dividing the type of the diabetic retinopathy training sample is found.
The fourth step: in the third step, the large coefficient is estimated with bias, so it is not optimal. To solve the problem of sample skewing caused by abnormal diabetic retinopathy, a penalty factor is used for each coefficient. And the importance degree of the loss brought by the outliers is represented, and when the penalty factor is infinite, the hyperplane is forced to correctly divide all samples, so that the samples are degenerated into a hard interval classifier.
The fifth step: after introducing the linear loss function, in order to balance the influence of each diabetic retina sample point on the projection-like mean, we follow the concept of a rough set, introducing a weighted linear loss function with a weighting vector, and the first term in the objective function is to control the model complexity to find the optimal projection direction. The second term in the objective function is to minimize the empirical risk, which attempts to minimize the intra-class variance of the projection samples of one lesion class, while the projection samples of another lesion class are as scattered as possible. Furthermore, the weighting vector will balance the effect of each point with the projection-like mean. During the training process, the empirical risk also attempts to achieve the consistency required for lesion image data classification. And finally, constructing a weight setting method with two steps according to the relation among the three characteristics of hard exudates, microaneurysms and bleeding spots, and carrying out classification calculation.
The test results were as follows:
fig. 13 is generated by testing a conventional twin support vector machine algorithm TSVM and a classifier constructed according to this embodiment on a noisy diabetic retinopathy image dataset, and is a description of the classification accuracy varying with the training times, a red line in the graph represents the performance representation of the original algorithm TSVM, a blue line represents the performance representation of the improved algorithm according to the present invention, and it is obvious from the model accuracy that the improved algorithm is higher than the graph represented by the conventional algorithm and has robustness to random noise.
In the present embodiment, the following three features are used: hard exudates, microaneurysms and bleeding spots. This example tests a noisy data set to conclude that: the improved model is the best choice for diabetic retinopathy detection, providing better results in terms of accuracy and robustness to noise. The improved selection classifier of the present invention also helps to reduce the time consumption of the classification of diabetes and non-diabetes corresponding to the data set.
Example 2
The pollution flashover accident is a serious accident having a wide influence on the power grid. The accurate and timely classification of the pollution severity is a key to prevent pollution flashover accidents and is a technical difficulty. In the embodiment, the method in steps S101 to S104 is adopted, and a supervision classifier is created for detecting the ozone level on the basis of extracting relevant features by using principal component analysis, so as to solve the problem of poor generalization caused by insufficient labeled samples in industrial application. The ozone level detection data set is used, and the specific data form can be referred to a standard ozone level characteristic correlation heat map, and comprises two ground ozone level data sets, one is a peak value set (eighthr. data) of eight hours, and the other is a peak value set (onehr. data) of one hour. These data were collected in houston, garwinton and brasorlia from 1998 to 2004.
Constructing a classifier based on the following steps:
the first step is as follows: the method comprises the steps of (1) judging the shortest distance from a specific hyperplane to the nearest polluted sample in an adjacent category by constructing a mapping function from ozone level detection training samples to categories and considering all parallel topologically-possible hyperplanes, and then dividing and identifying the edge of the hyperplane; and then based on a marginal sum strategy, maximizing the sum of all margins, and then solving the main classification problem on the basis of considering the constraint among all ozone pollution degrees.
The second step is that: and introducing a fuzzy support vector machine, performing membership degree division on each ozone level detection training sample data, performing fuzzification processing on the ozone level detection training sample data, and changing an original data set into a fuzzy data training set to obtain a hyperplane optimization problem seeking the optimal division pollution degree.
The third step: in order to solve the problems that the proportion of the change of the abnormal ozone pollution value in the image is too low and the data set is unbalanced in the second step, the sample is weighted, and a cost-sensitive learning frame is applied to a support vector machine to obtain a weighted optimization problem. Through the optimization problem, an optimal hyperplane for dividing the ozone level detection training sample types is found.
The fourth step: in the third step, the large coefficients are estimated with bias and are therefore not optimal. In order to solve the problem that the sample is deflected due to an abnormal ozone pollution value, a penalty factor is used for each coefficient, the importance degree of loss brought by outliers is represented, when the penalty factor is infinite, a hyperplane is forced to divide all samples correctly, and the hard interval classifier is degraded.
The fifth step: after introducing the linear loss function, in order to balance the influence of each ozone level detection sample point on the projection class mean value, following the concept of a rough set, a weighted linear loss function with a weighting vector is introduced, and the first term in the objective function is to control the model complexity to find the optimal projection direction. The second term in the objective function is to minimize the empirical risk, which attempts to minimize the intra-class variance of the projection samples of the self-contamination class, while the projection samples of the other lesion class are as scattered as possible. Furthermore, the weighting vector will balance the effect of each point with the projection-like mean. During the training process, the empirical risk also attempts to achieve the consistency required for the classification of the contamination level data. And finally, according to the relation among characteristics of local ozone peak prediction headwind, headwind ozone background level, precursor emission related factors, highest temperature, basic temperature generated by net ozone, total solar radiation amount of one day, sunrise wind speed, midday wind speed and the like, a weight setting method with two steps is constructed, and classification calculation is carried out.
The test results were as follows:
in the standard ozone water level characteristic correlation thermal map of fig. 14, each color block represents the ratio of the number of pixels classified by each contamination severity level to the total number of sample pixels. Fig. 15 is a result of testing the classifier constructed by the TSVM algorithm and the present embodiment on the noisy ozone level detection data set, and is a description of the classification accuracy varying with the training times, where the red line in the graph represents the performance representation of the TSVM algorithm of the original algorithm, and the blue line represents the performance representation of the improved algorithm of the present invention, and it is obvious from the model accuracy that the improved algorithm of the present invention is higher than the graph represented by the conventional algorithm, and is robust to random noise.
To sum up, in the method and the device for creating the diabetic retinopathy classifier based on feature extraction and a dual support vector machine, the method performs feature extraction on sample data, performs data optimization for different tasks, controls the membership of each sample point to a training set in a weighting mode by introducing the membership, and reduces errors caused by the singular points or noise points in the sample by introducing constraint conditions of slack variables to balance the singular points or noise points in the training data; furthermore, the cost is controlled in a weighting mode, a cost sensitive learning framework is used, and the cost is introduced in a fuzzy support vector machine through weighting, so that the error of expressing the data imbalance problem by an equation is reduced; further, by generating two independent and non-parallel hyperplanes, each hyperplane is brought close to one of the two categories while being far from the other, so that large-scale classification problems can be handled without the need for additional external optimizers.
Those of ordinary skill in the art will appreciate that the various illustrative components, systems, and methods described in connection with the embodiments disclosed herein may be implemented as hardware, software, or combinations of both. Whether this is done in hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention. When implemented in hardware, it may be, for example, an electronic circuit, an Application Specific Integrated Circuit (ASIC), suitable firmware, plug-in, function card, or the like. When implemented in software, the elements of the invention are the programs or code segments used to perform the required tasks. The program or code segments may be stored in a machine-readable medium or transmitted by a data signal carried in a carrier wave over a transmission medium or a communication link. A "machine-readable medium" may include any medium that can store or transfer information. Examples of a machine-readable medium include electronic circuits, semiconductor memory devices, ROM, flash memory, Erasable ROM (EROM), floppy disks, CD-ROMs, optical disks, hard disks, fiber optic media, Radio Frequency (RF) links, and so forth. The code segments may be downloaded via computer networks such as the internet, intranet, etc.
It should also be noted that the exemplary embodiments mentioned in this patent describe some methods or systems based on a series of steps or devices. However, the present invention is not limited to the order of the above-described steps, that is, the steps may be performed in the order mentioned in the embodiments, may be performed in an order different from the order in the embodiments, or may be performed simultaneously.
Features that are described and/or illustrated with respect to one embodiment may be used in the same way or in a similar way in one or more other embodiments and/or in combination with or instead of the features of the other embodiments in the present invention.
The above description is only a preferred embodiment of the present invention, and is not intended to limit the present invention, and various modifications and changes may be made to the embodiment of the present invention by those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (7)

1. A method for creating a diabetic retinopathy classifier based on feature extraction and a double support vector machine is characterized by comprising the following steps:
acquiring a training sample set, wherein the training sample set comprises a set number of classes, each class comprises a plurality of sample data, and each sample is provided with a class label; the sample data at least comprises images of retinal capillary aneurysm, bleeding spots, hard exudation, cotton velvet spots, venous beading, intraretinal microvascular abnormality and macular edema;
performing feature extraction on each sample data based on vector feature selection or matrix feature selection, wherein a lasso feature selection method is adopted for the vector-based feature selection, and an lr or p-norm-based feature selection method is adopted for the matrix-based feature selection;
introducing a fuzzy support vector machine, controlling the membership degree of each sample point to a training set in a weighting mode, and performing fuzzification processing on each sample data to reduce the membership degree of the isolated points and the noise points in the training sample set relative to the belonged classes; meanwhile, a relaxation variable is introduced into the fuzzy support vector machine, and a penalty factor is introduced based on a cost sensitive learning frame;
generating two independent and nonparallel hyperplanes based on the structure of the fuzzy support vector machine by using the training sample set after feature extraction, and enabling each hyperplane to be close to one of two categories and to be far away from the other category at the same time so as to create a double classifier; wherein, controlling the membership degree of each sample point to the training set by a weighting mode comprises:
fuzzification processing is carried out on the sample data during classification, matrix transformation is carried out on the actually acquired data set, and a training set representing fuzzy data is as follows:
Figure 92831DEST_PATH_IMAGE001
wherein the content of the first and second substances,
Figure DEST_PATH_IMAGE002
Figure DEST_PATH_IMAGE003
Figure DEST_PATH_IMAGE004
introducing relaxation variables in the fuzzy support vector machine, including:
introducing a relaxation variable in the process of standardizing linear or nonlinear programming, if the relaxation variable is equal to 0, converging to an original state, and if the relaxation variable is greater than zero, determining constraint relaxation;
then after introducing the fuzzy vector machine and the relaxation variable, the optimization expression of the hyperplane is shown as
Figure DEST_PATH_IMAGE005
Figure DEST_PATH_IMAGE006
Wherein the content of the first and second substances,
Figure DEST_PATH_IMAGE007
in order to be a penalty factor,
Figure DEST_PATH_IMAGE008
in order to be a function of the relaxation variable,
Figure DEST_PATH_IMAGE009
which represents the degree of membership of the sample,
Figure DEST_PATH_IMAGE010
is a normal vector of the hyperplane,
Figure DEST_PATH_IMAGE011
is a dimension of the training set of fuzzy data,
Figure DEST_PATH_IMAGE012
is as follows
Figure DEST_PATH_IMAGE013
A hyperplane edge;
introducing a penalty factor based on a cost sensitive learning framework, comprising:
by positive and negative class penalty factors
Figure DEST_PATH_IMAGE015
And
Figure DEST_PATH_IMAGE017
representing different costs, representing the contribution degrees of different importance to the data set error classification, and finding the optimal hyperplane by solving the following problems;
Figure DEST_PATH_IMAGE018
is used for training data sets
Figure DEST_PATH_IMAGE019
Of data points in dimensional space
Figure DEST_PATH_IMAGE020
Set of column vectors
Figure DEST_PATH_IMAGE021
Is shown as follows
Figure DEST_PATH_IMAGE022
An input mode is
Figure DEST_PATH_IMAGE023
Of 1 at
Figure DEST_PATH_IMAGE024
An output mode is
Figure DEST_PATH_IMAGE025
,
Figure DEST_PATH_IMAGE026
And
Figure DEST_PATH_IMAGE027
respectively positive and negative values; order to
Figure DEST_PATH_IMAGE028
A set of sample metrics is represented that are,
Figure DEST_PATH_IMAGE029
and
Figure DEST_PATH_IMAGE030
respectively representing a positive sample index set and a negative sample index set;
then the hyperplane is optimally represented as
Figure DEST_PATH_IMAGE031
Figure DEST_PATH_IMAGE032
Wherein the content of the first and second substances,
Figure 628547DEST_PATH_IMAGE015
and
Figure 126394DEST_PATH_IMAGE017
is a penalty factor that is a function of,
Figure DEST_PATH_IMAGE033
is a hyperplane normal vector and is,
Figure DEST_PATH_IMAGE034
is a relaxation variable;
the performance is calculated for each coefficient using an adaptive regression method
Figure DEST_PATH_IMAGE036
Using a weight
Figure DEST_PATH_IMAGE037
The loss caused by the solitary point or the noise point in the calculation process is represented by a penalty factor
Figure 646237DEST_PATH_IMAGE037
When the sample tends to be changed to the limit, the sample is gradually driven to be classified into the correct hyperplane,
Figure DEST_PATH_IMAGE038
is the correction of the initial sample, and the correct sample population is selected in successive iterations as:
Figure DEST_PATH_IMAGE039
wherein each coefficient
Figure DEST_PATH_IMAGE040
Corresponding to a weight value
Figure 132713DEST_PATH_IMAGE037
Using penalty factors as weights
Figure 407705DEST_PATH_IMAGE037
Introduction of
Figure DEST_PATH_IMAGE042
To obtain the following formula:
Figure DEST_PATH_IMAGE043
wherein the content of the first and second substances,
Figure DEST_PATH_IMAGE044
represents a hyperplane normal vector;
bonding of
Figure DEST_PATH_IMAGE046
And
Figure 799372DEST_PATH_IMAGE042
removing the hyperplane normal vector yields the following equation:
Figure DEST_PATH_IMAGE048
elimination of the relaxation variables gives the following formula:
Figure DEST_PATH_IMAGE049
elimination of the penalty factor gives the following formula:
Figure DEST_PATH_IMAGE050
generating two independent and non-parallel hyperplanes based on the structure of the fuzzy support vector machine using the training sample set, such that each hyperplane is close to one of the two classes while being far from the other, creating a dual classifier comprising:
for a computational binary classification problem in an n-dimensional real space, the training sample set is represented as:
Figure DEST_PATH_IMAGE051
wherein, it is provided with
Figure DEST_PATH_IMAGE053
Let us order
Figure DEST_PATH_IMAGE055
Matrix A of (A) represents a set of positive sample indices
Figure DEST_PATH_IMAGE057
In
Figure DEST_PATH_IMAGE059
A sample, an order
Figure DEST_PATH_IMAGE061
Matrix B of (A) represents a set of negative class sample indices
Figure DEST_PATH_IMAGE063
In
Figure DEST_PATH_IMAGE065
A sample is obtained; order to
Figure DEST_PATH_IMAGE067
For a kernel matrix, then the two hyperplanes are represented as:
Figure DEST_PATH_IMAGE068
Figure DEST_PATH_IMAGE069
Figure DEST_PATH_IMAGE070
order to
Figure DEST_PATH_IMAGE072
Is composed of
Figure DEST_PATH_IMAGE074
In the data set
Figure DEST_PATH_IMAGE076
The input of the order is carried out,
Figure DEST_PATH_IMAGE078
is a variance-like matrix;
in order to improve the marginal benefit and reduce the structural risk to the maximum extent, a regularization term is introduced into the formula, and then the optimization of the hyperplane is expressed as
Figure DEST_PATH_IMAGE080
Figure DEST_PATH_IMAGE081
Wherein the content of the first and second substances,
Figure DEST_PATH_IMAGE082
Figure DEST_PATH_IMAGE083
is a regularization term;
further performing the following solution to find an optimal hyperplane:
Figure DEST_PATH_IMAGE084
elimination of the balance parameters gives:
Figure DEST_PATH_IMAGE085
Figure DEST_PATH_IMAGE086
wherein the content of the first and second substances,
Figure DEST_PATH_IMAGE087
Figure DEST_PATH_IMAGE088
if the weight vector is the weight vector, the weight vector is eliminated to obtain:
Figure DEST_PATH_IMAGE089
Figure DEST_PATH_IMAGE090
2. the method of claim 1 wherein a weighted linear loss function with weight vectors is introduced to create the dual classifiers.
3. The method for creating a classifier for diabetic retinopathy based on feature extraction and dual support vector machines according to claim 2, characterized in that for the binary classification problem, a linear loss function is introduced, and the original problem of the linear loss projection dual support vector machine is expressed as:
Figure DEST_PATH_IMAGE091
so that
Figure DEST_PATH_IMAGE092
Figure DEST_PATH_IMAGE093
Figure DEST_PATH_IMAGE094
So that
Figure DEST_PATH_IMAGE095
Figure DEST_PATH_IMAGE096
Wherein the content of the first and second substances,
Figure DEST_PATH_IMAGE097
is a positive parameter of the number of bits,
Figure DEST_PATH_IMAGE098
is a relaxation variable; the optimal value for empirical risk was determined as:
Figure DEST_PATH_IMAGE099
and
Figure DEST_PATH_IMAGE100
the coarse set and the weighted linear loss function with the weighted vector are referred to, and the following first problem is solved:
Figure DEST_PATH_IMAGE101
so that
Figure DEST_PATH_IMAGE102
Figure DEST_PATH_IMAGE103
And solving a second problem:
Figure DEST_PATH_IMAGE104
so that
Figure DEST_PATH_IMAGE105
Figure DEST_PATH_IMAGE106
Wherein:
Figure DEST_PATH_IMAGE107
Figure DEST_PATH_IMAGE108
is determined by the following equation:
Figure DEST_PATH_IMAGE110
Figure DEST_PATH_IMAGE112
wherein the content of the first and second substances,
Figure DEST_PATH_IMAGE114
and
Figure DEST_PATH_IMAGE116
is a parameter;
the first problem is considered and the equality constraint is substituted into the objective function by the following approximation algorithm:
Figure DEST_PATH_IMAGE117
order:
Figure DEST_PATH_IMAGE118
Figure DEST_PATH_IMAGE119
then
Figure DEST_PATH_IMAGE120
To convert to:
Figure DEST_PATH_IMAGE121
will be the above type pair
Figure DEST_PATH_IMAGE122
Is set to 0, resulting in:
Figure DEST_PATH_IMAGE123
obtaining a solution to the first problem from a system of linear equations:
Figure DEST_PATH_IMAGE124
wherein the content of the first and second substances,
Figure DEST_PATH_IMAGE125
is an identity matrix;
considering the second problem and substituting the equation constraint into the objective function yields:
Figure DEST_PATH_IMAGE126
Figure DEST_PATH_IMAGE127
order:
Figure DEST_PATH_IMAGE128
will be provided with
Figure DEST_PATH_IMAGE129
To convert to:
Figure DEST_PATH_IMAGE130
will be provided with
Figure 36755DEST_PATH_IMAGE129
Relative to
Figure DEST_PATH_IMAGE131
Setting the gradient of (d) to 0 yields:
Figure DEST_PATH_IMAGE132
then, the solution to the second problem is obtained from a system of linear equations as:
Figure DEST_PATH_IMAGE133
4. the method of claim 3, wherein for non-linear classification problems, defining a classifier for diabetic retinopathy based on feature extraction and dual support vector machines
Figure DEST_PATH_IMAGE134
And determining a kernel function
Figure DEST_PATH_IMAGE135
Projecting the weighted linear loss onto the twin support vector machine, the original problem of the non-linear version is represented as the third problem:
Figure DEST_PATH_IMAGE136
so that
Figure DEST_PATH_IMAGE137
Figure 486060DEST_PATH_IMAGE138
And the fourth problem:
Figure DEST_PATH_IMAGE139
so that
Figure DEST_PATH_IMAGE141
Figure DEST_PATH_IMAGE143
Determining based on the same derivation process as linear classification
Figure DEST_PATH_IMAGE145
And
Figure DEST_PATH_IMAGE147
then, obtaining solutions of the third problem and the fourth problem, which are respectively:
Figure DEST_PATH_IMAGE148
Figure DEST_PATH_IMAGE149
wherein
Figure DEST_PATH_IMAGE150
And
Figure DEST_PATH_IMAGE151
is a unit matrix, then a matrix
Figure DEST_PATH_IMAGE152
Figure DEST_PATH_IMAGE153
Figure DEST_PATH_IMAGE154
Figure DEST_PATH_IMAGE155
5. The method of feature extraction and dual support vector machine based diabetic retinopathy classifier creation as claimed in claim 3, further comprising:
and using the sample data of the first set proportion in the training sample set to construct the double classifier, and detecting the precision of the classifier by adopting the sample data of the rest part of the training sample set.
6. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the steps of the method according to any of claims 1 to 5 when executing the program.
7. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 5.
CN202111366311.6A 2021-11-18 2021-11-18 Method and device for creating diabetic retinopathy classifier based on feature extraction and double support vector machines Active CN113792748B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111366311.6A CN113792748B (en) 2021-11-18 2021-11-18 Method and device for creating diabetic retinopathy classifier based on feature extraction and double support vector machines

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111366311.6A CN113792748B (en) 2021-11-18 2021-11-18 Method and device for creating diabetic retinopathy classifier based on feature extraction and double support vector machines

Publications (2)

Publication Number Publication Date
CN113792748A CN113792748A (en) 2021-12-14
CN113792748B true CN113792748B (en) 2022-05-13

Family

ID=78877389

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111366311.6A Active CN113792748B (en) 2021-11-18 2021-11-18 Method and device for creating diabetic retinopathy classifier based on feature extraction and double support vector machines

Country Status (1)

Country Link
CN (1) CN113792748B (en)

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6868411B2 (en) * 2001-08-13 2005-03-15 Xerox Corporation Fuzzy text categorizer
CN104462762A (en) * 2014-11-04 2015-03-25 西南交通大学 Fuzzy fault classification method of electric transmission line
CN104502103A (en) * 2014-12-07 2015-04-08 北京工业大学 Bearing fault diagnosis method based on fuzzy support vector machine

Also Published As

Publication number Publication date
CN113792748A (en) 2021-12-14

Similar Documents

Publication Publication Date Title
CN108345911B (en) Steel plate surface defect detection method based on convolutional neural network multi-stage characteristics
CN103136504B (en) Face identification method and device
US20200125889A1 (en) Information processing device, recording medium recording information processing program, and information processing method
EP3762795B1 (en) Method, device, system and program for setting lighting condition and storage medium
Yong et al. A multi-strategy integrated multi-objective artificial bee colony for unsupervised band selection of hyperspectral images
CN109344851B (en) Image classification display method and device, analysis instrument and storage medium
EP3822872A1 (en) Information processing device, information processing method, and information processing program
CN109961093A (en) A kind of image classification method based on many intelligence integrated studies
CN110674940A (en) Multi-index anomaly detection method based on neural network
Zhou et al. Automatic optic disc detection using low-rank representation based semi-supervised extreme learning machine
Damoulas et al. Inferring sparse kernel combinations and relevance vectors: an application to subcellular localization of proteins
CN114818963B (en) Small sample detection method based on cross-image feature fusion
CN114139631B (en) Multi-target training object-oriented selectable gray box countermeasure sample generation method
Spyropoulos et al. Ensemble classifier for combining stereo matching algorithms
CN113792748B (en) Method and device for creating diabetic retinopathy classifier based on feature extraction and double support vector machines
Li et al. Adaptive weighted ensemble clustering via kernel learning and local information preservation
EP4287083A1 (en) Determination program, determination apparatus, and method of determining
CN115830401B (en) Small sample image classification method
Chapel et al. Anomaly detection with score functions based on the reconstruction error of the kernel PCA
Jena et al. Elitist TLBO for identification and verification of plant diseases
WO2008118767A1 (en) Generalized sequential minimal optimization for svm+ computations
Li et al. An imbalanced ensemble learning method based on dual clustering and stage-wise hybrid sampling
Sun et al. Sample hardness guided softmax loss for face recognition
Li et al. Imbalanced complemented subspace representation with adaptive weight learning
LV et al. Imbalanced Data Over-Sampling Method Based on ISODATA Clustering

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant