CN106529598A - Classification method and system based on imbalanced medical image data set - Google Patents
Classification method and system based on imbalanced medical image data set Download PDFInfo
- Publication number
- CN106529598A CN106529598A CN201610997896.4A CN201610997896A CN106529598A CN 106529598 A CN106529598 A CN 106529598A CN 201610997896 A CN201610997896 A CN 201610997896A CN 106529598 A CN106529598 A CN 106529598A
- Authority
- CN
- China
- Prior art keywords
- sample
- subset
- medical image
- image data
- classifier
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2411—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/50—Extraction of image or video features by performing operations within image blocks; by using histograms, e.g. histogram of oriented gradients [HoG]; by summing image-intensity values; Projection analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V2201/00—Indexing scheme relating to image or video recognition or understanding
- G06V2201/03—Recognition of patterns in medical or anatomical images
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Physics & Mathematics (AREA)
- Artificial Intelligence (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- Bioinformatics & Computational Biology (AREA)
- General Engineering & Computer Science (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Multimedia (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a classification method and system based on an imbalanced medical image data set. The method comprises a step of extracting the green channel component of an original medical image, a step of using the histogram equalization to correct an extracted gray image, a step of extracting a texture feature, a wavelet feature and an auxiliary wheel feature from the corrected image, a step of ranking extracted feature samples according to a distance between the samples, a step of dividing uniform feature subsets on the ranked samples, and ensuring the difference between the subsets, a step of using an SVM algorithm and a BP neural network algorithm to train the feature subsets to produce sub classifiers, a step of combining the sub classifiers, and voting to obtain a final classification result. By using the technical scheme of the invention, the negative sample classification accuracy in multi-classification integrated learning is improved significantly, and the high skew of data set sample distribution and the negative sample accuracy in multi-classifier training in the medical field are improved obviously. The reduction of misdiagnosis is helped, and thus the practical value of the classifier is improved.
Description
Technical field
The invention belongs to machine learning field, more particularly to a kind of sorting technique based on unbalanced medical image data sets
With system.
Background technology
In many reality machines learning classification tasks, the training dataset of grader often has unbalanced point of height
Sample of the sample size of cloth problem, i.e. some classes far more than other classes.And traditional learning algorithm is whole for grader
Body nicety of grading is usually partial to for minority class mistake to be divided into many several classes ofs, but in many realistic problems, the classification essence of minority class
Degree is important all the better.Such as medical diagnosis on disease, credit card fraud detecting, network intrusions detecting.For such classification problem, such as medical treatment is led
It is exactly data set sample distribution high inclination that the data set in domain has a common feature, the number of positive sample (i.e. normal sample)
Quantity of the amount far above negative sample (i.e. ill sample).The grader for training out with such data set has significantly " to be had
Negative sample mistake can be divided into positive sample by bias ", and this is extremely serious for sufferer, causes mistaken diagnosis, misses optimal treatment
Time.So the classification accuracy for effectively improving negative sample is most important.Same example also leaks through a credit card swindleness
The loss deceived is more much smaller than the loss for refusing a normal person.So the learning method higher for minority class nicety of grading is past
It is past more of practical meaning.
In this reality, the achievement of machine learning is pushed to by having hampered for the unbalanced problem of data set of generally existing
Practical application, periodical " International Journal of Computer Science and Network " 2 months 2013
In 1st phase volume 2 by Rushi Longadge, Snehalata Dongre written paper " Class Imbalance
Problem in Data Mining:Analysis and summary in Review " solve this problem existing method.It is specifically divided into three major types:
Sampling, algorithm, feature selecting.Sampling is divided into lack sampling and over-sampling again, and most widely used wherein in lack sampling is random owing
Sampling, it is balanced that random lack sampling reaches sample by removing the sample in many several classes ofs at random.But the method has one asks
Topic, the useful information in many several classes of samples for removing simultaneously also are removed, and information will be caused to lose, impact final classification device
Accuracy rate.And it is most widely used in over-sampling be random over-sampling, random over-sampling by replicate generate minority class sample
To reach equiblibrium mass distribution.But the method there is also problem, exactly extra generation data not only increase the training time, and
The extra similar minority class sample for generating is likely to result in grader over-fitting;Algorithm usually introduces " Cost-
Sensitive " learning methods, the i.e. mistake point by improving minority class sample are lost, similar to the weights for increasing minority class sample,
It is balanced with the cum rights for reaching data distribution.But the method has a problem that to be exactly between many several classes of samples and minority class sample
There is no a general value in weights difference, this generally requires rule of thumb to judge or test repeatedly;Feature selecting then passes through
Choose a subset of existing feature set to be that grader is optimal performance, this is conducive to the features training collection of high latitude.But
It is that, as algorithm, the selection of subset does not equally have general subset, it is also desirable to micro-judgment and test repeatedly.
Meeting " International Conference on Knowledge Discovery&Data Mining " 1998
Year 164--168 page by Philip K Chan, written " the Toward Scalable Learning of Salvatore J Stolfo
with Non-Uniform Class and Cost Distributions:A Case Study in Credit Card
A kind of uniform sampling approach is proposed in Fraud Detection ", from unlike the method for sampling before, the method is not only
The sample in many several classes ofs need not be ignored, cause useful information to be lost.And extra sample point will not be generated, produce the training time
Increase, or cause the grader for producing to have over-fitting problem.Implement process as follows:
1. first most multiclass sample size in training set is rounded up divided by minority class sample size result, it is determined that training
Collection quantity.
2. will averagely divide by subset quantity except many several classes ofs after.
3. the portion after dividing and then is therefrom extracted, it is poor with the sample size of minority class sample size random from other parts
Extraction is gathered together enough.
4. finally gather with whole minority class samples as the uniform subset of sample, generate all subsets by that analogy.
The method not only make use of whole samples, not cause sample information to lose, while serving equalizing training concentration
Minority class imbalanced training sets problem.And it is final it is demonstrated experimentally that not only increasing integrated study grader using the method for sampling
Overall accuracy rate, and be obviously improved for the accuracy rate of minority class sample classification has.
In sum, in the solution of the unbalanced problem of training dataset in integrated study, adopting in pretreatment
Quadrat method often has more preferable applicability.But the above-mentioned method of sampling all only only account in training set minority class sample with it is many
The equal number problem of several classes of sample, without a property in view of Ensemble Learning Algorithms, i.e., the difference between sub-classifier
Property.Because in integrated study, what is obtained is integrated, individual learner answers " well different ", and exactly individual learner will have
Certain accuracy, i.e. learner performance can not be too poor, and will have diversity, i.e., will have difference between learner.Identical
Learning algorithm under, increase the simplest method of otherness and be just to increase the otherness between training set.
The content of the invention
For the unbalanced problem of training set in integrated study, the present invention provides a kind of based on unbalanced medical image data
The sorting technique and system of collection.
The present invention proposes a kind of new method of sampling, and each sample and minority class center of a sample in many several classes ofs is calculated before sampling
Minkowski Distance between point, first extracts distant sample when extracting with minority class quantity identical sample in many several classes ofs
This.So under the premise of ensureing that training set is uniform, while increased the difference between training set.According to the property of integrated study,
The accuracy rate of minority class in classification is not only improved, and improves the overall accuracy rate of system.This should for the reality of grader
With there is very strong practical significance.
For achieving the above object, the present invention is adopted the following technical scheme that:
The present invention provides a kind of sorting technique based on unbalanced medical image data sets, including:
Extract original medical image green channel component;
The gray level image extracted using histogram equalization amendment;
Respectively from revised image zooming-out textural characteristics, wavelet character, the auxiliary feature of wheel;
To the feature samples that extract by sample separation from sequence;
Uniform characteristics subset is divided to the sample after sequence, and ensures the otherness between subset;
Character subset is respectively trained using SVM algorithm and BP neural network algorithm and produces sub-classifier;
Combination sub-classifier, ballot draw final classification result.
Preferably, the green that the green channel component is colored medical image to be contained in 3 components of red, green, blue is divided
Amount.
Preferably, the histogram equalization is the side that a kind of utilization greyscale transformation automatically adjusts picture contrast quality
Method.
Preferably, the gray level image extracts green channel component image.
Preferably, the textural characteristics, wavelet character, the auxiliary feature of wheel are respectively:Medical image according to texture analysis at
The feature that extracts after the feature that extracts after reason, Wavelet transformation process, take turns the feature extracted after auxiliary method is processed.
Preferably, the sample separation is with a distance from being calculated using Minkowski Distance formula.
Preferably, described by sample separation from sequencer procedure be:As a example by this sentences three classification, calculate first minimum
The central point of sample in class, then by this basis of various kinds in secondary minority class and the Minkowski Distance of minority class central point from remote
Sort near, then calculate the central point of minority class and all samples in secondary minority class, finally by this basis of various kinds in many several classes ofs
With the Minkowski Distance of this central point from as far as nearly sequence, classify by that analogy more.
Preferably, the division uniform characteristics subset process is:Practice most multiclass sample size is concentrated divided by minimum class sample
This quantity result rounds up, and determines training subset quantity;Afterwards other classes in addition to minimum class are averagely drawn by subset quantity
Point, then other classes respectively extract the portion after dividing, and adjacent part from this part poor with the sample size of minimum class sample size is taken out
Take and gather together enough;The sample of the last quantity such as all kinds of gathers as the uniform subset of sample, generates all uniform subsets by that analogy.
Preferably, it is described ensure subset between otherness be according to distance-taxis after ordered data collection, by training son
Subset after collection quantity is divided is equally orderly, and having differences property from each other, i.e. distance are from as far as near.
Preferably, the use SVM algorithm is respectively trained character subset with BP neural network algorithm produces sub-classifier
SVM algorithm and BP neural network Algorithm for Training are given respectively by ready-portioned character subset as, generate twice character subset
Sub-classifier.
Preferably, the combination sub-classifier, votes and show that final classification result is:Test medical image is respectively by instructing
The sub-classifier classification perfected, statistical classification result, most multiclass are final classification result.
The present invention also provides a kind of categorizing system based on unbalanced medical image data sets, including:
Green channel classification extraction element, is configured to extract original medical image green channel component;
Histogram equalization device, is configured to, with the gray level image that histogram equalization amendment is extracted;
Feature deriving means, are configured to from revised image zooming-out textural characteristics, wavelet character, take turns auxiliary spy
Levy;
Sample collator, is configured to the feature samples for extracting by sample separation from sequence;
Uniform sampling device, is configured to divide the sample after sequence uniform characteristics subset, and ensures the difference between subset
The opposite sex;
Sub-classifier trainer, is configured with SVM algorithm and is respectively trained character subset with BP neural network algorithm
Produce sub-classifier;
As a result balloting device, is configured to combine sub-classifier, and ballot draws final classification result.
Preferably, the green that the green channel component is colored medical image to be contained in 3 components of red, green, blue is divided
Amount.
Preferably, the histogram equalization is the side that a kind of utilization greyscale transformation automatically adjusts picture contrast quality
Method.
Preferably, the gray level image extracts green channel component image.
Preferably, the textural characteristics, wavelet character, the auxiliary feature of wheel are respectively:Medical image according to texture analysis at
The feature that extracts after the feature that extracts after reason, Wavelet transformation process, take turns the feature extracted after auxiliary method is processed.
Preferably, the sample separation is with a distance from being calculated using Minkowski Distance formula.
Preferably, the sample collator processing procedure is:As a example by this sentences three classification, calculate first minimum
The central point of sample in class, then by this basis of various kinds in secondary minority class and the Minkowski Distance of minority class central point from remote
Sort near, then calculate the central point of minority class and all samples in secondary minority class, finally by this basis of various kinds in many several classes ofs
With the Minkowski Distance of this central point from as far as nearly sequence, classify by that analogy more.
Preferably, the uniform sampling device processing procedure is:Practice most multiclass sample size is concentrated divided by minimum class sample
This quantity result rounds up, and determines training subset quantity;Afterwards other classes in addition to minimum class are averagely drawn by subset quantity
Point, then other classes respectively extract the portion after dividing, and adjacent part from this part poor with the sample size of minimum class sample size is taken out
Take and gather together enough;The sample of the last quantity such as all kinds of gathers as the uniform subset of sample, generates all uniform subsets by that analogy.
Preferably, it is described ensure subset between otherness be according to distance-taxis after ordered data collection, by training son
Subset after collection quantity is divided is equally orderly, and having differences property from each other, i.e. distance are from as far as near.
Preferably, the sub-classifier trainer be by ready-portioned character subset give respectively SVM algorithm and
BP neural network Algorithm for Training, generates the sub-classifier of twice character subset.
Categorizing system based on unbalanced medical image data sets according to claim 12, it is characterised in that institute
Stating result balloting device processing procedure is:Test medical image is classified by the sub-classifier for training respectively, statistical classification result,
Most multiclass is final classification result.
The new method of sampling proposed by the present invention is obviously improved to negative sample classification accuracy in many classification of integrated study,
This in data set sample distribution high inclination in such as medical field, multi-categorizer training negative sample accuracy rate have and substantially carry
Rise.Contribute to reducing mistaken diagnosis, so as to improve the practical value of grader.
Description of the drawings
With reference to accompanying drawing, from the following detailed description to the embodiment of the present invention, the present invention is better understood with, is similar in accompanying drawing
Label indicate similar part, wherein:
Fig. 1 shows the one of the categorizing system based on unbalanced medical image data sets according to an embodiment of the invention
Individual detailed diagram;
Fig. 2 shows the one of the sorting technique based on unbalanced medical image data sets according to an embodiment of the invention
Individual detailed diagram;
Fig. 3 shows uniform sampling schematic diagram according to an embodiment of the invention.
Specific embodiment
The feature and exemplary embodiment of various aspects of the present invention is described more fully below.Explained below covers many
Detail, to provide complete understanding of the present invention.It will be apparent, however, to one skilled in the art that
The present invention can be implemented in the case of some details in not needing these details.Below to the description of embodiment only
It is in order to the example by illustrating the present invention is providing to clearer understanding of the invention.The present invention is not limited to set forth below
Any concrete configuration and algorithm, but cover coherent element, part and calculation under the premise of without departing from the spirit of the present invention
Any modification, replacement and the improvement of method.
Multiple problems in view of the above, the present invention propose a kind of classification based on unbalanced medical image data sets
Method and system.With reference to Fig. 1 and Fig. 2, the sorting technique based on unbalanced medical image data sets according to the present invention is illustrated
With the example of system.Fig. 1 shows the classification system based on unbalanced medical image data sets according to an embodiment of the invention
One detailed diagram of system;Fig. 2 shows dividing based on unbalanced medical image data sets according to an embodiment of the invention
One detailed diagram of class method;
As shown in figure 1, including that green is logical according to a kind of categorizing system based on unbalanced medical image data sets of the present invention
Road classification extraction element 101, histogram equalization device 102, feature deriving means 103, sample collator 104, uniformly adopt
Sampling device 105, sub-classifier trainer 106, result balloting device 107.Their function is as follows:Extract original medical image
Green channel component (that is, execution step S201).The gray level image extracted using histogram equalization amendment (that is, performs step
Rapid S202).Respectively from revised image zooming-out textural characteristics, wavelet character, the auxiliary feature (that is, execution step S203) of wheel.It is right
The feature samples for extracting are by sample separation from sequence (that is, execution step S204).Uniform characteristics are divided to the sample after sequence
Subset, and ensure the otherness (that is, execution step S205) between subset.Instructed with BP neural network algorithm respectively using SVM algorithm
Practice character subset and produce sub-classifier (that is, execution step S206).To combine sub-classifier, ballot draws final classification result
(that is, execution step S207).
Specifically, sample collator 104 introduces the distance that Minkowski Distance is calculated between sample, and ordering rule is root
According to this basis of various kinds in many several classes ofs with the Minkowski Distance of minority class central point from as far as nearly sequence.Uniform sampling device
105 be using sequence after sample set carry out uniform sampling because sample set in order, then can obtain the sample with otherness
This subset.Below, provide the example by the sorting technique according to the present invention based on unbalanced medical image data sets and system:
This introduces detailed process as a example by sentencing eye fundus image.Colored eye fundus image contains 3 components of red, green, blue.Due to red
Colouring component brightness highest, blood vessel and background contrasts it is low, be difficult to distinguish target blood and eyeground background;Blue component contrast
It is low with brightness, and noise jamming is serious;The brightness of green component is moderate, and blood vessel is higher with background contrasts, can be very well
The colored optical fundus blood vessel distribution of reaction.So extracting green channel (G passages) component to training set.
Histogram equalization is a kind of method that utilization greyscale transformation automatically adjusts picture contrast quality, and basic thought is
Greyscale transformation function is obtained by the probability density function of gray level, it is one kind based on Cumulative Distribution Function transform method
Histogram Modification Methods.So extracting the sorted gray level image of green channel using histogram equalization to correct to training set
Image.
It is for revised gray level image, special from extraction is processed by wavelet transformation, the auxiliary method of wheel and texture analysis respectively
Collection, as three kinds of independent data sets to train grader afterwards.Training set now is changed into three independent feature sets, point
Wei not wavelet character collection, the auxiliary feature set of wheel and texture feature set.
It is exactly that data set sample distribution is highly inclined for the data set of these three medical fields has a common feature
Tiltedly, quantity of the quantity of positive sample (i.e. normal sample) far above negative sample (i.e. ill sample).Trained with such data set
Grader out has significantly " excess kurtosis ", negative sample mistake can be divided into positive sample, and this is very tight for sufferer
Weight, mistaken diagnosis is caused, golden hour is missed.So the classification accuracy for effectively improving negative sample is most important.
As described in background technology, existing solution can not thoroughly solve the problem, then with reference to existing method,
Propose positive and negative sample distribution during one kind not only can ensure training set balanced, and differences between samples between training subset can be improved
The method of sampling of property, so as to effectively improve the overall accuracy rate of the classification accuracy and grader of negative sample.Detailed process is:Draw
Enter the distance that Minkowski Distance is calculated between sample, computing formula is as follows:
Wherein, d12For the x of sample1And x2Between distance, p represents the dimension of sample point attribute, numbers of the k for property value.
As a example by this sentences three classification, the central point of in minimum several classes of sample is calculated first, then will be each in secondary minority class
Sample, then is calculated in minority class and secondary minority class from as far as nearly sequence according to the Minkowski Distance of minority class central point
The central point of all samples, finally by the Minkowski Distance of this basis of various kinds in many several classes ofs and this central point from as far as nearly row
Sequence.Otherness sampling after sample after sequence is is ready.
For sorted three feature samples collection, the uniform sampling after sampling is improved respectively, as shown in figure 3, with three points
As a example by class, first by (i.e. the first kind) sample size of most multiclass in training set divided by minimum class (i.e. the 3rd class) sample size result to
On round, determine training subset quantity.Afterwards other classes in addition to minimum class are averagely divided by subset quantity.Then other classes
Each portion extracted after dividing, from adjacent part of this part extract poor with the sample size of minimum class sample size are gathered together enough.It is last each
The sample of the quantity such as class gathers as the uniform subset of sample, generates all subsets by that analogy.
This is arrived, using the new method of sampling, not only positive and negative sample distribution is balanced for the training subset of generation, and between subset
Having differences property, the accuracy rate of the integrated grader negative sample of the sub-classifier gone out by these traineds and overall accuracy rate are all
Can be lifted.And the method for sampling is all suitable for many classification in two classification.
The character subset for being obtained by the three category feature data sets sampling that previous step is obtained afterwards, respectively using SVMs
Twice and character subset and separate sub-classifier are obtained with the training of BP neural network learning algorithm.
Most all mutually independent at last sub-classifier is combined, and test eye fundus image is respectively by the sub-classifier for training
Classification, most statistical classification result, multiclass are final classification result.
The method and system are applicable not only to eye fundus image classification, and other unbalanced medical image classification are suitable for.
Need clearly, to the invention is not limited in particular configuration that is described above and illustrating in figure and process.Also,
For brevity, the detailed description to known method technology is omitted here.In the above-described embodiments, have been described and illustrated some
Concrete step is as an example.But, method of the present invention process is not limited to described and illustrated concrete steps, this area
Technical staff can understand the present invention spirit after, be variously modified, change and add, or change step between
Order.
Functional block shown in structures described above block diagram can be implemented as hardware, software, firmware or their group
Close.When realizing in hardware, its may, for example, be electronic circuit, special IC (ASIC), appropriate firmware, insert
Part, function card etc..When being realized with software mode, the element of the present invention is used to perform program or the generation of required task
Code section.Program or code segment can be stored in machine readable media, or are being passed by the data-signal carried in carrier wave
Defeated medium or communication links send." machine readable media " can include can store or transmission information any medium.
The example of machine readable media includes electronic circuit, semiconductor memory devices, ROM, flash memory, erasable ROM (EROM), soft
Disk, CD-ROM, CD, hard disk, fiber medium, radio frequency (RF) link, etc..Code segment can be via such as internet, inline
The computer network of net etc. is downloaded.
The present invention can be realized in other specific forms, without deviating from its spirit and essential characteristics.For example, particular implementation
Algorithm described in example can be changed, and system architecture is without departing from the essence spirit of the present invention.Therefore, it is current
Embodiment be all counted as in all respects being exemplary rather than it is determinate, the scope of the present invention by claims rather than
Foregoing description is defined, also, the whole changes fallen in the range of the implication and equivalent of claim are so as to all be included in
Among the scope of the present invention.
Claims (10)
1. a kind of sorting technique based on unbalanced medical image data sets, it is characterised in that include:
Extract original medical image green channel component;
The gray level image extracted using histogram equalization amendment;
Respectively from revised image zooming-out textural characteristics, wavelet character, the auxiliary feature of wheel;
To the feature samples that extract by sample separation from sequence;
Uniform characteristics subset is divided to the sample after sequence, and ensures the otherness between subset;
Character subset is respectively trained using SVM algorithm and BP neural network algorithm and produces sub-classifier;
Combination sub-classifier, ballot draw final classification result.
2. the sorting technique based on unbalanced medical image data sets according to claim 1, it is characterised in that described to press
Sample separation from sequencer procedure is:The central point of sample in minimum several classes of is calculated first, then by each sample in secondary minority class
, then own in calculating minority class and secondary minority class from as far as nearly sequence according to the Minkowski Distance of minority class central point
The central point of sample, finally by the Minkowski Distance of this basis of various kinds in many several classes ofs and this central point from as far as nearly sequence,
Many classification are by that analogy.
3. the sorting technique based on unbalanced medical image data sets according to claim 1, it is characterised in that described stroke
Point uniform characteristics subset process is:Practice and concentrate most multiclass sample size to round up divided by minimum class sample size result, it is determined that
Training subset quantity;Afterwards other classes in addition to minimum class are averagely divided by subset quantity, then other classes respectively extract division
Portion afterwards, from adjacent part of this part extract poor with the sample size of minimum class sample size are gathered together enough;The quantity such as finally all kinds of
Sample gathers as the uniform subset of sample, generates all uniform subsets by that analogy.
4. the sorting technique based on unbalanced medical image data sets according to claim 1, it is characterised in that the guarantor
Card subset between otherness be according to distance-taxis after ordered data collection, by training subset quantity divide after subset equally have
Sequence, and having differences property from each other, i.e. distance are from as far as near.
5. the sorting technique based on unbalanced medical image data sets according to claim 1, it is characterised in that described to make
It is respectively trained character subset and is produced sub-classifier and be with SVM algorithm and BP neural network algorithm and ready-portioned character subset is divided
SVM algorithm and BP neural network Algorithm for Training are not given, the sub-classifier of twice character subset is generated.
6. a kind of categorizing system based on unbalanced medical image data sets, it is characterised in that include:
Green channel classification extraction element, is configured to extract original medical image green channel component;
Histogram equalization device, is configured to, with the gray level image that histogram equalization amendment is extracted;
Feature deriving means, are configured to from revised image zooming-out textural characteristics, wavelet character, take turns auxiliary feature;
Sample collator, is configured to the feature samples for extracting by sample separation from sequence;
Uniform sampling device, is configured to divide the sample after sequence uniform characteristics subset, and ensures the otherness between subset;
Sub-classifier trainer, is configured with SVM algorithm and is respectively trained character subset generation with BP neural network algorithm
Sub-classifier;
As a result balloting device, is configured to combine sub-classifier, and ballot draws final classification result.
7. the categorizing system based on unbalanced medical image data sets according to claim 6, it is characterised in that the sample
This collator processing procedure is:The central point of sample in minimum several classes of as a example by this sentences three classification, is calculated first, then will
In secondary minority class, the Minkowski Distance of this basis of various kinds and minority class central point is from as far as nearly sequence, then calculates minority class
With the central point of all samples in secondary minority class, finally by the Minkowski of this basis of various kinds in many several classes ofs and this central point away from
From sorting from close to, classify by that analogy more.
8. the categorizing system based on unbalanced medical image data sets according to claim 6, it is characterised in that it is described
Even sampling apparatus processing procedure is:Practice and concentrate most multiclass sample size to round up divided by minimum class sample size result, it is determined that
Training subset quantity;Afterwards other classes in addition to minimum class are averagely divided by subset quantity, then other classes respectively extract division
Portion afterwards, from adjacent part of this part extract poor with the sample size of minimum class sample size are gathered together enough;The quantity such as finally all kinds of
Sample gathers as the uniform subset of sample, generates all uniform subsets by that analogy.
9. the categorizing system based on unbalanced medical image data sets according to claim 6, it is characterised in that the guarantor
Card subset between otherness be according to distance-taxis after ordered data collection, by training subset quantity divide after subset equally have
Sequence, and having differences property from each other, i.e. distance are from as far as near.
10. the categorizing system based on unbalanced medical image data sets according to claim 6, it is characterised in that described
Sub-classifier trainer is and gives SVM algorithm and BP neural network Algorithm for Training respectively by ready-portioned character subset, raw
Into the sub-classifier of twice character subset.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610997896.4A CN106529598B (en) | 2016-11-11 | 2016-11-11 | Method and system for classifying medical image data sets based on imbalance |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610997896.4A CN106529598B (en) | 2016-11-11 | 2016-11-11 | Method and system for classifying medical image data sets based on imbalance |
Publications (2)
Publication Number | Publication Date |
---|---|
CN106529598A true CN106529598A (en) | 2017-03-22 |
CN106529598B CN106529598B (en) | 2020-05-08 |
Family
ID=58351504
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201610997896.4A Active CN106529598B (en) | 2016-11-11 | 2016-11-11 | Method and system for classifying medical image data sets based on imbalance |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106529598B (en) |
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108230322A (en) * | 2018-01-28 | 2018-06-29 | 浙江大学 | A kind of eyeground feature detection device based on weak sample labeling |
CN108805091A (en) * | 2018-06-15 | 2018-11-13 | 北京字节跳动网络技术有限公司 | Method and apparatus for generating model |
CN108846405A (en) * | 2018-04-11 | 2018-11-20 | 东莞迪赛软件技术有限公司 | Uneven medical insurance data classification method based on SSGAN |
CN110069997A (en) * | 2019-03-22 | 2019-07-30 | 北京字节跳动网络技术有限公司 | Scene classification method, device and electronic equipment |
CN110704662A (en) * | 2019-10-17 | 2020-01-17 | 广东工业大学 | Image classification method and system |
CN111046891A (en) * | 2018-10-11 | 2020-04-21 | 杭州海康威视数字技术股份有限公司 | Training method of license plate recognition model, and license plate recognition method and device |
CN111758105A (en) * | 2018-05-18 | 2020-10-09 | 谷歌有限责任公司 | Learning data enhancement strategy |
CN112138394A (en) * | 2020-10-16 | 2020-12-29 | 腾讯科技(深圳)有限公司 | Image processing method, image processing device, electronic equipment and computer readable storage medium |
CN112491797A (en) * | 2020-10-28 | 2021-03-12 | 北京工业大学 | Intrusion detection method and system based on unbalanced industrial control data set |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101989289A (en) * | 2009-08-06 | 2011-03-23 | 富士通株式会社 | Data clustering method and device |
CN104091073A (en) * | 2014-07-11 | 2014-10-08 | 中国人民解放军国防科学技术大学 | Sampling method for unbalanced transaction data of fictitious assets |
CN104809226A (en) * | 2015-05-07 | 2015-07-29 | 武汉大学 | Method for early classifying imbalance multi-variable time sequence data |
CN105760889A (en) * | 2016-03-01 | 2016-07-13 | 中国科学技术大学 | Efficient imbalanced data set classification method |
CN106056130A (en) * | 2016-05-18 | 2016-10-26 | 天津大学 | Combined downsampling linear discrimination classification method for unbalanced data sets |
-
2016
- 2016-11-11 CN CN201610997896.4A patent/CN106529598B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101989289A (en) * | 2009-08-06 | 2011-03-23 | 富士通株式会社 | Data clustering method and device |
CN104091073A (en) * | 2014-07-11 | 2014-10-08 | 中国人民解放军国防科学技术大学 | Sampling method for unbalanced transaction data of fictitious assets |
CN104809226A (en) * | 2015-05-07 | 2015-07-29 | 武汉大学 | Method for early classifying imbalance multi-variable time sequence data |
CN105760889A (en) * | 2016-03-01 | 2016-07-13 | 中国科学技术大学 | Efficient imbalanced data set classification method |
CN106056130A (en) * | 2016-05-18 | 2016-10-26 | 天津大学 | Combined downsampling linear discrimination classification method for unbalanced data sets |
Non-Patent Citations (3)
Title |
---|
JI-JIANG YANG 等: "Exploiting ensemble learning for automatic cataract detection and grading", 《COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE》 * |
胡志军 等: "基于距离排序的快速支持向量机分类算法", 《计算机应用与软件》 * |
陈红波: "基于多分类器选择集成的农作物叶部病害识别研究", 《中国优秀硕士学位论文全文数据库 信息科技辑》 * |
Cited By (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108230322A (en) * | 2018-01-28 | 2018-06-29 | 浙江大学 | A kind of eyeground feature detection device based on weak sample labeling |
CN108230322B (en) * | 2018-01-28 | 2021-11-09 | 浙江大学 | Eye ground characteristic detection device based on weak sample mark |
CN108846405A (en) * | 2018-04-11 | 2018-11-20 | 东莞迪赛软件技术有限公司 | Uneven medical insurance data classification method based on SSGAN |
CN111758105A (en) * | 2018-05-18 | 2020-10-09 | 谷歌有限责任公司 | Learning data enhancement strategy |
CN108805091A (en) * | 2018-06-15 | 2018-11-13 | 北京字节跳动网络技术有限公司 | Method and apparatus for generating model |
CN108805091B (en) * | 2018-06-15 | 2021-08-10 | 北京字节跳动网络技术有限公司 | Method and apparatus for generating a model |
CN111046891A (en) * | 2018-10-11 | 2020-04-21 | 杭州海康威视数字技术股份有限公司 | Training method of license plate recognition model, and license plate recognition method and device |
CN110069997A (en) * | 2019-03-22 | 2019-07-30 | 北京字节跳动网络技术有限公司 | Scene classification method, device and electronic equipment |
CN110069997B (en) * | 2019-03-22 | 2021-07-20 | 北京字节跳动网络技术有限公司 | Scene classification method and device and electronic equipment |
CN110704662A (en) * | 2019-10-17 | 2020-01-17 | 广东工业大学 | Image classification method and system |
CN112138394A (en) * | 2020-10-16 | 2020-12-29 | 腾讯科技(深圳)有限公司 | Image processing method, image processing device, electronic equipment and computer readable storage medium |
CN112491797A (en) * | 2020-10-28 | 2021-03-12 | 北京工业大学 | Intrusion detection method and system based on unbalanced industrial control data set |
Also Published As
Publication number | Publication date |
---|---|
CN106529598B (en) | 2020-05-08 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106529598A (en) | Classification method and system based on imbalanced medical image data set | |
CN102842032B (en) | Method for recognizing pornography images on mobile Internet based on multi-mode combinational strategy | |
Duggal et al. | Prediction of thyroid disorders using advanced machine learning techniques | |
CN108399431A (en) | Disaggregated model training method and sorting technique | |
CN110210486A (en) | A kind of generation confrontation transfer learning method based on sketch markup information | |
CN106599155A (en) | Method and system for classifying web pages | |
CN107563428A (en) | Classification of Polarimetric SAR Image method based on generation confrontation network | |
CN108776774A (en) | A kind of human facial expression recognition method based on complexity categorization of perception algorithm | |
CN108460421A (en) | The sorting technique of unbalanced data | |
CN109800781A (en) | A kind of image processing method, device and computer readable storage medium | |
CN108764302A (en) | A kind of bill images sorting technique based on color characteristic and bag of words feature | |
Usman et al. | Intelligent automated detection of microaneurysms in fundus images using feature-set tuning | |
Zhu et al. | Automatic diabetic retinopathy screening via cascaded framework based on image-and lesion-level features fusion | |
CN109635669A (en) | Image classification method, the training method of device and disaggregated model, device | |
CN109871901A (en) | A kind of unbalanced data classification method based on mixing sampling and machine learning | |
Tavallali et al. | An efficient training procedure for viola-jones face detector | |
Jun et al. | Tournament based ranking CNN for the cataract grading | |
Urdal et al. | Prognostic prediction of histopathological images by local binary patterns and RUSBoost | |
Paswan et al. | Detection and classification of blood cancer from microscopic cell images using SVM KNN and NN classifier | |
Manjramkar | Survey of diabetic retinopathy screening methods | |
CN106250913A (en) | A kind of combining classifiers licence plate recognition method based on local canonical correlation analysis | |
Rampun et al. | Breast density classification using local ternary patterns in mammograms | |
CN109472307A (en) | A kind of method and apparatus of training image disaggregated model | |
CN108510483A (en) | A kind of calculating using VLAD codings and SVM generates color image tamper detection method | |
CN107729918A (en) | Cellular automata based on Cost Sensitive Support Vector Machines emerges in large numbers the sorting technique of phenomenon |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |