CN107169506A - Random assortment method and device based on assembled classifier - Google Patents

Random assortment method and device based on assembled classifier Download PDF

Info

Publication number
CN107169506A
CN107169506A CN201710244805.4A CN201710244805A CN107169506A CN 107169506 A CN107169506 A CN 107169506A CN 201710244805 A CN201710244805 A CN 201710244805A CN 107169506 A CN107169506 A CN 107169506A
Authority
CN
China
Prior art keywords
grader
assembled
assembled classifier
new
training
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201710244805.4A
Other languages
Chinese (zh)
Inventor
何为舟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Weimeng Chuangke Network Technology China Co Ltd
Original Assignee
Weimeng Chuangke Network Technology China Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Weimeng Chuangke Network Technology China Co Ltd filed Critical Weimeng Chuangke Network Technology China Co Ltd
Priority to CN201710244805.4A priority Critical patent/CN107169506A/en
Publication of CN107169506A publication Critical patent/CN107169506A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present invention relates to classified calculating technical field, and in particular to a kind of random assortment method based on assembled classifier, including:The incomplete same grader of N number of type is randomly choosed as assembled classifier;Training set and test set are selected to randomly selected each grader;Each grader is trained and tested respectively, the average accuracy of assembled classifier is obtained;Judged whether to trigger eliminative mechanism according to the average accuracy of assembled classifier;Classified calculating step is entered based on judged result, the classification results of each grader are obtained;The classification results of each grader are voted, final classification result is obtained.The present invention can reduce the phenomenon of over-fitting and poor fitting, it would be preferable to support discrete variable and continuous variable, and can overcome concept drift phenomenon.

Description

Random assortment method and device based on assembled classifier
Technical field
The present invention relates to classified calculating technical field, and in particular to a kind of random assortment method based on assembled classifier and Device.
Background technology
Classification problem is always one of hot issue that academia and industrial quarters are studied.One is quickly and accurately classified Device, can provide greatly value for enterprise.For example:Client is accurately classified, it is possible to the push client sense of orientation The ad content of interest, greatly improves the income of advertisement.
Classification problem is solved, most simple most straightforward approach is exactly rule and policy.So-called rule and policy, is exactly by the warp of people Test, be summarized as rule one by one, then according to these rule and policies, the problem of going to solve actual.For example:Can be according to year Age and sex, are divided into client interested in automobile and automobile are lost interest in, such as male 20-50, women 25-45.Then, Can be according to the result of classification, the advertisement of the push car category of orientation.The advantage of rule and policy is that it is realized simply, and And be readily appreciated that.By being constantly adjusted to artificial experience, it can be designed that a complicated algorithm comes out.
However, the inferior position of rule and policy is equally also fairly obvious, that is, be highly dependent on the consciousness of people.All rules All it is to be come out by the summary of experience of people, is not difficult although implementing, but needs manually to go to carry out substantial amounts of study behaviour Make.Or by taking advertiser as an example, when age and the two factors of sex are only considered, Rulemaking gets up or relative Easily.However, it is known that the factor of the interest of one people of influence, except age and sex, in addition to environment, educational background is lived The complicated factor of experience etc., and these factors are difficult to by summarizing the experience into rule.Therefore, at this point, regular The sorting technique of strategy, it is difficult to obtain extraordinary effect.
Another popular method of solution classification problem is exactly machine learning.By substantial amounts of research and analysis, in machine The low-dimensional level of device study, many algorithms are devised.These algorithms be related to probability theory, statistics, Approximation Theory, convextiry analysis, The multi-door subject such as algorithm complex theory.Wherein, comparing classical sorting algorithm includes:Logistic regression (Logistic Regression), SVMs (Support Vector Machine), decision tree (Decision Tree), simple pattra leaves This etc..These sorting algorithms belong to the field of low dimensional, because they are all by certain Probability or statistically Some theory goes what is classified.All relatively easy in the realization of these sorting algorithms, the requirement to performance is relatively low, practicality pole By force.
Random forest, refers to setting a kind of grader for being trained sample and predicting using many.In simple terms, with Machine forest is exactly to be made up of many CART (Classification And Regression Tree, Taxonomy and distribution). Randomness is mainly reflected in two aspects:(1) when training each tree, one is chosen from whole training samples (sample number is N) There may be the size of repetition to be similarly N data set to be trained (i.e. bootstrap (bootstrap) samplings);(2) each Node, randomly selects a subset of all features, for calculating optimal segmentation mode.For each tree, the instruction that they are used It is to be concentrated with putting back to sampling out from total training to practice collection, it means that, some samples in total training set may be repeatedly In the training set for appearing in one tree, it is also possible to from the training set for not appearing in one tree.When training the node of each tree, Use is characterized according to a certain percentage randomly without the extraction put back to from all features.
Random forest can handle the data of very high-dimensional (feature is a lot), and it goes without doing feature selecting.In training After complete, it is important which Property comparison it can provide.When random forest is created, unbiased is used to estimate extensive error Meter, model generalization ability is strong.Training speed is fast, may be readily formed as parallel method.In the training process, it is able to detect that between characteristic Interact.
Although in practice, above-mentioned sorting algorithm all shows its validity, these graders are generally all present Following limitation:
1st, over-fitting and poor fitting:Data significantly are underused, fitting result does not meet expection, or even can not be effective Training set is fitted, predictablity rate, recall rate are all more much lower than best fit function in theory, then are poor fitting.Consider excessive, Beyond the general sense dimension of independent variable, noise is excessively considered, can cause over-fitting.Training set prediction effect is good, and test set is pre- Survey effect poor, belong to the result of over-fitting.The sorting algorithm of low dimensional, because its dimension considered is limited, compares noise Sensitivity, and for different training airplanes, be required for adjusting corresponding parameter, it is easy to over-fitting or poor fitting occur Phenomenon.
2nd, discrete variable and continuous variable:For one group of training data, variable therein can be divided according to distribution situation For discrete variable and continuous variable.For some variables, it is merely able to take fixed several values, then belongs to discrete variable, such as Sex.And for some variables, then can take the arbitrary value within the scope of one, then belong to continuous variable, such as age.For Different low dimensional algorithms, often only supports a certain variable, for example:Decision tree only supports discrete variable.And most line Property algorithm (logistic regression, SVMs etc.) then supports continuous variable.Although, can be by between discrete variable and continuous variable According to the mutually conversion of certain rule, but the result of conversion is often all undesirable and more complicated.
3rd, characteristic is single:Different sorting algorithms has each different design original intentions, is produced not under different scenes Same effect.And in actual applications, the characteristic and distribution situation of training set are all different, it is also possible to pushing away over time Move, change, i.e. the phenomenon of concept drift.And single sorting algorithm, this situation of change can not be often tackled, therefore, Effect will cause large effect.
In addition, in the design of these algorithms, it is necessary to the access training set data repeated.Therefore, it is impossible to support streaming The processing of data.
The content of the invention
The technical problem to be solved in the present invention is, overcomes the shortcomings of that existing technology is based on assembled classification there is provided one kind The random assortment method and device of device, the method can solve the problem that the limitation of existing grader, reduce over-fitting and poor fitting Phenomenon, it would be preferable to support discrete variable and continuous variable, and concept drift phenomenon can be overcome.
To reach above-mentioned technical purpose, the random assortment method of the present invention based on assembled classifier, its feature exists In methods described includes:The incomplete same grader of N number of type is randomly choosed as assembled classifier;To randomly selected Each grader selectes training set and test set;Each grader is trained and tested respectively, assembled classifier is obtained Average accuracy;Judged whether to trigger eliminative mechanism according to the average accuracy of assembled classifier;Entered based on judged result and divided Class calculation procedure, obtains the classification results of each grader;The classification results of each grader are voted, final classification knot is obtained Really.
Further, it is described to the selected training set of randomly selected each grader and test set, including:By M feature Under each data set be respectively divided into and divided data collection and treat divided data collection;For each grader, selected at random in M feature M feature is taken as the selected feature of the grader, m < M, mN >=M;The i% conducts of the collection of divided data of the selected feature The training set of correspondence grader, the i < 50;(1-i%) of the collection of divided data of the selected feature is used as correspondence grader Test set;And, the classified calculating step is specifically included:M feature is treated that divided data collection delivers to assembled classifier Classified calculating is carried out in each grader.
Further, the average accuracy that each grader is trained and tested, assembled classifier is obtained, Specifically include:The training set is delivered to and is trained in correspondence grader, corresponding disaggregated model, each disaggregated model structure is obtained Into the corresponding assembled classification model of assembled classifier;The test set is delivered to and tested in correspondence disaggregated model, obtains each The accuracy of grader;According to the accuracy of each grader, the average accuracy of assembled classifier is calculated.
Yet further, the average accuracy according to the assembled classifier judges whether to trigger eliminative mechanism, tool Body includes:When the average accuracy of the assembled classifier is less than or equal to x, eliminative mechanism is triggered;When the assembled classifier Average accuracy be more than x when, do not trigger eliminative mechanism;It is described that classified calculating step is entered based on judged result, specifically include: Trigger after eliminative mechanism, return to divided data collection, generate new assembled classification model, new assembled classification model is surveyed Examination, until when the average accuracy of new assembled classifier is more than x, into classified calculating step;When not triggering eliminative mechanism, directly Tap into classified calculating step;The classified calculating step is specifically included:M feature is treated that divided data collection delivers to assembled classification Classified calculating is carried out in each disaggregated model of device, N number of classification results are obtained;The eliminative mechanism, is specifically included:By current group Each disaggregated model in grader is closed to sort from high to low by the accuracy of correspondence grader;Accuracy minimum n is deleted successively Disaggregated model;When often deleting a disaggregated model, a new grader is all supplemented at random;It is described to return to divided data collection, specifically Including:M feature is randomly selected as selected feature to new grader;Using under selected feature the i% of divided data collection as new The training set of grader, (1-i%) is used as the test set of new grader, the i < 50;The new assembled classification mould of the generation Type, including:New grader is trained, new disaggregated model is obtained;The classification of by new disaggregated model and not superseded grader Model constitutes new assembled classification model.
In the above-mentioned technical solutions, the data of the data set under the M feature are stream data;When in assembled classifier Grader be do not support stream data handle grader when, define a training line up, the stream data constantly enters Line up, it is described line up by stream data fill up rear grader start training, when training line up in stream data replace more than one After half, the grader re -training.
Random assortment device of the present invention based on assembled classifier, it is characterised in that described device includes combination Grader and categorizing system:The assembled classifier is made up of the incomplete same grader of N number of type;The categorizing system bag Include:Data processing unit, for selecting training set and test set to each grader;Test cell is trained, for respectively to every Individual grader is trained and tested, and obtains the average accuracy of assembled classifier;Unit is eliminated, for according to the combination point The average accuracy of class device judges whether to trigger eliminative mechanism, based on judged result execution eliminative mechanism;Classified calculating unit, is used In the classification results for calculating each grader;Ballot unit, votes for the classification results to each grader, and obtains final Classification results.
Further, the data processing unit includes characteristic module and data module;The characteristic module, for for Each grader, randomly selects m feature as selected feature, m < M, mN >=M in M feature;The data module, is used Divided data collection and divided data collection is treated in each data set under M feature is respectively divided into;By the divided data of selected feature The i% of collection is used as the training set for corresponding to grader;(1-i%) of the collection of divided data of selected feature is used as correspondence grader Test set.
Further, the training test cell includes training module and test module;The training module, for inciting somebody to action The training set is delivered to be trained in correspondence grader, obtains corresponding disaggregated model, each disaggregated model constitutes assembled classification Model;The test module, is tested in correspondence disaggregated model for the test set to be delivered to, is obtaining each grader just True rate, the average accuracy of assembled classifier is obtained according to the accuracy of each grader;The classified calculating unit, specifically for Treating M feature that divided data collection is delivered in each grader of assembled classifier and carry out classified calculating.
Yet further, the superseded unit includes eliminating judge module and superseded performing module;The superseded judgement mould Block, for judging whether the average accuracy of assembled classifier is less than or equal to x;When assembled classifier average accuracy be less than etc. When x, eliminative mechanism is triggered;When the average accuracy of assembled classifier is more than x, triggering classified calculating unit calculates each classification The classification results of device;The superseded performing module, for performing eliminative mechanism, and trigger data processing unit and training test are single The new assembled classification model of member generation, is tested new assembled classification model, and being averaged new assembled classifier just True rate is sent to the superseded judge module;The eliminative mechanism includes:Each disaggregated model in current assembled classifier is pressed The accuracy of correspondence grader sorts from high to low;N minimum disaggregated model of accuracy is deleted successively;Often delete a classification During model, a new grader is all supplemented at random;The classification that the new assembled classification model is not eliminated by new disaggregated model and The disaggregated model composition of device;The new disaggregated model randomly selects m feature to new grader by data processing unit and is used as Selected feature, then using the i% of divided data collection is used as the training set of new grader, new point of (1-i%) conduct under selected feature The test set of class device, is obtained after being trained to new grader.
In the above-mentioned technical solutions, when the grader in assembled classifier is not support the grader of stream data, institute State training unit including training to line up, for sending into and replacing stream data;Stream data in training is lined up replaces super Cross after half, the grader re -training.
The present invention proposes the random assortment method based on assembled classifier, can be with the solution practical application of efficient general Common classification problem.Different sorting algorithms is introduced into simultaneously to come, different training sets and training characteristic are distributed.By not With mutually making up between algorithm, to overcome the defect of existing grader, the comprehensive speciality for playing all kinds of algorithms, so as to obtain pole Good versatility.
Under normal circumstances, the i% collection of divided data can be chosen as training set, the collection conduct of divided data of (1-i%) Test set, i < 50.Because, often can all there is a certain amount of noise data in training set.And calculated in many classification In method, a small amount of noise may produce large effect to training effect.Therefore, having divided only with smaller portions of the invention Data are trained, as soon as so substantially can to exclude noise data, so as to ensure training effect.Even if there is part Grader be unfortunately assigned to the higher data of noise, can also be eliminated in selection process later.
If one has M characteristic, it can be trained to m characteristic of the distribution of each grader at random, m < M, mN ≥M.In machine learning, we can generally include hundreds and thousands of characteristics that may be relevant, but lack which can not be determined The combination of individual characteristic or which characteristic is most significant.And utilize randomness, it is possible to easily directly solve this Problem.By constantly attempting possible combination, the preferable grader of training effect is retained, the poor grader of training effect Eliminate, it is possible to which the characteristic for eventually attempting to out most worthy is come.Why the corresponding characteristic quantity of each grader can be set That puts is relatively low, exactly in order to prevent excessive irrelevant variable from introducing, and the number of times required for making trial becomes too big.
In the present invention, eliminative mechanism is introduced, the accuracy to each grader carries out the monitoring of real-time or timing, If the average accuracy of all graders is less than certain threshold value, the minimum n of accuracy grader is deleted.Together When, then new grader is generated at random, keep total grader quantity constant, the accuracy of a grader is monitored again, thus It can make the accuracy of assembled classifier in the present invention all the time more than satisfactory value.By constantly eliminating and generating, classification The mechanism of a survival of the fittest is just generated between device, adapting to the more preferable grader of effect can be retained, and cause the grader Accounting can more and more higher;And adapting to that effect is poor to be then eliminated, accounting is relatively low or even disappears in classifiers combination.So One, whole sorting algorithm can just pick out accuracy rate automatically more preferably, and the more preferable grader of effect comes.Therefore, this cradle System effectively reduces the situation of concept drift.
Based on randomness, each grader has different characteristics and different models, therefore, and a grader is often Only can be especially accurate in terms of some, and it is relatively weak in other respects.But by voting mechanism, then can be by grader Special essence in one aspect is showed.Such as:There are three graders, grader A has for the recognition accuracy of classification first 99%, and the recognition accuracy to classification second then only has 60%.And identifications of the other two grader BC then for classification first is accurate Exactness only has 50%.So, final voting results are exactly:For the data of some classification first, grader BC is to carry out Random conjecture, therefore poll maintains an equal level.And grader A has then made accurate judgement, so that the poll of classification first is dominant. Therefore, the final result of ballot, can be in fact leading using classification accuracy highest grader.
The present invention can also be applied in stream data in addition to solving above-mentioned prior art problem.For branch itself Hold the grader of stream data processing, such as Hoeffding trees, it is only necessary to be sent directly into training data by its training method. The grader for not supporting stream data to handle for those, such as logistic regression can define a training queue, constantly will instruction Practice data to be filled into training queue, after queue full, that is, start training.Then, new training data is constantly inserted, old Training data be sent queue, after the data of half are replaced, that is, start to train again.So, batch processing Grader, can also be applied to this algorithm, meanwhile, by the replacement of training data, can also possess certain concept Drift adaptibility to response.
In actual test, the present invention can not only apply to stream data, also show extremely strong robustness and just True rate;Meanwhile, the phenomenon of over-fitting and poor fitting is effectively reduced, and discrete variable and continuous variable, and energy can be supported Enough overcome concept drift phenomenon.
Brief description of the drawings
In order to illustrate more clearly about the embodiment of the present invention or technical scheme of the prior art, below will be to embodiment or existing There is the accompanying drawing used required in technology description to be briefly described, it should be apparent that, drawings in the following description are only this Some embodiments of invention, for those of ordinary skill in the art, on the premise of not paying creative work, can be with Other accompanying drawings are obtained according to these accompanying drawings.
Fig. 1 is schematic flow sheet of the invention;
Fig. 2 is the structural representation of categorizing system in the present invention;
Fig. 3 is the step flow chart of the embodiment of the present invention;
Fig. 4 is the structural scheme of mechanism of data processing unit, training test cell and superseded unit in the present invention;
Fig. 5 is configuration diagram of the invention.
Embodiment
Below in conjunction with the accompanying drawing in the embodiment of the present invention, the technical scheme in the embodiment of the present invention is carried out clear, complete Site preparation is described, it is clear that described embodiment is only a part of embodiment of the invention, rather than whole embodiments.It is based on Embodiment in the present invention, it is every other that those of ordinary skill in the art are obtained under the premise of creative work is not made Embodiment, belongs to the scope of protection of the invention.
As shown in Fig. 1, Fig. 3 and Fig. 5, the random assortment method based on assembled classifier, methods described includes:
101st, randomly choose the incomplete same grader of N number of type and be used as assembled classifier;The grader is use Main flow grader in machine learning.
102nd, training set and test set are selected to randomly selected each grader;
1021st, each data set under M feature is respectively divided into divided data collection and treats divided data collection;
1022nd, for each grader, m feature is randomly selected in M feature as the selected feature of the grader; Described m, M, N are met:mN≥3M;
1023rd, the i% of the collection of divided data of the selected feature is used as the training set of correspondence grader, the i%=1/ 3;(1-i%) of the collection of divided data of the selected feature is used as the test set for corresponding to grader.
103rd, each grader is trained and tested respectively, obtain the average accuracy of assembled classifier;
1031st, the training set is delivered to and be trained in correspondence grader, obtain corresponding disaggregated model, mould of respectively classifying Type constitutes assembled classification model;
1032nd, the test set is delivered to and tested in correspondence disaggregated model, obtain the accuracy of each grader;
104th, judged whether to trigger eliminative mechanism according to the average accuracy of assembled classifier;
1041st, judge whether the average accuracy of assembled classifier is more than x, 70%≤x≤80%;
1042nd, when the average accuracy of assembled classifier is less than or equal to x, eliminative mechanism is triggered, divided data collection is returned, The new assembled classification model of generation, is tested new assembled classification model;Until new assembled classifier is average correct When rate is more than x, into classified calculating step;
Each disaggregated model in current assembled classifier is sorted from high to low by the accuracy of correspondence grader;Delete successively Except the n disaggregated model that accuracy is minimum, n=0.25N;When often deleting a disaggregated model, all random new classification of supplement one Device;M feature is randomly selected as selected feature to new grader;Using the i% of divided data collection is used as new point under selected feature The training set of class device, (1-i%) as new grader test set;New grader is trained, new disaggregated model is obtained;Will The disaggregated model of new disaggregated model and not superseded grader constitutes new assembled classification model;To new assembled classification model weight Newly tested, until the average accuracy of assembled classifier is more than x;
When the average accuracy of assembled classifier is more than x, eliminative mechanism is not triggered, then is directly entered classified calculating step Suddenly.
105th, classified calculating step is entered based on judged result, obtains the classification results of each grader;
Treating M feature that divided data collection is delivered in each grader of assembled classifier and carry out classified calculating, obtains N number of point Class result.
106th, the classification results of each grader are voted, obtains final classification result.
The data of data set under the M feature are stream data.Stream data is not supported when having in assembled classifier During the grader of processing, a training defined in the training set of the grader is lined up, and stream data, which constantly enters, lines up, institute State line up by stream data fill up rear grader start training, when in lining up stream data replace exceed half after, described point Class device re -training.
As shown in Fig. 2, Fig. 4 and Fig. 5, the random assortment device based on assembled classifier, including assembled classifier and classification System:The assembled classifier is made up of the incomplete same grader of N number of type, during the grader is machine learning Main flow grader;The categorizing system includes:Data processing unit 21, training test cell 22, superseded unit 23, classified calculating Unit 24 and ballot unit 25.
The data processing unit 21 includes characteristic module 211 and data module 212.The characteristic module 211, for dividing Not Gei each grader m feature is randomly selected in M feature as selected feature, described m, M, N are met: mN≥3M.The data module 212, for according to collection being respectively divided data collection and to have treated divided data collection by each under M feature;Will The i% of the collection of divided data of selected feature is used as the training set for corresponding to grader;By (the 1- of the collection of divided data of selected feature I%) as the test set of correspondence grader, the i%=1/3.
The training test cell 22 includes training module 221 and test module instruction 222.The training module 221, is used for The training set is delivered to and is trained in correspondence grader, corresponding disaggregated model is obtained, each disaggregated model constitutes combination point Class model.The test module 222, tested for the test set to be delivered in correspondence disaggregated model, obtain each classification The accuracy of device, the average accuracy of assembled classifier is obtained according to the accuracy of each grader;
The superseded unit 23 includes eliminating judge module 231 and superseded performing module 232.The superseded judge module 231, for judging whether the average accuracy of assembled classifier is less than or equal to x, 70%≤x≤80%;When assembled classifier When average accuracy is less than or equal to x, eliminative mechanism is triggered;When the average accuracy of assembled classifier is more than x, triggering classification meter Calculate unit.The superseded execution model 232, for performing eliminative mechanism, and trigger data processing unit and training test cell The new assembled classification model of generation, is tested new assembled classification model, and being averaged for new assembled classifier is correct Rate is sent to the superseded judge module.
The eliminative mechanism includes:N minimum disaggregated model of accuracy, n=0.25N are deleted successively;Often delete one During disaggregated model, a new grader is all supplemented at random;What the new assembled classification model was not eliminated by new disaggregated model and The disaggregated model composition of grader;The new disaggregated model randomly selects m feature by data processing unit to new grader As selected feature, then using the i% of divided data collection is used as the training set of new grader, (1-i%) conduct under selected feature The test set of new grader, is obtained after being trained to new grader.
The classified calculating unit 24 is used for treating that divided data collection is delivered to and entering in each grader of assembled classifier for M feature Row classified calculating, obtains N number of classification results.
The ballot unit 25 is used to vote to N number of classification results, and obtains final classification result.
When the grader in assembled classifier is not support the grader of stream data, the training unit includes training Line up, for sending into and replacing stream data;After the stream data during training is lined up, which is replaced, exceedes half, the grader Re -training.
It should be understood that the particular order or level the step of during disclosed are the examples of illustrative methods.Based on setting Count preference, it should be appreciated that during the step of particular order or level can the protection domain for not departing from the disclosure feelings Rearranged under condition.Appended claim to a method gives the key element of various steps with exemplary order, and not It is to be limited to described particular order or level.
In above-mentioned detailed description, various features are combined in single embodiment together, to simplify the disclosure.No This open method should be construed to reflect such intention, i.e. the embodiment of theme claimed needs ratio The more features of feature clearly stated in each claim.On the contrary, as appended claims is reflected Like that, the present invention is in the state fewer than whole features of disclosed single embodiment.Therefore, appended claims It is hereby expressly incorporated into detailed description, wherein each claim is alone as the single preferred embodiment of the present invention.
To enable any technical staff in the art to realize or using the present invention, disclosed embodiment being entered above Description is gone.To those skilled in the art;The various modification modes of these embodiments will be apparent from, and this The General Principle of text definition can also be applied to other embodiments on the basis of the spirit and scope of the disclosure is not departed from. Therefore, the disclosure is not limited to embodiments set forth herein, but most wide with principle disclosed in the present application and novel features Scope is consistent.
Described above includes the citing of one or more embodiments.Certainly, in order to above-described embodiment is described and description portion The all possible combination of part or method is impossible, but it will be appreciated by one of ordinary skill in the art that each is implemented Example can do further combinations and permutations.Therefore, embodiment described herein is intended to fall into appended claims Protection domain in all such changes, modifications and variations.In addition, with regard to the term used in specification or claims "comprising", the mode that covers of the word is similar to term " comprising ", just as " including, " solved in the claims as link word As releasing.In addition, the use of any one term "or" in the specification of claims being to represent " non-exclusionism Or ".
Those skilled in the art will also be appreciated that the various illustrative components, blocks that the embodiment of the present invention is listed (illustrative logical block), unit, and step can be by the knots of electronic hardware, computer software, or both Conjunction is realized.To clearly show that the replaceability (interchangeability) of hardware and software, above-mentioned various explanations Property part (illustrative components), unit and step universally describe their function.Such work( Can be that the design requirement depending on specific application and whole system is realized by hardware or software.Those skilled in the art For every kind of specific application various methods can be used to realize described function, but this realization is understood not to The scope protected beyond the embodiment of the present invention.
Various illustrative logical blocks described in the embodiment of the present invention, or unit can by general processor, Digital signal processor, application specific integrated circuit (ASIC), field programmable gate array or other programmable logic devices, discrete gate Or the design of transistor logic, discrete hardware components, or any of the above described combination is come the function described by realizing or operate.General place It can be microprocessor to manage device, and alternatively, the general processor can also be any traditional processor, controller, microcontroller Device or state machine.Processor can also be realized by the combination of computing device, such as digital signal processor and microprocessor, Multi-microprocessor, one or more microprocessors combine a Digital Signal Processor Core, or any other like configuration To realize.
The step of method described in the embodiment of the present invention or algorithm can be directly embedded into hardware, computing device it is soft Part module or the combination of both.Software module can be stored in RAM memory, flash memory, ROM memory, EPROM storages Other any form of storage media in device, eeprom memory, register, hard disk, moveable magnetic disc, CD-ROM or this area In.Exemplarily, storage medium can be connected with processor, to allow processor to read information from storage medium, and Write information can be deposited to storage medium.Alternatively, storage medium can also be integrated into processor.Processor and storage medium can To be arranged in ASIC, ASIC can be arranged in user terminal.Alternatively, processor and storage medium can also be arranged at use In different parts in the terminal of family.
In one or more exemplary designs, above-mentioned functions described by the embodiment of the present invention can be in hardware, soft Part, firmware or any combination of this three are realized.If realized in software, these functions can be stored and computer-readable On medium, or with it is one or more instruction or code form be transmitted on the medium of computer-readable.Computer readable medium includes electricity Brain stores medium and is easy to so that allowing computer program to be transferred to other local telecommunication medias from a place.Storing medium can be with It is that any general or special computer can be with the useable medium of access.For example, such computer readable media can include but It is not limited to RAM, ROM, EEPROM, CD-ROM or other optical disc storage, disk storage or other magnetic storage devices, or other What can be used for carrying or store with instruct or data structure and it is other can be by general or special computer or general or specially treated Device reads the medium of the program code of form.In addition, any connection can be properly termed computer readable medium, example Such as, if software is to pass through a coaxial cable, fiber optic cables, double from web-site, server or other remote resources Twisted wire, Digital Subscriber Line (DSL) or with defined in being also contained in of the wireless way for transmitting such as infrared, wireless and microwave In computer readable medium.Described disk (disk) and disk (disc) include Zip disk, radium-shine disk, CD, DVD, floppy disk And Blu-ray Disc, disk is generally with magnetic duplication data, and disk generally carries out optical reproduction data with laser.Combinations of the above It can also be included in computer readable medium.
Above-described embodiment, has been carried out further to the purpose of the present invention, technical scheme and beneficial effect Describe in detail, should be understood that the embodiment that the foregoing is only the present invention, be not intended to limit the present invention Protection domain, within the spirit and principles of the invention, any modification, equivalent substitution and improvements done etc. all should be included Within protection scope of the present invention.

Claims (10)

1. a kind of random assortment method based on assembled classifier, it is characterised in that methods described includes:
The incomplete same grader of N number of type is randomly choosed as assembled classifier;
Training set and test set are selected to randomly selected each grader;
Each grader is trained and tested respectively, the average accuracy of assembled classifier is obtained;
Judged whether to trigger eliminative mechanism according to the average accuracy of assembled classifier;
Classified calculating step is entered based on judged result, the classification results of each grader are obtained;
The classification results of each grader are voted, final classification result is obtained.
2. the random assortment method according to claim 1 based on assembled classifier, it is characterised in that described to random choosing The each grader selected selectes training set and test set, including:
Each data set under M feature is respectively divided into divided data collection and divided data collection is treated;
For each grader, m feature is randomly selected in M feature as the selected feature of the grader, m < M, mN >= M;
The i% of the collection of divided data of the selected feature is used as the training set of correspondence grader, the i < 50;
(1-i%) of the collection of divided data of the selected feature is used as the test set for corresponding to grader;
And, the classified calculating step is specifically included:
Treating M feature that divided data collection is delivered in each grader of assembled classifier and carry out classified calculating.
3. the random assortment method according to claim 2 based on assembled classifier, it is characterised in that described to each point Class device is trained and tested, and obtains the average accuracy of assembled classifier, specifically includes:
The training set is delivered to and is trained in correspondence grader, corresponding disaggregated model, each disaggregated model composition group is obtained Close the corresponding assembled classification model of grader;
The test set is delivered to and tested in correspondence disaggregated model, the accuracy of each grader is obtained;
According to the accuracy of each grader, the average accuracy of assembled classifier is calculated.
4. the random assortment method according to claim 3 based on assembled classifier, it is characterised in that described in the basis The average accuracy of assembled classifier judges whether to trigger eliminative mechanism, specifically included:
When the average accuracy of the assembled classifier is less than or equal to x, eliminative mechanism is triggered;
When the average accuracy of the assembled classifier is more than x, eliminative mechanism is not triggered;
It is described that classified calculating step is entered based on judged result, specifically include:
Trigger after eliminative mechanism, return to divided data collection, generate new assembled classification model, new assembled classification model is carried out Test, until when the average accuracy of new assembled classifier is more than x, into classified calculating step;
When not triggering eliminative mechanism, classified calculating step is directly entered;
The classified calculating step is specifically included:M feature is treated that divided data collection delivers to each disaggregated model of assembled classifier Middle carry out classified calculating, obtains N number of classification results;
The eliminative mechanism, is specifically included:
Each disaggregated model in current assembled classifier is sorted from high to low by the accuracy of correspondence grader;
N minimum disaggregated model of accuracy is deleted successively;
When often deleting a disaggregated model, a new grader is all supplemented at random;
It is described to return to divided data collection, specifically include:
M feature is randomly selected as selected feature to new grader;
Using under selected feature the i% of divided data collection as new grader training set, (1-i%) as new grader test Collection, the i < 50;
The new assembled classification model of the generation, is specifically included:
New grader is trained, new disaggregated model is obtained;
The disaggregated model of by new disaggregated model and not superseded grader constitutes new assembled classification model.
5. the random assortment method based on assembled classifier according to any one of Claims 1-4, it is characterised in that institute The data for stating the data set under M feature are stream data;
When the grader in assembled classifier is the grader for not supporting stream data to handle, defines a training and line up, institute State stream data and constantly enter and line up, it is described line up to fill up rear grader by stream data start training, when training is lined up Stream data, which is replaced, to be exceeded after half, the grader re -training.
6. a kind of random assortment device based on assembled classifier, it is characterised in that described device includes assembled classifier and divided Class system;The assembled classifier is made up of the incomplete same grader of N number of type;The categorizing system includes:
Data processing unit, for selecting training set and test set to each grader;
Test cell is trained, for each grader to be trained and tested respectively, the average correct of assembled classifier is obtained Rate;
Unit is eliminated, for judging whether to trigger eliminative mechanism according to the average accuracy of the assembled classifier, based on judgement As a result eliminative mechanism is performed;
Classified calculating unit, the classification results for calculating each grader;
Ballot unit, votes, and obtain final classification result for the classification results to each grader.
7. the random assortment device according to claim 6 based on assembled classifier, it is characterised in that the data processing Unit includes characteristic module and data module;
The characteristic module, for for each grader, randomly selecting m feature in M feature as selected feature, m < M, mN >=M;
The data module, for each data set under M feature to be respectively divided into divided data collection and divided data collection is treated;Will The i% of the collection of divided data of selected feature is used as the training set for corresponding to grader;By (the 1- of the collection of divided data of selected feature I%) as the test set of correspondence grader.
8. the random assortment device according to claim 7 based on assembled classifier, it is characterised in that the training test Unit includes training module and test module;
The training module, is trained in correspondence grader for the training set to be delivered to, obtains corresponding disaggregated model, Each disaggregated model constitutes assembled classification model;
The test module, is tested in correspondence disaggregated model for the test set to be delivered to, is obtaining each grader just True rate, the average accuracy of assembled classifier is obtained according to the accuracy of each grader;
The classified calculating unit, specifically for M feature is treated into divided data collection is delivered in each grader of assembled classifier Carry out classified calculating.
9. the random assortment device according to claim 8 based on assembled classifier, it is characterised in that the superseded unit Including eliminating judge module and superseded performing module;
The superseded judge module, for judging whether the average accuracy of assembled classifier is less than or equal to x;Work as assembled classifier Average accuracy be less than or equal to x when, trigger eliminative mechanism;When the average accuracy of assembled classifier is more than x, triggering classification Computing unit calculates the classification results of each grader;
The superseded performing module, for performing eliminative mechanism, and trigger data processing unit and training test cell generation are new Assembled classification model, new assembled classification model is tested, and the average accuracy of new assembled classifier is sent To the superseded judge module;
The eliminative mechanism includes:By each disaggregated model in current assembled classifier by correspondence grader accuracy by height to Low sequence;N minimum disaggregated model of accuracy is deleted successively;When often deleting a disaggregated model, all random supplement one is new Grader;
The disaggregated model for the grader that the new assembled classification model is not eliminated by new disaggregated model and is constituted;
The new disaggregated model randomly selects m feature to new grader by data processing unit and is used as selected feature, then Using under selected feature the i% of divided data collection as new grader training set, (1-i%) as new grader test set, Obtained after being trained to new grader.
10. the random assortment device based on assembled classifier according to any one of claim 6 to 9, it is characterised in that When the grader in assembled classifier is not support the grader of stream data, the training unit is lined up including training, is used In sending into and replace stream data;After the stream data during training is lined up, which is replaced, exceedes half, the grader is instructed again Practice.
CN201710244805.4A 2017-04-14 2017-04-14 Random assortment method and device based on assembled classifier Pending CN107169506A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710244805.4A CN107169506A (en) 2017-04-14 2017-04-14 Random assortment method and device based on assembled classifier

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710244805.4A CN107169506A (en) 2017-04-14 2017-04-14 Random assortment method and device based on assembled classifier

Publications (1)

Publication Number Publication Date
CN107169506A true CN107169506A (en) 2017-09-15

Family

ID=59849736

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710244805.4A Pending CN107169506A (en) 2017-04-14 2017-04-14 Random assortment method and device based on assembled classifier

Country Status (1)

Country Link
CN (1) CN107169506A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109936582A (en) * 2019-04-24 2019-06-25 第四范式(北京)技术有限公司 Construct the method and device based on the PU malicious traffic stream detection model learnt
CN110516758A (en) * 2019-09-02 2019-11-29 广东工业大学 A kind of alzheimer's disease classification prediction technique and system
CN111343205A (en) * 2020-05-19 2020-06-26 中国航空油料集团有限公司 Industrial control network security detection method and device, electronic equipment and storage medium
WO2020168690A1 (en) * 2019-02-19 2020-08-27 深圳点猫科技有限公司 Ai implementation method for classification based on graphical programming tool, and electronic device

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020168690A1 (en) * 2019-02-19 2020-08-27 深圳点猫科技有限公司 Ai implementation method for classification based on graphical programming tool, and electronic device
CN109936582A (en) * 2019-04-24 2019-06-25 第四范式(北京)技术有限公司 Construct the method and device based on the PU malicious traffic stream detection model learnt
CN110516758A (en) * 2019-09-02 2019-11-29 广东工业大学 A kind of alzheimer's disease classification prediction technique and system
CN111343205A (en) * 2020-05-19 2020-06-26 中国航空油料集团有限公司 Industrial control network security detection method and device, electronic equipment and storage medium
CN111343205B (en) * 2020-05-19 2020-09-01 中国航空油料集团有限公司 Industrial control network security detection method and device, electronic equipment and storage medium

Similar Documents

Publication Publication Date Title
CN107169506A (en) Random assortment method and device based on assembled classifier
CN112613552B (en) Convolutional neural network emotion image classification method combined with emotion type attention loss
CN111860638A (en) Parallel intrusion detection method and system based on unbalanced data deep belief network
CN103744928B (en) A kind of network video classification method based on history access record
CN105512916B (en) Method for delivering advertisement accurately and system
CN110764064A (en) Radar interference signal identification method based on deep convolutional neural network integration
CN107609147B (en) Method and system for automatically extracting features from log stream
CN102214320A (en) Neural network training method and junk mail filtering method using same
CN112784031B (en) Method and system for classifying customer service conversation texts based on small sample learning
CN111125429B (en) Video pushing method, device and computer readable storage medium
CN108388929A (en) Client segmentation method and device based on cost-sensitive and semisupervised classification
CN111310918B (en) Data processing method, device, computer equipment and storage medium
CN108509996A (en) Feature selection approach based on Filter and Wrapper selection algorithms
CN112241494A (en) Key information pushing method and device based on user behavior data
CN106095939A (en) The acquisition methods of account authority and device
Sidle et al. Using multi-class classification methods to predict baseball pitch types
CN114493680B (en) Fishery resource statistical method and system based on stream stab net investigation
CN106127226B (en) The flexible grain quality detection method of grain grain and grain grain test sample
CN108108912A (en) Method of discrimination, device, server and the storage medium of interactive low quality user
Wang et al. Changing lane probability estimating model based on neural network
CN113239199A (en) Credit classification method based on multi-party data set
CN116188834B (en) Full-slice image classification method and device based on self-adaptive training model
CN112070112B (en) Method and device for classifying crimes related to network, computer equipment and storage medium
Wu et al. An online-optimized incremental learning framework for video semantic classification
CN113807541B (en) Fairness repair method, system, equipment and storage medium for decision system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20170915