CN102402713B

CN102402713B - machine learning method and device

Info

Publication number: CN102402713B
Application number: CN201010280239.0A
Authority: CN
Inventors: 杨宇航; 于浩; 孟遥; 陆应亮; 夏迎炬
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 2010-09-09
Filing date: 2010-09-09
Publication date: 2015-11-25
Anticipated expiration: 2030-09-09
Also published as: CN102402713A

Abstract

The invention discloses a kind of machine learning method and device.Described machine learning method comprises: the seed S set that the data centralization automatic marking utilizing diverse ways never to mark is different with acquisition n ₁, S ₂..., S _n, n is natural number and n>=2; Utilize the seed S set of the individual automatic marking of described n ₁, S ₂..., S _ntrain corresponding n sorter C respectively ₁, C ₂..., C _n; For each seed S set in the seed set of the individual automatic marking of described n _i, i=1,2 ..., n, utilizes removing by this seed S set in a described n sorter _ithe sorter C of training _ioutside part or all of sorter to this seed S set _iverify; And utilize described n seed S set of empirical tests ₁, S ₂..., S _nagain train corresponding n sorter C respectively ₁, C ₂..., C _n.

Description

Machine learning method and device

Technical field

The present invention relates to machine learning field, more specifically, relate to a kind of fault-tolerant machine learning method and device.

Background technology

Machine learning is intended to research computing machine and how simulates or to realize the learning behavior of the mankind, to obtain new knowledge or skills, reorganizes the performance that the existing structure of knowledge makes it constantly to improve self.Machine learning method and device are widely used in the task of different field, such as computer vision, natural language processing, bioinformatics etc.

Machine learning can be divided into supervise learning and the large class of non supervised learning two.Generally, guideless learning method uses the data set training classifier do not marked.Fig. 1 shows the indicative flowchart that a kind of nothing of the prior art instructs machine learning method.In step s 110, random labelling is carried out to the data set do not marked, obtain training set.In the step s 120, training set training classifier is used.In step s 130, which, pending example collection is predicted with the sorter trained.Guideless learning method marks data set without the need to dropping into a large amount of manpower, but because data set is without mark, effect may not be very desirable.

Fig. 2 shows a kind of indicative flowchart having guidance machine learning method of the prior art.In step S210, with the training set training classifier of artificial mark.In step S220, predict pending example collection with the sorter trained.Supervised learning approach uses the data of a large amount of artificial check and correction, thus can obtain good effect.But such method is difficult to field or the application of being transplanted to resource-constrained.

Therefore machine learning method often faces such awkward situation: unsupervised approach possibility effect is not very good, and has the method for guidance to need to consume a large amount of manpower and materials for preparing corpus.

In order to overcome this awkward situation, there is half directed learning method.Fig. 3 shows the indicative flowchart of a kind of half guidance machine learning method of the prior art.Compared with the non supervised learning method of Fig. 1, in Fig. 3 when training classifier, except the training set using data centralization random labelling and the acquisition never marked, also use the training set of artificial mark.Fig. 4 shows the indicative flowchart of another kind of half guidance machine learning method of the prior art.In the method for Fig. 4, artificial mark and an acquisition seed set in step S410, and train a sorter with this seed set in the step s 420.In addition, in order to improve the performance of sorter, in step S430, predict pending example collection with sorter; In step S440, example the highest for the middle confidence level that predicts the outcome is added in seed set; And in step S450, utilize the seed set training classifier again adding example.Repeat step S430 to S450, until meet the repetition end condition of regulation.

The language material that half method instructed can use mark simultaneously and not mark, but still depend critically upon the scale and quality of mark language material.It is still how the significant challenge of machine learning field face at artificial degree of participation and aspect of performance seeking balance.

Summary of the invention

Give hereinafter about brief overview of the present invention, to provide about the basic comprehension in some of the present invention.Should be appreciated that this general introduction is not summarize about exhaustive of the present invention.It is not that intention determines key of the present invention or pith, and nor is it intended to limit the scope of the present invention.Its object is only provide some concept in simplified form, in this, as the preorder in greater detail discussed after a while.

In view of the above situation of prior art, the present invention aims to provide a kind of efficient, fault-tolerant machine learning method and device.

According to an aspect of the present invention, a kind of machine learning method comprises: the seed S set that the data centralization automatic marking utilizing diverse ways never to mark is different with acquisition n ₁, S ₂..., S _n, n is natural number and n>=2; Utilize the seed S set of the individual automatic marking of described n ₁, S ₂..., S _ntrain corresponding n sorter C respectively ₁, C ₂..., C _n; For each seed S set in the seed set of the individual automatic marking of described n _i, i=1,2 ..., n, utilizes removing by this seed S set in a described n sorter _ithe sorter C of training _ioutside part or all of sorter to this seed S set _iverify; And utilize described n seed S set of empirical tests ₁, S ₂..., S _nagain train corresponding n sorter C respectively ₁, C ₂..., C _n.

According to a further aspect in the invention, a kind of machine learning device comprises: initialization unit, is configured for: the seed S set that the data centralization automatic marking utilizing diverse ways never to mark is different with acquisition n ₁, S ₂..., S _n, n is natural number and n>=2; Utilize the seed S set of the individual automatic marking of described n ₁, S ₂..., S _ntrain corresponding n sorter C respectively ₁, C ₂..., C _n; And for each seed S set in the seed set of the individual automatic marking of described n _i, i=1,2 ..., n, utilizes removing by this seed S set in a described n sorter _ithe sorter C of training _ioutside part or all of sorter to this seed S set _iverify; And optimize and processing unit, be configured for: described n the seed S set utilizing empirical tests ₁, S ₂..., S _nagain train corresponding n sorter C respectively ₁, C ₂..., C _n.

In said method and device, by distinct methods, automatic marking is carried out to the data set do not marked, without the need to artificial participation, improve learning efficiency.In addition, by carrying out cross validation with sorter to seed set, and utilization trains corresponding sorter again through the seed set of cross validation, effectively controls the noise introduced by automatic marking, achieves fault-tolerant study.

By below in conjunction with the detailed description of accompanying drawing to most preferred embodiment of the present invention, these and other advantage of the present invention will be more obvious.

Accompanying drawing explanation

Below with reference to the accompanying drawings illustrate embodiments of the invention, above and other objects, features and advantages of the present invention can be understood more easily.Parts in accompanying drawing are just in order to illustrate principle of the present invention.In the accompanying drawings, same or similar technical characteristic or parts will adopt same or similar Reference numeral to represent.

Fig. 1 shows the indicative flowchart that a kind of nothing of the prior art instructs machine learning method.

Fig. 2 shows a kind of indicative flowchart having guidance machine learning method of the prior art.

Fig. 3 shows the indicative flowchart of a kind of half guidance machine learning method of the prior art.

Fig. 4 shows the indicative flowchart of another kind of half guidance machine learning method of the prior art.

Fig. 5 shows the indicative flowchart of the machine learning method according to the embodiment of the present invention.

Fig. 6 shows the indicative flowchart of the machine learning method of use two sorters according to the embodiment of the present invention.

Fig. 7 shows the indicative flowchart of the machine learning method of use three sorters according to the embodiment of the present invention.

Fig. 8 shows the schematic block diagram of the machine learning device according to the embodiment of the present invention.

Fig. 9 shows and can be used for implementing the schematic block diagram according to the computing machine of the method and apparatus of the embodiment of the present invention.

Embodiment

Below with reference to accompanying drawings embodiments of the invention are described.The element described in an accompanying drawing of the present invention or a kind of embodiment and feature can combine with the element shown in one or more other accompanying drawing or embodiment and feature.It should be noted that for purposes of clarity, accompanying drawing and eliminate expression and the description of unrelated to the invention, parts known to persons of ordinary skill in the art and process in illustrating.

In view of the challenge that there is artificial degree of participation and aspect of performance seeking balance in prior art, present inventor proposes a kind of method of fault-tolerant study (Fault-TolerantLearning) to overcome this problem.

Fault-tolerant concept proposes the earliest in Computer Architecture, when there is data, file corruption for various reasons in systems in which or having lost in finger, system can automatically by these corrupted or lost files and date restoring to the state before having an accident, a kind of technology that system is normally run continuously.

According in the fault-tolerant learning method of the embodiment of the present invention and device, learnt by the language material of the corpus of automatic marking instead of artificial mark or priori, be a kind of machine learning method completely automatically, be therefore easily applied in any specific area or task.In addition, described method and apparatus is respectively used to checking by the different sorter of training and further predicts fault-tolerant to carry out, the raising of guaranteed performance.

Below in conjunction with Fig. 5-8, machine learning method according to the embodiment of the present invention and device are described.

Fig. 5 shows the indicative flowchart of the machine learning method according to the embodiment of the present invention.As shown in the figure, in step S510, the data centralization automatic marking utilizing diverse ways never to mark and the multiple different seed set of acquisition.Here, various automated process can be used to carry out labeled data collection.Those skilled in the art can select suitable automated process according to application scenarios.Such as, under the application scenarios of terminology extraction, the terminology extraction method based on TF-IDF that G.Salton and M.J.McGill proposes in IntroductiontoModernInformationRetrieval.McGraw-Hill in 1983 can be used, or YuhangYang, QinLu and TiejunZhao in 2008 at ChineseTermExtractionUsingMinimalResources.Proceedingsof the22thInternationalConferenceonComputationalLinguistics, the terminology extraction method based on deictic words proposed in 1033-1040 page carrys out labeled data collection, the seed set obtained comprises the term and non-term that utilize this automated process to judge to obtain.

Then, in step S520, the seed set of automatic marking is utilized to train multiple different sorter respectively.A sorter is trained in each seed set.

Then, in step S530, utilize the multiple sorters trained to carry out cross validation to different seed set, to obtain the seed set of empirical tests.That is, for the incompatible theory of subset, use other seed set to train part or all sorter in the sorter obtained to verify this seed set.

In step S540, multiple seed set is utilized again to train corresponding sorter.That is, the seed set of this renewal is utilized again to train by this seed set trained listening group.

Next, can process pending example collection with the sorter of again training.This can carry out with reference to the method for prior art, not shown here.

Preferably, in order to improve performance further, cross validation can also be introduced in the process of example collection.Specifically, in step S550, the sorter of again training is utilized to predict pending example collection.In step S560, the example collection of sorter to prediction is utilized to carry out cross validation.Similar with step S530, for one through prediction example collection, part or all sorter in other sorters except the sorter for predicting this example collection can be used to verify this example collection.Then, in step S570, the example in the example collection of empirical tests is added in corresponding seed set, again to train a point corresponding sorter with the seed set of this renewal.That is, the example in the example collection of empirical tests is joined for training in the seed set of the sorter being used for verifying this example collection.Here, exemplarily, the example of some the highest for confidence level in the example collection of empirical tests can be joined in seed set.Repeated execution of steps S540 to S570, repeats end condition (hereinafter also write and do stopping criterion for iteration) until meet.Here, end condition can set as required.Exemplarily, can set when the seed sum in all seed set reaches the number of the example of predetermined needs mark, termination of iterations.

In the above-mentioned methods, the language material of automatic marking is used and the language material of unartificial mark learns.The seed of automatic mark is higher than random labeled accuracy rate, makes to adopt automated process to obtain seed set more meaningful.In addition, different sorters can be trained with multiple relatively independent visual angle (as different seed set, different characteristic sets etc.), make proof procedure more effective.

In addition, in the above-mentioned methods, owing to using the language material of automatic marking, noise information may exist from the beginning, and all may increase after each iteration.In order to control noises effectively, make result more reliable, train multiple sorter be respectively used to seed set checking and with the seed set training classifier again after checking, to alleviate noise, raising performance.The prediction of example collection and checking is carried out and with the seed set training classifier again of example adding empirical tests, noise is alleviated further, and performance improves further with multiple sorter.

Fig. 6 shows the indicative flowchart of the machine learning method of use two sorters according to the embodiment of the present invention.In figure 6, given do not mark data set D, example collection U to be marked, need mark instance number be n.

First, a kind of method is adopted automatically to generate seed S set ₁, adopt and alternatively automatically generate seed S set ₂.

Then, seed S set is utilized ₁train first sorter C ₁, utilize seed S set ₂train first sorter C ₂.

Then, sorter C is utilized ₁and C ₂to the seed S set through automatic marking ₁and S ₂carry out cross validation.Specifically, sorter C is utilized ₁mark seed S set ₂, utilize sorter C ₂mark seed S set ₁.From seed S set ₁and S ₂there is inconsistent seed in middle annotation results of deleting automatic marking result and sorter respectively, obtains the seed S set of empirical tests ₁and S ₂.

As shown in the square frame 610 in Fig. 6, above-mentioned steps can be generically and collectively referred to as initialization procedure.

In order to improve performance further, in the processing procedure of example collection, also cross validation can be carried out, specific as follows.

First, seed S set is utilized ₁training classifier C again ₁, utilize seed S set ₂training classifier C again ₂.

Then, sorter C is utilized ₁example in prediction sets U.Specifically, sorter C is utilized ₁example in mark set U, chooses the example collection L of m the example composition mark that in annotation results, confidence level is the highest ₁, namely through the example collection L of prediction ₁.

Equally, sorter C is utilized ₂example in prediction sets U.Specifically, sorter C is utilized ₂example in mark set U, chooses the example collection L of m the example composition mark that in annotation results, confidence level is the highest ₂, namely through the example collection L of prediction ₂.

Then, sorter C is utilized ₁and C ₂to the example collection L through prediction ₁and L ₂carry out cross validation.Specifically, C is utilized ₂again example collection L is marked ₁in example, delete L ₁middle C ₂annotation results and C ₁the inconsistent example that predicts the outcome, obtain the example collection L of empirical tests ₁.C1 is utilized again to mark example collection L ₂in example, delete L ₂middle C ₁annotation results and C ₂the inconsistent example that predicts the outcome, obtain the example collection L of empirical tests ₂.

Then, set L ₁in example add seed S set to ₁, set L ₂in example add seed S set to ₂, complete an iteration.

Can the beginning of an iteration or at the end of judge whether iteration should stop.As stopping criterion for iteration, such as, can be | S ₁∪ S ₂| when>=N, termination of iterations; Otherwise continuation iteration.

As shown in the square frame 620 in Fig. 6, above-mentioned steps can be generically and collectively referred to as iterative process.

Fig. 7 shows the indicative flowchart of the machine learning method of use three sorters according to the embodiment of the present invention.Compared with Fig. 6, in the method for Fig. 7, employ three sorters.But in method, each step and Fig. 6 are substantially identical, no longer repeat here.

What deserves to be explained is, figure 7 illustrates and use sorter C ₂and C ₃to the seed S set of automatic marking ₁verify, use sorter C ₁and C ₃to the seed S set of automatic marking ₂verify, use sorter C ₁and C ₂to the seed S set of automatic marking ₃verify.Respectively from seed S set ₁, S ₂and S ₃there is inconsistent seed in middle deletion the result and automatic marking result, to obtain the seed S set of empirical tests ₁, S ₂and S ₃.But, also can only use other sorters of a part to verify a seed set.Such as, only sorter C can be suitable for ₂to seed S set ₁verify, be only suitable for sorter C ₃to seed S set ₂verify etc.Here no longer enumerate.

Equally, although illustrated use sorter C in Fig. 7 ₂and C ₃to the example collection L through prediction ₁verify, use sorter C ₁and C ₃to the example collection L through prediction ₂verify, use sorter C ₁and C ₂to the example collection L through prediction ₃verify, but, other sorters of part also can be used to verify an example collection.Such as, only sorter C can be suitable for ₂to example collection L ₁verify, be only suitable for sorter C ₂to example collection L ₃verify etc.Here no longer enumerate.

More than show the machine learning method example of use two sorters and three sorters, but this is just in order to illustration purpose, instead of will by the present invention's restriction therewith.It will be understood by those skilled in the art that the situation of the multiple sorters that may be used for other arbitrary numbers according to the machine learning method of the embodiment of the present invention, repeat no more here.

Fig. 8 shows the schematic block diagram of the machine learning device according to the embodiment of the present invention.As shown in the figure, machine learning device 800 comprises initialization unit 810 and optimizes and processing unit 820.According to one embodiment of present invention, initialization unit 810 is configurable for: the data centralization automatic marking utilizing diverse ways never to mark with obtain multiple different seed set; The seed set of described multiple automatic marking is utilized to train corresponding multiple sorter respectively; And for each seed set in the seed set of described multiple automatic marking, utilize the part or all of sorter except the sorter of being trained by this seed set in described multiple sorter to verify this seed set.Optimize and processing unit 820 configurable for utilizing multiple seed set of empirical tests again to train corresponding multiple sorter respectively.

According to another embodiment of the present invention, optimization and processing unit 820 are also configured for: utilize multiple sorters of again training to predict example collection respectively, to obtain multiple example collection through prediction accordingly; To each example collection through prediction, the part or all of sorter except the sorter for predicting this example collection in described multiple sorter is utilized to verify this example collection; Example in the example collection of each empirical tests is added corresponding seed set; And again train described in repeating, described example collection to be predicted, describedly each example collection is verified and described example in the example collection of each empirical tests is added corresponding seed set, be met until repeat end condition.

According to another embodiment of the present invention, repetition end condition is the total number reaching the example that predetermined needs mark of seed in whole seed set.

According to another embodiment of the present invention, optimization and processing unit 820 are configured for further: utilize described multiple sorter to mark example collection respectively; And choose respectively described multiple sorter each sorter annotation results in example composition multiple example collection through prediction accordingly of the highest predetermined number of confidence level.

According to another embodiment of the present invention, optimization and processing unit 820 are configured for further by the following example collection verified through prediction: utilize the part or all of sorter except the sorter for predicting this example collection in described multiple sorter to mark this example collection; And there is inconsistent example in the annotation results of deletion prediction result and described part or all of sorter from this example collection.

According to another embodiment of the present invention, initialization unit 810 is configured for the seed set being verified automatic marking by some further: utilize the part or all of sorter except the sorter of being trained by this seed set in described multiple sorter to mark this seed set; And there is inconsistent seed between the annotation results of deleting automatic marking result and described part or all of sorter from this seed set.

About the further details of the operation of the machine learning device according to the embodiment of the present invention, with reference to each embodiment of above-described method, can be not described in detail here.

In said method and device, by automated process, the data set do not marked is marked, without the need to artificial participation, improve learning efficiency.In addition, by carrying out cross validation with sorter to seed set, and utilization trains corresponding sorter again through the seed set of cross validation, effectively controls the noise introduced by automatic marking, achieves fault-tolerant study.

Method and apparatus according to the embodiment of the present invention does not impose any restrictions for practical application scene.Training method etc. for used classifier type, sorter does not also limit.

In addition, in said apparatus, all modules, unit can be configured by software, firmware, hardware or its mode combined.Configure spendable concrete means or mode is well known to those skilled in the art, do not repeat them here.When being realized by software or firmware, install to the computing machine with specialized hardware structure the program forming this software from storage medium or network, this computing machine, when being provided with various program, can perform various functions etc.

Fig. 9 shows and can be used for implementing the schematic block diagram according to the computing machine of the method and apparatus of the embodiment of the present invention.In fig .9, CPU (central processing unit) (CPU) 901 performs various process according to the program stored in ROM (read-only memory) (ROM) 902 or from the program that storage area 908 is loaded into random access memory (RAM) 903.In RAM903, also store the data required when CPU901 performs various process etc. as required.CPU901, ROM902 and RAM903 are connected to each other via bus 904.Input/output interface 905 is also connected to bus 904.

Following parts are connected to input/output interface 905: importation 906 (comprising keyboard, mouse etc.), output 907 (comprise display, such as cathode-ray tube (CRT) (CRT), liquid crystal display (LCD) etc., and loudspeaker etc.), storage area 908 (comprising hard disk etc.), communications portion 909 (comprising network interface unit such as LAN card, modulator-demodular unit etc.).Communications portion 909 is via network such as the Internet executive communication process.As required, driver 910 also can be connected to input/output interface 905.Detachable media 911 such as disk, CD, magneto-optic disk, semiconductor memory etc. can be installed on driver 910 as required, and the computer program therefrom read is installed in storage area 908 as required.

When series of processes above-mentioned by software simulating, from network such as the Internet or storage medium, such as detachable media 911 installs the program forming software.

It will be understood by those of skill in the art that this storage medium is not limited to wherein having program stored therein shown in Fig. 9, distributes the detachable media 911 to provide program to user separately with equipment.The example of detachable media 911 comprises disk (comprising floppy disk (registered trademark)), CD (comprising compact disc read-only memory (CD-ROM) and digital universal disc (DVD)), magneto-optic disk (comprising mini-disk (MD) (registered trademark)) and semiconductor memory.Or hard disk that storage medium can be ROM902, comprise in storage area 908 etc., wherein computer program stored, and user is distributed to together with comprising their equipment.

The present invention also proposes a kind of program product storing the instruction code of machine-readable.When described instruction code is read by machine and performs, the above-mentioned method according to the embodiment of the present invention can be performed.

Correspondingly, be also included within of the present invention disclosing for carrying the above-mentioned storage medium storing the program product of the instruction code of machine-readable.Described storage medium includes but not limited to floppy disk, CD, magneto-optic disk, storage card, memory stick etc.

Above in the description of the specific embodiment of the invention, the feature described for a kind of embodiment and/or illustrate can use in one or more other embodiment in same or similar mode, combined with the feature in other embodiment, or substitute the feature in other embodiment.

Should emphasize, term " comprises/comprises " existence referring to feature, key element, step or assembly when using herein, but does not get rid of the existence or additional of one or more further feature, key element, step or assembly.

In addition, method of the present invention be not limited to specifications in describe time sequencing perform, also can according to other time sequencing ground, perform concurrently or independently.Therefore, the execution sequence of the method described in this instructions is not construed as limiting technical scope of the present invention.

Although above by the description of specific embodiments of the invention to invention has been disclosure, should be appreciated that, above-mentioned all embodiments and example are all illustrative, and not restrictive.Those skilled in the art can design various amendment of the present invention, improvement or equivalent in the spirit and scope of claims.These amendments, improvement or equivalent also should be believed to comprise in protection scope of the present invention.

remarks

Remarks 1. 1 kinds of machine learning methods, comprising:

The seed S set that the data centralization automatic marking utilizing diverse ways never to mark is different with acquisition n ₁, S ₂..., S _n, n is natural number and n>=2;

Utilize the seed S set of the individual automatic marking of described n ₁, S ₂..., S _ntrain corresponding n sorter C respectively ₁, C ₂..., C _n;

For each seed S set in the seed set of the individual automatic marking of described n _i, i=1,2 ..., n, utilizes removing by this seed S set in a described n sorter _ithe sorter C of training _ioutside part or all of sorter to this seed S set _iverify; And

Utilize described n seed S set of empirical tests ₁, S ₂..., S _nagain train corresponding n sorter C respectively ₁, C ₂..., C _n.

Remarks 2., according to the method for remarks 1, also comprises:

Described n the sorter of again training is utilized to predict example collection respectively, to obtain corresponding n the example collection L through prediction ₁, L ₂..., L _n;

To each example collection L through prediction _i, i=1,2 ..., n, utilize in a described n sorter except for this example collection L _icarry out the sorter C predicted _ioutside part or all of sorter to this example collection L _iverify;

By the example collection L of each empirical tests _iin example add corresponding seed S set _i; And

Again train described in repetition, described example collection to be predicted, describedly each example collection is verified and described example in the example collection of each empirical tests is added corresponding seed set, be met until repeat end condition.

Remarks 3. is according to the method for remarks 2, and wherein, described repetition end condition is:

Described seed S set ₁, S ₂..., S _nin seed sum reach the number of the example of predetermined needs mark.

Remarks 4., according to the method for remarks 2, wherein, describedly carries out prediction to example collection and comprises:

A described n sorter is utilized to mark described example collection respectively; And

The example choosing the predetermined number that confidence level is the highest in the annotation results of each sorter of a described n sorter respectively forms corresponding n the example collection L through prediction ₁, L ₂..., L _n.

Remarks 5. is according to the method for remarks 2, and wherein, described checking is through the example collection L of prediction _icomprise:

Utilize in a described n sorter except for this example collection L _icarry out the sorter C predicted _ioutside part or all of sorter to this example collection L _imark; And

From this example collection L _ithere is inconsistent example in the annotation results of middle deletion prediction result and described part or all of sorter.

Remarks 6. according to the method for remarks 1, wherein, the seed S set of described checking automatic marking _icomprise:

Utilize removing by this seed S set in a described n sorter _ithe sorter C of training _ioutside part or all of sorter to this seed S set _imark; And

From this seed S set _iinconsistent seed is there is between the annotation results of middle deletion automatic marking result and described part or all of sorter.

Remarks 7. 1 kinds of machine learning devices, comprising:

Initialization unit, is configured for:

Utilize the seed S set of the individual automatic marking of described n ₁, S ₂..., S _ntrain corresponding n sorter C respectively ₁, C ₂..., C _n; And

Optimize and processing unit, be configured for:

Remarks 8. is according to the device of remarks 7, and wherein, described optimization and processing unit are also configured for:

Remarks 9. is according to the device of remarks 8, and wherein, described repetition end condition is:

Remarks 10. is according to the device of remarks 8, and wherein, described optimization and processing unit are configured for further:

Remarks 11. is according to the device of remarks 8, and wherein, described optimization and processing unit are configured for further by the following example collection L verified through prediction _i:

Remarks 12. is according to the device of remarks 7, and wherein, described initialization unit is configured for further by the following seed S set verifying automatic marking _i:

Claims

1. a machine learning method, comprising:

2. method according to claim 1, also comprises:

3. method according to claim 2, wherein, described repetition end condition is:

4. method according to claim 2, wherein, describedly prediction is carried out to example collection comprise:

5. method according to claim 2, wherein, described checking is through the example collection L of prediction _icomprise:

6. method according to claim 1, wherein, the seed S set of described checking automatic marking _icomprise:

7. a machine learning device, comprising:

Initialization unit, is configured for:

Optimize and processing unit, be configured for:

8. device according to claim 7, wherein, described optimization and processing unit are also configured for:

9. device according to claim 8, wherein, described optimization and processing unit are configured for further by the following example collection L verified through prediction _i:

10. device according to claim 7, wherein, described initialization unit is configured for further by the following seed S set verifying automatic marking _i: