CN104951802A

CN104951802A - Classifier updating method

Info

Publication number: CN104951802A
Application number: CN201510336424.XA
Authority: CN
Inventors: 吴偶; 胡卫明; 左海强; 祝守宇; 黄长波
Original assignee: Institute of Automation of Chinese Academy of Science
Current assignee: Institute of Automation of Chinese Academy of Science
Priority date: 2015-06-17
Filing date: 2015-06-17
Publication date: 2015-09-30

Abstract

The invention discloses a classifier updating method. The classifier updating method comprises the steps of firstly, collecting a wrongly-classified training sample and an incremental wrongly-classified sample; secondly, collecting all abnormal samples in an incremental wrongly-classified sample set by utilizing a basic wrongly-classified sample set; finally, updating a classifier by utilizing an abnormal sample set and the learning of an incremental machine. According to the classifier updating method disclosed by the invention, the incremental wrongly-classified sample set is screened by utilizing the basic wrongly-classified sample set, and thus the phenomenon that the generalization performance of a harmful image classifier is reduced as some helpful wrongly-classified training samples are used for updating can be avoided.

Description

A kind of sorter update method

Technical field

The present invention relates to Computer Applied Technology field, particularly a kind of sorter update method.

Background technology

The multimedias such as picture, video and audio frequency become gradually harmful (as pornographic, violence etc.) information propagate on the internet one of the major way taked.And in these network harmful informations, because picture transfer is relatively convenient, relatively easily browse, lower to hardware requirement, the actual harmfulness therefore brought to teenager may be maximum.The social concerns such as the negative influence that in network, harmful pictorial information causes and crime have been subject to people and have more and more paid close attention to.Imperfect picture information how in time in automatic recognition network, and then take effective Supervision Measures, become very urgent problem.

Network is harmful to the identification of image, is all generally first extract the dissimilar characteristics of image that can embody harmful semanteme, then constructs harmful Image Classifier according to these features.In addition, carry out in actual network image identification utilizing the harmful Image Classifier obtained, people trade union collects the sample divided by harmful Image Classifier mistake incessantly, then utilizes these increments mistake point samples and increment machine learning to upgrade harmful Image Classifier.

But utilize whole increment mistake point samples harmful Image Classifier to be upgraded to the Generalization Capability that likely can reduce harmful Image Classifier at present.Main cause is because in the training process of harmful Image Classifier, in order to ensure that sorter has good Generalization Capability, generally all makes the sorter finally obtained on training set, keep certain error rate.That is training set exist some reasonably wrong point samples.If increment mistake point sample and the mistake on training set are divided, sample is the same or closely, that illustrates that these increment mistake point samples should not be used for the renewal of harmful Image Classifier.So be necessary selecting an increment mistake point sample, to realize more reasonably carrying out sorter renewal.

Summary of the invention

(1) technical matters that will solve

The object of the present invention is to provide a kind of sorter update method, can avoid dividing training sample to be used for upgrading some useful mistakes and reducing the Generalization Capability of sorter.

(2) technical scheme

The invention provides a kind of sorter update method, it is characterized in that, comprising:

S1, collects wrong point training sample and increment mistake point sample, forms basic wrong point sample set and increment mistake point sample set respectively, and wherein, basic wrong point sample is used for training classifier, and increment mistake point sample is for upgrading sorter;

S2, utilizes basic wrong point sample set, collects all exceptional samples in increment mistake point sample set, forms exceptional sample collection;

S3, utilizes exceptional sample collection and increment machine learning to upgrade described sorter.

Further, method also comprises: S4, divides training sample set to merge described exceptional sample collection and described basic mistake, forms new basic mistake and divides training sample set.

Further, step S2 comprises: each increment mistake point sample in described increment mistake point sample set is put into described basic mistake in turn and divides in sample set, whether the increment mistake point sample put into described in detection is exceptional sample, collect all exceptional samples in described increment mistake point sample set, form exceptional sample collection.

Further, whether be the step of exceptional sample comprise: divide sample set to merge formation new samples collection with described basic mistake in an increment mistake point sample in described increment mistake point sample set temporarily, integrate operation exception detection algorithm judge that this increment mistake point sample is concentrated whether as exceptional sample at described new samples at described new samples if detecting increment mistake point sample.

Further, increment machine learning algorithm can be support vector machine Incremental Learning Algorithm or random forest Incremental Learning Algorithm.

Further, sorter can be harmful Image Classifier.

(3) beneficial effect

A kind of sorter update method provided by the invention, utilizes basic wrong point sample set, screens, can avoid dividing training sample to be used for upgrading some useful mistakes and reducing the Generalization Capability of harmful Image Classifier to an increment mistake point sample set.

Accompanying drawing explanation

Fig. 1 is the process flow diagram of harmful Image Classifier update method provided by the invention.

Embodiment

For making the object, technical solutions and advantages of the present invention clearly understand, below in conjunction with specific embodiment, and with reference to accompanying drawing, the present invention is described in more detail.

A kind of sorter update method provided by the invention, first wrong point training sample and increment mistake point sample is collected, the basic wrong point sample set of recycling, collects all exceptional samples in increment mistake point sample set, finally utilizes exceptional sample collection and increment machine learning to upgrade described sorter.The present invention, owing to utilizing basic wrong point sample set, screens an increment mistake point sample set, can avoid dividing training sample to be used for upgrading some useful mistakes and reducing the Generalization Capability of harmful Image Classifier.

Fig. 1 is the process flow diagram of harmful Image Classifier update method provided by the invention, and its step comprises:

S1, computing machine collects wrong point training sample and increment mistake point sample respectively, forms basic wrong point sample set and increment mistake point sample set respectively, and wherein, basic wrong point sample is for training described sorter, and increment mistake point sample is for upgrading described sorter; Assuming that a basic wrong point sample set contains 100 mistakes divide sample, also namely the most initial for training harmful Image Classifier time, the Image Classifier trained creates the result of wrong point on training set to 100 images, so in fact these 100 images are exactly that 100 mistakes that mistake point sample set comprises substantially divide sample.

S2, when the harmful Image Classifier the most initially trained being used for actual harmful image detection application, harmful Image Classifier can produce wrong point to some samples, in order to utilize increment mistake point sample set to promote the performance of harmful Image Classifier, each increment mistake point sample in increment mistake point sample set is put in basic wrong point sample set in turn, utilizes online Outlier Detection Algorithm whether to be abnormal to judge that this increment mistake point sample is concentrated at new samples; The algorithm etc. that online Outlier Detection Algorithm can comprise SmartSifter by conventional method, upgrade based on svd.Exceptional sample all after tested in increment mistake point sample set is picked out, forms exceptional sample collection.

S3, utilizes exceptional sample collection and Incremental Learning Algorithm to upgrade harmful Image Classifier; Incremental Learning Algorithm is according to deciding for learning algorithm when training harmful Image Classifier at first.If algorithm is originally support vector machine, so Incremental Learning Algorithm just selects support vector machine Incremental Learning Algorithm, if original algorithm is random forest, so Incremental Learning Algorithm is with regard to selectivity increment random forest learning algorithm.

S4, merges exceptional sample collection and a basic wrong point training sample set, forms new basic mistake and divide training sample set, so that harmful Image Classifier next time upgrades.

Execution environment of the present invention adopts one have the Pentium 4 computing machine of 3.0G hertz central processing unit and 2G byte of memory and worked out harmful Image Classifier update method constructor with C Plus Plus, achieve a kind of harmful Image Classifier update method newly of the present invention, other execution environment can also be adopted, do not repeat them here.Above-described specific embodiment; object of the present invention, technical scheme and beneficial effect are further described; be understood that; the foregoing is only specific embodiments of the invention; be not limited to the present invention; within the spirit and principles in the present invention all, any amendment made, equivalent replacement, improvement etc., all should be included within protection scope of the present invention.

Claims

1. a sorter update method, is characterized in that, comprising:

S1, collects wrong point training sample and increment mistake point sample, and form basic wrong point sample set and increment mistake point sample set respectively, wherein, described basic mistake divides sample to be used for training described sorter, and described increment mistake point sample is for upgrading described sorter;

S2, utilizes described basic mistake to divide sample set, collects all exceptional samples in described increment mistake point sample set, forms exceptional sample collection;

S3, utilizes described exceptional sample collection and increment machine learning to upgrade described sorter.

2. method according to claim 1, is characterized in that, the method also comprises:

S4, divides training sample set to merge described exceptional sample collection and described basic mistake, forms new basic mistake and divides training sample set.

3. method according to claim 2, is characterized in that, described step S2 comprises:

Each increment mistake point sample in described increment mistake point sample set is put into described basic mistake in turn to be divided in sample set, whether the increment mistake point sample put into described in detection is exceptional sample, collect all exceptional samples in described increment mistake point sample set, form exceptional sample collection.

4. method according to claim 3, is characterized in that, whether described detection increment mistake point sample is that the step of exceptional sample comprises:

Divide sample set to merge formation new samples collection in an increment mistake point sample in described increment mistake point sample set and described basic mistake temporarily, integrate operation exception detection algorithm at described new samples and judge that whether this increment mistake point sample is concentrated as exceptional sample at described new samples.

5. according to the method described in claim 1-4 any one, it is characterized in that, described increment machine learning algorithm is support vector machine Incremental Learning Algorithm or random forest Incremental Learning Algorithm.

6. according to the method described in claim 1-4 any one, it is characterized in that, described sorter is harmful Image Classifier.