CN104951802A - Classifier updating method - Google Patents

Classifier updating method Download PDF

Info

Publication number
CN104951802A
CN104951802A CN201510336424.XA CN201510336424A CN104951802A CN 104951802 A CN104951802 A CN 104951802A CN 201510336424 A CN201510336424 A CN 201510336424A CN 104951802 A CN104951802 A CN 104951802A
Authority
CN
China
Prior art keywords
increment
sample
mistake
sample set
exceptional
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201510336424.XA
Other languages
Chinese (zh)
Inventor
吴偶
胡卫明
左海强
祝守宇
黄长波
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Automation of Chinese Academy of Science
Original Assignee
Institute of Automation of Chinese Academy of Science
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Automation of Chinese Academy of Science filed Critical Institute of Automation of Chinese Academy of Science
Priority to CN201510336424.XA priority Critical patent/CN104951802A/en
Publication of CN104951802A publication Critical patent/CN104951802A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2411Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a classifier updating method. The classifier updating method comprises the steps of firstly, collecting a wrongly-classified training sample and an incremental wrongly-classified sample; secondly, collecting all abnormal samples in an incremental wrongly-classified sample set by utilizing a basic wrongly-classified sample set; finally, updating a classifier by utilizing an abnormal sample set and the learning of an incremental machine. According to the classifier updating method disclosed by the invention, the incremental wrongly-classified sample set is screened by utilizing the basic wrongly-classified sample set, and thus the phenomenon that the generalization performance of a harmful image classifier is reduced as some helpful wrongly-classified training samples are used for updating can be avoided.

Description

A kind of sorter update method
Technical field
The present invention relates to Computer Applied Technology field, particularly a kind of sorter update method.
Background technology
The multimedias such as picture, video and audio frequency become gradually harmful (as pornographic, violence etc.) information propagate on the internet one of the major way taked.And in these network harmful informations, because picture transfer is relatively convenient, relatively easily browse, lower to hardware requirement, the actual harmfulness therefore brought to teenager may be maximum.The social concerns such as the negative influence that in network, harmful pictorial information causes and crime have been subject to people and have more and more paid close attention to.Imperfect picture information how in time in automatic recognition network, and then take effective Supervision Measures, become very urgent problem.
Network is harmful to the identification of image, is all generally first extract the dissimilar characteristics of image that can embody harmful semanteme, then constructs harmful Image Classifier according to these features.In addition, carry out in actual network image identification utilizing the harmful Image Classifier obtained, people trade union collects the sample divided by harmful Image Classifier mistake incessantly, then utilizes these increments mistake point samples and increment machine learning to upgrade harmful Image Classifier.
But utilize whole increment mistake point samples harmful Image Classifier to be upgraded to the Generalization Capability that likely can reduce harmful Image Classifier at present.Main cause is because in the training process of harmful Image Classifier, in order to ensure that sorter has good Generalization Capability, generally all makes the sorter finally obtained on training set, keep certain error rate.That is training set exist some reasonably wrong point samples.If increment mistake point sample and the mistake on training set are divided, sample is the same or closely, that illustrates that these increment mistake point samples should not be used for the renewal of harmful Image Classifier.So be necessary selecting an increment mistake point sample, to realize more reasonably carrying out sorter renewal.
Summary of the invention
(1) technical matters that will solve
The object of the present invention is to provide a kind of sorter update method, can avoid dividing training sample to be used for upgrading some useful mistakes and reducing the Generalization Capability of sorter.
(2) technical scheme
The invention provides a kind of sorter update method, it is characterized in that, comprising:
S1, collects wrong point training sample and increment mistake point sample, forms basic wrong point sample set and increment mistake point sample set respectively, and wherein, basic wrong point sample is used for training classifier, and increment mistake point sample is for upgrading sorter;
S2, utilizes basic wrong point sample set, collects all exceptional samples in increment mistake point sample set, forms exceptional sample collection;
S3, utilizes exceptional sample collection and increment machine learning to upgrade described sorter.
Further, method also comprises: S4, divides training sample set to merge described exceptional sample collection and described basic mistake, forms new basic mistake and divides training sample set.
Further, step S2 comprises: each increment mistake point sample in described increment mistake point sample set is put into described basic mistake in turn and divides in sample set, whether the increment mistake point sample put into described in detection is exceptional sample, collect all exceptional samples in described increment mistake point sample set, form exceptional sample collection.
Further, whether be the step of exceptional sample comprise: divide sample set to merge formation new samples collection with described basic mistake in an increment mistake point sample in described increment mistake point sample set temporarily, integrate operation exception detection algorithm judge that this increment mistake point sample is concentrated whether as exceptional sample at described new samples at described new samples if detecting increment mistake point sample.
Further, increment machine learning algorithm can be support vector machine Incremental Learning Algorithm or random forest Incremental Learning Algorithm.
Further, sorter can be harmful Image Classifier.
(3) beneficial effect
A kind of sorter update method provided by the invention, utilizes basic wrong point sample set, screens, can avoid dividing training sample to be used for upgrading some useful mistakes and reducing the Generalization Capability of harmful Image Classifier to an increment mistake point sample set.
Accompanying drawing explanation
Fig. 1 is the process flow diagram of harmful Image Classifier update method provided by the invention.
Embodiment
For making the object, technical solutions and advantages of the present invention clearly understand, below in conjunction with specific embodiment, and with reference to accompanying drawing, the present invention is described in more detail.
A kind of sorter update method provided by the invention, first wrong point training sample and increment mistake point sample is collected, the basic wrong point sample set of recycling, collects all exceptional samples in increment mistake point sample set, finally utilizes exceptional sample collection and increment machine learning to upgrade described sorter.The present invention, owing to utilizing basic wrong point sample set, screens an increment mistake point sample set, can avoid dividing training sample to be used for upgrading some useful mistakes and reducing the Generalization Capability of harmful Image Classifier.
Fig. 1 is the process flow diagram of harmful Image Classifier update method provided by the invention, and its step comprises:
S1, computing machine collects wrong point training sample and increment mistake point sample respectively, forms basic wrong point sample set and increment mistake point sample set respectively, and wherein, basic wrong point sample is for training described sorter, and increment mistake point sample is for upgrading described sorter; Assuming that a basic wrong point sample set contains 100 mistakes divide sample, also namely the most initial for training harmful Image Classifier time, the Image Classifier trained creates the result of wrong point on training set to 100 images, so in fact these 100 images are exactly that 100 mistakes that mistake point sample set comprises substantially divide sample.
S2, when the harmful Image Classifier the most initially trained being used for actual harmful image detection application, harmful Image Classifier can produce wrong point to some samples, in order to utilize increment mistake point sample set to promote the performance of harmful Image Classifier, each increment mistake point sample in increment mistake point sample set is put in basic wrong point sample set in turn, utilizes online Outlier Detection Algorithm whether to be abnormal to judge that this increment mistake point sample is concentrated at new samples; The algorithm etc. that online Outlier Detection Algorithm can comprise SmartSifter by conventional method, upgrade based on svd.Exceptional sample all after tested in increment mistake point sample set is picked out, forms exceptional sample collection.
S3, utilizes exceptional sample collection and Incremental Learning Algorithm to upgrade harmful Image Classifier; Incremental Learning Algorithm is according to deciding for learning algorithm when training harmful Image Classifier at first.If algorithm is originally support vector machine, so Incremental Learning Algorithm just selects support vector machine Incremental Learning Algorithm, if original algorithm is random forest, so Incremental Learning Algorithm is with regard to selectivity increment random forest learning algorithm.
S4, merges exceptional sample collection and a basic wrong point training sample set, forms new basic mistake and divide training sample set, so that harmful Image Classifier next time upgrades.
Execution environment of the present invention adopts one have the Pentium 4 computing machine of 3.0G hertz central processing unit and 2G byte of memory and worked out harmful Image Classifier update method constructor with C Plus Plus, achieve a kind of harmful Image Classifier update method newly of the present invention, other execution environment can also be adopted, do not repeat them here.Above-described specific embodiment; object of the present invention, technical scheme and beneficial effect are further described; be understood that; the foregoing is only specific embodiments of the invention; be not limited to the present invention; within the spirit and principles in the present invention all, any amendment made, equivalent replacement, improvement etc., all should be included within protection scope of the present invention.

Claims (6)

1. a sorter update method, is characterized in that, comprising:
S1, collects wrong point training sample and increment mistake point sample, and form basic wrong point sample set and increment mistake point sample set respectively, wherein, described basic mistake divides sample to be used for training described sorter, and described increment mistake point sample is for upgrading described sorter;
S2, utilizes described basic mistake to divide sample set, collects all exceptional samples in described increment mistake point sample set, forms exceptional sample collection;
S3, utilizes described exceptional sample collection and increment machine learning to upgrade described sorter.
2. method according to claim 1, is characterized in that, the method also comprises:
S4, divides training sample set to merge described exceptional sample collection and described basic mistake, forms new basic mistake and divides training sample set.
3. method according to claim 2, is characterized in that, described step S2 comprises:
Each increment mistake point sample in described increment mistake point sample set is put into described basic mistake in turn to be divided in sample set, whether the increment mistake point sample put into described in detection is exceptional sample, collect all exceptional samples in described increment mistake point sample set, form exceptional sample collection.
4. method according to claim 3, is characterized in that, whether described detection increment mistake point sample is that the step of exceptional sample comprises:
Divide sample set to merge formation new samples collection in an increment mistake point sample in described increment mistake point sample set and described basic mistake temporarily, integrate operation exception detection algorithm at described new samples and judge that whether this increment mistake point sample is concentrated as exceptional sample at described new samples.
5. according to the method described in claim 1-4 any one, it is characterized in that, described increment machine learning algorithm is support vector machine Incremental Learning Algorithm or random forest Incremental Learning Algorithm.
6. according to the method described in claim 1-4 any one, it is characterized in that, described sorter is harmful Image Classifier.
CN201510336424.XA 2015-06-17 2015-06-17 Classifier updating method Pending CN104951802A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510336424.XA CN104951802A (en) 2015-06-17 2015-06-17 Classifier updating method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510336424.XA CN104951802A (en) 2015-06-17 2015-06-17 Classifier updating method

Publications (1)

Publication Number Publication Date
CN104951802A true CN104951802A (en) 2015-09-30

Family

ID=54166442

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510336424.XA Pending CN104951802A (en) 2015-06-17 2015-06-17 Classifier updating method

Country Status (1)

Country Link
CN (1) CN104951802A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109753580A (en) * 2018-12-21 2019-05-14 Oppo广东移动通信有限公司 A kind of image classification method, device, storage medium and electronic equipment
WO2019179189A1 (en) * 2018-03-23 2019-09-26 北京达佳互联信息技术有限公司 Image classification model optimization method and device and terminal

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101055621A (en) * 2006-04-10 2007-10-17 中国科学院自动化研究所 Content based sensitive web page identification method
CN101315670A (en) * 2007-06-01 2008-12-03 清华大学 Specific shot body detection device, learning device and method thereof
CN103593672A (en) * 2013-05-27 2014-02-19 深圳市智美达科技有限公司 Adaboost classifier on-line learning method and Adaboost classifier on-line learning system
CN104391860A (en) * 2014-10-22 2015-03-04 安一恒通(北京)科技有限公司 Content type detection method and device

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101055621A (en) * 2006-04-10 2007-10-17 中国科学院自动化研究所 Content based sensitive web page identification method
CN101315670A (en) * 2007-06-01 2008-12-03 清华大学 Specific shot body detection device, learning device and method thereof
CN103593672A (en) * 2013-05-27 2014-02-19 深圳市智美达科技有限公司 Adaboost classifier on-line learning method and Adaboost classifier on-line learning system
CN104391860A (en) * 2014-10-22 2015-03-04 安一恒通(北京)科技有限公司 Content type detection method and device

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
WEIMING HU ETAL: "Recognition of Adult Images, Videos, and Web Page Bags", 《ACM TRANSACTIONS ON MULTIMEDIA COMPUTING, COMMUNICATIONS AND APPLICATIONS》 *
丁昕苗 等: "基于多视角融合稀疏表示的恐怖视频识别", 《电子学报》 *
李文昊 等: "一种改进的AdaBoost人脸检测算法", 《电视技术》 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019179189A1 (en) * 2018-03-23 2019-09-26 北京达佳互联信息技术有限公司 Image classification model optimization method and device and terminal
US11544496B2 (en) 2018-03-23 2023-01-03 Beijing Dajia Internet Information Technology Co., Ltd. Method for optimizing image classification model, and terminal and storage medium thereof
CN109753580A (en) * 2018-12-21 2019-05-14 Oppo广东移动通信有限公司 A kind of image classification method, device, storage medium and electronic equipment

Similar Documents

Publication Publication Date Title
CN109639481B (en) Deep learning-based network traffic classification method and system and electronic equipment
CN107729908B (en) Method, device and system for establishing machine learning classification model
CN112668586B (en) Model training method, picture processing device, storage medium, and program product
CN113382279A (en) Live broadcast recommendation method, device, equipment, storage medium and computer program product
US20240185096A1 (en) A Method, Device and Storage Medium for Knowledge Recommendation
CN105574030B (en) A kind of information search method and device
CN110780965B (en) Vision-based process automation method, equipment and readable storage medium
CN113051344A (en) Information pushing method and information pushing system based on cloud computing and big data
WO2019242442A1 (en) Multi-model feature-based malware identification method, system and related apparatus
CN105630662B (en) Internal-memory detection method and device
US20160328466A1 (en) Label filters for large scale multi-label classification
CN109784368A (en) A kind of determination method and apparatus of application program classification
CN112187890B (en) Information distribution method based on cloud computing and big data and block chain financial cloud center
CN112861894A (en) Data stream classification method, device and system
US8712100B2 (en) Profiling activity through video surveillance
CN104951802A (en) Classifier updating method
CN104933077A (en) Rule-based multi-file information analysis method
CN111739649B (en) User portrait capturing method, device and system
CN109389972B (en) Quality testing method and device for semantic cloud function, storage medium and equipment
CN112364185A (en) Method and device for determining characteristics of multimedia resource, electronic equipment and storage medium
CN106293650A (en) A kind of folder attribute method to set up and device
CN104572996A (en) Processing method and device for video webpage
CN116661936A (en) Page data processing method and device, computer equipment and storage medium
CN110716778A (en) Application compatibility testing method, device and system
CN115393034A (en) Method for carrying out risk identification on enterprise account based on natural language processing technology

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20150930