CN104361059A - Harmful information identification and web page classification method based on multi-instance learning - Google Patents
Harmful information identification and web page classification method based on multi-instance learning Download PDFInfo
- Publication number
- CN104361059A CN104361059A CN201410609728.4A CN201410609728A CN104361059A CN 104361059 A CN104361059 A CN 104361059A CN 201410609728 A CN201410609728 A CN 201410609728A CN 104361059 A CN104361059 A CN 104361059A
- Authority
- CN
- China
- Prior art keywords
- webpage
- effective image
- bag
- text
- web page
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 44
- 239000013598 vector Substances 0.000 claims description 55
- 238000012706 support-vector machine Methods 0.000 claims description 16
- 239000000284 extract Substances 0.000 claims description 15
- 238000012549 training Methods 0.000 claims description 14
- 230000000007 visual effect Effects 0.000 claims description 9
- 238000006116 polymerization reaction Methods 0.000 claims description 5
- 238000003064 k means clustering Methods 0.000 claims description 2
- 238000010606 normalization Methods 0.000 claims description 2
- 230000000694 effects Effects 0.000 abstract description 5
- 231100000331 toxic Toxicity 0.000 abstract 1
- 230000002588 toxic effect Effects 0.000 abstract 1
- 239000003814 drug Substances 0.000 description 13
- 229940079593 drug Drugs 0.000 description 13
- 230000004044 response Effects 0.000 description 5
- 238000010586 diagram Methods 0.000 description 4
- 238000005070 sampling Methods 0.000 description 4
- 238000000605 extraction Methods 0.000 description 3
- 230000006870 function Effects 0.000 description 2
- 239000000203 mixture Substances 0.000 description 2
- 230000008569 process Effects 0.000 description 2
- 238000012545 processing Methods 0.000 description 2
- 238000012360 testing method Methods 0.000 description 2
- 244000025254 Cannabis sativa Species 0.000 description 1
- 235000012766 Cannabis sativa ssp. sativa var. sativa Nutrition 0.000 description 1
- 235000012765 Cannabis sativa ssp. sativa var. spontanea Nutrition 0.000 description 1
- 206010013654 Drug abuse Diseases 0.000 description 1
- 238000007630 basic procedure Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 235000009120 camo Nutrition 0.000 description 1
- 235000005607 chanvre indien Nutrition 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 239000012141 concentrate Substances 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 239000011487 hemp Substances 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 239000011159 matrix material Substances 0.000 description 1
- 230000002787 reinforcement Effects 0.000 description 1
- 208000011117 substance-related disease Diseases 0.000 description 1
- 238000012795 verification Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2411—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Databases & Information Systems (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
Description
Claims (10)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410609728.4A CN104361059B (en) | 2014-11-03 | 2014-11-03 | A kind of harmful information identification and Web page classification method based on multi-instance learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410609728.4A CN104361059B (en) | 2014-11-03 | 2014-11-03 | A kind of harmful information identification and Web page classification method based on multi-instance learning |
Publications (2)
Publication Number | Publication Date |
---|---|
CN104361059A true CN104361059A (en) | 2015-02-18 |
CN104361059B CN104361059B (en) | 2018-03-27 |
Family
ID=52528320
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201410609728.4A Active CN104361059B (en) | 2014-11-03 | 2014-11-03 | A kind of harmful information identification and Web page classification method based on multi-instance learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN104361059B (en) |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106021442A (en) * | 2016-05-16 | 2016-10-12 | 江苏大学 | Network news outline extraction method |
CN106055705A (en) * | 2016-06-21 | 2016-10-26 | 广东工业大学 | Web page classification method for multi-task and multi-example learning based on maximum distance |
CN106250924A (en) * | 2016-07-27 | 2016-12-21 | 南京大学 | A kind of newly-increased category detection method based on multi-instance learning |
CN107480289A (en) * | 2017-08-24 | 2017-12-15 | 成都澳海川科技有限公司 | User property acquisition methods and device |
CN109241379A (en) * | 2017-07-11 | 2019-01-18 | 北京交通大学 | A method of across Modal detection network navy |
CN111259237A (en) * | 2020-01-13 | 2020-06-09 | 中国搜索信息科技股份有限公司 | Method for identifying public harmful information |
CN113254636A (en) * | 2021-04-27 | 2021-08-13 | 上海大学 | Remote supervision entity relationship classification method based on example weight dispersion |
CN116992035A (en) * | 2023-09-27 | 2023-11-03 | 湖南正宇软件技术开发有限公司 | Intelligent classification method, device, computer equipment and medium |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101281521A (en) * | 2007-04-05 | 2008-10-08 | 中国科学院自动化研究所 | Method and system for filtering sensitive web page based on multiple classifier amalgamation |
JP2013004093A (en) * | 2011-06-16 | 2013-01-07 | Fujitsu Ltd | Search method and system by multi-instance learning |
CN103218608A (en) * | 2013-04-19 | 2013-07-24 | 中国科学院自动化研究所 | Network violent video identification method |
CN103605794A (en) * | 2013-12-05 | 2014-02-26 | 国家计算机网络与信息安全管理中心 | Website classifying method |
-
2014
- 2014-11-03 CN CN201410609728.4A patent/CN104361059B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101281521A (en) * | 2007-04-05 | 2008-10-08 | 中国科学院自动化研究所 | Method and system for filtering sensitive web page based on multiple classifier amalgamation |
JP2013004093A (en) * | 2011-06-16 | 2013-01-07 | Fujitsu Ltd | Search method and system by multi-instance learning |
CN103218608A (en) * | 2013-04-19 | 2013-07-24 | 中国科学院自动化研究所 | Network violent video identification method |
CN103605794A (en) * | 2013-12-05 | 2014-02-26 | 国家计算机网络与信息安全管理中心 | Website classifying method |
Non-Patent Citations (1)
Title |
---|
RUIGUANG HU等: "DRUG-TAKING INSTRUMENTS RECOGNITION", 《THE FIRST ASIAN CONFERENCE ON PATTERN RECOGNITION》 * |
Cited By (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106021442B (en) * | 2016-05-16 | 2019-10-01 | 江苏大学 | A kind of Internet news summary extracting method |
CN106021442A (en) * | 2016-05-16 | 2016-10-12 | 江苏大学 | Network news outline extraction method |
CN106055705A (en) * | 2016-06-21 | 2016-10-26 | 广东工业大学 | Web page classification method for multi-task and multi-example learning based on maximum distance |
CN106055705B (en) * | 2016-06-21 | 2019-07-05 | 广东工业大学 | Web page classification method based on maximum spacing multitask multi-instance learning |
CN106250924A (en) * | 2016-07-27 | 2016-12-21 | 南京大学 | A kind of newly-increased category detection method based on multi-instance learning |
CN106250924B (en) * | 2016-07-27 | 2019-07-16 | 南京大学 | A kind of newly-increased category detection method based on multi-instance learning |
CN109241379A (en) * | 2017-07-11 | 2019-01-18 | 北京交通大学 | A method of across Modal detection network navy |
CN107480289A (en) * | 2017-08-24 | 2017-12-15 | 成都澳海川科技有限公司 | User property acquisition methods and device |
CN107480289B (en) * | 2017-08-24 | 2020-06-30 | 成都澳海川科技有限公司 | User attribute acquisition method and device |
CN111259237A (en) * | 2020-01-13 | 2020-06-09 | 中国搜索信息科技股份有限公司 | Method for identifying public harmful information |
CN113254636A (en) * | 2021-04-27 | 2021-08-13 | 上海大学 | Remote supervision entity relationship classification method based on example weight dispersion |
CN116992035A (en) * | 2023-09-27 | 2023-11-03 | 湖南正宇软件技术开发有限公司 | Intelligent classification method, device, computer equipment and medium |
CN116992035B (en) * | 2023-09-27 | 2023-12-08 | 湖南正宇软件技术开发有限公司 | Intelligent classification method, device, computer equipment and medium |
Also Published As
Publication number | Publication date |
---|---|
CN104361059B (en) | 2018-03-27 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN104361059A (en) | Harmful information identification and web page classification method based on multi-instance learning | |
CN103218444B (en) | Based on semantic method of Tibetan language webpage text classification | |
CN101430695B (en) | System and method for computing difference affinities of word | |
CN107133213A (en) | A kind of text snippet extraction method and system based on algorithm | |
US20070294223A1 (en) | Text Categorization Using External Knowledge | |
CN104951548A (en) | Method and system for calculating negative public opinion index | |
CN107992542A (en) | A kind of similar article based on topic model recommends method | |
CN103617157A (en) | Text similarity calculation method based on semantics | |
CN107291723A (en) | The method and apparatus of web page text classification, the method and apparatus of web page text identification | |
CN103559199B (en) | Method for abstracting web page information and device | |
CN105653668A (en) | Webpage content analysis and extraction optimization method based on DOM Tree in cloud environment | |
CN104615593A (en) | Method and device for automatic detection of microblog hot topics | |
CN103246644B (en) | Method and device for processing Internet public opinion information | |
CN103544255A (en) | Text semantic relativity based network public opinion information analysis method | |
EP2041669A2 (en) | Text categorization using external knowledge | |
CN102945244A (en) | Chinese web page repeated document detection and filtration method based on full stop characteristic word string | |
CN104239485A (en) | Statistical machine learning-based internet hidden link detection method | |
CN106126502A (en) | A kind of emotional semantic classification system and method based on support vector machine | |
Chen et al. | Learning to predict charges for judgment with legal graph | |
CN103530316A (en) | Science subject extraction method based on multi-view learning | |
CN104537280B (en) | Protein interactive relation recognition methods based on text relation similitude | |
Hassan et al. | Automatic document topic identification using wikipedia hierarchical ontology | |
CN103699568B (en) | A kind of from Wiki, extract the method for hyponymy between field term | |
Croce et al. | Semantic convolution kernels over dependency trees: smoothed partial tree kernel | |
de Silva | SAFS3 algorithm: Frequency statistic and semantic similarity based semantic classification use case |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
TR01 | Transfer of patent right | ||
TR01 | Transfer of patent right |
Effective date of registration: 20191204 Address after: 250101 2F, Hanyu Jingu new media building, high tech Zone, Jinan City, Shandong Province Patentee after: Renmin Zhongke (Shandong) Intelligent Technology Co.,Ltd. Address before: 100190 Zhongguancun East Road, Beijing, No. 95, No. Patentee before: Institute of Automation, Chinese Academy of Sciences |
|
TR01 | Transfer of patent right | ||
TR01 | Transfer of patent right |
Effective date of registration: 20200311 Address after: Room 201, 2 / F, Hanyu Jingu new media building, no.7000, Jingshi Road, Jinan City, Shandong Province, 250000 Patentee after: Renmin Zhongke (Jinan) Intelligent Technology Co.,Ltd. Address before: 250101 2F, Hanyu Jingu new media building, high tech Zone, Jinan City, Shandong Province Patentee before: Renmin Zhongke (Shandong) Intelligent Technology Co.,Ltd. |
|
CP03 | Change of name, title or address | ||
CP03 | Change of name, title or address |
Address after: 100176 1401, 14th floor, building 8, No. 8 courtyard, No. 1 KEGU street, Beijing Economic and Technological Development Zone, Daxing District, Beijing (Yizhuang group, high-end industrial area, Beijing Pilot Free Trade Zone) Patentee after: Renmin Zhongke (Beijing) Intelligent Technology Co.,Ltd. Address before: Room 201, 2 / F, Hangu Jinggu new media building, 7000 Jingshi Road, Jinan City, Shandong Province Patentee before: Renmin Zhongke (Jinan) Intelligent Technology Co.,Ltd. |