CN104361059B - A kind of harmful information identification and Web page classification method based on multi-instance learning - Google Patents
A kind of harmful information identification and Web page classification method based on multi-instance learning Download PDFInfo
- Publication number
- CN104361059B CN104361059B CN201410609728.4A CN201410609728A CN104361059B CN 104361059 B CN104361059 B CN 104361059B CN 201410609728 A CN201410609728 A CN 201410609728A CN 104361059 B CN104361059 B CN 104361059B
- Authority
- CN
- China
- Prior art keywords
- mrow
- msub
- webpage
- effective image
- bag
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 38
- 239000000284 extract Substances 0.000 claims abstract description 12
- 239000013598 vector Substances 0.000 claims description 55
- 238000012706 support-vector machine Methods 0.000 claims description 15
- 238000012549 training Methods 0.000 claims description 12
- 230000000007 visual effect Effects 0.000 claims description 9
- 230000004438 eyesight Effects 0.000 claims description 7
- 238000000605 extraction Methods 0.000 claims description 6
- 238000006116 polymerization reaction Methods 0.000 claims description 4
- 238000010606 normalization Methods 0.000 claims description 3
- 238000003064 k means clustering Methods 0.000 claims description 2
- 239000011159 matrix material Substances 0.000 claims description 2
- 238000005259 measurement Methods 0.000 claims description 2
- 239000003814 drug Substances 0.000 abstract description 14
- 229940079593 drug Drugs 0.000 abstract description 14
- 230000000694 effects Effects 0.000 abstract description 5
- 230000004044 response Effects 0.000 description 5
- 238000005070 sampling Methods 0.000 description 4
- 238000010586 diagram Methods 0.000 description 3
- 230000009286 beneficial effect Effects 0.000 description 2
- 238000012545 processing Methods 0.000 description 2
- 238000012360 testing method Methods 0.000 description 2
- 206010013654 Drug abuse Diseases 0.000 description 1
- 238000007630 basic procedure Methods 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 239000012141 concentrate Substances 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 239000000835 fiber Substances 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 230000036541 health Effects 0.000 description 1
- 230000015654 memory Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000001737 promoting effect Effects 0.000 description 1
- 238000000746 purification Methods 0.000 description 1
- 208000011117 substance-related disease Diseases 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2411—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Databases & Information Systems (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
Description
Claims (6)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410609728.4A CN104361059B (en) | 2014-11-03 | 2014-11-03 | A kind of harmful information identification and Web page classification method based on multi-instance learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410609728.4A CN104361059B (en) | 2014-11-03 | 2014-11-03 | A kind of harmful information identification and Web page classification method based on multi-instance learning |
Publications (2)
Publication Number | Publication Date |
---|---|
CN104361059A CN104361059A (en) | 2015-02-18 |
CN104361059B true CN104361059B (en) | 2018-03-27 |
Family
ID=52528320
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201410609728.4A Active CN104361059B (en) | 2014-11-03 | 2014-11-03 | A kind of harmful information identification and Web page classification method based on multi-instance learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN104361059B (en) |
Families Citing this family (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106021442B (en) * | 2016-05-16 | 2019-10-01 | 江苏大学 | A kind of Internet news summary extracting method |
CN106055705B (en) * | 2016-06-21 | 2019-07-05 | 广东工业大学 | Web page classification method based on maximum spacing multitask multi-instance learning |
CN106250924B (en) * | 2016-07-27 | 2019-07-16 | 南京大学 | A kind of newly-increased category detection method based on multi-instance learning |
CN109241379A (en) * | 2017-07-11 | 2019-01-18 | 北京交通大学 | A method of across Modal detection network navy |
CN107480289B (en) * | 2017-08-24 | 2020-06-30 | 成都澳海川科技有限公司 | User attribute acquisition method and device |
CN111259237B (en) * | 2020-01-13 | 2021-02-09 | 中国搜索信息科技股份有限公司 | Method for identifying public harmful information |
CN113254636A (en) * | 2021-04-27 | 2021-08-13 | 上海大学 | Remote supervision entity relationship classification method based on example weight dispersion |
CN116992035B (en) * | 2023-09-27 | 2023-12-08 | 湖南正宇软件技术开发有限公司 | Intelligent classification method, device, computer equipment and medium |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101281521A (en) * | 2007-04-05 | 2008-10-08 | 中国科学院自动化研究所 | Method and system for filtering sensitive web page based on multiple classifier amalgamation |
CN103218608A (en) * | 2013-04-19 | 2013-07-24 | 中国科学院自动化研究所 | Network violent video identification method |
CN103605794A (en) * | 2013-12-05 | 2014-02-26 | 国家计算机网络与信息安全管理中心 | Website classifying method |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102831129B (en) * | 2011-06-16 | 2015-03-04 | 富士通株式会社 | Retrieval method and system based on multi-instance learning |
-
2014
- 2014-11-03 CN CN201410609728.4A patent/CN104361059B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101281521A (en) * | 2007-04-05 | 2008-10-08 | 中国科学院自动化研究所 | Method and system for filtering sensitive web page based on multiple classifier amalgamation |
CN103218608A (en) * | 2013-04-19 | 2013-07-24 | 中国科学院自动化研究所 | Network violent video identification method |
CN103605794A (en) * | 2013-12-05 | 2014-02-26 | 国家计算机网络与信息安全管理中心 | Website classifying method |
Non-Patent Citations (1)
Title |
---|
DRUG-TAKING INSTRUMENTS RECOGNITION;Ruiguang Hu等;《The First Asian Conference on Pattern Recognition》;20111128;90-94 * |
Also Published As
Publication number | Publication date |
---|---|
CN104361059A (en) | 2015-02-18 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN104361059B (en) | A kind of harmful information identification and Web page classification method based on multi-instance learning | |
Novendri et al. | Sentiment analysis of YouTube movie trailer comments using naïve bayes | |
CN109471937A (en) | A kind of file classification method and terminal device based on machine learning | |
CN110909164A (en) | Text enhancement semantic classification method and system based on convolutional neural network | |
CN108965245A (en) | Detection method for phishing site and system based on the more disaggregated models of adaptive isomery | |
US20070294223A1 (en) | Text Categorization Using External Knowledge | |
Huang et al. | JSContana: Malicious JavaScript detection using adaptable context analysis and key feature extraction | |
CN110705247B (en) | Based on x2-C text similarity calculation method | |
Doshi et al. | Movie genre detection using topological data analysis | |
Ashraf et al. | CIC at CheckThat! 2021: Fake News detection Using Machine Learning And Data Augmentation. | |
CN104537280B (en) | Protein interactive relation recognition methods based on text relation similitude | |
CN112052424A (en) | Content auditing method and device | |
Rajesh et al. | Fraudulent news detection using machine learning approaches | |
Huang et al. | Topic detection from microblog based on text clustering and topic model analysis | |
Pritzkau et al. | Finding a line between trusted and untrusted information on tweets through sequence classification | |
Abbasi et al. | Organizing resources on tagging systems using t-org | |
Su et al. | SSL-GAN-RoBERTa: A robust semi-supervised model for detecting Anti-Asian COVID-19 hate speech on social media | |
de Silva | SAFS3 algorithm: Frequency statistic and semantic similarity based semantic classification use case | |
Saha et al. | A large scale study of SVM based methods for abstract screening in systematic reviews | |
Khan et al. | Fake news detection of South African COVID-19 related tweets using machine learning | |
Cuzzola et al. | Automated classification and localization of daily deal content from the Web | |
CN112434126B (en) | Information processing method, device, equipment and storage medium | |
Surendran et al. | Covid-19 fake news detector using hybrid convolutional and Bi-lstm model | |
Chouliara et al. | Fake News Detection Utilizing Textual Cues | |
Ma et al. | LTCR: Long-Text Chinese Rumor Detection Dataset |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
TR01 | Transfer of patent right | ||
TR01 | Transfer of patent right |
Effective date of registration: 20191204 Address after: 250101 2F, Hanyu Jingu new media building, high tech Zone, Jinan City, Shandong Province Patentee after: Renmin Zhongke (Shandong) Intelligent Technology Co.,Ltd. Address before: 100190 Zhongguancun East Road, Beijing, No. 95, No. Patentee before: Institute of Automation, Chinese Academy of Sciences |
|
TR01 | Transfer of patent right | ||
TR01 | Transfer of patent right |
Effective date of registration: 20200311 Address after: Room 201, 2 / F, Hanyu Jingu new media building, no.7000, Jingshi Road, Jinan City, Shandong Province, 250000 Patentee after: Renmin Zhongke (Jinan) Intelligent Technology Co.,Ltd. Address before: 250101 2F, Hanyu Jingu new media building, high tech Zone, Jinan City, Shandong Province Patentee before: Renmin Zhongke (Shandong) Intelligent Technology Co.,Ltd. |
|
CP03 | Change of name, title or address | ||
CP03 | Change of name, title or address |
Address after: 100176 1401, 14th floor, building 8, No. 8 courtyard, No. 1 KEGU street, Beijing Economic and Technological Development Zone, Daxing District, Beijing (Yizhuang group, high-end industrial area, Beijing Pilot Free Trade Zone) Patentee after: Renmin Zhongke (Beijing) Intelligent Technology Co.,Ltd. Address before: Room 201, 2 / F, Hangu Jinggu new media building, 7000 Jingshi Road, Jinan City, Shandong Province Patentee before: Renmin Zhongke (Jinan) Intelligent Technology Co.,Ltd. |