CN104361059A - 一种基于多示例学习的有害信息识别和网页分类方法 - Google Patents
一种基于多示例学习的有害信息识别和网页分类方法 Download PDFInfo
- Publication number
- CN104361059A CN104361059A CN201410609728.4A CN201410609728A CN104361059A CN 104361059 A CN104361059 A CN 104361059A CN 201410609728 A CN201410609728 A CN 201410609728A CN 104361059 A CN104361059 A CN 104361059A
- Authority
- CN
- China
- Prior art keywords
- webpage
- effective image
- bag
- text
- web page
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 44
- 239000013598 vector Substances 0.000 claims description 55
- 238000012706 support-vector machine Methods 0.000 claims description 16
- 239000000284 extract Substances 0.000 claims description 15
- 238000012549 training Methods 0.000 claims description 14
- 230000000007 visual effect Effects 0.000 claims description 9
- 238000006116 polymerization reaction Methods 0.000 claims description 5
- 238000003064 k means clustering Methods 0.000 claims description 2
- 238000010606 normalization Methods 0.000 claims description 2
- 230000000694 effects Effects 0.000 abstract description 5
- 231100000331 toxic Toxicity 0.000 abstract 1
- 230000002588 toxic effect Effects 0.000 abstract 1
- 239000003814 drug Substances 0.000 description 13
- 229940079593 drug Drugs 0.000 description 13
- 230000004044 response Effects 0.000 description 5
- 238000010586 diagram Methods 0.000 description 4
- 238000005070 sampling Methods 0.000 description 4
- 238000000605 extraction Methods 0.000 description 3
- 230000006870 function Effects 0.000 description 2
- 239000000203 mixture Substances 0.000 description 2
- 230000008569 process Effects 0.000 description 2
- 238000012545 processing Methods 0.000 description 2
- 238000012360 testing method Methods 0.000 description 2
- 244000025254 Cannabis sativa Species 0.000 description 1
- 235000012766 Cannabis sativa ssp. sativa var. sativa Nutrition 0.000 description 1
- 235000012765 Cannabis sativa ssp. sativa var. spontanea Nutrition 0.000 description 1
- 206010013654 Drug abuse Diseases 0.000 description 1
- 238000007630 basic procedure Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 235000009120 camo Nutrition 0.000 description 1
- 235000005607 chanvre indien Nutrition 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 239000012141 concentrate Substances 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 239000011487 hemp Substances 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 239000011159 matrix material Substances 0.000 description 1
- 230000002787 reinforcement Effects 0.000 description 1
- 208000011117 substance-related disease Diseases 0.000 description 1
- 238000012795 verification Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2411—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Databases & Information Systems (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
Description
Claims (10)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410609728.4A CN104361059B (zh) | 2014-11-03 | 2014-11-03 | 一种基于多示例学习的有害信息识别和网页分类方法 |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410609728.4A CN104361059B (zh) | 2014-11-03 | 2014-11-03 | 一种基于多示例学习的有害信息识别和网页分类方法 |
Publications (2)
Publication Number | Publication Date |
---|---|
CN104361059A true CN104361059A (zh) | 2015-02-18 |
CN104361059B CN104361059B (zh) | 2018-03-27 |
Family
ID=52528320
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201410609728.4A Active CN104361059B (zh) | 2014-11-03 | 2014-11-03 | 一种基于多示例学习的有害信息识别和网页分类方法 |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN104361059B (zh) |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106021442A (zh) * | 2016-05-16 | 2016-10-12 | 江苏大学 | 一种网络新闻概要提取方法 |
CN106055705A (zh) * | 2016-06-21 | 2016-10-26 | 广东工业大学 | 基于最大间距多任务多示例学习的网页分类方法 |
CN106250924A (zh) * | 2016-07-27 | 2016-12-21 | 南京大学 | 一种基于多示例学习的新增类别检测方法 |
CN107480289A (zh) * | 2017-08-24 | 2017-12-15 | 成都澳海川科技有限公司 | 用户属性获取方法及装置 |
CN109241379A (zh) * | 2017-07-11 | 2019-01-18 | 北京交通大学 | 一种跨模态检测网络水军的方法 |
CN111259237A (zh) * | 2020-01-13 | 2020-06-09 | 中国搜索信息科技股份有限公司 | 一种用于公众有害信息的识别方法 |
CN113254636A (zh) * | 2021-04-27 | 2021-08-13 | 上海大学 | 一种基于示例权重离散度的远程监督实体关系分类方法 |
CN116992035A (zh) * | 2023-09-27 | 2023-11-03 | 湖南正宇软件技术开发有限公司 | 一种提案智能分类的方法、装置、计算机设备和介质 |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101281521A (zh) * | 2007-04-05 | 2008-10-08 | 中国科学院自动化研究所 | 一种基于多分类器融合的敏感网页过滤方法及*** |
JP2013004093A (ja) * | 2011-06-16 | 2013-01-07 | Fujitsu Ltd | マルチインスタンス学習による検索方法及びシステム |
CN103218608A (zh) * | 2013-04-19 | 2013-07-24 | 中国科学院自动化研究所 | 一种网络暴力视频的识别方法 |
CN103605794A (zh) * | 2013-12-05 | 2014-02-26 | 国家计算机网络与信息安全管理中心 | 一种网站分类方法 |
-
2014
- 2014-11-03 CN CN201410609728.4A patent/CN104361059B/zh active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101281521A (zh) * | 2007-04-05 | 2008-10-08 | 中国科学院自动化研究所 | 一种基于多分类器融合的敏感网页过滤方法及*** |
JP2013004093A (ja) * | 2011-06-16 | 2013-01-07 | Fujitsu Ltd | マルチインスタンス学習による検索方法及びシステム |
CN103218608A (zh) * | 2013-04-19 | 2013-07-24 | 中国科学院自动化研究所 | 一种网络暴力视频的识别方法 |
CN103605794A (zh) * | 2013-12-05 | 2014-02-26 | 国家计算机网络与信息安全管理中心 | 一种网站分类方法 |
Non-Patent Citations (1)
Title |
---|
RUIGUANG HU等: "DRUG-TAKING INSTRUMENTS RECOGNITION", 《THE FIRST ASIAN CONFERENCE ON PATTERN RECOGNITION》 * |
Cited By (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106021442B (zh) * | 2016-05-16 | 2019-10-01 | 江苏大学 | 一种网络新闻概要提取方法 |
CN106021442A (zh) * | 2016-05-16 | 2016-10-12 | 江苏大学 | 一种网络新闻概要提取方法 |
CN106055705A (zh) * | 2016-06-21 | 2016-10-26 | 广东工业大学 | 基于最大间距多任务多示例学习的网页分类方法 |
CN106055705B (zh) * | 2016-06-21 | 2019-07-05 | 广东工业大学 | 基于最大间距多任务多示例学习的网页分类方法 |
CN106250924A (zh) * | 2016-07-27 | 2016-12-21 | 南京大学 | 一种基于多示例学习的新增类别检测方法 |
CN106250924B (zh) * | 2016-07-27 | 2019-07-16 | 南京大学 | 一种基于多示例学习的新增类别检测方法 |
CN109241379A (zh) * | 2017-07-11 | 2019-01-18 | 北京交通大学 | 一种跨模态检测网络水军的方法 |
CN107480289A (zh) * | 2017-08-24 | 2017-12-15 | 成都澳海川科技有限公司 | 用户属性获取方法及装置 |
CN107480289B (zh) * | 2017-08-24 | 2020-06-30 | 成都澳海川科技有限公司 | 用户属性获取方法及装置 |
CN111259237A (zh) * | 2020-01-13 | 2020-06-09 | 中国搜索信息科技股份有限公司 | 一种用于公众有害信息的识别方法 |
CN113254636A (zh) * | 2021-04-27 | 2021-08-13 | 上海大学 | 一种基于示例权重离散度的远程监督实体关系分类方法 |
CN116992035A (zh) * | 2023-09-27 | 2023-11-03 | 湖南正宇软件技术开发有限公司 | 一种提案智能分类的方法、装置、计算机设备和介质 |
CN116992035B (zh) * | 2023-09-27 | 2023-12-08 | 湖南正宇软件技术开发有限公司 | 一种提案智能分类的方法、装置、计算机设备和介质 |
Also Published As
Publication number | Publication date |
---|---|
CN104361059B (zh) | 2018-03-27 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN104361059A (zh) | 一种基于多示例学习的有害信息识别和网页分类方法 | |
CN103218444B (zh) | 基于语义的藏文网页文本分类方法 | |
CN101430695B (zh) | 用于计算单词之间的差相关度的***和方法 | |
CN107133213A (zh) | 一种基于算法的文本摘要自动提取方法与*** | |
US20070294223A1 (en) | Text Categorization Using External Knowledge | |
CN104951548A (zh) | 一种负面舆情指数的计算方法及*** | |
CN107992542A (zh) | 一种基于主题模型的相似文章推荐方法 | |
CN103617157A (zh) | 基于语义的文本相似度计算方法 | |
CN107291723A (zh) | 网页文本分类的方法和装置,网页文本识别的方法和装置 | |
CN103559199B (zh) | 网页信息抽取方法和装置 | |
CN105653668A (zh) | 云环境中基于DOMTree的网页内容分析提取优化方法 | |
CN104615593A (zh) | 微博热点话题自动检测方法及装置 | |
CN103246644B (zh) | 一种网络舆情信息处理方法和装置 | |
CN103544255A (zh) | 基于文本语义相关的网络舆情信息分析方法 | |
EP2041669A2 (en) | Text categorization using external knowledge | |
CN102945244A (zh) | 基于句号特征字串的中文网页重复文档检测和过滤方法 | |
CN104239485A (zh) | 一种基于统计机器学习的互联网暗链检测方法 | |
CN106126502A (zh) | 一种基于支持向量机的情感分类***及方法 | |
Chen et al. | Learning to predict charges for judgment with legal graph | |
CN103530316A (zh) | 一种基于多视图学习的科学主题提取方法 | |
CN104537280B (zh) | 基于文本关系相似性的蛋白质交互关系识别方法 | |
Hassan et al. | Automatic document topic identification using wikipedia hierarchical ontology | |
CN103699568B (zh) | 一种从维基中抽取领域术语间上下位关系的方法 | |
Croce et al. | Semantic convolution kernels over dependency trees: smoothed partial tree kernel | |
de Silva | SAFS3 algorithm: Frequency statistic and semantic similarity based semantic classification use case |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
TR01 | Transfer of patent right | ||
TR01 | Transfer of patent right |
Effective date of registration: 20191204 Address after: 250101 2F, Hanyu Jingu new media building, high tech Zone, Jinan City, Shandong Province Patentee after: Renmin Zhongke (Shandong) Intelligent Technology Co.,Ltd. Address before: 100190 Zhongguancun East Road, Beijing, No. 95, No. Patentee before: Institute of Automation, Chinese Academy of Sciences |
|
TR01 | Transfer of patent right | ||
TR01 | Transfer of patent right |
Effective date of registration: 20200311 Address after: Room 201, 2 / F, Hanyu Jingu new media building, no.7000, Jingshi Road, Jinan City, Shandong Province, 250000 Patentee after: Renmin Zhongke (Jinan) Intelligent Technology Co.,Ltd. Address before: 250101 2F, Hanyu Jingu new media building, high tech Zone, Jinan City, Shandong Province Patentee before: Renmin Zhongke (Shandong) Intelligent Technology Co.,Ltd. |
|
CP03 | Change of name, title or address | ||
CP03 | Change of name, title or address |
Address after: 100176 1401, 14th floor, building 8, No. 8 courtyard, No. 1 KEGU street, Beijing Economic and Technological Development Zone, Daxing District, Beijing (Yizhuang group, high-end industrial area, Beijing Pilot Free Trade Zone) Patentee after: Renmin Zhongke (Beijing) Intelligent Technology Co.,Ltd. Address before: Room 201, 2 / F, Hangu Jinggu new media building, 7000 Jingshi Road, Jinan City, Shandong Province Patentee before: Renmin Zhongke (Jinan) Intelligent Technology Co.,Ltd. |