CN112884053A - Website classification method, system, equipment and medium based on image-text mixed characteristics - Google Patents
Website classification method, system, equipment and medium based on image-text mixed characteristics Download PDFInfo
- Publication number
- CN112884053A CN112884053A CN202110222323.5A CN202110222323A CN112884053A CN 112884053 A CN112884053 A CN 112884053A CN 202110222323 A CN202110222323 A CN 202110222323A CN 112884053 A CN112884053 A CN 112884053A
- Authority
- CN
- China
- Prior art keywords
- model
- classification
- vector
- image
- paragraph
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 37
- 239000013598 vector Substances 0.000 claims abstract description 93
- 238000012549 training Methods 0.000 claims abstract description 39
- 230000015654 memory Effects 0.000 claims abstract description 19
- 239000011159 matrix material Substances 0.000 claims abstract description 14
- 238000004590 computer program Methods 0.000 claims description 6
- 238000006243 chemical reaction Methods 0.000 claims description 4
- 241000282414 Homo sapiens Species 0.000 abstract description 5
- 238000013527 convolutional neural network Methods 0.000 description 3
- 238000010586 diagram Methods 0.000 description 3
- 238000012360 testing method Methods 0.000 description 3
- 238000010801 machine learning Methods 0.000 description 2
- 238000012545 processing Methods 0.000 description 2
- 208000001613 Gambling Diseases 0.000 description 1
- 238000001545 Page's trend test Methods 0.000 description 1
- 238000013528 artificial neural network Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000002596 correlated effect Effects 0.000 description 1
- 230000000875 corresponding effect Effects 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000007477 logistic regression Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000006403 short-term memory Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2411—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
- G06F16/355—Class or cluster creation or modification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/50—Information retrieval; Database structures therefor; File system structures therefor of still image data
- G06F16/55—Clustering; Classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/951—Indexing; Web crawling techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/044—Recurrent networks, e.g. Hopfield networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- Computing Systems (AREA)
- Biomedical Technology (AREA)
- General Health & Medical Sciences (AREA)
- Computational Linguistics (AREA)
- Biophysics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Molecular Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Health & Medical Sciences (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
Description
Model (model) | Error rate |
LSTM model based on image-text mixed characteristics | 9.7% |
CNN model based on character characteristics | 26.7% |
SVM model based on webpage structure characteristics | 41.9% |
Claims (10)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110222323.5A CN112884053B (en) | 2021-02-28 | 2021-02-28 | Website classification method, system, equipment and medium based on image-text mixed characteristics |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110222323.5A CN112884053B (en) | 2021-02-28 | 2021-02-28 | Website classification method, system, equipment and medium based on image-text mixed characteristics |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112884053A true CN112884053A (en) | 2021-06-01 |
CN112884053B CN112884053B (en) | 2022-04-15 |
Family
ID=76054868
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110222323.5A Active CN112884053B (en) | 2021-02-28 | 2021-02-28 | Website classification method, system, equipment and medium based on image-text mixed characteristics |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112884053B (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115982505A (en) * | 2023-03-16 | 2023-04-18 | 北京匠数科技有限公司 | Website detection method and device based on VLM |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20160328384A1 (en) * | 2015-05-04 | 2016-11-10 | Sri International | Exploiting multi-modal affect and semantics to assess the persuasiveness of a video |
US20170270123A1 (en) * | 2016-03-18 | 2017-09-21 | Adobe Systems Incorporated | Generating recommendations for media assets to be displayed with related text content |
CN109934260A (en) * | 2019-01-31 | 2019-06-25 | 中国科学院信息工程研究所 | Image, text and data fusion sensibility classification method and device based on random forest |
CN110196945A (en) * | 2019-05-27 | 2019-09-03 | 北京理工大学 | A kind of microblog users age prediction technique merged based on LSTM with LeNet |
CN110399458A (en) * | 2019-07-04 | 2019-11-01 | 淮阴工学院 | A kind of Text similarity computing method based on latent semantic analysis and accidental projection |
CN112287272A (en) * | 2020-10-27 | 2021-01-29 | 中国科学院计算技术研究所 | Method, system and storage medium for classifying website list pages |
-
2021
- 2021-02-28 CN CN202110222323.5A patent/CN112884053B/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20160328384A1 (en) * | 2015-05-04 | 2016-11-10 | Sri International | Exploiting multi-modal affect and semantics to assess the persuasiveness of a video |
US20170270123A1 (en) * | 2016-03-18 | 2017-09-21 | Adobe Systems Incorporated | Generating recommendations for media assets to be displayed with related text content |
CN109934260A (en) * | 2019-01-31 | 2019-06-25 | 中国科学院信息工程研究所 | Image, text and data fusion sensibility classification method and device based on random forest |
CN110196945A (en) * | 2019-05-27 | 2019-09-03 | 北京理工大学 | A kind of microblog users age prediction technique merged based on LSTM with LeNet |
CN110399458A (en) * | 2019-07-04 | 2019-11-01 | 淮阴工学院 | A kind of Text similarity computing method based on latent semantic analysis and accidental projection |
CN112287272A (en) * | 2020-10-27 | 2021-01-29 | 中国科学院计算技术研究所 | Method, system and storage medium for classifying website list pages |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115982505A (en) * | 2023-03-16 | 2023-04-18 | 北京匠数科技有限公司 | Website detection method and device based on VLM |
Also Published As
Publication number | Publication date |
---|---|
CN112884053B (en) | 2022-04-15 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109885692B (en) | Knowledge data storage method, apparatus, computer device and storage medium | |
Nguyen et al. | A neural local coherence model | |
CN107463605B (en) | Method and device for identifying low-quality news resource, computer equipment and readable medium | |
US11288324B2 (en) | Chart question answering | |
US9336299B2 (en) | Acquisition of semantic class lexicons for query tagging | |
CN105022754B (en) | Object classification method and device based on social network | |
CN111241232B (en) | Business service processing method and device, service platform and storage medium | |
CN112417153B (en) | Text classification method, apparatus, terminal device and readable storage medium | |
WO2022048363A1 (en) | Website classification method and apparatus, computer device, and storage medium | |
CN110175221B (en) | Junk short message identification method by combining word vector with machine learning | |
CN110968725B (en) | Image content description information generation method, electronic device and storage medium | |
CN113392209A (en) | Text clustering method based on artificial intelligence, related equipment and storage medium | |
CN111782804B (en) | Text CNN-based co-distributed text data selection method, system and storage medium | |
CN108595426B (en) | Word vector optimization method based on Chinese character font structural information | |
Dong et al. | Cross-media similarity evaluation for web image retrieval in the wild | |
Aziguli et al. | A robust text classifier based on denoising deep neural network in the analysis of big data | |
CN112884053B (en) | Website classification method, system, equipment and medium based on image-text mixed characteristics | |
Schmitt et al. | Outlier detection on semantic space for sentiment analysis with convolutional neural networks | |
US11989526B2 (en) | Systems and methods for short text similarity based clustering | |
CN109446321A (en) | Text classification method, text classification device, terminal and computer readable storage medium | |
CN115640376A (en) | Text labeling method and device, electronic equipment and computer-readable storage medium | |
Jirathampradub et al. | A 3D-CNN siamese network for motion gesture Sign Language alphabets recognition | |
Chen et al. | Class-aware convolution and attentive aggregation for image classification | |
CN113962221A (en) | Text abstract extraction method and device, terminal equipment and storage medium | |
Hong et al. | Deep cross-modal hashing retrieval based on semantics preserving and vision transformer |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
CB03 | Change of inventor or designer information | ||
CB03 | Change of inventor or designer information |
Inventor after: Zhang Leping Inventor after: Wu Yichao Inventor after: Gu Mingjuan Inventor after: Bian Hao Inventor before: Zhang Leping Inventor before: Gu Mingjuan Inventor before: Wu Yichao Inventor before: Bian Hao |
|
GR01 | Patent grant | ||
GR01 | Patent grant | ||
CP03 | Change of name, title or address | ||
CP03 | Change of name, title or address |
Address after: Room 608, 6th Floor, No. 66 Jingcheng Haoyuan, Zhonglou District, Nanjing City, Jiangsu Province, 213003 Patentee after: Changzhou Jiangsuan Tiancheng Information Technology Co.,Ltd. Country or region after: China Address before: 6 / F 608, 66 jingchenghaoyuan, Zhonglou District, Changzhou City, Jiangsu Province 213000 Patentee before: Jiangsu Jiangsuan Tiancheng Information Technology Co.,Ltd. Country or region before: China |