CN113157998A - Method, system, device and medium for polling website and judging website type through IP - Google Patents

Method, system, device and medium for polling website and judging website type through IP Download PDF

Info

Publication number
CN113157998A
CN113157998A CN202110222311.2A CN202110222311A CN113157998A CN 113157998 A CN113157998 A CN 113157998A CN 202110222311 A CN202110222311 A CN 202110222311A CN 113157998 A CN113157998 A CN 113157998A
Authority
CN
China
Prior art keywords
classification
website
pictures
characters
webpage
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110222311.2A
Other languages
Chinese (zh)
Inventor
张乐平
顾明娟
吴一超
卞豪
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jiangsu Jiangsuan Tiancheng Information Technology Co ltd
Original Assignee
Jiangsu Jiangsuan Tiancheng Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jiangsu Jiangsuan Tiancheng Information Technology Co ltd filed Critical Jiangsu Jiangsuan Tiancheng Information Technology Co ltd
Priority to CN202110222311.2A priority Critical patent/CN113157998A/en
Publication of CN113157998A publication Critical patent/CN113157998A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/55Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/958Organisation or management of web site content, e.g. publishing, maintaining pages or automatic linking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Molecular Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Mathematical Physics (AREA)
  • Computing Systems (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention relates to a method, a system, equipment and a medium for polling a website and judging the category of the website through IP, wherein the method comprises the following steps: capturing webpage content of a target website; extracting effective characters and pictures in the webpage; classifying and labeling the extracted effective characters and pictures; constructing and training a network model aiming at the character and picture data; respectively inputting pictures and characters crawled out from a webpage in a website as respective corresponding models to obtain a classification prediction result of the pictures and the characters in the webpage, and setting weights of the image classification result and the character classification result; counting the prediction results of all pictures and characters in the website to generate the distribution of picture classification and the distribution of character classification; and calculating a score to obtain a final classification result. The invention simulates the webpage browsing personnel in reality, adopts the artificial intelligence technology, directly analyzes the concrete content in the website, covers the website information such as videos, pictures, characters and the like, and comprehensively forms the website content judgment result.

Description

Method, system, device and medium for polling website and judging website type through IP
Technical Field
The invention relates to the field of computer image processing, in particular to a method, a system, equipment and a medium for polling a website and judging the category of the website through IP.
Background
At present, the following methods are mainly used in the market for solving the website classification:
1) based on the web page text;
A. analyzing effective words of a webpage to be judged in the current mode to judge the type of the website by establishing a website classification dictionary;
B. the similarity between characters is explained by aiming at algorithms such as deep learning CNN;
C. the texts are classified by a machine learning method such as logistic regression and Bayes.
2) The classification is made based on the structural features of the web site.
3) The classification is made based on the website log data.
However, these methods only extract partial features of the website, such as text information features and html structural features of the website, and cannot comprehensively and mathematically characterize the content of the web page, thereby resulting in low classification accuracy. Resulting in many manual corrective actions after machine sorting.
Disclosure of Invention
In order to solve the problem of low classification accuracy of the classification methods, and in consideration of the fact that images and characters are the most direct embodiment of website content classification, the invention provides a method, a system, equipment and a medium for inspecting websites through IP and judging website types, and the classification accuracy can be improved to over 85%.
The technical scheme for realizing the purpose of the invention is as follows: a method for polling websites and judging the category of the websites through IP comprises the following steps:
inputting an IP list, starting crawler scanning, and capturing webpage content of a target website;
judging whether a certain website is accessible or not, and recording the result to a database;
judging whether the record number exists in the webpage content and whether the record number can be checked, and recording the result to a database;
extracting effective characters and pictures in the webpage;
classifying and labeling the extracted effective characters and pictures;
constructing and training a network model aiming at the character and picture data, and writing model parameters into a model library after the training is finished;
respectively inputting pictures and characters crawled out from a webpage in a website as respective corresponding models to obtain a classification prediction result of the pictures and the characters in the webpage, and setting weights of the image classification result and the character classification result; counting the prediction results of all pictures and characters in the website to generate the distribution of picture classification and the distribution of character classification; and calculating a score to obtain a final classification result.
Furthermore, the webpage content of the target website is captured through a python crawler frame script in combination with javascript rendering service splash.
Further, classifying and labeling the extracted effective characters and pictures, specifically comprising: the webpage is used as a grouping dimension, and the pictures and the characters are combined and labeled together and labeled into a certain category or a plurality of categories in a preset classification list.
Further, for the picture data, a VGGNET model is used; for text data, using the textCNN model, the activation function ReLu, convolution kernel size: 14,15,16.
Further, the image prediction is optimized before the model is input, the input images are adjusted in size and are filled into n images to form a batch, batch prediction is carried out, then the output of a second layer is taken as the judgment of the result, n tensors with the shapes of (C, J and K) are generated, and pmap of a certain classification value is taken for comprehensive grading judgment;
the final pmap activation map matrix is
P=(P1+P2+...+Pn)/n
And then, solving a bright point connected graph of the P matrix, and if the area of the bright point connected graph of a certain classification is larger than 50% of the whole area, determining the bright point connected graph as a picture of a certain classification.
Further, during network model training, preprocessing the picture: the original image is expanded into 8 images, corresponding two-dimensional (r, g, b) three-channel vectors are extracted, the height and the width of the images are 224 and 224 respectively, and a tensor with the shape of (3,224,224) is obtained;
preprocessing the characters: the collected text is converted into word vectors by word2vector, each word is represented by a 9-dimensional word vector, forming a matrix of n x 9.
Further, the model training method is as follows:
inputting the picture matrix in the data set into a model for gradient descent training, and writing model parameters of VGGNET into a model library after the training is finished;
and inputting the character matrix in the data set into textCNN for gradient descent training, and writing the model parameters into a model warehouse after the training is finished.
Further, setting the weight of the image classification result as a, the weight of the character classification result as b, and a + b as 1; counting the prediction results of all pictures and characters in a website to generate a picture classification distribution and a character classification distribution, and counting the Y with the highest picture classification count in a classification listn1Count is Cn1(ii) a Counting the Y with highest character count classification in the classification listn2Count is Cn2(ii) a The final calculated score is:
rp=Cn1·a
rt=Cn2·b
wherein r isp、rtRespectively scoring the picture and the character;
by classification Yn1、Yn2The final classification result is the one with a high median score.
A system for inspecting websites and judging website types through IP comprises a user interaction system, a crawler management system, a prediction service system and an AI platform;
the AI platform consists of a data marking tool, a model version management subsystem and a task flow scheduling subsystem and is used for carrying out model training;
the prediction service subsystem is used for classified prediction of characters or pictures;
the crawler management system is used for crawler task allocation, crawler task scheduling, specific crawler extraction logic setting and webpage character and picture extraction;
the user interaction system is used for customizing a website warehouse to be scanned and periodically scanning and classifying the websites in the website warehouse by an order placing mode.
An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the above method when executing the program.
A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the above-mentioned method.
Compared with the prior art, the invention has the beneficial effects that: (1) according to the method, a webpage browser in reality is simulated, a deep learning technology is adopted, specific contents in a website are directly analyzed, webpage contents such as videos, pictures and characters are extracted, two different neural networks are respectively adopted for classification prediction aiming at the pictures and the characters in the webpage, then, the distribution of prediction results is weighted and calculated to accurately represent the webpage contents, and the accuracy of classification prediction is greatly improved; the improved algorithm specifically adopts VGGNET to predict pictures in the webpage and TextCNN to predict characters in the webpage, and finally weights the distribution of the classification prediction results of the pictures and the characters to form a website content judgment result; (2) the invention can greatly improve the efficiency of culture law enforcement officers, and the culture law enforcement officers can rapidly put up a case and investigate according to the classification result of the invention.
Drawings
FIG. 1 is a schematic block diagram of a system for polling a website and determining the category of the website according to IP.
Fig. 2 is a flow chart of an IP patrol process.
Fig. 3 is a schematic diagram of processing optimization before picture prediction is input into a model.
Detailed Description
With reference to fig. 1, the invention periodically captures the web page content of the monitored IP website by using an artificial intelligence machine learning technique, and intelligently analyzes the characters, images and videos in the web page based on artificial intelligence techniques such as image recognition and semantic recognition to classify the web page. The classification method mainly comprises the following steps:
step 1, capturing webpage contents of n target websites through scratch + splash;
step 2, extracting effective characters and pictures in the webpage;
step 3, classifying and labeling the extracted effective characters and pictures; the specific method comprises the following steps:
the webpage is used as a grouping dimension, and the pictures and the characters are combined and labeled to be labeled into a certain category or a plurality of categories in a preset classification list;
step 4, training
Step 4.1 network model design
1) For picture data, an ALEXNET model with the last layer (FCN layer) removed is used, and the output result of the last second layer is used as a model output result;
2) for text data, using the textCNN model, the activation function ReLu, convolution kernel size: 14,15, 16; step 4.2 Picture preprocessing
The original image is expanded into 8 images, corresponding two-dimensional (r, g, b) three-channel vectors are extracted, the height and the width of the images are 224 and 224 respectively, and a tensor with the shape of (3,224,224) is obtained;
step 4.3 text preprocessing
Converting the collected characters into word vectors through word2vector, wherein each word is represented by a 9-dimensional word vector to form a matrix of (n, 9);
step 4.4 training the model
Inputting the image matrix in the data set into a model for gradient descent training, and writing model parameters into a model warehouse after the training is finished;
inputting a character matrix in the data set into textCNN for gradient descent training, and writing model parameters into a model warehouse after training is finished;
step 5, use of model
Respectively taking pictures and characters crawled from the web pages in the website as the input of respective corresponding models to obtain the classification prediction result Y of the pictures and the characters in the web pagesn1、Yn2. Wherein the weight of the image classification result is set to be 0.6, and the weight of the character classification result is set to be 0.4. And finally, counting the prediction results of all pictures and characters in a website to generate the distribution of picture classification and the distribution of character classification. As in the following table:
input X Classification result Y
Web page 1-FIG. 1 Film and television
Web page 1-text 1 Forum/basketball
Web pages 1-2 Film and television
Web page 2-FIG. 1 Film and television
Web page 2-text 1 Forum/football
Web page 3-FIG. 1 Film and television/documentary
...... .....
Counting Y with highest picture classification count in the classification listn1Count is Cn1. Counting the Y with highest character count classification in the classification listn2Count is Cn2. Final calculated score
rp=Cn1·0.6
rt=Cn2·0.4
Wherein r isp、rtThe score is the score of the picture and the character.
If classification Yn1Score high, just classify Yn1Is the final classification result.
The image prediction is processed and optimized before the model is input, as shown in fig. 3, the input image resize is filled into n images to form a batch, batch prediction is carried out, then the output of the second layer is taken as the judgment of the result, n tensors with the shapes of (C, J, K) are generated, wherein C is a classification type, J K represents how many small areas 224 × 224 are included in the input image, and pmap of a certain classification value is taken for comprehensive grading judgment; the pmap activation map is shown in fig. 3.
The final pmap activation map matrix is:
P=(P1+P2+...+Pn)/n
and then, solving a bright spot connected graph of the P matrix, and if the area of the bright block connected graph of a certain classification is larger than 50% of the whole area, determining the bright block connected graph as a picture of a certain classification. The batch prediction method improves the overall prediction speed and accuracy.
The invention gives IP range, judges whether the IP is provided with a website or not by our tool, divides the accessible websites into details and classifies the websites into pornography, gambling, reaction, literature/house fighting novel, video and audio, video and animation and the like. Furthermore, evidence of website infringement can be made according to characters and picture characteristics of a plurality of copyrighted works, and a suspected video and audio work list is given.
As shown in fig. 2, the specific IP polling process is as follows:
1) a user inputs an IP list;
2) starting crawler scanning;
3) judging whether a certain website is accessible or not, and recording the result to a database;
4) judging whether the record number exists in the webpage content and whether the record number can be checked, and recording the result to a database;
5) inputting the extracted characters and pictures into a prediction system;
6) and recording the classification result of the website corresponding to a certain IP into a database.
The invention also provides a system for inspecting the website and judging the website category through the IP, which comprises a user interaction system, a crawler management system, a prediction service system and an AI platform;
the AI platform consists of subsystems such as a data marking tool, model version management, task flow scheduling and the like, and solves the problem of how to train the model;
the prediction service subsystem provides classified prediction service of characters or pictures, and solves the problem of how to use the model;
the crawler management system provides the abilities of crawler task allocation, crawler task scheduling and specific crawler extraction logic setting, and solves the problem of how to extract characters and pictures of a webpage quickly and accurately;
the user interaction system enables a user to customize a website warehouse which needs to be scanned, enables the user to scan and classify the websites regularly in an order placing mode, and points out possible content (pictures or characters) problems of the websites, such as pornography pictures to be checked by law enforcement.
Further, the present invention also provides an electronic device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor implements the classification method when executing the computer program.
And a computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the above-mentioned classification method.
The technical solutions in the embodiments of the present invention are described in detail below with reference to the accompanying drawings of the embodiments of the present invention, and it should be noted that only some embodiments, but not all embodiments, are provided in the implementation of the present invention.
Examples
1089793 IPs in a certain city range are scanned, and whether a website is erected or not is monitored; for the IP of the erected website, the first two pages of the website are accessed, whether ICP record numbers exist or not is checked, keywords are extracted, attribute classification is carried out on the website, and the three websites including games, videos (movies and music) and novels are focused. The overall scan results are shown in the table below.
Figure RE-GDA0003064813930000061
Figure RE-GDA0003064813930000071
The table shows that the method can help law enforcement departments to patrol legal websites of jurisdictions and give accurate classification, and the classification accuracy reaches 99.79%.
In the embodiments provided in the present application, it should be understood that the disclosed system, apparatus and method may be implemented in other ways. Each functional unit may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.
The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.
The above-mentioned embodiments are only used for illustrating the technical solutions of the present invention, and not for limiting the same; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims (10)

1. A method for polling websites and judging the category of the websites through IP is characterized by comprising the following steps:
inputting an IP list, starting crawler scanning, and capturing webpage content of a target website;
judging whether the website is accessible or not, and recording the result to a database;
judging whether the record number exists in the webpage content and whether the record number can be checked, and recording the result to a database;
extracting effective characters and pictures in the webpage;
classifying and labeling the extracted effective characters and pictures;
constructing and training a network model aiming at the character and picture data, and writing model parameters into a model library after the training is finished;
respectively inputting pictures and characters crawled out from a webpage in a website as respective corresponding models to obtain a classification prediction result of the pictures and the characters in the webpage, and setting weights of the image classification result and the character classification result; counting the prediction results of all pictures and characters in the website to generate the distribution of picture classification and the distribution of character classification; and calculating a score to obtain a final classification result.
2. The method of claim 1, wherein the web page content of the target website is crawled by a python crawler framework script in combination with a javascript rendering service splash;
classifying and labeling the extracted effective characters and pictures, specifically comprising the following steps: the webpage is used as a grouping dimension, and the pictures and the characters are combined and labeled together and labeled into a certain category or a plurality of categories in a preset classification list.
3. Method according to claim 1, characterized in that for picture data a VGG NET model is used; for text data, using the textCNN model, the activation function ReLu, convolution kernel size: 14,15,16.
4. The method of claim 1 or 3, wherein the picture prediction is optimized before inputting the model, the input pictures are resized and filled into n pictures to form a batch, the batch prediction is performed, then the output of the second layer is taken as the result judgment, n tensors with the shape of (C, J, K) are generated, and pmap of a certain classification value is taken for the comprehensive scoring judgment;
the final pmap activation map matrix is
P=(P1+P2+...+Pn)/n
And then, solving a bright spot connected graph of the P matrix, and if the area of the bright block connected graph of a certain classification is larger than 50% of the whole area, determining the bright block connected graph as a picture of a certain classification.
5. The method of claim 4, wherein the network model is trained by preprocessing the pictures: the original image is expanded into 8 images, corresponding two-dimensional (r, g, b) three-channel vectors are extracted, the height and the width of the images are 224 and 224 respectively, and a tensor with the shape of (3,224,224) is obtained;
preprocessing the characters: the collected text is converted into word vectors by word2vector, each word is represented by a 9-dimensional word vector, forming a matrix of n x 9.
6. The method of claim 1, wherein the model training method is as follows:
inputting the picture matrix in the data set into a model for gradient descent training, and writing model parameters of VGG NET into a model library after the training is finished;
and inputting the character matrix in the data set into textCNN for gradient descent training, and writing the model parameters into a model library after the training is finished.
7. The method according to claim 1, wherein the weight of the image classification result is set to a, the weight of the text classification result is set to b, and a + b is 1; counting the prediction results of all pictures and characters in a website to generate a picture classification distribution and a character classification distribution, and counting the Y with the highest picture classification count in a classification listn1Count is Cn1(ii) a Counting the Y with highest character count classification in the classification listn2Count is Cn2(ii) a The final calculated score is:
rp=Cn1·a
rt=Cn2·b
wherein r isp、rtScoring the pictures and the characters;
by classification Yn1、Yn2The final classification result is the one with a high median score.
8. A system for inspecting websites and judging website types through IP is characterized by comprising a user interaction system, a crawler management system, a prediction service system and an AI platform;
the AI platform consists of a data marking tool, a model version management subsystem and a task flow scheduling subsystem and is used for carrying out model training;
the prediction service subsystem is used for classified prediction of characters or pictures;
the crawler management system is used for crawler task allocation, crawler task scheduling, specific crawler extraction logic setting and webpage character and picture extraction;
the user interaction system is used for customizing a website library to be scanned and periodically scanning and classifying the websites in the website library by an order placing mode.
9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the method according to any of claims 1-7 when executing the program.
10. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the method according to any one of claims 1-7.
CN202110222311.2A 2021-02-28 2021-02-28 Method, system, device and medium for polling website and judging website type through IP Pending CN113157998A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110222311.2A CN113157998A (en) 2021-02-28 2021-02-28 Method, system, device and medium for polling website and judging website type through IP

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110222311.2A CN113157998A (en) 2021-02-28 2021-02-28 Method, system, device and medium for polling website and judging website type through IP

Publications (1)

Publication Number Publication Date
CN113157998A true CN113157998A (en) 2021-07-23

Family

ID=76883725

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110222311.2A Pending CN113157998A (en) 2021-02-28 2021-02-28 Method, system, device and medium for polling website and judging website type through IP

Country Status (1)

Country Link
CN (1) CN113157998A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114021064A (en) * 2022-01-06 2022-02-08 北京微步在线科技有限公司 Website classification method, device, equipment and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107341183A (en) * 2017-05-31 2017-11-10 中国科学院信息工程研究所 A kind of Website classification method based on darknet website comprehensive characteristics
CN110852368A (en) * 2019-11-05 2020-02-28 南京邮电大学 Global and local feature embedding and image-text fusion emotion analysis method and system
CN111259141A (en) * 2020-01-13 2020-06-09 北京工业大学 Social media corpus emotion analysis method based on multi-model fusion
CN112330463A (en) * 2020-11-27 2021-02-05 杭州安恒信息技术股份有限公司 Method, device, equipment and medium for detecting legal qualification of financing website
CN112347244A (en) * 2019-08-08 2021-02-09 四川大学 Method for detecting website involved in yellow and gambling based on mixed feature analysis

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107341183A (en) * 2017-05-31 2017-11-10 中国科学院信息工程研究所 A kind of Website classification method based on darknet website comprehensive characteristics
CN112347244A (en) * 2019-08-08 2021-02-09 四川大学 Method for detecting website involved in yellow and gambling based on mixed feature analysis
CN110852368A (en) * 2019-11-05 2020-02-28 南京邮电大学 Global and local feature embedding and image-text fusion emotion analysis method and system
CN111259141A (en) * 2020-01-13 2020-06-09 北京工业大学 Social media corpus emotion analysis method based on multi-model fusion
CN112330463A (en) * 2020-11-27 2021-02-05 杭州安恒信息技术股份有限公司 Method, device, equipment and medium for detecting legal qualification of financing website

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114021064A (en) * 2022-01-06 2022-02-08 北京微步在线科技有限公司 Website classification method, device, equipment and storage medium

Similar Documents

Publication Publication Date Title
CN106599155B (en) Webpage classification method and system
CN108334605B (en) Text classification method and device, computer equipment and storage medium
CN108629043B (en) Webpage target information extraction method, device and storage medium
CN109325165B (en) Network public opinion analysis method, device and storage medium
CN112749608B (en) Video auditing method, device, computer equipment and storage medium
CN111860171B (en) Method and system for detecting irregular-shaped target in large-scale remote sensing image
CN108073568A (en) keyword extracting method and device
CN109299258A (en) A kind of public sentiment event detecting method, device and equipment
CN112541476B (en) Malicious webpage identification method based on semantic feature extraction
CN116097250A (en) Layout aware multimodal pre-training for multimodal document understanding
Termritthikun et al. NU-InNet: Thai food image recognition using convolutional neural networks on smartphone
KR102407057B1 (en) Systems and methods for analyzing the public data of SNS user channel and providing influence report
CN116415017B (en) Advertisement sensitive content auditing method and system based on artificial intelligence
CN114218958A (en) Work order processing method, device, equipment and storage medium
CN110990563A (en) Artificial intelligence-based traditional culture material library construction method and system
CN113469214A (en) False news detection method and device, electronic equipment and storage medium
CN110969332A (en) Enterprise screening method and device
CN111723287A (en) Content and service recommendation method and system based on large-scale machine learning
CN112818206B (en) Data classification method, device, terminal and storage medium
CN113157998A (en) Method, system, device and medium for polling website and judging website type through IP
CN112396091B (en) Social media image popularity prediction method, system, storage medium and application
CN112241470A (en) Video classification method and system
CN106570910B (en) Based on the image automatic annotation method from coding characteristic and Neighborhood Model
Das et al. A comparative analysis and study of a fast parallel cnn based deepfake video detection model with feature selection (fpc-dfm)
CN112507115B (en) Method and device for classifying emotion words in barrage text and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB03 Change of inventor or designer information

Inventor after: Zhang Leping

Inventor after: Wu Yichao

Inventor after: Gu Mingjuan

Inventor after: Bian Hao

Inventor before: Zhang Leping

Inventor before: Gu Mingjuan

Inventor before: Wu Yichao

Inventor before: Bian Hao

CB03 Change of inventor or designer information
RJ01 Rejection of invention patent application after publication

Application publication date: 20210723

RJ01 Rejection of invention patent application after publication