CN114928472A - Method for filtering bad site grey list based on full-volume circulation main domain name - Google Patents

Method for filtering bad site grey list based on full-volume circulation main domain name Download PDF

Info

Publication number
CN114928472A
CN114928472A CN202210416876.9A CN202210416876A CN114928472A CN 114928472 A CN114928472 A CN 114928472A CN 202210416876 A CN202210416876 A CN 202210416876A CN 114928472 A CN114928472 A CN 114928472A
Authority
CN
China
Prior art keywords
domain name
filtering
bad
model
bad site
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202210416876.9A
Other languages
Chinese (zh)
Other versions
CN114928472B (en
Inventor
张兆心
孟月阳
柴婷婷
赵东
陈俊仁
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Harbin Institute of Technology Weihai
Original Assignee
Harbin Institute of Technology Weihai
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Harbin Institute of Technology Weihai filed Critical Harbin Institute of Technology Weihai
Priority to CN202210416876.9A priority Critical patent/CN114928472B/en
Publication of CN114928472A publication Critical patent/CN114928472A/en
Application granted granted Critical
Publication of CN114928472B publication Critical patent/CN114928472B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/02Network architectures or network communication protocols for network security for separating internal from external traffic, e.g. firewalls
    • H04L63/0227Filtering policies
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/10Network architectures or network communication protocols for network security for controlling access to devices or network resources
    • H04L63/101Access control lists [ACL]
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Computer Hardware Design (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computer Security & Cryptography (AREA)
  • Signal Processing (AREA)
  • Evolutionary Computation (AREA)
  • Molecular Biology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a method for filtering a grey list of bad sites based on a full-volume circulation main domain name, which comprises the following steps: step 1, constructing a name discrimination model of a bad site domain name based on character similarity, and realizing coarse filtering of suspected bad site domain names in a full domain name; step 2, identifying whether the domain name can be resolved and used for Web service; step 3, performing coarse filtration based on IP similarity; step 4, classifying the geographical regions of the domain names based on an IP positioning technology; step 5, analyzing the accuracy of the bad site domain name grey list obtained by coarse filtration; and 6, performing iterative optimization on the coarse filtering step 1 and the coarse filtering step 3. The method reduces the magnitude range of the existing domain name in a large range by filtering the similarity of the domain name characters and the service IP, greatly reduces the time consumption caused by acquiring and analyzing the web page text and the snapshot, and realizes the efficient and accurate filtering of the whole domain name.

Description

Method for filtering bad site grey list based on full-volume circulation main domain name
Technical Field
The invention relates to the technical field of construction of a domain name grey list of bad sites, in particular to a method for filtering the bad site grey list based on a full-flow main domain name.
Background
With the rapid development of computer networks, the internet has become an indispensable part of human life. The domain name system provides a mutual mapping function of an IP address and a domain name for applications and services in a network. Through the domain name, people can more conveniently access the internet. Today, however, networks are flooded with a large number of undesirable sites for pornography, gambling, fraud, etc. They not only harm people's mind, but can even seriously harm property safety. Therefore, the identification, monitoring, and management of the bad sites are very important.
The magnitude of main domain names circulating globally is about 2.6 hundred million, the newly added domain names are about 30 ten thousand dynamically each day, and the outdated domain names are about 30 ten thousand each day. At present, the main method for identifying the bad sites is based on the web page texts and the web page snapshots, but the time cost for acquiring and analyzing the web page texts and the web page snapshots is very high. Therefore, the existing system method for efficiently filtering the full-flow main domain name is lacked, so that the full-flow bad site domain name grey list cannot be effectively constructed.
Disclosure of Invention
The invention provides a method for filtering a bad site grey list based on a full circulation main domain name, aiming at the technical problems of long time consumption and high cost of the existing method for filtering the full domain name grey list based on a webpage text and a webpage snapshot.
Therefore, the technical scheme of the invention is that the method for filtering the bad site grey list based on the full-flow main domain name comprises the following steps:
step 1, extracting features from a character string of an existing bad site domain name, establishing a bad site keyword phrase library, and constructing a name discrimination model of the bad site domain name based on character similarity to realize coarse filtering of suspected bad site domain names in a full domain name;
step 2, an IP and port fast scanning model is constructed, service IP and port attribute information of the domain name of the suspected bad site are obtained, and whether the domain name can be analyzed and used for Web service is identified;
step 3, establishing an IP mapping range model of the domain name of the bad site through the existing bad site service IP group, and performing coarse filtering based on IP similarity;
step 4, classifying the geographical area of the domain name based on an IP positioning technology;
step 5, analyzing the accuracy of the domain name grey list of the bad sites obtained by rough filtering by utilizing the existing bad site identification technology;
and 6, performing iterative optimization on the coarse filtering step 1 and the coarse filtering step 3.
Furthermore, the structural form of the domain name of the bad site is divided into two categories, the first category is that the domain name contains English words or Chinese pinyin, the second category is that the domain name is formed by character sequences randomly, in the method based on the character similarity model, aiming at the first category of domain name, a color betting keyword phrase library is constructed to match the keywords, and aiming at the second category of domain name, the judgment whether the character sequences are generated randomly is carried out by training an LSTM neural network model.
Furthermore, the construction method of the color gambling keyword phrase library is that a 37 ten thousand English word dictionary and 405 Chinese pinyin are combined into an English Chinese pinyin dictionary, the longest word matching is carried out from a 39 ten thousand color gambling domain name set, and the high-frequency-appearing pornographic English pinyin phrases are extracted to form the color gambling keyword phrase library for the subsequent keyword matching filtering.
Further, the training method of the LSTM neural network model is that 70 ten thousand Alex domain names and 78 ten thousand random character sequence domain names are used as a training set and a testing set to train the LSTM model, and the LSTM neural network is divided into 3 layers: 1. the preprocessing layer expands the length of a domain name character sequence to 75, then maps character features into an integer index, and finally converts the positive integer index into a dense vector with a fixed size to be embedded as a character; 2. a long-short term memory layer, with the number of cells set to 128, and dropout set to 0.5 for avoiding overfitting; 3. and the output layer adopts 2 classification output.
Further, the method for coarsely filtering the domain name of the suspected bad site comprises the steps of firstly matching keywords through a constructed chromatic gambling keyword phrase library, judging whether the domain name contains chromatic gambling keyword phrases or not, if so, considering that the domain name can be used for the bad site, if not, judging the randomness of character sequences by using a trained LSTM neural network model, judging whether the domain name is composed of characters randomly or not, and if so, considering that the domain name can be used for the bad site.
Further, the method for performing coarse filtering based on IP similarity is to analyze the similarity of the IPs stored in step 2 through an existing IP mapping range model, and if the IP falls within a segment of mapping range of the model, the IP is considered to be used for the content with poor service.
Further, the specific method for performing iterative optimization on the coarse filtering step 1 and the coarse filtering step 3 is,
step S1, dynamically updating the color game keyword phrase library, adding the newly appeared color game English spelling phrases with high frequency into the phrase library, and deleting the phrases which are not used for a long time in the phrase library;
and step S2, dynamically updating the IP mapping range model, integrating the newly-appeared bad site service IP into the model, and reducing the IP range which is missed for a long time in the model.
The method has the advantages that when the grey list of bad sites of the full-flow main domain name is filtered, the 2.6 hundred million domain name magnitude range is reduced by 90% through the filtering of the similarity of the domain name characters and the service IP, the time consumption caused by acquiring and analyzing the webpage text and the snapshot is greatly reduced, and meanwhile, the full-flow domain name is efficiently and accurately filtered. The method provided by the invention can realize high-speed and high-precision filtration of the total domain name.
Drawings
FIG. 1 is a schematic diagram of the construction of a keyword phrase library, an LSTM neural network model and an IP mapping range model according to the present invention;
fig. 2 is a schematic flow chart of filtering the bad site grey list according to the present invention.
Detailed Description
The present invention will be further described with reference to the following examples.
As shown in FIG. 1, the first stage of the present invention requires two steps to construct a character similarity model and an IP mapping range model respectively. The method comprises the following specific steps:
step (1): when the domain names of the bad sites are analyzed, the domain name construction forms of the bad sites are divided into two categories. The first category is the domain name containing english words or chinese pinyin (different language forms, for chinese erotic gambling sites, domain names contain many forms of pinyin), for example: com, tiyubocai, cn, and the like. The second category is that domain names are composed of character sequences randomly (possibly generated randomly by an algorithm), for example: vdqw-96.com, 12034. cn. Therefore, in the method based on the character similarity model, aiming at the first class domain name, a color-gambling keyword phrase library is constructed to match the keywords. And aiming at the second class of domain names, judging whether the character sequences are randomly generated or not by training an LSTM neural network.
(1) Constructing a color gambling keyword phrase library: a dictionary of 37 ten thousand english words and 405 chinese pinyins (without phonetic symbols) are merged into an english-chinese pinyin dictionary. The longest word matching is carried out from the 39-thousand-color gambling domain name set, and the high-frequency-occurrence-frequency pornographic English spelling phrase is extracted to form a color gambling keyword phrase library for the subsequent keyword matching filtering.
(2) Training of the LSTM neural network model: the LSTM model is trained by using 70 ten thousand Alex domain names and 78 ten thousand random character sequence domain names (consisting of the random character sequence domain name and the DGA domain name in the 39 ten thousand gamble domain names) as a training set and a testing set. The neural network is divided into 3 layers: 1. and the preprocessing layer expands the length of the domain name character sequence to 75, then maps character features into integer indexes, and finally converts the positive integer indexes into dense vectors with fixed size to be used as characters for embedding. 2. Long-short term memory layer: the number of cells is set to 128 and dropout is set to 0.5 to avoid overfitting. 3. An output layer: a 2-class output is used. Finally, the accuracy was 94% in the training set and 96% in the test set.
Step (2): the existing 39 ten thousand bet domain names are subjected to DNS resolution to obtain all service IP addresses. Considering that when applying for using IP, a batch of consecutive IP addresses is usually applied as a backup. Therefore, the range mapping is carried out on all the gambling IP, and a model of the mapping range of the gambling IP is constructed for subsequent filtering.
As shown in fig. 2, a method for filtering a bad site grey list based on a full-flow main domain name includes the following specific steps:
step 1: extracting features from the existing domain name character strings of the bad sites, establishing a keyword phrase library of the bad sites, and constructing a domain name discrimination model of the bad sites based on character similarity to realize coarse filtering of suspected bad site domains in the full-scale domain names. And taking a 2.6-hundred million full-scale main domain name as input data to filter the character similarity of the domain name. Wherein, the filtration is divided into two parts to be executed. Firstly, matching keywords through a constructed color betting keyword phrase library, and judging whether a domain name contains a color betting keyword phrase. If the domain name exists, the domain name is considered to be possibly used for a bad site. If the sensitive keywords do not exist, the randomness of the character sequence is judged by using the trained LSTM model, and whether the domain name is composed of characters randomly or not is judged. If yes, the domain name is considered to be possibly used for a bad site. And performing coarse filtering on the domain names of the suspected bad sites through the two parts.
Step 2: and constructing an IP and port fast scanning model, acquiring domain name service IP and port attribute information of the suspected bad site, and identifying whether the domain name can be analyzed and used for Web service. And acquiring the service IP and the port attribute of the domain name set obtained in the last step. Through DNS resolution, the A record is obtained, and all available IP are stored. A port scan is then performed to see if the 80, 443, 8080, etc. ports are open, thereby filtering out the IP for the Web service.
And 3, step 3: and establishing an IP mapping range model of the domain name of the bad site through the existing bad site service IP group, and performing coarse filtering based on the IP similarity. And (3) carrying out similarity analysis on the IP stored in the step (2) through an existing IP mapping range model. If the IP falls within a mapping range of the model, the IP is considered to be used for serving the bad content. Coarse filtering of bad sites by IP similarity.
And 4, step 4: and classifying the geographic regions of the domain names based on the IP positioning technology. The IP physical address attribute is obtained through a service IP positioning technology and is subdivided into domestic and foreign. And corresponding and storing the IP obtained by filtering in the step with the domain name.
And 5: and analyzing the accuracy of the domain name grey list of the bad sites obtained by coarse filtering by utilizing the existing bad site identification technology. And accurately judging the domain name obtained by filtering the steps through the existing bad site judgment model. The discriminant model is based on the web page content and the snapshot, so that the time is consumed when the text content and the snapshot are acquired. However, through the filtering in the above steps, the domain name range has been reduced by 90%, and the domain name sets obtained through the filtering are all domain names highly suspected to be used for bad sites. Therefore, the step can efficiently filter out the bad site domain name grey list and evaluate the filtering effect of the step.
Step 6: and (3) performing iterative optimization coarse filtration, namely storing the gray list of the full-scale betting domain name, and performing iterative optimization of the step 1 and the step 3. The optimization method comprises the following specific steps: and step S1, dynamically updating the color-gambling keyword phrase library, adding the newly appeared and frequently-used color-gambling English spelling phrases into the phrase library, and deleting the phrases which are not used for a long time in the phrase library. And step S2, dynamically updating the IP mapping range model, integrating the newly-appeared bad site service IP into the model, and reducing the IP range missed for a long time in the model.
When the method filters the bad site grey list of the full-flow main domain name, the 2.6 hundred million domain name magnitude range is reduced by 90 percent through the filtering of the similarity of the domain name characters and the service IP, the time consumption caused by acquiring and analyzing the webpage text and the snapshot is greatly reduced, and simultaneously, the high-efficiency and accurate filtering of the full-flow domain name is realized. The method provided by the invention can realize high-speed and high-precision filtering of the full domain name.
However, the above description is only an example of the present invention, and the scope of the present invention should not be limited thereto, so that the substitution of the equivalent elements, or the equivalent changes and modifications made according to the claims should be included in the scope of the present invention.

Claims (7)

1. A bad site grey list filtering method based on a full-flow main domain name is characterized by comprising the following steps:
step 1, extracting features from a character string of an existing bad site domain name, establishing a bad site keyword phrase library, and constructing a name discrimination model of the bad site domain name based on character similarity to realize coarse filtering of suspected bad site domain names in a full domain name;
step 2, an IP and port fast scanning model is constructed, service IP and port attribute information of a suspected bad site domain name are obtained, and whether the domain name can be analyzed and used for Web service is identified;
step 3, establishing an IP mapping range model of the domain name of the bad site through the existing bad site service IP group, and performing coarse filtering based on IP similarity;
step 4, classifying the geographical area of the domain name based on an IP positioning technology;
step 5, analyzing the accuracy of the domain name grey list of the bad sites obtained by rough filtering by utilizing the existing bad site identification technology;
and 6, performing iterative optimization on the coarse filtering step 1 and the coarse filtering step 3.
2. The method according to claim 1, wherein the method for filtering the bad site grey list based on the full-volume circulation main domain name comprises the following steps: the method based on the character similarity model is characterized in that a chromatic gambling keyword phrase library is constructed for the first class of domain names to match keywords, and whether the character sequences are randomly generated or not is judged by training an LSTM neural network model for the second class of domain names.
3. The method according to claim 2, wherein the method for filtering the bad site grey list based on the full-volume circulation main domain name comprises the following steps: the construction method of the color gambling keyword phrase library comprises the steps of combining a 37 ten thousand English word dictionary and 405 Chinese pinyin into an English Chinese pinyin dictionary, matching the longest word from a 39 ten thousand color gambling domain name set, extracting the high-frequency-occurrence-frequency-pornography English pinyin phrases to form the color gambling keyword phrase library, and filtering for subsequent keyword matching.
4. The method as claimed in claim 3, wherein the method for filtering the bad site grey list based on the full-volume circulation main domain name comprises: the LSTM neural network model training method is that 70 ten thousand Alex domain names and 78 ten thousand random character sequence domain names are used as a training set and a testing set to train the LSTM model, and the LSTM neural network is divided into 3 layers: 1. the preprocessing layer expands the length of a domain name character sequence to 75, then maps character features into an integer index, and finally converts the positive integer index into a dense vector with a fixed size to be embedded as a character; 2. a long-short term memory layer, setting the number of cells to 128, and setting dropout to 0.5 for avoiding overfitting; 3. and the output layer adopts 2 classification output.
5. The method according to claim 4, wherein the method for filtering the grey list of bad sites based on the full-volume circulation main domain name comprises the following steps: the method for roughly filtering the domain name of a suspected bad site comprises the steps of firstly matching keywords through a constructed color betting keyword phrase library, judging whether the domain name contains color betting keyword phrases or not, if so, judging that the domain name can be used for the bad site, if not, judging that the character sequence randomness is judged by using a trained LSTM neural network model, judging whether the domain name consists of characters randomly or not, and if so, judging that the domain name can be used for the bad site.
6. The method of claim 5, wherein the method comprises the following steps: the method for performing coarse filtering based on the IP similarity is to analyze the similarity of the IP stored in the step 2 through the existing IP mapping range model, and if the IP falls into a segment of mapping range of the model, the IP is considered to be used for the content with poor service.
7. The method according to claim 6, wherein the method for filtering the grey list of bad sites based on the full circulation major domain name comprises the following steps: the specific method for performing iterative optimization on the coarse filtering step 1 and the step 3 is,
step S1, dynamically updating the color-gambling keyword phrase library, adding the newly appeared and frequently-used color-gambling English spelling phrases into the phrase library, and deleting the phrases which are not used for a long time in the phrase library;
and step S2, dynamically updating the IP mapping range model, integrating the newly-appeared bad site service IP into the model, and reducing the IP range missed for a long time in the model.
CN202210416876.9A 2022-04-20 2022-04-20 Bad site gray list filtering method based on full circulation main domain name Active CN114928472B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210416876.9A CN114928472B (en) 2022-04-20 2022-04-20 Bad site gray list filtering method based on full circulation main domain name

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210416876.9A CN114928472B (en) 2022-04-20 2022-04-20 Bad site gray list filtering method based on full circulation main domain name

Publications (2)

Publication Number Publication Date
CN114928472A true CN114928472A (en) 2022-08-19
CN114928472B CN114928472B (en) 2023-07-18

Family

ID=82807565

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210416876.9A Active CN114928472B (en) 2022-04-20 2022-04-20 Bad site gray list filtering method based on full circulation main domain name

Country Status (1)

Country Link
CN (1) CN114928472B (en)

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040243537A1 (en) * 2001-09-07 2004-12-02 Jiang Wang Contents filter based on the comparison between similarity of content character and correlation of subject matter
CN105897714A (en) * 2016-04-11 2016-08-24 天津大学 Botnet detection method based on DNS (Domain Name System) flow characteristics
US20180227223A1 (en) * 2017-02-06 2018-08-09 Silver Peak Systems, Inc. Multi-level Learning for Classifying Traffic Flows
CN110191103A (en) * 2019-05-10 2019-08-30 长安通信科技有限责任公司 A kind of DGA domain name detection classification method
US10440042B1 (en) * 2016-05-18 2019-10-08 Area 1 Security, Inc. Domain feature classification and autonomous system vulnerability scanning
CN111866196A (en) * 2019-04-26 2020-10-30 深信服科技股份有限公司 Domain name traffic characteristic extraction method, device, equipment and readable storage medium
US20200349430A1 (en) * 2019-05-03 2020-11-05 Webroot Inc. System and method for predicting domain reputation
CN112948725A (en) * 2021-03-02 2021-06-11 北京六方云信息技术有限公司 Phishing website URL detection method and system based on machine learning
US20210314352A1 (en) * 2020-04-03 2021-10-07 Paypal, Inc. Detection of User Interface Imitation
CN114095176A (en) * 2021-10-29 2022-02-25 北京天融信网络安全技术有限公司 Malicious domain name detection method and device
CN114266251A (en) * 2021-12-27 2022-04-01 北京天融信网络安全技术有限公司 Malicious domain name detection method and device, electronic equipment and storage medium

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040243537A1 (en) * 2001-09-07 2004-12-02 Jiang Wang Contents filter based on the comparison between similarity of content character and correlation of subject matter
CN105897714A (en) * 2016-04-11 2016-08-24 天津大学 Botnet detection method based on DNS (Domain Name System) flow characteristics
US10440042B1 (en) * 2016-05-18 2019-10-08 Area 1 Security, Inc. Domain feature classification and autonomous system vulnerability scanning
US20180227223A1 (en) * 2017-02-06 2018-08-09 Silver Peak Systems, Inc. Multi-level Learning for Classifying Traffic Flows
CN111866196A (en) * 2019-04-26 2020-10-30 深信服科技股份有限公司 Domain name traffic characteristic extraction method, device, equipment and readable storage medium
US20200349430A1 (en) * 2019-05-03 2020-11-05 Webroot Inc. System and method for predicting domain reputation
CN110191103A (en) * 2019-05-10 2019-08-30 长安通信科技有限责任公司 A kind of DGA domain name detection classification method
US20210314352A1 (en) * 2020-04-03 2021-10-07 Paypal, Inc. Detection of User Interface Imitation
CN112948725A (en) * 2021-03-02 2021-06-11 北京六方云信息技术有限公司 Phishing website URL detection method and system based on machine learning
CN114095176A (en) * 2021-10-29 2022-02-25 北京天融信网络安全技术有限公司 Malicious domain name detection method and device
CN114266251A (en) * 2021-12-27 2022-04-01 北京天融信网络安全技术有限公司 Malicious domain name detection method and device, electronic equipment and storage medium

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
LONGXI LI ETAL: "《Identifying Gambling and Porn Websites with Image Recognition》", 《SPRINGER INTERNATIONAL PUBLISHING AG》 *
XIANG TIAN ETAL: "《VegaStar: An Illegal Domain Detection System on Large-Scale Video Traffic》", 《2018 17TH IEEE INTERNATIONAL CONFERENCE ON TRUST, SECURITY AND PRIVACY IN COMPUTING AND COMMUNICATIONS/ 12TH IEEE INTERNATIONAL CONFERENCE ON BIG DATA SCIENCE AND ENGINEERING (TRUSTCOM/BIGDATASE)》 *
刘乐群;史君华;: "基于IP地址段的网站内容监控的研究", 现代电子技术, no. 21 *
杜刚 等: "《基于人工智能创作能力的未知不良域名发现技术》", 《电信工程技术与标准化》 *

Also Published As

Publication number Publication date
CN114928472B (en) 2023-07-18

Similar Documents

Publication Publication Date Title
Abdullah et al. Fake news classification bimodal using convolutional neural network and long short-term memory
CN112347244B (en) Yellow-based and gambling-based website detection method based on mixed feature analysis
CN111241389B (en) Sensitive word filtering method and device based on matrix, electronic equipment and storage medium
Maier et al. Machine translation vs. multilingual dictionaries assessing two strategies for the topic modeling of multilingual text collections
CN108053545B (en) Certificate verification method and device, server and storage medium
CN108171073A (en) A kind of private data recognition methods based on the parsing driving of code layer semanteme
CN105760439A (en) Figure cooccurrence relation graph establishing method based on specific behavior cooccurrence network
CN109299469B (en) Method for identifying complex address in long text
CN111723870B (en) Artificial intelligence-based data set acquisition method, apparatus, device and medium
CN111104801A (en) Text word segmentation method, system, device and medium based on website domain name
CN115238688B (en) Method, device, equipment and storage medium for analyzing association relation of electronic information data
CN112989414A (en) Mobile service data desensitization rule generation method based on width learning
CN111369980A (en) Voice detection method and device, electronic equipment and storage medium
CN113254995B (en) Data desensitization method, device, system and computer readable medium
CN110737770B (en) Text data sensitivity identification method and device, electronic equipment and storage medium
CN117312904A (en) Data classification and classification method and related products
CN116663536A (en) Matching method and device for clinical diagnosis standard words
CN114928472B (en) Bad site gray list filtering method based on full circulation main domain name
CN113761137A (en) Method and device for extracting address information
CN116562296A (en) Geographic named entity recognition model training method and geographic named entity recognition method
CN115438340A (en) Mining behavior identification method and system based on morpheme characteristics
CN111159360B (en) Method and device for obtaining query topic classification model and query topic classification
CN113268986A (en) Unit name matching and searching method and device based on fuzzy matching algorithm
CN115619443A (en) Company operation prediction method and system for emotion analysis based on annual report of listed company
CN111488622A (en) Method and device for detecting webpage tampering behavior and related components

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB03 Change of inventor or designer information
CB03 Change of inventor or designer information

Inventor after: Zhang Zhaoxin

Inventor after: Chen Junren

Inventor after: Chai Tingting

Inventor after: Zhao Dong

Inventor after: Meng Yueyang

Inventor before: Zhang Zhaoxin

Inventor before: Meng Yueyang

Inventor before: Chai Tingting

Inventor before: Zhao Dong

Inventor before: Chen Junren

GR01 Patent grant
GR01 Patent grant