CN106911717A - A kind of domain name detection method and device - Google Patents

A kind of domain name detection method and device Download PDF

Info

Publication number
CN106911717A
CN106911717A CN201710242441.6A CN201710242441A CN106911717A CN 106911717 A CN106911717 A CN 106911717A CN 201710242441 A CN201710242441 A CN 201710242441A CN 106911717 A CN106911717 A CN 106911717A
Authority
CN
China
Prior art keywords
domain name
condition code
detected
normal
letter
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201710242441.6A
Other languages
Chinese (zh)
Inventor
曹磊
徐业礼
童宁
吴湘宁
徐江明
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chengdu Yaxin Network Security Industry Technology Research Institute Co Ltd
Original Assignee
Chengdu Yaxin Network Security Industry Technology Research Institute Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chengdu Yaxin Network Security Industry Technology Research Institute Co Ltd filed Critical Chengdu Yaxin Network Security Industry Technology Research Institute Co Ltd
Priority to CN201710242441.6A priority Critical patent/CN106911717A/en
Publication of CN106911717A publication Critical patent/CN106911717A/en
Pending legal-status Critical Current

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1441Countermeasures against malicious traffic
    • H04L63/1466Active attacks involving interception, injection, modification, spoofing of data unit addresses, e.g. hijacking, packet injection or TCP sequence number attacks
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L61/00Network arrangements, protocols or services for addressing or naming
    • H04L61/30Managing network names, e.g. use of aliases or nicknames
    • H04L61/3015Name registration, generation or assignment
    • H04L61/3025Domain name generation or assignment

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Computer Hardware Design (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

Embodiments of the invention provide a kind of domain name detection method and device, are related to the communications field, can solve the problem that the problem of the domain name that None- identified is generated using DGA algorithms in the prior art.Including:The condition code of domain name to be detected and the condition code of normal domain name are obtained, condition code is used to indicate the distribution of letter in domain name or the distribution of letter and numeral;The feature gap between the condition code of domain name to be detected and the condition code of normal domain name is calculated, feature gap is used to indicate the similarity degree between the condition code of domain name to be detected and the condition code of normal domain name;Determined to be accessed for whether domain name is to generate the domain name that DGA algorithms are generated using domain name according to feature gap.The present invention is for detecting domain name.

Description

A kind of domain name detection method and device
Technical field
The present invention relates to the communications field, more particularly to a kind of domain name detection method and device.
Background technology
With continuing to develop for Internet technology, network has incorporated the every aspect of people's life.However, hacker enters The derivative developed as Internet technology is invaded, also becomes all-pervasive, network security is threaten increasingly seriously.Wherein, lead to Rogue program such as wooden horse that implantation can be remotely controlled in the terminal of access network etc. is crossed, hacker can reach control should The purpose of terminal.
In order to tackle the invasion of hacker, the malice being implanted on hacker's control terminal can be monitored and prevented by fire wall Program.But with the development of technology, increasing rogue program can actively initiate connection, and this connection is usually using HTTP The mode of agreement realizes that the stop that can bypass fire wall is connected to remote server, to realize long-range control of the hacker to terminal System.In order to solve the above problems, in the prior art there is provided a kind of by the domain name detection method based on blacklist, wherein, when When user is matched by the domain name that terminal is accessed with the domain name in blacklist, user is forbidden to continue to access the domain by terminal Name.
Although the above method can prevent a part of hacker from being implanted into the connection that rogue program is actively initiated, more and more Rogue program begin to use specific domain name generation (English full name:Domain Generation Algorithm, English letter Claim:DGA) algorithm generation domain name.Because the domain name detection method None- identified for being based on blacklist in the prior art uses DGA algorithms The domain name of generation, and use DGA algorithms generate domain name speed it is higher, can automatically generate daily more than 50,000 random Domain name, the domain name in blacklist can generally be bypassed in the prior art far fewer than the domain name of DGA algorithms generation, therefore rogue program For the detection of domain name, the success rate in the prior art to the detection of improper domain name is reduced.
The content of the invention
The application provides a kind of domain name detection method and device, can solve the problem that None- identified is calculated using DGA in the prior art The problem of the domain name of method generation.
In a first aspect, The embodiment provides a kind of domain name detection method, including:Obtain the spy of domain name to be detected The condition code of code and normal domain name is levied, condition code is used to indicate the distribution of letter in domain name or the distribution of letter and numeral;Meter The feature gap between the condition code of domain name to be detected and the condition code of normal domain name is calculated, feature gap is used to indicate domain to be detected Similarity degree between the condition code and the condition code of normal domain name of name;According to feature gap determine be accessed for domain name whether be The domain name that DGA algorithms are generated is generated using domain name.
Second aspect, The embodiment provides a kind of domain name detection means.Including:Processing module, for obtaining The condition code of the condition code of domain name to be detected and normal domain name, condition code be used to indicate in domain name the distribution of letter or letter and The distribution of numeral;The feature that processing module is additionally operable to calculate between the condition code of domain name to be detected and the condition code of normal domain name is poor Away from feature gap is used to indicate the similarity degree between the condition code of domain name to be detected and the condition code of normal domain name;Detection mould Block, for being determined to be accessed for whether domain name is to generate the domain name that DGA algorithms are generated using domain name according to feature gap.
The embodiment provides a kind of domain name detection method and device, by the condition code for obtaining domain name to be detected And the condition code of normal domain name, and the feature calculated between the condition code of domain name to be detected and the condition code of normal domain name is poor Away from, it is used to indicate the distribution of letter or letter and numeral in domain name due to condition code, therefore according to the condition code of domain name to be detected And the feature gap between the condition code of normal domain name can determine the condition code of domain name to be detected and the condition code of normal domain name Similarity, because when the condition code of domain name to be detected and the larger similarity gap of the condition code of normal domain name, this is to be checked Survey the possibility that domain name is the domain name for generating the generation of DGA algorithms using domain name higher, therefore quilt can be determined according to feature gap Whether the domain name of access is to generate the domain name that DGA algorithms are generated using domain name.Therefore domain name detection provided in an embodiment of the present invention Method solves the problems, such as the domain name that None- identified is generated using DGA algorithms in the prior art, improves and improper domain name is examined The success rate of survey, improves Consumer's Experience.
Brief description of the drawings
In order to illustrate more clearly about the embodiment of the present invention or technical scheme of the prior art, embodiment will be described below Needed for the accompanying drawing to be used be briefly described, it should be apparent that, drawings in the following description are only more of the invention Embodiment, for those of ordinary skill in the art, on the premise of not paying creative work, can also be attached according to these Figure obtains other accompanying drawings.
Character distribution probability shows in a kind of domain name generated by DGA algorithms that Fig. 1 is provided by embodiments of the invention It is intended to;
The schematic diagram of character distribution probability in a kind of normal domain name that Fig. 2 is provided by embodiments of the invention;
A kind of indicative flowchart of domain name detection method that Fig. 3 is provided by embodiments of the invention;
A kind of indicative flowchart of domain name detection method that Fig. 4 is provided by another embodiment of the present invention;
A kind of schematic diagram of domain name detection means that Fig. 5 is provided by embodiments of the invention;
A kind of schematic diagram of domain name detection means that Fig. 6 is provided by another embodiment of the present invention.
Specific embodiment
Below in conjunction with the accompanying drawing in the embodiment of the present invention, the technical scheme in the embodiment of the present invention is carried out clear, complete Site preparation is described, it is clear that described embodiment is only a part of embodiment of the invention, rather than whole embodiments.It is based on Embodiment in the present invention, it is every other that those of ordinary skill in the art are obtained under the premise of creative work is not made Embodiment, belongs to the scope of protection of the invention.
For the ease of clearly describing the technical scheme of the embodiment of the present invention, in an embodiment of the present invention, employ " the One ", the printed words such as " second " make a distinction to function and the essentially identical identical entry of effect or similar item, and those skilled in the art can To understand that the printed words such as " first ", " second " are not to be defined to quantity and execution order.
As the derivative that Internet technology develops, the serious life for affecting people of malicious attack behavior that hacker dominates It is living.Wherein, wherein, it is black by the rogue program such as wooden horse that can be remotely controlled of implantation etc. in the terminal of access network Visitor can reach the purpose for controlling the terminal.Existing rogue program such as wooden horse, corpse etc. typically can be by monitoring certain The network port, waits long-range control server to be attached it, to receive the remote control of hacker.Attacked to take precautions against hacker Hit, can be by setting the connection of firewall blocks rogue program and remote server.But with the development of technology, rogue program In order to avoid being found or detecting, start actively to initiate connection from network internal, the connection that above-mentioned active is initiated generally makes Realized with the mode of http protocol, the stop that can bypass fire wall is connected to remote server, reaches and receives remote control Purpose.
In order to solve the above problems, in the prior art there is provided a kind of by the domain name detection method based on blacklist, its In, when user is matched by the domain name that terminal is accessed with the domain name in blacklist, forbid user to continue to access by terminal The domain name.This mode can effectively stop that a part of rogue program receives the problem of remote control really, but due to above-mentioned domain Name detection method it is relatively simple, the dependence to blacklist is higher, at the same for example representative Zeus of rogue program and Conficker gradually begins to use specific domain name generation (English full name:Domain Generation Algorithm, English Referred to as:DGA) algorithm regularly, automatically generate domain name, and actively access the domain name of the generation.Due to being generated using DGA algorithms The speed of domain name is higher, and the mutation highest of such as Conficker can automatically generate random more than 50,000 daily Domain name, by comparison, the domain name in blacklist can generally be bypassed far fewer than the domain name of DGA algorithms generation, therefore rogue program In the prior art for the detection of domain name, the success rate of domain name detection method in the prior art is reduced.
Regarding to the issue above, applicant have observed that the character of the overwhelming majority is all numeral and word in usually used domain name Why mother, can constitute such a domain name using these numerals and letter, be for the ease of memory and complete by spreading all over These domain names for being easy to remember are converted into IP address and conducted interviews by the DNS service of ball.Therefore when domain name is not for DGA is calculated During the normal domain name of method generation, the character chosen for the ease of memory in the domain name more necessarily has the word of physical meaning Or phrase.And rogue program such as wooden horse etc. using DGA algorithms generate domain name be in order to bypass existing detecting system, by Most of domain name of DGA algorithms generation is to randomly select.Therefore normal domain name is constituted with the character of the domain name generated by DGA algorithms In the presence of certain difference.
As shown in Figure 1, The embodiment provides character distribution probability in a kind of domain name generated by DGA algorithms Schematic diagram, as shown in Figure 2, The embodiment provides a kind of schematic diagram of character distribution probability in normal domain name, In accompanying drawing 1, accompanying drawing 2, what transverse axis was represented is all digital and letter and "-", and above-mentioned numeral and letter and "-" are all groups Into the most common character of domain name.What the longitudinal axis was represented is the probability occurred in all samples of statistics.With reference to the accompanying drawings 1 and accompanying drawing Understood shown in 2, the probability that numeral and letter occur in the domain name generated by DGA algorithms is relatively average, some conventional letters What is occurred on the contrary is less, and the corresponding character distribution of normal domain name then significantly shows the differentiation of probability, and some characters occur Probability be significantly larger than other characters;The domain name for being generated by DGA algorithms simultaneously more uses numeral relative to normal domain name Alphabetical such as " xyz " etc. being of little use with some.
According to features described above, as shown in Figure 3, embodiments of the invention provide a kind of domain name detection method, including:
101st, the condition code of domain name to be detected and the condition code of normal domain name are obtained.
Wherein, condition code is used to indicate the distribution of letter in domain name or the distribution of letter and numeral.
Specifically, domain name to be detected can be to potentially include the domain name generated by DGA algorithms, domain name to be detected can be system It is all in dynamic daily record data to be accessed for domain name.Exemplary, can be all users' access in network with detection domain name Domain name, it can be the daily record that the web for obtaining all users in network using bypass testing equipment is accessed to obtain the domain name to be detected, And all parse as domain name to be detected all of domain name in daily record.Normal domain name can be to have confirmed that do not include by DGA The domain name of algorithm generation, normal domain name can be exemplary to realize obtaining, and can obtain preceding 1,000,000 in the ranking of Alexa websites Domain name, and using 1,000,000 domain name as normal domain name.
It should be noted that when domain name to be detected or normal domain name include more domain name, can treat detection domain name or Normal domain name is grouped, to obtain the condition code of multiple domain names to be detected.Wherein treat detection domain name carry out packet can be with base Detection domain name is treated in same target IP address to be grouped, the domain name that correspond to same target IP address is classified after grouping For identical is grouped;Can also based on identical subdomain name treat detection domain name be grouped, here subdomain name refer to removal TLD and CcTLD later subdomain name, such as two domain name hezl3.xk80p.com and 14lyu.xk80p.com to be detected, after removal Sew the entitled xl80p of " .com " later subdomain, therefore identical packet can be classified them as.
Likewise, because normal domain name potentially includes more domain name, can at random be taken in normal domain name a number of The multiple sample groups of domain name generation, domain name of each sample group comprising equal number.Acquired multiple sample groups can be as work It is to be contrasted to find that potentially possible DGA algorithms in domain name to be detected are generated with the domain name to be detected after packet according to object Forgery domain name.Exemplary, 1000 samples can be randomly generated based on before ranking in Alexa 1,000,000 domain name Group, each sample group includes 500 domain names.
Specifically, condition code is used to indicate the distribution of letter or the distribution of letter and numeral in domain name, wherein letter can be with Including a to z, numeral can include 0 to 9.Further, it can be by the alphabetical or alphabetical and numeral in domain name to obtain condition code The character string obtained after being ranked up according to occurrence number.When domain name to be detected or normal domain name are divided into multiple domain name groups, Acquired condition code potentially includes multiple, and each digital or letter goes out during now condition code is used to indicate a domain name group The distribution situation of occurrence number.
Exemplary, obtaining the condition code of domain name to be detected can be:
Detection domain name is treated to be grouped to obtain multiple domain name groups;
When the distribution of letter and numeral during condition code is used to indicate domain name, numeral and letter occur in statistics domain name group Number of times, such as " x " occur in that 230 times, " 3 " occur in that 59 is inferior;Choose domain name group in 10 numerals of occurrence number highest and Letter, if the numeral and letter for obtaining less than if 10 it is considered that the information that the domain name group is included be not enough to it is follow-up for doing Judge, directly abandon the domain name group;10 numerals come will be selected and obtain one according to the arrangement of occurrence number descending with letter 10 condition codes of byte;
When the distribution of letter during condition code is used to indicate domain name, the number of times that letter occurs in statistics domain name group;Choose domain 10 letters of occurrence number highest in name group, can consider the information that the domain name group is included if the letter for obtaining is less than 10 It is not enough to for doing follow-up judgement, directly abandons the domain name group;10 letters for coming will be selected according to occurrence number descending Arrangement obtains 10 condition codes for byte;
Obtaining the condition code of normal domain name can be:
When the distribution of letter and numeral during condition code is used to indicate domain name, numeral occurs with letter in counting normal domain name Number of times.Choose 10 numerals of occurrence number highest and letter in normal domain name;10 numerals and the letter for coming will be selected 10 condition codes for byte are obtained according to the arrangement of occurrence number descending;
When the distribution of letter during condition code is used to indicate domain name, the number of times of letter appearance in normal domain name is counted;Choose 10 letters of occurrence number highest in normal domain name;10 letters for coming will be selected to be obtained according to the arrangement of occurrence number descending One 10 condition code of byte.
102nd, the feature gap between the condition code and the condition code of normal domain name of domain name to be detected is calculated.
Wherein, feature gap is used to indicate the similar journey between the condition code of domain name to be detected and the condition code of normal domain name Degree.
Specifically, feature gap is similar between the condition code of domain name to be detected and the condition code of normal domain name for indicating Degree, further, the condition code of the bigger explanation domain name to be detected of feature gap is with the similarity of the condition code of normal domain name more Low, domain name to be detected is that the probability of a forgery domain name generated by DGA algorithms is larger, otherwise then indicates the spy of domain name to be detected Levy code higher with the similarity of the condition code of normal domain name, domain name to be detected is a forgery domain name generated by DGA algorithms Probability is smaller.
Further, when the distribution of letter during condition code is used to indicate domain name or the distribution of letter and numeral, can be with Calculate the Jie Kade similarity measurements between the condition code of domain name to be detected and the condition code of the normal domain name, and by Jie Kade phases Like property degree as the feature gap between the condition code of domain name to be detected and the condition code of the normal domain name.
When the distribution of letter and numeral during condition code is used to indicate domain name, can be according to Damerau-Levenshtein Distance algorithm calculates the condition code of domain name to be detected and the Damerau-Levenshtein distances of the condition code of normal domain name, Damerau-Levenshtein distances are the feature gap between the condition code and the condition code of normal domain name of domain name to be detected.
103rd, determined to be accessed for whether domain name is to generate the domain name that DGA algorithms are generated using domain name according to feature gap.
Specifically, after feature gap between the condition code and the condition code of normal domain name for obtaining domain name to be detected, can Contrasted with standard value set in advance with by this feature gap, when comparing result meets to be required, it may be determined that to be detected Domain name it is alphabetical with normal domain name in alphabetical characteristic distributions have an obvious difference, or determine detecting domains name letter and number with The characteristic distributions of the letter and number in normal domain name have obvious difference, so that it is determined that the domain name to be detected is to use domain name The domain name of generation DGA algorithm generations.
Further, when it is determined that domain name to be detected is the domain name generated using domain name generation DGA algorithms, can be to this Domain name to be detected is marked.
The embodiment provides a kind of domain name detection method, by the condition code and just for obtaining domain name to be detected The condition code of normal domain name, and the feature gap between the condition code of domain name to be detected and the condition code of normal domain name is calculated, due to Condition code is used to indicate the distribution of letter or letter and numeral in domain name, therefore according to the condition code and normal operation in normal domain of domain name to be detected Feature gap between the condition code of name can determine the condition code of domain name to be detected and the similarity of the condition code of normal domain name, Due to when the condition code of domain name to be detected and the larger similarity gap of the condition code of normal domain name, the domain name to be detected is to make The possibility for generating the domain name that DGA algorithms are generated with domain name is higher, therefore can be determined to be accessed for domain name according to feature gap Whether it is to generate the domain name that DGA algorithms are generated using domain name.Therefore domain name detection method provided in an embodiment of the present invention is solved The problem of the domain name that None- identified is generated using DGA algorithms, improves the success rate to the detection of improper domain name in the prior art, Improve Consumer's Experience.
Further, as shown in Figure 4, embodiments of the invention provide a kind of domain name detection method, including:
201st, web access logs are obtained, and parses web access logs to obtain domain name to be detected.
Specifically, it can be by bypassing all users in the network that testing equipment gets to obtain web access logs The daily record that web is accessed, and all of domain name in daily record is all parsed, and as the basis of subsequent treatment.It is different from The domain name that user accesses is analyzed based on the common flow using DNS, is accessed using the actual web of user terminal Daily record is more targeted also more accurate compared to use DNS flows as initial data.
202nd, by domain name to be detected and the TLD suffix LTD and national TLD suffix of normal domain name CcTLD is removed.
Specifically, by domain name to be detected and TLD suffix (the English full name of normal domain name:top-level Domain, english abbreviation:) and national TLD suffix (English full name TLD:country-code top-level Domain, english abbreviation:CcTLD after) all getting rid of, TLD and ccTLD can be avoided from disturbing domain name to be detected and normal strongly The result of calculation of the letter and number distribution statisticses of the alphabetical or to be detected domain name and normal domain name of domain name, and and then influence to treat The final Detection results of detection domain name.
203rd, by domain name to be detected and the prefix of normal domain name " www " removal.
Specifically, by domain name to be detected and the prefix of normal domain name " www " after removal, can avoid domain name to be detected with And the prefix of normal domain name " www " disturb the alphabetical or to be detected domain name and normal domain name of domain name to be detected and normal domain name Letter and number distribution statisticses result of calculation, and and then influence the final Detection results of domain name to be detected
204th, by the character removal in domain name to be detected and normal domain name in addition to 0-9, a-z, " ", " _ " and "-".
Specifically, it is possible to use regular expression such as ^ [0-9a-z._-]+by domain name to be detected and normal domain name Character removal in addition to 0-9, a-z, " ", " _ " and "-", regular expression is meant that and only includes 0-9, and a-z is added " ", " _ " and three characters of "-" have 39 characters altogether as character set, by the character independent assortment in above-mentioned character set Just as effective domain name, all domain names for not meeting this condition are all dropped domain name as invalid domain name.
Further, because Chinese domain name can be converted to PunyCode (starting with xn-) domain name by browser, it is therefore desirable to Chinese domain name in domain name to be detected and normal domain name is converted to the identification of PunyCode, prevents from being missed as conventional domain names It is judged to DGA domain names.
Further, can also all be changed into small by all of domain name capital and small letter in domain name just to be detected and normal domain name Write, in order to follow-up unified comparing.
205th, the condition code of domain name to be detected and the condition code of normal domain name are obtained.
Referring in particular to step 101 in above-described embodiment, will not be repeated here.
206th, when the distribution of letter during condition code is used to indicate domain name or the distribution of letter and numeral, domain to be detected is calculated Jie Kade similarity measurements between the condition code and the condition code of normal domain name of name.
Wherein, Jie Kade similarity measurements are poor feature between the condition code of domain name to be detected and the condition code of normal domain name Away from.
Specifically, when the distribution of letter during condition code is used to indicate domain name, the band detection domain name after packet can be calculated Each domain name group condition code with packet after normal domain name each sample group condition code between Jie Kade phases Like property degree, the arithmetic mean of instantaneous value of acquired multiple Jie Kade similarity measurements is then calculated, and using the arithmetic mean of instantaneous value as working as Spy of condition code when being used to indicate the distribution of letter in domain name between the condition code of domain name to be detected and the condition code of normal domain name Levy gap.
When letter is with digital distribution during condition code is used to indicate domain name, the band detection domain name after packet can be calculated Jie Kade between the condition code of each domain name group and the condition code of each sample group of the normal domain name after packet is similar Property degree, then calculate the arithmetic mean of instantaneous value of acquired multiple Jie Kade similarity measurements, and using the arithmetic mean of instantaneous value as spy Feature when levying code for the distribution for indicating letter in domain name between the condition code of domain name to be detected and the condition code of normal domain name Gap.
207th, the Jie Kade similarity measurements that the condition code of the distribution of letter and numeral is obtained according to for indicating domain name During more than or equal to 0.8, it is determined that it is the domain name generated using DGA algorithms to be accessed for domain name.
Above-mentioned steps are clearly indicated obvious between the condition code of domain name to be detected and the condition code of normal domain name Gap, also implying that the characteristic distributions of letter in the characteristic distributions and normal domain name of letter in domain name to be detected has significantly not Together, or in domain name to be detected the characteristic distributions of numeral and letter have significantly with the characteristic distributions of numeral and letter in normal domain name Difference, therefore all domain names to be detected for meeting conditions above all may be the potential forgery domain name generated by DGA algorithms.
208th, as Jie Kade differences and the outstanding person obtained according to the condition code of the distribution of letter and numeral in being used to indicate domain name When the ratio of card moral similarity measurements is less than 0.1, it is determined that it is the domain name generated using DGA algorithms to be accessed for domain name.
Wherein, the Jie Kade for being used to indicate the condition code acquisition of the distribution of letter in domain name according to Jie Kade differences is similar Property degree and according to the difference between the Jie Kade similarity measurements for indicating the condition code of the distribution of letter and numeral in domain name to obtain The absolute value of value.
Above-mentioned steps are used to avoid domain name the recognizing by mistake comprising a large amount of numerals in domain name to be detected and normal domain name It is set to forgery domain name (the such as website 10086.cn of China Mobile).Although from the point of view of the numeral of normal domain name and letter distribution, The frequency of use of numerals is relatively low, but relies solely on and be judged as a domain name for forgery is also improper comprising substantial amounts of numeral 's.
209th, when the distribution of letter and numeral during condition code is used to indicate domain name, according to Damerau-Levenshtein Distance algorithm calculates the condition code of domain name to be detected and the Damerau-Levenshtein distances of the condition code of normal domain name.
Wherein, Damerau-Levenshtein distances for condition code and the normal domain name of domain name to be detected condition code it Between feature gap.
210th, when Damerau-Levenshtein distances are more than or equal to 0.9, it is determined that it is to use to be accessed for domain name The domain name of DGA algorithms generation.
Above-mentioned steps are clearly indicated to be had substantially very much between the condition code of domain name to be detected and the condition code of normal domain name Gap, also implying that the characteristic distributions of letter in the characteristic distributions and normal domain name of letter in domain name to be detected has significantly There is substantially the characteristic distributions of numeral and letter with the characteristic distributions of numeral and letter in normal domain name in difference, or domain name to be detected Difference, therefore all domain names to be detected for meeting conditions above all may be the potential forgery domain name generated by DGA algorithms.
211st, the domain name in the access record of terminal is matched with the domain name for being defined as being generated using DGA algorithms.
Specifically, after determining the domain name generated using DGA algorithms after testing result is obtained, can be by acquired inspection It is that the original log record that domain name group is compared for generating is carried out to survey result and bring back to the daily record that original web accesses again Match somebody with somebody.Web in order to be directed to user according to matching result accesses behavior and carries out filtering so as to reduction as far as possible is reported by mistake and can Potential infected machine, and the infected order of severity are determined during to be filtered in behavior.
When the domain name in the access record of terminal and the domain name for being defined as being generated using DGA algorithms meet infection matching condition When, perform step 212.
Matching condition is deleted when the domain name in the access record of terminal is met with the domain name for being defined as being generated using DGA algorithms When, perform step 213.
212nd, determine that terminal is infection terminal.
213rd, the domain name for deleting matching condition will be met to be deleted from the domain name for being defined as being generated using DGA algorithms.
Exemplary, can be based on the web access logs in past 7 days, to be defined as using DGA algorithms with all The domain name of generation is compared, and finally only chooses those and is visited more than 3 days domain names all to being defined as being generated using DGA algorithms The access record asked, and accessed at least 3 machine conducts of the different domain names for being defined as using DGA algorithms to generate altogether Final infected machine, in order to take measures to carry out killing or isolation as early as possible to final infected machine, reduce into The harm of one step;If certain domain name for being defined as being generated using DGA algorithms was only accessed less than 3 times by a machine, recognize No longer it is labeled as forging domain name for this domain name is a possible statistics noise.From such behavior filter type Principle is:One infected bot program is as the machine of a Botnet part, it is necessary to periodically go to access the control of behind Machine processed keeps this control and controlled relation, and is namely using the purpose that DGA algorithms carry out domain name forgery Hide possible domain name blacklist filtering, thus the DGA algorithms that access of infected machine forge domain name should also be need through Often change.So can determine to be exactly infected terminal substantially after the terminal for finding above behavior.
The embodiment provides a kind of domain name detection method, by the condition code and just for obtaining domain name to be detected The condition code of normal domain name, and the feature gap between the condition code of domain name to be detected and the condition code of normal domain name is calculated, due to Condition code is used to indicate the distribution of letter or letter and numeral in domain name, therefore according to the condition code and normal operation in normal domain of domain name to be detected Feature gap between the condition code of name can determine the condition code of domain name to be detected and the similarity of the condition code of normal domain name, Due to when the condition code of domain name to be detected and the larger similarity gap of the condition code of normal domain name, the domain name to be detected is to make The possibility for generating the domain name that DGA algorithms are generated with domain name is higher, therefore can be determined to be accessed for domain name according to feature gap Whether it is to generate the domain name that DGA algorithms are generated using domain name.Therefore domain name detection method provided in an embodiment of the present invention is solved The problem of the domain name that None- identified is generated using DGA algorithms, improves the success rate to the detection of improper domain name in the prior art, Improve Consumer's Experience.
As shown in Figure 5, The embodiment provides a kind of domain name detection means 500, including:
Processing module 501, for obtaining the condition code of domain name to be detected and the condition code of normal domain name.
Wherein, condition code is used to indicate the distribution of letter in domain name or the distribution of letter and numeral.
Specifically, domain name to be detected can be to potentially include the domain name generated by DGA algorithms, domain name to be detected can be system It is all in dynamic daily record data to be accessed for domain name.Exemplary, can be all users' access in network with detection domain name Domain name, it can be the daily record that the web for obtaining all users in network using bypass testing equipment is accessed to obtain the domain name to be detected, And all parse as domain name to be detected all of domain name in daily record.Normal domain name can be to have confirmed that do not include by DGA The domain name of algorithm generation, normal domain name can be exemplary to realize obtaining, and can obtain preceding 1,000,000 in the ranking of Alexa websites Domain name, and using 1,000,000 domain name as normal domain name.
It should be noted that when domain name to be detected or normal domain name include more domain name, can treat detection domain name or Normal domain name is grouped, to obtain the condition code of multiple domain names to be detected.Wherein treat detection domain name carry out packet can be with base Detection domain name is treated in same target IP address to be grouped, the domain name that correspond to same target IP address is classified after grouping For identical is grouped;Can also based on identical subdomain name treat detection domain name be grouped, here subdomain name refer to removal TLD and CcTLD later subdomain name, such as two domain name hezl3.xk80p.com and 14lyu.xk80p.com to be detected, after removal Sew the entitled xl80p of " .com " later subdomain, therefore identical packet can be classified them as.
Likewise, because normal domain name potentially includes more domain name, can at random be taken in normal domain name a number of The multiple sample groups of domain name generation, domain name of each sample group comprising equal number.Acquired multiple sample groups can be as work It is to be contrasted to find that potentially possible DGA algorithms in domain name to be detected are generated with the domain name to be detected after packet according to object Forgery domain name.Exemplary, 1000 samples can be randomly generated based on before ranking in Alexa 1,000,000 domain name Group, each sample group includes 500 domain names.
Specifically, condition code is used to indicate the distribution of letter or the distribution of letter and numeral in domain name, wherein letter can be with Including a to z, numeral can include 0 to 9.Further, it can be by the alphabetical or alphabetical and numeral in domain name to obtain condition code The character string obtained after being ranked up according to occurrence number.When domain name to be detected or normal domain name are divided into multiple domain name groups, Acquired condition code potentially includes multiple, and each digital or letter goes out during now condition code is used to indicate a domain name group The distribution situation of occurrence number.
Exemplary, obtaining the condition code of domain name to be detected can be:
Detection domain name is treated to be grouped to obtain multiple domain name groups;
When the distribution of letter and numeral during condition code is used to indicate domain name, numeral and letter occur in statistics domain name group Number of times, such as " x " occur in that 230 times, " 3 " occur in that 59 is inferior;Choose domain name group in 10 numerals of occurrence number highest and Letter, if the numeral and letter for obtaining less than if 10 it is considered that the information that the domain name group is included be not enough to it is follow-up for doing Judge, directly abandon the domain name group;10 numerals come will be selected and obtain one according to the arrangement of occurrence number descending with letter 10 condition codes of byte;
When the distribution of letter during condition code is used to indicate domain name, the number of times that letter occurs in statistics domain name group;Choose domain 10 letters of occurrence number highest in name group, can consider the information that the domain name group is included if the letter for obtaining is less than 10 It is not enough to for doing follow-up judgement, directly abandons the domain name group;10 letters for coming will be selected according to occurrence number descending Arrangement obtains 10 condition codes for byte;
Obtaining the condition code of normal domain name can be:
When the distribution of letter and numeral during condition code is used to indicate domain name, numeral occurs with letter in counting normal domain name Number of times.Choose 10 numerals of occurrence number highest and letter in normal domain name;10 numerals and the letter for coming will be selected 10 condition codes for byte are obtained according to the arrangement of occurrence number descending;
When the distribution of letter during condition code is used to indicate domain name, the number of times of letter appearance in normal domain name is counted;Choose 10 letters of occurrence number highest in normal domain name;10 letters for coming will be selected to be obtained according to the arrangement of occurrence number descending One 10 condition code of byte.
The feature that processing module 501 is additionally operable to calculate between the condition code of domain name to be detected and the condition code of normal domain name is poor Away from.
Wherein, feature gap is used to indicate the similar journey between the condition code of domain name to be detected and the condition code of normal domain name Degree.
Specifically, feature gap is similar between the condition code of domain name to be detected and the condition code of normal domain name for indicating Degree, further, the condition code of the bigger explanation domain name to be detected of feature gap is with the similarity of the condition code of normal domain name more Low, domain name to be detected is that the probability of a forgery domain name generated by DGA algorithms is larger, otherwise then indicates the spy of domain name to be detected Levy code higher with the similarity of the condition code of normal domain name, domain name to be detected is a forgery domain name generated by DGA algorithms Probability is smaller.
Further, when the distribution of letter during condition code is used to indicate domain name or the distribution of letter and numeral, can be with Calculate the Jie Kade similarity measurements between the condition code of domain name to be detected and the condition code of the normal domain name, and by Jie Kade phases Like property degree as the feature gap between the condition code of domain name to be detected and the condition code of the normal domain name.
When the distribution of letter and numeral during condition code is used to indicate domain name, can be according to Damerau-Levenshtein Distance algorithm calculates the condition code of domain name to be detected and the Damerau-Levenshtein distances of the condition code of normal domain name, Damerau-Levenshtein distances are the feature gap between the condition code and the condition code of normal domain name of domain name to be detected.
Detection module 502, for being determined to be accessed for whether domain name is to generate DGA algorithms using domain name according to feature gap The domain name of generation.
Specifically, after feature gap between the condition code and the condition code of normal domain name for obtaining domain name to be detected, can Contrasted with standard value set in advance with by this feature gap, when comparing result meets to be required, it may be determined that to be detected Domain name it is alphabetical with normal domain name in alphabetical characteristic distributions have an obvious difference, or determine detecting domains name letter and number with The characteristic distributions of the letter and number in normal domain name have obvious difference, so that it is determined that the domain name to be detected is to use domain name The domain name of generation DGA algorithm generations.
Further, when it is determined that domain name to be detected is the domain name generated using domain name generation DGA algorithms, can be to this Domain name to be detected is marked.
The embodiment provides a kind of domain name detection means, by the condition code and just for obtaining domain name to be detected The condition code of normal domain name, and the feature gap between the condition code of domain name to be detected and the condition code of normal domain name is calculated, due to Condition code is used to indicate the distribution of letter or letter and numeral in domain name, therefore according to the condition code and normal operation in normal domain of domain name to be detected Feature gap between the condition code of name can determine the condition code of domain name to be detected and the similarity of the condition code of normal domain name, Due to when the condition code of domain name to be detected and the larger similarity gap of the condition code of normal domain name, the domain name to be detected is to make The possibility for generating the domain name that DGA algorithms are generated with domain name is higher, therefore can be determined to be accessed for domain name according to feature gap Whether it is to generate the domain name that DGA algorithms are generated using domain name.Therefore domain name detection method provided in an embodiment of the present invention is solved The problem of the domain name that None- identified is generated using DGA algorithms, improves the success rate to the detection of improper domain name in the prior art, Improve Consumer's Experience.
Specifically, as shown in Figure 6, the domain name detection means 500 that embodiments of the invention are provided can also include adopting Collection module 503, for obtaining web access logs, and parses web access logs to obtain domain name to be detected.
Specifically, it can be by bypassing all users in the network that testing equipment gets to obtain web access logs The daily record that web is accessed, and all of domain name in daily record is all parsed, and as the basis of subsequent treatment.It is different from The domain name that user accesses is analyzed based on the common flow using DNS, is accessed using the actual web of user terminal Daily record is more targeted also more accurate compared to use DNS flows as initial data.
Specifically, acquisition module 503 is additionally operable to:
The TLD suffix LTD and country TLD suffix ccTLD of domain name to be detected and normal domain name are gone Remove;And/or, by domain name to be detected and the prefix of normal domain name " www " removal;And/or, by domain name to be detected and normal operation in normal domain Character removal in name in addition to 0-9, a-z, " ", " _ " and "-".
Specifically, by domain name to be detected and TLD suffix (the English full name of normal domain name:top-level Domain, english abbreviation:) and national TLD suffix (English full name TLD:country-code top-level Domain, english abbreviation:CcTLD after) all getting rid of, TLD and ccTLD can be avoided from disturbing domain name to be detected and normal strongly The result of calculation of the letter and number distribution statisticses of the alphabetical or to be detected domain name and normal domain name of domain name, and and then influence to treat The final Detection results of detection domain name.
Specifically, by domain name to be detected and the prefix of normal domain name " www " after removal, can avoid domain name to be detected with And the prefix of normal domain name " www " disturb the alphabetical or to be detected domain name and normal domain name of domain name to be detected and normal domain name Letter and number distribution statisticses result of calculation, and and then influence the final Detection results of domain name to be detected
Specifically, it is possible to use regular expression such as ^ [0-9a-z._-]+by domain name to be detected and normal domain name Character removal in addition to 0-9, a-z, " ", " _ " and "-", regular expression is meant that and only includes 0-9, and a-z is added " ", " _ " and three characters of "-" have 39 characters altogether as character set, by the character independent assortment in above-mentioned character set Just as effective domain name, all domain names for not meeting this condition are all dropped domain name as invalid domain name.
Further, because Chinese domain name can be converted to PunyCode (starting with xn-) domain name by browser, it is therefore desirable to Chinese domain name in domain name to be detected and normal domain name is converted to the identification of PunyCode, prevents from being missed as conventional domain names It is judged to DGA domain names.
Further, can also all be changed into small by all of domain name capital and small letter in domain name just to be detected and normal domain name Write, in order to follow-up unified comparing.
Specifically, processing module 501 specifically for:
When the distribution of letter during condition code is used to indicate domain name or the distribution of letter and numeral, domain name to be detected is calculated Jie Kade similarity measurements between the condition code of condition code and normal domain name, Jie Kade similarity measurements are the feature of domain name to be detected Feature gap between the condition code of code and normal domain name;And/or,
When the distribution of letter and numeral during condition code is used to indicate domain name, according to Damerau-Levenshtein distances Algorithm calculates the condition code of domain name to be detected and the Damerau-Levenshtein distances of the condition code of normal domain name, Damerau-Levenshtein distances are the feature gap between the condition code and the condition code of normal domain name of domain name to be detected.
Wherein, Jie Kade similarity measurements are poor feature between the condition code of domain name to be detected and the condition code of normal domain name Away from.
Specifically, when the distribution of letter during condition code is used to indicate domain name, the band detection domain name after packet can be calculated Each domain name group condition code with packet after normal domain name each sample group condition code between Jie Kade phases Like property degree, the arithmetic mean of instantaneous value of acquired multiple Jie Kade similarity measurements is then calculated, and using the arithmetic mean of instantaneous value as working as Spy of condition code when being used to indicate the distribution of letter in domain name between the condition code of domain name to be detected and the condition code of normal domain name Levy gap.
When letter is with digital distribution during condition code is used to indicate domain name, the band detection domain name after packet can be calculated Jie Kade between the condition code of each domain name group and the condition code of each sample group of the normal domain name after packet is similar Property degree, then calculate the arithmetic mean of instantaneous value of acquired multiple Jie Kade similarity measurements, and using the arithmetic mean of instantaneous value as spy Feature when levying code for the distribution for indicating letter in domain name between the condition code of domain name to be detected and the condition code of normal domain name Gap.
Specifically, detection module 502 specifically for:
The Jie Kade similarity measurements that the condition code of the distribution of letter and numeral is obtained according to for indicating domain name are more than Or during equal to 0.8, it is determined that it is the domain name generated using DGA algorithms to be accessed for domain name;And/or,
As Jie Kade differences and the Jie Kade obtained according to the condition code of the distribution of letter and numeral in being used to indicate domain name When the ratio of similarity measurements is less than 0.1, it is determined that it is the domain name generated using DGA algorithms to be accessed for domain name, Jie Kade differences are According to the Jie Kade similarity measurements for indicating the condition code of the distribution of letter in domain name to obtain with according in for indicating domain name The absolute value of the difference between the Jie Kade similarity measurements that the condition code of the distribution of letter and numeral is obtained;And/or,
When Damerau-Levenshtein distances are more than or equal to 0.9, it is determined that it is to be calculated using DGA to be accessed for domain name The domain name of method generation.
The above is clearly indicated obvious between the condition code of domain name to be detected and the condition code of normal domain name Gap, also implying that the characteristic distributions of letter in the characteristic distributions and normal domain name of letter in domain name to be detected has significantly not Together, or in domain name to be detected the characteristic distributions of numeral and letter have significantly with the characteristic distributions of numeral and letter in normal domain name Difference, therefore all domain names to be detected for meeting conditions above all may be the potential forgery domain name generated by DGA algorithms.
Wherein, the Jie Kade for being used to indicate the condition code acquisition of the distribution of letter in domain name according to Jie Kade differences is similar Property degree and according to the difference between the Jie Kade similarity measurements for indicating the condition code of the distribution of letter and numeral in domain name to obtain The absolute value of value.
The above is used to avoid domain name the recognizing by mistake comprising a large amount of numerals in domain name to be detected and normal domain name It is set to forgery domain name (the such as website 10086.cn of China Mobile).Although from the point of view of the numeral of normal domain name and letter distribution, The frequency of use of numerals is relatively low, but relies solely on and be judged as a domain name for forgery is also improper comprising substantial amounts of numeral 's.
Specifically, as shown in Figure 6, the domain name detection means 500 that embodiments of the invention are provided can also include back Module of tracing back 504, domain name and the domain for being defined as being generated using DGA algorithms that backtracking module 504 is used in the access record to terminal Name is matched;Matched when the domain name in the access record of terminal meets infection with the domain name for being defined as being generated using DGA algorithms During condition, determine that terminal is infection terminal;And/or, when terminal access record in domain name be defined as being given birth to using DGA algorithms Into domain name meet delete matching condition when, the domain name for deleting matching condition will be met from being defined as what is generated using DGA algorithms Deleted in domain name.
Specifically, after determining the domain name generated using DGA algorithms after testing result is obtained, can be by acquired inspection It is that the original log record that domain name group is compared for generating is carried out to survey result and bring back to the daily record that original web accesses again Match somebody with somebody.Web in order to be directed to user according to matching result accesses behavior and carries out filtering so as to reduction as far as possible is reported by mistake and can Potential infected machine, and the infected order of severity are determined during to be filtered in behavior.
Exemplary, can be based on the web access logs in past 7 days, to be defined as using DGA algorithms with all The domain name of generation is compared, and finally only chooses those and is visited more than 3 days domain names all to being defined as being generated using DGA algorithms The access record asked, and accessed at least 3 machine conducts of the different domain names for being defined as using DGA algorithms to generate altogether Final infected machine, in order to take measures to carry out killing or isolation as early as possible to final infected machine, reduce into The harm of one step;If certain domain name for being defined as being generated using DGA algorithms was only accessed less than 3 times by a machine, recognize No longer it is labeled as forging domain name for this domain name is a possible statistics noise.From such behavior filter type Principle is:One infected bot program is as the machine of a Botnet part, it is necessary to periodically go to access the control of behind Machine processed keeps this control and controlled relation, and is namely using the purpose that DGA algorithms carry out domain name forgery Hide possible domain name blacklist filtering, thus the DGA algorithms that access of infected machine forge domain name should also be need through Often change.So can determine to be exactly infected terminal substantially after the terminal for finding above behavior.
The embodiment provides a kind of domain name detection means, by the condition code and just for obtaining domain name to be detected The condition code of normal domain name, and the feature gap between the condition code of domain name to be detected and the condition code of normal domain name is calculated, due to Condition code is used to indicate the distribution of letter or letter and numeral in domain name, therefore according to the condition code and normal operation in normal domain of domain name to be detected Feature gap between the condition code of name can determine the condition code of domain name to be detected and the similarity of the condition code of normal domain name, Due to when the condition code of domain name to be detected and the larger similarity gap of the condition code of normal domain name, the domain name to be detected is to make The possibility for generating the domain name that DGA algorithms are generated with domain name is higher, therefore can be determined to be accessed for domain name according to feature gap Whether it is to generate the domain name that DGA algorithms are generated using domain name.Therefore domain name detection method provided in an embodiment of the present invention is solved The problem of the domain name that None- identified is generated using DGA algorithms, improves the success rate to the detection of improper domain name in the prior art, Improve Consumer's Experience.
Through the above description of the embodiments, it is apparent to those skilled in the art that the present invention can be with Realized with hardware, or firmware is realized, or combinations thereof mode is realized.When implemented in software, can be by above-mentioned functions Storage is transmitted in computer-readable medium or as one or more instructions on computer-readable medium or code.Meter Calculation machine computer-readable recording medium includes computer-readable storage medium and communication media, and wherein communication media includes being easy to from a place to another Any medium of individual place transmission computer program.Storage medium can be any usable medium that computer can be accessed.With As a example by this but it is not limited to:Computer-readable medium can include random access memory (English full name:Random Access Memory, English abbreviation:RAM), read-only storage (English full name:Read Only Memory, English abbreviation:ROM), electricity can EPROM (English full name:Electrically Erasable Programmable Read Only Memory, English abbreviation:EEPROM), read-only optical disc (English full name:Compact Disc Read Only Memory, English Referred to as:CD-ROM) or other optical disc storages, magnetic disk storage medium or other magnetic storage apparatus or can be used in carry or Desired program code of the storage with instruction or data structure form simultaneously can be by any other medium of computer access.This Outward.Any connection can be appropriate as computer-readable medium.If for example, software be use coaxial cable, optical fiber cable, Twisted-pair feeder, digital subscriber line (English full name:Digital Subscriber Line, English abbreviation:DSL it is) or such as red The wireless technology of outside line, radio and microwave etc is transmitted from website, server or other remote sources, then coaxial electrical The wireless technology of cable, optical fiber cable, twisted-pair feeder, DSL or such as infrared ray, wireless and microwave etc is included in computer-readable In the definition of medium.
Through the above description of the embodiments, it is apparent to those skilled in the art that, when with software When mode realizes the present invention, can will store in computer-readable medium or logical for the instruction or code that perform the above method Computer-readable medium is crossed to be transmitted.Computer-readable medium includes computer-readable storage medium and communication media, wherein communicating Medium includes being easy to being transmitted from a place to another place any medium of computer program.Storage medium can be calculated Any usable medium that machine can be accessed.As example but it is not limited to:Computer-readable medium can include RAM, ROM, electricity can EPROM (full name:Electrically erasable programmable read-only memory, Referred to as:EEPROM), CD, disk or other magnetic storage apparatus or can be used in carrying or store with instruction or data The desired program code of structure type simultaneously can be by any other medium of computer access.
The above, specific embodiment only of the invention, but protection scope of the present invention is not limited thereto, and it is any Those familiar with the art the invention discloses technical scope in, change or replacement can be readily occurred in, should all contain Cover within protection scope of the present invention.Therefore, protection scope of the present invention described should be defined by scope of the claims.

Claims (12)

1. a kind of domain name detection method, it is characterised in that including:
Obtain the condition code of domain name to be detected and the condition code of normal domain name, described document information is used to indicating letter in domain name Distribution or the distribution of letter and numeral;
The feature gap between the condition code of the domain name to be detected and the condition code of the normal domain name is calculated, the feature is poor Away from for indicating the similarity degree between the condition code of the domain name to be detected and the condition code of the normal domain name;
It is accessed for whether domain name is to generate the domain name that DGA algorithms are generated using domain name according to the feature gap determines.
2. domain name detection method according to claim 1, it is characterised in that the condition code of the acquisition domain name to be detected with And before the condition code of normal domain name, methods described also includes:
Web access logs are obtained, and parses the web access logs to obtain the domain name to be detected.
3. domain name detection method according to claim 1, it is characterised in that the condition code of the acquisition domain name to be detected with And before the condition code of normal domain name, methods described also includes:
The TLD suffix LTD and country TLD suffix ccTLD of the domain name to be detected and normal domain name are gone Remove;And/or,
By the domain name to be detected and the prefix of normal domain name " www " removal;And/or,
By the character removal in the domain name to be detected and normal domain name in addition to 0-9, a-z, " ", " _ " and "-".
4. domain name detection method according to claim 1, it is characterised in that the feature of the calculating domain name to be detected Feature gap between code and the condition code of the normal domain name, including:
When the distribution of letter during condition code is used to indicate domain name or the distribution of letter and numeral, the domain name to be detected is calculated Jie Kade similarity measurements between the condition code of condition code and the normal domain name, the Jie Kade similarity measurements are described to be checked The feature gap surveyed between the condition code and the condition code of the normal domain name of domain name;And/or,
When the distribution of letter and numeral during condition code is used to indicate domain name, according to Damerau-Levenshtein distance algorithms Calculate the condition code of the domain name to be detected and the Damerau-Levenshtein distances of the condition code of the normal domain name, institute Damerau-Levenshtein distances are stated between the condition code and the condition code of the normal domain name of the domain name to be detected Feature gap.
5. domain name detection method according to claim 4, it is characterised in that described according to the feature gap determines It is accessed for whether domain name is to generate the domain name that DGA algorithms are generated using domain name, including:
The Jie Kade similarity measurements that the condition code of the distribution of letter and numeral is obtained according to for indicating domain name are more than Or during equal to 0.8, it is determined that the domain name that is accessed for is the domain name generated using DGA algorithms;And/or,
As Jie Kade differences and the Jie Kade obtained according to the condition code of the distribution of letter and numeral in being used to indicate domain name When the ratio of similarity measurements is less than 0.1, it is determined that the domain name that is accessed for is the domain name generated using DGA algorithms, the outstanding card The Jie Kade similarity measurements for being used to indicating the condition code of the distribution of letter in domain name to obtain according to moral difference with according to In indicate domain name in letter and numeral distribution condition code obtain the Jie Kade similarity measurements between difference it is absolute Value;And/or,
When Damerau-Levenshtein distances are more than or equal to 0.9, it is determined that the domain name that is accessed for is to use The domain name of DGA algorithms generation.
6. the domain name detection method according to any one of claim 1-5, it is characterised in that methods described also includes:
Domain name in the access record of terminal is matched with the domain name for being defined as being generated using DGA algorithms;
Matched when the domain name in the access record of the terminal meets infection with the domain name for being defined as being generated using DGA algorithms During condition, determine that the terminal is infection terminal;And/or,
Matched when the domain name in the access record of the terminal meets to delete with the domain name for being defined as being generated using DGA algorithms During condition, the domain name that will meet the deletion matching condition is deleted from the domain name for being defined as and being generated using DGA algorithms.
7. a kind of domain name detection means, it is characterised in that including:
Processing module, for obtaining the condition code of domain name to be detected and the condition code of normal domain name, described document information is used to refer to Show the distribution of letter in domain name or the distribution of letter and numeral;
The processing module is additionally operable to calculate between the condition code of the domain name to be detected and the condition code of the normal domain name Feature gap, the feature gap is used to indicate between the condition code of the domain name to be detected and the condition code of the normal domain name Similarity degree;
Detection module, for being accessed for whether domain name is to use domain name to generate DGA to calculate according to feature gap determination The domain name of method generation.
8. domain name detection means according to claim 7, it is characterised in that described device also includes:
Acquisition module, for obtaining web access logs, and parses the web access logs to obtain the domain name to be detected.
9. domain name detection means according to claim 7, it is characterised in that the acquisition module is additionally operable to:
The TLD suffix LTD and country TLD suffix ccTLD of the domain name to be detected and normal domain name are gone Remove;And/or, by the domain name to be detected and the prefix of normal domain name " www " removal;And/or, by the domain name to be detected with And the character removal in normal domain name in addition to 0-9, a-z, " ", " _ " and "-".
10. domain name detection means according to claim 7, it is characterised in that processing module specifically for:
When the distribution of letter during condition code is used to indicate domain name or the distribution of letter and numeral, the domain name to be detected is calculated Jie Kade similarity measurements between the condition code of condition code and the normal domain name, the Jie Kade similarity measurements are described to be checked The feature gap surveyed between the condition code and the condition code of the normal domain name of domain name;And/or,
When the distribution of letter and numeral during condition code is used to indicate domain name, according to Damerau-Levenshtein distance algorithms Calculate the condition code of the domain name to be detected and the Damerau-Levenshtein distances of the condition code of the normal domain name, institute Damerau-Levenshtein distances are stated between the condition code and the condition code of the normal domain name of the domain name to be detected Feature gap.
11. domain name detection means according to claim 10, it is characterised in that the detection module specifically for:
The Jie Kade similarity measurements that the condition code of the distribution of letter and numeral is obtained according to for indicating domain name are more than Or during equal to 0.8, it is determined that the domain name that is accessed for is the domain name generated using DGA algorithms;And/or,
As Jie Kade differences and the Jie Kade obtained according to the condition code of the distribution of letter and numeral in being used to indicate domain name When the ratio of similarity measurements is less than 0.1, it is determined that the domain name that is accessed for is the domain name generated using DGA algorithms, the outstanding card The Jie Kade similarity measurements for being used to indicating the condition code of the distribution of letter in domain name to obtain according to moral difference with according to In indicate domain name in letter and numeral distribution condition code obtain the Jie Kade similarity measurements between difference it is absolute Value;And/or,
When Damerau-Levenshtein distances are more than or equal to 0.9, it is determined that the domain name that is accessed for is to use The domain name of DGA algorithms generation.
The 12. domain name detection means according to any one of claim 7-11, it is characterised in that described device also includes:
Backtracking module, is carried out for the domain name and the domain name for being defined as being generated using DGA algorithms in the access record to terminal Match somebody with somebody;Matched when the domain name in the access record of the terminal meets infection with the domain name for being defined as being generated using DGA algorithms During condition, determine that the terminal is infection terminal;And/or, when the domain name in the access record of the terminal is defined as with described The domain name generated using DGA algorithms is met when deleting matching condition, will meet the domain name of the deletion matching condition from it is described really It is set to deletion in the domain name generated using DGA algorithms.
CN201710242441.6A 2017-04-13 2017-04-13 A kind of domain name detection method and device Pending CN106911717A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710242441.6A CN106911717A (en) 2017-04-13 2017-04-13 A kind of domain name detection method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710242441.6A CN106911717A (en) 2017-04-13 2017-04-13 A kind of domain name detection method and device

Publications (1)

Publication Number Publication Date
CN106911717A true CN106911717A (en) 2017-06-30

Family

ID=59209445

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710242441.6A Pending CN106911717A (en) 2017-04-13 2017-04-13 A kind of domain name detection method and device

Country Status (1)

Country Link
CN (1) CN106911717A (en)

Cited By (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109246083A (en) * 2018-08-09 2019-01-18 北京奇安信科技有限公司 A kind of detection method and device of DGA domain name
WO2019096099A1 (en) * 2017-11-15 2019-05-23 瀚思安信(北京)软件技术有限公司 Real-time detection method and apparatus for dga domain name
CN109936560A (en) * 2018-12-27 2019-06-25 上海银行股份有限公司 Malware means of defence and device
CN111478877A (en) * 2019-01-24 2020-07-31 安碁资讯股份有限公司 Domain name recognition method and domain name recognition device
CN111641663A (en) * 2020-07-06 2020-09-08 奇安信科技集团股份有限公司 Safety detection method and device
CN111935099A (en) * 2020-07-16 2020-11-13 兰州理工大学 Malicious domain name detection method based on deep noise reduction self-coding network
CN112073551A (en) * 2020-08-26 2020-12-11 重庆理工大学 DGA domain name detection system based on character-level sliding window and depth residual error network
US10880319B2 (en) 2018-04-26 2020-12-29 Micro Focus Llc Determining potentially malware generated domain names
US10911481B2 (en) 2018-01-31 2021-02-02 Micro Focus Llc Malware-infected device identifications
US10931714B2 (en) 2019-01-08 2021-02-23 Acer Cyber Security Incorporated Domain name recognition method and domain name recognition device
US10965697B2 (en) 2018-01-31 2021-03-30 Micro Focus Llc Indicating malware generated domain names using digits
CN112751804A (en) * 2019-10-30 2021-05-04 北京观成科技有限公司 Method, device and equipment for identifying counterfeit domain name
US11108794B2 (en) 2018-01-31 2021-08-31 Micro Focus Llc Indicating malware generated domain names using n-grams
US11245720B2 (en) 2019-06-06 2022-02-08 Micro Focus Llc Determining whether domain is benign or malicious
US11271963B2 (en) 2018-12-20 2022-03-08 Micro Focus Llc Defending against domain name system based attacks
CN116980234A (en) * 2023-09-25 2023-10-31 北京源堡科技有限公司 Domain name imitation detection method and system

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1670723A (en) * 2004-03-16 2005-09-21 微软公司 Systems and methods for improved spell checking
CN103098050A (en) * 2010-01-29 2013-05-08 因迪普拉亚公司 Systems and methods for word offensiveness detection and processing using weighted dictionaries and normalization
US20160057165A1 (en) * 2014-08-22 2016-02-25 Mcafee, Inc. System and method to detect domain generation algorithm malware and systems infected by such malware
CN105577660A (en) * 2015-12-22 2016-05-11 国家电网公司 DGA domain name detection method based on random forest
CN105610830A (en) * 2015-12-30 2016-05-25 山石网科通信技术有限公司 Method and device for detecting domain name
CN105827594A (en) * 2016-03-08 2016-08-03 北京航空航天大学 Suspicion detection method based on domain name readability and domain name analysis behavior
CN106372659A (en) * 2016-08-30 2017-02-01 五八同城信息技术有限公司 Similar object determination method and device
CN106372056A (en) * 2016-08-25 2017-02-01 久远谦长(北京)技术服务有限公司 Natural language-based topic and keyword extraction method and system
CN106557476A (en) * 2015-09-24 2017-04-05 北京奇虎科技有限公司 The acquisition methods and device of relevant information

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1670723A (en) * 2004-03-16 2005-09-21 微软公司 Systems and methods for improved spell checking
CN103098050A (en) * 2010-01-29 2013-05-08 因迪普拉亚公司 Systems and methods for word offensiveness detection and processing using weighted dictionaries and normalization
US20160057165A1 (en) * 2014-08-22 2016-02-25 Mcafee, Inc. System and method to detect domain generation algorithm malware and systems infected by such malware
CN106557476A (en) * 2015-09-24 2017-04-05 北京奇虎科技有限公司 The acquisition methods and device of relevant information
CN105577660A (en) * 2015-12-22 2016-05-11 国家电网公司 DGA domain name detection method based on random forest
CN105610830A (en) * 2015-12-30 2016-05-25 山石网科通信技术有限公司 Method and device for detecting domain name
CN105827594A (en) * 2016-03-08 2016-08-03 北京航空航天大学 Suspicion detection method based on domain name readability and domain name analysis behavior
CN106372056A (en) * 2016-08-25 2017-02-01 久远谦长(北京)技术服务有限公司 Natural language-based topic and keyword extraction method and system
CN106372659A (en) * 2016-08-30 2017-02-01 五八同城信息技术有限公司 Similar object determination method and device

Cited By (23)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019096099A1 (en) * 2017-11-15 2019-05-23 瀚思安信(北京)软件技术有限公司 Real-time detection method and apparatus for dga domain name
US11334764B2 (en) 2017-11-15 2022-05-17 Han Si An Xin (Beijing) Software Technology Co., Ltd Real-time detection method and apparatus for DGA domain name
US10965697B2 (en) 2018-01-31 2021-03-30 Micro Focus Llc Indicating malware generated domain names using digits
US11108794B2 (en) 2018-01-31 2021-08-31 Micro Focus Llc Indicating malware generated domain names using n-grams
US10911481B2 (en) 2018-01-31 2021-02-02 Micro Focus Llc Malware-infected device identifications
US10880319B2 (en) 2018-04-26 2020-12-29 Micro Focus Llc Determining potentially malware generated domain names
CN109246083A (en) * 2018-08-09 2019-01-18 北京奇安信科技有限公司 A kind of detection method and device of DGA domain name
CN109246083B (en) * 2018-08-09 2021-08-03 奇安信科技集团股份有限公司 DGA domain name detection method and device
US11271963B2 (en) 2018-12-20 2022-03-08 Micro Focus Llc Defending against domain name system based attacks
CN109936560A (en) * 2018-12-27 2019-06-25 上海银行股份有限公司 Malware means of defence and device
US10931714B2 (en) 2019-01-08 2021-02-23 Acer Cyber Security Incorporated Domain name recognition method and domain name recognition device
CN111478877B (en) * 2019-01-24 2022-08-02 安碁资讯股份有限公司 Domain name recognition method and domain name recognition device
CN111478877A (en) * 2019-01-24 2020-07-31 安碁资讯股份有限公司 Domain name recognition method and domain name recognition device
US11245720B2 (en) 2019-06-06 2022-02-08 Micro Focus Llc Determining whether domain is benign or malicious
CN112751804A (en) * 2019-10-30 2021-05-04 北京观成科技有限公司 Method, device and equipment for identifying counterfeit domain name
CN112751804B (en) * 2019-10-30 2023-04-07 北京观成科技有限公司 Method, device and equipment for identifying counterfeit domain name
CN111641663A (en) * 2020-07-06 2020-09-08 奇安信科技集团股份有限公司 Safety detection method and device
CN111641663B (en) * 2020-07-06 2022-08-12 奇安信科技集团股份有限公司 Safety detection method and device
CN111935099A (en) * 2020-07-16 2020-11-13 兰州理工大学 Malicious domain name detection method based on deep noise reduction self-coding network
CN112073551B (en) * 2020-08-26 2021-07-20 重庆理工大学 DGA domain name detection system based on character-level sliding window and depth residual error network
CN112073551A (en) * 2020-08-26 2020-12-11 重庆理工大学 DGA domain name detection system based on character-level sliding window and depth residual error network
CN116980234A (en) * 2023-09-25 2023-10-31 北京源堡科技有限公司 Domain name imitation detection method and system
CN116980234B (en) * 2023-09-25 2024-01-05 北京源堡科技有限公司 Domain name imitation detection method and system

Similar Documents

Publication Publication Date Title
CN106911717A (en) A kind of domain name detection method and device
CN109831465B (en) Website intrusion detection method based on big data log analysis
CN104113519B (en) Network attack detecting method and its device
CN103428189B (en) A kind of methods, devices and systems identifying malicious network device
CN103607385B (en) Method and apparatus for security detection based on browser
CN111949803B (en) Knowledge graph-based network abnormal user detection method, device and equipment
CN102945340B (en) information object detection method and system
CN109729044B (en) Universal internet data acquisition reverse-crawling system and method
CN110351248B (en) Safety protection method and device based on intelligent analysis and intelligent current limiting
CN106685899B (en) Method and device for identifying malicious access
JP2016146114A (en) Management method of blacklist
CN113992356A (en) Method and device for detecting IP attack and electronic equipment
CN112434304A (en) Method, server and computer readable storage medium for defending network attack
CN110493253B (en) Botnet analysis method of home router based on raspberry group design
CN106411819A (en) Method and apparatus for recognizing proxy Internet protocol address
CN111625700B (en) Anti-grabbing method, device, equipment and computer storage medium
CN110516170A (en) A kind of method and device checking exception web access
KR101428725B1 (en) A System and a Method for Finding Malicious Code Hidden Websites by Checking Sub-URLs
CN107172033A (en) A kind of WAF erroneous judgement recognition methods and device
CN113726775B (en) Attack detection method, device, equipment and storage medium
CN113923039B (en) Attack equipment identification method and device, electronic equipment and readable storage medium
CN110266684A (en) A kind of domain name system security means of defence and device
CN113765914B (en) CC attack protection method, system, computer equipment and readable storage medium
CN113852625B (en) Weak password monitoring method, device, equipment and storage medium
CN112702349B (en) Network attack defense method and device and electronic bidding transaction platform

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20170630

RJ01 Rejection of invention patent application after publication