CN110602020A - Botnet detection technology based on DGA domain name and periodic network connection session behavior - Google Patents
Botnet detection technology based on DGA domain name and periodic network connection session behavior Download PDFInfo
- Publication number
- CN110602020A CN110602020A CN201810603034.8A CN201810603034A CN110602020A CN 110602020 A CN110602020 A CN 110602020A CN 201810603034 A CN201810603034 A CN 201810603034A CN 110602020 A CN110602020 A CN 110602020A
- Authority
- CN
- China
- Prior art keywords
- domain name
- host
- dga
- dga domain
- population
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000001514 detection method Methods 0.000 title claims abstract description 20
- 230000000737 periodic effect Effects 0.000 title claims abstract description 13
- 238000005516 engineering process Methods 0.000 title abstract description 5
- 238000000034 method Methods 0.000 claims abstract description 16
- 230000006399 behavior Effects 0.000 claims abstract description 13
- 125000004122 cyclic group Chemical group 0.000 claims abstract description 4
- 238000010801 machine learning Methods 0.000 claims abstract description 4
- 239000013598 vector Substances 0.000 claims description 3
- 239000011159 matrix material Substances 0.000 claims 2
- 238000004891 communication Methods 0.000 description 5
- 239000000523 sample Substances 0.000 description 3
- 241000700605 Viruses Species 0.000 description 2
- 238000004364 calculation method Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 1
- 238000001914 filtration Methods 0.000 description 1
- 238000005206 flow analysis Methods 0.000 description 1
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L61/00—Network arrangements, protocols or services for addressing or naming
- H04L61/45—Network directories; Name-to-address mapping
- H04L61/4505—Network directories; Name-to-address mapping using standardised directories; using standardised directory access protocols
- H04L61/4511—Network directories; Name-to-address mapping using standardised directories; using standardised directory access protocols using domain name system [DNS]
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L63/00—Network architectures or network communication protocols for network security
- H04L63/14—Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
- H04L63/1408—Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L63/00—Network architectures or network communication protocols for network security
- H04L63/14—Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
- H04L63/1408—Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
- H04L63/1425—Traffic logging, e.g. anomaly detection
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L2463/00—Additional details relating to network architectures or network communication protocols for network security covered by H04L63/00
- H04L2463/144—Detection or countermeasures against botnets
Landscapes
- Engineering & Computer Science (AREA)
- Computer Security & Cryptography (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Computer Hardware Design (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- Data Exchanges In Wide-Area Networks (AREA)
Abstract
The invention discloses a botnet detection technology based on a DGA domain name and periodic network connection session behaviors, and relates to the field of network security. The detection method comprises the following steps: acquiring network protocol data and analyzing DNS protocol logs; establishing a machine learning detection model of the DGA domain name, and detecting whether the domain name accessed by the host meets the characteristics of the DGA domain name; using a clustering method to find a host population with similar DGA domain name access behaviors; detecting whether intermittent network connection behaviors exist between a host and a DGA domain name by using a cyclic autocorrelation method; and (4) detecting the botnets in the host group by combining the steps. The invention only collects the network protocol data in the exchanger in a bypass collection mode, and has wide application range, high detection efficiency and high accuracy.
Description
Technical Field
The invention relates to the technical field of information security, in particular to a botnet detection technology based on a DGA domain name and periodic network connection session behaviors.
Background
Botnets refer to computer networks that employ one or more propagation means, infect a large number of hosts with bot program viruses, and perform one-to-many control on each controlled node through a control server (C & C server). The virus usually has a commanded domain name in it, and the bot host accesses the C & C server by accessing this domain name intermittently. After communication with the C & C server is established, a command of a control end can be received, distributed denial of service attack (DDOS) and massive junk mails are launched, and meanwhile, a controller can steal information of a bot host, such as bank account passwords and the like. Botnets are therefore a direct threat to the normal operation of enterprise networks and user data security. The existing botnet detection method has the characteristics of dependence on a large number of artificial rules, low detection precision, high difficulty, high false alarm and the like, and the finding of the botnet detection method which combines machine learning and has high precision is particularly important.
Disclosure of Invention
And analyzing the underlying network protocol to obtain message protocol information. The DGA domain name in the DNS record is identified using machine learning. Using a clustering algorithm, a group of hosts with similar DGA domain name access is discovered. And detecting whether intermittent network connection exists between all the hosts in the group and the server IP corresponding to the DGA domain name. For the host group meeting the above conditions, the host group is a botnet.
In order to achieve the purpose, the technical scheme for botnet detection based on the DGA domain name and the periodic network session provided by the invention is as follows:
step 1: the flow protocol in the enterprise network is analyzed in a flow acquisition probe mode, mainly comprising TCP/UDP/DNS and the like, in a standardized format and stored in HDFS and elastic search.
Step 2: and constructing 9 large features aiming at the domain names in the DNS message, and detecting the DGA domain names by using a gradient lifting tree (GBDT) algorithm.
And step 3: and (3) finding the host group with similar DGA domain name access behaviors by using a Kmeans method in combination with the DGA domain name detected in the step (2).
And 4, step 4: and (3) finding that the behavior of intermittently carrying out network communication with the IP to which the DGA domain name belongs exists by using methods such as circular autocorrelation and the like in combination with the suspicious group detected in the step (3).
And 5: combining the results of steps 3 and 4, when the number of hosts meeting the conditions on each group reaches a set threshold, then the group is considered as a botnet.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
FIG. 1 is a flow chart proposed by the present invention;
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The implementation flow of the scheme is as follows:
step 1: and accessing the enterprise main switch by using a bypass flow access mode to acquire the flow information of all concerned hosts. Message information of protocols such as TCP/UDP/DNS is analyzed by using a flow analysis tool and is stored in HDFS and ELASTICSEARCH through a KAFKA message queue.
Step 2: and (4) combining the step 1, acquiring domain name information from the analyzed DNS message information. And training a DGA domain name detection model.
Step 2.1: and constructing a training library. Obtaining a domain name library generated by a DGA algorithm from an open source library, such as http:// data. netlab.360.com/DGA/, and the like, and taking the domain name library as a negative sample; and obtaining positive samples from Alexa website ranking, and constructing a training library.
Step 2.2: 9 domain name detection features are constructed. Firstly, a domain name, entropy; calculating uni-gram, bi-gram and tri-gram of the domain name by using an n-gram method; probability of vowels and numbers in domain names; fourthly, the probability of repeated letters in the domain name; probability of continuous consonant letters in the domain name; sixthly, probability of continuous numbers in the domain name; seventhly, generating the probability of the domain name based on Markov; the length of the domain name; ninthly frequency accessed in short time.
Step 2.3: and (3) training a DGA domain name recognition model by using the database constructed in the step 2.1 as a training sample and the step 2.2 as a characteristic processing mode and using a gradient lifting tree algorithm (GBDT).
Step 2.4: the domain names of all DNS accesses are identified using the model of step 2.3. If such domain names are identifiable: owfcpcrkpmokbu. biz, gowbqepkhqnk. com, etc.
And step 3: using the method of Kmeans, a group of hosts with similar DGA domain name access behavior is found.
Step 3.1: and (3) filtering the DNS record by combining the detection result of the step (2), and reserving the DNS record which has accessed the DGA domain name.
Step 3.2: and taking the domain name as a feature, counting the number of times that each host accesses the domain name, taking the number as a feature value, and constructing a feature vector aiming at each host.
Step 3.3: collecting the feature vectors of all the hosts in the step 3.2, combining with the calculation of the contour coefficient, and using a kmeans algorithm to find out a group { G } with similar domain names of the accessed DGA1,...,Gk}。
And 4, step 4: set the population { G } found in step 31,...,GkAgainst population GiEach host in the system is used for calculating whether intermittent communication connection exists between the host and the IP analyzed by the DGA domain name by using a cyclic autocorrelation method at a time interval of 10 minutes. The calculation process is as follows:
the 1 day time window is divided into ten minute intervals, t1, t2, …, tn for the period interval k,and traversing k, and executing the following operations:
calculating the result value of the periodic activity in a certain period ki, and using a circular autocorrelation method, the following formula is provided
Wherein f (T) represents the access times of the IP-DN in the T time slice in the T time, and f (T + k) represents the result of shifting f (T) to the right by ki step length.
② for any natural number ki, there are
r(ki)≤r(0)
Set the threshold value σ if
α(ki)≥σ(0≤σ≤1)
Then the sequence is considered periodic, i.e., there is intermittent network communication activity.
And 5: combining the results of steps 3 and 4, when the number of the accessed DGA domain names in each group exceeds a set threshold value beta, and the number of hosts which are in intermittent communication connection with the DGA domain names reaches a set threshold value gamma, the group is considered to be a botnet.
The botnet detection technology based on the DGA domain name and the periodic network connection session behavior provided by the embodiment of the invention is described in detail above, a specific example is applied in the text to explain the principle and the implementation mode of the invention, and the description of the embodiment is only used for helping to understand the method and the core idea of the invention; meanwhile, for a person skilled in the art, according to the idea of the present invention, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present invention.
Claims (6)
1. A botnet detection technique based on DGA domain names and periodic network connection session behavior, comprising: analyzing an original DNS protocol; establishing a machine learning detection model of the DGA domain name, and detecting whether the domain name accessed by the host meets the characteristics of the DGA domain name; using a clustering method to find a host population with similar DGA domain name access behaviors; detecting whether intermittent network connection behaviors exist between a host and a DGA domain name by using a cyclic autocorrelation method; when the number of the hosts with periodic network sessions between the hosts and the DGA domain name in each host group reaches a set threshold value, the host group is considered to be a botnet.
2. The method of claim 1, wherein resolving the original DNS protocol comprises: and collecting DNS protocol data in the switch by adopting a bypass model, and interpreting the DNS protocol data into original data.
3. The DGA domain name detection method according to claim 1, comprising: constructing 9 big characteristics: firstly, a domain name, entropy; calculating uni-gram, bi-gram and tri-gram of the domain name by using an n-gram method; probability of vowels and numbers in domain names; fourthly, the probability of repeated letters in the domain name; probability of continuous consonant letters in the domain name; sixthly, probability of continuous numbers in the domain name; seventhly, generating the probability of the domain name based on Markov; the length of the domain name; ninthly, accessed frequency in short time; the detection model is trained using a gradient lifting tree algorithm. When a new DNS record occurs, whether the domain name in the DNS record conforms to the DGA domain name characteristic or not is detected through the model.
4. The host population discovery of claim 1, comprising: processing DNS records of a host into a feature vector, wherein an accessed DGA domain name is a feature dimension, and constructing a feature matrix; and clustering the feature matrix by using a Kmeans algorithm, and finding a population with similar DGA domain name access behaviors. The clustering number K is confirmed by a method of contour coefficients.
5. The periodic network session detection of claim 1, comprising: for all detected DGA domain names, finding out IP addresses for domain name resolution; using the cyclic autocorrelation method, it is calculated whether the host has a periodic network session with the IP in each 10 minutes by setting a threshold.
6. The botnet determination of claim 1, comprising: for the host population found by clustering, if the number of hosts having periodic network sessions with the IP address of DGA domain name resolution in the population exceeds a set threshold, the population is a botnet.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810603034.8A CN110602020A (en) | 2018-06-12 | 2018-06-12 | Botnet detection technology based on DGA domain name and periodic network connection session behavior |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810603034.8A CN110602020A (en) | 2018-06-12 | 2018-06-12 | Botnet detection technology based on DGA domain name and periodic network connection session behavior |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110602020A true CN110602020A (en) | 2019-12-20 |
Family
ID=68849381
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810603034.8A Pending CN110602020A (en) | 2018-06-12 | 2018-06-12 | Botnet detection technology based on DGA domain name and periodic network connection session behavior |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110602020A (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113098989A (en) * | 2020-01-09 | 2021-07-09 | 深信服科技股份有限公司 | Dictionary generation method, domain name detection method, device, equipment and medium |
CN113452714A (en) * | 2021-06-29 | 2021-09-28 | 清华大学 | Host clustering method and device |
CN114666071A (en) * | 2020-12-04 | 2022-06-24 | ***通信集团广东有限公司 | Botnet identification method and device and terminal equipment |
CN115134095A (en) * | 2021-03-10 | 2022-09-30 | 中国电信股份有限公司 | Botnet control terminal detection method and device, storage medium and electronic equipment |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20130191915A1 (en) * | 2012-01-25 | 2013-07-25 | Damballa, Inc. | Method and system for detecting dga-based malware |
CN106713371A (en) * | 2016-12-08 | 2017-05-24 | 中国电子科技网络信息安全有限公司 | Fast Flux botnet detection method based on DNS anomaly mining |
CN107566376A (en) * | 2017-09-11 | 2018-01-09 | 中国信息安全测评中心 | One kind threatens information generation method, apparatus and system |
CN107645503A (en) * | 2017-09-20 | 2018-01-30 | 杭州安恒信息技术有限公司 | A kind of detection method of the affiliated DGA families of rule-based malice domain name |
-
2018
- 2018-06-12 CN CN201810603034.8A patent/CN110602020A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20130191915A1 (en) * | 2012-01-25 | 2013-07-25 | Damballa, Inc. | Method and system for detecting dga-based malware |
CN106713371A (en) * | 2016-12-08 | 2017-05-24 | 中国电子科技网络信息安全有限公司 | Fast Flux botnet detection method based on DNS anomaly mining |
CN107566376A (en) * | 2017-09-11 | 2018-01-09 | 中国信息安全测评中心 | One kind threatens information generation method, apparatus and system |
CN107645503A (en) * | 2017-09-20 | 2018-01-30 | 杭州安恒信息技术有限公司 | A kind of detection method of the affiliated DGA families of rule-based malice domain name |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113098989A (en) * | 2020-01-09 | 2021-07-09 | 深信服科技股份有限公司 | Dictionary generation method, domain name detection method, device, equipment and medium |
CN114666071A (en) * | 2020-12-04 | 2022-06-24 | ***通信集团广东有限公司 | Botnet identification method and device and terminal equipment |
CN114666071B (en) * | 2020-12-04 | 2023-09-05 | ***通信集团广东有限公司 | Botnet identification method and device and terminal equipment |
CN115134095A (en) * | 2021-03-10 | 2022-09-30 | 中国电信股份有限公司 | Botnet control terminal detection method and device, storage medium and electronic equipment |
CN113452714A (en) * | 2021-06-29 | 2021-09-28 | 清华大学 | Host clustering method and device |
CN113452714B (en) * | 2021-06-29 | 2022-11-18 | 清华大学 | Host clustering method and device |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US8260914B1 (en) | Detecting DNS fast-flux anomalies | |
Yin et al. | ConnSpoiler: Disrupting C&C communication of IoT-based botnet through fast detection of anomalous domain queries | |
CN108768883B (en) | Network traffic identification method and device | |
CN110602020A (en) | Botnet detection technology based on DGA domain name and periodic network connection session behavior | |
US20200059451A1 (en) | System and method for detecting generated domain | |
CN112242984B (en) | Method, electronic device and computer program product for detecting abnormal network request | |
CN112866023B (en) | Network detection method, model training method, device, equipment and storage medium | |
CN107657174B (en) | Database intrusion detection method based on protocol fingerprint | |
CN108924118B (en) | Method and system for detecting database collision behavior | |
CN108616498A (en) | A kind of web access exceptions detection method and device | |
CN111131260B (en) | Mass network malicious domain name identification and classification method and system | |
CN111031026A (en) | DGA malicious software infected host detection method | |
Tong et al. | A method for detecting DGA botnet based on semantic and cluster analysis | |
Celik et al. | Detection of Fast-Flux Networks using various DNS feature sets | |
CN110365636B (en) | Method and device for judging attack data source of industrial control honeypot | |
CN113179260B (en) | Botnet detection method, device, equipment and medium | |
US11886818B2 (en) | Method and apparatus for detecting anomalies in mission critical environments | |
CN110705250A (en) | Method and system for identifying target content in chat records | |
CN112839054A (en) | Network attack detection method, device, equipment and medium | |
Mimura et al. | Leaving all proxy server logs to paragraph vector | |
CN113691489A (en) | Malicious domain name detection feature processing method and device and electronic equipment | |
CN116886400A (en) | Malicious domain name detection method, system and medium | |
CN114205146B (en) | Processing method and device for multi-source heterogeneous security log | |
CN112261004B (en) | Method and device for detecting Domain Flux data stream | |
Zhou et al. | Fingerprinting IIoT devices through machine learning techniques |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |