CN110602020A - Botnet detection technology based on DGA domain name and periodic network connection session behavior - Google Patents

Botnet detection technology based on DGA domain name and periodic network connection session behavior Download PDF

Info

Publication number
CN110602020A
CN110602020A CN201810603034.8A CN201810603034A CN110602020A CN 110602020 A CN110602020 A CN 110602020A CN 201810603034 A CN201810603034 A CN 201810603034A CN 110602020 A CN110602020 A CN 110602020A
Authority
CN
China
Prior art keywords
domain name
host
dga
dga domain
population
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201810603034.8A
Other languages
Chinese (zh)
Inventor
杨育斌
吴智东
柯宗贵
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Blue Shield Information Security Technology Co Ltd
Bluedon Information Security Technologies Co Ltd
Original Assignee
Blue Shield Information Security Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Blue Shield Information Security Technology Co Ltd filed Critical Blue Shield Information Security Technology Co Ltd
Priority to CN201810603034.8A priority Critical patent/CN110602020A/en
Publication of CN110602020A publication Critical patent/CN110602020A/en
Pending legal-status Critical Current

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L61/00Network arrangements, protocols or services for addressing or naming
    • H04L61/45Network directories; Name-to-address mapping
    • H04L61/4505Network directories; Name-to-address mapping using standardised directories; using standardised directory access protocols
    • H04L61/4511Network directories; Name-to-address mapping using standardised directories; using standardised directory access protocols using domain name system [DNS]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1408Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1408Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
    • H04L63/1425Traffic logging, e.g. anomaly detection
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L2463/00Additional details relating to network architectures or network communication protocols for network security covered by H04L63/00
    • H04L2463/144Detection or countermeasures against botnets

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Computer Hardware Design (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The invention discloses a botnet detection technology based on a DGA domain name and periodic network connection session behaviors, and relates to the field of network security. The detection method comprises the following steps: acquiring network protocol data and analyzing DNS protocol logs; establishing a machine learning detection model of the DGA domain name, and detecting whether the domain name accessed by the host meets the characteristics of the DGA domain name; using a clustering method to find a host population with similar DGA domain name access behaviors; detecting whether intermittent network connection behaviors exist between a host and a DGA domain name by using a cyclic autocorrelation method; and (4) detecting the botnets in the host group by combining the steps. The invention only collects the network protocol data in the exchanger in a bypass collection mode, and has wide application range, high detection efficiency and high accuracy.

Description

Botnet detection technology based on DGA domain name and periodic network connection session behavior
Technical Field
The invention relates to the technical field of information security, in particular to a botnet detection technology based on a DGA domain name and periodic network connection session behaviors.
Background
Botnets refer to computer networks that employ one or more propagation means, infect a large number of hosts with bot program viruses, and perform one-to-many control on each controlled node through a control server (C & C server). The virus usually has a commanded domain name in it, and the bot host accesses the C & C server by accessing this domain name intermittently. After communication with the C & C server is established, a command of a control end can be received, distributed denial of service attack (DDOS) and massive junk mails are launched, and meanwhile, a controller can steal information of a bot host, such as bank account passwords and the like. Botnets are therefore a direct threat to the normal operation of enterprise networks and user data security. The existing botnet detection method has the characteristics of dependence on a large number of artificial rules, low detection precision, high difficulty, high false alarm and the like, and the finding of the botnet detection method which combines machine learning and has high precision is particularly important.
Disclosure of Invention
And analyzing the underlying network protocol to obtain message protocol information. The DGA domain name in the DNS record is identified using machine learning. Using a clustering algorithm, a group of hosts with similar DGA domain name access is discovered. And detecting whether intermittent network connection exists between all the hosts in the group and the server IP corresponding to the DGA domain name. For the host group meeting the above conditions, the host group is a botnet.
In order to achieve the purpose, the technical scheme for botnet detection based on the DGA domain name and the periodic network session provided by the invention is as follows:
step 1: the flow protocol in the enterprise network is analyzed in a flow acquisition probe mode, mainly comprising TCP/UDP/DNS and the like, in a standardized format and stored in HDFS and elastic search.
Step 2: and constructing 9 large features aiming at the domain names in the DNS message, and detecting the DGA domain names by using a gradient lifting tree (GBDT) algorithm.
And step 3: and (3) finding the host group with similar DGA domain name access behaviors by using a Kmeans method in combination with the DGA domain name detected in the step (2).
And 4, step 4: and (3) finding that the behavior of intermittently carrying out network communication with the IP to which the DGA domain name belongs exists by using methods such as circular autocorrelation and the like in combination with the suspicious group detected in the step (3).
And 5: combining the results of steps 3 and 4, when the number of hosts meeting the conditions on each group reaches a set threshold, then the group is considered as a botnet.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
FIG. 1 is a flow chart proposed by the present invention;
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The implementation flow of the scheme is as follows:
step 1: and accessing the enterprise main switch by using a bypass flow access mode to acquire the flow information of all concerned hosts. Message information of protocols such as TCP/UDP/DNS is analyzed by using a flow analysis tool and is stored in HDFS and ELASTICSEARCH through a KAFKA message queue.
Step 2: and (4) combining the step 1, acquiring domain name information from the analyzed DNS message information. And training a DGA domain name detection model.
Step 2.1: and constructing a training library. Obtaining a domain name library generated by a DGA algorithm from an open source library, such as http:// data. netlab.360.com/DGA/, and the like, and taking the domain name library as a negative sample; and obtaining positive samples from Alexa website ranking, and constructing a training library.
Step 2.2: 9 domain name detection features are constructed. Firstly, a domain name, entropy; calculating uni-gram, bi-gram and tri-gram of the domain name by using an n-gram method; probability of vowels and numbers in domain names; fourthly, the probability of repeated letters in the domain name; probability of continuous consonant letters in the domain name; sixthly, probability of continuous numbers in the domain name; seventhly, generating the probability of the domain name based on Markov; the length of the domain name; ninthly frequency accessed in short time.
Step 2.3: and (3) training a DGA domain name recognition model by using the database constructed in the step 2.1 as a training sample and the step 2.2 as a characteristic processing mode and using a gradient lifting tree algorithm (GBDT).
Step 2.4: the domain names of all DNS accesses are identified using the model of step 2.3. If such domain names are identifiable: owfcpcrkpmokbu. biz, gowbqepkhqnk. com, etc.
And step 3: using the method of Kmeans, a group of hosts with similar DGA domain name access behavior is found.
Step 3.1: and (3) filtering the DNS record by combining the detection result of the step (2), and reserving the DNS record which has accessed the DGA domain name.
Step 3.2: and taking the domain name as a feature, counting the number of times that each host accesses the domain name, taking the number as a feature value, and constructing a feature vector aiming at each host.
Step 3.3: collecting the feature vectors of all the hosts in the step 3.2, combining with the calculation of the contour coefficient, and using a kmeans algorithm to find out a group { G } with similar domain names of the accessed DGA1,...,Gk}。
And 4, step 4: set the population { G } found in step 31,...,GkAgainst population GiEach host in the system is used for calculating whether intermittent communication connection exists between the host and the IP analyzed by the DGA domain name by using a cyclic autocorrelation method at a time interval of 10 minutes. The calculation process is as follows:
the 1 day time window is divided into ten minute intervals, t1, t2, …, tn for the period interval k,and traversing k, and executing the following operations:
calculating the result value of the periodic activity in a certain period ki, and using a circular autocorrelation method, the following formula is provided
Wherein f (T) represents the access times of the IP-DN in the T time slice in the T time, and f (T + k) represents the result of shifting f (T) to the right by ki step length.
② for any natural number ki, there are
r(ki)≤r(0)
Set the threshold value σ if
α(ki)≥σ(0≤σ≤1)
Then the sequence is considered periodic, i.e., there is intermittent network communication activity.
And 5: combining the results of steps 3 and 4, when the number of the accessed DGA domain names in each group exceeds a set threshold value beta, and the number of hosts which are in intermittent communication connection with the DGA domain names reaches a set threshold value gamma, the group is considered to be a botnet.
The botnet detection technology based on the DGA domain name and the periodic network connection session behavior provided by the embodiment of the invention is described in detail above, a specific example is applied in the text to explain the principle and the implementation mode of the invention, and the description of the embodiment is only used for helping to understand the method and the core idea of the invention; meanwhile, for a person skilled in the art, according to the idea of the present invention, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present invention.

Claims (6)

1. A botnet detection technique based on DGA domain names and periodic network connection session behavior, comprising: analyzing an original DNS protocol; establishing a machine learning detection model of the DGA domain name, and detecting whether the domain name accessed by the host meets the characteristics of the DGA domain name; using a clustering method to find a host population with similar DGA domain name access behaviors; detecting whether intermittent network connection behaviors exist between a host and a DGA domain name by using a cyclic autocorrelation method; when the number of the hosts with periodic network sessions between the hosts and the DGA domain name in each host group reaches a set threshold value, the host group is considered to be a botnet.
2. The method of claim 1, wherein resolving the original DNS protocol comprises: and collecting DNS protocol data in the switch by adopting a bypass model, and interpreting the DNS protocol data into original data.
3. The DGA domain name detection method according to claim 1, comprising: constructing 9 big characteristics: firstly, a domain name, entropy; calculating uni-gram, bi-gram and tri-gram of the domain name by using an n-gram method; probability of vowels and numbers in domain names; fourthly, the probability of repeated letters in the domain name; probability of continuous consonant letters in the domain name; sixthly, probability of continuous numbers in the domain name; seventhly, generating the probability of the domain name based on Markov; the length of the domain name; ninthly, accessed frequency in short time; the detection model is trained using a gradient lifting tree algorithm. When a new DNS record occurs, whether the domain name in the DNS record conforms to the DGA domain name characteristic or not is detected through the model.
4. The host population discovery of claim 1, comprising: processing DNS records of a host into a feature vector, wherein an accessed DGA domain name is a feature dimension, and constructing a feature matrix; and clustering the feature matrix by using a Kmeans algorithm, and finding a population with similar DGA domain name access behaviors. The clustering number K is confirmed by a method of contour coefficients.
5. The periodic network session detection of claim 1, comprising: for all detected DGA domain names, finding out IP addresses for domain name resolution; using the cyclic autocorrelation method, it is calculated whether the host has a periodic network session with the IP in each 10 minutes by setting a threshold.
6. The botnet determination of claim 1, comprising: for the host population found by clustering, if the number of hosts having periodic network sessions with the IP address of DGA domain name resolution in the population exceeds a set threshold, the population is a botnet.
CN201810603034.8A 2018-06-12 2018-06-12 Botnet detection technology based on DGA domain name and periodic network connection session behavior Pending CN110602020A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810603034.8A CN110602020A (en) 2018-06-12 2018-06-12 Botnet detection technology based on DGA domain name and periodic network connection session behavior

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810603034.8A CN110602020A (en) 2018-06-12 2018-06-12 Botnet detection technology based on DGA domain name and periodic network connection session behavior

Publications (1)

Publication Number Publication Date
CN110602020A true CN110602020A (en) 2019-12-20

Family

ID=68849381

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810603034.8A Pending CN110602020A (en) 2018-06-12 2018-06-12 Botnet detection technology based on DGA domain name and periodic network connection session behavior

Country Status (1)

Country Link
CN (1) CN110602020A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113098989A (en) * 2020-01-09 2021-07-09 深信服科技股份有限公司 Dictionary generation method, domain name detection method, device, equipment and medium
CN113452714A (en) * 2021-06-29 2021-09-28 清华大学 Host clustering method and device
CN114666071A (en) * 2020-12-04 2022-06-24 ***通信集团广东有限公司 Botnet identification method and device and terminal equipment
CN115134095A (en) * 2021-03-10 2022-09-30 中国电信股份有限公司 Botnet control terminal detection method and device, storage medium and electronic equipment

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130191915A1 (en) * 2012-01-25 2013-07-25 Damballa, Inc. Method and system for detecting dga-based malware
CN106713371A (en) * 2016-12-08 2017-05-24 中国电子科技网络信息安全有限公司 Fast Flux botnet detection method based on DNS anomaly mining
CN107566376A (en) * 2017-09-11 2018-01-09 中国信息安全测评中心 One kind threatens information generation method, apparatus and system
CN107645503A (en) * 2017-09-20 2018-01-30 杭州安恒信息技术有限公司 A kind of detection method of the affiliated DGA families of rule-based malice domain name

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130191915A1 (en) * 2012-01-25 2013-07-25 Damballa, Inc. Method and system for detecting dga-based malware
CN106713371A (en) * 2016-12-08 2017-05-24 中国电子科技网络信息安全有限公司 Fast Flux botnet detection method based on DNS anomaly mining
CN107566376A (en) * 2017-09-11 2018-01-09 中国信息安全测评中心 One kind threatens information generation method, apparatus and system
CN107645503A (en) * 2017-09-20 2018-01-30 杭州安恒信息技术有限公司 A kind of detection method of the affiliated DGA families of rule-based malice domain name

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113098989A (en) * 2020-01-09 2021-07-09 深信服科技股份有限公司 Dictionary generation method, domain name detection method, device, equipment and medium
CN114666071A (en) * 2020-12-04 2022-06-24 ***通信集团广东有限公司 Botnet identification method and device and terminal equipment
CN114666071B (en) * 2020-12-04 2023-09-05 ***通信集团广东有限公司 Botnet identification method and device and terminal equipment
CN115134095A (en) * 2021-03-10 2022-09-30 中国电信股份有限公司 Botnet control terminal detection method and device, storage medium and electronic equipment
CN113452714A (en) * 2021-06-29 2021-09-28 清华大学 Host clustering method and device
CN113452714B (en) * 2021-06-29 2022-11-18 清华大学 Host clustering method and device

Similar Documents

Publication Publication Date Title
US8260914B1 (en) Detecting DNS fast-flux anomalies
Yin et al. ConnSpoiler: Disrupting C&C communication of IoT-based botnet through fast detection of anomalous domain queries
CN108768883B (en) Network traffic identification method and device
CN110602020A (en) Botnet detection technology based on DGA domain name and periodic network connection session behavior
US20200059451A1 (en) System and method for detecting generated domain
CN112242984B (en) Method, electronic device and computer program product for detecting abnormal network request
CN112866023B (en) Network detection method, model training method, device, equipment and storage medium
CN107657174B (en) Database intrusion detection method based on protocol fingerprint
CN108924118B (en) Method and system for detecting database collision behavior
CN108616498A (en) A kind of web access exceptions detection method and device
CN111131260B (en) Mass network malicious domain name identification and classification method and system
CN111031026A (en) DGA malicious software infected host detection method
Tong et al. A method for detecting DGA botnet based on semantic and cluster analysis
Celik et al. Detection of Fast-Flux Networks using various DNS feature sets
CN110365636B (en) Method and device for judging attack data source of industrial control honeypot
CN113179260B (en) Botnet detection method, device, equipment and medium
US11886818B2 (en) Method and apparatus for detecting anomalies in mission critical environments
CN110705250A (en) Method and system for identifying target content in chat records
CN112839054A (en) Network attack detection method, device, equipment and medium
Mimura et al. Leaving all proxy server logs to paragraph vector
CN113691489A (en) Malicious domain name detection feature processing method and device and electronic equipment
CN116886400A (en) Malicious domain name detection method, system and medium
CN114205146B (en) Processing method and device for multi-source heterogeneous security log
CN112261004B (en) Method and device for detecting Domain Flux data stream
Zhou et al. Fingerprinting IIoT devices through machine learning techniques

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination