CN111756871B - Data processing method based on domain name service protocol and electronic equipment - Google Patents

Data processing method based on domain name service protocol and electronic equipment Download PDF

Info

Publication number
CN111756871B
CN111756871B CN202010558544.5A CN202010558544A CN111756871B CN 111756871 B CN111756871 B CN 111756871B CN 202010558544 A CN202010558544 A CN 202010558544A CN 111756871 B CN111756871 B CN 111756871B
Authority
CN
China
Prior art keywords
character
sequence
probability
detection model
request
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010558544.5A
Other languages
Chinese (zh)
Other versions
CN111756871A (en
Inventor
张新
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Topsec Technology Co Ltd
Beijing Topsec Network Security Technology Co Ltd
Beijing Topsec Software Co Ltd
Original Assignee
Beijing Topsec Technology Co Ltd
Beijing Topsec Network Security Technology Co Ltd
Beijing Topsec Software Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Topsec Technology Co Ltd, Beijing Topsec Network Security Technology Co Ltd, Beijing Topsec Software Co Ltd filed Critical Beijing Topsec Technology Co Ltd
Priority to CN202010558544.5A priority Critical patent/CN111756871B/en
Publication of CN111756871A publication Critical patent/CN111756871A/en
Application granted granted Critical
Publication of CN111756871B publication Critical patent/CN111756871B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L61/00Network arrangements, protocols or services for addressing or naming
    • H04L61/45Network directories; Name-to-address mapping
    • H04L61/4505Network directories; Name-to-address mapping using standardised directories; using standardised directory access protocols
    • H04L61/4511Network directories; Name-to-address mapping using standardised directories; using standardised directory access protocols using domain name system [DNS]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1408Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
    • H04L63/1425Traffic logging, e.g. anomaly detection
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L69/00Network arrangements, protocols or services independent of the application payload and not provided for in the other groups of this subclass
    • H04L69/22Parsing or analysis of headers

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Computer Security & Cryptography (AREA)
  • Computer Hardware Design (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The application discloses a data processing method and electronic equipment based on a domain name service protocol, wherein the method comprises the following steps: under the condition that a first request based on a domain name service protocol is detected, first data based on the domain name service protocol corresponding to the first request is obtained; analyzing the first data to obtain a first character sequence at a specific position in the first data, and extracting character segments with specific bit length from the first character sequence based on a specific sequence to form a character segment sequence; determining the occurrence probability of character segments in the character segment sequence to form a probability information set, and acquiring first characteristic information based on the probability information set, wherein the first characteristic information represents the chaos degree of the first character sequence; and inputting the first characteristic information into the detection model which is trained, and calculating by using the detection model to obtain the identification result of the protocol type of the first request. The method can identify the encrypted protocol type, and the accuracy of the identification result is higher.

Description

Data processing method based on domain name service protocol and electronic equipment
Technical Field
The present disclosure relates to the field of network information, and in particular, to a data processing method and an electronic device based on a domain name service protocol.
Background
The DNS protocol (Domain Name System, Domain Name service protocol) is one of indispensable network communication protocols, and provides Domain Name resolution services for converting Domain names and IP addresses in order to access internet and intranet resources. Most firewalls and intrusion detection devices do not filter or mask the DNS substantially, so hiding data or instructions from being transmitted in the DNS protocol is a covert and efficient means. In an actual scene, when an attacker takes down a certain server right or the server is infected by malicious software, worms, trojans and the like, the purposes of sensitive information theft, file transmission, control instruction return and the like are achieved by establishing a DNS tunnel. Therefore, some illegal attackers may use the above technical principle to achieve the purpose of avoiding the detection of the firewall through the DNS tunneling technique, and it is necessary to detect whether the DNS tunneling technique exists in the data traffic of the network.
If the DNS tunnel technology exists in the data traffic of the network, the traffic needs to be investigated and subjected to forensics analysis, if the data traffic of the network needs to be subjected to forensics analysis, the upper layer protocol type of the DNS tunnel needs to be determined, then the content of the DNS traffic is identified based on the determined upper layer protocol type, and then whether sensitive information, files, control instructions and the like exist in the DNS traffic is determined.
Generally, detecting the upper protocol type of DNS traffic is by comparing the entropy of encoded protocol data in DNS traffic with the entropy of traffic data of various specific protocols, and determining the upper protocol type of DNS traffic based on similarity. However, the detection method has two defects, namely, the detection accuracy is low; second, only the type of non-encryption protocol can be detected, but not the type of encryption protocol.
Disclosure of Invention
An object of the embodiments of the present application is to provide a data processing method and an electronic device based on a domain name service protocol, where the method can detect data in network traffic through a trained detection model, and conveniently and accurately detect whether a network request is an abnormal specific request based on the domain name service protocol, where the specific request includes a request based on a DNS tunnel technology.
In order to solve the technical problem, the embodiment of the application adopts the following technical scheme:
a data processing method based on a domain name service protocol comprises the following steps:
under the condition that a first request based on a domain name service protocol is detected, first data based on the domain name service protocol corresponding to the first request is obtained;
analyzing the first data to obtain a first character sequence at a specific position in the first data, and extracting character segments with specific bit length from the first character sequence based on a specific sequence to form a character segment sequence;
determining the occurrence probability of the character segments in the character segment sequence to form a probability information set, and acquiring first characteristic information based on the probability information set, wherein the first characteristic information represents the chaos degree of the first character sequence;
and inputting the first characteristic information into a detection model which is trained, and calculating by using the detection model to obtain a recognition result of the protocol type of the first request.
In some embodiments, the method further comprises:
accumulating and checking the first character sequence to obtain second characteristic information;
correspondingly, the inputting the first feature information into the detection model that has been trained, and performing calculation by using the detection model to obtain the recognition result of the protocol type of the first request includes:
and inputting the first characteristic information and the second characteristic information into the detection model which is trained, and calculating by using the detection model to obtain the identification result of the protocol type of the first request.
In some embodiments, said extracting the character segments having the specific bit length from the first character sequence based on the specific order forms a character segment sequence, including:
and extracting a character segment with a specific bit length from the first character in the first character sequence, and staggering one character in the ending direction of the first character sequence every time, and extracting one character segment with the specific bit length until the end of the first character sequence so as to form the character segment sequence.
In some embodiments, said extracting the character segments having the specific bit length from the first character sequence based on the specific order forms a character segment sequence, including:
character segments having different bit lengths are extracted from the first character sequence based on a specific order to form a plurality of character segment sequences, respectively.
In some embodiments, the determining the probability of occurrence of the character segment in the sequence of character segments forms a probability information set, and the obtaining first feature information based on the probability information set includes:
respectively determining the occurrence probability of the character segments in each character segment sequence, and respectively forming a plurality of corresponding probability information sets;
and acquiring corresponding sub-feature information based on the probability information sets respectively.
In some embodiments, the inputting the first feature information into a detection model that is trained, and performing calculation by using the detection model to obtain a recognition result of the protocol type of the first request includes:
and inputting a plurality of pieces of sub-feature information into a detection model which is trained, and calculating by using the detection model to obtain a recognition result of the protocol type of the first request.
In some embodiments, said extracting character segments having different bit lengths from said first sequence of characters based on a particular order to form a plurality of sequences of character segments, respectively, comprises:
extracting a first character segment with a first bit length from the first character sequence based on a specific sequence to form a first character segment sequence;
extracting a second character segment with a second bit length from the first character sequence based on a specific sequence to form a second character segment sequence;
and extracting a third character segment with a third bit length from the first character sequence based on a specific sequence to form a third character segment sequence.
In some embodiments, the determining the probability of occurrence of the character segment in the sequence of character segments forms a probability information set, and the obtaining first feature information based on the probability information set includes:
calculating the occurrence probability of the character segments in the character segment sequence based on an N-Gram model to form a probability information set;
calculating entropy of the set of probability information as the first feature information.
In some embodiments, the detection model is formed by training an established model architecture, wherein the training process comprises:
preparing a training data set, wherein the training data set comprises a first characteristic information set and a corresponding recognition result data set;
and training the model architecture by taking the first characteristic information set as input data and the recognition result data set as output data.
An electronic device, comprising:
the device comprises an acquisition module, a processing module and a processing module, wherein the acquisition module is used for acquiring first data based on a domain name service protocol corresponding to a first request under the condition that the first request based on the domain name service protocol is detected;
the analysis module is used for analyzing the first data, acquiring a first character sequence at a specific position in the first data, and extracting character segments with specific bit length from the first character sequence based on a specific sequence to form a character segment sequence;
a determining module, configured to determine that occurrence probabilities of the character segments in the character segment sequences form a probability information set, and obtain first feature information based on the probability information set, where the first feature information represents a degree of confusion of the first character sequence;
and the processing module is used for inputting the first characteristic information into a detection model which is trained, calculating by using the detection model and acquiring a recognition result of the protocol type of the first request.
According to the data processing method based on the domain name service protocol, the probability of occurrence of each character fragment in the character fragment sequence is calculated, the probability information set is formed, the first characteristic information capable of representing the chaos degree of the first character sequence is obtained based on the probability information set, the protocol type of the first request can be well described, the first characteristic information is used as input data and input into a detection model completing training, the identification result of the protocol type of the first request can be obtained, the encryption protocol can be identified, and the identification result has high accuracy.
Drawings
Fig. 1 is a flowchart of a data processing method based on a domain name service protocol according to an embodiment of the present application;
fig. 2 is a flowchart of a specific implementation of a data processing method based on a domain name service protocol according to an embodiment of the present application;
fig. 3 is a flowchart of another specific implementation of a data processing method based on a domain name service protocol according to an embodiment of the present application;
fig. 4 is a block diagram of an electronic device according to an embodiment of the present application.
Detailed Description
Various aspects and features of the present application are described herein with reference to the drawings.
It will be understood that various modifications may be made to the embodiments of the present application. Accordingly, the foregoing description should not be construed as limiting, but merely as exemplifications of embodiments. Those skilled in the art will envision other modifications within the scope and spirit of the application.
The accompanying drawings, which are incorporated in and constitute a part of the specification, illustrate embodiments of the application and, together with a general description of the application given above and the detailed description of the embodiments given below, serve to explain the principles of the application.
These and other characteristics of the present application will become apparent from the following description of preferred forms of embodiment, given as non-limiting examples, with reference to the attached drawings.
It should also be understood that, although the present application has been described with reference to some specific examples, a person of skill in the art shall certainly be able to achieve many other equivalent forms of application, having the characteristics as set forth in the claims and hence all coming within the field of protection defined thereby.
The above and other aspects, features and advantages of the present application will become more apparent in view of the following detailed description when taken in conjunction with the accompanying drawings.
Specific embodiments of the present application are described hereinafter with reference to the accompanying drawings; however, it is to be understood that the disclosed embodiments are merely exemplary of the application, which can be embodied in various forms. Well-known and/or repeated functions and constructions are not described in detail to avoid obscuring the application of unnecessary or unnecessary detail. Therefore, specific structural and functional details disclosed herein are not to be interpreted as limiting, but merely as a basis for the claims and as a representative basis for teaching one skilled in the art to variously employ the present application in virtually any appropriately detailed structure.
The specification may use the phrases "in one embodiment," "in another embodiment," "in yet another embodiment," or "in other embodiments," which may each refer to one or more of the same or different embodiments in accordance with the application.
Fig. 1 is a flowchart of a data processing method based on a domain name service protocol according to an embodiment of the present application, and as shown in fig. 1, the data processing method based on the domain name service protocol according to the embodiment of the present application specifically includes the following steps:
s100, under the condition that a first request based on a domain name service protocol is detected, first data based on the domain name service protocol corresponding to the first request is obtained.
The Domain Name service protocol is a DNS protocol (Domain Name System, Domain Name service protocol), and the first request may be a DNS request sent by the terminal to the server, and may also be other request information. In the implementation, the server side can be monitored, so that the first request can be detected when the server receives the first request; the terminal may also be monitored to detect the first request when the terminal sends the first request out targeting the server.
The first data may be the first request application or data associated with the first request. In one embodiment, when the first request is detected, the DNS traffic can be captured and saved in the pcap format via a wireshark application or a tcpdump command of the linux operating system.
S200, analyzing the first data, obtaining a first character sequence at a specific position in the first data, and extracting character segments with specific bit length from the first character sequence based on a specific sequence to form a character segment sequence.
The first character sequence is a character sequence at a specific position in first data, and taking the first data as the DNS request as an example, the first character sequence may be any character sequence in the DNS request that includes tunnel information, for example, the first character sequence may be a Query Name (Query Name). After the DNS request is obtained, the DNS request may be parsed, and the query name therein may be extracted as the first character sequence.
After the first character sequence is obtained, character segments with a specific bit length in the first character sequence can be extracted based on a specific sequence, and a character segment sequence can be formed by the character segments. The specific sequence may be an extraction sequence from the beginning to the end of the first character in the first character sequence, or may be other sequences, the specific bit length refers to the character length of the character fragments, the specific bit length may be, for example, 2, 3, 4, 5 characters, or even more, and after extracting the character fragments, the character fragment sequence may be formed according to the specific sequence. In specific implementation, only a single character segment sequence may be obtained, or a plurality of character segment sequences may be obtained.
S300, determining the occurrence probability of the character segments in the character segment sequence to form a probability information set, and acquiring first characteristic information based on the probability information set, wherein the first characteristic information represents the chaos degree of the first character sequence.
In a particular embodiment, the probability of occurrence of the character fragments in the sequence of character fragments may be calculated based on an N-Gram model to form a set of probability information. Taking the first character sequence abbcab as an example, extracting character segments with 2 characters and forming a character segment sequence [ ab, bb, bc, ca, ab, bc ], that is, a 2-gram, then the occurrence probability of each character segment in the character segment sequence is: ab: 2/6, bb: 1/6, bc: 2/6, ca: 1/6.
The degree of disorder can be understood as the degree of disorder, i.e., the more disorder the character fragments are distributed, the higher the degree of disorder, and the less disorder the character fragments are distributed. The degree of disorder can also be understood as the frequency of occurrence of discrete random events, i.e. the degree of disorder is lower the smaller the uncertainty when the probability of occurrence is higher, and the degree of disorder is higher the larger the uncertainty when the probability of occurrence is lower. In a specific embodiment, after the probability information set is obtained, the entropy of the probability information set can be calculated and taken as the first feature information. When the occurrence probability is calculated based on the N-Gram model, the entropy is the N-Gram entropy. The formula for calculating the N-Gram entropy is as follows:
H(x)=-∑P(xi)log(2,P(xi))(i=1,2,..n)
wherein H(x)Is N-Gram entropy, P(xi)Is the probability of occurrence of a character fragment.
Taking the first character sequence as an example, the 2-gram entropy thereof is calculated as follows:
2/6*log(2/6)+1/6*log(1/6)+2/6*log(2/6)+1/6*log(1/6)+2/6*log(2/6)+2/6*log(2/6)
and S400, inputting the first characteristic information into a detection model which completes training, and calculating by using the detection model to obtain a recognition result of the protocol type of the first request.
The detection model is a machine learning model, and the detection model can be trained through a pre-constructed model architecture, as shown in fig. 2, where the pre-constructed model architecture may include one or more detection algorithms, such as a random forest algorithm or a gradient descent tree algorithm. After the model architecture is constructed, training is needed to improve the accuracy of the output recognition result. The training dataset for training the model architecture may include a first set of feature information, and a corresponding recognition result dataset. The first characteristic information set includes a plurality of first characteristic information, and the recognition result data set includes a recognition result corresponding to the first characteristic information. In the training process, the first characteristic information in the first characteristic information set is used as input data, the recognition result in the recognition result data set is used as output data to repeatedly train the model architecture, and the detection model is determined to be trained and can be applied online until the accuracy of the recognition result output by the model architecture is determined to meet the threshold requirement in the verification process.
After the training is finished, the first characteristic information acquired in real time is input into the detection model which finishes the training, and the detection model is used for calculation, so that the identification result which can represent the protocol type of the first request can be acquired. As embodied, N-Gram entropy may be input into the detection model to obtain corresponding recognition results.
According to the data processing method based on the domain name service protocol, the probability information set is formed by calculating the occurrence probability of the character fragments in the character fragment sequence, the first characteristic information capable of representing the chaos degree of the character fragments is obtained based on the probability information set, the protocol type of the first request can be well described, the first characteristic information is used as input data and is input into the detection model completing training, the identification result of the protocol type of the first request can be obtained, the encryption protocol can be identified, and the identification result has high accuracy.
As shown in conjunction with fig. 2, in some embodiments, the method further comprises: and accumulating and checking the first character sequence to obtain second characteristic information, wherein the second characteristic information is accumulated and checked value. The cumulative sum test is a hypothesis test that calculates how likely it will accept a hypothesis that the test object is a random sequence, and the range of the cumulative sum test value obtained by the calculation is (0, 1), and a value close to 0 indicates that the randomness is larger. Correspondingly, the inputting the first feature information into the detection model that has been trained, and performing calculation by using the detection model to obtain the recognition result of the protocol type of the first request includes: and inputting the first characteristic information and the second characteristic information into the detection model which is trained, and calculating by using the detection model to obtain the identification result of the protocol type of the first request. The accumulated sum check value can well depict the type of the encryption protocol, and the accuracy of the identification result output by the detection model can be improved by increasing the accumulated sum as second characteristic information.
In some embodiments, said extracting the character segments having the specific bit length from the first character sequence based on the specific order forms a character segment sequence, including:
and extracting a character segment with a specific bit length from the first character in the first character sequence, and staggering one character in the ending direction of the first character sequence every time, and extracting one character segment with the specific bit length until the end of the first character sequence so as to form the character segment sequence.
In specific implementation, a sliding window with a character length of N may be configured, the sliding window is set to slide from the first character in the first character sequence to the end direction, and a character segment with a character length of N is extracted through the sliding window when moving one character each time until the sliding window moves to the end of the first character sequence. Wherein, the value of N can be 2, 3, 4, 5, etc. In other embodiments, the sliding window is not limited to one character per movement, but may move multiple characters.
In practical applications, a plurality of character segment sequences may be obtained, and the character segments in each character segment sequence may have different bit lengths, as shown in fig. 3. That is, the extracting of the character segments having a specific bit length from the first character sequence based on a specific order to form a character segment sequence may include:
character segments having different bit lengths are extracted from the first character sequence based on a specific order to form a plurality of character segment sequences, respectively.
As one specific embodiment, the extracting of the character segments with different bit lengths from the first character sequence based on a specific order to form a plurality of character segment sequences respectively includes:
extracting a first character segment with a first bit length from the first character sequence based on a specific sequence to form a first character segment sequence;
extracting a second character segment with a second bit length from the first character sequence based on a specific sequence to form a second character segment sequence;
and extracting a third character segment with a third bit length from the first character sequence based on a specific sequence to form a third character segment sequence.
For example, the first bit length may be configured as 2 characters, the second bit length may be configured as 3 characters, and the third bit length may be configured as 4 characters. And then configuring a first sliding window based on the first bit length, setting the first sliding window to slide from the first character in the first character sequence to the end direction, extracting a first character segment with the character length being the first bit length every time the first sliding window moves one character until the sliding window moves to the end of the first character sequence, and forming the first character segment sequence based on the first character segments. And configuring a second sliding window based on the second bit length, setting the second sliding window to slide from the first character in the first character sequence to the end direction, extracting a second character segment with the character length of the second bit length when moving one character until the sliding window moves to the end of the first character sequence, and forming a second character segment sequence based on the second character segments. And configuring a third sliding window based on the third bit length, setting the third sliding window to slide from the first character in the first character sequence to the tail direction, extracting a third character segment with the character length of the third bit length when moving one character until the sliding window moves to the tail of the first character sequence, and forming a third character segment sequence based on the third character segments. Of course, for example, the fourth bit length, the fifth bit length, and so on may be configured to obtain other character fragment sequences.
After a plurality of character fragment sequences are obtained, the occurrence probability of character fragments in each character fragment sequence can be respectively determined, and a plurality of corresponding probability information sets are respectively formed; and then acquiring corresponding sub-feature information based on each probability information set. That is, the determining the occurrence probability of the character segment in the sequence of character segments forms a probability information set, and obtaining the first feature information based on the probability information set includes:
respectively determining the occurrence probability of the character segments in each character segment sequence, and respectively forming a plurality of corresponding probability information sets;
and acquiring corresponding sub-feature information based on the probability information sets respectively.
Specifically, a first occurrence probability of each first character segment in the first character segment sequence may be determined, so as to form a first probability information set; determining a second occurrence probability of each second character segment in the second character segment sequence to form a second probability information set; and determining a third occurrence probability of each third character segment in the third character segment sequence to form a third probability information set. And then, acquiring first sub-feature information based on the first probability information set, acquiring second sub-feature information based on the second probability information set, and acquiring third sub-feature information based on the third probability information set. For example, 2-Gram entropy, 3-Gram entropy and 4-Gram entropy can be obtained respectively.
Then, a plurality of the sub-feature information may be input into the detection model that has been trained as the first feature information to obtain a recognition result of the protocol type of the first request. That is, the inputting the first feature information into the detection model that has been trained, and performing calculation by using the detection model to obtain the recognition result of the protocol type of the first request includes:
and inputting a plurality of pieces of sub-feature information into a detection model which is trained, and calculating by using the detection model to obtain a recognition result of the protocol type of the first request.
Specifically, the first sub-feature information, the second sub-feature information, and the third sub-feature information may be respectively input as the first feature information into a detection model that completes training, and the detection model is used to perform calculation to obtain a result of identifying the protocol type of the first request. For example, in one embodiment, the 2-Gram entropy, the 3-Gram entropy, the 4-Gram entropy, and the accumulated and checked values may be input together into the trained detection model, and corresponding recognition results may be obtained. Through multiple N-Gram entropy analysis, the protocol type can be more effectively described, and the accuracy of the identification result of the protocol type of the first request is improved.
Referring to fig. 4, an embodiment of the present application further provides an electronic device, which includes:
an obtaining module 10, configured to obtain, when a first request based on a domain name service protocol is detected, first data based on the domain name service protocol corresponding to the first request;
the analysis module 20 is configured to analyze the first data, obtain a first character sequence at a specific position in the first data, and extract a character segment with a specific bit length from the first character sequence based on a specific sequence to form a character segment sequence;
a determining module 30, configured to determine that occurrence probabilities of the character segments in the character segment sequences form a probability information set, and obtain first feature information based on the probability information set, where the first feature information represents a degree of confusion of the first character sequence;
and the processing module 40 is configured to input the first feature information into a detection model that is trained, perform calculation by using the detection model, and obtain a recognition result of the protocol type of the first request.
In some embodiments, the electronic device further comprises:
a verification module for accumulating and verifying the first character sequence to obtain second characteristic information;
correspondingly, the processing module 40 is specifically configured to:
and inputting the first characteristic information and the second characteristic information into the detection model which is trained, and calculating by using the detection model to obtain the identification result of the protocol type of the first request.
In some embodiments, the parsing module 20 is specifically configured to:
and extracting a character segment with a specific bit length from the first character in the first character sequence, and staggering one character in the ending direction of the first character sequence every time, and extracting one character segment with the specific bit length until the end of the first character sequence so as to form the character segment sequence.
In some embodiments, the parsing module 20 is specifically configured to:
character segments having different bit lengths are extracted from the first character sequence based on a specific order to form a plurality of character segment sequences, respectively.
In some embodiments, the determining module 30 is specifically configured to:
respectively determining the occurrence probability of the character segments in each character segment sequence, and respectively forming a plurality of corresponding probability information sets;
and acquiring corresponding sub-feature information based on the probability information sets respectively.
In some embodiments, the processing module 40 is specifically configured to:
and inputting a plurality of pieces of sub-feature information into a detection model which is trained, and calculating by using the detection model to obtain a recognition result of the protocol type of the first request.
In some embodiments, the parsing module 20 is further configured to:
extracting a first character segment with a first bit length from the first character sequence based on a specific sequence to form a first character segment sequence;
extracting a second character segment with a second bit length from the first character sequence based on a specific sequence to form a second character segment sequence;
and extracting a third character segment with a third bit length from the first character sequence based on a specific sequence to form a third character segment sequence.
In some embodiments, the determination module 30 is specifically configured to
Calculating the occurrence probability of the character segments in the character segment sequence based on an N-Gram model to form a probability information set;
calculating entropy of the set of probability information as the first feature information.
In some embodiments, the electronic device further comprises:
a training model for training the established model architecture to form the detection model, wherein the training process comprises:
preparing a training data set, wherein the training data set comprises a first characteristic information set and a corresponding recognition result data set;
and training the model architecture by taking the first characteristic information set as input data and the recognition result data set as output data.
The above embodiments are only exemplary embodiments of the present application, and are not intended to limit the present application, and the protection scope of the present application is defined by the claims. Various modifications and equivalents may be made by those skilled in the art within the spirit and scope of the present application and such modifications and equivalents should also be considered to be within the scope of the present application.

Claims (8)

1. A data processing method based on a domain name service protocol is characterized by comprising the following steps:
under the condition that a first request based on a domain name service protocol is detected, first data based on the domain name service protocol corresponding to the first request is obtained;
analyzing the first data to obtain a first character sequence at a specific position in the first data, and extracting character segments with specific bit length from the first character sequence based on a specific sequence to form a character segment sequence;
determining the occurrence probability of the character segments in the character segment sequence to form a probability information set, and acquiring first characteristic information based on the probability information set, wherein the first characteristic information represents the chaos degree of the first character sequence;
inputting the first characteristic information into a detection model which is trained, calculating by using the detection model, and acquiring a recognition result of the protocol type of the first request;
wherein the extracting of the character segments with a specific bit length from the first character sequence based on a specific order to form a character segment sequence comprises:
extracting character segments with different bit lengths from the first character sequence based on a specific order to form a plurality of character segment sequences, respectively;
wherein, the determining the occurrence probability of the character segment in the character segment sequence forms a probability information set, and acquiring first feature information based on the probability information set includes:
respectively determining the occurrence probability of the character segments in each character segment sequence, and respectively forming a plurality of corresponding probability information sets;
and acquiring corresponding sub-feature information based on the probability information sets respectively.
2. The method of claim 1, further comprising:
accumulating and checking the first character sequence to obtain second characteristic information;
correspondingly, the inputting the first feature information into the detection model that has been trained, and performing calculation by using the detection model to obtain the recognition result of the protocol type of the first request includes:
and inputting the first characteristic information and the second characteristic information into the detection model which is trained, and calculating by using the detection model to obtain the identification result of the protocol type of the first request.
3. The method of claim 1, wherein extracting the character segments with a specific bit length from the first character sequence based on a specific order forms a sequence of character segments, comprising:
and extracting a character segment with a specific bit length from the first character in the first character sequence, and staggering one character in the ending direction of the first character sequence every time, and extracting one character segment with the specific bit length until the end of the first character sequence so as to form the character segment sequence.
4. The method according to claim 1, wherein the inputting the first feature information into a detection model that is trained, and performing calculation by using the detection model to obtain the recognition result of the protocol type of the first request comprises:
and inputting a plurality of pieces of sub-feature information into a detection model which is trained, and calculating by using the detection model to obtain a recognition result of the protocol type of the first request.
5. The method of claim 1, wherein extracting character segments with different bit lengths from the first character sequence based on a specific order to respectively form a plurality of character segment sequences comprises:
extracting a first character segment with a first bit length from the first character sequence based on a specific sequence to form a first character segment sequence;
extracting a second character segment with a second bit length from the first character sequence based on a specific sequence to form a second character segment sequence;
and extracting a third character segment with a third bit length from the first character sequence based on a specific sequence to form a third character segment sequence.
6. The method of claim 1, wherein the determining the probability of occurrence of the character segment in the sequence of character segments forms a probability information set, and wherein obtaining first feature information based on the probability information set comprises:
calculating the occurrence probability of the character segments in the character segment sequence based on an N-Gram model to form a probability information set;
calculating entropy of the set of probability information as the first feature information.
7. The method of claim 1, wherein the detection model is formed by training an established model architecture, wherein the training process comprises:
preparing a training data set, wherein the training data set comprises a first characteristic information set and a corresponding recognition result data set;
and training the model architecture by taking the first characteristic information set as input data and the recognition result data set as output data.
8. An electronic device, comprising:
the device comprises an acquisition module, a processing module and a processing module, wherein the acquisition module is used for acquiring first data based on a domain name service protocol corresponding to a first request under the condition that the first request based on the domain name service protocol is detected;
the analysis module is used for analyzing the first data, acquiring a first character sequence at a specific position in the first data, and extracting character segments with specific bit length from the first character sequence based on a specific sequence to form a character segment sequence;
a determining module, configured to determine that occurrence probabilities of the character segments in the character segment sequences form a probability information set, and obtain first feature information based on the probability information set, where the first feature information represents a degree of confusion of the first character sequence;
the processing module is used for inputting the first characteristic information into a detection model which is trained, calculating by using the detection model and acquiring a recognition result of the protocol type of the first request;
wherein the parsing module is specifically configured to:
extracting character segments with different bit lengths from the first character sequence based on a specific order to form a plurality of character segment sequences, respectively;
wherein the determining module is specifically configured to:
respectively determining the occurrence probability of the character segments in each character segment sequence, and respectively forming a plurality of corresponding probability information sets;
and acquiring corresponding sub-feature information based on the probability information sets respectively.
CN202010558544.5A 2020-06-18 2020-06-18 Data processing method based on domain name service protocol and electronic equipment Active CN111756871B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010558544.5A CN111756871B (en) 2020-06-18 2020-06-18 Data processing method based on domain name service protocol and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010558544.5A CN111756871B (en) 2020-06-18 2020-06-18 Data processing method based on domain name service protocol and electronic equipment

Publications (2)

Publication Number Publication Date
CN111756871A CN111756871A (en) 2020-10-09
CN111756871B true CN111756871B (en) 2022-04-26

Family

ID=72675459

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010558544.5A Active CN111756871B (en) 2020-06-18 2020-06-18 Data processing method based on domain name service protocol and electronic equipment

Country Status (1)

Country Link
CN (1) CN111756871B (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105897714A (en) * 2016-04-11 2016-08-24 天津大学 Botnet detection method based on DNS (Domain Name System) flow characteristics
CN106992969A (en) * 2017-03-03 2017-07-28 南京理工大学 DGA based on domain name character string statistical nature generates the detection method of domain name
CN111031026A (en) * 2019-12-09 2020-04-17 杭州安恒信息技术股份有限公司 DGA malicious software infected host detection method

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10498751B2 (en) * 2017-05-31 2019-12-03 Infoblox Inc. Inline DGA detection with deep networks
US11108794B2 (en) * 2018-01-31 2021-08-31 Micro Focus Llc Indicating malware generated domain names using n-grams

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105897714A (en) * 2016-04-11 2016-08-24 天津大学 Botnet detection method based on DNS (Domain Name System) flow characteristics
CN106992969A (en) * 2017-03-03 2017-07-28 南京理工大学 DGA based on domain name character string statistical nature generates the detection method of domain name
CN111031026A (en) * 2019-12-09 2020-04-17 杭州安恒信息技术股份有限公司 DGA malicious software infected host detection method

Also Published As

Publication number Publication date
CN111756871A (en) 2020-10-09

Similar Documents

Publication Publication Date Title
CN107241352B (en) Network security event classification and prediction method and system
CN108156174A (en) Botnet detection method, device, equipment and medium based on the analysis of C&C domain names
CN106961419B (en) WebShell detection method, device and system
US6880087B1 (en) Binary state machine system and method for REGEX processing of a data stream in an intrusion detection system
CN101686239B (en) Trojan discovery system
CN107657174B (en) Database intrusion detection method based on protocol fingerprint
CN111866024B (en) Network encryption traffic identification method and device
CN110611640A (en) DNS protocol hidden channel detection method based on random forest
CN109257393A (en) XSS attack defence method and device based on machine learning
US20200396201A1 (en) C&c domain name analysis-based botnet detection method, device, apparatus and mediumc&c domain name analysis-based botnet detection method, device, apparatus and medium
CN109922065B (en) Quick identification method for malicious website
CN107209834B (en) Malicious communication pattern extraction device, system and method thereof, and recording medium
CN112165484B (en) Network encryption traffic identification method and device based on deep learning and side channel analysis
CN107135212A (en) Man-machine identifying device and method under a kind of Web environment of Behavior-based control difference
CN111478892A (en) Attacker portrait multi-dimensional analysis method based on browser fingerprints
CN111182002A (en) Zombie network detection device based on HTTP (hyper text transport protocol) first question-answer packet clustering analysis
RU2615317C1 (en) Method for detection of malicious software codes in network data traffic, including exposed to combination of polymorphic transformations
CN111756871B (en) Data processing method based on domain name service protocol and electronic equipment
KR102119636B1 (en) Anonymous network analysis system using passive fingerprinting and method thereof
CN111756874A (en) Method and device for identifying type of DNS tunnel upper layer protocol
CN110958225B (en) Method for identifying website fingerprint based on flow
KR20070077517A (en) Profile-based web application intrusion detection system and the method
CN115051874B (en) Multi-feature CS malicious encrypted traffic detection method and system
CN111371727A (en) Detection method for NTP protocol covert communication
CN113382003B (en) RTSP mixed intrusion detection method based on two-stage filter

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant