CN113852645A - Method and device for resisting client DNS cache poisoning attack and electronic equipment - Google Patents
Method and device for resisting client DNS cache poisoning attack and electronic equipment Download PDFInfo
- Publication number
- CN113852645A CN113852645A CN202111457407.3A CN202111457407A CN113852645A CN 113852645 A CN113852645 A CN 113852645A CN 202111457407 A CN202111457407 A CN 202111457407A CN 113852645 A CN113852645 A CN 113852645A
- Authority
- CN
- China
- Prior art keywords
- description information
- proxy server
- action
- environment state
- action description
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L63/00—Network architectures or network communication protocols for network security
- H04L63/14—Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
- H04L63/1441—Countermeasures against malicious traffic
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L41/00—Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
- H04L41/14—Network analysis or design
- H04L41/145—Network analysis or design involving simulating, designing, planning or modelling of a network
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L61/00—Network arrangements, protocols or services for addressing or naming
- H04L61/45—Network directories; Name-to-address mapping
- H04L61/4505—Network directories; Name-to-address mapping using standardised directories; using standardised directory access protocols
- H04L61/4511—Network directories; Name-to-address mapping using standardised directories; using standardised directory access protocols using domain name system [DNS]
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/50—Network services
- H04L67/56—Provisioning of proxy services
- H04L67/568—Storing data temporarily at an intermediate stage, e.g. caching
Landscapes
- Engineering & Computer Science (AREA)
- Signal Processing (AREA)
- Computer Networks & Wireless Communication (AREA)
- Computer Security & Cryptography (AREA)
- General Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Theoretical Computer Science (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Evolutionary Computation (AREA)
- Physics & Mathematics (AREA)
- Evolutionary Biology (AREA)
- General Physics & Mathematics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Computational Biology (AREA)
- Artificial Intelligence (AREA)
- Life Sciences & Earth Sciences (AREA)
- Computer Hardware Design (AREA)
- Computing Systems (AREA)
- Data Exchanges In Wide-Area Networks (AREA)
Abstract
The invention discloses a method, a device and electronic equipment for resisting client DNS cache poisoning attack, wherein the method comprises the following steps: acquiring a request from a client and a DNS proxy server set; judging whether the target domain name hits the domain name in the local cache; if not, acquiring the current environment state; inputting the current environment state into a trained selection strategy model to obtain action description information selected in a DNS proxy server set; and selecting a corresponding DNS proxy server according to the action description information to obtain a target IP corresponding to the target domain name. The invention solves the uncertainty problem of the attack revenue function under the limited rational condition by selecting the strategy model, adaptively selects the DNS proxy server according to the state of the DNS proxy server in the current attack and defense game, and improves the effectiveness of the network service in defending the DNS cache poisoning attack and the second-level processing capability.
Description
Technical Field
The invention relates to the technical field of computers, in particular to a method and a device for resisting client DNS cache poisoning attack and electronic equipment.
Background
At present, a cloud platform has gradually become a mainstream paradigm of internet services due to the characteristic that the cloud platform can adaptively allocate storage and computing resources according to massive personalized service requirements, more and more internet service providers choose to transplant services into the cloud, and domain name system security plays an increasingly important role in network defense. In recent years, a client Domain Name System (DNS) cache poisoning attack has emerged in the market, and the difference from other attack methods is that such an attack attacks a DNS client only by using a certain non-privileged malicious program, and directly bypasses the main DNS defense System. First, a non-privileged malware keeps requesting resolution of a target domain name, such as www.a.com. When the client cannot find the IP of the domain name in the local DNS cache, it will request a DNS resolver. An attacker attempts to respond with an incorrect response within a time window before the DNS response reaches the client. In existing DNS response mechanisms, the client accepts the first response that matches its source IP, source port, destination, IP, destination port, and TXID. If the attacker first provides a matching response, the client will retain the misleading mapping in its DNS cache. Thus, it can result in the client interacting with the wrong server. Therefore, such an attack turns the request target of the cloud user into a zombie server and further reveals privacy information. The attack mode can invade DNS cache of the instance in the cloud within tens of seconds, and the security of the cloud platform is seriously threatened.
The existing cloud DNS defense technology still has a series of fatal defects of poor adaptability, frequent system change, high cost and the like, and the actual implementation of the technologies is severely restricted. At present, defense strategies against client DNS cache poisoning attacks are mainly classified into two categories. The first is non-cryptographic defense, such as randomizing the source UDP port, which protects clients from DNS cache poisoning attacks by increasing the uncertainty of the source port, whereas recent DNS attacks may tie up a port with a non-privileged malware to invalidate the port; the second approach, which utilizes encryption techniques such as DNSSec, DNSCrypt, etc., although such an approach theoretically has excellent security performance, the introduction of encryption adds many changes and burdens to the DNS interaction flow, thereby reducing the communication and energy efficiency of the system.
In summary, there is a need for a method for resisting client DNS cache poisoning attack to solve the above problems in the prior art.
Disclosure of Invention
Due to the problems existing in the existing method, the invention provides a method and a device for resisting client DNS cache poisoning attack and electronic equipment.
In a first aspect, the present invention provides a method for resisting a client DNS cache poisoning attack, including:
acquiring a request from a client and a DNS proxy server set; the request includes a target domain name;
judging whether the target domain name hits the domain name in the local cache;
if not, acquiring the current environment state; the environment state comprises state information of each DNS proxy server in the DNS proxy server set; the state information comprises the times selected by the client, the times selected by an attacker and the round-trip delay between the client and the attacker;
inputting the current environment state into a trained selection strategy model to obtain action description information selected in the DNS proxy server set;
selecting a corresponding DNS proxy server according to the action description information to obtain a target IP corresponding to the target domain name;
the trained selection strategy model is obtained by training by utilizing different environment states.
Further, the selecting policy model includes a value network, a policy network, an actor target network, and a critic target network, and before the current environment state is input to the trained selecting policy model and the action description information selected in the DNS proxy server set is obtained, the method further includes:
acquiring a preset number of training sample sets; each group of training samples comprises a first environment state, action description information, a second environment state and action rewards; the action description information is obtained after the strategy network inputs the first environment state; the first environment state is an environment state before the action corresponding to the action description information is executed; the second environment state is the environment state after the action corresponding to the action description information is executed; the action reward is a reward value for executing the action corresponding to the action description information;
inputting the first environment state and the action description information into the value network to obtain a first function value;
inputting the second environment state into the actor target network to obtain next action description information;
inputting the second environment state and the next action description information into the critic target network to obtain a second function value;
determining a dominant function according to the first function value and the second function value;
determining a gradient according to the merit function;
and updating the parameters of the selection strategy model according to the gradient to obtain the trained selection strategy model.
Further, the acquiring a preset number of training sample sets includes:
establishing a game model;
determining a first environment state and action description information according to the game model;
determining an action reward according to the action description information;
and determining a second environment state according to the first environment state and the action description information.
Further, the determining an action reward according to the action description information includes:
acquiring transmission time delay corresponding to the action description information;
and determining the action reward corresponding to the action description information according to the transmission delay.
Further, before the obtaining of the trained selection strategy model, the method further includes:
and optimizing the process of updating the parameters of the selection strategy model by adopting trust domain strategy optimization.
Further, before the selecting a corresponding DNS proxy server according to the action description information to obtain a target IP corresponding to the target domain name, the method further includes:
determining a selected DNS proxy server according to the action description information;
and adopting a self-checking component to check the DNS proxy server.
Further, the auditing the DNS proxy with a self-auditing component includes:
acquiring a transition set;
determining, with a self-audit component, whether the DNS proxy server is in the transition set;
and if the instantaneous state of the DNS proxy server is changed from positive excitation to negative feedback, reselecting the action description information by adopting a normal distributed sampling component.
In a second aspect, the present invention provides an apparatus for resisting client DNS cache poisoning attacks, including:
the acquisition module is used for acquiring a request from a client and a DNS proxy server set; the request includes a target domain name;
the processing module is used for judging whether the target domain name hits the domain name in the local cache; if not, acquiring the current environment state; the environment state comprises state information of each DNS proxy server in the DNS proxy server set; the state information comprises the times selected by the client, the times selected by an attacker and the round-trip delay between the client and the attacker; inputting the current environment state into a trained selection strategy model to obtain action description information selected in the DNS proxy server set; selecting a corresponding DNS proxy server according to the action description information to obtain a target IP corresponding to the target domain name; the trained selection strategy model is obtained by training by utilizing different environment states.
Further, the selection policy model includes a value network, a policy network, an actor target network, and a critic target network, the processing module is further configured to:
acquiring a preset number of training sample sets before inputting the current environment state into a trained selection strategy model to obtain action description information for selection in the DNS proxy server set; each group of training samples comprises a first environment state, action description information, a second environment state and action rewards; the action description information is obtained after the strategy network inputs the first environment state; the first environment state is an environment state before the action corresponding to the action description information is executed; the second environment state is the environment state after the action corresponding to the action description information is executed; the action reward is a reward value for executing the action corresponding to the action description information;
inputting the first environment state and the action description information into the value network to obtain a first function value;
inputting the second environment state into the actor target network to obtain next action description information;
inputting the second environment state and the next action description information into the critic target network to obtain a second function value;
determining a dominant function according to the first function value and the second function value;
determining a gradient according to the merit function;
and updating the parameters of the selection strategy model according to the gradient to obtain the trained selection strategy model.
Further, the processing module is specifically configured to:
establishing a game model;
determining a first environment state and action description information according to the game model;
determining an action reward according to the action description information;
and determining a second environment state according to the first environment state and the action description information.
Further, the processing module is specifically configured to:
acquiring transmission time delay corresponding to the action description information;
and determining the action reward corresponding to the action description information according to the transmission delay.
Further, the processing module is further configured to:
and before the trained selection strategy model is obtained, optimizing the process of updating parameters of the selection strategy model by adopting trust domain strategy optimization.
Further, the processing module is further configured to:
before the corresponding DNS proxy server is selected according to the action description information and the target IP corresponding to the target domain name is obtained, the selected DNS proxy server is determined according to the action description information;
and adopting a self-checking component to check the DNS proxy server.
Further, the processing module is specifically configured to:
acquiring a transition set;
determining, with a self-audit component, whether the DNS proxy server is in the transition set;
and if the instantaneous state of the DNS proxy server is changed from positive excitation to negative feedback, reselecting the action description information by adopting a normal distributed sampling component.
In a third aspect, the present invention further provides an electronic device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, and when the processor executes the computer program, the processor implements the method for resisting the client DNS cache poisoning attack according to the first aspect.
In a fourth aspect, the present invention also provides a non-transitory computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the method for resisting a client DNS cache poisoning attack as described in the first aspect.
According to the technical scheme, the method, the device and the electronic equipment for resisting the client side DNS cache poisoning attack solve the problem of uncertainty of an attack revenue function under a limited rational condition by selecting the strategy model, select the DNS proxy server in a self-adaptive manner according to the state of the DNS proxy server in the current attack and defense game, and improve the effectiveness of network service on DNS cache poisoning attack and defense and the second-level processing capability.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.
Fig. 1 is a system framework of a method for resisting a DNS cache poisoning attack of a client according to the present invention;
fig. 2 is a schematic flow chart of a method for resisting a DNS cache poisoning attack of a client according to the present invention;
fig. 3 is a schematic flow chart of a method for resisting a DNS cache poisoning attack at a client according to the present invention;
fig. 4 is a schematic structural diagram of a device for resisting client DNS cache poisoning attacks according to the present invention;
fig. 5 is a schematic structural diagram of an electronic device provided in the present invention.
Detailed Description
The following further describes embodiments of the present invention with reference to the accompanying drawings. The following examples are only for illustrating the technical solutions of the present invention more clearly, and the protection scope of the present invention is not limited thereby.
The method for resisting client side DNS cache poisoning attacks provided by the embodiment of the present invention may be applied to a system architecture as shown in fig. 1, where the system architecture includes a client side 100, a selection policy model 200, and a DNS proxy server set 300.
Specifically, the DNS proxy server set 300 is used to determine whether the target domain name hits a domain name in the local cache.
The DNS proxy first gets the request from the application and checks if the domain name is retained in the DNS cache map.
Selecting a policy model 200 for obtaining a current environment state if the target domain name is not hit; the action description information selected in the DNS proxy server set 300 is output after the current environment state is input.
The selection policy model selects a target IP of the DNS resolver from the DNS target IP set to request a miss to the cached domain name and stores the response. The responses that match and do not match are collected separately. And then analyzing to find out the spy progress possibly cooperating with the external attacker, and further carrying out prohibition on the spy progress.
The client 100 is configured to select a corresponding DNS proxy according to the action description information, and obtain a target IP corresponding to the target domain name.
It should be noted that fig. 1 is only an example of a system architecture according to the embodiment of the present invention, and the present invention is not limited to this specifically.
Based on the above illustrated system architecture, fig. 2 is a schematic flow chart corresponding to a method for resisting a client DNS cache poisoning attack according to an embodiment of the present invention, and as shown in fig. 2, the method includes:
Note that the request includes the target domain name.
In step 203, if not, the current environment state is obtained.
It should be noted that the environment state includes state information of each DNS proxy in the DNS proxy set; the state information includes the number of times selected by the client, the number of times selected by the attacker, and the round trip delay with the client.
And step 204, inputting the current environment state into the trained selection strategy model to obtain action description information selected in the DNS proxy server set.
it should be noted that, the trained selection strategy model is obtained by training with different environmental states.
According to the scheme, a DNS proxy server system is deployed on the client side, the uncertainty problem of an attack revenue function under a limited rational condition is solved by selecting a strategy model, the DNS proxy server is selected in a self-adaptive mode according to the state of the DNS proxy server in the current attack and defense game, and the effectiveness and the second-level processing capability of the network service on DNS cache poisoning attack defense are improved.
Before step 204, the embodiment of the present invention has a step flow as shown in fig. 3, which is specifically as follows:
It should be noted that each set of training samples includes a first environment state, action description information, a second environment state, and an action reward; the action description information is obtained after the strategy network inputs the first environment state; the first environment state is an environment state before the action corresponding to the action description information is executed; the second environment state is the environment state after the action corresponding to the action description information is executed; the action reward is a reward value for executing the action corresponding to the action description information.
Specifically, a game model is established;
in practical network attack and defense countermeasures, the non-cooperative decision is that the two parties cannot mutually expose their strategies. The game is based on incomplete information. Whether or not both parties choose to act simultaneously, they predict adverse actions with limited knowledge. Thus, this is a static game. Although the attacker pursues the optimal strategy under completely rational conditions, the subjective cognition degree determines that the attacker can only grasp limited information. The random game is a multi-stage game model combining game theory and Markov theory, and conforms to a multi-round process. The markov process is used to describe the transitions of the game state caused by the behavior of both parties. Based on this, the game model established by the embodiment of the invention is an incomplete information static random game model under the limited rationality.
In the embodiment of the invention, a game model is established as follows:
1、and the game model is defined as an attacking party and a defending party.On behalf of an attacker who intends to enforce cache poisoning,representing a defender under the DNS proxy architecture.
2、Is the action space of defender.Representing a set of DNS proxy IP set in the client. At each point, the client selects one of the requests for DNS resolution. Thus, the defender's action in each time unit is。
3、Is the action space of the attacker. An attacker can obtain a group through a non-privileged spywareThe client side sets the IP address of the DNS proxy server.Indicating the aggregate address chosen when an attacker attempts to forge a response packet in each time unit.
4、Indicating the game status. Each game state consists of the delay of each IP in the current DNS proxy IP set, the number of times the client selects an IP, and the number of times the attacker selects an IP. In the defense and attack fight, the malicious response packet can be identified through the corresponding matching program. Given an experienced attacker, it is possible to collect and analyze the defender's historical behavior. Thus, stateCan be that. Wherein the content of the first and second substances,is the round trip delay between the client and the DNS proxy IP.
5、To representDefense strategy in state. A defensive policy is a rule of defensive actions that specifies a selected action in the form of a probability.Is the probability of selecting a defensive action.
6、To return the function, represent the attacker inTaking action under the stateDefender taking actionThe defender returns immediately.
Is a penalty for the defender to choose the same action as the attacker. When the defender selects an action that is inconsistent with the attacker's action, the defender receives a positive return score that is inversely proportional to the time delay for selecting the IP.
According to the scheme, the attack and defense game conditions of the attack end and the defense end are comprehensively considered, and an incomplete information static random game model under the limited rationality is established and used for guiding attack and defense strategies under the large-scale network state.
Further, determining a first environment state and action description information according to the game model;
based on the game model, the first environment state is composed of the delay of each IP in the current DNS proxy IP set, the times of selecting the IP by the client and the times of selecting the IP by the attacker. The action description information corresponds to an action of selecting one of the IP sets of the DNS proxy server to request DNS resolution.
Determining an action reward according to the action description information;
specifically, a transmission delay corresponding to the action description information is obtained;
and determining the action reward corresponding to the action description information according to the transmission delay.
it should be noted that, in the following description,is a penalty for the defender to choose the same action as the attacker.Is the round trip delay between the client and the selected DNS proxy server.
And determining a second environment state according to the first environment state and the action description information.
It should be noted that under non-perfect rational conditions, there is no nash equilibrium between the two parties to the game. In this case, the defender converges to an optimal defense strategy corresponding to the behavior of the attacker through a monotone non-reductive deep reinforcement learning method. Under the completely rational conditions, the method can be used,andis a finite set, neither party knows the other return functionAnd each state is a static game with limited and incomplete information. Any static game with limited and incomplete information has a bayesian nash balance, which means that the defender can converge to a nash balance strategy in the game.
wherein the content of the first and second substances,is the last step of each round. Is a discount factor that displays future value. When the environment is in the stateWhen the micro-fluidic chip is given, the micro-fluidic chip is put into a sealed state,by neural networksProvided is a method.
gradient of gradientGetNAverage of individual gradients to ensure more accurate estimation, as follows:
wherein the content of the first and second substances,is in a random gameTaking action under the stateThe probability of (c).
And 307, updating parameters of the selection strategy model according to the gradient to obtain the trained selection strategy model.
Wherein, the learning rate of the actor in the deep reinforcement learning is shown.
In the embodiment of the invention, the DNS proxy server represents a defender in a game modelAnd according to different environmental conditionsAnd making corresponding action.
Defender can interact with attackerInteract without the benefit thereofFunction(s). Defending personReturn action reward after action is made. By repeating the process, the DNS proxy server realizes iterative optimization, and finally converges under different environment states to obtain the optimal action.
According to the scheme, the malicious non-privileged program is identified by training the selection strategy model, the spy process is forbidden, and the DNS cache poisoning attack is resisted to a great extent.
For practical application, an intelligent selection strategy based on DRL and a threat model with non-privileged malicious programs are further provided, a DNS proxy server is intelligently and adaptively selected according to the state of the DNS server in the current attack and defense game, so that the success rate of the cached DNS poisoning attack is greatly reduced,
furthermore, the embodiment of the invention optimizes the process of selecting the strategy model parameter updating by adopting trust region strategy optimization (TRPO), thereby ensuring that the training process is monotonous and not reduced and finally ensuring that the strategy converges to the optimal strategy in the game.
The embodiment of the invention mainly solves two problems of influencing rapid convergence, namely time consumption caused by unreusable historical data and unstable convergence caused by gradient updating of a neural network.
When using actorThe need to resample the training set, which may cause significant time consumption, after updating the strategy when collecting data, establishes two participant networks based on this embodiment of the invention,and actor。
Further, the air conditioner is provided with a fan,for interacting with the environment and acquiring training data. At the same time, actorIteratively updating its policy according to the collected set and synchronizing after a certain number of steps. This involves a key problem of significant sampling:
note that if the user cannot sample from p, another profile q is used to obtain data to estimate the functional expectation based on profile p. Although the expectation of the function is unbiased, the expected variance of the different distributions may be large. Therefore, it is a necessary condition to ensurepAndqthe distributions of (a) and (b) do not differ too much.
Is a merit function。Is state ofsTake actionaTime policy actorAnd a probability of the policy, the policy modified to:
further, the optimization formula is as follows:
In the attack and defense game of embodiments of the present invention, the large number of states represents an almost infinite number of constraints. Even with the conjugate gradient approximation, its complexity is still beyond expectations. Therefore, a method for limiting the update step based on PPO is adopted as follows:
the constraint function is defined as:
according to the scheme, the trust domain strategy optimization is adopted to optimize the process of updating the parameters of the selection strategy model, so that the training process is monotonous and not decreased, and the performance of selecting the strategy model is improved.
In the embodiment of the invention, the state of some servers in the DNS proxy server set has transient change, such as network delay change caused by network environment, unavailable service caused by server system failure and the like, and the change causes that the corresponding action selection is changed from positive excitation instant to negative feedback. Although the state may change instantly, the agent may not respond to the selection of actions by the server node immediately, due to the inherent mechanism of reinforcement learning, and may still be determined based on historical experience. This causes the agent security performance to fluctuate. And vice versa.
Based on this, in the embodiment of the present invention, before selecting a corresponding DNS proxy server according to the action description information and obtaining a target IP corresponding to a target domain name, the selected DNS proxy server is determined according to the action description information; and adopting a self-checking component to check the DNS proxy server.
Specifically, a transition set is obtained;
in the embodiment of the invention, the states of the servers, such as network delay, service availability, attacked states and the like, are inquired in real time according to the DNS proxy server selection list, and the IP of the DNS proxy server in the transient state is added to the transition set.
Judging whether the DNS proxy server is in the transition set or not by adopting a self-checking component;
and if the transient state of the DNS proxy server is changed from positive excitation to negative feedback, reselecting the action description information by adopting a normal distributed sampling component.
Specifically, if the transient state is changed from positive excitation to negative feedback, the normal distributed sampling component is adopted for reselection, and if the transient state is changed from negative feedback to positive excitation, no processing is performed.
It should be noted that other distributed sampling components, such as poisson distribution, may also be used, and this is not specifically limited in this embodiment of the present invention.
According to the scheme, the self-checking component is added for optimization, the state and the action after the action is selected are jointly checked through the self-checking component, the defense performance fluctuation caused by transient transition is solved, the stable convergence of the intelligent body is realized, the active defense safety performance is ensured, and efficient response and defense are completed. Meanwhile, the energy consumption ratio of the data center is reduced, and the migration decision speed of the virtual machine is increased.
Based on the same inventive concept, fig. 4 exemplarily shows a device for resisting a client DNS cache poisoning attack, which may be a flow of a method for resisting a client DNS cache poisoning attack according to an embodiment of the present invention.
The apparatus, comprising:
an obtaining module 401, configured to obtain a request from a client and a DNS proxy server set; the request includes a target domain name;
a processing module 402, configured to determine whether the target domain name hits a domain name in a local cache; if not, acquiring the current environment state; the environment state comprises state information of each DNS proxy server in the DNS proxy server set; the state information comprises the times selected by the client, the times selected by an attacker and the round-trip delay between the client and the attacker; inputting the current environment state into a trained selection strategy model to obtain action description information selected in the DNS proxy server set; selecting a corresponding DNS proxy server according to the action description information to obtain a target IP corresponding to the target domain name; the trained selection strategy model is obtained by training by utilizing different environment states.
Further, the selection policy model includes a value network, a policy network, an actor target network, and a critic target network, and the processing module 402 is further configured to:
acquiring a preset number of training sample sets before inputting the current environment state into a trained selection strategy model to obtain action description information for selection in the DNS proxy server set; each group of training samples comprises a first environment state, action description information, a second environment state and action rewards; the action description information is obtained after the strategy network inputs the first environment state; the first environment state is an environment state before the action corresponding to the action description information is executed; the second environment state is the environment state after the action corresponding to the action description information is executed; the action reward is a reward value for executing the action corresponding to the action description information;
inputting the first environment state and the action description information into the value network to obtain a first function value;
inputting the second environment state into the actor target network to obtain next action description information;
inputting the second environment state and the next action description information into the critic target network to obtain a second function value;
determining a dominant function according to the first function value and the second function value;
determining a gradient according to the merit function;
and updating the parameters of the selection strategy model according to the gradient to obtain the trained selection strategy model.
Further, the processing module 402 is specifically configured to:
establishing a game model;
determining a first environment state and action description information according to the game model;
determining an action reward according to the action description information;
and determining a second environment state according to the first environment state and the action description information.
Further, the processing module 402 is specifically configured to:
acquiring transmission time delay corresponding to the action description information;
and determining the action reward corresponding to the action description information according to the transmission delay.
Further, the processing module 402 is further configured to:
and before the trained selection strategy model is obtained, optimizing the process of updating parameters of the selection strategy model by adopting trust domain strategy optimization.
Further, the processing module 402 is further configured to:
before the corresponding DNS proxy server is selected according to the action description information and the target IP corresponding to the target domain name is obtained, the selected DNS proxy server is determined according to the action description information;
and adopting a self-checking component to check the DNS proxy server.
Further, the processing module is specifically configured to:
acquiring a transition set;
determining, with a self-audit component, whether the DNS proxy server is in the transition set;
and if the instantaneous state of the DNS proxy server is changed from positive excitation to negative feedback, reselecting the action description information by adopting a normal distributed sampling component.
Based on the same inventive concept, another embodiment of the present invention provides an electronic device, which specifically includes the following components, with reference to fig. 5: a processor 501, a memory 502, a communication interface 503, and a communication bus 504;
the processor 501, the memory 502 and the communication interface 503 complete mutual communication through the communication bus 504; the communication interface 503 is used for implementing information transmission between the devices;
the processor 501 is configured to call a computer program in the memory 502, and the processor implements all the steps of the above method for resisting a client DNS cache poisoning attack when executing the computer program, for example, the processor implements the following steps when executing the computer program: acquiring a request from a client and a DNS proxy server set; the request includes a target domain name; judging whether the target domain name hits the domain name in the local cache; if not, acquiring the current environment state; the environment state comprises state information of each DNS proxy server in the DNS proxy server set; the state information comprises the times selected by the client, the times selected by an attacker and the round-trip delay between the client and the attacker; inputting the current environment state into a trained selection strategy model to obtain action description information selected in the DNS proxy server set; selecting a corresponding DNS proxy server according to the action description information to obtain a target IP corresponding to the target domain name; the trained selection strategy model is obtained by training by utilizing different environment states.
Based on the same inventive concept, a further embodiment of the present invention provides a non-transitory computer-readable storage medium, having stored thereon a computer program, which when executed by a processor implements all the steps of the above method for resisting a client DNS cache poisoning attack, for example, the processor implements the following steps when executing the computer program: acquiring a request from a client and a DNS proxy server set; the request includes a target domain name; judging whether the target domain name hits the domain name in the local cache; if not, acquiring the current environment state; the environment state comprises state information of each DNS proxy server in the DNS proxy server set; the state information comprises the times selected by the client, the times selected by an attacker and the round-trip delay between the client and the attacker; inputting the current environment state into a trained selection strategy model to obtain action description information selected in the DNS proxy server set; selecting a corresponding DNS proxy server according to the action description information to obtain a target IP corresponding to the target domain name; the trained selection strategy model is obtained by training by utilizing different environment states.
In addition, the logic instructions in the memory may be implemented in the form of software functional units and may be stored in a computer readable storage medium when sold or used as a stand-alone product. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes several instructions for causing a computer device (which may be a personal computer, an apparatus for resisting client DNS cache poisoning attack, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the embodiment of the present invention. One of ordinary skill in the art can understand and implement it without inventive effort.
Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. Based on such understanding, the above technical solutions may be embodied in the form of a software product, which may be stored in a computer-readable storage medium, such as a ROM/RAM, a magnetic disk, an optical disk, etc., and includes several instructions for enabling a computer device (which may be a personal computer, an apparatus for resisting client DNS cache poisoning attack, or a network device, etc.) to execute the method for resisting client DNS cache poisoning attack according to the embodiments or some parts of the embodiments.
In addition, in the present invention, terms such as "first" and "second" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one such feature. In the description of the present invention, "a plurality" means at least two, e.g., two, three, etc., unless specifically limited otherwise.
Moreover, in the present invention, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
Furthermore, in the description herein, reference to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above are not necessarily intended to refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, various embodiments or examples and features of different embodiments or examples described in this specification can be combined and combined by one skilled in the art without contradiction.
Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.
Claims (10)
1. A method for resisting client DNS cache poisoning attacks is characterized by comprising the following steps:
acquiring a request from a client and a DNS proxy server set; the request includes a target domain name;
judging whether the target domain name hits the domain name in the local cache;
if not, acquiring the current environment state; the environment state comprises state information of each DNS proxy server in the DNS proxy server set; the state information comprises the times selected by the client, the times selected by an attacker and the round-trip delay between the client and the attacker;
inputting the current environment state into a trained selection strategy model to obtain action description information selected in the DNS proxy server set;
selecting a corresponding DNS proxy server according to the action description information to obtain a target IP corresponding to the target domain name;
the trained selection strategy model is obtained by training by utilizing different environment states.
2. The method of claim 1, wherein the selection policy model comprises a value network, a policy network, an actor target network, and a critic target network, and before inputting the current environmental status into the trained selection policy model to obtain the action description information for selecting in the DNS proxy server set, the method further comprises:
acquiring a preset number of training sample sets; each group of training samples comprises a first environment state, action description information, a second environment state and action rewards; the action description information is obtained after the strategy network inputs the first environment state; the first environment state is an environment state before the action corresponding to the action description information is executed; the second environment state is the environment state after the action corresponding to the action description information is executed; the action reward is a reward value for executing the action corresponding to the action description information;
inputting the first environment state and the action description information into the value network to obtain a first function value;
inputting the second environment state into the actor target network to obtain next action description information;
inputting the second environment state and the next action description information into the critic target network to obtain a second function value;
determining a dominant function according to the first function value and the second function value;
determining a gradient according to the merit function;
and updating the parameters of the selection strategy model according to the gradient to obtain the trained selection strategy model.
3. The method of claim 2, wherein the obtaining a predetermined number of training sample sets comprises:
establishing a game model;
determining a first environment state and action description information according to the game model;
determining an action reward according to the action description information;
and determining a second environment state according to the first environment state and the action description information.
4. The method for resisting client side DNS cache poisoning attack as claimed in claim 3, wherein the determining action reward according to the action description information comprises:
acquiring transmission time delay corresponding to the action description information;
and determining the action reward corresponding to the action description information according to the transmission delay.
5. The method of resisting client-side DNS cache poisoning attacks according to claim 2, further comprising, before the obtaining the trained selection policy model:
and optimizing the process of updating the parameters of the selection strategy model by adopting trust domain strategy optimization.
6. The method according to claim 1, wherein before the selecting the corresponding DNS proxy server according to the action description information to obtain the target IP corresponding to the target domain name, the method further comprises:
determining a selected DNS proxy server according to the action description information;
and adopting a self-checking component to check the DNS proxy server.
7. The method of claim 6, wherein the auditing the DNS proxy server with a self-auditing component comprises:
acquiring a transition set;
determining, with a self-audit component, whether the DNS proxy server is in the transition set;
and if the instantaneous state of the DNS proxy server is changed from positive excitation to negative feedback, reselecting the action description information by adopting a normal distributed sampling component.
8. An apparatus for resisting client side DNS cache poisoning attacks, comprising:
the acquisition module is used for acquiring a request from a client and a DNS proxy server set; the request includes a target domain name;
the processing module is used for judging whether the target domain name hits the domain name in the local cache; if not, acquiring the current environment state; the environment state comprises state information of each DNS proxy server in the DNS proxy server set; the state information comprises the times selected by the client, the times selected by an attacker and the round-trip delay between the client and the attacker; inputting the current environment state into a trained selection strategy model to obtain action description information selected in the DNS proxy server set; selecting a corresponding DNS proxy server according to the action description information to obtain a target IP corresponding to the target domain name; the trained selection strategy model is obtained by training by utilizing different environment states.
9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the steps of the method according to any of claims 1 to 7 are implemented when the processor executes the program.
10. A non-transitory computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111457407.3A CN113852645B (en) | 2021-12-02 | 2021-12-02 | Method and device for resisting client DNS cache poisoning attack and electronic equipment |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111457407.3A CN113852645B (en) | 2021-12-02 | 2021-12-02 | Method and device for resisting client DNS cache poisoning attack and electronic equipment |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113852645A true CN113852645A (en) | 2021-12-28 |
CN113852645B CN113852645B (en) | 2022-03-29 |
Family
ID=78982689
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111457407.3A Active CN113852645B (en) | 2021-12-02 | 2021-12-02 | Method and device for resisting client DNS cache poisoning attack and electronic equipment |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113852645B (en) |
Citations (20)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP0770967A2 (en) * | 1995-10-26 | 1997-05-02 | Koninklijke Philips Electronics N.V. | Decision support system for the management of an agile supply chain |
US20090241183A1 (en) * | 2008-03-18 | 2009-09-24 | Gregory Jensen Boss | Dynamic document merging method and system |
US20100042449A1 (en) * | 2008-08-14 | 2010-02-18 | Electronic Data Systems Corporation | Heterogeneous Information Technology (IT) Infrastructure Management Orchestration |
CN101682626A (en) * | 2007-05-24 | 2010-03-24 | 爱维技术解决方案私人有限公司 | Method and system for simulating a hacking attack on a network |
US20150105269A1 (en) * | 2013-10-10 | 2015-04-16 | Severe Adverse Event (Sae) Consortium | Biomarkers for increased risk of drug-induced osteonecrosis of the jaw |
US20160294645A1 (en) * | 2015-04-06 | 2016-10-06 | Illumio, Inc. | Enforcing rules for bound services in a distributed network management system that uses a label-based policy model |
US20160335223A1 (en) * | 2014-06-27 | 2016-11-17 | University Of South Florida | Methods and systems for computation of bilevel mixed integer programming problems |
CN106716404A (en) * | 2014-09-24 | 2017-05-24 | 甲骨文国际公司 | Proxy servers within computer subnetworks |
CN107332811A (en) * | 2016-04-29 | 2017-11-07 | 阿里巴巴集团控股有限公司 | The methods, devices and systems of intrusion detection |
CN107948223A (en) * | 2016-10-12 | 2018-04-20 | 中国电信股份有限公司 | Flow processing method, service strategy equipment and caching system for caching system |
CN108234472A (en) * | 2017-12-28 | 2018-06-29 | 北京百度网讯科技有限公司 | Detection method and device, computer equipment and the readable medium of Challenging black hole attack |
CN108289088A (en) * | 2017-01-09 | 2018-07-17 | ***通信集团河北有限公司 | Abnormal traffic detection system and method based on business model |
CN108833402A (en) * | 2018-06-11 | 2018-11-16 | 中国人民解放军战略支援部队信息工程大学 | A kind of optimal defence policies choosing method of network based on game of bounded rationality theory and device |
CN109327427A (en) * | 2018-05-16 | 2019-02-12 | 中国人民解放军战略支援部队信息工程大学 | A kind of dynamic network variation decision-making technique and its system in face of unknown threat |
CN109688110A (en) * | 2018-11-22 | 2019-04-26 | 顺丰科技有限公司 | DGA domain name detection model construction method, device, server and storage medium |
CN110266673A (en) * | 2019-06-11 | 2019-09-20 | 合肥宜拾惠网络科技有限公司 | Security strategy optimized treatment method and device based on big data |
CN110266647A (en) * | 2019-05-22 | 2019-09-20 | 北京金睛云华科技有限公司 | It is a kind of to order and control communication check method and system |
CN110300106A (en) * | 2019-06-24 | 2019-10-01 | 中国人民解放军战略支援部队信息工程大学 | Mobile target based on Markov time game defends decision choosing method, apparatus and system |
CN111401556A (en) * | 2020-04-22 | 2020-07-10 | 清华大学深圳国际研究生院 | Selection method of opponent type imitation learning winning incentive function |
CN111737168A (en) * | 2020-06-24 | 2020-10-02 | 华中科技大学 | Cache system, cache processing method, device, equipment and medium |
-
2021
- 2021-12-02 CN CN202111457407.3A patent/CN113852645B/en active Active
Patent Citations (20)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP0770967A2 (en) * | 1995-10-26 | 1997-05-02 | Koninklijke Philips Electronics N.V. | Decision support system for the management of an agile supply chain |
CN101682626A (en) * | 2007-05-24 | 2010-03-24 | 爱维技术解决方案私人有限公司 | Method and system for simulating a hacking attack on a network |
US20090241183A1 (en) * | 2008-03-18 | 2009-09-24 | Gregory Jensen Boss | Dynamic document merging method and system |
US20100042449A1 (en) * | 2008-08-14 | 2010-02-18 | Electronic Data Systems Corporation | Heterogeneous Information Technology (IT) Infrastructure Management Orchestration |
US20150105269A1 (en) * | 2013-10-10 | 2015-04-16 | Severe Adverse Event (Sae) Consortium | Biomarkers for increased risk of drug-induced osteonecrosis of the jaw |
US20160335223A1 (en) * | 2014-06-27 | 2016-11-17 | University Of South Florida | Methods and systems for computation of bilevel mixed integer programming problems |
CN106716404A (en) * | 2014-09-24 | 2017-05-24 | 甲骨文国际公司 | Proxy servers within computer subnetworks |
US20160294645A1 (en) * | 2015-04-06 | 2016-10-06 | Illumio, Inc. | Enforcing rules for bound services in a distributed network management system that uses a label-based policy model |
CN107332811A (en) * | 2016-04-29 | 2017-11-07 | 阿里巴巴集团控股有限公司 | The methods, devices and systems of intrusion detection |
CN107948223A (en) * | 2016-10-12 | 2018-04-20 | 中国电信股份有限公司 | Flow processing method, service strategy equipment and caching system for caching system |
CN108289088A (en) * | 2017-01-09 | 2018-07-17 | ***通信集团河北有限公司 | Abnormal traffic detection system and method based on business model |
CN108234472A (en) * | 2017-12-28 | 2018-06-29 | 北京百度网讯科技有限公司 | Detection method and device, computer equipment and the readable medium of Challenging black hole attack |
CN109327427A (en) * | 2018-05-16 | 2019-02-12 | 中国人民解放军战略支援部队信息工程大学 | A kind of dynamic network variation decision-making technique and its system in face of unknown threat |
CN108833402A (en) * | 2018-06-11 | 2018-11-16 | 中国人民解放军战略支援部队信息工程大学 | A kind of optimal defence policies choosing method of network based on game of bounded rationality theory and device |
CN109688110A (en) * | 2018-11-22 | 2019-04-26 | 顺丰科技有限公司 | DGA domain name detection model construction method, device, server and storage medium |
CN110266647A (en) * | 2019-05-22 | 2019-09-20 | 北京金睛云华科技有限公司 | It is a kind of to order and control communication check method and system |
CN110266673A (en) * | 2019-06-11 | 2019-09-20 | 合肥宜拾惠网络科技有限公司 | Security strategy optimized treatment method and device based on big data |
CN110300106A (en) * | 2019-06-24 | 2019-10-01 | 中国人民解放军战略支援部队信息工程大学 | Mobile target based on Markov time game defends decision choosing method, apparatus and system |
CN111401556A (en) * | 2020-04-22 | 2020-07-10 | 清华大学深圳国际研究生院 | Selection method of opponent type imitation learning winning incentive function |
CN111737168A (en) * | 2020-06-24 | 2020-10-02 | 华中科技大学 | Cache system, cache processing method, device, equipment and medium |
Non-Patent Citations (2)
Title |
---|
TENGCHAO MA等: "《Intelligent-Driven_Adapting_Defense_Against_the_Client-Side_DNS_Cache_Poisoning_in_the_Cloud》", 《GLOBECOM 2020 - 2020 IEEE GLOBAL COMMUNICATIONS CONFERENCE》 * |
王禛鹏: "《一种基于拟态安全防御的DNS框架设计》", 《电子学报》 * |
Also Published As
Publication number | Publication date |
---|---|
CN113852645B (en) | 2022-03-29 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11159546B1 (en) | Methods and systems for efficient threat context-aware packet filtering for network protection | |
Wang et al. | DDoS attack protection in the era of cloud computing and software-defined networking | |
Çeker et al. | Deception-based game theoretical approach to mitigate DoS attacks | |
Clark et al. | A game-theoretic approach to IP address randomization in decoy-based cyber defense | |
Maciá-Fernández et al. | Mathematical model for low-rate DoS attacks against application servers | |
CN112073411A (en) | Network security deduction method, device, equipment and storage medium | |
Shen et al. | Adaptive Markov game theoretic data fusion approach for cyber network defense | |
Sengupta et al. | General sum markov games for strategic detection of advanced persistent threats using moving target defense in cloud networks | |
CN108701260B (en) | System and method for aiding decision making | |
CN105939361A (en) | Method and device for defensing CC (Challenge Collapsar) attack | |
US20160299971A1 (en) | Identifying Search Engine Crawlers | |
CN107517200B (en) | Malicious crawler defense strategy selection method for Web server | |
Boumkheld et al. | Honeypot type selection games for smart grid networks | |
Masoud et al. | On tackling social engineering web phishing attacks utilizing software defined networks (SDN) approach | |
Zuzčák et al. | Expert system assessing threat level of attacks on a hybrid SSH honeynet | |
CN113852645B (en) | Method and device for resisting client DNS cache poisoning attack and electronic equipment | |
Bajic et al. | Automated benchmark network diversification for realistic attack simulation with application to moving target defense | |
Zhang et al. | Multiple domain cyberspace attack and defense game based on reward randomization reinforcement learning | |
Jafarabadi et al. | A stochastic epidemiological model for the propagation of active worms considering the dynamicity of network topology | |
Mojahedi et al. | Modeling the propagation of topology-aware P2P worms considering temporal parameters | |
Mezzour | Assessing the Global Cyber and Biological Threat. | |
An et al. | A Super-Nash Equilibrium Defense Solution for Client-Side Cache Poisoning Attacks | |
Bajic et al. | Attack simulation for a realistic evaluation and comparison of network security techniques | |
Hazeyama et al. | Outfitting an Inter-AS Topology to a Network Emulation TestBed for Realistic Performance Tests of DDoS Countermeasures. | |
Chen et al. | A Dynamic Hidden Forwarding Path Planning Method Based on Improved Q‐Learning in SDN Environments |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |