CN113852645A

CN113852645A - Method and device for resisting client DNS cache poisoning attack and electronic equipment

Info

Publication number: CN113852645A
Application number: CN202111457407.3A
Authority: CN
Inventors: 杨树杰; 许长桥; 马腾超; 关建峰; 丁中医; 刘朝阳
Original assignee: Beijing University of Posts and Telecommunications
Current assignee: Beijing University of Posts and Telecommunications
Priority date: 2021-12-02
Filing date: 2021-12-02
Publication date: 2021-12-28
Anticipated expiration: 2041-12-02
Also published as: CN113852645B

Abstract

The invention discloses a method, a device and electronic equipment for resisting client DNS cache poisoning attack, wherein the method comprises the following steps: acquiring a request from a client and a DNS proxy server set; judging whether the target domain name hits the domain name in the local cache; if not, acquiring the current environment state; inputting the current environment state into a trained selection strategy model to obtain action description information selected in a DNS proxy server set; and selecting a corresponding DNS proxy server according to the action description information to obtain a target IP corresponding to the target domain name. The invention solves the uncertainty problem of the attack revenue function under the limited rational condition by selecting the strategy model, adaptively selects the DNS proxy server according to the state of the DNS proxy server in the current attack and defense game, and improves the effectiveness of the network service in defending the DNS cache poisoning attack and the second-level processing capability.

Description

Method and device for resisting client DNS cache poisoning attack and electronic equipment

Technical Field

The invention relates to the technical field of computers, in particular to a method and a device for resisting client DNS cache poisoning attack and electronic equipment.

Background

At present, a cloud platform has gradually become a mainstream paradigm of internet services due to the characteristic that the cloud platform can adaptively allocate storage and computing resources according to massive personalized service requirements, more and more internet service providers choose to transplant services into the cloud, and domain name system security plays an increasingly important role in network defense. In recent years, a client Domain Name System (DNS) cache poisoning attack has emerged in the market, and the difference from other attack methods is that such an attack attacks a DNS client only by using a certain non-privileged malicious program, and directly bypasses the main DNS defense System. First, a non-privileged malware keeps requesting resolution of a target domain name, such as www.a.com. When the client cannot find the IP of the domain name in the local DNS cache, it will request a DNS resolver. An attacker attempts to respond with an incorrect response within a time window before the DNS response reaches the client. In existing DNS response mechanisms, the client accepts the first response that matches its source IP, source port, destination, IP, destination port, and TXID. If the attacker first provides a matching response, the client will retain the misleading mapping in its DNS cache. Thus, it can result in the client interacting with the wrong server. Therefore, such an attack turns the request target of the cloud user into a zombie server and further reveals privacy information. The attack mode can invade DNS cache of the instance in the cloud within tens of seconds, and the security of the cloud platform is seriously threatened.

The existing cloud DNS defense technology still has a series of fatal defects of poor adaptability, frequent system change, high cost and the like, and the actual implementation of the technologies is severely restricted. At present, defense strategies against client DNS cache poisoning attacks are mainly classified into two categories. The first is non-cryptographic defense, such as randomizing the source UDP port, which protects clients from DNS cache poisoning attacks by increasing the uncertainty of the source port, whereas recent DNS attacks may tie up a port with a non-privileged malware to invalidate the port; the second approach, which utilizes encryption techniques such as DNSSec, DNSCrypt, etc., although such an approach theoretically has excellent security performance, the introduction of encryption adds many changes and burdens to the DNS interaction flow, thereby reducing the communication and energy efficiency of the system.

In summary, there is a need for a method for resisting client DNS cache poisoning attack to solve the above problems in the prior art.

Disclosure of Invention

Due to the problems existing in the existing method, the invention provides a method and a device for resisting client DNS cache poisoning attack and electronic equipment.

In a first aspect, the present invention provides a method for resisting a client DNS cache poisoning attack, including:

acquiring a request from a client and a DNS proxy server set; the request includes a target domain name;

judging whether the target domain name hits the domain name in the local cache;

if not, acquiring the current environment state; the environment state comprises state information of each DNS proxy server in the DNS proxy server set; the state information comprises the times selected by the client, the times selected by an attacker and the round-trip delay between the client and the attacker;

inputting the current environment state into a trained selection strategy model to obtain action description information selected in the DNS proxy server set;

selecting a corresponding DNS proxy server according to the action description information to obtain a target IP corresponding to the target domain name;

the trained selection strategy model is obtained by training by utilizing different environment states.

Further, the selecting policy model includes a value network, a policy network, an actor target network, and a critic target network, and before the current environment state is input to the trained selecting policy model and the action description information selected in the DNS proxy server set is obtained, the method further includes:

acquiring a preset number of training sample sets; each group of training samples comprises a first environment state, action description information, a second environment state and action rewards; the action description information is obtained after the strategy network inputs the first environment state; the first environment state is an environment state before the action corresponding to the action description information is executed; the second environment state is the environment state after the action corresponding to the action description information is executed; the action reward is a reward value for executing the action corresponding to the action description information;

inputting the first environment state and the action description information into the value network to obtain a first function value;

inputting the second environment state into the actor target network to obtain next action description information;

inputting the second environment state and the next action description information into the critic target network to obtain a second function value;

determining a dominant function according to the first function value and the second function value;

determining a gradient according to the merit function;

and updating the parameters of the selection strategy model according to the gradient to obtain the trained selection strategy model.

Further, the acquiring a preset number of training sample sets includes:

establishing a game model;

determining a first environment state and action description information according to the game model;

determining an action reward according to the action description information;

and determining a second environment state according to the first environment state and the action description information.

Further, the determining an action reward according to the action description information includes:

acquiring transmission time delay corresponding to the action description information;

and determining the action reward corresponding to the action description information according to the transmission delay.

Further, before the obtaining of the trained selection strategy model, the method further includes:

and optimizing the process of updating the parameters of the selection strategy model by adopting trust domain strategy optimization.

Further, before the selecting a corresponding DNS proxy server according to the action description information to obtain a target IP corresponding to the target domain name, the method further includes:

determining a selected DNS proxy server according to the action description information;

and adopting a self-checking component to check the DNS proxy server.

Further, the auditing the DNS proxy with a self-auditing component includes:

acquiring a transition set;

determining, with a self-audit component, whether the DNS proxy server is in the transition set;

and if the instantaneous state of the DNS proxy server is changed from positive excitation to negative feedback, reselecting the action description information by adopting a normal distributed sampling component.

In a second aspect, the present invention provides an apparatus for resisting client DNS cache poisoning attacks, including:

the acquisition module is used for acquiring a request from a client and a DNS proxy server set; the request includes a target domain name;

the processing module is used for judging whether the target domain name hits the domain name in the local cache; if not, acquiring the current environment state; the environment state comprises state information of each DNS proxy server in the DNS proxy server set; the state information comprises the times selected by the client, the times selected by an attacker and the round-trip delay between the client and the attacker; inputting the current environment state into a trained selection strategy model to obtain action description information selected in the DNS proxy server set; selecting a corresponding DNS proxy server according to the action description information to obtain a target IP corresponding to the target domain name; the trained selection strategy model is obtained by training by utilizing different environment states.

Further, the selection policy model includes a value network, a policy network, an actor target network, and a critic target network, the processing module is further configured to:

acquiring a preset number of training sample sets before inputting the current environment state into a trained selection strategy model to obtain action description information for selection in the DNS proxy server set; each group of training samples comprises a first environment state, action description information, a second environment state and action rewards; the action description information is obtained after the strategy network inputs the first environment state; the first environment state is an environment state before the action corresponding to the action description information is executed; the second environment state is the environment state after the action corresponding to the action description information is executed; the action reward is a reward value for executing the action corresponding to the action description information;

determining a gradient according to the merit function;

Further, the processing module is specifically configured to:

establishing a game model;

determining an action reward according to the action description information;

Further, the processing module is specifically configured to:

Further, the processing module is further configured to:

and before the trained selection strategy model is obtained, optimizing the process of updating parameters of the selection strategy model by adopting trust domain strategy optimization.

Further, the processing module is further configured to:

before the corresponding DNS proxy server is selected according to the action description information and the target IP corresponding to the target domain name is obtained, the selected DNS proxy server is determined according to the action description information;

and adopting a self-checking component to check the DNS proxy server.

Further, the processing module is specifically configured to:

acquiring a transition set;

In a third aspect, the present invention further provides an electronic device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, and when the processor executes the computer program, the processor implements the method for resisting the client DNS cache poisoning attack according to the first aspect.

In a fourth aspect, the present invention also provides a non-transitory computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the method for resisting a client DNS cache poisoning attack as described in the first aspect.

According to the technical scheme, the method, the device and the electronic equipment for resisting the client side DNS cache poisoning attack solve the problem of uncertainty of an attack revenue function under a limited rational condition by selecting the strategy model, select the DNS proxy server in a self-adaptive manner according to the state of the DNS proxy server in the current attack and defense game, and improve the effectiveness of network service on DNS cache poisoning attack and defense and the second-level processing capability.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.

Fig. 1 is a system framework of a method for resisting a DNS cache poisoning attack of a client according to the present invention;

fig. 2 is a schematic flow chart of a method for resisting a DNS cache poisoning attack of a client according to the present invention;

fig. 3 is a schematic flow chart of a method for resisting a DNS cache poisoning attack at a client according to the present invention;

fig. 4 is a schematic structural diagram of a device for resisting client DNS cache poisoning attacks according to the present invention;

fig. 5 is a schematic structural diagram of an electronic device provided in the present invention.

Detailed Description

The following further describes embodiments of the present invention with reference to the accompanying drawings. The following examples are only for illustrating the technical solutions of the present invention more clearly, and the protection scope of the present invention is not limited thereby.

The method for resisting client side DNS cache poisoning attacks provided by the embodiment of the present invention may be applied to a system architecture as shown in fig. 1, where the system architecture includes a client side 100, a selection policy model 200, and a DNS proxy server set 300.

Specifically, the DNS proxy server set 300 is used to determine whether the target domain name hits a domain name in the local cache.

The DNS proxy first gets the request from the application and checks if the domain name is retained in the DNS cache map.

Selecting a policy model 200 for obtaining a current environment state if the target domain name is not hit; the action description information selected in the DNS proxy server set 300 is output after the current environment state is input.

The selection policy model selects a target IP of the DNS resolver from the DNS target IP set to request a miss to the cached domain name and stores the response. The responses that match and do not match are collected separately. And then analyzing to find out the spy progress possibly cooperating with the external attacker, and further carrying out prohibition on the spy progress.

The client 100 is configured to select a corresponding DNS proxy according to the action description information, and obtain a target IP corresponding to the target domain name.

It should be noted that fig. 1 is only an example of a system architecture according to the embodiment of the present invention, and the present invention is not limited to this specifically.

Based on the above illustrated system architecture, fig. 2 is a schematic flow chart corresponding to a method for resisting a client DNS cache poisoning attack according to an embodiment of the present invention, and as shown in fig. 2, the method includes:

step 201, a request from a client and a DNS proxy server set are obtained.

Note that the request includes the target domain name.

Step 202, determine whether the target domain name hits the domain name in the local cache.

In step 203, if not, the current environment state is obtained.

It should be noted that the environment state includes state information of each DNS proxy in the DNS proxy set; the state information includes the number of times selected by the client, the number of times selected by the attacker, and the round trip delay with the client.

And step 204, inputting the current environment state into the trained selection strategy model to obtain action description information selected in the DNS proxy server set.

Step 205, selecting a corresponding DNS proxy according to the action description information to obtain a target IP corresponding to the target domain name;

it should be noted that, the trained selection strategy model is obtained by training with different environmental states.

According to the scheme, a DNS proxy server system is deployed on the client side, the uncertainty problem of an attack revenue function under a limited rational condition is solved by selecting a strategy model, the DNS proxy server is selected in a self-adaptive mode according to the state of the DNS proxy server in the current attack and defense game, and the effectiveness and the second-level processing capability of the network service on DNS cache poisoning attack defense are improved.

Before step 204, the embodiment of the present invention has a step flow as shown in fig. 3, which is specifically as follows:

step 301, a preset number of training sample sets are obtained.

It should be noted that each set of training samples includes a first environment state, action description information, a second environment state, and an action reward; the action description information is obtained after the strategy network inputs the first environment state; the first environment state is an environment state before the action corresponding to the action description information is executed; the second environment state is the environment state after the action corresponding to the action description information is executed; the action reward is a reward value for executing the action corresponding to the action description information.

Specifically, a game model is established;

in practical network attack and defense countermeasures, the non-cooperative decision is that the two parties cannot mutually expose their strategies. The game is based on incomplete information. Whether or not both parties choose to act simultaneously, they predict adverse actions with limited knowledge. Thus, this is a static game. Although the attacker pursues the optimal strategy under completely rational conditions, the subjective cognition degree determines that the attacker can only grasp limited information. The random game is a multi-stage game model combining game theory and Markov theory, and conforms to a multi-round process. The markov process is used to describe the transitions of the game state caused by the behavior of both parties. Based on this, the game model established by the embodiment of the invention is an incomplete information static random game model under the limited rationality.

In the embodiment of the invention, a game model is established as follows:

1、

and the game model is defined as an attacking party and a defending party.

On behalf of an attacker who intends to enforce cache poisoning,

representing a defender under the DNS proxy architecture.

2、

Is the action space of defender.

Representing a set of DNS proxy IP set in the client. At each point, the client selects one of the requests for DNS resolution. Thus, the defender's action in each time unit is

。

3、

Is the action space of the attacker. An attacker can obtain a group through a non-privileged spywareThe client side sets the IP address of the DNS proxy server.

Indicating the aggregate address chosen when an attacker attempts to forge a response packet in each time unit.

4、

Indicating the game status. Each game state consists of the delay of each IP in the current DNS proxy IP set, the number of times the client selects an IP, and the number of times the attacker selects an IP. In the defense and attack fight, the malicious response packet can be identified through the corresponding matching program. Given an experienced attacker, it is possible to collect and analyze the defender's historical behavior. Thus, state

Can be that

. Wherein the content of the first and second substances,

is the round trip delay between the client and the DNS proxy IP.

5、

To represent

Defense strategy in state. A defensive policy is a rule of defensive actions that specifies a selected action in the form of a probability.

Is the probability of selecting a defensive action.

6、

To return the function, represent the attacker in

Taking action under the state

Defender taking action

The defender returns immediately.

Is a penalty for the defender to choose the same action as the attacker. When the defender selects an action that is inconsistent with the attacker's action, the defender receives a positive return score that is inversely proportional to the time delay for selecting the IP.

According to the scheme, the attack and defense game conditions of the attack end and the defense end are comprehensively considered, and an incomplete information static random game model under the limited rationality is established and used for guiding attack and defense strategies under the large-scale network state.

Further, determining a first environment state and action description information according to the game model;

based on the game model, the first environment state is composed of the delay of each IP in the current DNS proxy IP set, the times of selecting the IP by the client and the times of selecting the IP by the attacker. The action description information corresponds to an action of selecting one of the IP sets of the DNS proxy server to request DNS resolution.

Determining an action reward according to the action description information;

specifically, a transmission delay corresponding to the action description information is obtained;

Based on the game model, the calculation formula of the action reward is as follows:

it should be noted that, in the following description,

is a penalty for the defender to choose the same action as the attacker.

Is the round trip delay between the client and the selected DNS proxy server.

It should be noted that under non-perfect rational conditions, there is no nash equilibrium between the two parties to the game. In this case, the defender converges to an optimal defense strategy corresponding to the behavior of the attacker through a monotone non-reductive deep reinforcement learning method. Under the completely rational conditions, the method can be used,

and

is a finite set, neither party knows the other return function

And each state is a static game with limited and incomplete information. Any static game with limited and incomplete information has a bayesian nash balance, which means that the defender can converge to a nash balance strategy in the game.

Step 302, inputting the first environment state and the action description information into a value network to obtain a first function value;

step 303, inputting the second environment state into the actor target network to obtain the next action description information;

step 304, inputting the second environment state and the next action description information into the critic target network to obtain a second function value;

step 305, determining an advantage function according to the first function value and the second function value;

under the Actor-Critic framework, the subject is divided into actors

And critic

。

Merit function

The specific calculation formula of (2) is as follows:

wherein the content of the first and second substances,

is the last step of each round. Is a discount factor that displays future value. When the environment is in the state

When the micro-fluidic chip is given, the micro-fluidic chip is put into a sealed state,

by neural networks

Provided is a method.

Step 306, determining a gradient according to the dominance function;

gradient of gradient

GetNAverage of individual gradients to ensure more accurate estimation, as follows:

wherein the content of the first and second substances,

is in a random game

Taking action under the state

The probability of (c).

And 307, updating parameters of the selection strategy model according to the gradient to obtain the trained selection strategy model.

For actor

：

actor

Wherein, the learning rate of the actor in the deep reinforcement learning is shown.

In the embodiment of the invention, the DNS proxy server represents a defender in a game model

And according to different environmental conditions

And making corresponding action.

Defender can interact with attacker

Interact without the benefit thereofFunction(s)

. Defending person

Return action reward after action is made

. By repeating the process, the DNS proxy server realizes iterative optimization, and finally converges under different environment states to obtain the optimal action.

According to the scheme, the malicious non-privileged program is identified by training the selection strategy model, the spy process is forbidden, and the DNS cache poisoning attack is resisted to a great extent.

For practical application, an intelligent selection strategy based on DRL and a threat model with non-privileged malicious programs are further provided, a DNS proxy server is intelligently and adaptively selected according to the state of the DNS server in the current attack and defense game, so that the success rate of the cached DNS poisoning attack is greatly reduced,

furthermore, the embodiment of the invention optimizes the process of selecting the strategy model parameter updating by adopting trust region strategy optimization (TRPO), thereby ensuring that the training process is monotonous and not reduced and finally ensuring that the strategy converges to the optimal strategy in the game.

The embodiment of the invention mainly solves two problems of influencing rapid convergence, namely time consumption caused by unreusable historical data and unstable convergence caused by gradient updating of a neural network.

When using actor

The need to resample the training set, which may cause significant time consumption, after updating the strategy when collecting data, establishes two participant networks based on this embodiment of the invention,

and actor

。

Further, the air conditioner is provided with a fan,

for interacting with the environment and acquiring training data. At the same time, actor

Iteratively updating its policy according to the collected set and synchronizing after a certain number of steps

. This involves a key problem of significant sampling:

note that if the user cannot sample from p, another profile q is used to obtain data to estimate the functional expectation based on profile p. Although the expectation of the function is unbiased, the expected variance of the different distributions may be large. Therefore, it is a necessary condition to ensurepAndqthe distributions of (a) and (b) do not differ too much.

Is a merit function

。

Is state ofsTake actionaTime policy actor

And a probability of the policy, the policy modified to:

further, the optimization formula is as follows:

is the boundary of the KL divergence.

In the attack and defense game of embodiments of the present invention, the large number of states represents an almost infinite number of constraints. Even with the conjugate gradient approximation, its complexity is still beyond expectations. Therefore, a method for limiting the update step based on PPO is adopted as follows:

the constraint function is defined as:

according to the scheme, the trust domain strategy optimization is adopted to optimize the process of updating the parameters of the selection strategy model, so that the training process is monotonous and not decreased, and the performance of selecting the strategy model is improved.

In the embodiment of the invention, the state of some servers in the DNS proxy server set has transient change, such as network delay change caused by network environment, unavailable service caused by server system failure and the like, and the change causes that the corresponding action selection is changed from positive excitation instant to negative feedback. Although the state may change instantly, the agent may not respond to the selection of actions by the server node immediately, due to the inherent mechanism of reinforcement learning, and may still be determined based on historical experience. This causes the agent security performance to fluctuate. And vice versa.

Based on this, in the embodiment of the present invention, before selecting a corresponding DNS proxy server according to the action description information and obtaining a target IP corresponding to a target domain name, the selected DNS proxy server is determined according to the action description information; and adopting a self-checking component to check the DNS proxy server.

Specifically, a transition set is obtained;

in the embodiment of the invention, the states of the servers, such as network delay, service availability, attacked states and the like, are inquired in real time according to the DNS proxy server selection list, and the IP of the DNS proxy server in the transient state is added to the transition set.

Judging whether the DNS proxy server is in the transition set or not by adopting a self-checking component;

and if the transient state of the DNS proxy server is changed from positive excitation to negative feedback, reselecting the action description information by adopting a normal distributed sampling component.

Specifically, if the transient state is changed from positive excitation to negative feedback, the normal distributed sampling component is adopted for reselection, and if the transient state is changed from negative feedback to positive excitation, no processing is performed.

It should be noted that other distributed sampling components, such as poisson distribution, may also be used, and this is not specifically limited in this embodiment of the present invention.

According to the scheme, the self-checking component is added for optimization, the state and the action after the action is selected are jointly checked through the self-checking component, the defense performance fluctuation caused by transient transition is solved, the stable convergence of the intelligent body is realized, the active defense safety performance is ensured, and efficient response and defense are completed. Meanwhile, the energy consumption ratio of the data center is reduced, and the migration decision speed of the virtual machine is increased.

Based on the same inventive concept, fig. 4 exemplarily shows a device for resisting a client DNS cache poisoning attack, which may be a flow of a method for resisting a client DNS cache poisoning attack according to an embodiment of the present invention.

The apparatus, comprising:

an obtaining module 401, configured to obtain a request from a client and a DNS proxy server set; the request includes a target domain name;

a processing module 402, configured to determine whether the target domain name hits a domain name in a local cache; if not, acquiring the current environment state; the environment state comprises state information of each DNS proxy server in the DNS proxy server set; the state information comprises the times selected by the client, the times selected by an attacker and the round-trip delay between the client and the attacker; inputting the current environment state into a trained selection strategy model to obtain action description information selected in the DNS proxy server set; selecting a corresponding DNS proxy server according to the action description information to obtain a target IP corresponding to the target domain name; the trained selection strategy model is obtained by training by utilizing different environment states.

Further, the selection policy model includes a value network, a policy network, an actor target network, and a critic target network, and the processing module 402 is further configured to:

determining a gradient according to the merit function;

Further, the processing module 402 is specifically configured to:

establishing a game model;

determining an action reward according to the action description information;

Further, the processing module 402 is specifically configured to:

Further, the processing module 402 is further configured to:

and adopting a self-checking component to check the DNS proxy server.

Further, the processing module is specifically configured to:

acquiring a transition set;

Based on the same inventive concept, another embodiment of the present invention provides an electronic device, which specifically includes the following components, with reference to fig. 5: a processor 501, a memory 502, a communication interface 503, and a communication bus 504;

the processor 501, the memory 502 and the communication interface 503 complete mutual communication through the communication bus 504; the communication interface 503 is used for implementing information transmission between the devices;

the processor 501 is configured to call a computer program in the memory 502, and the processor implements all the steps of the above method for resisting a client DNS cache poisoning attack when executing the computer program, for example, the processor implements the following steps when executing the computer program: acquiring a request from a client and a DNS proxy server set; the request includes a target domain name; judging whether the target domain name hits the domain name in the local cache; if not, acquiring the current environment state; the environment state comprises state information of each DNS proxy server in the DNS proxy server set; the state information comprises the times selected by the client, the times selected by an attacker and the round-trip delay between the client and the attacker; inputting the current environment state into a trained selection strategy model to obtain action description information selected in the DNS proxy server set; selecting a corresponding DNS proxy server according to the action description information to obtain a target IP corresponding to the target domain name; the trained selection strategy model is obtained by training by utilizing different environment states.

Based on the same inventive concept, a further embodiment of the present invention provides a non-transitory computer-readable storage medium, having stored thereon a computer program, which when executed by a processor implements all the steps of the above method for resisting a client DNS cache poisoning attack, for example, the processor implements the following steps when executing the computer program: acquiring a request from a client and a DNS proxy server set; the request includes a target domain name; judging whether the target domain name hits the domain name in the local cache; if not, acquiring the current environment state; the environment state comprises state information of each DNS proxy server in the DNS proxy server set; the state information comprises the times selected by the client, the times selected by an attacker and the round-trip delay between the client and the attacker; inputting the current environment state into a trained selection strategy model to obtain action description information selected in the DNS proxy server set; selecting a corresponding DNS proxy server according to the action description information to obtain a target IP corresponding to the target domain name; the trained selection strategy model is obtained by training by utilizing different environment states.

In addition, the logic instructions in the memory may be implemented in the form of software functional units and may be stored in a computer readable storage medium when sold or used as a stand-alone product. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes several instructions for causing a computer device (which may be a personal computer, an apparatus for resisting client DNS cache poisoning attack, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the embodiment of the present invention. One of ordinary skill in the art can understand and implement it without inventive effort.

Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. Based on such understanding, the above technical solutions may be embodied in the form of a software product, which may be stored in a computer-readable storage medium, such as a ROM/RAM, a magnetic disk, an optical disk, etc., and includes several instructions for enabling a computer device (which may be a personal computer, an apparatus for resisting client DNS cache poisoning attack, or a network device, etc.) to execute the method for resisting client DNS cache poisoning attack according to the embodiments or some parts of the embodiments.

In addition, in the present invention, terms such as "first" and "second" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one such feature. In the description of the present invention, "a plurality" means at least two, e.g., two, three, etc., unless specifically limited otherwise.

Moreover, in the present invention, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

Furthermore, in the description herein, reference to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above are not necessarily intended to refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, various embodiments or examples and features of different embodiments or examples described in this specification can be combined and combined by one skilled in the art without contradiction.

Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims

1. A method for resisting client DNS cache poisoning attacks is characterized by comprising the following steps:

judging whether the target domain name hits the domain name in the local cache;

2. The method of claim 1, wherein the selection policy model comprises a value network, a policy network, an actor target network, and a critic target network, and before inputting the current environmental status into the trained selection policy model to obtain the action description information for selecting in the DNS proxy server set, the method further comprises:

determining a gradient according to the merit function;

3. The method of claim 2, wherein the obtaining a predetermined number of training sample sets comprises:

establishing a game model;

determining an action reward according to the action description information;

4. The method for resisting client side DNS cache poisoning attack as claimed in claim 3, wherein the determining action reward according to the action description information comprises:

5. The method of resisting client-side DNS cache poisoning attacks according to claim 2, further comprising, before the obtaining the trained selection policy model:

6. The method according to claim 1, wherein before the selecting the corresponding DNS proxy server according to the action description information to obtain the target IP corresponding to the target domain name, the method further comprises:

and adopting a self-checking component to check the DNS proxy server.

7. The method of claim 6, wherein the auditing the DNS proxy server with a self-auditing component comprises:

acquiring a transition set;

8. An apparatus for resisting client side DNS cache poisoning attacks, comprising:

9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the steps of the method according to any of claims 1 to 7 are implemented when the processor executes the program.

10. A non-transitory computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 7.