CN116886443B

CN116886443B - Opponent action preference estimation method and device for attack and defense game and electronic equipment

Info

Publication number: CN116886443B
Application number: CN202311123325.4A
Authority: CN
Inventors: 陈少飞; 胡振震; 袁唯淋; 陆丽娜; 吉祥; 李鹏; 陈佳星; 陈璟
Original assignee: National University of Defense Technology
Current assignee: National University of Defense Technology
Priority date: 2023-09-01
Filing date: 2023-09-01
Publication date: 2023-11-10
Anticipated expiration: 2043-09-01
Also published as: CN116886443A

Abstract

The application relates to an opponent action preference estimation method and device for attack and defense game and electronic equipment, wherein the method comprises the following steps: counting the types and the proportions of action sequences of both attack and defense parties in the previous round of network attack and defense process; according to the proportion of all the action sequences of the two parties, carrying out two-dimensional approximate estimation on the probability of the action sequences; determining posterior distribution of information sets of network aggressors at the end of the previous round according to the probability of the action sequence and the joint distribution of the information sets of both the attack and the defense; obtaining the distribution of the information set of the network attacker at the beginning of the next round according to the function relation of the information set between the posterior distribution and the round after the smoothing treatment; and adopting an attacker versus strategy model established according to information set distribution to infer an attacker action, and adopting an optimal countermeasure strategy to defend by the defending party. The method improves the accuracy of predicting the actions of the attacker, and further improves the pertinence of the defensive strategy of the attacker, thereby improving the capability and effect of network defense.

Description

Opponent action preference estimation method and device for attack and defense game and electronic equipment

Technical Field

The application relates to the technical field of network security, in particular to an opponent action preference estimation method and device for attack and defense game and electronic equipment.

Background

With the continuous enhancement of the informatization degree, the network provides more convenience for people, but network attacks are more frequent, and huge losses are caused to attackers, so that the attack and defense of the network information becomes one of important problems of network security. The network attack and defense is a process of multi-stage random repeated game of incomplete information of two parties, whether the network attack can succeed or not, and besides the strength of the attack capability, the targeted defense measures are also important factors, so that the network attack and defense process is also a process of the two parties for antagonizing and game, and the reconstruction of the strategy model of an opponent for predicting the attack strategy of a network attacker in the network attack and defense game has challenges.

In the network attack and defense process, due to incomplete information, the network attack and defense parties cannot completely observe hidden information of an opponent, so that when an opponent strategy model (namely a movement probability model) is explicitly constructed by using the observed data of the finished game bureau, an opponent information set which is not observed by the hidden information and an opponent information set which is observed by the hidden information need to be associated with an opponent action by a certain method (in a game, the information set of any game party refers to an indeterminate historical information set caused by the fact that part of information of the opponent is unknown when the game party acts, the strategy model of the reconstructed opponent is essentially that an associated probability model of the information set of the opponent and the opponent action is established, and the information set of the opponent is determined by the hidden information of the opponent). According to the current explicit reconstruction method of the opponent strategy model based on the decision point, different information sets in the game are aggregated in the decision point, the independent same-distribution property is utilized, the opponent information sets not observed by the hidden information and the opponent information sets observed by the hidden information are unified into one distribution consideration, and probability models of different actions are obtained by utilizing probability density estimation of the information sets corresponding to the actions.

However, the method assumes that the distribution of the information sets within the decision points is uniform, and this too strong assumption can bias the probability density estimates of the actions, thereby limiting the accuracy of the reconstructed adversary model. This is because (1) within a phase, the actions of both parties in the game are favored, and thus the current actions of the opponents are affected by the actions of the front of both parties in the game, so that the distribution of the information sets within the current decision points of the opponents is not uniformly distributed, and (2) after the phase transition, the distribution of the information sets of the current phase is relieved to some extent by the actions of both parties in the previous phase due to the existence of the randomness factor, but is still not completely ignored. The uneven distribution of the opponent information set caused by the influence of the action preferences of the two parties causes a certain deviation of the opponent action probability model adopting the uniform distribution assumption.

In summary, the existing explicit reconstruction method of the opponent strategy based on the decision point is inconsistent with the actual situation of uneven distribution of the opponent information sets affected by preference due to the even distribution assumption of the information sets in the decision point, so that the established opponent strategy model is limited in accuracy, deviation is caused to the speculation of the opponent action, the pertinence of the defense strategy adopted by the defender is poor, and the network defense capacity and effect are affected.

Disclosure of Invention

Based on the foregoing, it is necessary to provide an opponent action preference estimation method, an opponent action preference estimation device and an electronic device for the attack and defense game.

An opponent action preference estimation method facing attack and defense games, the method comprising:

and counting the types and the proportions of action sequences of both network attacks and defenders in the previous round of network attack and defending process.

And carrying out two-dimensional approximate estimation on the probability of the action sequences according to the proportion of all the action sequences of the two parties.

And determining posterior distribution of the information set of the network attack party at the end of the previous round according to the probability of the action sequence and the joint distribution of the information sets of the network attack and defense parties.

And obtaining the distribution of the information set of the network attacker at the beginning of the next round according to the posterior distribution of the information set of the network attacker after the smoothing and the functional relation of the information set between rounds, and taking the distribution as the information set distribution in the first-order decision point of the opponent of the round.

According to the distribution of the information set of the network attacker, an opponent strategy model of the network attacker is established, the opponent strategy model of the network attacker is adopted to infer the action of the network attacker, and the defender adopts the optimal countermeasure strategy to defend according to the action of the network attacker.

In one embodiment, counting the types and proportions of the action sequences of both network attacks and defenders in the previous round of network attack and defending process includes:

counting the types and the quantity of action sequences of the network attack and defense parties in the previous round in the data observed in the multi-office historical countermeasure process of the network attack and defense parties;

according to the types and the quantity of all the action sequences, determining the proportion of each type of action sequences as follows:

；

wherein,for the ratio of the number of action sequences acts to the total number of action sequences, +.>Number of act sequences for a class; />Is the total number of moving sequences; />Information for both network attack and defense partiesAction sequence conditional probability under joint distribution of the set; />The method is the joint distribution of information sets of both the network attack and defense parties; />The information sets of the network attack and defense parties in the previous round are respectively indicated by a subscript P representing the defender, a subscript O representing the network attack party, namely the opponent, and a subscript pre representing the previous round.

In one embodiment, the two-dimensional approximation of the probability of the action sequence in the previous round is performed according to the proportions of all the action sequences of both parties, including:

according to general decision logic of actions of both sides, setting approximate probability of the action sequence in the corresponding two-dimensional interval of the information set as follows:

；

Wherein,approximately probability of action sequence in two-dimensional space for information set,/->、/>Respectively information sets of both network attack and defense parties; />、/>Action sequences of both network attack and defense parties respectively>Corresponding two-dimensional interval range of information set, each action sequenceactsThe probability of (2) should satisfy:

；

wherein,for the proportion of action sequences, +.>And the prior joint distribution of the information sets of the two network attacks and defends in the previous round is realized.

According to the proportion of all the action sequences of both parties and the general decision logic of actions, each action sequence of both network attack and defense parties is obtained by two-mode iterationThe corresponding two-dimensional interval range of the information set further obtains the approximate probability of the action sequence on the two-dimensional interval of the complete information set.

In one embodiment, determining the posterior distribution of the information set of the network attacker at the end of the previous round according to the probability of the action sequence and the joint distribution of the information sets of both the network attack and the defense includes:

approximate probability of action sequence in two-dimensional space according to information setAnd action sequence set Acts capable of entering the next round, and obtaining posterior joint distribution of the information sets of the two parties at the end of the previous round:

；

wherein,for posterior joint distribution of both party information sets at the end of the previous round, ++ >For the prior joint distribution of the information sets of the network attack and defense parties in the previous round,approximately probability of action sequence in two-dimensional space for information set,/->For the next run of action sequence set, +.>For each sequence of actions;

the posterior distribution of the information set of the network attacker is obtained according to the edge distribution of the joint posterior distribution:

；

wherein,information sets for network aggressors of the previous round are distributed.

In one embodiment, obtaining the distribution of the information set of the network attacker at the beginning of the next round according to the posterior distribution of the information set of the network attacker after the smoothing and the functional relation of the information set between rounds includes:

the posterior distribution of the information set of the network attacker is smoothed by adopting a preset kernel function, and a smooth nonlinear density function is obtained as follows:

；

wherein,for a nonlinear density function of the previous round of smoothing, +.>For presetting kernel function, subscriptjFor the index after uniform discretization of the information set, < >>Posterior distribution of information set for previous round network aggressor, ++>Is the information set of the network aggressor in the previous round.

According to the posterior distribution of the information set of the network attacker after the smoothing treatment and the functional relation of the information set between turns, the distribution of the information set of the network attacker at the beginning of the next turn is obtained and is used as the information set distribution in the first-order decision point of the opponent of the turn:

；

Wherein,for the distribution of the information set of the network attacker at the beginning of the next round,for the functional relation of the information set between the previous round and the following round,/for the information set between the previous round and the following round>Is the information set of the network attacker at the beginning of the next round.

In one embodiment, the method further includes an opponent action preference estimation step within the second order decision point, specifically including:

counting the types and proportions of all action sequences in the network attack and defense process before the second-order decision point;

according to the proportion of all decision point action sequences before the second-order decision point in the current round, carrying out two-dimensional approximate estimation on the probability of the action sequences of both the attack and the defense of the network;

and determining the joint posterior distribution of the information sets of the two parties in the second-order decision point influenced by the previous action preference according to the probability of the action sequences of the two parties and the joint prior distribution of the information sets of the two parties, and determining the information set distribution in the second-order decision point of the network attacker according to the edge distribution of the joint posterior distribution. In one embodiment, the two-dimensional approximate estimation of the probability of the action sequences of both the attack and the defense of the network is performed according to the proportion of all the action sequences before the second-order decision point in the current round, which comprises the following steps:

According to the general decision logic of the actions of the two parties, the action sequences are set to be uniformly distributed in the two-dimensional intervals of the corresponding information sets, and according to the proportion of the action sequences, the two-dimensional information set intervals corresponding to all the action sequences are determined.

In one embodiment, determining, according to the probability of the action sequences of the network attack and defense parties and the joint prior distribution of the information sets of the network attack and defense parties, joint posterior distribution of the information sets of the two parties in the second-order decision point affected by the previous action preference, and determining, according to the edge distribution of the joint posterior distribution, the information set distribution in the second-order decision point of the network attack party includes:

based on a sequence of actions that can enter a second order decision pointDetermining joint posterior distribution of information sets of both the network attack and defense parties in a second-order decision point:

；

wherein,for the joint posterior distribution of information sets of both the network attack and defense parties in the second-order decision point, the ++>Is the conditional distribution of the action sequences of the network attack and defense parties which can enter the second-order decision point before the second-order decision point, +.>Combine a priori distribution for both party information sets at the beginning of the current round,/->For action sequence->Is equal to->；

Determining information set distribution in second-order decision points of network attack parties according to edge distribution of joint posterior distribution:

Wherein,is the information set distribution in the second order decision point of the network attacker.

An opponent action preference estimation device oriented to attack and defense gaming, the device comprising:

and the proportion determining module is used for counting the types and proportions of the action sequences of the network attack and defense parties in the previous round of network attack and defense process.

And the probability estimation module is used for carrying out two-dimensional approximate estimation on the probability of the action sequences in the previous round according to the proportion of the action sequences of all the both parties.

And the posterior distribution determining module is used for determining posterior distribution of the information set of the network attacker when the previous round is finished according to the probability of the action sequence and the joint distribution of the information sets of the network attack and defense parties.

And the next round of information set distribution determining module of the network attacker is used for obtaining the distribution of the information set of the network attacker at the beginning of the next round according to the posterior distribution of the information set of the network attacker after the smoothing process and the functional relation of the information sets between rounds, and taking the distribution as the information set distribution in the first-order decision point of the opponent of the round.

And the opponent action presumption module is used for establishing an opponent strategy model of the network attacker according to information set distribution in the decision point of the network attacker, presuming the action of the network attacker by adopting the opponent strategy model of the network attacker, and defending by adopting an optimal countermeasure strategy according to the action of the network attacker.

An electronic device comprising a memory storing a computer program and a processor implementing any of the methods described above when executing the computer program.

The attack and defense game-oriented opponent action preference estimation method, the attack and defense game-oriented opponent action preference estimation device and the electronic equipment, wherein the method comprises the following steps: counting the types and the proportions of action sequences of both network attack and defense parties in the previous round of network attack and defense process; according to the proportion of all the action sequences of the two parties, carrying out two-dimensional approximate estimation on the probability of the action sequence in the previous round; determining posterior distribution of information sets of network attack parties at the end of the previous round according to the probability of the action sequence and the joint distribution of the information sets of the network attack and defense parties; obtaining the distribution of the information set of the network attacker at the beginning of the next round according to the posterior distribution of the information set of the network attacker after the smoothing treatment and the functional relation of the information set between rounds, and taking the distribution as the information set distribution in the first-order decision point of the opponent of the round; according to information set distribution in decision points of the network attacker, an opponent strategy model of the network attacker is established, the opponent strategy model of the network attacker is adopted to infer the action of the network attacker, and the defender defends by adopting an optimal countermeasure strategy according to the action of the network attacker. The method breaks through the limitation of the assumption that the information sets are uniformly distributed in the existing decision points, can effectively improve the accuracy of explicit reconstruction of the adversary strategy, improves the accuracy of predicting the actions of the attacker, further improves the pertinence of the defensive strategy adopted by the attacker, and improves the network defensive capability and effect.

Drawings

FIG. 1 is a flow diagram of a method for opponent action preference estimation for an attack and defense game in one embodiment;

FIG. 2 is a flow chart of a method for estimating a mobile preference in an adversary's second order decision point in a network attack and defense process according to an embodiment;

FIG. 3 is a block diagram of an opponent action preference estimation device facing an attack and defense game in one embodiment;

fig. 4 is an internal structural diagram of an electronic device in one embodiment.

Detailed Description

The present application will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present application more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the application.

In the network attack and defense process, the decision of the network attacker is actually made on the information set, and the policy model of the reconstructed network attacker is essentially to establish a correlation probability model of the information set of the network attacker and the action of the network attacker.

Because of the randomness factor in the network attack and defense, the hidden information of the network attack and defense party is uniformly distributed when the network attack and defense is started, and thus the information set determined by the hidden information is also uniformly distributed, and the information set in the decision point is uniformly distributed when the network attack and defense is started. However, after the network attack and defense parties perform certain actions, as long as the actions of the network attack and defense parties are not randomly selected, that is, certain action preferences exist, the distribution of the network attack party information sets in the subsequent decision points is necessarily affected, and is not uniformly distributed, so that the influence caused by the preferences must be considered in order to construct a more accurate action probability model. The effect of this effect is essentially an uneven distribution of information sets within the decision points, which is equivalent to estimating the effect of network aggressor action preferences if the specific distribution of the network aggressor information sets can be accurately estimated.

The information set is regarded as a random variable, the action preference estimation problem is converted into an estimation problem of information set distribution of an opponent in a decision point, a first-order decision point and a second-order decision point are distinguished, the action preference influence of the actions of the two parties in the previous round and the preference influence of the actions of the two parties before the second-order decision point in the previous round are considered in the round conversion, probability statistics of the action sequences of the two parties in the previous round and the action sequences of the two parties before the second-order decision point in the current round are utilized, two-dimensional approximate probability estimation of the action sequences of the two parties is utilized to obtain the joint posterior distribution of the information sets of the two parties under the action sequence condition of reaching the second-order decision point in the next round or reaching the second-order decision point in the current round, and then the edge distribution is solved to obtain the information set distribution of the opponent, wherein the former is used as the information set distribution in the first-order decision point of the opponent in the round. The method considers the reality problem that the distribution of the opponent information sets in the game is influenced by the action preferences of both parties by utilizing the concept of Bayesian modeling, realizes the estimation of the distribution of the opponent information sets in the decision point, breaks through the limitation of the assumption of uniform distribution of the information sets in the existing decision point, and can effectively improve the accuracy of explicit reconstruction of the opponent strategy.

The information set is taken into consideration as a random variable, and the information set of the previous round can be distributed by a density functionRepresentation (since the number of information sets is large, a continuous random variable can be usedIThe method is to map all information sets to the real number range of 0-1 in sequence according to the magnitude of a certain characteristic quantity (such as the expected win rate), and the information set distribution of the next round can also use a density function +.>And (3) representing. Since the information set of the next round is not identical to the information set of the previous round due to the randomness factor existing between the phases, a functional relationship exists between the information set of the previous round and the information set of the next round>If the function is monotonic, then there is a relationship between the distribution of the two sets of round information: />This function may be obtained by traversing the information set. The information set distribution at the beginning of the next round can be derived from the information set distribution at the end of the previous round by the functional relationship.

Initial distribution at the very beginning of the previous roundNext, the distribution of information at the end of the round has been changed to +.>. Such a change in distribution due to the influence of action preferences can be modeled using bayesian principles. I.e. the initial distribution at the beginning of a round may be regarded as a priori, while the distribution at the end of a round may be regarded as a posterior, and the probability of all sequences of actions observed by the round may be regarded as conditional probability under the anterior. Since some actions by both parties in the previous round may cause the current game office to end in advance without going to the next round, the information set distribution of the next round is converted based on the posterior distribution of the action sequence observations that can go to the next round at the end of the previous round. Thus, we need to derive a posterior distribution of the adversary information set at the end of the previous round from observations of the sequence of actions that can go to the next round.

Due to a sequence of actions of one roundThe network attack and defense parties are formed, and are influenced by the network attack and defense parties, so the probability of observing a certain action sequence is +.>Conditional probability of->The subscript P represents a network defender, the subscript O represents a network attacker, and the information set distribution of the network defender and the network defender can be regarded as independent at the beginning of one round due to randomness factors, so that the joint distribution is obtained by multiplying the information set distribution of the two parties.

At the position ofnIn the observation of the individual game stations,can observemAn action sequence, wherein a partial action sequence (which is aggregated as Acts) would enable a previous round to enter the next round, so the posterior distribution of the information set at the end of the previous round that can enter the next round is:

（1）

after the posterior calculation is used for obtaining the joint distribution of the information sets of the two parties which can enter the next round at the previous end, the information set distribution of the network attacker can be obtained by solving the edge density:

（2）

after the network attacker information set distribution at the end of the previous round is obtained, the functional relation of the information sets between rounds can be obtainedTo obtain the distribution +. >。

Based on the analysis, the initial distribution of the information set of the network attacker in the previous round is obtained, and the important point is that the conditional probability of all different types of action sequences in the previous round is obtainedAnd obtaining posterior joint distribution which can enter the next round of action sequence observation on the basis of the posterior joint distribution, and finally obtaining the edge distribution of the information set of the network attacker.

In one embodiment, as shown in fig. 1, there is provided an opponent action preference estimation method facing an attack-defense game, the method comprising the steps of:

step 100: and counting the types and the proportions of action sequences of both network attacks and defenders in the previous round of network attack and defending process.

Specifically, the network attack and defense process is essentially a process of randomly repeating a game by multiple stages (rounds) of incomplete information in which both network attack and defense parties participate.

In the process of reconstructing a policy model of a network attacker, decision points are used as a modeling basis, and the reconstruction policy model is equivalent to an action probability model on the reconstruction decision points. The decision points represent similar decision scenes formed by similar information sets and can be regarded as a set of the same distribution information sets. The decision points distinguish between first order decision points and second order decision points. The first-order decision point only considers the last action before the decision point as the definition of the decision point, so that the influence of the preference of the actions of the two parties in the stage (also called the round) is not considered, but only the influence of the preference of the actions of the two parties in the previous stage is considered, and therefore, the first-order decision point only needs to consider the information set distribution of the opponent at the beginning of each round. The second order decision point takes the last two actions before the decision point into consideration as the definition of the decision point, so that the influence of the action preference of the two parties in the current round is considered to determine the information set distribution in the decision point.

According to consideration of factors influencing opponent decisions in incomplete information multi-stage (round) random repeated games participated by both network attack and defense, information sets with most key decision factors (usually hidden information) in network attack and defense as variables and other decision factors as equal constants are compressed into a decision point, so that distribution of the information sets in the decision point is determined by the most key decision factors.

Step 102: and (3) carrying out two-dimensional approximate estimation on the probability of the action sequences in the previous round according to the proportion of all the action sequences of the two parties.

Step 104: and determining posterior distribution of the information set of the network attack party at the end of the previous round according to the probability of the action sequence and the joint distribution of the information sets of the network attack and defense parties.

Step 106: and obtaining the distribution of the information set of the network attacker at the beginning of the next round according to the posterior distribution of the information set of the network attacker after the smoothing and the functional relation of the information set between the rounds, and taking the distribution as the information set distribution in the first-order decision point of the opponent of the round.

Step 108: according to the distribution of the information set of the network attacker, an opponent strategy model of the network attacker is established, the opponent strategy model of the network attacker is adopted to infer the action of the network attacker, and the defender adopts the optimal countermeasure strategy to defend according to the action of the network attacker.

The opponent action preference estimation method facing the attack and defense games comprises the following steps: counting the types and the proportions of action sequences of both network attack and defense parties in the previous round of network attack and defense process; according to the proportion of all the action sequences of the two parties, carrying out two-dimensional approximate estimation on the probability of the action sequence in the previous round; determining posterior distribution of information sets of network attack parties at the end of the previous round according to the probability of the action sequence and the joint distribution of the information sets of the network attack and defense parties; obtaining the distribution of the information set of the network attacker at the beginning of the next round according to the posterior distribution of the information set of the network attacker after the smoothing treatment and the functional relation of the information set between rounds, and taking the distribution as the information set distribution in the first-order decision point of the opponent of the round; according to information set distribution in decision points of the network attacker, an opponent strategy model of the network attacker is established, the opponent strategy model of the network attacker is adopted to infer the action of the network attacker, and the defender defends by adopting an optimal countermeasure strategy according to the action of the network attacker. The method breaks through the limitation of the assumption that the information sets are uniformly distributed in the existing decision points, can effectively improve the accuracy of explicit reconstruction of the adversary strategy, improves the accuracy of predicting the actions of network aggressors, further improves the pertinence of the defensive strategy adopted by the my, and improves the network defensive capability and effect.

In one embodiment, step 100 comprises: counting the types and the quantity of action series of the network attack and defense parties in the previous round in the data observed in the multi-office historical countermeasure process of the network attack and defense parties; according to the types and the number of all the action sequences, the proportion of each type of action sequences is determined as follows:

（3）

for the ratio of the number of action sequences acts to the total number of action sequences, +.>Number of act sequences for a class; />Is the total number of moving sequences; />The method comprises the steps of providing action sequence conditional probability under the joint distribution of information sets of both the attack and the defense of the network; />The method is the joint distribution of information sets of both the network attack and defense parties;the information sets of the network attack and defense parties in the previous round are respectively indicated by a subscript P representing the defender, a subscript O representing the network attack party, namely the opponent, and a subscript pre representing the previous round.

If all types of action sequences are counted, the expected probability of one type of action sequence can be obtained by using the ratio of the number of the action sequences to the total number of the action sequences.

This ratio is also in fact a conditional probabilityIs a function of the integral of (a). Since the left term in equation (3) is known, the conditional probability in the right term integral can be approximated by a certain assumption.

In one embodiment, step 102 includes: setting the general decision logic and action sequence of actions to be uniform in the two-dimensional interval of the corresponding information set; and estimating the probability of the action sequence according to the proportion of all the decision point action sequences, the general decision logic of the actions and the distribution of the action sequences in the two-dimensional interval of the corresponding information set.

In particular, due to doubleThe decision is made according to the own information set, so that different action choices often represent different own information sets. Consider that if the value range of the information sets of both parties is known, andit is known that if the conditional probability of an action sequence can be estimated based on general action decision logic, an estimate of the action probability over the entire range of values of the information set can be obtained.

In general action decision logic, different actions of the network attack and defense parties are selected according to the win rate (namely the success rate of the action) of the information set, when the win rate of the information set is high, actions with higher risks are easier to take, and conversely actions with lower risks are easier to take. If each action sequence is considered, the information set specific to both parties is usedInterval of->An internal uniform distribution (i.e. probability of action sequence within interval is 1) to approximate +.>Then in complete->The range of values is covered by the intervals corresponding to the different action sequences:

（4）

and the probability of occurrence of each action sequence is equal to the integral over a particular interval corresponding to that action sequence:

（5）

considering the joint distribution in formula (5)Conditional probability- >Also by the interval of each action sequence, then according to +.>The area size of the two-dimensional section of the sequence can be obtained. Therefore, the probability of the action sequence on the two-dimensional complete value range of the information set can be obtained only by arranging the intervals corresponding to all the sequences according to the preference of the information set corresponding to each action (namely, arranging according to the win rate of the information set) according to the sequence of each action in all the sequences.

Therefore, after all the action sequences are divided, the conditional probability of all the action sequences on the complete value range of the information sets of the network defender and the attacker can be obtained。

In one embodiment, step 104 includes: according to the probability of the action sequence and the joint distribution of the information sets of the network attack and defense parties, determining posterior distribution of the information set of the network attack party at the end of the previous round, wherein the method comprises the following steps: approximate probability of action sequence in two-dimensional space according to information setAnd action sequence set Acts capable of entering the next round, and obtaining posterior joint distribution of the information sets of the two parties at the end of the previous round:

（6）

wherein,for posterior joint distribution of both party information sets at the end of the previous round, ++>For the prior joint distribution of the information sets of the network attack and defense parties in the previous round, Approximately probability of action sequence in two-dimensional space for information set,/->For the next run of action sequence set, +.>For each sequence of actions;

（7）

wherein,and distributing information sets of network aggressors at the end of the previous round.

In one embodiment, step 106 includes: the posterior distribution of the information set of the network attacker is smoothed by adopting a preset kernel function, and a smooth nonlinear density function is obtained as follows:

（8）

wherein,for a nonlinear density function of the previous round of smoothing, +.>For presetting kernel function, subscriptjFor the index after uniform discretization of the information set, < >>For previous round of netPosterior distribution of information sets of the attacker, +.>Is the information set of the network aggressor in the previous round.

Specifically, since the two-dimensional approximation of the action sequence is obtained assuming a uniform distribution within the interval, the posterior distribution of the information set of the opponent O is obtained as a piecewise linear function, so that the density function of the distribution becomes a smooth nonlinear density function by a smoothing process, as shown in the expression (8).

And then can pass through the functional relation among the information sets among the rounds To obtain the distribution +.>And is used as information set distribution in the first-order decision point of the round opponent:

（9）

In one embodiment, as shown in fig. 2, the method further includes an opponent action preference estimation step of a second-order decision point in a round, and specifically includes the following steps:

step 200: and counting the types and the proportions of all action sequences in the network attack and defense process before the second-order decision point.

Specifically, the second-order decision point takes two actions before the current decision point as the definition of the decision point, so that the influence of the action preference of the two actions before the decision point in the current stage needs to be considered.

Step 202: and carrying out two-dimensional approximate estimation on the probability of the action sequences of both the attack and the defense of the network according to the proportion of all the action sequences before the second-order decision point in the current round.

Step 204: and determining the joint posterior distribution of the information sets of the two parties in the second-order decision point influenced by the previous action preference according to the probability of the action sequences of the network attack and defense parties and the joint prior distribution of the information sets of the network attack and defense parties, and determining the information set distribution in the second-order decision point of the network attack party according to the edge distribution of the joint posterior distribution.

In one embodiment, step 202 includes: according to the general decision logic of the actions of the two parties, the action sequences are set to be uniformly distributed in the two-dimensional intervals of the corresponding information sets, and according to the proportion of the action sequences, the two-dimensional information set intervals corresponding to all the action sequences are determined.

In one embodiment, step 204 includes: based on a sequence of actions that can enter a second order decision pointDetermining joint posterior distribution of information sets of both the network attack and defense parties in a second-order decision point:

（10）

（11）

The information set distribution estimation of the network attacker of the second-order decision point is similar to the preference estimation of the two actions of the previous round in the round conversion, but the two action sequences of the whole stage are not needed to be considered, only the two action sequences before the second-order decision point are needed to be considered, and the information set is not needed to be subjected to variable transformation by utilizing the functional relation of the information set among the rounds.

It should be understood that, although the steps in the flowcharts of fig. 1 and 2 are shown in order as indicated by the arrows, these steps are not necessarily performed in order as indicated by the arrows. The steps are not strictly limited to the order of execution unless explicitly recited herein, and the steps may be executed in other orders. Moreover, at least some of the steps in fig. 1, 2 may include multiple sub-steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, which are not necessarily performed sequentially, but may be performed alternately or alternately with at least a portion of the other steps or sub-steps of other steps.

In one embodiment, as shown in fig. 3, there is provided an opponent action preference estimation device facing an attack-defense game, including: a proportion determining module 301 of action sequences of both network attack and defense parties, a probability estimating module 302 of action sequences of both network attack and defense parties, a posterior distribution determining module 303 of current round information sets of network attack parties, a posterior round information set distribution determining module 304 of network attack parties, and an opponent action presumption module 305, wherein:

The proportion determining module 301 of the action sequences of the network attack and defense parties is used for counting the types and proportions of the action sequences of the network attack and defense parties in the previous round of network attack and defense process.

The probability estimation module 302 of the action sequences of the network attack and defense parties is used for performing two-dimensional approximate estimation on the probability of the action sequence in the previous round according to the proportion of the action sequences of all the parties.

And the posterior distribution determining module 303 of the current round information set of the network attacker is configured to determine the posterior distribution of the information set of the network attacker at the end of the previous round according to the probability of the action sequence and the joint distribution of the information sets of both the attack and the defense of the network.

And the next round of information set distribution determining module 304 of the network attacker is configured to obtain the distribution of the information set of the network attacker at the beginning of the next round according to the posterior distribution of the information set of the network attacker after the smoothing process and the functional relationship of the information sets between rounds, and use the obtained distribution as the information set distribution in the first-order decision point of the opponent of the round.

The opponent action presumption module 305 is configured to establish an opponent policy model of the network attacker according to information set distribution in decision points of the network attacker, presume the network attacker action by adopting the opponent policy model of the network attacker, and defend by adopting an optimal countermeasure policy according to the network attacker action.

In one embodiment, the proportion determining module 301 of the action sequences of the network attack and defense parties is further configured to count types and numbers of the action sequences of the network attack and defense parties in the previous round in the data observed in the multi-office historical countermeasure process of the network attack and defense parties; the ratio of each type of action sequence is determined according to the types and the number of all the action sequences, for example, as shown in formula (3).

In one embodiment, the probability estimation module 302 of the action sequences of the network attack and defense parties is further configured to set the general decision logic of the actions and the action sequences to be uniform in the two-dimensional interval of the corresponding information set; and estimating the probability of the action sequence according to the proportion of the action sequences of all the two parties, the general decision logic of the actions and the distribution of the action sequences in the two-dimensional interval of the corresponding information set.

（12）

wherein,approximately probability of action sequence in two-dimensional space for information set,/->、/>Respectively information sets of both network attack and defense parties; />、/>Action sequences of both network attack and defense parties respectivelyactsCorresponding two-dimensional interval range of information set, each action sequence actsThe probability of (2) should satisfy:

（13）

wherein,for the proportion of action sequences, +.>The prior joint distribution of the information sets of the two attack and defense parties of the previous round of internal network is realized;

according to the proportion of all the action sequences of both parties and the general decision logic of actions, each action sequence of both network attack and defense parties is obtained by two-mode iterationactsThe corresponding two-dimensional interval range of the information set further obtains the approximate probability of the action sequence on the two-dimensional interval of the complete information set.

In one embodiment, the posterior distribution determining module 303 of the current round information set of the network attacker is further configured to, for the posterior distribution of the information set of the network attacker at the end of the previous round, determine the approximate probability of the action sequence in the two-dimensional space according to the information setAnd a posterior joint distribution for obtaining the information sets of both sides at the end of the previous round by the action sequence set Acts capable of entering the next round is shown in the formula (6). The posterior distribution of the information set of the network attacker is obtained according to the edge distribution of the joint posterior distribution, and the posterior distribution of the information set of the network attacker is shown as a formula (7).

In one embodiment, the distribution determining module 304 of the next round of information set of the network attacker is further configured to smooth the posterior distribution of the information set of the network attacker by using a preset kernel function, so as to obtain a smooth nonlinear density function as shown in equation (8). And obtaining the distribution of the information set of the network attacker at the beginning of the next round according to the posterior distribution of the information set of the network attacker after the smoothing and the functional relation of the information set between rounds, and taking the distribution as the information set distribution in the first-order decision point of the opponent of the round, as shown in a formula (9).

In one embodiment, the device further comprises an opponent action preference estimation module for the second-order decision point in the round, wherein the opponent action preference estimation module is used for counting the types and the proportions of all action sequences in the network attack and defense process before the second-order decision point; according to the proportion of all action sequences before the second-order decision point in the current round, carrying out two-dimensional approximate estimation on the probability of the action sequences of both the attack and the defense of the network; and determining the joint posterior distribution of the information sets of the two parties in the second-order decision point influenced by the previous action preference according to the probability of the action sequences of the two parties and the joint prior distribution of the information sets of the two parties, and determining the information set distribution in the second-order decision point of the network attacker according to the edge distribution of the joint posterior distribution.

In one embodiment, the opponent action preference estimation module of the second-order decision point in one round is further configured to set the action sequence to be uniform in the two-dimensional interval of the corresponding information set according to the general decision logic of the actions of the two parties, and determine the two-dimensional information set interval corresponding to all the action sequences according to the proportion of the action sequences。

In one embodiment, the opponent action preference estimation module of the second order decision point in one round is further used for estimating the action sequence according to the second order decision point The joint posterior distribution of the information sets of the network attack and defense parties in the second-order decision point is determined as shown in the formula (10). And determining the information set distribution of the network aggressors in the second-order decision point according to the edge distribution of the joint posterior distribution, wherein the information set distribution is shown in the formula (11).

Specific limitation regarding the opponent action preference estimation device for the attack and defense game can be referred to the limitation of the opponent action preference estimation method for the attack and defense game hereinabove, and will not be repeated herein. The modules in the opponent action preference estimation device facing the attack and defense game can be all or partially realized by software, hardware and a combination thereof. The above modules may be embedded in hardware or may be independent of a processor in the computer device, or may be stored in software in a memory in the computer device, so that the processor may call and execute operations corresponding to the above modules.

In one embodiment, an electronic device is provided, which may be a terminal, and an internal structure diagram thereof may be as shown in fig. 4. The electronic device comprises a processor 401, a memory 402, a network interface 403, a display 404 and an input device 405, which are connected by a system bus. Wherein the processor of the electronic device is configured to provide computing and control capabilities. The memory 402 of the electronic device includes a nonvolatile storage medium 4022 and an internal memory 4021. The nonvolatile storage medium 4022 stores an operating system and a computer program. The internal memory 4021 provides an environment for the operation of the operating system and computer programs in the nonvolatile storage medium. The network interface 403 of the electronic device is used for communication with an external terminal via a network connection. The computer program, when executed by the processor 401, implements a method for opponent action preference estimation for attack-and-defense gaming. The display screen 404 of the electronic device may be a liquid crystal display screen or an electronic ink display screen, and the input device 405 of the electronic device may be a touch layer covered on the display screen 404, or may be a key, a track ball or a touch pad arranged on a casing of the computer device, or may be an external keyboard, a touch pad or a mouse.

It will be appreciated by persons skilled in the art that the architecture shown in fig. 4 is merely a block diagram of some of the architecture relevant to the present inventive arrangements and is not limiting as to the computer device to which the present inventive arrangements are applicable, and that a particular computer device may include more or fewer components than shown, or may combine some of the components, or have a different arrangement of components.

In an embodiment, an electronic device is provided comprising a memory storing a computer program and a processor implementing the steps of any of the method embodiments described above when the computer program is executed.

The technical features of the above embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.

The above examples illustrate only a few embodiments of the application, which are described in detail and are not to be construed as limiting the scope of the application. It should be noted that it will be apparent to those skilled in the art that several variations and modifications can be made without departing from the spirit of the application, which are all within the scope of the application. Accordingly, the scope of protection of the present application is to be determined by the appended claims.

Claims

1. An opponent action preference estimation method facing attack and defense games is characterized by comprising the following steps:

counting the types and the proportions of action sequences of both network attack and defense parties in the previous round of network attack and defense process;

according to the proportion of the action sequences of all the two parties, carrying out two-dimensional approximate estimation on the probability of the action sequence in the previous round;

determining posterior distribution of the information set of the network attack party at the end of the previous round according to the probability of the action sequence and the joint distribution of the information sets of the network attack and defense parties;

obtaining the distribution of the information set of the network attacker at the beginning of the next round according to the posterior distribution of the information set of the network attacker after the smoothing and the functional relation of the information set between rounds, and taking the distribution as the information set distribution in the first-order decision point of the opponent of the round;

according to information set distribution in decision points of the network attacker, an opponent strategy model of the network attacker is established, the opponent strategy model of the network attacker is adopted to infer the action of the network attacker, and the defender defends by adopting an optimal countermeasure strategy according to the action of the network attacker.

2. The method of claim 1, wherein counting the type and proportion of the sequence of actions of both network attacks and defenses in the previous round of network attack and defenses comprises:

，

wherein,for the ratio of the number of action sequences acts to the total number of action sequences, +.>Number of act sequences for a class; />Is the total number of moving sequences; />The method comprises the steps of providing action sequence conditional probability under the joint distribution of information sets of both the attack and the defense of the network; />The method is the joint distribution of information sets of both the network attack and defense parties; />Respectively information sets of the previous round of network attack and defense parties, subscriptsPRepresenting the defending party, subscriptORepresenting network aggressors, i.e. adversaries, subscriptspreRepresenting the previous round.

3. The method of claim 1, wherein estimating the probability of the action sequence in the previous round in two-dimensional approximation based on the proportions of all the two-party action sequences comprises:

，

wherein,approximately probability of action sequence in two-dimensional space for information set,/->、/>Respectively information sets of both network attack and defense parties; / >Action sequences of both network attack and defense parties respectivelyactsCorresponding two-dimensional interval range of information set, each action sequenceactsThe probability of (2) should satisfy:

，

4. The method of claim 1, wherein determining the posterior distribution of the information set of the network attacker at the end of the previous round based on the probability of the sequence of actions and the joint distribution of the information set of both the network attacks and the defenders, comprises:

according to the approximate probability of the action sequence on the two-dimensional space of the information set and the action sequence set which can enter the next round, obtaining posterior joint distribution of the information sets of the two parties at the end of the previous round:

wherein (1)>For posterior joint distribution of both sets of information at the end of the previous round,for the prior joint distribution of the information sets of the two parties of the network attack and defense in the previous round, the method comprises the step of +. >Respectively information sets of the previous round of network attack and defense parties, subscriptsPRepresenting the defending party, subscriptORepresenting network aggressors, i.e. adversaries, subscriptspreRepresenting the previous round;

，

5. The method according to claim 1, wherein obtaining the distribution of the information set of the network attacker at the beginning of the next round as the information set distribution in the first-order decision point of the opponent of the round according to the posterior distribution of the information set of the network attacker after smoothing and the functional relation of the information set between rounds includes:

，

wherein,for a nonlinear density function of the previous round of smoothing, +.>For a preset kernel function, subscriptsjFor the index after uniform discretization of the information set, < >>Posterior distribution of information set for previous round network aggressor, ++>The information set is the information set of the network attacker of the previous round;

，

Wherein,for the distribution of the information set of the network aggressor at the beginning of the next round +.>For the functional relation of the information set between the previous round and the following round,/for the information set between the previous round and the following round>Is the information set of the network attacker at the beginning of the next round.

6. The method of claim 1, further comprising an opponent action preference estimation within a second order decision point within the current round, the steps comprising:

according to the proportion of all the action sequences of the two parties before the second-order decision point in the current round, carrying out two-dimensional approximate estimation on the probability of the action sequences of the network attack and defense parties;

and determining the joint posterior distribution of the information sets of the two parties in the second-order decision point influenced by the previous action preference according to the probability of the action sequences of the two parties and the joint prior distribution of the information sets of the two parties, and determining the information set distribution in the second-order decision point of the network attacker according to the edge distribution of the joint posterior distribution.

7. The method of claim 6, wherein performing a two-dimensional approximate estimation of the probability of the network attack and defense party action sequences based on the proportion of all action sequences preceding the second-order decision point in the current round comprises:

8. The method of claim 6, wherein determining a joint posterior distribution of the two party information sets within the second order decision points affected by the previous action preference based on the probability of the network attack and defense party action sequence and the joint prior distribution of the network attack and defense party information sets, and determining an information set distribution within the second order decision points of the network attack party based on an edge distribution of the joint posterior distribution, comprises:

，

wherein,for the joint posterior distribution of information sets of both the network attack and defense parties in the second-order decision point, the ++>、/>Information sets of both network attack and defense parties respectively, < ->Is the conditional distribution of the action sequences of the network attack and defense parties which can enter the second-order decision point before the second-order decision point, +.>Combine a priori distribution for both party information sets at the beginning of the current round,/->For action sequence->Is equal to the full probability of ；

，

9. An opponent action preference estimation device facing an attack and defense game, which is characterized by comprising:

the proportion determining module of the action sequences of the network attack and defense parties is used for counting the types and proportions of the action sequences of the network attack and defense parties in the previous round of network attack and defense process;

the probability estimation module of the action sequences of the network attack and defense parties is used for carrying out two-dimensional approximate estimation on the probability of the action sequences according to the proportion of all the action sequences of the previous round or all the action sequences before the second-order decision point of the current round;

the posterior distribution determining module of the current round information set of the network attacker is used for determining the posterior distribution of the information set of the network attacker on the current round second-order decision point or the posterior distribution of the information set of the network attacker at the end of the previous round through edge distribution solving according to the probability of the action sequence and the joint posterior distribution of the information sets of the network attack and defense parties;

the next round of information set distribution determining module of the network attacker is used for obtaining the distribution of the information set of the network attacker at the beginning of the next round according to the posterior distribution of the information set at the end of the previous round of the network attacker after the smoothing and the functional relation of the information sets between the rounds, and taking the distribution as the information set distribution in the first-order decision point of the opponent of the round;

And the opponent action presumption module is used for establishing an opponent strategy model of the network attacker according to the distribution of the information sets in the decision points of the network attacker, presuming the action of the network attacker by adopting the opponent strategy model of the network attacker, and defending by adopting an optimal countermeasure strategy according to the action of the network attacker.

10. An electronic device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor implements the method of any one of claims 1 to 8 when executing the computer program.