CN114285645B

CN114285645B - Man-in-the-middle attack coping method based on repeated game

Info

Publication number: CN114285645B
Application number: CN202111604797.2A
Authority: CN
Inventors: 朱进; 张景龙
Original assignee: University of Science and Technology of China USTC
Current assignee: University of Science and Technology of China USTC
Priority date: 2021-12-24
Filing date: 2021-12-24
Publication date: 2022-09-30
Anticipated expiration: 2041-12-24
Also published as: CN114285645A

Abstract

The invention relates to a man-in-the-middle attack coping method based on repeated games in network security, which distributes information originally transmitted by one port to a plurality of ports, selects a fixed number of partial ports to transmit invalid information, transmits effective information to be transmitted actually by the other ports, and then determines which ports transmit invalid or effective information again at regular intervals according to a port distribution strategy provided by the invention. The invention can reduce the redistribution times of the ports while reducing the loss caused by man-in-the-middle attack as much as possible.

Description

Man-in-the-middle attack coping method based on repeated game

Technical Field

The invention relates to the field of network security, in particular to a man-in-the-middle attack coping method based on repeated games, which is a technology capable of effectively coping with man-in-the-middle attacks and reducing information leakage loss.

Background

In the current information era, a large amount of information is spread through the internet, and the security and confidentiality of the information are increasingly emphasized. In a network, computers communicate with each other through ports, and a network attack named man-in-the-middle attack causes information leakage by intercepting normal network communication data and performing data tampering and sniffing.

The existing coping method can distribute the service originally provided by a certain port to a plurality of ports, namely, the transmitted information is dispersed, and due to the limitation of practical factors, the man-in-the-middle attack can only attack part of the ports simultaneously, so the loss caused by the man-in-the-middle attack can be reduced.

However, this still does not achieve optimal results, and a man-in-the-middle attack can still cause loss by stealing information by attacking a portion of the ports.

Disclosure of Invention

The invention solves the problems: the method overcomes the defects of the prior art, provides a man-in-the-middle attack coping method based on repeated games, and can further reduce the information leakage loss caused by man-in-the-middle attack.

The invention discloses a technology capable of effectively coping with man-in-the-middle attack and reducing information leakage loss. The invention constructs a scene of attack and defense of man-in-the-middle attack into a repeated game model, enables a part of ports to transmit invalid information, enables the other ports to transmit valid information, and redistributes the ports for transmitting valid and invalid information at regular intervals, aiming at reducing the loss caused by man-in-the-middle attack as much as possible. In addition, the number of reallocations is also reduced as much as possible in consideration of the fact that reallocation of ports has a certain influence on information transmission.

The technical scheme of the invention is as follows: a method for man-in-the-middle attacking based on repeated game includes distributing information originally transmitted by one port to multiple ports, selecting a fixed number of partial ports to transmit invalid information and transmitting effective information to be transmitted by other ports, modeling problem of defending man-in-the-middle attacking into repeated game model, generating new port distribution strategy at regular intervals according to innovative port distribution strategy method provided by the invention, namely, repeated game to determine which ports transmit invalid or effective information, reducing number of times of redistribution of ports as much as possible while reducing loss caused by man-in-the-middle attacking as much as possible.

The innovative port allocation strategy of the present invention is specifically implemented as follows:

step 1: generating an exploration strategy set, and ensuring that at least one allocation strategy for transmitting invalid information by a port exists for any one port;

step 2: initializing an accumulated reward estimation value and a strategy disturbance quantity for each port, and executing subsequent steps to generate a port distribution strategy in the time period at fixed time intervals;

and step 3: at the current moment, independently sampling each port from Gaussian distribution to obtain random quantity, and accumulating the random quantity to the strategy disturbance quantity of the port;

and 4, step 4: searching with a certain probability, and randomly selecting one from the searching strategy set as a current round of distribution strategy; or adopting a strategy which enables the sum of the accumulated reward estimate and the strategy disturbance amount to obtain the maximum value as an allocation strategy;

and 5: determining that the port transmits invalid information or valid information according to the allocation strategy, and observing the benefit of the port which is simultaneously attacked by the man-in-the-middle and transmits the invalid information according to the action taken by the man-in-the-middle attack;

step 6: simulating by utilizing a resampling algorithm to estimate the reciprocal of the probability of each port transmitting invalid information at the moment;

and 7: updating the accumulated reward estimation value according to the actually adopted distribution strategy, the observed partial reward and the probability reciprocal obtained by simulation;

and 8: and returning to the step 2 to continue generating the allocation strategy of the next round until the next moment, wherein the generation of the allocation strategy of the next round is finished.

Compared with the prior art, the invention has the following advantages:

(1) when the method faces man-in-the-middle attack in network security, any information of the opposite side does not need to be known, namely the method has good robustness and can deal with various types of opponents;

(2) the port allocation strategy of the invention can reduce the redistribution times of the ports while reducing the loss caused by man-in-the-middle attack as much as possible, in other words, the information leakage loss caused by man-in-the-middle attack is reduced, and the adverse effect of switching the ports on effective information transmission is reduced.

Drawings

FIG. 1 is a flow chart of the implementation of the method of the present invention.

Detailed Description

The man-in-the-middle attack is a common attack mode in the network attack, and the man-in-the-middle attack steals the information transmitted in the man-in-the-middle attack through an attack port, thereby causing information leakage loss. The existing means can distribute the information originally transmitted by a certain port to a plurality of ports for transmission, i.e. the transmitted information is dispersed, so as to reduce the loss caused by man-in-the-middle attack. However, this alone is not sufficient, and a man-in-the-middle attack can still cause loss by stealing information by attacking a portion of the ports. Therefore, the invention constructs a repeated game model on the basis of the prior art, so that a part of ports transmit invalid information, the other ports transmit valid information, the ports for transmitting valid and invalid information are redistributed in each round, and the redistribution times of the ports are reduced as much as possible while the loss caused by man-in-the-middle attack is reduced as much as possible.

Attack the middle man from the perspective of repeated gameThe specific mathematical model established in this scenario is as follows: the total number of ports capable of transmitting information is n, a defender can select k (k < n) ports to transmit invalid information each time, an n-dimensional binary vector v can be used for representing the port allocation strategy of the defender, if the ith (i is 1, …, n) ports transmit invalid information, the ith element of the corresponding strategy v is 1, otherwise, the ith element is 0, and the ith element is | | | v | | ₁ K, the set of all policies v is denoted by v at the same time. Correspondingly, an attacker can only attack m ports at the same time, so that | | a | | calculation result ₁ While we use

Representing the set of all policies a. Total port revenue r per round _t For an n-dimensional vector, set as follows: if the port i is attacked and the port transmits invalid information, the benefit r is obtained _t R of the ith component _t，i Is [0, 0.5 ]]A random value; the port transmits effective information, and the protection person suffers loss of [ -0.5, 0 [ -0 [ ]]A random value. For the un-attacked port, the defender's profit is 0 no matter whether the effective information is transmitted or not. Since the content and importance of information transmitted by each port are different, the protection value of each port is different, and therefore the profit value of each port set in the model is also different. To be closer to the actual situation, there are two more important features on the model setup: the defender has no prior knowledge and limited perception ability of the defender. The former feature is that the defender does not know the game income and the behavior model of the attacker in advance; the latter feature is that the defender can only observe the benefit on the port that is not transmitting valid information in each round of the game. Under this model setting, online learning methods can be utilized to generate policies for defenders. The strategy should pursue two objectives: on one hand, effective information of a plurality of ports is prevented from being stolen as much as possible, and more benefits are obtained, namely the regret degree is reduced as much as possible; on the other hand, the reallocation of ports has a certain effect on information transmission, so the number of reallocations should be reduced as much as possible.

In a general repeated secure game scene, in order to evaluate the quality of a defender strategy algorithm, an idea of "regrettability" is generally adopted, that is, a difference value between an optimal fixed strategy which is known later and the accumulated income obtained by actually adopting the strategy is provided, and the lower the regrettability is, the better the actual strategy is, the greater the obtained income is. Unfortunately, the definition is as follows:

wherein v is a theoretical optimal strategy; v. of _t And the actual strategy is adopted by the defense party at the moment T, and T is the total time of the attack and defense scene.

In addition, in network defense, additional loss such as delay or loss of information transmission may be caused by re-allocating ports to transmit valid information, so the number of re-allocation should be reduced as much as possible. Therefore, the 'number of reallocations' can be used to evaluate the quality of the strategy, and the lower the value of the index, the better the strategy. The "number of reallocations" is defined as follows:

S _T ＝|{1＜t≤T：v _t-1 ≠vt}

aiming at the scene of coping with man-in-the-middle attack in network defense, the method can generate an effective defense strategy, wherein the effective defense strategy comprises the following important hyper-parameters: sigma is the variance of Gaussian distribution, gamma is the exploration probability, and the method specifically comprises the following steps:

step 1: generating an exploration strategy set epsilon { epsilon ═ consisting of n-dimensional vectors ₁ ，...，ε _n Therein, the vector ε _i The ith component of (a) is determined to be 1, which means that the port i must transmit invalid information, the rest components are 0 or 1, and the vector epsilon _i If and only k components are 1, it means that k ports transmit invalid information;

step 2: initializing a cumulative prize estimate for each of n ports available for transmitting information

All initial estimation values are combined into an n-dimensional accumulated reward estimation vector

Similarly, a perturbation Z is initialized for each port _0，i 0, form an n-dimensional perturbation vector Z ₀ ＝(Z _0，1 ，Z _0，2 ，…，Z _0，n ). Performing the subsequent steps at regular intervals, i.e. when T is 1, 2., T;

and step 3: at time t, the variance is σ from obedience expected to be 0 ² (preset) Gaussian distribution

And independently sampling to obtain n random quantities

Form an n-dimensional vector X _t ＝(X _t，1 ，X _t，2 ，…，X _t，n ) Random vector X _t Accumulated to disturbance vector Z _t-1 To obtain Z _t I.e. Z _t ＝Z _t-1 +X _t ；

And 4, step 4: uniformly and randomly sampling from 0 to 1 to obtain a value alpha, and randomly selecting a vector from a strategy set epsilon as a current round of distribution strategy v if the alpha is smaller than a search probability gamma set in advance _t (ii) a Otherwise, take the accumulated reward estimate

And random walk disturbance Z _t V taking the maximum sum of v as the allocation policy v _t I.e. by

And 5: according to an allocation policy v _t Determining port assignment where v _t If the ith component of (a) is 1, the ith port transmits invalid information, and if the ith component is 0, the ith port transmits valid information. From the action taken by the man-in-the-middle attack, the actual reward vector can be observedr _t Partial component r of _t，i I.e. the benefit of the port which is simultaneously attacked by the man-in-the-middle and transmits invalid information;

step 6: executing step 7-9 (resampling algorithm) to estimate the reciprocal of the probability of transmitting invalid information at the port i at the moment, and recording the reciprocal as K (t, i);

and 7: initializing K (t, i) ═ 0 for all i ═ 1, 2.., n; repeating steps 8-9 for k 1, 2.. times, M, where M represents a maximum number of simulations set in advance;

and 8: executing the step 3-4 to generate an allocation strategy v _t A simulation of

And step 9: for all i 1, 2, n, if k < M,

and K (t, i) ═ 0, then K (t, i) is set to K; otherwise if K is M and K (t, i) is 0, then K (t, i) is set to M;

step 10: according to the distribution strategy v actually adopted _t Observed partial awards r _t，i And simulating the derived K (t, i) to update the cumulative prize estimate

The specific update follows the following equation:

step 11: and returning to the step 2 to continue generating the allocation strategy of the next round until the next time T +1 until the time T is finished.

Under the above-mentioned "man-in-the-middle attack" scene without prior knowledge and with limited observability restriction, the use of the present invention to formulate the port allocation strategy limits the desired upper limit of "regret degree" and "reallocation times" at a lower level, as shown in the following two formulas:

and

(1) in particular, take

Unfortunately desirable upper bounds are:

i.e., the desired upper bound of regressions after T rounds does not exceed

This means that as T goes towards infinity, unfortunately approaching 0, the actual strategy converges to the optimal fixed strategy.

(2) By using

And

the desired upper bound for the number of reallocations may be approximated as:

generally, the search rate γ is set to be small (between 0 and 0.1), and when

When the order of k log n is not exceeded, the order of the first term on the right side of the above formula is not exceeded by the second term, and the redistribution times can be approximate to

I.e. the number of reallocations increases sub-linearly with round T.

Claims

1. A man-in-the-middle attack coping method based on repeated game is characterized in that: distributing information originally transmitted by one port to a plurality of ports, selecting a fixed number of partial ports from the ports to transmit invalid information, transmitting effective information to be actually transmitted by the other ports, modeling a problem of defending man-in-the-middle attack into a repeated game model, generating a new port distribution strategy according to the port distribution strategy at intervals of set time, namely, a repeated game to determine which ports transmit invalid or effective information, and reducing the number of times of redistribution of the ports as much as possible while reducing the loss caused by man-in-the-middle attack as much as possible;

the port allocation policy is specifically implemented as follows:

and 2, step: initializing an accumulated reward estimation value and a strategy disturbance quantity for each port, and executing subsequent steps to generate a port distribution strategy in fixed time at fixed time intervals;

and 4, step 4: randomly selecting one strategy from the exploration strategy set as a current round of distribution strategy when the exploration is carried out according to the set probability; or adopting a strategy which enables the sum of the accumulated reward estimation and the strategy disturbance amount to obtain the maximum value as an allocation strategy;

step 6: carrying out analog estimation on the reciprocal of the probability of transmitting invalid information at each port at the moment by utilizing a resampling algorithm;

and step 8: and returning to the step 2 to continue generating the allocation strategy of the next round until the next moment, wherein the generation of the allocation strategy of the next round is finished.