CN115361686A - Safety exploration reinforcement learning method oriented to wireless communication safety - Google Patents

Safety exploration reinforcement learning method oriented to wireless communication safety Download PDF

Info

Publication number
CN115361686A
CN115361686A CN202211007434.5A CN202211007434A CN115361686A CN 115361686 A CN115361686 A CN 115361686A CN 202211007434 A CN202211007434 A CN 202211007434A CN 115361686 A CN115361686 A CN 115361686A
Authority
CN
China
Prior art keywords
network
wireless communication
security
action
risk
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202211007434.5A
Other languages
Chinese (zh)
Other versions
CN115361686B (en
Inventor
肖亮
牛国航
吕泽芳
肖奕霖
杨和林
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xiamen University
Original Assignee
Xiamen University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xiamen University filed Critical Xiamen University
Priority to CN202211007434.5A priority Critical patent/CN115361686B/en
Publication of CN115361686A publication Critical patent/CN115361686A/en
Application granted granted Critical
Publication of CN115361686B publication Critical patent/CN115361686B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W12/00Security arrangements; Authentication; Protecting privacy or anonymity
    • H04W12/12Detection or prevention of fraud
    • H04W12/121Wireless intrusion detection systems [WIDS]; Wireless intrusion prevention systems [WIPS]
    • H04W12/122Counter-measures against attacks; Protection against rogue devices
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W12/00Security arrangements; Authentication; Protecting privacy or anonymity
    • H04W12/60Context-dependent security
    • H04W12/67Risk-dependent, e.g. selecting a security level depending on risk profiles
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D30/00Reducing energy consumption in communication networks
    • Y02D30/70Reducing energy consumption in communication networks in wireless communication networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Mobile Radio Communication Systems (AREA)

Abstract

A security exploration reinforcement learning method oriented to wireless communication security relates to the security of wireless communication. The state risk network and the action risk network are introduced to distinguish state risks and action risks, fitting accuracy of action risk degrees is improved, action choices are corrected by the action risk degrees, exploration danger strategies are avoided, and safe exploration under a wireless communication scene is achieved. The method comprises the following steps: the information sender uses the value network to evaluate the long-term accumulated return of taking different actions in the current state, evaluates the risk values of taking different actions in the current state according to the performance evaluation index and the communication requirement of the communication system, utilizes the state risk network and the action risk network to fit the long-term accumulated risk values and revise the output value of the value network, and selects the safe transmission strategy according to the revised values of the different actions. The method can reduce the exploration of risk strategies in wireless communication security application and improve the security of wireless communication.

Description

Safety exploration reinforcement learning method oriented to wireless communication safety
Technical Field
The invention relates to security of wireless communication, belongs to the field of modern wireless communication security, and particularly relates to a security exploration reinforcement learning method for wireless communication security.
Background
With the rapid development of wireless communication technologies, such as unmanned aerial vehicle video image transmission, voice call, and wireless body area network, wireless communication has become relevant to people's life. However, due to the openness of wireless communication, the wireless communication is vulnerable to interference, eavesdropping and the like during communication, and the privacy and the security of the communication system are seriously threatened. In a wireless communication system, techniques such as frequency hopping and power control are generally used to cope with illegal attacks, so as to improve the security of the communication system.
The reinforcement learning is learned in an unknown environment in a trial and error mode, attack strategies such as interference and the like or network parameters such as a channel state and the like do not need to be predicted, and the reinforcement learning is widely applied to the field of wireless communication security. For example, chinese patent CN112291495B proposes a low-latency anti-interference wireless video transmission method based on reinforcement learning, which uses an improved deep reinforcement learning algorithm to combine boltzmann distribution with a DQN algorithm, dynamically optimizes a transmission channel, transmission power, and a code modulation mode to resist interference attack; chinese patent CN113079167A provides an intrusion detection method and system of the Internet of vehicles based on deep reinforcement learning, and an intrusion detection model based on flow data is established by using a deep certainty strategy gradient algorithm; chinese patent CN113225794A proposes a full-duplex cognitive communication power control method based on deep reinforcement learning, which directly uses DQN algorithm to optimize the power control strategy of the secondary user transmitter.
Dai et al [ C.Dai, L.Xiao, X.Wan and Y.Chen "," discovery learning with safe application for network security "," in Proc. IEEE International Conference on Acoustics, speech and Signal Processing (ICASSP), brighton, UK, may 2019 ] propose a security discovery Reinforcement learning algorithm for network security, which uses security performance indexes to evaluate risk values of actions, thereby improving security performance of network security applications. Lu X et al [ Lu X, xiao L, niu G, et al, safe amplification in Wireless Security [ A Safe Reinforcement Learning Using Hierarchical Structure and Security [ J ]. IEEE Transactions on Information forms and Security,2022 ] propose a Security Reinforcement Learning Algorithm based on a Hierarchical Structure and Security criteria of action selection priority, compress action space using the Hierarchical Structure and action risk assessment criteria, optimize Security policy of Wireless communication Security application, thereby preventing serious consequences such as network collapse. Wachi Akifumi and Yanan Sun [ Wachi A, sun Y. Safe re-establishment in constrained markov decision processes [ C ]// International Conference on Machine learning. PMLR, 2020-9797-9806 ] propose a method for Markov decision process exploration and optimization under unknown security constraints, learn security constraints by expanding the security zones, then optimize accumulated rewards in the authenticated security zones, and achieve approximately optimal accumulated rewards while guaranteeing security in the constrained Markov decision process. Tessler et al [ C.Tessler, D.J.Mankowitz, and S.Mannor, "rewarded constrained policy optimization," in proc.int.Conf.Learning Repressions (ICLR), new Orleanans, LA, may 2019 ] propose a policy optimization method based on Reward constraints, which introduces two judge networks, respectively fits rewards of Reward and security constraints, and introduces security constraints as penalty signals into Reward functions to realize a safety exploration of reinforcement learning.
Although the existing wireless communication security scheme based on reinforcement learning achieves certain effects of anti-interference or intrusion detection and the like in a wireless communication security scene. However, most of the schemes do not consider the exploration of risk strategies, such as strategies causing communication interruption, in the initial learning phase, and the proposed security reinforcement learning algorithm does not distinguish the risk of state from the risk of action, and cannot accurately fit the action risk degree.
Disclosure of Invention
The invention aims to provide a security exploration reinforcement learning algorithm oriented to wireless communication security, aiming at the problems in the prior art, and the security exploration reinforcement learning algorithm is provided for designing a state risk network and an action risk network, improving the fitting accuracy of action risk degree, correcting risk actions so as to realize security exploration, avoiding selecting a risk strategy causing system communication interruption, and improving the wireless communication security.
The invention comprises the following steps:
step 1: initializing parameters:
the total number of data packets to be transmitted in the wireless communication system is K, each time of transmitting one data packet forms a time slot, and the total time slot is {1,2, \8230;, K, \8230;, K }; the information sender can adjust N wireless communication security strategies, such as frequency hopping, power control, code modulation modes and the like, to cope with interference attack in wireless communication; counting the ith security policy p i The feasible value number of (i is more than or equal to 1 and less than or equal to N) is L i (1≤L i N) is less than or equal to N), the action space set formed by all possible safety strategies is T, and the number of actions in the action space set is T
Figure BDA0003809499590000021
There are M performance evaluation indexes { d } in communication system i } 1≤i≤M E.g., delay, bit error rate, etc., where the performance i (1. Ltoreq. I. Ltoreq.M) satisfies the condition for normal communication
Figure BDA0003809499590000022
The information sender can sense J pieces of communication state information o i } 1≤i≤J Such as channel status and transmission information type; constructing three neural networks V, S and A with three fully-connected layers, wherein the network V comprises M + J input neurons, H hidden neurons and L output neurons; the network S comprises M + J input neurons, H hidden neurons and 1 output neuron; the network A comprises M + J input neurons, H hidden neurons and L output neurons; randomly initializing weight matrix of three neural networks
Figure BDA0003809499590000023
Omega and psi, initializing learning parameter zeta epsilon (0, 1), buffer zone
Figure BDA0003809499590000031
Sampling number B, random exploration probability eta and initial performance { d } i (0) } 1≤i≤M
Step 2: kth time slot, information transmissionThe method receives the performance evaluation index { d } of the last time slot communication system i (k-1) } 1≤i≤M And obtains communication state information o by perceptual calculation i (k) } 1≤i≤J Building the current state of the system
Figure BDA0003809499590000032
And step 3: the information sender will state s (k) The output of the network V is denoted as V = { V =, = { V, as the input of the network V, the network S, and the network a, respectively m } 1≤m≤L Representing the value of different actions; recording the output of the network S as S, representing the risk value of the current state; let the output of network A be A = { A = { (A) } m } 1≤m≤L Representing taking different risk values in the current state; the outputs of the network S and the network a together form a risk degree X = { X) of a state action pair m } 1≤m≤L
Figure BDA0003809499590000033
And 4, step 4: noting the Q vector Q = V-X, the sender selects the action p with the maximum corresponding Q with a probability of 1- η i I is more than or equal to 1 and less than or equal to N, other security strategies are randomly selected according to the probability of eta, and the obtained action combination P (k) =[p 1 ,p 2 ,…p N ]Adjusting a wireless communication security policy, and sending a data packet to an information receiver;
and 5: after receiving the data packet, the information receiver calculates the performance evaluation index { d ] of the current communication system i (k) } 1≤i≤M Feeding back the performance evaluation index to the information sender;
and 6: the information sender receives the performance evaluation index and calculates the benefit u through the benefit function f (k)
u (k) =f(d 1 (k) ,d 2 (k) ,…,d M (k) )
And 7: the sender of the message evaluates the current byDegree of risk r of a state action pair (k) Wherein I (·) is an indicator function, and is 0 if the parameter condition in the parentheses is true, otherwise, it is 1, for measuring the risk degree:
Figure BDA0003809499590000034
and step 8: x of quadruplet (k) ={s (k) ,P (k) ,u (k) ,r (k) Storing the data into a cache region C, and if the number of the data in the cache region is more than or equal to the sampling number B, randomly extracting B pieces of data { χ } from the cache region (i) } 1≤i≤B And updating parameters of network V, network S and network A by the following equations
Figure BDA0003809499590000036
ω (k) And psi (k) Where V (-), S (-), and A (-) represent the output values of network V, network S, and network A, respectively:
Figure BDA0003809499590000035
Figure BDA0003809499590000041
and step 9: repeating the steps 2-8 until the performance evaluation indexes of the communication system meet the normal communication requirement, namely
Figure BDA0003809499590000042
Wherein i is more than or equal to 1 and less than or equal to M.
Compared with the prior art, the invention has the following outstanding advantages:
according to the method and the device, risk values of different actions taken in the current state are evaluated according to performance evaluation indexes and communication requirements of the communication system, state risk and action risk are distinguished by a state risk network and an action risk network, fitting accuracy of action risk degree is improved, action selection is corrected by utilizing the action risk degree, a danger strategy is prevented from being explored, exploration on the risk strategy is reduced in wireless communication safety application, and safety of wireless communication is improved.
Drawings
Fig. 1 is a comparison of packet loss rates in image transmission.
Fig. 2 is a comparison of communication interruption probabilities.
Fig. 3 is a comparison of communication power consumption.
Detailed Description
In order to more clearly understand the technical content of the present invention, the technical solution of the present invention is described below with reference to the following specific embodiments and the accompanying drawings.
The embodiment of the invention comprises the following steps:
step 1: the total number of data packets to be transmitted in the wireless communication system is 1000, each time of transmitting one data packet forms a time slot, and the total time slot is {1,2, \8230;, k, \8230;, 1000}. The information sender can adjust three wireless communication security strategies of frequency hopping, power control and coding modulation modes to deal with interference attack in wireless communication. Counting the ith security policy p i (i is more than or equal to 1 and less than or equal to 3) the feasible value number is L i (1≤L i Less than or equal to N). The action space set composed of all possible security policies is T, and the number of actions in the action space set is
Figure BDA0003809499590000043
Two performance evaluation indexes d of time delay and error rate in communication system 1 And d 2 The condition satisfying the normal communication is d 1 D is less than or equal to 0.4s 2 Less than or equal to 0.01 percent, and the information sender can sense the information of the two communication states of the channel state and the transmission information type. Constructing a neural network V, a neural network S and a neural network A with three fully-connected layers, wherein the network V comprises 4 input neurons, 128 hidden neurons and L output neurons; the network S comprises 4 input neurons, 128 hidden neurons and 1 output neuron; network a contains 4 input neurons, 128 hidden neurons, and 3 output neurons. Randomly initializing weight matrices of three neural networks
Figure BDA0003809499590000051
ω and ψ, initial learning parameter ζ =0.5, buffer
Figure BDA0003809499590000052
Sampling number B =64, random exploration probability eta =0.05 and initial performance d 1 (0) =1 and d 2 (0) =0.001。
And 2, step: in the k time slot, the information sender receives the performance evaluation index of the last time slot communication system, including the time delay d 1 (k-1) Sum error rate d 2 (k-1) And obtaining the channel state o by perceptual calculation 1 (k) And type o of transmission information 2 (k) Building the current state of the system
Figure BDA0003809499590000053
And 3, step 3: the information sender will state s (k) The output of the network V is denoted as V = { V as input of the network V, the network S, and the network a, respectively m } 1≤m≤L Representing the value of the different actions; recording the output of the network S as S, representing the risk value of the current state; let the output of network A be A = { A = { (A) } m } 1≤m≤L Representing different risk values taken at the current state. The outputs of the network S and the network a together form a risk degree X = { X) of a state action pair m } 1≤m≤L
Figure BDA0003809499590000054
And 4, step 4: recording Q value vector Q = V-X, the information sender selects the frequency hopping, power and code modulation mode p with the maximum corresponding Q value with the probability of 0.95 1 、p 2 、p 3 Three actions, randomly selecting other security policies with a probability of 0.05, and combining P according to the obtained actions (k) =[p 1 ,p 2 ,p 3 ]Adjusting wireless communication security policy, and sending data packet to information receiver。
And 5: after receiving the data packet, the information receiver calculates the performance evaluation index d of the current communication system 1 (k) And d 2 (k) And feeding back the performance evaluation index to the information sender.
And 6: the information sender receives the performance evaluation index and calculates the benefit u according to the following formula (k)
u (k) =-d 1 (k) -1000*d 2 (k)
And 7: the information sender evaluates the risk degree r of the current state action pair through the following formula (k) Wherein I (·) is an indicator function, and is 0 if the parameter condition in the parentheses is true, otherwise, it is 1, for measuring the risk degree:
r (k) =I(d 1 (k) <=0.4s)+I(d 2 (k) <=0.01%)
and 8: four-tuple x (k) ={s (k) ,P (k) ,u (k) ,r (k) Storing the data into a buffer area C, if the number of the data in the buffer area is more than or equal to the sampling number B, randomly extracting B pieces of data { χ ] from the buffer area (i) } 1≤i≤B And updating parameters of network V, network S and network A by the following equations
Figure BDA0003809499590000056
ω (k) And psi (k) Where V (-), S (-), and A (-) represent the output values of network V, network S, and network A, respectively:
Figure BDA0003809499590000055
Figure BDA0003809499590000061
and step 9: repeating the steps 2-8 until the performance evaluation index of the communication system can meet the normal communication requirement, namely d 1 D is less than or equal to 0.4s 2 ≤0.01%。
Fig. 1 shows an image transmission packet loss rate of the security exploration reinforcement learning method for wireless communication security according to the embodiment of the present invention compared with a DQN algorithm proposed by ***. Fig. 2 shows a communication interruption probability of the security exploration reinforcement learning method for wireless communication security according to the embodiment of the present invention compared with the DQN method proposed by ***. Fig. 3 shows communication energy consumption of the security exploration reinforcement learning method oriented to wireless communication security according to the embodiment of the present invention compared with the DQN method proposed by ***. According to the method and the system, risk values of different actions taken under the current state are evaluated according to performance evaluation indexes and communication requirements of a communication system, the state risk network and the action risk network are introduced to distinguish state risks and action risks, the fitting accuracy of action risk degrees is improved, the action risk degrees are used for correcting action choices, danger strategies are avoided being explored, exploration on the risk strategies is reduced in wireless communication safety application, and the safety of wireless communication is improved.
The above-described embodiments are merely preferred embodiments of the present invention, and should not be construed as limiting the scope of the invention. All equivalent changes and modifications made within the scope of the present invention shall fall within the scope of the present invention.

Claims (9)

1. A safety exploration reinforcement learning method oriented to wireless communication safety is characterized by comprising the following steps:
step 1: three neural networks with three fully connected layers were constructed: initializing parameters of a network V, a network S and a network A;
and 2, step: at the kth time slot, the information sender receives the performance evaluation index of the last time slot communication system, obtains communication state information through perception calculation, and constructs the current state s of the system (k)
And step 3: the information sender will state s (k) The output of the network S and the output of the network A jointly form the risk degree X of the state action pair;
and 4, step 4: the sender of the message selects the action with the largest corresponding Q value with a probability of 1-etaMaking p i Randomly selecting other security policies according to the probability of eta, and combining P according to the obtained actions (k) Adjusting a wireless communication security policy, and sending a data packet to an information receiver;
and 5: after receiving the data packet, the information receiver calculates the performance evaluation index { d ] of the current communication system i (k) } 1≤i≤M Feeding back the performance evaluation index to an information sender;
step 6: the information sender receives the performance evaluation index and calculates the benefit u through the benefit function f (k)
u (k) =f(d 1 (k) ,d 2 (k) ,…,d M (k) )
And 7: the information sender evaluates the risk degree r of the current state action pair (k)
And 8: four-tuple x (k) ={s (k) ,P (k) ,u (k) ,r (k) Storing the data into a cache region C, and if the number of the data in the cache region is more than or equal to the sampling number B, randomly extracting B pieces of data { χ } from the cache region (i) } 1≤i≤B And updating parameters of network V, network S and network A
Figure FDA0003809499580000011
ω (k) And psi (k)
And step 9: repeating the steps 2-8 until the performance evaluation indexes of the communication system meet the normal communication requirements, namely
Figure FDA0003809499580000012
Wherein i is more than or equal to 1 and less than or equal to M.
2. The method as claimed in claim 1, wherein in step 1, the specific steps of constructing three neural networks with three fully-connected layers are: the total number of data packets to be transmitted in a wireless communication system is K, each time a data packet is transmitted, a time slot is formed, and the total time slot is K{1,2, \8230;, K }; the information sender adjusts N wireless communication security strategies to deal with interference attacks in wireless communication; counting the ith security policy p i The feasible value number of (i is more than or equal to 1 and less than or equal to N) is L i (1≤L i N) is smaller than or equal to N), the action space set composed of all possible combinations of safety strategies is T, and the number of actions in the action space set is
Figure FDA0003809499580000013
There are M performance evaluation indexes { d }in the communication system i } 1≤i≤M Wherein the condition that the performance i (i is more than or equal to 1 and less than or equal to M) meets the normal communication is
Figure FDA0003809499580000021
The information sender can sense J pieces of communication state information (o) i } 1≤i≤J B, carrying out the following steps of; constructing three networks V, S and A with three full-connection layers, wherein the network V comprises M + J input neurons, H hidden neurons and L output neurons; the network S comprises M + J input neurons, H hidden neurons and 1 output neuron; network a contains M + J input neurons, H hidden neurons, and L output neurons.
3. The method as claimed in claim 2, wherein the N wireless communication security policies include but are not limited to frequency hopping, power control, and code modulation; the M performance evaluation indexes include but are not limited to time delay and bit error rate; the J pieces of communication state information include, but are not limited to, a channel state, and a transmission information type.
4. The method as claimed in claim 1, wherein the initialization parameter in step 1 is a weight matrix for randomly initializing three neural networks
Figure FDA0003809499580000022
Omega and psi, initial chemistryLearning parameter Zeta ∈ (0, 1), buffer zone
Figure FDA0003809499580000023
Sampling number B, random exploration probability eta and initial performance { d } i (0) } 1≤i≤M
5. The method as claimed in claim 1, wherein in step 2, the current state s of the building system is determined (k) The method comprises the following specific steps: the information sender receives the performance evaluation index { d } of the last time slot communication system at the k time slot i (k-1) } 1≤i≤M And obtains communication state information o by perceptual calculation i (k) } 1≤i≤J Building the current state of the system
Figure FDA0003809499580000024
6. The security-oriented reinforcement learning method for wireless communication security as claimed in claim 1, wherein in step 3, the message sender sends the status s (k) The method is used as the input of a network V, a network S and a network A respectively, the output of the network S and the output of the network A jointly form the risk degree X of a state action pair, and the method specifically comprises the following steps: the information sender will state s (k) The output of the network V is denoted as V = { V as input of the network V, the network S, and the network a, respectively m } 1≤m≤L Representing the value of different actions; recording the output of the network S as S, representing the risk value of the current state; let the output of network A be A = { A = { (A) } m } 1≤m≤L Representing taking different risk values in the current state; the outputs of the network S and the network a together form a risk degree X = { X) of a state action pair m } 1≤m≤L
Figure FDA0003809499580000025
7. The method as claimed in claim 1, wherein the action p with the largest Q value is selected with a probability of 1- η in step 4 i Randomly selecting other security policies according to the probability of eta, and combining P according to the obtained actions (k) The method for adjusting the wireless communication security strategy and sending the data packet to the information receiver comprises the following specific steps: let Q value vector Q = V-X, select action p with maximum corresponding Q value with probability of 1-eta i Wherein i is more than or equal to 1 and less than or equal to N, randomly selecting other security strategies according to the probability of eta, and obtaining an action combination P (k) =[p 1 ,p 2 ,…p N ]And adjusting the wireless communication security policy, and sending the data packet to an information receiver.
8. The method as claimed in claim 1, wherein in step 7, the sender evaluates the risk degree r of the action pair of the current state (k) The method comprises the following specific steps: the information sender evaluates the risk degree r of the current state action pair through the following formula (k)
Figure FDA0003809499580000031
Wherein, I (-) is an indication function, and is 0 if the parameter condition in the parentheses is satisfied, otherwise is 1, and is used for measuring the risk degree.
9. The method as claimed in claim 1, wherein in step 8, the parameters of network V, network S and network a are updated
Figure FDA0003809499580000032
ω (k) And psi (k) Updated by the following formula, where V (-), S (-), and A (-) represent the output values of network V, network S, and network A, respectively:
Figure FDA0003809499580000033
Figure FDA0003809499580000034
CN202211007434.5A 2022-08-22 2022-08-22 Safety exploration reinforcement learning method for wireless communication safety Active CN115361686B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211007434.5A CN115361686B (en) 2022-08-22 2022-08-22 Safety exploration reinforcement learning method for wireless communication safety

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211007434.5A CN115361686B (en) 2022-08-22 2022-08-22 Safety exploration reinforcement learning method for wireless communication safety

Publications (2)

Publication Number Publication Date
CN115361686A true CN115361686A (en) 2022-11-18
CN115361686B CN115361686B (en) 2024-05-03

Family

ID=84003516

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211007434.5A Active CN115361686B (en) 2022-08-22 2022-08-22 Safety exploration reinforcement learning method for wireless communication safety

Country Status (1)

Country Link
CN (1) CN115361686B (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190228309A1 (en) * 2018-01-25 2019-07-25 The Research Foundation For The State University Of New York Framework and methods of diverse exploration for fast and safe policy improvement
US20200076857A1 (en) * 2018-08-31 2020-03-05 Microsoft Technology Licensing, Llc Secure exploration for reinforcement learning
CN112291495A (en) * 2020-10-16 2021-01-29 厦门大学 Wireless video low-delay anti-interference transmission method based on reinforcement learning
KR20220102395A (en) * 2021-01-13 2022-07-20 부경대학교 산학협력단 System and Method for Improving of Advanced Deep Reinforcement Learning Based Traffic in Non signalalized Intersections for the Multiple Self driving Vehicles

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190228309A1 (en) * 2018-01-25 2019-07-25 The Research Foundation For The State University Of New York Framework and methods of diverse exploration for fast and safe policy improvement
US20200076857A1 (en) * 2018-08-31 2020-03-05 Microsoft Technology Licensing, Llc Secure exploration for reinforcement learning
CN112291495A (en) * 2020-10-16 2021-01-29 厦门大学 Wireless video low-delay anti-interference transmission method based on reinforcement learning
KR20220102395A (en) * 2021-01-13 2022-07-20 부경대학교 산학협력단 System and Method for Improving of Advanced Deep Reinforcement Learning Based Traffic in Non signalalized Intersections for the Multiple Self driving Vehicles

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
CANHUANG DAI 等: "Reinforcement Learning with Safe Exploration for Network Security", ICASSP 2019 - 2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 17 April 2019 (2019-04-17) *
吴金立: "面向无人机应用的5G毫米波波束管理技术研究", 中国优秀硕士论文全文数据库, 15 May 2021 (2021-05-15) *
徐堂炜 等: "基于强化学习的低时延车联网群密钥分配管理技术", 网络与信息安全学报, vol. 6, no. 5, 31 October 2020 (2020-10-31) *

Also Published As

Publication number Publication date
CN115361686B (en) 2024-05-03

Similar Documents

Publication Publication Date Title
Sagduyu et al. IoT network security from the perspective of adversarial deep learning
Shi et al. Spectrum data poisoning with adversarial deep learning
Xiao et al. Anti-jamming underwater transmission with mobility and learning
Abdalzaher et al. A deep autoencoder trust model for mitigating jamming attack in IoT assisted by cognitive radio
CN111294058A (en) Channel coding and error correction decoding method, equipment and storage medium
Parras et al. Learning attack mechanisms in wireless sensor networks using Markov decision processes
Sattiraju et al. AI-assisted PHY technologies for 6G and beyond wireless networks
CN113726471B (en) Parameter optimization method of intelligent reflection surface auxiliary MIMO hidden communication system
CN113225794B (en) Full-duplex cognitive communication power control method based on deep reinforcement learning
US11611457B2 (en) Device and method for reliable classification of wireless signals
Wang et al. A robust cooperative spectrum sensing scheme based on Dempster-Shafer theory and trustworthiness degree calculation in cognitive radio networks
DelVecchio et al. Effects of forward error correction on communications aware evasion attacks
Dai et al. Reinforcement learning based power control for vanet broadcast against jamming
Xu et al. A new anti-jamming strategy based on deep reinforcement learning for MANET
Lee et al. Robust transmit power control with imperfect CSI using a deep neural network
AlQerm et al. Adaptive multi-objective Optimization scheme for cognitive radio resource management
CN115361686A (en) Safety exploration reinforcement learning method oriented to wireless communication safety
Wang et al. Adaptive resource allocation for semantic communication networks
CN113453220A (en) Security method for resisting trust attack of wireless sensor network
CN112329523A (en) Underwater acoustic signal type identification method, system and equipment
CN116667966A (en) Intelligent interference model rewarding poisoning defense and training method and system
Zhang et al. Resource management for heterogeneous semantic and bit communication systems
WO2019237475A1 (en) Secure multi-user pilot authentication method based on hierarchical two dimensional feature coding
Xu et al. Finite Blocklength covert communications: When the warden wants to detect the communications quickly
Arjoune et al. Real-time machine learning based on hoeffding decision trees for jamming detection in 5G new radio

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant