CN112954814B - Channel quality access method in cognitive radio - Google Patents

Channel quality access method in cognitive radio Download PDF

Info

Publication number
CN112954814B
CN112954814B CN202110107271.7A CN202110107271A CN112954814B CN 112954814 B CN112954814 B CN 112954814B CN 202110107271 A CN202110107271 A CN 202110107271A CN 112954814 B CN112954814 B CN 112954814B
Authority
CN
China
Prior art keywords
network
channel
actor
secondary user
global
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110107271.7A
Other languages
Chinese (zh)
Other versions
CN112954814A (en
Inventor
叶方
张音捷
李一兵
孙骞
田园
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Harbin Engineering University
Original Assignee
Harbin Engineering University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Harbin Engineering University filed Critical Harbin Engineering University
Priority to CN202110107271.7A priority Critical patent/CN112954814B/en
Publication of CN112954814A publication Critical patent/CN112954814A/en
Application granted granted Critical
Publication of CN112954814B publication Critical patent/CN112954814B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W74/00Wireless channel access
    • H04W74/08Non-scheduled access, e.g. ALOHA
    • H04W74/0808Non-scheduled access, e.g. ALOHA using carrier sensing, e.g. carrier sense multiple access [CSMA]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04BTRANSMISSION
    • H04B17/00Monitoring; Testing
    • H04B17/30Monitoring; Testing of propagation channels
    • H04B17/309Measuring or estimating channel quality parameters
    • H04B17/336Signal-to-interference ratio [SIR] or carrier-to-interference ratio [CIR]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04BTRANSMISSION
    • H04B17/00Monitoring; Testing
    • H04B17/30Monitoring; Testing of propagation channels
    • H04B17/382Monitoring; Testing of propagation channels for resource allocation, admission control or handover
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D30/00Reducing energy consumption in communication networks
    • Y02D30/70Reducing energy consumption in communication networks in wireless communication networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Signal Processing (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Electromagnetism (AREA)
  • Theoretical Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Mathematical Physics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Biophysics (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Biomedical Technology (AREA)
  • Quality & Reliability (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Mobile Radio Communication Systems (AREA)

Abstract

The invention provides a channel quality access method in cognitive radio, which comprises the following specific steps: the local networks comprise an actor network and a critic network, wherein the actor network is responsible for channel selection and interacts with the environment to collect interactive information, the critic network evaluates the advantages and disadvantages of the actor network channel selection strategy, but the local networks do not update the gradient and collect the gradient and transmit the gradient to the global network, the global network does not interact with the environment, gathers the gradients collected by the local networks, performs gradient update on the local networks, and transmits the updated network parameters to the local networks again. The invention comprehensively considers the channel quality and the idle probability, and the secondary user can effectively avoid accessing the inferior channel, thereby greatly improving the access success rate meeting the service quality requirement.

Description

Channel quality access method in cognitive radio
(I) the technical field
The invention belongs to the technical field of communication, in particular relates to a cognitive radio communication technology, and particularly relates to a channel quality access method in cognitive radio.
(II) background of the invention
With the popularization of 4G/5G networks, mobile devices are increasing, and diverse disciplines such as cloud computing, Internet of things and artificial intelligence are generated, so that emerging communication services emerge endlessly. However, wireless spectrum has become increasingly scarce as the basis for the operation of various types of communication services under existing spectrum planning management. The existing spectrum allocation mode has exclusivity and exclusivity, and even if an authorized user does not use the allocated frequency band, other users cannot use the allocated frequency band. The cognitive radio uses the authorized frequency band in a dynamic spectrum access mode, and provides a brand new scheme for improving the spectrum utilization rate on the premise of not causing harmful interference to authorized users/main users. The channel accessed by the secondary user sensing directly influences the sensing delay, transmission performance and other aspects of the secondary user, and the research thereof is imminent, and the channel accessed by the secondary user sensing is one of the key factors for improving the performance of the cognitive radio system.
The existing channel access algorithm adopts sequential detection access, determines a sensing sequence before sensing and senses according to the defined sensing sequence. And sequentially detecting the access under the condition of knowing some channel environment prior information, such as channel idle probability, a master user occupation rule, a channel signal-to-noise ratio and other information, and designing a channel sensing access sequence. Although sequential detection access is simple in design, it requires knowledge of most of the environment a priori, which is difficult to implement in a practical environment. The performance of the sequential detection algorithm is easily influenced by 'poor channels' in the environment, and although the idle degree of the channels is high, the signal-to-noise ratio is low; or the primary user occupies the channel frequently although the channel is large. If the signal-to-noise ratio sequential detection algorithm is based on, a channel with a high signal-to-noise ratio but frequent occupation of a master user is easy to select, so that the perception access success rate is low; or the sequential detection algorithm based on the channel idle probability is easy to select the channel with high idle degree but low signal-to-noise ratio, which causes the result that the secondary user does not meet the service quality requirement and the throughput obtained by the secondary user is low.
The deep reinforcement learning has excellent success in the fields of electronic games, robots, go and the like, and can interact with the environment to learn on the premise of losing most of prior information of the environment, so that intelligent decision is made. The invention introduces the network of the asynchronous dominant actor appraisal family in deep reinforcement learning into the cognitive radio, so that the secondary user can intelligently select the channel meeting the self service quality requirement for perception access under the condition of unknown most channel environment prior information.
Disclosure of the invention
The invention aims to provide a method for detecting the interference of a low-quality channel in the environment, which can overcome the defect and the defect that a sequential detection algorithm is easily interfered by the low-quality channel in the environment; and intelligently selecting the channel meeting the self service quality requirement for perception access by a method of unknown most channel environment prior information.
The purpose of the invention is realized as follows:
1.1, initializing actor network and comment family network parameters in the global network, and giving the global network parameters to the local network;
1.2, under the local network, the secondary user selects a channel to access according to an observation matrix formed by observation information and a current strategy, and the secondary user perceives the selected channel to access and obtains instant return according to the channel state;
1.3, after a plurality of iterations, respectively calculating gradients of the local actor network and the local commenting family network, transmitting the gradients to the global network, and resetting the gradients of the local actor network and the commenting family network;
1.4, updating the global actor network according to the actor network updating function, updating the global critic network according to the critic network updating function, and endowing the updated global network parameters to a local network;
and 1.5, circularly executing the step 1.2 to the step 1.4 until all the circulation times are finished, and obtaining a complete neural network model.
The asynchronous dominant actor critic network comprises the following main contents: the asynchronous dominant actor critic network is divided into two major parts, a global network and a local network. The neural network structure of the global network and the local network is the same, wherein the actor network is a hidden layer, the number of the neurons is 200, and the activation function is a linear rectification function. The critic network is also a hidden layer, the number of the neurons is 200, and the activation function is a linear rectification function.
The main contents of the local network included in the invention are as follows: each local network is independently interacted with the environment, so that each local network also has an independent actor network and a critic network, the local actor networks are independently interacted with the channel environment respectively, the critic network evaluates the action strategy of the actor networks, and the network structures of the local networks are completely the same.
The observation matrix comprises the following main contents: the method is characterized in that: the secondary user can only observe the state of the selected sensing channel, and the observation information of the secondary user in the t-th time slot is as follows:
Ot=[o1,t,o2,t,...,oN,t]
after a temporary memory mechanism is introduced, the secondary user can store the observation information of the previous M steps. The M-step observation information forms an observation matrix, and the observation matrix at the t-th moment can be expressed as:
St=[Ot-1,Ot-2,Ot-3,...,Ot-M]
the interactive return function comprises the following main contents: the secondary user selects to sense that the accessed channel is idle and meets the self service quality requirement, so that the decision is correct, and positive feedback is obtained; if the channel selected and sensed by the secondary user is occupied by the primary user, the decision is wrong, and a negative feedback punishment is received. Considering that channels meeting the service quality requirement of the secondary user are all in a busy state in a certain period of time, the channel selected and sensed by the secondary user is set to be an idle channel although the channel does not meet the service quality requirement, and a small positive feedback can still be obtained.
Figure BDA0002918007300000021
DiRepresenting the obtained throughput of the ith channel, with η being the throughput threshold of the secondary user. (D)i- η)/η is the ratio of the throughput obtained for the ith channel to the threshold η difference, mainly to guide the secondary user to select the more excellent channel.
The invention comprises a global network which mainly comprises the following contents: the global network does not interact with the environment, and the method mainly works by collecting gradient data of each network, updating the network through the gradient data and transmitting updated network parameters to each local network.
The update function for the global actor network is:
Figure BDA0002918007300000031
where θ represents a parameter of the global actor network; a (s, a) represents a merit function representing the degree of superiority and inferiority of the operation in the environmental state; h (Pi)θ'(s)) is a policy entropy for increasing exploratory power of previous users; (ii) a Beta represents a policy entropy weight for controlling the degree of exploration.
The update function for the global critic network is:
Figure BDA0002918007300000032
where μ represents a parameter of the global critic network; r represents the instant reward obtained by the secondary user; gamma is a discount factor; λ is the learning rate of the critic network.
Compared with the prior art, the invention has the beneficial effects that:
1. the invention comprehensively considers the signal-to-noise ratio and the idle probability of the channel, can effectively avoid poor channels in the environment and effectively improve the success rate of accessing the high-quality channel by the secondary user;
2. the return function of the invention is set to encourage the secondary user to access a more excellent channel on the premise of meeting the QoS, so that the secondary user can be guided to make a better decision;
3. the method is close to the access success rate of the known prior information algorithm under the condition of missing most of the environmental prior information, and is higher than the access success rate of part of the known prior information algorithm when the sensing times are less.
(IV) description of the drawings
FIG. 1 is a flow chart of the algorithm of the present invention;
FIG. 2 shows the number of selections of different channels in each cycle;
fig. 3 the present invention compares sequence-aware access success rates with different known apriori information.
(V) detailed description of the preferred embodiments
The following detailed description is made with reference to the accompanying drawings and specific examples:
the final objective of the algorithm of the invention is that the secondary user can intelligently select an idle channel which accords with the self service quality according to the learned channel access strategy for perception access, and abstract the idle channel into reinforcement learning, namely the strategy adopted by the intelligent agent can maximize the accumulated return. The communication of the user in a single circulation can also be carried out infinitely along with the time, the accumulated return tends to be infinite, and the quality of the strategy cannot be effectively evaluated. Thus defining the number of slots in a single iteration as T. The above problem can be expressed as the following formula:
Figure BDA0002918007300000033
wherein r isi,tIndicating the instantaneous reward obtained by selecting the ith channel at time t.
The invention sets that N channels and a secondary user exist in the environment, the states of the N channels are all time-varying, and the channel state is only related to the occupation of the primary user; setting that a secondary user can sense N (N < < N) channels in a time slot, wherein in the t-th time slot, environment information which can be observed by the secondary user is as follows:
Ot=[o1,t,o2,t,...,oN,t] (2)
wherein o isi,tObservation information representing the i-th channel at the time t of the secondary user:
Figure BDA0002918007300000041
wherein xi,tThe channel state of the ith channel at time t. After a temporary memory mechanism is introduced, the secondary user can store the observation information of the previous M steps. The M-step observation information forms an observation matrix, and the observation matrix at the t-th moment can be expressed as:
St=[Ot-1,Ot-2,Ot-3,...,Ot-M] (4)
and the secondary user selects the sensing access which best meets the self QoS requirement after sensing the n channels. The number of elements of the action set when selecting n channel senses is:
Figure BDA0002918007300000042
if two channels are selected for sensing in a single time slot when 5 channels exist in the environment, the action set is a { (1,2), (1,3), (1,4),. ·, (4,5) }. If only one channel can be sensed in a single timeslot, the action set is the number of channels existing in the environment:
A={1,2,3,...,N} (6)
if the secondary user selects to sense that the accessed channel is idle and meets the self service quality requirement, the decision is correct, and positive feedback is obtained; if the channel selected and sensed by the secondary user is occupied by the primary user, the decision is wrong, and a negative feedback punishment is received. Considering that channels meeting the service quality requirement of the secondary user are all in a busy state in a certain period of time, the channel selected and sensed by the secondary user is set to be an idle channel although the channel does not meet the service quality requirement, and a small positive feedback can still be obtained. The reward function may be represented by the following equation:
Figure BDA0002918007300000043
the quality of service requirement of the secondary user is determined by the throughput, and the quality of service is determined to be qualified only if the obtained throughput of the access channel is higher than the threshold requirement. DiRepresenting the obtained throughput of the ith channel, with η being the throughput threshold of the secondary user. (D)i- η)/η is the ratio of the throughput obtained for the ith channel to the threshold η difference, mainly to guide the secondary user to select the more excellent channel.
The network of the asynchronous dominant actor commentary family is divided into a local network and a global network. Each local network is independently interacted with the environment, so that each local network also has an independent actor network and a critic network, the local actor networks are independently interacted with the channel environment respectively, the critic network evaluates the action strategy of the actor networks, and the network structures of the local networks are completely the same. The global network does not interact with the environment, and the method mainly works by collecting gradient data of each network, updating the network through the gradient data and transmitting updated network parameters to each local network.
The actor network in the local network performs interaction and action selection with the environment, and the main task is strategy learning, which directly performs gradient calculation on the strategy:
Figure BDA0002918007300000044
wherein J (θ) represents an objective function of the policy network; piθ(s, a) represents the probability of selecting action a in state s when the network parameter is θ; d(s) representing the number of states collected for this interaction;
Figure BDA0002918007300000055
representing the immediate reward obtained by selecting action a in state s.
The local critic network is mainly used for estimating state value, evaluating the quality degree of an actor network action strategy and guiding actor network updating through an advantage function. The merit function is the merit of some action a over the average in state s. Multi-step sampling is employed in the asynchronous dominant actor critic network to accelerate convergence:
Figure BDA0002918007300000051
where V(s) represents the value of state s, which can be estimated by the critic's network. In conjunction with equation (9), the policy gradient calculation of equation (8) becomes:
Figure BDA0002918007300000052
the global network does not interact with the environment, and the method mainly works by collecting gradient data of each network, updating the network through the gradient data and transmitting updated network parameters to each local network. The structure of the global network also remains consistent with the local network due to the mutual communication of parameters and gradients. The actor network in the global network is also responsible for updating the action strategy, and the gradient update can be expressed as:
Figure BDA0002918007300000053
where θ represents a parameter of the global actor network; a (s, a) represents a merit function representing the degree of superiority and inferiority of the operation in the environmental state; h (Pi)θ'(s)) is a policy entropy for increasing exploratory power of previous users; beta represents a policy entropy weight for controlling the degree of exploration. After the dominance function is introduced, the global network critics network improves the fitting accuracy of the value function by minimizing the square of the dominance function, and the gradient update of the global network critics network can be expressed as:
Figure BDA0002918007300000054
where μ represents a parameter of the global critic network; r represents the instant reward obtained by the secondary user; gamma is a discount factor; λ is the learning rate of the critic network.
The simulation parameters of the simulation example of the invention are set as follows: the simulation parameters are divided into two parts of system environment parameters and neural network parameters. Wherein the system environment parameters are: in the environment, there are N-10 independent channels, each of which may be occupied by a primary user, and the occupation probability P isbusyIs (0,1), the signal-to-noise ratio of the channel ranges from [ -10,10 [)]dB. In simulation experiments, the signal-to-noise ratio of 10 channels is set to be SNR [ -10, -8, -9, -5, -3,0,4,5,7,10 [ -8, -9, -5 [ -3]Corresponding to an occupation probability of Pbusy=[0.1,0.3,0.4,0.3,0.2,0.5,0.3,0.4,0.4,0.9]. The neural network parameters are: the network structures of actors and commentators of the local network and the global network are the same, the actor network is a hidden layer, the number of neurons is 200, and an activation function is a linear rectification function; the output layer directly outputs the action selection probability distribution. The critic network is also a hidden layer, the number of the neurons is 200, and the activation function is a linear rectification function; the output layer outputs an estimate of the value of the state action. The learning rate of the critic network is required to be more than or equal to that of the actor network, and the learning rate Lr of the critic network is set by the methodc0.001, learning rate Lr of actor networka0.0001. The invention sets the access success rate as the probability of the secondary user successfully accessing the idle channel which accords with the service quality.
As shown in fig. 2, there are 3 channels meeting QoS requirements in the environment, and the three channels are selected by the secondary user to sense the access times when sensing once every time slot. As can be seen in the figure, the iteration is initially due to heuristics, and three channels are selected almost the same number of times. However, as the iteration progresses, although the signal-to-noise ratio of the 10 th channel is high, the occupied frequency of the primary user is high, and therefore the number of times of selection is also reduced continuously. The learning of the secondary user considers the channel access from a longer angle, so that the poor channel can be effectively avoided. The other two channels that meet the QoS requirements are selected slowly since their primary users occupy less frequency. Meanwhile, due to the arrangement of the return function, the secondary user is biased to access the 9 th channel under the condition that the occupation probability of the primary user is not large, which shows that the arrangement of the return function can guide the secondary user to make a better decision.
As shown in fig. 3, when there are 3 channels in the environment meeting the QoS requirement, the present invention compares the access success rate with the sequence sensing of different known prior information under different sensing times. The fully-known perception is that the algorithm assumes that the signal-to-noise ratio of all channels known by secondary users and the occupation probability of the primary user corresponding to each channel are calculated according to the product (SNR (1-P) of the signal-to-noise ratio and the idle probability of the primary userbusy) For sequence perception. It can be seen from the figure that the fully-known sensing always senses a fixed channel due to the characteristics of the sequence sensing, so that the access success rate of the fully-known sensing depends on the first sensing channel under the condition of sensing for 1 time, and the sensing access algorithm provided by the invention can intelligently select a proper channel for access without being limited to the sequence sensing access.
The invention provides a channel quality access method in cognitive radio, which comprises the following specific steps: the local network has an actor network and a critic network, the actor network is responsible for channel selection and interacts with the environment to collect interaction information, the critic network evaluates the advantages and disadvantages of actor network channel selection strategies, but the local network does not update gradients, but collects the gradients and transmits the gradients to the global network, the global network does not interact with the environment, the global network collects the gradients collected by the local networks, performs gradient updating on the local networks, and transmits updated network parameters to the local networks again. The invention comprehensively considers the channel quality and the idle probability, and the secondary user can effectively avoid accessing the inferior channel, thereby greatly improving the access success rate meeting the service quality requirement.
The technical solution of the present invention is not limited to the technical method, and the present invention can be extended to other modifications, variations, applications and embodiments in application, and all such modifications, variations, applications, embodiments are considered to be within the spirit and teaching scope of the present invention.

Claims (5)

1. A channel quality access method in cognitive radio is characterized in that: the method comprises the following steps:
1.1, initializing actor network and comment family network parameters in the global network, and giving the global network parameters to the local network;
1.2, under the local network, the secondary user selects a channel to access according to an observation matrix formed by observation information and a current strategy, and the secondary user perceives the selected channel to access and obtains instant return according to the channel state;
1.3, after a plurality of iterations, respectively calculating gradients of the local actor network and the local commenting family network, transmitting the gradients to the global network, and resetting the gradients of the local actor network and the commenting family network;
the actor network in the local network performs interaction and action selection with the environment, and the main task is strategy learning, which directly performs gradient calculation on the strategy:
Figure FDA0003504713860000011
wherein J (θ) represents an objective function of the policy network; piθ(s, a) represents the probability of selecting action a in state s when the network parameter is θ; d(s) representing the number of states collected for this interaction;
Figure FDA0003504713860000015
representing the instant reward obtained by selecting action a under state s;
the local critic network is mainly used for estimating state value, evaluating the quality degree of an actor network action strategy, and guiding the actor network to update through an advantage function, wherein the advantage function is the advantage of a certain action a relative to the average under a state s, and multistep sampling is adopted in the asynchronous dominant actor critic network to accelerate convergence:
A(s,a)=Q(s,a)-V(s)
=rt+1+γrt+2+...+γn-1rt+nnV(s')-V(s)
where v(s) represents the value of state s, which can be estimated by the critic network, and the gradient calculation for the strategy becomes:
Figure FDA0003504713860000012
1.4, updating the global actor network according to the actor network updating function, updating the global critic network according to the critic network updating function, and endowing the updated global network parameters to a local network;
the update function for the global actor network is:
Figure FDA0003504713860000013
where θ represents a parameter of the global actor network, A (s, a) represents a merit function representing a degree of goodness of the action under the environment condition, and H (π)θ'(s)) is a policy entropy for increasing exploratory power of previous users;
the update function for the global critic network is:
Figure FDA0003504713860000014
wherein mu represents the parameters of the global critic network, r represents the instant return obtained by the secondary user, gamma is a discount factor, and lambda is the learning rate of the critic network;
and 1.5, circularly executing the step 1.2 to the step 1.4 until all the circulation times are finished, and obtaining a complete neural network model.
2. The method of claim 1, wherein the method comprises: in the environment, a plurality of channels can be accessed, and the secondary user quickly finds and accesses the channel which meets the self service quality requirement.
3. The method of claim 1, wherein the method comprises: step 1.1, the neural networks of the global network and the local network have the same structure, wherein the actor network is a hidden layer, the number of the neurons is 200, the activation function is a linear rectification function, the critic network is a hidden layer, the number of the neurons is 200, and the activation function is a linear rectification function.
4. The method of claim 1, wherein the method comprises: in the step 1.2, each local network is independently interacted with the environment, the actor network and the comment family network are independent respectively, the local actor networks are independently interacted with the channel environment respectively, the comment family network evaluates the action strategy of the actor network, and the network structures of the local networks are completely the same.
5. The method of claim 1, wherein the method comprises: in the observation matrix in step 1.2, the secondary user can only observe the state of the selected sensing channel, and the observation information of the secondary user at the t-th time slot is as follows:
Ot=[o1,t,o2,t,...,oN,t]
after a temporary memory mechanism is introduced, the secondary user stores the observation information of the previous M steps, the observation information of the M steps forms an observation matrix, and the observation matrix at the t-th moment can be expressed as:
St=[Ot-1,Ot-2,Ot-3,...,Ot-M]
obtaining a return after interacting with the environment, wherein the return function is as follows:
Figure FDA0003504713860000021
the secondary user selects to sense that the accessed channel is idle and meets the self service quality requirement, so that the decision is correct, and positive feedback is obtained; if the channel selected and sensed by the secondary user is occupied by the primary user, the decision error is indicated, and a negative feedback punishment is received; considering that channels meeting the service quality requirement of the secondary user are all in a busy state in a certain period of time, setting the channels selected and sensed by the secondary user to be idle channels although the channels do not meet the service quality requirement, and still obtaining a small positive feedback, DiRepresenting the obtained throughput of the ith channel, η being the throughput threshold of the secondary user, (D)i- η)/η is the ratio of the throughput obtained for the ith channel to the threshold η difference, mainly to guide the secondary user to select the more excellent channel.
CN202110107271.7A 2021-01-27 2021-01-27 Channel quality access method in cognitive radio Active CN112954814B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110107271.7A CN112954814B (en) 2021-01-27 2021-01-27 Channel quality access method in cognitive radio

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110107271.7A CN112954814B (en) 2021-01-27 2021-01-27 Channel quality access method in cognitive radio

Publications (2)

Publication Number Publication Date
CN112954814A CN112954814A (en) 2021-06-11
CN112954814B true CN112954814B (en) 2022-05-20

Family

ID=76237380

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110107271.7A Active CN112954814B (en) 2021-01-27 2021-01-27 Channel quality access method in cognitive radio

Country Status (1)

Country Link
CN (1) CN112954814B (en)

Family Cites Families (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108471619B (en) * 2018-03-22 2021-02-02 中南大学 Channel selection method of cognitive wireless sensor network
CN109089307B (en) * 2018-07-19 2021-05-18 浙江工业大学 Energy-collecting wireless relay network throughput maximization method based on asynchronous dominant actor critic algorithm
CN109379752B (en) * 2018-09-10 2021-09-24 ***通信集团江苏有限公司 Massive MIMO optimization method, device, equipment and medium
WO2020152389A1 (en) * 2019-01-22 2020-07-30 Nokia Solutions And Networks Oy Machine learning for a communication network
CN110190918B (en) * 2019-04-25 2021-04-30 广西大学 Cognitive wireless sensor network spectrum access method based on deep Q learning
CN110492955B (en) * 2019-08-19 2021-11-23 上海应用技术大学 Spectrum prediction switching method based on transfer learning strategy
CN110691422B (en) * 2019-10-06 2021-07-13 湖北工业大学 Multi-channel intelligent access method based on deep reinforcement learning
CN111262638B (en) * 2020-01-17 2021-09-24 合肥工业大学 Dynamic spectrum access method based on efficient sample learning
CN112188503B (en) * 2020-09-30 2021-06-22 南京爱而赢科技有限公司 Dynamic multichannel access method based on deep reinforcement learning and applied to cellular network

Also Published As

Publication number Publication date
CN112954814A (en) 2021-06-11

Similar Documents

Publication Publication Date Title
CN112134916B (en) Cloud edge collaborative computing migration method based on deep reinforcement learning
Wang et al. A survey on applications of model-free strategy learning in cognitive wireless networks
CN107690176B (en) Network selection method based on Q learning algorithm
CN112367132B (en) Power distribution algorithm in cognitive radio based on reinforcement learning solution
CN109474980A (en) A kind of wireless network resource distribution method based on depth enhancing study
CN112188503B (en) Dynamic multichannel access method based on deep reinforcement learning and applied to cellular network
CN111262638B (en) Dynamic spectrum access method based on efficient sample learning
CN113038616B (en) Frequency spectrum resource management and allocation method based on federal learning
Wang et al. Decentralized learning based indoor interference mitigation for 5G-and-beyond systems
CN113596785B (en) D2D-NOMA communication system resource allocation method based on deep Q network
Guo et al. Deep reinforcement learning optimal transmission algorithm for cognitive Internet of Things with RF energy harvesting
CN112492691A (en) Downlink NOMA power distribution method of deep certainty strategy gradient
CN108833227A (en) A kind of smart home communication optimization scheduling system and method based on edge calculations
Giri et al. Deep Q-learning based optimal resource allocation method for energy harvested cognitive radio networks
Rao et al. Network selection in heterogeneous environment: A step toward always best connected and served
CN117376355B (en) B5G mass Internet of things resource allocation method and system based on hypergraph
CN114126021A (en) Green cognitive radio power distribution method based on deep reinforcement learning
CN112954814B (en) Channel quality access method in cognitive radio
CN114051252A (en) Multi-user intelligent transmitting power control method in wireless access network
Mishra et al. Raddpg: Resource allocation in cognitive radio with deep reinforcement learning
CN116155991B (en) Edge content caching and recommending method and system based on deep reinforcement learning
CN103249050A (en) Multi-scale frequency spectrum access method based on business requirements
CN113395757B (en) Deep reinforcement learning cognitive network power control method based on improved return function
Do et al. Actor-critic deep learning for efficient user association and bandwidth allocation in dense mobile networks with green base stations
Koursioumpas et al. A Safe Deep Reinforcement Learning Approach for Energy Efficient Federated Learning in Wireless Communication Networks

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant