CN109660375A

CN109660375A - A kind of highly reliable adaptive MAC layer scheduling method

Info

Publication number: CN109660375A
Application number: CN201710946487.6A
Authority: CN
Inventors: 刘元安; 张洪光; 王怡浩; 范文浩; 吴帆; 谢刚
Original assignee: Beijing University of Posts and Telecommunications
Current assignee: Beijing University of Posts and Telecommunications
Priority date: 2017-10-11
Filing date: 2017-10-11
Publication date: 2019-04-19
Anticipated expiration: 2037-10-11
Also published as: CN109660375B

Abstract

The invention discloses a kind of highly reliable adaptive MAC layer scheduling methods.Mainly solve the problems, such as that leader cluster node is since idle listening causes a large amount of energy consumptions in wireless sensor network.The described method includes: carrying out model foundation to wireless sensor network；Particular frame format is generated, queue occupancy and delay are embedded in frame control field；Initialization action set, select probability set and feedback set；Coordinator is interacted using learning automaton method with ambient enviroment, its movement and state are updated；Entire learning process is divided into three phases: initial stage, exploratory stage and greedy stage, taking corresponding search strategy；The effect of assessment movement and environmental interaction updates feedback and select probability set；The relevant parameter for determining duty ratio is chosen based on feedback set, realizes adaptive MAC layer scheduling.The embodiment of the present invention, guaranteeing node, interior adaptive adjustment duty ratio, minimum power consumption are with a wide range of applications during operation.

Description

A kind of highly reliable adaptive MAC layer scheduling method

Technical field

The invention belongs to wireless sensor network technology field, in particular to a kind of highly reliable adaptive MAC layer scheduling Method.

Background technique

Wireless sensor network (WSN) node is usually battery powered, and in many deployed environments, replacement electricity Pond or to be electromagnetically charged all be expensive or even infeasible.Therefore, low-power consumption is considered as network communication of wireless sensor association The most important index of view.Specifically, node does not know when data occur for other nodes, even if therefore node in idle state Under, transceiver will also be continuously in reception pattern.Idle listening is considered as one of main problem of energy waste.

Currently, most adopted IEEE802.15.4 standard defines several different types of nodes extensively: global function is set Standby (FFD) is also referred to as the equipment for enabling beacon, can be used as personal area network's coordinator, cluster head or router operation, part Function device (RFD) is also referred to as non-beacon equipment, can only run as terminal device.When FFD serves as cluster head, due to FFD without When its data is sent to them by method prediction other sensors node, therefore they are needed always in a receive mode to connect All information being collected into are received, its energy can be exhausted rapidly in this way.In order to overcome this problem, standard specification definition enables The mode of beacon.This mode supports beacon frame to be transferred to the terminal device for allowing node synchronous from coordinator.Make to own in this way Equipment carries out suspend mode between coordinating transmissions, helps to reduce idle listening, to extend network life.

In recent years, in response to this, many duty cycle adjustment algorithms are proposed, such as present in modification mac frame head Retain frame control field, the transmit queue of collector node occupies and the information such as end-to-end delay select duty ratio；There are also one For kind scheme by the way of intensified learning, main target is to find optimal duty ratio, devises one kind in WSN environment The scheme of SMAC agreement sleeping time is adjusted, the scheme proposed is to wait in line the frame number of transmission as state, to retain Activity time be movement.However, this meaning needs to store a large amount of state-movement pair, in the resource-constrained wireless biography of memory It is worthless in sensor node.There is the extension of the CAP for the busy tone that equipment issues at the end of proposing based on standard CAP recently.Only Busy tone is just sent when equipment sends the failure of its all data frame.If deposited in the transmission queue of any equipment at the end of CAP In some real time datas, then CAP is extended.However, these extensions are not inconsistent standardization, need to modify superframe structure.

Summary of the invention

The embodiment of the present invention provides a kind of highly reliable adaptive MAC layer scheduling method, during operation interior adaptive tune Whole duty ratio, does not need human intervention, so as to minimum power consumption, while balancing probability and the application of successful data transmitting Deferred constraint.

In order to achieve the above objectives, it the embodiment of the invention provides a kind of highly reliable adaptive MAC layer scheduling method, answers For the coordinator unit in wireless sensor network, method includes:

Model foundation is carried out according to wireless sensor network environment, wireless sensor network environment model is by three-dimensional array E =(α, β, p) indicates that wherein α indicates that node learns the behavior aggregate inputted automatically, indicates the duty ratio of node in the present invention Set；β indicates that node selects feedback signal of the suitable duty ratio later with environmental interaction output.

Specifically, environment can be divided into P- model and Q- model according to the difference of β value type: in P- model, instead Feedback signal is Boolean (0 or 1)；In Q- model, feedback signal is [0,1] interior continuous random variable.P- model is due to it Controlling model is easy to use, therefore the present invention uses P- model.P={ p₁, p₂..., p_rIndicate a series of rewards and punishments probability, and Each learning automaton acts α_iThere is a corresponding p_i。

Node generates specific frame structure format, is prolonged using the reserved bit insertion queue occupancy of frame control field with queuing When etc. parameters.

Specifically, in order to avoid introducing any additional expense, frame of each terminal device in each data frame of transmission Queue occupancy O and queueing delay D is embedded in control structure, which is protected using 3 of frame control field as shown in Figure 3 Position is stayed to be embedded in.

It should be noted that each sender indicates the queue occupancy o of 4 different stages using two bits_i, And queueing delay d_iIt is divided into 2 ranks.

Coordinator (FFD) carries out flow estimation, generates the duty ratio set an of adaptive-flow.

It should be noted that coordinator collection terminal is set present invention assumes that wireless sensor network is stelliform connection topology configuration The data that preparation is sent.Each coordinator accumulated by idle listening in computing terminal equipment transmit queue, grouping and postpone come Estimate incoming business.

Its set of actions is initialized for coordinator, acts select probability set and feedback set.

Specifically, learning automaton is a learning tool based on probability, it passes through stochastic activity probability vector P_i (t) activity is selected, activity probability vector is the main member of learning automaton, so must keep updating at any time.

It should be noted that in the initial stage to prevent from losing in the case that wireless sensor network data flow is very big The case where mass data, occurs, and is acted and is chosen to be largest duty cycle, i.e., coordinator is in always receives state, and its is right The movement select probability answered also is 1, guarantees that coordinator early period can be collected into the more information of network.

Coordinator (FFD) is interacted using learning automaton method (LA) method with ambient enviroment.

Specifically, the learning automaton model of varistructure can be indicated by three-dimensional array LA=(α, β, p), wherein α ={ α₁, α₂..., α_rIndicate learning automaton behavior aggregate, β={ β₁, β₂..., β_rIndicate the feedback signal that environment is given Collection, p={ p₁, p₂..., p_rExpression movement probability set, meetWherein p_i(n) it indicates by the n-th wheel study The α of process_iCorresponding movement probability.

Strategy is explored in selection: different times select different exploration strategy

Specifically, the exploration strategy of entire part is divided into 3 stages: initial stage, exploratory stage and greedy rank Section；

Initial stage explores the everything in set using cyclic search strategy in deterministic fashion, and node is initial When select highest duty ratio, slowly reduce duty ratio until minimum duty cycle, which ensures that all duty ratio set all It is attempted.

Exploratory stage randomly explores the movement than currently selecting higher duty ratio, if select probability increases Bright reward increases.Otherwise, if reward remains unchanged or declines, it can randomly explore low duty ratio action.

In the greedy stage, after explorative research strategy carries out study a period of time, node is just substantially poor to the cognition of environment Few, this is can to start autonomous selection movement.

The influence transmitted after the movement and environmental interaction to data is assessed, feedback set, update action select probability are updated Set

Specifically, coordinator by using the feedback received in the last one active duration from sender more The reward of new each beacon interval.

Selection movement is selected BO the and SO standard parameter for determining duty ratio based on feedback set, realizes adaptive MAC Scheduling

After selecting action value, adjustment determines BO the and SO standard parameter of duty ratio.

In order to achieve the above objectives, it the embodiment of the invention provides a kind of highly reliable adaptive MAC layer dispatching device, answers For the coordinator unit in wireless sensor network, described device includes:

Generation unit: generating specific frame control structure format, is embedded in queue occupancy in the reserved bit of frame control field With the parameters such as queuing delay；

Transmission unit: each sensor node is sent to each sensing according to the frame format that setting generates the case where itself Device node；

Receiving unit: the data frame sent after accessing channel for receiving each sensor node；The data frame is extremely It less include the parameters such as queue occupancy and queuing delay；

Assessment unit: according to the selection of the parameter and the working state evaluation of coordination of sensor node the transmission movement Probability；

Autonomous learning element: node updates the set of actions of itself using learning automaton method, acts select probability collection Conjunction and feedback set；

Policy selection unit: which time slot judgement is in, and takes corresponding strategy, and the stage takes circulation to explore in the early stage Strategy takes randomized policy in the exploratory stage, takes Greedy strategy in the last greedy stage；

Automatic adjusument unit: it after selection movement, based on feedback collection and behavior aggregate adjusting parameter BO and SO, completes Adaptive MAC scheduling.

Detailed description of the invention

In order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, below will to embodiment or Attached drawing needed to be used in the description of the prior art is briefly described, it should be apparent that, the accompanying drawings in the following description is only Some embodiments of the present invention, for those of ordinary skill in the art, without creative efforts, also Other drawings may be obtained according to these drawings without any creative labor.

Fig. 1 is the flow diagram of the highly reliable adaptive MAC layer scheduling method of one kind provided in an embodiment of the present invention；

Fig. 2 is the model schematic of learning automaton provided in an embodiment of the present invention；

The structural schematic diagram of the position Fig. 3 frame control format provided in an embodiment of the present invention；

Fig. 4 is the structural schematic diagram of the highly reliable adaptive MAC layer dispatching device of one kind provided in an embodiment of the present invention；

Fig. 5 is MAC layer scheduling node provided in an embodiment of the present invention transmission collision schematic diagram.

Specific embodiment

Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete Site preparation description, it is clear that described embodiments are only a part of the embodiments of the present invention, instead of all the embodiments.It is based on Embodiment in the present invention, it is obtained by those of ordinary skill in the art without making creative efforts it is all its His embodiment, shall fall within the protection scope of the present invention.

According to attached drawing, technical solution of the present invention is illustrated.

The highly reliable adaptive MAC layer scheduling method of described one kind, comprising the following steps:

S101 carries out model foundation to wireless sensor network.

Specifically, wireless sensor network environment model is indicated by three-dimensional array E=(α, β, p), wherein α indicates node The automatic study i.e. behavior aggregate of input indicates the duty ratio set of node in the present invention；β indicates that node selection suitably accounts for Feedback signal after sky ratio with environmental interaction output.

Specifically, environment can be divided into P- model and Q- model according to the difference of β value type: in P- model, instead Feedback signal is Boolean (0 or 1)；In Q- model, feedback signal is [0,1] interior continuous random variable, is suitable for actual Control field；P- model is widely used in wireless sensor network research since its Controlling model is easy to use.P= {p₁, p₂..., p_rIndicate a series of rewards and punishments probability, and each learning automaton acts α_iThere is a corresponding p_i.? In the present invention, we carry out model foundation to wireless sensor network environment using P- model.

S102, node generate specific frame structure format, utilize the reserved bit insertion queue occupancy of frame control field and row The parameters such as team's delay.

It should be noted that each sender indicates the queue occupancy o of 4 different stages using two bits_i, And queueing delay d_iIt is divided into 2 ranks.By this information, coordinator can estimate queue occupancy O and queueing delay D.Queue occupancy O is defined as follows:

Wherein, it can store the maximum frame number in its queue if any node is met or exceeded, be equal to 1.It is no Then, it is equal to the average queuing occupancy in inactive period, i.e. occupancy highest time in grouping accumulation CAP.Queue Occupancy O is indicated using 2Bits, can not only save space, but also can reduce the fluctuation range of the value.

S103, coordinator (FFD) carry out flow estimation, generate the duty ratio set an of adaptive-flow.

It should be noted that coordinator collection terminal is set present invention assumes that wireless sensor network is stelliform connection topology configuration The data that preparation is sent.Each coordinator accumulated by idle listening in computing terminal equipment transmit queue, grouping and postpone come Estimate incoming business.The expression formula of idle listening IL is as follows:

IL=1.0-SF_u (2)

Wherein, SF_uIndicate superframe utilization rate, be terminal device occupy superframe time with can be used for the total of data communication Ratio between time, is defined as:

Wherein, SD is super-frame durations, T_bCoordinator is the time that beacon transmissions are spent, T_cIndicate due to frame conflict and The time of equipment busy channel, T_rIt is the time for data receiver.

Illustratively, see attached drawing 5, in Class1 (C1), the sender's node (A node) considered terminates it first Transmission, and the transmission of other nodes continues to.In type 2 (C2), sender A completes transmission after clashing.Most Afterwards, in type 3 (C3), two nodes terminate to transmit simultaneously.In order to detect C1 and C2, A or B can monitor channel to detect Other transmission, if they are in the range of each other.Therefore, sender thinks to rush when if listening to channel after transmission It is prominent, then it detects busy channel and is not received by acknowledgement frame 2.On the other hand, in order to detect C3, receiver perceives its CCA Reception energy in threshold value increases, but it is not synchronous with start frame separator.

S104 acts select probability set and feedback set to coordinate to initialize its set of actions.

Specifically, learning automaton is a learning tool based on probability, it passes through stochastic activity probability vector P_i (t) activity is selected, activity probability vector is the main member of learning automaton, so must keep updating at any time.Study is certainly Motivation A_iActivity probability vector be expressed as follows:

Wherein, P_i(t) it indicates in moment t, node n_iThe probability of a certain duty ratio is selected, in the present invention, probability setting For the desired value of corresponding duty ratio general feedback return, it is defined as follows:

It should be noted that being lost in the case that in the initial stage, wireless sensor network data flow is very big in order to prevent The case where losing mass data occurs, and is acted that be chosen to be duty ratio be zero, i.e., coordinator is in always receives state, and its Corresponding movement select probability is also 1, guarantees that coordinator early period can be collected into the more information of network.

S105, coordinator (FFD) are interacted using learning automaton method (LA) method with ambient enviroment.

It should be noted that the learning automaton model of varistructure can be indicated by three-dimensional array LA=(α, β, p), Wherein, α={ α₁, α₂..., α_rIndicate learning automaton behavior aggregate, β={ β₁, β₂..., β_rIndicate environment give it is anti- Feedback signal collection, p={ p₁, p₂..., p_rExpression movement probability set, meetWherein p_i(n) it indicates to pass through n-th Take turns the α of learning process_iCorresponding movement probability.Meet and probability updating formula p (n+1)=T (α (n), β (n), p (n)), T table Show learning algorithm, the general learning algorithm mechanism of learning automaton is defined as follows:

Wherein, α (n) and b (n) is linear function g_iAnd h_iWeight coefficient, linear function or constant can be defined as, answered Depending on concrete application；Using P- environmental model, feedback signal value is 0 or 1, and when feedback signal takes 0, environment is awarded Signal.When feedback signal takes 0, corresponding probability updating is expressed as follows shown:

When feedback signal takes 1, corresponding probability updating is expressed as follows shown:

Strategy is explored in S106, selection: different times select different exploration strategy.

Although relying solely on movement select probability it should be noted that having movement select probability herein and being possible to Cause coordinator adjustment more slow, can not be in time to environment as reflecting, in the present invention, movement select probability selection is made Measured for a parameter, assist a kind of explorations tactful simultaneously external so that coordinator to the variation of ambient enviroment more It is sensitive.

Specifically, the exploration strategy of entire part is divided into 3 stages: initial stage, exploratory stage and greedy stage；

Initial stage explores the everything in set using cyclic search strategy in deterministic fashion, and node is initial When select highest duty ratio, slowly reduce duty ratio until minimum duty cycle, which ensures that all duty ratio set all It is attempted, i.e. the set of actions of learning automaton has been enumerated completely.

Exploratory stage, once after all movements are selected, we use following strategy:

Specifically, this strategy includes the movement randomly explored than currently selecting higher duty ratio, if select probability It increases and illustrates that reward increases.Otherwise, if reward remains unchanged or declines, it can randomly explore low duty ratio Action.

In the greedy stage, after explorative research strategy carries out study a period of time, node is just substantially poor to the cognition of environment Few, this is can to start autonomous selection movement, at this moment uses strategy as follows:

Specifically, the movement with best P value in subset of actions of the Greedy strategy selection with lower action value, changes speech It, selects that the higher duty ratio selected than last moment.There are several movements of identical P value in selected subset In the case of, select the movement with lowest duty cycle (highest action value).This means that we select have lowest duty cycle Best movement shows that it selects more preferably P value if reward is equal to or less than the reward received in previous stage.Therefore, exist Under stable condition, preferred minimum duty cycle.Once selected movement, if new action value from it is different on last stage, increase The exploration probability of node.

S107 assesses the influence transmitted after the movement and environmental interaction to data, updates feedback set, update action choosing Select Making by Probability Sets

It should be noted that coordinator from sender by using receiving in the last one active duration Feedback updates the reward of each beacon interval.Reward function is defined as follows:

Wherein, β is expressed as the combination of punishment (negative) value of the performance of stage duty ratio selection.It is by above formula as it can be seen that best Reward be a zero (not punishing) because it indicates no idle listening, transmission queue is non-spill.

Specifically, reward is based on queue occupancy O and threshold value O_maxBetween comparison.If queue occupancy is higher than upper Limit threshold value O_max, then prize signal is negative (- 1), it means that bigger O_maxSetting, the necessary packet discard of final equipment Chance is more, therefore the reward obtained is lower.Threshold value O_maxSelection indicate coordinator to the sensitivity of frame loss.The ginseng Several settings can be configured according to the reliability requirement of application.It can be set to 0.8 under normal conditions, if queue accounts for There is rate O to be less than threshold value O_max, then feedback signal is defined as the negative value equal to idle listening amount, since idle listening is that energy disappears One of the main reason for consumption.Therefore the lower it the better.Only when idle listening is zero and queue occupancy O is expressed as no data frame When loss, zero maximum reward (no punishment) can be only achieved.This means that realizing between bandwidth availability ratio and energy consumption The target of optimal tradeoff.

S108, selection movement select BO the and SO standard parameter for determining duty ratio based on feedback set, realize adaptive MAC scheduling

After selecting action value, adjustment determines BO the and SO standard parameter of duty ratio.Adjustment is defined as follows:

BO=max (4, | A | → (BI-SD) < δ) (13)

SO ← max (0, BO- α_t) (14)

It should be noted that the selection is based on data frame delay experienced, parameter value BO and SO, which are embedded in, to be broadcast to In the beacon frame of terminal device, to synchronize.

It should be noted that, in this document, relational terms such as first and second and the like are used merely to a reality Body or operation are distinguished with another entity or operation, without necessarily requiring or implying between these entities or operation There are any actual relationship or orders.Moreover, the terms "include", "comprise" or its any other variant are intended to Cover non-exclusive inclusion, so that the process, method, article or equipment for including a series of elements not only includes that A little elements, but also including other elements that are not explicitly listed, or further include for this process, method, article or The intrinsic element of equipment.In the absence of more restrictions, the element limited by sentence "including a ...", not There is also other identical elements in the process, method, article or apparatus that includes the element for exclusion.

Claims

1. a kind of highly reliable adaptive MAC layer scheduling method, which comprises the following steps:

The first step carries out model foundation according to wireless sensor network environment, learning automaton method is applied to wireless sensing Among the environment of device network, three bit array E=(α, β, p) of sensor network environment model are indicated, wherein α={ α₁, α₂..., α_nIndicate node duty ratio set, β={ β₁, β₂..., β_mIndicate the feedback letter that node and environmental interaction export Number, p={ p₁, p₂..., p_nIndicate a series of rewards and punishments probability；

Second step, node generate specific frame structure format, utilize the reserved bit insertion queue occupancy of frame control field and queuing The parameters such as delay.Specifically, each terminal device is embedded in queue occupancy O in the frame control structure of each data frame of transmission With queueing delay D, which is embedded in using 3 reserved bits of frame control field as shown in Figure 3；

Third step, coordinator (FFD) carry out flow estimation, generate the duty ratio set an of adaptive-flow, each coordinator Incoming business is estimated by idle listening, grouping accumulation and the delay in computing terminal equipment transmit queue；

4th step acts select probability set and feedback set to coordinate to initialize its set of actions；

5th step, coordinator (FFD) is interacted using learning automaton method (LA) method with ambient enviroment, using P- environment Model, feedback signal value are 0 or -1, if feedback signal takes 0, definition of probability is as follows:

If feedback signal takes -1, definition of probability is as follows:

Strategy is explored in 6th step, selection: different times select different exploration strategy, and entire learning process is divided into three ranks Section, initial stage use cyclic search strategy, and the exploratory stage uses random searching strategy, and the greedy stage then uses Greedy strategy；

7th step assesses the influence transmitted after the movement and environmental interaction to data, updates feedback set, and update action selection is general Rate set；

8th step, selection movement are selected BO the and SO standard parameter for determining duty ratio based on feedback set, select optimal account for Empty ratio, wherein BO parameter definition is as follows:

BO=max (4, | A | → (BI-SD) < δ) (3)

2. the highly reliable adaptive MAC layer scheduling method of one kind according to claim 1, which is characterized in that the net Network environmental model establish, specifically, wireless sensor network environment model by three-dimensional array E=(α, β, p) indicate, wherein α= {α₁, α₂..., α_nIndicate that node learns the limited action collection inputted automatically, the duty ratio collection of node is indicated in the present invention It closes；β={ β₁, β₂..., β_mIndicate the feedback signal exported after node selects suitable duty ratio with environmental interaction, p= {p₁, p₂..., p_nIndicate a series of rewards and punishments probability, each probability penalty p_iAll and given input variable α_iIt is related.

Environment can be divided into p-type environment, Q type ring border, 3 seed type of S type ring border by β based on the feedback signal.The present invention uses P- Model carries out model foundation to wireless sensor network environment, and feedback signal is Boolean (0 or 1), i.e., β only uses binary zero With 1 description.

Wherein, α_i(α_i∈ α) indicate the activity that learning automaton selects, p (t) indicates the probability vector in t moment, uses P_rewardTable Show the reward factor, uses P_penaltyIt indicates penalty factor, is determined increase respectively with the two factors or reduce movable probability, Probability vector P (t) update is defined as follows:

If activity is rewarded by random environment, activity probability vector P (t) update is defined as follows:

3. the highly reliable adaptive MAC layer scheduling method of one kind according to claim 1, which is characterized in that node generates Specific frame structure format, particularly, each terminal device are embedded in queue in the frame control structure of each data frame of transmission Occupancy O and queueing delay D, the information are embedded in using 3 reserved bits of frame control field as shown in Figure 3.

Specifically, each sending node indicates the queue occupancy o of 4 different stages using two bits_i, and queueing delay d_iIt is divided into 2 ranks.By this information, coordinator can estimate queue occupancy O and queueing delay D.Queue occupies Rate O is defined as follows:

It should be noted that passing through the information of 3 reserved bits, coordinator can estimate queue occupancy O and queueing delay D, such as There are node devices to meet or exceed the maximum frame number that can store in its queue for fruit, then is equal to 1.Otherwise, it, which is equal to, indicates The average queuing occupancy of the first message received in the CAP of grouping accumulation, i.e. inactive period queue occupancy are most The high time.

It should be noted that the queuing delay position D of each terminal device i_iIndicate the delay threshold of current Beacon Interval BI and definition D_thOne comparison of minimum value, if it is less than the minimum value, then queuing delay position D_iIt is ' 0 ', is otherwise ' 1 '.Coordinator will prolong Late marking is the maximum delay that node device is sent.This is done to guarantee any node when queueing delay is higher than threshold value still Can so it carry out data transmission.

4. the highly reliable adaptive MAC layer scheduling method of one kind according to claim 1, which is characterized in that coordinator (FFD) flow estimation is carried out, specifically each coordinator is tired by the idle listening in computing terminal equipment transmit queue, grouping It accumulates and postpones to estimate incoming business.The expression formula of idle listening IL is as follows:

IL=1.0-SF_u (7)

Wherein, SF_uIndicate superframe utilization rate, be terminal device occupy superframe time with can be used for data communication total time it Between ratio, is defined as:

Wherein, SD is super-frame durations, T_bCoordinator is the time that beacon transmissions are spent, T_cEquipment is indicated due to frame conflict The time of busy channel, T_rIt is the time for data receiver, T_sIt is defined as follows:

T_s=T_CCA+T_DATA+T_IFS+T_ACK (9)

Wherein, T_CCAIndicate the channel assessment time in each frame data transmission process, T_DATAIndicate data transmission period, T_IFSTable Show interframe space, T_ACKIndicate confirmation receiving time.

5. the highly reliable adaptive MAC layer scheduling method of one kind according to claim 1, which is characterized in that initialize it Set of actions acts select probability set and feedback set.Specifically, learning automaton is a study work based on probability Tool, it passes through stochastic activity probability vector P_i(t) activity being selected, activity probability vector is the main member of learning automaton, So must keep updating at any time.Learning automaton A_iActivity probability vector be expressed as follows:

Wherein, P_i(t) it indicates in moment t, node n_iThe probability of a certain duty ratio is selected, in the present invention, probability is expressed as corresponding to Duty ratio general feedback return desired value:

It being acted when it should be noted that initial and is chosen to be largest duty cycle, i.e., coordinator is in always receives state, and its Corresponding movement select probability is also 1, guarantees that coordinator early period can be collected into the more information of network.

6. the highly reliable adaptive MAC layer scheduling method of one kind according to claim 1, which is characterized in that coordinator (FFD) it is interacted using learning automaton method (LA) method with ambient enviroment.Specifically, learning automaton model can be by Three-dimensional array LA=(α, β, p) is indicated, wherein α={ α₁, α₂..., α_rIndicate learning automaton behavior aggregate, β={ β₁, β₂..., β_rIndicate the feedback signal collection that environment is given, p={ p₁, p₂..., p_rExpression movement probability set, meetWherein p_i(n) α by the n-th wheel learning process is indicated_iCorresponding movement probability.Meet and probability updating is public Formula p (n+1)=T (α (n), β (n), p (n)).

Specifically, using P- environmental model, feedback signal value is 0 or 1, when feedback signal takes 0, and environment is awarded signal. When feedback signal takes 0 or -1, corresponding probability updating respectively indicates as follows:

It should be noted that can constantly receive environment to one using learning automaton method during adjusting duty ratio A feedback β, the feedback always received can be understood as feeding back the summation with future feedback immediately, as follows:

Wherein, γ is discount factor, and γ ∈ [0,1] indicates a weight to future feedback.

7. the highly reliable adaptive MAC layer scheduling method of one kind according to claim 1, which is characterized in that different times Select different exploration strategy, particularly, the exploration strategy of entire part is divided into 3 stages: initial stage, explores rank Section and greedy stage.

Initial stage, explore the everything in set in deterministic fashion using cyclic search strategy, node selects at the beginning Highest duty ratio is selected, slowly reduces duty ratio until minimum duty cycle, which ensures that all duty ratio set all to be tasted Examination, the i.e. set of actions of learning automaton have been enumerated completely.

Exploratory stage, once after all movements are selected, will randomly it explore higher than the duty ratio of current generation selection Movement, if corresponding β in feedback set β_t ⁱIt increases, then it represents that movement α_iRepresentative duty ratio is more preferable.Otherwise, if Feedback set β is remained unchanged or corresponding β_t ⁱReduce, it can randomly explore the action of lower duty ratio, and strategy is such as Under:

In the greedy stage, after explorative research strategy carries out study a period of time, node is just substantially similar to the cognition of environment , optimal action value at this time is found using Greedy strategy, as β corresponding in feedback set β_t ⁱHigher than a upper stage Illustrate that flow increases, Greedy strategy selection has the subset of actions of lower action value, that is, selects higher duty ratio；If feedback Corresponding β in set β_t ⁱBelow or equal to a upper stageGreedy strategy selection has movement of higher action value Collection, that is, select lower duty ratio；Therefore, under stable condition, preferred minimum duty cycle is selected when next stage and this stage Duty ratio it is different, then increase the probability of search, otherwise, then reduce study and explore probability, when movement best to avoid selection It generates, strategy is as follows:

Wherein, β is expressed as the combination of punishment (negative) value of the performance of stage duty ratio selection.Have formula (16) it can be concluded that, most Good reward is a zero (not punishing), because it indicates no idle listening, transmission queue is non-spill.

It should be noted that study and exploration rate are reduced if new action is equal to the last one selected action, to avoid Select the oscillation of optimizing behavior.

8. the highly reliable adaptive MAC layer scheduling method of one kind according to claim 1, which is characterized in that selection movement, BO the and SO standard parameter that decision duty ratio is selected based on feedback set is realized adaptive MAC scheduling, particularly, selected After selecting action value, adjustment determines BO the and SO standard parameter of duty ratio.Adjustment formula is defined as follows:

BO=max (4, | A | → (BI-SD) < δ) (17)

SO ← max (0, BO- α_t) (18)

It should be noted that the selection is based on data frame delay experienced, parameter value BO and SO insertion are broadcast to terminal and set In standby beacon frame, to synchronize.

9. the highly reliable adaptive MAC layer scheduling method of one kind according to claim 1, which is characterized in that described device Include:

Generation unit generates specific frame control structure format, is embedded in queue occupancy and row in the reserved bit of frame control field The parameters such as team's delay；

Transmission unit, each sensor node are sent to each sensor section according to the frame format that setting generates the case where itself Point；

Receiving unit, the data frame sent after accessing channel for receiving each sensor node；The data frame at least wraps Include the parameters such as queue occupancy and queuing delay；

Assessment unit, according to the select probability of the parameter and the working state evaluation of coordination of sensor node the transmission movement；

Autonomous learning element, node update itself set of actions using learning automaton method, act select probability set with And feedback set；

Which time slot policy selection unit, judgement are in, take corresponding strategy, and the stage takes circulation to explore strategy in the early stage, Randomized policy is taken in the exploratory stage, takes Greedy strategy in the last greedy stage；

Automatic adjusument unit after selection movement, based on feedback collection and behavior aggregate adjusting parameter BO and SO, is completed adaptive MAC scheduling.