WO2021233061A1 - 调度方法、调度***和调度装置 - Google Patents

调度方法、调度***和调度装置 Download PDF

Info

Publication number
WO2021233061A1
WO2021233061A1 PCT/CN2021/089129 CN2021089129W WO2021233061A1 WO 2021233061 A1 WO2021233061 A1 WO 2021233061A1 CN 2021089129 W CN2021089129 W CN 2021089129W WO 2021233061 A1 WO2021233061 A1 WO 2021233061A1
Authority
WO
WIPO (PCT)
Prior art keywords
scheduling
scheduler
scheduling decision
feedback
decision
Prior art date
Application number
PCT/CN2021/089129
Other languages
English (en)
French (fr)
Inventor
韩育超
金爱祥
张倬钒
王坚
李榕
杜颖钢
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to EP21809268.2A priority Critical patent/EP4142409A4/en
Publication of WO2021233061A1 publication Critical patent/WO2021233061A1/zh
Priority to US17/988,815 priority patent/US20230072585A1/en

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W72/00Local resource management
    • H04W72/20Control channels or signalling for resource management
    • H04W72/23Control channels or signalling for resource management in the downlink direction of a wireless link, i.e. towards a terminal
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L47/00Traffic control in data switching networks
    • H04L47/10Flow control; Congestion control
    • H04L47/39Credit based
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W72/00Local resource management
    • H04W72/50Allocation or scheduling criteria for wireless resources
    • H04W72/535Allocation or scheduling criteria for wireless resources based on resource usage policies
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/004Artificial life, i.e. computing arrangements simulating life
    • G06N3/006Artificial life, i.e. computing arrangements simulating life based on simulated virtual individual or collective life forms, e.g. social simulations or particle swarm optimisation [PSO]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/092Reinforcement learning
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W72/00Local resource management
    • H04W72/20Control channels or signalling for resource management
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W72/00Local resource management
    • H04W72/20Control channels or signalling for resource management
    • H04W72/21Control channels or signalling for resource management in the uplink direction of a wireless link, i.e. towards the network
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W72/00Local resource management
    • H04W72/20Control channels or signalling for resource management
    • H04W72/23Control channels or signalling for resource management in the downlink direction of a wireless link, i.e. towards a terminal
    • H04W72/232Control channels or signalling for resource management in the downlink direction of a wireless link, i.e. towards a terminal the control data signalling from the physical layer, e.g. DCI signalling

Definitions

  • This application relates to the field of communications, and in particular, to a scheduling method, a scheduling system, and a scheduling device.
  • MAC layer scheduling mainly solves the problems of time-frequency resource allocation, modulation and coding scheme (modulation and coding scheme, MCS) selection, user pairing, and precoding. Achieve a compromise between system throughput and fairness.
  • MCS modulation and coding scheme
  • a base station (BS) scheduler using deep reinforcement learning can better achieve a compromise between system throughput and fairness.
  • the scheduler After receiving the revenue feedback of the last scheduling decision, the scheduler must determine the current scheduling decision based on the revenue feedback, and then send the current scheduling decision to the BS for downlink control information (
  • the downlink control information, DCI) is encoded and sent by the BS to the terminal device at the agreed time.
  • DCI downlink control information
  • the scheduler may not be able to obtain the benefits of the last scheduling decision in time, causing the BS to be unable to send this schedule at the appointed time by the system
  • the DCI coding of the decision causes the air interface feedback to lag, which causes the scheduler to be unable to effectively perform deep intensive training in time sequence.
  • the present application provides a scheduling method, a scheduling system, and a scheduling device, which effectively solves the problem of the scheduling process and the air interface timing stuck conflict when the scheduler cannot obtain the profit of the last scheduling decision in time.
  • a scheduling method is provided, which is applied to a scheduling system composed of at least one scheduler, the scheduling system includes a first scheduler, and the method includes: the first scheduler obtains first profit feedback at the i-th time unit , Where i ⁇ 1 and i is an integer; the first scheduler determines the first scheduling decision according to the first revenue feedback, where the first revenue feedback is determined by the terminal device according to the second scheduling decision, and the second scheduling decision is the first The last scheduling decision determined by the scheduler before the first scheduling decision; the first scheduler sends the first scheduling decision in the i+Nth time unit, where N>1 and N is an integer.
  • the scheduler when the scheduler obtains the revenue feedback of the last scheduling decision, it can send the current scheduling decision calculated according to the revenue feedback in N time units after receiving the revenue feedback, so that it can be left to the scheduler Sufficient time for calculation and coding of scheduling decisions effectively solves the problem of the scheduling process and the air interface timing stuck conflict when the scheduler cannot obtain the profit of the last scheduling decision in time.
  • the scheduling system further includes one or more second schedulers, and the method further includes: the second scheduler obtains second revenue feedback at the i+jth time unit , Where 1 ⁇ j ⁇ N-1 and j is an integer; the second scheduler determines the third scheduling decision according to the second revenue feedback, where the second revenue feedback is determined by the terminal device according to the fourth scheduling decision, and the fourth scheduler The decision is the last scheduling decision determined by the second scheduler before the third scheduling decision.
  • the scheduling decision determined by the first scheduler and the scheduling decision determined by the second scheduler are the same for the first scheduler and the second scheduler.
  • Task scheduling decision the second scheduler sends the second scheduling decision in the i+j+M-th time unit, where M>1 and M is an integer.
  • the use scenario is wider than that of using only one scheduler.
  • the scheduling process solution of coordinated scheduling by multiple schedulers is adopted.
  • the first scheduler and the second scheduler alternately obtain uplink revenue feedback and output the scheduling strategy, which is effective Improve the adaptability of the scheduler to the air interface environment.
  • the first scheduler sends the first information to the second scheduler, the first information includes the first scheduling decision or the third revenue feedback, and the third revenue feedback is the terminal
  • the device is determined according to the first scheduling decision; the second scheduler receives the first information and adjusts the scheduling decision for the task after the first information.
  • the schedulers have an information exchange function, which can adjust their own scheduling parameters to ensure that the scheduling strategies of different schedulers are similar and the same, and the scheduling benefits are maximized.
  • the second scheduler sends second information to the first scheduler, the second information includes the second scheduling decision or the fourth revenue feedback, and the fourth revenue feedback is the terminal
  • the device is determined according to the second scheduling decision; the first scheduler receives the second information and adjusts the scheduling decision for the task after the second information.
  • the schedulers have an information exchange function, which can adjust their own scheduling parameters to ensure that the scheduling strategies of different schedulers are similar and the same, and the scheduling benefits are maximized.
  • a scheduling method includes: a terminal device sends a first profit feedback in the i-th time unit, where i ⁇ 1 and i is an integer; and the terminal device receives the first profit feedback in the i+N-th time unit.
  • a first scheduling decision determined by a scheduler based on the first revenue feedback, where the first revenue feedback is determined by the terminal device based on the second scheduling decision, and the second scheduling decision is determined by the first scheduler before the first scheduling decision.
  • One-time scheduling decision, N>1 and N is an integer.
  • the terminal device receives the scheduling decision of the scheduler for the revenue feedback after a period of time after sending the revenue feedback, so that sufficient time can be left for the scheduler to perform the calculation and coding of the scheduling decision, thereby effectively solving the problem.
  • the scheduler cannot obtain the profit of the last scheduling decision in time, the scheduling process and the air interface timing stuck conflict.
  • the terminal device sends the second revenue feedback in the i+jth time unit, where 1 ⁇ j ⁇ N-1 and j is an integer; the terminal device is in the i+jth time unit; i+j+M time units receive the third scheduling decision determined by the second scheduler according to the second revenue feedback, where M>1 and M is an integer, and the second revenue feedback is determined by the terminal device according to the fourth scheduling decision,
  • the fourth scheduling decision is the last scheduling decision determined by the second scheduler before the third scheduling decision.
  • the scheduling decision determined by the first scheduler and the scheduling decision determined by the second scheduler are the first scheduler and the second scheduler, respectively The scheduling decision of the same task by the processor.
  • the scheduling process scheme of coordinated scheduling by multiple schedulers is adopted.
  • the terminal equipment alternately receives the scheduling decisions of different schedulers and determines the revenue feedback according to the scheduling decisions of different schedulers and sends it to the corresponding scheduler, effectively improving the scheduling Adaptability to the air interface environment.
  • N is equal to 2.
  • the value of N is specified by the communication system or communication protocol.
  • a scheduling system configured to obtain first revenue feedback at the i-th time unit, where i ⁇ 1 and i is an integer; the first scheduler, and Used to determine the first scheduling decision based on the first revenue feedback, where the first revenue feedback is determined by the terminal device according to the second scheduling decision, and the second scheduling decision is the last one determined by the first scheduler before the first scheduling decision Scheduling decision; the first scheduler is also used to send the first scheduling decision in the i+Nth time unit, where N>1 and N is an integer.
  • the scheduling system further includes one or more second schedulers, and the second scheduler is used to obtain second revenue feedback at the i+jth time unit, Wherein, 1 ⁇ j ⁇ N-1 and j is an integer; the second scheduler is also used to determine the third scheduling decision based on the second revenue feedback, where the second revenue feedback is determined by the terminal device according to the fourth scheduling decision, The fourth scheduling decision is the last scheduling decision determined by the second scheduler before the third scheduling decision.
  • the scheduling decision determined by the first scheduler and the scheduling decision determined by the second scheduler are the first scheduler and the second scheduler, respectively
  • the second scheduler is also used to send the second scheduling decision in the i+j+M-th time unit, where M>1 and M is an integer.
  • the first scheduler is further configured to send first information to the second scheduler, and the first information includes the first scheduling decision or the third profit feedback, and the third The revenue feedback is determined by the terminal device according to the first scheduling decision; the second scheduler is also used to receive the first information and adjust the scheduling decision for the task after the first information is adjusted.
  • the second scheduler is further configured to send second information to the first scheduler, and the second information includes the second scheduling decision or the fourth revenue feedback, and the fourth The revenue feedback is determined by the terminal device according to the second scheduling decision; the first scheduler is also used to receive the second information and adjust the scheduling decision for the task after the second information is adjusted.
  • a scheduling device is provided, and the scheduling device is configured to execute the scheduling method provided in the foregoing first aspect.
  • the scheduling apparatus may include a module for executing the scheduling method provided in the first aspect.
  • a scheduling device is provided, and the scheduling device is configured to execute the scheduling method provided in the second aspect.
  • the scheduling apparatus may include a module for executing the scheduling method provided in the second aspect.
  • a scheduling device including a processor.
  • the processor is coupled with the memory and can be used to execute instructions in the memory to implement the foregoing first aspect and the scheduling method in any one of the possible implementation manners of the first aspect.
  • the scheduling device further includes a memory.
  • the scheduling device further includes a communication interface, the processor is coupled with the communication interface, and the communication interface is used to input and/or output information.
  • the information includes at least one of instructions and data.
  • the scheduling device is a first scheduler or a second scheduler.
  • the communication interface may be a transceiver or an input/output interface.
  • the scheduling device is a chip or a chip system.
  • the communication interface may be an input/output interface, which may be an input/output interface, interface circuit, output circuit, input circuit, pin, or related circuit on the chip or chip system.
  • the processor can also be embodied as a processing circuit or a logic circuit.
  • the scheduling device is a chip or a chip system configured in the first scheduler or the second scheduler.
  • the transceiver may be a transceiver circuit.
  • the input/output interface may be an input/output circuit.
  • a scheduling device including a processor.
  • the processor is coupled with the memory and can be used to execute instructions in the memory to implement the foregoing second aspect and the scheduling method in any one of the possible implementation manners of the second aspect.
  • the scheduling device further includes a memory.
  • the scheduling device further includes a communication interface, the processor is coupled with the communication interface, and the communication interface is used to input and/or output information.
  • the information includes at least one of instructions and data.
  • the scheduling device is a terminal device.
  • the communication interface may be a transceiver, or an input/output interface.
  • the scheduling device is a chip or a chip system.
  • the communication interface may be an input/output interface, interface circuit, output circuit, input circuit, pin or related circuit on the chip or chip system.
  • the processor can also be embodied as a processing circuit or a logic circuit.
  • the scheduling device is a chip or a chip system configured in a terminal device.
  • the transceiver may be a transceiver circuit.
  • the input/output interface may be an input/output circuit.
  • a computer-readable storage medium is provided with a computer program stored thereon.
  • the scheduling device When the computer program is executed by a scheduling device, the scheduling device enables the scheduling device to implement the first aspect and the scheduling method in any possible implementation manner of the first aspect .
  • a computer-readable storage medium on which a computer program is stored.
  • the communication device When the computer program is executed by a scheduling device, the communication device enables the communication device to implement the second aspect and the scheduling method in any possible implementation manner of the second aspect .
  • a computer program product containing instructions is provided.
  • the scheduling device implements the scheduling method provided in the first aspect.
  • a computer program product containing instructions is provided.
  • the scheduling device implements the scheduling method provided in the second aspect.
  • a scheduling system including the aforementioned first scheduler and terminal equipment; or, including the aforementioned first scheduler, second scheduler, and terminal equipment.
  • Fig. 1 is a schematic diagram of a network architecture applicable to an embodiment of the present application.
  • Figure 2 is a schematic diagram of the training process of reinforcement learning.
  • Fig. 3 is a processing sequence diagram of BS and UE using deep reinforcement learning for scheduling.
  • Fig. 4 is a schematic block diagram of a scheduling method provided by an embodiment of the present application.
  • Fig. 5 is a schematic flowchart of an uplink BS scheduling method provided by an embodiment of the present application.
  • Fig. 6 is a schematic flowchart of a downlink BS scheduling method provided by an embodiment of the present application.
  • FIG. 7 is a schematic flowchart of a scheduling method performed by multiple BS schedulers according to an embodiment of the present application.
  • FIG. 8 is a schematic block diagram of the scheduling device 1000 provided by this application.
  • FIG. 9 is a schematic block diagram of a scheduling device 2000 provided by this application.
  • FIG. 10 is a schematic structural diagram of the communication device 10 provided by this application.
  • FIG. 11 is a schematic structural diagram of the communication device 20 provided by this application.
  • LTE long term evolution
  • FDD frequency division duplex
  • UMTS time division duplex
  • NR new radio
  • 5G fifth generation
  • Fig. 1 is a schematic diagram of a network architecture applicable to an embodiment of the present application.
  • the network architecture may include at least one network device 110, at least one terminal device 120, and at least one scheduler 130.
  • the terminal device 120 may be mobile or fixed.
  • the network device 110 is a device that can communicate with the terminal device 120 via a wireless link, such as a base station or a base station controller.
  • the scheduler 130 can achieve a compromise between system throughput and fairness through information interaction with the network device 110 and the terminal device 120.
  • FIG. 1 only exemplarily shows a network device, a terminal device, and a scheduler, but this should not constitute any limitation to this application.
  • the network device 110 and the scheduler 130 may be physically independent devices, or the network device 110 may also be integrated with the scheduler 130, which is not limited herein.
  • Each of the above-mentioned communication devices may be equipped with multiple antennas.
  • the plurality of antennas may include at least one transmitting antenna for transmitting signals and at least one receiving antenna for receiving signals.
  • each communication device additionally includes a transmitter chain and a receiver chain.
  • Those of ordinary skill in the art can understand that they can all include multiple components related to signal transmission and reception (such as processors, modulators, multiplexers, etc.). , Demodulator, demultiplexer or antenna, etc.). Therefore, multiple antenna technology can be used to communicate between network devices and terminal devices.
  • the network device may be any device that has a wireless transceiver function.
  • Network equipment includes but is not limited to: evolved Node B (evolved Node B, eNB), radio network controller (RNC), Node B (Node B, NB), home base station (for example, home evolved Node B, Or home Node B, HNB), baseband unit (BBU), access point (AP) in wireless fidelity (WIFI) system, wireless relay node, wireless backhaul node, transmission Point (transmission point, TP) or transmission and reception point (transmission and reception point, TRP), etc., can also be the gNB or transmission point (TRP or TP) in the 5G (such as NR) system, or the base station in the 5G system
  • One or a group of (including multiple antenna panels) antenna panels, or, may be a network node that constitutes a gNB or transmission point, such as a baseband unit (BBU), or a distributed unit (DU), etc.
  • Network equipment can communicate with terminal equipment through uplink transmission or downlink transmission of data.
  • terminal equipment may also be referred to as user equipment (UE), access terminal, user unit, user station, mobile station, mobile station, remote station, remote terminal, mobile equipment, user terminal, Terminal, wireless communication equipment, user agent or user device.
  • UE user equipment
  • the terminal device in the embodiment of the present application may be a mobile phone (mobile phone), a tablet computer (pad), a computer with a wireless transceiver function, a virtual reality (VR) terminal device, and an augmented reality (AR) terminal Equipment, wireless terminals in industrial control, wireless terminals in self-driving, wireless terminals in remote medical, wireless terminals in smart grid, transportation safety ( Wireless terminal in transportation safety, wireless terminal in smart city, wireless terminal in smart home (smart home), cellular phone, cordless phone, session initiation protocol (SIP) phone, wireless local Loop (wireless local loop, WLL) stations, personal digital assistants (personal digital assistants, PDAs), handheld devices with wireless communication functions, computing devices or other processing devices connected to wireless modems, in-vehicle devices, wearable devices, 5G Terminal equipment in the network, terminal equipment in a non-public network, etc.
  • transportation safety Wireless terminal in transportation safety, wireless terminal in smart city, wireless terminal in smart home (smart home), cellular phone, cordless phone, session initiation
  • wearable devices can also be called wearable smart devices, which are the general term for the application of wearable technology to intelligently design daily wear and develop wearable devices, such as glasses, gloves, watches, clothing and shoes.
  • a wearable device is a portable device that is directly worn on the body or integrated into the user's clothes or accessories.
  • Wearable devices are not only a kind of hardware device, but also realize powerful functions through software support, data interaction, and cloud interaction.
  • wearable smart devices include full-featured, large-sized, complete or partial functions that can be achieved without relying on smart phones, such as smart watches or smart glasses, and only focus on a certain type of application function, and need to cooperate with other devices such as smart phones.
  • the terminal device may also be a terminal device in an Internet of Things (IoT) system.
  • IoT Internet of Things
  • Its main technical feature is to connect objects to the network through communication technology, so as to realize the intelligent network of human-machine interconnection and interconnection of things.
  • Reinforcement learning is a field in machine learning.
  • Figure 2 is a schematic diagram of the training process of reinforcement learning.
  • reinforcement learning mainly includes four elements: agent, environment, state, action, and reward.
  • the input of the agent is the state and the output For the action.
  • the training process of reinforcement learning is: through the agent interacts with the environment multiple times to obtain the actions, states, and rewards of each interaction; using these multiple sets (actions, states, rewards) as training data, the agent Have a training session. Using the above process, the agent is trained for the next round until the convergence condition is met. Among them, the process of obtaining an interactive action, state, and reward is shown in Figure 1. The current state of the environment s(t) is input to the agent, and the action a(t) output by the agent is obtained.
  • the relevant performance index under the action of) calculates the reward r(t) of this interaction, so far, the action a(t), action a(t) and reward r(t) of this interaction are obtained. Record the action a(t), action a(t) and reward r(t) of this interaction for subsequent training of the agent. It also records the next state s(t+1) of the environment under the action a(t), so as to realize the next interaction between the agent and the environment.
  • Deep reinforcement learning deep reinforcement learning, DRL
  • Deep reinforcement learning still conforms to the framework of interaction between the subject and the environment in reinforcement learning. The difference is that deep neural networks are used in agents to make decisions.
  • FIG. 3 is a processing sequence diagram of BS and UE using deep reinforcement learning for scheduling.
  • the current workflow of the BS scheduler based on deep reinforcement learning is generally:
  • the scheduler obtains feedback on scheduling revenue. For example, the scheduler obtains the uplink data sent by the UE in frame n, demodulates and decodes the uplink data, and obtains the revenue feedback of the most recent scheduling decision output by the scheduler.
  • Revenue feedback means that in deep reinforcement learning, revenue feedback can be used to determine the impact of scheduling decisions on system throughput and user fairness. For example: the scheduling decision A of the scheduler can improve the system throughput and user fairness, then the terminal equipment will have higher profit feedback to the scheduling decision A. Similarly, if the scheduling decision B of the scheduler reduces the system throughput and the user’s fairness. Fairness, then the terminal device's revenue feedback to the scheduling decision B is low, so that the scheduler can continuously update the scheduling decision based on the revenue feedback sent by the terminal device, thereby determining the optimal scheduling decision.
  • the scheduler outputs the scheduling decision, and the BS encodes the DCI and sends it. For example, the scheduler determines the current scheduling decision according to the revenue feedback obtained in frame n, the BS encodes the current scheduling decision to generate DCI#1, and the BS sends DCI#1 to the UE in frame n+1.
  • the UE receives DCI#1 sent by the BS, it decodes DCI#1 and sends revenue feedback (ie uplink data) to the BS according to the decoding result.
  • the BS scheduler is in frame n+1 Obtain the profit feedback sent by the UE.
  • the BS scheduler performs decision-coding DCI#2 again in frame n+1 according to the uplink data, and sends DCI#2 to the UE in frame n+2.
  • the scheduling process of the above scheduler may be limited to step c. Obtaining the profit after the decision through the uplink. First, the time-consuming length of step c directly affects the response speed of the BS scheduler. If the feedback time is too long, the dynamic adaptability of the scheduler will decrease. Secondly, different frame structures also have a great impact on the timeliness of step c. For example, in a 10ms system frame, the BS issues DCI in subframe 0 of frame n+1. If the UE reports scheduling revenue in subframe 9 of frame n, because DCI coding needs to be carried out in advance, the BS cannot perform DCI in frame n+1. The feedback scheduling income reported in the subframe 9 of the frame n in the subframe 0 causes the air interface feedback to lag. Therefore, if the timing is followed, the deep reinforcement learning training cannot be effectively performed.
  • this application proposes a scheduling method to solve the impact of air interface feedback lag on the deep reinforcement learning scheduler.
  • one or more time units may be included in the time domain.
  • it can be divided into frames with a time length of 10ms, each frame is divided into 10 subframes with the same size and a length of 1ms, and each subframe can contain one or more time slots .
  • the time unit in the embodiment of the present application is described by taking a frame as an example.
  • FIG. 4 is a schematic block diagram of a scheduling method provided by an embodiment of the present application. The method is applied to a scheduling system composed of at least one scheduler.
  • the scheduling process of the first scheduler includes:
  • the first scheduler obtains the first profit feedback in the i-th time unit, i>1 and i is an integer.
  • the terminal device sends the first profit feedback to the network device in the i-th time unit, and the first scheduler obtains the first profit feedback from the network device in the i-th time unit.
  • the output scheduling decision is determined (that is, the second scheduling decision).
  • the network device here may be the network device described in FIG. 1, for example, the network device may be a BS, and more specifically, it may be a BS baseband.
  • the first scheduler determines a first scheduling decision according to the first profit feedback.
  • the first scheduler sends the first scheduling decision in the i+Nth time unit.
  • the value of N may be specified by the communication system or communication protocol.
  • the first scheduler sends the first scheduling decision to the network device for DCI encoding, and the network device sends the DCI to the terminal device in i+N time units.
  • the scheduling system may also include one or more second schedulers.
  • the second scheduler and the first scheduler respectively perform scheduling decision output for the same task, and the first scheduler and the second scheduler round-robin Issue scheduling decisions and obtain feedback on scheduling revenue.
  • the same task here refers to a process in which the system side determines whether the same terminal device is scheduled or the system side allocates time-frequency resources to the same terminal device in the same scheduling system.
  • the scheduling process of the second scheduler includes:
  • the terminal device sends the second revenue feedback to the network device in the i+jth time unit, and the second scheduler obtains the second revenue feedback from the network device in the i+jth time unit.
  • the second revenue feedback is the terminal device according to the second revenue feedback.
  • the most recent scheduling decision output by the scheduler is determined (that is, the fourth scheduling decision).
  • the second scheduler determines the third scheduling decision based on the second revenue feedback.
  • the second scheduler sends the third scheduling decision in the i+j+M-th time unit, where M>1 and M is an integer.
  • the scheduling of the first scheduler and the second scheduler will be described below with examples.
  • the second scheduler sends the third scheduling decision to the network device for DCI encoding, and the network device sends the DCI to the terminal device in i+j+M time units.
  • the value of M may be specified by the communication system or communication protocol.
  • the scheduling period M of the second scheduler may be equal to or not equal to the scheduling period N of the first scheduler.
  • the first scheduler sends scheduling decisions and obtains scheduling revenue feedback in frame n, frame (n+2), frame (n+4), frame (n+6)...
  • the second scheduler sends scheduling decisions in frame (n+1), frame (n+3), frame (n+5), frame (n+7)... And get feedback on scheduling revenue;
  • the second scheduler sends scheduling decisions in frame (n+1), frame (n+5), frame (n+9), frame (n+13)... And get feedback on scheduling revenue.
  • the first scheduler is in frame n, frame (n+ 3), frame (n+6), frame (n+9)...send scheduling decisions and obtain scheduling revenue feedback;
  • the time period M for the existence of the second scheduler is 4, then the first scheduler is in frame (n+ 1), frame (n+5), frame (n+9), frame (n+13)... Up and down scheduling decisions and obtain scheduling revenue feedback, at this time, both schedulers will be in frame (n+9) Send up and down scheduling decisions and obtain feedback on scheduling revenue.
  • only one of the schedulers is selected by the network device.
  • the selected scheduler normally performs according to the preset scheduling period, and the unselected scheduler skips the current scheduling profit feedback acquisition, scheduling decision calculation, and scheduling decision issuance.
  • the unselected scheduler does not receive the corresponding scheduling revenue within the time window for receiving the scheduling revenue this time, and will retransmit the previous scheduling strategy at the next scheduling moment.
  • the time window for receiving the scheduling revenue refers to the latest arrival time of the scheduling revenue required for the decision calculation from the last time the scheduling policy is issued.
  • the network device selects the scheduler in a predetermined manner, one of which can be fixedly selected, or it can be selected by polling, or it can be selected according to other parameters (for example, allocated to a scheduler with a lower load).
  • the first scheduler and the second scheduler may also interact with the scheduling decision.
  • the first scheduler and the second scheduler conduct decision-making interaction after a preset time period.
  • the first scheduler and the second scheduler perform decision-making interactions after reaching a preset number of scheduling decisions.
  • the interaction information between the first scheduler and the second scheduler may be the profit corresponding to the most recent scheduling decision.
  • the first scheduler may send the first scheduling decision and the revenue feedback of the first scheduling decision to the second scheduler, or the second scheduler of the scheduler may send the third scheduling decision and the revenue feedback of the third scheduler to the second scheduler.
  • the first scheduler and the second scheduler amend their respective scheduling decisions according to the received interactive information, so that the scheduling decisions between the interactive schedulers gradually converge in the same direction.
  • the first scheduler and the second scheduler may not make decision adjustments, and compare the scheduling decisions output by the two schedulers according to the received interactive information and its own scheduling decision. difference between.
  • the scheduler when the scheduler obtains the revenue feedback of the last scheduling decision, it does not need to complete the calculation of the scheduling decision in the current time unit, and the network device does not need to send DCI in the next time unit, but in one or more intervals. Sending DCI after this time unit is not limited to the air interface frame structure, leaving enough time for the scheduler to calculate and code the scheduling decision. This effectively solves the problem that the scheduling process and the air interface timing stuck conflict when the scheduler is unable to obtain the last scheduling decision benefit in time.
  • the first scheduler performs scheduling decision output across time units in the time domain, then one or more second schedulers can use the time units not used by the first scheduler to make scheduling decisions for the same task, thereby improving system time. Frequency resource utilization.
  • FIG. 5 is a schematic flowchart of an uplink BS scheduling method provided by an embodiment of the present application.
  • Fig. 5 includes two schedulers, namely scheduler A and scheduler B.
  • Fig. 4 shows only two schedulers by way of example, and the number of schedulers in this application is not limited to those in Fig. 4 There are 2 schedulers, which can be 3 or more.
  • the system uses scheduler A (an example of the second scheduler) and scheduler B (an example of the first scheduler) to coordinate and schedule, and each scheduler obtains scheduling input in a round-robin manner
  • scheduler A and the scheduling decision determined by the scheduler B are the scheduling decisions of the scheduler A and the scheduler B for the same task respectively.
  • the specific uplink scheduling process is as follows:
  • the UE receives and decodes DCI#1 in frame n.
  • the DCI#1 here is generated by the BS baseband according to the scheduling decision output last time by the scheduler B (that is, an example of the second scheduling decision).
  • the UE encodes and sends uplink data in frame n.
  • the UE performs PUSCH encoding on the uplink data according to the received DCI#1 information and sends the encoded uplink data to the BS baseband in frame n.
  • the BS baseband receives and decodes uplink data in frame n.
  • the BS baseband receives the uplink data in frame n and decodes the uplink data, obtains the revenue feedback R1 of the last uplink scheduling decision (that is, an example of the first revenue feedback) from the decoded data and sends it to the scheduler B.
  • the terminal device does not directly calculate the revenue feedback, but only performs uplink data encoding based on the received DCI information, and the BS calculates and obtains the corresponding revenue feedback based on the received uplink data.
  • the scheduler B receives the revenue feedback R1 sent by the BS baseband.
  • the scheduler B calculates the scheduling decision B1 (that is, an example of the first scheduling decision) according to the revenue feedback R1.
  • the scheduler B outputs the scheduling decision B1 to the BS baseband.
  • the BS baseband encodes DCI#2, and sends DCI#2 in frame n+2.
  • the BS baseband integrates it into DCI and performs DCI coding to generate DCI#2.
  • the BS baseband sends DCI#2 to the UE in frame (n+2).
  • the DCI coding can be started only after receiving the scheduling decision B1, and it can be completed before the BS baseband sends the DCI.
  • the UE receives and decodes DCI#2 in frame (n+2).
  • step 108 is the same as step 101, which means that the terminal device and the scheduler start to enter the next round of learning.
  • the UE encodes and sends uplink data in frame (n+2).
  • the UE performs PUSCH coding on the uplink data according to the received DCI#2 and sends the coded uplink data to the BS baseband in frame (n+2).
  • the BS baseband receives and decodes uplink data in frame (n+2).
  • the BS receives the uplink data in the frame (n+2) and decodes the uplink data, obtains the revenue feedback R2 of this uplink scheduling from the decoded data, and sends it to the scheduler B.
  • the scheduler B receives the revenue feedback R2.
  • the scheduler B calculates the scheduling decision B2 according to the revenue feedback R2.
  • step 113 The scheduler B outputs the scheduling decision B2 to the BS baseband.
  • the operation of step 113 is the same as that of step 106. Refer to the operations after step 106 for subsequent loop operations.
  • the scheduling process of the caller A and the scheduling process of the scheduler B are the same, the difference is that the two schedulers make scheduling decisions on different time units.
  • Scheduler A receives revenue feedback R3 in frame (n-1).
  • the scheduler A calculates the scheduling decision A1 according to the revenue feedback R3.
  • the scheduler A outputs the scheduling decision A1 (that is, an example of the fourth scheduling decision) to the BS baseband.
  • the BS baseband encodes DCI#3, and sends DCI#3 in frame n+3.
  • the BS baseband integrates it into the DCI and performs DCI coding to generate DCI#3.
  • the BS baseband sends DCI#3 to the UE in frame (n+1).
  • the DCI coding can be started only after receiving the scheduling decision of the scheduler A, and it can be completed before the BS baseband sends the DCI.
  • the UE receives and decodes DCI#3 in frame (n+1).
  • the UE encodes and sends uplink data in frame (n+1).
  • the UE performs PUSCH encoding on the uplink data according to the received DCI#3 and sends the encoded uplink data to the BS baseband in frame (n+1).
  • the BS baseband receives and decodes uplink data in frame (n+1).
  • the BS receives the uplink data in frame (n+1) and decodes the uplink data, and obtains the revenue feedback R4 of this uplink scheduling from the decoded data (that is, an example of the second revenue feedback) and sends it to the scheduler A.
  • the scheduler A receives the revenue feedback R4.
  • the scheduler A calculates the scheduling decision A2 (that is, an example of the third scheduling decision) according to the revenue feedback R4.
  • the scheduler A outputs the scheduling decision A2 to the BS baseband.
  • the BS baseband encodes DCI#4, and sends DCI#4 in frame (n+3).
  • the BS baseband integrates it into the DCI and performs DCI encoding to generate DCI#4, and the BS baseband sends DCI#4 to the UE in frame (n+3).
  • the operation of step 211 is the same as that of step 204. Refer to the operations after step 204 for subsequent loop operations.
  • multiple schedulers in the scheduling system may also interact with the scheduling decision, and the interaction period may be defined by the user or the system.
  • multiple schedulers conduct decision-making interactions after a preset time period.
  • multiple schedulers perform decision-making interactions after reaching a preset number of scheduling decisions.
  • the multiple scheduler interaction information may be the profit corresponding to the most recent scheduling decision.
  • the scheduler A can send the scheduling decision A1 and the revenue feedback R4 to the scheduler B, or the scheduler B can send the scheduling decision B1 and the revenue feedback R2 to the scheduler A.
  • the schedulers A and B modify their respective scheduling decisions according to the received interactive information, so that the scheduling decisions of multiple schedulers gradually converge in the same direction.
  • the schedulers A and B do not make decision adjustments, and compare the differences between the scheduling decisions output by the two schedulers according to the received information and their own scheduling decisions.
  • the use scenario is broader than using only one scheduler.
  • the schedulers A and B alternately obtain uplink revenue feedback and output the scheduling strategy to DCI.
  • the UE receives and decodes the DCI and performs uplink data encoding according to the scheduling decision.
  • the BS alternately feeds back the scheduling profit results to the corresponding scheduler.
  • the BS baseband obtains the scheduling strategy code DCI from a certain scheduler, it will feed back the most recent subsequent scheduling income to the same scheduler; the BS baseband will obtain the scheduling strategy from another scheduler next time, and then feedback the subsequent most recent One-time scheduling income.
  • the solution of multiple schedulers can effectively improve the adaptability of the scheduler to changes in the air interface environment.
  • there is an information exchange function between schedulers which can adjust their own scheduling parameters to ensure that the scheduling strategies of different schedulers are similar and the same, and the scheduling benefits are maximized.
  • FIG. 6 is a schematic flowchart of a downlink BS scheduling method provided by an embodiment of the present application.
  • Fig. 6 includes two schedulers, namely scheduler A and scheduler B. It should be understood that Fig. 6 shows only two schedulers by way of example, and the number of schedulers in this application is not limited to those in Fig. 6 There are 2 schedulers, which can be 3 or more.
  • the system uses scheduler A and scheduler B to coordinate and schedule together, and each scheduler obtains scheduling input and executes scheduling decisions in a round-robin manner.
  • the specific downlink scheduling process is as follows:
  • the UE receives and decodes DCI#1 in frame n.
  • the DCI#1 here is generated by the BS baseband according to the scheduling decision output last time by the scheduler B (that is, an example of the second scheduling decision).
  • the UE sends revenue feedback R1 in frame n.
  • the UE determines and encodes the revenue feedback R1 according to the received DCI#1 information, and sends the encoded revenue feedback R1 to the BS baseband in frame n.
  • the terminal device directly calculates the revenue feedback and encodes the calculated revenue feedback and feeds it back to the BS baseband.
  • the BS baseband only needs to respond to the received data Decoding can directly obtain the profit feedback this time.
  • the BS baseband receives the revenue feedback R1 in frame n.
  • the BS baseband receives the uplink data in frame n and decodes the uplink data, obtains the revenue feedback R1 (that is, an example of the first revenue feedback) from the decoded data and sends it to the scheduler B.
  • R1 that is, an example of the first revenue feedback
  • the scheduler B receives the revenue feedback R1 sent by the BS baseband.
  • the scheduler B calculates the scheduling decision B1 (that is, an example of the first scheduling decision) according to the revenue feedback R1.
  • the scheduler B outputs the scheduling decision B1 to the BS baseband.
  • the BS baseband encodes DCI#2, and sends DCI#2 in frame n+2.
  • the BS baseband integrates it into DCI#2 and determines the downlink data for PDSCH encoding.
  • the BS baseband sends DCI#2 and the encoded downlink data to the UE in frame (n+2) .
  • the BS baseband can start DCI coding after receiving the scheduling decision, and it can be completed before the BS baseband sends the DCI.
  • the UE receives and decodes DCI#2 in frame (n+2).
  • step 608 is the same as step 601, which means that the terminal device and the scheduler start to enter the next round of learning.
  • the UE sends revenue feedback R2 in frame (n+2).
  • the UE performs PUSCH coding on the uplink data according to the received DCI#2 and sends the coded uplink data to the BS baseband in frame (n+2).
  • the UE determines and encodes the revenue feedback R2 according to the received DCI#2 information, and sends the encoded revenue feedback R2 to the BS baseband in frame n.
  • the BS baseband receives the revenue feedback R2 in frame (n+2).
  • the BS receives the uplink data in the frame (n+2) and decodes the uplink data, obtains the revenue feedback R2 from the decoded data and sends it to the scheduler B.
  • the scheduler B receives the revenue feedback R2.
  • the scheduler B calculates the scheduling decision B2 according to the revenue feedback R2.
  • Step 613 The scheduler B outputs the scheduling decision B2 to the BS baseband.
  • Step 613 has the same operation as step 606. Refer to the operations after step 306 for subsequent loop operations.
  • the scheduling process of the caller A and the scheduling process of the scheduler B are the same, the difference is that the two schedulers make scheduling decisions on different time units.
  • Scheduler A receives revenue feedback R3 in frame (n-1).
  • the scheduler A calculates the scheduling decision A1 according to the revenue feedback R3.
  • the scheduler A outputs the scheduling decision A1 (that is, an example of the fourth scheduling decision) to the BS baseband.
  • the BS baseband encodes DCI#3, and sends DCI#3 in frame n+3.
  • the BS baseband integrates it into DCI#3 and decides the downlink data to perform PDSCH encoding.
  • the BS baseband sends DCI#3 and downlink data to the UE in frame (n+1).
  • the scheduler A can start DCI coding after receiving the scheduling decision, and it can be completed before the BS baseband sends the DCI.
  • the UE receives and decodes DCI#3 in frame (n+1).
  • the UE sends revenue feedback R4 in frame (n+1).
  • the UE determines and encodes the revenue feedback R4 according to the received DCI#3 information, and sends the encoded revenue feedback R4 to the BS baseband in frame (n+1).
  • the BS baseband receives the revenue feedback R4 in frame (n+1).
  • the BS receives the uplink data in frame (n+1) and decodes the uplink data, obtains the revenue feedback R4 (that is, an example of the second revenue feedback) from the decoded data and sends it to the scheduler A.
  • the revenue feedback R4 that is, an example of the second revenue feedback
  • the scheduler A receives the revenue feedback R4.
  • the scheduler A calculates the scheduling decision A2 (that is, an example of the third scheduling decision) according to the revenue feedback R4.
  • the scheduler A outputs the scheduling decision A2 to the BS baseband.
  • the BS baseband encodes DCI#4, and sends DCI#4 in frame (n+3).
  • the BS baseband integrates it into DCI#4 and decides downlink data to perform PDSCH encoding.
  • the BS baseband sends DCI#4 and downlink data to the UE in a frame (n+3).
  • Step 711 has the same operation as step 704. Refer to the operations after step 704 for subsequent loop operations.
  • schedulers A and B can also interact and adjust scheduling decisions, which will not be repeated here.
  • the schedulers A and B alternately obtain the next revenue feedback, output the scheduling decision to the DCI and encode the downlink data according to the scheduling decision, and the UE receives and decodes the DCI and determines the revenue feedback of the scheduling decision according to the scheduling decision.
  • the BS alternately feeds back the scheduling revenue result to the corresponding scheduler.
  • the BS baseband obtains the scheduling strategy code DCI from a certain scheduler, it will feed back the most recent subsequent scheduling income to the same scheduler; the BS baseband will obtain the scheduling strategy from another scheduler next time, and then feedback the subsequent most recent One-time scheduling income.
  • the usage scenarios are broader and not limited to the air interface frame structure.
  • the BS baseband After receiving the revenue feedback, there is sufficient time for calculating the scheduling decision, and the BS baseband has enough time to proceed after receiving the scheduling decision. Coding solves the effect of air interface feedback lag on the deep reinforcement learning scheduler. At the same time, there is an information exchange function between schedulers, which can adjust their own scheduling parameters to ensure that the scheduling strategies of different schedulers are similar and the same, and the scheduling benefits are maximized.
  • FIG. 7 is a schematic flowchart of a method for performing scheduling by multiple BS schedulers according to an embodiment of the present application.
  • Figure 7 includes three schedulers, namely scheduler A, scheduler B and scheduler C.
  • the scheduling sequence of the three schedulers is B, A, C, B, A, C, where scheduler B obtains revenue feedback from the BS baseband in frame n, and BS outputs schedule in frame (n+3)
  • the scheduling decision of the scheduler B, the scheduler A obtains the revenue feedback from the BS baseband in the frame (n+1), the BS outputs the scheduling decision of the scheduler A in the frame (n+4), and the scheduler C is in the frame (n+2)
  • the revenue feedback is obtained from the BS baseband, and the BS outputs the scheduling decision of the scheduler C in the frame (n+5).
  • scheduler A obtains revenue feedback from BS baseband in frame (n+1), and BS outputs schedule in frame (n+1+N)
  • Scheduler A's scheduling decision scheduler B obtains revenue feedback from the BS baseband in frame n
  • BS outputs the scheduling decision of scheduler B in frame (n+N)
  • scheduler C obtains it from BS baseband in frame (n+2)
  • Revenue feedback the BS outputs the scheduling decision of scheduler C in the frame (n+2+N), where N ⁇ Q and N is an integer.
  • the remaining schedulers can perform revenue feedback and feedback in the reserved frames according to the above rules. Decision output, I won’t go into details here.
  • the alternate round-robin sequence of the scheduler in this application is not limited to the sequential output, as long as the scheduling sequence among multiple schedulers is regular and followable in time units.
  • the scheduling sequence of schedulers A and B in Figure 5 and Figure 6 can also be A, B, B, A, B, B, A or A, A, B, B, A, A, B, B
  • the scheduling sequence of schedulers A, B, and C in Figure 7 can also be A, B, A, C, A, B, A, C or A, B, B, A, C, A, B, B, A, C and other periodic and regular sequence.
  • FIG. 8 is a schematic block diagram of the scheduling apparatus 1000 provided by this application.
  • the scheduling device 1000 includes a sending unit 1100, a receiving unit 1200, and a processing unit 1300.
  • the sending unit 1100 is configured to send the first revenue feedback in the i-th time unit, where i ⁇ 1 and i is an integer; the receiving unit is configured to receive the first scheduler according to the first revenue in the i+N-th time unit Feed back the determined first scheduling decision, where,
  • the processing unit 1300 is configured to determine the first benefit according to the second scheduling decision, the second scheduling decision is the last scheduling decision determined by the first scheduler before the first scheduling decision, N>1 and N is an integer.
  • the sending unit 1100 is further configured to send the second profit feedback in the i+jth time unit, where 1 ⁇ j ⁇ N-1 and j is an integer;
  • the receiving unit 1200 is configured to receive, in the i+j+M-th time unit, the third scheduling decision determined by the second scheduler according to the second revenue feedback, where M>1 and M is an integer,
  • the processing unit 1300 is configured to determine the second profit feedback according to the fourth scheduling decision, the fourth scheduling decision being the last scheduling decision determined by the second scheduler before the third scheduling decision, the scheduling decision determined by the first scheduler and the first
  • the scheduling decisions determined by the two schedulers are the scheduling decisions of the first scheduler and the second scheduler for the same task, respectively.
  • the receiving unit 1100 and the sending unit 1200 can also be integrated into one transceiver unit, which has both receiving and sending functions, which is not limited here.
  • N is equal to 2.
  • the value of N is specified by the communication system or communication protocol.
  • the scheduling apparatus 1000 may be a terminal device in the method embodiment.
  • the sending unit 1100 may be a transmitter
  • the receiving unit 1200 may be a receiver.
  • the receiver and transmitter can also be integrated into one transceiver.
  • the processing unit 1300 may be a processing device.
  • the scheduling apparatus 1000 may be a chip or an integrated circuit installed in a terminal device.
  • the sending unit 1100 and the receiving unit 1200 may be communication interfaces or interface circuits.
  • the sending unit 1100 is an output interface or an output circuit
  • the receiving unit 1200 is an input interface or an input circuit
  • the processing unit 1300 may be a processing device.
  • the function of the processing device can be realized by hardware, or by hardware executing corresponding software.
  • the processing device may include a memory and a processor, where the memory is used to store a computer program, and the processor reads and executes the computer program stored in the memory, so that the scheduling device 1000 executes the operations performed by the terminal device in each method embodiment and /Or processing.
  • the processing device may only include a processor, and the memory for storing the computer program is located outside the processing device.
  • the processor is connected to the memory through a circuit/wire to read and execute the computer program stored in the memory.
  • the processing device may be a chip or an integrated circuit.
  • FIG. 9 is a schematic block diagram of the scheduling device 2000 provided by this application.
  • the scheduling device 2000 includes a sending unit 2100, a receiving unit 2200, and a processing unit 2300.
  • the processing unit 2300 is configured to obtain the first profit feedback in the i-th time unit, where i ⁇ 1 and i is an integer;
  • the processing unit 2300 is further configured to determine a first scheduling decision according to the first revenue feedback, where the first revenue feedback is determined by the terminal device according to a second scheduling decision, and the second scheduling decision is the The last scheduling decision determined by the processing unit before the first scheduling decision;
  • the sending unit 2100 is configured to send the first scheduling decision before the i+Nth time unit, where N>1 and N is an integer.
  • the sending unit 2100 and the receiving unit 2200 can also be integrated into one transceiver unit, which has the functions of receiving and sending at the same time, which is not limited here.
  • the scheduling apparatus 2000 may be the first scheduler or the second scheduler in the method embodiment.
  • the sending unit 2100 may be a transmitter
  • the receiving unit 2200 may be a receiver.
  • the receiver and transmitter can also be integrated into one transceiver.
  • the processing unit 2300 may be a processing device.
  • the scheduling device 2000 may be a chip or an integrated circuit installed in the first scheduler or the second scheduler.
  • the sending unit 2100 and the receiving unit 2200 may be communication interfaces or interface circuits.
  • the sending unit 2100 is an output interface or an output circuit
  • the receiving unit 2200 is an input interface or an input circuit
  • the processing unit 2300 may be a processing device.
  • the function of the processing device can be realized by hardware, or by hardware executing corresponding software.
  • the processing device may include a memory and a processor, where the memory is used to store a computer program, and the processor reads and executes the computer program stored in the memory, so that the scheduling device 2000 executes the first scheduler and the second scheduler in each method embodiment. 2. Operations and/or processing performed by the scheduler.
  • the processing device may only include a processor, and the memory for storing the computer program is located outside the processing device.
  • the processor is connected to the memory through a circuit/wire to read and execute the computer program stored in the memory.
  • the processing device may be a chip or an integrated circuit.
  • the communication device 10 includes: one or more processors 11, one or more memories 12 and one or more communication interfaces 13.
  • the processor 11 is used to control the communication interface 13 to send and receive signals
  • the memory 12 is used to store a computer program
  • the processor 11 is used to call and run the computer program from the memory 12, so that the terminal device executes the Processes and/or operations are executed.
  • the processor 11 may have the function of the processing unit 1300 shown in FIG. 8, and the communication interface 13 may have the function of the sending unit 1100 and/or the receiving unit 1200 shown in FIG. 8.
  • the processor 11 may be used to execute the processing or operations executed internally by the terminal device in FIG. 4 to FIG. 7, and the communication interface 13 is used to execute the sending and/or receiving actions executed by the terminal device in FIG. 4 to FIG.
  • the communication device 10 may be a terminal device in the method embodiment.
  • the communication interface 13 may be a transceiver.
  • the transceiver may include a receiver and a transmitter.
  • the processor 11 may be a baseband device, and the communication interface 13 may be a radio frequency device.
  • the communication device 10 may be a chip installed in a terminal device.
  • the communication interface 13 may be an interface circuit or an input/output interface.
  • the communication device 20 includes: one or more processors 21, one or more memories 22 and one or more communication interfaces 23.
  • the processor 21 is used to control the communication interface 23 to send and receive signals
  • the memory 22 is used to store a computer program
  • the processor 21 is used to call and run the computer program from the memory 22, so that the first scheduler in each method embodiment of the present application Or the process and/or operation executed by the second scheduler is executed.
  • the processor 21 may have the functions of the processing unit 2300 shown in FIG. 9, and the communication interface 23 may have the functions of the sending unit 2100 and the receiving unit 2200 shown in FIG. 9.
  • the processor 21 may be used to execute the processing or operation executed internally by the first scheduler or the second scheduler in FIGS. 4-7, and the communication interface 23 may be used to execute Or the sending and/or receiving actions performed by the second scheduler will not be repeated.
  • processor and the memory in the foregoing device embodiments may be physically independent units, or the memory may also be integrated with the processor, which is not limited herein.
  • this application also provides a computer-readable storage medium in which computer instructions are stored.
  • the operations performed by the terminal device in the method embodiments of this application are And/or the process is executed.
  • This application also provides a computer-readable storage medium that stores computer instructions.
  • the computer instructions run on the computer, the first scheduler or the second scheduler in each method embodiment of the present application The operations and/or processes performed by the scheduler are executed.
  • the application also provides a computer program product.
  • the computer program product includes computer program code or instructions.
  • the operations and/or processes performed by the terminal device in the method embodiments of the application are Be executed.
  • the computer program product includes computer program codes or instructions.
  • the first scheduler or the second scheduler in the various method embodiments of the application The executed operation and/or process is executed.
  • the present application also provides a chip including a processor.
  • the memory for storing the computer program is provided independently of the chip, and the processor is used to execute the computer program stored in the memory, so that operations and/or processing performed by the terminal device in any method embodiment are executed.
  • the chip may also include a communication interface.
  • the communication interface may be an input/output interface, or an interface circuit or the like.
  • the chip may also include the memory.
  • the application also provides a chip including a processor.
  • the memory for storing the computer program is provided independently of the chip, and the processor is used to execute the computer program stored in the memory, so that the operations and/or processing performed by the first scheduler or the second scheduler in any method embodiment are implement.
  • the chip may also include a communication interface.
  • the communication interface may be an input/output interface, or an interface circuit or the like.
  • the chip may also include the memory.
  • this application also provides a scheduling system, including some or all of the terminal equipment, network equipment, first scheduler, and second scheduler in the embodiments of this application.
  • the processor in the embodiment of the present application may be an integrated circuit chip, which has the ability to process signals.
  • the steps of the foregoing method embodiments can be completed by hardware integrated logic circuits in the processor or instructions in the form of software.
  • the processor can be a general-purpose processor, digital signal processor (digital signal processor, DSP), application specific integrated circuit (ASIC), field programmable gate array (field programmable gate array, FPGA) or other programmable logic Devices, discrete gates or transistor logic devices, discrete hardware components.
  • the general-purpose processor may be a microprocessor or the processor may also be any conventional processor or the like.
  • the steps of the method disclosed in the embodiments of the present application may be directly embodied as being executed and completed by a hardware encoding processor, or executed and completed by a combination of hardware and software modules in the encoding processor.
  • the software module can be located in a mature storage medium in the field, such as random access memory, flash memory, read-only memory, programmable read-only memory, or electrically erasable programmable memory, registers.
  • the storage medium is located in the memory, and the processor reads the information in the memory and completes the steps of the above method in combination with its hardware.
  • the memory in the embodiments of the present application may be volatile memory or non-volatile memory, or may include both volatile and non-volatile memory.
  • the non-volatile memory can be read-only memory (ROM), programmable read-only memory (programmable ROM, PROM), erasable programmable read-only memory (erasable PROM, EPROM), and electrically available Erase programmable read-only memory (electrically EPROM, EEPROM) or flash memory.
  • Volatile memory can be random access memory (RAM), which is used as an external high-speed cache.
  • RAM random access memory
  • static random access memory static random access memory
  • dynamic RAM dynamic random access memory
  • DRAM dynamic random access memory
  • synchronous dynamic random access memory synchronous DRAM, SDRAM
  • double data rate synchronous dynamic random access memory double data rate SDRAM, DDR SDRAM
  • enhanced synchronous dynamic random access memory enhanced SDRAM, ESDRAM
  • synchronous connection dynamic random access memory serial DRAM, SLDRAM
  • direct rambus RAM direct rambus RAM
  • the disclosed system, device, and method can be implemented in other ways.
  • the device embodiments described above are merely illustrative, for example, the division of the units is only a logical function division, and there may be other divisions in actual implementation, for example, multiple units or components may be combined or It can be integrated into another system, or some features can be ignored or not implemented.
  • the displayed or discussed mutual coupling or direct coupling or communication connection may be indirect coupling or communication connection through some interfaces, devices or units, and may be in electrical, mechanical or other forms.
  • the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or they may be distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the objectives of the solutions of the embodiments.
  • the functional units in the various embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit.
  • a and/or B can mean that there is A alone, and both A and B exist. There are three cases of B. Among them, A, B, and C can all be singular or plural, and are not limited.
  • the function is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a computer readable storage medium.
  • the technical solution of the present application essentially or the part that contributes to the existing technology or the part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium, including Several instructions are used to make a computer device (which may be a personal computer, a server, or a network device, etc.) execute all or part of the steps of the methods described in the various embodiments of the present application.
  • the aforementioned storage media include: U disk, mobile hard disk, ROM, RAM, magnetic disk or optical disk and other media that can store program codes.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Mathematical Physics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Computing Systems (AREA)
  • Molecular Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Medical Informatics (AREA)
  • Mobile Radio Communication Systems (AREA)

Abstract

本申请提供了一种调度方法、调度***和调度装置,调度器在获取到终端设备发送的上一次调度决策的收益反馈后,可以在接收到收益反馈之后的N个时间单元发送根据该收益反馈计算的本次的调度决策,其中,N>1且N为整数,从而能够留给调度器充足的时间进行调度决策的计算与编码,有效解决了调度器无法及时获取上一次调度决策收益时,调度流程与空口时序卡滞冲突的问题。

Description

调度方法、调度***和调度装置
本申请要求于2020年5月20日提交中国国家知识产权局、申请号为202010430887.3、申请名称为“调度方法、调度***和调度装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及通信领域,并且具体地,涉及一种调度方法、调度***和调度装置。
背景技术
在蜂窝网络中,介质访问控制(media access control,MAC)层调度主要解决时频资源的分配、调制与编码策略(modulation and coding scheme,MCS)选择、用户配对、预编码等问题,通过调度来实现***吞吐和公平性的折中。
目前使用深度强化学***性的折中。在进行深度强化学习的过程中,调度器在接收到上次调度决策的收益反馈之后,要根据该收益反馈确定本次的调度决策,然后将本次的调度决策发送给BS进行下行控制信息(downlink control information,DCI)编码之后由BS在约定的时间发送给终端设备,在实际中,可能由于调度器无法及时获取到上次调度决策的收益,导致BS无法在***约定时间发送本次的调度决策的DCI编码,从而造成空口反馈滞后,导致调度器无法在时序上有效地进行深度强化训练。
发明内容
本申请提供一种调度方法、调度***和调度装置,有效解决了调度器无法及时获取上一次调度决策收益时,调度流程与空口时序卡滞冲突的问题。
第一方面,提供了一种调度方法,应用于由至少一个调度器组成的调度***中,调度***包括第一调度器,方法包括:第一调度器在第i个时间单元获取第一收益反馈,其中,i≥1且i为整数;第一调度器根据第一收益反馈确定第一调度决策,其中,第一收益反馈是终端设备根据第二调度决策确定的,第二调度决策为第一调度器在第一调度决策之前确定的上一次的调度决策;第一调度器在第i+N个时间单元发送第一调度决策,其中,N>1且N为整数。
上述技术方案中,调度器在获取到上一次调度决策的收益反馈时,可以在接收到收益反馈之后的N个时间单元发送根据该收益反馈计算的本次的调度决策,从而能够留给调度器充足的时间进行调度决策的计算与编码,有效解决了调度器无法及时获取上一次调度决策收益时,调度流程与空口时序卡滞冲突的问题。
结合第一方面,在第一方面的某些实现方式中,调度***还包括一个或多个第二调度器,方法还包括:第二调度器在第i+j个时间单元获取第二收益反馈,其中,1≤j≤N-1且j 为整数;第二调度器根据第二收益反馈确定第三调度决策,其中,第二收益反馈是终端设备根据第四调度决策确定的,第四调度决策为第二调度器在第三调度决策之前确定的上一次的调度决策,第一调度器确定的调度决策和第二调度器确定的调度决策分别为第一调度器和第二调度器对同一任务的调度决策;第二调度器在第i+j+M个时间单元发送第二调度决策,其中,M>1且M为整数。
上述技术方案中,相对于只使用1个调度器的使用场景更宽泛,采用多调度器协作调度的调度流程方案,第一调度器和第二调度器交替获取上行收益反馈并输出调度策略,有效地提升调度器对空口环境的适应性。
结合第一方面,在第一方面的某些实现方式中,第一调度器向第二调度器发送第一信息,第一信息包括第一调度决策或第三收益反馈,第三收益反馈是终端设备根据第一调度决策确定的;第二调度器接收第一信息并根据第一信息调整之后对任务的调度决策。
上述技术方案中,调度器间具有信息交互功能,可以调整自身的调度参数,确保不同调度器的调度策略相近、相同,以及调度收益最大化。
结合第一方面,在第一方面的某些实现方式中,第二调度器向第一调度器发送第二信息,第二信息包括第二调度决策或第四收益反馈,第四收益反馈是终端设备根据第二调度决策确定的;第一调度器接收第二信息并根据第二信息调整之后对任务的调度决策。
上述技术方案中,调度器间具有信息交互功能,可以调整自身的调度参数,确保不同调度器的调度策略相近、相同,以及调度收益最大化。
第二方面,提供了一种调度方法,方法包括:终端设备在第i个时间单元发送第一收益反馈,其中,i≥1且i为整数;终端设备在第i+N个时间单元接收第一调度器根据第一收益反馈确定的第一调度决策,其中,第一收益反馈是终端设备根据第二调度决策确定的,第二调度决策为第一调度器在第一调度决策之前确定的上一次的调度决策,N>1且N为整数。
上述技术方案中,终端设备在发送收益反馈后,在一段时间之后再接收调度器针对该收益反馈的调度决策,这样可以留给调度器充足的时间进行调度决策的计算与编码,从而有效解决了调度器无法及时获取上一次调度决策收益时,调度流程与空口时序卡滞冲突的问题。
结合第二方面,在第二方面的某些实现方式中,终端设备在第i+j个时间单元发送第二收益反馈,其中,1≤j≤N-1且j为整数;终端设备在第i+j+M个时间单元接收第二调度器根据第二收益反馈确定的第三调度决策,其中,M>1且M为整数,第二收益反馈是终端设备根据第四调度决策确定的,第四调度决策为第二调度器在第三调度决策之前确定的上一次的调度决策,第一调度器确定的调度决策和第二调度器确定的调度决策分别为第一调度器和第二调度器对同一任务的调度决策。
上述技术方案中,采用多调度器协作调度的调度流程方案,终端设备交替接收不同调度器的调度决策并根据不同的调度器的调度决策确定收益反馈并发送给对应的调度器,有效地提升调度器对空口环境的适应性。
结合第二方面,在第二方面的某些实现方式中,N等于2。
结合第二方面,在第二方面的某些实现方式中,N的值是通信***或通信协议规定的。
第三方面,提供了一种调度***,调度***包括:第一调度器,用于在第i个时间单 元获取第一收益反馈,其中,i≥1且i为整数;第一调度器,还用于根据第一收益反馈确定第一调度决策,其中,第一收益反馈是终端设备根据第二调度决策确定的,第二调度决策为第一调度器在第一调度决策之前确定的上一次的调度决策;第一调度器,还用于在第i+N个时间单元发送第一调度决策,其中,N>1且N为整数。
结合第三方面,在第三方面的某些实现方式中,调度***还包括一个或多个第二调度器,第二调度器,用于在第i+j个时间单元获取第二收益反馈,其中,1≤j≤N-1且j为整数;第二调度器,还用于根据第二收益反馈确定第三调度决策,其中,第二收益反馈是终端设备根据第四调度决策确定的,第四调度决策为第二调度器在第三调度决策之前确定的上一次的调度决策,第一调度器确定的调度决策和第二调度器确定的调度决策分别为第一调度器和第二调度器对同一任务的调度决策;第二调度器,还用于在第i+j+M个时间单元发送第二调度决策,其中,M>1且M为整数。
结合第三方面,在第三方面的某些实现方式中,第一调度器,还用于向第二调度器发送第一信息,第一信息包括第一调度决策或第三收益反馈,第三收益反馈是终端设备根据第一调度决策确定的;第二调度器,还用于接收第一信息并根据第一信息调整之后对任务的调度决策。
结合第三方面,在第三方面的某些实现方式中,第二调度器,还用于向第一调度器发送第二信息,第二信息包括第二调度决策或第四收益反馈,第四收益反馈是终端设备根据第二调度决策确定的;第一调度器,还用于接收第二信息并根据第二信息调整之后对任务的调度决策。
关于第三方面的所产生的有益效果,参考第一方面中的描述,这里不再赘述。
第四方面,提供一种调度装置,调度装置用于执行上述第一方面提供的调度方法。具体地,调度装置可以包括用于执行第一方面提供的调度方法的模块。
第五方面,提供一种调度装置,调度装置用于执行上述第二方面提供的调度方法。具体地,调度装置可以包括用于执行第二方面提供的调度方法的模块。
第六方面,提供一种调度设备,包括处理器。该处理器与存储器耦合,可用于执行存储器中的指令,以实现上述第一方面以及第一方面中任一种可能实现方式中的调度方法。可选地,该调度装置还包括存储器。可选地,该调度装置还包括通信接口,处理器与通信接口耦合,通信接口用于输入和/或输出信息。信息包括指令和数据中的至少一项。
在一种实现方式中,该调度装置为第一调度器或第二调度器。当该调度装置为第一调度器或第二调度器时,通信接口可以是收发器,或,输入/输出接口。
在另一种实现方式中,该调度装置为芯片或芯片***。当该调度装置为芯片或芯片***时,通信接口可以是输入/输出接口可以是该芯片或芯片***上的输入/输出接口、接口电路、输出电路、输入电路、管脚或相关电路等。处理器也可以体现为处理电路或逻辑电路。
在另一种实现方式中,该调度装置为配置于第一调度器或第二调度器中的芯片或芯片***。
可选地,收发器可以为收发电路。可选地,输入/输出接口可以为输入/输出电路。
第七方面,提供一种调度装置,包括处理器。该处理器与存储器耦合,可用于执行存储器中的指令,以实现上述第二方面以及第二方面中任一种可能实现方式中的调度方法。 可选地,该调度装置还包括存储器。可选地,该调度装置还包括通信接口,处理器与通信接口耦合,通信接口用于输入和/或输出信息。信息包括指令和数据中的至少一项。
在一种实现方式中,该调度装置为终端设备。当该调度装置为网络设备时,通信接口可以是收发器,或,输入/输出接口。
在另一种实现方式中,该调度装置为芯片或芯片***。当该调度装置为芯片或芯片***时,通信接口可以是该芯片或芯片***上的输入/输出接口、接口电路、输出电路、输入电路、管脚或相关电路等。处理器也可以体现为处理电路或逻辑电路。
在另一种实现方式中,该调度装置为配置于终端设备中的芯片或芯片***。
可选地,收发器可以为收发电路。可选地,输入/输出接口可以为输入/输出电路。
第八方面,提供一种计算机可读存储介质,其上存储有计算机程序,计算机程序被调度装置执行时,使得调度装置实现第一方面以及第一方面的任一可能的实现方式中的调度方法。
第九方面,提供一种计算机可读存储介质,其上存储有计算机程序,计算机程序被调度装置执行时,使得通信装置实现第二方面以及第二方面的任一可能的实现方式中的调度方法。
第十方面,提供一种包含指令的计算机程序产品,指令被计算机执行时使得调度装置实现第一方面提供的调度方法。
第十一方面,提供一种包含指令的计算机程序产品,指令被计算机执行时使得调度装置实现第二方面提供的调度方法。
第十二方面,提供了一种调度***,包括前述的第一调度器和终端设备;或者,包括前述的第一调度器、第二调度器以及终端设备。
附图说明
图1是适用于本申请实施例的网络架构的示意图。
图2是强化学习的训练过程示意图。
图3是BS和UE使用深度强化学习进行调度的处理时序图。
图4是本申请实施例提供的调度方法的示意性框图。
图5是本申请实施例提供的上行BS调度方法的示意性流程图。
图6是本申请实施例提供的下行BS调度方法的示意性流程图。
图7是本申请实施例提供的多个BS调度器执行调度方法的示意性流程图。
图8为本申请提供的调度装置1000的示意性框图。
图9为本申请提供的调度装置2000的示意性框图。
图10为本申请提供的通信装置10的示意性结构图。
图11为本申请提供的通信装置20的示意性结构图。
具体实施方式
下面将结合附图,对本申请中的技术方案进行描述。
本申请实施例的技术方案可以应用于各种通信***,例如:长期演进(long term evolution,LTE)***、LTE频分双工(frequency division duplex,FDD)***、LTE时分 双工(time division duplex,TDD)、通用移动通信***(universal mobile telecommunication system,UMTS)、新无线(new radio,NR)***等第五代(5th generation,5G)***,卫星通信***,以及其它未来演进的通信***等。
图1是适用于本申请实施例的网络架构的示意图。如图1所示,该网络架构可以包括至少一个网络设备110、至少一个终端设备120以及至少一个调度器130。其中,终端设备120可以是移动的或固定的。网络设备110为可以通过无线链路与终端设备120通信的设备,如基站或基站控制器等。调度器130可以通过与网络设备110和终端设备120的信息交互实现***吞吐和公平性的折中。应理解,图1只是示例性地给出了一个网络设备、一个终端设备和一个调度器,但这不应对本申请构成任何限定。
可选地,网络设备110和调度器130可以是物理上相互独立的设备,或者,网络设备110也可以和调度器130集成在一起,本文不做限定。
上述各个通信设备,可以配置多个天线。该多个天线可以包括至少一个用于发送信号的发射天线和至少一个用于接收信号的接收天线。另外,各通信设备还附加地包括发射机链和接收机链,本领域普通技术人员可以理解,它们均可包括与信号发送和接收相关的多个部件(例如处理器、调制器、复用器、解调器、解复用器或天线等)。因此,网络设备与终端设备之间可通过多天线技术通信。
本申请实施例中,网络设备可以是任意一种具有无线收发功能的设备。网络设备包括但不限于:演进型节点B(evolved Node B,eNB)、无线网络控制器(radio network controller,RNC)、节点B(Node B,NB)、家庭基站(例如,home evolved Node B,或home Node B,HNB)、基带单元(baseband unit,BBU),无线保真(wireless fidelity,WIFI)***中的接入点(access point,AP)、无线中继节点、无线回传节点、传输点(transmission point,TP)或者发送接收点(transmission and reception point,TRP)等,还可以为5G(如NR)***中的gNB或传输点(TRP或TP),或者,5G***中的基站的一个或一组(包括多个天线面板)天线面板,或者,可以为构成gNB或传输点的网络节点,如基带单元(BBU),或,分布式单元(distributed unit,DU)等。
网络设备可以与终端设备通过上行传输或下行传输数据进行通信。在本申请实施例中,终端设备也可以称为用户设备(user equipment,UE)、接入终端、用户单元、用户站、移动站、移动台、远方站、远程终端、移动设备、用户终端、终端、无线通信设备、用户代理或用户装置。本申请的实施例中的终端设备可以是手机(mobile phone)、平板电脑(pad)、带无线收发功能的电脑、虚拟现实(virtual reality,VR)终端设备、增强现实(augmented reality,AR)终端设备、工业控制(industrial control)中的无线终端、无人驾驶(self driving)中的无线终端、远程医疗(remote medical)中的无线终端、智能电网(smart grid)中的无线终端、运输安全(transportation safety)中的无线终端、智慧城市(smart city)中的无线终端、智慧家庭(smart home)中的无线终端、蜂窝电话、无绳电话、会话启动协议(session initiation protocol,SIP)电话、无线本地环路(wireless local loop,WLL)站、个人数字助理(personal digital assistant,PDA)、具有无线通信功能的手持设备、计算设备或连接到无线调制解调器的其它处理设备、车载设备、可穿戴设备、5G网络中的终端设备、非公共网络中的终端设备等。
其中,可穿戴设备也可以称为穿戴式智能设备,是应用穿戴式技术对日常穿戴进行智 能化设计、开发出可以穿戴的设备的总称,如眼镜、手套、手表、服饰及鞋等。可穿戴设备即直接穿在身上,或是整合到用户的衣服或配件的一种便携式设备。可穿戴设备不仅仅是一种硬件设备,更是通过软件支持以及数据交互、云端交互来实现强大的功能。广义穿戴式智能设备包括功能全、尺寸大、可不依赖智能手机实现完整或者部分的功能,例如:智能手表或智能眼镜等,以及只专注于某一类应用功能,需要和其它设备如智能手机配合使用,如各类进行体征监测的智能手环、智能首饰等。
此外,终端设备还可以是物联网(internet of things,IoT)***中的终端设备。IoT是未来信息技术发展的重要组成部分,其主要技术特点是将物品通过通信技术与网络连接,从而实现人机互连,物物互连的智能化网络。
为便于理解本申请实施例,首先对本申请中涉及到的术语作简单说明。
1、强化学习:
强化学习是机器学习中的一个领域。参见图2,图2是强化学习的训练过程示意图。如图1所示,强化学习主要包含四个元素:智能体(agent)、环境(environment)、状态(state)、动作(action)与奖励(reward),其中,智能体的输入为状态,输出为动作。
当前技术中,强化学习的训练过程为:通过智能体与环境进行多次交互,获得每次交互的动作、状态、奖励;将这多组(动作,状态,奖励)作为训练数据,对智能体进行一次训练。采用上述过程,对智能体进行下一轮次训练,直至满足收敛条件。其中,获得一次交互的动作、状态、奖励的过程如图1所示,将环境当前状态s(t)输入至智能体,获得智能体输出的动作a(t),根据环境在动作a(t)作用下的相关性能指标计算本次交互的奖励r(t),至此,获得本次交互的动作a(t)、动作a(t)与奖励r(t)。记录本次交互的动作a(t)、动作a(t)与奖励r(t),以备后续用来训练智能体。还记录环境在动作a(t)作用下的下一个状态s(t+1),以便实现智能体与环境的下一次交互。
2、深度强化学习(deep reinforcement learning,DRL):
将强化学习和深度学习相结合,就得到了深度强化学习。深度强化学习仍然符合强化学习中主体和环境交互的框架。不同的是,智能体(agent)中使用深度神经网络进行决策。
参见图3,图3是BS和UE使用深度强化学习进行调度的处理时序图。目前的基于深度强化学习的BS调度器工作流程一般是:
a.调度器获取调度收益反馈。例如:调度器在帧n中获取UE发送的上行数据,并对上行数据进行解调译码,获取调度器最近一次输出的调度决策的收益反馈。
收益反馈是指在深度强化学***性的影响。例如:调度器的调度决策A能够提高***吞吐量和用户的公平性,那么终端设备对调度决策A的收益反馈就高,同理,如果调度器的调度决策B降低了***吞吐量和用户的公平性,那么终端设备对调度决策B的收益反馈就低,这样,调度器就可以根据终端设备发送的收益反馈不断更新调度决策,从而确定最优的调度决策。
b.调度器输出调度决策,BS编码DCI并发送。例如:调度器根据在帧n获取到的收益反馈确定本次的调度决策,BS将本次的调度决策编码生成DCI#1,BS在帧n+1中向UE发送DCI#1。
c.通过上行获取调度决策后的收益反馈。例如:UE在接收到BS发送的DCI#1后,对DCI#1进行译码,同时根据译码结果向BS发送收益反馈(即上行数据),对应的,BS 调度器在帧n+1中获取UE发送的该收益反馈。
d.更新决策再次编码DCI并发送。例如BS调度器根据上行数据在帧n+1中再次进行决策编码DCI#2,并在帧n+2向UE发送DCI#2。
上述调度器的调度流程可能会受限于步骤c.通过上行获取决策后的收益。首先,步骤c的耗时长短直接影响BS调度器响应的快慢,如果反馈的时间过长,调度器的动态适应性就会下降。其次,不同帧结构对步骤c的时效性也有很大影响。例如10ms的***帧中,BS在帧n+1的子帧0下发DCI,如果UE在帧n的子帧9上报调度收益,因为DCI编码需要提前进行,所以BS无法在帧n+1的子帧0中包含帧n的子帧9上报的反馈调度收益,造成空口反馈滞后,因此,若按照该时序就无法有效进行深度强化学习的训练。
有鉴于此,本申请提出一种调度方法,用于解决空口反馈滞后对深度强化学习调度器的影响。
在本申请所有实施例中,在时域上可以包括一个或多个时间单元。例如:在时域上,可以划分为时间长度为10ms的帧(frame),每个帧被分成10个相同大小长度为1ms的子帧(subframe),每个子帧可包含一个或多个时隙。作为示例而非限定,本申请实施例中的时间单元以帧为例进行说明。
参见图4,图4是本申请实施例提供的调度方法的示意性框图。该方法应用于由至少一个调度器组成的调度***中。
第一调度器的调度流程包括:
401、第一调度器在第i个时间单元获取第一收益反馈,i>1且i为整数。
终端设备在第i个时间单元向网络设备发送第一收益反馈,第一调度器在第i个时间单元从网络设备获取第一收益反馈,第一收益反馈是终端设备根据第一调度器最近一次输出的调度决策确定的(即第二调度决策)。
应理解,这里的网络设备可以为图1中描述的网络设备,例如:网络设备可以为BS,更具体的,可以为BS基带。
402、第一调度器根据第一收益反馈确定第一调度决策。
403、第一调度器在第i+N个时间单元发送第一调度决策。
可选的,N的值可以是通信***或通信协议规定的。
例如:第i个时间单元为帧n,第一调度器可以在帧n之后的第N个帧中发送第一调度决策,其中N>1且N为整数,比如N的取值可以为N=2、3或5,即第一调度器在帧n获取到第一收益反馈后,根据第一收益反馈确定第一调度决策,然后在帧(n+2)、帧(n+3)或者帧(n+5)发送第一调度决策。
第一调度器将第一调度决策发送给网络设备进行DCI编码,网络设备在i+N个时间单元中将DCI发送给终端设备。
可选的,该调度***中还可以包括一个或多个第二调度器,第二调度器与第一调度器分别对同一任务的进行调度决策输出,第一调度器与第二调度器轮循下发调度决策与获取调度收益反馈。
应理解,这里的同一任务是指在同一调度***中,***侧决定同一终端设备是否被调度或***侧给同一终端设备分配时频资源的过程。
第二调度器的调度流程包括:
(1)第二调度器在第i+j个时间单元获取第二收益反馈,1≤j≤N-1且j为整数。例如:当N=2时,j=1或者当N=5时,j=1、2、3或4。
终端设备在第i+j个时间单元向网络设备发送第二收益反馈,第二调度器在第i+j个时间单元从网络设备获取第二收益反馈,第二收益反馈是终端设备根据第二调度器最近一次输出的调度决策确定的(即第四调度决策)。
(2)第二调度器根据第二收益反馈确定第三调度决策。
(3)第二调度器在第i+j+M个时间单元发送第三调度决策,M>1且M为整数。
下面对第一调度器和第二调度器的调度进行举例说明。
例如:第i个时间单元为帧n,则第一调度器可以在帧n之后的第N个帧中发送第一调度决策,其中N>1且N为整数,比如N的取值可以为N=2、3或5,即第一调度器在帧n获取到第一收益反馈后,根据第一收益反馈确定第一调度决策,然后在帧(n+2)、帧(n+3)或者帧(n+5)发送第一调度决策。
第二调度器将第三调度决策发送给网络设备进行DCI编码,网络设备在i+j+M个时间单元中将DCI发送给终端设备。
可选的,M的值可以是通信***或通信协议规定的。
其中,第二调度器的调度周期M可以等于第一调度器的调度周期N,也可以不等于N。
例如:当N=2时,第一调度器在帧n、帧(n+2)、帧(n+4)、帧(n+6)……上下发调度决策与获取调度收益反馈;
当M=N=2时,j=1,第二调度器在帧(n+1)、帧(n+3)、帧(n+5)、帧(n+7)……上下发调度决策与获取调度收益反馈;
当M=4≠N时,j=1,第二调度器在帧(n+1)、帧(n+5)、帧(n+9)、帧(n+13)……上下发调度决策与获取调度收益反馈。
对于M≠N的情况,可能存在两个调度器在同一时间单元上重叠的情况,例如:如果存在第一调度器的时间周期N为3,则第一调度器在帧n、帧(n+3)、帧(n+6)、帧(n+9)……上下发调度决策与获取调度收益反馈;存在第二调度器的时间周期M为4,则第一调度器在帧(n+1)、帧(n+5)、帧(n+9)、帧(n+13)……上下发调度决策与获取调度收益反馈,这时,两个调度器都会在帧(n+9)上下发调度决策与获取调度收益反馈。
对于上述情况,可选的,由网络设备选择只采取其中一个调度器。被选择的调度器正常按预设的调度周期进行,没有被选择的调度器则跳开此次的调度收益反馈获取、调度决策计算以及调度决策下发。未被选择的调度器在此次接收调度收益的时间窗内未收到相应的调度收益,则会在下一个调度时刻重发上一次的调度策略。接收调度收益的时间窗是指从上一次下发调度策略起,到此次决策计算所要求的调度收益最晚到达时间。
对于上述情况,可选的,网络设备按照预先规定的方式选择调度器,可以固定选择其中一个,也可以轮询选择,也可以根据其他参数选择(例如分配给负载较低的调度器)。
可选的,在调度器的决策过程中,第一调度器、第二调度器还可以进行调度决策的交互。
可选的,第一调度器、第二调度器经过预设的时间段进行决策交互。
可选的,第一调度器、第二调度器达到预设的调度决策次数进行决策交互。
可选的,第一调度器、第二调度器交互信息可以为最近一次的调度决策与该决策对应 的收益。例如:第一调度器可以将第一调度决策和第一调度决策的收益反馈发送给第二调度器,或者调度器第二调度器可以将第三调度决策第三调度器的收益反馈发送给第一调度器。
可选的,第一调度器、第二调度器根据收到的交互信息修正各自的调度决策,使交互的调度器之间的调度决策逐渐收敛于同一个方向。
可选的,第一调度器、第二调度器在收到的交互信息后,也可以不进行决策调整,各自根据接收到的交互信息与自己的调度决策,比较两个调度器输出的调度决策之间的差异。
上述技术方案中,调度器在获取到上一次调度决策的收益反馈时,不必在当前时间单元完成调度决策的计算,网络设备也不用在下一个时间单元就进行DCI发送,而是在间隔一个或多个时间单元之后再发送DCI,不局限于空口帧结构,留给调度器充足的时间进行调度决策的计算与编码。有效解决了调度器无法及时获取上一次调度决策收益时,调度流程与空口时序卡滞冲突的问题。同时,第一调度器在时域上跨时间单元进行调度决策输出,那么一个或多个第二调度器就可以利用第一调度器没有使用的时间单元对同一任务进行调度决策,从而提高***时频资源的利用率。
参见图5,图5是本申请实施例提供的上行BS调度方法的示意性流程图。图5中包括2个调度器,分别为调度器A和调度器B,应理解,图4只是示例性的给出2个调度器,本申请中的调度器的数量不局限于图4中的2个调度器,可以是3个或多个。
当BS使用基于深度强化学习调度器时,***采用调度器A(即第二调度器的一例)和调度器B(即第一调度器的一例)共同协作调度,各调度器轮循获取调度输入与输出调度决策,调度器A确定的调度决策和调度器B确定的调度决策分别为调度器A和调度器B对同一任务的调度决策,具体的上行调度过程如下:
对于调度器B:
101、UE在帧n中接收与译码DCI#1。
应理解,这里的DCI#1是BS基带根据调度器B最近一次输出的调度决策(即第二调度决策的一例)编码生成的。
102、UE在帧n中对上行数据进行编码与发送。
UE根据接收到的DCI#1信息对上行数据进行PUSCH编码并在帧n中向BS基带发送编码后的上行数据。
103、BS基带在帧n中接收与译码上行数据。
BS基带在帧n中接收上行数据并对上行数据进行译码,从译码后的数据中获取上次上行调度决策的收益反馈R1(即第一收益反馈的一例)并发送给调度器B。
需要说明的是,在上行BS调度流程中,终端设备不直接计算收益反馈,只根据接收到的DCI信息进行上行数据编码,由BS根据接收到的上行数据计算获取相应的收益反馈。
104、调度器B接收BS基带发送的收益反馈R1。
105、调度器B根据收益反馈R1计算调度决策B1(即第一调度决策的一例)。
106、调度器B向BS基带输出调度决策B1。
107、BS基带编码DCI#2,并在帧n+2中发送DCI#2。
BS基带根据从调度器B获取的调度决策B1,将其整合入DCI并进行DCI编码,生 成DCI#2,BS基带在帧(n+2)中向UE发送DCI#2。
应理解,DCI编码只要在接收到调度决策B1后就可以开始编码,同时在BS基带发送DCI前完成即可。
108、UE在帧(n+2)中接收与译码DCI#2。
应理解,步骤108与步骤101是相同的,表示终端设备与调度器开始进入下一轮学习。
109、UE在帧(n+2)中对上行数据进行编码与发送。
UE根据接收到的DCI#2对上行数据进行PUSCH编码并在帧(n+2)中向BS基带发送编码后的上行数据。
110、BS基带在帧(n+2)中接收与译码上行数据。
BS在帧(n+2)中接收上行数据并对上行数据进行译码,从译码后的数据中获取本次上行调度的收益反馈R2并发送给调度器B。
111、调度器B接收收益反馈R2。
112、调度器B根据收益反馈R2计算调度决策B2。
113、调度器B向BS基带输出调度决策B2。步骤113与步骤106操作相同。后续循环操作参见步骤106之后的操作。
调取器A的调度流程与调度器B的调度流程是相同的,不同的是两个调度器的在不同的时间单元上进行调度决策。
对于调度器A:
201、调度器A在帧(n-1)中接收收益反馈R3。
202、调度器A根据收益反馈R3计算调度决策A1。
203、调度器A向BS基带输出调度决策A1(即第四调度决策的一例)。
204、BS基带编码DCI#3,并在帧n+3中发送DCI#3。
BS基带根据从调度器A获取的调度决策A1,将其整合入DCI并进行DCI编码,生成DCI#3,BS基带在帧(n+1)中向UE发送DCI#3。
应理解,DCI编码只要在接收到调度器A的调度决策后就可以开始编码,同时在BS基带发送DCI前完成即可。
205、UE在帧(n+1)中接收与译码DCI#3。
206、UE在帧(n+1)中对上行数据进行编码与发送。
UE根据接收到的DCI#3对上行数据进行PUSCH编码并在帧(n+1)中向BS基带发送编码后的上行数据。
207、BS基带在帧(n+1)中接收与译码上行数据。
BS在帧(n+1)中接收上行数据并对上行数据进行译码,从译码后的数据中获取本次上行调度的收益反馈R4(即第二收益反馈的一例)并发送给调度器A。
208、调度器A接收收益反馈R4。
209、调度器A根据收益反馈R4计算调度决策A2(即第三调度决策的一例)。
210、调度器A向BS基带输出调度决策A2。
211、BS基带编码DCI#4,并在帧(n+3)中发送DCI#4。
BS基带根据从调度器A获取的调度决策A2,将其整合入DCI并进行DCI编码,生成DCI#4,BS基带在帧(n+3)中向UE发送DCI#4。步骤211与步骤204操作相同。后 续循环操作参见步骤204之后的操作。
可选的,在调度器的决策过程中,调度***中的多个调度器还可以进行调度决策的交互,交互的周期可由用户或***定义。
可选的,多个调度器经过预设的时间段进行决策交互。
可选的,多个调度器达到预设的调度决策次数进行决策交互。
可选的,多个调度器交互信息可以为最近一次的调度决策与该决策对应的收益。例如:调度器A可以将调度决策A1和收益反馈R4发送给调度器B,或者调度器B可以将调度决策B1和收益反馈R2发送给调度器A。
可选的,调度器A、B根据收到的交互信息修正各自的调度决策,使多个调度器的调度决策逐渐收敛于同一个方向。
可选的,调度器A、B在收到的交互信息后,不进行决策调整,各自根据接收到的信息与自己的调度决策,比较两个调度器输出的调度决策之间的差异。
上述技术方案中,相对于只使用1个调度器的使用场景更宽泛,调度器A、B交替获取上行收益反馈并输出调度策略到DCI,UE接收译码DCI并根据调度决策进行上行数据编码。BS接收上行译码后,将调度的收益结果交替反馈到对应的调度器上。BS基带从某一调度器获取调度策略编码DCI后,会将后续最近的一次调度收益反馈到同一调度器上;BS基带下一次会从另一个调度器上获取调度策略,再向其反馈后续最近一次调度收益。多个调度器的方案能够有效地提升调度器对空口环境变化的适应性。同时,调度器间具有信息交互功能,可以调整自身的调度参数,确保不同调度器的调度策略相近、相同,以及调度收益最大化。
参见图6,图6是本申请实施例提供的下行BS调度方法的示意性流程图。图6中包括2个调度器,分别为调度器A和调度器B,应理解,图6只是示例性的给出2个调度器,本申请中的调度器的数量不局限于图6中的2个调度器,可以是3个或多个。
当BS使用基于深度强化学习调度器时,***采用调度器A和调度器B共同协作调度,各调度器轮循获取调度输入与执行调度决策。具体的下行调度过程如下:
对于调度器B:
601、UE在帧n中接收与译码DCI#1。
应理解,这里的DCI#1是BS基带根据调度器B最近一次输出的调度决策(即第二调度决策的一例)编码生成的。
602、UE在帧n中发送收益反馈R1。
UE根据接收到的DCI#1信息确定收益反馈R1并进行编码,并在帧n中向BS基带发送编码后的收益反馈R1。
需要说明的是,与上行BS调度流程不同的是,在下行BS调度流程中,终端设备直接计算收益反馈并将计算出的收益反馈编码后反馈给BS基带,BS基带只需要对接收到的数据进行译码即可直接获取本次的收益反馈。
606、BS基带在帧n中接收收益反馈R1。
BS基带在帧n中接收上行数据并对上行数据进行译码,从译码后的数据中获取收益反馈R1(即第一收益反馈的一例)并发送给调度器B。
604、调度器B接收BS基带发送的收益反馈R1。
603、调度器B根据收益反馈R1计算调度决策B1(即第一调度决策的一例)。
606、调度器B向BS基带输出调度决策B1。
607、BS基带编码DCI#2,并在帧n+2中发送DCI#2。
BS基带根据从调度器B获取的调度决策B1,将其整合入DCI#2并决策下行数据进行PDSCH编码,BS基带在帧(n+2)中向UE发送DCI#2和编码后的下行数据。
应理解,BS基带在接收到调度决策后就可以开始DCI编码,同时在BS基带发送DCI前完成即可。
608、UE在帧(n+2)中接收与译码DCI#2。
应理解,步骤608与步骤601是相同的,表示终端设备与调度器开始进入下一轮学习。
609、UE在帧(n+2)中发送收益反馈R2。
UE根据接收到的DCI#2对上行数据进行PUSCH编码并在帧(n+2)中向BS基带发送编码后的上行数据。
UE根据接收到的DCI#2信息确定收益反馈R2并进行编码,并在帧n中向BS基带发送编码后的收益反馈R2。
610、BS基带在帧(n+2)中接收收益反馈R2。
BS在帧(n+2)中接收上行数据并对上行数据进行译码,从译码后的数据中获取收益反馈R2并发送给调度器B。
611、调度器B接收收益反馈R2。
612、调度器B根据收益反馈R2计算调度决策B2。
613、调度器B向BS基带输出调度决策B2。步骤613与步骤606操作相同。后续循环操作参见步骤306之后的操作。
调取器A的调度流程与调度器B的调度流程是相同的,不同的是两个调度器的在不同的时间单元上进行调度决策。
对于调度器A:
701、调度器A在帧(n-1)中接收收益反馈R3。
702、调度器A根据收益反馈R3计算调度决策A1。
703、调度器A向BS基带输出调度决策A1(即第四调度决策的一例)。
704、BS基带编码DCI#3,并在帧n+3中发送DCI#3。
BS基带根据从调度器A获取的调度决策A1,将其整合入DCI#3并决策下行数据进行PDSCH编码,BS基带在帧(n+1)中向UE发送DCI#3和下行数据。
应理解,调度器A在接收到调度决策后就可以开始DCI编码,同时在BS基带发送DCI前完成即可。
705、UE在帧(n+1)中接收与译码DCI#3。
706、UE在帧(n+1)中发送收益反馈R4。
UE根据接收到的DCI#3信息确定收益反馈R4并进行编码,并在帧(n+1)中向BS基带发送编码后的收益反馈R4。
707、BS基带在帧(n+1)中接收收益反馈R4。
BS在帧(n+1)中接收上行数据并对上行数据进行译码,从译码后的数据中获取收益反馈R4(即第二收益反馈的一例)并发送给调度器A。
708、调度器A接收收益反馈R4。
709、调度器A根据收益反馈R4计算调度决策A2(即第三调度决策的一例)。
710、调度器A向BS基带输出调度决策A2。
711、BS基带编码DCI#4,并在帧(n+3)中发送DCI#4。
BS基带根据从调度器A获取的调度决策A2,将其整合入DCI#4并决策下行数据进行PDSCH编码,BS基带在帧(n+3)中向UE发送DCI#4和下行数据。步骤711与步骤704操作相同。后续循环操作参见步骤704之后的操作。
在下行调度流程中,调度器A、B也可以进行调度决策的交互和调整,这里不再赘述。
上述技术方案中,调度器A、B交替获取下收益反馈,输出调度决策到DCI并根据调度决策进行下行数据编码,UE接收译码DCI并根据调度决策确定该调度决策的收益反馈。BS接收到该收益反馈后,将调度的收益结果交替反馈到对应的调度器上。BS基带从某一调度器获取调度策略编码DCI后,会将后续最近的一次调度收益反馈到同一调度器上;BS基带下一次会从另一个调度器上获取调度策略,再向其反馈后续最近一次调度收益。相对于只使用1个调度器的使用场景更宽泛,不局限于空口帧结构,例如调度在接收到收益反馈后有充足的计算调度决策时间,BS基带在接收到调度决策后也有充足的时间进行编码,解决了空口反馈滞后对深度强化学习调度器的影响。同时,调度器间具有信息交互功能,可以调整自身的调度参数,确保不同调度器的调度策略相近、相同,以及调度收益最大化。
参见图7,图7是本申请实施例提供的多个BS调度器执行调度方法的示意性流程图。图7中包括3个调度器,分别为调度器A、调度器B和调度器C。
由图7可知,3个调度器的调度顺序为B,A,C,B,A,C,其中调度器B在帧n从BS基带获取收益反馈,BS在帧(n+3)中输出调度器B的调度决策,调度器A在帧(n+1)从BS基带获取收益反馈,BS在帧(n+4)中输出调度器A的调度决策,调度器C在帧(n+2)从BS基带获取收益反馈,BS在帧(n+5)中输出调度器C的调度决策。关于调度器A、调度器B和调度器C具体的调度过程参考图5和图6中的描述,这里不再赘述。
同理,当存在Q(Q>3且Q为整数)个调度器时,调度器A在帧(n+1)从BS基带获取收益反馈,BS在帧(n+1+N)中输出调度器A的调度决策,调度器B在帧n从BS基带获取收益反馈,BS在帧(n+N)中输出调度器B的调度决策,调度器C在帧(n+2)从BS基带获取收益反馈,BS在帧(n+2+N)中输出调度器C的调度决策,其中,N≥Q且N为整数,剩余的调度器可以根据以上规律在预留的帧中进行收益反馈和决策输出,这里不再赘述。
需要说明是,本申请中调度器的交替轮循的顺序不局限顺序输出,只要保证多个调度器之间的调度顺序在时间单元上是有规律可循的即可。例如,对于图5、图6中的调度器A、B的调度顺序也可以是A、B、B、A、B、B、A或A、A、B、B、A、A、B、B等周期性有规律的顺序,对于图7中的调度器A、B、C的调度顺序也可以为A、B、A、C,A、B、A、C或A、B、B、A、C、A、B、B、A、C等周期性有规律的顺序。
以上,对本申请提供的提供调度方法进行了详细说明,下面介绍本申请提供的调度装置。
参见图8,图8为本申请提供的调度装置1000的示意性框图。如图8,调度装置1000 包括发送单元1100、接收单元1200以及处理单元1300。
发送单元1100,用于在第i个时间单元发送第一收益反馈,其中,i≥1且i为整数;接收单元,用于在第i+N个时间单元接收第一调度器根据第一收益反馈确定的第一调度决策,其中,
处理单元1300,用于根据第二调度决策确定第一收益,第二调度决策为第一调度器在第一调度决策之前确定的上一次的调度决策,N>1且N为整数。
可选地,在一个实施例中,发送单元1100,还用于在第i+j个时间单元发送第二收益反馈,其中,1≤j≤N-1且j为整数;
接收单元1200,用于在第i+j+M个时间单元接收第二调度器根据第二收益反馈确定的第三调度决策,其中,M>1且M为整数,
处理单元1300,用于根据第四调度决策确定第二收益反馈,第四调度决策为第二调度器在第三调度决策之前确定的上一次的调度决策,第一调度器确定的调度决策和第二调度器确定的调度决策分别为第一调度器和第二调度器对同一任务的调度决策。
可选地,接收单元1100和发送单元1200也可以集成为一个收发单元,同时具备接收和发送的功能,这里不作限定。
可选地,在一个实施例中,N等于2。
可选地,在一个实施例中,N的值是通信***或通信协议规定的。
在一种实现方式中,调度装置1000可以为方法实施例中的终端设备。在这种实现方式中,发送单元1100可以为发射器,接收单元1200可以为接收器。接收器和发射器也可以集成为一个收发器。处理单元1300可以为处理装置。
在另一种实现方式中,调度装置1000可以为安装在终端设备中的芯片或集成电路。在这种实现方式中,发送单元1100和接收单元1200可以为通信接口或者接口电路。例如,发送单元1100为输出接口或输出电路,接收单元1200为输入接口或输入电路,处理单元1300可以为处理装置。
其中,处理装置的功能可以通过硬件实现,也可以通过硬件执行相应的软件实现。例如,处理装置可以包括存储器和处理器,其中,存储器用于存储计算机程序,处理器读取并执行存储器中存储的计算机程序,使得调度装置1000执行各方法实施例中由终端设备执行的操作和/或处理。可选地,处理装置可以仅包括处理器,用于存储计算机程序的存储器位于处理装置之外。处理器通过电路/电线与存储器连接,以读取并执行存储器中存储的计算机程序。又例如,处理装置可以芯片或集成电路。
参见图9,图9为本申请提供的调度装置2000的示意性框图。如图9,调度装置2000包括发送单元2100、接收单元2200以及处理单元2300。
处理单元2300,用于在第i个时间单元获取第一收益反馈,其中,i≥1且i为整数;
所述处理单元2300,还用于根据所述第一收益反馈确定第一调度决策,其中,所述第一收益反馈是终端设备根据第二调度决策确定的,所述第二调度决策为所述处理单元在所述第一调度决策之前确定的上一次的调度决策;
发送单元2100,用于在所述第i+N个时间单元之前发送第一调度决策,其中,N>1且N为整数。
可选地,发送单元2100和接收单元2200也可以集成为一个收发单元,同时具备接收 和发送的功能,这里不作限定。
在一种实现方式中,调度装置2000可以为方法实施例中的第一调度器或第二调度器。在这种实现方式中,发送单元2100可以为发射器,接收单元2200可以为接收器。接收器和发射器也可以集成为一个收发器。处理单元2300可以为处理装置。
在另一种实现方式中,调度装置2000可以为安装在第一调度器或第二调度器中的芯片或集成电路。在这种实现方式中,发送单元2100和接收单元2200可以为通信接口或者接口电路。例如,发送单元2100为输出接口或输出电路,接收单元2200为输入接口或输入电路,处理单元2300可以为处理装置。
其中,处理装置的功能可以通过硬件实现,也可以通过硬件执行相应的软件实现。例如,处理装置可以包括存储器和处理器,其中,存储器用于存储计算机程序,处理器读取并执行存储器中存储的计算机程序,使得调度装置2000执行各方法实施例中由第一调度器和第二调度器执行的操作和/或处理。可选地,处理装置可以仅包括处理器,用于存储计算机程序的存储器位于处理装置之外。处理器通过电路/电线与存储器连接,以读取并执行存储器中存储的计算机程序。又例如,处理装置可以芯片或集成电路。
参见图10,图10为本申请提供的通信装置10的示意性结构图。如图10,通信装置10包括:一个或多个处理器11,一个或多个存储器12以及一个或多个通信接口13。处理器11用于控制通信接口13收发信号,存储器12用于存储计算机程序,处理器11用于从存储器12中调用并运行该计算机程序,以使得本申请各方法实施例中由终端设备执行的流程和/或操作被执行。
例如,处理器11可以具有图8中所示的处理单元1300的功能,通信接口13可以具有图8中所示的发送单元1100和/或接收单元1200的功能。具体地,处理器11可以用于执行图4图7中由终端设备内部执行的处理或操作,通信接口13用于执行图4-图7中由终端设备执行的发送和/或接收的动作。
在一种实现方式中,通信装置10可以为方法实施例中的终端设备。在这种实现方式中,通信接口13可以为收发器。收发器可以包括接收器和发射器。
可选地,处理器11可以为基带装置,通信接口13可以为射频装置。
在另一种实现中,通信装置10可以为安装在终端设备中的芯片。在这种实现方式中,通信接口13可以为接口电路或者输入/输出接口。
参见图11,图11是本申请提供的通信装置20的示意性结构图。如图11,通信装置20包括:一个或多个处理器21,一个或多个存储器22以及一个或多个通信接口23。处理器21用于控制通信接口23收发信号,存储器22用于存储计算机程序,处理器21用于从存储器22中调用并运行该计算机程序,以使得本申请各方法实施例中由第一调度器或第二调度器执行的流程和/或操作被执行。
例如,处理器21可以具有图9中所示的处理单元2300的功能,通信接口23可以具有图9中所示的发送单元2100和接收单元2200的功能。具体地,处理器21可以用于执行图4-图7中由第一调度器或第二调度器内部执行的处理或操作,通信接口23用于执行图4-图7中由第一调度器或第二调度器执行的发送和/或接收的动作,不再赘述。
可选的,上述各装置实施例中的处理器与存储器可以是物理上相互独立的单元,或者,存储器也可以和处理器集成在一起,本文不做限定。
此外,本申请还提供一种计算机可读存储介质,所述计算机可读存储介质中存储有计算机指令,当计算机指令在计算机上运行时,使得本申请各方法实施例中由终端设备执行的操作和/或流程被执行。
本申请还提供一种计算机可读存储介质,所述计算机可读存储介质中存储有计算机指令,当计算机指令在计算机上运行时,使得本申请各方法实施例中由第一调度器或第二调度器执行的操作和/或流程被执行。
本申请还提供一种计算机程序产品,计算机程序产品包括计算机程序代码或指令,当计算机程序代码或指令在计算机上运行时,使得本申请各方法实施例中由终端设备执行的操作和/或流程被执行。
本申请还提供一种计算机程序产品,计算机程序产品包括计算机程序代码或指令,当计算机程序代码或指令在计算机上运行时,使得本申请各方法实施例中由第一调度器或第二调度器执行的操作和/或流程被执行。
此外,本申请还提供一种芯片,所述芯片包括处理器。用于存储计算机程序的存储器独立于芯片而设置,处理器用于执行存储器中存储的计算机程序,以使得任意一个方法实施例中由终端设备执行的操作和/或处理被执行。
进一步地,所述芯片还可以包括通信接口。所述通信接口可以是输入/输出接口,也可以为接口电路等。进一步地,所述芯片还可以包括所述存储器。
本申请还提供一种芯片,所述芯片包括处理器。用于存储计算机程序的存储器独立于芯片而设置,处理器用于执行存储器中存储的计算机程序,以使得任意一个方法实施例中由第一调度器或第二调度器执行的操作和/或处理被执行。
进一步地,所述芯片还可以包括通信接口。所述通信接口可以是输入/输出接口,也可以为接口电路等。进一步地,所述芯片还可以包括所述存储器。
此外,本申请还提供一种调度***,包括本申请实施例中的终端设备、网络设备、第一调度器及第二调度器中的部分或全部。
本申请实施例中的处理器可以是集成电路芯片,具有处理信号的能力。在实现过程中,上述方法实施例的各步骤可以通过处理器中的硬件的集成逻辑电路或者软件形式的指令完成。处理器可以是通用处理器、数字信号处理器(digital signal processor,DSP)、专用集成电路(application specific integrated circuit,ASIC)、现场可编程门阵列(field programmable gate array,FPGA)或其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。本申请实施例公开的方法的步骤可以直接体现为硬件编码处理器执行完成,或者用编码处理器中的硬件及软件模块组合执行完成。软件模块可以位于随机存储器,闪存、只读存储器,可编程只读存储器或者电可擦写可编程存储器、寄存器等本领域成熟的存储介质中。该存储介质位于存储器,处理器读取存储器中的信息,结合其硬件完成上述方法的步骤。
本申请实施例中的存储器可以是易失性存储器或非易失性存储器,或可包括易失性和非易失性存储器两者。其中,非易失性存储器可以是只读存储器(read-only memory,ROM)、可编程只读存储器(programmable ROM,PROM)、可擦除可编程只读存储器(erasable PROM,EPROM)、电可擦除可编程只读存储器(electrically EPROM,EEPROM)或闪存。易失性存储器可以是随机存取存储器(random access memory,RAM),其用作外部高速缓 存。通过示例性但不是限制性说明,许多形式的RAM可用,例如静态随机存取存储器(static RAM,SRAM)、动态随机存取存储器(dynamic RAM,DRAM)、同步动态随机存取存储器(synchronous DRAM,SDRAM)、双倍数据速率同步动态随机存取存储器(double data rate SDRAM,DDR SDRAM)、增强型同步动态随机存取存储器(enhanced SDRAM,ESDRAM)、同步连接动态随机存取存储器(synchlink DRAM,SLDRAM)和直接内存总线随机存取存储器(direct rambus RAM,DRRAM)。应注意,本文描述的***和方法的存储器旨在包括但不限于这些和任意其它适合类型的存储器。
本领域普通技术人员可以意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,能够以电子硬件、或者计算机软件和电子硬件的结合来实现。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。
所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的***、装置和单元的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。
在本申请所提供的几个实施例中,应该理解到,所揭露的***、装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个***,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。
另外,在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。
本申请中术语“和/或”,仅仅是一种描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B这三种情况。其中,A、B以及C均可以为单数或者复数,不作限定。
所述功能如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本申请各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、ROM、RAM、磁碟或者光盘等各种可以存储程序代码的介质。
以上所述,仅为本申请的具体实施方式,但本申请的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应以所述权利要求的保护范围为准。

Claims (24)

  1. 一种调度方法,其特征在于,应用于由至少一个调度器组成的调度***中,所述调度***包括第一调度器,所述方法包括:
    所述第一调度器在第i个时间单元获取第一收益反馈,其中,i≥1且i为整数;
    所述第一调度器根据所述第一收益反馈确定第一调度决策,其中,所述第一收益反馈是终端设备根据第二调度决策确定的,所述第二调度决策为所述第一调度器在所述第一调度决策之前确定的上一次的调度决策;
    所述第一调度器在第i+N个时间单元发送所述第一调度决策,其中,N>1且N为整数。
  2. 根据权利要求1所述的方法,其特征在于,所述调度***还包括一个或多个第二调度器,所述方法还包括:
    所述第二调度器在第i+j个时间单元获取第二收益反馈,其中,1≤j≤N-1且j为整数;
    所述第二调度器根据所述第二收益反馈确定第三调度决策,其中,所述第二收益反馈是所述终端设备根据第四调度决策确定的,所述第四调度决策为所述第二调度器在所述第三调度决策之前确定的上一次的调度决策,
    所述第一调度器确定的调度决策和所述第二调度器确定的调度决策分别为所述第一调度器和所述第二调度器对同一任务的调度决策;
    所述第二调度器在第i+j+M个时间单元发送所述第二调度决策,其中,M>1且M为整数。
  3. 根据权利要求2所述的方法,其特征在于,所述方法还包括:
    所述第一调度器向所述第二调度器发送第一信息,所述第一信息包括所述第一调度决策或第三收益反馈,所述第三收益反馈是所述终端设备根据所述第一调度决策确定的;
    所述第二调度器接收所述第一信息并根据所述第一信息调整之后对所述任务的调度决策。
  4. 根据权利要求2或3所述的方法,其特征在于,所述方法还包括:
    所述第二调度器向所述第一调度器发送第二信息,所述第二信息包括所述第二调度决策或第四收益反馈,所述第四收益反馈是所述终端设备根据所述第二调度决策确定的;
    所述第一调度器接收所述第二信息并根据所述第二信息调整之后对所述任务的调度决策。
  5. 一种调度方法,其特征在于,包括:
    终端设备在第i个时间单元发送第一收益反馈,其中,i≥1且i为整数;
    所述终端设备在第i+N个时间单元接收第一调度器根据所述第一收益反馈确定的第一调度决策,其中,所述第一收益反馈是所述终端设备根据第二调度决策确定的,所述第二调度决策为所述第一调度器在所述第一调度决策之前确定的上一次的调度决策,N>1且N为整数。
  6. 根据权利要求5所述的方法,其特征在于,所述方法还包括:
    所述终端设备在第i+j个时间单元发送第二收益反馈,其中,1≤j≤N-1且j为整数;
    所述终端设备在第i+j+M个时间单元接收第二调度器根据所述第二收益反馈确定的第三调度决策,其中,M>1且M为整数,
    所述第二收益反馈是所述终端设备根据第四调度决策确定的,所述第四调度决策为所述第二调度器在所述第三调度决策之前确定的上一次的调度决策,
    所述第一调度器确定的调度决策和所述第二调度器确定的调度决策分别为所述第一调度器和所述第二调度器对同一任务的调度决策。
  7. 根据权利要求5或6所述的方法,其特征在于,N等于2。
  8. 根据权利要求5至7中任一项所述的方法,其特征在于,所述N的值是通信***或通信协议规定的。
  9. 一种调度***,其特征在于,所述调度***包括:
    第一调度器,用于在第i个时间单元获取第一收益反馈,其中,i≥1且i为整数;
    所述第一调度器,还用于根据所述第一收益反馈确定第一调度决策,其中,所述第一收益反馈是终端设备根据第二调度决策确定的,所述第二调度决策为所述第一调度器在所述第一调度决策之前确定的上一次的调度决策;
    所述第一调度器,还用于在第i+N个时间单元发送所述第一调度决策,其中,N>1且N为整数。
  10. 根据权利要求9所述的调度***,其特征在于,所述调度***还包括一个或多个第二调度器,
    所述第二调度器,用于在第i+j个时间单元获取第二收益反馈,其中,1≤j≤N-1且j为整数;
    所述第二调度器,还用于根据所述第二收益反馈确定第三调度决策,其中,
    所述第二收益反馈是所述终端设备根据第四调度决策确定的,所述第四调度决策为所述第二调度器在所述第三调度决策之前确定的上一次的调度决策,
    所述第一调度器确定的调度决策和所述第二调度器确定的调度决策分别为所述第一调度器和所述第二调度器对同一任务的调度决策;
    所述第二调度器,还用于在第i+j+M个时间单元发送所述第二调度决策,其中,M>1且M为整数。
  11. 根据权利要求10所述的调度***,其特征在于,
    所述第一调度器,还用于向所述第二调度器发送第一信息,所述第一信息包括所述第一调度决策或第三收益反馈,所述第三收益反馈是所述终端设备根据所述第一调度决策确定的;
    所述第二调度器,还用于接收所述第一信息并根据所述第一信息调整之后对所述任务的调度决策。
  12. 根据权利要求10或11所述的调度***,其特征在于,
    所述第二调度器,还用于向所述第一调度器发送第二信息,所述第二信息包括所述第二调度决策或第四收益反馈,所述第四收益反馈是所述终端设备根据所述第二调度决策确定的;
    所述第一调度器,还用于接收所述第二信息并根据所述第二信息调整之后对所述任务的调度决策。
  13. 一种调度装置,其特征在于,包括:
    发送单元,用于在第i个时间单元发送第一收益反馈,其中,i≥1且i为整数;
    接收单元,用于在第i+N个时间单元接收第一调度器根据所述第一收益反馈确定的第一调度决策,其中,
    处理单元,用于根据第二调度决策确定所述第一收益,所述第二调度决策为所述第一调度器在所述第一调度决策之前确定的上一次的调度决策,N>1且N为整数。
  14. 根据权利要求13所述调度装置,其特征在于,
    所述发送单元,还用于在第i+j个时间单元发送第二收益反馈,其中,1≤j≤N-1且j为整数;
    所述接收单元,用于在第i+j+M个时间单元接收第二调度器根据所述第二收益反馈确定的第三调度决策,其中,M>1且M为整数,
    所述处理单元,用于根据第四调度决策确定所述第二收益反馈,所述第四调度决策为所述第二调度器在所述第三调度决策之前确定的上一次的调度决策,
    所述第一调度器确定的调度决策和所述第二调度器确定的调度决策分别为所述第一调度器和所述第二调度器对同一任务的调度决策。
  15. 根据权利要求13或14所述的调度装置,其特征在于,N等于2。
  16. 根据权利要求13至15中任一项所述的调度装置,其特征在于,所述N的值是通信***或通信协议规定的。
  17. 一种调度装置,其特征在于,应用于由至少一个调度装置组成的通信***中,包括:
    第一处理单元,用于在第i个时间单元获取第一收益反馈,其中,i≥1且i为整数;
    所述第一处理单元,还用于根据所述第一收益反馈确定第一调度决策,其中,所述第一收益反馈是终端设备根据第二调度决策确定的,所述第二调度决策为所述处理单元在所述第一调度决策之前确定的上一次的调度决策;
    第一发送单元,用于在所述第i+N个时间单元之前发送第一调度决策,其中,N>1且N为整数。
  18. 根据权利要求17所述的调度装置,其特征在于,所述调度装置还包括:
    第二接收单元,用于在第i+j个时间单元获取第二收益反馈,其中,1≤j≤N-1且j为整数;
    第二处理单元,用于根据所述第二收益反馈确定第三调度决策,其中,所述第二收益反馈是所述终端设备根据第四调度决策确定的,所述第四调度决策为所述第二处理单元在所述第三调度决策之前确定的上一次的调度决策,所述第一处理单元确定的调度决策和所述第二处理单元确定的调度决策分别为所述第一处理单元和所述第二处理单元对同一任务的调度决策;
    第二发送单元,用于在第i+j+M个时间单元发送所述第二调度决策,其中,M>1且M为整数。
  19. 根据权利要求18所述的调度装置,其特征在于,
    所述第一发送单元,还用于向所述第二调度器发送第一信息,所述第一信息包括所述第一调度决策或第三收益反馈,所述第三收益反馈是所述终端设备根据所述第一调度决策 确定的;
    所述第二接收单元,还用于接收所述第一信息并根据所述第一信息调整之后对所述任务的调度决策。
  20. 根据权利要求18或19所述的调度装置,其特征在于,所述调度装置还包括:第一接收单元;
    所述第二发送单元,还用于向所述第一接收单元发送第二信息,所述第二信息包括所述第二调度决策或第四收益反馈,所述第四收益反馈是所述终端设备根据所述第二调度决策确定的;
    所述第一接收单元,用于接收所述第二信息并根据所述第二信息调整之后对所述任务的调度决策。
  21. 一种调度装置,其特征在于,包括至少一个处理器,所述至少一个处理器与至少一个存储器耦合,所述至少一个处理器用于执行至少一个存储器中存储的计算机程序或指令,以使得所述调度装置实现如权利要求1至4中任一项所述的方法,或者,以使得所述装置实现如权利要求5至8中任一项所述的方法。
  22. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质中存储有计算机指令,当所述计算机指令在计算机上运行时,如权利要求1至4中任一项所述的方法被执行,或者,如权利要求5至8中任一项所述的方法被执行。
  23. 一种计算机程序产品,其特征在于,所述计算机程序产品中包括计算机程序代码,当所述计算机程序代码在计算机上运行时,如权利要求1至4中任一项所述的方法被执行,或者,如权利要求5至8中任一项所述的方法被执行。
  24. 一种计算机程序,当其在计算机上运行时,使得如权利要求1至4中任一项所述的方法被执行,或者,如权利要求5至8中任一项所述的方法被执行。
PCT/CN2021/089129 2020-05-20 2021-04-23 调度方法、调度***和调度装置 WO2021233061A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
EP21809268.2A EP4142409A4 (en) 2020-05-20 2021-04-23 PLANNING METHOD, PLANNING SYSTEM AND PLANNING DEVICE
US17/988,815 US20230072585A1 (en) 2020-05-20 2022-11-17 Scheduling method, scheduling system, and scheduling apparatus

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010430887.3A CN113709890A (zh) 2020-05-20 2020-05-20 调度方法、调度***和调度装置
CN202010430887.3 2020-05-20

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US17/988,815 Continuation US20230072585A1 (en) 2020-05-20 2022-11-17 Scheduling method, scheduling system, and scheduling apparatus

Publications (1)

Publication Number Publication Date
WO2021233061A1 true WO2021233061A1 (zh) 2021-11-25

Family

ID=78645600

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/089129 WO2021233061A1 (zh) 2020-05-20 2021-04-23 调度方法、调度***和调度装置

Country Status (4)

Country Link
US (1) US20230072585A1 (zh)
EP (1) EP4142409A4 (zh)
CN (1) CN113709890A (zh)
WO (1) WO2021233061A1 (zh)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112888076B (zh) * 2019-11-29 2023-10-24 华为技术有限公司 一种调度方法及装置
US20220007382A1 (en) * 2020-10-07 2022-01-06 Intel Corporation Model-assisted deep reinforcement learning based scheduling in wireless networks

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2007123121A1 (ja) * 2006-04-19 2007-11-01 Ntt Docomo, Inc. 無線基地局および送信制御方法
CN101291526A (zh) * 2007-04-18 2008-10-22 松下电器产业株式会社 减少信息反馈量的自适应调度方法和装置
CN101754385A (zh) * 2008-12-01 2010-06-23 日电(中国)有限公司 使用缺陷cqi反馈的比例公平调度器和调度方法
US20100284345A1 (en) * 2009-05-11 2010-11-11 Rudrapatna Ashok N System and method for cell-edge performance management in wireless systems using distributed scheduling
CN108924198A (zh) * 2018-06-21 2018-11-30 中国联合网络通信集团有限公司 一种基于边缘计算的数据调度方法、装置及***

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7702289B2 (en) * 2005-07-21 2010-04-20 Motorola, Inc. Fast acquisition of a communication uplink allocation in a mobile communication system based on mobile processing capabilities
CN101515847B (zh) * 2008-02-18 2012-10-10 中兴通讯股份有限公司 基于无线通信时分双工***的上行传输/反馈方法及***
US8929239B2 (en) * 2012-07-02 2015-01-06 Apple Inc. Modulation and coding scheme (MCS) recovery based on CQI offset
EP3776374A4 (en) * 2018-03-27 2021-11-17 Nokia Solutions and Networks Oy METHOD AND APPARATUS FOR FACILITATING THE MATCHING OF RESOURCES USING A DEEP Q NETWORK

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2007123121A1 (ja) * 2006-04-19 2007-11-01 Ntt Docomo, Inc. 無線基地局および送信制御方法
CN101291526A (zh) * 2007-04-18 2008-10-22 松下电器产业株式会社 减少信息反馈量的自适应调度方法和装置
CN101754385A (zh) * 2008-12-01 2010-06-23 日电(中国)有限公司 使用缺陷cqi反馈的比例公平调度器和调度方法
US20100284345A1 (en) * 2009-05-11 2010-11-11 Rudrapatna Ashok N System and method for cell-edge performance management in wireless systems using distributed scheduling
CN108924198A (zh) * 2018-06-21 2018-11-30 中国联合网络通信集团有限公司 一种基于边缘计算的数据调度方法、装置及***

Also Published As

Publication number Publication date
US20230072585A1 (en) 2023-03-09
EP4142409A4 (en) 2023-11-08
EP4142409A1 (en) 2023-03-01
CN113709890A (zh) 2021-11-26

Similar Documents

Publication Publication Date Title
WO2021233053A1 (zh) 计算卸载的方法和通信装置
US11974305B2 (en) Method for transmitting uplink channel via multi-beams, terminal device and network-side device
US20230072585A1 (en) Scheduling method, scheduling system, and scheduling apparatus
CN110169137A (zh) 用于传递***信息的***和方法
CN108702693A (zh) 移动通信中的群组公共物理下行链路控制信道设计
CN111432476B (zh) 一种波束方向的指示方法、基站及终端
WO2020088088A1 (zh) 一种数据传输方法及终端设备
WO2021031042A1 (zh) 信号发送和接收方法以及装置
JP2022516899A (ja) 無線通信方法及び装置
CN107005939A (zh) 经区分优先次序的rts‑cts资源
CN107889264A (zh) 一种上行传输方法和设备
CN110351011A (zh) 资源分配方法、相关装置及***
Pacheco-Paramo et al. Delay-aware dynamic access control for mMTC in wireless networks using deep reinforcement learning
CN117311296B (zh) 基于工业模型的高强度生产线协同优化与能效管理方法
CN113676736A (zh) 数据帧的传输方法和通信装置
CN109067512B (zh) 信号传输方法、相关装置及***
WO2013054417A1 (ja) 無線通信システム、基地局、及び無線通信方法
WO2021227069A9 (zh) 一种模型更新方法及装置、通信设备
CN114698138A (zh) 一种信道接入方法和装置
WO2021027904A1 (zh) 无线通信的方法和装置以及通信设备
CN109863799A (zh) 一种资源分配和传输数据包的方法及设备
CN113810949A (zh) 数据传输方法和装置
CN114760690A (zh) 数据传输方法和数据传输装置
JP2023536384A (ja) データ送信方法、端末装置、及びネットワーク装置
EP4407907A1 (en) Dynamic data transmission method and apparatus, and storage medium

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21809268

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2021809268

Country of ref document: EP

Effective date: 20221124

NENP Non-entry into the national phase

Ref country code: DE