CN116113050A

CN116113050A - Dynamic beam scheduling method and device

Info

Publication number: CN116113050A
Application number: CN202211711903.1A
Authority: CN
Inventors: 黄锐; 李屹寰; 赵冬; 齐浩; 刘悦; 陈宏�
Original assignee: China Telecom Satellite Communications Co Ltd
Current assignee: China Telecom Satellite Communications Co Ltd
Priority date: 2022-12-29
Filing date: 2022-12-29
Publication date: 2023-05-12

Abstract

The application discloses a dynamic beam scheduling method and device. Wherein the method comprises the following steps: issuing a first beam scheduling instruction to a satellite communication system, and acquiring a first beam projection state and a first network communication state in a beam coverage area after the satellite communication system executes the first beam scheduling instruction; inputting the first beam projection state and the first network communication state into a beam scheduling strategy model based on reinforcement learning, and obtaining a second beam scheduling instruction output by the beam scheduling strategy model, wherein the beam scheduling strategy model predicts that the second network communication state in a beam coverage area after the satellite communication system executes the second beam scheduling instruction is better than the first network communication state; and issuing a second beam scheduling instruction to the satellite communication system. The method and the device solve the technical problems of low beam scheduling efficiency and poor flexibility in the satellite-ground fusion network in the related technology.

Description

Dynamic beam scheduling method and device

Technical Field

The present application relates to the field of mobile communications technologies, and in particular, to a method and an apparatus for dynamic beam scheduling.

Background

In recent years, satellite-to-earth fusion network (STIN) has obvious advantages in bandwidth, power consumption, spectrum efficiency and the like, so that the star-to-earth fusion network gradually becomes a new research focus in the future communication field. In addition, with the large-scale increase of the number of wireless terminal devices and the continuous increase of the demand for transmission rate by users, high-rate transmission and low power consumption become one of the key issues to be considered in future communication systems.

Although the related dynamic resource allocation algorithm achieves great achievements in aspects of improving the throughput of the system, reducing the transmission delay of data packets and the like, when a service scene changes, the dynamic resource allocation algorithm needs to be updated and iterated, and the offline heuristic algorithm has higher complexity and poorer flexibility, so that the method is not suitable for space-borne resource management. Therefore, a beam scheduling method with high efficiency and high flexibility is not provided so far and can be applied to the star-ground fusion network.

In view of the above problems, no effective solution has been proposed at present.

Disclosure of Invention

The embodiment of the application provides a dynamic beam scheduling method and a dynamic beam scheduling device, which are used for at least solving the technical problems of lower beam scheduling efficiency and poorer flexibility in a star-ground fusion network in the related art.

According to an aspect of the embodiments of the present application, there is provided a dynamic beam scheduling method, including: issuing a first beam scheduling instruction to a satellite communication system, and acquiring a first beam projection state and a first network communication state in a beam coverage area after the satellite communication system executes the first beam scheduling instruction; inputting the first beam projection state and the first network communication state into a beam scheduling strategy model based on reinforcement learning, and obtaining a second beam scheduling instruction output by the beam scheduling strategy model, wherein the beam scheduling strategy model predicts that the second network communication state in a beam coverage area after the satellite communication system executes the second beam scheduling instruction is better than the first network communication state; and issuing a second beam scheduling instruction to the satellite communication system.

Optionally, the first beam projection state includes at least: a first number of projection beams and a second number of beam coverage areas, wherein the first number is not greater than the second number; the first network communication state includes at least: traffic requests in the beam coverage area, signal-to-interference-and-noise ratio, channel capacity and channel transmission delay, wherein the signal-to-interference-and-noise ratio and channel capacity are determined based on the transmit power, power gain and noise power spectral density of the beam projected into the beam coverage area.

Optionally, the beam scheduling policy model includes: the system comprises a beam scheduling strategy determination sub-model, an experience pool, a label determination sub-model and an optimizer, wherein the beam scheduling strategy determination sub-model comprises a state action mapping function which is used for determining a second beam scheduling instruction according to a first beam projection state and a first network communication state; the experience pool is used for storing the first beam scheduling instruction, the first beam projection state, the first network communication state and the second beam scheduling instruction of the calendar in an array mode as training samples; the label determining sub-model is used for determining sample labels corresponding to the training samples; the optimizer is used for adjusting the beam scheduling strategy to determine model parameters of the sub-model.

Optionally, the training process of the beam scheduling policy model includes: selecting a preset number of target training samples from an experience pool; determining target sample labels corresponding to all target training samples by using a label determination sub-model; sequentially inputting each target training sample into a beam scheduling strategy determination sub-model, and constructing a target loss function according to the output result of the beam scheduling strategy determination sub-model and a target sample label; the model parameters of the sub-model are determined by adjusting the beam scheduling strategy with an optimizer in a manner that minimizes the target loss function.

Optionally, before training the beam scheduling policy model, the method further comprises: initializing an experience pool, and storing a first beam scheduling instruction, a beam projection state, a network communication state and a second beam scheduling instruction of the calendar into the experience pool in an array mode to serve as training samples; when the number of training samples in the experience pool exceeds a first preset threshold value, training the beam scheduling strategy model; when the number of training samples in the experience pool exceeds the capacity threshold of the experience pool, deleting the training samples stored in the experience pool first according to the first-in first-out principle.

Optionally, the second beam scheduling instruction is configured to instruct the satellite communication system to project, to each beam coverage area, a target beam that matches the service request amount of the beam coverage area and has a channel capacity greater than a second preset threshold, where the target beam is allocated with subcarriers ordered earlier according to the signal-to-noise ratio and the channel condition, and the order of subcarriers allocated by the target beam with a smaller channel capacity is more advanced.

Optionally, the second beam scheduling instruction is further configured to instruct the satellite communication system to project a target beam matching the service request to an area where the service request exists but is not covered by the beam.

According to another aspect of the embodiments of the present application, there is also provided a dynamic beam scheduling apparatus, including: the first issuing module is used for issuing a first beam scheduling instruction to the satellite communication system and acquiring a first beam projection state and a first network communication state in a beam coverage area after the satellite communication system executes the first beam scheduling instruction; the strategy determining module is used for inputting the first beam projection state and the first network communication state into a beam scheduling strategy model based on reinforcement learning to obtain a second beam scheduling instruction output by the beam scheduling strategy model, wherein the beam scheduling strategy model predicts that the second network communication state in the beam coverage area after the satellite communication system executes the second beam scheduling instruction is better than the first network communication state; and the second issuing module is used for issuing a second beam scheduling instruction to the satellite communication system.

According to another aspect of the embodiments of the present application, there is further provided a nonvolatile storage medium, where the nonvolatile storage medium includes a stored program, and a device where the nonvolatile storage medium is located executes the dynamic beam scheduling method described above by running the program.

According to another aspect of the embodiments of the present application, there is also provided an electronic device including: the system comprises a memory and a processor, wherein the memory stores a computer program, and the processor is configured to execute the dynamic beam scheduling method through the computer program.

In the embodiment of the application, a first beam scheduling instruction is issued to a satellite communication system, and a first beam projection state and a first network communication state in a beam coverage area after the satellite communication system executes the first beam scheduling instruction are obtained; inputting the first beam projection state and the first network communication state into a beam scheduling strategy model based on reinforcement learning, and obtaining a second beam scheduling instruction output by the beam scheduling strategy model, wherein the beam scheduling strategy model predicts that the second network communication state in a beam coverage area after the satellite communication system executes the second beam scheduling instruction is better than the first network communication state; and issuing a second beam scheduling instruction to the satellite communication system. The beam projection state and the network communication state are input into the reinforced learning beam scheduling strategy model, so that the beam scheduling is effectively performed according to the service requirements and the system energy efficiency, and the technical problems of low beam scheduling efficiency and poor flexibility in the star-ground fusion network in the related technology are solved.

Drawings

The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiments of the application and together with the description serve to explain the application and do not constitute an undue limitation to the application. In the drawings:

FIG. 1 is a flow chart of an alternative dynamic beam scheduling method according to an embodiment of the present application;

fig. 2 is a schematic structural diagram of an alternative multi-beam satellite system model with beam hopping functionality according to an embodiment of the present application;

FIG. 3 is a flow chart of an alternative reinforcement learning based beam scheduling policy model in accordance with an embodiment of the present application;

fig. 4 is a schematic structural diagram of an alternative dynamic beam scheduling apparatus according to an embodiment of the present application.

Detailed Description

In order to make the present application solution better understood by those skilled in the art, the following description will be made in detail and with reference to the accompanying drawings in the embodiments of the present application, it is apparent that the described embodiments are only some embodiments of the present application, not all embodiments. All other embodiments, which can be made by one of ordinary skill in the art based on the embodiments herein without making any inventive effort, shall fall within the scope of the present application.

It should be noted that the terms "first," "second," and the like in the description and claims of this application and the accompanying drawings are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that embodiments of the present application described herein may be implemented in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

Example 1

In accordance with embodiments of the present application, a dynamic beam scheduling method is provided, it being noted that the steps illustrated in the flowcharts of the figures may be performed in a computer system, such as a set of computer executable instructions, and, although a logical order is illustrated in the flowcharts, in some cases, the steps illustrated or described may be performed in an order other than that illustrated herein.

Fig. 1 is a flowchart of an alternative dynamic beam scheduling method according to an embodiment of the present application, as shown in fig. 1, the method at least includes steps S102-S106, where:

step S102, a first beam scheduling instruction is issued to the satellite communication system, and a first beam projection state and a first network communication state in a beam coverage area after the satellite communication system executes the first beam scheduling instruction are obtained.

In the technical scheme provided in the step S102, a multi-beam satellite communication system model with a beam hopping function is determined, a first beam scheduling instruction is issued to the satellite communication system through the dynamic beam scheduling system, and a first beam projection state and a first network communication state in a beam coverage area after the satellite communication system executes the first beam scheduling instruction are acquired.

Step S104, the first beam projection state and the first network communication state are input into a beam scheduling strategy model based on reinforcement learning, and a second beam scheduling instruction output by the beam scheduling strategy model is obtained, wherein the beam scheduling strategy model predicts that the second network communication state in the beam coverage area after the satellite communication system executes the second beam scheduling instruction is better than the first network communication state.

In the technical scheme provided in the step S104 of the present invention, the first beam projection state and the first network communication state which are currently acquired are analyzed by the beam scheduling policy model based on reinforcement learning, and the beam scheduling instruction which is better than the current first network communication state is output by comprehensively considering the channel capacity, the service request amount and the data waiting time length, wherein the model structure of the beam scheduling policy model based on reinforcement learning can evaluate the given state, and the dynamic beam scheduling in the satellite communication system is reinforced by using the accumulated rewarding value.

And step S106, issuing a second beam scheduling instruction to the satellite communication system.

In the technical solution provided in the above step S106 of the present invention, after the second beam scheduling instruction is output by the beam scheduling policy model based on reinforcement learning, the second beam scheduling instruction is issued to the satellite communication system, so that the second network communication state in the beam coverage area after the satellite communication system executes the second beam scheduling instruction is better than the first network communication state.

In the technical scheme provided in steps S102-S106 of the present application, a first beam scheduling instruction is issued to a satellite communication system, and a first beam projection state and a first network communication state in a beam coverage area after the satellite communication system executes the first beam scheduling instruction are obtained; inputting the first beam projection state and the first network communication state into a beam scheduling strategy model based on reinforcement learning, and obtaining a second beam scheduling instruction output by the beam scheduling strategy model, wherein the beam scheduling strategy model predicts that the second network communication state in a beam coverage area after the satellite communication system executes the second beam scheduling instruction is better than the first network communication state; and issuing a second beam scheduling instruction to the satellite communication system. The beam projection state and the network communication state are input into the reinforced learning beam scheduling strategy model, so that the beam scheduling is effectively performed according to the service requirements and the system energy efficiency, and the technical problems of low beam scheduling efficiency and poor flexibility in the star-ground fusion network in the related technology are solved.

The above steps in this embodiment will be further described below.

As an optional implementation manner, in the technical solution provided in step S102 of the present invention, the method includes: the first beam projection state at least comprises: a first number of projection beams and a second number of beam coverage areas, wherein the first number is not greater than the second number; the first network communication state includes at least: traffic requests in the beam coverage area, signal-to-interference-and-noise ratio, channel capacity and channel transmission delay, wherein the signal-to-interference-and-noise ratio and channel capacity are determined based on the transmit power, power gain and noise power spectral density of the beam projected into the beam coverage area.

In this embodiment, the ground isThe user terminal is the main object of the internet access service, and the terrestrial user terminal is connected to the gateway station by the forward link (i.e. downlink) of the satellite communication system, fig. 2 is a schematic diagram of an alternative multi-beam satellite system model with beam hopping function according to an embodiment of the present application. Assuming that K active beams provided by the satellite communication system cover N cells under the area, and the number K of active beams is much smaller than the number N of cells, the set can be used to represent a multi-beam satellite communication system model with a beam hopping function, and thus the first number of projection beams can be expressed as b= { d _k I k=1, 2,3, …, K }, while the second number of beam coverage areas represents c= { C _n |n＝1，2，3，…，N}。

In addition, the effective load parameter and the channel condition can be correspondingly decision optimized by comprehensively considering the channel capacity and the user service request through the beams distributed by a plurality of cells at a certain moment. If the star communication system is set at each time t _s Dynamically scheduling the coverage area of K projection beams to N cells, wherein the service request amount in the coverage area of the beams is W, and each time t can be set _s The dynamically scheduled projection beam is expressed as:

wherein (1)>

Representing cell c _n At t _j The beam is obtained at the moment, otherwise, the beam is not obtained. Thus, for cell c assigned to the beam _n (i.e.)>

) At t _j The signal-to-interference-and-noise ratio at time can also be expressed by the following equation:

/>

wherein the signal-to-interference-and-noise ratio is mainly calculated by the methodTransmit power of a beam directed into a beam coverage area

Power gain->

And noise power spectral density N ₀ Determined, wherein->

Indicated at t _j From time of day projected to cell c within the beam coverage area _n The transmit power of the beam; />

Representing t _j From time of day projected to cell c within the beam coverage area _r Is provided.

Further, the projection to the cell c in the beam coverage area can be determined by the signal-to-interference-and-noise ratio _n The channel capacity of (2) is calculated as follows:

wherein f _DVB-s2 Is a mapping function based on satellite second generation digital video broadcast specifications.

As an optional implementation manner, in the technical solution provided in step S104 of the present invention, the beam scheduling policy model includes: the system comprises a beam scheduling strategy determination sub-model, an experience pool, a label determination sub-model and an optimizer, wherein the beam scheduling strategy determination sub-model comprises a state action mapping function which is used for determining a second beam scheduling instruction according to a first beam projection state and a first network communication state; the experience pool is used for storing the first beam scheduling instruction, the first beam projection state, the first network communication state and the second beam scheduling instruction of the calendar in an array mode as training samples; the label determining sub-model is used for determining sample labels corresponding to the training samples; the optimizer is used for adjusting the beam scheduling strategy to determine model parameters of the sub-model.

In this embodiment, the construction of the beam scheduling policy model is immediately a markov decision process, that is, after the decision agent outputs the current beam state, the satellite communication system receives the decision action at the first time and makes a corresponding reward for the output, then inputs the action value function capable of being characterized into the beam scheduling policy determining sub-model to obtain the mapping action value of the relevant beam, and then further optimizes the output decision through the experience pool, the tag determining sub-model and the optimizer, thereby optimizing the scheduling performance of the multi-beam satellite communication system and further meeting the communication quality requirements of users in different cells to the greatest extent.

For example, fig. 3 is a flow chart of an alternative beam scheduling policy model based on reinforcement learning according to an embodiment of the present application, and as can be seen from fig. 3, the multi-beam satellite system model will execute a first beam scheduling instruction (i.e. action a _t ) The first beam projection state (i.e. state s _t ) And a first network communication state (reward r) within the beam coverage area _t ) In a beam scheduling policy determining sub-model (i.e., a Q network) input to the beam scheduling policy model, determining a second beam scheduling instruction through the first beam projection state and the first network communication state; then, the first beam scheduling command, the first beam projection state, the first network communication state and the second beam scheduling command are arranged in an array manner, such as(s) _t ，a _t ，r _t ，s _t+1 ) Is stored in an experience pool as a training sample; and then the label is used for determining sample labels corresponding to all training samples in the label determination sub-model (namely the target network), and in addition, the model parameters of the beam scheduling strategy determination sub-model (the target network) can be continuously adjusted through an optimizer so as to improve the decision performance.

Further, after the analysis of the markov decision process model, the arrival rate of the data packet and the time variability of the channel condition in a specific scene can be comprehensively considered. Because the time variability of the channel condition in the decision process is difficult to obtain, in the application, the state, the action and the rewards in the decision process can be optimized by using a deep learning algorithm, so that the purpose of global performance improvement is achieved. The state is mainly abstracted from a specific environment, so that an optimal solution can be provided for beam scheduling, and the method is an important basis for agent decision. After the state is input in the deep neural network, the delay plan in the structure fixed state is reconstructed, so that the delay equilibrium is ensured. The actions are used to represent the degree of advancement of the agent decision process, and the minimum packet average delay is combined with the non-zero elements of the known vector to obtain all the actions of beam scheduling. The rewards are the maximum rewards that the intelligent agent can obtain in the Markov decision process model, and if the current accumulated time delay is larger, the rewards obtained by the intelligent agent are proved to be smaller.

As an optional implementation manner, in the technical solution provided in step S104 of the present invention, the training process of the beam scheduling policy model includes: selecting a preset number of target training samples from an experience pool; determining target sample labels corresponding to all target training samples by using a label determination sub-model; sequentially inputting each target training sample into a beam scheduling strategy determination sub-model, and constructing a target loss function according to the output result of the beam scheduling strategy determination sub-model and a target sample label; the model parameters of the sub-model are determined by adjusting the beam scheduling strategy with an optimizer in a manner that minimizes the target loss function.

In this embodiment, since the construction of the beam scheduling policy model is immediately a Markov decision process, the size N can be randomly sampled directly from the experience pool _mb The target training samples corresponding to each target training sample of the sub-model are determined through the labels, the target training samples for determining the target sample labels are sequentially input into the beam scheduling strategy determination sub-model, a target loss function is constructed through the result output by the beam scheduling strategy determination sub-model and the target sample labels, and model parameters of the beam scheduling strategy determination sub-model are adjusted by an optimizer in a mode of adopting minimum mean square error, so that the target loss function can be written:

L(θ)＝E[(y _t -Q(s _t ,a _t ；θ)) ² ]

Wherein y is _t Representing the target sample tag.

As an optional implementation manner, in the technical solution provided in step S104 of the present invention, before training the beam scheduling policy model, the method further includes: initializing an experience pool, and storing a first beam scheduling instruction, a beam projection state, a network communication state and a second beam scheduling instruction of the calendar into the experience pool in an array mode to serve as training samples; when the number of training samples in the experience pool exceeds a first preset threshold value, training the beam scheduling strategy model; when the number of training samples in the experience pool exceeds the capacity threshold of the experience pool, deleting the training samples stored in the experience pool first according to the first-in first-out principle.

In this embodiment, when a nonlinear function approximator such as a neural network is used to represent the action function value, since there is a correlation in the training samples, and the training labels present an unstable state along with updating of the beam scheduling policy determination submodel, reinforcement learning is difficult to converge, and even a divergent situation occurs. Therefore, in order to overcome this drawback in the embodiment of the present application, before training the beam scheduling policy model, the experience pool is initialized, that is, the capacity in the experience pool is emptied, so that when the number of training samples of the first beam scheduling instruction, the beam projection state, the network communication state and the second beam scheduling instruction stored in the experience pool in an array manner exceeds a first preset threshold, training of the beam scheduling policy model is started, thereby implementing batch normalization processing of multiple beams.

In addition, the selection of the action strategy generally occurs after the training of the intelligent agent is completed, and factors which possibly influence the action value need to be optimized by utilizing a weighing method in consideration of the incompleteness of the state space experienced by the intelligent agent, so that the intelligent agent can obtain the best output effect along with the increase of iteration times when exploring new actions, and the probability of multi-beam scheduling exploration is further reduced.

As an optional implementation manner, in the technical solution provided in step S106 of the present invention, the method includes: the second beam scheduling instruction is used for indicating the satellite communication system to project target beams which are matched with the service request quantity of the beam coverage areas and have channel capacity larger than a second preset threshold value to each beam coverage area, wherein the target beams are allocated with the subcarriers which are ordered to be front according to the signal to noise ratio and the channel condition, and the targets with smaller channel capacity are allocated with the subcarriers which are ordered to be front.

In this embodiment, the satellite communication system orders the subcarriers of each target beam according to the priority relationship, so as to obtain the optimal subcarriers, thereby ensuring that the beam can obtain the optimal subcarriers; then, searching the sub-carrier wave with the optimal channel condition and signal-to-noise ratio in the multi-beam, then preferentially distributing the optimal sub-carrier wave according to the difference value of the channel capacity and the target beam in sequence from small to large, and controlling the satellite communication system to project the target beam which is matched with the service request quantity of the beam coverage area and has the channel capacity larger than a second preset threshold value to the beam coverage area by the second beam scheduling instruction, so that the optimal sub-carrier wave is preferentially matched for the user under the condition of ensuring the optimal channel quality, the utilization efficiency of resources is further improved, and the communication quality is ensured.

As an optional implementation manner, in the technical solution provided in step S106 of the present invention, the method includes: the second beam scheduling instructions are also for instructing the satellite communication system to project a target beam matching the service request to an area where the service request is present but not covered by the beam.

In this embodiment, the reallocation of resources is implemented by the second beam scheduling instruction, so that it is possible to avoid that the system cannot meet the personalized requirements of the users for satellite communications under the condition that the number of users is too large or the change of the number of users is small. In principle, after the terminal accesses the new user, the embodiment of the application can cover the beam within the whole range of the cell, and can also perform re-resource allocation on the cell which is not covered by the beam, so as to ensure the improvement of the resource utilization efficiency. Meanwhile, after a user moves out of a certain amount of projection beams, the satellite communication system reallocates resources to cells which do not meet the beam coverage condition according to the change of the positions and the number of the users. In the process of reallocating resources, the scheme also rearranges the source data combination according to the user feedback, and integrates beams with similar conditions.

Optionally, after throughput of the multi-beam satellite communication system increases, fairness of users in each cell may change correspondingly, and in this application, for maximum fairness of users, a steady state time average value may be used to provide resources according to different demands of users, so as to maximize throughput of the multi-beam satellite communication system under the condition of ensuring fairness.

In the steps, considering the mobility of users and user groups at different geographic positions, analyzing the first beam projection state and the first network communication state through a beam scheduling strategy model based on reinforcement learning, so that the advantages of a beam scheduling technology can be fully exerted, and better communication service is provided for the users; optimizing states, actions and rewards in a decision process of the beam scheduling strategy model by using a deep learning algorithm, so that the aim of global performance improvement is fulfilled, and meanwhile, solving the problem that the decision process of the beam scheduling strategy model is difficult to converge by using deep learning, so that the stability of the model is improved; in addition, initializing an experience pool, and when the number of training samples in the experience pool exceeds a first preset threshold, starting training the beam scheduling strategy model, so that batch normalization processing of multiple beams is realized.

Example 2

According to an embodiment of the present application, there is further provided a dynamic beam scheduling apparatus for implementing the dynamic beam scheduling method in embodiment 1, and fig. 4 is a schematic structural diagram of an alternative dynamic beam scheduling apparatus according to an embodiment of the present application, where, as shown in fig. 4, the dynamic beam scheduling apparatus includes at least a first issuing module 41, a policy determining module 42 and a second issuing module 43, where:

the first issuing module 41 is configured to issue a first beam scheduling instruction to the satellite communication system, and acquire a first beam projection state and a first network communication state within a beam coverage area after the satellite communication system executes the first beam scheduling instruction.

Specifically, a multi-beam satellite communication system model with a beam hopping function is determined, a first beam scheduling instruction is issued to the satellite communication system through a first issuing module 41, and a first beam projection state and a first network communication state in a beam coverage area after the satellite communication system executes the first beam scheduling instruction are acquired.

As an alternative embodiment, the first beam scheduling instruction issued by the first issuing module 41 includes a first beam projection state and a first network communication state within a beam coverage area, where: the first beam projection state at least comprises: a first number of projection beams and a second number of beam coverage areas, wherein the first number is not greater than the second number; the first network communication state includes at least: traffic requests in the beam coverage area, signal-to-interference-and-noise ratio, channel capacity and channel transmission delay, wherein the signal-to-interference-and-noise ratio and channel capacity are determined based on the transmit power, power gain and noise power spectral density of the beam projected into the beam coverage area.

In this embodiment, the terrestrial user terminal is used as the main object of the internet access service, and the terrestrial user terminal is connected to the gateway station by the forward link (i.e. downlink) of the satellite communication system, fig. 2 is a schematic diagram of an alternative multi-beam satellite system model with beam hopping function according to an embodiment of the present application. Assuming that K active beams provided by the satellite communication system cover N cells under the area, and the number K of active beams is much smaller than the number N of cells, the set can be used to represent a multi-beam satellite communication system model with a beam hopping function, and thus the first number of projection beams can be expressed as b= { B _k I k=1, 2,3, K, while the second number of beam coverage areas represents c= { C _n |n＝1，2，3，...，N}。

In addition, the beams distributed by a plurality of cells in a certain moment can be comprehensively consideredAnd carrying out corresponding decision optimization on the payload parameters and the channel conditions according to the channel capacity and the user service request. If the star communication system is set at each time t _s Dynamically scheduling the coverage area of K projection beams to N cells, wherein the service request amount in the coverage area of the beams is W, and each time t can be set _s The dynamically scheduled projection beam is expressed as:

Wherein (1)>

wherein the signal-to-interference-and-noise ratio is mainly determined by the transmit power of the beam projected into the beam coverage area

Power gain->

And noise power spectral density N ₀ Determined, wherein->

Representing t _j From time to timeCell c within the coverage area of the beam _r Is provided.

The policy determining module 42 is configured to input the first beam projection state and the first network communication state into a beam scheduling policy model based on reinforcement learning, and obtain a second beam scheduling instruction output by the beam scheduling policy model, where the beam scheduling policy model predicts that the second network communication state in the beam coverage area after the satellite communication system executes the second beam scheduling instruction is better than the first network communication state.

Specifically, the first beam projection state and the first network communication state which are currently acquired are analyzed through a beam scheduling strategy model based on reinforcement learning, and a beam scheduling instruction which is superior to the current first network communication state is output by comprehensively considering the channel capacity, the service request quantity and the data waiting time length.

For an alternative embodiment, the beam scheduling policy model in the policy determination module 42 includes: the system comprises a beam scheduling strategy determination sub-model, an experience pool, a label determination sub-model and an optimizer, wherein the beam scheduling strategy determination sub-model comprises a state action mapping function which is used for determining a second beam scheduling instruction according to a first beam projection state and a first network communication state; the experience pool is used for storing the first beam scheduling instruction, the first beam projection state, the first network communication state and the second beam scheduling instruction of the calendar in an array mode as training samples; the label determining sub-model is used for determining sample labels corresponding to the training samples; the optimizer is used for adjusting the beam scheduling strategy to determine model parameters of the sub-model.

As an alternative embodiment, the training process of the beam scheduling policy model by the policy determination module 42 includes: selecting a preset number of target training samples from an experience pool; determining target sample labels corresponding to all target training samples by using a label determination sub-model; sequentially inputting each target training sample into a beam scheduling strategy determination sub-model, and constructing a target loss function according to the output result of the beam scheduling strategy determination sub-model and a target sample label; the model parameters of the sub-model are determined by adjusting the beam scheduling strategy with an optimizer in a manner that minimizes the target loss function.

L(θ)＝E[(y _t -Q(s _t ,a _t ；θ)) ² ]

wherein y is _t Representing the target sample tag.

As an alternative implementation manner, the policy determining module 42 may initialize the experience pool before training the beam scheduling policy model, and store the first beam scheduling instruction, the beam projection state, the network communication state and the second beam scheduling instruction of the calendar in the experience pool as training samples in an array manner; when the number of training samples in the experience pool exceeds a first preset threshold value, training the beam scheduling strategy model; when the number of training samples in the experience pool exceeds the capacity threshold of the experience pool, deleting the training samples stored in the experience pool first according to the first-in first-out principle.

A second issuing module 43, configured to issue a second beam scheduling instruction to the satellite communication system.

Specifically, after the policy determining module 42 outputs the second beam scheduling instruction through the beam scheduling policy model based on reinforcement learning, the second issuing module 43 will also issue the second beam scheduling instruction into the satellite communication system, so that the second network communication state in the beam coverage area after the satellite communication system executes the second beam scheduling instruction is better than the first network communication state.

As an optional implementation manner, the second beam scheduling instruction issued by the second issuing module 43 is used to instruct the satellite communication system to project, to each beam coverage area, a target beam that matches the service request amount of the beam coverage area and has a channel capacity greater than a second preset threshold, where the target beam is allocated with the subcarriers ordered earlier according to the signal-to-noise ratio and the channel condition, and the order of the subcarriers allocated by the target beam with a smaller channel capacity is more advanced.

As an alternative embodiment, the second beam scheduling instruction issued by the second issuing module 43 is further used to instruct the satellite communication system to project a target beam matching the service request to the area where the service request exists but is not covered by the beam.

It should be noted that, each module in the dynamic beam scheduling apparatus in the embodiment of the present application corresponds to each implementation step of the dynamic beam scheduling method in embodiment 1 one by one, and since the detailed description has been already made in embodiment 1, some details not shown in the embodiment may refer to embodiment 1, and will not be repeated here.

Example 3

According to an embodiment of the present application, there is further provided a nonvolatile storage medium, where the nonvolatile storage medium includes a stored program, and a device where the nonvolatile storage medium is located executes the dynamic beam scheduling method in embodiment 1 by running the program.

Optionally, the device where the nonvolatile storage medium is located performs the following steps by running the program: issuing a first beam scheduling instruction to a satellite communication system, and acquiring a first beam projection state and a first network communication state in a beam coverage area after the satellite communication system executes the first beam scheduling instruction; inputting the first beam projection state and the first network communication state into a beam scheduling strategy model based on reinforcement learning, and obtaining a second beam scheduling instruction output by the beam scheduling strategy model, wherein the beam scheduling strategy model predicts that the second network communication state in a beam coverage area after the satellite communication system executes the second beam scheduling instruction is better than the first network communication state; and issuing a second beam scheduling instruction to the satellite communication system.

According to an embodiment of the present application, there is further provided a processor, configured to execute a program, where the program executes the dynamic beam scheduling method in embodiment 1.

Optionally, the program execution realizes the following steps: issuing a first beam scheduling instruction to a satellite communication system, and acquiring a first beam projection state and a first network communication state in a beam coverage area after the satellite communication system executes the first beam scheduling instruction; inputting the first beam projection state and the first network communication state into a beam scheduling strategy model based on reinforcement learning, and obtaining a second beam scheduling instruction output by the beam scheduling strategy model, wherein the beam scheduling strategy model predicts that the second network communication state in a beam coverage area after the satellite communication system executes the second beam scheduling instruction is better than the first network communication state; and issuing a second beam scheduling instruction to the satellite communication system.

According to an embodiment of the present application, there is also provided an electronic device including: a memory and a processor, wherein the memory stores a computer program, the processor configured to execute the dynamic beam scheduling method in embodiment 1 by the computer program.

Optionally, the processor is configured to implement the following steps by computer program execution: issuing a first beam scheduling instruction to a satellite communication system, and acquiring a first beam projection state and a first network communication state in a beam coverage area after the satellite communication system executes the first beam scheduling instruction; inputting the first beam projection state and the first network communication state into a beam scheduling strategy model based on reinforcement learning, and obtaining a second beam scheduling instruction output by the beam scheduling strategy model, wherein the beam scheduling strategy model predicts that the second network communication state in a beam coverage area after the satellite communication system executes the second beam scheduling instruction is better than the first network communication state; and issuing a second beam scheduling instruction to the satellite communication system.

The foregoing embodiment numbers of the present application are merely for describing, and do not represent advantages or disadvantages of the embodiments.

In the foregoing embodiments of the present application, the descriptions of the embodiments are emphasized, and for a portion of this disclosure that is not described in detail in this embodiment, reference is made to the related descriptions of other embodiments.

In the several embodiments provided in the present application, it should be understood that the disclosed technology content may be implemented in other manners. The above-described embodiments of the apparatus are merely exemplary, and the division of units may be a logic function division, and there may be another division manner in actual implementation, for example, multiple units or components may be combined or integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be through some interfaces, units or modules, or may be in electrical or other forms.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed over a plurality of units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional unit in each embodiment of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.

The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application may be embodied in essence or a part contributing to the prior art or all or part of the technical solution, in the form of a software product stored in a storage medium, including several instructions to cause a computer device (which may be a personal computer, a server or a network device, etc.) to perform all or part of the steps of the methods of the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a removable hard disk, a magnetic disk, or an optical disk, or other various media capable of storing program codes.

The foregoing is merely a preferred embodiment of the present application and it should be noted that modifications and adaptations to those skilled in the art may be made without departing from the principles of the present application and are intended to be comprehended within the scope of the present application.

Claims

1. A method for dynamic beam scheduling, comprising:

issuing a first beam scheduling instruction to a satellite communication system, and acquiring a first beam projection state and a first network communication state in a beam coverage area after the satellite communication system executes the first beam scheduling instruction;

inputting the first beam projection state and the first network communication state into a beam scheduling strategy model based on reinforcement learning, and obtaining a second beam scheduling instruction output by the beam scheduling strategy model, wherein the beam scheduling strategy model predicts that the second network communication state in a beam coverage area after the satellite communication system executes the second beam scheduling instruction is better than the first network communication state;

and issuing the second beam scheduling instruction to the satellite communication system.

2. The method of claim 1, wherein the step of determining the position of the substrate comprises,

The first beam projection state at least comprises: a first number of projection beams and a second number of beam coverage areas, wherein the first number is not greater than the second number;

the first network communication state at least comprises: the traffic request volume, the signal-to-interference-and-noise ratio, the channel capacity and the channel transmission delay in the beam coverage area are determined according to the transmitting power, the power gain and the noise power spectral density of the beam projected into the beam coverage area.

3. The method of claim 1, wherein the beam scheduling policy model comprises: a beam scheduling policy determination sub-model, an experience pool, a tag determination sub-model, and an optimizer, wherein,

the beam scheduling strategy determination submodel comprises a state action mapping function which is used for determining the second beam scheduling instruction according to the first beam projection state and the first network communication state;

the experience pool is used for storing the first beam scheduling instruction, the first beam projection state, the first network communication state and the second beam scheduling instruction of the calendar in an array mode as training samples;

The label determining sub-model is used for determining sample labels corresponding to the training samples;

the optimizer is configured to adjust model parameters of the beam scheduling policy determination sub-model.

4. The method of claim 3, wherein the training process of the beam scheduling policy model comprises:

selecting a preset number of target training samples from the experience pool;

determining target sample labels corresponding to the target training samples by using the label determination sub-model;

sequentially inputting each target training sample into the beam scheduling strategy determination sub-model, and constructing a target loss function according to the output result of the beam scheduling strategy determination sub-model and the target sample label;

and adjusting the beam scheduling strategy to determine model parameters of a sub-model by using the optimizer in a mode of minimizing the target loss function.

5. The method of claim 4, wherein prior to training the beam scheduling policy model, the method further comprises:

initializing the experience pool, and storing the first beam scheduling instruction, the beam projection state, the network communication state and the second beam scheduling instruction which are obtained from time to time in the experience pool in an array mode as training samples;

When the number of training samples in the experience pool exceeds a first preset threshold value, training the beam scheduling strategy model;

and deleting the training samples stored in the experience pool first according to a first-in first-out principle when the number of the training samples in the experience pool exceeds a capacity threshold of the experience pool.

6. The method of claim 1, wherein the step of determining the position of the substrate comprises,

the second beam scheduling instruction is configured to instruct the satellite communication system to project, to each beam coverage area, a target beam that matches with a service request amount of the beam coverage area and has a channel capacity greater than a second preset threshold, where the target beam is allocated with subcarriers ordered to be earlier according to a signal-to-noise ratio and a channel condition, and the order of the subcarriers allocated to the target beam with a smaller channel capacity is more earlier.

7. The method of claim 6, wherein the step of providing the first layer comprises,

the second beam scheduling instructions are further for instructing the satellite communication system to project the target beam matching the service request to an area where the service request is present but not covered by a beam.

8. A dynamic beam scheduling apparatus, comprising:

the first issuing module is used for issuing a first beam scheduling instruction to the satellite communication system and acquiring a first beam projection state and a first network communication state in a beam coverage area after the satellite communication system executes the first beam scheduling instruction;

the strategy determining module is used for inputting the first beam projection state and the first network communication state into a beam scheduling strategy model based on reinforcement learning to obtain a second beam scheduling instruction output by the beam scheduling strategy model, wherein the beam scheduling strategy model predicts that the second network communication state in a beam coverage area after the satellite communication system executes the second beam scheduling instruction is better than the first network communication state;

and the second issuing module is used for issuing the second beam scheduling instruction to the satellite communication system.

9. A non-volatile storage medium, characterized in that the non-volatile storage medium comprises a stored program, wherein a device in which the non-volatile storage medium is located performs the dynamic beam scheduling method according to any one of claims 1 to 7 by running the program.

10. An electronic device, comprising: a memory and a processor, wherein the memory has stored therein a computer program, the processor being configured to perform the dynamic beam scheduling method of any one of claims 1 to 7 by the computer program.