CN115483960B

CN115483960B - Wave beam jumping scheduling method, system and device for low orbit satellite and storage medium

Info

Publication number: CN115483960B
Application number: CN202211014921.4A
Authority: CN
Inventors: 王丹
Original assignee: Aipu Road Network Technology Nanjing Co ltd
Current assignee: Aipu Road Network Technology Nanjing Co ltd
Priority date: 2022-08-23
Filing date: 2022-08-23
Publication date: 2023-08-29
Anticipated expiration: 2042-08-23
Also published as: CN115483960A

Abstract

The application discloses a method, a system, a device and a storage medium for beam hopping scheduling of a low-orbit satellite, and relates to the technical field of satellite beam hopping. The beam hopping scheduling method of the low-orbit satellite comprises the following steps: acquiring cell coverage information of a beam cluster in a low-orbit satellite system; according to the cell coverage information, constructing a Q value matrix table by taking the cell as a state and taking the beam jumping direction as a behavior; initializing a Q value in the Q value matrix table; training the Q value matrix table according to preset training times to obtain a beam hopping strategy so that the low orbit satellite performs beam scheduling according to the beam hopping strategy; wherein, each training of the Q value matrix table comprises the following steps: and taking all cells in the beam traversing beam cluster as target states, and updating the Q value matrix table by adopting a reinforcement learning algorithm. The application can automatically generate the beam hopping strategy to reasonably schedule the beam of the low orbit satellite.

Description

Wave beam jumping scheduling method, system and device for low orbit satellite and storage medium

Technical Field

The present application relates to the field of satellite communications technologies, and in particular, to a method, a system, an apparatus, and a storage medium for beam hopping scheduling of a low-orbit satellite.

Background

Currently, as 5G technology is becoming more mature, 5G development is steadily advancing. Due to the outstanding characteristics of high performance, low delay, high capacity and the like of 5G, the 5G technology opens a new era of everything interconnection, and integrates multiple technologies such as artificial intelligence, big data and the like. But 5G communication has certain limitations as a land mobile system. Due to economic and technical limitations, land mobile communication services cannot cover all areas, such as ships, airplanes, scientific equipment, etc. in remote areas such as oceans, forests, deserts, etc. bandwidth is difficult to use. The satellite network is adopted as an auxiliary communication means, so that the communication problem of an area which is not covered by the land mobile communication service can be solved, and the network coverage can be greatly improved by combining the 5G network with the satellite network.

The high orbit satellite has limited orbit resources and large data transmission delay, and the time delay requirement of services such as online video chat or games cannot be met. In contrast, the data transmission delay of the low-orbit satellite is greatly shortened, and with the rapid development of modern mobile communication and electronic component technology, the problems of the communication quality, the data transmission rate, the use cost and the like of the early low-orbit satellite communication system are restricted and solved, and the low-orbit satellite communication system can be widely applied. The orbit and spectrum resources of the low orbit satellite system are limited at present, and the resources of the low orbit satellite system can be allocated by using the beam hopping technology, but no reasonable beam hopping strategy is used for beam scheduling at present.

Disclosure of Invention

The present application aims to solve at least one of the technical problems existing in the prior art. Therefore, the application provides a beam hopping scheduling method, a system, a device and a storage medium for a low-orbit satellite, which can automatically generate a beam hopping strategy to reasonably schedule the beam of the low-orbit satellite.

In one aspect, an embodiment of the present application provides a beam hopping scheduling method for a low-orbit satellite, including the following steps:

acquiring cell coverage information of a beam cluster in a low-orbit satellite system;

according to the cell coverage information, constructing a Q value matrix table by taking the cell as a state and taking the beam jumping direction as a behavior;

initializing a Q value in the Q value matrix table;

training the Q value matrix table according to preset training times to obtain a beam hopping strategy so that the low orbit satellite performs beam scheduling according to the beam hopping strategy;

wherein, each training of the Q value matrix table comprises the following steps:

and taking all cells in the beam traversing beam cluster as target states, and updating the Q value matrix table by adopting a reinforcement learning algorithm.

According to some embodiments of the application, the Q value matrix table is updated by training the following steps:

selecting one action to execute from all possible actions in the current state to obtain a next state and a reward value;

and updating the Q value of the selected behavior of the current state according to the maximum Q value of the next state and the rewarding value.

According to some embodiments of the present application, selecting one of all possible actions of the current state for execution, deriving the next state and prize value includes the steps of:

selecting one action from all possible actions in the current state to be executed, and obtaining the number of coverage users, the number of coincident beams and the position of the next cell corresponding to the next state;

determining a beam moving distance according to the current cell position and the next cell position corresponding to the current state;

and determining the rewarding value according to the number of the coverage users, the number of the coincident beams and the beam moving distance.

According to some embodiments of the application, the determining the prize value based on the number of coverage users, the number of coincident beams, and the beam movement distance comprises the steps of:

determining a positive correlation item of the rewarding value according to the number of the covered users;

determining a first rewarding value negative correlation item according to the number of the coincident beams;

determining a second prize value negative correlation term according to the beam movement distance;

and determining the bonus value according to the bonus value positive correlation item, the first bonus value negative correlation item and the second bonus value negative correlation item.

According to some embodiments of the application, the prize value is obtained by the following formula:

wherein, reorder represents the rewarding value, M represents the number of covered users, N represents the number of coincident beams, and D represents the beam moving distance.

According to some embodiments of the application, the updating the Q value of the selected behavior of the current state according to the maximum Q value of the next state and the reward value comprises the steps of:

determining an expected Q value of the current state according to the maximum Q value of the next state and the rewarding value;

and updating the Q value of the selected behavior of the current state according to the difference between the expected Q value of the current state and the Q value of the current state before updating.

According to some embodiments of the application, the expected Q value of the current state is calculated by the following formula:

Q(s _t ',a _t ')＝reward+gamma×arg(max(Q(s _t+1 ))；

wherein Q(s) _t ',a _t ') represents the expected Q value of the current state, gamma represents the preset attenuation value, Q(s) _t+1 ) The Q value corresponding to all possible behaviors representing the next state.

On the other hand, the embodiment of the application also provides a beam hopping scheduling system of the low-orbit satellite, which comprises the following steps:

the first module is used for acquiring cell coverage information of a beam cluster in the low-orbit satellite system;

the second module is used for constructing a Q value matrix table by taking the cell as a state and the beam jumping direction as a behavior according to the cell coverage information;

a third module, configured to initialize a Q value in the Q value matrix table;

a fourth module, configured to train the Q-value matrix table according to a preset training number to obtain a beam hopping strategy, so that the low-orbit satellite performs beam scheduling according to the beam hopping strategy;

On the other hand, the embodiment of the application also provides a beam hopping scheduling device of the low-orbit satellite, which comprises the following steps:

at least one processor;

at least one memory for storing at least one program;

the at least one program, when executed by the at least one processor, causes the at least one processor to implement a beam-hopping scheduling method for low-orbit satellites as previously described.

In another aspect, embodiments of the present application also provide a computer-readable storage medium storing computer-executable instructions for causing a computer to perform a beam-hopping scheduling method for low-orbit satellites as described above.

The technical scheme of the application has at least one of the following advantages or beneficial effects: the Q value matrix table is constructed by taking the cells in the beam cluster as a state and the beam jumping direction as a behavior, the Q value matrix table is initialized, then the Q value matrix table is updated by taking all the cells in the beam traversing beam cluster as a target state, and the updating process is repeated according to the preset training times, so that the Q value in the Q value matrix table can accurately reflect the environmental rewards corresponding to the beam jumping direction selected at the current cell position, and therefore a reasonable beam jumping strategy can be automatically generated in the beam jumping direction with higher environmental rewards selected at different cells based on the Q value matrix table, and the beams of the low orbit satellite are reasonably scheduled according to the beam jumping strategy.

Drawings

Fig. 1 is a flowchart of a beam hopping scheduling method of a low-orbit satellite according to an embodiment of the present application;

FIG. 2 is a schematic diagram of a satellite communication system according to an embodiment of the present application;

fig. 3 is a schematic diagram of a beam hopping scheduling device for a low-orbit satellite according to an embodiment of the present application.

Detailed Description

Embodiments of the present application are described in detail below, examples of which are illustrated in the accompanying drawings, wherein the same or similar reference numerals refer to the same or similar elements or elements having the same or similar functions throughout. The embodiments described below by referring to the drawings are illustrative only and are not to be construed as limiting the application.

In the description of the present application, it should be understood that the direction or positional relationship indicated with respect to the description of the orientation, such as up, down, left, right, etc., is based on the direction or positional relationship shown in the drawings, is merely for convenience of describing the present application and simplifying the description, and does not indicate or imply that the apparatus or element to be referred to must have a specific orientation, be constructed and operated in a specific orientation, and thus should not be construed as limiting the present application.

In the description of the present application, the description of first, second, etc. is for the purpose of distinguishing between technical features only, and should not be construed as indicating or implying a relative importance or implicitly indicating the number of technical features indicated or implicitly indicating the precedence of the technical features indicated.

Before describing embodiments of the present application in further detail, the terms and terminology involved in the embodiments of the present application will be described, and the terms and terminology involved in the embodiments of the present application will be used in the following explanation.

A beam (wave beam) refers to the shape of an electromagnetic wave emitted by a satellite antenna on the earth's surface (e.g., a beam of light emitted as a flashlight to the dark). There are mainly global beams, spot beams, shaped beams, the shape of which is determined in particular by the transmitting antenna.

A cell (cell) assumes that a region is covered by K spot beam areas, each spot beam area being referred to as a cell (cell). The communication service flow of the users in the cell is uploaded to the satellite by the gateway station, and then is sent to each user by the satellite through the jumping beam downlink. Every N spot beams are grouped into a group, called a cluster (cluster), and are grouped into M clusters, where k=n×m.

The beam hopping technology is a dynamic beam adjustable technology of a satellite system, which applies the idea of time division multiplexing to divide the time resource of a low orbit satellite system into a plurality of time slots, wherein each time slot only has a part of beams to work as required, and the beams are dynamically requested to be scheduled according to the flow in the next time slot, so that the beams of the system are "hopped" to other cells, namely, the beams are scheduled according to the time slots. In the conventional multi-beam system, all spot beams work simultaneously, but not all areas have service requirements all the time, which causes a certain resource waste. In the beam hopping system, only part of the hopping beams on the satellite work simultaneously, namely, at any given moment, only a small number of spot beam areas in each cluster are lightened to be in a working state. In each beam cluster, the system hops to a cell with a service request as required to provide service for the cell, so that resource waste caused by idle channels is greatly reduced.

AMF (Access and Mobility management Function ), 5G core network element, responsible for user access and mobility management.

NWDAF (Network Data Analytics Function, network data analysis function), 5G core network element, responsible for providing network analysis services according to the request data of the network services.

The embodiment of the application provides a beam hopping scheduling method of a low-orbit satellite, which can be applied to a satellite communication system, and referring to fig. 2, the satellite communication system comprises a satellite base station (low-orbit satellite) and a 5G core network, and the 5G core network comprises an AMF, a database and an NWDAF.

The satellite base station is used for reporting satellite positions, cell coverage information, wireless resource information and the like to the AMF;

the AMF is used for receiving satellite positions, cell coverage information and radio resource information reported by satellites, adding corresponding time stamps to the received various information and storing the information into the database; meanwhile, the AMF sends a beam hopping analysis request to the NWDAF;

the NWDAF is configured to obtain corresponding data from the database according to the beam hopping analysis request, perform beam hopping strategy training analysis according to the data, and send the obtained beam hopping strategy to the AMF;

the AMF forwards the beam hopping strategy to the corresponding satellite base station, so that the satellite base station performs beam hopping according to the received beam hopping strategy.

The beam hopping scheduling method of the low-orbit satellite in the embodiment of the application can be applied to the NWDAF of the satellite communication system and can also be applied to other network elements with data analysis functions in the satellite communication system, and the embodiment of the application is not particularly limited.

Referring to fig. 1, a beam hopping scheduling method of a low-orbit satellite according to an embodiment of the present application includes, but is not limited to, step S110, step S120, step S130, and step S140.

Step S110, obtaining cell coverage information of a beam cluster in a low-orbit satellite system;

step S120, constructing a Q value matrix table by taking a cell as a state and taking the beam jumping direction as a behavior according to the cell coverage information;

step S130, initializing the Q value in the Q value matrix table;

step S140, training the Q value matrix table according to preset training times to obtain a beam hopping strategy so that the low orbit satellite performs beam scheduling according to the beam hopping strategy;

wherein, each training Q value matrix table comprises the following steps:

Specifically, the beam hopping technology is based on the time division multiplexing idea, each low-orbit satellite allocates a time slot for each cell of a beam cluster in a certain time period, and sequentially schedules the beams to the corresponding cells according to a time slot sequence, so that only the cells lighted by the beams can realize satellite communication. In order to reasonably allocate cell time slots, firstly, cell coverage information in a beam cluster, namely cell allocation in the beam cluster, is obtained, and then, according to the cell coverage information, a Q value matrix table is constructed and initialized by taking a cell as a state and taking the beam jumping direction as a behavior, wherein the Q value matrix table is shown in table 1.

TABLE 1 initial Q matrix table

Referring to table 1, the state in the Q value matrix table represents a cell, the behavior represents the next beam jumping direction in the current cell, the value in the Q value matrix table is a Q value, and the Q value represents that an environmental reward value is obtained in the corresponding cell according to the corresponding beam jumping direction. After constructing the Q value matrix table, initializing each value of the Q value matrix table to be 0, then adopting Q-learning in a reinforcement learning algorithm to update the Q value matrix table, namely randomly selecting an initial state, namely randomly selecting the initial coverage cell position of the beam to start updating the Q value, until all cells in the beam coverage beam cluster complete one-time training updating of the Q value matrix table, and repeatedly updating the Q value matrix table according to preset training times to obtain the trained Q value matrix table, as shown in table 2.

Table 2Q value matrix table after training

Assuming that the cell c1 is the starting position of the beam, determining the next hop according to the action with the largest Q value in all possible actions of the cell c1, namely, jumping upwards in the cell c1, and determining the next hop after the beam jumps upwards in the cell c1 to the cell c2, and so on until the beam jumps to all the cells, thereby obtaining the beam scheduling path. Further, according to the fine granularity of time slot division, the time slot allocation is carried out according to the beam scheduling path, so that the beam hopping strategy can be obtained, and the low orbit satellite carries out beam scheduling according to the beam hopping strategy.

It can be understood that the beam jumping directions provided by the embodiment of the application include up, down, left and right, and the beam jumping directions can be fewer or more directions, and the beam jumping directions can be represented by an angle with a reference line.

step S210, selecting one action to execute from all possible actions in the current state to obtain the next state and a reward value;

step S220, the Q value of the selected behavior of the current state is updated according to the maximum Q value and the rewarding value of the next state.

Specifically, in the first training of the Q value matrix table, since each value in the Q value matrix table is 0, the maximum Q value of the next state is 0, at this time, the maximum Q value of any one of the next states may be randomly selected in combination with the prize value obtained by jumping from the current state to the next state to update the Q value of the selected behavior of the current state, and then the Q value of the next state is updated until the Q values of all the states are updated, so that the training is completed once.

According to some embodiments of the present application, step S210 includes, but is not limited to, step S310, step S320, and step S330.

Step S310, selecting one action from all possible actions in the current state to be executed, and obtaining the number of coverage users, the number of coincident beams and the position of the next cell corresponding to the next state;

step S320, determining a beam moving distance according to the current cell position and the next cell position corresponding to the current state;

step S330, determining the rewarding value according to the number of the covered users, the number of the coincident beams and the beam moving distance.

Specifically, the number of the coverage users is fused into the reward value calculation, so that the Q value in the Q value matrix table is fused into the resource utilization ratio characteristic, the number of the coincident beams is fused into the reward value calculation, so that the Q value in the Q value matrix table is fused into the beam interference influence characteristic, the beam moving distance is fused into the reward value calculation, so that the Q value in the Q value matrix table is fused into the path length characteristic, the beam hopping strategy is obtained according to the Q value matrix table, the resource utilization ratio, the beam interference and the path length factor are considered, the low orbit satellite performs reasonable beam scheduling according to the beam hopping strategy, the resource utilization ratio can be optimized, and the beam interference is reduced.

According to some embodiments of the present application, step S330 includes, but is not limited to, step S410, step S420, step S430, and step S440.

Step S410, determining a positive correlation item of the rewarding value according to the number of the covered users;

step S420, determining a first rewarding value negative correlation item according to the number of the coincident beams;

step S430, determining a second prize value negative correlation term according to the beam moving distance;

step S440, determining the prize value according to the prize value positive correlation term, the first prize value negative correlation term and the second prize value negative correlation term.

Specifically, the positive correlation term of the reward value indicates that the reward value increases with the increment of the number of covered users, the negative correlation term of the first reward value indicates that the reward value increases with the decrement of the number of coincident beams, the negative correlation term of the second reward value indicates that the reward value increases with the decrement of the moving distance of the beams, the positive correlation term of the reward value, the negative correlation term of the first reward value and the negative correlation term of the second reward value are added to obtain the reward value, the larger the reward value indicates that the resource utilization rate is higher, the less the beam interference is caused and the scheduling path is shorter, correspondingly, the larger the Q value indicates that the resource utilization rate is higher, the less the beam interference is caused and the scheduling path is shorter, therefore, when the beam is selected in the next hop direction of the current cell according to the Q value matrix table, the direction with the largest Q value should be selected for beam hopping.

It should be noted that, in the embodiment of the present application, the prize value may be determined only according to the positive correlation term of the prize value, that is, only the influence of the resource utilization rate on the prize value is considered, or only the influence of the beam interference on the prize value is considered, or only the influence of the second negative correlation term of the prize value is considered, that is, only the influence of the scheduling distance on the prize value is considered, or the prize value is determined according to any two of the positive correlation term of the prize value, the negative correlation term of the first prize value, and the negative correlation term of the second prize value.

In some embodiments, the positive correlation term of the prize value, the negative correlation term of the first prize value and the negative correlation term of the second prize value may be weighted according to the importance of different influencing factors, for example, the beam scheduling mainly considers the resource utilization rate, the beam interference and the scheduling path are secondary considerations, and the weights of the positive correlation term of the prize value, the negative correlation term of the first prize value and the negative correlation term of the second prize value may be 0.7, 0.2 and 0.1 respectively, and the respective correlation terms are multiplied by the respective weights and added to obtain the prize value.

Specifically, the prize value calculation formula may be as shown in formula (1):

According to some embodiments of the application, step S220 includes, but is not limited to, step S510 and step S520.

Step S510, determining the expected Q value of the current state according to the maximum Q value and the rewarding value of the next state;

step S520, the Q value of the selected behavior of the current state is updated according to the difference between the expected Q value of the current state and the Q value of the current state before updating.

Specifically, taking the Q value matrix table of the continuous training table 2 as an example, the Q value of the upward action of the current state c1 is 88, the current state c1 goes to the next state c2 after the upward action is taken, the rewarding value is determined according to the information such as the number of coverage users, the number of coincident beams and the beam moving distance of the cell c2, the maximum Q value 87 of the next state c2 is obtained by looking up a table, and the maximum Q value of the next state c2 is multiplied by a preset attenuation value and then added with the rewarding value, so that the expected Q value is obtained, which is assumed to be 90. The difference between the expected Q value 90 and the estimated Q value 88 of the current state c1 taking the upward action is calculated, and the difference is multiplied by a learning rate of 0.5 and then the original estimated Q value 88 is added to obtain the updated Q value 89 of the current state c1 taking the upward action.

According to some embodiments of the application, the expected Q value for the current state is calculated by the following formula:

Q(s _t ',a _t ')＝reward+gamma×arg(max(Q(s _t+1 ))；

The embodiment of the application also provides a wave beam jumping scheduling system of the low orbit satellite, which comprises the following steps:

the second module is used for constructing a Q value matrix table by taking the cell as a state and taking the beam jumping direction as a behavior according to the cell coverage information;

a third module for initializing the Q value in the Q value matrix table;

wherein, each training Q value matrix table comprises the following steps:

It can be understood that the content in the embodiments of the beam hopping scheduling method of the low-orbit satellite is applicable to the embodiments of the system, and the functions specifically implemented by the embodiments of the system are the same as those of the embodiments of the beam hopping scheduling method of the low-orbit satellite, and the beneficial effects achieved by the embodiments of the beam hopping scheduling method of the low-orbit satellite are the same as those achieved by the embodiments of the beam hopping scheduling method of the low-orbit satellite.

Referring to fig. 3, fig. 3 is a schematic diagram of a beam hopping scheduling device for a low-orbit satellite according to an embodiment of the present application. The beam hopping scheduling device for the low-orbit satellite according to the embodiment of the application comprises one or more control processors and a memory, and in fig. 3, one control processor and one memory are taken as an example.

The control processor and the memory may be connected by a bus or otherwise, for example in fig. 3.

The memory, as a non-transitory computer readable storage medium, may be used to store non-transitory software programs as well as non-transitory computer executable programs. In addition, the memory may include high-speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid state storage device. In some embodiments, the memory optionally includes memory remotely located from the control processor, the remote memory being connectable to the beam-hopping scheduler of the low-orbit satellite via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

It will be appreciated by those skilled in the art that the arrangement shown in fig. 3 does not constitute a limitation of the low-orbit satellite's beam-hopping scheduler, and may include more or fewer components than shown, or may combine certain components, or a different arrangement of components.

The non-transitory software program and instructions required to implement the low-orbit satellite beam-hopping scheduling method of the beam-hopping scheduling device applied to the low-orbit satellite in the above embodiment are stored in the memory, and when executed by the control processor, the beam-hopping scheduling method of the low-orbit satellite of the beam-hopping scheduling device applied to the low-orbit satellite in the above embodiment is executed.

In addition, an embodiment of the present application further provides a computer readable storage medium, where computer executable instructions are stored, where the computer executable instructions are executed by one or more control processors, and where the one or more control processors are configured to perform a beam hopping scheduling method of the low-orbit satellite in the method embodiment.

Those of ordinary skill in the art will appreciate that all or some of the steps, systems, and methods disclosed above may be implemented as software, firmware, hardware, and suitable combinations thereof. Some or all of the physical components may be implemented as software executed by a processor, such as a central processing unit, digital signal processor, or microprocessor, or as hardware, or as an integrated circuit, such as an application specific integrated circuit. Such software may be distributed on computer readable media, which may include computer storage media (or non-transitory media) and communication media (or transitory media). The term computer storage media includes both volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data, as known to those skilled in the art. Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital Versatile Disks (DVD) or other optical disk storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can be accessed by a computer. Furthermore, as is well known to those of ordinary skill in the art, communication media typically embodies computer readable instructions, data structures, program modules or other data in a modulated data signal such as a carrier wave or other transport mechanism and includes any information delivery media.

The embodiments of the present application have been described in detail with reference to the accompanying drawings, but the present application is not limited to the above embodiments, and various changes can be made within the knowledge of one of ordinary skill in the art without departing from the spirit of the present application.

Claims

1. The wave beam jumping scheduling method of the low orbit satellite is characterized by comprising the following steps of:

initializing a Q value in the Q value matrix table;

wherein, updating the Q value matrix table for each training comprises the following steps:

taking all cells in a beam traversing beam cluster as target states, and updating the Q value matrix table by adopting a reinforcement learning algorithm; wherein, the updating the Q value matrix table by adopting the reinforcement learning algorithm comprises the following steps:

determining a reward value according to the number of the coverage users, the number of the coincident beams and the beam moving distance;

2. The method of beam hopping scheduling for a low-orbit satellite as claimed in claim 1, wherein said determining the prize value based on the number of overlapping users, the number of overlapping beams and the beam movement distance comprises the steps of:

3. The method of beam hopping scheduling for a low-orbit satellite as claimed in claim 1, wherein the prize value is obtained by the following equation:

4. A method of beam-hopping scheduling for a low-orbit satellite as claimed in claim 3, wherein said updating the Q value of the selected behaviour of the current state in dependence on the maximum Q value of the next state and the reward value comprises the steps of:

5. The method of beam-hopping scheduling for low-orbit satellites as claimed in claim 4, wherein the expected Q value of the current state is calculated by the following formula:

Q(s _t ',a _t ')＝reward+gamma×arg(max(Q(s _t+1 )))；

6. A system for beam hopping scheduling for a low-orbit satellite, comprising:

a third module, configured to initialize a Q value in the Q value matrix table;

7. A beam hopping scheduling device for a low-orbit satellite, comprising:

at least one processor;

at least one memory for storing at least one program;

when the at least one program is executed by the at least one processor, the at least one processor is caused to implement the beam hopping scheduling method of the low-orbit satellite as claimed in any one of claims 1 to 5.

8. A computer readable storage medium in which a processor executable program is stored, wherein the processor executable program is for implementing the low orbit satellite beam hopping scheduling method according to any one of claims 1 to 5 when executed by the processor.