CN115483960B - Wave beam jumping scheduling method, system and device for low orbit satellite and storage medium - Google Patents

Wave beam jumping scheduling method, system and device for low orbit satellite and storage medium Download PDF

Info

Publication number
CN115483960B
CN115483960B CN202211014921.4A CN202211014921A CN115483960B CN 115483960 B CN115483960 B CN 115483960B CN 202211014921 A CN202211014921 A CN 202211014921A CN 115483960 B CN115483960 B CN 115483960B
Authority
CN
China
Prior art keywords
value
low
orbit satellite
matrix table
current state
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202211014921.4A
Other languages
Chinese (zh)
Other versions
CN115483960A (en
Inventor
王丹
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Aipu Road Network Technology Nanjing Co ltd
Original Assignee
Aipu Road Network Technology Nanjing Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Aipu Road Network Technology Nanjing Co ltd filed Critical Aipu Road Network Technology Nanjing Co ltd
Priority to CN202211014921.4A priority Critical patent/CN115483960B/en
Publication of CN115483960A publication Critical patent/CN115483960A/en
Application granted granted Critical
Publication of CN115483960B publication Critical patent/CN115483960B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04BTRANSMISSION
    • H04B7/00Radio transmission systems, i.e. using radiation field
    • H04B7/14Relay systems
    • H04B7/15Active relay systems
    • H04B7/185Space-based or airborne stations; Stations for satellite systems
    • H04B7/1851Systems using a satellite or space-based relay
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04BTRANSMISSION
    • H04B7/00Radio transmission systems, i.e. using radiation field
    • H04B7/14Relay systems
    • H04B7/15Active relay systems
    • H04B7/185Space-based or airborne stations; Stations for satellite systems
    • H04B7/1851Systems using a satellite or space-based relay
    • H04B7/18519Operations control, administration or maintenance
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D30/00Reducing energy consumption in communication networks
    • Y02D30/70Reducing energy consumption in communication networks in wireless communication networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Astronomy & Astrophysics (AREA)
  • Aviation & Aerospace Engineering (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Radio Relay Systems (AREA)

Abstract

The application discloses a method, a system, a device and a storage medium for beam hopping scheduling of a low-orbit satellite, and relates to the technical field of satellite beam hopping. The beam hopping scheduling method of the low-orbit satellite comprises the following steps: acquiring cell coverage information of a beam cluster in a low-orbit satellite system; according to the cell coverage information, constructing a Q value matrix table by taking the cell as a state and taking the beam jumping direction as a behavior; initializing a Q value in the Q value matrix table; training the Q value matrix table according to preset training times to obtain a beam hopping strategy so that the low orbit satellite performs beam scheduling according to the beam hopping strategy; wherein, each training of the Q value matrix table comprises the following steps: and taking all cells in the beam traversing beam cluster as target states, and updating the Q value matrix table by adopting a reinforcement learning algorithm. The application can automatically generate the beam hopping strategy to reasonably schedule the beam of the low orbit satellite.

Description

Wave beam jumping scheduling method, system and device for low orbit satellite and storage medium
Technical Field
The present application relates to the field of satellite communications technologies, and in particular, to a method, a system, an apparatus, and a storage medium for beam hopping scheduling of a low-orbit satellite.
Background
Currently, as 5G technology is becoming more mature, 5G development is steadily advancing. Due to the outstanding characteristics of high performance, low delay, high capacity and the like of 5G, the 5G technology opens a new era of everything interconnection, and integrates multiple technologies such as artificial intelligence, big data and the like. But 5G communication has certain limitations as a land mobile system. Due to economic and technical limitations, land mobile communication services cannot cover all areas, such as ships, airplanes, scientific equipment, etc. in remote areas such as oceans, forests, deserts, etc. bandwidth is difficult to use. The satellite network is adopted as an auxiliary communication means, so that the communication problem of an area which is not covered by the land mobile communication service can be solved, and the network coverage can be greatly improved by combining the 5G network with the satellite network.
The high orbit satellite has limited orbit resources and large data transmission delay, and the time delay requirement of services such as online video chat or games cannot be met. In contrast, the data transmission delay of the low-orbit satellite is greatly shortened, and with the rapid development of modern mobile communication and electronic component technology, the problems of the communication quality, the data transmission rate, the use cost and the like of the early low-orbit satellite communication system are restricted and solved, and the low-orbit satellite communication system can be widely applied. The orbit and spectrum resources of the low orbit satellite system are limited at present, and the resources of the low orbit satellite system can be allocated by using the beam hopping technology, but no reasonable beam hopping strategy is used for beam scheduling at present.
Disclosure of Invention
The present application aims to solve at least one of the technical problems existing in the prior art. Therefore, the application provides a beam hopping scheduling method, a system, a device and a storage medium for a low-orbit satellite, which can automatically generate a beam hopping strategy to reasonably schedule the beam of the low-orbit satellite.
In one aspect, an embodiment of the present application provides a beam hopping scheduling method for a low-orbit satellite, including the following steps:
acquiring cell coverage information of a beam cluster in a low-orbit satellite system;
according to the cell coverage information, constructing a Q value matrix table by taking the cell as a state and taking the beam jumping direction as a behavior;
initializing a Q value in the Q value matrix table;
training the Q value matrix table according to preset training times to obtain a beam hopping strategy so that the low orbit satellite performs beam scheduling according to the beam hopping strategy;
wherein, each training of the Q value matrix table comprises the following steps:
and taking all cells in the beam traversing beam cluster as target states, and updating the Q value matrix table by adopting a reinforcement learning algorithm.
According to some embodiments of the application, the Q value matrix table is updated by training the following steps:
selecting one action to execute from all possible actions in the current state to obtain a next state and a reward value;
and updating the Q value of the selected behavior of the current state according to the maximum Q value of the next state and the rewarding value.
According to some embodiments of the present application, selecting one of all possible actions of the current state for execution, deriving the next state and prize value includes the steps of:
selecting one action from all possible actions in the current state to be executed, and obtaining the number of coverage users, the number of coincident beams and the position of the next cell corresponding to the next state;
determining a beam moving distance according to the current cell position and the next cell position corresponding to the current state;
and determining the rewarding value according to the number of the coverage users, the number of the coincident beams and the beam moving distance.
According to some embodiments of the application, the determining the prize value based on the number of coverage users, the number of coincident beams, and the beam movement distance comprises the steps of:
determining a positive correlation item of the rewarding value according to the number of the covered users;
determining a first rewarding value negative correlation item according to the number of the coincident beams;
determining a second prize value negative correlation term according to the beam movement distance;
and determining the bonus value according to the bonus value positive correlation item, the first bonus value negative correlation item and the second bonus value negative correlation item.
According to some embodiments of the application, the prize value is obtained by the following formula:
wherein, reorder represents the rewarding value, M represents the number of covered users, N represents the number of coincident beams, and D represents the beam moving distance.
According to some embodiments of the application, the updating the Q value of the selected behavior of the current state according to the maximum Q value of the next state and the reward value comprises the steps of:
determining an expected Q value of the current state according to the maximum Q value of the next state and the rewarding value;
and updating the Q value of the selected behavior of the current state according to the difference between the expected Q value of the current state and the Q value of the current state before updating.
According to some embodiments of the application, the expected Q value of the current state is calculated by the following formula:
Q(s t ',a t ')=reward+gamma×arg(max(Q(s t+1 ));
wherein Q(s) t ',a t ') represents the expected Q value of the current state, gamma represents the preset attenuation value, Q(s) t+1 ) The Q value corresponding to all possible behaviors representing the next state.
On the other hand, the embodiment of the application also provides a beam hopping scheduling system of the low-orbit satellite, which comprises the following steps:
the first module is used for acquiring cell coverage information of a beam cluster in the low-orbit satellite system;
the second module is used for constructing a Q value matrix table by taking the cell as a state and the beam jumping direction as a behavior according to the cell coverage information;
a third module, configured to initialize a Q value in the Q value matrix table;
a fourth module, configured to train the Q-value matrix table according to a preset training number to obtain a beam hopping strategy, so that the low-orbit satellite performs beam scheduling according to the beam hopping strategy;
wherein, each training of the Q value matrix table comprises the following steps:
and taking all cells in the beam traversing beam cluster as target states, and updating the Q value matrix table by adopting a reinforcement learning algorithm.
On the other hand, the embodiment of the application also provides a beam hopping scheduling device of the low-orbit satellite, which comprises the following steps:
at least one processor;
at least one memory for storing at least one program;
the at least one program, when executed by the at least one processor, causes the at least one processor to implement a beam-hopping scheduling method for low-orbit satellites as previously described.
In another aspect, embodiments of the present application also provide a computer-readable storage medium storing computer-executable instructions for causing a computer to perform a beam-hopping scheduling method for low-orbit satellites as described above.
The technical scheme of the application has at least one of the following advantages or beneficial effects: the Q value matrix table is constructed by taking the cells in the beam cluster as a state and the beam jumping direction as a behavior, the Q value matrix table is initialized, then the Q value matrix table is updated by taking all the cells in the beam traversing beam cluster as a target state, and the updating process is repeated according to the preset training times, so that the Q value in the Q value matrix table can accurately reflect the environmental rewards corresponding to the beam jumping direction selected at the current cell position, and therefore a reasonable beam jumping strategy can be automatically generated in the beam jumping direction with higher environmental rewards selected at different cells based on the Q value matrix table, and the beams of the low orbit satellite are reasonably scheduled according to the beam jumping strategy.
Drawings
Fig. 1 is a flowchart of a beam hopping scheduling method of a low-orbit satellite according to an embodiment of the present application;
FIG. 2 is a schematic diagram of a satellite communication system according to an embodiment of the present application;
fig. 3 is a schematic diagram of a beam hopping scheduling device for a low-orbit satellite according to an embodiment of the present application.
Detailed Description
Embodiments of the present application are described in detail below, examples of which are illustrated in the accompanying drawings, wherein the same or similar reference numerals refer to the same or similar elements or elements having the same or similar functions throughout. The embodiments described below by referring to the drawings are illustrative only and are not to be construed as limiting the application.
In the description of the present application, it should be understood that the direction or positional relationship indicated with respect to the description of the orientation, such as up, down, left, right, etc., is based on the direction or positional relationship shown in the drawings, is merely for convenience of describing the present application and simplifying the description, and does not indicate or imply that the apparatus or element to be referred to must have a specific orientation, be constructed and operated in a specific orientation, and thus should not be construed as limiting the present application.
In the description of the present application, the description of first, second, etc. is for the purpose of distinguishing between technical features only, and should not be construed as indicating or implying a relative importance or implicitly indicating the number of technical features indicated or implicitly indicating the precedence of the technical features indicated.
Before describing embodiments of the present application in further detail, the terms and terminology involved in the embodiments of the present application will be described, and the terms and terminology involved in the embodiments of the present application will be used in the following explanation.
A beam (wave beam) refers to the shape of an electromagnetic wave emitted by a satellite antenna on the earth's surface (e.g., a beam of light emitted as a flashlight to the dark). There are mainly global beams, spot beams, shaped beams, the shape of which is determined in particular by the transmitting antenna.
A cell (cell) assumes that a region is covered by K spot beam areas, each spot beam area being referred to as a cell (cell). The communication service flow of the users in the cell is uploaded to the satellite by the gateway station, and then is sent to each user by the satellite through the jumping beam downlink. Every N spot beams are grouped into a group, called a cluster (cluster), and are grouped into M clusters, where k=n×m.
The beam hopping technology is a dynamic beam adjustable technology of a satellite system, which applies the idea of time division multiplexing to divide the time resource of a low orbit satellite system into a plurality of time slots, wherein each time slot only has a part of beams to work as required, and the beams are dynamically requested to be scheduled according to the flow in the next time slot, so that the beams of the system are "hopped" to other cells, namely, the beams are scheduled according to the time slots. In the conventional multi-beam system, all spot beams work simultaneously, but not all areas have service requirements all the time, which causes a certain resource waste. In the beam hopping system, only part of the hopping beams on the satellite work simultaneously, namely, at any given moment, only a small number of spot beam areas in each cluster are lightened to be in a working state. In each beam cluster, the system hops to a cell with a service request as required to provide service for the cell, so that resource waste caused by idle channels is greatly reduced.
AMF (Access and Mobility management Function ), 5G core network element, responsible for user access and mobility management.
NWDAF (Network Data Analytics Function, network data analysis function), 5G core network element, responsible for providing network analysis services according to the request data of the network services.
The embodiment of the application provides a beam hopping scheduling method of a low-orbit satellite, which can be applied to a satellite communication system, and referring to fig. 2, the satellite communication system comprises a satellite base station (low-orbit satellite) and a 5G core network, and the 5G core network comprises an AMF, a database and an NWDAF.
The satellite base station is used for reporting satellite positions, cell coverage information, wireless resource information and the like to the AMF;
the AMF is used for receiving satellite positions, cell coverage information and radio resource information reported by satellites, adding corresponding time stamps to the received various information and storing the information into the database; meanwhile, the AMF sends a beam hopping analysis request to the NWDAF;
the NWDAF is configured to obtain corresponding data from the database according to the beam hopping analysis request, perform beam hopping strategy training analysis according to the data, and send the obtained beam hopping strategy to the AMF;
the AMF forwards the beam hopping strategy to the corresponding satellite base station, so that the satellite base station performs beam hopping according to the received beam hopping strategy.
The beam hopping scheduling method of the low-orbit satellite in the embodiment of the application can be applied to the NWDAF of the satellite communication system and can also be applied to other network elements with data analysis functions in the satellite communication system, and the embodiment of the application is not particularly limited.
Referring to fig. 1, a beam hopping scheduling method of a low-orbit satellite according to an embodiment of the present application includes, but is not limited to, step S110, step S120, step S130, and step S140.
Step S110, obtaining cell coverage information of a beam cluster in a low-orbit satellite system;
step S120, constructing a Q value matrix table by taking a cell as a state and taking the beam jumping direction as a behavior according to the cell coverage information;
step S130, initializing the Q value in the Q value matrix table;
step S140, training the Q value matrix table according to preset training times to obtain a beam hopping strategy so that the low orbit satellite performs beam scheduling according to the beam hopping strategy;
wherein, each training Q value matrix table comprises the following steps:
and taking all cells in the beam traversing beam cluster as target states, and updating the Q value matrix table by adopting a reinforcement learning algorithm.
Specifically, the beam hopping technology is based on the time division multiplexing idea, each low-orbit satellite allocates a time slot for each cell of a beam cluster in a certain time period, and sequentially schedules the beams to the corresponding cells according to a time slot sequence, so that only the cells lighted by the beams can realize satellite communication. In order to reasonably allocate cell time slots, firstly, cell coverage information in a beam cluster, namely cell allocation in the beam cluster, is obtained, and then, according to the cell coverage information, a Q value matrix table is constructed and initialized by taking a cell as a state and taking the beam jumping direction as a behavior, wherein the Q value matrix table is shown in table 1.
TABLE 1 initial Q matrix table
Referring to table 1, the state in the Q value matrix table represents a cell, the behavior represents the next beam jumping direction in the current cell, the value in the Q value matrix table is a Q value, and the Q value represents that an environmental reward value is obtained in the corresponding cell according to the corresponding beam jumping direction. After constructing the Q value matrix table, initializing each value of the Q value matrix table to be 0, then adopting Q-learning in a reinforcement learning algorithm to update the Q value matrix table, namely randomly selecting an initial state, namely randomly selecting the initial coverage cell position of the beam to start updating the Q value, until all cells in the beam coverage beam cluster complete one-time training updating of the Q value matrix table, and repeatedly updating the Q value matrix table according to preset training times to obtain the trained Q value matrix table, as shown in table 2.
Table 2Q value matrix table after training
Assuming that the cell c1 is the starting position of the beam, determining the next hop according to the action with the largest Q value in all possible actions of the cell c1, namely, jumping upwards in the cell c1, and determining the next hop after the beam jumps upwards in the cell c1 to the cell c2, and so on until the beam jumps to all the cells, thereby obtaining the beam scheduling path. Further, according to the fine granularity of time slot division, the time slot allocation is carried out according to the beam scheduling path, so that the beam hopping strategy can be obtained, and the low orbit satellite carries out beam scheduling according to the beam hopping strategy.
It can be understood that the beam jumping directions provided by the embodiment of the application include up, down, left and right, and the beam jumping directions can be fewer or more directions, and the beam jumping directions can be represented by an angle with a reference line.
According to some embodiments of the application, the Q value matrix table is updated by training the following steps:
step S210, selecting one action to execute from all possible actions in the current state to obtain the next state and a reward value;
step S220, the Q value of the selected behavior of the current state is updated according to the maximum Q value and the rewarding value of the next state.
Specifically, in the first training of the Q value matrix table, since each value in the Q value matrix table is 0, the maximum Q value of the next state is 0, at this time, the maximum Q value of any one of the next states may be randomly selected in combination with the prize value obtained by jumping from the current state to the next state to update the Q value of the selected behavior of the current state, and then the Q value of the next state is updated until the Q values of all the states are updated, so that the training is completed once.
According to some embodiments of the present application, step S210 includes, but is not limited to, step S310, step S320, and step S330.
Step S310, selecting one action from all possible actions in the current state to be executed, and obtaining the number of coverage users, the number of coincident beams and the position of the next cell corresponding to the next state;
step S320, determining a beam moving distance according to the current cell position and the next cell position corresponding to the current state;
step S330, determining the rewarding value according to the number of the covered users, the number of the coincident beams and the beam moving distance.
Specifically, the number of the coverage users is fused into the reward value calculation, so that the Q value in the Q value matrix table is fused into the resource utilization ratio characteristic, the number of the coincident beams is fused into the reward value calculation, so that the Q value in the Q value matrix table is fused into the beam interference influence characteristic, the beam moving distance is fused into the reward value calculation, so that the Q value in the Q value matrix table is fused into the path length characteristic, the beam hopping strategy is obtained according to the Q value matrix table, the resource utilization ratio, the beam interference and the path length factor are considered, the low orbit satellite performs reasonable beam scheduling according to the beam hopping strategy, the resource utilization ratio can be optimized, and the beam interference is reduced.
According to some embodiments of the present application, step S330 includes, but is not limited to, step S410, step S420, step S430, and step S440.
Step S410, determining a positive correlation item of the rewarding value according to the number of the covered users;
step S420, determining a first rewarding value negative correlation item according to the number of the coincident beams;
step S430, determining a second prize value negative correlation term according to the beam moving distance;
step S440, determining the prize value according to the prize value positive correlation term, the first prize value negative correlation term and the second prize value negative correlation term.
Specifically, the positive correlation term of the reward value indicates that the reward value increases with the increment of the number of covered users, the negative correlation term of the first reward value indicates that the reward value increases with the decrement of the number of coincident beams, the negative correlation term of the second reward value indicates that the reward value increases with the decrement of the moving distance of the beams, the positive correlation term of the reward value, the negative correlation term of the first reward value and the negative correlation term of the second reward value are added to obtain the reward value, the larger the reward value indicates that the resource utilization rate is higher, the less the beam interference is caused and the scheduling path is shorter, correspondingly, the larger the Q value indicates that the resource utilization rate is higher, the less the beam interference is caused and the scheduling path is shorter, therefore, when the beam is selected in the next hop direction of the current cell according to the Q value matrix table, the direction with the largest Q value should be selected for beam hopping.
It should be noted that, in the embodiment of the present application, the prize value may be determined only according to the positive correlation term of the prize value, that is, only the influence of the resource utilization rate on the prize value is considered, or only the influence of the beam interference on the prize value is considered, or only the influence of the second negative correlation term of the prize value is considered, that is, only the influence of the scheduling distance on the prize value is considered, or the prize value is determined according to any two of the positive correlation term of the prize value, the negative correlation term of the first prize value, and the negative correlation term of the second prize value.
In some embodiments, the positive correlation term of the prize value, the negative correlation term of the first prize value and the negative correlation term of the second prize value may be weighted according to the importance of different influencing factors, for example, the beam scheduling mainly considers the resource utilization rate, the beam interference and the scheduling path are secondary considerations, and the weights of the positive correlation term of the prize value, the negative correlation term of the first prize value and the negative correlation term of the second prize value may be 0.7, 0.2 and 0.1 respectively, and the respective correlation terms are multiplied by the respective weights and added to obtain the prize value.
Specifically, the prize value calculation formula may be as shown in formula (1):
wherein, reorder represents the rewarding value, M represents the number of covered users, N represents the number of coincident beams, and D represents the beam moving distance.
According to some embodiments of the application, step S220 includes, but is not limited to, step S510 and step S520.
Step S510, determining the expected Q value of the current state according to the maximum Q value and the rewarding value of the next state;
step S520, the Q value of the selected behavior of the current state is updated according to the difference between the expected Q value of the current state and the Q value of the current state before updating.
Specifically, taking the Q value matrix table of the continuous training table 2 as an example, the Q value of the upward action of the current state c1 is 88, the current state c1 goes to the next state c2 after the upward action is taken, the rewarding value is determined according to the information such as the number of coverage users, the number of coincident beams and the beam moving distance of the cell c2, the maximum Q value 87 of the next state c2 is obtained by looking up a table, and the maximum Q value of the next state c2 is multiplied by a preset attenuation value and then added with the rewarding value, so that the expected Q value is obtained, which is assumed to be 90. The difference between the expected Q value 90 and the estimated Q value 88 of the current state c1 taking the upward action is calculated, and the difference is multiplied by a learning rate of 0.5 and then the original estimated Q value 88 is added to obtain the updated Q value 89 of the current state c1 taking the upward action.
According to some embodiments of the application, the expected Q value for the current state is calculated by the following formula:
Q(s t ',a t ')=reward+gamma×arg(max(Q(s t+1 ));
wherein Q(s) t ',a t ') represents the expected Q value of the current state, gamma represents the preset attenuation value, Q(s) t+1 ) The Q value corresponding to all possible behaviors representing the next state.
The embodiment of the application also provides a wave beam jumping scheduling system of the low orbit satellite, which comprises the following steps:
the first module is used for acquiring cell coverage information of a beam cluster in the low-orbit satellite system;
the second module is used for constructing a Q value matrix table by taking the cell as a state and taking the beam jumping direction as a behavior according to the cell coverage information;
a third module for initializing the Q value in the Q value matrix table;
a fourth module, configured to train the Q-value matrix table according to a preset training number to obtain a beam hopping strategy, so that the low-orbit satellite performs beam scheduling according to the beam hopping strategy;
wherein, each training Q value matrix table comprises the following steps:
and taking all cells in the beam traversing beam cluster as target states, and updating the Q value matrix table by adopting a reinforcement learning algorithm.
It can be understood that the content in the embodiments of the beam hopping scheduling method of the low-orbit satellite is applicable to the embodiments of the system, and the functions specifically implemented by the embodiments of the system are the same as those of the embodiments of the beam hopping scheduling method of the low-orbit satellite, and the beneficial effects achieved by the embodiments of the beam hopping scheduling method of the low-orbit satellite are the same as those achieved by the embodiments of the beam hopping scheduling method of the low-orbit satellite.
Referring to fig. 3, fig. 3 is a schematic diagram of a beam hopping scheduling device for a low-orbit satellite according to an embodiment of the present application. The beam hopping scheduling device for the low-orbit satellite according to the embodiment of the application comprises one or more control processors and a memory, and in fig. 3, one control processor and one memory are taken as an example.
The control processor and the memory may be connected by a bus or otherwise, for example in fig. 3.
The memory, as a non-transitory computer readable storage medium, may be used to store non-transitory software programs as well as non-transitory computer executable programs. In addition, the memory may include high-speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid state storage device. In some embodiments, the memory optionally includes memory remotely located from the control processor, the remote memory being connectable to the beam-hopping scheduler of the low-orbit satellite via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
It will be appreciated by those skilled in the art that the arrangement shown in fig. 3 does not constitute a limitation of the low-orbit satellite's beam-hopping scheduler, and may include more or fewer components than shown, or may combine certain components, or a different arrangement of components.
The non-transitory software program and instructions required to implement the low-orbit satellite beam-hopping scheduling method of the beam-hopping scheduling device applied to the low-orbit satellite in the above embodiment are stored in the memory, and when executed by the control processor, the beam-hopping scheduling method of the low-orbit satellite of the beam-hopping scheduling device applied to the low-orbit satellite in the above embodiment is executed.
In addition, an embodiment of the present application further provides a computer readable storage medium, where computer executable instructions are stored, where the computer executable instructions are executed by one or more control processors, and where the one or more control processors are configured to perform a beam hopping scheduling method of the low-orbit satellite in the method embodiment.
Those of ordinary skill in the art will appreciate that all or some of the steps, systems, and methods disclosed above may be implemented as software, firmware, hardware, and suitable combinations thereof. Some or all of the physical components may be implemented as software executed by a processor, such as a central processing unit, digital signal processor, or microprocessor, or as hardware, or as an integrated circuit, such as an application specific integrated circuit. Such software may be distributed on computer readable media, which may include computer storage media (or non-transitory media) and communication media (or transitory media). The term computer storage media includes both volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data, as known to those skilled in the art. Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital Versatile Disks (DVD) or other optical disk storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can be accessed by a computer. Furthermore, as is well known to those of ordinary skill in the art, communication media typically embodies computer readable instructions, data structures, program modules or other data in a modulated data signal such as a carrier wave or other transport mechanism and includes any information delivery media.
The embodiments of the present application have been described in detail with reference to the accompanying drawings, but the present application is not limited to the above embodiments, and various changes can be made within the knowledge of one of ordinary skill in the art without departing from the spirit of the present application.

Claims (8)

1. The wave beam jumping scheduling method of the low orbit satellite is characterized by comprising the following steps of:
acquiring cell coverage information of a beam cluster in a low-orbit satellite system;
according to the cell coverage information, constructing a Q value matrix table by taking the cell as a state and taking the beam jumping direction as a behavior;
initializing a Q value in the Q value matrix table;
training the Q value matrix table according to preset training times to obtain a beam hopping strategy so that the low orbit satellite performs beam scheduling according to the beam hopping strategy;
wherein, updating the Q value matrix table for each training comprises the following steps:
taking all cells in a beam traversing beam cluster as target states, and updating the Q value matrix table by adopting a reinforcement learning algorithm; wherein, the updating the Q value matrix table by adopting the reinforcement learning algorithm comprises the following steps:
selecting one action from all possible actions in the current state to be executed, and obtaining the number of coverage users, the number of coincident beams and the position of the next cell corresponding to the next state;
determining a beam moving distance according to the current cell position and the next cell position corresponding to the current state;
determining a reward value according to the number of the coverage users, the number of the coincident beams and the beam moving distance;
and updating the Q value of the selected behavior of the current state according to the maximum Q value of the next state and the rewarding value.
2. The method of beam hopping scheduling for a low-orbit satellite as claimed in claim 1, wherein said determining the prize value based on the number of overlapping users, the number of overlapping beams and the beam movement distance comprises the steps of:
determining a positive correlation item of the rewarding value according to the number of the covered users;
determining a first rewarding value negative correlation item according to the number of the coincident beams;
determining a second prize value negative correlation term according to the beam movement distance;
and determining the bonus value according to the bonus value positive correlation item, the first bonus value negative correlation item and the second bonus value negative correlation item.
3. The method of beam hopping scheduling for a low-orbit satellite as claimed in claim 1, wherein the prize value is obtained by the following equation:
wherein, reorder represents the rewarding value, M represents the number of covered users, N represents the number of coincident beams, and D represents the beam moving distance.
4. A method of beam-hopping scheduling for a low-orbit satellite as claimed in claim 3, wherein said updating the Q value of the selected behaviour of the current state in dependence on the maximum Q value of the next state and the reward value comprises the steps of:
determining an expected Q value of the current state according to the maximum Q value of the next state and the rewarding value;
and updating the Q value of the selected behavior of the current state according to the difference between the expected Q value of the current state and the Q value of the current state before updating.
5. The method of beam-hopping scheduling for low-orbit satellites as claimed in claim 4, wherein the expected Q value of the current state is calculated by the following formula:
Q(s t ',a t ')=reward+gamma×arg(max(Q(s t+1 )));
wherein Q(s) t ',a t ') represents the expected Q value of the current state, gamma represents the preset attenuation value, Q(s) t+1 ) The Q value corresponding to all possible behaviors representing the next state.
6. A system for beam hopping scheduling for a low-orbit satellite, comprising:
the first module is used for acquiring cell coverage information of a beam cluster in the low-orbit satellite system;
the second module is used for constructing a Q value matrix table by taking the cell as a state and the beam jumping direction as a behavior according to the cell coverage information;
a third module, configured to initialize a Q value in the Q value matrix table;
a fourth module, configured to train the Q-value matrix table according to a preset training number to obtain a beam hopping strategy, so that the low-orbit satellite performs beam scheduling according to the beam hopping strategy;
wherein, each training of the Q value matrix table comprises the following steps:
taking all cells in a beam traversing beam cluster as target states, and updating the Q value matrix table by adopting a reinforcement learning algorithm; wherein, the updating the Q value matrix table by adopting the reinforcement learning algorithm comprises the following steps:
selecting one action from all possible actions in the current state to be executed, and obtaining the number of coverage users, the number of coincident beams and the position of the next cell corresponding to the next state;
determining a beam moving distance according to the current cell position and the next cell position corresponding to the current state;
determining a reward value according to the number of the coverage users, the number of the coincident beams and the beam moving distance;
and updating the Q value of the selected behavior of the current state according to the maximum Q value of the next state and the rewarding value.
7. A beam hopping scheduling device for a low-orbit satellite, comprising:
at least one processor;
at least one memory for storing at least one program;
when the at least one program is executed by the at least one processor, the at least one processor is caused to implement the beam hopping scheduling method of the low-orbit satellite as claimed in any one of claims 1 to 5.
8. A computer readable storage medium in which a processor executable program is stored, wherein the processor executable program is for implementing the low orbit satellite beam hopping scheduling method according to any one of claims 1 to 5 when executed by the processor.
CN202211014921.4A 2022-08-23 2022-08-23 Wave beam jumping scheduling method, system and device for low orbit satellite and storage medium Active CN115483960B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211014921.4A CN115483960B (en) 2022-08-23 2022-08-23 Wave beam jumping scheduling method, system and device for low orbit satellite and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211014921.4A CN115483960B (en) 2022-08-23 2022-08-23 Wave beam jumping scheduling method, system and device for low orbit satellite and storage medium

Publications (2)

Publication Number Publication Date
CN115483960A CN115483960A (en) 2022-12-16
CN115483960B true CN115483960B (en) 2023-08-29

Family

ID=84422279

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211014921.4A Active CN115483960B (en) 2022-08-23 2022-08-23 Wave beam jumping scheduling method, system and device for low orbit satellite and storage medium

Country Status (1)

Country Link
CN (1) CN115483960B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118019016B (en) * 2024-04-10 2024-07-05 成都爱瑞无线科技有限公司 Method, device and storage medium for managing jumping beam for satellite communication

Citations (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2632061A1 (en) * 2012-02-27 2013-08-28 Agence Spatiale Européenne A method and a system of providing multi-beam coverage of a region of interest in multi-beam satellite communication.
EP2634923A1 (en) * 2012-02-29 2013-09-04 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Interference Mitigation in a Communication System
CN107949066A (en) * 2017-11-21 2018-04-20 西安空间无线电技术研究所 A kind of ripple position resource flexible scheduling system and dispatching method towards beam-hopping
CN108966352A (en) * 2018-07-06 2018-12-07 北京邮电大学 Dynamic beam dispatching method based on depth enhancing study
CN109743735A (en) * 2018-12-18 2019-05-10 北京邮电大学 A kind of dynamic channel assignment method based on depth enhancing study in satellite communication system
CN111211831A (en) * 2020-01-13 2020-05-29 东方红卫星移动通信有限公司 Multi-beam low-orbit satellite intelligent dynamic channel resource allocation method
WO2020144572A1 (en) * 2019-01-11 2020-07-16 Telefonaktiebolaget Lm Ericsson (Publ) Apparatus and method for facilitating index-based positioning in a non-terrestrial network
CN111970047A (en) * 2020-08-25 2020-11-20 桂林电子科技大学 LEO satellite channel allocation method based on reinforcement learning
WO2021134058A1 (en) * 2019-12-28 2021-07-01 Hughes Network Systems, Llc System and method of traffic-based classification of iot devices and dynamic allocation of link resources to iot devices
CN113572517A (en) * 2021-07-30 2021-10-29 哈尔滨工业大学 Beam hopping resource allocation method, system, storage medium and equipment based on deep reinforcement learning
WO2022015008A1 (en) * 2020-07-13 2022-01-20 Samsung Electronics Co., Ltd. Method and system for determining target cell for handover of ue
WO2022066147A1 (en) * 2020-09-22 2022-03-31 Viasat, Inc. Techniques for switching between operating modes of beamforming systems and satellites
CN114362810A (en) * 2022-01-11 2022-04-15 重庆邮电大学 Low-orbit satellite beam hopping optimization method based on migration depth reinforcement learning
CN114499629A (en) * 2021-12-24 2022-05-13 南京邮电大学 Dynamic resource allocation method for beam-hopping satellite system based on deep reinforcement learning
CN114599117A (en) * 2022-03-07 2022-06-07 中国科学院微小卫星创新研究院 Dynamic configuration method for backspacing resources in random access of low earth orbit satellite network
CN114629547A (en) * 2022-03-19 2022-06-14 西安电子科技大学 High-throughput beam hopping scheduling method for differentiated services
CN114665952A (en) * 2022-03-24 2022-06-24 重庆邮电大学 Low-orbit satellite network beam hopping optimization method based on satellite-ground fusion architecture
CN114727422A (en) * 2022-03-07 2022-07-08 中国科学院微小卫星创新研究院 Dynamic configuration method for channel resources in random access of low-orbit satellite network
CN114900897A (en) * 2022-05-17 2022-08-12 中国人民解放军国防科技大学 Multi-beam satellite resource allocation method and system

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11165491B2 (en) * 2018-12-31 2021-11-02 Hughes Network Systems, Llc Location management for satellite systems

Patent Citations (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2632061A1 (en) * 2012-02-27 2013-08-28 Agence Spatiale Européenne A method and a system of providing multi-beam coverage of a region of interest in multi-beam satellite communication.
EP2634923A1 (en) * 2012-02-29 2013-09-04 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Interference Mitigation in a Communication System
CN107949066A (en) * 2017-11-21 2018-04-20 西安空间无线电技术研究所 A kind of ripple position resource flexible scheduling system and dispatching method towards beam-hopping
CN108966352A (en) * 2018-07-06 2018-12-07 北京邮电大学 Dynamic beam dispatching method based on depth enhancing study
CN109743735A (en) * 2018-12-18 2019-05-10 北京邮电大学 A kind of dynamic channel assignment method based on depth enhancing study in satellite communication system
WO2020144572A1 (en) * 2019-01-11 2020-07-16 Telefonaktiebolaget Lm Ericsson (Publ) Apparatus and method for facilitating index-based positioning in a non-terrestrial network
WO2021134058A1 (en) * 2019-12-28 2021-07-01 Hughes Network Systems, Llc System and method of traffic-based classification of iot devices and dynamic allocation of link resources to iot devices
CN111211831A (en) * 2020-01-13 2020-05-29 东方红卫星移动通信有限公司 Multi-beam low-orbit satellite intelligent dynamic channel resource allocation method
WO2022015008A1 (en) * 2020-07-13 2022-01-20 Samsung Electronics Co., Ltd. Method and system for determining target cell for handover of ue
CN111970047A (en) * 2020-08-25 2020-11-20 桂林电子科技大学 LEO satellite channel allocation method based on reinforcement learning
WO2022066147A1 (en) * 2020-09-22 2022-03-31 Viasat, Inc. Techniques for switching between operating modes of beamforming systems and satellites
CN113572517A (en) * 2021-07-30 2021-10-29 哈尔滨工业大学 Beam hopping resource allocation method, system, storage medium and equipment based on deep reinforcement learning
CN114499629A (en) * 2021-12-24 2022-05-13 南京邮电大学 Dynamic resource allocation method for beam-hopping satellite system based on deep reinforcement learning
CN114362810A (en) * 2022-01-11 2022-04-15 重庆邮电大学 Low-orbit satellite beam hopping optimization method based on migration depth reinforcement learning
CN114599117A (en) * 2022-03-07 2022-06-07 中国科学院微小卫星创新研究院 Dynamic configuration method for backspacing resources in random access of low earth orbit satellite network
CN114727422A (en) * 2022-03-07 2022-07-08 中国科学院微小卫星创新研究院 Dynamic configuration method for channel resources in random access of low-orbit satellite network
CN114629547A (en) * 2022-03-19 2022-06-14 西安电子科技大学 High-throughput beam hopping scheduling method for differentiated services
CN114665952A (en) * 2022-03-24 2022-06-24 重庆邮电大学 Low-orbit satellite network beam hopping optimization method based on satellite-ground fusion architecture
CN114900897A (en) * 2022-05-17 2022-08-12 中国人民解放军国防科技大学 Multi-beam satellite resource allocation method and system

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于强化学习的卫星通信资源分配算法研究;刘召;《中国优秀硕士学位论文全文数据库-信息科技辑》;全文 *

Also Published As

Publication number Publication date
CN115483960A (en) 2022-12-16

Similar Documents

Publication Publication Date Title
CN111414252B (en) Task unloading method based on deep reinforcement learning
Hu et al. Deep reinforcement learning‐based beam Hopping algorithm in multibeam satellite systems
CN110753319B (en) Heterogeneous service-oriented distributed resource allocation method and system in heterogeneous Internet of vehicles
Ye et al. Deep reinforcement learning based distributed resource allocation for V2V broadcasting
CN115483960B (en) Wave beam jumping scheduling method, system and device for low orbit satellite and storage medium
US20230217264A1 (en) Dynamic spectrum sharing based on machine learning
CN108566242B (en) Spatial information network resource scheduling system for remote sensing data transmission service
CN110769512B (en) Satellite-borne resource allocation method and device, computer equipment and storage medium
Qi et al. Energy-efficient resource allocation for UAV-assisted vehicular networks with spectrum sharing
CN115021799B (en) Low-orbit satellite switching method based on multi-agent cooperation
CN115276754B (en) Satellite transmission optimization method based on grid time delay prediction
US11871251B2 (en) Method of association of user equipment in a cellular network according to a transferable association policy
Zheng et al. LEO satellite channel allocation scheme based on reinforcement learning
CN114599117B (en) Dynamic configuration method for backspacing resources in random access of low earth orbit satellite network
CN114727422A (en) Dynamic configuration method for channel resources in random access of low-orbit satellite network
CN113727278B (en) Path planning method, access network equipment and flight control equipment
CN114726431B (en) Wave beam hopping multiple access method facing low orbit satellite constellation
CN116669194A (en) Method and apparatus for allocating frequency resources in a non-terrestrial network
CN114826379A (en) Time slot and wave beam resource dynamic allocation method applied to low-earth-orbit satellite network
CN115103449B (en) Multi-beam low-orbit satellite space energy distribution method and device and electronic equipment
Liu et al. Deep Reinforcement Learning-Assisted NOMA Age-Optimal Power Allocation for S-IoT Network
CN116506965B (en) Multi-unmanned aerial vehicle communication resource allocation method and terminal
CN115665867B (en) Spectrum management method and system for Internet of Vehicles
CN117320083B (en) Multi-unmanned aerial vehicle communication resource allocation method based on scale independent reinforcement learning
CN117729555B (en) Air base station deployment method, cooperative system and related equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant