CN116506863A

CN116506863A - Decision optimization method, decision optimization device, electronic equipment and readable storage medium

Info

Publication number: CN116506863A
Application number: CN202210053765.6A
Authority: CN
Inventors: 邓娟; 刘光毅
Original assignee: China Mobile Communications Group Co Ltd; China Mobile Communications Ltd Research Institute
Current assignee: China Mobile Communications Group Co Ltd; China Mobile Communications Ltd Research Institute
Priority date: 2022-01-18
Filing date: 2022-01-18
Publication date: 2023-07-28

Abstract

The application provides a decision optimization method, a decision optimization device, electronic equipment and a readable storage medium, wherein the decision optimization method applied to a base station comprises the following steps: acquiring target network state information and sending the target network state information to a network node; receiving a network state information set sent by the network node, wherein the network state set comprises the target network state information and at least one derived network state information, and the derived network state information is determined based on the target network state information; determining an action decision set according to the network state information set, and sending the action decision set to the network node, wherein the action decision set comprises at least one action decision; and receiving a target action decision sent by the network node, and executing the target action decision, wherein the target action decision is determined based on the action decision set. The method and the device can improve the reliability of system decision.

Description

Decision optimization method, decision optimization device, electronic equipment and readable storage medium

Technical Field

The embodiment of the application relates to the technical field of artificial intelligence, in particular to a decision optimization method, a decision optimization device, electronic equipment and a readable storage medium.

Background

An agent is an entity with sensing, interaction and autonomous decision making capabilities that can be used in wireless communications. The actual physical network interacts with the intelligent agent, the intelligent agent can acquire the network state, determine an action to execute according to a certain strategy, and then adjust the strategy according to feedback executed by the network on the action, so that the optimization is continuously updated, and finally the optimal strategy is determined.

However, during the interaction of an agent with a network, there is often a certain delay, for example, delay in the acquisition of the network state, delay in the determination and issuing of action decisions. This will result in decisions performed by the agent that are already not optimal for performing the current network state, and the reliability of the system is reduced.

Disclosure of Invention

An embodiment of the application aims to provide a decision optimization method, a decision optimization device, electronic equipment and a readable storage medium, which solve the problem of low reliability of system decisions in the prior art.

In a first aspect, an embodiment of the present application provides a decision optimization method, which is applied to a base station, and includes:

acquiring target network state information and sending the target network state information to a network node;

Receiving a network state information set sent by the network node, wherein the network state set comprises the target network state information and at least one derived network state information, and the derived network state information is determined based on the target network state information;

determining an action decision set according to the network state information set, and sending the action decision set to the network node, wherein the action decision set comprises at least one action decision;

and receiving a target action decision sent by the network node, and executing the target action decision, wherein the target action decision is determined based on the action decision set.

Optionally, before the obtaining the target network state information and sending the target network state information to a network node, the method further includes:

receiving the requirement information of the network state parameters sent by the network nodes, wherein the requirement information of the network state parameters is determined based on target evaluation indexes, and the target evaluation indexes are determined based on network optimization intention;

determining a target network state parameter according to the requirement information of the network state parameter;

the obtaining the target network state information includes:

And acquiring the parameter value of the target network state parameter to obtain the target network state information.

Optionally, the determining an action decision set according to the network state information set includes:

inputting first network state information into a reinforcement learning algorithm model to obtain a first action decision, wherein the first network state is any network state in the network state set, and the action decision set comprises the first action decision.

Optionally, after the performing the target action decision, the method further comprises:

acquiring target network performance information after the target action decision is executed;

determining target feedback information according to the target network performance information;

generating a training sample based on the target network state, the target action decision and the target feedback information;

and updating model parameters of the reinforcement learning algorithm model based on the training samples.

In a second aspect, an embodiment of the present application provides a decision optimization method, applied to a network node, including:

receiving target network state information sent by a base station;

determining at least one derived network state information according to the target network state information, and sending a network state information set to the base station, wherein the network state information set comprises the target network state information and the at least one derived network state information;

Receiving an action decision set sent by the base station, wherein the action decision set is determined based on the network state information set;

and determining a target action decision according to the action decision set, and sending the target action decision to the base station.

Optionally, before the receiving the target network state information sent by the base station, the method further includes:

determining a target evaluation index, wherein the target evaluation index is determined based on the network optimization intention;

according to the target evaluation index, determining the requirement information of the network state parameters;

transmitting the requirement information of the network state parameters to the base station;

the target network state is acquired based on the requirement information of the network state parameters.

Optionally, the determining the requirement information of the network state parameter according to the target evaluation index includes:

and determining the requirement information of the network state parameters corresponding to the target evaluation indexes based on a pre-configured knowledge graph, wherein the knowledge graph stores the association relation between the evaluation indexes and the network state parameters.

Optionally, the determining at least one derived network state information according to the target network state information includes:

Determining a virtual twin network corresponding to a physical network based on the target network state information;

at least one derived network state information within a target time window is predicted based on the virtual twin network.

Optionally, the determining a target action decision according to the action decision set includes:

based on the virtual twin network, performing simulation on each action decision in the action decision set to obtain a simulation result corresponding to each action decision;

evaluating the simulation result to obtain evaluation information corresponding to the simulation result;

and determining a target action decision according to the evaluation information.

Optionally, the performing, based on the virtual twin network, a simulation on each action decision in the action decision set to obtain a simulation result corresponding to each action decision includes:

based on the virtual twin network, performing simulation on a second action decision and each piece of network state information in the network state information set to obtain a simulation result corresponding to the second action decision and each piece of network state information;

wherein the second action decision is any action decision in the action decision set.

In a third aspect, an embodiment of the present application provides a decision optimization apparatus, including:

the first acquisition module is used for acquiring target network state information and sending the target network state information to a network node;

a first receiving module, configured to receive a network state information set sent by the network node, where the network state set includes the target network state information and at least one derived network state information, and the derived network state information is determined based on the target network state information;

a first determining module, configured to determine a set of action decisions according to the set of network state information, and send the set of action decisions to the network node, where the set of action decisions includes at least one action decision;

and the second receiving module is used for receiving the target action decision sent by the network node and executing the target action decision, and the target action decision is determined based on the action decision set.

Optionally, the apparatus further comprises:

the third receiving module is used for receiving the requirement information of the network state parameters sent by the network node, wherein the requirement information of the network state parameters is determined based on a target evaluation index, and the target evaluation index is determined based on the network optimization intention;

The second determining module is used for determining a target network state parameter according to the requirement information of the network state parameter;

the first acquisition module is used for:

Optionally, the first determining module is configured to:

Optionally, the apparatus further comprises:

the second acquisition module is used for acquiring the target network performance information after the target action decision is executed;

the third determining module is used for determining target feedback information according to the target network performance information;

the generation module is used for generating a training sample based on the target network state, the target action decision and the target feedback information;

and the optimization module is used for updating the model parameters of the reinforcement learning algorithm model based on the training samples.

In a fourth aspect, an embodiment of the present application provides a decision optimization apparatus, including:

A fourth receiving module, configured to receive target network state information sent by the base station;

a fourth determining module, configured to determine at least one derived network state information according to the target network state information, and send a network state information set to the base station, where the network state information set includes the target network state information and the at least one derived network state information;

a fifth receiving module, configured to receive an action decision set sent by the base station, where the action decision set is determined based on the network state information set;

and a fifth determining module, configured to determine a target action decision according to the action decision set, and send the target action decision to the base station.

Optionally, the apparatus further comprises:

a sixth determining module, configured to determine a target evaluation index, where the target evaluation index is determined based on the network optimization intention;

a seventh determining module, configured to determine, according to the target evaluation index, requirement information of a network state parameter;

the first sending module is used for sending the requirement information of the network state parameters to the base station;

Optionally, the seventh determining module is configured to:

Optionally, the fourth determining module includes:

a first determining unit, configured to determine a virtual twin network corresponding to a physical network based on the target network state information;

and a second determining unit, configured to predict at least one derived network state information within a target time window based on the virtual twin network.

Optionally, the fifth determining module includes:

the simulation unit is used for performing simulation on each action decision in the action decision set based on the virtual twin network to obtain a simulation result corresponding to each action decision;

the evaluation unit is used for evaluating the simulation result to obtain evaluation information corresponding to the simulation result;

and the third determining unit is used for determining a target action decision according to the evaluation information.

Optionally, the simulation unit is configured to:

In a fifth aspect, embodiments of the present application provide an electronic device comprising a transceiver, a processor, a memory and a computer program stored on the memory and executable on the processor, the computer program implementing the steps of the decision optimization method as provided in the first aspect when executed by the processor; alternatively, the computer program when executed by the processor implements the steps of the decision optimization method as provided in the second aspect.

In a sixth aspect, embodiments of the present application provide a computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the decision optimization method as provided in the first aspect; alternatively, the computer program when executed by the processor implements the steps of the decision optimization method as provided in the second aspect.

In the embodiment of the application, after the base station acquires the target network state information from the physical network, the target network state information can be sent to the network node, the network node integrates the target network state information, and derives the network state in a period of time in the future, so that the base station intelligent agent can comprehensively consider the target network state information and the derived network state information to determine the action decision set, and then the network node comprehensively determines the target action decision to be executed by the base station. The time granularity of determining the action decision by the intelligent agent is finer, and compared with the action decision determined based on the target network state information, the target action decision is more fit with the real physical network state, so that the mode of system decision is optimized, and the reliability and safety of the system decision are improved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the description of the embodiments of the present application will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a block diagram of a network system to which embodiments of the present application are applicable;

FIG. 2 is one of the flowcharts of a decision optimization method provided in an embodiment of the present application;

FIG. 3 is a second flowchart of a decision optimization method according to an embodiment of the present disclosure;

FIG. 4 is one of the interactive schematic diagrams of a decision optimization method provided in the embodiments of the present application;

FIG. 5 is a second schematic diagram of an embodiment of a decision optimization method;

fig. 6 is a schematic diagram of a codebook space design of a Massive MIMO antenna according to an embodiment of the present application;

FIG. 7 is a block diagram of a decision optimizing apparatus according to an embodiment of the present application;

FIG. 8 is a second block diagram of a decision optimizing apparatus according to an embodiment of the present application;

Fig. 9 is a block diagram of an electronic device according to an embodiment of the present application.

Detailed Description

The following description of the embodiments of the present application will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are some, but not all, of the embodiments of the present application. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the present disclosure, are within the scope of the present disclosure.

The terms "first," "second," and the like in this application are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus. Furthermore, the use of "and/or" in this application means at least one of the connected objects, such as a and/or B and/or C, is meant to encompass the 7 cases of a alone, B alone, C alone, and both a and B, both B and C, both a and C, and both A, B and C.

In the embodiments of the present application, words such as "exemplary" or "such as" are used to mean serving as examples, illustrations, or descriptions. Any embodiment or design described herein as "exemplary" or "for example" should not be construed as preferred or advantageous over other embodiments or designs. Rather, the use of words such as "exemplary" or "such as" is intended to present related concepts in a concrete fashion.

In the related art, an agent in wireless communication can acquire a real-time network state of a physical network, determine an action according to a certain policy, and then adjust the policy according to feedback executed by the physical network on the action, so as to continuously update and optimize. Specifically, the agent may learn action cost functions that employ different action decisions under different network conditions, and determine the action decisions to be performed. After performing the action decision, rewards may be calculated by a rewards function, resulting in a < state, action, rewards > combination, and model parameters are optimized based on the < state, action, rewards > combination.

However, the time granularity of the intelligent agent when acquiring network state parameters and generating the action decisions is limited by the interaction and transmission of the intelligent agent and the physical network, and one or more network states between the two action decisions cannot be optimized by the currently determined action decisions, which results in reduced reliability of the action decisions and increased security risk.

The embodiment of the application provides a decision optimization method. Referring to fig. 1, fig. 1 is a block diagram of a network system to which an embodiment of the present application is applicable. As shown in fig. 1, the network system includes a base station 11 and a network node 12, and communication is possible between the base station 11 and the network node 12. The base station 11 may be any base station in a physical network, and an agent is deployed in the base station 11.

The network node 12 may be a centralized network node in a physical network, which may also be referred to as a centralized management network node or a core network node, and may specifically be a wireless network management device or other centralized network node. The network node 12 may interact with one or more base stations 11 to integrate and process information transmitted by the base stations 11, so as to assist the base station agent in optimizing decisions, so as to improve reliability and security of system decision execution.

Embodiments of the present application will be described below from the perspective of a base station and a network node, respectively.

Referring to fig. 2, fig. 2 is one of flowcharts of a decision optimization method according to an embodiment of the present application, where the decision optimization method may be applied to a base station.

As shown in fig. 2, the decision optimization method may include the steps of:

step 201, obtaining target network state information, and sending the target network state information to a network node.

It should be noted that, in the embodiment of the present application, the physical network in the target area is taken as an example to perform decision optimization, and the decision optimization for the global physical network may also be performed with reference to the embodiment of the present application.

The target network state information may include parameter values for one or more network state parameters in the physical network. The network status parameters may include static parameters, such as base station location and number, cell number, frequency band, carrier frequency bandwidth, scene channel type, scene geographic information, etc., and dynamic parameters, such as base station switching, base station antenna configuration, user location distribution, etc., which are not limited herein.

In particular, the base station may preset one or more network state parameters for describing a network state, and acquire the target network state information by acquiring parameter values of the preset network state parameters. The base station can also determine the network state parameters to be acquired based on the requirement information sent by the network node, and acquire the target network state information by acquiring the parameter values of the network state parameters required by the requirement. It will be appreciated that the embodiment for specifically determining the network status parameter may be determined according to the actual situation, and is not specifically limited herein.

After acquiring the target network state information, the base station may send the target network state information to the network node in real time, or may periodically send the target network state information to the network node.

Step 202, receiving a network state information set sent by a network node.

Wherein the network state set comprises target network state information and at least one derived network state information, the derived network state information being determined based on the target network state information.

After receiving the target network state information sent by the base station, the network node may determine at least one derived network state information based on the target network state information, and generate a network state information set.

The derived network state information may represent: information of network states that may occur to the physical network during a period of time after the base station performs the last action decision and before the next action decision is performed. For convenience of description, the above-described time period is denoted as a target time window. In particular implementations, the network node may predict network states within a target time window based on the target network state information and the stored historical network state information, and determine information for each network state.

It should be noted that, in the case where the network node interacts with the plurality of base stations, the network node may acquire the target network state information sent by the plurality of base stations, and comprehensively determine the total state information set based on the plurality of target network state information and the actual situation of each base station. For the target base station, the network node may send the full-quantity state information set, or may send only the network state information set related to the target base station in the full-quantity state information set, or may send only the network state information set related to the target base station and the adjacent base station of the target base station in the full-quantity state information set, which may be specifically determined according to the actual situation, for example, based on the algorithm design determination of the agent in the target base station, which is not limited herein.

Step 203, determining an action decision set according to the network state information set, and sending the action decision set to the network node.

Wherein the set of action decisions comprises at least one action decision.

In particular, the base station may determine an action decision according to one network state information in the network state information set, and finally obtain the action decision set.

In an alternative embodiment, the base station may determine an action decision based on a reinforcement learning algorithm for each network state information in the set of network state information. Specifically, the base station agent is deployed with a reinforcement learning algorithm model, taking the first network state information as an example, by inputting the first network state information into the reinforcement learning algorithm model, the reinforcement learning algorithm model can determine the action values of different candidate action decisions based on the algorithm policy, and select the optimal action decision output, that is, the first action decision, based on the action values.

Step 204, receiving the target action decision sent by the network node, and executing the target action decision.

Wherein the target action decision is determined based on the set of action decisions.

After receiving the action decision set sent by the base station, the network node synthesizes at least one action decision in the action decision set, and selects a final action decision, namely, a target action decision, to be sent to the base station.

The following is a further description of the embodiments of the present application:

one) target network state information

The target network state information may be determined based on parameter values of preset network state parameters. However, in some scenarios, the predetermined network state parameters may not accurately and completely describe all states during actual operation of the physical network. In an alternative embodiment, the target network state information may be determined based on the requirement information determined by the network node, so that the target network state information may describe the operation state of the physical network more accurately and completely, and the reliability and safety of the action decision determined based on the target network state information may be higher.

In an alternative embodiment, before step 201, the decision optimization method further includes:

step 201 comprises:

and acquiring parameter values of the target network state parameters to obtain the target network state information.

In this embodiment, the network node may determine the target evaluation index based on the network optimization intention, and determine the requirement information of the network state parameter according to the target evaluation index.

In specific implementation, the network optimization intention may represent an evaluation index to be optimized in the physical network in the target area, that is, a target evaluation index. The target evaluation index may include a performance index (key performance index (Key Performance Indicator, KPI)) of the physical network in the target area, and may further include a base station number list of the target area, a performance index to be optimized in the target area, an arrival value of the performance index, and the like. The performance index of the physical network may include, but is not limited to, base station coverage signal strength, user drop call rate, user handover success rate, system throughput, system energy consumption, etc., and may be specifically determined according to practical situations, which is not specifically limited herein.

The network node can receive the input of operators, analyze the network optimization intention and determine target evaluation indexes, and can also automatically analyze the network optimization intention and determine the target evaluation indexes based on the information reported by one or more base stations. And then, the network node determines the requirement information of the network state parameters according to the target evaluation indexes, wherein the requirement information of the network state parameters can comprise parameter types (including static parameters, dynamic parameters or static parameters and dynamic parameters) of the network state parameters, and can also comprise network state parameters specifically required under the corresponding parameter types.

In an alternative embodiment, the network node may determine the requirement information of the network state parameter based on the knowledge-graph. Specifically, the network node may be preconfigured with a knowledge graph, where the knowledge graph stores an association relationship between the evaluation index and the network state parameter. After determining the target evaluation index, the network node may determine a network state parameter associated with the target evaluation index based on the knowledge-graph.

In the specific implementation, the network node can acquire related information for establishing a knowledge graph through the wireless network equipment, and determines the association relationship and the influence relationship between the initial evaluation index and the network state parameter in the modes of algorithm analysis and the like, thereby establishing the knowledge graph. In the operation process of the knowledge graph, the network node can update the association relationship and the influence relationship in the knowledge graph based on the periodically acquired information.

Two) derived network state information

The derived network state information is determined based on the target network state information.

The network node may first determine a target time window. The target time window may be determined based on a preset rule, for example, the target time window is a time window with a duration of T after the base station performs the action decision last time, and T is a preset value. The target time window may also be specifically determined by the network node according to the actual situation.

The network node may predict network states within a target time window based on the target network state information and the stored historical network state information and determine information for each network state.

In an alternative embodiment, the network node may predict the derived network state information by establishing a virtual twin network. Specifically, the network node may determine a virtual twin network corresponding to the physical network based on the target network state information, and predict at least one derived network state information within the target time window based on the virtual twin network.

In this embodiment, the virtual twin network may also be referred to as a twin digital network or a digital twin network, which is a virtual digital model of a physical network, is a simulation map of the physical network, and can reflect the network state of the physical network. The virtual twin network may include various antenna, channel models, and may also include a machine learning algorithm model for predicting wireless network conditions. The virtual twin network may store historical network state information and historical network configuration information.

In particular, when the network node receives the target network state information for the first time, the network node can establish a virtual twin network based on the target network state information, and when the target network state information is subsequently received, the virtual twin network is updated based on the target network state information. Because the virtual twin network is a mapping to the physical network, the network node can perform a simulation on the target network state information and the stored historical network state information based on the virtual twin network to obtain at least one derived network state information in the target time window.

Three) target action decision

The network node may determine a target action decision from a set of action decisions determined by the base station.

In an alternative embodiment, when the network node establishes the virtual twin network, the network node may perform simulation on each action decision in the action decision set based on the virtual twin network, to obtain a simulation result corresponding to each action decision, and evaluate the simulation result to obtain evaluation information corresponding to the simulation result, and determine a target action decision according to the evaluation information.

In particular, when the network node determines the pair of the network state information and the action decision set based on the network state information set and the action decision set, the pair of the network state information and the action decision can be obtained by combining any action decision in the action decision set with each network state information in the network state information set, or the pair of the network state information and the action decision set can be obtained by combining any network state information in the network state information set with each action decision in the action decision set, which is equivalent to arranging and combining the network state information set and the action decision set.

Taking a second action decision in the action decision set as an example, based on the virtual twin network, each piece of network state information in the second action decision and the network state information set can be simulated, and a simulation result corresponding to the second action decision and each piece of network state information can be obtained. The network node may evaluate the multiple simulation results to determine a target action decision.

It should be noted that, when the network node interacts with the plurality of base stations, the network node may obtain an action decision set sent by the plurality of base stations, and determine a pair of < network state information, action decision, and base station > based on the plurality of network state information sets, the plurality of action decision sets, and the actual conditions of the plurality of base stations, where for each network state information in the network state information set, the network node may combine with any action decision in the action decision set and any base station to obtain one pair of < network state information, the action decision, and the base station > pair, or for each action decision in the action decision set, the network state information and any base station in the network state information set may combine to obtain one pair of < network state information, the action decision, and the base station > pair, which is equivalent to permutation and combination of the network state information sets, the action decision sets, and the plurality of base stations.

In this way, the network node can pre-verify each action decision based on the virtual twin network, and the reliability and safety of the system decision are further improved.

Fourth), self-optimization of base station agents

After the base station executes the target action decision, the base station can acquire the network performance information after executing the target action decision, that is, the value of each network performance index, from the physical network to calculate the target rewards corresponding to the target action decision. In addition, the base station may also obtain updated network state information after performing the target action decision, and determine a training sample < target network state information, target action decision, target reward >, or < target network state information, target action decision, target reward, updated network state information >. The base station can optimize the model parameters of the reinforcement learning algorithm based on the training sample to realize self-optimization.

Referring to fig. 3, fig. 3 is a second flowchart of a decision optimization method according to an embodiment of the present application, where the decision optimization method may be applied to a network node.

As shown in fig. 3, the decision optimization method may include the steps of:

step 301, receiving target network state information sent by a base station.

Step 302, determining at least one derived network state information according to the target network state information, and sending a network state information set to the base station.

Wherein the network state information set includes target network state information and at least one derived network state information;

step 303, receiving an action decision set sent by the base station.

Wherein the action decision set is determined based on the network state information set;

step 304, determining a target action decision according to the action decision set, and sending the target action decision to the base station.

Optionally, before receiving the target network state information sent by the base station, the method further includes:

Optionally, determining the requirement information of the network state parameter according to the target evaluation index includes:

and determining the requirement information of the network state parameters corresponding to the target evaluation indexes based on a pre-configured knowledge graph, wherein the knowledge graph stores the association relationship between the evaluation indexes and the network state parameters.

Optionally, determining at least one derived network state information according to the target network state information includes:

Optionally, determining the target action decision from the action decision set includes:

Optionally, based on each action decision in the action decision set, performing simulation on the virtual twin network to obtain a simulation result corresponding to each action decision, including:

based on the virtual twin network, performing simulation on each piece of network state information in the second action decision and the network state information set to obtain a simulation result corresponding to the second action decision and each piece of network state information;

It should be noted that, this embodiment serves as an implementation manner of the network node corresponding to the above-described method embodiment, so that reference may be made to the description related to the above-described method embodiment, and the same beneficial effects may be achieved. In order to avoid repetition of the description, a description thereof will be omitted.

For ease of understanding, one specific implementation of the embodiments of the present application is described below:

in this embodiment, as shown in fig. 4, the network system for executing the decision optimization method includes two functional entities: base stations and network nodes. The interaction between the base station and the network node may be performed.

The base station is provided with a base station intelligent agent, and specifically comprises a sample acquisition module and a reinforcement learning module. The reinforcement learning module calculates based on the reinforcement learning algorithm model, and the state variables required by the reinforcement learning algorithm model may include, but are not limited to, user distribution, user service model, resource distribution allocated by the base station to the user, base station transmitting antenna parameters, and the like, the required action variables may include, but are not limited to, user switching/accessing, base station resource redistribution, base station switching selection, base station antenna parameter reconfiguration, and the like, and the required reward variables may include, but are not limited to, user call drop rate, user switching success rate, base station coverage signal strength, system throughput, system energy consumption, and the like.

The network node is a centralized network node in the physical network, which can be wireless network management equipment or other centralized network nodes, and is deployed with a virtual twin network. The network node may specifically include an optimization intent module, a twin configuration module, and a physical simulation module.

It should be noted that, in the embodiment of the present application, the division of the modules in the base station and the network node is not limited, and this embodiment is only illustrative, and the information interaction between the base station and the network node may also be as shown in fig. 5.

As shown in fig. 4 and 5, the flow of the decision optimization method in the present embodiment is as follows:

1) The optimization intention module receives network optimization intention input by a user, and sends an optimization intention request message to the twin configuration module, wherein the optimization intention request message carries an evaluation index to be optimized, namely the target evaluation index in the embodiment of the application.

2) After the twinning configuration module receives the optimization intention request message, the target evaluation index carried in the message can be analyzed based on a pre-configured knowledge graph, and the requirement information of the network state parameter associated with the target evaluation index is determined. The twinning configuration module can send a twinning configuration parameter acquisition request message to the sample acquisition module of one or more base station agents in the target area, wherein the twinning configuration parameter acquisition request message carries the parameter type of the twinning configuration parameters and twinning configuration parameter requirements under the corresponding parameter types.

3) After the sample acquisition module receives the twin configuration parameter request message, parameter values related to twin configuration parameter requirements can be acquired from the physical network to form target network state information S _t And sending the target network state information to the twin configuration module through the twin configuration parameter acquisition response message. The target network state information may include information of twin static parameters, such as base station addresses and numbers, cell numbers, frequency bands, carrier frequency bandwidths, scene channel types, scene geographic information, and the like, and may also include information of twin dynamic parameters, such as base station switching conditions, base station antenna configuration, user position distribution, and the like.

It should be noted that, after the sample collection module initially receives the request message of the twin configuration parameters, the sample collection module may collect the current network state information of the physical networkAnd the response message is collected and returned through the twin configuration parameters. Network status informationMay be used to establish a virtual twin network. The sample collection module can periodically collect network state information of the physical network>And the update message is collected and returned through the twin configuration parameters. Network status information->May be used to update the virtual twin network. Wherein the acquisition period of different twin configuration parameters may be different, the sample acquisition module may decide itself, e.g. the acquisition period of twin static parameters is typically longer than twin dynamic parameters.

4) After receiving the twinning configuration parameter acquisition response message or the twinning configuration parameter acquisition update message, the twinning configuration module can send a twinning network update message to the physical simulation module, and the twinning network update message can carry network state information sent by each sample acquisition module and is used for establishing or updating a virtual twinning network.

5) In the initialization stage of the virtual twin network, the physical simulation module receives the twin network update message transmitted by the twin configuration module, and generates the virtual twin network according to the same nodes and environments as the physical network.

6) In the base station agent self-optimization stage, the physical simulation module can be based on the received target network state information S of one or more base stations _t And stored historical network state information, predicting n kinds of derivative network state information S 'of physical networks in the target area within the target time window T' ₁ 、S' ₂ 、···、S' _n Will S _t 、S' ₁ 、S' ₂ 、···、S' _n The total n+1 network state information is denoted as set S.

7) The physical simulation module sends a decision-making reasoning request message to reinforcement learning modules of one or more base station intelligent agents in the target area, wherein the decision-making reasoning request message carries a network state information set S.

8) After the reinforcement learning module receives the decision-making reasoning request message, the network state information set S can be input into the reinforcement learning algorithm model to correspondingly generate an action decision set A, wherein the action decision set A comprises a corresponding action decision a _t 、a' ₁ 、a' ₂ 、···、a' _n And n+1 action decisions are taken. The reinforcement learning module sends an action decision set A to the physical simulation module through a decision reasoning response message.

9) After the physical simulation module receives the decision reasoning response message, the state-decision pair can be based on the virtual twin network pair<s,a>，Performing simulation to evaluate and pre-verify the performance of the system, and finally comprehensively selecting the optimal action decision a _t I.e., the target action decisions described in the embodiments of the present application. Strengthening learning of physical simulation module to base station intelligent agentThe module sends a decision preferential feedback message which carries the optimal action decision a _t 。

10 After the reinforcement learning module receives the decision preference feedback message, the reinforcement learning module can control the base station to execute the optimal action decision a _t And in execution a _t Then, a network feedback request message is sent to the sample acquisition module to request the sample acquisition module to acquire and execute a _t Feedback information of the post physical network to calculate the prize r _t . In addition, the reinforcement learning module can request the sample acquisition module to acquire and execute the a through the network feedback request message _t Network state information S of post-physical network _t+1 。

11 The sample acquisition module may send a training sample update message to the reinforcement learning module, carrying the updated training sample <S _t ,a _t ,r _t >Or (b)<S _t ,a _t ,r _t ,S _t+1 >。

12 After the reinforcement learning module receives the training sample update message, based on the training sample<S _t ,a _t ,r _t >Or (b)<S _t ,a _t ,r _t ,S _t+1 >Optimizing the model parameters of the reinforcement learning algorithm to realize self-optimization.

For convenience of understanding, a scenario of Massive multiple input multiple output (Massive Multiple Input Multiple Output, massive MIMO) antenna weight self-optimization of a mobile communication system is taken as an example for illustration.

In this example, the mobile communication system includes a base station and a terminal, the base station includes an antenna array, and the base station transmits a signal to the terminal through the antenna. The base station may allocate an antenna weight to the antenna array, that is, a weight on each antenna array, for example, a weight, a phase, etc. of a transmission signal, and specifically may set a weight configuration module to determine the antenna weight. The direction and shape of the lobe of the antenna array transmission is determined by the weights. Illustratively, fig. 6 is a schematic diagram of 27 lobe directions for Massive MIMO.

Target network state information S _t The user position distribution information at a certain time t represents the positions of all users in the coverage area of the base station and can comprise the usersOr include the number of users in each grid after the base station coverage is rasterized.

The base station performs action decision as follows: the weight distribution of the Massive MIMO antenna is specifically that the antenna weight configuration value matrix responsible for the base station is awarded as follows: after configuring the antenna weights and transmitting signals with the decision, the coverage performance in the coverage area of the base station, the specific reward may be represented by the statistical RSRP of the whole area calculated by the reference signal received signal power (Reference Signal Receiving Power, RSRP) of each terminal in the coverage area of the base station, or by 95% of the statistical average or cumulative distribution function (Cumulative Distribution Function, CDF) of the RSRP of all terminals.

In this example, the virtual twin network is deployed in a southbound network management device in a physical network. If only the coverage performance under a certain single base station needs to be optimized, the base station and the southbound network management equipment only need to interact according to the scheme. If the coverage performance of a plurality of base stations in a larger area needs to be optimized, the plurality of base stations in the area need to interact with the virtual twin network.

The agent is deployed in the mobile communication base station, and the target evaluation index determined by the optimization intention module in fig. 4 may be an RSRP statistic value in the area to be optimized. The network configuration parameters required by the twinning configuration module in fig. 4, or referred to as twinning configuration parameters, include the number of regional base stations, the number of sectors per base station, the number of regional users, the base station transmit power, the base station antenna gain pattern, the regional environment scenario, etc.

The base station collects the network configuration parameters to generate target network state information S _t And then the data is sent to the south network management equipment. The physical simulation module in fig. 4 may establish a virtual twin network of the physical network. The physical simulation module distributes information S according to the current user position _t Predicting and expanding a derived network state information set S, wherein the weight configurations generated by the reinforcement learning modules in FIG. 4 form a set A, namely an action decision set A, and then evaluating by a physical simulation module to obtain an optimal weight configuration a _t 。

Base station execution a _t And adjusting the antenna weight. Execution a _t The terminal then reports to the base station the RSRP it receives downward in the new antenna beam direction. The base station then calculates a prize r _t, And according to<User distribution, antenna weight, terminal RSRP>And (5) adjusting model parameters to finish self-optimization of the intelligent agent at the base station side.

In this embodiment, the virtual twin network is utilized to derive the network state in a period of time in the future, so that the reinforcement learning algorithm comprehensively considers the current network state and the derived network state to determine an optimal action decision, which is beneficial to improving the system performance. In addition, the virtual twin network can be utilized to pre-verify the result brought by the action decision, so that the reliability and the safety of the system decision are further improved. In addition, the reinforcement learning algorithm may utilize training samples of physical network feedback to achieve self-optimization.

Referring to fig. 7, fig. 7 is one of the block diagrams of the decision optimizing apparatus provided in the embodiment of the present application.

As shown in fig. 7, the decision optimizing apparatus 700 includes:

a first obtaining module 701, configured to obtain target network state information, and send the target network state information to a network node;

a first receiving module 702, configured to receive a set of network state information sent by the network node, where the set of network state information includes the target network state information and at least one derived network state information, and the derived network state information is determined based on the target network state information;

a first determining module 703, configured to determine a set of action decisions according to the set of network state information, and send the set of action decisions to the network node, where the set of action decisions includes at least one action decision;

a second receiving module 704, configured to receive a target action decision sent by the network node, and execute the target action decision, where the target action decision is determined based on the action decision set.

Optionally, the decision optimizing device 700 further comprises:

the first acquisition module 701 is configured to:

Optionally, the first determining module 703 is configured to:

Optionally, the decision optimizing device 700 further comprises:

The decision optimizing device 700 can realize each process that the base station can realize in the method embodiment of the present application, and achieve the same beneficial effects, and for avoiding repetition, the details are not repeated here.

Referring to fig. 8, fig. 8 is a second block diagram of the decision optimizing apparatus according to the embodiment of the present application.

As shown in fig. 8, the decision optimizing apparatus 800 includes:

a fourth receiving module 801, configured to receive target network state information sent by a base station;

a fourth determining module 802, configured to determine at least one derived network state information according to the target network state information, and send a network state information set to the base station, where the network state information set includes the target network state information and the at least one derived network state information;

a fifth receiving module 803, configured to receive a set of action decisions sent by the base station, where the set of action decisions is determined based on the set of network state information;

a fifth determining module 804, configured to determine a target action decision according to the action decision set, and send the target action decision to the base station.

Optionally, the decision optimizing device 800 further comprises:

Optionally, the seventh determining module is configured to:

Optionally, the fourth determining module 802 includes:

Optionally, the fifth determining module 804 includes:

Optionally, the simulation unit is configured to:

The decision optimization device 800 can implement each process that can be implemented by the network node in the method embodiment of the present application, and achieve the same beneficial effects, and for avoiding repetition, a detailed description is omitted here.

The embodiment of the application also provides electronic equipment. Because the principle of solving the problem of the electronic device is similar to that of the decision optimization method provided in the embodiment of the present application, the implementation of the electronic device may refer to the implementation of the method, and the repetition is omitted. As shown in fig. 9, the terminal of the embodiment of the present application includes a processor 900, a transceiver 910, and a memory 920.

In one embodiment, the electronic device is a base station, and the processor 900 is configured to read the program in the memory 920, and perform the following procedures:

A transceiver 910 for receiving and transmitting data under the control of the processor 900.

Wherein in fig. 9, a bus architecture may comprise any number of interconnected buses and bridges, and in particular one or more processors represented by processor 900 and various circuits of memory represented by memory 920, linked together. The bus architecture may also link together various other circuits such as peripheral devices, voltage regulators, power management circuits, etc., which are well known in the art and, therefore, will not be described further herein. The bus interface provides an interface. The transceiver 910 may be a number of elements, including a transmitter and a transceiver, providing a means for communicating with various other apparatus over a transmission medium. The processor 900 is responsible for managing the bus architecture and general processing, and the memory 920 may store data used by the processor 900 in performing operations.

Optionally, the processor 900 is further configured to read the program in the memory 920, and perform the following steps:

In another embodiment, the electronic device is a network node, and the processor 900 is configured to read the program in the memory 920, and perform the following procedure:

receiving target network state information sent by a base station;

The electronic device provided in the embodiment of the present application may execute the above method embodiment, and its implementation principle and technical effects are similar, and this embodiment will not be described herein again.

Those of ordinary skill in the art will appreciate that all or a portion of the steps of implementing the methods of the embodiments described above may be implemented by hardware associated with program instructions, where the program may be stored on a readable medium. The embodiment of the present application further provides a readable storage medium, where a computer program is stored, where any step in the method embodiment corresponding to fig. 2 or fig. 3 can be implemented when the computer program is executed by a processor, and the same technical effect can be achieved, so that repetition is avoided, and no redundant description is provided herein.

Such as Read-Only Memory (ROM), random access Memory (Random Access Memory, RAM), magnetic or optical disk, etc.

In the several embodiments provided in this application, it should be understood that the disclosed methods and apparatus may be implemented in other ways. For example, the apparatus embodiments described above are merely illustrative, e.g., the division of the units is merely a logical function division, and there may be additional divisions when actually implemented, e.g., multiple units or components may be combined or integrated into another system, or some features may be omitted or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or units, which may be in electrical, mechanical or other form.

In addition, each functional unit in each embodiment of the present application may be integrated in one processing unit, or each unit may be physically included separately, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in hardware plus software functional units.

The integrated units implemented in the form of software functional units described above may be stored in a computer readable storage medium. The software functional unit is stored in a storage medium, and includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to perform part of the steps of the transceiving method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), a magnetic disk, or an optical disk, or other various media capable of storing program codes.

The foregoing is a preferred embodiment of the present application and it should be noted that modifications and adaptations to those skilled in the art may be made without departing from the principles of the present application and are intended to be comprehended within the scope of the present application.

Claims

1. A decision optimization method applied to a base station, comprising:

2. The method of claim 1, wherein prior to the obtaining the target network state information and transmitting the target network state information to a network node, the method further comprises:

the obtaining the target network state information includes:

3. The method of claim 1, wherein said determining a set of action decisions from said set of network state information comprises:

4. A method according to claim 3, wherein after said performing said target action decision, the method further comprises:

5. A method of decision optimization for use in a network node, the method comprising:

receiving target network state information sent by a base station;

6. The method of claim 5, wherein prior to receiving the target network state information sent by the base station, the method further comprises:

7. The method of claim 6, wherein determining the demand information for the network status parameter based on the target evaluation index comprises:

8. The method of claim 5, wherein said determining at least one derived network state information from said target network state information comprises:

9. The method of claim 8, wherein the determining a target action decision from the set of action decisions comprises:

10. The method of claim 9, wherein the performing a simulation of each action decision in the set of action decisions based on the virtual twin network to obtain a simulation result corresponding to each of the action decisions comprises:

11. A decision-making optimization apparatus, comprising:

12. The apparatus of claim 11, wherein the apparatus further comprises:

the first acquisition module is used for:

13. The apparatus of claim 11, wherein the first determining module is configured to:

14. The apparatus of claim 13, wherein the apparatus further comprises:

15. A decision-making optimization apparatus, comprising:

16. The apparatus of claim 15, wherein the apparatus further comprises:

17. The apparatus of claim 16, wherein the seventh determination module is configured to:

18. The apparatus of claim 15, wherein the fourth determination module comprises:

19. The apparatus of claim 18, wherein the fifth determination module comprises:

20. The apparatus of claim 19, wherein the simulation unit is configured to:

21. An electronic device comprising a transceiver, a processor, a memory, and a computer program stored on the memory and executable on the processor, the computer program implementing the steps of the method of any one of claims 1 to 4 when executed by the processor; alternatively, the computer program realizes the steps of the method according to claims 5 to 10 when executed by the processor.

22. A computer-readable storage medium, characterized in that it has stored thereon a computer program which, when executed by a processor, implements the steps of the method according to any of claims 1 to 4; alternatively, the computer program realizes the steps of the method according to claims 5 to 10 when executed by the processor.