WO2023024728A1

WO2023024728A1 - Policy management method and device, and computer-readable storage medium

Info

Publication number: WO2023024728A1
Application number: PCT/CN2022/104720
Authority: WO
Inventors: 林志远; 林伟; 刘向凤; 芮华; 黄河
Original assignee: 中兴通讯股份有限公司
Priority date: 2021-08-23
Filing date: 2022-07-08
Publication date: 2023-03-02
Also published as: CN115718865A

Abstract

A policy management method and device, and a computer-readable storage medium. The policy management method comprises: obtaining condition information (S100); selecting from historical policy sets a candidate policy set corresponding to the current cycle (S200); obtaining an optimal policy on the basis of the condition information and the candidate policy set (S300); collecting an operation performance parameter obtained by executing the optimal policy (S400); and updating the candidate policy set according to the operation performance parameter (S500).

Description

策略管理方法、设备及计算机可读存储介质Policy management method, device and computer-readable storage medium

相关申请的交叉引用Cross References to Related Applications

本申请基于申请号为202110969832.4、申请日为2021年8月23日的中国专利申请提出，并要求该中国专利申请的优先权，该中国专利申请的全部内容在此引入本申请作为参考。This application is based on a Chinese patent application with application number 202110969832.4 and a filing date of August 23, 2021, and claims the priority of this Chinese patent application. The entire content of this Chinese patent application is hereby incorporated by reference into this application.

技术领域technical field

本申请实施例涉及但不限于通信技术领域，尤其涉及一种策略管理方法、设备及计算机可读存储介质。The embodiments of the present application relate to but are not limited to the field of communication technologies, and in particular, relate to a policy management method, device, and computer-readable storage medium.

背景技术Background technique

随着通信技术的不断发展，从大量备选策略中选取最优策略已经成为人们所关注的焦点。目前通常使用粒子群算法进行最优策略选取，而粒子群算法选取策略会基于平均性能计算，但对于执行策略所产生的性能具有随机性、且无法准确计算的场合就无法准确寻找到最优策略，无法对策略进行准确更新，从而影响后续最优策略的选取。With the continuous development of communication technology, selecting the optimal strategy from a large number of alternative strategies has become the focus of people's attention. At present, the particle swarm optimization algorithm is usually used to select the optimal strategy, and the particle swarm algorithm selection strategy will be calculated based on the average performance. However, the optimal strategy cannot be accurately found when the performance generated by the execution strategy is random and cannot be accurately calculated. , the strategy cannot be updated accurately, thus affecting the selection of the subsequent optimal strategy.

发明内容Contents of the invention

以下是对本文详细描述的主题的概述。本概述并非是为了限制权利要求的保护范围。The following is an overview of the topics described in detail in this article. This summary is not intended to limit the scope of the claims.

本申请实施例提供了一种策略管理方法、设备及计算机可读存储介质。Embodiments of the present application provide a policy management method, device, and computer-readable storage medium.

第一方面，本申请实施例提供了一种策略管理方法，包括：获取条件信息；从历史策略集合中选取与当前周期对应的候选策略集合；基于所述条件信息和所述候选策略集合得到最优策略；采集执行所述最优策略而得到的运行性能参数；根据所述运行性能参数更新所述候选策略集合。In the first aspect, the embodiment of the present application provides a policy management method, including: obtaining condition information; selecting a candidate policy set corresponding to the current period from the historical policy set; An optimal strategy; collecting operating performance parameters obtained by executing the optimal strategy; updating the set of candidate strategies according to the operating performance parameters.

第二方面，本申请实施例还提供了一种策略管理设备，包括：至少一个处理器；至少一个存储器，被设置为存储至少一个程序；当至少一个所述程序被至少一个所述处理器执行时实现如上所述的策略管理方法。In the second aspect, the embodiment of the present application also provides a policy management device, including: at least one processor; at least one memory configured to store at least one program; when at least one of the programs is executed by at least one of the processors When implementing the policy management method as described above.

第三方面，本申请实施例还提供了一种计算机可读存储介质，存储有计算机可执行指令，所述计算机可执行指令被设置为执行如上所述的策略管理方法。In a third aspect, the embodiment of the present application further provides a computer-readable storage medium storing computer-executable instructions, and the computer-executable instructions are configured to execute the above-mentioned policy management method.

本申请的其它特征和优点将在随后的说明书中阐述，并且，部分地从说明书中变得显而易见，或者通过实施本申请而了解。本申请的目的和其他优点可通过在说明书、权利要求书以及附图中所特别指出的结构来实现和获得。Additional features and advantages of the application will be set forth in the description which follows, and, in part, will be obvious from the description, or may be learned by practice of the application. The objectives and other advantages of the application will be realized and attained by the structure particularly pointed out in the written description and claims hereof as well as the appended drawings.

附图说明Description of drawings

附图用来提供对本申请技术方案的进一步理解，并且构成说明书的一部分，与本申请的实施例一起用于解释本申请的技术方案，并不构成对本申请技术方案的限制。The accompanying drawings are used to provide a further understanding of the technical solution of the present application, and constitute a part of the specification, and are used together with the embodiments of the present application to explain the technical solution of the present application, and do not constitute a limitation to the technical solution of the present application.

图1是本申请一个实施例提供的策略管理方法的流程图；FIG. 1 is a flowchart of a policy management method provided by an embodiment of the present application;

图2是本申请另一实施例提供的选取最优策略的具体流程图；FIG. 2 is a specific flow chart of selecting an optimal strategy provided by another embodiment of the present application;

图3是本申请另一实施例提供的生成最优策略的具体流程图；FIG. 3 is a specific flowchart of generating an optimal strategy provided by another embodiment of the present application;

图4是本申请另一实施例提供的选取候选策略集合的具体流程图；FIG. 4 is a specific flow chart of selecting a candidate strategy set provided by another embodiment of the present application;

图5是本申请另一个实施例提供的更新候选策略集合的具体流程图；FIG. 5 is a specific flow chart of updating a set of candidate strategies provided by another embodiment of the present application;

图6是本申请另一个实施例提供的更新候选策略集合的具体流程图；FIG. 6 is a specific flow chart of updating a set of candidate strategies provided by another embodiment of the present application;

图7是本申请另一个实施例提供的更新候选策略集合的具体流程图；FIG. 7 is a specific flow chart of updating a set of candidate strategies provided by another embodiment of the present application;

图8是本申请另一个实施例提供的更新候选策略集合的具体流程图；FIG. 8 is a specific flow chart of updating a set of candidate policies provided by another embodiment of the present application;

图9是本申请另一个实施例提供的更新候选策略集合的具体流程图；FIG. 9 is a specific flow chart of updating a set of candidate strategies provided by another embodiment of the present application;

图10是本申请另一个实施例提供的更新候选策略集合的具体流程图；FIG. 10 is a specific flow chart of updating a set of candidate strategies provided by another embodiment of the present application;

图11是本申请另一个实施例提供的策略管理方法的流程图；FIG. 11 is a flowchart of a policy management method provided by another embodiment of the present application;

图12是本申请另一实施例提供的策略管理设备的构造示意图。Fig. 12 is a schematic structural diagram of a policy management device provided by another embodiment of the present application.

具体实施方式Detailed ways

为了使本申请的目的、技术方案及优点更加清楚明白，以下结合附图及实施例，对本申请进行进一步详细说明。应当理解，此处所描述的具体实施例仅用以解释本申请，并不用于限定本申请。In order to make the purpose, technical solution and advantages of the present application clearer, the present application will be further described in detail below in conjunction with the accompanying drawings and embodiments. It should be understood that the specific embodiments described here are only used to explain the present application, not to limit the present application.

需要说明的是，虽然在装置示意图中进行了功能模块划分，在流程图中示出了逻辑顺序，但是在某些情况下，可以以不同于装置中的模块划分，或流程图中的顺序执行所示出或描述的步骤。说明书和权利要求书及上述附图中的术语“第一”、“第二”等是用于区别类似的对象，而不必用于描述特定的顺序或先后次序。It should be noted that although the functional modules are divided in the schematic diagram of the device, and the logical sequence is shown in the flowchart, in some cases, it can be executed in a different order than the module division in the device or the flowchart in the flowchart. steps shown or described. The terms "first", "second" and the like in the specification and claims and the above drawings are used to distinguish similar objects, and not necessarily used to describe a specific sequence or sequence.

本申请提供了一种策略管理方法、设备及计算机可读存储介质，获取条件信息；从历史策略集合中选取与当前周期对应的候选策略集合；基于条件信息和候选策略集合得到最优策略；采集执行最优策略而得到的运行性能参数；根据运行性能参数更新候选策略集合。首先获取条件信息以及从历史策略集合中选取与当前周期对应的候选策略集合，接着根据条件信息和候选策略集合得到最优策略，然后执行最优策略并且采集执行最优策略而得到的运行性能参数，最后根据运行性能参数对候选策略集合进行更新，实现基于条件信息而进行最优策略选取，还可以根据运行性能参数而对候选策略集合进行更新处理，以便于后续最优策略选取。The present application provides a policy management method, device and computer-readable storage medium to obtain condition information; select a candidate policy set corresponding to the current period from a historical policy set; obtain an optimal policy based on the condition information and the candidate policy set; collect The operating performance parameters obtained by executing the optimal strategy; the set of candidate strategies is updated according to the operating performance parameters. First obtain the condition information and select the candidate strategy set corresponding to the current period from the historical strategy set, then obtain the optimal strategy according to the condition information and the candidate strategy set, then execute the optimal strategy and collect the operating performance parameters obtained by executing the optimal strategy , and finally update the candidate policy set according to the operating performance parameters to realize optimal policy selection based on condition information, and update the candidate policy set according to the operating performance parameters to facilitate subsequent optimal policy selection.

下面结合附图，对本申请实施例作进一步阐述。The embodiments of the present application will be further described below in conjunction with the accompanying drawings.

如图1所示，图1是本申请一个实施例提供的策略管理方法的流程图。该策略管理方法包括但不限于有步骤S100、步骤S200、步骤S300、步骤S400和步骤S500。As shown in FIG. 1 , FIG. 1 is a flowchart of a policy management method provided by an embodiment of the present application. The policy management method includes but not limited to step S100, step S200, step S300, step S400 and step S500.

步骤S100，获取条件信息。Step S100, acquiring condition information.

步骤S200，从历史策略集合中选取与当前周期对应的候选策略集合。Step S200, selecting a candidate policy set corresponding to the current period from the historical policy set.

步骤S300，基于所述条件信息和所述候选策略集合得到最优策略。Step S300, obtaining an optimal strategy based on the condition information and the set of candidate strategies.

步骤S400，采集执行所述最优策略而得到的运行性能参数。Step S400, collecting operating performance parameters obtained by executing the optimal strategy.

步骤S500，根据所述运行性能参数更新所述候选策略集合。Step S500, updating the set of candidate policies according to the operating performance parameters.

需要说明的是，首先获取条件信息以及从历史策略集合中选取与当前周期对应的候选策略集合，接着根据条件信息和候选策略集合得到最优策略，然后执行最优策略并且采集执行最优策略而得到的运行性能参数，最后根据运行性能参数对候选策略集合进行更新，实现基于条件信息而进行最优策略选取，还可以根据执行最优策略而得到的运行性能参数来对候选策略集合进行更新处理，以便于后续最优策略选取。It should be noted that firstly obtain the condition information and select the candidate policy set corresponding to the current period from the historical policy set, then obtain the optimal policy according to the condition information and the candidate policy set, and then execute the optimal policy and collect and execute the optimal policy. Finally, the candidate policy set is updated according to the operating performance parameters to realize the optimal policy selection based on condition information, and the candidate policy set can also be updated according to the operating performance parameters obtained by executing the optimal policy. , so as to facilitate the subsequent optimal strategy selection.

需要说明的是，策略可以包括但不限于：多用户传输场景下的一种波束组合、单用户传输场景下给定波束的一种MCS参数配置、网络路由问题中的一条路由路径。It should be noted that the strategy may include but not limited to: a combination of beams in a multi-user transmission scenario, an MCS parameter configuration for a given beam in a single-user transmission scenario, and a routing path in a network routing problem.

可以理解的是，条件信息即为完成某一操作而需要设定的约束条件。示例性地，对于面向多用户空分领域中的波束选择问题；而波束空分选择问题具体为：从给定的多个波束(例如64个)中，选择合适的若干个波束形成空分组合进行空分传输；在该问题中，一个策略就代表一种波束空分组合；波束空分组合的性能具有随机性，即使波束组合相同，但波束内用户或用户的信道可能会不同，导致空分组合的吞吐量或频谱效率不同；波束空分组合的平均性能无法提前精确计算而得到，因为无法预先获知所有可能的性能取值和对应的概率分布；而对于必须在本次调度中被调度的主用户，其条件信息就可以是该用户所在的波束一定要被包含在波束空分集合中。示例性地，对于路由规划的问题，路由规划的问题具体为：存在一个源节点，一个目的节点，以及多个中转节点，节点之间存在带有一定开销(例如时延)的路径，要求寻找一条从源节点到目的节点的路径，使得总开销最小；在该问题中，一个策略代表一条路径，该路径从源节点出发，可能经过多个中转节点，止于目的节点；策略的性能具有随机性，因为节点之间的开销同样具有随机性，例如节点之间的传输时延会随背景业务量而波动；策略的平均性能无法提前精确计算而得到，因为无法预先获知所有可能的性能取值和对应的概率分布；对于路由规划这一操作，其条件信息就可以是必须包含源节点、目的节点以及给定的中转节点范围。It can be understood that the condition information is a constraint condition that needs to be set to complete a certain operation. Exemplarily, for the beam selection problem in the field of multi-user space division; and the beam space division selection problem is specifically: from a given number of beams (for example, 64), select a number of appropriate beamforming space division combinations Space division transmission; in this problem, a strategy represents a beam space division combination; the performance of beam space division combination is random, even if the beam combination is the same, the users or users’ channels in the beam may be different, resulting in space The throughput or spectral efficiency of the beam-space division combination is different; the average performance of the beam space division combination cannot be accurately calculated in advance, because all possible performance values and corresponding probability distributions cannot be known in advance; and for the beam that must be scheduled in this scheduling The primary user of , the condition information may be that the beam where the user is located must be included in the beam space division set. Exemplarily, for the problem of routing planning, the problem of routing planning is specifically: there is a source node, a destination node, and multiple transit nodes, and there is a path with a certain overhead (such as delay) between the nodes, and it is required to find A path from the source node to the destination node minimizes the total cost; in this problem, a strategy represents a path, which starts from the source node, may pass through multiple transit nodes, and ends at the destination node; the performance of the strategy is random Because the overhead between nodes is also random, for example, the transmission delay between nodes will fluctuate with the background traffic; the average performance of the strategy cannot be calculated accurately in advance, because all possible performance values cannot be known in advance and the corresponding probability distribution; for the operation of routing planning, the conditional information may include the source node, the destination node and the given transit node range.

需要说明的是，历史策略集合可以由历史上出现过的历史策略组合而成的一个集合；而对于本实施例，历史策略集合可以理解为是由若干个历史最优策略组合而成的一个集合；而历史最优策略是指在历史上出现的满足相同条件约束的大量策略中，长时统计性能最佳的策略。历史策略集合中可以存放着若干个候选策略集合，候选策略集合可以包括若干个历史最优策略，而历史策略集合中的候选策略集合可以按照周期进行排序；示例性地，按照小时进行周期排列，形成24个候选策略集合，当需要从历史策略集合中选取候选策略集合的时候，只需要根据当前时间就可以选取到相应周期的候选策略集合。It should be noted that the historical policy set can be a set composed of historical strategies that have appeared in history; and for this embodiment, the historical policy set can be understood as a set composed of several historical optimal strategies ; while the historical optimal strategy refers to the strategy with the best long-term statistical performance among a large number of strategies that meet the same conditional constraints that have appeared in history. Several candidate strategy sets can be stored in the historical strategy set, and the candidate strategy sets can include several historical optimal strategies, and the candidate strategy sets in the historical strategy set can be sorted according to the cycle; for example, the cycle is arranged according to the hour, 24 candidate policy sets are formed. When it is necessary to select a candidate policy set from the historical policy set, the candidate policy set of the corresponding period can be selected only according to the current time.

可以理解的是，最优策略即为基于当前条件信息所能够执行的性能指标相对较优的策略。It can be understood that the optimal strategy is a strategy with relatively better performance indicators that can be executed based on current condition information.

需要说明的是，对于采集执行最优策略的过程中所产生运行性能参数；示例性地，对于面向多用户空分领域中的波束选择问题，可以采集执行最优策略的过程中所产生的频谱效率；而对于路由规划的问题，可以采集执行最优策略的过程中所产生的网络时延。It should be noted that for the collection of operating performance parameters generated in the process of executing the optimal strategy; for example, for the beam selection problem in the field of multi-user space division, the frequency spectrum generated in the process of executing the optimal strategy can be collected Efficiency; and for the problem of routing planning, the network delay generated in the process of implementing the optimal strategy can be collected.

另外，在一实施例中，如图2所示，上述步骤S300可以包括但不限于步骤S310和步骤S320。In addition, in an embodiment, as shown in FIG. 2 , the above step S300 may include but not limited to step S310 and step S320.

步骤S310，在候选策略集合中查找与条件信息匹配的目标策略。Step S310, searching for a target policy matching the condition information in the candidate policy set.

步骤S320，当候选策略集合中存在目标策略，将目标策略作为最优策略。Step S320, when there is a target policy in the candidate policy set, use the target policy as the optimal policy.

需要说明的是，在候选策略集合中查找与条件信息匹配的目标策略，如果候选策略集合中存在着目标策略，就会将目标策略作为最优策略；由于候选策略集合包括了若干个对应当前周期的候选策略，将获取得到的条件信息和候选策略进行对比匹配，当对比匹配成功的时候，就可以将对应的目标策略作为最优策略，然后执行最优策略，采集相关的运行性能参数。It should be noted that the target strategy that matches the condition information is searched in the candidate strategy set. If there is a target strategy in the candidate strategy set, the target strategy will be used as the optimal strategy; since the candidate strategy set includes several The candidate strategy is compared and matched with the obtained condition information and the candidate strategy. When the comparison and matching is successful, the corresponding target strategy can be used as the optimal strategy, and then the optimal strategy is executed to collect relevant operating performance parameters.

值得注意的是，条件信息包含着若干要素，当候选策略中包含了条件信息中的全部要素，就可以认定为该候选策略与条件信息匹配成功；而匹配成功的候选策略中如果存在着多个，则将性能指标最优的候选策略作为目标策略。It is worth noting that the condition information contains several elements. When the candidate strategy contains all the elements in the condition information, it can be determined that the candidate strategy matches the condition information successfully; and if there are multiple , the candidate strategy with the best performance index is taken as the target strategy.

另外，在一实施例中，如图3所示，上述步骤S300可以包括但不限于步骤S310和步骤S330。In addition, in an embodiment, as shown in FIG. 3 , the above step S300 may include but not limited to step S310 and step S330.

步骤S330，当候选策略集合中不存在目标策略，根据条件信息生成最优策略。Step S330, when the target policy does not exist in the candidate policy set, an optimal policy is generated according to the condition information.

需要说明的是，在候选策略集合中查找与条件信息匹配的目标策略，如果查找不到与条件信息匹配的目标策略，就会根据条件信息生成最优策略。It should be noted that the target policy matching the condition information is searched in the candidate policy set, and if no target policy matching the condition information is found, an optimal policy will be generated according to the condition information.

值得注意的是，根据条件信息生成最优策略即为根据条件信息而采用目前相关的通用的方式生成最优策略；示例性地，对于面向多用户空分领域中的波束选择问题，可以计算不同波束之间的相关性，选择相关性在给定门限下的并且包含主用户所在波束的波束空分集合；对于路由规划的问题，可以基于迪克斯特拉算法来生成最优策略。It is worth noting that generating the optimal policy based on the conditional information is to generate the optimal policy based on the conditional information in a currently relevant general way; for example, for the beam selection problem in the field of multi-user space division, different For the correlation between beams, select the beam space division set whose correlation is under a given threshold and contains the beam where the primary user is located; for the problem of routing planning, the optimal strategy can be generated based on the Dijkstra algorithm.

另外，在一实施例中，如图4所示，上述步骤S200可以包括但不限于步骤S210、步骤S220和步骤S230。In addition, in an embodiment, as shown in FIG. 4 , the above step S200 may include but not limited to step S210 , step S220 and step S230 .

步骤S210，构建策略统计空间。Step S210, constructing a policy statistics space.

步骤S220，从历史策略集合中确定与当前周期对应的候选策略集合。Step S220, determining a candidate policy set corresponding to the current period from the historical policy sets.

步骤S230，将候选策略集合拷贝至策略统计空间，使得策略统计空间包括候选策略集合。Step S230, copying the candidate policy set to the policy statistics space, so that the policy statistics space includes the candidate policy set.

需要说明的是，首选构建策略统计空间，然后从历史策略集合中确定与当前周期对应的候选策略集合，最后将候选策略集合拷贝到策略统计空间中，使得策略统计空间中包括了候选策略集合。It should be noted that it is preferred to construct the policy statistics space, then determine the candidate policy set corresponding to the current period from the historical policy set, and finally copy the candidate policy set to the policy statistics space, so that the policy statistics space includes the candidate policy set.

值得注意的是，策略统计空间可以为表格形式或者矩阵形式，能够用于存储候选策略集合。根据当前时间周期而从历史策略集合中确定候选策略集合，示例性地，如果当前的时间为上午9点30分，就可以根据当前时间而从历史策略集合中确定时间周期为上午9点至10点所对应的候选策略集合。It is worth noting that the policy statistics space can be in the form of a table or a matrix, and can be used to store a set of candidate policies. Determine the candidate policy set from the historical policy set according to the current time period. For example, if the current time is 9:30 am, the time period can be determined from the historical policy set according to the current time as 9:00 am to 10:00 am The set of candidate policies corresponding to the point.

需要说明的是，将候选策略集合拷贝到策略统计空间处，主要是为了实现策略的统计以及更新处理，对候选策略集合所包含的内容进行更新处理，为后续最优策略选取做好更新准备。It should be noted that the purpose of copying the candidate policy set to the policy statistics space is mainly to implement policy statistics and update processing, update the content contained in the candidate policy set, and prepare for the update of the subsequent optimal policy selection.

另外，在一实施例中，最优策略包括第一性能参数，如图5所示，上述步骤S500可以包括但不限于步骤S510。In addition, in an embodiment, the optimal policy includes the first performance parameter, as shown in FIG. 5 , the above step S500 may include but not limited to step S510.

步骤S510，当策略统计空间存在最优策略，根据运行性能参数更新第一性能参数从而更新候选策略集合。Step S510, when there is an optimal strategy in the strategy statistics space, update the first performance parameter according to the running performance parameter so as to update the candidate strategy set.

需要说明的是，当策略统计空间中已经存在着最优策略，就会将执行最优策略所采集的运行性能参数对策略统计空间中已经存储的最优策略的第一性能参数进行更新处理，从而实现候选策略集合更新。It should be noted that when the optimal strategy already exists in the strategy statistics space, the operation performance parameters collected by executing the optimal strategy will be updated to the first performance parameter of the optimal strategy already stored in the strategy statistics space, In this way, the update of the candidate strategy set is realized.

可以理解的是，最优策略可以包括策略内容和与策略内容对应的第一性能参数，第一性能参数即为对应策略内容的性能指标，而“第一”只是为了区分性能参数所对应的主体有所不同，以便于对实施例进行解释说明。可以理解的是，策略内容是指策略具体包含的执行内容，示例性地，对于路由选择问题，策略内容就可以指代某一条网络路径。It can be understood that the optimal strategy may include the strategy content and the first performance parameter corresponding to the strategy content, the first performance parameter is the performance index corresponding to the strategy content, and the "first" is only to distinguish the subject corresponding to the performance parameter are different in order to explain the examples. It can be understood that the policy content refers to the execution content specifically included in the policy. Exemplarily, for the routing problem, the policy content may refer to a certain network path.

示例性地，对于面向多用户空分领域中的波束选择问题，采集的运行性能参数可以为频谱效率，而第一性能参数可以为平均频谱效率，当策略统计空间中存在着最优策略，就可以利用下述公式对第一性能参数进行更新处理：Exemplarily, for the beam selection problem in the field of multi-user space division, the collected operating performance parameter may be spectral efficiency, and the first performance parameter may be average spectral efficiency. When there is an optimal strategy in the strategy statistics space, then The following formula can be used to update the first performance parameter:

其中，n为统计次数，An为第n次统计时的平均频谱效率，Xn为第n次频谱效率。Wherein, n is the number of statistics, An is the average spectral efficiency of the nth statistics, and Xn is the nth spectral efficiency.

示例性地，对于路由规划的问题，采集的运行性能参数可以为时延，而第一性能参数可以为平均时延和时延方差，当策略统计空间中存在着最优策略，就可以利用下述公式对第一性能参数进行更新处理：Exemplarily, for the problem of routing planning, the collected operational performance parameter can be delay, and the first performance parameter can be average delay and delay variance. When there is an optimal strategy in the policy statistics space, the following can be used The above formula updates the first performance parameter:

其中，n为统计次数，Yn为第n次时延，Dn为第n次统计时的平均时延，Vn为第n统计时的时延方差。Wherein, n is the number of statistics, Yn is the time delay of the nth time, Dn is the average time delay of the nth time of statistics, Vn is the time delay variance of the nth time of statistics.

另外，在一实施例中，如图6所示，上述步骤S500中可以包括但不限于步骤S520和步骤S530。In addition, in an embodiment, as shown in FIG. 6 , the above step S500 may include but not limited to step S520 and step S530 .

步骤S520，当策略统计空间不存在最优策略，并且预设的策略缓冲空间不存在最优策略，将最优策略保存至策略缓冲空间。Step S520, when there is no optimal policy in the policy statistics space and no optimal policy in the preset policy buffer space, save the optimal policy in the policy buffer space.

步骤S530，当策略缓冲空间中的最优策略满足更新条件，将策略缓冲空间中的最优策略保存至策略统计空间，更新候选策略集合。Step S530, when the optimal strategy in the strategy buffer space satisfies the update condition, save the optimal strategy in the strategy buffer space to the strategy statistics space, and update the set of candidate strategies.

需要说明的是，当策略统计空间和预设的策略缓冲空间中都不存在最优策略，就会将最优策略保存到策略缓冲空间中，并且在策略缓冲空间中的最优策略满足更新条件的情况下，就会将策略缓冲空间中的最优策略保存到策略统计空间中，从而实现更新候选策略集合。It should be noted that when the optimal strategy does not exist in the strategy statistics space and the preset strategy buffer space, the optimal strategy will be saved in the strategy buffer space, and the optimal strategy in the strategy buffer space satisfies the update condition In the case of , the optimal strategy in the strategy buffer space will be saved in the strategy statistics space, so as to update the set of candidate strategies.

需要说明的是，策略缓冲空间可以为表格形式或者矩阵形式。设置策略缓冲空间能够很好地避免运行性能参数具有随机性的情况下而对策略统计空间中存储的候选策略集合造成更新不可靠的情况，更加有利于后续最优策略选取。It should be noted that, the strategy buffer space may be in the form of a table or a matrix. Setting the policy buffer space can well avoid the situation that the update of the candidate policy set stored in the policy statistics space is not reliable when the operating performance parameters are random, and it is more conducive to the subsequent optimal policy selection.

另外，在一实施例中，如图7所示，上述步骤S500中可以包括但不限于步骤S540和步骤S550。In addition, in an embodiment, as shown in FIG. 7 , the above step S500 may include but not limited to step S540 and step S550 .

步骤S540，当策略统计空间不存在最优策略，并且预设的策略缓冲空间存在最优策略，根据运行性能参数更新第一性能参数从而更新最优策略。Step S540, when there is no optimal policy in the policy statistics space and there is an optimal policy in the preset policy buffer space, update the first performance parameter according to the running performance parameter so as to update the optimal policy.

步骤S550，当策略缓冲空间中的更新后的最优策略满足更新条件，将策略缓冲空间中的更新后的最优策略保存至策略统计空间，更新候选策略集合。Step S550, when the updated optimal policy in the policy buffer space satisfies the update condition, save the updated optimal policy in the policy buffer space to the policy statistics space, and update the set of candidate policies.

需要说明的是，当策略统计空间中不存在最优策略，但是预设的策略缓冲空间中存在着最优策略，就会将执行最优策略所采集得到的运行性能参数来对已经存储在策略缓冲空间的最优策略的第一性能参数进行更新处理；并且更新后的最优策略满足更新条件，也会将策略缓冲空间中的更新后的最优策略保存至策略统计空间，实现更新候选策略集合。It should be noted that when there is no optimal strategy in the strategy statistics space, but there is an optimal strategy in the preset strategy buffer space, the operating performance parameters collected by executing the optimal The first performance parameter of the optimal strategy in the buffer space is updated; and the updated optimal strategy satisfies the update condition, and the updated optimal strategy in the strategy buffer space will be saved to the strategy statistics space to realize the update candidate strategy gather.

值得注意的是，利用运行性能参数来对已经存储在策略缓冲空间的最优策略的第一性能参数进行更新处理的具体方法可以跟上述当策略统计空间中已经存在着最优策略，将执行最优策略所采集的运行性能参数对策略统计空间中已经存储的最优策略的第一性能参数进行更新处理的方法相同，此处不再赘述。It is worth noting that the specific method of updating the first performance parameter of the optimal strategy already stored in the strategy buffer space by using the operating performance parameter can be the same as the above-mentioned method. When the optimal strategy already exists in the strategy statistics space, the optimal The operation performance parameters collected by the optimal strategy are updated in the same way as the first performance parameters of the optimal strategy stored in the strategy statistics space, which will not be repeated here.

另外，在一实施例中，策略缓冲空间包括多个缓冲策略，缓冲策略包括第二性能参数；如图8所示，上述步骤S520中可以包括但不限于步骤S521。In addition, in an embodiment, the strategy buffer space includes multiple buffer strategies, and the buffer strategy includes the second performance parameter; as shown in FIG. 8 , the above step S520 may include but not limited to step S521.

步骤S521，当策略缓冲空间处于饱和状态，将最优策略替换策略缓冲空间中的第二性能参数最差的缓冲策略。Step S521, when the strategy buffer space is saturated, replace the optimal strategy with the buffer strategy with the worst second performance parameter in the strategy buffer space.

需要说明的是，当将最优策略保存到策略缓冲空间的时候，当策略缓冲空间处于饱和状态，就会将最优策略替换策略缓冲空间中第二性能参数最差的缓冲策略。It should be noted that when the optimal policy is saved in the policy buffer space, when the policy buffer space is saturated, the optimal policy will be replaced by the buffer policy with the worst second performance parameter in the policy buffer space.

可以理解的是，策略缓冲空间处于饱和状态，即为策略缓冲空间中填满了元素，而元素可以为存储于策略缓冲空间中的数据或者图表；示例性地，对于表格形式的策略缓冲空间，策略缓冲空间处于饱和状态即为表格中填满了元素，不能够再添加另外的元素。第二性能参数是指存储在策略缓冲空间中的缓冲策略所对应的性能指标，“第二”只是为了区分性能参数所对应的主体，以便于进行实施例的解释说明。缓冲策略可以理解为已经被执行的存储在策略缓冲空间中的策略。It can be understood that the strategy buffer space is in a saturated state, that is, the strategy buffer space is filled with elements, and the elements can be data or charts stored in the strategy buffer space; for example, for the strategy buffer space in table form, When the strategy buffer space is saturated, it means that the table is full of elements, and no more elements can be added. The second performance parameter refers to the performance index corresponding to the buffer policy stored in the policy buffer space, and "second" is only used to distinguish the subject corresponding to the performance parameter, so as to facilitate the explanation of the embodiment. The buffer strategy can be understood as the executed strategy stored in the strategy buffer space.

另外，在一实施例中，候选策略集合包括多个候选策略，候选策略包括第三性能参数；如图9所示，上述步骤S530中可以包括但不限于步骤S531。In addition, in an embodiment, the set of candidate policies includes multiple candidate policies, and the candidate policies include a third performance parameter; as shown in FIG. 9 , the above step S530 may include but not limited to step S531.

步骤S531，当策略统计空间处于饱和状态，将策略缓冲空间中的最优策略替换候选策略集合中第三性能参数最差的候选策略。Step S531 , when the strategy statistics space is saturated, replace the candidate strategy with the worst third performance parameter in the candidate strategy set with the optimal strategy in the strategy buffer space.

需要说明的是，当将策略缓冲空间中的最优策略保存到策略统计空间中的时候，当策略统计空间处于饱和状态，就会将最优策略替换策略统计空间中第三性能参数最差的候选策略。It should be noted that when the optimal strategy in the strategy buffer space is saved in the strategy statistics space, when the strategy statistics space is saturated, the optimal strategy will be replaced with the third worst performance parameter in the strategy statistics space. candidate strategy.

值得注意的是，在另外的一些实施例中，将最优策略的第一性能参数与策略统计空间中的候选策略的第三性能参数进行性能比较，得出最优策略是性能参数最差的策略，则会将最优策略进行删除，策略统计空间中的候选策略保持不变。It is worth noting that, in some other embodiments, the performance of the first performance parameter of the optimal strategy is compared with the third performance parameter of the candidate strategy in the strategy statistics space, and it is concluded that the optimal strategy is the one with the worst performance parameter strategy, the optimal strategy will be deleted, and the candidate strategies in the strategy statistics space will remain unchanged.

可以理解的是，策略统计空间处于饱和状态，即为策略统计空间中填满了元素；示例性地，对于表格形式的策略统计空间，策略统计空间处于饱和状态即为表格中填满了元素，不能够再添加另外的元素。第三性能参数是指存储在策略统计空间中的候选策略所对应的性能指标，“第三”只是为了区分性能参数所对应的主体，以便于进行实施例的解释说明。候选策略可以理解为已经存储在策略统计空间中的策略。It can be understood that the strategy statistics space is in a saturated state, that is, the strategy statistics space is filled with elements; for example, for the strategy statistics space in the form of a table, the strategy statistics space is in a saturated state, that is, the table is filled with elements, No further elements can be added. The third performance parameter refers to the performance index corresponding to the candidate policy stored in the policy statistics space, and the "third" is only for distinguishing the subject corresponding to the performance parameter, so as to facilitate the explanation of the embodiment. Candidate policies can be understood as policies that have been stored in the policy statistics space.

另外，在一实施例中，候选策略集合包括多个候选策略，候选策略包括第三性能参数；如图10所示，上述步骤S550中可以包括但不限于步骤S551。In addition, in an embodiment, the set of candidate policies includes multiple candidate policies, and the candidate policies include a third performance parameter; as shown in FIG. 10 , the above step S550 may include but not limited to step S551.

步骤S551，当策略统计空间处于饱和状态，将策略缓冲空间中的更新后的最优策略替换候选策略集合中第三性能参数最差的候选策略。Step S551 , when the policy statistics space is saturated, replace the candidate policy with the worst third performance parameter in the candidate policy set with the updated optimal policy in the policy buffer space.

需要说明的是，当将策略缓冲空间中的更新后的最优策略保存到策略统计空间中的时候，当策略统计空间处于饱和状态，就会将更新后的最优策略替换策略统计空间中第三性能参数最差的候选策略。It should be noted that when the updated optimal strategy in the strategy buffer space is saved in the strategy statistics space, when the strategy statistics space is saturated, the updated optimal strategy will be replaced in the strategy statistics space. The worst candidate strategy for three performance parameters.

可以理解的是，策略统计空间处于饱和状态，即为策略统计空间中填满了元素；示例性地，对于表格形式的策略统计空间，策略统计空间处于饱和状态即为表格中填满了元素，不能够再添加另外的元素。第三性能参数是指存储在策略统计空间中的候选策略所对应的性能指标，“第三”只是为了区分性能参数所对应的主体，以便于进行实施例的解释说明。候选策略可以理解为已经存储在策略统计空间中的策略。It can be understood that the strategy statistics space is in a saturated state, that is, the strategy statistics space is filled with elements; for example, for the strategy statistics space in the form of a table, the strategy statistics space is in a saturated state, that is, the table is filled with elements, No further elements can be added. The third performance parameter refers to the performance index corresponding to the candidate strategy stored in the policy statistics space, and the "third" is only to distinguish the subject corresponding to the performance parameter, so as to facilitate the explanation of the embodiment. Candidate policies can be understood as policies that have been stored in the policy statistics space.

在本申请的一些具体实施例中，步骤S530和步骤S550中的更新条件具体为：第一计数器所记录的第一数值达到第一预设门限值，并且第一数值与第二计数器所记录的第二数值之比大于第二预设门限值；其中，第一计数器和第二计数器均根据最优策略而配置，第一计数器被设置为记录最优策略被采用的次数，第二计数器被设置为记录策略缓冲空间中的策略被采用的次数。In some specific embodiments of the present application, the updating conditions in step S530 and step S550 are specifically: the first value recorded by the first counter reaches the first preset threshold value, and the first value is the same as that recorded by the second counter The ratio of the second numerical value of is greater than the second preset threshold value; wherein, the first counter and the second counter are configured according to the optimal strategy, the first counter is set to record the number of times the optimal strategy is adopted, and the second counter Set to keep track of the number of times a policy in the policy buffer space is taken.

需要说明的是，第一计数器被设置为记录最优策略被采用的次数，第二计数器被设置为记录策略缓冲空间中的策略被采用的次数；当第一计数器所记录的第一数值达到第一预设门限值的情况下，就代表着最优策略被才采用的次数达到了第一预设门限值，接着就会计算第一计数器所记录的第一数值和第二计数器记录的第二数值之比是否大于第二预设门限值，如果大于，就会将策略缓冲空间中的最优策略保存至策略统计空间，更新候选策略集合。示例性地，设定第一预设门限值为20，第二预设门限值为0.5,当第一计数器所记录的第一数值为20的时候第二计数器所记录的第二数值为25，则第一数值与第二数值之比即为0.8,0.8大于第二预设门限值设定的0.5，就会认定最优策略被采用的次数较多，就会将策略缓冲空间中的最优策略保存至策略统计空间。其中，“第一数值”和“第二数值”只是为了区分执行计数的主体不同，不应该认定两者属于不同类型的数据。It should be noted that the first counter is set to record the number of times the optimal strategy is adopted, and the second counter is set to record the number of times the strategy in the policy buffer space is adopted; when the first value recorded by the first counter reaches the number In the case of a preset threshold value, it means that the number of times the optimal strategy is adopted has reached the first preset threshold value, and then the first value recorded by the first counter and the value recorded by the second counter will be calculated. Whether the ratio of the second value is greater than the second preset threshold value, and if so, the optimal strategy in the strategy buffer space will be saved in the strategy statistics space, and the set of candidate strategies will be updated. Exemplarily, the first preset threshold value is set to 20, the second preset threshold value is 0.5, and when the first value recorded by the first counter is 20, the second value recorded by the second counter is 25, the ratio of the first value to the second value is 0.8, and if 0.8 is greater than 0.5 set by the second preset threshold value, it will be determined that the optimal strategy is adopted more times, and the strategy buffer space will be saved The optimal strategy of is saved to the strategy statistics space. Among them, the "first value" and "second value" are only used to distinguish the subjects performing the counting, and they should not be deemed to belong to different types of data.

值得注意的是，还可以设定当第二计数器所记录的第二数值达到第三预设门限，但第一数值与第二计数器所记录的第二数值之比不大于第二预设门限值，就将最优策略从策略缓冲空间中删除，很好地防止被采用次数相对较少的策略造成策略缓冲空间拥塞的情况，实现空间资源的充分利用。示例性地，设定第三预设门限为30，第二预设门限值为0.5，当第二数值达到30的情况下，第一计数器所记录的第一数值为3，则第一数值与第二数值之比即为0.1，0.1小于第三预设门限值设定的0.5，就会认定最优策略被采用的次数较少，进而将最优策略从策略缓冲空间中删除，很好地防止被采用次数较少的最优策略过多占用策略缓冲空间。It is worth noting that it can also be set that when the second value recorded by the second counter reaches the third preset threshold, but the ratio of the first value to the second value recorded by the second counter is not greater than the second preset threshold value, the optimal policy will be deleted from the policy buffer space, which can well prevent the policy buffer space from being congested by policies that are adopted relatively few times, and realize the full utilization of space resources. Exemplarily, the third preset threshold is set to 30, and the second preset threshold is 0.5. When the second value reaches 30, the first value recorded by the first counter is 3, and the first value The ratio to the second value is 0.1, and if 0.1 is less than 0.5 set by the third preset threshold value, it will be determined that the optimal strategy is adopted less times, and then the optimal strategy will be deleted from the strategy buffer space. It is better to prevent the optimal strategy that is adopted less frequently from occupying too much strategy buffer space.

值得注意的是，步骤S530和步骤S550中的更新条件还可以为：第一计数器所记录的第一数值达到第一预设门限值；只需要第一计数器所记录的第一数值达到第一预设门限值，就会将策略缓冲空间中的最优策略保存至策略统计空间。另外，当第一计数器所记录的第一数值未达到第一预设门限值，而第二计数器所记录的第二数值达到第三预设门限，就会将最优策略从策略缓冲空间中删除。It is worth noting that the update condition in step S530 and step S550 can also be: the first value recorded by the first counter reaches the first preset threshold value; it only needs that the first value recorded by the first counter reaches the first If the threshold is preset, the optimal strategy in the strategy buffer space will be saved to the strategy statistics space. In addition, when the first value recorded by the first counter does not reach the first preset threshold and the second value recorded by the second counter reaches the third preset threshold, the optimal strategy will be removed from the strategy buffer space. delete.

另外，在一实施例中，如图11所示，执行完步骤S500之后，还可以包括但不限于步骤S600。In addition, in an embodiment, as shown in FIG. 11 , after step S500 is executed, step S600 may also be included but not limited to.

步骤S600，在当前周期的结束时间，根据策略统计空间中的候选策略集合更新历史策略集合。Step S600, at the end of the current period, update the historical policy set according to the candidate policy set in the policy statistics space.

需要说明的是，当前周期结束的时候，策略统计空间中的候选策略集合就会对历史策略集合中当前周期所对应的候选策略集合进行更新处理，以便于后续最优策略选取。示例性地，当前周期为上午9点至上午10点，而当前时间达到上午10点的时候，当前策略统计空间中的候选策略集合就会对历史策略集合中原来存储的对应上午9点至上午10点的候选策略集合进行覆盖更新，从而实现统计更新结果的快速收敛；而当到达后一天上午9点的时候，又可以从前一天覆盖更新的对应候选策略集合中选取最优策略，实现了后续最优策略的快速有效选取。之所以采取上述操作，是因为不同时段内的最优策略可能是不同的。It should be noted that when the current period ends, the candidate policy set in the policy statistics space will update the candidate policy set corresponding to the current period in the historical policy set, so as to facilitate subsequent optimal policy selection. Exemplarily, the current cycle is from 9:00 am to 10:00 am, and when the current time reaches 10:00 am, the candidate policy set in the current policy statistics space will compare the corresponding 9:00 am to am The candidate strategy set at 10:00 is covered and updated to achieve rapid convergence of statistical update results; and when it arrives at 9:00 a.m. the next day, the optimal strategy can be selected from the corresponding candidate strategy set covered and updated the previous day, realizing the follow-up Fast and efficient selection of optimal strategies. The reason for taking the above operation is that the optimal strategy in different time periods may be different.

为了更加清楚地说明本申请实施例提供的策略管理方法的管理流程，下面以具体的示例进行说明。In order to more clearly illustrate the management process of the policy management method provided by the embodiment of the present application, a specific example is used below for description.

示例一：Example one:

本实施例面向多用户空分领域中的波束选择问题。首先简单介绍波束空分选择问题：从给定的多个波束(例如64个)中，选择合适的若干个波束形成空分组合进行空分传输。在该问题中，一个策略就代表一种波束空分组合。波束空分组合的性能具有随机性，即使波束组合相同，但波束内用户或用户的信道可能会不同，导致空分组合的吞吐量或频谱效率不同。波束空分组合的平均性能无法提前精确计算而得到，因为无法预先获知所有可能的性能取值和对应的概率分布。This embodiment is oriented to the problem of beam selection in the field of multi-user space division. First, the problem of beam space division selection is briefly introduced: from a given number of beams (for example, 64), select a number of appropriate beamforming space division combinations for space division transmission. In this problem, a strategy represents a beam space division combination. The performance of beam space division combining is random. Even if the beam combination is the same, users or channels of users in the beam may be different, resulting in different throughput or spectral efficiency of space division combining. The average performance of the beam space division combination cannot be accurately calculated in advance, because all possible performance values and corresponding probability distributions cannot be known in advance.

本实施例以一个周期为例，阐述本申请的实施流程：This embodiment takes one cycle as an example to illustrate the implementation process of this application:

一个周期开始时，从历史策略集合中挑选出对应的候选策略集合，用候选策略集合的内容覆盖策略统计空间的内容，完成赋初值的过程。候选策略集合和策略统计空间内的内容包括具体的候选策略(或候选策略的索引)，以及候选策略对应的第三性能参数。其中，具体候选策略的存储方式可以采用存***束索引的方法；第三性能参数包括该波束空分策略的平均频谱效率。At the beginning of a cycle, select the corresponding candidate policy set from the historical policy set, and use the content of the candidate policy set to overwrite the content of the policy statistics space to complete the process of initial value assignment. The content in the candidate strategy set and the strategy statistics space includes a specific candidate strategy (or an index of the candidate strategy) and a third performance parameter corresponding to the candidate strategy. Wherein, the storage method of the specific candidate strategy may adopt the method of storing the beam index; the third performance parameter includes the average spectral efficiency of the beam space division strategy.

在本周期内的某一次多用户调度中，波束空分条件为：对于必须在本次调度中被调度的主用户，该用户所在的波束一定要被包含在波束空分集合中。例如，可以采用现有的策略生成方法，即计算不同波束之间的相关性，选择相关性在给定门限之下的、包含主用户所在波束的波束空分集合；或者，对于给定的空分条件(即必须包含某一给定波束)，从策略统计空间中挑选出包含该波束的，且平均频谱效率最高的波束空分组合。如果没有这样的组合，采取现有的策略生成方法。In a certain multi-user scheduling in this period, the beam space division condition is: for the primary user that must be scheduled in this scheduling, the beam where the user is located must be included in the beam space division set. For example, the existing policy generation method can be used, that is, to calculate the correlation between different beams, and select the beam space division set containing the beam where the primary user is located whose correlation is below a given threshold; or, for a given space According to the sub-conditions (that is, a given beam must be included), the space division combination of beams containing the beam and having the highest average spectral efficiency is selected from the strategy statistical space. If there is no such combination, existing policy generation methods are adopted.

基于得到的波束空分策略(即为最优策略)进行空分传输，得到该空分策略的频谱效率。如果该最优策略已被策略统计空间存储，那么用新反馈的频谱效率更新被存储的该候选策略的平均频谱效率，公式如下：Space division transmission is performed based on the obtained beam space division strategy (that is, the optimal strategy), and the spectrum efficiency of the space division strategy is obtained. If the optimal strategy has been stored in the strategy statistics space, then update the stored average spectral efficiency of the candidate strategy with the spectral efficiency of the new feedback, the formula is as follows:

其中，n为统计次数，An为第n次统计时的平均频谱效率，Xn为第n次频谱效率。相较于累和求平均，采用如上的递推公式有助于减小存储量。Wherein, n is the number of statistics, An is the average spectral efficiency of the nth statistics, and Xn is the nth spectral efficiency. Compared with accumulating and averaging, using the above recursive formula helps to reduce the amount of storage.

如果该最优策略没有被策略统计空间存储，但被策略缓冲空间存储，那么用新反馈的频谱效率更新被缓存的缓存策略的平均频谱效率，利用上述公式进行更新。同时，该最优策略对应的第一计数器加1，所有策略缓冲空间内策略对应的第二计数器加1。If the optimal policy is not stored in the policy statistics space, but is stored in the policy buffer space, then the average spectral efficiency of the cached cached strategy is updated with the newly fed back spectral efficiency, and the above formula is used for updating. At the same time, the first counter corresponding to the optimal policy is incremented by 1, and the second counters corresponding to the policies in all policy buffer spaces are incremented by 1.

如果该策略没有被策略统计空间存储，且没有被策略缓冲空间缓存，那么将该最优策略加入策略缓冲空间当中。如果策略缓冲空间处于饱和状态，则替代第二性能参数最差的缓存策略，并给新缓存的最优策略配置初值为1的第一计数器和第二计数器。其中，第一计数器被设置为记录该最优策略加入策略缓冲空间后的被采用次数，第二计数器被设置为记录该最优策略加入策略缓冲空间后所有缓存策略的被采用次数。其中，上述策略缓冲空间内的策略替换方法有助于在策略缓冲空间内保留性能指标最好的若干个策略。If the policy is not stored in the policy statistics space and not cached in the policy buffer space, then add the optimal policy into the policy buffer space. If the policy buffer space is in a saturated state, replace the cache policy with the worst second performance parameter, and configure the first counter and the second counter with an initial value of 1 for the newly cached optimal policy. Wherein, the first counter is set to record the number of times the optimal policy is used after it is added to the policy buffer space, and the second counter is set to record the number of times all cache policies are used after the optimal policy is added to the policy buffer space. Wherein, the above policy replacement method in the policy buffer space helps to retain several policies with the best performance indicators in the policy buffer space.

接着判断发生更新的缓存策略的第一计数器所记录的第一数值是否大于第一预设门限值。如果是，则将该缓存策略从策略缓冲空间删除并加入策略统计空间。如果策略统计空间满，则利用该缓存策略与候选策略进行比较，当缓存策略为性能参数最差的策略，则策略统计空间中的候选策略保持不变，否则缓存策略就会替换策略统计空间中第三性能参数最差的候选策略。此外，如果第二计数器所记录的第二数值达到第三预设门限值，则将缓存策略从策略缓冲空间中删除。上述策略统计空间内的策略替换方法有助于在策略统计空间内保留性能指标较好的若干个策略。Then it is judged whether the first value recorded by the first counter of the updated cache policy is greater than the first preset threshold value. If so, the cache policy is deleted from the policy buffer space and added to the policy statistics space. If the strategy statistics space is full, use the cache strategy to compare with the candidate strategy. When the cache strategy is the strategy with the worst performance parameters, the candidate strategy in the strategy statistics space remains unchanged, otherwise the cache strategy will replace the strategy statistics space. The worst candidate strategy for the third performance parameter. In addition, if the second value recorded by the second counter reaches the third preset threshold value, the cache policy is deleted from the policy buffer space. The strategy replacement method in the strategy statistics space mentioned above helps to retain several strategies with better performance indicators in the strategy statistics space.

本周期结束时，策略统计空间内容被传输至历史策略集合，直接覆盖历史策略集合中对应的候选策略集合，完成历史策略集合更新的过程。At the end of this period, the content of the policy statistics space is transmitted to the historical policy set, directly covering the corresponding candidate policy set in the historical policy set, and completing the process of updating the historical policy set.

示例二：Example two:

本实施例面向网络路由问题。首先简单介绍网络路由问题：存在一个源节点，一个目的节点，以及多个中转节点，节点之间存在带有一定开销(例如时延)的路径，要求寻找一条从源节点到目的节点的路径，使得总开销最小。在这个问题中，一个策略代表一条路径，该路径从源节点出发，可能经过多个中转节点，止于目的节点。策略的性能具有随机性，因为节点之间的开销同样具有随机性，例如节点之间的传输时延会随背景业务量而波动。策略的平均性能无法提前精确计算而得到，因为无法预先获知所有可能的性能取值和对应的概率分布。This embodiment is oriented to network routing problems. First, a brief introduction to the network routing problem: there is a source node, a destination node, and multiple transit nodes, there is a path with a certain overhead (such as delay) between the nodes, and it is required to find a path from the source node to the destination node, minimize the total overhead. In this problem, a policy represents a path that starts from a source node, may pass through multiple transit nodes, and ends at a destination node. The performance of the strategy is random, because the overhead between nodes is also random, for example, the transmission delay between nodes will fluctuate with the background traffic. The average performance of a strategy cannot be accurately calculated in advance, because all possible performance values and corresponding probability distributions cannot be known in advance.

本实施例以一个周期为例，阐述具体的工作流程：This embodiment takes a cycle as an example to illustrate the specific workflow:

一个周期开始时，从历史策略集合中挑选出对应的候选策略集合，用候选策略集合的内容覆盖策略统计空间的内容，完成赋初值的过程。候选策略集合和策略统计空间内的内容包括具体的候选策略(或候选策略的索引)，以及候选策略对应的第三性能参数。其中，具体候选策略的存储方式可以采用顺序存储节点索引的方法；第三性能参数包括该路由策略的平均时延和时延方差。At the beginning of a cycle, select the corresponding candidate policy set from the historical policy set, and use the content of the candidate policy set to overwrite the content of the policy statistics space to complete the process of initial value assignment. The content in the candidate strategy set and the strategy statistics space includes a specific candidate strategy (or an index of the candidate strategy) and a third performance parameter corresponding to the candidate strategy. Wherein, the storage method of the specific candidate strategy may adopt the method of sequentially storing node indexes; the third performance parameter includes the average delay and delay variance of the routing strategy.

在本周期内的某一次路由规划中，路由规划条件为：在本次规划中，必须包含源节点和目的节点，且预先给定备选中转节点集合。现有策略生成方法：例如可以采用经典的迪克斯特拉算法。对于给定的路由条件(即必须包含源节点和目的节点，且中转节点选择受限)，从策略统计空间中挑选出满足上述条件的，且时延方差小于一定门限的，且平均时延最小的路由策略。如果没有这样的策略，采取现有的策略生成方法。其中，增加对时延方差的判断，有利于提高选出策略的时延稳定性。In a certain route planning in this period, the route planning condition is: in this plan, the source node and the destination node must be included, and a set of alternative transit nodes is given in advance. Existing policy generation methods: For example, the classic Dijkstra algorithm can be used. For a given routing condition (that is, the source node and the destination node must be included, and the selection of transit nodes is limited), select the ones that meet the above conditions from the policy statistics space, and the delay variance is less than a certain threshold, and the average delay is the smallest routing strategy. If there is no such policy, take the existing policy generation method. Among them, increasing the judgment of the delay variance is beneficial to improve the delay stability of the selected strategy.

基于得到的最优策略进行网络信息传输，并得到该最优策略的时延。如果该最优策略已被策略统计空间存储，那么用新反馈的时延更新被存储的该候选策略的平均时延和时延方差，公式如下：Network information transmission is performed based on the obtained optimal strategy, and the time delay of the optimal strategy is obtained. If the optimal policy has been stored in the policy statistics space, then update the stored average delay and delay variance of the candidate policy with the new feedback delay, the formula is as follows:

其中，n为统计次数，Yn为第n次时延，Dn为第n次统计时的平均时延，Vn为第n统计时的时延方差。采用如上的递推公式计算均值和方差有助于减小存储量。如果该最优策略没有被策略统计空间存储，但被策略缓冲空间存储，那么用新反馈的时延更新被策略缓冲空间存储的缓冲策略的平均时延和时延方差，利用上述公式进行更新。同时，最优策略对应的第一计数器加1，所有策略缓冲空间内策略对应的第二计数器加1。Wherein, n is the number of statistics, Yn is the time delay of the nth time, Dn is the average time delay of the nth time of statistics, Vn is the time delay variance of the nth time of statistics. Using the above recursive formula to calculate the mean and variance helps to reduce the storage capacity. If the optimal policy is not stored in the policy statistics space, but is stored in the policy buffer space, then use the new feedback delay to update the average delay and delay variance of the buffer policy stored in the policy buffer space, and use the above formula to update. At the same time, the first counter corresponding to the optimal policy is incremented by 1, and the second counters corresponding to the policies in all policy buffer spaces are incremented by 1.

如果该最优策略没有被策略统计空间存储，且没有被策略缓冲空间缓存，那么将该最优策略加入策略缓冲空间当中。如果策略缓冲空间处于饱和状态，则替代第二性能参数最差的缓存策略，并给新缓存的最优策略配置初值为1的第一计数器和第二计数器。If the optimal policy is not stored in the policy statistics space and not cached in the policy buffer space, then add the optimal policy into the policy buffer space. If the policy buffer space is in a saturated state, replace the cache policy with the worst second performance parameter, and configure the first counter and the second counter with an initial value of 1 for the newly cached optimal policy.

另外，如图12所示，本申请的一个实施例还提供了一种策略管理设备700，该策略管理设备700包括：存储器720、处理器710及存储在存储器720上并可在处理器710上运行的计算机程序。In addition, as shown in FIG. 12 , an embodiment of the present application also provides a policy management device 700, the policy management device 700 includes: a memory 720, a processor 710, and A computer program that runs.

处理器710和存储器720可以通过总线或者其他方式连接。The processor 710 and the memory 720 may be connected via a bus or in other ways.

需要说明的是，本实施例中的策略管理设备700和上述实施例中的策略管理方法属于相同的发明构思，因此这些实施例具有相同的实现原理以及技术效果，此处不再详述。It should be noted that the policy management device 700 in this embodiment and the policy management method in the foregoing embodiments belong to the same inventive concept, so these embodiments have the same implementation principle and technical effect, and will not be described in detail here.

实现上述实施例的策略管理方法所需的非暂态软件程序以及指令存储在存储器720中，当被处理器710执行时，执行上述实施例中的策略管理方法，例如，执行以上描述的图1中的方法步骤S100至S500、图2中的方法步骤S310至S320、图3中的方法步骤S310至S330、图4中的方法步骤S210至S230、图5中的方法步骤S510、图6中的方法步骤S520至S530、图7中的方法步骤S540至S550、图8中的方法步骤S521、图9中的方法步骤S531、图10中的方法步骤S551、图11中的方法步骤S600。The non-transitory software programs and instructions required to implement the policy management method of the above-mentioned embodiment are stored in the memory 720, and when executed by the processor 710, the policy management method in the above-mentioned embodiment is executed, for example, executing the above-described Figure 1 Method steps S100 to S500 in, method steps S310 to S320 in Fig. 2, method steps S310 to S330 in Fig. 3, method steps S210 to S230 in Fig. 4, method steps S510 in Fig. 5, method steps in Fig. 6 Method steps S520 to S530, method steps S540 to S550 in FIG. 7, method step S521 in FIG. 8, method step S531 in FIG. 9, method step S551 in FIG. 10, method step S600 in FIG.

此外，本申请的一个实施例还提供了一种计算机可读存储介质，该计算机可读存储介质存储有计算机可执行指令，该计算机可执行指令被一个处理器710执行，例如，被上述策略管理设备700实施例中的一个处理器710执行，可使得上述处理器710执行上述实施例中的策略管理方法，例如，执行以上描述的图1中的方法步骤S100至S500、图2中的方法步骤S310至S320、图3中的方法步骤S310至S330、图4中的方法步骤S210至S230、图5中的方法步骤S510、图6中的方法步骤S520至S530、图7中的方法步骤S540至S550、图8中的方法步骤S521、图9中的方法步骤S531、图10中的方法步骤S551、图11中的方法步骤S600。In addition, an embodiment of the present application also provides a computer-readable storage medium, the computer-readable storage medium stores computer-executable instructions, and the computer-executable instructions are executed by a processor 710, for example, managed by the above policy Execution by a processor 710 in the embodiment of the device 700 can cause the processor 710 to execute the policy management method in the above embodiment, for example, execute the method steps S100 to S500 in FIG. 1 and the method steps in FIG. 2 described above S310 to S320, method steps S310 to S330 in FIG. 3 , method steps S210 to S230 in FIG. 4 , method steps S510 in FIG. 5 , method steps S520 to S530 in FIG. 6 , method steps S540 to S530 in FIG. 7 S550, method step S521 in FIG. 8 , method step S531 in FIG. 9 , method step S551 in FIG. 10 , method step S600 in FIG. 11 .

本申请实施例包括：获取条件信息；从历史策略集合中选取与当前周期对应的候选策略集合；基于条件信息和候选策略集合得到最优策略；采集执行最优策略而得到的运行性能参数；根据运行性能参数更新候选策略集合。根据本申请实施例提供的方案，首先获取条件信息以及从历史策略集合中选取与当前周期对应的候选策略集合，接着根据条件信息和候选策略集合得到最优策略，然后执行最优策略并且采集执行最优策略而得到的运行性能参数，最后根据运行性能参数对候选策略集合进行更新，实现基于条件信息而进行最优策略选取，还可以根据运行性能参数而对候选策略集合进行更新处理，以便于后续最优策略选取。The embodiment of the present application includes: obtaining condition information; selecting a candidate policy set corresponding to the current period from the historical policy set; obtaining the optimal policy based on the condition information and the candidate policy set; collecting the operating performance parameters obtained by executing the optimal policy; Run performance parameters to update the set of candidate policies. According to the solution provided by the embodiment of the present application, first obtain the condition information and select the candidate policy set corresponding to the current period from the historical policy set, then obtain the optimal policy according to the condition information and the candidate policy set, then execute the optimal policy and collect and execute The operating performance parameters obtained from the optimal strategy, and finally update the candidate policy set according to the operating performance parameters to realize optimal policy selection based on condition information, and update the candidate policy set according to the operating performance parameters, so as to facilitate Subsequent optimal strategy selection.

本领域普通技术人员可以理解，上文中所公开方法中的全部或某些步骤、***可以被实施为软件、固件、硬件及其适当的组合。某些物理组件或所有物理组件可以被实施为由处理器，如中央处理器、数字信号处理器或微处理器执行的软件，或者被实施为硬件，或者被实施为集成电路，如专用集成电路。这样的软件可以分布在计算机可读介质上，计算机可读介质可以包括计算机存储介质(或非暂时性介质)和通信介质(或暂时性介质)。如本领域普通技术人员公知的，术语计算机存储介质包括在用于存储信息(诸如计算机可读指令、数据结构、程序模块或其他数据)的任何方法或技术中实施的易失性和非易失性、可移除和不可移除介质。计算机存储介质包括但不限于RAM、ROM、EEPROM、闪存或其他存储器技术、CD-ROM、数字多功能盘(DVD)或其他光盘存储、磁盒、磁带、磁盘存储或其他磁存储装置、或者可以用于存储期望的信息并且可以被计算机访问的任何其他的介质。此外，本领域普通技术人员公知的是，通信介质通常包含计算机可读指令、数据结构、程序模块或者诸如载波或其他传输机制之类的调制数据信号中的其他数据，并且可包括任何信息递送介质。Those of ordinary skill in the art can understand that all or some of the steps and systems in the methods disclosed above can be implemented as software, firmware, hardware and an appropriate combination thereof. Some or all of the physical components may be implemented as software executed by a processor, such as a central processing unit, digital signal processor, or microprocessor, or as hardware, or as an integrated circuit, such as an application-specific integrated circuit . Such software may be distributed on computer readable media, which may include computer storage media (or non-transitory media) and communication media (or transitory media). As known to those of ordinary skill in the art, the term computer storage media includes both volatile and nonvolatile media implemented in any method or technology for storage of information, such as computer readable instructions, data structures, program modules, or other data. permanent, removable and non-removable media. Computer storage media include, but are not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital versatile disk (DVD) or other optical disk storage, magnetic cartridges, tape, magnetic disk storage or other magnetic storage devices, or can Any other medium used to store desired information and which can be accessed by a computer. In addition, as is well known to those of ordinary skill in the art, communication media typically embodies computer readable instructions, data structures, program modules, or other data in a modulated data signal such as a carrier wave or other transport mechanism, and may include any information delivery media .

以上是对本申请的一些实施进行了具体说明，但本申请并不局限于上述实施方式，熟悉本领域的技术人员在不违背本申请精神的前提下还可作出种种的等同变形或替换，这些等同的变形或替换均包含在本申请权利要求所限定的范围内。The above is a specific description of some implementations of the present application, but the present application is not limited to the above-mentioned embodiments. Those skilled in the art can also make various equivalent deformations or replacements without violating the spirit of the present application. Any modification or substitution is included within the scope defined by the claims of the present application.

Claims

一种策略管理方法，包括：A policy management method comprising:

获取条件信息；Get conditional information;

从历史策略集合中选取与当前周期对应的候选策略集合；Select a candidate policy set corresponding to the current period from the historical policy set;

基于所述条件信息和所述候选策略集合得到最优策略；obtaining an optimal strategy based on the condition information and the set of candidate strategies;

采集执行所述最优策略而得到的运行性能参数；collecting operating performance parameters obtained by executing the optimal strategy;

根据所述运行性能参数更新所述候选策略集合。Updating the set of candidate strategies according to the running performance parameters.
根据权利要求1所述的策略管理方法，其中，所述基于所述条件信息和所述候选策略集合得到最优策略，包括：The policy management method according to claim 1, wherein said obtaining an optimal policy based on said condition information and said set of candidate policies comprises:

在所述候选策略集合中查找与所述条件信息匹配的目标策略；Searching for a target policy that matches the condition information in the set of candidate policies;

当所述候选策略集合中存在所述目标策略，将所述目标策略作为所述最优策略；When the target strategy exists in the set of candidate strategies, use the target strategy as the optimal strategy;

当所述候选策略集合中不存在所述目标策略，根据所述条件信息生成所述最优策略。When the target policy does not exist in the set of candidate policies, generate the optimal policy according to the condition information.
根据权利要求2所述的策略管理方法，其中，所述从历史策略集合中选取与当前周期对应的候选策略集合，包括：The policy management method according to claim 2, wherein said selecting the candidate policy set corresponding to the current cycle from the historical policy set comprises:

构建策略统计空间；Build strategy statistics space;

从历史策略集合中确定与当前周期对应的所述候选策略集合；determining the candidate policy set corresponding to the current period from the historical policy set;

将所述候选策略集合拷贝至所述策略统计空间，使得所述策略统计空间包括所述候选策略集合。Copying the set of candidate policies to the policy statistics space, so that the policy statistics space includes the set of candidate policies.
根据权利要求3所述的策略管理方法，其中，所述最优策略包括第一性能参数，所述根据所述运行性能参数更新所述候选策略集合，包括：The policy management method according to claim 3, wherein the optimal policy includes a first performance parameter, and updating the set of candidate policies according to the running performance parameter comprises:

当所述策略统计空间存在所述最优策略，根据所述运行性能参数更新所述第一性能参数从而更新所述候选策略集合。When the optimal strategy exists in the strategy statistics space, the first performance parameter is updated according to the operation performance parameter, so as to update the set of candidate strategies.
根据权利要求4所述的策略管理方法，其中，所述根据所述运行性能参数更新所述候选策略集合，还包括：The policy management method according to claim 4, wherein said updating said set of candidate policies according to said operating performance parameters further comprises:

当所述策略统计空间不存在所述最优策略，并且预设的策略缓冲空间不存在所述最优策略，将所述最优策略保存至所述策略缓冲空间；When the optimal policy does not exist in the policy statistics space and the optimal policy does not exist in the preset policy buffer space, save the optimal policy in the policy buffer space;

当所述策略缓冲空间中的所述最优策略满足更新条件，将所述策略缓冲空间中的所述最优策略保存至所述策略统计空间，更新所述候选策略集合。When the optimal policy in the policy buffer space satisfies the update condition, save the optimal policy in the policy buffer space to the policy statistics space, and update the set of candidate policies.
根据权利要求4所述的策略管理方法，其中，所述根据所述运行性能参数更新所述候选策略集合，还包括：The policy management method according to claim 4, wherein said updating said set of candidate policies according to said operating performance parameters further comprises:

当所述策略统计空间不存在所述最优策略，并且预设的策略缓冲空间存在所述最优策略，根据所述运行性能参数更新所述第一性能参数从而更新所述最优策略；When the optimal strategy does not exist in the strategy statistics space and the optimal strategy exists in the preset strategy buffer space, updating the first performance parameter according to the operating performance parameter so as to update the optimal strategy;

当所述策略缓冲空间中的更新后的所述最优策略满足更新条件，将所述策略缓冲空间中的更新后的所述最优策略保存至所述策略统计空间，更新所述候选策略集合。When the updated optimal policy in the policy buffer space satisfies the update condition, save the updated optimal policy in the policy buffer space to the policy statistics space, and update the set of candidate policies .
根据权利要求5所述的策略管理方法，其中，所述策略缓冲空间包括多个缓冲策略，所述缓冲策略包括第二性能参数；所述将所述最优策略保存至所述策略缓冲空间，包括：The policy management method according to claim 5, wherein the policy buffer space includes a plurality of buffer policies, and the buffer policies include a second performance parameter; the saving the optimal policy into the policy buffer space, include:

当所述策略缓冲空间处于饱和状态，将所述最优策略替换所述策略缓冲空间中的第二性能参数最差的缓冲策略。When the strategy buffer space is in a saturated state, replace the optimal strategy with the buffer strategy with the worst second performance parameter in the strategy buffer space.
根据权利要求5所述的策略管理方法，其中，所述候选策略集合包括多个候选策略，所述候选策略包括第三性能参数；所述将所述策略缓冲空间中的所述最优策略保存至所述策略统计空间，包括：The policy management method according to claim 5, wherein, the set of candidate policies includes a plurality of candidate policies, and the candidate policies include a third performance parameter; and storing the optimal policy in the policy buffer space to the policy statistics space, including:

当所述策略统计空间处于饱和状态，将所述策略缓冲空间中的所述最优策略替换所述候选策略集合中第三性能参数最差的候选策略。When the policy statistics space is in a saturated state, replacing the optimal policy in the policy buffer space with the candidate policy with the worst third performance parameter in the candidate policy set.
根据权利要求6所述的策略管理方法，其中，所述候选策略集合包括多个候选策略，所述候选策略包括第三性能参数；所述将所述策略缓冲空间中的更新后的所述最优策略保存至所述策略统计空间，包括：The policy management method according to claim 6, wherein the set of candidate policies includes a plurality of candidate policies, and the candidate policies include a third performance parameter; the updated latest in the policy buffer space The optimal strategy is saved to the strategy statistics space, including:

当所述策略统计空间处于饱和状态，将所述策略缓冲空间中的更新后的所述最优策略替换所述候选策略集合中第三性能参数最差的候选策略。When the policy statistics space is in a saturated state, replacing the candidate policy with the worst third performance parameter in the candidate policy set with the updated optimal policy in the policy buffer space.
根据权利要求5或6所述的策略管理方法，其中，所述更新条件具体为：The policy management method according to claim 5 or 6, wherein the update condition is specifically:

第一计数器所记录的第一数值达到第一预设门限值，并且所述第一数值与第二计数器所记录的第二数值之比大于第二预设门限值；其中，所述第一计数器和所述第二计数器均根据所述最优策略而配置，所述第一计数器被设置为记录所述最优策略被采用的次数，所述第二计数器被设置为记录所述策略缓冲空间中的策略被采用的次数。The first value recorded by the first counter reaches the first preset threshold value, and the ratio of the first value to the second value recorded by the second counter is greater than the second preset threshold value; wherein, the first value is greater than the second preset threshold value; A counter and the second counter are both configured according to the optimal strategy, the first counter is configured to record the number of times the optimal strategy is adopted, and the second counter is configured to record the strategy buffer The number of times a policy in the space is taken.
根据权利要求3所述的策略管理方法，其中，所述策略管理方法，还包括：The policy management method according to claim 3, wherein the policy management method further comprises:

在当前周期的结束时间，根据所述策略统计空间中的所述候选策略集合更新所述历史策略集合。At the end of the current period, the historical policy set is updated according to the candidate policy set in the policy statistics space.
一种策略管理设备，包括：A policy management device comprising:

至少一个处理器；at least one processor;

至少一个存储器，被设置为存储至少一个程序；at least one memory configured to store at least one program;

当至少一个所述程序被至少一个所述处理器执行时实现如权利要求1至11任意一项所述的策略管理方法。The policy management method according to any one of claims 1 to 11 is realized when at least one of the programs is executed by at least one of the processors.
一种计算机可读存储介质，存储有计算机可执行指令，其中，所述计算机可执行指令被设置为执行权利要求1至11任意一项所述的策略管理方法。A computer-readable storage medium storing computer-executable instructions, wherein the computer-executable instructions are configured to execute the policy management method according to any one of claims 1 to 11.