CN116112422A

CN116112422A - Routing decision method and device

Info

Publication number: CN116112422A
Application number: CN202211689757.7A
Authority: CN
Inventors: 张珮明; 李波; 卢建刚; 曾瑛; 刘元杰; 张思拓; 李星南; 梁文娟; 朱文红; 许世纳; 姜文婷; 黄小红
Original assignee: Electric Power Dispatch Control Center of Guangdong Power Grid Co Ltd
Current assignee: Electric Power Dispatch Control Center of Guangdong Power Grid Co Ltd
Priority date: 2022-12-27
Filing date: 2022-12-27
Publication date: 2023-05-12

Abstract

The embodiment of the application provides a routing decision method and device, comprising the following steps: establishing an agent for carrying out route planning on service data of a target class service; taking current network information as input of a route evaluation network, and outputting corresponding route evaluation information by the route evaluation network; taking the current network information and the route evaluation information as the input of a route planning network of the intelligent agent, and outputting corresponding route strategy information by the route planning network; generating a segment routing list according to the routing strategy information; and forwarding the service data in the network according to the segment routing list. The method and the system can provide routing strategies which meet service requirements and are optimized efficiently for different types of services by utilizing different intelligent agents.

Description

Routing decision method and device

Technical Field

The embodiment of the application relates to the technical field of networks, in particular to a routing decision method and a routing decision device.

Background

The power business carried by the power data network has the characteristics of various kinds, large data flow, dynamic change, high speed, continuous growth and the like, and higher requirements are put on network quality. The method aims at designing unified path decision optimization targets for power businesses with different requirements, establishes a complex mathematical model, and has the problems of difficult solution, poor instantaneity and poor business pertinence. How to provide routing strategies for different services, which meet service needs and are optimized efficiently, is a problem to be solved.

Disclosure of Invention

In view of this, an objective of the embodiments of the present application is to provide a routing policy method and apparatus, which can provide an efficient and optimized routing decision for different services in a power data network.

Based on the above objects, an embodiment of the present application provides a routing decision method, including:

establishing an agent for carrying out route planning on service data of a target class service;

taking current network information as input of a route evaluation network, and outputting corresponding route evaluation information by the route evaluation network; the route evaluation network is used for evaluating route strategy information obtained by the intelligent agent through route planning;

taking the current network information and the route evaluation information as the input of a route planning network of the intelligent agent, and outputting corresponding route strategy information by the route planning network;

generating a segment routing list according to the routing strategy information;

and forwarding the service data in the network according to the segment routing list.

Optionally, before the agent is built, the method includes:

dividing the service quality grades of the businesses according to the requirements of different businesses on the service quality;

dividing the services with the same service quality level into the same service class;

the establishment of the agent comprises: and establishing corresponding agents for each type of service.

Optionally, the service quality parameters of the target service include bandwidth, time delay, jitter and packet loss rate; establishing an agent comprising:

determining the weight of each service quality parameter meeting the service quality requirement of the target class service according to the influence degree of each service quality parameter of the target class service on the service;

and determining a feedback rewarding function of the intelligent agent according to the weight of each service quality parameter.

Optionally, taking current network information as input of a route evaluation network, outputting corresponding route evaluation information by the route evaluation network, including:

calculating a feedback rewarding value by utilizing the feedback rewarding function according to the current network information;

and the route evaluation network determines route evaluation information corresponding to the route strategy information according to the current network information and the feedback rewarding value.

Optionally, the feedback reward function of the agent is:

r _i ＝w _i,1 ×throughput _i +w _i,2 ×delay _i +w _i,3 ×jitter _i +w _i,4 ×loss _i (1)

wherein, through put _i For network throughput of service i, delay _i Jitter is the delay of service i _i For jitter of service i, loss _i Packet loss rate, w, for service i _i,1 Bandwidth weight value, w, for service i _i,2 Is the time delay weight value, w, of the service i _i,3 Jitter weight value, w, for service i _i,4 And the packet loss rate weight value of the service i.

Optionally, before the current network information and the route evaluation information are input into the route planning network of the agent, the method further includes:

acquiring all network information;

screening out target network information corresponding to the target class service from all network information;

taking the current network information and the route evaluation information as the input of the route planning network of the intelligent agent comprises the following steps:

taking the target network information and the route evaluation information as the input of a route planning network of the intelligent agent;

taking current network information as input of a route evaluation network comprises the following steps: and taking all network information as the input of the route evaluation network.

Optionally, generating a segment routing list according to the routing policy information includes:

according to the routing strategy information, a forwarding path is calculated according to a preset routing algorithm;

determining a segment route list according to the forwarding path;

forwarding the service data in the network according to the segment routing list, including:

issuing the segment routing list to a network node;

and the network node forwards the service data according to the segment routing list.

Optionally, determining a segment routing list according to the forwarding path includes:

comparing the forwarding path with a pre-calculated shortest path, and removing repeated paths and network nodes;

and generating the segment routing list according to the rest paths and the network nodes.

Optionally, the method further comprises:

acquiring network information obtained after forwarding the service data in a network according to the segment routing list;

and updating the network parameters of the route planning network and the network parameters of the route evaluation network according to the current network information, the route strategy information, the feedback rewarding value and the network information.

The embodiment of the application also provides a routing decision device, which comprises:

the establishing module is used for establishing an agent for carrying out route planning on the service data of the target service;

the evaluation module is used for taking the current network information as the input of the route evaluation network and outputting the corresponding route evaluation information by the route evaluation network; the route evaluation network is used for evaluating route strategy information obtained by the intelligent agent through route planning;

the planning module is used for taking the current network information and the route evaluation information as the input of the route planning network of the intelligent agent, and outputting corresponding route strategy information by the route planning network;

the generation module is used for generating a segment routing list according to the routing strategy information;

and the execution module is used for forwarding the service data in the network according to the segment routing list.

It can be seen from the foregoing that, by establishing an agent for performing route planning on service data of a target service, the route decision method and apparatus provided in the embodiments of the present application use current network information as input of a route evaluation network, the route evaluation network outputs corresponding route evaluation information, the route planning network uses the current network information and the route evaluation information as input of the route planning network of the agent, the route planning network outputs corresponding route policy information, and then a segment route list is generated according to the route policy information, and service data is forwarded in the network according to the segment route list. By respectively establishing the intelligent agents for different types of services, the intelligent agents are utilized to provide routing strategies which meet service requirements and are optimized efficiently for the corresponding services.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, and it is obvious that the drawings in the following description are only embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a schematic flow chart of a method according to an embodiment of the present application;

FIG. 2 is a schematic diagram of a process flow according to an embodiment of the present application;

FIG. 3 is a schematic diagram of a network framework according to an embodiment of the present application;

FIG. 4 is a schematic view of a device structure according to an embodiment of the present application;

fig. 5 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Detailed Description

For the purposes of promoting an understanding of the principles and advantages of the disclosure, reference will now be made to the embodiments illustrated in the drawings and specific language will be used to describe the same.

It should be noted that unless otherwise defined, technical or scientific terms used in the embodiments of the present application should be given the ordinary meaning as understood by one of ordinary skill in the art to which the present disclosure pertains. The terms "first," "second," and the like, as used in embodiments of the present application, do not denote any order, quantity, or importance, but rather are used to distinguish one element from another. The word "comprising" or "comprises", and the like, means that elements or items preceding the word are included in the element or item listed after the word and equivalents thereof, but does not exclude other elements or items. The terms "connected" or "connected," and the like, are not limited to physical or mechanical connections, but may include electrical connections, whether direct or indirect. "upper", "lower", "left", "right", etc. are used merely to indicate relative positional relationships, which may also be changed when the absolute position of the object to be described is changed.

In the related art, the SRv technology guides the forwarding of the data packet by dividing the forwarding path into different segment routes to form a segment route list, can flexibly plan the routing path, and provides a solution for the path planning of different types of service data in the power data network. The deep reinforcement learning model can continuously adjust the decision model through learning network state information to realize dynamic path planning, and the dynamic path planning of the deep reinforcement learning model is combined with the path arrangement characteristic of SRv to realize path decision and deployment of different types of business data.

The following further details the technical solution of implementing routing decisions for the power data network by combining the deep reinforcement learning model and SRv according to the present application through specific embodiments.

As shown in fig. 1, the present application provides a routing decision method, applied to a power data network, the method includes:

s101: establishing an agent for carrying out route planning on service data of a target class service;

in this embodiment, the power data network carries service data of various power services, and requirements of the various power services on service quality are the same or different. In order to ensure that the planned routing strategy can meet the service quality requirements of various electric power services, the service quality grades of the services are divided according to the requirements of different services on the service quality, the services with the same service quality grade are divided into the same service class, corresponding intelligent agents are established for each service class, the intelligent agents are utilized to plan the optimal routing strategy for the corresponding service class, compared with the establishment of a unified complex mathematical model, the solving complexity can be effectively reduced, each intelligent agent only plans a routing path for the corresponding service class, the pertinence is strong, and the efficient optimized routing strategy can be provided for the different service class.

Considering that the power services are various, the requirements of each service on the service quality are the same or different, and all the power services can be classified according to the requirements of each service on the service quality. For example, the distributed distribution automation service has low bandwidth requirement, but low time delay and small packet loss rate; the video monitoring service of the transformer substation has high bandwidth requirements, but has lower requirements on time delay and jitter. The requirements of the power service on the service quality can be determined according to the service quality parameters, wherein the service quality parameters comprise bandwidth, time delay, packet loss rate, jitter and the like, the service quality grade of the service is divided according to the service quality parameters, and then the service is classified according to the service quality grade.

For example, as shown in table 1, the service quality parameters corresponding to different electric power services are different, the service quality classes of the services are divided according to the service quality parameters, the services with the same service quality class belong to the same service class, the same service class programs the routing strategy of the service data by the same agent, and the service data of the same class are forwarded in the electric power data network according to the routing strategy planned by the agent, so that the service quality requirement can be met, and the network performance is ensured.

Table 1 quality of service parameters and class for different power services

Electric power business	Bandwidth requirement (kb/s)	Time delay requirement	Packet loss requirement	Grade
					Distributed power distribution automation	Low and low	High (millisecond level)	High height	1
Low-voltage electricity consumption information acquisition	In (a)	Low and low	Low and low	2
					Millisecond level accurate load control	Low and low	High height	High height		1
Video monitoring of transformer substation	High (8M-16M)	Low and low	Low and low	3

S102: taking the current network information as the input of a route evaluation network, and outputting corresponding route evaluation information by the route evaluation network; the route evaluation network is used for evaluating route strategy information obtained by the intelligent agent for route planning;

s103: taking the current network information and the route evaluation information as the input of a route planning network of an intelligent agent, and outputting corresponding route strategy information by the route planning network;

the deep reinforcement learning model includes agents (agents), environments (states), actions (actions), and rewards (Reward), which select an appropriate Action by sensing a State from the Environment. After the environment executes the action, the feedback rewards are sent to the intelligent agent, and the intelligent agent continuously learns according to the last feedback rewards and the new state of the environment to determine the next action so as to obtain larger rewards. The goal of deep reinforcement learning is to maximize the long-term overall benefit (rewards). To achieve this goal, reinforcement learning evaluates the value of each state plus action combination and then selects the most valuable decision as the next. For a deep reinforcement learning model of a plurality of agents, a Multi-agent deep deterministic strategy gradient algorithm (Multi-Agent Deep Deterministic Policy Gradient, MADDPG) can be used for realizing the deep reinforcement learning model training of the plurality of agents, each agent has a respective Actor network, each agent has a respective optimization target, and different strategies are used for maximizing a respective reward value, all agents share a Critic network, and the Critic network is used for integrating the strategies of all agents.

As shown in fig. 2, in an application scenario of the electric power data network, after different types of services are divided according to service quality requirements, corresponding agents are established for each type of service, a feedback rewarding function of each agent is determined according to the service quality requirements of each type of service, and an optimal routing strategy is planned for the corresponding type of service by using the agents. Each intelligent agent is provided with a corresponding route planning network and a route optimization target, wherein the route planning network is used for outputting route strategy information according to current network information of the power data network and route evaluation information fed back by the route evaluation network, and a route strategy for planning corresponding service is used as the route optimization target. And after routing strategy information planned by all the agents is issued to the power data network, network information obtained after forwarding various service data according to the routing strategy is obtained, the routing strategy planned by each agent is evaluated by the routing evaluation network based on the obtained network information and a feedback rewarding value obtained by calculating a feedback rewarding function, corresponding routing evaluation information is obtained, the routing evaluation information is fed back to the corresponding agent, the next routing strategy planning is carried out by the agent based on the fed-back routing evaluation information and the current network information, the conflict of the routing strategies planned by each agent is avoided, the cooperation among the agents is promoted, and the optimal routing strategy aiming at different services is provided.

In some embodiments, upon establishment of an agent, determining a feedback rewards function for the agent, the method includes:

according to the influence degree of each service quality parameter of the target service on the service, determining the weight of each service quality parameter meeting the service quality requirement of the target service;

In this embodiment, the service quality parameters of the target service include bandwidth, delay, jitter and packet loss rate, and the requirements of different services on the service quality are different. And carrying out qualitative analysis on each service quality parameter, and determining the weight value of each service quality parameter by adopting an analytic hierarchy process. Specifically, for the target service, the quality of service parameters are compared in pairs to construct a judgment matrix a, a= (a) _i,j ) _n×n Satisfy a for any integer k _i,j ×a _j,k ＝a _j,k The method comprises the steps of carrying out a first treatment on the surface of the Calculating a maximum eigenvalue lambda, a maximum eigenvalue n and an eigenvector W of the judgment matrix a according to the formula aw=λw; consistency test is carried out on the judgment matrix A, CR is less than or equal to 0.1,

RI is a random consistency index, the relation between n and RI is shown in table 2, and the value of RI is known to be obtained by looking up the table. And continuously adjusting the judgment matrix A until the consistency requirement is met, and taking the feature vector W of the judgment matrix A meeting the consistency requirement as a weight parameter vector meeting the service quality requirement of the target class service.

TABLE 2 relationship between characteristic root n and R1

n	1	2	3	4	5	6	7	8	9
										RI	0	0	0.58	0.90	1.12	1.24	1.32	1.41	1.45

Alternatively, the comparative judgment criteria of the judgment matrix A for the quality of service parameters may refer to the 1-9 scale method of Thomas L.Saath, as shown in Table 3:

table 3 scale method for judging matrix element

In some embodiments, the feature vector W is a weight parameter vector W _i ＝(w _i,1 ,w _i,2 ,w _i,3 ,w _i,4 ) Constructing a feedback rewarding function of an intelligent agent of the target class service according to each service quality parameter and the corresponding weight value, wherein the feedback rewarding function is expressed as follows:

in the case of through put _i For network throughput of service i, delay _i Jitter is the delay of service i _i For jitter of service i, loss _i Packet loss rate, w, for service i _i,1 Bandwidth weight value, w, for service i _i,2 Is the time delay weight value, w, of the service i _i,3 Jitter weight value, w, for service i _i,4 And the packet loss rate weight value of the service i.

In some embodiments, before the current network information and the route evaluation information are input into the route planning network of the agent, the method further comprises:

acquiring all network information;

screening out target network information corresponding to the target class service from all the network information;

the input of the route planning network taking the current network information and the route evaluation information as the intelligent agent comprises the following steps:

In this embodiment, the route planning network of the agent of the target class service is used to plan a route policy according to the network information of the target class service. After all network information is obtained from the power data network, all network information is classified according to service types, the classified network information is correspondingly input into a route planning network of corresponding service, and the route planning network evaluates and adjusts the route strategy information determined last time according to the network information of the corresponding service and the route evaluation information output by the route evaluation network to determine better route strategy information. The network information comprises network parameters such as bandwidth, time delay, packet loss rate, jitter and the like.

That is, the network is input as the network information in which all the network information acquired from the power data network is classified by the service type for the route planning network of each agent. And respectively calculating corresponding feedback rewards according to feedback rewards corresponding to different service types for the classified network information, inputting the feedback rewards for all acquired network information and all types of calculated service to a route evaluation network shared by each intelligent agent, and determining route evaluation information obtained after data forwarding according to the last route strategy information by the route evaluation network according to all network information and all types of calculated service feedback rewards to provide basis for the intelligent agent to determine the next route strategy information.

S104: generating a segment route list according to the route strategy information;

s105: and forwarding the service data in the network according to the segment routing list.

In this embodiment, after determining the routing policy information of the target class service by using the agent, a segment routing list is generated according to the routing policy information, the segment routing list is distributed to the network node, and the network node forwards the service data of the target class service according to the segment routing list.

In some embodiments, generating the segment routing list according to the routing policy information includes:

according to the routing strategy information, calculating a forwarding path according to a preset routing algorithm;

determining a segment route list according to the forwarding path;

issuing the segment routing list to a network node;

the network node forwards the traffic data according to the segment routing list.

In this embodiment, after determining the corresponding routing policy information, the agent of each service type calculates a forwarding path of the service data according to the routing policy information, generates a segment routing list according to the forwarding path, and issues the segment routing list to the corresponding network node, and utilizes each network node to forward the service data according to the segment routing list, thereby completing deployment and issuing of the routing policy.

In some implementations, the agent determines a weight value for each link in the network topology and calculates the routing path using an open shortest path first algorithm. For example, the calculated routing paths are 1-3-4-5, wherein SID (Segment ID) identifiers of nodes 1, 3, 4, and 5 are 12001,12003,12004, and 12005, respectively, and the generated segment routing list is [12001,12003,12004,12005].

In some embodiments, determining the segment routing list from the forwarding path includes:

and generating a segment route list according to the rest paths and the network nodes.

In this embodiment, considering that the segment routing list is inserted into the header, the data forwarding path is determined by reading the segment routing list in the header, and if the segment routing list is too long, the header is too large, which affects the data transmission efficiency. Therefore, the shortest path is generated by utilizing the shortest path algorithm in advance, after the forwarding path is determined according to the routing strategy information, the forwarding path is compared with the shortest path, the repeated paths and network nodes in the forwarding path and the shortest path are removed, a segment routing list is generated according to the rest paths and network nodes, redundant segment routing can be reduced, the length of the segment routing list is reduced, and the data transmission efficiency is improved.

For example, the forwarding path set determined according to the routing policy information is

The shortest path set calculated by using the shortest path algorithm is +.>

Screening two identical route paths of source node and destination node from two sets, if the sets are +.>

Node h default next hop forwarding node g and set +.>

The next hop forwarding node of the node h in the middle path p' is the same, the node h is determined not to need forwarding intervention, the SID identification of the node h in the segment routing list is deleted, namely the node h forwards according to the default shortest path, and the intervention of the forwarding behavior of the additional SID is not needed.

In some embodiments, the routing decision method further comprises:

acquiring network information obtained after service data is forwarded in a network according to the segment routing list;

The embodiment provides a method for training a network, after service data is transferred in a power data network according to a segment routing list determined last time, a new network state is obtained, network information in the new state is obtained, and a routing planning network of each agent and a routing evaluation network of all agents are updated according to the last network information, routing strategy information determined according to the last network information, calculated feedback rewards values and the network information obtained this time, and finally a network for planning a routing strategy for the power data network is obtained through multiple rounds of optimization updating.

As shown in fig. 2 and 3, in some embodiments, the power service in the power data network is divided into n types of service, and n agents are correspondingly established, each agentCorresponding to the respective route planning network (based on the Actor network implementation), an experience pool for storing experience data(s) of route planning performed by the route planning network for a certain time, and a feedback rewarding function _i ,s _i ^′ ,a _i ,r _i ) Wherein s is _i Planning network information input for the route of the ith agent at last planning time, s _i ^′ A, inputting network information for the current planning _i For the route policy information, r, output in the last planning _i And feeding back a reward value corresponding to the routing strategy information planned last time. A route evaluation network (based on Critic network implementation) common to all agents is established.

During initialization, network parameters of the route planning network and parameters of the route evaluation network of each intelligent agent are initialized randomly. Acquiring initial network information from a power data network, dividing the initial network information into network information corresponding to a service type according to the service type (s ₁ ,s ₂ ,..,s _n ) Calculating a shortest path set using a shortest route algorithm

During the first training, agent c _i According to the input network information s _i Output path

From policy information

Wherein (1)>

For network parameter θ _i Noise added to avoid local optimization problems.

Route policy information (a) outputted from a route planning network according to each agent ₁ ,a ₂ ,…,a _n ) Calculating to obtain a shortest path set by using a shortest route algorithm

To set shortest path P ₀ And P _c Comparing, screening two routing paths with the same source node and destination node, comparing the two routing paths, deleting repeated paths and network nodes in the two routing paths, forming a new routing path by the rest network nodes, and generating a segment routing list SR= (SR) according to the new routing path ₁ ,…,sr _n ) And issuing the segment routing list to the network node.

After forwarding service data according to the segment routing list in the power data network, a new network state is obtained, and network information(s) in the new state is obtained ₁ ^′ ,s ₂ ^′ ,…,s ^′ _n ) According to each parameter of the corresponding service in the network information, calculating the feedback rewarding value of the corresponding service by using the feedback rewarding function shown in the formula (1) to obtain the feedback rewarding value (r) of all the services ₁ ,r ₂ ,…,r _n ) Agent c _i Network information s acquired last time _i Route policy information a of last planning _i Feedback prize value r obtained according to last planned route policy information _i Network information s acquired this time _i ^′ Store in experience pool D _i Is a kind of medium.

From experience pool D _i K samples are randomly selected, and a target action-cost function y is calculated according to the selected samples ^j ：

Wherein, gamma is an attenuation factor, r _i ^j For the feedback prize value in the jth sample,

is the target route evaluation network, is a delayed copy of the route evaluation network, μ ^′ The network is planned for the target route, is a copy of the route planning network with delay, and n is the number of agents. X is x ^′j ＝(s ₁ ^′ ,s ₂ ^′ ,…,s ^′ _n ) A, for input network information set ^′ _z Target policy value for agent z, +.>

Representation->

In a ^′ _n The value of (2) comes from the corresponding target route planning network, i.e. is equivalent to +.>

Wherein the method comprises the steps of

Representing a network μ for destination routing ^′ _n In inputting network information->

And the policy value obtained after that.

L(θ _i ) Is the loss function in the target evaluation network, k is the number of samples.

To route the evaluation network, the action-cost function Q, θ is output _i To evaluate network parameters of the network.

Updating network parameters of the agent's routing network using a gradient descent method, expressed as:

wherein, the liquid crystal display device comprises a liquid crystal display device,

is the network information input for the j-th sample of agent i.

And updating the route evaluation network and the route planning network of each agent according to the updated network parameters, and training the network for the next round according to the process. And when the feedback values obtained by all the intelligent agents gradually tend to be stable, ending training. After training, all the strategies output by the agents can obtain higher feedback rewards, namely, the output routing strategy can meet the target requirements of power business, and the trained model is used for routing planning in the power data network.

It should be noted that, the method of the embodiments of the present application may be performed by a single device, for example, a computer or a server. The method of the embodiment can also be applied to a distributed scene, and is completed by mutually matching a plurality of devices. In the case of such a distributed scenario, one of the devices may perform only one or more steps of the methods of embodiments of the present application, and the devices may interact with each other to complete the methods.

It should be noted that the foregoing describes specific embodiments of the present invention. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims can be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing are also possible or may be advantageous.

As shown in fig. 4, the embodiment of the present application further provides a routing decision device, including:

For convenience of description, the above devices are described as being functionally divided into various modules, respectively. Of course, the functions of each module may be implemented in one or more pieces of software and/or hardware when implementing the embodiments of the present application.

The device of the foregoing embodiment is configured to implement the corresponding method in the foregoing embodiment, and has the beneficial effects of the corresponding method embodiment, which is not described herein.

Fig. 5 shows a more specific hardware architecture of an electronic device according to this embodiment, where the device may include: a processor 1010, a memory 1020, an input/output interface 1030, a communication interface 1040, and a bus 1050. Wherein processor 1010, memory 1020, input/output interface 1030, and communication interface 1040 implement communication connections therebetween within the device via a bus 1050.

The processor 1010 may be implemented by a general-purpose CPU (Central Processing Unit ), microprocessor, application specific integrated circuit (Application Specific Integrated Circuit, ASIC), or one or more integrated circuits, etc. for executing relevant programs to implement the technical solutions provided in the embodiments of the present disclosure.

The Memory 1020 may be implemented in the form of ROM (Read Only Memory), RAM (Random Access Memory ), static storage device, dynamic storage device, or the like. Memory 1020 may store an operating system and other application programs, and when the embodiments of the present specification are implemented in software or firmware, the associated program code is stored in memory 1020 and executed by processor 1010.

The input/output interface 1030 is used to connect with an input/output module for inputting and outputting information. The input/output module may be configured as a component in a device (not shown) or may be external to the device to provide corresponding functionality. Wherein the input devices may include a keyboard, mouse, touch screen, microphone, various types of sensors, etc., and the output devices may include a display, speaker, vibrator, indicator lights, etc.

Communication interface 1040 is used to connect communication modules (not shown) to enable communication interactions of the present device with other devices. The communication module may implement communication through a wired manner (such as USB, network cable, etc.), or may implement communication through a wireless manner (such as mobile network, WIFI, bluetooth, etc.).

Bus 1050 includes a path for transferring information between components of the device (e.g., processor 1010, memory 1020, input/output interface 1030, and communication interface 1040).

It should be noted that although the above-described device only shows processor 1010, memory 1020, input/output interface 1030, communication interface 1040, and bus 1050, in an implementation, the device may include other components necessary to achieve proper operation. Furthermore, it will be understood by those skilled in the art that the above-described apparatus may include only the components necessary to implement the embodiments of the present description, and not all the components shown in the drawings.

The electronic device of the foregoing embodiment is configured to implement the corresponding method in the foregoing embodiment, and has the beneficial effects of the corresponding method embodiment, which is not described herein.

The computer readable media of the present embodiments, including both permanent and non-permanent, removable and non-removable media, may be used to implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of storage media for a computer include, but are not limited to, phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium, which can be used to store information that can be accessed by a computing device.

Those of ordinary skill in the art will appreciate that: the discussion of any of the embodiments above is merely exemplary and is not intended to suggest that the scope of the disclosure, including the claims, is limited to these examples; the technical features of the above embodiments or in the different embodiments may also be combined under the idea of the present disclosure, the steps may be implemented in any order, and there are many other variations of the different aspects of the embodiments of the present application as described above, which are not provided in details for the sake of brevity.

Additionally, well-known power/ground connections to Integrated Circuit (IC) chips and other components may or may not be shown within the provided figures, in order to simplify the illustration and discussion, and so as not to obscure the embodiments of the present application. Furthermore, the devices may be shown in block diagram form in order to avoid obscuring the embodiments of the present application, and this also takes into account the fact that specifics with respect to implementation of such block diagram devices are highly dependent upon the platform on which the embodiments of the present application are to be implemented (i.e., such specifics should be well within purview of one skilled in the art). Where specific details (e.g., circuits) are set forth in order to describe example embodiments of the disclosure, it should be apparent to one skilled in the art that embodiments of the disclosure can be practiced without, or with variation of, these specific details. Accordingly, the description is to be regarded as illustrative in nature and not as restrictive.

While the present disclosure has been described in conjunction with specific embodiments thereof, many alternatives, modifications, and variations of those embodiments will be apparent to those skilled in the art in light of the foregoing description. For example, other memory architectures (e.g., dynamic RAM (DRAM)) may use the embodiments discussed.

The present embodiments are intended to embrace all such alternatives, modifications and variances which fall within the broad scope of the appended claims. Accordingly, any omissions, modifications, equivalents, improvements, and the like, which are within the spirit and principles of the embodiments of the present application, are intended to be included within the scope of the present disclosure.

Claims

1. A method of routing decisions, comprising:

2. The method of claim 1, comprising, prior to establishing the agent:

3. The method of claim 1, wherein the quality of service parameters of the targeted class of service include bandwidth, delay, jitter, and packet loss rate; establishing an agent comprising:

4. A method according to claim 3, wherein the current network information is taken as an input of a route evaluation network, and the corresponding route evaluation information is output by the route evaluation network, comprising:

5. A method according to claim 3, wherein the feedback reward function of the agent is:

r _i ＝w _i，1 ×throughput _i +W _i，2 ×delay _i +w _i，3 ×jitter _i +w _i，4 ×loss _i (1)

wherein, through put _i For network throughput of service i, delay _i Jitter is the delay of service i _i For jitter of service i, loss _i Packet loss rate, w, for service i _i，1 Bandwidth weight value, w, for service i _i，2 Is the time delay weight value, w, of the service i _i，3 Jitter weight value, w, for service i _i，4 And the packet loss rate weight value of the service i.

6. The method of claim 1, further comprising, prior to the inputting of the route planning network for the agent with the current network information and route evaluation information:

acquiring all network information;

7. The method of claim 1, wherein generating a segment routing list based on the routing policy information comprises:

determining a segment route list according to the forwarding path;

issuing the segment routing list to a network node;

8. The method of claim 7, wherein determining a segment routing list from the forwarding path comprises:

9. The method as recited in claim 4, further comprising:

10. A routing decision apparatus, comprising: