CN111340192A

CN111340192A - Network path allocation model training method, path allocation method and device

Info

Publication number: CN111340192A
Application number: CN202010130022.5A
Authority: CN
Inventors: 陈力; 刘礼彬
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2020-02-28
Filing date: 2020-02-28
Publication date: 2020-06-26
Anticipated expiration: 2040-02-28
Also published as: CN111340192B

Abstract

The embodiment of the application provides a network path distribution model training method, a path distribution method and a device, and relates to the fields of traffic engineering and artificial intelligence. The method comprises the following steps: acquiring a training sample, wherein the training sample comprises sample tunnel information corresponding to at least one sample tunnel in a network, then carrying out iterative training on a preset network model based on the training sample until a preset condition is met, and taking the network model meeting the preset condition as a network path distribution model, wherein the mode of carrying out primary training on the network model comprises the following steps: inputting the training samples into a network model, and predicting to obtain network path information corresponding to each sample tunnel; determining network state information of the network based on the obtained network path information; model parameters of the network model are adjusted based on the network state information. According to the embodiment of the application, the time for model training is reduced, the efficiency for model training is improved, and samples required by training can be reduced.

Description

Network path allocation model training method, path allocation method and device

Technical Field

The application relates to the technical field of traffic engineering and artificial intelligence, in particular to a network path distribution model training method, a path distribution method and a device.

Background

For cloud service providers with global services, a backbone Network (WAN) is responsible for connecting data centers around the world, and provides communication services for large-scale applications, which is one of the most important parts in a cloud service infrastructure. Traffic in WANs is large in scale and is still growing rapidly. Therefore, Traffic Engineering (TE) has been receiving wide attention from both academia and industry as an important means for improving network application performance and reducing cost in the backbone network. Under the constraints of network performance and cost, the TE needs to distribute traffic with different bandwidth requirements and priorities to different network paths to achieve different goals. Specifically, traffic can be distributed to different network paths through a trained TE Deep Neural Networks (DNN) model, so how to train the TE DNN model becomes a key problem.

In the prior art, when a TE DNN model is trained, a network environment in a WAN, a real network device and a protocol work, for example, a forwarding delay of a switch and a router device, a routing protocol and a transport protocol work and a convergence process, need to be continuously simulated through training samples to finally reach a TE network convergence state.

Disclosure of Invention

The application provides a network path allocation model training method, a path allocation method and a device, which can solve at least one technical problem. The technical scheme is as follows:

in a first aspect, a method for training a network path assignment model is provided, where the method includes:

acquiring a training sample, wherein the training sample comprises sample tunnel information corresponding to at least one sample tunnel in a network;

performing iterative training on a preset network model based on a training sample until a preset condition is met, and taking the network model meeting the preset condition as a network path distribution model;

the mode of training the network model for one time comprises the following steps:

inputting the training samples into a network model, and predicting to obtain network path information corresponding to each sample tunnel;

determining network state information of the network based on the obtained network path information;

model parameters of the network model are adjusted based on the network state information.

In one possible implementation, the sample tunnel information includes a source address of the sample tunnel, a target address of the sample tunnel, and sample attribute information of the sample tunnel.

In another possible implementation manner, the sample attribute information of the sample tunnel includes: at least one of a bandwidth requirement of the sample tunnel, a tunnel grade of the sample tunnel, a tunnel latency requirement of the sample tunnel, or a tunnel link cost of the sample tunnel.

In another possible implementation manner, the network path information corresponding to any one sample tunnel includes link identifiers of the directional links through which the network path sequentially passes,

determining network state information of the network based on the obtained network path information, including:

determining prediction attribute information corresponding to each sample tunnel based on link attribute information of each link contained in the network and link information passed by each network path;

and determining network state information based on the corresponding prediction attribute information of each sample tunnel.

In another possible implementation manner, any sample tunnel prediction attribute information includes at least one of the following:

link capacity information; link delay information; link cost information.

In another possible implementation manner, adjusting model parameters of the network model based on the network state information includes:

determining an objective function value corresponding to the current training of the objective function corresponding to the network model based on the network state information;

determining a training gradient of the network model based on an objective function value corresponding to the current training and an objective function value corresponding to the last training;

based on the determined gradient, model parameters of the network model are adjusted.

In another possible implementation manner, determining, based on the network state information, an objective function value corresponding to the current training of the objective function corresponding to the network model includes:

and determining an objective function value corresponding to the current training of the objective function corresponding to the network model based on the network state information and the sample attribute information of the sample tunnel.

In another possible implementation, the objective function includes at least one of:

a delay function; a bandwidth function; a cost function.

In a second aspect, a network path allocation method is provided, including:

acquiring tunnel information of a network tunnel;

and performing network path allocation processing on the tunnel information of the network tunnel through the trained network model shown in the first aspect or any possible implementation manner of the first aspect to obtain network path information corresponding to the network tunnel.

In a third aspect, a network path allocation model training apparatus is provided, including:

the device comprises a first acquisition module, a second acquisition module and a third acquisition module, wherein the first acquisition module is used for acquiring a training sample, and the training sample comprises sample tunnel information corresponding to at least one sample tunnel in a network;

the training module is used for carrying out iterative training on a preset network model based on a training sample until a preset condition is met, and taking the network model meeting the preset condition as a network path distribution model;

wherein, the training module is specifically used for:

In another possible implementation, the sample attribute information includes: at least one of a bandwidth requirement of the sample tunnel, a tunnel grade of the sample tunnel, a tunnel latency requirement of the sample tunnel, or a tunnel link cost of the sample tunnel.

when determining the network state information of the network based on the obtained network path information, the training module is specifically configured to:

In another possible implementation manner, the prediction attribute information corresponding to any sample tunnel includes at least one of the following:

link capacity information; link delay information; link cost information.

In another possible implementation manner, when the training module adjusts the model parameters of the network model based on the network state information, the training module is specifically configured to:

In another possible implementation manner, when determining, based on the network state information, an objective function value corresponding to the current training of the objective function corresponding to the network model, the training module is specifically configured to:

a delay function; a bandwidth function; a cost function.

In a fourth aspect, a network path allocating apparatus is provided, including:

the second acquisition module is used for acquiring tunnel information of the network tunnel;

and the allocation module is configured to perform network path allocation processing on the tunnel information of the network tunnel through a network model trained in the first aspect or any one of possible implementation manners of the first aspect, so as to obtain network path information corresponding to the network tunnel.

The beneficial effect that technical scheme that this application provided brought is:

the application provides a network path distribution model training method, path distribution method and device, compares with prior art, this application obtains the training sample, and the training sample includes the sample tunnel information that at least one sample tunnel corresponds in the network, then based on the training sample, carries out iterative training to predetermined network model, until satisfying predetermined condition, will satisfy the network model of predetermined condition as network path distribution model, and wherein, the mode of carrying out a training to network model includes: inputting the training samples into a network model, and predicting to obtain network path information corresponding to each sample tunnel; determining network state information of the network based on the obtained network path information; model parameters of the network model are adjusted based on the network state information. In other words, in the present application, for each training, the network state corresponding to the current network can be calculated, and it is not necessary to continuously simulate the network environment, the real network device and the protocol work, etc. through the training samples, so that the time of model training can be reduced, the efficiency of model training can be improved, and the samples required by training can be reduced.

Compared with the prior art, the method and the electronic equipment for distributing the network paths acquire the tunnel information of the network tunnel, and then carry out network path distribution processing on the tunnel information of the network tunnel through a trained network model to acquire the network path information corresponding to the network tunnel. The method comprises the steps of predicting network path information corresponding to each sample tunnel through a training sample, determining the network state of the network according to the network path information corresponding to each sample tunnel, adjusting parameters of a network model according to the network state of the network, and avoiding the need of continuously simulating the network environment, the real network equipment, the protocol work and the like through the training sample, so that the time for model training can be shortened, the efficiency of model training can be improved, and samples required by training can be reduced.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings used in the description of the embodiments of the present application will be briefly described below.

Fig. 1 is a schematic flowchart of a network path assignment model training method according to an embodiment of the present disclosure;

fig. 2 is a schematic flowchart of a trained network path assignment model application according to an embodiment of the present disclosure;

FIG. 3 is a schematic structural diagram of a long-term and short-term memory network model for training according to an embodiment of the present disclosure;

FIG. 4 is a schematic diagram illustrating a comparison between network path allocation for a network model (LSTM) and existing TE algorithms in terms of throughput, maximum congestion, and congestion and packet loss, according to an example of a Long Short-Term Memory (LSTM) trained according to an embodiment of the present invention;

fig. 5 is a schematic diagram of TE DNN model training performed by a differentiable TE-based network simulator according to an embodiment of the present application;

fig. 6 is a schematic structural diagram of a network path allocation model training apparatus according to an embodiment of the present disclosure;

fig. 7 is a schematic structural diagram of a network path allocating apparatus according to an embodiment of the present disclosure;

fig. 8 is a schematic structural diagram of an electronic device for training a network path assignment model according to an embodiment of the present disclosure;

fig. 9 is a schematic structural diagram of an electronic device for network path allocation according to an embodiment of the present disclosure.

Detailed Description

Reference will now be made in detail to embodiments of the present application, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the drawings are exemplary only for the purpose of explaining the present application and are not to be construed as limiting the present invention.

As used herein, the singular forms "a", "an", "the" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof. It will be understood that when an element is referred to as being "connected" or "coupled" to another element, it can be directly connected or coupled to the other element or intervening elements may also be present. Further, "connected" or "coupled" as used herein may include wirelessly connected or wirelessly coupled. As used herein, the term "and/or" includes all or any element and all combinations of one or more of the associated listed items.

To make the objects, technical solutions and advantages of the present application more clear, embodiments of the present application will be described in further detail below with reference to the accompanying drawings.

The terms referred to in this application will first be introduced and explained:

traffic Engineering (TE) is the last ring before a network is put into production, and is a tool for planning Traffic in the network. The process of planning a path for traffic in a network is traffic engineering;

artificial Intelligence (AI) is a theory, method, technique and application system that uses a digital computer or a machine controlled by a digital computer to simulate, extend and expand human Intelligence, perceive the environment, acquire knowledge and use the knowledge to obtain the best results. In other words, artificial intelligence is a comprehensive technique of computer science that attempts to understand the essence of intelligence and produce a new intelligent machine that can react in a manner similar to human intelligence. Artificial intelligence is the research of the design principle and the realization method of various intelligent machines, so that the machines have the functions of perception, reasoning and decision making.

In the prior art, the network model can be trained by a general-purpose network simulator, for example, a discrete event network simulator third generation (network simulator version3, NS3) and a Modular and component-based discrete event network simulator (object Modular network test in C + +, OMNet + +), but the existing general-purpose network simulator has the following technical problems when training the network model:

(1) existing general Network simulators view a Network (i.e., a WAN Network) as a black box, an algorithm that does not rely on a specific Network model, such as an algorithm of a Deep Neural Network (DNN) model. For the DNN model, the network simulator has the defects of low training speed, poor expandability and incapability of supporting other DNN algorithm models except Deep reinforcement learning (Deep RL, DRL);

(2) the existing general network simulator needs to simulate a real network environment through calculation, a DNN model needs a large number of training samples to carry out repeated training, and on the general network simulator based on discrete events, the training speed of the DNN model is very low, so that the model is difficult to converge. The larger the network scale is, the longer the training time of the model is, and the expandability of the algorithm model is further limited by the expandability problem of the simulator;

(3) the existing general network simulator is not differentiable and does not support the training of a model based on a gradient descent optimization method. The DNN model based TE algorithm is therefore limited to only the DRL algorithm that can be used because DRL can estimate the TE network system dynamics such as state transitions, action functions, and reward functions through proxy learning. The DNN model applied to TE at present is also limited by the limitation of RL algorithm, such as the problems of extremely difficult convergence of the model and low training efficiency caused by large deviation of training samples. In addition, in many scenarios, other DNN model based algorithms perform better than the RL, such as trajectory optimization in robot control and monte carlo tree search in games. The assumption of TE networks as "black boxes" prevents the use of other DNN models to solve the TE problem.

The above problems encountered by existing general network simulators in software defined backbone (SDWAN) TE applications are solved in the embodiments of the present application.

(1) In SDWAN, the network is not a "black box," its network environment is completely unambiguous: the current network state and the tunnel-path assignment result can definitely determine the next network state; the TE metrics, such as delay, path length, and link utilization, can be directly calculated. Therefore, the future network state can be definitely calculated according to the current network state and the output of the algorithm, and the TE network simulator in the embodiment of the application can immediately feed back the network state information, so that the model training speed and the model expansibility are greatly improved.

(2) In addition, differential programming can make software programming a process of assembling a network of parameterized functional modules, thereby making each part of the software differentiable. This allows the program to be trained to optimize parameters in the software using a gradient descent based optimization method. Following such a programming paradigm, the present embodiment designs a fully differentiable network simulator dNE that can be embedded as a separate "layer" in the deep learning model. This allows any DNN model, such as Recurrent Neural Networks (RNNs), Generative Adaptive Networks (GANs), and Differentiable Neural Computers (DNCs), to be trained on dnes.

The following describes the technical solutions of the present application and how to solve the above technical problems with specific embodiments. The following several specific embodiments may be combined with each other, and details of the same or similar concepts or processes may not be repeated in some embodiments. Embodiments of the present application will be described below with reference to the accompanying drawings.

An embodiment of the present application provides a model training method, as shown in fig. 1, executed by an electronic device, including:

and step S101, obtaining a training sample.

The training sample comprises sample tunnel information corresponding to at least one sample tunnel in the network.

For the present embodiment, one tunnel is a tunnel of a particular level of aggregated traffic between a pair of ingress WAN routers and egress WAN routers.

And S102, carrying out iterative training on a preset network model based on the training sample until a preset condition is met, and taking the network model meeting the preset condition as a network path distribution model.

Further, in the embodiment of the present application, the network model is trained through iterative training based on the training samples until a preset condition is met, so as to obtain a trained network model (that is, a network path allocation model), where the trained network model may be set in the central controller and used to allocate network paths to each tunnel. I.e. determining the links traversed by the respective tunnels. For example, the network model in the embodiment of the present application may be a DNN model.

Further, the preset conditions are detailed in the following embodiments, and are not described herein again.

Specifically, the method for training the network model at one time includes: inputting the training samples into a network model, and predicting to obtain network path information corresponding to each sample tunnel; determining network state information of the network based on the obtained network path information; model parameters of the network model are adjusted based on the network state information.

Specifically, in this embodiment of the present application, adjusting the model parameters of the network model based on the network state information may specifically include: and calculating the value and gradient value of the objective function based on the network state information, and further adjusting the model parameters of the DNN model.

Specifically, in the embodiment of the present application, the network status information is network status information in a WAN network. Further, after obtaining the network state information, adjusting the network parameters in the network model based on the obtained network state information may include: based on the obtained network state information, the network parameters respectively corresponding to each layer in the network model are adjusted, and of course, only the network parameters of some network layers in the network model may be adjusted. The embodiments of the present application are not limited thereto.

Specifically, the sample tunnel information in step S101 includes a source address of the sample tunnel, a target address of the sample tunnel, and sample attribute information of the sample tunnel.

Specifically, the sample attribute information of the sample tunnel includes: at least one of a bandwidth requirement of the sample tunnel, a tunnel grade of the sample tunnel, a tunnel latency requirement of the sample tunnel, or a tunnel link cost of the sample tunnel.

Further, in a specific example of the embodiment of the present application, if the training samples include the sample tunnel information of the T sample network tunnels, the training samples may form a matrix. In particular, the amount of the solvent to be used,

1. the source address O is a vector of dimension | T | × 1, where for any tunnel T in the tunnel set T, O [ T ] denotes the index of the tunnel T in the source address vector O;

2. the target address D is a vector of | T | × 1 dimension, wherein for any tunnel T in the tunnel set T, D [ T ] represents the index of the tunnel T in the target address vector D;

3. the bandwidth requirement B of the tunnel is a vector of | T | × 1 dimension, wherein B [ T ] represents the bandwidth requirement of the tunnel T;

4. the tunnel level C is a vector of dimension | T | × | P | where P denotes the set of tunnel levels supported in the network, and if the level of tunnel T is P, C [ T |)][p]Is 1; otherwise, C [ t ]][p]Is 0. each tunnel belongs to only one class of traffic classes (∑)_pC[t][p]1). The network model assumes that all routers use strict priority queues;

5. the tunnel delay requirement L is a vector with the dimension of | T | × 1, and L [ T ] (≧ 0) represents the maximum total delay acceptable by the tunnel T;

6. the tunnel link cost (length) requirement Z is a vector of dimension | T | × 1, and Z [ T ] (≧ 0) indicates the maximum total link cost that can be accepted by tunnel T.

Further, after obtaining the training samples, performing iterative training on the network model based on the obtained training samples, and obtaining a TE decision (represented by a, that is, network path information corresponding to each sample tunnel) through the network model based on the obtained training samples for one training, specifically, the TE decision may be a matrix of | T | × | E |, that is, a | T | × | E |, which represents a path allocated in the network topology for each tunnel in | T |, for example, the value of a [ T ] [ E ] is 0 or 1 to indicate whether the tunnel T passes through the link E, further, after obtaining the TE decision, determining the network state information of the current network according to the TE decision, then determining a target function value corresponding to the training according to the network state information and the determined target function, then obtaining the gradient of the network model according to the target function value corresponding to the training and the target function value corresponding to the last training, and then adjusting the network function value in the network model according to the obtained training gradient of the network model.

Further, the network state information of the current network is determined according to the TE decision, the objective function value corresponding to the current training is determined according to the network state information and the determined objective function, the gradient corresponding to the network parameter of the network model is obtained according to the objective function value corresponding to the current training and the objective function value corresponding to the last training, and the mode of adjusting the network parameter in the network model according to the gradient can be specifically executed in a differentiable TE network simulator (also referred to as dNE). Specifically, as shown in fig. 5, the above steps (determining the network state information of the current network according to the TE decision, determining the objective function value corresponding to the current training according to the network state information and the determined objective function, and obtaining the gradient corresponding to the network parameter of the network model according to the objective function value corresponding to the current training and the objective function value corresponding to the last training) may be performed in dNE, wherein, two stages (a network state evaluation stage and a network summary stage) are included in the dNE, wherein the network state evaluation stage is to determine the network state information of the current network based on the TE decision, the network summary node determines the objective function value corresponding to the current training according to the network state information, obtaining a gradient corresponding to a network parameter of the network model according to an objective function value corresponding to the current training and an objective function value corresponding to the last training; and then adjusting network parameters in the TE DNN model according to the gradient obtained in the network summarizing stage, wherein the TE decision is obtained through the network model based on the obtained training sample.

Specifically, when the network model is trained, the differentiable TE network simulator may be embedded in the network model as a "layer", that is, the differentiable TE network simulator may output gradients corresponding to network parameters of the network model, and adjust the network parameters in the network model based on the automatic differentiation capability of the Pytorch programming framework and based on the obtained gradients. In the embodiment of the present application, the differentiable TE network simulator is implemented by using an auto-differentiable programming paradigm of the Pytorch, and the parameters of the network model implemented by using the Pytorch can be updated directly based on the gradients corresponding to the network parameters of the output network model.

In another possible implementation manner of this embodiment, the network path information corresponding to any one sample tunnel includes link identifiers of directional links through which the network path sequentially passes, where determining the network state information of the network based on the obtained network path information includes: determining prediction attribute information corresponding to each sample tunnel based on link attribute information of each link contained in the network and link information passed by each network path; and determining network state information based on the corresponding prediction attribute information of each sample tunnel.

Specifically, the prediction attribute information corresponding to any sample tunnel includes at least one of the following:

link capacity information; link delay information; link cost information.

Specifically, the link capacity N is a vector of | E | × 1 dimension, N [ E ] represents the link capacity of the link E, which can be kept constant or dynamically changed, the measured delay M is a vector of | E | × 1 dimension, M [ E ] represents the measured delay on the link E, which can be kept constant or dynamically changed, and the link cost K: a vector of | E | × 1 dimension, K [ E ] represents the Internal Gateway Protocol (IGP) measurement of the link E in the network.

Specifically, in the embodiment of the present application, the preset attribute information may be preset, for example, the preset attribute information includes link capacity information; the preset attribute information may also be determined according to the training sample, for example, if the training sample includes a tunnel link cost (length) requirement Z, the preset attribute includes a link cost K; the preset attribute information may also be determined according to an objective function, for example, the objective function is a delay function: max _ lat (a, M), the predetermined attribute information includes the measured delay M.

Further, in this embodiment of the present application, links through which network paths corresponding to each tunnel pass are obtained based on network paths (TE decisions) respectively corresponding to at least one tunnel output by a network model, a preset attribute value corresponding to each link is determined, a preset attribute value of a network path corresponding to each tunnel is determined based on the preset attribute value corresponding to each link and the link through which the network path corresponding to each link passes, and then network state information is determined based on the preset attribute value of the network path corresponding to each link.

For example, the training sample includes a tunnel 1 and a tunnel 2, and for the tunnel 1, the preset attribute is link delay, where a network path corresponding to the tunnel 1 is a link 1, a link 2, a link 3, a link 4, and a link 5, and link delay values corresponding to respective links (the link 1, the link 2, the link 3, the link 4, and the link 5) are determined, so that a link delay value corresponding to the tunnel 1 can be determined, and the link delay value corresponding to the tunnel 2 and the link delay value corresponding to the tunnel 1 are determined in the same manner, which is not described again.

Further, adjusting model parameters of the network model based on the network state information includes: determining an objective function value corresponding to the current training of the objective function corresponding to the network model based on the network state information; determining a training gradient of the network model based on an objective function value corresponding to the current training and an objective function value corresponding to the last training; based on the determined gradient, model parameters of the network model are adjusted.

Specifically, determining an objective function value corresponding to the current training of an objective function corresponding to the network model based on the network state information includes: and determining an objective function value corresponding to the current training of the objective function corresponding to the network model based on the network state information and the sample attribute information of the sample tunnel.

Specifically, the objective function includes at least one of:

a delay function; a bandwidth function; a cost function.

Specifically, in the embodiment of the present application, the latency function may include invalid _ lat (a, M, L): and the time delay constraint verification function verifies whether the sum of the time delays of the links passed by the path of one tunnel exceeds the time delay requirement of the tunnel. If all delay constraints are satisfied, it returns 0; max _ lat (a, M): the function is responsible for obtaining the maximum time delay in all tunnels; avg _ lat (a, M): this function is responsible for obtaining the average delay of all tunnels. The bandwidth function may include: an invalid _ bw (A, B, N), the bandwidth constraint verification function verifies whether the sum of the bandwidth demands on each link exceeds the link capacity; max _ bw (a, B, N): this function is responsible for obtaining the maximum bandwidth for all tunnels; avg _ bw (a, B, N): this function is responsible for obtaining the average bandwidth of all tunnels. Cost function: an invalid _ cost (A, Z, K) that a cost constraint verification function verifies whether the sum of the costs of all links on a path of a tunnel exceeds the cost budget requirement of the tunnel; max _ cost (a, Z): this function is responsible for obtaining the maximum cost for all tunnels; avg _ cost (a, Z): this function is responsible for obtaining the average cost of all tunnels.

In the embodiment of the present application, the objective function may be at least one of the above functions, or may be any combination of the above functions, or a new function (a function defined by a user) may be added to calculate the objective function value, and the new function may be used as the objective function, or at least one of the new function and the above objective function may be combined to be used as the objective function.

Specifically, for the first iteration process, the first case: determining the objective function value obtained by the first iteration as a gradient corresponding to a parameter of the current adjustment network model (for example, a gradient corresponding to a parameter of the adjustment DNN model); in the second case: and subtracting a preset value from the objective function value obtained by the first iteration processing to obtain the gradient corresponding to the network parameter of the current adjustment network model. The embodiments of the present application are not limited thereto.

Specifically, for non-first-time iterative processing, a gradient corresponding to a parameter for adjusting the network model is determined based on a target function value obtained by current iterative processing minus a target function value obtained by last iterative processing.

Further, after obtaining the gradient corresponding to the parameter of the adjusted network model, the model parameter of the network model is adjusted by a gradient descent method according to the gradient, so as to realize training and optimization of the network model.

Further, the network model is trained through the iterative processing mode until the preset condition is met.

Specifically, the preset conditions may include: the iteration times are larger than a preset threshold value, and the gradient corresponding to the network parameter of the network model is adjusted to belong to a preset range.

For example, the preset threshold corresponding to the preset iteration number may be 100, and when the iteration number is equal to 100, the training of the network model is stopped to obtain the trained network model; for another example, the preset range corresponding to the gradient is [ -0.01,0.01], and when the gradient is detected to be in the preset range, the training of the network model is stopped to obtain the trained network model.

Further, after the trained network model is obtained, network path information corresponding to each tunnel may be obtained based on the trained network model, which is described in detail in the following embodiments.

An embodiment of the present application provides a network path allocation method, as shown in fig. 2, which is executed by an electronic device, for example, the electronic device may be a central controller, and the method may include:

step S201, acquiring tunnel information of the network tunnel.

Specifically, in this embodiment of the present application, the tunnel information of the network tunnel may be tunnel information of one network tunnel, or may also be tunnel information of at least two network tunnels.

Specifically, the tunnel information is the same as the tunnel information of the sample, i.e., the tunnel information may include: a source address of the tunnel, a destination address of the tunnel, and sample attribute information of the tunnel. For the detailed description of the tunnel information, see the above embodiments, which are not described herein again.

Step S202, the tunnel information of the network tunnel is subjected to network path distribution processing through the trained network model, and network path information corresponding to the network tunnel is obtained.

For the embodiment of the present application, the trained network model is a network model obtained by training based on the network path distribution model training method introduced in the above embodiment, that is, a network model meeting the preset condition, and may also be referred to as a network model meeting the preset condition.

For the embodiment of the application, the tunnel information of the network tunnel is subjected to network path distribution through the trained network model to obtain the network path information corresponding to the network tunnel, that is, to obtain each link through which the network tunnel passes in sequence.

Compared with the prior art, the embodiment of the application provides a network path allocation method, tunnel information of a network tunnel is obtained, then network path allocation processing is carried out on the tunnel information of the network tunnel through a trained network model, and network path information corresponding to the network tunnel is obtained. In the embodiment of the application, the network path information corresponding to each sample tunnel is obtained through the prediction of the training sample, and then the network state of the network is determined according to the network path information corresponding to each sample tunnel, so that the parameters of the network model are adjusted according to the network state of the network, and the continuous simulation of the network environment, the real work of network equipment, a protocol and the like through the training sample is not needed, so that the time for model training can be reduced, the efficiency for model training is improved, and the samples required by training can be reduced.

In the above embodiments, a method of training a network model to obtain a trained network path allocation model and a method of allocating a network path for a network tunnel based on the trained network path allocation model are described, and a method of training a network model (LSTM) to obtain a trained network path allocation model and a method of allocating a network path for a network tunnel based on the trained LSTM model are described below by taking an LSTM as an example, specifically in the following embodiments:

taking the long-short term memory network-based traffic engineering (LSTM-based TE, LSTM-TE) as an example, the LSTM is a special RNN, the LSTM includes a plurality of cells, as shown in fig. 3, the LSTM includes LSTM cell @ t-1, LSTM cell @ t, and LSTM cell @ t +1, and for each element in the input sequence, each LSTM cell (LSTM cell @ t is an example) specifically performs the following operations:

the first step is as follows: the LSTM cell needs to decide which information needs to be discarded from the information transmitted by the last cell, and the decision is made by a sigmoThe id layer (also called forgetting threshold layer) is determined by the formula: f. of_t＝σ(W_ifx_t+b_if+W_hfh_(t–1)+b_hf) Calculated, wherein the input is h_(t–1)And x_t；

The second step is that: the LSTM cell needs to determine what information needs to be stored in the current cell state information, which includes two parts: a sigmoid layer (also called input threshold) and a tanh layer. Wherein the sigmoid layer determines which values need to be updated, specifically by formula i_t＝σ(W_iix_t+b_ii+W_hih_(t–1)+b_hi) Calculating to obtain; the tanh layer creates a new candidate vector g (t) that can be added to the state information of the cell, in particular by the formula g_t＝tanh(W_igx_t+b_ig+W_hgh_(t–1)+b_hg) Calculating to obtain; thirdly, the old cell state C (t-1) is updated, specifically according to the formula C_t＝f_t*c_(t–1)+i_t*g_tThe implementation is carried out; the fourth step, the current cell needs to determine the output information, which includes two parts: a sigmoid layer and a tanh layer. The sigmoid layer determines which parts of the cellular state need to be output, specifically through a formula o_t＝σ(W_iox_t+b_io+W_hoh_(t–1)+b_ho) The implementation is carried out; the tanh layer limits the range of output values to [ -1,1 [)]In the formula h_t＝o_t*tanh(c_t)。

Wherein, here h_tRepresenting a hidden state at time t, c_tRepresenting the state of the cell at time t, x_tShown is an input at time t, h_(t–1)Representing either the hidden state of the corresponding LSTM layer at time t-1 or the initial state at time 0, i_t、f_t、g_t、o_tRespectively representing input, forgetting, cell, and output thresholds; w denotes the weight of different functions in the LSTM cell, b denotes the deviation of the corresponding function, e.g. different rectangular nodes in FIG. 3 denote the cellσ denotes the sigmoid excitation function and σ denotes the hadamard product.

In particular, at the beginning of a training iteration, for an input containing | S | tunnels, at each time node, a vector containing tunnel information is input into LSTM cells, after all | S | tunnels are processed for | S | time steps, we can obtain the final hidden state of the current training epoch as output A.A, which is a matrix of | T | × | E |, representing the paths each tunnel in the | T | tunnels allocates in the network topology, while we input A into dNE to compute the network state information of the current network, and compute the gradient of the tuning parameters based on the network state information of the current network and the objective function, and then tune the parameters in the LSTM based on the gradient of the tuning parameters, which completes an epoch of LSTM model training in dNE.

Further, after the LSTM-TE model is obtained through training, the tunnel information of at least one network tunnel is passed through the trained LSTM-TE to obtain the network path corresponding to each network tunnel.

Further, the LSTM model (referred to as LSTM-TE) trained in the embodiment of the present application performs network path allocation on the tunnel, and compares the network path allocation with the existing TE algorithm, where the network path allocation performed by the LSTM-TE on the tunnel includes: fig. 4 shows the change of the traffic data with a duration of 35 hours measured by using throughput normalized by total traffic demand, maximum congestion (maximum link utilization), and congestion (packet loss due to congestion is normalized by total traffic demand) and the like, as shown in fig. 4, where the metric includes Constrained Shortest Path First (CSPF), Equal Cost Multipath (ECMP), k Shortest Path (k-short Path, KSP) + Multi-object network Flow (MCF), and DRL-based traffic engineering (DRL-based TE, DRL-TE) based on deep reinforcement learning. LSTM-TE was found to perform better than ECMP and KSP + MCF in all metrics, and to perform at the same level in throughput and congestion packet loss compared to the existing best performing non-DNN algorithms CSPF and SMORE. Compared with ECMP, the LSTM-TE improves the throughput by 13.1 percent and reduces the congestion packet loss by 93.9 percent on average; for KSP + MCF, the corresponding gains were 0.8% and 51.8%, respectively. LSTM-TE performs worse than CSPF and SMORE in terms of maximum congestion. Despite the single path routing limitations, LSTM-TE performance in terms of throughput and congestion packet loss still exceeds DRL-TE (average 7.0% and 89.8% gains, respectively). That is, in the embodiment of the present application, an arbitrary DNN model may be trained as a network path allocation model, which may be used to solve the TE problem and obtain a great performance benefit.

Specifically, CSPF is a greedy algorithm that searches for the shortest path in a network that meets the tunnel requirements as the assigned path for the tunnel. For the batch tunnel requirement, executing CSPF for each tunnel one by one; the ECMP is used for adjusting the weight of the link so as to calculate tunnel paths, and the path of each tunnel is calculated by a Dijkstra algorithm based on the weight of the link; in the algorithm, k shortest paths are calculated firstly, and then the MCF problem is abstracted out and the problem of bandwidth allocation or flow segmentation is solved by using linear programming; SMORE is similar to KSP + MCF, except that SMORE does not take k shortest paths, but rather paths generated using an unrelated routing algorithm; DRL-the DRL used to solve the TE problem.

Further, the embodiments of the present application define the training speed of dNE (the way in which the network model is trained in the embodiments of the present application) as the time taken from taking the TE action to observing the effect produced by the action. Taking TE action refers to applying the decision of the algorithm model (here, path allocation of the tunnel) to the actual TE network, that is, allocating the tunnel to a specific network path according to the TE decision and meeting the requirement of the tunnel; the effect generated by the action refers to the TE network state information (such as network link utilization rate, link delay and the like) obtained after the TE network convergence is completed; the effect from taking TE actions to observing actions refers to the process of TE algorithm model decision application to the actual TE network resulting in TE network state change and reconvergence to a new final TE network state, where speed is defined as the duration of the process. In the present embodiment, dNE is compared with the existing general network simulator OMNet + + and a network topology comprising 100 nodes and 500 1Gbps links is established on dNE and OMNet + +. The embodiment of the application simultaneously creates 100 tunnels with the bandwidth requirement of 10Kbps and uniformly places the tunnels into the network; then the placement of any one tunnel is changed, and the change situation of the corresponding link bandwidth usage is observed. The present embodiment repeats this process 1000 times and measures the time each time a tunnel is placed and the corresponding link utilization is observed to change all usage. The following are found: for OMNet + +, the average change delay observed was 228.3 ms and 99% of the delay was 1.594 seconds, but for dNE the changes produced by the corresponding action were almost immediate (99% of the delay was 0.977 ms). Because the dNE only needs one matrix operation, the dNE has extremely fast response compared with the existing general network simulator, and the training speed can be improved by over 228 times.

Further, it is a great concern for the prior art universal discrete event simulator. As the network size increases, the simulation process becomes very slow. The embodiment of the application enlarges the network scale of the experiment to the network comprising 1000 nodes and 5000 links. It was found that OMNet + + increased the observed change in the experiment from 228.3 milliseconds to 29.1 seconds for each tunnel placement. This means that for an SDWAN network containing 1000 routers, it takes approximately 0.5 minutes for each training iteration OMNet + +. Given that most deep learning models require thousands of iterations, the total time required to train using prior art universal network simulators is unacceptable. Dnes, on the other hand, are insensitive to changes in net size, since changing net size in dnes only requires changing the dimensions of the tensors in the net model. The invention finds that scaling up the network results in little change in the matrix computation time. This indicates that dnes can be scaled up to larger scale networks than traditional network simulators, while keeping model evaluation and training speed unchanged.

The above embodiments describe a network path allocation model training method and a network path allocation method from the perspective of a method flow, and the following embodiments describe a network path allocation model training apparatus and a network path allocation apparatus from the perspective of a virtual module or a virtual unit, which are described in detail in the following embodiments, wherein,

an embodiment of the present application provides a network path assignment model training apparatus, as shown in fig. 6, the network path assignment model training apparatus 60 may include: a first acquisition module 61, a training module 62, wherein,

a first obtaining module 61, configured to obtain a training sample, where the training sample includes sample tunnel information corresponding to at least one sample tunnel in a network;

the training module 62 is configured to perform iterative training on a preset network model based on a training sample until a preset condition is met, and use the network model meeting the preset condition as a network path distribution model;

the training module 62 is specifically configured to, when performing a training on the network model for one time:

In a possible implementation manner of the embodiment of the present application, the sample tunnel information includes a source address of the sample tunnel, a target address of the sample tunnel, and sample attribute information of the sample tunnel.

In a possible implementation manner of the embodiment of the present application, the sample attribute information of the sample tunnel includes: at least one of a bandwidth requirement of the sample tunnel, a tunnel grade of the sample tunnel, a tunnel latency requirement of the sample tunnel, or a tunnel link cost of the sample tunnel.

In a possible implementation manner of the embodiment of the present application, the network path information corresponding to any sample tunnel includes link identifiers of directional links through which the network path sequentially passes,

when determining the network state information of the network based on the obtained network path information, the training module 62 is specifically configured to:

determining prediction attribute information corresponding to each sample tunnel based on link attribute information of each link contained in the network and link information passed by each network path information;

In a possible implementation manner of the embodiment of the present application, the prediction attribute information corresponding to any sample tunnel includes at least one of the following:

link capacity information; link delay information; link cost information.

In a possible implementation manner of the embodiment of the present application, when the training module 62 adjusts the model parameters of the network model based on the network state information, it is specifically configured to:

In a possible implementation manner of the embodiment of the present application, when determining, based on the network state information, an objective function value corresponding to a current training of an objective function corresponding to the network model, the training module 62 is specifically configured to:

In a possible implementation manner of the embodiment of the present application, the objective function includes at least one of the following:

a delay function; a bandwidth function; a cost function.

The network path distribution model training apparatus of this embodiment may execute the network path distribution model training method shown in the above method embodiments, and the implementation principles thereof are similar, and are not described herein again.

The embodiment of the application provides a network path distribution model training device, compares with prior art, this application embodiment obtains the training sample, and the training sample includes the sample tunnel information that at least one sample tunnel corresponds in the network, then based on the training sample, carries out iterative training to predetermined network model, until satisfying predetermined condition, will satisfy the network model of predetermined condition as network path distribution model, and wherein, the mode of carrying out a training to network model includes: inputting the training samples into a network model, and predicting to obtain network path information corresponding to each sample tunnel; determining network state information of the network based on the obtained network path information; model parameters of the network model are adjusted based on the network state information. In other words, in the embodiment of the present application, for each training, the network state corresponding to the current network can be calculated, and it is not necessary to continuously simulate the network environment, the real network device and the protocol work, etc. through the training samples, so that the time for model training can be reduced, the efficiency for model training can be improved, and the samples required for training can be reduced.

An embodiment of the present application further provides a network path allocating apparatus, as shown in fig. 7, the network path allocating apparatus 70 includes: a second acquisition module 71, an assignment module 72, wherein,

a second obtaining module 71, configured to obtain tunnel information of the network tunnel;

the allocating module 72 is configured to perform network path allocation processing on the tunnel information of the network tunnel through the trained network model shown in the foregoing method embodiment, so as to obtain network path information corresponding to the network tunnel.

Compared with the prior art, the embodiment of the application provides a network path allocation device, and the method and the device for allocating the network paths acquire the tunnel information of the network tunnel, and then perform network path allocation processing on the tunnel information of the network tunnel through a trained network model to acquire the network path information corresponding to the network tunnel. In the embodiment of the application, the network path information corresponding to each sample tunnel is obtained through the prediction of the training sample, and then the network state of the network is determined according to the network path information corresponding to each sample tunnel, so that the parameters of the network model are adjusted according to the network state of the network, and the continuous simulation of the network environment, the real work of network equipment, a protocol and the like through the training sample is not needed, so that the time for model training can be reduced, the efficiency for model training is improved, and the samples required by training can be reduced.

The network path allocating apparatus of this embodiment can execute the network path allocating method shown in the above method embodiments, and the implementation principles thereof are similar, and are not described herein again.

In the above embodiments, a network path model training apparatus and a network path allocation apparatus are introduced from the perspective of a virtual module, and in the following embodiments, an electronic device is introduced from the perspective of a physical apparatus, which can be used to execute the network path allocation model training method shown in the above method embodiments and execute the network path allocation method shown in the above embodiments, which is described in detail in the following embodiments.

An embodiment of the present application provides an electronic device, and as shown in fig. 8, an electronic device 8000 shown in fig. 8 includes: a processor 8001 and memory 8003. Processor 8001 is coupled to memory 8003, such as via bus 8002. Optionally, the electronic device 8000 may also include a transceiver 8004. In addition, the transceiver 8004 is not limited to one in practical applications, and the structure of the electronic device 8000 does not limit the embodiment of the present application.

Processor 8001 may be a CPU, general purpose processor, DSP, ASIC, FPGA or other programmable logic device, transistor logic device, hardware component, or any combination thereof. Which may implement or perform the various illustrative logical blocks, modules, and circuits described in connection with the disclosure. Processor 8001 may also be a combination that implements computing functionality, e.g., comprising one or more microprocessor combinations, DSP and microprocessor combinations, and so forth.

Bus 8002 may include a path to transfer information between the aforementioned components. The bus 8002 may be a PCI bus or an EISA bus, etc. The bus 8002 may be divided into an address bus, a data bus, a control bus, and the like. For ease of illustration, only one thick line is shown in FIG. 8, but this is not intended to represent only one bus or type of bus.

Memory 8003 may be, but is not limited to, ROM or other types of static storage devices that can store static information and instructions, RAM or other types of dynamic storage devices that can store information and instructions, EEPROM, CD-ROM or other optical disk storage, optical disk storage (including compact disk, laser disk, optical disk, digital versatile disk, blu-ray disk, etc.), magnetic disk storage media or other magnetic storage devices, or any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer.

The memory 8003 is used for storing application program codes for executing the scheme of the present application, and the execution is controlled by the processor 8001. Processor 8001 is configured to execute application program code stored in memory 8003 to implement what is shown in the foregoing method embodiments of network path assignment model training.

An embodiment of the present application provides an electronic device, where the electronic device includes: a memory and a processor; at least one program stored in the memory for execution by the processor, which when executed by the processor, implements: the embodiment of the application obtains the training sample, and the training sample includes the sample tunnel information that at least one sample tunnel corresponds in the network, then based on the training sample, carries out iterative training to predetermined network model, and until satisfying the preset condition, will satisfy the network model of preset condition as network path distribution model, wherein, carries out the mode of training once to network model, includes: inputting the training samples into a network model, and predicting to obtain network path information corresponding to each sample tunnel; determining network state information of the network based on the obtained network path information; model parameters of the network model are adjusted based on the network state information. In other words, in the embodiment of the present application, for each training, the network state corresponding to the current network can be calculated, and it is not necessary to continuously simulate the network environment, the real network device and the protocol work, etc. through the training samples, so that the time for model training can be reduced, the efficiency for model training can be improved, and the samples required for training can be reduced.

The present application provides a computer-readable storage medium, on which a computer program is stored, which, when running on a computer, enables the computer to execute the corresponding content in the foregoing method embodiments. Compared with the prior art, the embodiment of the application obtains the training sample, and the training sample includes the sample tunnel information that at least one sample tunnel corresponds in the network, then based on the training sample, carries out iterative training to the network model of predetermineeing, and until satisfying the predetermined condition, will satisfy the network model of predetermined condition as the network route distribution model, wherein, carry out the mode of training once to the network model, include: inputting the training samples into a network model, and predicting to obtain network path information corresponding to each sample tunnel; determining network state information of the network based on the obtained network path information; model parameters of the network model are adjusted based on the network state information. In other words, in the embodiment of the present application, for each training, the network state corresponding to the current network can be calculated, and it is not necessary to continuously simulate the network environment, the real network device and the protocol work, etc. through the training samples, so that the time for model training can be reduced, the efficiency for model training can be improved, and the samples required for training can be reduced.

An embodiment of the present application provides an electronic device, as shown in fig. 9, an electronic device 9000 shown in fig. 9 includes: a processor 9001 and a memory 9003. Among other things, the processor 9001 and memory 9003 are coupled, such as via a bus 9002. Optionally, the electronic device 9000 can also include a transceiver 9004. Note that the transceiver 9004 is not limited to one in practical use, and the structure of the electronic device 9000 is not limited to the embodiment of the present application.

The processor 9001 may be a CPU, general purpose processor, DSP, ASIC, FPGA or other programmable logic device, transistor logic device, hardware component, or any combination thereof. Which may implement or perform the various illustrative logical blocks, modules, and circuits described in connection with the disclosure. The processor 9001 may also be a combination of computing functions, e.g., comprising one or more microprocessors, a combination of DSPs and microprocessors, or the like.

The bus 9002 may include a pathway to transfer information between the aforementioned components. The bus 9002 may be a PCI bus or an EISA bus, etc. The bus 9002 may be divided into an address bus, a data bus, a control bus, and the like. For ease of illustration, only one thick line is shown in FIG. 9, but this does not indicate only one bus or one type of bus.

The memory 9003 may be a ROM or other type of static storage device that may store static information and instructions, a RAM or other type of dynamic storage device that may store information and instructions, an EEPROM, a CD-ROM or other optical disk storage, optical disk storage (including compact disk, laser disk, optical disk, digital versatile disk, blu-ray disk, etc.), magnetic disk storage media or other magnetic storage devices, or any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer, but is not limited to.

The memory 9003 is used to store application code for performing aspects of the present application and is controlled by the processor 9001 for execution. Processor 9001 is configured to execute application program code stored in memory 9003 to implement what is shown in the foregoing network path allocation method embodiments.

An embodiment of the present application provides an electronic device, where the electronic device includes: a memory and a processor; at least one program stored in the memory for execution by the processor, which when executed by the processor, implements: the method and the device for acquiring the network tunnel information acquire the tunnel information of the network tunnel, and then perform network path allocation processing on the tunnel information of the network tunnel through the trained network model to acquire the network path information corresponding to the network tunnel. In the embodiment of the application, the network path information corresponding to each sample tunnel is obtained through the prediction of the training sample, and then the network state of the network is determined according to the network path information corresponding to each sample tunnel, so that the parameters of the network model are adjusted according to the network state of the network, and the continuous simulation of the network environment, the real work of network equipment, a protocol and the like through the training sample is not needed, so that the time for model training can be reduced, the efficiency for model training is improved, and the samples required by training can be reduced.

The present application provides a computer-readable storage medium, on which a computer program is stored, which, when running on a computer, enables the computer to execute the corresponding content in the foregoing method embodiments. Compared with the prior art, the method and the device for acquiring the network tunnel information acquire the tunnel information of the network tunnel, and then perform network path allocation processing on the tunnel information of the network tunnel through the trained network model to acquire the network path information corresponding to the network tunnel. In the embodiment of the application, the network path information corresponding to each sample tunnel is obtained through the prediction of the training sample, and then the network state of the network is determined according to the network path information corresponding to each sample tunnel, so that the parameters of the network model are adjusted according to the network state of the network, and the continuous simulation of the network environment, the real work of network equipment, a protocol and the like through the training sample is not needed, so that the time for model training can be reduced, the efficiency for model training is improved, and the samples required by training can be reduced.

It should be understood that, although the steps in the flowcharts of the figures are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and may be performed in other orders unless explicitly stated herein. Moreover, at least a portion of the steps in the flow chart of the figure may include multiple sub-steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, which are not necessarily performed in sequence, but may be performed alternately or alternately with other steps or at least a portion of the sub-steps or stages of other steps.

The foregoing is only a partial embodiment of the present invention, and it should be noted that, for those skilled in the art, various modifications and decorations can be made without departing from the principle of the present invention, and these modifications and decorations should also be regarded as the protection scope of the present invention.

Claims

1. A network path allocation model training method is characterized by comprising the following steps:

performing iterative training on a preset network model based on the training sample until a preset condition is met, and taking the network model meeting the preset condition as a network path distribution model;

the method for training the network model for one time comprises the following steps:

inputting the training samples into the network model, and predicting to obtain network path information corresponding to each sample tunnel;

adjusting model parameters of the network model based on the network state information.

2. The method of claim 1, wherein the sample tunnel information comprises a source address of the sample tunnel, a destination address of the sample tunnel, and sample attribute information of the sample tunnel.

3. The method of claim 2, wherein the sample attribute information of the sample tunnel comprises: at least one of a bandwidth requirement of the sample tunnel, a tunnel class of the sample tunnel, a tunnel latency requirement of the sample tunnel, or a tunnel link cost of the sample tunnel.

4. The method according to any one of claims 1 to 3, wherein the network path information corresponding to any one of the sample tunnels includes link identifiers of respective directed links through which the network path passes in sequence,

the determining the network state information of the network based on the obtained network path information includes:

and determining the network state information based on the prediction attribute information corresponding to each sample tunnel.

5. The method of claim 4, wherein the prediction attribute information corresponding to any sample tunnel comprises at least one of:

link capacity information; link delay information; link cost information.

6. The method according to any of claims 1-5, wherein said adjusting model parameters of the network model based on the network state information comprises:

7. The method of claim 6, wherein the determining, based on the network state information, an objective function value corresponding to a current training of an objective function corresponding to the network model comprises:

8. The method of claim 6 or 7, wherein the objective function comprises at least one of:

a delay function; a bandwidth function; a cost function.

9. A method for network path allocation, comprising:

acquiring tunnel information of a network tunnel;

and carrying out network path distribution processing on the tunnel information of the network tunnel through the network model trained by any one of claims 1-8 to obtain the network path information corresponding to the network tunnel.

10. A network path assignment model training apparatus, comprising:

the training module is used for carrying out iterative training on a preset network model based on the training sample until a preset condition is met, and taking the network model meeting the preset condition as a network path distribution model;

the training module is specifically configured to, when performing a training on the network model for one time:

11. A network path allocation apparatus, comprising:

an allocation module, configured to perform network path allocation processing on the tunnel information of the network tunnel through the network model trained according to any one of claims 1 to 8, to obtain network path information corresponding to the network tunnel.

12. An electronic device, comprising:

one or more processors;

a memory;

one or more applications, wherein the one or more applications are stored in the memory and configured to be executed by the one or more processors, the one or more programs configured to: carrying out the method according to any one of claims 1 to 9.

13. A computer-readable storage medium, on which a computer program is stored, which program, when being executed by a processor, is adapted to carry out the method of any one of claims 1 to 9.