CN111340192B

CN111340192B - Network path allocation model training method, path allocation method and device

Info

Publication number: CN111340192B
Application number: CN202010130022.5A
Authority: CN
Inventors: 陈力; 刘礼彬
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2020-02-28
Filing date: 2020-02-28
Publication date: 2023-06-30
Anticipated expiration: 2040-02-28
Also published as: CN111340192A

Abstract

The embodiment of the application provides a network path allocation model training method, a path allocation method and a device, and relates to the fields of traffic engineering and artificial intelligence. The method comprises the following steps: obtaining a training sample, wherein the training sample comprises sample tunnel information corresponding to at least one sample tunnel in a network, and then performing iterative training on a preset network model based on the training sample until a preset condition is met, and taking the network model meeting the preset condition as a network path distribution model, wherein the one-time training mode for the network model comprises the following steps: inputting training samples into a network model, and predicting to obtain network path information corresponding to each sample tunnel; determining network state information of a network based on the obtained network path information; model parameters of the network model are adjusted based on the network state information. According to the embodiment of the application, the time for model training is reduced, the efficiency of model training is improved, and samples required by training can be reduced.

Description

Network path allocation model training method, path allocation method and device

Technical Field

The application relates to the technical fields of traffic engineering and artificial intelligence, in particular to a network path allocation model training method, a path allocation method and a device.

Background

For cloud service providers with global traffic, the backbone (Wide Area Network, WAN) is responsible for connecting to their global data centers, providing communication services for large-scale applications, one of the most important parts of the cloud service infrastructure. Traffic in WANs is massive and still growing rapidly. Traffic engineering (Traffic Engineering, TE) has therefore received considerable attention from both the academia and industry as an important means of improving network application performance and reducing costs in backbone networks. TE needs to distribute traffic of different bandwidth requirements and priorities over different network paths to achieve different goals, subject to network performance and cost constraints. In particular, traffic can be distributed to different network paths through a trained TE deep neural network (Deep Neural Networks, DNN) model, so how to train the TE DNN model becomes a critical issue.

In the prior art, when training the TE DNN model, the network environment, the actual network device and the operation of the protocol in the WAN, such as the forwarding delay of the switch and the router device, the operation of the routing protocol and the transmission protocol, and the convergence process, need to be continuously simulated through the training samples, so as to finally reach the state of TE network convergence, but training the DNN model through the simulation manner may require a large number of training samples, and the training time is long.

Disclosure of Invention

The application provides a network path allocation model training method, a path allocation method and a device, which can solve at least one technical problem. The technical proposal is as follows:

in a first aspect, a network path allocation model training method is provided, the method including:

acquiring a training sample, wherein the training sample comprises sample tunnel information corresponding to at least one sample tunnel in a network;

based on the training sample, carrying out iterative training on a preset network model until a preset condition is met, and taking the network model meeting the preset condition as a network path distribution model;

the method for training the network model once comprises the following steps:

inputting training samples into a network model, and predicting to obtain network path information corresponding to each sample tunnel;

determining network state information of a network based on the obtained network path information;

model parameters of the network model are adjusted based on the network state information.

In one possible implementation, the sample tunnel information includes a source address of the sample tunnel, a destination address of the sample tunnel, and sample attribute information of the sample tunnel.

In another possible implementation, the sample attribute information of the sample tunnel includes: at least one of a bandwidth requirement of the sample tunnel, a tunnel class of the sample tunnel, a tunnel latency requirement of the sample tunnel, or a tunnel link cost of the sample tunnel.

In another possible implementation, the network path information corresponding to any one of the sample tunnels includes a link identification of each of the directional links through which the network path sequentially passes,

based on the obtained path information of each network, determining network state information of the network comprises:

determining predicted attribute information corresponding to each sample tunnel based on link attribute information of each link included in the network and link information passed by each network path;

and determining network state information based on the prediction attribute information corresponding to each sample tunnel.

In another possible implementation, any sample tunnel prediction attribute information includes at least one of:

link capacity information; link delay information; link cost information.

In another possible implementation, adjusting model parameters of the network model based on the network state information includes:

determining an objective function value corresponding to the current training of an objective function corresponding to the network model based on the network state information;

determining a training gradient of the network model based on the objective function value corresponding to the current training and the objective function value corresponding to the last training;

And adjusting model parameters of the network model based on the determined gradient.

In another possible implementation manner, determining an objective function value corresponding to a current training of an objective function corresponding to a network model based on network state information includes:

and determining an objective function value corresponding to the current training of the objective function corresponding to the network model based on the network state information and the sample attribute information of the sample tunnel.

In another possible implementation, the objective function includes at least one of:

a time delay function; a bandwidth function; cost function.

In a second aspect, a network path allocation method is provided, including:

acquiring tunnel information of a network tunnel;

and carrying out network path distribution processing on the tunnel information of the network tunnel through the trained network model shown in the first aspect or any possible implementation manner of the first aspect to obtain network path information corresponding to the network tunnel.

In a third aspect, a network path allocation model training device is provided, including:

the first acquisition module is used for acquiring training samples, wherein the training samples comprise sample tunnel information corresponding to at least one sample tunnel in a network;

The training module is used for carrying out iterative training on a preset network model based on the training sample until the preset condition is met, and taking the network model meeting the preset condition as a network path distribution model;

the training module is specifically configured to, when training the network model once:

In another possible implementation, the sample attribute information includes: at least one of a bandwidth requirement of the sample tunnel, a tunnel class of the sample tunnel, a tunnel latency requirement of the sample tunnel, or a tunnel link cost of the sample tunnel.

The training module is specifically configured to, when determining network state information of the network based on the obtained path information of each network:

In another possible implementation manner, the prediction attribute information corresponding to any sample tunnel includes at least one of the following:

link capacity information; link delay information; link cost information.

In another possible implementation manner, the training module is specifically configured to, when adjusting the model parameters of the network model based on the network state information:

In another possible implementation manner, the training module is specifically configured to, when determining, based on the network state information, an objective function value corresponding to a current training of an objective function corresponding to the network model:

a time delay function; a bandwidth function; cost function.

In a fourth aspect, there is provided a network path allocation apparatus, comprising:

the second acquisition module is used for acquiring the tunnel information of the network tunnel;

and the distribution module is used for carrying out network path distribution processing on the tunnel information of the network tunnel through the network model trained by the first aspect or any possible implementation manner of the first aspect to obtain the network path information corresponding to the network tunnel.

The beneficial effects that this application provided technical scheme brought are:

compared with the prior art, the method for acquiring training samples, which comprise sample tunnel information corresponding to at least one sample tunnel in a network, and then performing iterative training on a preset network model based on the training samples until a preset condition is met, wherein the network model meeting the preset condition is used as the network path distribution model, and the method for performing one-time training on the network model comprises the following steps: inputting training samples into a network model, and predicting to obtain network path information corresponding to each sample tunnel; determining network state information of a network based on the obtained network path information; model parameters of the network model are adjusted based on the network state information. In the application, the network state corresponding to the current network can be calculated for each training without continuously simulating the network environment, the actual network equipment, the operation of the protocol and the like through training samples, so that the time of model training can be reduced, the efficiency of model training can be improved, and samples required by training can be reduced.

Compared with the prior art, the method and the electronic device for network path allocation acquire tunnel information of a network tunnel, and then perform network path allocation processing on the tunnel information of the network tunnel through a trained network model to obtain network path information corresponding to the network tunnel. In the method, the network path information corresponding to each sample tunnel is obtained through the prediction of the training samples, and then the network state of the network is determined according to the network path information corresponding to each sample tunnel, so that the parameters of the network model are adjusted according to the network state of the network, the operation of the network environment, real network equipment and protocols and the like are not required to be continuously simulated through the training samples, the time of model training can be shortened, the efficiency of model training is improved, and samples required for training can be reduced.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings that are required to be used in the description of the embodiments of the present application will be briefly described below.

Fig. 1 is a flow chart of a training method of a network path allocation model according to an embodiment of the present application;

FIG. 2 is a schematic flow chart of a trained network path allocation model application according to an embodiment of the present application;

FIG. 3 is a schematic structural diagram of a long-short-term memory network model for training according to an embodiment of the present disclosure;

fig. 4 is a schematic diagram illustrating a Long Short-Term Memory (LSTM) after training according to an embodiment of the present application, where network path allocation performed on a network model (LSTM) is compared with throughput, maximum congestion and congestion packet loss of an existing TE algorithm;

FIG. 5 is a schematic diagram of training a TE DNN model based on a differentiable TE network simulator according to an embodiment of the present application;

fig. 6 is a schematic structural diagram of a training device for a network path allocation model according to an embodiment of the present application;

fig. 7 is a schematic structural diagram of a network path allocation device according to an embodiment of the present application;

fig. 8 is a schematic structural diagram of an electronic device trained by a network path allocation model according to an embodiment of the present application;

fig. 9 is a schematic structural diagram of an electronic device for network path allocation according to an embodiment of the present application.

Detailed Description

Embodiments of the present application are described in detail below, examples of which are illustrated in the accompanying drawings, wherein the same or similar reference numerals refer to the same or similar elements or elements having the same or similar functions throughout. The embodiments described below by referring to the drawings are exemplary only for the purpose of illustrating the present application and are not to be construed as limiting the invention.

As used herein, the singular forms "a", "an", "the" and "the" are intended to include the plural forms as well, unless expressly stated otherwise, as understood by those skilled in the art. It will be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof. It will be understood that when an element is referred to as being "connected" or "coupled" to another element, it can be directly connected or coupled to the other element or intervening elements may also be present. Further, "connected" or "coupled" as used herein may include wirelessly connected or wirelessly coupled. The term "and/or" as used herein includes all or any element and all combination of one or more of the associated listed items.

For the purpose of making the objects, technical solutions and advantages of the present application more apparent, the embodiments of the present application will be described in further detail below with reference to the accompanying drawings.

Several terms which are referred to in this application are first introduced and explained:

Traffic engineering (Traffic Engineering, TE) is the last ring before the network is put into production and is a tool for traffic planning in the network. The process of planning a path for traffic in a network is traffic engineering;

artificial intelligence (Artificial Intelligence, AI) is the theory, method, technique and application system that uses a digital computer or a machine controlled by a digital computer to simulate, extend and extend human intelligence, sense the environment, acquire knowledge and use the knowledge to obtain optimal results. In other words, artificial intelligence is an integrated technology of computer science that attempts to understand the essence of intelligence and to produce a new intelligent machine that can react in a similar way to human intelligence. Artificial intelligence, i.e. research on design principles and implementation methods of various intelligent machines, enables the machines to have functions of sensing, reasoning and decision.

In the prior art, the network model may be trained by a general network simulator, for example, the third generation of discrete event network simulators (NS 3 ) and the discrete event network simulators (Objective Modular NEtworkTestBed in C ++, omnet++) which are modular and component-based, but the following technical problems are specific to the existing general network simulators when training the network model:

(1) Existing general network simulators treat the network (i.e., WAN network) as a black box, algorithm independent of a specific network model, such as the algorithm of the deep neural network (Deep Neural Network, DNN) model. Such network simulators have the disadvantages of slow training speed, poor scalability, and inability to support other DNN algorithm models than Deep RL (DRL) for DNN models;

(2) The existing general network simulator needs to simulate a real network environment through calculation, a DNN model needs a large number of training samples to carry out repeated training, and on the general network simulator based on discrete events, the training speed of the DNN model is very low, so that the model is difficult to converge. The larger the network scale is, the longer the training time of the model is, and the problem of the expandability of the simulator also limits the expandability of the algorithm model;

(3) The existing general network simulator is not differentiable and does not support the training of a model based on a gradient descent optimization method. The DNN model-based TE algorithm is therefore limited to only using the DRL algorithm because the DRL can estimate the dynamics of the TE network system, such as state transitions, action functions, and rewards functions, through proxy learning. The DNN model applied in TE at present is also limited by the limit of RL algorithm, such as the problem that the model is extremely difficult to converge and the training efficiency is low due to large training sample deviation. In addition, in many scenarios, other DNN model-based algorithms perform beyond the RL, such as trajectory optimization in robotic control and Meng Daka robotics search in games. The assumption of a TE network as a "black box" prevents the use of other DNN models to solve the TE problem.

The above problems encountered by existing generic network simulators in Software Defined backbone network (SDWAN) TE applications are solved in embodiments of the present application.

(1) In SDWAN, the network is not a "black box," and its network environment is entirely defined: the current network state and tunnel-path allocation result can explicitly determine the next network state; metrics of TE, such as latency, path length, and link utilization, can be directly calculated. Therefore, the future network state can be clearly calculated according to the current network state and the output of the algorithm, and the TE network simulator in the embodiment of the application can immediately feed back the network state information, so that the model training speed and the model expansibility are greatly improved.

(2) In addition, differential programming can cause the software to be programmed as a process that assembles a network of parameterized functional modules, thereby making each portion of the software differentiable. This allows the program to be trained using gradient descent based optimization methods to optimize parameters in the software. Following such programming paradigm, embodiments of the present application design a fully differentiable network simulator dNE that can be embedded as a separate "layer" in the deep learning model. This allows any DNN model, such as recurrent neural networks (Recurrent Neural Network, RNN), generation countermeasure networks (Generative Adversarial Networks, GAN), and differentiable neural computers (Differentiable Neural Computers, DNC), to be trained on dnes.

The following describes the technical solutions of the present application and how the technical solutions of the present application solve the above technical problems in detail with specific embodiments. The following embodiments may be combined with each other, and the same or similar concepts or processes may not be described in detail in some embodiments. Embodiments of the present application will be described below with reference to the accompanying drawings.

The embodiment of the application provides a model training method, as shown in fig. 1, which is executed by an electronic device and includes:

step S101, obtaining training samples.

The training samples comprise sample tunnel information corresponding to at least one sample tunnel in the network.

For the embodiments of the present application, a tunnel is a channel of aggregate traffic of a particular level between a pair of ingress and egress WAN routers.

Step S102, based on the training samples, iterative training is carried out on a preset network model until preset conditions are met, and the network model meeting the preset conditions is used as a network path distribution model.

Further, in the embodiment of the present application, training the network model based on the training sample through iterative training until a preset condition is met, to obtain a trained network model (i.e., a network path allocation model), where the trained network model may be set in the central controller, and is used to allocate a network path for each tunnel. I.e. determining the respective links through which the respective tunnels pass. For example, the network model in embodiments of the present application may be a DNN model.

Further, the preset conditions are described in the following examples, and are not described herein.

Specifically, the method for training the network model once includes: inputting training samples into a network model, and predicting to obtain network path information corresponding to each sample tunnel; determining network state information of a network based on the obtained network path information; model parameters of the network model are adjusted based on the network state information.

Specifically, in the embodiment of the present application, adjusting the model parameters of the network model based on the network state information may specifically include: based on the network state information, calculating the value and gradient value of the objective function, and further adjusting the model parameters of the DNN model.

Specifically, in the embodiment of the present application, the network state information is network state information in the WAN network. Further, after obtaining the network state information, adjusting the network parameters in the network model based on the obtained network state information may include: based on the obtained network state information, the network parameters corresponding to each layer in the network model are adjusted, and of course, only the network parameters of some network layers in the network model can be adjusted. The embodiments of the present application are not limited thereto.

Specifically, the sample tunnel information in step S101 includes a source address of the sample tunnel, a target address of the sample tunnel, and sample attribute information of the sample tunnel.

Specifically, the sample attribute information of the sample tunnel includes: at least one of a bandwidth requirement of the sample tunnel, a tunnel class of the sample tunnel, a tunnel latency requirement of the sample tunnel, or a tunnel link cost of the sample tunnel.

Further, in one embodiment of the present application, if the training samples include sample tunnel information of T sample network tunnels, a matrix may be formed. In particular, the method comprises the steps of,

1. the source address O is a vector of |T|×1 dimension, wherein for any tunnel T in the tunnel set T, O [ T ] represents the index of the tunnel T in the source address vector O;

2. the destination address D is a vector of |T| x 1 dimension, where for any tunnel T in the tunnel set T, dt represents the index of the tunnel T in the destination address vector D;

3. the bandwidth requirement B of the tunnel is a vector of |T| x 1 dimension, wherein B [ T ] represents the bandwidth requirement of the tunnel T;

4. the tunnel class C is a vector of dimensions |t|×|p|, where P represents the set of tunnel classes supported in the network; if the level of tunnel t is p is C t][p]The value of (2) is 1; conversely, C [ t ]][p]The value of (2) is 0. Each tunnel belongs to only one class of traffic class (Σ) _p C[t][p]=1). The present network model assumes that all routers use strict priority queues;

5. the tunnel delay requirement L is a vector of |T| multiplied by 1 dimension, and L [ T ] (. Gtoreq.0) represents the maximum total delay acceptable by the tunnel T;

6. the tunnel link cost (length) requirement Z is a vector of |T|×1 dimension, and Z [ T ] (. Gtoreq.0) represents the maximum total link cost acceptable for tunnel T.

Further, after the training samples are obtained, iterative training is performed on the network model based on the obtained training samples, and for one training, TE decision (represented by A, namely network path information corresponding to each sample tunnel) is obtained through the network model based on the obtained training samples. Specifically, the TE decision may be a matrix of |t|×|e|, i.e., a= |t|×|e|, characterizing the path each tunnel in |t| allocates in the network topology, e.g., a value of 0 or 1 for a [ T ] [ E ] to indicate whether the tunnel T passes through the link E; further, after the TE decision is obtained, network state information of the current network is determined according to the TE decision, then an objective function value corresponding to the training is determined according to the network state information and the determined objective function, then a training gradient of the network model is obtained according to the objective function value corresponding to the training and the objective function value corresponding to the last training, and then network parameters in the network model are adjusted according to the obtained training gradient of the network model.

Further, determining network state information of the current network according to the TE decision, determining an objective function value corresponding to the current training according to the network state information and the determined objective function, obtaining a gradient corresponding to a network parameter of the network model according to the objective function value corresponding to the current training and the objective function value corresponding to the last training, and adjusting the network parameter in the network model according to the gradient, wherein the method can be specifically executed in a differentiable TE network simulator (also referred to as dNE). Specifically, as shown in fig. 5, the above steps (determining the network state information of the current network according to the TE decision, determining the objective function value corresponding to the current training according to the network state information and the determined objective function, obtaining the gradient corresponding to the network parameter of the network model according to the objective function value corresponding to the current training and the objective function value corresponding to the last training) may be performed in the dNE, where the dNE includes two stages (a network state evaluation stage and a network summarization stage), where the network state evaluation stage is determining the network state information of the current network based on the TE decision, determining the objective function value corresponding to the current training according to the network state information at the network summarization node, and obtaining the gradient corresponding to the network parameter of the network model according to the objective function value corresponding to the current training and the objective function value corresponding to the last training; and then, according to the gradient obtained in the network summarizing stage, adjusting network parameters in a TE DNN model, wherein TE decision is obtained through the network model based on the obtained training sample.

Specifically, when training the network model, the differentiable TE network simulator may be embedded as a "layer" in the network model, that is, the differentiable TE network simulator may output gradients corresponding to network parameters of the network model, and adjust the network parameters in the network model based on the automatic differentiation capability of the Pytorch programming framework and based on the obtained gradients. In the embodiment of the application, the differentiable TE network simulator is realized by using an automatically differentiable programming paradigm of Pytorch, and can update parameters of the network model which is also realized by using Pytorch directly based on gradients corresponding to network parameters of the output network model.

In another possible implementation manner of this embodiment of the present application, the network path information corresponding to any one sample tunnel includes link identifiers of each directional link through which the network path sequentially passes, where determining, based on the obtained network path information, network state information of a network includes: determining predicted attribute information corresponding to each sample tunnel based on link attribute information of each link included in the network and link information passed by each network path; and determining network state information based on the prediction attribute information corresponding to each sample tunnel.

Specifically, the prediction attribute information corresponding to any sample tunnel includes at least one of the following:

link capacity information; link delay information; link cost information.

Specifically, the link capacity N is a vector of |E|×1 dimension, where Ne represents the link capacity of the link E, which may be kept unchanged or may be dynamically changed; the measured time delay M is a vector of |E|x 1 dimension, M [ E ] represents the time delay measured on the link E, and the time delay can be kept unchanged or can be dynamically changed; link cost K: a vector of |e|×1 dimension. K [ e ] represents the interior gateway protocol (Interior Gateway Protocols, IGP) metric of link e in the network.

Specifically, in the embodiment of the present application, the preset attribute information may be preset, for example, the preset attribute information includes link capacity information; the preset attribute information may also be determined according to a training sample, for example, the training sample includes a tunnel link cost (length) requirement Z, and the preset attribute includes a link cost K; the preset attribute information may also be determined according to an objective function, for example, the objective function is a delay function: max_lat (a, M), the preset attribute information includes the measured delay M.

Further, in the embodiment of the present application, links through which network paths corresponding to each tunnel pass are obtained based on network paths (TE decisions) respectively corresponding to at least one tunnel output by the network model, a preset attribute value corresponding to each link is determined, a preset attribute value of a network path corresponding to each tunnel is determined based on the preset attribute value corresponding to each link and the links through which the network paths corresponding to each link pass, and then network state information is determined based on the preset attribute values of the network paths corresponding to each link.

For example, the training samples include a tunnel 1 and a tunnel 2, and for the tunnel 1, the preset attribute is link delay, where the network path corresponding to the tunnel 1 is link 1, link 2, link 3, link 4 and link 5, and link delay values corresponding to each link (link 1, link 2, link 3, link 4 and link 5) are determined, so that the link delay value corresponding to the tunnel 1 can be determined, the link delay value corresponding to the tunnel 2 is consistent with the determination mode of the link delay value corresponding to the tunnel 1, which is not repeated, and after the link delay value corresponding to the tunnel 1 and the link delay value corresponding to the tunnel 2 are calculated, network state information (that is, the network delay of the current network) is obtained according to the link delay value corresponding to the tunnel 1 and the link delay value corresponding to the tunnel 2.

Further, adjusting model parameters of the network model based on the network state information includes: determining an objective function value corresponding to the current training of an objective function corresponding to the network model based on the network state information; determining a training gradient of the network model based on the objective function value corresponding to the current training and the objective function value corresponding to the last training; and adjusting model parameters of the network model based on the determined gradient.

Specifically, determining, based on the network state information, an objective function value corresponding to a current training of an objective function corresponding to the network model includes: and determining an objective function value corresponding to the current training of the objective function corresponding to the network model based on the network state information and the sample attribute information of the sample tunnel.

Specifically, the objective function includes at least one of:

a time delay function; a bandwidth function; cost function.

Specifically, in the embodiment of the present application, the delay function may include an invalid_lat (a, M, L): the time delay constraint verification function verifies whether the sum of link time delays of paths of one tunnel exceeds the time delay requirement of the tunnel. If all delay constraints are satisfied, it returns 0; max_lat (a, M): the function is responsible for obtaining the maximum time delay in all tunnels; avg_lat (a, M): the function is responsible for obtaining the average delay for all tunnels. The bandwidth function may include: an invalid_bw (A, B, N), a bandwidth constraint verification function verifies whether the sum of the bandwidth requirements on each link exceeds the link capacity; max_bw (a, B, N): the function is responsible for obtaining the maximum bandwidth of all tunnels; avg_bw (a, B, N): the function is responsible for obtaining the average bandwidth of all tunnels. Cost function: an invalid_cost (A, Z, K) cost constraint verification function verifies whether the sum of the costs of all links on the path of a tunnel exceeds the cost budget requirement of the tunnel; max_cost (a, Z): the function is responsible for obtaining the maximum cost of all tunnels; avg_cost (a, Z): this function is responsible for obtaining the average cost of all tunnels.

For the embodiment of the application, the objective function may be at least one of the above functions, or any combination of the above functions, or a new function (a user-defined function) may be added when calculating the objective function value, and the new function is taken as the objective function, or the new function and at least one of the above objective functions are combined to be taken as the objective function.

Specifically, for the first iterative process, the first case: determining the objective function value obtained by the first iteration process as a gradient corresponding to the parameter of the current adjustment network model (for example, a gradient corresponding to the parameter of the adjustment DNN model); second case: subtracting a preset value from the objective function value obtained by the first iteration processing to obtain a gradient corresponding to the network parameter of the current adjustment network model. The embodiments of the present application are not limited thereto.

Specifically, for non-first iteration processing, a gradient corresponding to a parameter of the adjustment network model is determined based on the objective function value obtained by the current iteration processing minus the objective function value obtained by the last iteration processing.

Further, after obtaining the gradient corresponding to the parameters of the network model, the model parameters of the network model are adjusted according to the gradient by a gradient descent method so as to train and optimize the network model.

Further, training the network model by the iterative processing mode until a preset condition is met.

Specifically, the preset conditions may include: the iteration times are larger than a preset threshold value, and the gradient corresponding to the network parameters of the network model is adjusted to belong to a preset range.

For example, the preset threshold corresponding to the preset iteration number may be 100, and when the iteration number is equal to 100, training the network model is stopped to obtain a trained network model; for another example, the preset range corresponding to the gradient is [ -0.01,0.01], and when the gradient is detected to be in the preset range, training of the network model is stopped to obtain the trained network model.

Further, after the trained network model is obtained, network path information corresponding to each tunnel may be obtained based on the trained network model, which is described in detail in the following embodiments.

An embodiment of the present application provides a network path allocation method, as shown in fig. 2, performed by an electronic device, for example, the electronic device may be a central controller, and the method may include:

step S201, obtaining tunnel information of a network tunnel.

Specifically, in the embodiment of the present application, the tunnel information of the network tunnel may be the tunnel information of one network tunnel, or may be the tunnel information of at least two network tunnels.

Specifically, the tunnel information is the same as the tunnel information of the sample, i.e., the tunnel information may include: the source address of the tunnel, the destination address of the tunnel, and sample attribute information of the tunnel. The specific description of the tunnel information is detailed in the above embodiments, and is not repeated here.

And step S202, carrying out network path distribution processing on the tunnel information of the network tunnel through the trained network model to obtain the network path information corresponding to the network tunnel.

For the embodiment of the present application, the trained network model is a network model obtained by training based on the network path allocation model training method described in the above embodiment, that is, a network model that meets the preset condition, which may, of course, also be referred to as a network model that meets the preset condition.

For the embodiment of the application, the tunnel information of the network tunnel is subjected to network path distribution through the trained network model to obtain the network path information corresponding to the network tunnel, namely, each link sequentially passed by the network tunnel is obtained.

Compared with the prior art, the embodiment of the application acquires the tunnel information of the network tunnel, and then performs network path distribution processing on the tunnel information of the network tunnel through a trained network model to obtain the network path information corresponding to the network tunnel. In the embodiment of the application, the network path information corresponding to each sample tunnel is obtained through the prediction of the training samples, and then the network state of the network is determined according to the network path information corresponding to each sample tunnel, so that the parameters of the network model are adjusted according to the network state of the network, the operation of the network environment, the real network equipment and the protocol and the like are not required to be continuously simulated through the training samples, the time of model training can be reduced, the efficiency of model training is improved, and the samples required by training can be reduced.

The foregoing embodiments describe a way to train a network model to obtain a trained network path allocation model, and a way to allocate a network path for a network tunnel based on the trained network path allocation model, and the following describes, by taking LSTM as an example, a way to train a network model (LSTM) to obtain a trained network path allocation model, and a way to allocate a network path for a network tunnel based on the trained LSTM model, which are specifically described in the following embodiments:

taking traffic engineering (LSTM-based TE, LSTM-TE) based on long and short term memory networks as an example, LSTM is a special RNN, where LSTM includes a plurality of cells, as shown in fig. 3, where LSTM includes LSTM cells @ t-1, LSTM cells @ t, and LSTM cells @ t+1, and for each element in the input sequence, each LSTM cell (LSTM cell @ t is an example) specifically performs the following operations:

the first step: the LSTM cell needs to determine which information needs to be discarded from the information transmitted by the last cell, which is determined by a sigmoid layer (also called a forgetting threshold layer), specifically by the formula: f (f) _t ＝σ(W _if x _t +b _if +W _hf h _(t–1) +b _hf ) Calculated, wherein the input is h _(t–1) And x _t ；

And a second step of: LSTM cells need to determine what information is needed To be stored in the current cell state information, this step comprises two parts: a sigmoid layer (also called an input threshold) and a tanh layer. Wherein the sigmoid layer determines which values need to be updated, in particular by formula i _t ＝σ(W _ii x _t +b _ii +W _hi h _(t–1) +b _hi ) Calculating to obtain; the tanh layer creates a new candidate vector g (t) that can be added to the state information of the cell, specifically by the formula g _t ＝tanh(W _ig x _t +b _ig +W _hg h _(t–1) +b _hg ) Calculating to obtain; third, update the old cell state C (t-1), specifically by equation C _t ＝f _t *c _(t–1) +i _t *g _t Realizing; fourth, the current cell needs to determine the output information, which includes two parts: a sigmoid layer and a tanh layer. The sigmoid layer will determine which parts of the cell state need to be output, specifically by equation o _t ＝σ(W _io x _t +b _io +W _ho h _(t–1) +b _ho ) Realizing; the tanh layer limits the range of the output values to [ -1,1]The formula is h _t ＝o _t *tanh(c _t )。

Wherein, here, h _t Representing the hidden state at time t, c _t Representing the state of a cell at time t, x _t Representing the input at time t, h _(t–1) Representing the hidden state of the corresponding LSTM layer at time t-1 or the initial state at time 0, i _t 、f _t 、g _t 、o _t Respectively representing input, forget, cell and output threshold; w represents the weight of different functions within LSTM cells, b represents the deviation of the corresponding function, as different rectangular nodes in fig. 3 represent different functions within cells, σ represents the sigmoid excitation function, and x represents the hadamard product.

Further, in the present embodiments, in LSTM-TE, the network environment is considered a discrete event dynamic system. In particular, at the beginning of a training iteration, for an input containing |s| tunnels, at each time node, a vector containing a piece of tunnel information is input into the LSTM cell, after all |s| tunnels are processed in |s| time steps, we can obtain the final hidden state of the current training epoch as output a. A is a matrix of |T|×|E| representing the path each tunnel in the |T| tunnel distributes in the network topology, while we input A into dNE to calculate the network state information of the current network, and calculate the gradient of the tuning parameters according to the network state information of the current network and the objective function, and then adjust the parameters in LSTM according to the gradient of the tuning parameters, thus completing an epoch of LSTM model training in dNE.

Further, after the LSTM-TE model is obtained through training, tunnel information of at least one network tunnel is obtained through the trained LSTM-TE to obtain network paths corresponding to the network tunnels respectively.

Further, the LSTM model (referred to as LSTM-TE) obtained by training in the embodiment of the present application performs network path allocation on a tunnel, and compared with the existing TE algorithm, the LSTM-TE performs network path allocation on a tunnel, and the existing TE algorithm includes: constrained Shortest Path first (Constrained Shortest Path First, CSPF), equal Cost Multi-Path (ECMP), k Shortest paths (k-short Path, KSP) +multi-object Flow (MCF), deep reinforcement learning based traffic engineering (DRL-based TE, DRL-TE), have certain advantages over throughput, congestion packet loss, and maximum congestion, as shown in fig. 4, fig. 4 illustrates the variation in traffic data over a period of 35 hours using metrics such as total traffic demand normalized throughput, maximum congestion (maximum link utilization), and congestion packet loss (the result of normalizing the congestion induced packet loss using total traffic demand). LSTM-TE was found to perform better than ECMP and ksp+mcf on all metrics, showing comparable levels of throughput and congestion packet loss compared to the existing best performing non-DNN algorithms CSPF and SMORE. Compared with ECMP, LSTM-TE improves throughput by 13.1% and reduces congestion packet loss by 93.9%; for KSP+MCF, the corresponding benefits were 0.8% and 51.8%, respectively. LSTM-TE performs poorly in terms of maximum congestion compared to CSPF and SMORE. Despite the limitations of single-path routing, LSTM-TE performance in terms of throughput and congestion packet loss still exceeds DRL-TE (gain of 7.0% and 89.8% respectively). That is, in the embodiment of the present application, any DNN model may be a model that is trained as a network path allocation model, may be used to solve TE problems and obtain great performance benefits.

Specifically, CSPF is a greedy algorithm that searches for the shortest path in the network that meets the tunnel needs as the tunnel's allocation path. For batch tunnel requirements, performing CSPF for each tunnel one by one; ECMP, used for adjusting the weight of the link so as to calculate and get the path of the tunnel, the path of each tunnel is calculated and got through Dijkstra algorithm based on weight of the link; KSP+MCF, in the algorithm, first calculate k shortest paths, then abstract MCF problem and use linear programming to solve the problem of bandwidth allocation or flow division; SMORE is similar to ksp+mcf except that SMORE does not take the k shortest paths, but rather paths generated using an irrelevant routing algorithm; DRL used by DRL-TE solves the TE problem.

Further, embodiments of the present application define the training speed of dNE (the manner in which the network model is trained in embodiments of the present application) as the time it takes from taking TE action until observing the effect of the action. Wherein, taking TE action refers to applying the decision of the algorithm model (here, the path allocation of the tunnel) to the actual TE network, namely allocating the tunnel to a specific network path according to the TE decision and meeting the requirement of the tunnel; the effect of action is TE network state information (such as network link utilization, link delay, etc.) obtained after TE network convergence is completed; the effect from taking TE action to observing the action refers to the process of TE algorithm model decision application to the actual TE network that causes the TE network state to change and re-converge to the new final TE network state, where speed is defined as the duration of the process. In the present embodiment, the dNE is compared to an existing generic network simulator OMNet++, and a network topology comprising 100 nodes and 500 1Gbps links is established over the dNE and OMNet++. The embodiment of the application simultaneously creates 100 tunnels with 10Kbps bandwidth requirements and places them evenly into the network; then changing the placement of any tunnel, and observing the corresponding change condition of the link bandwidth use. The present embodiment repeats this process 1000 times and measures the time for each tunnel placement and for all usage to observe a corresponding link utilization change. The discovery is as follows: for omnet++, the observed average change delay is 228.3 milliseconds and 99% delay is 1.594 seconds, but for dNE the change by the corresponding action is almost immediate (99% delay is 0.977 ms). Because the dNE only needs one matrix operation, the dNE has extremely fast response compared with the existing universal network simulator, and the training speed can be improved by more than 228 times.

Further, it is a great concern for prior art universal discrete event simulators. As the network scale increases, the simulation process becomes very slow. The present embodiment expands the network scale of the above experiment to a network comprising 1000 nodes and 5000 links. The omnet++ was found to increase the experiment in which the change was observed from 228.3 ms to 29.1 seconds for each tunnel placement. This means that for an SDWAN network containing 1000 routers, each training iteration omnet++ requires approximately 0.5 minutes. Considering that most deep learning models require thousands of iterations, the total time required to train using the prior art generic network simulator is not acceptable. On the other hand, dNE is insensitive to changes in network size, since changing network size in dNE only requires changing the dimension of tensors in the network model. The present invention finds that scaling up the network results in little change in matrix computation time. This suggests that dNE can be scaled to larger scale networks compared to traditional network simulators, while keeping model evaluation and training speeds unchanged.

The above-described embodiments describe a network path allocation model training method and a network path allocation method from the viewpoint of a method flow, and the following embodiments describe a network path allocation model training device and a network path allocation device from the viewpoint of a virtual module or a virtual unit, concretely the following embodiments, wherein,

The embodiment of the present application provides a network path allocation model training device, as shown in fig. 6, the network path allocation model training device 60 may include: a first acquisition module 61, a training module 62, wherein,

the first obtaining module 61 is configured to obtain a training sample, where the training sample includes sample tunnel information corresponding to at least one sample tunnel in the network;

the training module 62 is configured to iteratively train a preset network model based on the training sample until a preset condition is met, and take the network model meeting the preset condition as a network path allocation model;

wherein, the training module 62 is specifically configured to, when training the network model once:

In one possible implementation manner of the embodiment of the present application, the sample tunnel information includes a source address of the sample tunnel, a target address of the sample tunnel, and sample attribute information of the sample tunnel.

In one possible implementation manner of the embodiment of the present application, sample attribute information of a sample tunnel includes: at least one of a bandwidth requirement of the sample tunnel, a tunnel class of the sample tunnel, a tunnel latency requirement of the sample tunnel, or a tunnel link cost of the sample tunnel.

In one possible implementation manner of the embodiment of the present application, the network path information corresponding to any one sample tunnel includes link identifiers of each directional link through which the network path sequentially passes,

the training module 62 is specifically configured to, when determining network state information of the network based on the obtained path information of each network:

determining predicted attribute information corresponding to each sample tunnel based on link attribute information of each link included in the network and link information passed by each network path information;

In one possible implementation manner of the embodiment of the present application, the prediction attribute information corresponding to any sample tunnel includes at least one of the following:

link capacity information; link delay information; link cost information.

In one possible implementation manner of the embodiment of the present application, when the training module 62 adjusts the model parameters of the network model based on the network state information, the training module is specifically configured to:

In one possible implementation manner of this embodiment of the present application, when determining, based on the network state information, an objective function value corresponding to the current training of the objective function corresponding to the network model, the training module 62 is specifically configured to:

One possible implementation manner of the embodiment of the present application, the objective function includes at least one of the following:

a time delay function; a bandwidth function; cost function.

The training device for the network path allocation model in this embodiment may execute the training method for the network path allocation model shown in the foregoing method embodiment, and its implementation principle is similar, and will not be described herein again.

Compared with the prior art, the embodiment of the application provides a training device for a network path distribution model, the training sample is obtained, the training sample comprises sample tunnel information corresponding to at least one sample tunnel in a network, then iteration training is performed on a preset network model based on the training sample until a preset condition is met, the network model meeting the preset condition is used as the network path distribution model, and the method for performing one training on the network model comprises the following steps: inputting training samples into a network model, and predicting to obtain network path information corresponding to each sample tunnel; determining network state information of a network based on the obtained network path information; model parameters of the network model are adjusted based on the network state information. In the embodiment of the application, the network state corresponding to the current network can be calculated for each training without continuously simulating the network environment, the real network equipment, the operation of the protocol and the like through training samples, so that the time of model training can be reduced, the efficiency of model training can be improved, and samples required by training can be reduced.

The embodiment of the present application further provides a network path allocation device, as shown in fig. 7, where the network path allocation device 70 includes: a second acquisition module 71, an allocation module 72, wherein,

a second obtaining module 71, configured to obtain tunnel information of a network tunnel;

the allocation module 72 is configured to perform network path allocation processing on the tunnel information of the network tunnel through the trained network model shown in the above method embodiment, so as to obtain network path information corresponding to the network tunnel.

The network path allocation device of this embodiment may execute the network path allocation method shown in the foregoing method embodiment, and its implementation principle is similar, and will not be described herein.

In the above embodiment, the network path model training device and the network path allocation device are described from the viewpoint of the virtual module, and the following embodiment describes the electronic device from the viewpoint of the physical device, and may be used to perform the network path allocation model training method shown in the above method embodiment, and perform the network path allocation method shown in the above embodiment, which is specifically described in the following embodiment.

The embodiment of the application provides an electronic device, as shown in fig. 8, an electronic device 8000 shown in fig. 8 includes: a processor 8001, and a memory 8003. Processor 8001 is coupled to memory 8003, such as via bus 8002. Optionally, electronic device 8000 may also include a transceiver 8004. In practice, the transceiver 8004 is not limited to one, and the structure of the electronic device 8000 is not limited to the embodiment of the present application.

The processor 8001 may be a CPU, general purpose processor, DSP, ASIC, FPGA or other programmable logic device, transistor logic device, hardware component, or any combination thereof. Which may implement or perform the various exemplary logic blocks, modules, and circuits described in connection with this disclosure. The processor 8001 may also be a combination that implements computing functionality, e.g., comprising one or more microprocessor combinations, a combination of DSP and microprocessor, etc.

Bus 8002 may include a path to transfer information between the components. Bus 8002 may be a PCI bus or an EISA bus, etc. Bus 8002 may be divided into an address bus, a data bus, a control bus, and the like. For ease of illustration, only one thick line is shown in fig. 8, but not only one bus or one type of bus.

Memory 8003 may be, but is not limited to, a ROM or other type of static storage device that can store static information and instructions, a RAM or other type of dynamic storage device that can store information and instructions, an EEPROM, CD-ROM or other optical disk storage, optical disk storage (including compact disk, laser disk, optical disk, digital versatile disk, blu-ray disk, etc.), magnetic disk storage media or other magnetic storage devices, or any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer.

The memory 8003 is used to store application code for executing the present application and is controlled by the processor 8001 to execute. Processor 8001 is configured to execute application code stored in memory 8003 to implement what is shown in the foregoing method embodiments of network path allocation model training.

The embodiment of the application provides electronic equipment, which comprises: a memory and a processor; at least one program stored in the memory for execution by the processor, which when executed by the processor, performs: according to the method, training samples are obtained, the training samples comprise sample tunnel information corresponding to at least one sample tunnel in a network, then iterative training is conducted on a preset network model based on the training samples until preset conditions are met, the network model meeting the preset conditions is used as a network path distribution model, and the method for conducting one-time training on the network model comprises the following steps: inputting training samples into a network model, and predicting to obtain network path information corresponding to each sample tunnel; determining network state information of a network based on the obtained network path information; model parameters of the network model are adjusted based on the network state information. In the embodiment of the application, the network state corresponding to the current network can be calculated for each training without continuously simulating the network environment, the real network equipment, the operation of the protocol and the like through training samples, so that the time of model training can be reduced, the efficiency of model training can be improved, and samples required by training can be reduced.

The present application provides a computer readable storage medium having a computer program stored thereon, which when run on a computer, causes the computer to perform the corresponding method embodiments described above. Compared with the prior art, the method comprises the steps of obtaining a training sample, wherein the training sample comprises sample tunnel information corresponding to at least one sample tunnel in a network, then performing iterative training on a preset network model based on the training sample until the preset condition is met, taking the network model meeting the preset condition as a network path distribution model, and performing one-time training on the network model, wherein the method comprises the following steps: inputting training samples into a network model, and predicting to obtain network path information corresponding to each sample tunnel; determining network state information of a network based on the obtained network path information; model parameters of the network model are adjusted based on the network state information. In the embodiment of the application, the network state corresponding to the current network can be calculated for each training without continuously simulating the network environment, the real network equipment, the operation of the protocol and the like through training samples, so that the time of model training can be reduced, the efficiency of model training can be improved, and samples required by training can be reduced.

The embodiment of the application provides an electronic device, as shown in fig. 9, an electronic device 9000 shown in fig. 9 includes: a processor 9001 and a memory 9003. Wherein the processor 9001 is coupled to a memory 9003, such as via a bus 9002. Optionally, the electronic device 9000 may further comprise a transceiver 9004. Note that, in practical applications, the transceiver 9004 is not limited to one, and the structure of the electronic device 9000 is not limited to the embodiment of the present application.

The processor 9001 may be a CPU, general purpose processor, DSP, ASIC, FPGA or other programmable logic device, transistor logic device, hardware component, or any combination thereof. Which may implement or perform the various exemplary logic blocks, modules, and circuits described in connection with this disclosure. The processor 9001 may also be a combination implementing computing functions, e.g. comprising one or more microprocessor combinations, a combination of a DSP and a microprocessor, etc.

Bus 9002 may include a pathway to transfer information between the components. Bus 9002 may be a PCI bus or EISA bus, or the like. The bus 9002 can be divided into an address bus, a data bus, a control bus, and the like. For ease of illustration, only one thick line is shown in fig. 9, but not only one bus or one type of bus.

The memory 9003 may be a ROM or other type of static storage device that can store static information and instructions, a RAM or other type of dynamic storage device that can store information and instructions, an EEPROM, a CD-ROM or other optical disk storage, optical disk storage (including compact disks, laser disks, optical disks, digital versatile disks, blu-ray disks, etc.), magnetic disk storage media or other magnetic storage devices, or any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer, but is not limited to such.

The memory 9003 is used to store application code for executing the present application and is controlled for execution by the processor 9001. The processor 9001 is configured to execute application program codes stored in the memory 9003 to implement the content shown in the foregoing network path allocation method embodiment.

The embodiment of the application provides electronic equipment, which comprises: a memory and a processor; at least one program stored in the memory for execution by the processor, which when executed by the processor, performs: according to the method and the device, the tunnel information of the network tunnel is obtained, and then the network path distribution processing is carried out on the tunnel information of the network tunnel through the trained network model, so that the network path information corresponding to the network tunnel is obtained. In the embodiment of the application, the network path information corresponding to each sample tunnel is obtained through the prediction of the training samples, and then the network state of the network is determined according to the network path information corresponding to each sample tunnel, so that the parameters of the network model are adjusted according to the network state of the network, the operation of the network environment, the real network equipment and the protocol and the like are not required to be continuously simulated through the training samples, the time of model training can be reduced, the efficiency of model training is improved, and the samples required by training can be reduced.

The present application provides a computer readable storage medium having a computer program stored thereon, which when run on a computer, causes the computer to perform the corresponding method embodiments described above. Compared with the prior art, the method and the device for network path allocation in the network tunnel have the advantages that tunnel information of the network tunnel is obtained, and then the tunnel information of the network tunnel is subjected to network path allocation processing through the trained network model, so that network path information corresponding to the network tunnel is obtained. In the embodiment of the application, the network path information corresponding to each sample tunnel is obtained through the prediction of the training samples, and then the network state of the network is determined according to the network path information corresponding to each sample tunnel, so that the parameters of the network model are adjusted according to the network state of the network, the operation of the network environment, the real network equipment and the protocol and the like are not required to be continuously simulated through the training samples, the time of model training can be reduced, the efficiency of model training is improved, and the samples required by training can be reduced.

It should be understood that, although the steps in the flowcharts of the figures are shown in order as indicated by the arrows, these steps are not necessarily performed in order as indicated by the arrows. The steps are not strictly limited in order and may be performed in other orders, unless explicitly stated herein. Moreover, at least some of the steps in the flowcharts of the figures may include a plurality of sub-steps or stages that are not necessarily performed at the same time, but may be performed at different times, the order of their execution not necessarily being sequential, but may be performed in turn or alternately with other steps or at least a portion of the other steps or stages.

The foregoing is only a partial embodiment of the present invention, and it should be noted that it will be apparent to those skilled in the art that modifications and adaptations can be made without departing from the principles of the present invention, and such modifications and adaptations are intended to be comprehended within the scope of the present invention.

Claims

1. A method for training a network path allocation model, comprising:

performing iterative training on a preset network model based on the training sample through a differentiable traffic engineering network simulator until a preset condition is met, and taking the network model meeting the preset condition as a network path distribution model;

the method for training the network model once comprises the following steps:

inputting the training samples into the network model, and predicting to obtain network path information corresponding to each sample tunnel;

determining network state information of the network based on the obtained network path information;

and adjusting model parameters of the network model by a gradient descent method based on the determined gradient.

2. The method of claim 1, wherein the sample tunnel information comprises a source address of the sample tunnel, a destination address of the sample tunnel, and sample attribute information of the sample tunnel.

3. The method of claim 2, wherein the sample attribute information of the sample tunnel comprises: at least one of a bandwidth requirement of the sample tunnel, a tunnel class of the sample tunnel, a tunnel latency requirement of the sample tunnel, or a tunnel link cost of the sample tunnel.

4. The method of any one of claims 1-3, wherein the network path information corresponding to any one of the sample tunnels includes a link identification of each of the directional links through which the network path sequentially passes,

the determining network state information of the network based on the obtained path information of each network includes:

and determining the network state information based on the prediction attribute information corresponding to each sample tunnel.

5. The method of claim 4, wherein the predicted attribute information corresponding to any sample tunnel comprises at least one of:

link capacity information; link delay information; link cost information.

6. The method of claim 1, wherein determining the objective function value corresponding to the current training of the objective function corresponding to the network model based on the network state information comprises:

7. The method according to claim 1 or 6, wherein the objective function comprises at least one of:

a time delay function; a bandwidth function; cost function.

8. A network path allocation method, comprising:

Acquiring tunnel information of a network tunnel;

and carrying out network path distribution processing on the tunnel information of the network tunnel through the network model trained by any one of claims 1-7 to obtain the network path information corresponding to the network tunnel.

9. A network path allocation model training device, comprising:

the training module is used for carrying out iterative training on a preset network model based on the training sample through the differentiable traffic engineering network simulator until the preset condition is met, and taking the network model meeting the preset condition as a network path distribution model;

10. A network path allocation apparatus, comprising:

and the distribution module is used for carrying out network path distribution processing on the tunnel information of the network tunnel through the network model trained by any one of claims 1-7 to obtain the network path information corresponding to the network tunnel.

11. An electronic device, comprising:

one or more processors;

a memory;

one or more applications, wherein the one or more applications are stored in the memory and configured to be executed by the one or more processors, the one or more applications configured to: method according to any one of claims 1 to 8.

12. A computer readable storage medium, on which a computer program is stored, characterized in that the program, when being executed by a processor, implements the method of any of claims 1-8.