CN111901237A

CN111901237A - Source routing method and system, related device and computer readable storage medium

Info

Publication number: CN111901237A
Application number: CN201910373196.1A
Authority: CN
Inventors: 李亦然; 徐葳; 蔡庆芃; 郑顺; 胡苏�
Original assignee: Tsinghua University; Huawei Technologies Co Ltd
Current assignee: Tsinghua University; Huawei Technologies Co Ltd
Priority date: 2019-05-06
Filing date: 2019-05-06
Publication date: 2020-11-06
Anticipated expiration: 2039-05-06
Also published as: CN111901237B

Abstract

The application provides a source route selecting method, a source route selecting system, a source route device, a computer device and a computer readable storage medium, local synchronous information and global asynchronous information are combined, and path decision is carried out according to the local synchronous information and the global asynchronous information, so that target path information is obtained, and the route selecting efficiency and the transmission rate of a source route can be effectively improved.

Description

Source routing method and system, related device and computer readable storage medium

Technical Field

The present application relates to the field of communications technologies, and in particular, to a source routing method, a source routing system, a source routing device, a computer device, and a computer-readable storage medium.

Background

Cloud providers and data center operators have been striving to improve the performance of data center networks. Although the throughput of a data center can be improved by extending the link bandwidth and adding redundant links, reducing network data flow delay remains a challenging problem. Moreover, latency has been a key performance bottleneck for various applications (e.g., Web services, search, storage) supported by data centers.

One way to improve latency is to take full advantage of the redundant paths available in modern data centers and select the redundant path with the least latency. These topologies have been adopted to increase the available bandwidth and thereby reduce latency. The most widely used method to reduce delay by utilizing redundant paths is Equal-Cost Multi-Path (ECMP). ECMP randomly selects a path for each data Flow (Flow) in the shortest path. Assuming that the traffic patterns are sufficiently uniform, the ECMP method can balance the network load well, thereby improving latency. However, data center traffic patterns can change rapidly over short periods of time and can sometimes be very unbalanced. Therefore, random selection may cause much congestion, thereby increasing delay.

Disclosure of Invention

In view of the above-mentioned shortcomings of the related art, it is an object of the present application to provide a source routing method, a source routing system, a source routing device, a computer device and a computer readable storage medium to solve the above-mentioned problems.

To achieve the above and other related objects, a first aspect of the present application provides a source routing method, comprising: determining a plurality of candidate paths when at least one data packet is received from a host of a network, and observing the flow state of the network in real time; determining state information related to a plurality of observed paths in the observed flow states as local synchronization information; and performing path decision according to the local synchronous information and the acquired global asynchronous information to output a data packet containing target path information to a switch in the network.

A second aspect of the present application also provides a source routing device, including: the network interface is used for determining a plurality of candidate paths when receiving at least one data packet from a host of a network, observing the flow state of the network in real time and sending the data packet containing target path information to a switch in the network; a memory for storing local synchronization information determined by observed state information associated with a plurality of observed paths in the flow state; and the processor is used for carrying out path decision according to the local synchronous information and the acquired global asynchronous information so as to generate a data packet containing target path information.

A third aspect of the present application further provides a source routing system, comprising: the network module is used for determining a plurality of candidate paths when receiving at least one data packet from a host of a network, observing the flow state of the network in real time, and sending the data packet containing target path information to a switch in the network; the local cache module is used for determining state information related to a plurality of observation paths in the observed flow state as local synchronization information; and the route decision module is used for carrying out route decision according to the local synchronous information and the acquired global asynchronous information so as to output a data packet containing target route information to the network module.

The fourth aspect of the present application also provides a computer device comprising: a network card device; a memory, a computer program for source routing; one or more processors; wherein the processor is configured to invoke a computer program for source routing in the memory to perform the source routing method of the first aspect.

A fifth aspect of the present application also provides a computer-readable storage medium storing a computer program for active route routing, which when executed implements the method for performing the source route routing according to the first aspect.

As described above, the source route routing method, the source route routing system, the source route device, the computer device, and the computer-readable storage medium according to the present application combine the local synchronous information and the global asynchronous information, and perform the path decision according to the local synchronous information and the global asynchronous information, thereby obtaining the target path information, and effectively improving the routing efficiency and the transmission rate of the source route.

Drawings

Fig. 1 is a schematic flow chart diagram illustrating a source routing method according to an embodiment of the present invention.

Fig. 2 shows a flow diagram of a source route routing method of the present application in another embodiment.

Fig. 3 is a schematic flow chart of a source routing method according to another embodiment of the present invention.

Fig. 4 shows a flow diagram of a source route routing method of the present application in yet another embodiment.

FIG. 5 is a schematic diagram of an embodiment of a reinforcement learning model for the source routing method of the present application.

FIG. 6 is a schematic diagram of a neural network according to an embodiment of the source routing method of the present application.

Fig. 7 is a block diagram of a source routing device according to an embodiment of the present application.

FIG. 8 shows a block diagram of the modules of the source routing system of the present application in one embodiment.

FIG. 9 is a block diagram of a computer device of the present application in one embodiment.

Detailed Description

The following description of the embodiments of the present application is provided for illustrative purposes, and other advantages and capabilities of the present application will become apparent to those skilled in the art from the present disclosure. In the following description, reference is made to the accompanying drawings that describe several embodiments of the application. It is to be understood that other embodiments may be utilized and that compositional and operational changes may be made without departing from the spirit and scope of the present application. The following detailed description is not to be taken in a limiting sense, and the scope of embodiments of the present application is defined only by the claims of the patent of the present application. The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application.

Also, as used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, unless the context indicates otherwise. For example, in the present application, the "at least one user" includes a user and a plurality of users, or the "at least one ontology entity" includes an ontology entity and a plurality of ontology entities. It will be further understood that the terms "comprises," "comprising," "includes" and/or "including," when used in this specification, specify the presence of stated features, steps, operations, elements, components, items, species, and/or groups, but do not preclude the presence, or addition of one or more other features, steps, operations, elements, components, species, and/or groups thereof. The terms "or" and/or "as used herein are to be construed as inclusive or meaning any one or any combination.

It should be understood that the term "and/or" herein is merely one type of association relationship that describes an associated object, meaning that three relationships may exist, e.g., a and/or B may mean: a exists alone, A and B exist simultaneously, and B exists alone. In addition, the character "/" herein generally indicates that the former and latter related objects are in an "or" relationship. It should be understood that in the embodiments of the present application, the terms network "element," node, "" component, "" module, "and the like may be used interchangeably to generally describe a network device, and the terms policy," "method," and the like may be used interchangeably to refer to a method; no special or special meaning is implied to such term or items, and the use of the different nomenclature is intended to distinguish one from another as is conventionally used in the art, and is not intended to limit the scope of such nomenclature unless otherwise specifically indicated herein.

As described in the background, attempts have been made to optimize the delay of multiple paths using a centralized planning algorithm for data flows. This algorithm works poorly because the data streams can arrive randomly and many are very short, requiring the central controller to route each data stream very time consuming. Thus, it is desirable to select a path locally at the source without the need for a global controller.

Still other approaches use local real-time information for path selection, which typically utilize source routing mechanisms. When the path selection is performed by these methods, global information such as the overall link utilization rate is crucial to whether the path selection is good or not. Although source routing avoids the delay of querying the central controller, global information is not available locally.

There is no clear consensus on the path selection method at present. Since the local information and the global information belong to different types, the system is difficult to combine the local information and the global information, and therefore, in order to effectively combine the local information and the global information to optimize path selection and reduce delay, the application provides a source routing method for solving the above problems.

It should be understood that source routing is a routing technology for routing based on a source address, and can implement a function of selectively sending data packets to different destination addresses according to a plurality of different subnet or intranet addresses. Source routing may be used in different network architectures including Internet Protocol (IP) networks, multiprotocol label switching (MPLS) networks, Asynchronous Transfer Mode (ATM) networks, software-defined networking (SDN) based networks, and any other suitable network architecture. The network refers to a network composed of devices that can communicate with each other. In general, the network refers to the Internet (Internet), but the embodiments of the present application are not limited thereto, and may also include, for example, an intranet, a local area network, a wide area network, a metropolitan area network, and the like.

Referring to fig. 1, a schematic flow chart of an embodiment of the source routing method according to the present application is shown, as shown, the method includes:

in step S11, a plurality of candidate paths are determined when at least one packet is received from a host of a network, and a flow state of the network is observed in real time. In an embodiment, an End-Host (End-Host) determines a plurality of candidate Paths (Paths) when receiving at least one packet from a Host configured as a previous node in a network, and simultaneously, the End-Host observes a flow state of the network in real time.

In this embodiment, the host refers to a device connected to a network and configured to complete network communication and data processing, and is, for example, a physical server, where a bottom layer of the physical server is a hardware layer, and the hardware layer mainly includes hardware resources such as a Central Processing Unit (CPU), a memory, a hard disk, and a network card. The host is capable of providing information resources, services or applications to users or other nodes on the network. For example, the host may be a server, a modem, a network switch, an intelligent network card, or a hardware router. The terminal host is used for providing a path selection service. In an embodiment, the terminal host may be a device such as a physical server, a modem, a network switch, an intelligent network card, or a hardware router, or a device such as a network module or a network card device installed on the electronic terminal, the network host, the network system, or the network server. It should be noted that the end hosts do not rely on any source routing mechanism.

In this embodiment, the host sends the data packet to the end host, and the end host determines a plurality of candidate paths according to the received data packet. Meanwhile, the terminal host monitors real-time network information through a network device configured by the terminal host, and in the embodiment, the network information is sent to the terminal host as a streaming state, and the network device is, for example, a network card device or the like.

It should be understood that the data packet is the basic unit of transmission data. The data packet mainly comprises a destination IP address, a source IP address, payload data and the like. An IP Address (Internet Protocol Address) is a set of numbers used in a network to uniquely identify a device or node, and the devices or nodes communicate with each other using an IP Protocol. The destination IP address is used for representing the address of a receiver of the data packet, the source IP address is used for representing the address of a sender of the data packet, and the payload data is used for representing data content.

It should be understood that the path refers to a flow path or a transmission path of a data packet in a network. The host sends the data packet to the receiver, and the terminal host takes the path which can pass from the host to the receiver as a candidate path. The path is composed of at least two or more links (links).

In this application, the at least one data packet may be a single data packet or a plurality of data packets, wherein the plurality of data packets form a small Flow (flowet) and the plurality of small flows form a data Flow (Flow). That is, the small Flow may be understood as a packet group consisting of a plurality of packets consecutively transmitted in one data Flow, and each Flow includes a plurality of flowlets. In the embodiments of the present application, the terms "data Flow", "network Flow", and the like are used interchangeably and are used to refer to "Flow" in data communication transmission; the terms "small Flow," "droplet," and like terms are used interchangeably and are used to refer to a "Flowlet" used to make up a Flow. The Flow States (Flow States) are used to represent network state information. The network state information includes information such as data transmission capacity, throughput, transmission rate, bandwidth, and the like.

In an exemplary embodiment, the end host determines a plurality of candidate paths when receiving a first packet from a host configured as a previous node in a network, and simultaneously, the end host observes a flow state of the network in real time. In this embodiment, the first packet refers to a first packet of the data Flow (Flow) received by the end host when the end host receives a data Flow including one packet, multiple continuous packets, or multiple discontinuous packets from a host configured as a previous node in the network for the first time (for the first time), for example, a longer time interval exists between two data flows that may exist in a sequential transmission process between the end host and a host configured as a previous node in the network during a transmission process of data, where the time interval is, for example, 1s, 10s, 1min, or 10min, and then a plurality of candidate paths are determined when the end host receives the first packet from the host configured as a previous node in the network, and meanwhile, in the step of the end host observing a Flow state of the network in real time, the first data packet refers to a first data packet in the same data stream sent by the end host from a host configured as a previous node in the network.

In another exemplary embodiment, the end host determines a plurality of candidate paths when receiving a first packet from a host configured as a previous node in a network, and simultaneously, the end host observes a flow state of the network in real time. In this embodiment, the first packet refers to a first packet in each small flow received by the end host when the end host receives a small flow (flowet) containing a plurality of packets for the first time (for the first time) from a host configured as a previous node in a network.

In step S12, state information related to a plurality of observed paths among the observed flow states is determined as local synchronization information. In an embodiment, the terminal host observes network information in real time through network monitoring, and the network information is sent to the terminal host as a stream state. In step S12 executed, the end host determines state information related to a plurality of observed paths among the observed flow states as local synchronization information.

It should be appreciated that there are often multiple paths for data to travel through the network, but not all paths are available. For example, when a device failure or node failure is encountered in a path, a state may result in the path containing the device or node being unavailable. Alternatively, some paths may have high delay rates, congested traffic, etc., and may be unavailable.

In the embodiment of the present application, the observation path includes the following: in an embodiment, the observed path comprises a plurality of the candidate paths; in another embodiment, the observed path may in turn comprise an available path or a historical path with recorded information; in yet another embodiment, the observed path includes both a plurality of candidate paths and an available path or historical path having recorded information.

The local synchronization information includes RTT (Round-trip Time) information. The RTT refers to a total delay experienced by a transmitting end from transmitting data to receiving an acknowledgement from a receiving end. In embodiments, the method further includes storing the determined local synchronization information, for example, storing the local synchronization information in a Buffer (Buffer) on a local memory, cache, or other storage medium. In particular implementations, the storage medium may include read-only memory, random-access memory, EEPROM, CD-ROM or other optical disk storage, magnetic disk storage or other magnetic storage devices, flash memory, a usb disk, a removable hard disk, or any other medium that can be used to store the desired program code in the form of instructions or data structures and that can be accessed.

Wherein the state information related to the plurality of observed paths comprises one or more of RTT information, path length information, path usage history information and path congestion information of each observed path. As mentioned above, the RTT information of the observation path refers to round trip delay information on the observation path, and is not described herein again.

The path length information refers to the sum of costs of the respective observed paths traversed. Other routing protocols define the number of hops, i.e., the number of network products, such as routers, that a packet must traverse on its way from a source to a destination. The path usage history information includes historical usage information and transmission patterns of historical usage on the piece of observed path. For example, for a certain observation path, the host a and the host b respectively use the observation path, and the usage information of the host a and the host b is the usage history information of the observation path. The transmission mode refers to a transmission mode of a data packet, and the transmission mode of the data packet may be different according to the type of data encapsulated in the data packet. For example, the data packet may be in a breakpoint transmission mode or a continuous transmission mode; for example, when the data encapsulated in the data packet is voice data, the transmission mode is different from the transmission mode when the encapsulated data is text data.

The path congestion information routing delay refers to the time it takes for a packet to travel from a source through the network to a destination. In actual network data transmission, many factors affect the data transmission process, such as the bandwidth of the intermediate network path, the port queue of each router passing through, the congestion degree of all the intermediate network paths, and the physical distance.

In an embodiment, the state information related to the plurality of observation paths may further include path reliability information, path bandwidth information, load information, communication cost, and the like. The path reliability is used to represent the dependencies of the paths in the network (usually described in terms of bit error rates). In general, path reliability refers to the reliability of the least reliable path of the paths. The path bandwidth information refers to a maximum throughput capacity of the path. The load information refers to how busy a network resource, such as a router, is.

In step S13, a path decision is made according to the local synchronization information and the obtained global asynchronous information, so as to output a packet containing target path information to a switch in the network. In an embodiment, the end host acquires global asynchronous information from the network, and then makes a path decision based on the global asynchronous information and the local synchronous information determined in the executed step S12, thereby obtaining target path information. And then the terminal host packs the target path information and the data packet together so as to obtain the data packet containing the target path information. And the terminal host sends the data packet containing the target path information to a switch in the network to complete path selection, wherein the switch supports source routing. In the description of the embodiments of the present application, the term global asynchronous Information may also be referred to as asynchronous global Information (asynchronous global Information).

As mentioned earlier, the local synchronization information refers to real-time network information reflecting the current state from the perspective of the terminal host; correspondingly, the global asynchronous information refers to long-term aggregated network information provided from a global perspective. The global asynchronous information reflects the cumulative state of the network. In source routing, the policy of path selection is crucial, and different policies may lead to distinct results. Since the local synchronous information and the global asynchronous information have different attributes and are used for describing different path characteristics, the path can be selected better by combining the local synchronous information and the global asynchronous information. In embodiments, after obtaining the local synchronous information and the global asynchronous information, the method may further include storing the local synchronous information and the global asynchronous information, for example, in a Buffer area (Buffer) on a storage medium of a local memory space or a cache space, and in embodiments, the storage medium may include a read-only memory, a random access memory, an EEPROM, a CD-ROM or other optical disk storage device, a magnetic disk storage device or other magnetic storage devices, a flash memory, a usb disk, a removable hard disk, or any other medium that can be used to store and access a desired program code in the form of instructions or data structures.

In one exemplary embodiment, the global asynchronous information includes path utilization information. And the path utilization rate information is obtained by calculating the acquired bandwidth and data transmission quantity of the path.

Referring to fig. 2, a flow chart of a source routing method according to another embodiment of the present application is shown, and as shown in the drawing, in an embodiment, the step of obtaining the global asynchronous information includes the following steps: in step S121, sending the flow state to a global Aggregator (agglegrator) in the network; the global aggregator collects the flow states sent by all hosts in the network, calculates the use state of each link in each path included in the flow state information, then updates the global asynchronous information according to the use state of each link in the network, the global aggregator sends the updated global asynchronous information to each host (including the terminal host) in the network according to a preset frequency update, and in step S122, the terminal host receives the global asynchronous information updated by the global aggregator according to the preset frequency. In an embodiment, the global aggregator is, for example, a physical server, a bottom layer of the physical server is a hardware layer, and the hardware layer mainly includes hardware resources such as a central processing unit, a memory, a hard disk, and a network card.

As in the above embodiment, the terminal host sends the observed streaming status to the global aggregator in the network, and the global aggregator updates the global asynchronous information according to the received streaming status. And the terminal host receives the global asynchronous information updated and sent by the global aggregator according to a certain frequency. In an embodiment, the global aggregator receives global asynchronous information updated by the flow state sent by each network node in the network, and sends the global asynchronous information to each network node in the network according to a preset frequency. The global aggregator may also receive the stream states sent by all network nodes in the network, integrate global network information to update the global asynchronous information, and then send the updated global asynchronous information to all network nodes according to a preset frequency. The frequency is a preset frequency and can be manually set in advance. For example, in a data center network with 1000 hosts, if the hosts report network traffic using a small interval, the global aggregator will have a high bandwidth consumption. To avoid such network overhead, the preset frequency is set to be sent every 500 ms, so that the global aggregator sends the updated global asynchronous information to each network node at a frequency of every 500 ms, but not limited thereto, and in different implementations, the preset frequency may be a shorter time (e.g., 200ms) or a longer time (e.g., 1s) depending on the number of hosts configured in the global or the load condition of the network.

In an embodiment, the terminal host temporarily stores the updated global asynchronous information onto a storage medium in a local memory space or a cache area (Buffer) on the storage medium after receiving the updated global asynchronous information.

In an embodiment, the global asynchronous information includes information of utilization rate of each link globally within at least one preset historical duration (or historical version). In an exemplary embodiment, for example, if the preset historical duration is 10s, the end host sends all the stream states observed in the 10s to the global aggregator, the global aggregator integrates the utilization rate of each link in the 10s in the network and updates the utilization rate, and then sends the updated global asynchronous information containing the utilization rate of each link with the historical duration of 10s to each network node.

In another exemplary embodiment, the global asynchronous information includes information of utilization rate of each link in a global state within a plurality of preset historical durations, in which case at least two of the preset historical durations are different, and in this embodiment, the preset historical durations are divided or distinguished in a slicing manner, for example, a first historical duration is 10s of a latest time, a second historical duration is 60s of a previous time, a further historical duration is 10min of a further previous time, and so on. That is, the global asynchronous information includes information of utilization rate of each link in the global environment within a plurality of preset historical durations, where each preset historical duration is different, and the preset historical durations are, for example, preset historical durations of 5 preset time periods, that is. The global asynchronous information acquired by the terminal host comprises information of utilization rate of each link in the global state within 5 preset historical durations.

In an embodiment, the method further comprises the step of obtaining the utilization rate of each candidate path according to the information of the utilization rate of each link in the global: and the terminal host determines the utilization rate of each candidate path according to the maximum value and the average value of the utilization rate of the link in each candidate path in a preset time length within at least one preset historical time length. In this embodiment, the terminal host obtains the utilization rate of each link in each candidate path in one or more preset durations through matrix operation, in this embodiment, the terminal host obtains the link utilization rate of each link in a preset historical duration, each path includes a plurality of links, and then selects the maximum value thereof as the utilization rate of the corresponding candidate path according to the utilization rates of all the links. Meanwhile, the utilization rates of all links are subjected to mean calculation, and the obtained mean value is used as the utilization rate of the corresponding candidate path. Therefore, the utilization rate of each candidate path is obtained.

Referring to fig. 3, which is a schematic flow chart of a source routing method according to another embodiment of the present application, as shown in the executed step S13, the step of making a path decision according to the local synchronization information and the obtained global asynchronous information further includes:

in the executed step S131, determining the local synchronous information and the global asynchronous information as path states of the plurality of candidate paths; in an embodiment, the end host determines local synchronization information including RTT information and global asynchronous information including utilization of each candidate path as path states of the candidate paths.

In step S132, the path states of the candidate paths are used as input to calculate a target path, and the information of the target path is written into the data packet and sent to a switch in the network. In an embodiment, the end host calculates path states of the candidate paths as input to obtain a target path, and writes information of the target path into the data packet to be sent to a switch in the network.

The terminal host takes the local synchronous information and the global asynchronous information as path states of corresponding candidate paths respectively, and calculates according to a certain algorithm or model and a plurality of path states to obtain a target path. The global asynchronous information refers to the obtained candidate path utilization rate. The target path is a superior path among a plurality of candidate paths. By "superior" herein is meant that the path is not only an available path, but that the "superior" path has one or more parameters that are lower or higher than other candidate paths, such as lower delay, no or less congestion, shorter path length, higher path utilization, etc. In an embodiment, the target path may be obtained by calculating a plurality of path states through, for example, a Link State Routing (Link State Routing) algorithm or the like.

And after the target path is obtained, writing the information of the target path into the data packet, and sending the data packet to a switch in the network. After the switch receives the data packet, the switch transmits the data according to the target path according to the information of the target path in the data packet.

In an embodiment, the target path may also be obtained by calculating a plurality of path states using a neural network and machine learning. With the continuous development of the machine learning field and the neural network technology, the method is also widely applied to the communication field. However, in an actual network, the size is huge, network information is complicated, and the network information is very variable. Since there are a large number of paths in the network, each path has its own state information, which is variable. Therefore, when utilizing machine learning techniques or neural network techniques, the prior art faces the challenge of how to solve the problem of the change of information of a large number of candidate paths over time.

In an exemplary embodiment of the present application, the path decision is obtained by taking the path states of the candidate paths as input and calculating the path states by using a reinforcement learning model. The reinforcement learning model can be used for learning through interaction with a network environment, so that the self path selection is improved, the calculation amount is small, the calculation speed is high, the efficiency is high, and the performance is higher compared with other models such as a supervision learning model.

For convenience of description, the reinforcement learning model may be divided into a reinforcement learning part for acquiring experiences through historical path selection and corresponding delay conditions and learning, and a neural network part for acquiring and analyzing a relationship between processing network information and a path selection result, so that the whole path selection part does not need manual calculation or manual rule setting.

Neural networks are mathematical or computational models that mimic the structure and function of biological neural networks. In embodiments, the neural network master may include a supervised learning network, an unsupervised learning network, or the like; in embodiments, the neural network may include a feed-forward neural network, a recurrent neural network, an augmented neural network, and the like.

The reinforcement learning method is a machine learning method and is widely applied to the fields of intelligent control, robots, multi-agents and the like. The reinforcement learning refers to learning from an environment state to behavior mapping so as to maximize the accumulated reward value obtained by the system behavior from the environment, and an optimal behavior strategy is found by adopting a continuous trial and error method. In an exemplary embodiment, the path states of the candidate paths are input to a neural network, then the potential waiting time of each path is evaluated through reinforcement learning, and the path corresponding to the shortest potential waiting time is returned as a result and is taken as the target path. And the target path is sent to the terminal host, so that the terminal host completes path selection. In embodiments, the reinforcement Learning method may include algorithms such as Q-Learning, Deep Q Network, or variations of these algorithms.

Referring to fig. 4, a schematic flow chart of a source routing method according to another embodiment of the present application is shown, wherein the step of calculating the path states of the candidate paths using a reinforcement learning model to obtain the path decision includes:

in step S1321, a utility value for each candidate path is calculated separately using the path states of the plurality of candidate paths as inputs to the neural network. And respectively inputting the path states of the candidate paths into a neural network, and respectively calculating the utility value of each candidate path by the neural network. The utility value is used to evaluate the utility of the candidate path. When the utility value is higher, it means that the candidate path is more useful, or the candidate path is "superior".

In machine learning, a situation that an algorithm easily falls into local optimization often occurs. In a real complex network, the local optimization means that a solution to a problem is optimal within a certain range or area, or that a means for solving the problem or achieving the goal is optimal within a certain range or limit. The local optimality means that the reinforcement learning model places more emphasis on "exploratory" optimizing path selection, but ignores "exploratory" for all paths in the network. This situation easily causes a decrease in the utilization of the path.

Therefore, in consideration of this, in step S1322, the candidate path with the maximum utility value is determined as the target path according to the preset first probability, and then in step S1324, the information of the target path is written into the packet to be sent to a switch in the network; or in step S1323, determining the target path using a random policy according to a preset second probability, and then in step S1324, writing information of the target path into the data packet to be sent to a switch in the network, wherein the first probability is greater than the second probability.

In the embodiment, a first probability and a second probability are preset, and when the utility value of each candidate path is obtained through calculation, the candidate path corresponding to the maximum utility value is selected from the multiple candidate paths as the target path according to the first probability, or the candidate path is selected as the target path by using a random strategy according to the second probability. In this embodiment, the first probability is greater than the second probability, and the sum of the first probability and the second probability is 1. That is, when the neural network is selecting a target path, a large probability is selected based on the comparison of utility values, and a small probability is selected using a random strategy. For example, the first probability is preset to be 80%, the second probability is 20%, when a plurality of candidate paths are selected by the neural network, the candidate path with the maximum utility value is selected as the target path with a probability of 80%, and the candidate path is randomly selected with a probability of 20%. But not limited to this, in other embodiments, the first probability is preset to be 90%, the second probability is preset to be 10%, when the neural network selects multiple candidate paths, there is a probability of 90% to select the candidate path with the largest utility value as the target path, and a probability of 1% is performed in this way to randomly select among the candidate paths. The method for presetting the first probability and the second probability can effectively avoid the algorithm from falling into local optimum, and provides diversity selection for the target path.

In an embodiment, the random strategy comprises a uniform random measurement or a random strategy that makes a decision based on a Max-Boltzmann distribution of utility values. The uniform random measurement means that the probability of selecting each candidate path as the target path is equal among a plurality of candidate paths.

In an embodiment, the method further includes the step of updating the calculated weights of the reinforcement learning model from a training network according to a preset period. Referring to fig. 5, a schematic structural diagram of a reinforcement learning model in an embodiment of the source routing method of the present application is shown, as shown in fig. 5, the steps include: and taking the last-time path decision state ta, the current candidate path state S' and the current immediate Reward (Reward) r in the flow states of the observed network as the estimation result of the experience training of the neural network on the utility value. The last time path decision state ta refers to the path state of the last determined target path a, and the current candidate path state S' refers to all candidate path states S acquired during the decision_iThe current instant prize r refers to the prize obtained during the decision. In this embodiment, the training network not only uses the current candidate path state S' but also considers historical input information, such as the path decision state ta at the previous time, during training, so as to help the training result tend to converge.

In an embodiment, the terminal host updates the computation weight of the reinforcement learning model from an external training network according to a preset period, for example, a period of every 1 hour or every 24 hours, where the external training network may be a device dedicated to the training network, such as a physical server or a cloud server, and the physical server or the cloud server transmits the trained network to the terminal host to update the computation weight of the reinforcement learning model. Of course, the training network is not limited to this, and in one case, the training network may also be a training module or training device built in the terminal host, such as an electronic device or a circuit board, etc. which is built in the terminal host and includes a neural network chip.

In the embodiment, the same neural network is used for making a decision for all path states, the path state of each candidate path is input to the neural network, and an output value is returned through calculation of the neural network, wherein the output value is the target path. For example, please refer to fig. 6, which is a schematic structural diagram of a neural network according to an embodiment of the source routing method of the present application, and as shown in the figure, the neural network includes an input layer, a hidden layer, an output layer, and the like. By way of example, the neural network includes an input layer L1, a first hidden layer L2, a second hidden layer L3, an output layer L4, where two hidden layers have eight neurons in total. In the figure is shown S₁、S₂、S₃、S₄Four candidate path states, which are input layers (only 4 candidate paths are used as an example here, and the number of candidate paths is not limited), are respectively input into a neural network, in the neural network, the candidate path states share weights, and the neural network calculates the candidate path states and determines a target path a₀As an output value.

As described above in step S13 of fig. 2, the end host determines the local synchronization information and the global asynchronous information as the path states of the candidate paths; in the process of making a path decision by using the neural network shown in fig. 5, if the path state of each candidate path includes global asynchronous information with 5 history durations (or referred to as 5 history versions), the path state of each candidate path includes: RTT information of the candidate path, a maximum value of utilization rate of each link in the candidate path in 5 historical durations, and an average value of utilization rate of each link in the candidate path in 5 historical durations, a total of 11-dimensional data is used as an input of the neural network shown in fig. 5, and if 3 candidate networks are determined before, the input of the neural network is 3 × 11-dimensional data; if the number of candidate networks previously determined is 12, the input to the neural network is data of 12 × 11 dimensions.

The neural network is divided into a decision part and a training part, the training part receives the experience from the decision part, updates the neural network according to the experience, and periodically sends the updated neural network parameters to the decision part. The neural network parameters comprise the number of network nodes, initial weight, training rate, dynamic parameters, allowable error, iteration times, activation functions and the like. In a data center, all hosts may share the same neural network parameters, and all hosts may also share the same reinforcement learning model. By the improvement, parallel decision can be realized, and the reinforcement learning model is helped to obtain more experience and more accurate reward, so that the accuracy of the algorithm is improved.

According to the embodiment of the application, the local synchronous information and the global asynchronous information are combined, and the path decision is carried out according to the local synchronous information and the global asynchronous information, so that the target path information is obtained, and the tail delay can be effectively avoided.

The present application also provides a source routing device, which combines local synchronous information and global asynchronous information in a network to perform a routing decision to optimize routing and reduce delay, it should be understood that a source routing is a routing technology that performs routing based on a source address, and can implement a function of selectively sending a packet to different destination addresses according to a plurality of different subnets or intranet addresses. Source routing may be used in different network architectures including internet protocol networks, multi-protocol label switching networks, asynchronous transfer mode networks, software defined network based networks, and any other suitable network architecture. The network refers to a network composed of devices that can communicate with each other. In general, the network refers to the internet, but the embodiments of the present application are not limited thereto, and may also include, for example, an intranet, a local area network, a wide area network, a metropolitan area network, and the like.

In an embodiment, the source routing device is a switch or an intelligent network card, and specifically, the switch is, for example, an Edge switch (Edge switch) with computing capability. Referring to fig. 7, which is a block diagram of a source routing device 20 according to an embodiment of the present application, the source routing device includes a network interface 201, a memory 202, and a processor 203.

The network interface 201 is configured to determine a plurality of candidate paths when receiving at least one data packet from a host of a network, and observe a flow state of the network in real time; in an embodiment, the host in the network refers to a device connected to the network and used for completing network communication and data processing, where the host is, for example, a physical server, a bottom layer of the physical server is a hardware layer, and the hardware layer mainly includes hardware resources such as a central processing unit, a memory, a hard disk, and a network card. The host is capable of providing information resources, services or applications to users or other nodes on the network. For example, the host may be a server, a modem, a network switch, an intelligent network card, or a hardware router. The network interface 201 is, for example, a circuit or/and a software module of the source routing device 20 including the network interface 201.

In this embodiment, the host 10 sends a data packet to the network interface 201, and the network interface 201 determines a plurality of candidate paths according to the received data packet. Meanwhile, the network interface 201 monitors real-time observed network information through the configured network interface 201, and in an embodiment, the network information is sent to the network interface 201 as a streaming status.

It should be understood that the data packet is the basic unit of transmission data. The data packet mainly comprises a destination IP address, a source IP address, payload data and the like. An IP address is a set of numbers used in a network to uniquely identify a device or node with which communication is to take place using the IP protocol. The destination IP address is used for representing the address of a receiver of the data packet, the source IP address is used for representing the address of a sender of the data packet, and the payload data is used for representing data content.

It should be understood that the path refers to a flow path or a transmission path of a data packet in a network. There may be a plurality of paths for the host 10 to transmit the packet to the receiver, and the source routing device 20 may use a path that can pass from the host 10 to the receiver as a candidate path. The path is composed of at least two or more links (links).

In an exemplary embodiment, the network interface 201 determines a plurality of candidate paths when receiving a first packet from a host 10 configured as a previous node in a network, and simultaneously, the network interface 201 observes a flow state of the network in real time. In this embodiment, the first data packet refers to a first data packet of a data Flow (Flow) received by the network interface 201 when the network interface 201 receives the data Flow including one data packet, multiple continuous data packets, or multiple discontinuous data packets from the host 10 configured as a previous node in the network for the first time (for the first time), for example, a longer time interval exists between two data flows that are transmitted successively in the transmission process of data by the network interface 201 and the host 10 configured as a previous node in the network, where the time interval is, for example, 1s, 10s, 1min, or 10min, and the like, when the network interface 201 receives the first data packet from the host 10 configured as a previous node in the network, a plurality of candidate paths are determined, and meanwhile, in the process of the network interface 201 observing the Flow state of the network in real time, the first packet refers to a first packet in the same data stream sent by the network interface 201 from the host 10 configured as the previous node in the network.

In another exemplary embodiment, the network interface 201 determines a plurality of candidate paths when receiving a first packet from the host 10 configured as a previous node in the network, and simultaneously, the network interface 201 observes a flow state of the network in real time. In this embodiment, the first packet refers to a first packet in each of small flows received by the network interface 201 when the network interface 201 receives a small flow (Flowlet) including a plurality of packets from a host 10 configured as a previous node in a network for the first time (first time).

The memory 202 is configured to store local synchronization information, which is determined by observed state information related to a plurality of observed paths in the flow state; in an embodiment, the memory 202 may comprise read-only memory, random access memory, EEPROM, CD-ROM or other optical disk storage, magnetic disk storage or other magnetic storage devices, flash memory, a USB flash drive, a removable hard drive, or any other medium that can be used to store desired program code in the form of instructions or data structures and that can be accessed.

The local synchronization information includes RTT (Round-trip Time) information. The RTT refers to a total delay experienced by a transmitting end from transmitting data to receiving an acknowledgement from a receiving end. In an embodiment, the determined local synchronization information is stored, for example, the local synchronization information is stored in a Buffer (Buffer) in the memory 202.

In an embodiment, the state information related to the plurality of observed paths includes one or more of RTT information, path length information, path usage history information, and path congestion information of each observed path. As mentioned above, the RTT information of the observation path refers to round trip delay information on the observation path.

The processor 203 is configured to perform a path decision according to the local synchronization information and the obtained global asynchronous information to generate a data packet including target path information; in one embodiment, the processor 203 is configured as a central processing unit, for example.

In an embodiment, the source routing device 20 obtains global asynchronous information from the network, and then performs a path decision according to the global asynchronous information and local synchronous information cached in the memory 202, so as to obtain target path information. The source routing device 20 then encapsulates the destination path information and the data packet together, thereby obtaining a data packet containing the destination path information.

The network interface 201 is used to send a packet containing target path information to a switch in the network. In an embodiment, the source routing device 20 sends the data packet containing the target path information to a switch in the network through its network interface 201 to complete path selection, and the switch supports source routing. In the description of the embodiments of the present application, the term global asynchronous information may also be referred to as asynchronous global information.

As mentioned earlier, the local synchronization information refers to real-time network information reflecting the current state from the perspective of the source routing device 20; correspondingly, the global asynchronous information refers to long-term aggregated network information provided from a global perspective. The global asynchronous information reflects the cumulative state of the network. In source routing, the policy of path selection is crucial, and different policies may lead to distinct results. Since the local synchronous information and the global asynchronous information have different attributes and are used for describing different path characteristics, the path can be selected better by combining the local synchronous information and the global asynchronous information. In an embodiment, after obtaining the local synchronization information and the global asynchronous information, the method may further include storing the local synchronization information and the global asynchronous information in the memory 202.

In an embodiment, the manner of acquiring the global asynchronous information by the processor 203 includes that the network interface 201 sends the received stream state to a global aggregator in the network, and stores the received global asynchronous information updated by the global aggregator into the memory 202 according to a preset frequency, specifically, the source routing device 20 temporarily stores the received updated global asynchronous information into a Buffer (Buffer) of the memory 202.

In an embodiment, the global aggregator is, for example, a physical server, a bottom layer of the physical server is a hardware layer, and the hardware layer mainly includes hardware resources such as a central processing unit, a memory, a hard disk, and a network card. The global aggregator collects flow states sent by all hosts in the network, calculates a use state of each link in each path included in the flow state information, then updates global asynchronous information according to the use state of each link in the network, sends the updated global asynchronous information to each host in the network (including the source routing device 20 of the present application) according to a preset frequency update, and the source routing device 20 receives the global asynchronous information updated by the global aggregator according to the preset frequency.

In an embodiment, the global aggregator receives global asynchronous information updated by the flow state sent by each network node in the network, and sends the global asynchronous information to each network node in the network according to a preset frequency.

The source routing device 20 sends the observed flow state to a global aggregator in the network, which updates global asynchronous information according to the received flow state. The source routing device 20 receives global asynchronous information updated and sent by the global aggregator according to a certain frequency. In an embodiment, the global aggregator receives global asynchronous information updated by the flow state sent by each network node in the network, and sends the global asynchronous information to each network node in the network according to a preset frequency. The global aggregator may also receive the stream states sent by all network nodes in the network, integrate global network information to update the global asynchronous information, and then send the updated global asynchronous information to all network nodes according to a preset frequency. The frequency is a preset frequency and can be manually set in advance. For example, in a data center network with 1000 hosts, if the hosts report network traffic using a small interval, the global aggregator will have a high bandwidth consumption. To avoid such network overhead, the preset frequency is set to be sent every 500 ms, so that the global aggregator sends the updated global asynchronous information to each network node at a frequency of every 500 ms, but not limited thereto, and in different implementations, the preset frequency may be a shorter time (e.g., 200ms) or a longer time (e.g., 1s) depending on the number of hosts configured in the global or the load condition of the network.

In an embodiment, the global asynchronous information includes information of utilization rate of each link globally within at least one preset historical duration (or historical version). In an exemplary embodiment, for example, if the historical duration is preset to be 10s, the source routing device 20 sends all the flow states observed in the 10s to the global aggregator, and the global aggregator integrates and updates the utilization rate of each link in the network within the 10s, and sends the updated global asynchronous information containing the utilization rate of each link with the historical duration of 10s to each network node.

In another exemplary embodiment, the global asynchronous information includes information of utilization rate of each link in a global state within a plurality of preset historical durations, in which case at least two of the preset historical durations are different, and in this embodiment, the preset historical durations are divided or distinguished in a slicing manner, for example, a first historical duration is 10s of a latest time, a second historical duration is 60s of a previous time, a further historical duration is 10min of a further previous time, and so on. That is, the global asynchronous information includes information of utilization rate of each link in the global environment within a plurality of preset historical durations, where each preset historical duration is different, and the preset historical durations are, for example, preset historical durations of 5 preset time periods, that is. The global asynchronous information obtained by the source routing device 20 includes information of each link utilization rate in the global environment within 5 preset historical durations, in other words, the global asynchronous information obtained by the source routing device 20 includes information of link utilization rates of 5 historical versions.

In an embodiment, the memory 202 is further configured to determine the utilization rate of each candidate path according to the maximum value and the average value of the utilization rates of the links in each candidate path in a preset time period within at least one preset historical time period. In this embodiment, the source routing device 20 obtains the utilization rate of each link in each candidate path in one or more preset durations through matrix operation, in this embodiment, the source routing device 20 obtains the link utilization rate of each link respectively in each path within a preset historical duration, and then selects the maximum value thereof as the utilization rate of the corresponding candidate path according to the utilization rates of all links. Meanwhile, the utilization rates of all links are subjected to average calculation, and the obtained average value is used as the utilization rate of the corresponding candidate path, so that the utilization rate of each candidate path is obtained.

The processor 203 is configured to determine the local synchronization information and the global asynchronous information as path states of the candidate paths, and calculate the path states of the candidate paths as inputs to generate a data packet including target path information. In an embodiment, the source routing device 20 determines local synchronization information including RTT information and global asynchronous information including utilization rate of each candidate path as the path states of the candidate paths. The source routing device 20 calculates the path states of the candidate paths as input to obtain a target path, and writes the information of the target path into the data packet to be sent to a switch in the network.

The source routing device 20 uses the local synchronization information and the global asynchronous information as path states of corresponding candidate paths, and performs calculation according to a certain algorithm or model according to the path states to obtain a target path. The global asynchronous information refers to the obtained candidate path utilization rate. The target path is a superior path among a plurality of candidate paths. By "superior" herein is meant that the path is not only an available path, but that the "superior" path has one or more parameters that are lower or higher than other candidate paths, such as lower delay, no or less congestion, shorter path length, higher path utilization, etc. In an embodiment, the target path may be obtained by calculating a plurality of path states through, for example, a Link state algorithm (Link state routing) or the like.

In view of this, in an exemplary embodiment of the present application, in an embodiment, the processor 203 calculates the path states of the candidate paths as input by using a reinforcement learning model to generate the data packet containing the target path information, in other words, calculates the path states of the candidate paths as input by using a reinforcement learning model to obtain the path decision. The reinforcement learning model can be used for learning through interaction with a network environment, so that the self path selection is improved, the calculation amount is small, the calculation speed is high, the efficiency is high, and the performance is higher compared with other models such as a supervision learning model.

The reinforcement learning method is a machine learning method and is widely applied to the fields of intelligent control, robots, multi-agents and the like. The reinforcement learning refers to learning from an environment state to behavior mapping so as to maximize the accumulated reward value obtained by the system behavior from the environment, and an optimal behavior strategy is found by adopting a continuous trial and error method. In an exemplary embodiment, the path states of the candidate paths are input to a neural network, then the potential waiting time of each path is evaluated through reinforcement learning, and the path corresponding to the shortest potential waiting time is returned as a result and is taken as the target path. The target path is sent to the source routing device 20, so that the source routing device 20 completes the path selection. In embodiments, the reinforcement Learning method may include algorithms such as Q-Learning, Deep Q Network, or variations of these algorithms.

In an embodiment, the calculating with the reinforcement learning model is: respectively taking the path states of the candidate paths as the input of a neural network to calculate the utility value of each candidate path; and respectively inputting the path states of the candidate paths into a neural network, and respectively calculating the utility value of each candidate path by the neural network. The utility value is used to evaluate the utility of the candidate path. When the utility value is higher, it means that the candidate path is more useful, or the candidate path is "superior".

Therefore, in consideration of this situation, in this embodiment, the candidate path with the maximum utility value is determined as the target path according to the preset first probability, and the information of the target path is written into the data packet to be sent to a switch in the network; or determining the target path by using a random strategy according to a preset second probability, and writing the information of the target path into the data packet to be sent to a switch in the network, wherein the first probability is greater than the second probability.

In an embodiment, the network interface 201 is further configured to update the calculated weights of the learning-by-force model in the processor from a training network according to a preset period. Referring to fig. 4, which is a schematic structural diagram of a reinforcement learning model in an embodiment of the source routing method of the present application, as shown in fig. 4, an updating process includes: and taking the path decision state ta at the last moment, the current candidate path state S' and the current instant reward r in the flow states of the observed network as the estimation result of the experience training neural network on the utility value. The last time path decision state ta refers to the path state of the last determined target path a, and the current candidate path state S' refers to all candidate path states S acquired during the decision_iThe current instant prize r refers to the prize obtained during the decision. In this embodiment, the training network not only considers the current candidate path state S' but also considers historical input information, such as the path decision state ta at the previous time, during training, so as to help improve the performance of the training networkThe training aid tends to converge.

In an embodiment, the process of updating the calculated weights of the reinforcement learning model from a training network according to a preset period includes observing a path decision state at a last time in the flow states of the network, a current candidate path state, and a current instant reward as an estimation result of empirically training the neural network for utility values.

In an embodiment, the source routing device 20 updates the computation weight of the reinforcement learning model from an external training network according to a preset period, for example, according to a period of every 1 hour or every 24 hours, where the external training network may be a device dedicated to the training network, such as a physical server or a cloud server, and the physical server or the cloud server transmits the trained network to the source routing device 20 to update the computation weight of the reinforcement learning model. Of course, the training network is not limited to this, and in one case, the training network may also be a training module or training device built in the source routing apparatus 20, such as an electronic device or a circuit board or the like built in the source routing apparatus 20 and including a neural network chip.

In the embodiment, the same neural network is used for making a decision for all path states, the path state of each candidate path is input to the neural network, and an output value is returned through calculation of the neural network, wherein the output value is the target path. For example, please refer to fig. 5, which is a schematic structural diagram of a neural network according to an embodiment of the source routing method of the present application, and as shown in the figure, the neural network includes an input layer, a hidden layer, an output layer, and the like. By way of example, the neural network includes an input layer L1, a first hidden layer L2, a second hidden layer L3, an output layer L4, where two hidden layers have eight neurons in total. In the figure is shown S₁、S₂、S₃、S₄The four candidate path states are input to a neural network as an input layer (only 4 candidate paths are used as an example here, and the number of candidate paths is not limited), and are input to the neural network,the candidate path states share weights, and the neural network calculates the candidate path states and determines a target path a₀As an output value.

In an embodiment, source routing device 20 determines the local synchronization information and global asynchronous information as the path states of the plurality of candidate paths; in the process of making a path decision by using the neural network shown in fig. 5, if the path state of each candidate path includes global asynchronous information with 5 history durations (or referred to as 5 history versions), the path state of each candidate path includes: RTT information of the candidate path, a maximum value of utilization rate of each link in the candidate path in 5 historical durations, and an average value of utilization rate of each link in the candidate path in 5 historical durations, a total of 11-dimensional data is used as an input of the neural network shown in fig. 5, and if 3 candidate networks are determined before, the input of the neural network is 3 × 11-dimensional data; if the number of candidate networks previously determined is 12, the input to the neural network is data of 12 × 11 dimensions.

In the embodiment of the source routing device 20, the local synchronous information and the global asynchronous information are combined, and the path decision is performed according to the local synchronous information and the global asynchronous information, so that the target path information is obtained, and the tail delay can be effectively avoided.

The application also provides a source routing system for combining local synchronous information and global asynchronous information in a network to perform path decision to optimize path selection and reduce delay. It should be understood that source routing is a routing technology for routing based on a source address, and can implement a function of selectively sending data packets to different destination addresses according to a plurality of different subnet or intranet addresses. Source routing may be used in different network architectures including internet protocol networks, multi-protocol label switching networks, asynchronous transfer mode networks, software defined network based networks, and any other suitable network architecture. The network refers to a network composed of devices that can communicate with each other. In general, the network refers to the internet, but the embodiments of the present application are not limited thereto, and may also include, for example, an intranet, a local area network, a wide area network, a metropolitan area network, and the like. Referring to fig. 8, which is a block diagram of a source routing system of the present application, in one embodiment, the source routing system 50 includes: a network module 501, a local cache module 502, and a route decision module 503.

In an embodiment, the source routing system can be implemented in hardware, software, firmware, or any combination thereof. The module functions, when implemented, may be embodied by processor-executable function modules in a general-purpose computer architecture, which may be located on tangible, non-transitory computer-readable and-writable storage media, which may be any available media that can be accessed by a computer.

The network module 501 is configured to determine a plurality of candidate paths when receiving at least one packet from a host 10 of a network, observe a flow state of the network in real time, and send a packet containing target path information to a switch 40 in the network; in an embodiment, the network module 501 may be, for example, a device such as a physical server, a modem, a network switch, an intelligent network card, or a hardware router, or a network module or a network card device installed on the electronic terminal, the network host, the network system, or the network server, and for example, the network module may also be an Agent (Agent) in a dumbnt environment.

In this embodiment, the host 10 sends a data packet to the network module 501, and the network module 501 determines a plurality of candidate paths according to the received data packet. Meanwhile, the network module 501 monitors real-time network information through a network device configured therein, where in an embodiment, the network information is sent to the network module 501 as a streaming status, and the network device is, for example, a network card device.

It should be understood that the path refers to a flow path or a transmission path of a data packet in a network. There may be multiple paths for the host to send the data packet to the receiver, and the network module 501 takes the paths that can pass from the host to the receiver as candidate paths. The path is composed of at least two or more links (links).

In an exemplary embodiment, the network module 501 determines a plurality of candidate paths when receiving a first packet from a host configured as a previous node in a network, and simultaneously, the network module 501 observes a flow state of the network in real time. In this embodiment, the first data packet refers to a first data packet of a data Flow (Flow) received by the network module 501 when the network module 501 receives a data Flow including one data packet, multiple continuous data packets, or multiple discontinuous data packets from a host configured as a previous node in a network for the first time (for the first time), for example, a longer time interval exists between two data flows that may exist in a sequential transmission process between the network module 501 and the host configured as a previous node in the network during the transmission process of data, where the time interval is, for example, 1s, 10s, 1min, or 10min, and then, when the network module 501 receives the first data packet from the host configured as a previous node in the network, multiple candidate paths are determined, and at the same time, the network module 501 observes the Flow state of the network in real time, the first packet refers to the first packet in the same data flow sent by the network module 501 from the host configured as the previous node in the network.

In another exemplary embodiment, the network module 501 determines a plurality of candidate paths when receiving a first packet from a host configured as a previous node in a network, and simultaneously, the network module 501 observes a flow state of the network in real time. In this embodiment, the first packet refers to a first packet in each of small flows received by the end host when the network module 501 receives a small flow (Flowlet) including a plurality of packets from a host configured as a previous node in a network for the first time (for the first time).

The local cache module 502 is configured to determine state information related to a plurality of observed paths in the observed flow state as local synchronization information; in an embodiment, the network module 501 monitors real-time network information through a network, and the network information is sent to the local cache module 502 as a flow state. The local caching module 502 determines state information associated with a plurality of observation paths among the observed flow states as local synchronization information.

In an embodiment, the local cache module 502 is, for example, a host computer having a storage medium that may include read-only memory, random-access memory, EEPROM, CD-ROM or other optical disk storage, magnetic disk storage or other magnetic storage devices, flash memory, a usb disk, a removable hard disk, or any other medium that can be used to store desired program code in the form of instructions or data structures and that can be accessed; alternatively, the local caching module 502 is, for example, a Host Agent (Host Agent).

The local synchronization information includes RTT (Round-trip Time) information. The RTT refers to a total delay experienced by a transmitting end from transmitting data to receiving an acknowledgement from a receiving end. In an embodiment, the local cache module 502 stores the determined local synchronization information, for example, stores the local synchronization information in a cache area (Buffer) in a local memory, cache or other storage medium.

The route decision module 503 performs a path decision according to the local synchronization information and the obtained global asynchronous information, so as to output a data packet containing target path information to the network module 501. In an embodiment, the local cache module 502 obtains global asynchronous information from a network, and then performs a path decision according to the global asynchronous information and the local synchronous information cached in the local cache module 502, so as to obtain target path information. The route decision module 503 then encapsulates the target path information and the data packet together, thereby obtaining a data packet containing the target path information. The network module 501 sends the data packet containing the target path information to a switch in the network to complete path selection, where the switch supports source routing. In the description of the embodiments of the present application, the term Global Asynchronous Information may also be referred to as Asynchronous Global Information (Asynchronous Global Information).

In an embodiment, the route decision module 503 is, for example, a device or software that can implement route decision calculation, the device is, for example, a physical server, a modem, a network switch, an intelligent network card, or a hardware router, or a combination of software and hardware with neural network calculation capability installed on the electronic terminal, the network host, the network system, or the network server, in which case, the route decision module 503 may also be a cloud server or a cluster of servers configured at a remote location.

As mentioned earlier, the local synchronization information refers to real-time network information reflecting the current state from a local perspective; correspondingly, the global asynchronous information refers to long-term aggregated network information provided from a global perspective. The global asynchronous information reflects the cumulative state of the network. In source routing, the policy of path selection is crucial, and different policies may lead to distinct results. Since the local synchronous information and the global asynchronous information have different attributes and are used for describing different path characteristics, the path can be selected better by combining the local synchronous information and the global asynchronous information. In embodiments, after obtaining the local synchronous information and the global asynchronous information, the method may further include storing the local synchronous information and the global asynchronous information, for example, in a Buffer area (Buffer) on a storage medium of a local memory space or a cache space, and in embodiments, the storage medium may include a read-only memory, a random access memory, an EEPROM, a CD-ROM or other optical disk storage device, a magnetic disk storage device or other magnetic storage devices, a flash memory, a usb disk, a removable hard disk, or any other medium that can be used to store and access a desired program code in the form of instructions or data structures.

In an embodiment, the route decision module 503 extracts the global asynchronous information from the local cache module 502, and the process of the local cache module 502 obtaining the global asynchronous information includes: sending the flow state to a global Aggregator (agregator) 30 in the network through the network module 501; the global aggregator 30 aggregates flow states sent by all hosts in the network, calculates a usage state of each link in each path included in the flow state information, then the global aggregator 30 updates global asynchronous information according to the usage state of each link in the network, and sends the updated global asynchronous information to each host (including a terminal host) in the network according to a preset frequency update, and the local cache module 502 receives the global asynchronous information updated by the global aggregator 30 according to the preset frequency. In an embodiment, the global aggregator 30 is, for example, a physical server, a bottom layer of the physical server is a hardware layer, and the hardware layer mainly includes hardware resources such as a central processing unit, a memory, a hard disk, and a network card.

As in the above embodiment, the network module 501 sends the observed streaming status to the global aggregator 30 in the network, and the global aggregator 30 updates the global asynchronous information according to the received streaming status. The network module 501 receives global asynchronous information updated and sent by the global aggregator 30 according to a certain frequency. In an embodiment, the global aggregator 30 receives the global asynchronous information updated by the flow state sent by each network node in the network, and sends the global asynchronous information to each network node in the network according to a preset frequency. The global aggregator 30 may also receive the stream states sent by all network nodes in the network, integrate the global network information to update the global asynchronous information, and then send the updated global asynchronous information to all network nodes according to a preset frequency. The frequency is a preset frequency and can be manually set in advance. For example, in a data center network with 1000 hosts, if the hosts report network traffic using a small interval, the global aggregator 30 will have a high bandwidth consumption. To avoid such network overhead, the preset frequency is set to be sent every 500 ms, so that the global aggregator 30 sends the updated global asynchronous information to each network node at a frequency of every 500 ms, but not limited thereto, and in different implementations, the preset frequency may be a shorter time (e.g. 200ms) or a longer time (e.g. 1s) depending on the number of hosts configured in the global or the load condition of the network.

In an embodiment, the network module 501 temporarily stores the updated global asynchronous information to a local location after receiving the updated global asynchronous information, that is, a Buffer area (Buffer) on a storage medium or a cache area of a memory space or a cache space of the local cache module 502.

In an embodiment, the global asynchronous information includes information of utilization rate of each link globally within at least one preset historical duration (or historical version). In an exemplary embodiment, for example, if the preset historical duration is 10s, the end host sends all the stream states observed in the 10s to the global aggregator 30, the global aggregator 30 integrates the utilization rate of each link in the network within the 10s and updates the utilization rate, and then sends the updated global asynchronous information including the utilization rate of each link with the historical duration of 10s to each network node.

In an embodiment, the process of obtaining the utilization rate of each candidate path according to the information of the utilization rate of each link in the global comprises: the local caching module 502 determines the utilization rate of each candidate path according to the maximum value and the average value of the utilization rate of the link in each candidate path in a preset time length within at least one preset historical time length. In this embodiment, the local cache module 502 obtains the utilization rate of each link in each candidate path in one or more preset durations through matrix operation, in this embodiment, the local cache module 502 obtains the link utilization rate of each link respectively in each path in a preset historical duration, and then selects the maximum value thereof as the utilization rate of the corresponding candidate path according to the utilization rates of all links. Meanwhile, the utilization rates of all links are subjected to mean calculation, and the obtained mean value is used as the utilization rate of the corresponding candidate path. Therefore, the utilization rate of each candidate path is obtained.

The route decision module 503 is configured to determine the local synchronization information and the global asynchronous information as path states of the multiple candidate paths, and calculate the path states of the multiple candidate paths as inputs to generate a data packet including target path information. In an embodiment, the local caching module 502 determines local synchronous information including RTT information and global asynchronous information including utilization of each candidate path as the path states of the candidate paths. The route decision module 503 takes the path states of the candidate paths as input to calculate to obtain a target path, and writes the information of the target path into the data packet to be sent to a switch in the network.

In an embodiment, the routing policy module takes a plurality of local synchronous information and global asynchronous information as path states of corresponding candidate paths, and performs calculation according to a certain algorithm or model according to the plurality of path states to obtain a target path. The global asynchronous information refers to the obtained candidate path utilization rate. The target path is a superior path among a plurality of candidate paths. By "superior" herein is meant that the path is not only an available path, but that the "superior" path has one or more parameters that are lower or higher than other candidate paths, such as lower delay, no or less congestion, shorter path length, higher path utilization, etc. In an embodiment, the target path may be obtained by calculating a plurality of path states through, for example, a Link state algorithm (Link state routing) or the like.

In an exemplary embodiment of the present application, the route decision module 503 calculates the path states of the candidate paths as input by using a reinforcement learning model to generate the data packet containing the target path information, in other words, the route decision module 503 calculates the path states of the candidate paths as input by using a reinforcement learning model to obtain the path decision. The reinforcement learning model can be used for learning through interaction with a network environment, so that the self path selection is improved, the calculation amount is small, the calculation speed is high, the efficiency is high, and the performance is higher compared with other models such as a supervision learning model.

It should be understood that the reinforcement learning method is a machine learning method, and is widely applied to the fields of intelligent control, robots, multi-agents and the like. The reinforcement learning refers to learning from an environment state to behavior mapping so as to maximize the accumulated reward value obtained by the system behavior from the environment, and an optimal behavior strategy is found by adopting a continuous trial and error method. In an exemplary embodiment, the path states of the candidate paths are input to a neural network, then the potential waiting time of each path is evaluated through reinforcement learning, and the path corresponding to the shortest potential waiting time is returned as a result and is taken as the target path. The target path is sent to the source routing device 20, so that the source routing device 20 completes the path selection. In embodiments, the reinforcement Learning method may include algorithms such as Q-Learning, Deep Q Network, or variations of these algorithms.

In an embodiment, the above calculation using the reinforcement learning model is: respectively taking the path states of the candidate paths as the input of a neural network to calculate the utility value of each candidate path; and respectively inputting the path states of the candidate paths into a neural network, and respectively calculating the utility value of each candidate path by the neural network. The utility value is used to evaluate the utility of the candidate path. When the utility value is higher, it means that the candidate path is more useful, or the candidate path is "superior".

In an embodiment, the network module 501 is further configured to update the calculated weights of the strong learning models in the route decision module 503 from a training network according to a preset period. Referring to fig. 4, which is a schematic structural diagram of a reinforcement learning model in an embodiment of the source routing method of the present application, as shown in fig. 4, an updating process includes: observing the last time path decision state ta, the current candidate in the flow state of the networkThe path state S' and the current immediate reward r are used as the estimation result of empirically training the neural network for the utility value. The last time path decision state ta refers to the path state of the last determined target path a, and the current candidate path state S' refers to all candidate path states S acquired during the decision_iThe current instant prize r refers to the prize obtained during the decision. In this embodiment, the training network not only uses the current candidate path state S' but also considers historical input information, such as the path decision state ta at the previous time, during training, so as to help the training result tend to converge.

In an embodiment, the network module 501 updates the computation weight of the reinforcement learning model from an external training network according to a preset period, for example, according to a period of every 1 hour or every 24 hours, where the external training network may be a device dedicated to the training network, such as a physical server or a cloud server, and the physical server or the cloud server transmits the trained network to the route decision module 503 to update the computation weight of the reinforcement learning model. Of course, in one case, the training network may also be a training module or training device built in the source routing system, such as an electronic device or a circuit board that is built in the source routing system and includes a neural network chip.

In the embodiment, the same neural network is used for making a decision for all path states, the path state of each candidate path is input to the neural network, and an output value is returned through calculation of the neural network, wherein the output value is the target path. For example, please refer to FIG. 5 for a clear illustrationShown is a schematic diagram of the structure of a neural network in one embodiment of the source routing method of the present application, which includes an input layer, a hidden layer, an output layer, etc., as shown. By way of example, the neural network includes an input layer L1, a first hidden layer L2, a second hidden layer L3, an output layer L4, where two hidden layers have eight neurons in total. In the figure is shown S₁、S₂、S₃、S₄Four candidate path states, which are input layers (only 4 candidate paths are used as an example here, and the number of candidate paths is not limited), are respectively input into a neural network, in the neural network, the candidate path states share weights, and the neural network calculates the candidate path states and determines a target path a₀As an output value.

In an embodiment, the source routing system determines the local synchronization information and the global asynchronous information as the path states of the plurality of candidate paths; in the process of making a path decision by using the neural network shown in fig. 5, if the path state of each candidate path includes global asynchronous information with 5 history durations (or referred to as 5 history versions), the path state of each candidate path includes: RTT information of the candidate path, a maximum value of utilization rate of each link in the candidate path in 5 historical durations, and an average value of utilization rate of each link in the candidate path in 5 historical durations, a total of 11-dimensional data is used as an input of the neural network shown in fig. 5, and if 3 candidate networks are determined before, the input of the neural network is 3 × 11-dimensional data; if the number of candidate networks previously determined is 12, the input to the neural network is data of 12 × 11 dimensions.

According to the embodiment of the source routing system, the local synchronous information and the global asynchronous information are combined, and the path decision is carried out according to the local synchronous information and the global asynchronous information, so that the target path information is obtained, and the tail delay can be effectively avoided.

Referring to fig. 9, which is a block diagram of a computer device in an embodiment of the present invention, as shown in the figure, the computer device 60 includes a network card device 601; a memory 602 for a computer program for source route routing; and one or more processors 603; in an embodiment, the computer device is, for example, a physical server, a bottom layer of the physical server is a hardware layer, and the hardware layer mainly includes hardware resources such as a central processing unit, a memory, a hard disk, and a network card.

In an embodiment, the processor is used to invoke a computer program of the source route routing in the memory to perform the method of the above embodiment with respect to the source route routing method described in fig. 1 to 4.

The present application also provides a computer readable and writable storage medium storing a computer program of an active route routing method, which when executed implements the method of the above embodiments with respect to the source route routing method described in fig. 1 to 4.

The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application or portions thereof that substantially contribute to the prior art may be embodied in the form of a software product stored in a storage medium and including instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present application.

In the embodiments provided herein, the computer-readable and writable storage medium may include read-only memory, random-access memory, EEPROM, CD-ROM or other optical disk storage, magnetic disk storage or other magnetic storage devices, flash memory, a USB flash drive, a removable hard disk, or any other medium that can be used to store desired program code in the form of instructions or data structures and that can be accessed by a computer. Also, any connection is properly termed a computer-readable medium. For example, if the instructions are transmitted from a website, server, or other remote source using a coaxial cable, fiber optic cable, twisted pair, Digital Subscriber Line (DSL), or wireless technologies such as infrared, radio, and microwave, then the coaxial cable, fiber optic cable, twisted pair, DSL, or wireless technologies such as infrared, radio, and microwave are included in the definition of medium. It should be understood, however, that computer-readable-writable storage media and data storage media do not include connections, carrier waves, signals, or other transitory media, but are intended to be non-transitory, tangible storage media. Disk and disc, as used in this application, includes Compact Disc (CD), laser disc, optical disc, Digital Versatile Disc (DVD), floppy disk and blu-ray disc where disks usually reproduce data magnetically, while discs reproduce data optically with lasers.

In one or more exemplary aspects, the functions described in the computer program of the source routing method described herein may be implemented in hardware, software, firmware, or any combination thereof. When implemented in software, the functions may be stored on or transmitted over as one or more instructions or code on a computer-readable medium. The steps of a method or algorithm disclosed herein may be embodied in a processor-executable software module, which may be located on a tangible, non-transitory computer-readable and/or writable storage medium. Tangible, non-transitory computer readable and writable storage media may be any available media that can be accessed by a computer.

The flowcharts and block diagrams in the figures described above of the present application illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

According to the source route routing method, the source route routing system, the source route device, the computer device and the computer readable storage medium, the local synchronous information and the global asynchronous information are combined, and the path decision is carried out according to the local synchronous information and the global asynchronous information, so that the target path information is obtained, and the routing efficiency and the transmission rate can be effectively improved.

The above embodiments are merely illustrative of the principles and utilities of the present application and are not intended to limit the application. Any person skilled in the art can modify or change the above-described embodiments without departing from the spirit and scope of the present application. Accordingly, it is intended that all equivalent modifications or changes which can be made by those skilled in the art without departing from the spirit and technical concepts disclosed in the present application shall be covered by the claims of the present application.

Claims

1. A source routing method, comprising the steps of:

determining a plurality of candidate paths when at least one data packet is received from a host of a network, and observing the flow state of the network in real time;

determining state information related to a plurality of observed paths in the observed flow states as local synchronization information;

and performing path decision according to the local synchronous information and the acquired global asynchronous information to output a data packet containing target path information to a switch in the network.

2. The source routing method of claim 1, wherein the plurality of observed paths comprises the plurality of candidate paths or/and available paths with recorded information.

3. The source routing method according to claim 1 or 2, wherein the state information related to the plurality of observed paths comprises RTT information, path length information, path usage history information, path congestion information of each observed path.

4. The source routing method of claim 1, wherein the step of obtaining global asynchronous information comprises:

sending the flow state to a global aggregator in the network;

and receiving the global asynchronous information updated by the global aggregator according to a preset frequency.

5. The source route selecting method as claimed in claim 4, wherein the global aggregator receives the stream status sent by each network node in the network and sends the updated global asynchronous information to each network node in the network according to a predetermined frequency.

6. The source routing method of claim 4, wherein the global asynchronous information comprises information of utilization of each link globally for at least a preset historical duration.

7. The source routing method of claim 6, further comprising the step of obtaining the utilization of each candidate path according to the information of the utilization of each link in the global: and determining the utilization rate of each candidate path according to the maximum value and the average value of the utilization rate of the link in each candidate path in a preset time length in at least one preset historical time length.

8. The source routing method according to claim 6 or 7, wherein the at least one predetermined historical duration is a plurality of predetermined historical durations, at least two of the predetermined historical durations being different.

9. The source routing method of claim 1, wherein the step of making a path decision according to the local synchronization information and the obtained global asynchronous information comprises:

determining the local synchronous information and the global asynchronous information as the path states of the candidate paths;

and calculating the path states of the candidate paths as input to obtain a target path, and writing the information of the target path into the data packet to send to a switch in the network.

10. The source routing method of claim 9, wherein the step of calculating the path states of the candidate paths as input to obtain the path decision comprises calculating the path states of the candidate paths as input using a reinforcement learning model to obtain the path decision.

11. The source routing method of claim 10, wherein the step of computing the path states of the candidate paths as input using a reinforcement learning model to obtain the path decision comprises:

respectively taking the path states of the candidate paths as the input of a neural network to calculate the utility value of each candidate path;

determining the candidate path with the maximum utility value as a target path according to a preset first probability, and writing the information of the target path into the data packet to be sent to a switch in the network; or

And determining the target path by using a random strategy according to a preset second probability, and writing the information of the target path into the data packet to be sent to a switch in the network, wherein the first probability is greater than the second probability.

12. The source route routing method according to claim 11, wherein the random policy comprises a uniform random measurement or a random policy that makes a decision depending on Max-Boltzmann distribution of utility values.

13. The source route routing method of claim 11, further comprising the step of updating the calculated weights of the reinforcement learning model from a training network according to a predetermined period.

14. The source routing method of claim 13, wherein the step of updating the calculated weights of the reinforcement learning model from a training network according to a predetermined period comprises: and taking the path decision state at the last moment in the flow states of the observed network, the current candidate path state and the current instant reward as the estimation result of the experience training of the neural network on the utility value.

15. A source routing device, comprising:

the network interface is used for determining a plurality of candidate paths when receiving at least one data packet from a host of a network, observing the flow state of the network in real time and sending the data packet containing target path information to a switch in the network;

a memory for storing local synchronization information determined by observed state information associated with a plurality of observed paths in the flow state;

and the processor is used for carrying out path decision according to the local synchronous information and the acquired global asynchronous information so as to generate a data packet containing target path information.

16. The source routing device of claim 15, wherein the plurality of observed paths comprises the plurality of candidate paths or/and available paths with recorded information.

17. The source routing device of claim 15, wherein the plurality of observed path-related state information comprises RTT information, path length information, path usage history information, path congestion information for each observed path.

18. The source routing device of claim 15, wherein the network interface sends the received stream state to a global aggregator in the network, and stores global asynchronous information updated by the global aggregator into the memory according to a predetermined frequency.

19. The source routing device of claim 18, wherein the global aggregator receives global asynchronous information about flow status and update sent by each network node in the network, and sends the global asynchronous information to each network node in the network according to a predetermined frequency.

20. The source routing device of claim 15, wherein the global asynchronous information comprises information of utilization of each link globally for at least a preset historical duration.

21. The source routing device of claim 20, wherein the memory is further configured to determine each of the candidate path utilizations based on a maximum value and an average value of link utilizations in each of the candidate paths over at least a predetermined historical time period for a predetermined time period.

22. The source routing device of claim 21, wherein the at least one preset historical duration is a plurality of preset historical durations, and wherein at least two of the preset historical durations are different.

23. The source routing device of claim 15, wherein the processor is configured to determine the local synchronization information and global asynchronous information as path states of the candidate paths, and to compute the path states of the candidate paths as input to generate a packet containing destination path information.

24. The source routing device of claim 23, wherein the computing is performed using a reinforcement learning model to generate the data packet including the target path information using the path states of the candidate paths as input.

25. The source routing device of claim 24, wherein the computing with the reinforcement learning model is to: respectively taking the path states of the candidate paths as the input of a neural network to calculate the utility value of each candidate path; determining the candidate path with the maximum utility value as a target path according to a preset first probability, and writing the information of the target path into the data packet to be sent to a switch in the network; or determining the target path by using a random strategy according to a preset second probability, and writing the information of the target path into the data packet to be sent to a switch in the network, wherein the first probability is greater than the second probability.

26. The source routing device of claim 25, wherein the random policy comprises a uniform random measurement or a random policy that decides based on Max-Boltzmann distribution of utility values.

27. The source routing device of claim 23, wherein the network interface is further configured to update the calculated weights of the robust learning model in the processor from a training network according to a predetermined period.

28. The source routing device of claim 27, wherein the process of updating the calculated weights of the reinforcement learning model from a training network according to a predetermined period comprises empirically training the estimation result of utility value of the neural network by observing a last-time path decision state, a current candidate path state, and a current immediate reward in the flow state of the network.

29. The source routing device of claim 15, wherein the source routing device is a switch or an intelligent network card.

30. A source routing system, comprising:

the network module is used for determining a plurality of candidate paths when receiving at least one data packet from a host of a network, observing the flow state of the network in real time, and sending the data packet containing target path information to a switch in the network;

the local cache module is used for determining state information related to a plurality of observation paths in the observed flow state as local synchronization information;

and the route decision module is used for carrying out route decision according to the local synchronous information and the acquired global asynchronous information so as to output a data packet containing target route information to the network module.

31. The source routing system of claim 30, wherein the plurality of observed paths comprises the plurality of candidate paths or/and available paths with recorded information.

32. The source routing system of claim 30 or 31, wherein the plurality of observed path-related state information comprises RTT information, path length information, path usage history information, path congestion information for each observed path.

33. The source routing system of claim 30, wherein the network module sends the received flow state to a global aggregator in the network, and stores global asynchronous information updated by the global aggregator into the local cache module according to a predetermined frequency.

34. The source routing system of claim 33, wherein the global aggregator receives global asynchronous information about flow status sent by each network node in the network and sends the global asynchronous information to each network node in the network according to a predetermined frequency.

35. The source routing system of claim 33, wherein the global asynchronous information comprises information on utilization of each link globally for at least a predetermined historical duration.

36. The source routing system of claim 35, wherein the local cache module is further configured to determine the utilization of each of the candidate paths according to a maximum value and an average value of utilization of links in each of the candidate paths for a predetermined duration within at least a predetermined historical duration.

37. The source routing system of claim 36, wherein the at least one predetermined historical time duration is a plurality of predetermined historical time durations, at least two of the predetermined historical time durations being different.

38. The source routing system of claim 30, wherein the route decision module is configured to determine the local synchronization information and the global asynchronization information as path states of the candidate paths, and to compute the path states of the candidate paths as inputs to generate a data packet comprising destination path information.

39. The source routing system of claim 38, wherein the computing is performed using a reinforcement learning model to compute the path states of the candidate paths as input to generate the data packet including the target path information.

40. The source routing system of claim 39, wherein the using the reinforcement learning model is calculated as: respectively taking the path states of the candidate paths as the input of a neural network to calculate the utility value of each candidate path; determining the candidate path with the maximum utility value as a target path according to a preset first probability, and writing the information of the target path into the data packet to be sent to a switch in the network; or determining the target path by using a random strategy according to a preset second probability, and writing the information of the target path into the data packet to be sent to a switch in the network, wherein the first probability is greater than the second probability.

41. The source routing system of claim 40, wherein the random policy comprises a uniform random measurement or a random policy that decides based on a Max-Boltzmann distribution of utility values.

42. The source routing system of claim 39, wherein the network module is further configured to update the calculated weights of the reinforcement learning model in the routing decision module from a training network according to a predetermined period.

43. The source routing system of claim 42, wherein the process of updating the calculated weights of the reinforcement learning model from a training network according to a predetermined period comprises empirically training the estimation of utility values of the neural network by observing a last-time path decision state, a current candidate path state, and a current immediate reward in the flow states of the network.

44. A computer device, comprising:

a network card device;

a memory, a computer program for source routing;

one or more processors;

wherein the processor is used to invoke a computer program of source route routing in the memory to perform the source route routing method of any of claims 1-14.

45. A computer-readable storage medium, in which a computer program for active route routing is stored, which computer program, when executed, implements the source route routing method of any one of claims 1-14.