CN110581808B - Congestion control method and system based on deep reinforcement learning - Google Patents

Congestion control method and system based on deep reinforcement learning Download PDF

Info

Publication number
CN110581808B
CN110581808B CN201910778639.5A CN201910778639A CN110581808B CN 110581808 B CN110581808 B CN 110581808B CN 201910778639 A CN201910778639 A CN 201910778639A CN 110581808 B CN110581808 B CN 110581808B
Authority
CN
China
Prior art keywords
network
congestion control
reward
value
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910778639.5A
Other languages
Chinese (zh)
Other versions
CN110581808A (en
Inventor
王菲
廖旭东
马成业
胡海燕
陈艳姣
廖崎臣
张竞之
夏振厂
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuhan University WHU
Original Assignee
Wuhan University WHU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuhan University WHU filed Critical Wuhan University WHU
Priority to CN201910778639.5A priority Critical patent/CN110581808B/en
Publication of CN110581808A publication Critical patent/CN110581808A/en
Application granted granted Critical
Publication of CN110581808B publication Critical patent/CN110581808B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/14Network analysis or design
    • H04L41/145Network analysis or design involving simulating, designing, planning or modelling of a network
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L47/00Traffic control in data switching networks
    • H04L47/10Flow control; Congestion control
    • H04L47/22Traffic shaping
    • H04L47/225Determination of shaping rate, e.g. using a moving window
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L47/00Traffic control in data switching networks
    • H04L47/10Flow control; Congestion control
    • H04L47/27Evaluation or update of window size, e.g. using information derived from acknowledged [ACK] packets
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L47/00Traffic control in data switching networks
    • H04L47/10Flow control; Congestion control
    • H04L47/28Flow control; Congestion control in relation to timing considerations
    • H04L47/283Flow control; Congestion control in relation to timing considerations in response to processing delays, e.g. caused by jitter or round trip time [RTT]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L47/00Traffic control in data switching networks
    • H04L47/10Flow control; Congestion control
    • H04L47/29Flow control; Congestion control using a combination of thresholds
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L47/00Traffic control in data switching networks
    • H04L47/10Flow control; Congestion control
    • H04L47/32Flow control; Congestion control by discarding or delaying data units, e.g. packets or frames

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The invention discloses a congestion control method and a system based on deep reinforcement learning, wherein the congestion control method comprises the steps of firstly initializing the environment and model parameters of a network, then training a congestion control model by utilizing the collected current window, throughput, time delay, data sending rate and the like in the network, selecting the congestion control model with the minimum model loss function value and the maximum reward function value according to the training result, and then deploying the model into the network for congestion control. The method dynamically adjusts the size of the congestion window according to the current network throughput, round-trip delay and data packet loss rate, thereby controlling the sending rate of data, improving the network throughput, and reducing the data transmission delay and the packet loss rate, thereby reducing the occurrence of network congestion and achieving the purpose of optimizing the network performance.

Description

Congestion control method and system based on deep reinforcement learning
Technical Field
The invention relates to the technical field of computers, in particular to a congestion control method and a congestion control system based on deep reinforcement learning.
Background
The rapid development of the next generation internet technology and the rapid increase of internet application programs bring convenience to life and improve experience quality for people, and also put forward new requirements on network performance, and particularly in the aspect of network congestion control, the sending rate of data needs to be continuously adjusted according to network indexes such as the number of packets retransmitted overtime, the average packet delay, the percentage of discarded packets and the like of the network, so that the occurrence of network congestion is reduced, network resources are effectively utilized, the performance of the network is improved, and high-quality service experience is provided for users. As an important means for improving network throughput, reducing data transmission delay, reducing data packet loss rate, and other network performances, computer network congestion control has become an important research hotspot and further development direction in the field of computer network technologies.
In the prior art, congestion control methods can be mainly classified into three categories: (1) congestion control method based on packet loss. (2) A delay-based congestion method. (3) A method of congestion control based on probing.
The inventor of the present application finds that the method of the prior art has at least the following technical problems in the process of implementing the present invention:
the congestion control method based on packet loss regards packet loss as a congestion signal, and when packet loss occurs, a sending window is halved to avoid congestion. However, under the condition of no packet loss, the buffer area is continuously filled, so that the buffer area is kept in an overfilled state for a long time, which causes an excessive queuing delay, and the bandwidth utilization rate is also poor in the network environment of link packet loss. The congestion control method based on time delay uses delay as a congestion signal, and comprises sending delay, queuing delay, transmission delay and the like. And the number of network data packets can be roughly estimated by using the delay, and the congestion control protocol based on the delay can ensure that the network data packets have good performance in limiting the delay, but when the bottleneck bandwidth is shared with the data flow based on the packet loss, the bandwidth allocation is not fair due to lack of competitiveness. The congestion control method based on the detection does not set a specific congestion signal, but uses the detection method to form a congestion control strategy by means of an evaluation function. However, the control strategies all depend on training data, when the real network conditions are different, the performance is greatly reduced, and some used detection methods cannot quickly respond to the change of the network environment.
Therefore, the method in the prior art has the technical problem of poor control effect.
Disclosure of Invention
In view of the above, the present invention provides a congestion control method and system based on deep reinforcement learning, so as to solve or at least partially solve the technical problem of poor control effect of the method in the prior art.
In order to solve the technical problem, the invention provides a congestion control method based on deep reinforcement learning, which comprises the following steps:
step S1: initializing a network environment, and generating network state data, wherein the network state data comprises network delay, transmission rate, sending rate and congestion window size;
step S2: initializing parameters of a congestion control model, wherein the parameters of the congestion control model comprise a reward function, an experience pool size, a neural network structure and a learning rate;
step S3: selecting target network state data from the generated network state data, updating parameters of a neural network according to the target network state data, a reward function and a loss function, and generating different congestion control models;
step S4: and screening out an optimal model as a target congestion control model according to the value of the reward function and the value of the loss function, deploying the target congestion control model into the network, and performing congestion control.
In one embodiment, step S1 specifically includes:
step S1.1: establishing connection between two communication parties;
step S1.2: and calculating the network delay, the transmission rate, the sending rate and the size of a congestion window according to the data sent by the two communication parties through the established connection.
In one embodiment, the parameters of the congestion control model in step S2 further include:
parameters of the Q network, parameters of the target Q network, number of rounds, throughput threshold, reward threshold, and maximum number of steps for a round.
In one embodiment, step S3 specifically includes:
step S3.1: according to the target network state data, exploring with the probability size being in the range of ∈ and randomly taking action, or selecting the action argmax with the maximum Q value under the current state according to the probability of 1 ∈aQ(φ(st) A; θ), where e is a probability variable, Q represents a value calculated by the neural network when taking different actions, a represents a different action, φ(s)t) Represents the state at time t;
step S3.2: and updating the neural network parameters in a mode of minimizing a loss function according to the obtained value of the Reward function, and generating different congestion control models.
In one embodiment, the method further comprises:
and judging whether the steps of the current round are finished, if the accumulated reward value of the current round is smaller than a reward threshold value and the throughput is smaller than a throughput threshold value or the step number of the current round is larger than or equal to the maximum step number value of one round, generating a congestion control model, and otherwise, starting the next step of the round, wherein each round corresponds to one round of training.
In one embodiment, the neural network in step S3 includes an input layer, a hidden layer and an output layer, wherein the hidden layer includes two convolutional layers for extracting features of the input data set and two fully-connected layers for integrating local information with category distinctiveness in the convolutional layers.
In one embodiment, the reward function is of the form:
Reward=α*tput-β*RTT-γ*packet_loss_rate (1)
wherein Reward represents a Reward value, RTT represents a network delay, packet _ loss _ rate represents a loss rate of a data packet, which is a ratio of a packet loss number to a transmission packet number, and α, β, and γ values are preset parameters.
In one embodiment, the throughput is calculated as follows:
tput=0.008*(delivered-last_delivered)/max(1,duration) (2)
wherein, t represents throughput, duration represents the total time length of the current data stream, delayed represents the current data transmission amount, and last _ delayed represents the last data transmission amount.
Based on the same inventive concept, the second aspect of the present invention provides a congestion control system based on deep reinforcement learning, which includes:
the device comprises a parameter initialization module, a congestion control module and a congestion control module, wherein the parameter initialization module is used for initializing a network environment and generating network state data, and the network state data comprises network delay, transmission rate, sending rate and congestion window size;
the system comprises an environment initialization module, a congestion control module and a management module, wherein the environment initialization module is used for initializing parameters of the congestion control model, and the parameters of the congestion control model comprise a reward function, an experience pool size, a neural network structure and a learning rate;
the model generation module is used for selecting target network state data from the generated network state data, updating parameters of the neural network according to the target network state data, the reward function and the loss function, and generating different congestion control models;
and the congestion control module is used for screening out an optimal model as a target congestion control model according to the value of the reward function and the value of the loss function, deploying the target congestion control model into the network and performing congestion control.
One or more technical solutions in the embodiments of the present application have at least one or more of the following technical effects:
the invention provides a congestion control method based on deep reinforcement learning, which comprises the steps of firstly, initializing a network environment and obtaining network state data, wherein the network state data comprises network delay, transmission rate, sending rate and congestion window size; then initializing parameters of a congestion control model, wherein the parameters of the congestion control model comprise a reward function, an experience pool size, a neural network structure and a learning rate; then, selecting target network state data from the generated network state data, updating parameters of the neural network according to the target network state data, the reward function and the loss function, and generating different congestion control models; and finally, screening out an optimal model as a target congestion control model according to the value of the reward function and the value of the loss function, deploying the target congestion control model into the network, and performing congestion control.
The invention can train the congestion control model by utilizing the selected network state data (network time delay, transmission rate, sending rate and congestion window size) and the like, select the congestion control model with the minimum model loss function value and the maximum reward function value according to the training result, and then deploy the model into the network to control the congestion. The method dynamically adjusts the size of the congestion window according to the current network throughput, round-trip delay and data packet loss rate, thereby controlling the sending rate of data, reducing the occurrence of network congestion and achieving the purpose of optimizing network performance. The technical problem of poor control effect in the prior art is solved.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.
Fig. 1 is a schematic flow chart of a congestion control method based on deep reinforcement learning according to the present invention;
FIG. 2 is a diagram illustrating the overall steps of a congestion control method according to an embodiment of the present invention;
FIG. 3 is a flowchart of an initialization run environment of an embodiment of the present invention;
fig. 4 is a schematic diagram of adaptive adjustment of a congestion window according to an embodiment of the present invention;
FIG. 5 is a schematic diagram of model parameter update according to an embodiment of the present invention;
fig. 6 is a block diagram of a congestion control system based on deep reinforcement learning according to an embodiment of the present invention.
Detailed Description
The invention aims to provide a congestion control method and system based on deep reinforcement learning, aiming at the technical problem of poor control effect of the method in the prior art, so as to achieve the purpose of improving the congestion control effect.
In order to achieve the above purpose, the main concept of the invention is as follows:
the invention provides a congestion control method and system based on deep reinforcement learning, which are mainly based on reinforcement learning and utilize performance indexes such as current window, throughput, time delay, data sending rate and the like in a network. The existing network congestion control technology generally controls the sending rate based on the Time delay (Round-Trip Time, RTT), the packet loss rate (Lose rate), and the like, and although the network congestion can be solved to a certain extent, the congestion window cannot be adjusted according to the real network environment, and the overall performance is not as good as that of the present invention. The method of the invention can fully utilize the performance index of the network, generate the congestion control model through deep reinforcement learning, and generate appropriate values (the size and the direction of the congestion window) to adjust the size and the direction of the network congestion window, so as to improve the network throughput, reduce the packet loss rate and the time delay, and further solve the network congestion. By the method and the device, better network performance can be obtained, and the obtained result is more scientific and accurate.
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Example one
The present embodiment provides a congestion control method based on deep reinforcement learning, please refer to fig. 1, and the method includes:
step S1: initializing a network environment, and generating network state data, wherein the network state data comprises a network delay, a transmission rate, a sending rate, and a congestion window size.
Specifically, step S1 is to initialize parameters of the computer network and then generate network status data.
In a specific implementation, step S1 specifically includes:
step S1.1: establishing connection between two communication parties;
step S1.2: and calculating the network delay, the transmission rate, the sending rate and the size of a congestion window according to the data sent by the two communication parties through the established connection.
Specifically, before the program starts, it is necessary to initialize a network environment, establish a connection between both communication parties, calculate status data such as a network delay (RTT), a transmission rate (delivery rate), a sending rate (sending rate), and a congestion window size (cwnd) of a network by data transmission of both communication parties, and store the data in an experience pool. After a certain amount of data is stored in the experience pool, a certain amount of state data can be randomly taken from the experience pool to prepare for the operation of each step (i.e., subsequent training).
Step S2: initializing parameters of a congestion control model, wherein the parameters of the congestion control model comprise a reward function, an experience pool size, a neural network structure and a learning rate.
The parameters of the congestion control model in step S2 further include:
parameters of the Q network, parameters of the target Q network, number of rounds, throughput threshold, reward threshold, and maximum number of steps for a round.
Specifically, before training the model, it is first necessary to initialize the experience pool, the parameters of the Q network, and the parameters of the target Q network. Then a round is initialized, a throughput preset bad-put is set and a reward threshold bad-reward is set and a maximum step number max-step of a round is set.
Step S3: and selecting target network state data from the generated network state data, updating parameters of the neural network according to the target network state data, the reward function and the loss function, and generating different congestion control models.
Specifically, data may be randomly selected as the target network status data from the network status data generated in step S1. And then training the neural network by using the target network state data, wherein the reward function and the loss function are used for adjusting and updating parameters of the neural network so as to obtain different congestion control models.
In one embodiment, step S3 specifically includes:
step S3.1: according to the target network state data, exploring with the probability size being in the range of ∈ and randomly taking action, or selecting the action argmax with the maximum Q value under the current state according to the probability of 1 ∈aQ(φ(st) A; θ), where e is a probability variable, Q represents a value calculated by the neural network when taking different actions, a represents a different action, φ(s)t) Represents the state at time t;
step S3.2: and updating the neural network parameters in a mode of minimizing a loss function according to the obtained value of the Reward function, and generating different congestion control models.
Specifically, in reinforcement learning, the congestion control is to change the size of a window by an action, and the basis of what action is taken is status data. The two modes are two mechanisms of action selection by DQN (deep reinforcement learning). Reinforcement learning defines an environment for an Agent to implement certain actions to maximize rewards.
When the parameters are updated, the Q network and target Q parameters need to be adjusted according to the feedback of the reward function received by the Agent, and are updated respectively. In the optimization process, after a certain number of time steps, the target Q parameter of the target network is updated to be the parameter of the eval net of the current training network. The Q network and the target Q (target Q network) are two networks in the DNQ model of the present invention, and they have the same structure, but the updating manner and function are different.
To optimize the neural network of the present invention, a small batch of data in the experience pool can be randomly extracted at each optimization, and the optimization is by minimizing a loss function L (θ), which is defined as follows:
Figure BDA0002175866210000071
wherein the content of the first and second substances,
Figure BDA0002175866210000072
Q-(·|θ-) For the network with the parameter theta-time obtained from the last target network update (i.e. the actual value of Q calculated from the parameters of the actual neural network), Q(s)t,at| θ) is an estimate of Q, st,atRespectively representing the state at time t and the action taken, rtMax is the maximum Q value for the value of the reward function obtained, and E is the expectation of obtaining a at the current s. The smaller the difference between the actual value of Q and the estimated value of Q, the better. The loss function of the invention is minimized by a random gradient method, thereby effectively achieving the purpose of optimizing the network.
In one embodiment, the method further comprises:
and judging whether the steps of the current round are finished, if the accumulated reward value of the current round is smaller than a reward threshold value and the throughput is smaller than a throughput threshold value or the step number of the current round is larger than or equal to the maximum step number value of one round, generating a congestion control model, and otherwise, starting the next step of the round, wherein each round corresponds to one round of training.
In one embodiment, the reward function is of the form:
Reward=α*tput-β*RTT-γ*packet_loss_rate (1)
wherein Reward represents a Reward value, RTT represents a network delay, packet _ loss _ rate represents a loss rate of a data packet, which is a ratio of a packet loss number to a transmission packet number, and α, β, and γ values are preset parameters.
In one embodiment, the throughput is calculated as follows:
tput=0.008*(delivered-last_delivered)/max(1,duration) (2)
wherein, t represents throughput, duration represents the total time length of the current data stream, delayed represents the current data transmission amount, and last _ delayed represents the last data transmission amount.
Specifically, whether the steps of the current round are finished or not is judged, and if the steps are rewarded < bad-rewarded and the t < bad-tput or step-count (the step number of the current round) > (max-step), a congestion control model is generated and stored. A new round is then started and step-count is made 0. Otherwise, starting the next step of the round, and increasing the number of the steps in a manner of making step-count equal to step-count + 1; the program runs continuously to generate different congestion control models.
Step S4: and screening out an optimal model as a target congestion control model according to the value of the reward function and the value of the loss function, deploying the target congestion control model into the network, and performing congestion control.
Specifically, according to the sizes of the Loss and the Reward, a corresponding model with a low Loss value and a high Reward value in a period of time is selected as an obtained optimal model and deployed in the environment. And selecting the optimal action for adjusting the size and the direction of the congestion window according to the state of the network. The implementation mode is as follows:
and acquiring the current network state, and obtaining an action according to the network state, such as cwnd (cwnd) (. 2). This action is then executed to expand the current congestion window by a factor of two. The sender makes a decision as to whether or not an ack (acknowledgement message) is obtained from the receiver, and if not, waits until it is obtained. After the ack is obtained, the state and the reward are updated, the update of the state is to observe the network delay (RTT), the transmission rate (delivery rate), the sending rate (sending rate) and the congestion window size (cwnd) network state of the network link in step S1, the update of the reward is to calculate according to a reward function, and the flow is ended.
In one embodiment, the neural network in step S3 includes an input layer, a hidden layer and an output layer, wherein the hidden layer includes two convolutional layers for extracting features of the input data set and two fully-connected layers for integrating local information with category distinctiveness in the convolutional layers.
Specifically, when training the neural network, a part of samples in the "experience pool" may be randomly extracted to train the model (the experience pool is a container for storing data, and stores historical data), firstly, the number of extracted samples is fixed to be "history _ length", and in the present invention, the history _ length is 16, that is, 16 sets of state data are randomly selected in the "experience pool" each time, and are used as the input of the first layer input layer in the deep neural network, and then the output of the first layer enters the first layer convolution layer in the hidden layer. The number of channels of input signals of the first layer of the convolutional layer is 16, the number of output channels is 32, the activation function is ReLU and is used as the input of the second layer of the convolutional layer, the number of channels of input signals of the second layer of the convolutional layer is 32, the number of output channels is 64, the activation function is also ReLU, after the two layers of the convolutional layer are processed, extracted characteristic data are flattened (Flatter), then the flattened data are input into the first layer of the fully-connected layer, the output is processed through the ReLU and then enters the second layer of the fully-connected layer, and the output of the second layer of the fully-connected layer is the Q value corresponding to different actions required by the invention.
In order to more clearly illustrate the implementation and beneficial effects of the method provided by the invention, the following detailed description is given by specific examples.
Please refer to fig. 2, which is a general step diagram of the congestion control method. Firstly initializing a network environment, generating network state data, then judging whether a model needs to be trained, if not, executing a left branch, otherwise, executing a right branch. I.e. whether the training model or the loading model is selected to run on the actual link. The training model is used for training agents for reinforcement learning to train a plurality of models. And the loading model is to select a trained model for carrying out congestion control on the network.
And the left branch correspondingly selects a loading model to run in an actual link, at the moment, a network environment is set, state data is selected to run, a corresponding action is obtained, and then the size of a congestion window is adjusted.
The right branch corresponds to a training model, firstly, parameters of the model need to be initialized, then state data are randomly selected, when model training is carried out, an Agent is adopted to observe a Sender and a Receiver in Environment, and observed data states are sent to a DQN neural network. The DQN continuously learns through reward of data sending and Environment feedback, and an action is adopted as a mode for adjusting the congestion window. After the Environment takes the action, the Environment feeds back to the Agent a reward as a reward and punishment of the previous step, and the reward and punishment are used for measuring the performance of the action given by the Agent on congestion control of the previous step. The value of Reward is calculated from equation (1).
Reward=α*tput-β*RTT-γ*packet_loss_rate (1)
Wherein, the throughput is the throughput of the network and is calculated by formula (2); RTT is the round trip delay of the network and is calculated by formula (4); the packet _ loss _ rate is calculated by formula (3), and experiments prove that when the value of alpha is 0.6, the value of beta is 0.2, and the value of gamma is 0.2, the performance of the network is optimal.
tput=0.008*(delivered-last_delivered)/max(1,duration) (2)
Wherein duration represents the total time length of opening the current data stream, and delivery and last _ delivery represent the transmission number of the current time and the last time;
packet_loss_rate=loss_num/send_num (3)
where loss _ num and send _ num represent the number of packets lost and sent, respectively.
The training process is as follows: the first round needs to be initialized, each round has unequal number of steps, and the specific situation needs to be judged. Each step is run. Referring to fig. 4, flatten indicates flattening, an activation function is ReLU, a hidden layer includes two convolutional layers and two fully-connected layers, extracted feature data is flattened (Flatter) after the two convolutional layers of the hidden layer are processed, then the flattened data is input into a first fully-connected layer, the data is output after the processing of the ReLU through the activation function and then enters a second fully-connected layer, and the output of the second fully-connected layer is Q values corresponding to different actions required by the present invention.
The action is obtained from the neural network according to the current state, specifically, an action list corresponding to five actions set by the invention is firstly designed through experience and multiple experiments: ["+0.0","-100.0","+100.0","*2","/2.0"]These five actions represent cwndt+1=cwndt,cwndt+1=cwndt-100,cwndt+1=cwndt+100,cwndt+1=cwndt*2,cwndt+1=cwndt/2. After each training, obtaining Q values corresponding to each action at the current state through the output of the deep neural network, selecting the action with the maximum Q value to further obtain an index to the action list, and determining the action to be selected by the agent, wherein if the neural network learns that the Q value of the second action is maximum in the training process of the time, the Q value of-100.0' in the Q values output by the network is maximum, further obtaining the index to the action list, and obtaining the action list [1 ]]I.e., -100.0", so agent is allowed to take action, i.e., cwndt+1=cwndt100, achieving the purpose of adjusting the window. After the action is obtained, the action is executed. And judging whether the ack of the receiver is obtained or not, and continuously waiting until the ack is obtained if the ack is not obtained. After ack is obtained, the state needs to be updated, which is to observe four parameters of the network link (the network delay, the transmission rate, the sending rate and the congestion window size mentioned above), and the rewarded update is calculated by using the existing state according to a rewarded function.
And acquiring the current network state, and obtaining action according to the network state. This action is then performed to reduce the current congestion window by 100. And judging whether the ack of the receiver is obtained or not, and continuously waiting until the ack is obtained if the ack is not obtained. After ack is obtained, the state and the rewarded are updated, the state updating is to observe four parameters of the network link in the step 1, the rewarded updating is to calculate according to a rewarded function, and the process is ended.
Referring to FIG. 3, a flowchart for initializing an operating environment is shown. The state of the incoming neural network is RTT, delivery rate (delivery rate), sending rate (sendingrate) and congestion window size (cwnd) which need to be calculated by equations (4), (5) and (6). Calculating the Q value of each action through the operation of the neural network, and selecting the action corresponding to the maximum Q value as the current regulation mode.
RTT=float(curr_time_ms-ack.send_ts) (4)
Curr _ time _ ms represents the current ack receiving time of the sender, and ack _ send _ ts represents the time of sending the packet corresponding to the ack;
Figure BDA0002175866210000101
delayed and ack, respectively representing the transmission number of packets and the transmission number of ack, and delayed _ time and ack, respectively representing the transmission time of packets and ack;
send_rate=0.008*(self.sent_bytes-ack.sent_bytes)/max(1,self.rtt) (6)
send bytes represents the total bytes sent and ack send bytes represents the bytes that have been sent recently.
And operating an observation program. Specifically, as shown in fig. 5:
the buffers and history data are updated and they hold the previous relevant parameters. Then, it is determined whether or not the learning mini-batch is reached, that is, whether or not the minimum learning data is reached, which is based on the currently learned step number, lean-step-counter, the set learning step number, lean-start, and the training frequency, train-frequency, and if lean-step-counter > lean-start and lean _ step _ counter% train _ frequency is 0, the neural network starts learning. Next, it is determined whether the condition of update target-q is reached, and if the condition is reached, specifically, a frequency count target _ q _ update _ step of update target-q is set, and if left _ step _ counter% target _ q _ update _ step is 0, q-value is updated.
Generally, compared with the prior art, the technical scheme of the invention has the following advantages and beneficial effects:
the invention provides a congestion control method and system based on deep reinforcement learning by using performance indexes such as a current window, throughput, time delay, data sending rate and the like in a network based on reinforcement learning. The existing network congestion control technology generally modifies the sending rate based on the Time delay (Round-Trip Time, RTT), the packet loss rate (Lose rate), and the like, and although the network congestion can be solved to a certain extent, the congestion window cannot be adjusted according to the real network adjustment, and the overall performance is not as good as that of the present invention. The method of the invention can fully utilize the performance index of the network, adopts a proper value through deep reinforcement learning to adjust the size and the direction of the network congestion window, thereby improving the network throughput, reducing the packet loss rate and the time delay and further solving the network congestion. By the method and the device, better network performance can be obtained, and the obtained result is more scientific and accurate.
Example two
Based on the same inventive concept, the present embodiment provides a congestion control system based on deep reinforcement learning, please refer to fig. 6, the system includes:
a parameter initialization module 201, configured to initialize a network environment and generate network status data, where the network status data includes a network delay, a transmission rate, a sending rate, and a size of a congestion window;
the environment initialization module 202 is configured to initialize parameters of a congestion control model, where the parameters of the congestion control model include a reward function, an experience pool size, a neural network structure, and a learning rate;
the model generation module 203 is used for selecting target network state data from the generated network state data, updating parameters of the neural network according to the target network state data, the reward function and the loss function, and generating different congestion control models;
and the congestion control module 204 is configured to screen out an optimal model as a target congestion control model according to the value of the reward function and the value of the loss function, deploy the target congestion control model to the network, and perform congestion control.
In one embodiment, the environment initialization module 202 is specifically configured to perform the following steps:
step S1.1: establishing connection between two communication parties;
step S1.2: and calculating the network delay, the transmission rate, the sending rate and the size of a congestion window according to the data sent by the two communication parties through the established connection.
In one embodiment, the parameters of the congestion control model further comprise:
parameters of the Q network, parameters of the target Q network, number of rounds, throughput threshold, reward threshold, and maximum number of steps for a round.
In one embodiment, the model generation module 203 is specifically configured to perform the following steps:
step S3.1: according to the target network state data, exploring with the probability size being in the range of ∈ and randomly taking action, or selecting the action argmax with the maximum Q value under the current state according to the probability of 1 ∈aQ(φ(st) A; θ), where e is a probability variable, Q represents a value calculated by the neural network when taking different actions, a represents a different action, φ(s)t) Represents the state at time t;
step S3.2: and updating the neural network parameters in a mode of minimizing a loss function according to the obtained value of the Reward function, and generating different congestion control models.
In one embodiment, the system further comprises a determining module configured to:
and judging whether the steps of the current round are finished, if the accumulated reward value of the current round is smaller than a reward threshold value and the throughput is smaller than a throughput threshold value or the step number of the current round is larger than or equal to the maximum step number value of one round, generating a congestion control model, and otherwise, starting the next step of the round, wherein each round corresponds to one round of training.
In one embodiment, the neural network comprises an input layer, a hidden layer and an output layer, wherein the hidden layer comprises two convolutional layers and two fully-connected layers, the convolutional layers are used for extracting characteristics of an input data set, and the fully-connected layers are used for integrating local information with category distinctiveness in the convolutional layers.
In one embodiment, the reward function is of the form:
Reward=α*tput-β*RTT-γ*packet_loss_rate (1)
wherein Reward represents a Reward value, RTT represents a network delay, packet _ loss _ rate represents a loss rate of a data packet, which is a ratio of a packet loss number to a transmission packet number, and α, β, and γ values are preset parameters.
In one embodiment, the throughput is calculated as follows:
tput=0.008*(delivered-last_delivered)/max(1,duration) (2)
wherein, t represents throughput, duration represents the total time length of the current data stream, delayed represents the current data transmission amount, and last _ delayed represents the last data transmission amount.
Since the system described in the second embodiment of the present invention is a system for implementing the congestion control method based on deep reinforcement learning in the first embodiment of the present invention, a person skilled in the art can understand the specific structure and deformation of the system based on the method described in the first embodiment of the present invention, and thus details thereof are not described herein. All systems adopted by the method of the first embodiment of the present invention are within the intended protection scope of the present invention.
While preferred embodiments of the present invention have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all such alterations and modifications as fall within the scope of the invention.
It will be apparent to those skilled in the art that various modifications and variations can be made in the embodiments of the present invention without departing from the spirit or scope of the embodiments of the invention. Thus, if such modifications and variations of the embodiments of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to encompass such modifications and variations.

Claims (9)

1. A congestion control method based on deep reinforcement learning is characterized by comprising the following steps:
step S1: initializing a network environment, and generating network state data, wherein the network state data comprises network delay, transmission rate, sending rate and congestion window size;
step S2: initializing parameters of a congestion control model, wherein the parameters of the congestion control model comprise a reward function, an experience pool size, a neural network structure and a learning rate;
step S3: selecting target network state data from the generated network state data, updating parameters of a neural network according to the target network state data, a reward function and a loss function, and generating different congestion control models;
step S4: and screening out an optimal model as a target congestion control model according to the value of the reward function and the value of the loss function, deploying the target congestion control model into the network, and performing congestion control.
2. The method according to claim 1, wherein step S1 specifically comprises:
step S1.1: establishing connection between two communication parties;
step S1.2: and calculating the network delay, the transmission rate, the sending rate and the size of a congestion window according to the data sent by the two communication parties through the established connection.
3. The method of claim 1, wherein the parameters of the congestion control model in step S2 further comprise:
parameters of the Q network, parameters of the target Q network, number of rounds, throughput threshold, reward threshold, and maximum number of steps for a round.
4. The method according to claim 3, wherein step S3 specifically comprises:
step S3.1: according to the target network state data, the probability is used as the E to search, actions are randomly taken, or the probability of 1-E is used to select the Q value under the current state to be the maximumLarge action argmaxaQ(φ(st) A; θ), where e is a probability variable, Q represents a value calculated by the neural network when taking different actions, a represents a different action, φ(s)t) Represents the state at time t, and theta represents a parameter of the neural network;
step S3.2: and updating parameters of the neural network in a mode of minimizing a loss function according to the acquired value of the reward function, and generating different congestion control models.
5. The method of claim 4, wherein the method further comprises:
and judging whether the steps of the current round are finished, if the accumulated reward value of the current round is smaller than a reward threshold value and the throughput is smaller than a throughput threshold value or the step number of the current round is larger than or equal to the maximum step number value of one round, generating a congestion control model, and otherwise, starting the next step of the round, wherein each round corresponds to one round of training.
6. The method of claim 1, wherein the neural network in step S3 comprises an input layer, a hidden layer and an output layer, wherein the hidden layer comprises two convolutional layers for extracting features of the input data set and two fully-connected layers for integrating local information with category distinctiveness in the convolutional layers.
7. The method of claim 1, wherein the reward function is of the form:
Reward=α*tput-β*RTT-γ*packet_loss_rate
wherein, tput represents throughput, Reward represents a Reward value, RTT represents network delay, packet _ loss _ rate represents a loss rate of a data packet, which is a ratio of a packet loss number to a transmission packet number, and α, β, and γ values are preset parameters.
8. The method of claim 5, wherein the throughput is calculated as follows:
tput=0.008*(delivered-last_delivered)/max(1,duration)
wherein, t represents throughput, duration represents the total time length of the current data stream, delayed represents the current data transmission amount, and last _ delayed represents the last data transmission amount.
9. A congestion control system based on deep reinforcement learning, comprising:
the device comprises a parameter initialization module, a congestion control module and a congestion control module, wherein the parameter initialization module is used for initializing a network environment and generating network state data, and the network state data comprises network delay, transmission rate, sending rate and congestion window size;
the system comprises an environment initialization module, a congestion control module and a management module, wherein the environment initialization module is used for initializing parameters of the congestion control model, and the parameters of the congestion control model comprise a reward function, an experience pool size, a neural network structure and a learning rate;
the model generation module is used for selecting target network state data from the generated network state data, updating parameters of the neural network according to the target network state data, the reward function and the loss function, and generating different congestion control models;
and the congestion control module is used for screening out an optimal model as a target congestion control model according to the value of the reward function and the value of the loss function, deploying the target congestion control model into the network and performing congestion control.
CN201910778639.5A 2019-08-22 2019-08-22 Congestion control method and system based on deep reinforcement learning Active CN110581808B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910778639.5A CN110581808B (en) 2019-08-22 2019-08-22 Congestion control method and system based on deep reinforcement learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910778639.5A CN110581808B (en) 2019-08-22 2019-08-22 Congestion control method and system based on deep reinforcement learning

Publications (2)

Publication Number Publication Date
CN110581808A CN110581808A (en) 2019-12-17
CN110581808B true CN110581808B (en) 2021-06-15

Family

ID=68811694

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910778639.5A Active CN110581808B (en) 2019-08-22 2019-08-22 Congestion control method and system based on deep reinforcement learning

Country Status (1)

Country Link
CN (1) CN110581808B (en)

Families Citing this family (31)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111092823B (en) * 2019-12-25 2021-03-26 深圳大学 Method and system for adaptively adjusting congestion control initial window
CN111199272B (en) * 2019-12-30 2023-11-03 同济大学 Self-adaptive scheduling method for intelligent workshops
CN112422441B (en) * 2020-03-05 2022-10-04 上海哔哩哔哩科技有限公司 Congestion control method and system based on QUIC transmission protocol
CN111372284B (en) * 2020-03-10 2022-07-29 中国联合网络通信集团有限公司 Congestion processing method and device
CN113572694B (en) 2020-04-29 2023-10-13 华为技术有限公司 Congestion control method, device and system and computer storage medium
CN113873571A (en) * 2020-06-30 2021-12-31 华为技术有限公司 Congestion control method and corresponding equipment
CN111818570B (en) * 2020-07-25 2022-04-01 清华大学 Intelligent congestion control method and system for real network environment
CN112104563B (en) * 2020-08-12 2022-08-30 新华三技术有限公司 Congestion control method and device
CN112311690B (en) * 2020-09-25 2022-12-06 福建星网智慧科技有限公司 AI-based congestion control method, device, equipment and medium
CN112383485B (en) * 2020-10-30 2022-08-19 新华三技术有限公司 Network congestion control method and device
CN112469079B (en) * 2020-11-05 2022-04-22 南京大学 Novel congestion control method combining deep reinforcement learning and traditional congestion control
CN112468265B (en) * 2020-11-10 2022-04-22 南京大学 Wireless local area network modulation coding self-adaptive selection method based on reinforcement learning and wireless equipment
CN112822230B (en) * 2020-12-28 2022-03-25 南京大学 Method and system for setting initial rate of sending end based on probability
CN112714074B (en) * 2020-12-29 2023-03-31 西安交通大学 Intelligent TCP congestion control method, system, equipment and storage medium
CN112770353B (en) * 2020-12-30 2022-10-28 武汉大学 Method and device for training congestion control model and method and device for controlling congestion
CN112822718B (en) * 2020-12-31 2021-10-12 南通大学 Packet transmission method and system based on reinforcement learning and stream coding driving
CN112770357B (en) * 2021-01-08 2022-04-26 浙江大学 Wireless network congestion control method based on deep reinforcement learning
US20220231933A1 (en) * 2021-01-20 2022-07-21 Nvidia Corporation Performing network congestion control utilizing reinforcement learning
CN113300970B (en) * 2021-01-22 2022-05-27 青岛大学 TCP congestion dynamic control method and device based on deep learning
CN113079104B (en) * 2021-03-22 2022-09-30 新华三技术有限公司 Network congestion control method, device and equipment
CN113315715B (en) * 2021-04-07 2024-01-05 北京邮电大学 Distributed intra-network congestion control method based on QMIX
CN113315716B (en) * 2021-05-28 2023-05-02 北京达佳互联信息技术有限公司 Training method and equipment of congestion control model and congestion control method and equipment
CN113825171B (en) * 2021-09-30 2023-07-28 新华三技术有限公司 Network congestion control method, device, equipment and medium
CN114553836B (en) * 2022-01-12 2024-02-20 中国科学院信息工程研究所 Data block transmission punctuality improving method based on reinforcement learning
CN114567597B (en) * 2022-02-21 2023-12-19 深圳市亦青藤电子科技有限公司 Congestion control method and device based on deep reinforcement learning in Internet of things
CN114745337B (en) * 2022-03-03 2023-11-28 武汉大学 Real-time congestion control method based on deep reinforcement learning
US20230300671A1 (en) * 2022-03-18 2023-09-21 Qualcomm Incorporated Downlink congestion control optimization
CN114785757B (en) * 2022-03-31 2023-10-20 东北大学 Multipath transmission control method for real-time conversation service
CN114866489A (en) * 2022-04-28 2022-08-05 清华大学 Congestion control method and device and training method and device of congestion control model
CN116055406B (en) * 2023-01-10 2024-05-03 中国联合网络通信集团有限公司 Training method and device for congestion window prediction model
CN117651024A (en) * 2023-12-01 2024-03-05 北京基流科技有限公司 Method for predicting network link congestion of data center

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107171842A (en) * 2017-05-22 2017-09-15 南京大学 Multi-path transmission protocol jamming control method based on intensified learning
CN107864102A (en) * 2017-11-22 2018-03-30 浙江工商大学 A kind of SDN data centers jamming control method based on Sarsa
CN108211362A (en) * 2017-12-26 2018-06-29 浙江大学 A kind of non-player role fight policy learning method based on depth Q learning networks

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10977551B2 (en) * 2016-12-14 2021-04-13 Microsoft Technology Licensing, Llc Hybrid reward architecture for reinforcement learning
CN109194583B (en) * 2018-08-07 2021-05-14 中国地质大学(武汉) Network congestion link diagnosis method and system based on deep reinforcement learning
CN109471963A (en) * 2018-09-13 2019-03-15 广州丰石科技有限公司 A kind of proposed algorithm based on deeply study
CN109471847B (en) * 2018-09-18 2020-06-09 华中科技大学 I/O congestion control method and control system
CN109710741A (en) * 2018-12-27 2019-05-03 中山大学 A kind of mask method the problem of study based on deeply towards online answer platform

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107171842A (en) * 2017-05-22 2017-09-15 南京大学 Multi-path transmission protocol jamming control method based on intensified learning
CN107864102A (en) * 2017-11-22 2018-03-30 浙江工商大学 A kind of SDN data centers jamming control method based on Sarsa
CN108211362A (en) * 2017-12-26 2018-06-29 浙江大学 A kind of non-player role fight policy learning method based on depth Q learning networks

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
A Deep Reinforcement Learning Perspective on Internet Congestion Control;Nathan Jay,et al.;《International Conference on Machine》;20190531;全文 *
Rax:Deep Reinforcement Learning for Congestion Control;Maximilian Bachl,et al.;《IEEE International Conference on Communications》;20190715;全文 *

Also Published As

Publication number Publication date
CN110581808A (en) 2019-12-17

Similar Documents

Publication Publication Date Title
CN110581808B (en) Congestion control method and system based on deep reinforcement learning
CN107634911B (en) Adaptive congestion control method based on deep learning in information center network
CN110278149B (en) Multi-path transmission control protocol data packet scheduling method based on deep reinforcement learning
CN107864084B (en) The transmission method and device of data packet
CN111818570B (en) Intelligent congestion control method and system for real network environment
WO2021103706A1 (en) Data packet sending control method, model training method, device, and system
CN112770353B (en) Method and device for training congestion control model and method and device for controlling congestion
KR102246465B1 (en) Method and apparatus of allocating resource of terminal in wireless communication system
CN113595923A (en) Network congestion control method and device
CN107070802A (en) Wireless sensor network Research of Congestion Control Techniques based on PID controller
CN113132490A (en) MQTT protocol QoS mechanism selection scheme based on reinforcement learning
CN109698925A (en) Real-time video jamming control method and device based on data-driven
EP4161029A1 (en) System and method for adapting transmission rate computation by a content transmitter
CN113825171A (en) Network congestion control method, device, equipment and medium
CN114760644A (en) Multilink transmission intelligent message scheduling method based on deep reinforcement learning
Xia et al. A multi-objective reinforcement learning perspective on internet congestion control
CN113726656A (en) Method and device for forwarding delay sensitive flow
WO2024001763A1 (en) Data transmission processing method and device, storage medium, and electronic device
CN117082008A (en) Virtual elastic network data transmission scheduling method, computer device and storage medium
Seo et al. Fairness enhancement of TCP congestion control using reinforcement learning
US9877338B1 (en) Wireless scheduler bandwidth estimation for quick start
US20230231810A1 (en) System and method for adapting transmission rate computation by a content transmitter
CN116389375A (en) Network queue management method, device and router for live video stream
CN114845338A (en) Random back-off method for user access
CN114866196A (en) Data packet retransmission method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant