WO2021128181A1

WO2021128181A1 - Method and system for self-adaptively adjusting initial congestion control window

Info

Publication number: WO2021128181A1
Application number: PCT/CN2019/128759
Authority: WO
Inventors: 谢瑞桃; 孙文斌; 伍楷舜
Original assignee: 深圳大学
Priority date: 2019-12-25
Filing date: 2019-12-26
Publication date: 2021-07-01
Also published as: CN111092823A; CN111092823B

Abstract

Provided are a method and system for self-adaptively adjusting an initial congestion control window. The method comprises: constructing a neural network decision-making model, taking the historical streaming state as input, and the probability distribution of the initial congestion control window as output; constructing a Markov learning process, and obtaining optimized parameters of the neural network decision-making model by means of online learning; using the optimized neural network decision-making model to obtain an initial congestion control window value, which is used for a subsequent streaming process. The present invention uses a deep neural network to dynamically configure the initial congestion control window, and can adapt to changes in the network environment, thereby improving network transmission efficiency.

Description

一种自适应调节拥塞控制初始窗口的方法和***Method and system for adaptively adjusting initial window of congestion control

技术领域Technical field

本发明涉及通信技术领域，尤其涉及一种自适应调节拥塞控制初始窗口的方法和***。The present invention relates to the field of communication technology, in particular to a method and system for adaptively adjusting the initial window of congestion control.

背景技术Background technique

TCP(传输控制协议)是一种面向连接的、可靠的、基于字节流的传输层通信协议，拥塞控制是一种用来调整TCP中拥塞窗口大小的算法，许多应用通过传输控制协议实现互相通信。开始通信时，客户端会与服务器建立一个TCP连接，然后客户端向服务器发送请求，服务器收到请求后，给客户端发送响应。在进行响应数据传输时，客户端将根据拥塞控制算法动态的调整拥塞窗口的大小，拥塞窗口大小的初始值很小，通常设置成2或10。TCP (Transmission Control Protocol) is a connection-oriented, reliable, byte stream-based transport layer communication protocol. Congestion control is an algorithm used to adjust the size of the congestion window in TCP. Many applications realize mutual communication through transmission control protocols. Communication. When communication starts, the client will establish a TCP connection with the server, and then the client will send a request to the server, and the server will send a response to the client after receiving the request. When transmitting response data, the client will dynamically adjust the size of the congestion window according to the congestion control algorithm. The initial value of the congestion window is very small, usually set to 2 or 10.

适当提高初始窗口的大小能够减少流传输时间并且提升网络的吞吐量，然而如果一味地增大初始窗口，则可能导致网络拥塞从而增加流传输时间。所以需要设置一个最佳的初始窗口大小，使得网络在不发生额外拥塞的情况下最小化流传输时间。Properly increasing the size of the initial window can reduce the streaming time and increase the throughput of the network. However, if the initial window is blindly increased, it may cause network congestion and increase the streaming time. Therefore, it is necessary to set an optimal initial window size so that the network minimizes the streaming time without additional congestion.

初始窗口的设置可应用于多种场景，例如在边缘计算场景中，大多数应用对网络延迟非常敏感，因此网络拥塞将产生非常严重的影响。因此如何设置适当的初始拥塞窗口，才能使得流传输时间最短并且网络拥塞发生最少是一个至关重要的问题。The setting of the initial window can be applied to a variety of scenarios. For example, in the edge computing scenario, most applications are very sensitive to network delay, so network congestion will have a very serious impact. Therefore, how to set an appropriate initial congestion window to minimize the streaming time and minimize the occurrence of network congestion is a crucial issue.

在现有技术中，一些研究表明，增加初始拥塞窗口能够减少HTTP响应的延迟，例如将TCP的初始拥塞窗口大小增加到10个数据段时，HTTP平均响应时间减少了大约10％，平均的重传率也仅增长了0.5％。然而这种方式不能动态地调整初始窗口值，会对网络传输效率产生严重影响。例如在5G边缘计算场景中，网络带宽有极大的提高，同时网络中会有大量的短流需要传输，导致这些短流往往在“慢启动”阶段就会传输结束，所以动态调整初始窗口大小能够极大的提高网络的传输效率。In the prior art, some studies have shown that increasing the initial congestion window can reduce the delay of HTTP response. For example, when the initial congestion window size of TCP is increased to 10 data segments, the average HTTP response time is reduced by about 10%, and the average weight is reduced. The transmission rate also only increased by 0.5%. However, this method cannot dynamically adjust the initial window value, which will have a serious impact on the network transmission efficiency. For example, in the 5G edge computing scenario, the network bandwidth has been greatly improved, and there will be a large number of short streams that need to be transmitted in the network. As a result, these short streams often end in the "slow start" phase, so the initial window size is dynamically adjusted Can greatly improve the transmission efficiency of the network.

有研究提出使用UCB(置信区间上界)算法动态地设置初始窗口值，最大化网路吞吐量并且最小化网络延时。然而，这种方法无法适应动态变化的网络环境，无法达到最优的性能。Studies have proposed to use the UCB (upper bound of confidence interval) algorithm to dynamically set the initial window value to maximize network throughput and minimize network delay. However, this method cannot adapt to the dynamically changing network environment and cannot achieve optimal performance.

发明内容Summary of the invention

本发明的目的在于克服上述现有技术的缺陷，提供一种自适应调节拥塞控制初始窗口的方法和***，通过引入A3C强化学习模型，利用深度神经网络动态设置拥塞控制初始窗口来适用网络环境变化。The purpose of the present invention is to overcome the above-mentioned defects of the prior art and provide a method and system for adaptively adjusting the initial window of congestion control. By introducing the A3C reinforcement learning model, the deep neural network is used to dynamically set the initial window of congestion control to adapt to changes in the network environment. .

根据本发明的第一方面，提供一种自适应调节拥塞控制初始窗口的方法。该方法包括以下步骤：According to the first aspect of the present invention, a method for adaptively adjusting the initial window of congestion control is provided. The method includes the following steps:

构建神经网络决策模型，以历史的流传输状态作为输入，以拥塞控制初始窗口的分布作为输出动作；Construct a neural network decision model, take the historical streaming state as input, and take the distribution of the initial window of congestion control as the output action;

构建马尔科夫学习过程通过在线学习获得所述神经网络决策模型的优化参数，其中，一个决策对应神经网络决策模型中一组参数θ，并且该决策对应一个状态轨迹{s ₀,a ₀,s ₁,a ₁,...,s _t,a _t,...}和流传输性能集合{d ₀,d ₁,...,d _t,...}，经过多个轮次的迭代更新，找到最佳的决策参数θ，使得期望的流传输性能最优，其中s表示输入的流传输状态，a表示输出动作； Construct a Markov learning process to obtain optimized parameters of the neural network decision model through online learning, where a decision corresponds to a set of parameters θ in the neural network decision model, and the decision corresponds to a state trajectory {s ₀ ,a ₀ ,s _{_{1, a 1, ..., s}} t, a t, ...} and the set of performance streaming _{_{{d 0, d 1, ...}} , d t, ...}, through a plurality of iteration rounds Update to find the best decision parameter θ to make the expected streaming performance optimal, where s represents the input streaming state, and a represents the output action;

利用所优化的神经网络决策模型获得拥塞控制初始窗口值，用于后续的流传输过程。The optimized neural network decision model is used to obtain the initial window value of congestion control, which is used in the subsequent stream transmission process.

在一个实施例中，使用流完成传输时间、流到达的间隔时间、流结束的间隔时间、流数据量、流吞吐量、流往返时延等六个状态量生成所述神经网络决策模型的输入。具体方式是对每个状态量，利用最新的k个样本构建统计直方图来描述所述神经网络决策模型在一段时间内的流传输状态。上述所述流完成传输时间是衡量流传输性能的指标。In one embodiment, the input of the neural network decision-making model is generated using six state variables including the flow completion transmission time, the interval time of the flow arrival, the interval time of the flow end, the amount of flow data, the flow throughput, and the flow round-trip delay. . The specific method is to use the latest k samples to construct a statistical histogram for each state quantity to describe the streaming state of the neural network decision model over a period of time. The above-mentioned streaming completion time is an index for measuring streaming performance.

在一个实施例中，所述神经网络决策模型包括特征提取器和预测器，所述特征提取器用于从输入数据中提取特征，其包括依次连接的五个卷积层和一个全连接层；所述预测器用于预测拥塞控制窗口值分布，其输入是所述特征提取器提取的特征输出，所述预测器包括两个全连接层和一个输出层，并且输出层通过使用softmax激活函数将输出转换为拥塞控制初始窗口值的概率分布。In one embodiment, the neural network decision model includes a feature extractor and a predictor. The feature extractor is used to extract features from the input data, which includes five convolutional layers and one fully connected layer that are connected in sequence; The predictor is used to predict the congestion control window value distribution, and its input is the feature output extracted by the feature extractor. The predictor includes two fully connected layers and an output layer, and the output layer converts the output by using a softmax activation function Control the probability distribution of the initial window value for congestion.

在一个实施例中，该方法还包括：在所述神经网络决策模型的输入直方图信息的变化达到预设目标的情况下，初始化所述神经神经网络决策模型的最后两层的参数进行重新优化。In one embodiment, the method further includes: in the case where the change of the input histogram information of the neural network decision model reaches a preset target, initialize the parameters of the last two layers of the neural network decision model to re-optimize .

在一个实施例中，采用以下指标来衡量所述神经网络决策模型的输入直方图信息的变化：In one embodiment, the following indicators are used to measure changes in the input histogram information of the neural network decision model:

其中，s神经网络决策模型的当前输入，s'是神经网络决策模型之前的输入。Among them, s is the current input of the neural network decision model, and s'is the previous input of the neural network decision model.

在一个实施例中，该方法还包括：在获得拥塞控制初始窗口值后，在一段时间内重复使用该值用于流传输；以及当发送方开始传输流时同时启动一个计时器，在计时器到期而流传输尚未完成的情况下，利用估计的流完成传输时间作为样本训练所述神经网络决策模型。In one embodiment, the method further includes: after obtaining the congestion control initial window value, repeatedly using the value for stream transmission for a period of time; and simultaneously starting a timer when the sender starts to transmit the stream. In the case that the stream transmission has not been completed when it is due, the estimated time for the completion of the stream transmission is used as a sample to train the neural network decision model.

在一个实施例中，所述流传输性能是流完成传输时间或吞吐量中一项或多项。In one embodiment, the streaming performance is one or more of the streaming completion time or throughput.

在一个实施例中，通过并行架构实现所述在线学习，该架构包括一个中央代理器，多个子代理器和网络环境，其中所述中央代理器负责维护最新的神经网络决策模型的参数，每一个子代理器通过决策函数做出拥塞控制初始窗口的决策并且计算参数更新。In one embodiment, the online learning is implemented through a parallel architecture, which includes a central agent, multiple sub-agents and a network environment, wherein the central agent is responsible for maintaining the latest neural network decision model parameters, each The subagent makes the decision of the initial window of congestion control through the decision function and calculates parameter updates.

根据本发明的第二方面，提供一种自适应调节拥塞控制初始窗口的***。该***包括：模型构建单元，其用于构建神经网络决策模型，以历史的流传输状态作为输入，以拥塞控制初始窗口的分布作为输出动作；在线学习单元，其用于构建马尔科夫学习过程通过在线学习获得所述神经网络决策模型的优化参数，其中，一个决策对应神经网络决策模型中一组参数θ，并且该决策对应一个状态轨迹{s ₀,a ₀,s ₁,a ₁,...,s _t,a _t,...}和流传输性能集合{d ₀,d ₁,...,d _t,...}，经过多个轮次的迭代更新，找到最佳的决策参数θ，使得期望的流传输性能最优，其中s表示输入的流传输状态，a表示输出动作；预测单元，其用于利用所优化的神经网络决策模型获得拥塞控制初始窗口值，用于后续的流传输过程。 According to a second aspect of the present invention, a system for adaptively adjusting the initial window of congestion control is provided. The system includes: a model construction unit, which is used to construct a neural network decision model, with historical streaming status as input, and the distribution of the initial window of congestion control as an output action; an online learning unit, which is used to construct a Markov learning process The optimized parameters of the neural network decision model are obtained through online learning, where a decision corresponds to a set of parameters θ in the neural network decision model, and the decision corresponds to a state trajectory {s ₀ , a ₀ , s ₁ , a ₁ ,. _{_{.., s t, a t,}} ...} and the set of performance streaming _{_{{d 0, d 1, ...}} , d t, ...}, iteratively updated through a plurality of rounds, to find the best The decision parameter θ makes the desired streaming performance optimal, where s represents the input streaming state, and a represents the output action; the prediction unit is used to obtain the initial window value of congestion control by using the optimized neural network decision model. The subsequent streaming process.

与现有技术相比，本发明的优点在于：实现了自主学习，不需要分析无线传输***和基站***的内部原理，能够自主调整初始窗口的大小，无需人力进行分析；实现了最优决策，能够获得最优的初始窗口，从而提高了流传输效率；实现了自适应调节，能够根据网络状况动态地调节初始窗口，在变化的网络环境中也能得到最优性能。Compared with the prior art, the present invention has the advantages of realizing autonomous learning, no need to analyze the internal principles of the wireless transmission system and base station system, capable of autonomously adjusting the size of the initial window, and no need for manpower to analyze; and the optimal decision-making is realized. The optimal initial window can be obtained, thereby improving the efficiency of streaming transmission; adaptive adjustment is realized, the initial window can be dynamically adjusted according to the network conditions, and the optimal performance can also be obtained in a changing network environment.

附图说明Description of the drawings

以下附图仅对本发明作示意性的说明和解释，并不用于限定本发明的范围，其中：The following drawings only schematically illustrate and explain the present invention, and are not used to limit the scope of the present invention, in which:

图1是根据本发明一个实施例的自适应调节拥塞控制初始窗口的方法的过程示意；Fig. 1 is a process diagram of a method for adaptively adjusting an initial window of congestion control according to an embodiment of the present invention;

图2是根据本发明一个实施例的自适应调节拥塞控制初始窗口的方法的流程图；2 is a flowchart of a method for adaptively adjusting the initial window of congestion control according to an embodiment of the present invention;

图3是根据本发明一个实施例的神经网络决策模型的结构示意图；Fig. 3 is a schematic structural diagram of a neural network decision model according to an embodiment of the present invention;

图4是根据本发明一个实施例的并行训练模型结构的示意图；Figure 4 is a schematic diagram of a parallel training model structure according to an embodiment of the present invention;

图5是根据本发明一个实施例的仿真结果示意图。Fig. 5 is a schematic diagram of a simulation result according to an embodiment of the present invention.

具体实施方式Detailed ways

为了使本发明的目的、技术方案、设计方法及优点更加清楚明了，以下结合附图通过具体实施例对本发明进一步详细说明。应当理解，此处所描述的具体实施例仅用于解释本发明，并不用于限定本发明。In order to make the objectives, technical solutions, design methods, and advantages of the present invention clearer, the following further describes the present invention in detail through specific embodiments with reference to the accompanying drawings. It should be understood that the specific embodiments described here are only used to explain the present invention, but not used to limit the present invention.

在本文示出和讨论的所有例子中，任何具体值应被解释为仅仅是示例性的，而不是作为限制。因此，示例性实施例的其它例子可以具有不同的值。In all examples shown and discussed herein, any specific value should be construed as merely exemplary, rather than as a limitation. Therefore, other examples of the exemplary embodiment may have different values.

对于相关领域普通技术人员已知的技术、方法和设备可能不作详细讨论，但在适当情况下，所述技术、方法和设备应当被视为说明书的一部分。The technologies, methods, and equipment known to those of ordinary skill in the relevant fields may not be discussed in detail, but where appropriate, the technologies, methods, and equipment should be regarded as part of the specification.

根据本发明的一个实施例，提供一种自适应调节拥塞控制初始窗口的方法，该方法自适应地在线学习最优的初始窗口决策模型。参见图1所示，首先，当一个数据流开始传输之前，收集网络环境(network)的历史状态(states)，并将其转换成统计直方图(histogram)；然后，将该直方图输入到神经网络决策模型(model)，经过神经网络前向运算得到输出值，即拥塞控制初始窗口值；接着，网络环境采取该值设定此数据流传输的初始窗口，并进行传输。该数据流传输进而产生了一个新的***状态和奖励值(reward)；接下来，利用该奖励值计算出神经网络决策模型的参数更新值(update parameters)，进而优化模型参数。通过这种方式，经过多轮迭代计算，最终获得一个收敛的神经网络决策模型，输出最优的初始窗口值。According to an embodiment of the present invention, a method for adaptively adjusting the initial window of congestion control is provided, and the method adaptively learns an optimal initial window decision model online. As shown in Figure 1, first, before a data stream starts to be transmitted, the historical states of the network environment (network) are collected and converted into a statistical histogram (histogram); then, the histogram is input to the neural network. The network decision model (model) obtains the output value through the forward operation of the neural network, that is, the initial window value of congestion control; then, the network environment uses this value to set the initial window of the data stream transmission and transmits. The data stream transmission then generates a new system state and reward; then, using the reward value to calculate the update parameters of the neural network decision model, and then optimize the model parameters. In this way, after multiple rounds of iterative calculations, a convergent neural network decision model is finally obtained, and the optimal initial window value is output.

需说明的是，本文涉及的网络环境，可以是移动边缘计算***(如5G 边缘计算)或其他的采用TCP/IP协议传输数据流的固定网络或移动网络环境。在下文中，将以5G边缘计算场景为例，分别介绍神经网络决策模型、在线学习算法、自适应学习算法以及优化神经网络决策模型的策略。It should be noted that the network environment involved in this article can be a mobile edge computing system (such as 5G edge computing) or other fixed or mobile network environments that use the TCP/IP protocol to transmit data streams. In the following, the 5G edge computing scenario will be used as an example to introduce neural network decision models, online learning algorithms, adaptive learning algorithms, and strategies for optimizing neural network decision models.

参见图2所示，本发明实施例提供的自适应调节拥塞控制初始窗口的方法包括以下步骤：Referring to FIG. 2, the method for adaptively adjusting the initial window of congestion control provided by the embodiment of the present invention includes the following steps:

步骤S210，构建神经网络决策模型，以网络环境的流传输历史状态作为输入，以拥塞控制初始窗口值作为输出动作。In step S210, a neural network decision model is constructed, taking the streaming history state of the network environment as an input, and the initial window value of congestion control as an output action.

对于神经网络决策模型的输入，在选择输入状态时有两个原则，即：每个输入的数据能够从边缘服务器端获得；每个输入状态都与拥塞控制问题有关。根据这两个原则，在一个实施例中，选择以下六个状态量做输入：流完成传输时间、流到达的间隔时间、流结束的间隔时间、流数据量、流吞吐量和流往返时延。当服务器发送响应的第一个数据包时，流传输开始，直到这个流传输结束所经历的时间称为流传输时间。连续的两个流到达时的时间间隔称为流到达时间间隔。连续的两个流结束的时间间隔称为流结束时间间隔。For the input of the neural network decision model, there are two principles when selecting the input state, namely: each input data can be obtained from the edge server; each input state is related to the congestion control problem. According to these two principles, in one embodiment, the following six state variables are selected as input: flow completion transmission time, flow arrival interval time, flow end interval time, flow data volume, flow throughput, and flow round-trip delay . When the server sends the first packet of the response, the streaming starts and the time elapsed until the end of the streaming is called the streaming time. The time interval between the arrival of two consecutive flows is called the flow arrival time interval. The time interval between the end of two consecutive streams is called the stream end time interval.

对于上述每一个状态量，即流完成传输时间、流到达的间隔时间、流结束的间隔时间、流数据量、流吞吐量和流往返时延，，收集最新的k(k是大于等于2的整数)个样本后构建出一个直方图来描述网络环境在一段时间内的状态。这些直方图的集合形成了***状态S _t。通过使用直方图来代替原始的采样数据，使得神经网络模型的输入更简洁，并且无需归一化处理。 For each of the above-mentioned state quantities, that is, the flow completion transmission time, the flow arrival interval time, the flow end interval time, the flow data volume, the flow throughput and the flow round-trip delay, the latest k (k is greater than or equal to 2) is collected. Integer) samples are used to construct a histogram to describe the state of the network environment over a period of time. These histograms formed collection system state S _t. By using the histogram to replace the original sampled data, the input of the neural network model is more concise, and no normalization processing is required.

对于神经网络决策模型的输出，可以输出一个值作为初始窗口大小或者输出一组值作为离散初始窗口大小的概率分布。前者探索一个连续的概率分布，后者探索一个离散的概率分布。本发明使用离散初始窗口大小的概率分布作为输出动作空间，因为这种方式使得神经网络决策模型更容易被训练。For the output of the neural network decision model, one value can be output as the initial window size or a set of values can be output as the probability distribution of the discrete initial window size. The former explores a continuous probability distribution, the latter explores a discrete probability distribution. The present invention uses the probability distribution of the discrete initial window size as the output action space, because this method makes the neural network decision model easier to train.

进一步地，为了减少计算复杂度，选择2 ⁱ个数据段作为输出动作空间的大小。例如，i从4开始然后增加到一个会产生明显网络拥塞的值例如[16,32,64,128,256,512]。 Further, in order to reduce the computational complexity, 2 ⁱ data segments are selected as the size of the output action space. For example, i starts from 4 and then increases to a value that will cause obvious network congestion such as [16,32,64,128,256,512].

对于神经网络决策模型的结构，可采用多种形式，例如，参见图3所示(其中括号中的数字是对应层的大小)，该神经网络决策模型整体上包含特征提取器(feature extractor)和预测器(predictor)两部分。具体地，特征提取器是从输入中提取特征，它由5个卷积层(conv1至conv5)组成，前4个卷积层的内核大小为5*1，第5个卷积层的内核大小为1*1，在第5个卷积层之后是一个全连接层。特征提取器中每一层的结构如下：1)10个步幅为1的卷积核；2)20个步幅为2的卷积核；3)20个步幅为1的卷积核；4)40个步幅为2的卷积核；5)10个步幅为1的卷积核。然后将输出转换为一个180*1(flatten)的向量，最后再通过一个全连接层(fc1)将向量输出成一个具有10个神经元的一维向量。至此，完成了输入数据的特征提取。For the structure of the neural network decision model, various forms can be used. For example, as shown in Figure 3 (the number in parentheses is the size of the corresponding layer), the neural network decision model as a whole includes a feature extractor and The predictor has two parts. Specifically, the feature extractor extracts features from the input, which consists of 5 convolutional layers (conv1 to conv5), the kernel size of the first 4 convolutional layers is 5*1, and the kernel size of the fifth convolutional layer 1*1, after the fifth convolutional layer is a fully connected layer. The structure of each layer in the feature extractor is as follows: 1) 10 convolution kernels with a stride of 1; 2) 20 convolution kernels with a stride of 2; 3) 20 convolution kernels with a stride of 1; 4) 40 convolution kernels with a stride of 2; 5) 10 convolution kernels with a stride of 1. Then convert the output into a 180*1 (flatten) vector, and finally output the vector into a one-dimensional vector with 10 neurons through a fully connected layer (fc1). So far, the feature extraction of the input data is completed.

对于神经网络决策模型的预测器部分，它的输入是特征提取器提取的特征，预测器负责预测初始窗口大小。它由两个全连接层(fc2和fc3)和一个输出层(output)组成，每个全连接层有128神经元，输出层通过使用softmax激活函数将输出转换为一个概率分布。For the predictor part of the neural network decision model, its input is the features extracted by the feature extractor, and the predictor is responsible for predicting the initial window size. It consists of two fully connected layers (fc2 and fc3) and an output layer (output). Each fully connected layer has 128 neurons. The output layer converts the output into a probability distribution by using the softmax activation function.

基于图3的神经网络决策模型，可利用强化学习进行反复试验，因此，在训练开始时尽量避免选择偏见是非常重要的。否则由于行为选择上缺乏探索性，可能会找不到最佳选择。因此在神经网络决策模型中使用批量归一化(batch normalization)技术。该技术的使用将有利于神经网络决策模型产生均匀的概率分布。图3的模型中，仅将批量归一化技术应用到第二个全连接层上(图3中bn表示批标准化)。Based on the neural network decision model of Figure 3, reinforcement learning can be used for trial and error. Therefore, it is very important to avoid selection bias at the beginning of training. Otherwise, due to the lack of exploratory behavior in the choice of behavior, the best choice may not be found. Therefore, batch normalization technology is used in the neural network decision model. The use of this technology will help the neural network decision model to generate a uniform probability distribution. In the model in Figure 3, only batch normalization technology is applied to the second fully connected layer (bn in Figure 3 represents batch standardization).

图3是经过反复实验确定了一个可行的神经网络决策模型结构，但应理解的是，本领域技术人员根据应用场景或训练精度要求，也可采用其他的改进结构来提升模型性能，例如，采用更多的卷积层或不同的卷积核大小等。Figure 3 shows a feasible neural network decision model structure determined through repeated experiments, but it should be understood that those skilled in the art can also use other improved structures to improve model performance according to application scenarios or training accuracy requirements, for example, More convolutional layers or different convolution kernel sizes, etc.

步骤S220，利用在线学习方法，获得优化的神经网络决策模型。Step S220, using an online learning method to obtain an optimized neural network decision model.

在一个实施例中，将需要解决的问题构建成一个马尔科夫决策过程，例如使用强化学习中的A3C算法获得最佳决策。In one embodiment, the problem to be solved is constructed as a Markov decision process, for example, the A3C algorithm in reinforcement learning is used to obtain the best decision.

具体地，给定的一个决策对应神经网络决策模型中一组参数θ，同时这个决策对应一个状态轨迹{s ₀,a ₀,s ₁,a ₁,...,s _t,a _t,...}和流传输时间集合{d ₀,d ₁,...,d _t,...}其中，s ₀，s ₁，…，s _t表示状态，a ₀，a ₁，…，a _t表示输出动作。待求解的问题是找到最佳的决策参数θ，使得期望的累计流传输时间值最小，表示为

Specifically, a decision corresponding to a given model of a neural network decision set of parameters [theta], while this corresponds to a decision state trajectory _{_{{s 0, a 0, s}} 1, a 1, ..., s t, a t,. ..} and streaming time set {d ₀ ,d ₁ ,...,d _t ,...} where s ₀ , s ₁ ,..., _st represent the state, a ₀ , a ₁ ,..., a _t represents output action. The problem to be solved is to find the best decision parameter θ, so that the expected cumulative streaming time value is the smallest, expressed as

让π _θ(a|s)表示带有参数θ的决策函数，它表示在状态s下采取动作a的概率大小，通过该决策函数来选择要执行的动作。让V _ω(s)表示带有参数ω的价值函数，对于每一个价值函数，采取与决策函数相同的神经网络决策模型结构，唯一的区别是价值函数在最后一层输出的是数值，而不是概率分布。此外，让决策函数和价值函数共享神经网络决策模型中特征提取器的所有参数，A3C算法不断地更新参数θ和ω直到他们收敛。γ表示影响因子，例如可取常数0.9。 Let π _θ (a|s) denote the decision function with parameter θ, which represents the probability of taking action a in state s, and the action to be executed is selected through the decision function. Let V _ω (s) denote the value function with parameter ω. For each value function, adopt the same neural network decision model structure as the decision function. The only difference is that the value function outputs a value in the last layer instead of Probability distributions. In addition, let the decision function and the value function share all the parameters of the feature extractor in the neural network decision model, and the A3C algorithm continuously updates the parameters θ and ω until they converge. γ represents the influence factor, for example, a constant 0.9 can be taken.

在一个优选实施例中，为了模型的稳定性，采用A3C并行学习框架。参见图4所示，它由一个中央代理器(central agent)，多个子代理器(subagents)和网络环境组成。中央代理器负责维护最新的参数θ和ω，每一个子代理器通过决策函数做出初始窗口的决策并且计算更新子代理器中参数θ和ω，所有子代理器所处的网络环境是相似的，表示子代理器具有相似的网络状态。同时并行学习在现实网络环境中是可以实现的，例如，通过对3G/LTE移动流量的研究表明，许多蜂窝塔共享与其地理位置相关的相同流量模式，这种模式在5G蜂窝网络中同样适用。此外，相互协同的移动边缘计算节点是一个可行的框架，并行学习中的每个子代理器都可以在带蜂窝塔的边缘服务器上运行，中央代理器可以在边缘服务器或者专用服务器中的任何一个上运行。In a preferred embodiment, for the stability of the model, the A3C parallel learning framework is adopted. As shown in Figure 4, it is composed of a central agent, multiple subagents and a network environment. The central agent is responsible for maintaining the latest parameters θ and ω. Each sub-agent makes the initial window decision through the decision function and calculates and updates the parameters θ and ω in the sub-agents. The network environment of all sub-agents is similar. , Which means that the subagents have similar network status. At the same time, parallel learning is achievable in a real network environment. For example, a study of 3G/LTE mobile traffic shows that many cell towers share the same traffic pattern related to their geographic location. This model is also applicable in 5G cellular networks. In addition, cooperative mobile edge computing nodes are a feasible framework. Each sub-agent in parallel learning can run on an edge server with a cellular tower, and the central agent can be on either edge server or dedicated server. run.

在图4的并行框架中，当一个边缘服务器想获得当前网络环境下最佳初始窗口大小时，它将当前的环境状态发送到关联的子代理器中，子代理器使用决策函数计算出最佳的初始窗口大小，并且立即返回结果给边缘服务器。当子代理器收集了一批数据后，根据这批数据开始计算梯度Δθ和Δω，子代理器将更新的梯度发送给中央代理器，中央代理器利用这些梯度更新参数θ和ω，这次参数更新是异步发生的。In the parallel framework of Figure 4, when an edge server wants to obtain the best initial window size in the current network environment, it sends the current environment state to the associated sub-agent, and the sub-agent uses the decision function to calculate the best The initial window size of, and immediately return the result to the edge server. When the sub-agent collects a batch of data, it starts to calculate the gradients Δθ and Δω based on this batch of data. The sub-agent sends the updated gradient to the central agent, and the central agent uses these gradients to update the parameters θ and ω, this time the parameters Updates happen asynchronously.

对于每一个子代理器而言，A3C算法是从t＝0时刻开始的，每次迭代子代理器都会经历下面的步骤：For each subagent, the A3C algorithm starts at t=0, and each iteration of the subagent will go through the following steps:

1)、重置梯度：Δθ←0，Δω←0；1). Reset gradient: Δθ←0, Δω←0;

2)、将子代理器的参数(θ'和ω')和中央代理器的参数(θ和ω)同步：θ'←θ，ω'←ω；2) Synchronize the parameters (θ' and ω') of the sub-agents and the parameters (θ and ω) of the central agent: θ'←θ, ω'←ω;

3)、使用当前决策与网络环境进行交互并收集状态轨迹{s _t,a _t,...,s _t+T-1,a _t+T-1}和流传输时间集合{d _t,...,d _t+T-1}，其中T是大于n(n步返回的参数)的超参数； 3), the current network environment to interact with a decision and collecting state trajectory _{_{{s t, a t, ...}} , s t + T-1, a t + T-1} and the set time streaming {d _t,. ..,d _t+T-1 }, where T is a hyperparameter greater than n (the parameter returned by n steps);

4)、对于所有的i∈[0,T-1-n]，计算n步返回值

4) For all i∈[0,T-1-n], calculate the return value of n steps

5)、计算优势价值

5), calculate the advantage value

6)、计算梯度

6), calculate the gradient

7)、计算梯度

7), calculate the gradient

8)、对这些全局参数θ和ω执行异步更新：θ←θ+η ₁Δθ，ω←ω+η ₂Δω，其中η ₁和η ₂是学习率； 8) Perform asynchronous update of these global parameters θ and ω: θ←θ+η ₁ Δθ, ω←ω+η ₂ Δω, where η ₁ and η ₂ are learning rates;

9)、设t←t+T；9), suppose t←t+T;

10)、重复上述步骤，直到达到最大迭代次数。10) Repeat the above steps until the maximum number of iterations is reached.

步骤S230，根据网络环境的动态变化自适应学习神经网络决策模型的优化参数。In step S230, the optimization parameters of the neural network decision model are adaptively learned according to the dynamic changes of the network environment.

如上所述，本发明实施例使用批量归一化技术使神经网络决策模型输出的概率分布尽可能均匀，在反复探索决策过程中使算法收敛。网络环境会动态地发生变化，并且在之前的网络环境中训练的神经网络决策模型可能不适用于当前的网络环境。为了解决这个问题，本发明进一步提出一种自适应算法，通过检测网络环境的变化并根据需求重新开始训练神经网络决策模型。As described above, the embodiment of the present invention uses batch normalization technology to make the probability distribution of the neural network decision model output as uniform as possible, and converge the algorithm in the process of repeated exploration and decision-making. The network environment will dynamically change, and the neural network decision model trained in the previous network environment may not be suitable for the current network environment. In order to solve this problem, the present invention further proposes an adaptive algorithm, which detects changes in the network environment and restarts training the neural network decision model according to requirements.

例如，为了检测网络中环境的变化，观察神经网络决策模型的输入，即直方图。如果观察到连续的直方图之间存在显著差异，则网络环境可能正在发生重大变化。具体地，首先使用神经网络决策模型输入(如直方图信息)的变化来检测网络环境的变化，接着使用余弦相似度来评估两个输入之间的相似度，并将输入s和之前的输入s'之间的差值定义为

该差值越大，表示网络环境的变化越大。 For example, in order to detect changes in the environment in the network, observe the input of the neural network decision model, that is, the histogram. If significant differences are observed between consecutive histograms, the network environment may be undergoing major changes. Specifically, first use changes in the neural network decision model input (such as histogram information) to detect changes in the network environment, and then use cosine similarity to evaluate the similarity between the two inputs, and compare the input s with the previous input s The difference between 'is defined as

The greater the difference, the greater the change in the network environment.

自适应算法的第二个任务是重新训练神经网络决策模型。例如，可重新初始化神经网络决策模型中的所有参数，但这种方式会破坏模型以前学到的许多信息。为了使神经网络决策模型输出均匀的概率，使得每个动作都有可能再次被选到，本发明实施例重新初始化神经网络决策模型最后两层的参数。The second task of the adaptive algorithm is to retrain the neural network decision model. For example, all parameters in the neural network decision model can be reinitialized, but this way will destroy much of the information the model has learned before. In order to make the neural network decision model output a uniform probability, so that each action may be selected again, the embodiment of the present invention reinitializes the parameters of the last two layers of the neural network decision model.

自适应算法可在中央代理器上运行。在每次迭代中，包括的步骤如下：The adaptive algorithm can be run on the central agent. In each iteration, the steps involved are as follows:

1)、获得本次迭代的平均状态s和过去的L次迭代的平均状态，用s'表示；1) Get the average state s of this iteration and the average state of the past L iterations, denoted by s';

2)、根据公式计算

2), calculated according to the formula

3)、当Δs大于阈值时，重新初始化决策函数和价值函数的神经网络决策模型的最后两层。3) When Δs is greater than the threshold, reinitialize the last two layers of the neural network decision model of the decision function and the value function.

此外，为了使本发明提供的技术方案更加的完善，进一步提出两种优化策略，分别是减轻未来决策影响机制和提前反馈机制。In addition, in order to make the technical solution provided by the present invention more perfect, two optimization strategies are further proposed, namely, a mechanism to reduce the influence of future decision-making and an early feedback mechanism.

对于减轻未来决策影响机制，具体包括：The mechanisms for mitigating the impact of future decision-making include:

如果要更新神经网络决策模型的参数，需要训练样本，每个样本都由三元组组成：网络状态s _t，模型输出动作a _t(初始窗口值)和流传输时间d _t。 To update the model parameter decision neural network requires training samples, each sample consists of triples consisting of: network state s _t, the operation of the model output a _t (the initial value of the window) and the streaming time d _t.

理想情况下，d _t仅取决于s _t和a _t，并且训练算法将根据d _t的大小更新神经网络决策模型的参数来惩罚或鼓励动作a _t。但是，d _t不仅取决于s _t和a _t，还可能取决于将来的a _t+1，a _t+2等动作。这是因为在时间t发送的数据流不会立即完成，它可能会在一段时间之后才会传输完毕，当采取a _t+1...a _t+k这个序列动作时，这些将来的动作可能会导致网络拥塞并影响在时间t发送的流，从而影响d _t，进而误导模型训练做出错误的惩罚或鼓励。 Ideally, d _t, and only depend on s _t a _t, and the training algorithm to punish or encourage the operation according to the size d _t a _t updated neural network parameter decision models. However, d _t, and only depend on s _t a _t, in the future it may also depend on _{_{a t + 1, a t +}} 2 and other movements. This is because the data stream sent at time t will not be completed immediately. It may be transmitted after a period of time. When _{the sequence of actions at+1} ...at _+k is taken, these future actions may be It will cause network congestion and affect the flow sent at time t, thereby affecting d _t , and then misleading model training to make wrong punishment or encouragement.

为了减轻这种未来决策动作带来的影响，本发明实施例将在接下来一段时间内，重复利用当前初始窗口值，从而降低上述介绍的这种影响。这种方式还能够避免初始窗口的频繁调整。In order to reduce the impact of such future decision-making actions, the embodiment of the present invention will reuse the current initial window value in the next period of time, thereby reducing the impact described above. This method can also avoid frequent adjustments of the initial window.

对于提前反馈机制，具体包括：For the early feedback mechanism, specifically include:

流的传输时间不仅有延迟，而且延迟有所不同。这种差异将使神经网络决策模型的训练产生偏差，从而影响模型性能。根据上面的说明，在采取动作a _t使得初始窗口大小保持一段时间之后，***必须等待其对应的流传输完毕，等待时间正好是这个流的传输时间d _t。如果d _t很大，那么这种消极样本(抑制使用a _t)只会在模型处理许多积极样本(鼓励使用a _t)之后才能到达模型进行训练。由于模型训练不能及时利用消极样本，因此往往过于激进，从而减慢了训练的收敛速度。 There is not only a delay in the transmission time of a stream, but also a different delay. This difference will bias the training of the neural network decision model, thereby affecting the performance of the model. According to the above description, a _t after the action is taken so that the initial window size for a period of time, the system must wait for the completion of its corresponding transport stream, the latency is just the transmission time of the stream d _t. If d _{t is} large, then this negative sample (suppressing the use of a _t ) will only reach the model for training after the model has processed many positive samples (the use of a _{t is encouraged).} Because model training cannot use negative samples in time, it is often too aggressive, which slows down the convergence rate of training.

为了解决这个问题，本发明提出一种提前反馈机制，即在完成传输之前估算某些流的传输时间。具体来讲，当发送方开始传输流，它还同时启动了一个计时器。当计时器到期时，如果传输尚未完成，则估计这个流的传输时间。估算的方法有很多种，为了简单起见，可将估计的传输时间设置为一个常数。如果该值很大，那么将对拥塞进行严重惩罚；如果值很小，那么对拥塞的惩罚就很小。In order to solve this problem, the present invention proposes an early feedback mechanism, that is, to estimate the transmission time of certain streams before completing the transmission. Specifically, when the sender starts to transmit the stream, it also starts a timer at the same time. When the timer expires, if the transmission has not been completed, the transmission time of this stream is estimated. There are many estimation methods. For simplicity, the estimated transmission time can be set to a constant. If the value is large, then the congestion will be severely penalized; if the value is small, then the congestion penalty will be small.

相应地，本发明还提供一种自适应调节拥塞控制初始窗口的***，用于实现上述方法的一个方面或多个方面。例如，该***包括：模型构建单元，其用于构建神经网络决策模型，以历史的流传输状态作为输入，以拥塞控制初始窗口的分布作为输出动作；在线学习单元，其用于构建马尔科夫学习过程获得所述神经网络决策模型的优化参数，其中，一个决策对应神经网络决策模型中一组参数θ，并且该决策对应一个状态轨迹{s ₀,a ₀,s ₁,a ₁,...,s _t,a _t,...}和流传输性能集合{d ₀,d ₁,...,d _t,...}，找到最佳的决策参数θ，使得期望的流传输性能最优，其中s表示输入的流传输状态，a表示输出动作；预测单元，其用于利用所优化的神经网络决策模型获得拥塞控制初始窗口值，用于后续的流传输过程。该***中的各单元可采用逻辑硬件或处理器实现。 Correspondingly, the present invention also provides a system for adaptively adjusting the initial window of congestion control, which is used to implement one or more aspects of the above method. For example, the system includes: a model construction unit, which is used to construct a neural network decision model, with historical streaming status as input, and the distribution of the initial window of congestion control as an output action; an online learning unit, which is used to construct Markov The learning process obtains the optimized parameters of the neural network decision model, where a decision corresponds to a set of parameters θ in the neural network decision model, and the decision corresponds to a state trajectory {s ₀ , a ₀ , s ₁ , a ₁ ,... _{_{., s t, a t,}} ...} and the set of performance streaming _{_{{d 0, d 1, ...}} , d t, ...}, to find the best decision parameter θ, such that the desired performance of streaming Optimal, where s represents the input streaming state, and a represents the output action; the prediction unit is used to obtain the initial window value of congestion control using the optimized neural network decision model for the subsequent streaming process. Each unit in the system can be implemented by logic hardware or a processor.

为了进一步验证本发明的可行性和技术效果，进行了多种仿真实验。在仿真实验中，将本发明的方法与两种现有技术的方法进行对比，这两种方法是：将拥塞控制初始窗口大小固定设置为10个数据段；利用UCB算法动态调整初始窗口大小。本发明实施例使用8个子代理器进行并行学***均结果。In order to further verify the feasibility and technical effects of the present invention, a variety of simulation experiments were carried out. In the simulation experiment, the method of the present invention is compared with two prior art methods. The two methods are: fixedly setting the initial window size of congestion control to 10 data segments; using UCB algorithm to dynamically adjust the initial window size. In the embodiment of the present invention, 8 subagents are used for parallel learning. For the comparison method in the prior art, different random seeds are used to run 8 simulations, and the average result is taken.

在仿真实验中，首先模拟了动态网络负载场景，客户端请求的平均到达间隔时间在50ms(低负载)和5ms(高负载)之间交替，实验结果如图5所示，其中IW-10表示固定初始窗口大小为10不变，Smart IW是利用UCB算法来决策初始窗口大小，Neuro IW是本发明提出的技术方案。在图5(a)中，横轴表示时间步长，纵轴是8个子代理器流传输时间的平均值，图5(b)的纵轴表示初始窗口大小(单位是数据段(segments))，每个子代理器都有1000个流。由图5(a)可知，相比于其他现有的两种算法，本发明的自适应调节拥塞控制初始窗口的方法在每种类型负载下都能获得最佳性能。在低负载下，本发明的传输时间大约是10毫秒，在高负载下，传输时间是20毫秒左右。由图5(b)可知，本发明能够根据网络环境的变化自适应的调节初始窗口的设置，从而适应性更强。In the simulation experiment, the dynamic network load scenario was first simulated. The average inter-arrival time of client requests alternated between 50ms (low load) and 5ms (high load). The experimental results are shown in Figure 5, where IW-10 represents The fixed initial window size is 10 unchanged, Smart IW uses the UCB algorithm to determine the initial window size, and Neuro IW is the technical solution proposed by the present invention. In Figure 5(a), the horizontal axis represents the time step, the vertical axis is the average of the 8 sub-agents' streaming time, and the vertical axis of Figure 5(b) represents the initial window size (unit is data segment (segments)) , Each subagent has 1000 streams. It can be seen from FIG. 5(a) that, compared with the other two existing algorithms, the method of adaptively adjusting the initial window of congestion control of the present invention can obtain the best performance under each type of load. Under low load, the transmission time of the present invention is about 10 milliseconds, and under high load, the transmission time is about 20 milliseconds. It can be seen from Fig. 5(b) that the present invention can adaptively adjust the setting of the initial window according to the change of the network environment, so that the adaptability is stronger.

综上所述，本发明提出的自适应调节拥塞控制窗口的方法，将在线学习算法用于实时决策初始窗口，使用了强化学习中的A3C算法实现了模型的自主学习。当需要动态调整当前初始窗口大小时，边缘服务器能够将当前的网络环境状态发送给子代理器，子代理器用神经网络决策模型计算出一个最优的初始窗口值返回给边缘服务器，边缘服务器利用该初始窗口传输数据。当子代理器收集了一批数据后会联合中央代理器更新神经网络决策模型的参数值，从而优化模型性能；进一步地，本发明能够检测网络环境是否发生了显著的变化，进而根据变化情况自动重启初始窗口决策函数的训练过程，这种方式使得一个决策函数在经历了长时间的训练收敛以后，在新环境中仍然能够重新探索发现新的最优解。In summary, the method for adaptively adjusting the congestion control window proposed by the present invention uses an online learning algorithm for the initial window of real-time decision-making, and uses the A3C algorithm in reinforcement learning to implement autonomous learning of the model. When the current initial window size needs to be dynamically adjusted, the edge server can send the current network environment state to the subagent. The subagent uses the neural network decision model to calculate an optimal initial window value and returns it to the edge server. The edge server uses this The initial window transfers data. When the subagent collects a batch of data, it will work with the central agent to update the parameter values of the neural network decision model, thereby optimizing the model performance; further, the present invention can detect whether the network environment has changed significantly, and then automatically according to the changes. Restart the training process of the initial window decision function. This method enables a decision function to re-explore and find a new optimal solution in a new environment after a long period of training convergence.

需要说明的是，虽然上文按照特定顺序描述了各个步骤，但是并不意味着必须按照上述特定顺序来执行各个步骤，实际上，这些步骤中的一些可以并发执行，甚至改变顺序，只要能够实现所需要的功能即可。It should be noted that although the steps are described in a specific order above, it does not mean that the steps must be executed in the above specific order. In fact, some of these steps can be executed concurrently, or even change the order, as long as it can be implemented. The required functions are sufficient.

本发明可以是***、方法和/或计算机程序产品。计算机程序产品可以包括计算机可读存储介质，其上载有用于使处理器实现本发明的各个方面的计算机可读程序指令。The present invention may be a system, a method and/or a computer program product. The computer program product may include a computer-readable storage medium loaded with computer-readable program instructions for enabling a processor to implement various aspects of the present invention.

计算机可读存储介质可以是保持和存储由指令执行设备使用的指令的有形设备。计算机可读存储介质例如可以包括但不限于电存储设备、磁存储设备、光存储设备、电磁存储设备、半导体存储设备或者上述的任意合适的组合。计算机可读存储介质的更具体的例子(非穷举的列表)包括：便携式计算机盘、硬盘、随机存取存储器(RAM)、只读存储器(ROM)、可擦式可编程只读存储器(EPROM或闪存)、静态随机存取存储器(SRAM)、便携式压缩盘只读存储器(CD-ROM)、数字多功能盘(DVD)、记忆棒、软盘、机械编码设备、例如其上存储有指令的打孔卡或凹槽内凸起结构、以及上述的任意合适的组合。The computer-readable storage medium may be a tangible device that holds and stores instructions used by the instruction execution device. The computer-readable storage medium may include, but is not limited to, an electrical storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any suitable combination of the foregoing, for example. More specific examples (non-exhaustive list) of computer-readable storage media include: portable computer disks, hard disks, random access memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM) Or flash memory), static random access memory (SRAM), portable compact disk read-only memory (CD-ROM), digital versatile disk (DVD), memory stick, floppy disk, mechanical encoding device, such as a printer with instructions stored thereon The protruding structure in the hole card or the groove, and any suitable combination of the above.

以上已经描述了本发明的各实施例，上述说明是示例性的，并非穷尽性的，并且也不限于所披露的各实施例。在不偏离所说明的各实施例的范围和精神的情况下，对于本技术领域的普通技术人员来说许多修改和变更都是显而易见的。本文中所用术语的选择，旨在最好地解释各实施例的原理、实际应用或对市场中的技术改进，或者使本技术领域的其它普通技术人员能理解本文披露的各实施例。The embodiments of the present invention have been described above, and the above description is exemplary, not exhaustive, and is not limited to the disclosed embodiments. Without departing from the scope and spirit of the illustrated embodiments, many modifications and changes are obvious to those of ordinary skill in the art. The choice of terms used herein is intended to best explain the principles, practical applications, or technical improvements in the market of the various embodiments, or to enable other ordinary skilled in the art to understand the various embodiments disclosed herein.

Claims

一种自适应调节拥塞控制初始窗口的方法，包括以下步骤：A method for adaptively adjusting the initial window of congestion control includes the following steps:

构建神经网络决策模型，以历史的流传输状态作为输入，以拥塞控制初始窗口的分布作为输出动作；Construct a neural network decision model, take the historical streaming state as input, and take the distribution of the initial window of congestion control as the output action;

构建马尔科夫学习过程通过在线学习获得所述神经网络决策模型的优化参数，其中，一个决策对应神经网络决策模型中一组参数θ，并且该决策对应一个状态轨迹{s ₀,a ₀,s ₁,a ₁,...,s _t,a _t,...}和流传输性能集合{d ₀,d ₁,...,d _t,...}，经过多个轮次的迭代更新，找到最佳的决策参数θ，使得期望的流传输性能最优，其中s表示输入的流传输状态，a表示输出动作； Construct a Markov learning process to obtain optimized parameters of the neural network decision model through online learning, where a decision corresponds to a set of parameters θ in the neural network decision model, and the decision corresponds to a state trajectory {s ₀ ,a ₀ ,s _{_{1, a 1, ..., s}} t, a t, ...} and the set of performance streaming _{_{{d 0, d 1, ...}} , d t, ...}, through a plurality of iteration rounds Update to find the best decision parameter θ to make the expected streaming performance optimal, where s represents the input streaming state, and a represents the output action;

利用所优化的神经网络决策模型获得拥塞控制初始窗口值，用于后续的流传输过程。The optimized neural network decision model is used to obtain the initial window value of congestion control, which is used in the subsequent stream transmission process.
根据权利要求1所述的方法，其特征在于，基于流完成传输时间，流到达的间隔时间，流结束的间隔时间，流数据量，流吞吐量，流往返时延的k个样本所构建的统计直方图来描述所述神经网络决策模型在一段时间内的流传输状态，其中所述流完成传输时间是衡量流传输性能的指标。The method according to claim 1, characterized in that it is constructed based on k samples of stream completion transmission time, stream arrival interval time, stream end interval time, stream data volume, stream throughput, and stream round-trip delay A statistical histogram is used to describe the streaming state of the neural network decision model over a period of time, where the streaming completion time is an index to measure streaming performance.
根据权利要求1所述的方法，其特征在于，所述神经网络决策模型包括特征提取器和预测器，所述特征提取器用于从输入数据中提取特征，其包括依次连接的五个卷积层和一个全连接层；所述预测器用于预测拥塞控制窗口值的概率分布，其输入是所述特征提取器提取的特征输出，所述预测器包括两个全连接层和一个输出层，并且输出层通过使用softmax激活函数将输出转换为拥塞控制初始窗口值的概率分布。The method according to claim 1, wherein the neural network decision model includes a feature extractor and a predictor, and the feature extractor is used to extract features from input data, which includes five convolutional layers connected in sequence And a fully connected layer; the predictor is used to predict the probability distribution of the congestion control window value, and its input is the feature output extracted by the feature extractor. The predictor includes two fully connected layers and an output layer, and outputs The layer converts the output into the probability distribution of the initial window value of congestion control by using the softmax activation function.
根据权利要求2所述的方法，其特征在于，该方法还包括：在所述神经网络决策模型的输入直方图信息的变化达到预设目标的情况下，初始化所述神经神经网络决策模型的最后两层的参数进行重新优化。The method according to claim 2, characterized in that, the method further comprises: in the case that the change of the input histogram information of the neural network decision model reaches a preset target, initializing the last of the neural network decision model The parameters of the two layers are re-optimized.
根据权利要求4所述的方法，其特征在于，采用以下指标来衡量所述神经网络决策模型的输入直方图信息的变化：The method according to claim 4, wherein the following indicators are used to measure changes in the input histogram information of the neural network decision model:

其中，s神经网络决策模型的当前输入，s'是神经网络决策模型之前的输入。Among them, s is the current input of the neural network decision model, and s'is the previous input of the neural network decision model.
根据权利要求1所述的方法，其特征在于，该方法还包括：在获得拥塞控制初始窗口值后的一段时间内重复使用该值用于流传输；以及当发送方开始传输流时同时启动一个计时器，在计时器到期而流传输尚未完成的情况下，利用估计的流完成传输时间作为样本训练所述神经网络决策模型。The method according to claim 1, characterized in that the method further comprises: reusing the initial window value for congestion control for a period of time after obtaining the value of the congestion control for stream transmission; and simultaneously starting a congestion control window when the sender starts to transmit the stream. The timer is used to train the neural network decision model by using the estimated time of completion of the stream transmission as a sample when the timer expires and the stream transmission has not been completed.
根据权利要求1所述的方法，其特征在于，所述流传输性能是流完成传输时间或吞吐量中一项或多项。The method according to claim 1, wherein the stream transmission performance is one or more of the time to complete the transmission of the stream or the throughput.
根据权利要求1所述的***，其特征在于，通过并行架构实现所述在线学习，该架构包括一个中央代理器，多个子代理器和网络环境，其中所述中央代理器负责维护最新的神经网络决策模型的参数，每一个子代理器通过决策函数做出拥塞控制初始窗口的决策并且计算参数更新。The system of claim 1, wherein the online learning is implemented through a parallel architecture, the architecture including a central agent, multiple sub-agents and a network environment, wherein the central agent is responsible for maintaining the latest neural network For the parameters of the decision model, each subagent makes a decision on the initial window of congestion control through a decision function and calculates parameter updates.
一种自适应调节拥塞控制初始窗口的***，包括：A system for adaptively adjusting the initial window of congestion control includes:

模型构建单元：用于构建神经网络决策模型，以历史的流传输状态作为输入，以拥塞控制初始窗口的分布作为输出动作；Model building unit: used to build a neural network decision model, taking the historical streaming state as input, and the distribution of the initial window of congestion control as the output action;

在线学习单元：用于构建马尔科夫学习过程通过在线学习获得所述神经网络决策模型的优化参数，其中，一个决策对应神经网络决策模型中一组参数θ，并且该决策对应一个状态轨迹{s ₀,a ₀,s ₁,a ₁,...,s _t,a _t,...}和流传输性能集合{d ₀,d ₁,...,d _t,...}，经过多个轮次的迭代更新，找到最佳的决策参数θ，使得期望的流传输性能最优，其中s表示输入的流传输状态，a表示输出动作； Online learning unit: used to construct a Markov learning process to obtain optimized parameters of the neural network decision model through online learning, where a decision corresponds to a set of parameters θ in the neural network decision model, and the decision corresponds to a state trajectory {s _{_{_{0, a 0, s 1,}}} a 1, ..., s t, a t, ...} and the set of performance streaming _{_{{d 0, d 1, ...}} , d t, ...}, after Multiple rounds of iterative update to find the best decision parameter θ to optimize the expected streaming performance, where s represents the input streaming state, and a represents the output action;

预测单元：用于利用所优化的神经网络决策模型获得拥塞控制初始窗口值，用于后续的流传输过程。Prediction unit: used to use the optimized neural network decision model to obtain the initial window value of congestion control for the subsequent streaming process.
一种计算机可读存储介质，其上存储有计算机程序，其中，该程序被处理器执行时实现根据权利要求1至8中任一项所述方法的步骤。A computer-readable storage medium having a computer program stored thereon, wherein the program is executed by a processor to implement the steps of the method according to any one of claims 1 to 8.