CN109194583A

CN109194583A - Network congestion Diagnosis of Links method and system based on depth enhancing study

Info

Publication number: CN109194583A
Application number: CN201810890267.0A
Authority: CN
Inventors: 潘胜利; 曾德泽
Original assignee: China University of Geosciences
Current assignee: China University of Geosciences
Priority date: 2018-08-07
Filing date: 2018-08-07
Publication date: 2019-01-11
Anticipated expiration: 2038-08-07
Also published as: CN109194583B

Abstract

The network congestion Diagnosis of Links method and system based on deep learning that the invention discloses a kind of, enhancing study is combined with deep learning by DQN, advantage when using it in face of dimensional state, using the Q-Learning mode of learning for constructing label based on state-movement-reward strategy, congestion link diagnosis is carried out.State is defined as the binary group being combined by link and by the congestion state collection in all paths of the link by the enhancing study part of DQN in the present invention；Action definition be according to path congestion state set come guess the link whether congestion；Reward, which is defined as hitting it, is positive reward and while guessing wrong is negative reward, and the deep learning part of DQN then uses depth convolutional neural networks even depth neural network in the present invention.In this way, DQN, by constantly iteration, the incidence relation between autonomous learning network congestion path and network congestion link realizes the Accurate Diagnosis to network congestion link, and diagnosis performance of the present invention is outstanding.

Description

Network congestion Diagnosis of Links method and system based on depth enhancing study

Technical field

The present invention relates to network congestion Diagnosis of Links fields, enhance study based on depth more specifically to a kind of Network congestion Diagnosis of Links method and system.

Background technique

Externally measured technology based on router cooperation initiates measurement process in network edge, by internal node to detection The feedback of data obtains parameter to be measured.Wherein, relatively common tool includes the ping for diagnostic network connectivity, obtains net The traceroute of network topology, pathchar of performance parameters such as measurement link bandwidth, time delay etc..When internal node because of network When the factors such as safety do not support cooperation, such methods will fail.In addition, such methods mostly use greatly ICMP (Internet Control Measurement Protocol) message as detection data, and in real network icmp packet priority compared with It is low, therefore the performance parameter measured possibly can not accurately reflect the virtual condition of network.End-to-end measurement passes through in network edge Sending and receiving data between node obtains the end to end performance parameter of network.This method only needs the basic storage using router Forwarding capability, minimum to the dependence of network itself, network tomography technology (Network Tomography, NT) is a kind of According to end-to-end measurement data, the method for the parameters within network such as link performance parameters, topological structure is inferred.Since it can not have Network internal performance parameter is obtained in the case where having internal node cooperation, with current internet non-cooperating, isomerization, based on edge The feature of control is agreed with very much, and the present invention is studied by the chromatography imaging method of network link performance parameter, passes through depth Study solves network congestion Diagnosis of Links, more accurately and quickly obtains the operating status of link in network.

Summary of the invention

In order to solve the above technical problems, present applicant proposes a kind of network congestion Diagnosis of Links based on depth enhancing study Method and system, by being learned by deep neural network and enhancing after end-to-end measurement link state under network chromatography method It practises to carry out congestion link positioning.

It includes such as that the present invention, which solves the network congestion Diagnosis of Links method based on deep learning used by its technical problem, Lower step:

S1, M parts of congestion state data of network collection physical link for treating diagnosis, have to obtain link congestion state network To acyclic figure M, as sample pool；M is the integer greater than 1；

S2, decision-making state modeling is carried out to every link congestion state network directed acyclic graph respectively, generates state together Set S；

S3, training number is used as according to the training method DQN, the state set S and corresponding decision set A of neural network According to collection, neural metwork training is carried out, it is input that each group of training data, which is the state of a directed acyclic graph, when training, corresponding Decision is output；

S4, using method identical with step S2, by the network directed acyclic graph of pending network congestion Diagnosis of Links into The modeling of row decision-making state, generates initiating task state s₀, by state s₀It substitutes into the neural network that step S3 training obtains, carries out Link congestion status predication.

Further, in the network congestion Diagnosis of Links method of the invention based on deep learning, DQN in step S3 Objective function construction method is as follows:

A1, with a deep neural network as the network of Q value, parameter ω forces Q function by updating ω Nearly optimal value: Q (s, a, ω) ≈ Q^π(s,a)；In formula, s indicates state, and a indicates decision；

A2, using mean square deviation carry out objective function in Q value:

L (ω)=E [(r+ γ maxQ (s ', a ', ω)-Q (s, a, ω)²)]；

In formula, s ' indicates next state, and a ' indicates next decision, and E is indicated., r expression., γ, which indicates to decay, is Number；

The gradient of A3, calculating parameter ω about objective function:

A4, optimization aim end to end is realized using SGD.

Further, in the network congestion Diagnosis of Links method of the invention based on deep learning, in DQN training, Main step includes:

B1, initialization experience pond D, setting capacity is N, for storing trained sample；

B2, initialization action-cost function Q neural network, weight parameter θ used are random value；

B3, initialized target movement-cost functionNeural network, structure is identical with Q, and weight parameter θ^-=θ；

B4, setting segment sum M；

B5, initialization network inputs state s₀, and calculate network output；

B6, with state set S_next={ s₀As input set, recurrence update is carried out to network parameter.

Further, in the network congestion Diagnosis of Links method of the invention based on deep learning, wherein step B6 Include: to the step of network progress recurrence update

B61, each of input set state is carried out movement conjecture and executes network to update, while obtaining next shape State is added into NextState set if NextState is nonabsorptive state；

If B62, NextState set non-empty, as the input that network recurrence updates, continues recurrence, otherwise tie Beam.

Further, right in step B61 in the network congestion Diagnosis of Links method of the invention based on deep learning Each state carries out movement conjecture execution network and updates and include: the step of obtaining the set of NextState

C1, it uses ε-greedy strategy to carry out movement selection: randomly choosing a movement from set of actions A with probability ε As a_t, otherwise current state is input to the Q value for calculating each movement in current network with a CNN, select Q It is worth a maximum movement as a_t；

C2, a is executed_t, obtain executing a_tFeedback r afterwards_tWith NextState s_t+1；

C3, by four parameter (s_t,a_t,r_t,s_t+1) be deposited into D together as state this moment, when storing N number of in D The state at quarter；

C4, minibatch state parameter group (s is taken out from D at random_j,a_j,r_j,s_j+1)；

C5, the target value for calculating each state, specifically by execution a_tReward afterwards updates Q value as target Value: if NextState is absorbing state, y_j=r_j, otherwise

C6, pass through SGD undated parameter θ；

Target action-value function network is updated after C7, every C iterationParameter θ^-It is current The parameter θ of the network Q of action-value function, C are the positive integer greater than 1.

Further, in the network congestion Diagnosis of Links method of the invention based on deep learning, the link congestion State network indicates G=(V, E) by a directed acyclic graph, and wherein V={ 0,1,2 ..., k ..., m } is network node collection It closes, E={ l₁,l₂,...,l_k,...,l_mIt is link set, and link l_kIt then indicates that end node is the link of k, owns in network The set in path is defined as P={ p₁,p₂,...,p_i,...,p_n, corresponding route congestion state observation set is defined as Y= {y₁,y₂,...,y_i,...,y_n, wherein the i-th paths p_iCongestion state be y_i, work as y_iWhen=1, path p is indicated_iIn congestion State；And if y_i=0, then it represents that path p_iIn normal condition, φ_kIt indicates to pass through link l_kSet of paths；Y_kCorrespond to φ_kIn the congestion state in each path observe set, the path status collection in network is combined into X={ x₁,x₂,...,x_k,...,x_m}；

State s is defined as a binary group of link and the route congestion state set by link, i.e. s=s_k=(l_k, Y_k), state set is S={ s₁,s₂,s₃,...,s_k,...,s_m, for being in state s=s_kWhen, the set of actions taken is A =a, wherein a=0 indicates conjecture link l_kFor normal link, i.e.,As a=1, then it represents that conjecture l_kFor congestion link, HaveWhen true link congestion state is identical as the link congestion state of conjecture, that is, work asWhen, it will be encouraged It encourages；Otherwise it will be punished.

Further, in the network congestion Diagnosis of Links method of the invention based on deep learning, entire based on deep Stateful set S in the network congestion Diagnosis of Links method of study is spent, strategy set A, tactful π are selected according to current state Lower a moment behavior a=π (s) has corresponding return value R (s) to be corresponding to it each of state set state s；It is right Corresponding weight function V is arranged for each strategy π in every next state in status switch, setting attenuation coefficient γ^π(s₀)=E [R (s₀)+γR(s₁)+γ²R(s₂)+...|s₀=S, π]=E [R (s₀)+γV^π(s₁)]。

The present invention is to solve its technical problem, additionally provides a kind of network congestion Diagnosis of Links system based on deep learning System, the system carry out network congestion link using the network congestion Diagnosis of Links method based on deep learning of any of the above-described and examine It is disconnected.

Beneficial effects of the present invention: the present invention is studied by the chromatography imaging method of network link performance parameter, is led to It crosses Deep Q-Learning and combines deep learning with intensified learning, more accurately and quickly obtain link in network Operating status.Network training is carried out using DQN, the multiple NextStates being likely to occur in state transfer for this problem propose The processing scheme that recurrence updates.Demonstrating this programme in an experiment has higher deduction accuracy and robustness compared with SCFS algorithm.

Detailed description of the invention

Present invention will be further explained below with reference to the attached drawings and examples, in attached drawing:

Fig. 1 is the flow chart of network congestion Diagnosis of Links method one embodiment of the invention based on deep learning；

Fig. 2 is neural metwork training schematic diagram of the invention；

Fig. 3 is DQN training flow chart of the invention；

Fig. 4 is that recurrence updates flow chart in DQN training of the invention；

Fig. 5 is that movement conjecture updates flow through a network figure in DQN training of the invention；

Fig. 6 is state transfer schematic diagram when guessing 0；

Fig. 7 is state transfer schematic diagram when guessing 1；

Fig. 8 is the relational graph of of the invention cycle of training number and value network and target network difference degree；

Fig. 9 is the comparison diagram of the present invention and SCFS algorithm DR；

Figure 10 is the comparison diagram of the present invention and SCFS algorithm FPR.

Specific embodiment

For a clearer understanding of the technical characteristics, objects and effects of the present invention, now control attached drawing is described in detail A specific embodiment of the invention.

Network congestion Diagnosis of Links method and system the invention discloses one kind based on Deep Q-Learning (DQN), Main contents are combined enhancing study with deep learning by DQN, advantage when using it in face of dimensional state, using base The Q-Learning mode of learning of label is constructed in the strategy of " state-movement-reward ", carries out congestion link diagnosis.This hair " state " is defined as the congestion state in all paths by link and by the link by the enhancing study part of bright middle DQN Collect the binary group being combined into；" movement " be defined as the congestion state set according to path guess the link whether congestion；" prize Encourage " being defined as hitting it is positive reward and while guessing wrong is negative reward.And the deep learning part of DQN then uses depth to roll up in the present invention Product neural network even depth neural network.In this way, DQN is by the way that constantly iteration, autonomous learning network congestion path are gathered around with network The incidence relation between link is filled in, realizes the Accurate Diagnosis to network congestion link.Emulation under multiple network congestion scenario is real It tests the results show that DQN method compares more traditional SCFS method in the present invention, has more excellent congestion link diagnostic Energy.

With reference to Fig. 1, the network congestion Diagnosis of Links method based on deep learning that the present embodiment uses is included the steps that such as Under:

S1, M parts of congestion state data of network collection physical link for treating diagnosis, it is oriented to obtain link congestion state network Acyclic figure M, as sample pool；Wherein, M is the positive integer greater than 1；

S3, the training method DQN according to neural network, the shape that M link congestion state network directed acyclic graphs are generated State set S and corresponding decision set A carries out neural metwork training as input, and each group of training data is one when training The state of directed acyclic graph is input, and corresponding decision is output；

S4, using method identical with step S2, by the network directed acyclic graph of pending network congestion Diagnosis of Links into The modeling of row decision-making state, generates initiating task state s₀, by state s₀It substitutes into the neural network that DQN training obtains, carries out link Congestion state prediction.

With reference to Fig. 2, DQN objective function construction method is as follows:

A1, with a deep neural network as the network of Q value, parameter ω forces Q function by updating ω Nearly optimal value: Q (s, a, ω) ≈ Q^π(s,a)；In formula, s indicates state, and a indicates decision (movement)；

A2, using mean square deviation mean-square error carry out objective function objective function in Q value Namely loss function loss function:L (ω)=E [(r+ γ maxQ (s ', a ', ω)-Q (s, a, ω)²)]；

In formula, L (ω) is objective function, and s ' indicates next state, and a ' indicates next decision, and E indicates expectation computing, R indicates reward, and γ indicates attenuation coefficient；

The gradient of A3, calculating parameter ω about loss function:

A4, optimization aim end to end is realized using SGD；

With reference to Fig. 3, in DQN training, main step includes:

B4, setting segment sum M；

B5, initialization network inputs state s₀, and calculate network output；

B6, with state set S_next={ s₀As input set, recurrence update is carried out to network parameter；

Wherein, step B6 includes: to the step of network progress recurrence update

If B62, NextState set non-empty, as the input that network recurrence updates, continues recurrence, otherwise tie Beam；

More specific step is as shown in Figure 4.Movement conjecture is carried out to each state in step B61 and executes network update simultaneously The step of obtaining the set of NextState is as shown in Figure 5, comprising:

C1, it uses ε-greedy strategy to carry out movement selection: randomly choosing one from set of actions A with probability ε (very little) A movement is used as a_t, otherwise current state is input to in current network the Q that each movement is calculated (with a CNN) Value, selects the maximum movement (optimal movement) of Q value as a_t；

C3, by four parameter (s_t,a_t,r_t,s_t+1) be deposited into D (when storing N number of in D together as state this moment The state at quarter)

C5, the target value of each state is calculated (by executing a_tReward afterwards updates Q value as target value): such as Fruit NextState is absorbing state, then y_j=r_j, otherwise

C6, pass through SGD undated parameter θ；

Target action-value function network is updated after C7, every C iterationParameter θ^-It is current The parameter θ of the network Q of action-value function.

Network congestion Diagnosis of Links method based on depth enhancing study of the invention mainly includes Deep Q Network (DQN), congestion link diagnoses；Wherein Deep Q Learning includes enhancing study, deep neural network, congestion link diagnosis End-to-end link congestion state is obtained including network tomography method.

The variable-definition of network:

Network indicates G=(V, E) by a directed acyclic graph, and wherein V={ 0,1,2 ..., k ..., m } is node collection It closes, E={ l₁,l₂,...,l_k,...,l_mIt is link set, and link l_kIt then indicates that end node is the link of k, owns in network The set in path is defined as P={ p₁,p₂,...,p_i,...,p_n, corresponding route congestion state observation set is defined as Y= {y₁,y₂,...,y_i,...,y_n, wherein the i-th paths p_iCongestion state be y_i, work as y_iWhen=1, path p is indicated_iIn congestion State；And if y_i=0, then it represents that path p_iIn normal condition, φ_kIt indicates to pass through link l_kSet of paths；Y_kCorrespond to φ_kIn each path congestion state observe set.Path status collection in network is combined into X={ x₁,x₂,...,x_k,...,x_m}。

The variable-definition being related in DQN:

State s is defined as a binary group of link and the route congestion state set by link.That is s=s_k=(l_k, Y_k)；State set is S={ s₁,s₂,s₃,...,s_k,...,s_m}.For being in state s=s_kWhen, we can take dynamic Make collection and be combined into A=a, wherein a=0 indicates conjecture link l_kFor normal link, i.e.,As a=1, then it represents that conjecture l_kFor Congestion link hasWhen true link congestion state is identical as the link congestion state of conjecture, that is, work asWhen, Will obtain reward R (s, a)=1；Otherwise will obtain punishment R (s, a)=- 2.

DQN diagnoses schematic diagram:

With reference to Fig. 6, original state collection: s₁=(l₁, [1,1,1,1,1]), in the case where guessing 0, state is shifted；

State s₂=(l₂,[1,1]),(l₅, [1,1,1]), two states are learnt parallel, are transferred to NextState；

State s₃=(l₆,[1,1]),(l₉, [1]), in the case where guessing 1, it is transferred to absorbing state, is terminated.

With reference to Fig. 7, s₁=(l₁, [1,1,1,1,1]), in the case where guessing 1, it is transferred directly to absorbing state E, is terminated；

The present invention is based in the network congestion Diagnosis of Links method of deep learning, from standing state, continue to optimize certainly Oneself strategy, stateful set S, behavior set A, tactful π select the behavior of lower a moment according to current state in the entire system A=π (s) has corresponding return value R (s) to be corresponding to it each of state set state s；For status switch In per next state, corresponding weight function V is arranged for each strategy π in setting attenuation coefficient γ^π(s₀)=E [R (s₀)+γR(s₁)+γ²R(s₂)+...|s₀=S, π], which meets Bellman equation, is write as V^π(s₀)=E [R (s₀)+γ V^π(s₁)]。

Deep neural network (Deep Neural Network, DNN), refers to a series of spies being stacked by multiple layer heaps Determine neural network, each layer is then made of node.Operation carries out in node, and the operating mode of node and the neuron of the mankind are big It causes similar, will be activated when encountering enough stimulus informations and release signal.Node is by input data and one group of coefficient (or power Weight) it combines, its importance in algorithm learning tasks is specified by amplifying or inhibiting input.Input data and weight multiply The sum of product will enter the activation primitive of node, determine whether signal continues to transmit in a network, and the distance of transmitting, thus certainly Determine how signal influences the final result of network, such as classification movement.Deep learning network and more common single hidden layer mind Difference through network is depth, i.e. the node level that is passed through in the multistep process of pattern-recognition of data.Three layers or more (packets Include including outputting and inputting layer) system can be known as " depth " study.So depth is the art for having strict difinition Language indicates more than one hidden layer.

In deep learning network, one group of study identification on the basis of preceding layer exports of each node layer is specifically special Sign.As neural network depth increases, node can know another characteristic and also just become increasingly complex, because each layer can integrate and lay equal stress on The feature of group preceding layer.According to applicable cases difference, the form and size of deep neural network are also different.Popular form and big Small positive rapid evolution is with lift scheme accuracy and efficiency.There are two types of principal modes for the network of processing input: feedforward and circulation. In feedforward network, all calculating are all a series of runnings carried out on the basis of preceding layer output, such as CNN.Recirculating network is There is inherent memory, long-term dependence is allowed to influence output, such as LSTM.

According to Q-Learning more new formula: Q^*(s, a)=Q (s, a)+α (r+ γ maxQ (s ', a ')-Q (s, a)), DQN Loss Function be L (θ)=E [(TargetQ-Q (s, a；θ))²], wherein θ is network parameter, target TargetQ= r+γmaxQ(s′,a′；θ).Experience pond (experience replay), the function in experience pond mainly solve correlation and non- Static distribution problem.Specific practice is the transfer sample (s that each time step agent and environmental interaction are obtained_t,a_t,r_t,s_t+1) Playback memory unit is stored, (minibatch) is taken out when training at random just to train.Target network generates TargetQ value, Q (s, a；θ_i) indicate the output of current network MainNet, for assessing the value function of current state movement pair；Q (s,a；θ_i ^-) output that indicates TargetNet, it substitutes into ask above in the formula of TargetQ value and obtains target Q value.According to above Loss Function updates the parameter of MainNet, every to take turns iteration by N, and the parameter of MainNet is copied to TargetNet.

Target network is considered as a flight data recorder by network tomography, and usually, all measurements are around the end of network Node carries out, and this measurement strategies are referred to as end-to-end measurement end-to-end measurement in addition to passively listening the number between end node pair Outside according to message transmissions, be more by the way of initiatively sending probe messages between end-to-end node according to being adopted The Routing Protocol taken is different, and end-to-end measurement mode is broadly divided into two kinds at present: multicast measurement and unicast measurement are for safety etc. The considerations of factor, router is higher than multicast for the support of unicast and is based on multi-slot measurement Nguyen etc. in current Internet The priori congestion probability that people demonstrates link under the frame of Boolean network tomography can be by the number of multiple measurement time slots According to uniquely determining, and propose that a kind of method CLINK (congested LINK identification) based on matrix inversion is right It is solved；Then, the end-to-end data of the priori congestion probability of link and subsequent measurement time slot are combined, link shape can be obtained For the MAP estimation of state compared to SCFS algorithm, the accuracy of this method is higher, especially when in network congestion link compared with When more, there is the Ghita et al. of the Lausanne the higher verification and measurement ratio Institute of Technology above method is generalized to more generally scene for it In, find and demonstrate certain links in network state it is not mutually indepedent when, cognizable fill of link priori congestion probability is wanted Condition proposes that a kind of need to measure the scheme that can acquire link priori congestion probability to a small amount of end-to-end path.

It is tested using artificial network, artificial network includes 15 paths, and the priori congestion probability of link is randomly generated, Experiment repeats emulation 100 times altogether.Carry out link congestion diagnosis using the mentioned method of this patent, obtain this programme number cycle of training with The relational graph of value network and target network difference degree is as shown in Figure 8.Horizontal axis coordinate is number cycle of training, ordinate of orthogonal axes in figure Difference between value network and target network.It can be seen that with the increase of number cycle of training, value network and target network The difference degree of network is reducing.Experiment is compared using SCFS algorithm and this paper algorithm, respectively obtain verification and measurement ratio (DR) and is missed The relationship line chart of report rate (FPR) and congestion probability ρ, successively as shown in Figure 9, Figure 10.Wherein, verification and measurement ratio expression is detected Positive sample number accounts for the ratio of all positive sample numbers, and false detection rate is the ratio for being detected the sample being actually negative in the sample being positive. As seen from Figure 9, the verification and measurement ratio of SCFS algorithm is shown in small with the increase of congestion probability, and the verification and measurement ratio of context of methods is by congestion probability Influence is smaller, remains at higher level, and performance is significantly better than SCFS algorithm.As seen from Figure 10, when congestion probability be less than etc. When 0.7, in rate of false alarm, context of methods is only slightly higher than SCFS algorithm.In general, context of methods has better table It is existing.

The embodiment of the present invention is described with above attached drawing, but the invention is not limited to above-mentioned specific Embodiment, the above mentioned embodiment is only schematical, rather than restrictive, those skilled in the art Under the inspiration of the present invention, without breaking away from the scope protected by the purposes and claims of the present invention, it can also make very much Form, all of these belong to the protection of the present invention.

Claims

1. a kind of network congestion Diagnosis of Links method based on deep learning, which comprises the steps of:

S1, M parts of congestion state data of network collection physical link for treating diagnosis, to obtain the oriented nothing of link congestion state network Ring figure M, as sample pool；M is the integer greater than 1；

S2, decision-making state modeling is carried out respectively to every link congestion state network directed acyclic graph, generating states set closes together S；

S3, according to the training method DQN, the state set S of neural network and corresponding decision set A as training data Collection carries out neural metwork training, and it is input that each group of training data, which is the state of a directed acyclic graph, when training, corresponding to determine Plan is output；

S4, using method identical with step S2, the network directed acyclic graph of pending network congestion Diagnosis of Links is determined The modeling of plan state, generates initiating task state s₀, by state s₀It substitutes into the neural network that step S3 training obtains, carries out link Congestion state prediction.

2. the network congestion Diagnosis of Links method according to claim 1 based on deep learning, which is characterized in that step S3 The objective function construction method of middle DQN is as follows:

A1, with a deep neural network as the network of Q value, parameter ω makes Q function approximation most by updating ω The figure of merit: Q (s, a, ω) ≈ Q^π(s,a)；In formula, s indicates state, and a indicates decision, and π is strategy；

A2, using mean square deviation carry out objective function in Q value:

L (ω)=E [(r+ γ maxQ (s ', a ', ω)-Q (s, a, ω)²)]；

In formula, s ' indicates next state, and a ' indicates next decision, and E indicates that expectation computing, r indicate reward, and γ indicates decaying Coefficient；

The gradient of A3, calculating parameter ω about objective function:

A4, optimization aim end to end is realized using SGD.

3. the network congestion Diagnosis of Links method according to claim 2 based on deep learning, which is characterized in that

In DQN training, main step includes:

B4, setting segment sum M；

B5, initialization network inputs state s₀, and calculate network output；

4. the network congestion Diagnosis of Links method according to claim 3 based on deep learning, which is characterized in that wherein, Step B6 to network carry out recurrence update the step of include:

B61, each of input set state is carried out movement conjecture and executes network to update, while obtaining NextState, If NextState is nonabsorptive state, it is added into NextState set；

If B62, NextState set non-empty, as the input that network recurrence updates, continues recurrence, otherwise terminate.

5. the network congestion Diagnosis of Links method according to claim 4 based on deep learning, which is characterized in that step Each state act in B61 and guesses that executing network updates and include: the step of obtaining the set of NextState

C1, use ε-greedy strategy to carry out movement to select: randomly choosed from set of actions A using probability ε a movement as a_t, otherwise current state is input to the Q value for calculating each movement in current network with a CNN, select Q value most A big movement is used as a_t；

C3, by four parameter (s_t,a_t,r_t,s_t+1) be deposited into D together as state this moment, N number of moment is stored in D State；

C5, the target value for calculating each state, specifically by execution a_tReward afterwards updates Q value as target value: if NextState is absorbing state, then y_j=r_j, otherwise

C6, pass through SGD undated parameter θ；

Target action-value function network is updated after C7, every C iterationParameter θ^-For current action- The parameter θ of the network Q of value function, C are the positive integer greater than 1.

6. the network congestion Diagnosis of Links method according to claim 1 based on deep learning, which is characterized in that the chain Road congestion state network indicates G=(V, E) by a directed acyclic graph, and wherein V={ 0,1,2 ..., k ..., m } is network node Set, E={ l₁,l₂,…,l_k,…,l_mIt is link set, and link l_kIt then indicates that end node is the link of k, owns in network The set in path is defined as P={ p₁,p₂,…,p_i,…,p_n, corresponding route congestion state observation set is defined as Y={ y₁, y₂,…,y_i,…,y_n, wherein the i-th paths p_iCongestion state be y_i, work as y_iWhen=1, path p is indicated_iIn congestion state； And if y_i=0, then it represents that path p_iIn normal condition, φ_kIt indicates to pass through link l_kSet of paths；Y_kCorresponding to φ_kIn The congestion state in each path observes set, and the path status collection in network is combined into X={ x₁,x₂,...,x_k,...,x_m}；

State s is defined as a binary group of link and the route congestion state set by link, i.e. s=s_k=(l_k,Y_k), State set is S={ s₁,s₂,s₃,...,s_k,...,s_m, for being in state s=s_kWhen, the set of actions taken is A=a, Wherein a=0 indicates conjecture link l_kFor normal link, i.e.,As a=1, then it represents that conjecture l_kFor congestion link, that is, haveWhen true link congestion state is identical as the link congestion state of conjecture, that is, work asWhen, it will be rewarded；It is no It will then be punished.

7. based on the network congestion Diagnosis of Links method described in claim 1 based on deep learning, which is characterized in that entire Stateful set S in network congestion Diagnosis of Links method based on deep learning, strategy set A, tactful π, according to current state Behavior a=π of lower a moment (s) is selected, for each of state set state s, has corresponding return value R (s) therewith It is corresponding；For, per next state, attenuation coefficient γ being arranged in status switch, for each strategy π, corresponding power is set Value function V^π(s₀)=E [R (s₀)+γR(s₁)+γ²R(s₂)+…|s₀=S, π]=E [R (s₀)+γV^π(s₁)]。

8. a kind of network congestion Diagnosis of Links system based on deep learning, it is characterised in that: using any one of claim 1-7 The network congestion Diagnosis of Links method based on deep learning carries out network congestion Diagnosis of Links.