CN113596138B - Heterogeneous information center network cache allocation method based on deep reinforcement learning - Google Patents

Heterogeneous information center network cache allocation method based on deep reinforcement learning Download PDF

Info

Publication number
CN113596138B
CN113596138B CN202110843043.6A CN202110843043A CN113596138B CN 113596138 B CN113596138 B CN 113596138B CN 202110843043 A CN202110843043 A CN 202110843043A CN 113596138 B CN113596138 B CN 113596138B
Authority
CN
China
Prior art keywords
content
network
heterogeneous
cache
learning
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110843043.6A
Other languages
Chinese (zh)
Other versions
CN113596138A (en
Inventor
马连博
周萍
王兴伟
黄敏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Northeastern University China
Original Assignee
Northeastern University China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Northeastern University China filed Critical Northeastern University China
Priority to CN202110843043.6A priority Critical patent/CN113596138B/en
Publication of CN113596138A publication Critical patent/CN113596138A/en
Application granted granted Critical
Publication of CN113596138B publication Critical patent/CN113596138B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/50Network services
    • H04L67/56Provisioning of proxy services
    • H04L67/568Storing data temporarily at an intermediate stage, e.g. caching
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent

Abstract

The invention discloses a heterogeneous information center network cache allocation method based on deep reinforcement learning, and relates to the technical field of network cache space allocation. The method specifically comprises the following steps: abstracting a heterogeneous ICN into a topological model; defining a dynamically changing content request in a heterogeneous ICN; converting the cache space distribution problem of the heterogeneous ICN into a network performance optimization problem, and constructing a network performance optimization model, wherein the network performance optimization model comprises an optimization objective function and corresponding constraints; applying a Q learning algorithm to each content request to obtain a cache allocation scheme with optimal network performance corresponding to the content request at each moment: combining the deep neural network with a Q learning algorithm, and training an optimal cache allocation scheme which is suitable for the content request of the heterogeneous ICN dynamic change by utilizing a cache allocation scheme with optimal network performance corresponding to the content request at each moment solved by the Q learning algorithm. The method can adaptively solve the cache allocation scheme with the optimal network performance and can be more suitable for dynamically changing network requests.

Description

Heterogeneous information center network cache allocation method based on deep reinforcement learning
Technical Field
The invention relates to the technical field of heterogeneous information center networks, in particular to a cache allocation method of a heterogeneous information center network based on deep reinforcement learning.
Background
With the development of internet technology, more and more network users and more requests for network content are provided. Information Centric Networking (ICN) is a new type of Network architecture that caches content provided by servers on routers to serve users. The outstanding advantage of ICN over traditional network architectures is network caching, where each router can store content. Because the Content Router (Content Router) in the ICN caches different contents from the server, the Content requested by the user is responded by the Router storing the requested Content, thereby avoiding the overhead of long-distance transmission from the client to the server and greatly improving the response speed. For network caching in ICNs, cache allocation (allocation of cache capacity to each content router) is the basis for caching content. In heterogeneous ICNs, each content router may be allocated a different size of cache capacity, which becomes more complex than in homogeneous ICNs. In addition, since the cost of configuring the cache space for the content router is expensive and consumes energy, if the cache space allocated to the content router is too large, unnecessary waste may be caused; if the allocated cache space is too small to meet the request requirement of the cache user, the user experience and the network performance are affected. Therefore, allocating the appropriate cache space for each content router is important to optimize heterogeneous ICN network performance.
For cache allocation of heterogeneous ICNs, two aspects are mainly considered: the method comprises the following steps that firstly, the centrality of a router in a network topology is high, the higher the centrality is, the higher the importance degree of a node in a topological structure is, and the higher the cache capacity needs to be allocated; secondly, the request frequency of the nodes, and the more frequently requested nodes need to allocate more cache space. The existing cache allocation methods of heterogeneous ICN are divided into two types: one is to perform cache allocation based on the importance of the nodes in the network topology; the other method is to convert the cache allocation problem into a network performance optimization problem and obtain an optimal cache allocation scheme by solving an optimal solution which enables the network performance to be optimal. However, these methods are all performed for static networks, and in reality, network requests are dynamically changed, and the existing methods cannot meet the dynamic requirements of the network requests.
Disclosure of Invention
In order to solve the above problems, the present invention provides a heterogeneous information center network cache allocation method based on deep reinforcement learning, and aims to allocate an appropriate cache space for each routing node according to the dynamic property of a network request.
The technical scheme of the invention is as follows:
a cache allocation method of a heterogeneous information center network based on deep reinforcement learning comprises the following steps:
step 1: abstracting the heterogeneous ICN into a topological model;
step 2: defining a dynamically changing content request in a heterogeneous ICN;
and step 3: converting the cache space distribution problem of the heterogeneous ICN into a network performance optimization problem of the heterogeneous ICN, and constructing a network performance optimization model, wherein the network performance optimization model comprises an optimization objective function and corresponding constraints;
and 4, step 4: applying a Q learning algorithm to each content request in the heterogeneous ICN to obtain a cache allocation scheme with optimal network performance corresponding to the content request at each moment:
and 5: and (4) combining the deep neural network with the Q learning algorithm, and training an optimal cache allocation scheme which is suitable for the content request with the dynamic change of the heterogeneous ICN by using the cache allocation scheme with the optimal network performance corresponding to the content request at each moment solved by using the Q learning algorithm in the step 4.
Further, according to the cache allocation method of the heterogeneous information center network based on deep reinforcement learning, the heterogeneous ICN with n content routers is abstracted into a topology model G (V, E, C, Long, Lati):
Figure BDA0003179578870000021
wherein V represents a content router set composed of the n content routers; e represents a set of edges between content routers; c represents a set of cache capacities allocated to the content routers; long represents the longitude of the location of the content router in the topology model G; lati represents the latitude of the position of the content router in the topology model G; CRiRepresents the ith content router; e.g. of the typeijPresentation content router CRiAnd the jth content router CRjA path between; c. CiPresentation content router CRiThe allocated buffer capacity; longiPresentation content router CRiLongitude of the location in the topology model G; lati iiPresentation content router CRiLatitude of the position in the topological model G; CRi,eijYet further can be expressed as follows:
Figure BDA0003179578870000022
wherein the content of the first and second substances,
Figure BDA0003179578870000023
indicating allocated buffer capacity ciThe ith content router of (1) in the content router,
Figure BDA0003179578870000024
presentation content router
Figure BDA0003179578870000025
And allocated buffer capacity cjJth content router of (1)
Figure BDA0003179578870000026
Path between CmaxIndicating the maximum cache capacity that the content router can allocate.
Further, according to the heterogeneous information center network cache allocation method based on deep reinforcement learning, the hit rate and energy consumption of a content request are used as evaluation indexes of the performance of the heterogeneous ICN, and an optimization objective function shown in a formula (12) is established:
Figure BDA0003179578870000031
wherein, NetPtotalIs the overall network performance of the heterogeneous ICN;
Figure BDA0003179578870000032
indicating a successful cache hit CRiThe number of times of the operation of the motor,
Figure BDA0003179578870000033
represents CRiThe total number of requests received is,
Figure BDA0003179578870000034
presentation content router CRiThe request hit rate of;
Figure BDA0003179578870000035
representing routing node CRiEnergy consumption of (2); piIs CRiFixed energy consumption of router hardware when caching content;
Figure BDA0003179578870000036
to pass through CRiTransmitting energy consumption corresponding to the content of the unit byte; traiTo pass through CRiThe size of the data stream of (a);
Figure BDA0003179578870000037
representing content requesting nodes CRjAnd service node CRiThe distance of (a); ω and μ are request hit rate and energy consumption, respectively, for content router CRiAnd caching the weight value of the network performance corresponding to the content of the unit size.
Further, according to the method for allocating cache of a heterogeneous information-centric network based on deep reinforcement learning, the constraints include a cache space constraint of each content router shown in formula (13) and a cache space constraint in the overall network topology:
Figure BDA0003179578870000038
wherein, CmaxRepresenting the maximum cache capacity that a content router in a heterogeneous ICN can allocate; ctotalRepresenting the maximum cache space of the whole of all content routers in the heterogeneous ICN.
Further, according to the method for allocating cache of the heterogeneous information center network based on deep reinforcement learning, the method for applying the Q learning algorithm to each content request in the heterogeneous ICN comprises the following steps: the content request at each time is expressed as a Q-learned state Status ═ s1,s2,…,stIn which s istRequesting q for content at time ttThe corresponding Q-learned state; representing a topology model G (V, E, C, Long, Lati) of the heterogeneous information-centric network as an Environment of Q learning { E }1,e2,…,etIn which etRequesting q for content at time ttCorresponding Q-learned RingEnvironmental conditions; express cache allocation scheme for content router as Q-learned Action ═ { a ═ a }1,a2,…,atIn which a istRequesting q for content at time ttA corresponding Q learning action; performing a cache allocation scheme for network content requests returns a network performance value, denoted as the reward value of Q learning, r1,r2,…,rtIn which r istRequesting q for content at time ttThe corresponding Q learned reward value; in the Q learning process, selecting the action with the maximum corresponding reward value for each state to execute, and obtaining the strategy of the Q learning after the Q learning process is finished
Figure BDA0003179578870000041
The action with the largest corresponding prize value is selected for each entered state and executed.
Further, according to the heterogeneous information center network cache allocation method based on deep reinforcement learning, the deep neural network is a BP neural network.
Further, according to the heterogeneous information center network cache allocation method based on deep reinforcement learning, the step 5 includes the following specific steps:
step 5.1: randomly initializing a weight theta of the BP neural network;
and step 5.2: learning the state and action(s) of Q at time T within a period Tt,at) As the input value of the neural network, the maximum reward value R(s) obtained by the Q learning algorithm is used correspondinglyt,atθ) and corresponding actions atOutput value y as a deep neural networkoutput
Step 5.3: calculating an estimated value of an output value of the BP neural network according to a Bellman equation;
step 5.4: calculating a corresponding loss value according to the output value of the BP neural network and the estimated value of the output value;
step 5.5: updating the weight of the BP neural network by adopting a gradient descent method according to the loss value;
step 5.6: and according to the method of the steps 5.2 to 5.5, repeatedly executing the steps 5.2 to 5.5, and iteratively updating the theta until a condition T of stopping iteration is met, so as to obtain a final weight theta of the neural network, wherein the weight theta is used as an optimal cache allocation scheme of the content request adapting to the dynamic change of the T time period.
Compared with the prior art, the heterogeneous information center network cache allocation method based on deep reinforcement learning has the following beneficial effects: after the heterogeneous information center network is modeled, the dynamic property of the network request is analyzed, and compared with the existing topological model of the heterogeneous information center network, the dynamic network model is more in line with the actual situation. The deep learning and the Q learning are combined and applied to the problem of cache allocation of the dynamic heterogeneous information center network, and compared with the existing cache allocation method, the cache allocation scheme with the optimal network performance can be solved in a self-adaptive mode, and the dynamic heterogeneous information center network cache allocation method can be more suitable for network requests with dynamic changes.
Drawings
FIG. 1 is a schematic diagram of an information-centric network architecture;
fig. 2 is a schematic flow chart of a cache allocation method of a heterogeneous information center network based on deep reinforcement learning according to the embodiment;
fig. 3 is a schematic structural diagram of the deep Q learning algorithm of the present embodiment;
fig. 4 is a schematic flowchart of solving the network cache allocation scheme by deep learning according to this embodiment.
Detailed Description
To facilitate an understanding of the present application, the present application will now be described more fully with reference to the accompanying drawings. Preferred embodiments of the present application are given in the accompanying drawings. This application may, however, be embodied in many different forms and should not be construed as limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete.
The invention provides a dynamic cache space allocation scheme aiming at the problem of cache space allocation of nodes in a heterogeneous information center network, and particularly provides a cache allocation strategy for the nodes in the network to adapt to the dynamic property of the network. When modeling is performed for the cache allocation problem, the network request hit rate and the energy consumption are used as performance evaluation indexes, the hit rate and the energy consumption are integrated into a comprehensive performance evaluation cache allocation scheme, and the cache allocation problem is modeled as the problem of maximizing network performance. In order to obtain the optimal cache allocation of each content request, the cache allocation is used as an action selected by an agent by applying a reinforcement learning method, and a cache allocation scheme corresponding to the optimal performance of each request is obtained. In order to adapt to the dynamic property of the network request, the existing content request is used as input, the cache allocation scheme obtained by reinforcement learning is used as output, and the optimal cache allocation scheme adapting to the dynamic request at different moments is obtained through training.
Fig. 1 is a schematic diagram of an information-centric network architecture, which is composed of nodes and paths between the nodes, wherein the nodes include request nodes, routing nodes and service nodes. The request node is responsible for receiving a content request of a user and transmitting the request to the routing node; the routing node is responsible for transmitting requests or contents and can cache the contents; the service node stores the content and is responsible for returning the requested content to the user. Nodes between paths are responsible for passing requests or content. When a user sends a content request to a request node, the request node transmits the request to a routing node through a path, the routing node judges whether the request content is cached or not, if the request content is cached, the content is returned to the request node, and if the request content is not cached, the request is delivered to the next routing node or a service node according to a forwarding information base; finally, the request is transmitted to a routing node or a service node which caches the request content, and the node returns the content to the request node according to the request path to complete the request. The efficiency of the process of one request completion is proportional to network performance and is related to the cache space and corresponding cache contents of each routing node. Through proper cache allocation, the frequently requested content is cached in the corresponding node frequently requesting the content, so that the network performance can be improved, and the content requesting efficiency can be improved. The invention provides a heterogeneous information center network cache allocation method based on deep reinforcement learning, and aims to allocate a proper cache space for each routing node.
Fig. 2 is a schematic flow chart of a deep reinforcement learning-based heterogeneous information centric network cache allocation method provided by the present invention, where the deep reinforcement learning-based heterogeneous information centric network cache allocation method includes the following steps:
step 1: abstracting a heterogeneous information center network into a topological model;
in this embodiment, a heterogeneous information center network with n content routers is abstracted into a topology model G (V, E, C, Long, Lati), where V represents a content router set composed of the n content routers; e represents a set of edges between content routers; c represents a set of cache capacities allocated to the content routers; long represents the longitude of the location of the content router in the topology model; lati represents the latitude of the position of the content router in the topological model; each component of the heterogeneous information center network topology model is specifically expressed as follows:
Figure BDA0003179578870000061
wherein, CRiRepresents the ith content router; e.g. of the typeijPresentation content router CRiAnd the jth content router CRjA path between; c. CiPresentation content router CRiThe allocated buffer capacity; longiPresentation content router CRiLongitude of the location in the topology model; lati iiPresentation content router CRiThe latitude of the location in the topology model. CRi,eijYet further can be expressed as follows:
Figure BDA0003179578870000062
wherein the content of the first and second substances,
Figure BDA0003179578870000063
indicating allocated buffer capacity ciThe ith content router of (1) in the content router,
Figure BDA0003179578870000064
presentation content router
Figure BDA0003179578870000065
And allocated buffer capacity cjJth content router of (1)
Figure BDA0003179578870000066
Path between CmaxIndicating the maximum cache capacity that the content router can allocate.
And 2, step: on the basis of a topological model of a heterogeneous information center network, defining a dynamically changing content request;
the content request at each moment is dynamically changed, and the content request Qr in the period T is defined as:
Qr={qt|1≤t≤T} (3)
wherein q istRefers to a content request that occurs at time t by the network, including: the method comprises the steps of requesting contents, the latitude and longitude of the position of the content requesting node in the topological model, requesting contents, providing a content server node of the requested contents and requesting time.
To elaborate the dynamically changing network requests at different times, qtCan be further expressed as:
Figure BDA0003179578870000067
wherein the content of the first and second substances,
Figure BDA0003179578870000068
respectively represent qtThe content requesting node of the kth content request, the requested content, the longitude of the location of the content requesting node in the network topology model, the latitude of the location of the content requesting node in the network topology model, the content server node providing the requested content, and the request time.
On the basis of a static network topology model, the dynamic analysis of the network request is added, and the requirements of the dynamically changing network request on different cache spaces can be met.
And step 3: converting the cache allocation problem into a network performance optimization problem, and constructing a network performance optimization model, wherein the network performance optimization model comprises an optimization objective function and corresponding constraints;
in the embodiment, the cache allocation problem is converted into the optimization problem of the network performance, and the hit rate and the energy consumption of the content request are used as the evaluation indexes of the network performance for the network content request. With Etotal、HtotalRespectively representing the energy consumption and hit rate of the whole network by eci、hriEach content router CR is represented separatelyiThe unit energy consumption and the unit hit rate of the network are respectively the sum of the energy consumption of each router and the sum of the hit rate, and are specifically expressed as follows:
Figure BDA0003179578870000071
wherein, ci={0,1,2,...,Cmax},CmaxMaximum buffer capacity that can be allocated for each router, ci0 represents CRiNot allocated cache, c i1 represents CRiA buffer allocated 1 predetermined unit, ci2 represents CRiThe cache of 2 preset units is distributed; hr ofiPresentation content router CRiIs hit by the content router CR as shown in equation (6)iNumber of requests and content router CR received and successfully hitiThe ratio of all requests received is calculated, wherein the content router CRiThe number of requests received and successfully hit is the actual number of requests occurring at the content router CRiAnd node CRiThe number of requests for which the requested content is cached, and the content router CRiThe number of all requests received is the actual number of requests occurring at the node CRiThe number of requests above; eciIndicating a routing node CRiThe energy consumption is calculated according to the formula (7) and comprises two parts of cache energy consumption and transmission energy consumptionAnd the overhead size of ICN content caching is reflected. The cache energy consumption refers to energy consumed by the router for caching content, is related to the caching performance of the router and the size of the caching content, and is calculated. The transmission energy consumption refers to energy consumed by the router for transmitting the request, is related to the size of the transmitted content, and is used for calculating the time consumed by transmission energy;
Figure BDA0003179578870000072
wherein the content of the first and second substances,
Figure BDA0003179578870000073
indicates a successful cache hit CRiThe number of times of the operation of the motor,
Figure BDA0003179578870000074
represents CRiThe total number of requests received.
Figure BDA0003179578870000075
Wherein, PiIs CRiFixed energy consumption of router hardware when caching content;
Figure BDA0003179578870000076
to pass through CRiEnergy consumption, t, corresponding to the content of a transmission unit of bytesiIs CRiRun time of traiTo pass through CRiThe size of the data stream.
The running time includes the time for the node to process the cache request and the transmission time for returning the request content to the requesting node, assuming the processing time is ignored, the CRjRequesting a node for the content, then tiThe calculation was performed according to equation (8).
Figure BDA0003179578870000081
Wherein distancei,jRepresenting content requesting nodes CRjAnd service node CRiThe distance of (2) is calculated according to the position of the node in the heterogeneous information center network topology model, and the formula (9) is referred to:
Figure BDA0003179578870000082
NetPipresentation content router CRiThe network performance corresponding to the content of the cache unit size is proportional to the hit rate and inversely proportional to the energy consumption, NetPiCalculation with reference to equation (10):
NetPi=ωhri+μeci (10)
where ω and μ are hit rate and energy consumption versus content router CR, respectivelyiAnd caching the weight value of the network performance corresponding to the content of the unit size.
In the whole heterogeneous information center network topology, the whole network performance NetPtotalIs represented as follows:
Figure BDA0003179578870000083
aiming at the problem of cache space allocation of an ICN node, the goal is to find a cache allocation scheme, so that the network performance is optimal for dynamic content requests, namely the overall network performance is maximized, and an optimization objective function shown in formula (12) is established:
Figure BDA0003179578870000084
while maximizing network performance, a single node cache space and all network cache spaces need to satisfy certain constraint conditions, as shown in equation (13), including the cache space constraint of each content router and the cache space constraint in the overall network topology:
Figure BDA0003179578870000085
and (3) a final network performance optimization model is shown as an equation (14):
Figure BDA0003179578870000091
in the above formula, ciDenotes the ith content Router CRiThe allocated buffer capacity; ctotalThe maximum cache space of the whole router of all content in the representative network;
and 4, step 4: applying a Q learning algorithm to cache allocation of a heterogeneous information center network, applying the Q learning algorithm to each content request of the network, and obtaining a cache allocation scheme with optimal network performance corresponding to the content request at each moment:
because the network structure cannot change along with time change in practice, the network dynamics is mainly embodied in the network request dynamics, different content requests can occur in the network at different times, and the network dynamics is caused, so when the Q learning is applied to cache allocation of the heterogeneous information center network, the content request at each time is expressed as a Q learning state Status, and for the content requests at different times, the Q learning state is specifically expressed as Status ═ s {(s) }1,s2,…,stIn which s istRequesting q for content at time ttThe corresponding Q-learned state; representing a topological model G (V, E, C, Long, Lati) of the heterogeneous information center network as an Environment for Q learning; the cache allocation scheme for the content router is represented as Q-learned Action, and the execution of the cache allocation scheme for the network content request returns a network performance value, represented as Q-learned reward value, rework. In the Q learning process, each shapeAnd selecting the action with the maximum corresponding reward value to execute. After the Q learning process is finished, the obtained Policy of Q learning selects the action with the maximum reward value for each input state to execute.
The action of Q learning refers to different cache allocation schemes allocated to the network, specifically, a certain cache space is allocated to each routing node, and the constraint condition of the size of the cache space of the routing node is met. In a heterogeneous information centric network, the cache space size of each node may be unequal. In a real network, the number of nodes is often huge, and the cache space of each node can be selected, so that the selectable cache allocation schemes of the network are various, namely Q learning has a plurality of selectable actions. Network content request q for different time instantstThe Action of Q learning is specifically expressed as Action ═ { a ═ a1,a2,…,atIn which a istRequest for content q at time ttCorresponding Q learning actions.
The environment of Q learning is composed of an integral network, and the quality of actions can be evaluated by interacting the actions and the states and returning to corresponding reward values of the states. The Environment of Q learning is specifically expressed as Environment { e } for network content requests at different times1,e2,…,etIn which etRequest for content q at time ttCorresponding to the environment of Q learning.
The reward value for Q learning refers to the value that a state performs an action interacting with the environment, which returns to the state, represented by the network performance. For different cache allocation schemes, the network has different performances when processing the request, and the higher the performance is, the better the corresponding cache allocation scheme is, i.e. the action with the high reward value is selected to be executed. The reward value is obtained through network performance calculation, in a known network topology model, different network performances can be obtained through different cache allocation schemes for the same network content request, and the Q learning algorithm can select the cache allocation scheme corresponding to the optimal network performance, namely the action of Q learning and selecting the reward value to be the maximum. Network content request q for different time instantstThe reward value for Q learning is specifically expressed asReword={r1,r2,…,rtIn which r istRequest for content q at time ttThe corresponding Q learned reward value.
The strategy of Q learning directs a state to select an action. According to the policy, a cache allocation scheme for a certain network content request may be determined. The strategy is expressed as
Figure BDA0003179578870000101
And 5: and (4) combining the deep neural network and the Q learning, applying the combined result to the solution of the cache allocation scheme of the heterogeneous information center network, and training an optimal cache allocation scheme of the content request adapting to the dynamic change of the network according to the cache allocation scheme with optimal network performance corresponding to the network content request at each moment solved by the Q learning in the step 4.
In the deep neural network portion, a bp (back propagation) neural network is used in the present embodiment, and includes two processes of forward propagation and backward propagation. The forward propagation process is used for constructing a neural network structure, as shown in fig. 3, the state and action of Q learning are regarded as input of the neural network, the reward value is regarded as output of the neural network, the strategy is regarded as weight of the neural network, and the neural network can train the optimal weight fitting input and output at different times. The back propagation process is used to adjust the neural network structure, optimizing the weights by minimizing the loss value per training, where the loss value is related to the output value of the neural network and an estimate of the output value of the neural network. The information center network cache allocation process for solving the dynamic request by the deep Q learning algorithm specifically comprises the following steps:
as shown in FIG. 3, the state and action of Q-learning is used as input to the neural network. The input layer of the neural network receives input data, and inputs the state and Action of Q learning, denoted as (Status, Action). And the output layer of the neural network receives the performance value of the network output by the Q learning and takes the reward value of the Q learning as the output of the neural network. In a state stAccording to the policy(s)t,at) Performing action atReceive the corresponding reward value rtIs shown byIs rt(st,at;policy(st,at)). The weight theta of the neural network corresponds to the strategy Policy of Q learning.
Reward value rtWith reference to equation (15):
Figure BDA0003179578870000102
as shown in fig. 4, the step 5 includes the following specific steps:
step 5.1: randomly initializing a weight theta of the BP neural network;
step 5.2: learning the state and action(s) of Q at time T within a period Tt,at) As input value x for a neural networkinputThe maximum reward value R(s) for Q learning is accordingly determinedt,atθ) and corresponding actions atOutput value y as a deep neural networkoutput
Step 5.3: and calculating an estimated value of the output value of the BP neural network according to the Bellman equation.
Calculation of estimated value of output value by BP neural network through equation (16)
Figure BDA0003179578870000111
Figure BDA0003179578870000112
Wherein alpha and gamma are respectively the learning rate and the discount rate of the Bellman equation, and a is the state st+1A corresponding selectable action.
Step 5.4: calculating a corresponding loss value according to the output value of the BP neural network and the estimated value of the output value;
and in the back propagation process of the BP neural network, the BP neural network adjusts the weight according to the loss value. The loss value is calculated from the output value of the neural network and the estimated value of the output value.
Calculation of loss values with reference to equation (17):
Figure BDA0003179578870000113
wherein m is the number of neurons of the preset output layer.
Step 5.5: and updating the weight of the BP neural network by adopting a gradient descent method according to the loss value.
In order to make the weight value approach to the optimum continuously, the weight value should be updated toward the direction of decreasing loss value. The updating mode of the weight value refers to a formula (18), and in the back propagation process, the weight value of the neural network is adjusted according to the loss value by adopting a gradient descent method, namely according to loss (x)input,youtputθ) update weight θ, expressed as follows:
Figure BDA0003179578870000114
wherein η is the learning rate of the gradient descent method. Since the objective of the algorithm is to find the weight corresponding to the minimum loss value, η < 0.
Step 5.6: and repeating the steps 5.2-5.5, and iterating the process of updating theta until the condition of stopping iteration is met, namely T is equal to T, wherein the obtained theta is the weight of the final neural network, and the obtained final weight is the optimal cache allocation strategy adapted to the dynamic request of the T time period.
Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; such modifications and substitutions do not depart from the spirit of the corresponding technical solutions and scope of the present invention as defined in the appended claims.

Claims (4)

1. A cache allocation method of a heterogeneous information center network based on deep reinforcement learning is characterized by comprising the following steps:
step 1: abstracting a heterogeneous ICN into a topological model;
abstract the heterogeneous ICN with n content routers into a topology model G (V, E, C, Long, Lati):
Figure FDA0003581647550000011
wherein V represents a content router set composed of the n content routers; e represents a set of edges between content routers; c represents a set of cache capacities allocated to the content routers; long represents the longitude of the location of the content router in the topology model G; lati represents the latitude of the position of the content router in the topology model G; CRiRepresents the ith content router; e.g. of the typeijPresentation content router CRiAnd the jth content router CRjA path between; c. CiPresentation content router CRiThe allocated buffer capacity; longiPresentation content router CRiLongitude of the location in the topology model G; lati iiPresentation content router CRiLatitude of the position in the topological model G; CRi,eijYet further can be expressed as follows:
Figure FDA0003581647550000012
wherein the content of the first and second substances,
Figure FDA0003581647550000013
indicating allocated buffer capacity ciThe ith content router of (1) in the content router,
Figure FDA0003581647550000014
presentation content router
Figure FDA0003581647550000015
And allocated buffer capacity cjJth content router of (1)
Figure FDA0003581647550000016
Path between CmaxRepresents the maximum cache capacity that the content router can allocate;
step 2: defining a dynamically changing content request in a heterogeneous ICN;
and step 3: converting the cache space distribution problem of the heterogeneous ICN into a network performance optimization problem of the heterogeneous ICN, and constructing a network performance optimization model, wherein the network performance optimization model comprises an optimization objective function and corresponding constraints;
the hit rate and the energy consumption of the content request are used as evaluation indexes of the performance of the heterogeneous ICN network, and an optimization objective function shown in a formula (12) is established:
Figure FDA0003581647550000017
wherein, NetPtotalIs the overall network performance of the heterogeneous ICN;
Figure FDA0003581647550000018
indicating a successful cache hit CRiNumber of times, Ni resRepresents CRiThe total number of requests received is,
Figure FDA0003581647550000021
presentation content router CRiThe request hit rate of;
Figure FDA0003581647550000022
indicating a routing node CRiEnergy consumption of (2); piIs CRiFixed energy consumption of router hardware when caching content;
Figure FDA0003581647550000023
to pass through CRiTransmitting energy consumption corresponding to the content of the unit byte; traiTo pass through CRiThe size of the data stream of (a);
Figure FDA0003581647550000024
representing content requesting nodes CRjAnd service node CRiThe distance of (d); ω and μ are request hit rate and energy consumption, respectively, for content router CRiCaching a weighted value of the network performance corresponding to the content of the unit size;
the constraints include a cache space constraint for each content router and a cache space constraint in the overall network topology as shown in equation (13):
Figure FDA0003581647550000025
wherein, CmaxRepresenting the maximum cache capacity that a content router in a heterogeneous ICN can allocate; ctotalRepresenting the whole maximum cache space of all content routers in the heterogeneous ICN;
and 4, step 4: applying a Q learning algorithm to each content request in the heterogeneous ICN to obtain a cache allocation scheme with optimal network performance corresponding to the content request at each moment:
and 5: and (4) combining the deep neural network with the Q learning algorithm, and training an optimal cache allocation scheme which is suitable for the content request with the dynamic change of the heterogeneous ICN by using the cache allocation scheme with the optimal network performance corresponding to the content request at each moment solved by using the Q learning algorithm in the step 4.
2. The method for distributing the cache of the heterogeneous information-centric network based on the deep reinforcement learning of claim 1, wherein the method for applying the Q learning algorithm to each content request in the heterogeneous ICN comprises: the content request at each time is expressed as a Q-learned state Status ═ s1,s2,…,stIn which s istRequesting q for content at time ttThe corresponding Q-learned state; expressing a topological model G (V, E, C, Long, Lati) of the heterogeneous information center network as QsEnvironment of study { e ═ e1,e2,…,etIn which etRequesting q for content at time ttA corresponding Q learning environment; representing the cache allocation scheme for a content router as the Action of Q-learning ═ { a }1,a2,…,atIn which a istRequesting q for content at time ttA corresponding Q learning action; performing a cache allocation scheme for network content requests returns a network performance value, denoted as the reward value of Q learning, r1,r2,…,rtIn which r istRequesting q for content at time ttThe corresponding Q learned reward value; in the Q learning process, selecting the action with the maximum reward value corresponding to each state for execution, and obtaining the strategy of Q learning after the Q learning process is finished
Figure FDA0003581647550000031
The action with the largest corresponding prize value is selected for each entered state and executed.
3. The heterogeneous information-centric network cache allocation method based on deep reinforcement learning according to claim 2, wherein the deep neural network is a BP neural network.
4. The method for allocating the cache of the heterogeneous information-centric network based on the deep reinforcement learning of claim 3, wherein the step 5 comprises the following specific steps:
step 5.1: randomly initializing a weight theta of the BP neural network;
step 5.2: learning the state and action(s) of Q at time T within a period Tt,at) As the input value of the neural network, the maximum reward value R(s) obtained by the Q learning algorithm is used correspondinglyt,atθ) and corresponding actions atOutput value y as a deep neural networkoutput
Step 5.3: calculating an estimated value of an output value of the BP neural network according to a Bellman equation;
step 5.4: calculating a corresponding loss value according to the output value of the BP neural network and the estimated value of the output value;
and step 5.5: updating the weight of the BP neural network by adopting a gradient descent method according to the loss value;
step 5.6: and according to the method of the steps 5.2 to 5.5, repeatedly executing the steps 5.2 to 5.5, and iteratively updating the theta until a condition T of stopping iteration is met, so as to obtain a final weight theta of the neural network, wherein the weight theta is used as an optimal cache allocation scheme of the content request adapting to the dynamic change of the T time period.
CN202110843043.6A 2021-07-26 2021-07-26 Heterogeneous information center network cache allocation method based on deep reinforcement learning Active CN113596138B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110843043.6A CN113596138B (en) 2021-07-26 2021-07-26 Heterogeneous information center network cache allocation method based on deep reinforcement learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110843043.6A CN113596138B (en) 2021-07-26 2021-07-26 Heterogeneous information center network cache allocation method based on deep reinforcement learning

Publications (2)

Publication Number Publication Date
CN113596138A CN113596138A (en) 2021-11-02
CN113596138B true CN113596138B (en) 2022-06-21

Family

ID=78250075

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110843043.6A Active CN113596138B (en) 2021-07-26 2021-07-26 Heterogeneous information center network cache allocation method based on deep reinforcement learning

Country Status (1)

Country Link
CN (1) CN113596138B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116996921B (en) * 2023-09-27 2024-01-02 香港中文大学(深圳) Whole-network multi-service joint optimization method based on element reinforcement learning

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108322352A (en) * 2018-03-19 2018-07-24 北京工业大学 It is a kind of based on the honeycomb isomery caching method to cooperate between group
CN110138748A (en) * 2019-04-23 2019-08-16 北京交通大学 A kind of network integration communication means, gateway and system
CN111586439A (en) * 2020-05-25 2020-08-25 河南科技大学 Green video caching method for cognitive content center network

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3206348B1 (en) * 2016-02-15 2019-07-31 Tata Consultancy Services Limited Method and system for co-operative on-path and off-path caching policy for information centric networks
CN106131202B (en) * 2016-07-20 2017-03-29 中南大学 Caching in Information central site network based on fluid dynamic theory places decision-making methods of marking
US11258879B2 (en) * 2017-06-19 2022-02-22 Northeastern University Joint routing and caching method for content delivery with optimality guarantees for arbitrary networks
CN110049039B (en) * 2019-04-15 2021-09-10 哈尔滨工程大学 GBDT-based information center network cache pollution detection method
CN111885648A (en) * 2020-07-22 2020-11-03 北京工业大学 Energy-efficient network content distribution mechanism construction method based on edge cache
CN112995950B (en) * 2021-02-07 2022-03-29 华南理工大学 Resource joint allocation method based on deep reinforcement learning in Internet of vehicles

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108322352A (en) * 2018-03-19 2018-07-24 北京工业大学 It is a kind of based on the honeycomb isomery caching method to cooperate between group
CN110138748A (en) * 2019-04-23 2019-08-16 北京交通大学 A kind of network integration communication means, gateway and system
CN111586439A (en) * 2020-05-25 2020-08-25 河南科技大学 Green video caching method for cognitive content center network

Also Published As

Publication number Publication date
CN113596138A (en) 2021-11-02

Similar Documents

Publication Publication Date Title
He et al. Qoe-based task offloading with deep reinforcement learning in edge-enabled internet of vehicles
CN110365514B (en) SDN multistage virtual network mapping method and device based on reinforcement learning
CN111556461B (en) Vehicle-mounted edge network task distribution and unloading method based on deep Q network
CN111400001B (en) Online computing task unloading scheduling method facing edge computing environment
CN113434212B (en) Cache auxiliary task cooperative unloading and resource allocation method based on meta reinforcement learning
CN112486690B (en) Edge computing resource allocation method suitable for industrial Internet of things
CN111711666B (en) Internet of vehicles cloud computing resource optimization method based on reinforcement learning
Rjoub et al. Trust-driven reinforcement selection strategy for federated learning on IoT devices
CN110247795B (en) Intent-based cloud network resource service chain arranging method and system
CN110601973A (en) Route planning method, system, server and storage medium
CN114328291A (en) Industrial Internet edge service cache decision method and system
CN113596138B (en) Heterogeneous information center network cache allocation method based on deep reinforcement learning
CN116566838A (en) Internet of vehicles task unloading and content caching method with cooperative blockchain and edge calculation
Li et al. DQN-enabled content caching and quantum ant colony-based computation offloading in MEC
CN115941790A (en) Edge collaborative content caching method, device, equipment and storage medium
Hu et al. Dynamic task offloading in MEC-enabled IoT networks: A hybrid DDPG-D3QN approach
Chen et al. Joint caching and computing service placement for edge-enabled IoT based on deep reinforcement learning
ABDULKAREEM et al. OPTIMIZATION OF LOAD BALANCING ALGORITHMS TO DEAL WITH DDOS ATTACKS USING WHALE‎ OPTIMIZATION ALGORITHM
CN113543160B (en) 5G slice resource allocation method, device, computing equipment and computer storage medium
Li et al. Optimal service selection and placement based on popularity and server load in multi-access edge computing
CN116684291A (en) Service function chain mapping resource intelligent allocation method suitable for generalized platform
CN113766540B (en) Low-delay network content transmission method, device, electronic equipment and medium
CN114928826A (en) Two-stage optimization method, controller and decision method for software-defined vehicle-mounted task unloading and resource allocation
CN113411826B (en) Edge network equipment caching method based on attention mechanism reinforcement learning
CN114125745A (en) MQTT protocol power control and QoS mechanism selection method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant