CN116938721A

CN116938721A - Digital twin-assisted network slice resource allocation method in industrial Internet of things

Info

Publication number: CN116938721A
Application number: CN202311113097.2A
Authority: CN
Inventors: 唐伦; 文雯; 李质萱; 陈前斌
Original assignee: Chongqing University of Post and Telecommunications
Current assignee: Chongqing University of Post and Telecommunications
Priority date: 2023-08-30
Filing date: 2023-08-30
Publication date: 2023-10-24

Abstract

The invention relates to a network slice resource allocation method assisted by digital twinning in an industrial Internet of things, which belongs to the technical field of mobile communication and comprises the following steps: s1: constructing IIoTSE-DTNS in an industrial Internet of things scene; s2: introducing the transmitting power of a wireless access network, channel allocation, association relation between industrial equipment and a state estimation VNF and scheduling variables of arrangement and flow of the VNF in a core network; s3: aoI is introduced to represent the freshness of the information and provide information age balance; performing state estimation by using the state estimation virtual network function VNF and the perception data of the industrial equipment which are deployed on the nodes, and describing the influence of information age on the state estimation; s4: by adjusting variables, establishing optimization problems with the aim of maximization of profit weighted by the information age balance value; s5: DT PER-MADCAC is proposed to solve the complex discrete and continuous motion problems.

Description

Digital twin-assisted network slice resource allocation method in industrial Internet of things

Technical Field

The invention belongs to the technical field of mobile communication, and relates to a network slice resource allocation method assisted by digital twinning in an industrial Internet of things.

Background

Industrial internet of things is the use of intelligent sensors and actuators to enhance manufacturing and industrial processes. Network slicing technology paves the way for multiple services of the industrial internet of things (IIoT), which enables multiple independent logical networks to run on the same physical Network infrastructure, network Slicing (NS) being essentially a Slice of the physical infrastructure that contains multiple IIoT Network domain resources. While management of network slices can take great benefit from network Digital Twinning (DT), which simulates and predicts the behavior of a physical object by digitally creating a virtual copy of it. How to accurately master the state of industrial equipment in the industrial internet of things, it is particularly important to acquire correct state information, and DTs monitor the state of industrial processes through a large number of sensors to support real-time feedback control.

Network virtual functions are deployed on virtual machines, containers and commercial off-the-shelf devices using the infrastructure provided by the network function virtualization (Network Function Virtualization, NFV) infrastructure, such that services are provided in the form of service function chains (Service Function Chain, SFCs), each SFC being a sequence of virtual network functions VNFs arranged in a predefined order, which will provide flexible network functions, improving the flexibility, scalability, manageability of the network. But now there are few researchers considering both state estimation and network slice resource allocation, on the other hand there are more authors studying the use of digital twinning in edge computing, and few authors combining it with the core network. The digital twinning can continuously monitor the performance under various operating conditions without affecting the physical network, and the optimization strategy is issued to the physical facilities in the digital twinning, so that the cost and risk of network hardware can be reduced.

Disclosure of Invention

In view of the above, the invention aims to provide a digital twin-assisted network slice resource allocation method in the industrial internet of things, which solves the problem of sensor perception data reliability in the industrial internet of things IIoT and the problem of network slice resource allocation optimization caused by limited network resources, reduces the cost of network slices and ensures the information age balance in the network.

In order to achieve the above purpose, the present invention provides the following technical solutions:

a digital twin-assisted network slice resource allocation method in the industrial Internet of things comprises the following steps:

s1: under the scene of the industrial Internet of things, constructing a network slice architecture IIoTSE-DTNS combining state estimation and digital twinning;

s2: introducing the transmitting power of a wireless access network, channel allocation, association relation between industrial equipment and a state estimation VNF and scheduling variables of arrangement and flow of the VNF in a core network;

s3: introducing information age AoI to represent the freshness of the information and providing information age balance; performing state estimation by using the state estimation virtual network function VNs deployed on the nodes and the perception data of the industrial equipment, and describing the influence of information age on the state estimation;

s4: by adjusting variables, establishing optimization problems with the aim of maximization of profit weighted by the information age balance value;

s5: digital twin-assisted multi-agent priority experience playback discrete continuous motion actor criticizing algorithm DT PER-MADCAC is provided to solve the composite discrete and continuous motion problems.

Further, the IIoTSE-DTNS architecture includes a physical layer, a digital twin network slice layer, and an application layer;

the physical layer comprises industrial Internet of things equipment, a base station and a core network NFV node;

the digital twin network slice layer comprises a twin network layer data domain, a digital twin network slice model domain and a digital twin network slice management domain, which correspond to a data sharing warehouse, a service mapping model and a network slice twin management subsystem respectively; the data sharing repository includes four responsibilities: data management, data service, data storage and data acquisition; the service mapping model comprises a functional model and a basic model; the network slice twin management comprises three responsibilities of model management, security management and topology management;

the application layer generated service is transmitted to the digital twin network slice layer through the twin northbound interface.

Further, the functional model in the service mapping model specifically comprises three parts of SFC management, algorithm function library and service management;

the SFC management comprises SFC group chain, VNF deployment, VNF scheduling, resource detection, resource allocation and flow prediction;

the algorithm function library comprises a DRL algorithm, a distributed component, a resource prediction algorithm and a node re-instantiation algorithm;

the service management comprises service twinning, service monitoring and service management.

Further, the basic model in the service mapping model specifically comprises a network element model and a network topology model;

the network element model comprises a node model and a link model, wherein the node model is twin of computing resources, storage resources, routing tables and the like of physical nodes, and the link model is twin of physical link numbers, bandwidth resources and link node numbers;

the network topology model is a network topology of a physical network topology corresponding to a twin.

Further, in step S3, information age update expressions of the service S data packet of the mth IIoT device at the t moment at the source node, the destination node, and the destination user are respectively constructedAnd is +.about.upper age of information according to the above three equations>Constructing an information age balance calculation formula:

where u represents a node, j represents a user,respectively representing a source node, a destination node and a set of destination users.

Further, in step S4, the profit maximization problem expression weighted by the information age balance value is:

where x represents the arrangement of VNFs, y represents flow scheduling, κ represents channel allocation, J represents the association of industrial equipment with a state estimation VNF, p represents continuous action transmit power, U _S Representing business profit, U _E Representing state estimation profit.

Further, the DT PER-MADCAC algorithm described in step S5 specifically includes the following steps:

s51: initializing each module of an actor network, a critic network, a target actor network, target critic network parameters, an experience playback cache D and a digital twin network slice layer;

s52: judging whether the set iteration times are exceeded, if so, stopping iteration, otherwise, continuing to execute S53;

s53: for N _sp The method comprises the steps of obtaining an initial state from a digital twin network slice layer basic model, selecting a t moment action for each intelligent agent according to a current strategy, adding Ornstein-Uhlenbeck noise to the action, and observing a t+1 moment state; if the optimization problem satisfies the constraint in the next state, the priority is maximized<Status, action, rewards, status at next moment>Tuple and stores D, otherwise executing S54;

s54: for N _sp Calculating t+1 moment action according to a strategy, and adding OU noise; calculating y value and TD error according to t+1 moment action, and updating sample priority;

s55: judging whether the set training times are exceeded, if so, stopping training, otherwise, continuing to execute S56;

s56: calculating a loss function, and updating commentator network parameters according to the loss function; calculating the gradient of the Q function values of all samples to continuous actions, and updating actor network parameters according to the gradient; and updating the target critics and the target actor parameters according to the soft update.

Further, the sample priority in step S5 is:

p _i ＝|δ _i |+ζ

wherein ζ is a small value that prevents the probability that the sample i is extracted from being 0; i represents the sample i, |delta _i The i represents TD error, defined as:

wherein y is _k For the target value of agent k, Q _k Q network representing agent k, s (t+1) being the state at time t+1, a _k (t+1) is the action of agent k, θ _k Representing parameters of the Q network.

The invention has the beneficial effects that: aiming at the problems of high network slice resource allocation cost and low profit caused by sensor perception data reliability and limited network resources in the industrial Internet of things, the invention provides an industrial Internet of things combined state estimation and digital twin network slice architecture. Based on the architecture, the state estimation is performed by using the state estimation virtual network function deployed on the nodes and the perceived data of the industrial equipment, so that the estimation error is reduced, the internal state of the equipment is better mastered, and the accuracy of digital twin is improved. In addition, information age (AoI) is introduced to characterize freshness of information, and information age balance is proposed, and profit weighted by the information age balance value is maximized by adjusting variables such as transmission power of a radio access network, channel allocation, association between industrial equipment and a state estimation VNF, arrangement of the VNF in a core network, and flow scheduling. And a digital twin-assisted multi-agent priority experience playback discrete continuous action actor criticizing algorithm is provided to realize high performance under different service request numbers and different network topologies. The invention can realize the optimization of the network slice resource allocation strategy, reduce the waste of network resources and promote profits.

Additional advantages, objects, and features of the invention will be set forth in part in the description which follows and in part will become apparent to those having ordinary skill in the art upon examination of the following or may be learned from practice of the invention. The objects and other advantages of the invention may be realized and obtained by means of the instrumentalities and combinations particularly pointed out in the specification.

Drawings

For the purpose of making the objects, technical solutions and advantages of the present invention more apparent, the present invention will be described in the following preferred detail with reference to the accompanying drawings, in which:

FIG. 1 is a schematic diagram of an IIoTSE-DTNS architecture;

FIG. 2 is a schematic diagram of the working principle of IIoTSE-DTNS architecture;

FIG. 3 is a schematic diagram of the information transmitter from the IIoT device to the destination subscriber;

FIG. 4 is a diagram of a training framework for DT PER-MADCAC algorithm.

Detailed Description

Other advantages and effects of the present invention will become apparent to those skilled in the art from the following disclosure, which describes the embodiments of the present invention with reference to specific examples. The invention may be practiced or carried out in other embodiments that depart from the specific details, and the details of the present description may be modified or varied from the spirit and scope of the present invention. It should be noted that the illustrations provided in the following embodiments merely illustrate the basic idea of the present invention by way of illustration, and the following embodiments and features in the embodiments may be combined with each other without conflict.

Wherein the drawings are for illustrative purposes only and are shown in schematic, non-physical, and not intended to limit the invention; for the purpose of better illustrating embodiments of the invention, certain elements of the drawings may be omitted, enlarged or reduced and do not represent the size of the actual product; it will be appreciated by those skilled in the art that certain well-known structures in the drawings and descriptions thereof may be omitted.

The same or similar reference numbers in the drawings of embodiments of the invention correspond to the same or similar components; in the description of the present invention, it should be understood that, if there are terms such as "upper", "lower", "left", "right", "front", "rear", etc., that indicate an azimuth or a positional relationship based on the azimuth or the positional relationship shown in the drawings, it is only for convenience of describing the present invention and simplifying the description, but not for indicating or suggesting that the referred device or element must have a specific azimuth, be constructed and operated in a specific azimuth, so that the terms describing the positional relationship in the drawings are merely for exemplary illustration and should not be construed as limiting the present invention, and that the specific meaning of the above terms may be understood by those of ordinary skill in the art according to the specific circumstances.

Referring to fig. 1 to 4, the invention provides a method for allocating network slice resources assisted by digital twinning in an industrial internet of things, which comprises the following steps:

s1: in the scene of industrial Internet of things, an IIoTSE-DTNS architecture is designed; FIG. 1 shows a diagram of IIoTSE-DTNS architecture.

At the physical layer, the physical network is regarded as an undirected graphUse->Representing the two physical nodes of the network,representing a physical link. />And->Representing the number of nodes and links, respectively. Use->Representing a set of IIoT devices, +.>Indicating the number of IIoT devices. />Representing a collection of destination users. C (C) _u Representing the computing resources of node u, B _(u,v) Representing bandwidth resources on the link uv.

At the application layer, a directed service function diagram is usedTo represent SFCR _s WhereinService function chain SFC has L _s VNFs, f arranged in a particular order _l ^s Representing SFC chain G _S Is defined by the first VNF of +.>Computing resources of the first VNF for service s on node u, +.>Estimating VNF for state on node u _E Is->For the device->Bit rate, D, of service s of (2) _s The packet size of the service s. With only one source function o for each service s _s And a destination function o _d Suppose that the physical nodes where the source function and the destination function are placed are called source node and destination node, the set of which is +.>And->

The network includes a state estimation VNF in addition to a traffic VNF _E For performing state estimation on IIoT equipment, the perceived data packet size for state estimation isWhich is much smaller than the traffic data packets. Definitions->Estimating VNF for state of device m on node u _E The required computing resources.

And the digital twin network slice layer comprises a twin network layer data field, a digital twin network slice model field and a digital twin network slice management field, which respectively correspond to the three subsystems of the data sharing warehouse, the service mapping model and the network twin body management.

The IIoTSE-DTNS workflow is shown in FIG. 2 and is divided into the following five steps: 1) The data acquisition, the digital twin is constructed based on the data, acquires the residual computing resources, the residual storage resources, the residual bandwidth resources of links among the infrastructures, the VNF type, the routing table and the state sensing data of the infrastructures of the physical network layer, and uploads the acquired data to the digital twin network slice layer through the twin southbound interface to provide data support for the digital twin network slice layer. 2) And constructing a virtual-real mapping model, and establishing a digital twin network corresponding to the corresponding physical network based on data in a data sharing warehouse, such as network element basic configuration information, network operation state configuration information, network topology configuration information and the like, so as to finish accurate description of the physical network. 3) The digital twin network slice layer designs functions according to service requirements input by a twin northbound interface of a network application layer, and resource allocation strategies such as VNF deployment, flow scheduling and the like are formulated on the premise of ensuring the overall performance. 4) After the digital twin network slice layer obtains the intention of the network application layer, the theoretical simulation is carried out, the existing capacity is called to realize virtual simulation result analysis on the digital twin network slice layer, network performance is ensured not to be greatly reduced, network resources are effectively utilized, network simulation results can be visualized in a digital twin network slice management module, and accuracy and effectiveness of twin network strategies are explained. 5) And (3) control issuing, namely continuously and iteratively optimizing network parameters and network performance according to the process of 'simulation-analysis-tuning-simulation', and issuing a simulation scheme to a physical network for actual deployment after tuning is completed, so that network cost and deployment risk can be reduced.

VNF orchestration refers to activating and deactivating deployed VNFs, traffic VNFs and state estimation VNF activation states are defined as follows:

the flow scheduling policy is related to the mapping between virtual links and physical links:

the core network delay includes a processing delayPropagation and transmission delaysd _uv For the distance between nodes u and v, c is the speed of light, η _uv Utility for the link spectrum.

The consumption of computing resources on the physical nodes includes the consumption of computing resources of the traffic VNF and the consumption of computing resources of the state estimation VNF:

inter-physical node linksDefinition of flow rate for upper network slice:

assume that the total bandwidth of an Uplink (UL) and Downlink (DL) access network is divided into I andthe index sets of the sub-carriers, uplink sub-carrier and downlink sub-carrier are +.>And->The total bandwidth allocated does not exceed a given total bandwidth budget, i.e. +.>And->Wherein b _i Representing the bandwidth of subcarrier i +.>Representing sub-carriers->Bandwidth of B _U And->The total bandwidth budget for UL and DL access networks, respectively. />Representing for allocating subcarriers to services +.>And an integer variable of the link between IIoT device m and node u, wherein if subcarrier i is allocated toService->And the link between IIoT device m and node u>Otherwise-> Indicating that subcarrier i is allocated to the link transmission-aware data between IIoT device m and node u. />Representing a user j e V for a service s and a destination node u and a destination user j e V _user Integer variable of subcarrier allocated for the link between service s and destination node u and destination user j, subcarrier allocated for the link between service s and destination node u and destination user j>Then->Otherwise->Traffic transmission rate of UL access network->The definition is as follows:

wherein the method comprises the steps ofAnd->The power of the service s allocated to the device m on the subcarrier i and the channel gain between the device m and the node u on the subcarrier i are represented, respectively. Transmission rate of similarly perceived data->Is that Representing the power allocated to the device m on subcarrier i for transmitting the perceived data. Transmission rate of a similar DL access network>Is->Wherein->And->Respectively represent sub-carriers +.>Power and subcarrier +.>And the channel gain between the destination node u and the destination user j.

The access network delay consists of transmission delay and propagation delay, and the calculation formulas of the uplink and downlink access network delay are respectivelyWherein d is _mu D is the distance between the device m and the source node u _uj For the distance between destination node u and destination user j.

Definition of the definition And->The upper limit of the packet of service s of IIoT equipment m at time t at TS t at source node u, destination node u and destination user j is AoI respectively +.>And->Defining binary variable +.>Data packet l indicating service s of mth IIoT device _ms (t) whether transmission was successful at time step t. If a data packet is delivered in time step t +.>Otherwise, go (L)>Let->Equal to 0 at t=0, if a packet arrives successfully at the source node u at TS t +.>Equal to the sum τ of propagation delay and transmission delay from device m to node u _UL Otherwise, let(s)>One slot delta will be added. />The update expression is as follows:

at the destination node, three cases are divided: (1)Wherein the method comprises the steps ofService s representing IIoT device m is +.>L _s Representing the number of VNFs for service s, when t time slot L _s Individual nodes and L _s Each link is assigned to a respective L of a service s _s VNFs, χ _sm (t)＝(L _s ) ² ；(2)Indicating that no data packet is transmitted from IIoT equipment m to source node u or data transmission fails in TS t-1 access network, but VNs are successfully placed on TS t node; (3) If neither data packet is transmitted from IIoT device m to source node u nor VNF is successfully placed +.>The information age of the destination node is increased by one time slot delta for the time t. To sum up, the drug is added with>The updates are as follows:

at destination user j, four cases are divided: 1)The service s of the TS t-2 equipment m successfully sends a data packet to the source node u, the VNF is successfully placed in the TS t-1, and the destination node u successfully sends the data packet to the destination user j in the TS t; 2)/>Indicating that no new data packet is received at the source node u of TS t-2, but VNs are successfully placed at TS t-1, and the data packet is successfully sent from the destination node u to the destination user j at TS t; 3)/>Indicating that a new data packet is not received at a TS t-2 source node, and TS t-1 VNs are not successfully placed, but the data packet is successfully transmitted to a destination user j at a TS t destination node u; 4) If the conditions are not met, the information age at the target user j is Z _sm (t) adding a TS length delta. />The updates of (2) are as follows:

information age equalization: the perceived data of IIoT is very sensitive to delay, so it is crucial to ensure the freshness of information, and the degree of equalization of the information ages of the service data packet at the source node, the destination node and the destination user reflects whether the allocation of the resources for transmitting the processing data in the network is reasonably balanced, and the following is a calculation formula of the information age equalization value:

when AoI at a place is smaller than the prescribed upper limit, the equalization value is 1; when AoI of the service packet is greater than the prescribed upper limit, an information age balance value is calculated according to formula (10), which averages the information age balance values at the source node, the destination node, and the destination user to represent the total information age balance value, whose value range is [0,1].

Assume that a state estimation VNF has been deployed on P nodes _E The association of device m with the state estimation VNF is expressed as a matrix J (t):

representing the association of IIoT device m with state estimation node p, < >>The transmission of the sensory data representing the device to node p is estimated by the state estimation VNF.

The linear system under consideration is monitored cooperatively by a plurality of sensors, and the discrete linear time-invariant system is monitored cooperatively by Γ _m (t+1)＝HΓ _m (t)+v _m (t) and Λ _m (t)＝C _m Γ _m +w _m (t), m.epsilon.M. Γ -shaped structure _m And (t) is the system state obtained by IIoT equipment m at TS t. Λ type _m (t) is the measurement of IIoT device m at TS t. H is the state transition matrix of the system, C _m For the measurement matrix of device m, v _m (t) and w _m (t) mean zero and varianceAnd->Is independent of each other.

The transmission performance of the sensor data directly affects the estimation performance. This document uses AoI to represent the delay and makes a state estimate at the base station based on the received sensor data. Assume that the latest received state information at node u isThen

The state estimation node u uses the received sensing information to estimate the system state and the estimate is given by:

wherein F is _m And (t) is the Kalman filtering gain of the IIoT device m.

The estimation performance is estimated by the estimated distortion, which is expressed as a function of the estimation error based on AoI:

for a service, the following costs will result:

the setup cost is only incurred when the VNF needs to be activated but is not active in the previous slot. Is provided withFor VNF f _l ^s The set cost of (2) is defined as

After the VNF is activated on the server, the network provider charges the operational cost per unit cost,representing VNF on node u _l ^s Is not limited, and the unit operation cost of the equipment is not limited. The total operating cost can be expressed as

Communication costs for transporting traffic streams from a server hosting its parent function are defined as

Wherein the method comprises the steps ofIs the unit communication cost on link uv.

The uplink and downlink wireless resource costs of the access network are respectively:

wherein the method comprises the steps ofAnd->Respectively represent subcarriers i and +.>Cost of transmission at unit rate.

C _S (t)＝C ^SU (t)+C ^OP (t)+C ^TR (t)+C ^UW (t)+C ^DW (t)

The business profit of the information age balance weighting is as follows:

wherein mu ₁ For normalizing constant, r _ms The unit traffic of service s for IIoT device m is the ideal return.

For state estimation, the following costs will result:

the running cost of the state estimation VNF is:

wherein the method comprises the steps ofFor VNF f _E Is set up at a low cost.

The device associates with a node deployed with a state estimation VNF to transmit perceived data for state estimation, models the associated cost as an exponential model, as shown below, where v _mp Weight coefficient, d _mp Estimating the distance phi of the node p for the device m from the state ^REL Is the unit associated cost.

The total state estimation cost is the sum of the running cost of the state estimation VNF and the associated cost:

C _E (t)＝C ^SE (t)+C ^REL (t)

the state estimation profit is defined as follows:

wherein mu ₂ 、μ ₃ For normalization constant, T is total time slot, r _m And rewarding the IIoT device m with unit perception data for performing state estimation.

The optimization targets are as follows, and are constrained by C1-C14:

c1 ensures that each physical node supports at most one VNF per SFC; c2 ensures that only one physical node activates SFCR in one slot _s VNF f of (v) _l ^s The method comprises the steps of carrying out a first treatment on the surface of the C3 ensures flow conservation, where V _O (u) and V _I (u) is represented as a set of child vertices (i.e., egress nodes) and parent vertices (i.e., ingress nodes) of u in the infrastructure network G. C4 and C5 are the single-path flow balance of the source node and the destination node; c6 ensures that the core network delay does not exceed the duration delta of one slot, wherein the delay is handledPropagation and transmission delay->d _uv The distance between the nodes u and v is the light speed, and c is the light speed; c7 and C8 represent capacity limits for nodes and links; c9 and C11 represent limits on uplink and downlink transmission rates; c10 represents that each uplink subcarrier can only be allocated to one IIoT device at most to transmit perceived data or traffic data; c12 each downlink subcarrier can only be allocated to one IIoT device at most to transmit service data; c13 and C14 represent uplink and downlink access delay constraints.

The DQN algorithm introduces a target network for calculating the objective function value, and the estimated Q function of the neural network is expressed as Q (s (t), a (t); θ (t)), and the parameter θ (t) has a neural network weight, and the neural network is trained by using the updated value to approximate the real Q value. The loss function L (θ (t)) is minimized in each iteration to update the network parameters, as follows:

wherein the target value y (t) =r (t) +γmax _a(t+1) Q (s (t+1), a (t+1), θ '(t)), a (t+1) represents an action of DNN generated in TS t+1 in a given state s (t+1), and θ' (t) represents a parameter of the target network. The target network in DQN uses hard updates and the expected value is taken from a small batch sampled from D.

Depth Deterministic Policy Gradient (DDPG) adds policy network and target policy network on its basis. DDPG comprises two independent neural networks, an actor network and a critic network. Actors output continuous actions a (t) through deterministic strategies pi (a (t) |s (t); ω (t)), and critics calculate losses through Q (s (t), a (t); θ (t)) to the Q value of the estimated network:actor parameters are updated by applying a strategy gradient method. The gradient of the loss function is used to update the parameter θ of the commentator DNN as follows:

wherein alpha is _c Is the learning rate of criticists. Further, since the parameter θ is updated using a small batch of size I, there are:

taking the TD error as a priority experience Playback (PER) index, and taking the TD error of a sample iAs a sample importance measure, the priority p of sample i _i ＝|δ _i ζ is a small value to prevent sample i from being extracted with a probability of 0. For a pair ofIn N _sp The time complexity of the tuple, sampling and updating the TD error is O (N _sp ) The extraction probability of sample i is +.>Uniform sampling when α=0, and greedy sampling when α=1. The importance weight of the sample i obtained by sampling probability isCompared with a uniform sampling method, the PER adjusts the sampling probability through the importance of the sample, and samples with high importance are used, so that the learning efficiency is improved, and the convergence rate is improved.

A PER-DCAC algorithm is proposed that combines DQN and DDPG algorithms and introduces a preferential empirical playback mechanism for importance sampling, and adjusts the probability that each tuple is sampled by assigning a weight to them, thereby improving training efficiency and performance.

Expressing the optimization objective as a Markov Decision Process (MDP), formally, the MDP is characterized by 5 tuples Wherein->And->Is the state space and the action space of the intelligent agent, < >>Is a function of the reward,representing the transition probability of MDP, gamma.epsilon. (0, 1) is the discount factor. The state space, action space and rewards function of an MDP are defined as follows: />

State space: the state s (t) at time slot t is represented by a vector of resource (node, bandwidth) usage of a particular NS, UL/DL channel conditions (channel gain) and traffic per SFC:wherein the first element->Is the computational resource of all physical nodes occupied by the network chip stream c _u Representing nodes occupied by network chip streams +.>Is a computing resource of (a). Second element->Is the bandwidth resource of all physical links occupied by the flows in NS, where b _e Is the link occupied by the flow in the network slice +.>Is allocated to the bandwidth resources of the mobile station. Third elementIs the radio transmission channel gain. Fourth element->Is device mFlow->Request rate at time slot t.

Actions should include VNF activation, flow scheduling, device association with state estimation node, UL/DL transmission power and subcarrier allocation, defined as a (t) = < x (t), y (t), J (t), ρ (t), p (t) >.

Bonus function: i.e. the objective of the DNSO problem, to formulate a reward function:

wherein r=u _S +U _E Note that when states and actions at t are given, rewards at t are deterministic. Negative infinity in the reward function is used to indicate the importance of the constraint, i.e., if the agent is unable to meet the constraint, it will be penalized by a very large negative reward.

In order to handle the relation of discrete action VNF orchestration, flow scheduling, association of devices, subcarrier allocation and continuous action transmit power allocation, the optimal resource allocation selection strategy is divided into two parts, namely a strategy for jointly selecting the optimal node and the appropriate subcarrier (discrete action) and a strategy for optimal transmit power allocation (continuous action). The discrete action of maximizing the Q function value in the current state s is:

a(t)＝argmax _a(t) Q ^* (s(t),a(t)；θ；π ^* (a(t)∣s(t)；ω))

wherein policy pi ^* (a (t) |s (t); ω) provides the best transmit power for a given state s (t), discrete action. Accurate estimation of Q due to high dimensional state and motion space ^* (s (t), a (t); θ) and pi ^* (a (t) |s (t); ω) is very time consuming and therefore employs a neural network approximation strategy function Q ^* And pi ^* 。

Because of the limitation of the learning ability of a Single Agent (SA), and in order to be more suitable for the practical environment, it is proposed to use a distributed multi-agent for collaborative learning, each agent transmits its own observations and actions to a central controller for centralized training, and then the agents perform decision-making distributed execution according to their own observations. The DT PER-MADCAC algorithm specifically comprises the following steps:

s53: for N _sp And obtaining an initial state from the digital twin network slice layer basic model, selecting a t moment action for each agent according to the current strategy, adding Ornstein-Uhlenbeck (OU) noise to the action, and observing a t+1 moment state. If the optimization problem satisfies the constraint in the next state, the priority is maximized<Status, action, rewards, status at next moment>Tuple and stores D, otherwise executing S54;

s54: for N _sp And calculating the action at the time t+1 according to the strategy, and adding OU noise. Calculating y value and TD error according to t+1 moment action, and updating sample priority;

s56: and calculating a loss function, and updating the commentary network parameters according to the loss function. The gradient of the Q function values of all samples to the continuous motion is calculated and the actor's home network parameters are updated based thereon. And updating the target critics and the target actor parameters according to the soft update.

Finally, it is noted that the above embodiments are only for illustrating the technical solution of the present invention and not for limiting the same, and although the present invention has been described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that modifications and equivalents may be made thereto without departing from the spirit and scope of the present invention, which is intended to be covered by the claims of the present invention.

Claims

1. A digital twin-assisted network slice resource allocation method in the industrial Internet of things is characterized in that: the method comprises the following steps:

2. The method for allocating network slice resources assisted by digital twinning in the industrial internet of things according to claim 1, wherein the method is characterized by comprising the following steps of: the IIoTSE-DTNS architecture comprises a physical layer, a digital twin network slice layer and an application layer;

3. The method for allocating network slice resources assisted by digital twinning in the industrial internet of things according to claim 2, wherein the method is characterized by: the functional model in the service mapping model comprises three parts, namely SFC management, algorithm function library and service management;

4. The method for allocating network slice resources assisted by digital twinning in the industrial internet of things according to claim 1, wherein the method is characterized by comprising the following steps of: the basic model in the service mapping model specifically comprises a network element model and a network topology model;

5. The method for allocating network slice resources assisted by digital twinning in the industrial internet of things according to claim 1, wherein the method is characterized by comprising the following steps of: in step S3, the information age update expressions of the service S data packet of the mth IIoT device at the t moment at the source node, the destination node and the destination user are respectively constructedAnd is +.about.upper age of information according to the above three equations> Constructing an information age balance calculation formula:

wherein u represents a node and j representsThe user can use the device to control the operation of the device,respectively representing a source node, a destination node and a set of destination users.

6. The method for allocating network slice resources assisted by digital twinning in the industrial internet of things according to claim 1, wherein the method is characterized by comprising the following steps of: in step S4, the profit maximization problem expression weighted by the information age balance value is:

7. The method for allocating network slice resources assisted by digital twinning in the industrial internet of things according to claim 1, wherein the method is characterized by comprising the following steps of: the DT PER-MADCAC algorithm described in step S5 specifically includes the following steps:

s54: for a pair ofIn N _sp Calculating t+1 moment action according to a strategy, and adding OU noise; calculating y value and TD error according to t+1 moment action, and updating sample priority;

8. The method for allocating network slice resources assisted by digital twinning in the industrial internet of things of claim 7, wherein the method comprises the following steps: the sample priority in step S5 is:

p _i ＝|δ _i |+ζ