CN117939535A

CN117939535A - Dependency task unloading method, terminal and storage medium in V2V scene

Info

Publication number: CN117939535A
Application number: CN202410316670.8A
Authority: CN
Inventors: 张本宏; 何聪; 胡琪炜; 徐浩; 毕翔; 杜朝阳
Original assignee: Intelligent Manufacturing Institute of Hefei University Technology
Current assignee: Intelligent Manufacturing Institute of Hefei University Technology
Priority date: 2024-03-20
Filing date: 2024-03-20
Publication date: 2024-04-26
Anticipated expiration: 2044-03-20
Also published as: CN117939535B

Abstract

The invention relates to the technical field of internet of vehicles (IOT) edge unloading and discloses a dependent task unloading method, a terminal and a storage medium in a V2V scene. The offloading method divides all subtasks in a computing task into critical subtasks and non-critical subtasks. When the key subtasks are unloaded, the key subtasks are duplicated and unloaded to two service vehicles; when the non-critical subtasks are offloaded and the offloading fails, a second offloading is performed on them. And then, a kinematic model, a communication model, a subtask calculation delay model, a subtask execution priority model and a link reliability model between the task vehicle and the service vehicle are built, and a constraint problem for generating an optimal unloading strategy is built. Modeling the constraint problem as a Markov decision process, solving the Markov decision process by adopting a deep reinforcement learning algorithm to obtain an optimal unloading strategy, and unloading the computing task to be unloaded of the task vehicle to the service vehicle according to the optimal unloading strategy. The invention improves the reliability of the task-dependent unloading.

Description

Dependency task unloading method, terminal and storage medium in V2V scene

Technical Field

The invention relates to the technical field of internet of vehicles edge unloading, in particular to a task-dependent unloading method under a V2V scene, a computer terminal applying the method and a computer-readable storage medium.

Background

With the development of 5G and Internet of vehicles, the explosive growth of novel vehicle-mounted applications such as automatic driving, augmented reality, virtual reality and the like brings great convenience to the life of people. At the same time, however, more and more computationally intensive and delay-sensitive computing tasks are being created, but since the computing and storage resources of the vehicle itself are limited, there are insufficient resources to meet the resource requirements of these tasks, which constitutes a significant challenge in ensuring the quality of service required by the vehicle. V2V (Vehicle-to-Vehicle) unloading is considered a very promising approach to solve the above-mentioned problems. For V2V offloading, many scholars have made great progress in optimizing task delays, reducing energy consumption, and stimulating the joining and offloading of vehicles with idle computing resources. However, the influence of the high-speed mobility of the vehicle on the unloading reliability is not considered, and the influence of the vehicle maneuver on the unloading reliability is reduced by partial students in a relatively passive manner.

For example, ahsan et al, document "A novel contract theory-based incentive mechanism for cooperative task-offloading in electricalvehicular networks[J]. IEEE Transactions on Intelligent Transportation Systems,2021, 23(7): 8380-8395."（ discloses a mechanism for electric vehicle network collaborative task offloading excitation based on contract theory [ J ]. IEEE intelligent transportation system theory), which is capable of reducing the influence of high-speed mobility by setting excitation measures to cause a service vehicle and a task vehicle to increase or decrease acceleration of each other synchronously during offloading. For this study, there is a problem in implementation due to uncertainty of various situations in a real scene.

C-V2X movement perception dynamic unloading strategy [ J ]. Physical communication under "Mobility-aware dynamic offloading strategy for C-V2X under multi-access edge computing[J]. Physical Communication, 2021, 49: 101446."（ multi-access edge calculation proposed by Li et al reduces the influence caused by high-speed mobility by establishing evaluation indexes and selecting proper service vehicles. However, this study also has problems in that the currently selected suitable service vehicle may not be the optimal solution due to unpredictability of vehicle travel, and there may be cases where unloading fails.

Waheed et al propose "An infrastructure-assisted job scheduling and task coordination in volunteer computing-based VANET[J]. Complex&Intelligent Systems, 2023,9:3613-3633."（ to divide a task into several subtasks based on infrastructure assisted job scheduling and task coordination J. Complex and intelligent systems in volunteer computing VANET, to offload the subtasks to a vehicle with free computing resources in the same RSU (roadside unit) for computing, to replicate the task and to select one more vehicle for offloading if the subtasks are offloaded to the vehicle for vehicles that may be driving out of the RSU communication range. For the study of Waheed et al, firstly, it does not consider the situation in the RSU-free scenario, and secondly, due to the unpredictable vehicle travel trajectory, it is impossible to determine whether the vehicle will travel out of the communication range for all service vehicles based on the existing vehicle speed and distance from the RSU, and it is possible for all service vehicles to travel out of the communication range.

In summary, the above research focuses on offloading strategies for independent tasks, only for independent tasks, and not for dependent tasks. However, with the development of technology, how to offload a plurality of interdependent subtasks and ensure the reliability of the task offload has become a urgent problem in vehicle network research.

Disclosure of Invention

In order to avoid and overcome the technical problem of lower reliability of dependent task unloading in the existing V2V unloading technology, the invention provides a dependent task unloading method, a terminal and a storage medium in a V2V scene.

In order to achieve the above purpose, the present invention provides the following technical solutions:

The invention discloses a dependent task unloading method in a V2V scene, which comprises the following steps of S1-S3.

S1, taking the urgency degree of a plurality of subtasks with dependency relations in a task vehicle computing task as a dividing standard, and dividing all the subtasks in the computing task into a critical subtask and a non-critical subtask.

When the key subtasks are unloaded, the key subtasks are copied and unloaded to two service vehicles; and when the non-critical subtasks are unloaded and the unloading fails, unloading the subtasks which fail to be unloaded for the second time.

S2, respectively constructing a kinematic model, a communication model, a subtask calculation delay model, a subtask execution priority model and a link reliability model between the task vehicle and the service vehicle, and accordingly establishing a constraint problem for generating an optimal unloading strategy.

And S3, modeling the constraint problem as a Markov decision process, solving the Markov decision process by adopting a deep reinforcement learning algorithm to obtain an optimal unloading strategy, and unloading subtasks to be unloaded of the task vehicle to the service vehicle according to the optimal unloading strategy.

As a further improvement of the above solution, in step S1, the method of dividing all the subtasks in the computing task into critical subtasks and non-critical subtasks includes the following specific steps, namely S11-S13.

S11, calculating the earliest unloading starting time and the latest unloading ending time of each subtask, wherein the calculation formula is as follows:

In the method, in the process of the invention, Representing the ith subtask/>, of the mission vehicle mIs the latest unloading end time of (3); /(I)Representing subtasks/>Is the earliest unload start time in (a); /(I)Representing subtasks/>Is a subsequent sub-task of (a); /(I)Representing subtasks/>Is a preamble sub-task of (2); /(I)Representing subtasks/>Data volume of/>Representing processing subtasks/>Computing resources required for each bit of data; f _max represents the most free resources of the service vehicle n; /(I)Representing subtasks/>Is a communication delay of (1); for subtasks/>, without a predecessor subtask，/>Is 0; for subtasks without subsequent subtasks/>，/>To calculate the maximum tolerable delay of the task.

S12, calculating the urgent degree of each subtask, wherein the expression formula is as follows:

In the method, in the process of the invention, Representing subtasks/>Is the urgency of (a); /(I)For subtask/>Is used for the loading of the load onto the vehicle.

S13, comparing the urgency degree of each subtask with a preset threshold value; and when the urgency degree of the subtask is not higher than the preset threshold, defining the subtask as a critical subtask, otherwise defining the subtask as a non-critical subtask.

As a further improvement of the above scheme, subtasksAverage unload time of/>The calculation formula of (2) is as follows:

Wherein n is the maximum number of service vehicles in the service vehicle set; f _j is the free computing resources of the j-th service vehicle.

As a further improvement of the above scheme, step S2 includes the following specific steps, namely S21-S26.

S21, constructing a kinematic model; the distance expression formula between the task vehicle and the service vehicle is as follows:

Wherein D _m,n (t) represents the distance between the task vehicle m and the service vehicle n at the time t when the sub-task execution phase is completed; d _m,n(t₀) represents the distance between the task vehicle m and the service vehicle n at the start time t ₀ of subtask offloading, D _m,n(t₀)=x_m(t₀)－x_n(t₀),x_m(t₀) and x _n(t₀) represent the positions of the task vehicle m and the service vehicle n at t ₀, respectively; deltav (t ₀)=v_m(t₀)－v_n(t₀),v_m(t₀) and v _n(t₀) represent the speeds of mission vehicle m and service vehicle n at t ₀, respectively; Δa (t ₀)=a_m(t₀)－a_n(t₀),a_m(t₀) and a _n(t₀) represent the acceleration of the mission vehicle m and the service vehicle n at t ₀, respectively.

S22, constructing a communication model; wherein the transmission delay of the task vehicle for transmitting the subtasks to the service vehicle through the uploading channel is as followsAnd/>; Wherein R _m,n represents the transmission rate of the mission vehicle m to the service vehicle n; representing subtasks/> The size of the data volume transmitted to the service vehicle n is defined by the subtask/>Data volume per se/>And the sum of the data amounts transmitted by the preamble subtasks.

S23, constructing a subtask calculation delay model; wherein the subtasksDelay in servicing vehicle offload tasksAnd/>; Subtask/>The time required to complete the unloading process is/>And (2) and; Subtask/>Unload start time of/>Subtask/>Is (1) the unloading end time isAnd/>，/>; In the/>For subtask/>The unload end time of the preamble subtask.

S24, constructing a subtask execution priority model; wherein subtasks of task vehicle m are definedFor subtask/>Preamble subtasks of/>Execution priority of (3) is higher than/>；/>Is/>And (2) and; Where pri (p) is the subtask/>Execution priority of the preamble subtask.

S25, constructing a link reliability model; wherein, in unloading subtasksWhen the link reliability P _m,n,i between the service vehicle n and the mission vehicle m is expressed as follows:

Wherein e is a natural constant; t _theroy is the theoretical time for the service vehicle to travel out of the communication range of the task vehicle m at the current speed; the range of values of P _m,n,i (0, 1).

S26, establishing a constraint problem as shown below according to the model constructed in the steps S21-S25:

Wherein D _success represents the number of calculation tasks successfully unloaded; constraint C1 represents the last subtask completion time of task vehicle m The maximum tolerant time delay T _m of the whole calculation task is not exceeded; constraint C2 indicates that initial free resource F _n serving vehicle n is between most free resource F _max and least free resource F _min; constraint C3 represents subtask/>Execution priority/>Lower than the lowest execution priority/>, among all its predecessor subtasks; Constraint C4 represents a subtaskUnloading start time/>Not earlier than the latest unload end time/>, of all its predecessor subtasks; In constraint C5,/>And/>Representing a set of task vehicles and a set of service vehicles, respectively.

As a further improvement of the above scheme, in step S22, the calculation formula of the transmission rate R _m,n is:

Wherein B _m,n represents the bandwidth of the upload channel; p _m denotes the transmission power of the mission vehicle m; Represents the path loss between the service vehicle and the mission vehicle, delta represents the path loss factor, and l represents the distance between the service vehicle and the mission vehicle; h represents the channel fading factor of the uploading link; n ₀ represents gaussian white noise power.

As a further improvement of the above solution, in step S3, the markov decision process includes: state space, action space, and reward functions.

The state space is used for representing information parameters of the service vehicle in the communication range of the task vehicle and state parameters of the self-calculation task at the current moment at each moment, and the parameters are used as input states of a deep reinforcement learning algorithm and are represented as s (t ')= [ C (t '), Q (t ') ]; wherein s (t ') is an input state at any time t'; c (t ') is t', said vehicle queue being a member of the service vehicle; and (3) a task queue when Q (t ') is t', wherein the task queue is taken as a member by the subtask.

In the unloading process of a computing task, when one subtask returns a computing result, recalculating the execution priority of the not-scheduled subtask in the computing task; when the service vehicle does not return an unloading result within the expected completion time, judging that the unloading is failed, changing the execution priority of the corresponding subtasks to 0, and putting the subtasks into an unloading queue again to wait for calculation; wherein, for a critical subtask, the maximum value of the predicted completion times offloaded to two service vehicles is taken as the predicted completion time of the critical subtask. When the offloading of the subtask is completed, the execution priority of the subtask becomes 1.

The action space is used for selecting a service vehicle for the subtasks; for a subtask with execution priority of 0, the subtask can be unloaded, so that the subtask can be unloaded to a service vehicle for calculation; all action sets of the action space are represented as action= {1,2, …, n },1,2, …, n being the number of the service vehicle.

The calculation formula of the reward function is; Where r _i (t ') represents the task vehicle's scheduling subtasks/>, at tThe awards obtained; t _true is subtask/>Is calculated as/>。

As a further improvement of the above, in the vehicle train C (t'), each service vehicle member includes four elements, expressed as:

where x _n(t')、a_n(t')、v_n (t') represents the position, acceleration, and speed, respectively, of the service vehicle n at the current time.

In the task queue Q (t'), each subtask member includes eight elements, denoted:

In the method, in the process of the invention, Representing subtasks/>Maximum tolerated time delay of (2); choose denotes a service vehicle to which the subtask is offloaded.

As a further improvement of the above solution, in step S3, the deep reinforcement learning algorithm is DDPG algorithm.

The invention also discloses a computer terminal which comprises a memory, a processor and a computer program stored in the memory and capable of running on the processor, wherein the processor realizes the steps of the dependent task unloading method under the V2V scene when executing the program.

The invention also discloses a computer readable storage medium, on which a computer program is stored, which when being executed by a processor, implements the steps of the dependent task offloading method in the V2V scenario described above.

Compared with the prior art, the invention has the beneficial effects that:

1. According to the dependency task unloading method under the V2V scene, the concept of a key subtask is provided in consideration of different urgency degrees of each subtask of the dependency task, the subtasks are divided into the key subtask and the non-key subtask before the task is unloaded, and different unloading modes are provided for the subtasks, so that the unloading success rate of the key subtask is ensured as much as possible, the influence of high-speed mobility of a vehicle on the dependent task unloading is reduced, and the unloading success rate of the whole calculation task is improved. On the basis, the invention constructs constraint problems according to the conditions and characteristics of the task vehicles, the service vehicles and the subtask nodes, optimizes the unloading strategy by using a deep reinforcement learning method, and further ensures the unloading reliability of the dependent tasks.

2. The computer terminal and the computer readable storage medium disclosed by the invention can produce the same beneficial effects as the method by applying the task-dependent unloading method, and are not repeated here.

Drawings

FIG. 1 is a flowchart of a method for offloading dependent tasks in a V2V scenario in embodiment 1 of the present invention.

Fig. 2 is a directed acyclic graph of the formation of multiple subtasks in example 1 of the present invention.

Fig. 3 is a system model diagram of a mission vehicle and a service vehicle on a lane in the embodiment 1 of the present invention at the time of V2V communication.

FIG. 4 is a system model diagram of the task vehicle of FIG. 3 in offloading computing tasks to a service vehicle.

Fig. 5 is a bar graph of task success rates for various scenarios under different initial service vehicle numbers in embodiment 1 of the present invention.

Fig. 6 is a histogram of task success rates for various schemes of embodiment 1 of the present invention at different numbers of subtasks.

Fig. 7 is a bar chart of task success rates for various schemes in different communication ranges in embodiment 1 of the present invention.

Fig. 8 is a graph showing average time delay lines for various schemes of example 1 of the present invention for different initial service vehicle numbers.

Fig. 9 is a graph showing the average delay line for various schemes of example 1 according to the present invention at various numbers of subtasks.

Fig. 10 is a graph showing the average delay line for various schemes in example 1 of the present invention under different communication ranges.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

Example 1

Referring to fig. 1, the present embodiment provides a method for unloading a dependent task in a V2V scenario, which is mainly divided into two parts, wherein the first part is to reduce the influence of high-speed mobility of a vehicle on the unloading of the dependent task, and the second part is to generate an unloading strategy scheme for the dependent task based on the first part scheme. Specifically, the unloading method includes the following steps, S1-S3.

To address the reliability problem of task offloading, reliability of offloading may be ensured by replicating and offloading subtasks of a computing task to two service vehicles. However, due to the limited number of service vehicles within the communication range of the task vehicle, all subtasks cannot be duplicated and offloaded. Therefore, the invention takes the urgency degree of a plurality of subtasks with dependency relations in the task vehicle computing task as a division standard, divides all the subtasks of the computing task into a key subtask and a non-key subtask, and provides an active replication mechanism based on the key subtask to solve the problem of reliability of task unloading.

The present embodiment classifies vehicles traveling on a road into two categories: mission vehicles and service vehicles. For a set of task vehicles and service vehicles, respectivelyAnd/>To represent. For the service vehicle n, its free computing resources are denoted as F _n,F_n∈[F_min,F_max. When the service vehicle has idle computing resources, the service vehicle can continuously broadcast own information (including position, speed and idle computing resources), and when the task vehicle needs to unload a computing task, the task vehicle selects a proper service vehicle to unload according to the received service vehicle information and computing task conditions. When the service vehicle receives the calculation task, a message is broadcast to the vehicles within the communication range, which indicates that the service vehicle has no free calculation resources, and when the calculation task is completed, the service vehicle can restart broadcasting the information.

The calculation task of the task vehicle m is denoted as D _m=(V_m,R_m,T_m), whereAll subtask sets representing task vehicle m, use/>Representing subtasks/>And subtask/>Dependency relationship between subtasks/>Is a subtask/>Preamble node, task/>After completion, task/>Can be performed,/>For subtask/>And subtask/>Data traffic between the two, T _m, is the maximum tolerated delay of the computing task.

As shown in fig. 2, each circle represents a subtask node, edges (a, b) represent the predecessor nodes of subtasks a and b, and the values of edges (a, b) represent the data traffic between subtasks, the dependency between subtasks being irreversible. Each subtask node is represented asWherein/>Data quantity (bits) representing subtask node i of vehicle m,/>Representing the number of CPU cycles required to process 1bit data of a task, i.e., the computational resources,/>, andRepresenting the maximum tolerant delay of the subtask node,/>Representing earliest offload start time for the subtask node,/>Indicating the latest offload completion time for that subtask node.

In step S1, the method for dividing all subtasks in the computing task into critical subtasks and non-critical subtasks comprises the following specific steps, namely S11-S13.

S11, calculating the earliest starting time and the latest finishing time of each subtask, wherein the calculation formula is as follows:

In the method, in the process of the invention, Representing subtasks/>Is a subsequent sub-task of (a); /(I)Representing subtasks/>Is a preamble sub-task of (2); /(I)Representing subtasks/>Is a communication delay of (1); for subtasks/>, without a predecessor subtask，/>Is 0; for subtasks without subsequent subtasks/>，/>To calculate the maximum tolerated time delay of the task, i.e., T _m.

In order to improve the unloading reliability and consider the limited service vehicle, the invention divides the subtasks into critical subtasks and non-critical subtasks and adopts different unloading strategies to unload the subtasks. Thus, step S12 is performed.

In the method, in the process of the invention, Representing subtasks/>Is the urgency of (a); /(I)For subtask/>Average unload time of/>The calculation formula of (2) is as follows:

Step S13 may be understood as a subtask satisfying the following conditions is defined as a critical subtask:

In the invention, when the subtask is unloaded and fails, for the critical subtask, if the service vehicle is reselected for the second unloading, if The time after the sub-task re-offload is completed may be greater than the latest completion time, which may result in a computing task failure. Therefore, in order to improve the reliability of unloading, the invention unloads the key subtasks to two service vehicles in a copying way when unloading the key subtasks, thereby improving the unloading success rate and the reliability. In addition, the invention adds sigma as a margin to the above conditional formula to reduce these effects, taking into account the variability of service vehicles, the variability of service vehicles' idle resources, and the communication delay, the sigma value range being (0, 1), the present embodiment sigma taking 0.5. For non-critical subtasks, only a second time of offloading subtasks when the subtask offload fails.

In this embodiment, step S2 includes the following specific steps, S21-S26.

S21, constructing a kinematic model. The speed difference between the task vehicle and the service vehicle may cause the distance between the two workshops to increase/decrease, resulting in connection interruption beyond the maximum communication range of the task vehicle, resulting in task failure. The offloading process is only completed when the service vehicle and the mission vehicle remain within V2V communication after the up-going and execution phases are completed. Therefore, the invention needs to calculate the distance between the task vehicle and the service vehicle at the moment when the sub-task execution stage is completed, if the value is smaller than the V2V maximum communication range, the sub-task is successfully unloaded, and as each automobile has respective speed and acceleration, the distance expression formula between the task vehicle and the service vehicle is as follows:

Wherein D _m,n (t) represents the distance between the task vehicle m and the service vehicle n at the time t when the sub-task execution phase is completed; d _m,n(t₀) represents the distance between the task vehicle m and the service vehicle n at the off-load task start time t ₀, D _m,n(t₀)=x_m(t₀)－x_n(t₀),x_m(t₀) and x _n(t₀) represent the positions of the task vehicle m and the service vehicle n at t ₀, respectively; deltav (t ₀)=v_m(t₀)－v_n(t₀),v_m(t₀) and v _n(t₀) represent the speeds of mission vehicle m and service vehicle n at t ₀, respectively; Δa (t ₀)=a_m(t₀)－a_n(t₀),a_m(t₀) and a _n(t₀) represent the acceleration of the mission vehicle m and the service vehicle n at t ₀, respectively.

In reality, however, the behavior of each car is independent, and the task vehicle only knows the speed of the service vehicle at the time when the off-load task starts, and cannot determine the speed of the service vehicle between the starting time and the task execution completion time. Therefore, the predicted distance D _m,n (t) is only a reference factor.

Wherein, the transmission rate R _m,n can be obtained by shannon formula:

Further, since the size of the output data of the calculation task is generally much smaller than the size of the input data, the present embodiment ignores the result return time in calculating the communication delay.

Wherein, when the subtaskWhen the first node of the computing task is the first node, then the subtask/>Is 0; besides, subtasks/>Is the maximum unload complete time for all of its immediate predecessor subtasks.

Wherein, when a subtask node is unloaded, the priority of the node becomes 1. When the execution priority of the subtask node becomes 0, the node can perform unloading calculation. By a first subtask of the mission vehicle mThe priority of each subtask node is recursively calculated throughout the computing task. As can be seen from the above execution priority formula, when the execution priority of all the direct-predecessor subtasks of a node becomes 1, the execution priority of the node is 0, and the unloading calculation can be performed.

S25, constructing a link reliability model. When the number of service vehicles with idle resources in the environment exceeds the current required number of sub-tasks to be offloaded, the invention uses the link reliability as an index to select the service vehicles to be offloaded. The reliability of the link can ensure the stability of the unloading process and reduce the possibility of sub-node task failure and retransmission. Wherein, in unloading subtasksWhen the link reliability P _m,n,i between the service vehicle n and the mission vehicle m is expressed as follows:

Wherein e is a natural constant; t _theroy is the theoretical time for the service vehicle to travel out of the communication range of the task vehicle m at the current speed; as can be seen from the above link reliability formulas, when the speed and acceleration of the mission vehicle m and the service vehicle n are identical at time t ₀, the link connection is considered to be very reliable, the larger the value of P _m,n,i, the better the link reliability between vehicles, i.e., the ratio of t _theroy to t The larger the link reliability is, the better the link reliability is.

S26, establishing constraint problems according to the model constructed in the steps S21-S25. The invention fully utilizes the computing resources of the service vehicle by offloading the subtask nodes to the service vehicle available in the Internet of vehicles environment. Due to the complexity and diversity of the vehicle network environment, there are multiple subtask nodes to be offloaded and multiple service vehicles available, and at the same time, each subtask node is different, and the idle computing resources and driving conditions of each service vehicle are also different. Thus, to maximize the offloading success rate of the computing tasks, the present invention determines an offloading policy based on the characteristics of each service vehicle and each subtask node. This problem is a linear integer programming problem, which has been demonstrated to be NP-HARD (non-deterministic polynomial), in order to maximize the offloading success rate, the present invention describes the constraint problem as:

Wherein D _success represents the number of calculation tasks successfully unloaded; constraint C1 represents the last subtask completion time of task vehicle m The maximum tolerance time delay of the whole calculation task is not exceeded; constraint C2 indicates that initial free resource F _n serving vehicle n is between most free resource F _max and least free resource F _min; constraint C3 represents subtask/>Execution priority/>Lower than the lowest execution priority/>, among all its predecessor subtasks; Constraint C4 represents a subtaskUnloading start time/>Not earlier than the latest unload end time/>, of all its predecessor subtasks。

For the NP-hard problem, the prior art is usually solved by using a heuristic algorithm, however, most of the heuristic algorithms in the prior art are unstable in the actual vehicle network, and cannot realize fast decision when facing the large-scale problem. The deep reinforcement learning algorithm is considered as an effective method for searching the optimal strategy in the complex dynamic system, and can be solved by modeling the problem as a Markov decision process and then by an algorithm based on the deep reinforcement learning algorithm, see step S3 in detail.

In the internet of vehicles environment defined in this embodiment, a group of task vehicles are set to offload the computing tasks with the dependency relationships. Because the computing resources of the vehicle itself cannot support completion of the computing task within the maximum tolerable latency of the computing task, the task vehicle needs to offload the computing task to a service vehicle within communication range. The invention aims to maximize the unloading success rate of a computing task by specifying an optimal unloading strategy. At the system start time, the task vehicle gathers information about all service vehicles within its communication range, inserts one service vehicle as a member into the vehicle queue C, each member of the queue comprising four elementsAnd respectively representing the position, acceleration and speed of the idle computing resource at the current moment. In addition, each subtask that calculates the task weight can also be described by eight elementsThe input data size, the computing resources required by each bit, the maximum tolerance time delay, the earliest starting time, the latest finishing time, the task urgency degree, the execution priority and the task unloading destination are respectively represented. The tuple for each subtask is placed in the task queue Q.

The Markov decision process includes: state space, action space, and reward functions.

State space: the invention takes the information parameters of the service vehicle in the communication range and the state parameters of the self-calculation task at the current moment as the input states of a deep reinforcement learning algorithm and expresses the parameters as follows:

s(t’)=[C(t’),Q(t’)]

Wherein s (t ') is an input state at any time t'; c (t ') is t', said vehicle queue being a member of the service vehicle; and (3) a task queue when Q (t ') is t', wherein the task queue is taken as a member by the subtask.

In the vehicle queue C (t'), each service vehicle member includes four elements, denoted:

In the task queue Q (t'), each subtask member includes eight elements, denoted:

In the unloading process of a computing task, when one subtask returns a computing result, recalculating the execution priority of the not-scheduled subtask in the computing task; when the service vehicle does not return an unloading result within the expected completion time, judging that the unloading is failed, changing the execution priority of the corresponding subtasks to 0, and putting the subtasks into an unloading queue again to wait for calculation; wherein, for a critical subtask, the maximum value of the predicted completion times offloaded to two service vehicles is taken as the predicted completion time of the critical subtask; when the offloading of the subtask is completed, the execution priority of the subtask becomes 1.

The reward function is used for maximizing the success rate of the calculation task, and consists of the following three parts:

(1) Time delay: with latest offload completion time of subtask nodes Subtracting the actual completion time t _true indicates that the greater the value the more computing resources the subtask node chooses to serve the vehicle. Furthermore, if the predecessor node of the subtask node fails to offload, this may result in an increase in the predicted completion time of the subtask node, which may result in a decrease in the latency portion.

(2) Importance level: for different subtask nodes, the importance degree is different according to the urgency degree, and the higher the urgency degree is, the higher the importance degree on the computing task is, and the higher the influence degree on the computing task delay is. The initial attribute for each subtask node is a constant value.

(3) Link reliability: in the invention, the reliability of the link is positively correlated with the reliability of task unloading, and the higher the reliability of the link is, the lower the probability of failure of the subtask node is.

Thus, the calculation formula for the bonus function is:

Wherein r _i (t') represents that the task vehicle schedules the subtasks at t The awards obtained; t _true is a subtaskIs calculated as/>。

In this embodiment, the deep reinforcement learning algorithm in step S3 may be DDPG algorithm (DEEP DETERMINISTIC Policy Gradient, depth deterministic Policy Gradient algorithm).

DDPG is an algorithm for outputting deterministic actions, whose structural form mainly includes policy network, target policy network, value network and target value network.

Among them, the policy Network (Actor Network) is a neural Network for learning deterministic policies in DDPG algorithm. It receives as input the state of the environment and outputs a deterministic action. The goal of the policy network is to maximize the jackpot to find the optimal policy. During training, the policy network updates the parameters by a gradient ascent method so that the expected jackpot increases.

The target policy network (Target Actor Network) is a copy of the policy network for generating the target action. Its parameters are updated periodically from the policy network by Soft Update (Soft Update) to remain more stable. The existence of the target strategy network helps to reduce estimation errors of the target value network, thereby improving training stability.

A value Network (Critic Network) is a neural Network used to estimate a cost function of a state-action pair. It receives as input the state and action of the environment, outputs a corresponding action value function, representing the expected jackpot under a given state and action. The goal of the value network is to minimize the mean square error to approximate the cumulative value of the true rewards.

The target value Network (TARGET CRITIC Network) is a copy of the value Network for generating the target value. Similar to the target policy network, the parameters of the target value network are also updated periodically from the value network by soft updating. The objective value network is used for reducing the estimation error of the action value function, so that the stability of algorithm training is improved.

It should be noted that DDPG also employs an experience replay technique to store the sampled experiences in order to the experience pool, each of which consists of information of current state s, action a taken, received reward r, next state s' and termination state, and retains a certain amount of recent experience. During the training process, a batch of experience samples are randomly extracted from the experience pool as training data. This helps break the temporal correlation between samples, preventing the negative impact of data correlation on algorithm learning. The method accelerates the convergence speed of the algorithm, improves the stability of exploration and strategy, and finally improves the performance of the algorithm.

In depth deterministic strategy gradients, the Markov Decision Process (MDP) is a mathematical framework used to describe reinforcement learning problems, including state space, action space, reward functions, transition probabilities, and the like. In applying DDPG algorithm, the MDP is used to describe the state transition and rewards mechanism of the problem, thereby guiding the agent to learn and make decisions.

Specifically, how the Markov decision process is applied in the DDPG algorithm is as follows:

1. State space (STATE SPACE): all possible states are defined in the MDP, representing the current situation of the environment. In the DDPG algorithm, the state space is typically represented by a vector or image that is used to describe the current observations of the environment.

2. Action space (Action space): all actions that an agent can take are defined in the MDP. The DDPG algorithm outputs deterministic actions through the policy network so that the intelligent agent can make decisions directly in the continuous action space.

3. Bonus function (Reward function): the rewards function in the MDP defines the immediate rewards that the agent gets during the state transition. The DDPG algorithm adjusts the policy network based on the environmental return prize signal to maximize the future jackpot.

4. Transition probability (Transition probabilities): MDP describes the probability that an agent will transition to the next state after taking some action in the state space. In the DDPG algorithm, new states and rewards are obtained by interacting with the environment without explicit modeling transition probabilities.

5. Value function (Value function): the cost function in the MDP is used to evaluate the value of the state or state-action pair, guiding the agent in making decisions. In the DDPG algorithm, a value function network (Critic) is used to estimate the value of the state-action pairs, helping to optimize the policy network.

By integrating the various components of the MDP into the DDPG algorithm, the agent can continually learn and refine the strategy based on the state, rewards, and value information of the environment to achieve a long-term cumulative maximum rewards. In this process, the theoretical framework of the Markov decision process directs the agent to make a reasonable decision in the reinforcement learning task, thereby achieving efficient learning.

The invention applies DDPG algorithm to obtain the optimal unloading strategy, guides the learning process of an Agent by setting a proper reward function, and the DDPG algorithm combines the ideas of an experience playback mechanism, strategy gradient and Q learning (Q-learning), updates the neural network parameters by maximizing the accumulated reward value and continuously optimizes the strategy by utilizing optimization algorithms such as gradient descent and the like so as to approach the optimal strategy.

Referring to fig. 3 and 4, the present embodiment considers a multi-lane one-way road, where a plurality of vehicles travel in the same direction, the vehicle travel speed range is [0,80] km/h, the subtask number range of the calculation task is [8,25], the V2V communication bandwidth is 10mhz, the V2V maximum communication range is 50-100m, and the number of service vehicles in the task vehicle communication range is 5-10.

For the model constructed in the invention, the initial input data is design data, which is the initial state of DDPG algorithm, and changes with the progress of time and execution of actions in DDPG algorithm. The input data includes two parts, a vehicle part and a calculation task part. The vehicle part comprises the following elements, namelyAnd respectively representing the idle computing resources, the position, the acceleration and the speed of the vehicle, wherein the range of the idle computing resources is 4-5GHz. The computing task section contains the following elements,/>The input data size, the computing resources required by each bit, the maximum tolerance time delay, the earliest starting time, the latest finishing time, the task urgency degree, the execution priority and the task unloading destination are respectively represented. Wherein/>Is in the range of 600-1200KB,/>In the range of 1500-2500 megaweeks,/>Is in the range of 1-5s. The four parts of earliest starting time, latest finishing time, task urgency and execution priority can be obtained by the calculation formula. The output data is in two parts: one part is the return (rewards) per round of the DDPG algorithm when training the model; the other part is whether the delay and unloading obtained after the simulation experiment are successful or not by using the trained model after the training is finished, and the part is the final required result.

Referring to fig. 5 to 10, in order to embody the superiority of the unloading scheme proposed by the present invention over the existing scheme, it is verified by comparing with the following three schemes:

(1) No duplication: the scheme does not replicate subtasks, and other contents are the same as the scheme provided by the invention.

(2) Greedy algorithm replication: the scheme uses a greedy algorithm for copy unloading, and the rest is consistent with the scheme provided by the invention.

(3) Maximum subtask replication: the scheme selects the subtasks requiring more computing resources from all the subtasks to copy, and other contents are consistent with the scheme provided by the invention.

In fig. 5, the number of subtasks of the task is set to 25, and the communication distance, i.e., the communication range, is 50 meters. As the number of service vehicles increases, subtasks have more offload options, and the available computing resources increase, resulting in increased task success rates.

In fig. 6, the service vehicles are 7 vehicles, and the communication distance is 50 meters. As the number of subtasks increases, the completion time of the dependent tasks increases and the likelihood of the service vehicle driving out of communication range increases. The increase in computational resources required to rely on tasks results in subtasks being offloaded to lower-ranked service vehicles due to the limited number of service vehicles. In short, as the number of subtasks increases, the task success rate decreases.

In fig. 7, the number of service vehicles is 7, and the number of subtasks is 25. With the increase of the communication distance, the probability that the service vehicle exits the communication range is reduced, and the success rate of depending on tasks is improved.

In fig. 8, the number of service vehicles is 7, and the communication distance is 50 meters. As the number of service vehicles increases, the mission vehicles may select a better service vehicle to offload, thereby reducing the average delay time. Meanwhile, due to the difference of task success rates, the average calculation delay of the unloading strategy adopted by the embodiment is superior to that of other strategies.

In fig. 9, the number of service vehicles is 7, and the communication distance is 50 meters. With the increase of the number of subtasks, the dependent task completion time also increases, and the average delay of the four strategies is different for the same subtask due to the different task success rates and the different computing resource usage of the four strategies.

In fig. 10, there are 7 service vehicles and 25 subtasks. As the communication distance increases, the probability of the service vehicle exiting the communication range decreases, the task success rate increases, and the average delay decreases.

In summary, compared with the three schemes, the invention has obvious advantages of task success rate and average delay under different conditions.

Example 2

The present embodiment provides a computer terminal comprising a memory, a processor and a computer program stored on the memory and executable on the processor.

The computer terminal may be a smart phone, a tablet computer, a notebook computer, etc. capable of executing a program. The processor may be a central processing unit (Central Processing Unit, CPU), controller, microcontroller, microprocessor, or other data processing chip in some embodiments. The processor is typically used to control the overall operation of the computer device. In this embodiment, the processor is configured to execute the program code stored in the memory or process the data. The processor, when executing the program, implements the steps of the dependent task offloading method in the V2V scenario in embodiment 1.

Example 3

The present embodiment provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the dependent task offload method in the V2V scenario in embodiment 1.

The computer readable storage medium may include flash memory, hard disk, multimedia card, card memory (e.g., SD or DX memory, etc.), random Access Memory (RAM), static Random Access Memory (SRAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), programmable Read Only Memory (PROM), magnetic memory, magnetic disk, optical disk, etc. In some embodiments, the storage medium may be an internal storage unit of a computer device, such as a hard disk or memory of the computer device. In other embodiments, the storage medium may also be an external storage device of the computer device, such as a plug-in hard disk provided on the computer device, a smart memory card (SMART MEDIA CARD, SMC), a secure digital (SecureDigital, SD) card, a flash memory card (FLASH CARD), or the like. Of course, the storage medium may also include both internal storage units of the computer device and external storage devices. In this embodiment, the memory is typically used to store an operating system and various application software installed on the computer device. In addition, the memory can be used to temporarily store various types of data that have been output or are to be output.

The foregoing is only a preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art, who is within the scope of the present invention, should make equivalent substitutions or modifications according to the technical scheme of the present invention and the inventive concept thereof, and should be covered by the scope of the present invention.

Claims

1. The method for unloading the dependent tasks in the V2V scene is characterized by comprising the following steps of:

s1, taking the urgency degree of a plurality of subtasks with dependency relations in a task vehicle calculation task as a division standard, and dividing all the subtasks in the calculation task into a critical subtask and a non-critical subtask;

when the key subtasks are unloaded, the key subtasks are copied and unloaded to two service vehicles; when non-critical subtasks are unloaded and the unloading is failed, unloading the subtasks which are unloaded failed for the second time;

s2, respectively constructing a kinematic model, a communication model, a subtask calculation delay model, a subtask execution priority model and a link reliability model between the task vehicle and the service vehicle, and accordingly establishing a constraint problem for generating an optimal unloading strategy;

2. The method for offloading dependent tasks in a V2V scenario according to claim 1, wherein in step S1, the method for dividing all subtasks in the computing task into critical subtasks and non-critical subtasks comprises the following specific steps:

In the/> Representing the ith subtask/>, of the mission vehicle mIs the latest unloading end time of (3); /(I)Representing subtasks/>Is the earliest unload start time in (a); /(I)Representing subtasks/>Is a subsequent sub-task of (a); /(I)Representing subtasks/>Is a preamble sub-task of (2); /(I)Representing subtasks/>Data volume of/>Representing processing subtasks/>Computing resources required for each bit of data; f _max represents the most free resources of the service vehicle n; /(I)Representing subtasks/>Is a communication delay of (1); for subtasks/>, without a predecessor subtask，/>Is 0; for subtasks without subsequent subtasks/>，/>The maximum tolerance time delay of the task is calculated;

In the/> Representing subtasks/>Is the urgency of (a); /(I)For subtask/>Is a mean unloading time of (2);

3. The method for offloading dependent tasks in a V2V scenario as claimed in claim 2, wherein the subtasks areAverage unload time of/>The calculation formula of (2) is as follows:

4. The method for offloading dependent tasks in a V2V scenario according to claim 2, wherein step S2 comprises the specific steps of:

Wherein D _m,n (t) represents the distance between the task vehicle m and the service vehicle n at the time t when the sub-task execution phase is completed; d _m,n(t₀) represents the distance between the task vehicle m and the service vehicle n at the start time t ₀ of subtask offloading, D _m,n(t₀)=x_m(t₀)－x_n(t₀),x_m(t₀) and x _n(t₀) represent the positions of the task vehicle m and the service vehicle n at t ₀, respectively; deltav (t ₀)=v_m(t₀)－v_n(t₀),v_m(t₀) and v _n(t₀) represent the speeds of mission vehicle m and service vehicle n at t ₀, respectively; Δa (t ₀)=a_m(t₀)－a_n(t₀),a_m(t₀) and a _n(t₀) represent the acceleration of the mission vehicle m and the service vehicle n at t ₀, respectively;

S22, constructing a communication model; wherein the transmission delay of the task vehicle for transmitting the subtasks to the service vehicle through the uploading channel is as follows And/>; Wherein R _m,n represents the transmission rate of the mission vehicle m to the service vehicle n; /(I)Representing subtasks/>The size of the data volume transmitted to the service vehicle n is defined by the subtask/>The sum of the data volume of the self and the data volume transmitted by the preamble subtask thereof;

S23, constructing a subtask calculation delay model; wherein the subtasks Delay in servicing vehicle offload tasks is/>And/>; Subtask/>The time required to complete the unloading process is/>And/>; Subtask/>Unload start time of/>Subtask/>Is/>And (2) and，/>; In the/>For subtask/>The unloading end time of the preamble subtask;

s24, constructing a subtask execution priority model; wherein subtasks of task vehicle m are defined For subtask/>Preamble subtasks of/>Execution priority of (3) is higher than/>；/>Is/>And (2) and; Where pri (p) is the subtask/>Execution priority of the preamble subtask;

s25, constructing a link reliability model; wherein, in unloading subtasks When the link reliability P _m,n,i between the service vehicle n and the mission vehicle m is expressed as follows:

Wherein e is a natural constant; t _theroy is the theoretical time for the service vehicle to travel out of the communication range of the task vehicle m at the current speed; the value range of P _m,n,i is (0, 1);

wherein D _success represents the number of calculation tasks successfully unloaded; constraint C1 represents the last subtask completion time/>, of task vehicle m The maximum tolerant time delay T _m of the whole calculation task is not exceeded; constraint C2 indicates that initial free resource F _n serving vehicle n is between most free resource F _max and least free resource F _min; constraint C3 represents subtask/>Execution priority/>Lower than the lowest execution priority among all its predecessor subtasks; constraint C4 represents subtask/>Unloading start time/>Not earlier than the latest unload end time of all its predecessor subtasks; In constraint C5,/>And/>Representing a set of task vehicles and a set of service vehicles, respectively.

5. The method for offloading dependent tasks in a V2V scenario as claimed in claim 4, wherein in step S22, the calculation formula of the transmission rate R _m,n is:

Wherein B _m,n represents the bandwidth of the upload channel; p _m denotes the transmission power of the mission vehicle m; /(I) Represents the path loss between the service vehicle and the mission vehicle, delta represents the path loss factor, and l represents the distance between the service vehicle and the mission vehicle; h represents the channel fading factor of the uploading link; n ₀ represents gaussian white noise power.

6. The method for offloading dependent tasks in a V2V scenario according to claim 4, wherein in step S3, said markov decision process comprises: state space, action space, and reward functions;

the state space is used for representing information parameters of the service vehicle in the communication range of the task vehicle and state parameters of the self-calculation task at the current moment at each moment, and the parameters are used as input states of a deep reinforcement learning algorithm and are represented as s (t ')= [ C (t '), Q (t ') ]; wherein s (t ') is an input state at any time t'; c (t ') is t', said vehicle queue being a member of the service vehicle; a task queue when Q (t ') is t', wherein the task queue is formed by subtasks as members;

In the unloading process of a computing task, when one subtask returns a computing result, recalculating the execution priority of the not-scheduled subtask in the computing task; when the service vehicle does not return an unloading result within the expected completion time, judging that the unloading is failed, changing the execution priority of the corresponding subtasks to 0, and putting the subtasks into an unloading queue again to wait for calculation; wherein, for a critical subtask, the maximum value of the predicted completion times offloaded to two service vehicles is taken as the predicted completion time of the critical subtask; when the unloading of the subtask is completed, the execution priority of the subtask is changed to 1;

the action space is used for selecting a service vehicle for the subtasks; for a subtask with execution priority of 0, the subtask can be unloaded, so that the subtask can be unloaded to a service vehicle for calculation; all action sets of the action space are represented as action= {1,2, …, n }, where 1,2, …, n are the numbers of the service vehicles;

The calculation formula of the reward function is ; Where r _i (t ') represents the task vehicle's scheduling subtasks/>, at tThe awards obtained; t _true is subtask/>Is calculated according to the following formula。

7. The method of dependent task offloading in a V2V scenario of claim 6, wherein in the vehicle queue C (t'), each service vehicle member comprises four elements, expressed as:

Wherein x _n(t')、a_n(t')、v_n (t') represents the position, acceleration and speed of the service vehicle n at the current time, respectively;

In the task queue Q (t'), each subtask member includes eight elements, denoted:

In the/> Representing subtasks/>Maximum tolerated time delay of (2); choose denotes a service vehicle to which the subtask is offloaded.

8. The method for task offloading of claim 7, wherein in step S3, the deep reinforcement learning algorithm is DDPG algorithm.

9. Computer terminal comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor, when executing the program, realizes the steps of a method of dependency task offloading in a V2V scenario according to any one of claims 1 to 8.

10. A computer readable storage medium having stored thereon a computer program, wherein the program, when executed by a processor, implements the steps of a dependent task offload method in a V2V scenario as claimed in any one of claims 1 to 8.