CN111800477B

CN111800477B - Differentiated excitation method for edge-computing data quality perception

Info

Publication number: CN111800477B
Application number: CN202010542414.2A
Authority: CN
Inventors: 骆淑云; 李逸飞; 徐伟强
Original assignee: Zhejiang Sci Tech University ZSTU
Current assignee: Kangxu Technology Co ltd
Priority date: 2020-06-15
Filing date: 2020-06-15
Publication date: 2022-09-23
Anticipated expiration: 2040-06-15
Also published as: CN111800477A

Abstract

The invention discloses a differentiated excitation method for edge-oriented calculation data quality perception, which comprises the following steps: s11, extracting crowd sensing data uploaded by a user, establishing a two-dimensional data coordinate according to the crowd sensing data, and calculating an outlier factor of each point relative to other points in the established two-dimensional data coordinate by adopting an LOF algorithm; s12, comparing the difference value between the outlier factor and the standard value 1 of each data with a preset range to obtain the quality grade of each data; and S13, obtaining a user grade corresponding to the quality grade according to the quality grade of each piece of data, and providing return information corresponding to the user grade. The invention provides an optimal excitation mechanism meeting task requirements based on data quality and is used for selection of crowd sensing users in edge calculation.

Description

Differentiated excitation method for edge-computing data quality perception

Technical Field

The invention relates to the technical field of edge computing of crowd sensing, in particular to a differentiated excitation method for edge computing data quality sensing.

Background

Under the background of the big data era, with the rapid development of the functions of the mobile terminal, the crowd sensing gradually appears in the visual field of people, and has already gained wide attention in academic circles and industrial circles. The essence of crowd-sourcing perception is to collect data for scientific research such as local temperature, humidity, carbon dioxide content and the like in the daily life of a user by using a wide range of eligible smart phones, thereby greatly saving the time and money spent on arranging a specially-assigned person to collect the data.

But the crowd sensing has some problems to be solved urgently. First, most individuals with crowd sensing participate in the collection in the form of volunteers, and neither the quantity nor the quality of the collected data is guaranteed. While there is a certain consumption (such as time, traffic, bandwidth, etc.) of data collection participating in crowd sensing, even the possibility of revealing privacy of some individuals (such as geographic locations of individuals, etc.). Therefore, there is a need for a suitable incentive mechanism to provide certain rewards for individuals participating in crowd sensing, maintain the enthusiasm of individuals participating in crowd sensing applications, and simultaneously screen some malicious individuals (or users who continuously provide inaccurate data).

There are studies on incentive mechanisms, whose main purpose is to encourage users to open gates sharing their resources, whose role is mainly "knock". Yang Dequan et al proposed an incentive model centered on the platform and the users, respectively, as early as 2012, but it is a precondition that both the users and the platform know the service costs of all users, which is impractical in the practical application of crowd sensing; meanwhile, the platform-centric model only considers a single task, and the user-centric model also only considers independent tasks without considering the association between tasks. Yand Di and Amintoosi, Haleh et al respectively propose a mobile crowd sensing online incentive mechanism considering credit updating and a participatory sensing system based on credit, and simultaneously consider the quality of data submitted by users and the credibility level of the data in a social network. Zhang Yu et al propose a reward-based cooperation mechanism using a repeat game method. There are also many scholars studying online incentive mechanisms for multiple random users to ensure that the mechanism can meet the user's needs in real-time dynamics.

However, these excitation mechanisms are focused on a single cooperative task, do not combine multiple cooperative tasks with multiple users, and lack consideration on data quality discrimination, which may result in collected data being possibly "watered," thereby resulting in inaccuracy of research based on the data. In practical applications, the potential for low quality data comes from a number of sources: on one hand, problems such as packet loss and the like can be caused by the network of the user, so that incomplete data can be generated; on the other hand, a certain user may have malicious intent to upload some malicious data, but the data is usually used for edge computing, the data volume of the data is usually much smaller than that of cloud computing, the influence caused by the malicious data is relatively large, and the malicious data directly interferes with the normal operation of the upper-layer application, so that the service quality of the application is affected. In view of the selfishness and inertia of the users themselves, it is necessary to design a differentiated incentive method based on data quality perception to discriminate and grade the users for return.

Disclosure of Invention

The invention aims to overcome the defects of the prior art and provides a differentiated excitation method facing edge computing data quality perception.

In order to achieve the purpose, the invention adopts the following technical scheme:

a differentiated incentive method for edge-oriented computing data quality perception comprises the following steps:

s1, extracting crowd sensing data uploaded by a user, establishing a two-dimensional data coordinate according to the crowd sensing data, and calculating an outlier factor of each point relative to other points in the established two-dimensional data coordinate by adopting an LOF algorithm;

s2, comparing the difference value between the outlier factor and the standard value 1 of each data with a preset range to obtain the quality grade of each data;

and S3, obtaining a user grade corresponding to the quality grade according to the quality grade of each piece of data, and providing return information corresponding to the user grade.

Further, in step S1, an LOF algorithm is used to calculate an outlier factor of each point relative to other points in the established two-dimensional data coordinates, which is expressed as:

wherein, LOF _k (A) An outlier representing data a; a represents one of data uploaded by a user; lrd _k (A) Represents the inverse of the average reachable distance of points in the kth neighborhood of point a;

representing all points in the kth neighborhood of point a; lrd (O) represents the inverse of the average reachable distance for point O; n is a radical of hydrogen _k (A) Representing the kth neighborhood of point a.

Further, after comparing the difference between the outlier factor and the standard value 1 of each datum with the preset range in the step S2, the method further includes:

s21, if the difference value between the outlier factor and the standard value 1 is smaller than a preset range, obtaining that the data corresponding to the outlier factor is high-quality data, and storing the user corresponding to the high-quality data to a set U _g The preparation method comprises the following steps of (1) performing;

s22, if the difference value between the outlier factor and the standard value 1 is larger than or equal to a preset range, obtaining that the data corresponding to the outlier factor is low-quality data, and storing the user corresponding to the low-quality data to a set U _m In (1).

Further, after the data corresponding to the outlier obtained in step S22 is low quality data, the method further includes:

judging whether the accuracy of the low-quality data is higher than a preset threshold value, if so, saving users corresponding to the data with the accuracy higher than the preset threshold value in the low-quality data in a set

Performing the following steps; if not, saving the users corresponding to the data with the accuracy lower than the preset threshold value in the low-quality data in a set U _m In (1).

Further, the step S3 specifically includes:

if the user is in the set U _m If yes, no report is provided;

if the user is in the set

If yes, the user obtains a basic return delta;

if the user is in the set U _g In (3), the user gets a report r _i 。

Further, the basic reward δ obtained by the user is represented as:

wherein R represents the total reward given by the server;

expressed as a collection

The number of users in (1); r is _i Representing the reward earned by the user.

Further, the reward r obtained by the user _i The method specifically comprises the following steps:

and calculating the credibility of the user, wherein the credibility is represented as:

ρ _i ＝k/LOF(i)

where ρ is _i Representing the credibility of the user i; k represents a constant defined by a particular application;

user-derived reward r _i Expressed as:

wherein the content of the first and second substances,

an estimate representing a reward earned by the user; w represents U _g An estimate of the total reward earned by the users in the set.

Further, in step S3, the reward information corresponding to the user rating is provided as:

wherein r is _i Representing the reward earned by the user.

Compared with the prior art, the invention has the following beneficial effects:

(1) the invention innovatively provides a differentiated incentive mechanism based on data quality perception, provides an optimal incentive mechanism meeting task requirements based on data quality and is used for selecting crowd sensing users in edge calculation;

(2) the method is carried out based on the data quality in the process of selecting the crowd sensing data source, so that the accuracy of the data uploaded by the user and collected by the edge server is greatly improved;

(3) when the system distributes the report back, the invention considers the inertia and selfishness of the user, adopts a multi-classification and multi (credit) grade distribution mode, applies a two-stage Stent Boger game and optimizes the distribution mode, thereby greatly improving the enthusiasm of the user.

Drawings

FIG. 1 is a flowchart of a differentiated incentive method for edge-oriented computation of data quality perception according to an embodiment;

FIG. 2 is a schematic diagram of a two-stage Stainberg game according to one embodiment;

FIG. 3 is a diagram of a single edge server multi-tasking multi-user system according to an embodiment.

Detailed Description

The embodiments of the present invention are described below with reference to specific embodiments, and other advantages and effects of the present invention will be easily understood by those skilled in the art from the disclosure of the present specification. The invention is capable of other and different embodiments and of being practiced or of being carried out in various ways, and its several details are capable of modification in various respects, all without departing from the spirit and scope of the present invention. It is to be noted that the features in the following embodiments and examples may be combined with each other without conflict.

Example one

The embodiment provides a differentiated excitation method facing edge computing data quality perception, as shown in fig. 1, including the steps of:

s11, extracting crowd sensing data uploaded by a user, establishing a two-dimensional data coordinate according to the crowd sensing data, and calculating an outlier factor of each point relative to other points in the established two-dimensional data coordinate by adopting an LOF algorithm;

s12, comparing the difference value between the outlier factor and the standard value 1 of each data with a preset range to obtain the quality grade of each data;

and S13, obtaining a user grade corresponding to the quality grade according to the quality grade of each piece of data, and providing return information corresponding to the user grade.

In step S11, crowd sensing data uploaded by the user is extracted, a two-dimensional data coordinate is established according to the crowd sensing data, and an LOF algorithm is used to calculate an outlier factor of each point relative to other points in the established two-dimensional data coordinate.

Extracting two characteristics (selected by a server) from data provided by a user to establish a two-dimensional data set, drawing the two-dimensional data set on a two-dimensional coordinate axis in a point form, calculating an Outlier factor of each point relative to other points by using an LOF (local Outlier factor) algorithm, calculating the Outlier factor of data A according to the following formula, and when the LOF is used, extracting two characteristics (selected by the server) from the data provided by the user to establish a two-dimensional data set, calculating the Outlier factor of each point relative to other points by using the LOF (local Outlier factor) algorithm _k (A) The closer to 0, the more likely it is that the data a is abnormal data.

The formula is expressed as:

wherein, LOF _k (A) An outlier factor representing data a; a represents one of data uploaded by a user; lrd _k (A) Represents the inverse of the average reachable distance of points within the kth neighborhood of point a;

representing all points in the kth neighborhood of point a; lrd (O) represents the inverse of the average reachable distance for point O; n is a radical of _k (A) Representing the kth neighborhood of point a.

The details involved are as follows:

dis (A, O): representing the Euclidean distance between the point A and the point O;

kth distance (k-distance): the kth distance of point A is simply the distance from point A that is k-th away from point A (excluding A itself) to point A, and is denoted as d _k (A)；

Distance domain (k-distance neighborhoo)d) The method comprises the following steps The kth distance field of point A is represented by N, which is all points (including points on a circle) within an area with A as the center of the circle and the kth distance as the radius _k (A) In that respect Therefore, the number of the kth domain points of A is at least k;

reach-distance (reach-distance): the k-th reachable distance from point O to point a may take the larger of dis (a, O) and dk (a), denoted as reach _ disk (a, O) ═ MAX (dis (a, O), d _k (A))；

The local reachable density (lrdk (a)) of point a represents the inverse of the average reachable distance of points in the kth domain of point a, and is calculated as follows:

lrd _k (A) representing a density, the higher the density, the more points around a, and it is obvious that we consider that a with higher density is more likely to belong to the same cluster as the surrounding points, and conversely, the lower density is more likely to be outliers.

In step S12, the difference between the outlier factor and the standard value 1 of each data is compared with a preset range to obtain a quality grade of each data.

The edge server sets a threshold range according to the requirement of the edge server on data, judges the data quality of each data according to the outlier factor of each data, divides users into three levels, and respectively sets the data quality from high to low as U _g 、

U _m 。

In this embodiment, after comparing the outlier factor of each data with the preset threshold, the method further includes:

s121, if the difference value between the outlier factor and the standard value 1 is smaller than the preset range, obtaining data corresponding to the outlier factor as high-quality data, and storing users corresponding to the high-quality data to a set U _g Performing the following steps;

and S122, if the difference value between the outlier factor and the standard value 1 is larger than or equal to a preset range, obtaining that the data corresponding to the outlier factor is low-quality data.

For low-quality data, whether the low-quality data is caused by the acquisition time, the time delay, the packet loss rate and the like of the data needs to be checked, and the accuracy of the low-quality data is judged, specifically:

judging whether the accuracy of the low-quality data is higher than a preset threshold value, if so, considering that the low quality is caused by unstable network conditions and the like, and saving users corresponding to the data with the accuracy higher than the preset threshold value in the low-quality data in a set

Performing the following steps; if not, the user is determined to be a malicious user, and the user corresponding to the data with the accuracy lower than the preset threshold value in the low-quality data is stored in the set U _m In (1).

In step S13, a user rank corresponding to the quality rank is obtained according to the quality rank of each data, and reward information corresponding to the user rank is provided.

The edge server gives out the total reward willing to be provided, and distributes the reward according to different grades of the users, which specifically comprises the following steps:

if the user is in the set U _m If yes, no report is provided;

if the user is in the set

If so, the basic return delta obtained by the user is obtained;

if the user is in the set U _g In (3), the user gets a report r _i 。

For user in set U _m And (4) deeming it to be a malicious user, giving no return, and then not adopting the data uploaded by the user.

Is in the set for the user

Although it uploads low-quality data due to network reasons, the uploaded data does not help the edge server, and after considering that it still may provide high-quality data to encourage it to upload continuouslyData, given a base return δ, δ determined by the upper layer application, is expressed as:

wherein R represents the total reward given by the server;

expressed as a collection

The number of users in (1); r is _i Representing the reward the user receives.

For user in set U _g In this level, users are a group of users with the largest contribution to the edge computing crowd sensing, so the users can reasonably obtain the most return; but each user has selfishness and wants to get more returns. To solve this problem, the present embodiment employs a two-stage steinberg-based gaming method.

First consider the reputation of the user, expressed as:

ρ _i ＝k/LOF(i)

wherein ρ _i Representing the credibility of the user i; k denotes a constant defined by a specific application.

Defining the utility function of the user as: u. u _i ＝r _i -c _i ，r _i A reward obtained for the user, c _i The cost spent uploading data for the user;

defining the utility function of the MEC (Mobile Edge computing) server as follows:

wherein v is _j Indicating the value of the task that the server can obtain after task j is completed.

The two parties participating in the game are an MEC server and a plurality of users (from U) collecting data _g )。The game is an incomplete information game because the server does not know the value of the specific cost of a user to participate in the task, but only the probability distribution of the cost. Since both parties want to maximize their utility, the two-stage Stent Boger game is specifically processed as follows (as shown in FIG. 2):

(1) gaming between the MEC server and the user. Game content: according to the cost probability distribution of the user, the MEC server determines the value of the total return R so that u _s And maximum.

(2) Given R, users play games with each other. Game content: determining a task participation strategy set of each user according to the total return R value given by the server, so that u _i And max.

The solving method is as follows:

by adopting a backtracking method, the second-stage game equilibrium solution is solved first, and then the first-stage game equilibrium solution is solved

And a second stage game: given R, determine the set of task participation policies for each user such that u _i And max.

According to the personal rationality attribute u _i ＝r _i -c _i Is more than or equal to 0. Suppose that

Then

Then all r _i ≥r ^* The user may be willing to participate in the task.

From this it can be calculated how many tasks can be completed, the condition for which is | U _j |≥m _j So that u can be calculated from the definition of the server utility function _s 。

The first stage game: each R corresponds to a u _s Then u can be derived _s The maximum R can be judged whether the R exists or not and whether the R is unique or not through a geometric mode.

To sum up, for being in set U _g Of the user, the reward r obtained by the user _i Expressed as:

wherein, the first and the second end of the pipe are connected with each other,

To summarize, the allocated reward for all users is expressed as:

wherein r is _i Representing the reward obtained by the user;

wherein R is _j Representing the total reward assigned to users participating in task j; m is _j Indicating that task j requires at least m _j The task can be completed only when the user participates, also called the task threshold.

And is

U _j Representing a set of users participating in task j; t is _c A set of tasks to be performed is represented,

t is the set of all tasks.

The scenario shown in fig. 3 is a single-edge server multitasking multi-user scenario, and there are 7 crowd sensing users (users 1-7) who submit crowd sensing data to the MEC server, and have already calculated LOF (i) of data submitted by all users (the LOF drawing and calculating process is skipped here). LOF (1) ═ 0.7; LOF (2) ═ 0.8; LOF (3) ═ 0.9; LOF (4) ═ 0.6; LOF (5) ═ 0.6; LOF (6) ═ 0.3; LOF (7) ═ 0.4.

It should be noted that, at this time, the edge server does not know the specific value spent by the user, but only knows the probability distribution of the cost spent by the user, and the probability distribution is one subject to the expectation of μ and the variance of σ ² Is normally distributed.

The threshold set by the system is set to be 0.6, and it is assumed that the user 6 uploads low-quality data due to network packet loss after subsequent judgment, the user is classified as follows:

the MEC server has given a set of tasks T to be completed _c Two tasks, task 1 and task 2, U ₁ 、U ₂ Represent a set of users participating in task 1 and task 2, respectively; task 1 requires m ₁ Each user participates in completing task, and task 2 needs m ₂ Each user participates in completing the task, where m is assumed ₁ And m ₂ Are all 2.

Two stages of the Stainberg game are started: assume that the total return to each task that the MEC server can give is R units. According to the formula:

according to personal rationality attributes, u _i ＝r _i -c _i Is more than or equal to 0. Suppose that

Then

Then all c _i ≥r ^* The user may be willing to participate in the task. Suppose cost of user 4 ₄ Greater than r, then the user will not choose to participate in the task, while c is preferentially used among the remaining users _i Smaller users. Finally, selecting user 1, user 2 and user3 and data provided by the user 5, U ₁ ＝{1，2}，U ₂ That they will receive r for each _i The users are reported back in order to be able to continue to provide good quality data at a later time, users 4 and 6 will only get a basic return δ, users 7 and 8 will not receive any return, and the MEC server will then refuse to receive its uploaded data.

In summary, the embodiment provides a differentiated incentive method for edge-oriented data quality calculation, which can provide an optimal incentive method for satisfying the requirements of both users for multi-task cooperative application of the users and the MEC server based on the data quality uploaded by the users; according to the method, the data source with high quality is selected for the edge server, so that the accuracy of crowd sensing data collection is enhanced, and the follow-up application is improved; meanwhile, in the process of distributing the reward, the game based on two sections of Stainberg games is carried out: for the server, the final payment is made the lowest, i.e. the utility ratio of the server is the highest. For the users, the reward obtained by the users is always larger than the cost spent by the users, the enthusiasm for participating in crowd sensing data collection is maintained, and meanwhile, the higher the credit of each user is, the higher the reward obtained is, and the great incentive is also provided for the users to continuously provide high-quality data for the edge server.

In addition, the influence of sporadic problems on the data uploaded by the user is also considered through the incentive method. For users who upload low-quality data, a set of decision processes is established to determine the reason for uploading the low-quality data: and (3) discriminating real malicious users from non-malicious users in the aspects of data collection time, network packet loss rate, uploading delay and the like, and giving consideration to the basic points. This effectively encourages the enthusiasm of such users and gives positive stimulation to the crowd-sourcing perception potential users in the prospect, effectively increasing the number of crowd-sourcing perception users.

Those skilled in the art will appreciate that all or part of the steps in the methods of the above embodiments may be implemented by associated hardware instructed by a program, which may be stored in a computer-readable storage medium, and the storage medium may include: ROM, RAM, magnetic or optical disks, and the like.

It is to be noted that the foregoing description is only exemplary of the invention and that the principles of the technology may be employed. It will be understood by those skilled in the art that the present invention is not limited to the particular embodiments described herein, but is capable of various obvious changes, rearrangements and substitutions as will now become apparent to those skilled in the art without departing from the scope of the invention. Therefore, although the present invention has been described in greater detail by the above embodiments, the present invention is not limited to the above embodiments, and may include other equivalent embodiments without departing from the spirit of the present invention, and the scope of the present invention is determined by the scope of the appended claims.

Claims

1. A differentiated incentive method for edge-oriented computing data quality perception is characterized by comprising the following steps:

s3, according to the quality grade of each data, obtaining a user grade corresponding to the quality grade, and providing return information corresponding to the user grade;

after comparing the difference between the outlier factor and the standard value 1 of each datum with the preset range in step S2, the method further includes:

s21, if the difference value between the outlier factor and the standard value 1 is smaller than a preset range, obtaining that the data corresponding to the outlier factor is high-quality data, and storing the user corresponding to the high-quality data to a set U _g Performing the following steps;

s22, if the difference value between the outlier factor and the standard value 1 is larger than or equal to a preset range, obtaining that the data corresponding to the outlier factor is low-quality data, and storing the user corresponding to the low-quality data to a set U _m Performing the following steps;

after the data corresponding to the outlier obtained in step S22 is low quality data, the method further includes:

The preparation method comprises the following steps of (1) performing; if not, saving the users corresponding to the data with the accuracy lower than the preset threshold value in the low-quality data in a set U _m Performing the following steps;

step S3 specifically includes:

if the user is in the set U _m If yes, no report is provided;

if the user is in the set

If so, the basic return delta obtained by the user is obtained;

if the user is in the set U _g In (3), the user gets a report r _i ；

The base return δ obtained by the user is expressed as:

wherein R represents the total reward given by the server;

expressed as a collection

The number of users in (2); r is _i Representing the reward obtained by the user;

the reward r obtained by the user _i The method specifically comprises the following steps:

ρ _i ＝k/LOF(i)

reward r obtained by the user _i Expressed as:

2. The discriminative excitation method for edge-oriented computation data quality perception according to claim 1, wherein the LOF algorithm is used in step S1 to calculate the outlier factor of each point relative to other points in the established two-dimensional data coordinates, which is expressed as:

wherein, LOF _k (A) An outlier representing data a; a represents one of data uploaded by a user; lrd _k (A) Represents the inverse of the average reachable distance of points within the kth neighborhood of point a;

3. The differentiated incentive method for edge-oriented computing data quality perception according to claim 1, wherein the reward information corresponding to the user level is provided in step S3 and is represented as:

wherein r is _i Representing the reward the user receives.