CN111028080A

CN111028080A - Multi-arm slot machine and Shapley value-based crowd sensing data dynamic transaction method

Info

Publication number: CN111028080A
Application number: CN201911250169.1A
Authority: CN
Inventors: 徐畅; 司雅蕴; 祝烈煌; 张川; 张璨; 饶鸿洲
Original assignee: Beijing Institute of Technology BIT
Current assignee: Beijing Institute of Technology BIT
Priority date: 2019-12-09
Filing date: 2019-12-09
Publication date: 2020-04-17

Abstract

The invention relates to a multi-arm slot machine and Shapley value-based crowd sensing data dynamic transaction method, and belongs to the technical field of big data and crowd sensing. The present invention first determines the marginal contribution of each "worker"'s data to a "buyer" using the sharley value, including considering the direct contribution of new data and considering the indirect contribution of redundant data. The "buyer" would then select the "worker" with the higher marginal contribution and give the transaction price for the intent. In order to improve the success rate of the transaction and obtain the maximum return, the buyer implements a certain learning strategy. Aiming at the dilemma that high price is given to guarantee successful transaction and the trial bottom line obtains greater return, the multi-arm slot machine model in the context form is utilized for learning, the strategy selects the best observable price in each round, and the strategy is gradually adjusted to adapt to the psychological bottom line of a worker. The price of the worker deduced by the method is expected to be closer to the actual value, and the buyer obtains greater return.

Description

Multi-arm slot machine and Shapley value-based crowd sensing data dynamic transaction method

Technical Field

The invention relates to a data dynamic transaction method under crowd sensing, in particular to a data dynamic transaction method based on a dobby slot machine and a Shapley value, and belongs to the technical field of big data and crowd sensing.

Background

In recent years, with the rapid development of wireless communication and sensor technologies and the rapid popularization of wireless mobile intelligent terminal devices, most smart phones and tablet computers integrate sensing modules with powerful computing and sensing functions, such as a Global Positioning System (GPS), an accelerometer, a gyroscope, a microphone, a camera and the like, so that people can sense and acquire surrounding environment information and acquire related data anytime and anywhere. A large number of applications based on perceptual information continue to emerge, such as: environmental monitoring, traffic monitoring, social networking applications, and the like. These increasing applications have prompted the birth and development of crowd sensing (crowd sensing).

In the context of crowd sensing, some organizations (such as meteorological centers, traffic management departments, etc.) urgently need instant distributed data and become a party who purchases the data, called a buyer; various users who upload perception data through the intelligent terminal are called as a party selling the data as a worker. When supply-demand relationships persist, there is always a "buyer" paying for valuable data, naturally forming a data market. This kind of data transaction mechanism in currency can be regarded as dynamic zero sum game. Both parties are to maximize their own interest, which at the same time means loss to the other. From the "buyer" perspective, it is aimed to obtain more valuable data at the lowest price.

At present, the data market formed in the context of crowd sensing still has some limitations, so that the context cannot be completely marketized, and the data market is pushed to a wider application context. Given that "workers" are unable to communicate in the trading market, this means that "workers" cannot see each other and can not agree on each other's bids, and thus cannot form a seller league to control prices. That is, the "buyer" can know all the bids of the "worker" from the market, and the "worker" only knows the bids of the "worker" and does not know the market quotation of the whole market.

To maximize the benefits of the "buyer," the performance of the different "workers" needs to be measured. Where the value of the perception data affects the final decision of the "buyer". Some conventional solutions believe that the performance of a "worker" or the objective quality of the data completely determines the value of the current data to a particular "buyer". Alternatively, environmental factors can have a significant impact on the value of the data, such as the time at which the data is collected and the location at which the data is collected. However, these views are all comparative. The process of data transaction may be divided into multiple time rounds, and as the time rounds progress, the "buyer" will get more data gradually. That is, most of the time, the "buyer" itself may be considered to have a data set stored. Under this premise, even if the same perception data, they are likely to be of different value to different "buyers". For example, assuming that there are currently "buyers" A and B, A already has data 1, and B does not, then data 1 has a higher probability of being worth B. Thus, "buyers" have a tendency to choose data that is more valuable to them.

In addition to the value of data, another important factor in the data market is the ultimate transaction price of the buyer and seller. How to determine the transaction price of valuable data is a huge challenge in this scenario. The method can be thought of, the buyer and the seller negotiate, and through multiple rounds of discussion, the price acceptable by the two parties is finally obtained progressively and the contract is made. However, because the communication cost is high, the method is more suitable in a scene with less bidding rounds. In the crowd sensing, because the number of participating entities is very large, especially the number of "workers" may be much larger than the number of "buyers", each "buyer" contracts with a large number of groups one by one, which is only a theoretically feasible method.

"buyers" are more inclined to make decisions based on observed environmental information. The environmental information here refers to the expectation of the "worker" for the transaction price. This value may fluctuate within certain limits while having certain proprietary properties. There have been some incentive mechanisms designed in the past to encourage "workers" to indicate their desired price on the market. Still other scenarios default expectations are public. In other words, the "buyer" has fully understood the probability distribution of the "worker" to the deal price in advance, which is somewhat impractical. Specifically, the "worker" may give his/her preliminary psychological price while selling the data, but this is not equal to the final price of the deal, and the two parties still have the problem of information asymmetry. The "buyer" wishes to directly predict the bargaining price closest to the "worker" psychological base line without multiple rounds of negotiation in order to be able to successfully trade and obtain data with minimal expense.

Disclosure of Invention

The invention aims to solve the defects of the prior art, and provides a group intelligence perception data dynamic transaction method based on a dobby and a Shapley value in order to solve the technical problems that how a data demand party (buyer) finds an optimal data seller (buyer) in a data transaction market through multiple rounds of transaction data and can purchase data information at a relatively optimal price in a big data group intelligence perception scene.

The core of the method is as follows: in a crowd-sourcing aware data market, the problem of maximizing rewards for data collectors is solved, wherein data is traded in multiple rounds. The sharley value is first used to determine the marginal contribution of each worker's data to the data collector. This contribution is split into two parts, including a direct contribution that takes into account new data and an indirect contribution that takes into account redundant data. The data collector may then select workers with higher marginal contributions and give the transaction price for the intent. In order to improve the success rate of the transaction and obtain the maximum return, the data collector will implement a certain learning strategy. Aiming at the dilemma that high price is given to guarantee successful transaction and the trial bottom line obtains greater return, the multi-arm slot machine model in the context form is utilized for learning, the strategy selects the best observable price in each round, and the strategy is gradually adjusted to adapt to the psychological bottom line of workers.

Advantageous effects

Compared with the prior art, the method of the invention has the following advantages:

1. in the data transaction of the crowd sensing scene, the problem of maximizing the reward of a data collector is considered, namely, how to obtain the maximum reward when purchasing the data.

2. And dynamically evaluating the value of the perception data. The value of the sensory data collected by the "workers" at different time rounds is modeled as the sharley value, i.e., the marginal contribution of the new data set of the "workers" to the original data set of the "buyers". The marginal contribution includes a direct contribution of the new data to the original data set, and an indirect contribution of the redundant data

3. A multi-armed slot machine model in context is used as a pricing model between "buyers" and "workers". Given the time-varying nature of supply-demand relationships in the market, the value of data changes with time rounds, i.e., contextual attributes. The price of "workers" inferred therefrom is expected to be closer to the actual value, and the "buyers" are thereby rewarded more.

Drawings

FIG. 1 is a diagram of a system model in the process of the present invention;

FIG. 2 is a schematic illustration of collected data and uncollected data for a "buyer" in the method of the present invention;

FIG. 3 is a direct contribution made by the data of "workers" in the method of the present invention;

FIG. 4 is a graph of cumulative average revenue over time runs in the method of the present invention;

FIG. 5 is a direct contribution from a "buyer" in the method of the present invention for different time rounds;

FIG. 6 is an indirect contribution made by a "buyer" in the method of the present invention;

FIG. 7 is an unfortunate value of the LinUCB method based on the different price selection intervals of "workers" in the method of the present invention;

FIG. 8 is a graph of the LinUCB process performance at various α points in the process of the present invention.

Detailed Description

The following describes in further detail embodiments of the method of the present invention with reference to the accompanying drawings and examples.

As shown in fig. 1, a method for dynamically trading crowd sensing data based on a dobby slot machine and a sharley value has the following technical scheme:

in the crowd-sensing scenario, there are two main subjects in common: the "buyers" that collect purchase perception data and the "workers" that collect sales perception data. There are many "buyers" and "workers" but the number of "workers" is much greater than that of "buyers".

There is always a trade relationship between "buyers" and "workers" due to the constant demand for sensory data. This scenario can therefore be viewed as a dynamically changing data market. For convenience of description, the transaction process is divided into a plurality of time rounds, and the time of one transaction is regarded as one time round.

Over a number of time rounds, the "buyer" will gradually accumulate the required perception data, i.e. it can be seen that the "buyer" holds one perception data set. Specifically, in one turn, the buyer judges data of different workers, calculates the marginal value of the new sensing data set, finally selects the data set with the highest marginal value, and enters a pre-transaction stage.

The "buyer" spends money purchasing the perception data of the "worker". To ensure that the transaction is successful, the "buyer" will predict the psychological price baseline of the "worker" in anticipation of the highest return in the course of the transaction, depending on the relationship between the historical data value and the transaction price, where the return is defined as the difference between the value of the data and the transaction price.

Step 1: the value of the perception data is evaluated.

The specific evaluation method is as follows:

at time round t, define all data on the market as

The market here does not necessarily refer to the entire market, and it is obviously impractical to communicate too far or too many "workers" due to the distance between the two parties and the number of "workers". Thus, the market refers to a non-empty subset of the original market, after segmentation, where there is no obstacle to communication between entities.

Let "worker" u_iThe data set is preserved as

"buyer" C_jThe data set is preserved as

Wherein 0<Ω_i＜＜Ω_j<N, N represents the total amount of data on the market.

Definition of "buyer" C_jThe demand for data at time round t is

Step 1.1: and solving the direct contribution and the indirect contribution of the marginal value.

Using Shapely's value, measure how much the sensory data provided by "worker" can bring benefits to "buyer":

defining functions

v (N) represents the value of the limited data set N,

i.e. real number field, defining data d_iFor a data set

The marginal contribution of (a) is:

Δ_di(v,S)＝v(S∪{d_i})-v(S) (1)

for Shapely values, the following are defined:

ψ_i(v, N) is the average of all marginal contributions, i.e. the contribution of new data to the original data set; the new data is data that the "buyer" does not have and the "worker" does. For data held in one 'worker' hand, the new data set is represented as

For single data

It represents a direct contribution:

for a "worker", its direct contribution is equal to the sum of the contributions of all new data, i.e.:

indirect contribution, is the contribution that the redundant data makes indirectly to the "buyer" in the transaction by lowering the price of the same type of data in the market. Redundant data refers to the portion of data that is owned by the old "worker" held in the hand of the new "worker". The indirect value is defined as follows:

wherein the content of the first and second substances,

means "workers" u_iTo the data collector c_jRedundant data of phi_jRefers to the data collector c_jA collection of "workers" who have accessed or purchased data,

representing the original old worker u_lOwned data sets.

Step 1.2: the value of the data is evaluated based on the direct contribution and the indirect contribution.

New "worker" u_iFor "buyer" c_jIs equal to the sum of the direct contribution of the new data in the data set and the indirect contribution of the redundant data.

Step 2: the transaction price of the data is evaluated.

The specific evaluation method is as follows:

after data value evaluation, the buyer determines the object of the transaction in the round, and predicts and approaches the psychological price bottom line of the worker by using a confidence interval upper limit model in the dobby slot machine algorithm to obtain the maximum return. There are two possibilities of transaction success and failure due to the estimated probability of failure.

In the multiple arm slot machine algorithm, the price of a historical deal is defined as the "arm" of the slot machine. For one arm, X_tThe sequence representing the benefit of its selection in the previous t rounds, then has the actual mean r and the sample mean

Where n represents the number of times the arm is selected. X_i-r is a random variable obeying a gaussian distribution of degree- σ, represented by the chebyshev inequality:

wherein the content of the first and second substances,

for the variance of all the samples X,

representing the mathematical expectation of all samples X, epsilon is any value greater than 0. The above formula, under gaussian distribution, is equivalent to:

the formula (9) is shown after being finished,

meanwhile, considering that the buyer is only collected in the sample X of the first t-1 turns at the time of the t turn₁-X_t-1. For each "arm", the most likely candidate for the unknown mean of this "arm", i.e. the upper confidence interval limit (UCB), is obtained:

UCB_i(t-1,δ)＝∞，X_t-1＝0 (11)

wherein the content of the first and second substances,

representing the difference between the predicted upper revenue limit and the mean revenue for the current arm. As the number of rounds t increases lnt increases, which means that the uncertainty of the estimate is larger. If the branch with the highest confidence bound is selected, this indicates that the policy is exploratory (exploratory). At the same time, since one is selectedThe "arm", and correspondingly time, increases, resulting in a decrease in the value of this term and a decrease in the uncertainty of the arm. As the number of passes increases, the overall uncertainty is controlled to be within a limited range. The reward of the selected branch is gradually closer to the actual expected reward, which means that the selected branch is the best choice by the collected environmental data at each round.

Also, since this is a contextual problem, the value of the data may change from run to run. The problem is therefore defined as a contextual dobby slot machine model, the core idea being the dobby slot machine algorithm mentioned above. In the model, there are a total of three variables, a two-dimensional feature vector X determined by the observed environmental factors_t,i＝(v_t-1,1)^TWherein v is_t-1Indicating the value of a particular datum in the t-1 round. In addition, with I_pThe arm with the price p is represented,

indicating arm I_pThe number of times of selection in t-1 rounds is

F_θ(p) represents the probability of acceptance of price p by "worker".

Representing an unknown parameter vector.

In the model, the feature vectors are independent variables and the expected reward is a dependent variable. Thus, the problem is modeled as a linear regression problem, with the mapping between the historical feature vectors and the rewards as training samples. In particular when selecting the price p_iWhen it is used, order

For this round the price is selected. Let Di ∈ R^lX2Is at the arm p_iThe following l contexts are observed, with:

c_i∈R^lis that each price is in n_iThe corresponding reward vector observed in the wheel. Estimating an optimal solution of the coefficient vector by least squares estimation using training data (Di, ci)

Using ridge regression, there are:

wherein, I₂Is a two-dimensional identity matrix.

In this model, the reward is expected

Is evaluated as

The standard deviation is expressed as

Wherein A is_i,tFor parameters, initialize I₂In each round represented by formula A_i,t←X_t,iX_t,i ^TIterations are performed and eventually converge. Therefore, there is an optimal arm at the t-th round:

to constant quantity

δ is any value greater than zero.

And step 3: and (4) determining the optimal worker for purchasing the data according to the data value evaluation result obtained in the step (1). Then, according to step 2, the data transaction price evaluation result is obtained from the selected workers, the optimal price is determined, and the data information is purchased.

Examples

In an embodiment, we do two parts, the first part is the many-to-one relationship that is formed after the data collector determines the seller with the highest profit, and the second part is the one-to-one transaction relationship that illustrates how the data collector uses the LinUCB learning strategy to complete the transaction at a near-ideal price.

In the example, there are 10 data to be collected for a total of 10 data collectors and 50 "workers", each round of transaction lasting 10 units of time for a total of 100 rounds.

FIG. 3 shows the direct contribution of data in the "worker" hand, with the z-axis representing the amount of data held by the "worker". It can be seen that significant stratification occurs in the graph because the value of different data is different for different collectors, and because the collector's data set is not empty initially, the data collector has a preference in selecting "workers", which results in differences in the value of the data in the hands of the "workers", thus forming stratification.

Fig. 5 and 6 are direct and indirect contributions of data to the same collector in different rounds of the transaction, respectively. The value of the data does not generally fluctuate much in different transaction rounds, and because the data collector has some data in nature, the direct contribution of partial data is always 0; the indirect contribution being R_i,jAnd D_jIs the average of the direct contributions of the history of (a). Indirect contribution is small compared to direct contribution because the probability that the surplus data of "workers" in each round hits the collector just in possession of the data is small, but indirect contribution of data is still not negligible.

In the decision section of the data collector, the value v of the data collected from above is 200,300]Evenly distributed in this interval; the expected price theta in the "workers" heart is subject to N (mu)_θ1) normal distribution, and μ_θV/2; price given by data collector obeys 0,400]Uniform distribution of (2); a total of 1000 transactions were conducted.

The results using the LinUCB strategy are shown in fig. 4, with the vertical axis representing the average cumulative revenue for the data collector. In the first 100 rounds, the curve had a distinct oscillation and a minimum of-10 occurred near 100 rounds. Then the yield curve starts to increase steadily, and the number of rounds T has

The relationship (2) of (c). The yield is approximately 83.5 when 1000 rounds are reached, but by 3000 rounds the yield only increases to 91.1. And it can also be seen from the figure that, all the rounds that fail the transaction are because the price p offered by the collector is lower than the price θ expected by the "workers", the number of failed rounds is approximately 5% of the total number of rounds.

Claims

1. A crowd sensing data dynamic transaction method based on a multi-arm slot machine and a Shapley value is characterized in that:

the crowd-sourcing aware scene includes two subject objects: the method comprises the steps of collecting buyers purchasing perception data and workers selling perception data; there is always a transaction relationship between "buyer" and "worker"; dividing the transaction process into a plurality of time rounds, wherein the time of one transaction is regarded as one time round;

step 1: evaluating the value of the perception data;

using the Shapely value, determining the profitability that the perception data provided by each "worker" can bring to the "buyer", i.e. the marginal contribution, which includes two parts, the direct contribution of the new data and the indirect contribution of the redundant data; evaluating the value of the data according to the marginal contribution, wherein the total contribution value of the data set of a new 'worker' to the 'buyer' is equal to the sum of the direct contribution of the new data in the data set and the indirect contribution of the redundant data;

step 2: evaluating the bargaining price of the data by using a multi-arm slot machine algorithm as a pricing model between buyers and workers;

and step 3: determining the optimal worker for purchasing data according to the data value evaluation result obtained in the step 1; then, according to step 2, the data transaction price evaluation result is obtained from the selected workers, the optimal price is determined, and the data information is purchased.

2. The method for dynamically trading crowd sensing data based on multiple-armed slots and sharley values according to claim 1, wherein the method for obtaining the marginal contribution in step 1 is as follows:

at time round t, define all data on the market as

The market is a non-empty subset of the original market, after being segmented, with no obstacles to communication between entities;

let "worker" u_iThe data set is preserved as

"buyer" C_jThe data set is preserved as

Wherein 0<Ω_i＜＜Ω_j<N, N represents all data quantity on the market;

definition of "buyer" C_jThe demand for data at time round t is

Defining functions

v (N) represents the value of the limited data set N,

i.e. real number field, defining data d_iFor a data set

Is not limited byThe actual contribution is as follows:

Δ_di(v,S)＝v(S∪{d_i})-v(S) (1)

for Shapely values, the following are defined:

ψ_i(v, N) is the average of all marginal contributions, i.e. the contribution of new data to the original data set; new data is data that "buyer" does not have and "worker" owns; for data held in one 'worker' hand, the new data set is represented as

For single data

It represents a direct contribution:

indirect contribution, which is the contribution of redundant data to the "buyer" indirectly in the transaction due to the reduced price of the same type of data in the market; redundant data refers to data owned by an old "worker" held in the hand of the new "worker"; the indirect value is defined as follows:

wherein the content of the first and second substances,

means "workers" u_iTo "buyer" c_jRedundant data of phi_jRefers to "buyer" c_jA collection of "workers" who have accessed or purchased data,

representing old "workers" u_lOwned data sets.

3. The method for dynamically trading crowd sensing data based on multiple arm slots and sharley values according to claim 1, wherein the step 2 of evaluating the bargaining price of the data comprises the following steps:

estimating and approaching a psychological price bottom line of a worker by using a confidence interval upper limit model in a multi-arm slot machine algorithm to obtain maximum return; defining the historically committed price as the "arm" of the slot machine, X for one arm_tThe sequence representing the benefit of its selection in the previous t rounds, then has the actual mean r and the sample mean

Where n represents the number of times the arm is selected; x_i-r is a random variable obeying a gaussian distribution of degree- σ, represented by the chebyshev inequality:

wherein the content of the first and second substances,

for the variance of all the samples X,

represents the mathematical expectation of all samples X, epsilon being any value greater than 0; the above formula, under gaussian distribution, is equivalent to:

formula (9) has, after finishing:

meanwhile, considering that the buyer is only collected in the sample X of the first t-1 turns at the time of the t turn₁-X_t-1(ii) a For each "arm", the maximum likelihood candidate for the unknown mean of this "arm" is obtained, i.e. the upper confidence interval limit UCB:

UCB_i(t-1,δ)＝∞，X_t-1＝0 (11)

wherein the content of the first and second substances,

representing the difference between the predicted income upper limit and the income mean value for the current arm;

in the model, there are a total of three variables, a two-dimensional feature vector X determined by the observed environmental factors_t,i＝(v_t-1,1)^TWherein v is_t-1Representing the value of a specific datum in the t-1 round; in addition, with I_pThe arm with the price p is represented,

indicating arm I_pThe number of times of selection in t-1 rounds is

F_θ(p) represents the probability of acceptance of price p by "worker";

representing an unknown parameter vector;

when selecting price p_iWhen it is used, order

Rounds in which the price is selected for this purpose; let Di ∈ R^lX2Is at the arm p_iThe following l contexts are observed, with:

c_i∈R^lis that each price is in n_iThe corresponding reward vector observed in the wheel; estimating an optimal solution of the coefficient vector by least squares estimation using training data (Di, ci)

Using ridge regression, there are:

wherein, I₂Is a two-dimensional identity matrix;

in this model, the reward is expected

Is evaluated as

The standard deviation is expressed as

Wherein A is_i,tFor parameters, initialize I₂At each timeIn the wheel is composed of_i,t←X_t,iX_t,i ^TIteration is carried out, and convergence is finally carried out;

there is an optimal arm under the t-th round:

to constant quantity

δ is any value greater than zero.