CN109284315A - A kind of label data Statistical Inference under crowdsourcing model - Google Patents

A kind of label data Statistical Inference under crowdsourcing model Download PDF

Info

Publication number
CN109284315A
CN109284315A CN201810975033.6A CN201810975033A CN109284315A CN 109284315 A CN109284315 A CN 109284315A CN 201810975033 A CN201810975033 A CN 201810975033A CN 109284315 A CN109284315 A CN 109284315A
Authority
CN
China
Prior art keywords
model
mark
crowdsourcing
ijk
person
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201810975033.6A
Other languages
Chinese (zh)
Other versions
CN109284315B (en
Inventor
刘端阳
弓箭峰
赵敏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jiangsu Mobihike Intelligent Technology Co.,Ltd.
Original Assignee
Dalian Mobi Hike Intelligent Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Dalian Mobi Hike Intelligent Technology Co Ltd filed Critical Dalian Mobi Hike Intelligent Technology Co Ltd
Priority to CN201810975033.6A priority Critical patent/CN109284315B/en
Publication of CN109284315A publication Critical patent/CN109284315A/en
Application granted granted Critical
Publication of CN109284315B publication Critical patent/CN109284315B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses the label data Statistical Inferences under a kind of crowdsourcing model, and thxe present method defines optimal objective functions, give the constraint condition of the different dimensions such as mark person, object, and the solution for carrying out model is converted using Lagrange duality.Lagrange multiplier in Lagrange transformation has measured the professional standards of mark person and the difficulty of object marking respectively, and optimum results are not influenced by the complexity of the mark level of maker and separate calibration task.For the less situation of data volume, slack variable joined so as to as a result reach better effect.

Description

A kind of label data Statistical Inference under crowdsourcing model
Technical field
The present invention relates to data minings and machine learning techniques field, more particularly, to the mark under a kind of crowdsourcing model Sign data statistics estimating method.
Background technique
With the fast development of Internet technology, crowdsourcing service as it is a kind of flexibly, effective solution mode, start by It is more and more paid close attention to people.With the continuous development of Internet technology, crowdsourcing service (crowdsourcing) is met the tendency of And give birth to, crowdsourcing refers to the task that a company or mechanism are executed the past by employee, is given with freely voluntary form packet The way of unspecific (and being usually large-scale) public network.The task of crowdsourcing is usually undertaken by individual, but if Involve the need for the task of multiple person cooperational completion, it is also possible to occur in the form of the individual production by open source.
The various aspects research of recent years, crowdsourcing field all make great progress, and propose many using crowdsourcing hand , there are many specific processing methods, has obtained good working effect in the new application of section.Since crowdsourcing application generates In the background of complicated online network trading platform, start the quality Control for crowdsourcing application occurred, thus study how The quality of task completion is effectively improved, and malice worker is identified, becomes in current crowdsourcing research work one Urgent problem, and in crowdsourcing platform worker anonymous property, lead to the processing mode of it and traditional outsourcing task It is very different, accurately and efficiently solves crowdsourcing quality problems and be of great significance.
In existing crowdsourcing model, obtain data scaling task flag data after, mainly by way of the method for ballot come Infer final result, this scheme can obtain objective description under conditions of most people grasps correct result, but not have The case where correct result may be grasped in view of a few peoples, and such case can occur often.
Summary of the invention
It is an object of the invention to overcome drawbacks described above of the existing technology, the number of tags under a kind of crowdsourcing model is provided Estimating method according to statistics is converted by using Lagrange duality, and the complexity of mark person's level and calibration task is received Enter constraint condition.
To achieve the above object, technical scheme is as follows:
A kind of label data Statistical Inference under crowdsourcing model, which comprises the following steps:
Step1: establishing the Unified Form of crowdsourcing data, and note mark person's quantity is m, and the quantity of objective objects is n, class Other quantity is c, and remembers in sample that the objective objects j frequency for being divided to classification k is z by mark person iijk, zijkIt is followed Distribution is denoted as πij, it is distributed πijProbability be denoted as πijk, πijkMeaning be real data in mark person i objective objects j is divided to The probability of kth class, i=1~m, j=1~n, k=1~c;
Step2: building mark person generates the model of class label to objective objects, with yjlIndicate object j in classification l The purpose of probability, l=1~c, following steps is just to solve for yjl:
Step2.1: maximum-entropy model, first maximization objective function are to determine probability distribution that user labels:
Step2.2: minimization is carried out to maximization entropy again, infers yjlOptimized model are as follows:
Step3: with Lagrangian transform method, Lagrange multiplier λ is introducedijjkikl, construct Lagrangian letter Number is
Wherein τjkMeasure the complexity of a calibration task, σiklMeasure the mark level of a calibration person;
Step4: its dual problem is converted by the optimization in Step2, the Lagrangian after conversion are as follows:
Step5: iterative solution solves yjl, it enables:
Then the first item of L is rewritten as in Step4,
Thus it can determine yjlIteration expression formula:
Wherein t=1~N is the number of iterations, and total iterative steps are N.
Preferably, when the labeled data of task is less, the Optimized model of step Step2.2 is replaced with
WhereinFor slack variable, αjiFor regularization parameter.
It can be seen from the above technical proposal that the present invention is by meeting the optimization objective functions of constraint conditions a series of, The deduction to objective results is realized in a manner of mathematical reasoning, independent of personal level, is not also influenced by task complexity. Therefore, the present invention, which has, follows principle of objectivity, does not depend on the personal horizontal distinguishing feature of mark person.
Specific embodiment
Specific embodiments of the present invention will be described in further detail below.
A kind of label data Statistical Inference under crowdsourcing model, comprising the following steps:
Step1: establishing the Unified Form of crowdsourcing data, and note mark person's quantity is m, and the quantity of objective objects is n, class Other quantity is c, and remembers in sample that the objective objects j frequency for being divided to classification k is z by mark person iijk, zijkIt is followed Distribution is denoted as πij, it is distributed πijProbability be denoted as πijk, πijkMeaning be real data in mark person i objective objects j is divided to The probability of kth class, i=1~m, j=1~n, k=1~c.
It is the basis that scheme is realized for the modeling that mark person marks behavior.The mark behavior of mark person is to mark pair The process that labels of elephant, what is established here is probabilistic model.Whole operation process is mark person according to existing experience and knowledge pair The attribute value of objective objects is marked.Assuming that mark person is i, objective objects j, i are to divide j to the marking behavior of j Generic operation.In the very specific situation of classification of j, the j probability for being identified as k class is 0 or 1 by i.But people in real process Thinking have certain randomness and ambiguity, assorting process, which has, is classified as k for j with probability value p, with probability value (1-p) J k is not classified as.For object j, the affiliated situation of its classification is objective reality, is not interfered by the thinking of people, is used yjkIndicate probability of the object j in classification k, this value is exactly the result for needing us to be inferred to by given data. Step2:, with yjkProbability of the object j in classification k is indicated, with πijkIndicate that mark person i will be objective right in data in real data As j is divided to the frequency of kth class.
Step2: building mark person generates the model of class label to objective objects, with yjlIndicate object j in classification l The purpose of probability, l=1~c, following steps is just to solve for yjl
The forming process of final data is an optimization process actually, it is divided into design object function and chooses constraint condition Two large divisions.
Step2.1: maximum-entropy model, first maximization objective function are to determine probability distribution that user labels:
The building of objective function and constraint condition is the core of scheme.Our purpose is from the z observedijkValue It is middle to estimate y with Statistical InferencejlValue.We have been set up the probabilistic model of mark behavior, to realize that object is true The purpose that label is inferred, it is necessary first to clear πijkForm, here by way of Maximum Entropy.
This model is different from only being constrained from capable direction in ballot method, it is also carried out from the direction of column simultaneously Constraint, maximization entropy are to attempt to seek a distribution π as wide as possibleijk, this point is the background environment phase with swarm intelligence Identical.
Step2.2: minimization is carried out to maximization entropy again, infers yjlOptimized model are as follows:
Minimization be because the classification situation of object j that we are not intended to be it is fuzzy, this with it is desirable that obtaining True and reliable yjlThe target of value be consistent.
When the labeled data of task is less, this Optimized model be may be replaced by
WhereinFor slack variable, αjiFor regularization parameter.
Step3: with Lagrangian transform method, Lagrange multiplier λ is introducedijjkikl, construct Lagrangian letter Number is
Wherein τjkMeasure the complexity of a calibration task, σiklMeasure the mark level of a calibration person.
Step4: its dual problem is converted by the optimization in Step2, the Lagrangian after conversion are as follows:
Step5: iterative solution solves yjl, it enables:
Then the first item of L is rewritten as in Step4,
Thus it can determine yjlIteration expression formula:
Wherein t=1~N is the number of iterations, and total iterative steps are N.
The foregoing is only a preferred embodiment of the present invention, but scope of protection of the present invention is not limited thereto, Anyone skilled in the art in the technical scope disclosed by the present invention, according to the technique and scheme of the present invention and its Inventive concept is subject to equivalent substitution or change, should be covered by the protection scope of the present invention.

Claims (2)

1. the label data Statistical Inference under a kind of crowdsourcing model, which comprises the following steps:
Step1: establishing the Unified Form of crowdsourcing data, and note mark person's quantity is m, and the quantity of objective objects is n, classification Quantity is c, and remembers in sample that the objective objects j frequency for being divided to classification k is z by mark person iijk, zijkThe distribution followed It is denoted as πij, it is distributed πijProbability be denoted as πijk, πijkMeaning be that objective objects j is divided to kth by mark person i in real data The probability of class, i=1~m, j=1~n, k=1~c;
Step2: building mark person generates the model of class label to objective objects, with yjlIndicate that object j is general in classification l The purpose of rate, l=1~c, following steps is just to solve for yjl:
Step2.1: maximum-entropy model, first maximization objective function are to determine probability distribution that user labels:
Step2.2: minimization is carried out to maximization entropy again, infers yjlOptimized model are as follows:
Step3: with Lagrangian transform method, Lagrange multiplier λ is introducedijjkikl, constructing Lagrangian is
Wherein τjkMeasure the complexity of a calibration task, σiklMeasure the mark level of a calibration person;
Step4: its dual problem is converted by the optimization in Step2, the Lagrangian after conversion are as follows:
Step5: iterative solution yjl, it enables:
Then the first item of L is rewritten as in Step4,
Thus it can determine yjlIteration expression formula:
Wherein t=1~N is the number of iterations, and total iterative steps are N.
2. the label data Statistical Inference under crowdsourcing model according to claim 1, which is characterized in that when task When labeled data is less, the Optimized model of step Step2.2 is replaced with
Wherein ξjk,For slack variable, αjiFor regularization parameter.
CN201810975033.6A 2018-08-24 2018-08-24 Label data statistical inference method in crowdsourcing mode Active CN109284315B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810975033.6A CN109284315B (en) 2018-08-24 2018-08-24 Label data statistical inference method in crowdsourcing mode

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810975033.6A CN109284315B (en) 2018-08-24 2018-08-24 Label data statistical inference method in crowdsourcing mode

Publications (2)

Publication Number Publication Date
CN109284315A true CN109284315A (en) 2019-01-29
CN109284315B CN109284315B (en) 2021-04-23

Family

ID=65183631

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810975033.6A Active CN109284315B (en) 2018-08-24 2018-08-24 Label data statistical inference method in crowdsourcing mode

Country Status (1)

Country Link
CN (1) CN109284315B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111275079A (en) * 2020-01-13 2020-06-12 浙江大学 Crowdsourcing label speculation method and system based on graph neural network
CN111444937A (en) * 2020-01-15 2020-07-24 湖州师范学院 Crowdsourcing quality improvement method based on integrated TSK fuzzy classifier

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8849790B2 (en) * 2008-12-24 2014-09-30 Yahoo! Inc. Rapid iterative development of classifiers
CN105426826A (en) * 2015-11-09 2016-03-23 张静 Tag noise correction based crowd-sourced tagging data quality improvement method
CN105701502A (en) * 2016-01-06 2016-06-22 福州大学 Image automatic marking method based on Monte Carlo data balance
CN105787521A (en) * 2016-03-25 2016-07-20 浙江大学 Semi-monitoring crowdsourcing marking data integration method facing imbalance of labels
WO2018000269A1 (en) * 2016-06-29 2018-01-04 深圳狗尾草智能科技有限公司 Data annotation method and system based on data mining and crowdsourcing

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8849790B2 (en) * 2008-12-24 2014-09-30 Yahoo! Inc. Rapid iterative development of classifiers
CN105426826A (en) * 2015-11-09 2016-03-23 张静 Tag noise correction based crowd-sourced tagging data quality improvement method
CN105701502A (en) * 2016-01-06 2016-06-22 福州大学 Image automatic marking method based on Monte Carlo data balance
CN105787521A (en) * 2016-03-25 2016-07-20 浙江大学 Semi-monitoring crowdsourcing marking data integration method facing imbalance of labels
WO2018000269A1 (en) * 2016-06-29 2018-01-04 深圳狗尾草智能科技有限公司 Data annotation method and system based on data mining and crowdsourcing

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111275079A (en) * 2020-01-13 2020-06-12 浙江大学 Crowdsourcing label speculation method and system based on graph neural network
CN111444937A (en) * 2020-01-15 2020-07-24 湖州师范学院 Crowdsourcing quality improvement method based on integrated TSK fuzzy classifier
CN111444937B (en) * 2020-01-15 2023-05-12 湖州师范学院 Crowd-sourced quality improvement method based on integrated TSK fuzzy classifier

Also Published As

Publication number Publication date
CN109284315B (en) 2021-04-23

Similar Documents

Publication Publication Date Title
Clements et al. Estimating risk and uncertainty in deep reinforcement learning
Stanujkic et al. An objective multi-criteria approach to optimization using MOORA method and interval grey numbers
CN104573359B (en) A kind of mass-rent labeled data integration method of task based access control difficulty and mark person's ability
Li et al. Applying various algorithms for species distribution modelling
CN104331816B (en) Knowledge based learns big data user's purchase intention Forecasting Methodology with secret protection
CN110990718B (en) Social network model building module of company image lifting system
CN106021377A (en) Information processing method and device implemented by computer
CN110909125B (en) Detection method of media rumor of news-level society
CN106789338B (en) Method for discovering key people in dynamic large-scale social network
CN105160539A (en) Probability matrix decomposition recommendation method
Qiu et al. A directed edge weight prediction model using decision tree ensembles in industrial Internet of Things
CN106202377A (en) A kind of online collaborative sort method based on stochastic gradient descent
CN102722578B (en) Unsupervised cluster characteristic selection method based on Laplace regularization
Zhou et al. Comparative study on the time series forecasting of web traffic based on statistical model and Generative Adversarial model
CN109284315A (en) A kind of label data Statistical Inference under crowdsourcing model
Manganelli et al. Using genetic algorithms in the housing market analysis
CN110570041A (en) AP clustering-based prospective year typical daily load prediction method
Rane et al. Explainable Artificial Intelligence (XAI) approaches for transparency and accountability in financial decision-making
Popovic et al. Fast model‐based ordination with copulas
CN117291655B (en) Consumer life cycle operation analysis method based on entity and network collaborative mapping
Vahdani et al. A neural network model based on support vector machine for conceptual cost estimation in construction projects
Tripathy et al. Performance analysis of bitcoin forecasting using deep learning techniques
Ballı et al. An application of artificial neural networks for prediction and comparison with statistical methods
CN108491477A (en) Neural network recommendation method based on multidimensional cloud and user's dynamic interest
Liu et al. An influence maximization algorithm based on low-dimensional representation learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right
TA01 Transfer of patent application right

Effective date of registration: 20200429

Address after: 518000 Guangdong city of Shenzhen province Qianhai Shenzhen Hong Kong cooperation zone before Bay Road No. 1 building 201 room A

Applicant after: Shenzhen Mobi hi Ke raspberry intelligent robot Co.,Ltd.

Address before: Room 310, No. 1 Pioneer Port E, Dalian Hi-tech Industrial Park, 116000 Liaoning Province

Applicant before: DALIAN MOBI HAIKE INTELLIGENT TECHNOLOGY Co.,Ltd.

TA01 Transfer of patent application right

Effective date of registration: 20200731

Address after: Room 107, building a, Chuangye No.1 building, No.43 Yanshan Road, Yanshan community, zhaoshang street, Nanshan District, Shenzhen City, Guangdong Province

Applicant after: Shenzhen mobihaike data intelligent technology Co.,Ltd.

Address before: 518000 Guangdong city of Shenzhen province Qianhai Shenzhen Hong Kong cooperation zone before Bay Road No. 1 building 201 room A

Applicant before: Shenzhen Mobi hi Ke raspberry intelligent robot Co.,Ltd.

TA01 Transfer of patent application right
TA01 Transfer of patent application right
TA01 Transfer of patent application right

Effective date of registration: 20201223

Address after: 1504, building 1, shuimuyifang building, 286 Nanguang Road, dawangshan community, Nantou street, Nanshan District, Shenzhen City, Guangdong Province

Applicant after: Shenzhen Mobi hi Ke raspberry intelligent robot Co.,Ltd.

Address before: Room 107, building a, Chuangye No.1 building, No.43 Yanshan Road, Yanshan community, zhaoshang street, Nanshan District, Shenzhen City, Guangdong Province

Applicant before: Shenzhen mobihaike data intelligent technology Co.,Ltd.

GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20230920

Address after: 2nd and 3rd floors of Building B-11, No. 29 Xuehai Road, Big Data Industrial Park, Yannan High tech Zone, Yancheng City, Jiangsu Province, 224000 (CNK)

Patentee after: Jiangsu Mobihike Intelligent Technology Co.,Ltd.

Address before: 1504, building 1, shuimuyifang building, 286 Nanguang Road, dawangshan community, Nantou street, Nanshan District, Shenzhen City, Guangdong Province

Patentee before: Shenzhen Mobi hi Ke raspberry intelligent robot Co.,Ltd.