CN112598286A - Crowdsourcing user cheating behavior detection method and device and electronic equipment - Google Patents

Crowdsourcing user cheating behavior detection method and device and electronic equipment Download PDF

Info

Publication number
CN112598286A
CN112598286A CN202011556332.XA CN202011556332A CN112598286A CN 112598286 A CN112598286 A CN 112598286A CN 202011556332 A CN202011556332 A CN 202011556332A CN 112598286 A CN112598286 A CN 112598286A
Authority
CN
China
Prior art keywords
user
historical
crowdsourcing
answer data
cheating
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011556332.XA
Other languages
Chinese (zh)
Inventor
黄鹤南
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baige Feichi Technology Co.,Ltd.
Original Assignee
Zuoyebang Education Technology Beijing Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zuoyebang Education Technology Beijing Co Ltd filed Critical Zuoyebang Education Technology Beijing Co Ltd
Priority to CN202011556332.XA priority Critical patent/CN112598286A/en
Publication of CN112598286A publication Critical patent/CN112598286A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0639Performance analysis of employees; Performance analysis of enterprise or organisation operations
    • G06Q10/06395Quality analysis or management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0639Performance analysis of employees; Performance analysis of enterprise or organisation operations
    • G06Q10/06398Performance of employee with respect to a job function
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/10Services
    • G06Q50/20Education
    • G06Q50/205Education administration or guidance

Landscapes

  • Business, Economics & Management (AREA)
  • Human Resources & Organizations (AREA)
  • Engineering & Computer Science (AREA)
  • Educational Administration (AREA)
  • Strategic Management (AREA)
  • Economics (AREA)
  • Tourism & Hospitality (AREA)
  • Development Economics (AREA)
  • Entrepreneurship & Innovation (AREA)
  • General Physics & Mathematics (AREA)
  • Marketing (AREA)
  • Physics & Mathematics (AREA)
  • General Business, Economics & Management (AREA)
  • Theoretical Computer Science (AREA)
  • Operations Research (AREA)
  • Quality & Reliability (AREA)
  • Game Theory and Decision Science (AREA)
  • Educational Technology (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Primary Health Care (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention belongs to the technical field of internet, and provides a method and a device for detecting cheating behaviors of crowdsourcing users and electronic equipment, wherein the method comprises the following steps: obtaining effective historical answer data of crowdsourcing users; taking user behavior information and question attribute information corresponding to the effective historical answer data as characteristics, and taking whether a user corresponding to the effective historical answer data cheats as a label training cheating model; inputting the user behavior information and the question attribute information of the current crowdsourcing user answering the question into a trained cheating model to detect whether the current crowdsourcing user answering the question cheating. According to the invention, whether the crowdsourcing user cheats can be automatically detected only by extracting the user behaviors and the question attributes in the process of answering the questions by the crowdsourcing user, and manual monitoring is not needed, so that the monitoring time and the labor cost are effectively saved.

Description

Crowdsourcing user cheating behavior detection method and device and electronic equipment
Technical Field
The invention belongs to the technical field of internet, is particularly suitable for the internet online education technology, and more particularly relates to a method and a device for detecting cheating behaviors of crowdsourced users, electronic equipment and a computer readable medium.
The crowdsourcing user is a user for solving problems corresponding to each link of a production topic. For example: at the repeated judgment link, at least two questions are given, and crowdsourcing users judge whether the given questions are the same question; in a subject division link, a subject is given, and a user is crowdsourced to judge the subject to which the subject belongs; in the learning segment link, a topic is given, crowdsourcing users judge the grade of the topic, and the like.
Background
Online education is used as supplement and enhancement of traditional school education, the acceptance is higher and higher, and more students and teachers participate in the online education. In online education, in order to consolidate and detect the acceptance of students on classroom teaching knowledge, a large number of subject questions are often used as example questions, practice questions, variable questions and the like. This requires the production of a large number of discipline topics to meet the teaching needs.
The existing title production process can be divided into links such as weight judgment, residue judgment, subject division, division into sections, answer examination, typesetting examination, multi-title splitting examination and the like; each link requires crowdsourcing users to solve different questions. For example: at the repeated judgment link, at least two questions are given, and crowdsourcing users judge whether the given questions are the same question; in the step of judging the incomplete, the question stem is given, and crowdsourcing users judge whether the question stem information is complete (whether the characters are incomplete, the picture is incomplete, and the like); in a subject division link, a subject is given, and a user is crowdsourced to judge the subject to which the subject belongs; in the learning segment link, a topic is given, crowdsourcing users judge the grade of the topic, and the like. Crowdsourcing users usually get questions and answer through clients, cheating phenomena are inevitable in the answering process, and if manual monitoring is carried out through a camera, a large amount of labor time is consumed, so that resource waste is caused.
Disclosure of Invention
Technical problem to be solved
The invention aims to solve the technical problem that crowdsourcing users are time-consuming and labor-consuming through manual monitoring in the prior art.
(II) technical scheme
In order to solve the technical problem, one aspect of the present invention provides a method for detecting cheating behaviors of crowdsourcing users, where the crowdsourcing users are users who solve problems corresponding to each link of a production topic, and the method includes the following steps:
obtaining effective historical answer data of crowdsourcing users;
taking user behavior information and question attribute information corresponding to the effective historical answer data as characteristics, and taking whether a user corresponding to the effective historical answer data cheats as a label training cheating model;
inputting the user behavior information and the question attribute information of the current crowdsourcing user answering the question into a trained cheating model to detect whether the current crowdsourcing user answering the question cheating.
According to a preferred embodiment of the present invention, the valid historical answer data is question data with a correct answer data format and answer parameters within a preset normal range, and the obtaining of the valid historical answer data of the crowdsourcing user includes:
collecting historical answer data of crowdsourcing users;
extracting correct historical answer data according to the format of the historical answer data;
removing or modifying abnormal data of answer parameters from the correct historical answer data according to answer parameters of historical questions and answer parameter thresholds;
and taking the correct historical answer data after the abnormal answer data is removed or modified as effective historical answer data.
According to a preferred embodiment of the invention, correct historical answer data is extracted on line in real time according to correct answers of historical questions;
or, the offline asynchronous delay extracts correct historical answer data according to correct answer of the historical questions in a preset time period.
According to a preferred embodiment of the present invention, the answer parameters include: the user operation in the user current day answering quantity, the user current day correct rate, the user current day answering time and the current day answering time; correspondingly, the answer parameter threshold includes: the method comprises the following steps of firstly, modifying abnormal answer data from correct historical answer data according to answer parameters and answer parameter thresholds of historical questions, wherein the steps comprise that a user current day answer quantity threshold, a user current day answer time threshold and user effective operation are carried out, and the step comprises the following steps:
when the user current day response quantity is larger than the user current day response quantity threshold value, modifying the user current day response quantity into the user historical average day response quantity, and modifying the user current day accuracy into the user historical average day accuracy;
when the user response time on the same day is larger than the user response time threshold value on the same day, modifying the user response time on the same day into the user average response time;
and extracting the user operation which accords with the effective operation of the user within the response time of the user as the user operation times.
According to a preferred embodiment of the present invention, the user behavior information includes: at least one of the user current day answering quantity, the user average answering quantity, the user current day answering time, the user operation times, the user current day correct rate and the user average correct rate.
According to a preferred embodiment of the present invention, the title attribute information includes: at least one of subject difficulty, subject answering time of the same type, subject answering time, subject section and subject integrity.
According to a preferred embodiment of the present invention, before the user behavior information and the question attribute information corresponding to the valid historical answer data are used as features, and whether the user corresponding to the valid historical answer data cheats is used as a tag to train a cheating model, the method further includes:
configuring a subject difficulty scoring rule and a subject integrity scoring rule;
acquiring the question difficulty scoring data of the effective historical answer data according to the question difficulty scoring rule;
and acquiring the question integrity grading data of the effective historical answer data according to the question integrity grading rule.
According to a preferred embodiment of the present invention, before inputting the user behavior information and the question attribute information of the current crowdsourcing user's current answer into the trained cheating model to detect whether the current crowdsourcing user's current answer is cheated, the method further includes:
collecting user behavior information and title attribute information of a specified crowdsourcing user as cross-validation samples; and verifying the trained cheating model through the cross-validation sample.
According to a preferred embodiment of the invention, the method comprises:
counting the detection cheating times of all crowdsourcing users within a preset time period according to the detection result of the cheating model;
and when the detection cheating times are larger than the cheating threshold value, the crowdsourced users are forbidden.
According to a preferred embodiment of the present invention, the step of producing the title includes: any one of a weight judging link, a residual judging link, a subject dividing link and a subject dividing link.
A second aspect of the present invention provides a device for detecting cheating behaviors of crowdsourcing users, where the crowdsourcing users are users who solve problems corresponding to respective links of production topics, and the device includes:
the acquisition module is used for acquiring effective historical answer data of crowdsourcing users;
the training module is used for taking the user behavior information and the question attribute information corresponding to the effective historical answer data as characteristics, and taking whether the user corresponding to the effective historical answer data cheats as a label training cheating model;
and the detection module is used for inputting the user behavior information and the question attribute information of the current crowdsourcing user answer into the trained cheating model to detect whether the current crowdsourcing user answer is cheated.
A third aspect of the invention proposes an electronic device comprising a processor and a memory for storing a computer-executable program, which, when executed by the processor, performs the method.
The fourth aspect of the present invention also provides a computer-readable medium storing a computer-executable program, which when executed, implements the method.
(III) advantageous effects
The method comprises the steps of obtaining effective historical answer data of crowdsourcing users; and then training a cheating model according to the historical user behavior information, the question attribute information and the label data of whether the historical user cheats, which are extracted from the effective historical answer data, and detecting whether the current crowdsourcing user cheats the current answer through the trained cheating model. According to the invention, whether the crowdsourcing user cheats can be automatically detected only by extracting the user behaviors and the question attributes in the process of answering the questions by the crowdsourcing user, and manual monitoring is not needed, so that the monitoring time and the labor cost are effectively saved.
According to the format of the historical question answering data, correct historical question answering data are extracted; and eliminating or modifying abnormal data of the answer parameters from the correct historical answer data according to the answer parameters of the historical questions and the answer parameter threshold values, so that effective historical answer data are obtained, the effectiveness of the cheating model training samples is improved, and the detection accuracy of the cheating model is improved.
According to the detection result of the cheating model, the detection cheating times of all crowdsourced users in a preset time period are counted; and when the detection cheating times are larger than the cheating threshold value, the crowdsourced users are forbidden. The cheating behaviors of crowdsourcing users can be effectively reduced, and the production quality of the questions is improved.
Drawings
FIG. 1 is a schematic flow chart illustrating a method for detecting cheating behavior of crowdsourced users according to the present invention;
FIG. 2 is a schematic structural diagram of a device for detecting cheating actions of crowdsourced users according to the present invention;
FIG. 3 is a schematic structural diagram of an electronic device of one embodiment of the invention;
fig. 4 is a schematic diagram of a computer-readable recording medium of an embodiment of the present invention.
Detailed Description
In describing particular embodiments, specific details of structures, properties, effects, or other features are set forth in order to provide a thorough understanding of the embodiments by one skilled in the art. However, it is not excluded that a person skilled in the art may implement the invention in a specific case without the above-described structures, performances, effects or other features.
The flow chart in the drawings is only an exemplary flow demonstration, and does not represent that all the contents, operations and steps in the flow chart are necessarily included in the scheme of the invention, nor does it represent that the execution is necessarily performed in the order shown in the drawings. For example, some operations/steps in the flowcharts may be divided, some operations/steps may be combined or partially combined, and the like, and the execution order shown in the flowcharts may be changed according to actual situations without departing from the gist of the present invention.
The block diagrams in the figures generally represent functional entities and do not necessarily correspond to physically separate entities. I.e. these functional entities may be implemented in the form of software, or in one or more hardware modules or integrated circuits, or in different network and/or processing unit devices and/or microcontroller devices.
The same reference numerals denote the same or similar elements, components, or parts throughout the drawings, and thus, a repetitive description thereof may be omitted hereinafter. It will be further understood that, although the terms first, second, third, etc. may be used herein to describe various elements, components, or sections, these elements, components, or sections should not be limited by these terms. That is, these phrases are used only to distinguish one from another. For example, a first device may also be referred to as a second device without departing from the spirit of the present invention. Furthermore, the term "and/or", "and/or" is intended to include all combinations of any one or more of the listed items.
In order to solve the technical problems, the invention provides a method for detecting cheating behaviors of crowdsourcing users, wherein the crowdsourcing users refer to users who answer problems corresponding to all links of production questions, and effective historical answer data of the crowdsourcing users are obtained; and then training a cheating model according to the historical user behavior information, the question attribute information and the label data of whether the historical user cheats, which are extracted from the effective historical answer data, and detecting whether the current crowdsourcing user cheats the current answer through the trained cheating model. According to the invention, whether the crowdsourcing user cheats can be automatically detected only by extracting the user behaviors and the question attributes in the process of answering the questions by the crowdsourcing user, and manual monitoring is not needed, so that the monitoring time and the labor cost are effectively saved.
According to the format of the historical question answering data, correct historical question answering data are extracted; and eliminating or modifying abnormal data of the answer parameters from the correct historical answer data according to the answer parameters of the historical questions and the answer parameter threshold values, so that effective historical answer data are obtained, the effectiveness of the cheating model training samples is improved, and the detection accuracy of the cheating model is improved.
According to the invention, correct historical answer data can be extracted on line in real time according to correct answers of historical questions; and extracting correct historical answer data according to correct answer of the historical questions in a preset time period in an offline asynchronous delay manner. The offline asynchronous delay processing can better count historical answer data and realize batch processing.
In one embodiment of modifying the abnormal answer data, the answer parameters include: the user operation in the user current day answering quantity, the user current day correct rate, the user current day answering time and the current day answering time; correspondingly, the answer parameter threshold value comprises: when the user on-day response quantity is larger than the user on-day response quantity threshold value, modifying the user on-day response quantity into the user historical average day response quantity, and modifying the user on-day accuracy into the user historical average day accuracy; when the user response time on the current day is larger than the user response time threshold value on the current day, modifying the user response time on the current day into the user average response time; meanwhile, the user operation which accords with the effective operation of the user in the response time of the user is extracted as the user operation times.
In the present invention, the user behavior information includes: at least one of the user current day answering quantity, the user average answering quantity, the user current day answering time, the user operation times, the user current day correct rate and the user average correct rate. The title attribute information includes: at least one of subject difficulty, subject answering time of the same type, subject answering time, subject section and subject integrity.
Before cheating model training, a question difficulty scoring rule and a question integrity scoring rule are configured; acquiring the question difficulty scoring data of the effective historical answer data according to the question difficulty scoring rule; and acquiring the question integrity grading data of the effective historical answer data according to the question integrity grading rule. Therefore, the extraction of the attribute information of the historical titles is realized.
After the cheating model is trained, user behavior information and subject attribute information of appointed crowdsourcing users are further collected to serve as cross-validation samples; and verifying the trained cheating model through the cross-validation sample so as to improve the accuracy of the cheating model.
According to the detection result of the cheating model, the detection cheating times of all crowdsourced users in a preset time period are counted; and when the detection cheating times are larger than the cheating threshold value, the crowdsourced users are forbidden. The cheating behaviors of crowdsourcing users can be effectively reduced, and the production quality of the questions is improved.
In order that the objects, technical solutions and advantages of the present invention will become more apparent, the present invention will be further described in detail with reference to the accompanying drawings in conjunction with the following specific embodiments.
In describing particular embodiments, specific details of structures, properties, effects, or other features are set forth in order to provide a thorough understanding of the embodiments by one skilled in the art. However, it is not excluded that a person skilled in the art may implement the invention in a specific case without the above-described structures, performances, effects or other features.
The flow chart in the drawings is only an exemplary flow demonstration, and does not represent that all the contents, operations and steps in the flow chart are necessarily included in the scheme of the invention, nor does it represent that the execution is necessarily performed in the order shown in the drawings. For example, some operations/steps in the flowcharts may be divided, some operations/steps may be combined or partially combined, and the like, and the execution order shown in the flowcharts may be changed according to actual situations without departing from the gist of the present invention.
The block diagrams in the figures generally represent functional entities and do not necessarily correspond to physically separate entities. I.e. these functional entities may be implemented in the form of software, or in one or more hardware modules or integrated circuits, or in different network and/or processing unit devices and/or microcontroller devices.
The same reference numerals denote the same or similar elements, components, or parts throughout the drawings, and thus, a repetitive description thereof may be omitted hereinafter. It will be further understood that, although the terms first, second, third, etc. may be used herein to describe various elements, components, or sections, these elements, components, or sections should not be limited by these terms. That is, these phrases are used only to distinguish one from another. For example, a first device may also be referred to as a second device without departing from the spirit of the present invention. Furthermore, the term "and/or", "and/or" is intended to include all combinations of any one or more of the listed items.
Fig. 1 is a schematic flow diagram of a method for detecting cheating behaviors of crowdsourced users, in which the crowdsourced users refer to users who solve problems corresponding to various links of production topics, and as shown in fig. 1, the method includes the following steps:
s1, obtaining effective historical answer data of crowdsourcing users;
the effective historical answer data is question data with correct answer data format and answer parameters in a preset normal range. By preprocessing the question data, the invention eliminates error and abnormal data, thereby ensuring the effectiveness of the training sample of the cheating model.
Specifically, the method comprises the following steps:
s11, collecting historical answer data of crowdsourcing users;
the historical answer data comprises answers of historical questions, and can be obtained by collecting answers of questions answered by historical crowdsourcing users in each link of producing the questions within a preset time period (for example, within three months). The production topic link comprises: any one of a weight judging link, a residual judging link, a subject dividing link and a subject dividing link.
S12, extracting correct historical answer data according to the format of the historical answer data;
the format of the answer data of the historical questions refers to the format requirements on answers of the historical questions, the production links of the questions are different, and the corresponding answer format requirements of the questions are also different; for example, the answer formats corresponding to the repeated judgment links and the incomplete judgment questions are as follows; yes or no; the answer format corresponding to the subject of the subject link is a preset subject name, and the preset subject name can be: subject names stored in advance by systems such as mathematics, English and Chinese; the answer format corresponding to the topic of the classification section link is a preset grade. Therefore, the production link of the historical questions can be obtained firstly, the format requirement of the answers to the questions is determined according to the production link, and then whether the answers to the historical questions meet the format requirement of the answers to the questions is judged, so that the historical question data with wrong answer formats can be removed.
According to the invention, correct historical answer data can be extracted on line in real time according to correct answers of historical questions; and extracting correct historical answer data according to correct answer of the historical questions in a preset time period in an offline asynchronous delay manner. The offline asynchronous delay processing can better count historical answer data and realize batch processing.
S13, removing or modifying abnormal data of answer parameters from the correct historical answer data according to the answer parameters of the historical questions and the answer parameter threshold;
in the invention, the answer parameters are used for reflecting the daily answer condition of crowdsourcing users, and whether the answer data is abnormal or not is judged according to the daily answer condition of the crowdsourcing users. Illustratively, the answer parameters include: the user operation in the user current day answering quantity, the user current day correct rate, the user current day answering time and the current day answering time; correspondingly, the answer parameter threshold includes: the user current day answering quantity threshold value, the user current day answering time threshold value and the user effective operation. In the present invention, the user operation refers to any interaction behavior of the user in the answering process, including but not limited to: interactive behaviors such as clicking, dragging, sliding, and preset gestures (for example, rotating a finger by a predetermined angle); the effective operation of the user refers to the interaction behavior of the user with the terminal equipment component in the answering process, and comprises clicking the component, dragging the component, sliding the component, and rotating the component and the component by a preset gesture (such as rotating a finger by a preset angle), and the like. The terminal component may be: a mouse, a display screen, a touch screen, etc.
The user answer parameters can firstly acquire crowdsourcing user IDs (such as login accounts of crowdsourcing users) of correct historical question answers, answer timestamps of the crowdsourcing user IDs and user operations in the answer timestamps; counting the response amount of the user on the day and the response time of the user on the day by crowdsourcing user ID and the response time stamp of the crowdsourcing user ID; determining the user operation in the answering time of the day through the user operation in the answering timestamp; and determining the current-day correct rate of the user by comparing the current-day user question answers with the accurate answers. The answer parameter threshold is an answer parameter range in which the user normally answers, and may be set according to actual conditions, for example, the user answer amount threshold on the day may be set to 1000, and the user answer time threshold on the day may be set to 8 hours.
Specifically, in a specific manner of modifying abnormal answer data, if the daily answer quantity of the user is greater than the daily answer quantity threshold of the user, modifying the daily answer quantity of the user into the historical average daily answer quantity of the user, and modifying the daily accuracy of the user into the historical average daily accuracy of the user; when the user response time on the same day is larger than the user response time threshold value on the same day, modifying the user response time on the same day into the user average response time; and extracting the user operation which accords with the effective operation of the user within the response time of the user as the user operation times.
And S14, taking the correct historical answer data after the abnormal answer data are removed or modified as valid historical answer data.
S2, taking user behavior information and question attribute information corresponding to the effective historical answer data as characteristics, and taking whether a user corresponding to the effective historical answer data cheats as a label training cheating model;
wherein the user behavior information comprises: at least one of the user current day answering quantity, the user average answering quantity, the user current day answering time, the user current day correct rate and the user average correct rate. The user current day response amount, the user current day response time and the user operation times can firstly acquire a crowdsourcing user ID (such as a login account of a crowdsourcing user) of correct historical question answering, an answering timestamp of the crowdsourcing user ID and user operation in the answering timestamp; counting the response amount of the user on the day and the response time of the user on the day by crowdsourcing user ID and the response time stamp of the crowdsourcing user ID; determining the operation times of the user according to the operation times of the user in the answer timestamp; and determining the current-day correct rate of the user by comparing the current-day user question answers with the accurate answers. The average user response amount is the average daily response amount of the crowdsourcing users, and the average user accuracy rate is the average daily accuracy rate of the crowdsourcing users.
The title attribute information includes: at least one of subject difficulty, subject answering time of the same type, subject answering time, subject section and subject integrity. The same-type questions answering time can be determined according to the average answering time of crowdsourcing users for the same type of questions, and the same-type questions answering time can be determined according to the average answering time of crowdsourcing users for the same type of questions; disciplines and disciplines can be obtained according to the topic attributes. The topic difficulty and the topic integrity can be obtained by configuring a topic difficulty scoring rule and a topic integrity scoring rule; acquiring the question difficulty scoring data of the effective historical answer data according to the question difficulty scoring rule; and acquiring the question integrity grading data of the effective historical answer data according to the question integrity grading rule. For example, the topic difficulty scoring rule may be to calculate the topic difficulty score according to the topic type, the topic subject, the number of knowledge points related to the topic, and the corresponding weight, and the topic integrity scoring rule may be to calculate the topic integrity score according to the topic structure integrity, the integrity of the known conditions in the topic, and the corresponding weight.
After the cheating model is trained, user behavior information and question attribute information of appointed crowdsourcing users are collected to serve as cross-validation samples; and verifying the trained cheating model through the cross-validation sample so as to improve the accuracy of the cheating model.
S3, inputting the user behavior information and the question attribute information of the current crowdsourcing user answer into the trained cheating model to detect whether the current crowdsourcing user answer is cheated.
Furthermore, the invention can also count the detection cheating times of all crowdsourced users in a preset time period according to the detection result of the cheating model; and when the detection cheating times are larger than the cheating threshold value, the crowdsourced users are forbidden. For crowdsourcing users who detect that the cheating times are more than 0 and less than the cheating threshold, the cheating answer behavior is regarded as invalid behavior, and the cheating answer data is used as historical answer data for optimizing the cheating model.
Fig. 2 is a schematic structural diagram of a device for processing a portrait of a live-broadcast class to reduce weight, wherein the crowd-sourced users refer to users who answer questions corresponding to each link of a production topic, and as shown in fig. 2, the device includes:
an obtaining module 21, configured to obtain valid historical answer data of crowdsourcing users;
the training module 22 is configured to use the user behavior information and the question attribute information corresponding to the valid historical answer data as features, and use whether the user corresponding to the valid historical answer data cheats as a tag training cheating model;
the detecting module 23 is configured to input the user behavior information and the question attribute information of the current crowdsourcing user who answers this time into the trained cheating model to detect whether the current crowdsourcing user answers the question this time to cheat.
In a specific embodiment, the valid historical answer data is question data with a correct answer data format and answer parameters within a preset normal range, and the obtaining module 21 includes:
the acquisition module is used for acquiring historical answer data of crowdsourcing users;
the extraction module is used for extracting correct historical answer data according to the format of the historical answer data;
the removing and modifying module is used for removing or modifying data with abnormal answer parameters from the correct historical answer data according to the answer parameters of the historical questions and the answer parameter threshold;
and the determining module is used for taking the correct historical answer data after the abnormal answer data is removed or modified as effective historical answer data.
The extraction module 22 extracts correct historical answer data on line in real time according to correct answers of the historical questions; or, the extracting module 22 extracts correct historical answer data according to correct answer of the historical questions within a preset time period in an offline asynchronous delay manner.
In one embodiment, the answer parameters include: the user operation in the user current day answering quantity, the user current day correct rate, the user current day answering time and the current day answering time; correspondingly, the answer parameter threshold includes: the removing and modifying module is specifically used for: when the current day response quantity of the user is larger than the current day response quantity threshold of the user, modifying the current day response quantity of the user into the historical average day response quantity of the user, and modifying the current day accuracy of the user into the historical average day accuracy of the user; when the user response time on the current day is larger than the user response time threshold value on the current day, modifying the user response time on the current day into the user average response time; and extracting the user operation which accords with the effective operation of the user within the response time of the user as the user operation times.
In the present invention, the user behavior information includes: at least one of the user current day answering quantity, the user average answering quantity, the user current day answering time, the user operation times, the user current day correct rate and the user average correct rate. The title attribute information includes: at least one of subject difficulty, subject answering time of the same type, subject answering time, subject section and subject integrity.
Further, the apparatus further comprises:
the configuration module is used for configuring a question difficulty scoring rule and a question integrity scoring rule;
the first obtaining module is used for obtaining the question difficulty scoring data of the effective historical answer data according to the question difficulty scoring rule;
and the second acquisition module is used for acquiring the question integrity grade data of the effective historical answer data according to the question integrity grade rule.
The sub-acquisition module 24 is configured to acquire user behavior information and topic attribute information of a specified crowdsourcing user as a cross-validation sample; and verifying the trained cheating model through the cross-validation sample.
The counting module 25 is used for counting the detection cheating times of all crowdsourced users in a preset time period according to the detection result of the cheating model;
a block module 26, configured to block the crowdsourced user when the number of detected cheating is greater than the cheating threshold.
In the invention, the links of the production questions comprise: any one of a weight judging link, a residual judging link, a subject dividing link and a subject dividing link.
Those skilled in the art will appreciate that the modules in the above-described embodiments of the apparatus may be distributed as described in the apparatus, and may be correspondingly modified and distributed in one or more apparatuses other than the above-described embodiments. The modules of the above embodiments may be combined into one module, or further split into multiple sub-modules.
Fig. 3 is a schematic structural diagram of an electronic device according to an embodiment of the present invention, where the electronic device includes a processor and a memory, where the memory is used to store a computer-executable program, and when the computer program is executed by the processor, the processor executes a crowd-sourced user cheating behavior detection method.
As shown in fig. 3, the electronic device is in the form of a general purpose computing device. The processor can be one or more and can work together. The invention also does not exclude that distributed processing is performed, i.e. the processors may be distributed over different physical devices. The electronic device of the present invention is not limited to a single entity, and may be a sum of a plurality of entity devices.
The memory stores a computer executable program, typically machine readable code. The computer readable program may be executed by the processor to enable an electronic device to perform the method of the invention, or at least some of the steps of the method.
The memory may include volatile memory, such as Random Access Memory (RAM) and/or cache memory, and may also be non-volatile memory, such as read-only memory (ROM).
Optionally, in this embodiment, the electronic device further includes an I/O interface, which is used for data exchange between the electronic device and an external device. The I/O interface may be a local bus representing one or more of several types of bus structures, including a memory unit bus or memory unit controller, a peripheral bus, an accelerated graphics port, a processing unit, and/or a memory storage device using any of a variety of bus architectures.
It should be understood that the electronic device shown in fig. 3 is only one example of the present invention, and elements or components not shown in the above example may be further included in the electronic device of the present invention. For example, some electronic devices further include a display unit such as a display screen, and some electronic devices further include a human-computer interaction element such as a button, a keyboard, and the like. Electronic devices are considered to be covered by the present invention as long as the electronic devices are capable of executing a computer-readable program in a memory to implement the method of the present invention or at least a part of the steps of the method.
Fig. 4 is a schematic diagram of a computer-readable recording medium of an embodiment of the present invention. As shown in fig. 4, the computer-readable recording medium stores therein a computer-executable program, which when executed, implements the method for detecting cheating behavior of crowdsourced users according to the present invention. The computer readable storage medium may include a propagated data signal with readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A readable storage medium may also be any readable medium that is not a readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a readable storage medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
Program code for carrying out operations for aspects of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device and partly on a remote computing device, or entirely on the remote computing device or server. In the case of a remote computing device, the remote computing device may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., through the internet using an internet service provider).
From the above description of the embodiments, those skilled in the art will readily appreciate that the present invention can be implemented by hardware capable of executing a specific computer program, such as the system of the present invention, and electronic processing units, servers, clients, mobile phones, control units, processors, etc. included in the system, and the present invention can also be implemented by a vehicle including at least a part of the above system or components. The invention can also be implemented by computer software executing the method of the invention, for example, by control software executed by a microprocessor, an electronic control unit, a client, a server, etc. of a live device. It should be noted that the computer software for executing the method of the present invention is not limited to be executed by one or a specific hardware entity, but may also be implemented in a distributed manner by hardware entities without specific details, and for the computer software, the software product may be stored in a computer readable storage medium (which may be a CD-ROM, a usb disk, a removable hard disk, etc.) or may be stored in a distributed manner on a network, as long as it can enable an electronic device to execute the method according to the present invention.
While the foregoing embodiments have described the objects, aspects and advantages of the present invention in further detail, it should be understood that the present invention is not inherently related to any particular computer, virtual machine or electronic device, and various general-purpose machines may be used to implement the present invention. The invention is not to be considered as limited to the specific embodiments thereof, but is to be understood as being modified in all respects, all changes and equivalents that come within the spirit and scope of the invention.

Claims (10)

1. A method for detecting cheating behaviors of crowdsourcing users is characterized by comprising the following steps:
obtaining effective historical answer data of crowdsourcing users;
taking user behavior information and question attribute information corresponding to the effective historical answer data as characteristics, and taking whether a user corresponding to the effective historical answer data cheats as a label training cheating model;
inputting the user behavior information and the question attribute information of the current crowdsourcing user answering the question into a trained cheating model to detect whether the current crowdsourcing user answering the question cheating.
2. The method according to claim 1, wherein the valid historical answer data is question data with a correct answer data format and answer parameters within a preset normal range, and the obtaining of the valid historical answer data of the crowdsourcing user comprises:
collecting historical answer data of crowdsourcing users;
extracting correct historical answer data according to the format of the historical answer data;
removing or modifying abnormal data of answer parameters from the correct historical answer data according to answer parameters of historical questions and answer parameter thresholds;
and taking the correct historical answer data after the abnormal answer data is removed or modified as effective historical answer data.
3. The method for detecting the cheating behavior of the crowdsourcing user according to claim 1 or 2, wherein correct historical answer data is extracted on line in real time according to correct answers to historical questions;
or, the offline asynchronous delay extracts correct historical answer data according to correct answer of the historical questions in a preset time period.
4. The method of detecting cheating behavior of crowdsourcing users according to any of claims 1-3, wherein the answer parameters comprise: the user operation in the user current day answering quantity, the user current day correct rate, the user current day answering time and the current day answering time; correspondingly, the answer parameter threshold includes: the method comprises the following steps of firstly, modifying abnormal answer data from correct historical answer data according to answer parameters and answer parameter thresholds of historical questions, wherein the steps comprise that a user current day answer quantity threshold, a user current day answer time threshold and user effective operation are carried out, and the step comprises the following steps:
when the user current day response quantity is larger than the user current day response quantity threshold value, modifying the user current day response quantity into the user historical average day response quantity, and modifying the user current day accuracy into the user historical average day accuracy;
when the user response time on the same day is larger than the user response time threshold value on the same day, modifying the user response time on the same day into the user average response time;
and extracting the user operation which accords with the effective operation of the user within the response time of the user as the user operation times.
5. The method of detecting cheating behavior of crowdsourced users as claimed in any one of claims 1-4, wherein the user behavior information comprises: at least one of the user current day response quantity, the user average response quantity, the user current day response time, the user operation times, the user current day accuracy rate and the user average accuracy rate;
optionally, the title attribute information includes: at least one of subject difficulty, subject answering time of the same type, subject answering time, subject section and subject integrity.
6. The method for detecting cheating behaviors of crowdsourced users according to any one of claims 1 to 5, wherein before the user behavior information and the question attribute information corresponding to the valid historical answer data are used as features and whether the user corresponding to the valid historical answer data cheats or not is used as a tag to train a cheating model, the method further comprises:
configuring a subject difficulty scoring rule and a subject integrity scoring rule;
acquiring the question difficulty scoring data of the effective historical answer data according to the question difficulty scoring rule;
acquiring the question integrity grading data of the effective historical answer data according to the question integrity grading rule;
optionally, before inputting the user behavior information and the question attribute information of the current crowdsourcing user who answers this time into the trained cheating model to detect whether the current crowdsourcing user answers this time to cheat, the method further includes:
collecting user behavior information and title attribute information of a specified crowdsourcing user as cross-validation samples; and verifying the trained cheating model through the cross-validation sample.
7. The method of detecting cheating behavior of crowdsourced users according to any one of claims 1-6, wherein the method comprises:
counting the detection cheating times of all crowdsourcing users within a preset time period according to the detection result of the cheating model;
when the number of detected cheating times is larger than the cheating threshold value, the crowdsourcing users are forbidden;
optionally, the step of producing the title includes: any one of a weight judging link, a residual judging link, a subject dividing link and a subject dividing link.
8. The utility model provides a crowd-sourced user action detection device that cheats, crowd-sourced user indicates that each link of answering the production subject corresponds the user of problem, a serial communication port, the device includes:
the acquisition module is used for acquiring effective historical answer data of crowdsourcing users;
the training module is used for taking the user behavior information and the question attribute information corresponding to the effective historical answer data as characteristics, and taking whether the user corresponding to the effective historical answer data cheats as a label training cheating model;
and the detection module is used for inputting the user behavior information and the question attribute information of the current crowdsourcing user answer into the trained cheating model to detect whether the current crowdsourcing user answer is cheated.
9. An electronic device comprising a processor and a memory, the memory for storing a computer-executable program, characterized in that:
the computer program, when executed by the processor, performs the method of any of claims 1-7.
10. A computer-readable medium storing a computer-executable program, wherein the computer-executable program, when executed, implements the method of any of claims 1-7.
CN202011556332.XA 2020-12-23 2020-12-23 Crowdsourcing user cheating behavior detection method and device and electronic equipment Pending CN112598286A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011556332.XA CN112598286A (en) 2020-12-23 2020-12-23 Crowdsourcing user cheating behavior detection method and device and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011556332.XA CN112598286A (en) 2020-12-23 2020-12-23 Crowdsourcing user cheating behavior detection method and device and electronic equipment

Publications (1)

Publication Number Publication Date
CN112598286A true CN112598286A (en) 2021-04-02

Family

ID=75202501

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011556332.XA Pending CN112598286A (en) 2020-12-23 2020-12-23 Crowdsourcing user cheating behavior detection method and device and electronic equipment

Country Status (1)

Country Link
CN (1) CN112598286A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113313168A (en) * 2021-05-28 2021-08-27 上海大学 Intelligent anti-cheating self-service examination system for unmanned invigilation
CN114926221A (en) * 2022-05-31 2022-08-19 北京奇艺世纪科技有限公司 Cheating user identification method and device, electronic equipment and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019200736A1 (en) * 2018-04-17 2019-10-24 平安科技(深圳)有限公司 Operating method and device for crowdsourcing platform, computer device and storage medium
CN110570217A (en) * 2019-09-10 2019-12-13 北京百度网讯科技有限公司 cheating detection method and device
CN110659954A (en) * 2019-08-29 2020-01-07 北京三快在线科技有限公司 Cheating identification method and device, electronic equipment and readable storage medium
CN111639969A (en) * 2020-05-28 2020-09-08 浙江大学 Dynamic incentive calculation method, system, device and medium for crowdsourcing system
US20200380410A1 (en) * 2019-05-27 2020-12-03 Yandex Europe Ag Method and system for determining result for task executed in crowd-sourced environment

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019200736A1 (en) * 2018-04-17 2019-10-24 平安科技(深圳)有限公司 Operating method and device for crowdsourcing platform, computer device and storage medium
US20200380410A1 (en) * 2019-05-27 2020-12-03 Yandex Europe Ag Method and system for determining result for task executed in crowd-sourced environment
CN110659954A (en) * 2019-08-29 2020-01-07 北京三快在线科技有限公司 Cheating identification method and device, electronic equipment and readable storage medium
CN110570217A (en) * 2019-09-10 2019-12-13 北京百度网讯科技有限公司 cheating detection method and device
CN111639969A (en) * 2020-05-28 2020-09-08 浙江大学 Dynamic incentive calculation method, system, device and medium for crowdsourcing system

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113313168A (en) * 2021-05-28 2021-08-27 上海大学 Intelligent anti-cheating self-service examination system for unmanned invigilation
CN114926221A (en) * 2022-05-31 2022-08-19 北京奇艺世纪科技有限公司 Cheating user identification method and device, electronic equipment and storage medium

Similar Documents

Publication Publication Date Title
Kovanović et al. Penetrating the black box of time-on-task estimation
CN107256650B (en) Exercise pushing method and system and terminal equipment
CN109800320B (en) Image processing method, device and computer readable storage medium
CN111177413A (en) Learning resource recommendation method and device and electronic equipment
CN111831831A (en) Knowledge graph-based personalized learning platform and construction method thereof
CN114511425A (en) Method for providing user-customized learning content
CN112598286A (en) Crowdsourcing user cheating behavior detection method and device and electronic equipment
WO2022170985A1 (en) Exercise selection method and apparatus, and computer device and storage medium
CN113407675A (en) Automatic education subject correcting method and device and electronic equipment
Tack et al. Human and automated CEFR-based grading of short answers
CN107301411A (en) Method for identifying mathematical formula and device
CN110111011B (en) Teaching quality supervision method and device and electronic equipment
JP7111223B2 (en) Learning support device and program
CN110765241A (en) Super-outline detection method and device for recommendation questions, electronic equipment and storage medium
CN110688480A (en) Real-time teaching evaluation method and system based on message
CN114021984A (en) Invigilation data processing method
Fonseca et al. Using early plagiarism detection in programming classes to address the student’s difficulties
CN117765783A (en) Intelligent interactive training system based on virtual reality technology
CN109635214A (en) Learning resource pushing method and electronic equipment
CN112328812B (en) Domain knowledge extraction method and system based on self-adjusting parameters and electronic equipment
Amo Filvà et al. Learning analytics to assess students’ behavior with scratch through clickstream
CN112734142B (en) Resource learning path planning method and device based on deep learning
CN113407829A (en) Online learning resource recommendation method, device, equipment and storage medium
CN113918588A (en) Wrong question dynamic intelligent management system based on knowledge points
CN107609401A (en) Automatic test approach and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right

Effective date of registration: 20230606

Address after: 6001, 6th Floor, No.1 Kaifeng Road, Shangdi Information Industry Base, Haidian District, Beijing, 100085

Applicant after: Beijing Baige Feichi Technology Co.,Ltd.

Address before: 100085 4002, 4th floor, No.1 Kaifa Road, Shangdi Information Industry base, Haidian District, Beijing

Applicant before: ZUOYEBANG EDUCATION TECHNOLOGY (BEIJING) CO.,LTD.

TA01 Transfer of patent application right