CN110737771A

CN110737771A - question distribution method based on big data and device thereof

Info

Publication number: CN110737771A
Application number: CN201910866615.5A
Authority: CN
Inventors: 孙全智; 耿溟; 孙艺恬
Original assignee: TENFEN Inc
Current assignee: TENFEN Inc
Priority date: 2019-09-12
Filing date: 2019-09-12
Publication date: 2020-01-31
Anticipated expiration: 2039-09-12
Also published as: CN110737771B

Abstract

The invention provides question distribution methods based on big data and devices thereof, belonging to the technical field of computer information.A question distribution method based on big data comprises a data establishment stage and a question distribution stage, wherein in the data establishment stage, users are classified into a plurality of user category groups according to answer data of each user, then a difficulty coefficient of each question in each user category group is determined according to the answer data of each user category group.

Description

question distribution method based on big data and device thereof

Technical Field

The invention belongs to the technical field of computer information, and particularly relates to title distribution methods based on big data and a device thereof.

Background

Currently, on websites or application programs for online answering, such as online examination, online answering games, etc., when a user wants to answer, several ways of assigning questions are usually adopted, for example, questions are randomly extracted from a question bank and assigned to the user, for example, the user selects an answering mode, all users answer the same questions in the same mode, for example, the difficulty of the questions is evaluated according to the individual answering conditions of the user, and questions with moderate difficulty are selected and assigned to the user for answering.

Disclosure of Invention

The invention aims to at least solve technical problems in the prior art, and provides question allocation methods based on big data, which can more accurately evaluate the difficulty of questions and allocate the questions according to the evaluated difficulty and the personalized requirements of users, thereby achieving better answering effect.

The technical scheme adopted for solving the technical problem of the invention is that title distribution methods based on big data comprise the following steps:

a data establishing stage:

establishing a plurality of special topics, and respectively establishing question banks for the plurality of special topics, wherein each question bank comprises a plurality of questions;

classifying each user into a plurality of user category groups through a classifier obtained through pre-training according to answer data of each user;

determining difficulty coefficients of all the questions under each user category group according to preset algorithms according to answer data of all users in each user category group;

a topic distribution stage:

determining a special question and an answer mode selected by a user;

acquiring a user category group to which the user belongs, and acquiring difficulty coefficients of all the topics in the special topics selected by the user according to the user category group;

and selecting questions with matched difficulty coefficients from the questions to be distributed to the user for answering according to a preset difficulty coefficient interval of the answering mode selected by the user.

According to the method provided by the invention, the difficulty coefficient of each question is determined according to the answer data of the individual user and the answer data of each user in the user category group to which the user belongs, and then the question with the proper difficulty coefficient is selected and distributed to the user for answering according to the special question and the answer mode selected by the user, so that the difficulty of the question can be more accurately evaluated, and then the question is distributed according to the evaluated difficulty and the individual requirement of the user, thereby achieving better answer effect.

Preferably, in the above method provided by the present invention, the method further comprises:

and after the user finishes answering, updating each question made by the user at this time according to the answer data of the user at this time and a preset algorithm, wherein the difficulty coefficient is under the user category group to which the user belongs.

Preferably, in the above method provided by the present invention, each question in the question bank includes at least labels, and the classifying step of classifying each user into a plurality of user category groups by a classifier obtained through pre-training according to the answer data of each user specifically includes:

respectively generating associated data of each user and a label according to the answer data of the question answered by each user and the label included in the question answered by each user; the associated data is the number of the questions answered by the user and the number of the questions answered by the user under the label included by the questions answered by each user;

and classifying each user into a plurality of user category groups according to the associated data and a classifier obtained by pre-training.

Preferably, in the above method provided by the present invention, the pre-trained classifier uses a clustering algorithm to classify each user into a plurality of user category groups.

Preferably, in the above method provided by the present invention, the clustering algorithm includes any of a K-means clustering algorithm, a center point clustering algorithm, and a random selection clustering algorithm.

Preferably, in the above method provided by the present invention, the preset algorithm satisfies:

wherein K1 is the initial difficulty coefficient of the topic; the AC is the total number of users of the user category group to which the user belongs; the ACF is the number of users who have done the topic in the total number of users; ACFR responds the number of users who have done the track to the number of users who have done the track.

Correspondingly, the invention also provides types of large data-based title distribution devices, which comprise a data establishing unit and a title distribution unit;

the data establishing unit specifically includes:

the system comprises an item bank establishing module, a plurality of item banks and a plurality of item bank setting modules, wherein the item bank establishing module is used for establishing a plurality of special items and respectively establishing item banks for the plurality of special items, and each item bank comprises a plurality of items;

the user classification module is used for classifying the users into a plurality of user category groups through a classifier obtained by pre-training according to the answer data of the users;

the difficulty calculation module is used for determining a difficulty coefficient of each question under each user category group according to a preset algorithm according to the answer data of each user in each user category group;

the title allocation unit specifically includes:

the selection module is used for determining the special questions and answer modes selected by the user;

the acquisition module is used for acquiring a user category group to which the user belongs and acquiring difficulty coefficients of all the topics in the special topics selected by the user according to the user category group;

and the distribution module is used for selecting questions with matched difficulty coefficients from the questions to distribute to the user for answering according to a difficulty coefficient interval preset in the answering mode selected by the user.

Preferably, in the above apparatus provided by the present invention, the apparatus further comprises:

and the difficulty updating unit is used for updating each question made by the user at this time according to the answer data of the user at this time and a preset algorithm after the user finishes answering the questions, and the difficulty coefficient under the user category group to which the user belongs.

Preferably, in the above apparatus provided by the present invention, each topic in the topic library includes at least labels, and the user classification module specifically includes:

an module, configured to generate associated data of each user and a label according to answer data of the questions answered by each user and the label included in the question answered by each user, where the associated data is the number of the questions answered by each user and the number of the questions answered by the user under the label included in the question answered by each user;

and the second module is used for classifying each user into a plurality of user category groups according to the associated data and the classifier obtained by pre-training.

Preferably, in the apparatus provided by the present invention, in the second module, the pre-trained classifier classifies each user into a plurality of user category groups by using a clustering algorithm.

Preferably, in the above apparatus provided by the present invention, the clustering algorithm includes any of a K-means clustering algorithm, a center point clustering algorithm, and a random selection clustering algorithm.

Preferably, in the above apparatus provided by the present invention, in the difficulty calculating module and/or the difficulty updating unit, the preset algorithm satisfies:

Drawings

Fig. 1 is a flowchart of a data establishment phase in the title allocation methods based on big data provided in this embodiment;

FIG. 2 is a flowchart of a topic allocation stage in the methods for allocating topics based on big data according to this embodiment;

fig. 3 is a detailed flowchart of step 12 in the data establishment phase in the title allocation methods based on big data provided in this embodiment;

fig. 4 is a schematic structural diagram of title distribution devices based on big data according to this embodiment.

Detailed Description

For purposes of clarity, technical solutions and advantages of the present invention, the present invention will be described in further detail with reference to the accompanying drawings, and it is to be understood that the described embodiments are only some embodiments of the present invention, rather than all embodiments.

The shapes and sizes of the various elements in the drawings are not to scale and are merely intended to facilitate an understanding of the contents of the embodiments of the invention.

The embodiment provides title distribution methods based on big data, which comprises the following steps:

as shown in fig. 1, during the data set-up phase:

s11, establishing a plurality of topics, and respectively establishing question banks for the topics, wherein each question bank comprises a plurality of questions.

Specifically, a plurality of topics are established according to needs, for example, mathematics, politics and English topics are respectively established, and then, an item base is respectively established for the plurality of topics according to contents required by the topics, wherein the item base of each topic comprises a plurality of items.

Step , after the topic library of each topic is built, initialization information is built for each topic in each topic library, the initialization information of each topic may include an initial difficulty coefficient K1 of the topic and a tag of the topic, where the initial difficulty coefficient K1 of each topic is the same, and K1 may be assigned at will.

And S12, classifying the users into a plurality of user category groups through a classifier obtained through pre-training according to the answer data of the users.

Specifically, the users can be classified according to the historical answer data of the users and the labels of the answered questions. As shown in fig. 2, S12 may specifically include:

s121, respectively generating associated data of each user and a label according to answer data of the question answered by each user and the label included in the question answered by each user; the associated data may be the number of questions answered by the user and the number of questions answered by the user under the label included in the questions answered by each user.

Specifically, all the tags included in the questions answered by the user are obtained, the number of the questions answered by the user and the number of the questions answered by the user under each tag are obtained, and associated data of the user and each tag are generated.

For example, for a user a, the user a answers 10 questions, 9 questions are answered in 10 questions, the 10 questions include a label M and a label N, under the label M, the user a answers 7 questions and 6 questions, under the label N, the user a answers 3 questions and answers 3 questions, and then associated data of the user a, the label M and the label N can be generated according to answer data of the 10 questions answered by the user a, which is shown in table 1-1:

user' s	Label (R)	Number of pairs/number of questions	Accuracy rate
				A	M	6/7	85％
A	N	3/3	100％

TABLE 1-1

Of course, the associated data may be in other forms, may include other contents, and may be specifically designed according to needs, and is not limited herein.

It should be noted that the answer data of the user specifically includes the question answered by the user and information about whether the answered question is answered or not answered.

And S122, classifying the users into a plurality of user category groups according to the associated data of the users and the classifier obtained by pre-training.

The classifier obtains the associated data of each user, takes the associated data of each user as input data, takes each label as a dimension, selects a proper metric, such as Euclidean metric or Manhattan distance metric, performs cluster analysis on each user to obtain a plurality of data clusters, each data cluster is each user category group, each data in the data clusters is each user, divides the users into a plurality of user category groups according to the questions answered by the users under each label and the accuracy rate of the questions answered by the users, namely in the same user category group, the relevance of each user answering is highest, so that the users can be classified more accurately, the users are prevented from being classified by using a single variable, so that the classification is not accurate.

The classifier obtained through pre-training can optionally adopt various types of clustering algorithms to classify users, such as types in a K-means clustering algorithm, a central point clustering algorithm and a random selection clustering algorithm, for example, the users are classified by adopting the K-means clustering algorithm, if the users need to be divided into K user category groups, a multi-dimensional coordinate system is established by using the associated data of each user, the associated data of K users is randomly selected as an initial clustering center, then the distance between the associated data of each user and the associated data of each user serving as the clustering center is calculated, each user is allocated to the user nearest to the user serving as the clustering center according to calculation, if all the users are allocated, K data clusters are obtained, and then new K users serving as the clustering centers are calculated according to the positions of each user in the K data clusters.

Further , the user category group of each user may be updated based on the user's answer data.

And S13, according to the answer data of each user in each user category group, determining the difficulty coefficient of each question in the user category group according to a preset algorithm.

Specifically, the preset algorithm satisfies:

wherein K1 is the initial difficulty coefficient of the topic; the AC is the total number of users of the user category group to which the user belongs; the ACF is the number of users who have done the topic in the total number of users of the user category group to which the user belongs; ACFR responds the number of users who have done the track to the number of users who have done the track. Of course, the difficulty coefficient of the topic may also be calculated in other manners, and the specific design is designed according to the requirement, which is not limited herein.

For user category groups, a th item of a preset algorithm formula is the basic difficulty of the questions in the user category groups, a second item of the preset algorithm formula is the accuracy rate of answering the questions for the user category groups, the difficulty coefficient of the questions is evaluated by combining big data, namely answer data of a plurality of users, and the difficulty of the questions is evaluated by integrating the basic difficulty and the accuracy rate in the user category groups, so that the difficulty of the questions can be evaluated more accurately, the difficulty coefficient of the questions is prevented from being calculated by using a single variable, and the difficulty coefficient is not accurate, and in , different difficulty coefficients are calculated for different user category groups when the difficulty coefficient of the -way questions is calculated, so that the assignment of the questions to individual requirements of the users is facilitated when the questions are assigned.

As shown in FIG. 3, in the topic assignment phase:

and S21, determining the special subject and the answering mode selected by the user.

Specifically, besides setting multiple special questions, multiple answer modes may also be set, for example, if the method provided in this embodiment is applied to learning software, a learning mode, an examination mode, an easy-to-error-question checking mode, and other modes may be set, each answer mode is preset with a different difficulty coefficient interval, and a user may select an answer mode according to personal requirements.

S22, obtaining a user category group to which the user belongs, and obtaining the difficulty coefficient of each topic in the topics selected by the user according to the user category group to which the user belongs.

Specifically, each topic has different difficulty coefficients in different user category groups, so after the topic selected by the user is determined, the user category group to which the user belongs can be obtained, and then the difficulty coefficient of each topic in the topic selected by the user in the user category group is obtained according to the user category group to which the user belongs.

And S23, selecting the question with the matched difficulty coefficient from the questions in the special questions selected by the user according to the difficulty coefficient interval preset in the answering mode selected by the user, and distributing the selected question to the user for answering.

The method comprises the steps of selecting questions with difficulty coefficients in a difficulty coefficient interval of a mode selected by a user from a question bank selected by the user, distributing the selected questions to the user according to a rule for answering, for example, according to a rule that the difficulty coefficients are from low to high, , presetting different difficulty coefficient intervals for each answering mode according to requirements of the answering mode, for example, setting a difficulty coefficient interval with a large range in a learning mode so that the problem coverage allocated to the user is large, specifically designing according to the requirements, and not limiting.

In summary, in the method provided in this embodiment, the difficulty coefficient of each question is determined according to the answer data of the individual user and the answer data of each user in the user category group to which the user belongs, and then the question with the appropriate difficulty coefficient is selected and allocated to the user for answering according to the special question and the answer mode selected by the user, so that the difficulty of the question can be more accurately evaluated, and then the question is allocated according to the evaluated difficulty for the personalized requirement of the user, so as to achieve a better answer effect.

Optionally, in the method provided in this embodiment, the method may further include:

and S31, after the user finishes answering, updating each question made by the user at this time according to the answer data of the user at this time and a preset algorithm, and the difficulty coefficient under the user category group to which the user belongs.

Specifically, after the user finishes making the question each time, the question that the user has answered this time and the information whether each question is answered are recorded, and then the difficulty coefficient of the question that the user has answered this time is updated according to the algorithm in S13. The difficulty coefficient of each question is updated along with the change of the user group type, the change of the answer data of each user and the change of a preset algorithm. The difficulty coefficient of the question is dynamically updated by combining the answer data of each user, the difficulty coefficient of the question can be more accurately evaluated, and therefore the question with the proper difficulty coefficient can be better distributed to the user according to the personalized requirements of the user.

Optionally, when the user answers th times, the questions including each difficulty coefficient may be randomly screened out from the multiple special questions, and distributed to the user for answering, and then the user is classified according to the answer data of the user.

Correspondingly, as shown in fig. 4, the present embodiment further provides large-data-based topic allocation apparatuses, which include a data creation unit 1 and a topic allocation unit 2.

Specifically, the data establishing unit 1 specifically includes:

the question bank establishing module 11 is used for establishing a plurality of special subjects and respectively establishing question banks for the plurality of special subjects, wherein each question bank comprises a plurality of questions.

And the user classification module 12 is configured to classify each user into a plurality of user category groups according to the answer data of each user through a classifier obtained through pre-training.

And the difficulty calculating module 13 is configured to determine, according to the answer data of each user in each user category group, a difficulty coefficient of each question in the user category group according to a preset algorithm.

Specifically, the topic allocation unit 2 specifically includes:

a selection module 21 for determining the user selected special topic and answer mode.

The obtaining module 22 is configured to obtain a user category group to which the user belongs, and obtain a difficulty coefficient of each topic in the topics selected by the user according to the user category group to which the user belongs.

And the distribution module 23 is configured to select, according to a difficulty coefficient interval preset in the answer mode selected by the user, a question with a matching difficulty coefficient from among the questions in the special question selected by the user, and distribute the selected question to the user for answering.

Optionally, in the above apparatus provided in this embodiment, the apparatus further includes:

and the difficulty updating unit 3 is used for updating each question made by the user at this time according to the answer data of the user at this time and a preset algorithm after the user finishes answering the question, and the difficulty coefficient under the user category group to which the user belongs.

Optionally, in the apparatus provided in this embodiment, each topic in the topic database building module 11 includes at least tags, and the user classification module 12 specifically includes:

and an module 01, configured to generate associated data of each user and the label according to the answer data of the question answered by each user and the label included in the question answered by each user, where the associated data is the number of the questions answered by the user and the number of the questions answered by the answer pair under the label included in the question answered by each user.

A second module 02, configured to classify each user into a plurality of user category groups according to the associated data and a classifier obtained through pre-training.

Optionally, in the apparatus provided in this embodiment, in the second module 02, a pre-trained classifier classifies each user into a plurality of user category groups by using a clustering algorithm.

Optionally, in the apparatus provided in this embodiment, in the second module 02, the clustering algorithm adopted by the pre-trained classifier includes kinds of mean clustering algorithm, central point clustering algorithm, and random selection clustering algorithm.

Optionally, in the apparatus provided in this embodiment, in the difficulty calculating module 13 and/or the difficulty updating unit 03, a preset algorithm satisfies:

wherein K1 is the initial difficulty coefficient of the topic; the AC is the total number of users of the user category group to which the user belongs; the ACF is the number of users who have done the topic in the total number of users of the user category group to which the user belongs; ACFR responds the number of users who have done the track to the number of users who have done the track.

In summary, according to the question allocation method based on big data provided by the present invention, the difficulty coefficient of each question is determined according to the answer data of the individual user and the answer data of each user in the user category group to which the user belongs, and then the question with the appropriate difficulty coefficient is selected and allocated to the user for answering according to the special question and the answer mode selected by the user, so that the difficulty of the question can be more accurately evaluated, and then the question is allocated according to the evaluated difficulty and the personalized requirement of the user, thereby achieving a better answer effect.

It will be understood that the above embodiments are merely exemplary embodiments taken to illustrate the principles of the present invention, which is not limited thereto. It will be apparent to those skilled in the art that various modifications and improvements can be made without departing from the spirit and substance of the invention, and these modifications and improvements are also considered to be within the scope of the invention.

Claims

1, title distribution method based on big data, characterized by comprising the following steps:

a data establishing stage:

a topic distribution stage:

determining a special question and an answer mode selected by a user;

2. The method of claim 1, further comprising:

3. The method of claim 1, wherein each question in the question bank comprises at least labels, and classifying each user into a plurality of user category groups according to the answer data of each user through a pre-trained classifier, specifically comprises:

4. The method of claim 3, wherein the pre-trained classifier employs a clustering algorithm to classify users into a plurality of user class groups.

5. The method of claim 4, wherein the clustering algorithm comprises of K-means clustering algorithm, center point clustering algorithm, and random selection clustering algorithm.

6. The method according to claim 1 or claim 2, wherein the preset algorithm satisfies:

7, title distribution device based on big data, which is characterized in that it comprises a data establishing unit and a title distribution unit;

the data establishing unit specifically includes:

the title allocation unit specifically includes:

8. The apparatus of claim 7, further comprising:

9. The apparatus of claim 7, wherein each topic in the topic library comprises at least labels, and the user classification module comprises:

10. The apparatus of claim 9, wherein the pre-trained classifier in the second module classifies users into a plurality of user class groups using a clustering algorithm.

11. The apparatus of claim 10, wherein the clustering algorithm comprises any of a K-means clustering algorithm, a center point clustering algorithm, and a random selection clustering algorithm.

12. The apparatus according to claim 7 or claim 8, wherein in the difficulty calculating module and/or the difficulty updating unit, the preset algorithm satisfies: