CN114595923B - Group teaching recommendation system based on deep reinforcement learning - Google Patents

Group teaching recommendation system based on deep reinforcement learning Download PDF

Info

Publication number
CN114595923B
CN114595923B CN202210028554.7A CN202210028554A CN114595923B CN 114595923 B CN114595923 B CN 114595923B CN 202210028554 A CN202210028554 A CN 202210028554A CN 114595923 B CN114595923 B CN 114595923B
Authority
CN
China
Prior art keywords
student
model
teaching
recommendation
neural network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210028554.7A
Other languages
Chinese (zh)
Other versions
CN114595923A (en
Inventor
杨腾杰
左琳
陈柯弟
刘念伯
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Electronic Science and Technology of China
Original Assignee
University of Electronic Science and Technology of China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Electronic Science and Technology of China filed Critical University of Electronic Science and Technology of China
Priority to CN202210028554.7A priority Critical patent/CN114595923B/en
Publication of CN114595923A publication Critical patent/CN114595923A/en
Application granted granted Critical
Publication of CN114595923B publication Critical patent/CN114595923B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/067Enterprise or organisation modelling
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/047Probabilistic or stochastic networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/10Services
    • G06Q50/20Education
    • G06Q50/205Education administration or guidance
    • GPHYSICS
    • G09EDUCATION; CRYPTOGRAPHY; DISPLAY; ADVERTISING; SEALS
    • G09BEDUCATIONAL OR DEMONSTRATION APPLIANCES; APPLIANCES FOR TEACHING, OR COMMUNICATING WITH, THE BLIND, DEAF OR MUTE; MODELS; PLANETARIA; GLOBES; MAPS; DIAGRAMS
    • G09B5/00Electrically-operated educational appliances
    • G09B5/08Electrically-operated educational appliances providing for individual presentation of information to a plurality of student stations
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Strategic Management (AREA)
  • Human Resources & Organizations (AREA)
  • Educational Administration (AREA)
  • Educational Technology (AREA)
  • General Health & Medical Sciences (AREA)
  • Economics (AREA)
  • Health & Medical Sciences (AREA)
  • Tourism & Hospitality (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Entrepreneurship & Innovation (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • General Business, Economics & Management (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Marketing (AREA)
  • Probability & Statistics with Applications (AREA)
  • Quality & Reliability (AREA)
  • Operations Research (AREA)
  • Game Theory and Decision Science (AREA)
  • Development Economics (AREA)
  • Primary Health Care (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Electrically Operated Instructional Devices (AREA)

Abstract

The invention discloses a group teaching recommendation system based on deep reinforcement learning, and belongs to the technical field of education and information. The invention collects student data through interaction methods such as voting, question answering, homework, test and the like in a classroom, provides a teaching plan with the largest overall benefit for a given student group, and the overall benefit can be represented by a multi-objective optimization function, and particularly can comprise, but is not limited to, pass rate, excellent rate, average and the like. The invention uses the deep reinforcement learning method to carry out target-oriented teaching path planning for teachers, and can process large-scale complex data. Meanwhile, the training process which takes most time is placed before and after class, and in the class, a teacher can immediately obtain recommended teaching knowledge points through class feedback of students.

Description

Group teaching recommendation system based on deep reinforcement learning
Technical Field
The invention relates to the technical field of education and information, in particular to a group teaching recommendation system based on deep reinforcement learning.
Background
In conventional classroom teaching, teachers often arrange learning content empirically because the learning details of students are not visible and uncontrolled. Of course, the teacher may have a variety of information including questions and answers, classroom assessment, and facial expressions, gestures, and physical actions of the student to assess the student's learning performance. But this information is often rough and cannot cover every student or track every person's learning details, which often makes it impossible for teachers to design teaching paths on a fine granularity. The development of teaching auxiliary systems relieves the difficulty faced by teachers. The teaching auxiliary system provides various teacher-student interaction methods, interaction information can be recorded, and a teacher can more accurately and deeply understand student conditions through the interaction information. On the other hand, the teaching auxiliary system can also provide recommended teaching plans or learning plans for teachers or students, so that the working pressure of the teachers is relieved to a greater extent.
The patent application with publication number CN 112700688A discloses an intelligent classroom teaching auxiliary system. And collecting student learning data through interaction methods such as voting in a class, modeling and tracking the students based on the data, and finally giving a recommended teaching plan according to a model of the students in the whole class. However, the recommendation algorithm is to simulate the learning process of students in various teaching plans based on the current student model, and finally select the teaching plan with the best simulation effect as the recommendation. In order to obtain a better recommendation, it is necessary to simulate as much as possible the situation under all possible teaching plans, which brings about a large amount of calculation and time consumption. With more students and more knowledge points, the resulting long wait may be unacceptable, resulting in a teacher not being able to get timely feedback in the class.
Disclosure of Invention
The invention provides a group teaching recommendation system based on deep reinforcement learning, which can be used for improving the processing efficiency of group teaching recommendation.
The technical scheme adopted by the invention is as follows:
a group teaching recommendation system based on deep reinforcement learning, the system comprising: the system comprises a user terminal, a knowledge point management module, a student data management module, a student model module, a pre-training module and a group teaching recommendation module;
the user terminal is used for a teacher or a student to log in the system and is an interactive input and output terminal of the user and the system;
the knowledge point management module is used for a teacher user to input knowledge point data and send the knowledge point data to the student model module and the pre-training module group teaching recommendation module;
the student data management module is used for inputting student basic data by a student user and sending the student basic data to the student model module; the system comprises a group teaching recommendation module, a student classroom feedback acquisition module and a group teaching recommendation module, wherein the group teaching recommendation module is used for acquiring student classroom feedback in a classroom;
the student model module creates a student model based on currently entered knowledge point data and student basic data according to a preset creation strategy and sends the student model to the pre-training module;
the pre-training module takes the student model created by the student model module as a learning main body, takes data sent by the knowledge point management module and the student data management module as training data, trains a preset initial group recommendation model, and obtains a trained group recommendation model; the initial group recommendation model comprises a first neural network model and a second neural network model, wherein the first neural network model and the second neural network model comprise an input layer, at least one hidden layer and an output layer, the input layer is a student class feedback data sequence, the hidden layer is a neural network capable of processing sequence input, and the output layer of the first neural network model is used for outputting recommendation degree of each knowledge point of a current course; the output layer of the second neural network model is used for outputting the evaluation value of the current classroom teaching, namely the evaluation value of the teaching behavior which is executed currently; the group teaching recommendation module calls a group recommendation model trained by the pre-training module, and combines the student class feedback of each class of the course to output teaching recommendation information in the course of course teaching and sends the teaching recommendation information to the corresponding teacher user; saving student classroom feedback collected by the raw data management module; updating and training the group recommendation model based on student classroom feedback stored in the current period according to the configured model updating period in the course teaching process;
the output teaching recommendation information comprises recommended knowledge points of the next class and evaluation values of a current student class feedback data sequence, wherein the recommended knowledge points of the next class are knowledge points with the maximum recommendation degree;
further, the knowledge point data includes: knowledge point ID, belonging course name, knowledge point brief introduction, knowledge point content, knowledge point difficulty coefficient, the prepositive knowledge point ID of the knowledge point, the matched class test questions of the knowledge point and knowledge point related data.
Further, the student base data includes: student number, name, age, sex, age and student type; the student classroom feedback includes data including: test question names, belonging to knowledge point IDs, test question contents, participation in testing student IDs, student test results and the like.
Further, the student model module uses a student model to simulate a group recommendation model training process of a real student in the pre-training module, and a construction model of the student model is an Aibinhaos memory model, a half-life memory model or a Bayesian knowledge tracking model;
and the description of the model includes:
describing the current mastering state of the virtual students for each knowledge point;
a process describing how a virtual student transitions from one state to another by learning;
classroom feedback after learning is described.
Further, the training of the initial group recommendation model by the pre-training module includes:
a student model created by the student model module is used as a virtual student to form a class to participate in training;
setting course requirement information and initializing network parameters of the initial group recommendation model;
taking the whole class virtual students as environments, taking a first neural network model and a second neural network model of an initial group recommendation model as an intelligent body, training the intelligent body by adopting a near-end strategy optimization algorithm, and storing current network parameters when a preset training ending condition is met to obtain a trained group recommendation model.
The curriculum schedule information includes: the number of lessons and the pass rate, excellent rate, average score and the like which are needed to be achieved when the lessons are finished.
Further, training the agent using the near-end policy optimization algorithm includes:
step S1: recording the initial state of the virtual student;
step S2: judging whether the first cycle times reach a preset first maximum cycle times, if so, executing the step S3; otherwise, the following process is circularly performed:
step S201: resetting the virtual student status to the initial status recorded in step S1;
step S202: step S202-1 to step S202-4 are circularly executed until the cycle number reaches the preset maximum subcycling number; recording the classroom feedback of the virtual students in each cycle, the recommendation degree of the knowledge points output by the first neural network, the evaluation value output by the second neural network model, and calculating the rewarding value obtained by the knowledge points learned last time according to course requirement information through the classroom feedback of all the virtual students;
step S202-1: all virtual students participate in classroom learning, and the virtual students give student classroom feedback;
step S202-2: the student classroom feedback given in the step S202-1 is formed into a student classroom feedback data sequence and is input into a first neural network, and knowledge points of the next teaching are obtained based on the output of the first neural network, namely, the knowledge point with the maximum recommendation degree is used as the knowledge point of the next teaching;
step S202-3: the student classroom feedback given in the step S202-1 is formed into a student classroom feedback data sequence, a second neural network model is input, and an evaluation value of the student classroom feedback data sequence is obtained based on the output of the second neural network model;
step S202-4: all virtual students learn knowledge points of the next teaching based on the first neural network;
step S3: judging whether the preset second maximum cycle number is reached, if so, ending; otherwise, the following process is circularly performed:
step S301: sampling the student classroom feedback data collected in the step S2;
step S302: calculating a first objective function (namely, output loss of the first neural network) based on the sampled data, and adjusting network parameters of the first neural network according to a preset random gradient ascent algorithm;
step S303: calculating a second objective function (namely, output loss of the second neural network model) based on the sampled data, and adjusting network parameters of the second neural network model according to a preset random gradient ascent algorithm;
the recommendation and training process of the group teaching recommendation module is as follows:
initializing a group recommendation model, and initializing network parameters stored after training by using a pre-training module;
after a teacher starts teaching, a student class feedback data sequence is formed based on student class feedback of students in each class through a user terminal, and the first and second neural network models of the group recommendation model are respectively input; acquiring recommended teaching knowledge points and corresponding evaluation values of the next class based on the output of the teaching knowledge points, and storing the student class feedback data sequence, the recommended teaching knowledge points and the evaluation values; transmitting the recommended teaching knowledge point of the next classroom to the corresponding teacher;
after class, updating and training the group recommendation model based on historical data stored in the current updating period, wherein the historical data comprises a plurality of groups of data records, and each group of data comprises a student class feedback data sequence, recommended teaching knowledge points and evaluation values.
The technical scheme provided by the embodiment of the invention at least has the following beneficial effects:
compared with the prior art, the invention collects student data through interaction methods such as voting, question answering, homework, test and the like in a classroom, provides a teaching plan with the largest overall benefit for a given student group (such as the whole class), and the overall benefit can be represented by a multi-objective optimization function, and can specifically comprise (but is not limited to) pass rate, excellent rate, average and the like. The invention uses the deep reinforcement learning method to carry out target-oriented teaching path planning for teachers, and can process large-scale complex data. Meanwhile, the training process which takes most time is placed before and after class, and in the class, a teacher can immediately obtain recommended teaching knowledge points through class feedback of students.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required for the description of the embodiments will be briefly described below, and it is apparent that the drawings in the following description are only some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a block diagram of a group teaching recommendation system based on deep reinforcement learning according to an embodiment of the present invention;
fig. 2 is a teaching process data sequence diagram of a group teaching recommendation system based on deep reinforcement learning provided by the embodiment of the invention;
FIG. 3 is a flowchart of a pre-training module of a group teaching recommendation system based on deep reinforcement learning according to an embodiment of the present invention;
FIG. 4 is a flow chart of a group teaching recommendation module of a group teaching recommendation system based on deep reinforcement learning provided by an embodiment of the invention;
fig. 5 is a clip function diagram of a group teaching recommendation system based on deep reinforcement learning according to an embodiment of the present invention.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the present invention more apparent, the embodiments of the present invention will be described in further detail with reference to the accompanying drawings.
The embodiment of the invention provides a group teaching recommendation system based on deep reinforcement learning, as shown in fig. 1, the system comprises: user terminal (for teacher or student login system), knowledge point management module, student data management module, student model module, pre-training module and group teaching recommendation module. The specific process for realizing group teaching recommendation through data interaction among the modules comprises the following steps:
(1) The user terminal (teacher user) inputs knowledge point data through a knowledge point management module, and the knowledge point management module sends the knowledge point data to a student model module and a pre-training module group teaching recommendation module;
(2) The user terminal (student user) inputs student basic data through a student data management module, and the student data management module sends the student basic data to a student model module and a pre-training module; in the class, through interaction with the user terminal, student class feedback is collected and sent to the group teaching recommendation module;
(3) The student model module creates a student model based on the currently entered related information (knowledge point data and student basic data) according to a preset creation strategy and sends the student model to the pre-training module;
(4) The pre-training module takes the student model created by the student model module as a study subject, takes data sent by the knowledge point management module and the student data management module as training data, trains a preset initial group recommendation model, and obtains a trained group recommendation model;
(5) The group teaching recommendation module calls the group recommendation model trained by the pre-training module, and combines the student class feedback of each class of the course to output teaching recommendation information and sends the teaching recommendation information to the corresponding teacher user in the course teaching process; saving student classroom feedback collected by the raw data management module; and updating and training the group recommendation model based on student classroom feedback stored in the current period according to the configured model updating period in the course teaching process.
In this embodiment, the knowledge point management module is configured to: receiving and storing knowledge point data (namely knowledge point information) input by an expert; the received data is provided as a data set to other modules for use. The expert refers to a teacher or a teacher group with advanced teaching experience and familiar course knowledge points; the knowledge point information comprises knowledge point ID, belonging course name, knowledge point introduction, knowledge point content, knowledge point difficulty coefficient, prepositioned knowledge point ID of the knowledge point, class test questions matched with the knowledge point and knowledge point related data.
In this embodiment, the student data management module is configured to: receiving and storing student basic information input by students; collecting and storing student classroom feedback data through a classroom test interaction mode; the received data is provided as a data set to other modules for use. The student basic information comprises student numbers, names, ages, sexes, ages and student types; the classroom feedback data comprises a test question name, a belonging knowledge point ID, test question content, a participation test student ID and a student test result; the data set comprises a student basic information data set and a classroom feedback data set; the data sequence generated during teaching in this embodiment is shown in fig. 2.
In this embodiment, the student model module is configured to create a student model based on a student basic information data set; and simulating the group recommendation model training process of the real students in the pre-training module by using the student model. The student model is realized through an Eggy memory model, and the student model is used for describing several pieces of characteristic information:
(1) Describing the current mastery state of the virtual student for each knowledge point, the formula in this embodiment is as follows:
Figure BDA0003465431020000061
wherein ,Pi Representing the probability of a student grasping the ith knowledge point,
Figure BDA0003465431020000062
representing the mastering probability of a prepositioned knowledge point of an ith knowledge point, wherein theta is a difficulty coefficient, D is the time from the last learning of the knowledge point to the current interval according to the specific conditions of students and knowledge points, and S is the total number of times of learning the knowledge point;
(2) Describing how a virtual student transitions from one state to another by learning, in this embodiment by changing D and S in the above-described formulas;
(3) Describing the learned classroom feedback, in this embodiment, by sampling a random number between 0 and 1, if less than P in the above formula i It is considered that the question of the knowledge point can be answered correctly and otherwise not.
In this embodiment, the pre-training module is configured to train a group recommendation model based on the student model module through a near-end policy optimization algorithm before a class, and provide the group recommendation model for the group teaching recommendation module, and the flow is shown in fig. 3. The group recommendation model consists of a recommendation neural network and a comment family neural network. The recommended neural network is a cyclic neural network, and since feedback data of students is data of a sequence arranged according to time, the recommended neural network needs to be capable of processing sequence input, and in this embodiment, a long-term and short-term memory cyclic neural network is used to output knowledge points for teaching as recommended. The comment home neural network structure is similar to the recommendation neural network, namely the comment home neural network structure and the recommendation neural network are similar, namely the comment home neural network structure and the recommendation neural network are both composed of an input layer, a hiding layer and an output layer, wherein the input layer is used for inputting a student class feedback sequence, the number of layers of the hiding layer can be one layer or multiple layers, the number of layers of the comment home neural network structure and the hiding layer of the recommendation neural network can be consistent or different, the output layer is the main difference of the comment home neural network structure and the recommendation neural network, the output layer of the recommendation neural network is used for classified output, the output layer of the recommendation neural network adopts a softmax function, and the output information is used for representing the recommendation degree of each knowledge point of a current course in the next class (when a recommendation result is formed, the maximum recommendation degree is used as a recommendation result); the output layer of the commentator neural network adopts a Linear function, and the output information is used for representing the scoring value of the behavior at each sampling moment, namely the output of the commentator neural network is the evaluation (scoring value) of the current class teaching. The training group recommendation model by using a near-end strategy optimization algorithm (Proximal Policy Optimization, PPO) comprises the following specific training procedures:
(1) The student model created by the student model module is used as a virtual student to form a class to participate in training, and if the number of class is 20;
(2) Setting course requirement information;
(3) Initializing a group recommendation model, namely initializing network parameters of a recommendation neural network and a comment home neural network;
(4) Taking a whole class of virtual students as an environment, recommending a neural network and commenting family neural networks as an intelligent body, and training the intelligent body by using a near-end strategy optimization algorithm;
(5) After the training of the recommended neural network and the comment family neural network is completed, the network parameters of the current recommended neural network and the comment family neural network are saved and provided for the group teaching recommendation module.
As a possible implementation manner, in the training process of this embodiment, (2) the course requirement information includes a number of courses of 80, a passing rate required to be achieved at the end of a course of 0.8, and a better average value is obtained as an excellent rate of 0.2; (3) The initialization group recommendation model comprises two layers of neural networks and 64 hidden layer neurons; the flow of the near-end policy optimization algorithm in (4) is as follows:
(1) Recording the initial state of the virtual student;
(2) The following steps are cycled for specified times:
(2-a) cycling the following steps a specified number of times:
I. resetting the virtual student status to the initial status saved in (1);
II. The following steps are circulated until the learning times reach the set time, the test results returned by students in each cycle are recorded, the knowledge points output by the neural network are recommended, the evaluation values output by the family neural network are reviewed, and the rewarding values obtained by the knowledge points learned last time are calculated according to course requirement information through classroom feedback of all students, wherein the rewarding value formula is as follows:
Reward=λ 1 R p2 R e3 R a
wherein ,Rp Index and lattice rate, R e Indicate excellent rate, R a Refers to the evaluation and mastering probability lambda of all students on knowledge points 1 ,λ 2 and λ3 The weights are respectively represented, the values of the weights are larger than or equal to 0, and the specific values are empirical values, so that the invention is not particularly limited. Taken as 5,3,1 in this embodiment, respectively.
1) Allowing all virtual students to participate in classroom tests, and returning test results by the virtual students;
2) Transmitting classroom feedback into a recommendation neural network, and outputting a recommended knowledge point of the next teaching;
3) Transmitting classroom feedback into comment home nerve network, and outputting evaluation value;
4) All virtual students learn and recommend knowledge points output by the neural network;
(2-b) cycling through the following operations a specified number of times:
I. sampling from the data collected in (2-a).
II. Calculating an objective function by using the sampled data, and selecting a random gradient rising algorithm to train a recommended neural network, wherein the formula is as follows:
Figure BDA0003465431020000071
wherein ,θk Refers to the parameters of the recommended neural network during the kth training, D k Referring to the sampled data set, τ refers to the sampled data under a set of teaching paths, i.e., a complete teaching track sample (e.g., 40 lessons in time, 40 lessons after complete teaching, the 40 lessons teaching constitutes a set of data, D k Consists of a plurality of groups of tau), T is the duration of course and pi θ (a t |s t ) When the representation parameter is theta, at the time t, the input classroom feedback is s t The output is a t As shown in figure 4, i.e. the input parameters of clip () include r t (θ)
Figure BDA0003465431020000081
And represents the boundary value E, if r t (θ) is equal to or less than 1-e, clip () =1-e; if r t (θ) is equal to or greater than 1+_E, clip () =1+_E, if 1+_E < r t (θ) < 1+.e, clip () =r t (θ). In this embodiment, the boundary value e is 0.1./>
Figure BDA0003465431020000082
For the dominant value of the behavior at time t, the formula is as follows:
Figure BDA0003465431020000083
ξ t =r t +γV(s t+1 )-V(s t )
wherein ,ξt Representing the intermediate parameter at time t, i.e. intermediate parameter ζ at different times t According to xi t Calculated by the calculation formula of (2), gamma tableShowing the discount factor, in this example, the value is 0.99, T represents the total time so far, r t Represents the prize value obtained at time t, V (s t ) The comment value given by the comment home nerve network at the moment t is represented;
III, calculating an objective function by using the sampled data, and selecting a random gradient rising algorithm to train a critic neural network formula as follows:
Figure BDA0003465431020000084
wherein ,
Figure BDA0003465431020000085
parameters of comment on the family neural network during the kth training +.>
Figure BDA0003465431020000086
The representation is based on current network parameters
Figure BDA0003465431020000087
The output (comment value) of the home neural network is commented on at time t.
In this embodiment, the group teaching recommendation module is configured to receive and store classroom feedback data of students in a classroom; giving a knowledge point for recommending and teaching based on classroom feedback; and further training the group recommendation model by using classroom feedback data after the class. The recommendation and training process is shown in fig. 5:
(1) Initializing a group recommendation model, and initializing parameters stored after training by using a pre-training module;
(2) The teacher starts teaching, students give classroom feedback and input a recommended neural network and a comment family neural network, the recommended neural network outputs recommended teaching knowledge points, the comment family neural network outputs evaluation values, and all data are stored;
(3) And (2) circularly executing, and teaching by a teacher according to the recommended knowledge points. After a certain number of times, after class, calculating an objective function by using the data saved so far, and training the group recommendation model again;
(4) And (3) cycling until the course is finished.
Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and are not limiting; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention.
What has been described above is merely some embodiments of the present invention. It will be apparent to those skilled in the art that various modifications and improvements can be made without departing from the spirit of the invention.

Claims (10)

1. A group teaching recommendation system based on deep reinforcement learning, comprising: the system comprises a user terminal, a knowledge point management module, a student data management module, a student model module, a pre-training module and a group teaching recommendation module;
the user terminal is used for a teacher or a student to log in the system and is an interactive input and output terminal of the user and the system;
the knowledge point management module is used for a teacher user to input knowledge point data and send the knowledge point data to the student model module and the pre-training module group teaching recommendation module;
the student data management module is used for inputting student basic data by a student user and sending the student basic data to the student model module; the system comprises a group teaching recommendation module, a student classroom feedback acquisition module and a group teaching recommendation module, wherein the group teaching recommendation module is used for acquiring student classroom feedback in a classroom;
the student model module creates a student model based on currently entered knowledge point data and student basic data according to a preset creation strategy and sends the student model to the pre-training module;
the pre-training module takes the student model created by the student model module as a learning main body, takes data sent by the knowledge point management module and the student data management module as training data, trains a preset initial group recommendation model, and obtains a trained group recommendation model; the initial group recommendation model comprises a first neural network model and a second neural network model, wherein the first neural network model and the second neural network model comprise an input layer, at least one hidden layer and an output layer, the input layer is a student class feedback data sequence, the hidden layer is a neural network capable of processing sequence input, and the output layer of the first neural network model is used for outputting recommendation degree of each knowledge point of a current course; the output layer of the second neural network model is used for outputting the evaluation value of the current classroom teaching;
the group teaching recommendation module calls a group recommendation model trained by the pre-training module, and combines the student class feedback of each class of the course to output teaching recommendation information in the course of course teaching and sends the teaching recommendation information to the corresponding teacher user; saving student classroom feedback collected by the raw data management module; updating and training the group recommendation model based on student classroom feedback stored in the current period according to the configured model updating period in the course teaching process;
the output teaching recommendation information comprises recommended knowledge points of the next class and evaluation values of the current student class feedback data sequence, wherein the recommended knowledge points of the next class are the knowledge points with the maximum recommendation degree.
2. The group teaching recommendation system of claim 1, wherein the knowledge point data comprises: knowledge point ID, belonging course name, knowledge point brief introduction, knowledge point content, knowledge point difficulty coefficient, the prepositive knowledge point ID of the knowledge point, the matched class test questions of the knowledge point and knowledge point related data.
3. The group teaching recommendation system of claim 1, wherein the student base data comprises: student number, name, age, sex, age and student type; the student classroom feedback includes data including: the test question name, the belonging knowledge point ID, the test question content, the participation test student ID and the student test result.
4. The group teaching recommendation system according to claim 1, wherein the student model module uses a student model to simulate a real student participating in a group recommendation model training process in the pre-training module, and a construction model of the student model is an eibinos memory model, a half-life memory model or a bayesian knowledge tracking model;
and the description of the model includes:
describing the current mastering state of the virtual students for each knowledge point;
a process describing how a virtual student transitions from one state to another by learning;
classroom feedback after learning is described.
5. The group teaching recommendation system of claim 1, wherein the training of the initial group recommendation model by the pre-training module comprises:
a student model created by the student model module is used as a virtual student to form a class to participate in training;
setting course requirement information and initializing network parameters of the initial group recommendation model;
taking the whole class virtual students as environments, taking a first neural network model and a second neural network model of an initial group recommendation model as an intelligent body, training the intelligent body by adopting a near-end strategy optimization algorithm, and storing current network parameters when a preset training ending condition is met to obtain a trained group recommendation model.
6. The group teaching recommendation system of claim 5, wherein the curriculum requirements information comprises: the number of lessons and the pass rate to be achieved at the end of the lessons, the excellent rate and the average score.
7. The group teaching recommendation system of claim 1, wherein training the agent using a near-end policy optimization algorithm comprises:
step S1: recording the initial state of the virtual student;
step S2: judging whether the first cycle times reach a preset first maximum cycle times, if so, executing the step S3; otherwise, the following process is circularly performed:
step S201: resetting the virtual student status to the initial status recorded in step S1;
step S202: step S202-1 to step S202-4 are circularly executed until the cycle number reaches the preset maximum subcycling number; recording the classroom feedback of the virtual students in each cycle, the recommendation degree of the knowledge points output by the first neural network model, the evaluation value output by the second neural network model, and the rewarding value obtained by the knowledge points learned last time according to course requirement information through the classroom feedback of all the virtual students;
step S202-1: all virtual students participate in classroom learning, and the virtual students give student classroom feedback;
step S202-2: the student classroom feedback given in the step S202-1 is formed into a student classroom feedback data sequence and is input into a first neural network model, and knowledge points of the next teaching are obtained based on the output of the first neural network model, namely, the knowledge point with the maximum recommendation degree is used as the knowledge point of the next teaching;
step S202-3: the student classroom feedback given in the step S202-1 is formed into a student classroom feedback data sequence, a second neural network model is input, and an evaluation value of the student classroom feedback data sequence is obtained based on the output of the second neural network model;
step S202-4: all virtual students learn knowledge points of the next teaching obtained based on the first neural network model;
step S3: judging whether the preset second maximum cycle number is reached, if so, ending; otherwise, the following process is circularly performed:
step S301: sampling the student classroom feedback data collected in the step S2;
step S302: calculating a first objective function based on the sampled data, and adjusting network parameters of a first neural network model according to a preset random gradient ascent algorithm, wherein the first objective function is used for representing the output loss of the first neural network model;
step S303: and calculating a second objective function based on the sampled data, and adjusting network parameters of the second neural network model according to a preset random gradient ascent algorithm, wherein the second objective function is used for representing the output loss of the second neural network model.
8. The group teaching recommendation system of claim 7, wherein the first objective function is:
Figure FDA0004083256830000031
wherein ,
θ k+1 representing network parameters of the first neural network during the (k+1) th training;
D k representing a sampled dataset;
t represents the duration of the course;
τ represents sampled data under a set of teaching paths;
π θ (a t |s t ) When the network parameter is represented as theta, at the time t, the input classroom feedback of the student is s t Output is a t Probability of (2);
the input parameters of the function clip () include r t (θ) and the boundary value E, if r t (θ) is equal to or less than 1-e, clip () =1-e; if r t (θ) is equal to or greater than 1+_E, clip () =1+_E, if 1+_E < r t (θ) < 1+.e, clip () =r t (θ); wherein,
Figure FDA0004083256830000032
Figure FDA0004083256830000033
representing t moment behaviorThe calculation formula of the dominance value of (2) is as follows:
Figure FDA0004083256830000034
ξ t =r t +γV(s t+1 )-V(s t );
wherein ,ξt Represents an intermediate parameter at time t, gamma represents a preset discount factor, r t Represents the prize value obtained at time t, V (s t ) The comment value output by the second neural network model at the moment t is represented;
the second objective functions are respectively:
Figure FDA0004083256830000035
wherein ,
Figure FDA0004083256830000041
representing network parameters of the second neural network model during the (k+1) th training;
Figure FDA0004083256830000042
the representation is based on the current network parameters->
Figure FDA0004083256830000043
The second neural network model outputs a comment value at time t.
9. The group teaching recommendation system of claim 1, wherein the recommendation and training process of the group teaching recommendation module is:
initializing a group recommendation model, and initializing network parameters stored after training by using a pre-training module;
after a teacher starts teaching, a student class feedback data sequence is formed based on student class feedback of students in each class through a user terminal, and the first and second neural network models of the group recommendation model are respectively input; acquiring recommended teaching knowledge points and corresponding evaluation values of the next class based on the output of the teaching knowledge points, and storing the student class feedback data sequence, the recommended teaching knowledge points and the evaluation values; transmitting the recommended teaching knowledge point of the next classroom to the corresponding teacher;
after class, updating and training the group recommendation model based on historical data stored in the current updating period, wherein the historical data comprises a plurality of groups of data records, and each group of data comprises a student class feedback data sequence, recommended teaching knowledge points and evaluation values.
10. The group teaching recommendation system of claim 1, wherein the first and second neural network model hidden layers are long and short term memory recurrent neural networks.
CN202210028554.7A 2022-01-11 2022-01-11 Group teaching recommendation system based on deep reinforcement learning Active CN114595923B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210028554.7A CN114595923B (en) 2022-01-11 2022-01-11 Group teaching recommendation system based on deep reinforcement learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210028554.7A CN114595923B (en) 2022-01-11 2022-01-11 Group teaching recommendation system based on deep reinforcement learning

Publications (2)

Publication Number Publication Date
CN114595923A CN114595923A (en) 2022-06-07
CN114595923B true CN114595923B (en) 2023-04-28

Family

ID=81803873

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210028554.7A Active CN114595923B (en) 2022-01-11 2022-01-11 Group teaching recommendation system based on deep reinforcement learning

Country Status (1)

Country Link
CN (1) CN114595923B (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116521936B (en) * 2023-06-30 2023-09-01 云南师范大学 Course recommendation method and device based on user behavior analysis and storage medium
CN117114937B (en) * 2023-09-07 2024-06-14 深圳市真实智元科技有限公司 Method and device for generating exercise song based on artificial intelligence
CN117455389B (en) * 2023-10-10 2024-05-28 北京华普亿方科技集团股份有限公司 Vocational training management platform based on artificial intelligence
CN117688248B (en) * 2024-02-01 2024-04-26 安徽教育网络出版有限公司 Online course recommendation method and system based on convolutional neural network
CN117910481A (en) * 2024-03-20 2024-04-19 北京语言大学 Spoken language dialogue method and device for assisting language learning and dialogue robot

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108615423A (en) * 2018-06-21 2018-10-02 中山大学新华学院 Instructional management system (IMS) on a kind of line based on deep learning
CN113509726A (en) * 2021-04-16 2021-10-19 超参数科技(深圳)有限公司 Interactive model training method and device, computer equipment and storage medium

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3543918A1 (en) * 2018-03-20 2019-09-25 Flink AI GmbH Reinforcement learning method
CN108614865B (en) * 2018-04-08 2020-12-11 暨南大学 Personalized learning recommendation method based on deep reinforcement learning
CN109242207A (en) * 2018-10-10 2019-01-18 中山大学 A kind of Financial Time Series prediction technique based on deeply study
CN112307214A (en) * 2019-07-26 2021-02-02 株式会社理光 Deep reinforcement learning-based recommendation method and recommendation device
CN112700688B (en) * 2020-12-25 2021-09-24 电子科技大学 Intelligent classroom teaching auxiliary system
CN112784154B (en) * 2020-12-31 2022-03-15 电子科技大学 Online teaching recommendation system with data enhancement
CN113590929A (en) * 2021-01-28 2021-11-02 腾讯科技(深圳)有限公司 Information recommendation method and device based on artificial intelligence and electronic equipment
CN113033537B (en) * 2021-03-25 2022-07-01 北京百度网讯科技有限公司 Method, apparatus, device, medium and program product for training a model

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108615423A (en) * 2018-06-21 2018-10-02 中山大学新华学院 Instructional management system (IMS) on a kind of line based on deep learning
CN113509726A (en) * 2021-04-16 2021-10-19 超参数科技(深圳)有限公司 Interactive model training method and device, computer equipment and storage medium

Also Published As

Publication number Publication date
CN114595923A (en) 2022-06-07

Similar Documents

Publication Publication Date Title
CN114595923B (en) Group teaching recommendation system based on deep reinforcement learning
Varma et al. Preservice elementary teachers’ perceptions of their understanding of inquiry and inquiry-based science pedagogy: Influence of an elementary science education methods course and a science field experience
CN109215426A (en) A kind of student&#39;s learning information analysis system and its application method
CN109858797A (en) The various dimensions information analysis of the students method of knowledge based network exact on-line education system
CN109118861A (en) A kind of individualized intelligent tutoring system
Khanna et al. Expert systems advances in education
CN110263020A (en) On-line study item bank management system and management method
Noh et al. Intelligent tutoring system using rule-based and case-based: a comparison
CN109920288A (en) Adaptive learning task intelligence generating means and computer learning system
CN110046804A (en) A kind of education training method and system based on student&#39;s classification
Chan et al. Applying the genetic encoded conceptual graph to grouping learning
Schwartz et al. Choice-based assessments for the digital age
CN112951022A (en) Multimedia interactive education training system
Wang Exploration on the operation status and optimization strategy of networked teaching of physical education curriculum based on AI algorithm
Lee et al. Comparison of peer-to-peer and virtual simulation rehearsals in eliciting student thinking through number talks
Tang et al. Adaptive narrative game for personalized learning
CN115205072A (en) Cognitive diagnosis method for long-period evaluation
Fleener et al. Dimensions of teacher education accountability: A Louisiana perspective on value-added
Wan et al. Adaptive course generation based on evolutionary algorithm
KR100995679B1 (en) Personalized E-Learning System for the Written Examination of Driver&#39;s License Test
Zou et al. A novel learning early-warning model based on knowledge points and question types
CN114155124B (en) Test question resource recommendation method and system
Javadi et al. Improving student's modeling framework in a tutorial-like system based on Pursuit learning automata and reinforcement learning
Wang et al. Research on the Reform of English Precision Teaching in Colleges and Universities Facilitated by Artificial Intelligence Technology
Kamha et al. Implementation of a Curriculum to Enhance Learning Management Competency in Computational Thinking for the Lower Secondary Teachers

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant