CN114595923B

CN114595923B - Group teaching recommendation system based on deep reinforcement learning

Info

Publication number: CN114595923B
Application number: CN202210028554.7A
Authority: CN
Inventors: 杨腾杰; 左琳; 陈柯弟; 刘念伯
Original assignee: University of Electronic Science and Technology of China
Current assignee: University of Electronic Science and Technology of China
Priority date: 2022-01-11
Filing date: 2022-01-11
Publication date: 2023-04-28
Anticipated expiration: 2042-01-11
Also published as: CN114595923A

Abstract

The invention discloses a group teaching recommendation system based on deep reinforcement learning, and belongs to the technical field of education and information. The invention collects student data through interaction methods such as voting, question answering, homework, test and the like in a classroom, provides a teaching plan with the largest overall benefit for a given student group, and the overall benefit can be represented by a multi-objective optimization function, and particularly can comprise, but is not limited to, pass rate, excellent rate, average and the like. The invention uses the deep reinforcement learning method to carry out target-oriented teaching path planning for teachers, and can process large-scale complex data. Meanwhile, the training process which takes most time is placed before and after class, and in the class, a teacher can immediately obtain recommended teaching knowledge points through class feedback of students.

Description

Group teaching recommendation system based on deep reinforcement learning

Technical Field

The invention relates to the technical field of education and information, in particular to a group teaching recommendation system based on deep reinforcement learning.

Background

In conventional classroom teaching, teachers often arrange learning content empirically because the learning details of students are not visible and uncontrolled. Of course, the teacher may have a variety of information including questions and answers, classroom assessment, and facial expressions, gestures, and physical actions of the student to assess the student's learning performance. But this information is often rough and cannot cover every student or track every person's learning details, which often makes it impossible for teachers to design teaching paths on a fine granularity. The development of teaching auxiliary systems relieves the difficulty faced by teachers. The teaching auxiliary system provides various teacher-student interaction methods, interaction information can be recorded, and a teacher can more accurately and deeply understand student conditions through the interaction information. On the other hand, the teaching auxiliary system can also provide recommended teaching plans or learning plans for teachers or students, so that the working pressure of the teachers is relieved to a greater extent.

The patent application with publication number CN 112700688A discloses an intelligent classroom teaching auxiliary system. And collecting student learning data through interaction methods such as voting in a class, modeling and tracking the students based on the data, and finally giving a recommended teaching plan according to a model of the students in the whole class. However, the recommendation algorithm is to simulate the learning process of students in various teaching plans based on the current student model, and finally select the teaching plan with the best simulation effect as the recommendation. In order to obtain a better recommendation, it is necessary to simulate as much as possible the situation under all possible teaching plans, which brings about a large amount of calculation and time consumption. With more students and more knowledge points, the resulting long wait may be unacceptable, resulting in a teacher not being able to get timely feedback in the class.

Disclosure of Invention

The invention provides a group teaching recommendation system based on deep reinforcement learning, which can be used for improving the processing efficiency of group teaching recommendation.

The technical scheme adopted by the invention is as follows:

a group teaching recommendation system based on deep reinforcement learning, the system comprising: the system comprises a user terminal, a knowledge point management module, a student data management module, a student model module, a pre-training module and a group teaching recommendation module;

the user terminal is used for a teacher or a student to log in the system and is an interactive input and output terminal of the user and the system;

the knowledge point management module is used for a teacher user to input knowledge point data and send the knowledge point data to the student model module and the pre-training module group teaching recommendation module;

the student data management module is used for inputting student basic data by a student user and sending the student basic data to the student model module; the system comprises a group teaching recommendation module, a student classroom feedback acquisition module and a group teaching recommendation module, wherein the group teaching recommendation module is used for acquiring student classroom feedback in a classroom;

the student model module creates a student model based on currently entered knowledge point data and student basic data according to a preset creation strategy and sends the student model to the pre-training module;

the pre-training module takes the student model created by the student model module as a learning main body, takes data sent by the knowledge point management module and the student data management module as training data, trains a preset initial group recommendation model, and obtains a trained group recommendation model; the initial group recommendation model comprises a first neural network model and a second neural network model, wherein the first neural network model and the second neural network model comprise an input layer, at least one hidden layer and an output layer, the input layer is a student class feedback data sequence, the hidden layer is a neural network capable of processing sequence input, and the output layer of the first neural network model is used for outputting recommendation degree of each knowledge point of a current course; the output layer of the second neural network model is used for outputting the evaluation value of the current classroom teaching, namely the evaluation value of the teaching behavior which is executed currently; the group teaching recommendation module calls a group recommendation model trained by the pre-training module, and combines the student class feedback of each class of the course to output teaching recommendation information in the course of course teaching and sends the teaching recommendation information to the corresponding teacher user; saving student classroom feedback collected by the raw data management module; updating and training the group recommendation model based on student classroom feedback stored in the current period according to the configured model updating period in the course teaching process;

the output teaching recommendation information comprises recommended knowledge points of the next class and evaluation values of a current student class feedback data sequence, wherein the recommended knowledge points of the next class are knowledge points with the maximum recommendation degree;

further, the knowledge point data includes: knowledge point ID, belonging course name, knowledge point brief introduction, knowledge point content, knowledge point difficulty coefficient, the prepositive knowledge point ID of the knowledge point, the matched class test questions of the knowledge point and knowledge point related data.

Further, the student base data includes: student number, name, age, sex, age and student type; the student classroom feedback includes data including: test question names, belonging to knowledge point IDs, test question contents, participation in testing student IDs, student test results and the like.

Further, the student model module uses a student model to simulate a group recommendation model training process of a real student in the pre-training module, and a construction model of the student model is an Aibinhaos memory model, a half-life memory model or a Bayesian knowledge tracking model;

and the description of the model includes:

describing the current mastering state of the virtual students for each knowledge point;

a process describing how a virtual student transitions from one state to another by learning;

classroom feedback after learning is described.

Further, the training of the initial group recommendation model by the pre-training module includes:

a student model created by the student model module is used as a virtual student to form a class to participate in training;

setting course requirement information and initializing network parameters of the initial group recommendation model;

taking the whole class virtual students as environments, taking a first neural network model and a second neural network model of an initial group recommendation model as an intelligent body, training the intelligent body by adopting a near-end strategy optimization algorithm, and storing current network parameters when a preset training ending condition is met to obtain a trained group recommendation model.

The curriculum schedule information includes: the number of lessons and the pass rate, excellent rate, average score and the like which are needed to be achieved when the lessons are finished.

Further, training the agent using the near-end policy optimization algorithm includes:

step S1: recording the initial state of the virtual student;

step S2: judging whether the first cycle times reach a preset first maximum cycle times, if so, executing the step S3; otherwise, the following process is circularly performed:

step S201: resetting the virtual student status to the initial status recorded in step S1;

step S202: step S202-1 to step S202-4 are circularly executed until the cycle number reaches the preset maximum subcycling number; recording the classroom feedback of the virtual students in each cycle, the recommendation degree of the knowledge points output by the first neural network, the evaluation value output by the second neural network model, and calculating the rewarding value obtained by the knowledge points learned last time according to course requirement information through the classroom feedback of all the virtual students;

step S202-1: all virtual students participate in classroom learning, and the virtual students give student classroom feedback;

step S202-2: the student classroom feedback given in the step S202-1 is formed into a student classroom feedback data sequence and is input into a first neural network, and knowledge points of the next teaching are obtained based on the output of the first neural network, namely, the knowledge point with the maximum recommendation degree is used as the knowledge point of the next teaching;

step S202-3: the student classroom feedback given in the step S202-1 is formed into a student classroom feedback data sequence, a second neural network model is input, and an evaluation value of the student classroom feedback data sequence is obtained based on the output of the second neural network model;

step S202-4: all virtual students learn knowledge points of the next teaching based on the first neural network;

step S3: judging whether the preset second maximum cycle number is reached, if so, ending; otherwise, the following process is circularly performed:

step S301: sampling the student classroom feedback data collected in the step S2;

step S302: calculating a first objective function (namely, output loss of the first neural network) based on the sampled data, and adjusting network parameters of the first neural network according to a preset random gradient ascent algorithm;

step S303: calculating a second objective function (namely, output loss of the second neural network model) based on the sampled data, and adjusting network parameters of the second neural network model according to a preset random gradient ascent algorithm;

the recommendation and training process of the group teaching recommendation module is as follows:

initializing a group recommendation model, and initializing network parameters stored after training by using a pre-training module;

after a teacher starts teaching, a student class feedback data sequence is formed based on student class feedback of students in each class through a user terminal, and the first and second neural network models of the group recommendation model are respectively input; acquiring recommended teaching knowledge points and corresponding evaluation values of the next class based on the output of the teaching knowledge points, and storing the student class feedback data sequence, the recommended teaching knowledge points and the evaluation values; transmitting the recommended teaching knowledge point of the next classroom to the corresponding teacher;

after class, updating and training the group recommendation model based on historical data stored in the current updating period, wherein the historical data comprises a plurality of groups of data records, and each group of data comprises a student class feedback data sequence, recommended teaching knowledge points and evaluation values.

The technical scheme provided by the embodiment of the invention at least has the following beneficial effects:

compared with the prior art, the invention collects student data through interaction methods such as voting, question answering, homework, test and the like in a classroom, provides a teaching plan with the largest overall benefit for a given student group (such as the whole class), and the overall benefit can be represented by a multi-objective optimization function, and can specifically comprise (but is not limited to) pass rate, excellent rate, average and the like. The invention uses the deep reinforcement learning method to carry out target-oriented teaching path planning for teachers, and can process large-scale complex data. Meanwhile, the training process which takes most time is placed before and after class, and in the class, a teacher can immediately obtain recommended teaching knowledge points through class feedback of students.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required for the description of the embodiments will be briefly described below, and it is apparent that the drawings in the following description are only some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a block diagram of a group teaching recommendation system based on deep reinforcement learning according to an embodiment of the present invention;

fig. 2 is a teaching process data sequence diagram of a group teaching recommendation system based on deep reinforcement learning provided by the embodiment of the invention;

FIG. 3 is a flowchart of a pre-training module of a group teaching recommendation system based on deep reinforcement learning according to an embodiment of the present invention;

FIG. 4 is a flow chart of a group teaching recommendation module of a group teaching recommendation system based on deep reinforcement learning provided by an embodiment of the invention;

fig. 5 is a clip function diagram of a group teaching recommendation system based on deep reinforcement learning according to an embodiment of the present invention.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the present invention more apparent, the embodiments of the present invention will be described in further detail with reference to the accompanying drawings.

The embodiment of the invention provides a group teaching recommendation system based on deep reinforcement learning, as shown in fig. 1, the system comprises: user terminal (for teacher or student login system), knowledge point management module, student data management module, student model module, pre-training module and group teaching recommendation module. The specific process for realizing group teaching recommendation through data interaction among the modules comprises the following steps:

(1) The user terminal (teacher user) inputs knowledge point data through a knowledge point management module, and the knowledge point management module sends the knowledge point data to a student model module and a pre-training module group teaching recommendation module;

(2) The user terminal (student user) inputs student basic data through a student data management module, and the student data management module sends the student basic data to a student model module and a pre-training module; in the class, through interaction with the user terminal, student class feedback is collected and sent to the group teaching recommendation module;

(3) The student model module creates a student model based on the currently entered related information (knowledge point data and student basic data) according to a preset creation strategy and sends the student model to the pre-training module;

(4) The pre-training module takes the student model created by the student model module as a study subject, takes data sent by the knowledge point management module and the student data management module as training data, trains a preset initial group recommendation model, and obtains a trained group recommendation model;

(5) The group teaching recommendation module calls the group recommendation model trained by the pre-training module, and combines the student class feedback of each class of the course to output teaching recommendation information and sends the teaching recommendation information to the corresponding teacher user in the course teaching process; saving student classroom feedback collected by the raw data management module; and updating and training the group recommendation model based on student classroom feedback stored in the current period according to the configured model updating period in the course teaching process.

In this embodiment, the knowledge point management module is configured to: receiving and storing knowledge point data (namely knowledge point information) input by an expert; the received data is provided as a data set to other modules for use. The expert refers to a teacher or a teacher group with advanced teaching experience and familiar course knowledge points; the knowledge point information comprises knowledge point ID, belonging course name, knowledge point introduction, knowledge point content, knowledge point difficulty coefficient, prepositioned knowledge point ID of the knowledge point, class test questions matched with the knowledge point and knowledge point related data.

In this embodiment, the student data management module is configured to: receiving and storing student basic information input by students; collecting and storing student classroom feedback data through a classroom test interaction mode; the received data is provided as a data set to other modules for use. The student basic information comprises student numbers, names, ages, sexes, ages and student types; the classroom feedback data comprises a test question name, a belonging knowledge point ID, test question content, a participation test student ID and a student test result; the data set comprises a student basic information data set and a classroom feedback data set; the data sequence generated during teaching in this embodiment is shown in fig. 2.

In this embodiment, the student model module is configured to create a student model based on a student basic information data set; and simulating the group recommendation model training process of the real students in the pre-training module by using the student model. The student model is realized through an Eggy memory model, and the student model is used for describing several pieces of characteristic information:

(1) Describing the current mastery state of the virtual student for each knowledge point, the formula in this embodiment is as follows:

wherein ,P_i Representing the probability of a student grasping the ith knowledge point,

representing the mastering probability of a prepositioned knowledge point of an ith knowledge point, wherein theta is a difficulty coefficient, D is the time from the last learning of the knowledge point to the current interval according to the specific conditions of students and knowledge points, and S is the total number of times of learning the knowledge point;

(2) Describing how a virtual student transitions from one state to another by learning, in this embodiment by changing D and S in the above-described formulas;

(3) Describing the learned classroom feedback, in this embodiment, by sampling a random number between 0 and 1, if less than P in the above formula _i It is considered that the question of the knowledge point can be answered correctly and otherwise not.

In this embodiment, the pre-training module is configured to train a group recommendation model based on the student model module through a near-end policy optimization algorithm before a class, and provide the group recommendation model for the group teaching recommendation module, and the flow is shown in fig. 3. The group recommendation model consists of a recommendation neural network and a comment family neural network. The recommended neural network is a cyclic neural network, and since feedback data of students is data of a sequence arranged according to time, the recommended neural network needs to be capable of processing sequence input, and in this embodiment, a long-term and short-term memory cyclic neural network is used to output knowledge points for teaching as recommended. The comment home neural network structure is similar to the recommendation neural network, namely the comment home neural network structure and the recommendation neural network are similar, namely the comment home neural network structure and the recommendation neural network are both composed of an input layer, a hiding layer and an output layer, wherein the input layer is used for inputting a student class feedback sequence, the number of layers of the hiding layer can be one layer or multiple layers, the number of layers of the comment home neural network structure and the hiding layer of the recommendation neural network can be consistent or different, the output layer is the main difference of the comment home neural network structure and the recommendation neural network, the output layer of the recommendation neural network is used for classified output, the output layer of the recommendation neural network adopts a softmax function, and the output information is used for representing the recommendation degree of each knowledge point of a current course in the next class (when a recommendation result is formed, the maximum recommendation degree is used as a recommendation result); the output layer of the commentator neural network adopts a Linear function, and the output information is used for representing the scoring value of the behavior at each sampling moment, namely the output of the commentator neural network is the evaluation (scoring value) of the current class teaching. The training group recommendation model by using a near-end strategy optimization algorithm (Proximal Policy Optimization, PPO) comprises the following specific training procedures:

(1) The student model created by the student model module is used as a virtual student to form a class to participate in training, and if the number of class is 20;

(2) Setting course requirement information;

(3) Initializing a group recommendation model, namely initializing network parameters of a recommendation neural network and a comment home neural network;

(4) Taking a whole class of virtual students as an environment, recommending a neural network and commenting family neural networks as an intelligent body, and training the intelligent body by using a near-end strategy optimization algorithm;

(5) After the training of the recommended neural network and the comment family neural network is completed, the network parameters of the current recommended neural network and the comment family neural network are saved and provided for the group teaching recommendation module.

As a possible implementation manner, in the training process of this embodiment, (2) the course requirement information includes a number of courses of 80, a passing rate required to be achieved at the end of a course of 0.8, and a better average value is obtained as an excellent rate of 0.2; (3) The initialization group recommendation model comprises two layers of neural networks and 64 hidden layer neurons; the flow of the near-end policy optimization algorithm in (4) is as follows:

(1) Recording the initial state of the virtual student;

(2) The following steps are cycled for specified times:

(2-a) cycling the following steps a specified number of times:

I. resetting the virtual student status to the initial status saved in (1);

II. The following steps are circulated until the learning times reach the set time, the test results returned by students in each cycle are recorded, the knowledge points output by the neural network are recommended, the evaluation values output by the family neural network are reviewed, and the rewarding values obtained by the knowledge points learned last time are calculated according to course requirement information through classroom feedback of all students, wherein the rewarding value formula is as follows:

Reward＝λ ₁ R _p +λ ₂ R _e +λ ₃ R _a

wherein ,R_p Index and lattice rate, R _e Indicate excellent rate, R _a Refers to the evaluation and mastering probability lambda of all students on knowledge points ₁ ，λ ₂ and λ₃ The weights are respectively represented, the values of the weights are larger than or equal to 0, and the specific values are empirical values, so that the invention is not particularly limited. Taken as 5,3,1 in this embodiment, respectively.

1) Allowing all virtual students to participate in classroom tests, and returning test results by the virtual students;

2) Transmitting classroom feedback into a recommendation neural network, and outputting a recommended knowledge point of the next teaching;

3) Transmitting classroom feedback into comment home nerve network, and outputting evaluation value;

4) All virtual students learn and recommend knowledge points output by the neural network;

(2-b) cycling through the following operations a specified number of times:

I. sampling from the data collected in (2-a).

II. Calculating an objective function by using the sampled data, and selecting a random gradient rising algorithm to train a recommended neural network, wherein the formula is as follows:

wherein ,θ_k Refers to the parameters of the recommended neural network during the kth training, D _k Referring to the sampled data set, τ refers to the sampled data under a set of teaching paths, i.e., a complete teaching track sample (e.g., 40 lessons in time, 40 lessons after complete teaching, the 40 lessons teaching constitutes a set of data, D _k Consists of a plurality of groups of tau), T is the duration of course and pi _θ (a _t |s _t ) When the representation parameter is theta, at the time t, the input classroom feedback is s _t The output is a _t As shown in figure 4, i.e. the input parameters of clip () include r _t (θ)

And represents the boundary value E, if r _t (θ) is equal to or less than 1-e, clip () =1-e; if r _t (θ) is equal to or greater than 1+_E, clip () =1+_E, if 1+_E < r _t (θ) < 1+.e, clip () =r _t (θ). In this embodiment, the boundary value e is 0.1./>

For the dominant value of the behavior at time t, the formula is as follows:

ξ _t ＝r _t +γV(s _t+1 )-V(s _t )

wherein ,ξ_t Representing the intermediate parameter at time t, i.e. intermediate parameter ζ at different times _t According to xi _t Calculated by the calculation formula of (2), gamma tableShowing the discount factor, in this example, the value is 0.99, T represents the total time so far, r _t Represents the prize value obtained at time t, V (s _t ) The comment value given by the comment home nerve network at the moment t is represented;

III, calculating an objective function by using the sampled data, and selecting a random gradient rising algorithm to train a critic neural network formula as follows:

wherein ,

parameters of comment on the family neural network during the kth training +.>

The representation is based on current network parameters

The output (comment value) of the home neural network is commented on at time t.

In this embodiment, the group teaching recommendation module is configured to receive and store classroom feedback data of students in a classroom; giving a knowledge point for recommending and teaching based on classroom feedback; and further training the group recommendation model by using classroom feedback data after the class. The recommendation and training process is shown in fig. 5:

(1) Initializing a group recommendation model, and initializing parameters stored after training by using a pre-training module;

(2) The teacher starts teaching, students give classroom feedback and input a recommended neural network and a comment family neural network, the recommended neural network outputs recommended teaching knowledge points, the comment family neural network outputs evaluation values, and all data are stored;

(3) And (2) circularly executing, and teaching by a teacher according to the recommended knowledge points. After a certain number of times, after class, calculating an objective function by using the data saved so far, and training the group recommendation model again;

(4) And (3) cycling until the course is finished.

Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and are not limiting; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention.

What has been described above is merely some embodiments of the present invention. It will be apparent to those skilled in the art that various modifications and improvements can be made without departing from the spirit of the invention.

Claims

1. A group teaching recommendation system based on deep reinforcement learning, comprising: the system comprises a user terminal, a knowledge point management module, a student data management module, a student model module, a pre-training module and a group teaching recommendation module;

the pre-training module takes the student model created by the student model module as a learning main body, takes data sent by the knowledge point management module and the student data management module as training data, trains a preset initial group recommendation model, and obtains a trained group recommendation model; the initial group recommendation model comprises a first neural network model and a second neural network model, wherein the first neural network model and the second neural network model comprise an input layer, at least one hidden layer and an output layer, the input layer is a student class feedback data sequence, the hidden layer is a neural network capable of processing sequence input, and the output layer of the first neural network model is used for outputting recommendation degree of each knowledge point of a current course; the output layer of the second neural network model is used for outputting the evaluation value of the current classroom teaching;

the group teaching recommendation module calls a group recommendation model trained by the pre-training module, and combines the student class feedback of each class of the course to output teaching recommendation information in the course of course teaching and sends the teaching recommendation information to the corresponding teacher user; saving student classroom feedback collected by the raw data management module; updating and training the group recommendation model based on student classroom feedback stored in the current period according to the configured model updating period in the course teaching process;

the output teaching recommendation information comprises recommended knowledge points of the next class and evaluation values of the current student class feedback data sequence, wherein the recommended knowledge points of the next class are the knowledge points with the maximum recommendation degree.

2. The group teaching recommendation system of claim 1, wherein the knowledge point data comprises: knowledge point ID, belonging course name, knowledge point brief introduction, knowledge point content, knowledge point difficulty coefficient, the prepositive knowledge point ID of the knowledge point, the matched class test questions of the knowledge point and knowledge point related data.

3. The group teaching recommendation system of claim 1, wherein the student base data comprises: student number, name, age, sex, age and student type; the student classroom feedback includes data including: the test question name, the belonging knowledge point ID, the test question content, the participation test student ID and the student test result.

4. The group teaching recommendation system according to claim 1, wherein the student model module uses a student model to simulate a real student participating in a group recommendation model training process in the pre-training module, and a construction model of the student model is an eibinos memory model, a half-life memory model or a bayesian knowledge tracking model;

and the description of the model includes:

classroom feedback after learning is described.

5. The group teaching recommendation system of claim 1, wherein the training of the initial group recommendation model by the pre-training module comprises:

6. The group teaching recommendation system of claim 5, wherein the curriculum requirements information comprises: the number of lessons and the pass rate to be achieved at the end of the lessons, the excellent rate and the average score.

7. The group teaching recommendation system of claim 1, wherein training the agent using a near-end policy optimization algorithm comprises:

step S1: recording the initial state of the virtual student;

step S202: step S202-1 to step S202-4 are circularly executed until the cycle number reaches the preset maximum subcycling number; recording the classroom feedback of the virtual students in each cycle, the recommendation degree of the knowledge points output by the first neural network model, the evaluation value output by the second neural network model, and the rewarding value obtained by the knowledge points learned last time according to course requirement information through the classroom feedback of all the virtual students;

step S202-2: the student classroom feedback given in the step S202-1 is formed into a student classroom feedback data sequence and is input into a first neural network model, and knowledge points of the next teaching are obtained based on the output of the first neural network model, namely, the knowledge point with the maximum recommendation degree is used as the knowledge point of the next teaching;

step S202-4: all virtual students learn knowledge points of the next teaching obtained based on the first neural network model;

step S302: calculating a first objective function based on the sampled data, and adjusting network parameters of a first neural network model according to a preset random gradient ascent algorithm, wherein the first objective function is used for representing the output loss of the first neural network model;

step S303: and calculating a second objective function based on the sampled data, and adjusting network parameters of the second neural network model according to a preset random gradient ascent algorithm, wherein the second objective function is used for representing the output loss of the second neural network model.

8. The group teaching recommendation system of claim 7, wherein the first objective function is:

wherein ,

θ _k+1 representing network parameters of the first neural network during the (k+1) th training;

D _k representing a sampled dataset;

t represents the duration of the course;

τ represents sampled data under a set of teaching paths;

π _θ (a _t |s _t ) When the network parameter is represented as theta, at the time t, the input classroom feedback of the student is s _t Output is a _t Probability of (2);

the input parameters of the function clip () include r _t (θ) and the boundary value E, if r _t (θ) is equal to or less than 1-e, clip () =1-e; if r _t (θ) is equal to or greater than 1+_E, clip () =1+_E, if 1+_E < r _t (θ) < 1+.e, clip () =r _t (θ); wherein,

representing t moment behaviorThe calculation formula of the dominance value of (2) is as follows:

ξ _t ＝r _t +γV(s _t+1 )-V(s _t )；

wherein ,ξ_t Represents an intermediate parameter at time t, gamma represents a preset discount factor, r _t Represents the prize value obtained at time t, V (s _t ) The comment value output by the second neural network model at the moment t is represented;

the second objective functions are respectively:

wherein ,

representing network parameters of the second neural network model during the (k+1) th training;

the representation is based on the current network parameters->

The second neural network model outputs a comment value at time t.

9. The group teaching recommendation system of claim 1, wherein the recommendation and training process of the group teaching recommendation module is:

10. The group teaching recommendation system of claim 1, wherein the first and second neural network model hidden layers are long and short term memory recurrent neural networks.