CN108614865B

CN108614865B - Personalized learning recommendation method based on deep reinforcement learning

Info

Publication number: CN108614865B
Application number: CN201810307140.1A
Authority: CN
Inventors: 汤胤; 黄书强; 王雯
Original assignee: Jinan University
Current assignee: Jinan University
Priority date: 2018-04-08
Filing date: 2018-04-08
Publication date: 2020-12-11
Anticipated expiration: 2038-04-08
Also published as: CN108614865A

Abstract

The invention discloses a personalized learning recommendation method based on deep reinforcement learning, which comprises the following steps: defining difficulty attributes of knowledge points and questions, and constructing a knowledge point network graph according to the relation between the knowledge points; determining the relation between the topics under the knowledge points according to the relation between the knowledge points, and constructing a topic network graph; according to the user behavior data, obtaining a sub-graph in the current state for a specified user in the question network graph as a learning boundary; and then, modeling by using a deep reinforcement learning algorithm and a user history record, and training to obtain how to select a cut set strategy in a sub-graph of the user in the current state. The method can intelligently recommend the best questions to the user, save the learning time of the user, improve the learning efficiency and improve the learning experience.

Description

Personalized learning recommendation method based on deep reinforcement learning

Technical Field

The invention relates to the field of personalized learning recommendation research, in particular to a personalized learning recommendation method based on deep reinforcement learning.

Background

Along with the release of more and more internet education platforms at present, the network learning resources are greatly enriched, users can learn at any time and any place, and simultaneously, the users can obtain tests at any time, and the experience is self-evident to the convenience of the users. However, the learning effect is greatly influenced by the differences of students in the aspects of individual difference, interest, learning style and the like, and the condition of low learning efficiency and difficulty in performing teaching according to the situation exists in non-differentiated teaching. The american psychologist Noel tiky (nonel Tichy) proposed that the most ideal state for one to learn is the "stretch zone" where things that are often in learning are appropriately challenging. Then, mining the learning behaviors of the user, and finding the question of the learning area has very important significance for the learning process of the user when the user is recommended. In addition, due to the popularization of the internet education learning platform, learning resources which are most suitable for the cognitive level of a user can be rapidly presented, and personalized recommendation is performed by finding the most suitable questions for students in the question sea. The popularization of platforms and the increase of the number of users also accumulate more and more behavior data of user network learning. How to utilize the behavior data of the user to recommend learning teaching materials or subjects suitable for the user, so that the improvement of the learning experience of the user becomes a hotspot of current research.

At present, there are related researches on behavior data of a current user, modeling is performed according to the behavior data, and personalized topics are recommended for the user. The two methods have the problems that information contained in user behaviors is easy to ignore, the resource utilization rate is not high, the recommendation output is unstable, the precision is low and the like.

Disclosure of Invention

The invention aims to overcome the defect that the prior art can not perform personalized recommendation, and provides a personalized learning recommendation method based on deep reinforcement learning.

The purpose of the invention is realized by the following technical scheme: the personalized learning recommendation method based on the deep reinforcement learning comprises the following steps:

(1) defining difficulty attributes of knowledge points and questions, and constructing a knowledge point network graph according to the relation between the knowledge points;

(2) determining the relation between the topics under the knowledge points according to the relation between the knowledge points, and constructing a topic network graph;

(3) obtaining a sub-graph of the appointed user in the current state in the topic network graph according to the user behavior data;

(4) and (3) modeling by using a deep reinforcement learning algorithm and utilizing the user history record, and training to obtain how to select a cut set in the subgraph of the user in the current state, namely a user 'learning area' strategy.

Preferably, in the step (1), the difficulty attribute value of the knowledge point is defined by depending on expert or user data modeling, and the difficulty attribute of the topic is defined by depending on the difficulty attribute value of the knowledge point where the topic is located and the difficulty of the topic itself on the expert or user data modeling.

Preferably, in the step (1), the knowledge point network graph is modeled by using knowledge points as nodes, difficulty attribute values of the knowledge points as difficulty attribute values of the nodes, edges are established according to relationships between the knowledge points, the relationship degree between the knowledge points is used as a weight value of the edges, and the relationships depend on experts or user data.

Preferably, in the step (2), the topic network graph is that a topic under a knowledge point is used as a node, a difficulty attribute value of the topic is used as a topic difficulty attribute value of the node, a difficulty attribute value of the knowledge point where the topic is located is used as a knowledge point difficulty attribute value of the node, a continuous edge is established according to a relationship between topics under knowledge points with continuous edges and a relationship between topics under the same knowledge point, and a relationship degree between topics is used as a weight value of the continuous edge.

Preferably, in step (3), the method for constructing the sub-graph in the current state of the user is as follows: according to the user behavior data, finding forward or backward nodes of the answered topic nodes in the topic network graph according to the user behavior data, wherein the found nodes and the connecting edges and the weights of the connecting edges form a subgraph of the user in the current state.

Preferably, in the step (4), a deep reinforcement learning model is constructed, the historical answer records of the user are used as the state of the deep reinforcement learning model, the question selection strategy according to the difficulty attribute of the nodes in the subgraph in the current state of the user is used as an action set, the return value is determined according to the correct number of the answers of the user, deep reinforcement learning training is carried out through a certain number of answer processes, a cut set strategy is selected from the subgraph in the current state of the user, and the cut set is the question of a learning area in the personalized learning recommendation.

Compared with the prior art, the invention has the following advantages and beneficial effects:

1. according to the method, the modeling is carried out according to the learning behaviors of the user, the deep reinforcement learning algorithm is used for learning the behaviors of the user, and the learning area of the user is obtained, so that the final question recommending the response of the user is suitable for the difficulty of the user, the response of the user can be better accurate, and the purpose of efficiently learning the user is achieved.

2. Based on the complex network diagram, the invention finds the topics associated with the user historical behaviors in the topic network diagram according to the user historical behaviors, and can fully utilize the user historical behavior information to mine the effective information of the user behaviors.

3. In the deep reinforcement learning training process, when a deep reinforcement learning model is constructed, the user behavior sequence is used for modeling, namely, the deep reinforcement learning training is carried out through a certain amount of answers, the latest answer record of the user is used as the state after each answer, and the update is carried out after each answer, so that the selected state can effectively reflect the individuation of the user.

4. The method can intelligently select the learning area of the user, namely, a strategy of carrying out personalized question recommendation on the user is learned by utilizing a deep reinforcement learning algorithm, so that the purpose of intelligently recommending the question to the user, namely, the question in the range of the learning area is achieved, and the user experience is better.

Drawings

Fig. 1 is a schematic diagram of the principle of the method of the present embodiment, (a) shows a knowledge point network graph structure, (b) shows a topic network structure under the same knowledge point, (c) shows a topic network structure under an associated knowledge point, (d) shows a structure of selected user behavior data in a topic network graph, (e) shows forward and backward nodes for finding the topic node in the topic network graph, (f) shows a structure of a sub-graph under a current state of a user, and (g) shows an obtained "learning region" topic.

FIG. 2 is a process diagram of the deep reinforcement learning training of the present invention.

Fig. 3 is a relationship between data, operations, and the like in the implementation of the method of the present embodiment.

Detailed Description

The present invention will be described in further detail with reference to examples and drawings, but the present invention is not limited thereto.

Examples

The embodiment provides a personalized learning recommendation method based on deep reinforcement learning, which comprises the steps of forming a knowledge point network graph by using a complex network graph to represent the relation between knowledge points and forming a question network graph by using the relation between questions, obtaining a sub-graph of user behaviors in the question network graph in the current state of a user through user behavior data, converting a problem of searching a learning area into a problem of searching a cut set in the sub-graph in the current state of the user, modeling the user behavior data by using a deep reinforcement learning algorithm, and training to obtain a strategy of selecting the cut set from the sub-graph in the current state of the user, so that personalized learning recommendation is realized for the user. The steps are specifically described below with reference to the drawings.

Firstly, defining difficulty attributes of knowledge points and topics, and constructing a knowledge point network graph according to the relation between the knowledge points.

In actual operation, the difficulty attributes of the knowledge points and the topics can be preset by a senior teacher according to own teaching experience or generated by using historical data of users, and the difficulty attributes of the topics can be defined according to the difficulty attribute value of the knowledge points where the topics are combined and the difficulty of the topics, which depend on experts or user data modeling.

In the constructed knowledge point network graph, the knowledge points are used as nodes, the difficulty attribute values of the knowledge points are used as the difficulty attribute values of the nodes, the continuous edges are established according to the relation between the knowledge points, and the relation degree between the knowledge points is used as the weight value of the continuous edges. The constructed knowledge point network diagram structure is shown in fig. 1 (a).

And secondly, determining the relation between the topics under the knowledge points according to the relation between the knowledge points, and constructing a topic network graph.

In this embodiment, the topic network graph is that, according to topics under a knowledge point as nodes, difficulty attribute values of the topics are as topic difficulty attribute values of the nodes, difficulty attribute values of the knowledge points where the topics are located are as knowledge point difficulty attribute values of the nodes, a continuous edge is established according to relationships between topics under the knowledge points with continuous edges and relationships between topics under the same knowledge point, and a relationship degree between the topics is used as a weight value of the continuous edge. The constructed structures are shown in fig. 1(b) and fig. 1(c), wherein fig. 1(b) shows a topic network structure under the knowledge point, and fig. 1(c) shows a topic network structure under the knowledge point.

And thirdly, obtaining a sub-graph of the user in the current state in the topic network according to the user behavior data.

(1) Firstly, obtaining user behavior data from a user behavior library, selecting a nearest answer record, namely behavior data of the current state of a user, and referring to a structure in a question network diagram in a figure 1 (d);

(2) then finding forward and backward nodes of the question node from the question network graph according to the latest answer record, specifically, if the historical question is answered correctly, finding the backward node of the question node in the question network graph, and if the historical question is answered incorrectly, finding the forward node of the question node in the question network graph, wherein the structure is shown in figure 1 (e);

(3) and then, the found nodes and the connected edges and the weights of the connected edges jointly form a subgraph in the current state of the user, and the structure is shown in fig. 1 (f).

And fourthly, training to obtain how to select a cut set strategy in the subgraph of the user in the current state by using a deep reinforcement learning algorithm and combining with the user history record.

Referring to fig. 2, the learning process using the deep reinforcement learning algorithm is as follows:

(1) firstly, constructing a deep reinforcement learning initial model, carrying out deep reinforcement learning training through a certain amount of user answers, taking a historical answer record of a user as the state of the deep reinforcement learning model in the training process, taking a question selection strategy of difficulty attributes of nodes in a subgraph in the current state of the user as an action set, and determining a return value according to the correct number of answers of the user;

(2) feeding back the question of a learning area according to the depth reinforcement learning model, and continuously inputting a return value of a strategy, a new answer record, a new sub-graph of the user in the current state and an original answer record into the depth reinforcement learning model for training after the user answers;

(3) finally, training is carried out to obtain a strategy of selecting a cut set from the subgraph of the user in the current state, so that personalized learning recommendation is carried out on the user, and the obtained learning area topic is shown in fig. 1 (g).

Referring to fig. 3, in the implementation process of the method, a user continuously obtains new historical records in response, the records are continuously input into the deep reinforcement learning model for training, new subjects in a learning area, namely new subjects screened from the sub-image of the user in the current state, are obtained according to the training result, the user continuously responds, and through the process, the optimal strategy for selecting the subjects is obtained, so that personalized learning recommendation is realized.

The method is based on the neural network of the deep reinforcement learning, can adapt to the behaviors of most users through a large amount of training, models the user behaviors, and utilizes the deep reinforcement learning technology to learn the question setting strategy according to the user behaviors, so that the personalized learning recommendation is realized according to the users, and the purpose of personalized question setting can be achieved in application.

The above embodiments are preferred embodiments of the present invention, but the present invention is not limited to the above embodiments, and any other changes, modifications, substitutions, combinations, and simplifications which do not depart from the spirit and principle of the present invention should be construed as equivalents thereof, and all such changes, modifications, substitutions, combinations, and simplifications are intended to be included in the scope of the present invention.

Claims

1. The personalized learning recommendation method based on the deep reinforcement learning is characterized by comprising the following steps:

the topic network graph is characterized in that according to topics under knowledge points as nodes, the difficulty attribute values of the topics are used as topic difficulty attribute values of the nodes, the difficulty attribute values of the knowledge points where the topics are located are used as knowledge point difficulty attribute values of the nodes, continuous edges are established according to the relation between topics under the knowledge points with continuous edges and the relation between topics under the same knowledge point, and the relation degree between the topics is used as the weight value of the continuous edges;

(3) according to the user behavior data, obtaining a subgraph of the appointed user in the current state in the topic network graph, wherein the subgraph comprises nodes which answer correctly and wrongly and neighbor nodes in the appointed period;

in the step (3), the construction method of the sub-graph in the current state of the user is as follows: according to the user behavior data, finding forward or backward nodes of the answered topic nodes in the topic network graph according to the user behavior data, wherein the found nodes and the connecting edges and the weights of the connecting edges form a sub-graph of the user in the current state;

(4) modeling by using a deep reinforcement learning algorithm and using a user history record, training to obtain how to select a cut set in a subgraph in the current state of the user, determining a question selection strategy and selecting questions;

in the step (4), a deep reinforcement learning model is constructed, the historical answer records of the user are used as the state of the deep reinforcement learning model, the question selection strategy according to the difficulty attribute of the nodes in the subgraph in the current state of the user is used as an action set, the return value is determined according to the correct number of the answers of the user, deep reinforcement learning training is carried out through a certain number of answers, and a cut set strategy is selected from the subgraph in the current state of the user.

2. The personalized learning recommendation method based on the deep reinforcement learning of claim 1, wherein in the step (1), the difficulty attribute value of the knowledge point is defined by expert or user data modeling, and the difficulty attribute of the topic is defined according to the difficulty attribute value of the knowledge point where the topic is located and the difficulty of the topic itself by expert or user data modeling.

3. The personalized learning recommendation method based on the deep reinforcement learning as claimed in claim 1, wherein in the step (1), the knowledge point network graph is defined by using knowledge points as nodes, using difficulty attribute values of the knowledge points as difficulty attribute values of the nodes, establishing connection edges according to relationships among the knowledge points, using relationship degrees among the knowledge points as weight values of the connection edges, and modeling the relationships by depending on experts or user data.