CN115688863A

CN115688863A - Depth knowledge tracking method based on residual connection and student near-condition feature fusion

Info

Publication number: CN115688863A
Application number: CN202211354947.3A
Authority: CN
Inventors: 张姝; 韩晓瑜; 李子杰; 周菊香; 欧阳昭相
Original assignee: Yunnan Normal University
Current assignee: Yunnan Normal University
Priority date: 2022-11-01
Filing date: 2022-11-01
Publication date: 2023-02-03

Abstract

The invention relates to a depth knowledge tracking method based on residual error connection and student near-state feature fusion, and belongs to the technical field of knowledge tracking. The invention improves the problems of a Deep Knowledge Tracking (DKT) model, wherein the deep knowledge tracking predicts the future answering conditions of students by using the relevant data of knowledge point answering with time sequence of learners and the relevant data of whether the learners correctly answer the knowledge point answering and applying a Recurrent Neural Network (RNN). According to the method, the residual error is connected and added into the DKT model so as to solve the problem of network information degradation, the recent learning condition of the student is enhanced through the recent learning data of the student, the accuracy of model prediction is effectively improved, and a feasible scheme is provided for the development of the knowledge tracking field.

Description

Depth knowledge tracking method based on residual connection and student near-condition feature fusion

Technical Field

The invention relates to a depth knowledge tracking method based on residual error connection and student near-state feature fusion, and belongs to the technical field of knowledge tracking.

Background

Knowledge tracking is one of the important practices of artificial intelligence in education. The main research task is to establish a model to predict the future problem-making situation of the student according to the past learning data of the student. In recent years, deep Knowledge Tracking (DKT) based on Recurrent Neural Network (RNN) creation has been proposed superior to the previously proposed knowledge tracking method and the conventional method. The recurrent neural network has unexpected advantages and effects on predicting student problem conditions in real time, but observation and experiments show that DKT has a great promotion space.

With the advent of the big data era, the application of artificial intelligence in education is more and more popular, data generated by students in the learning process is stored in a large quantity, and the data processing capability of a computer is greatly enhanced. The development of educational data mining and educational data analysis provides a motivation for the development of learning predictions. The deep knowledge tracking is to construct a model based on RNN for past learning data of students and predict the future problem-making conditions of the students. RNN is very effective to data with sequence characteristics, can mine time sequence information and semantic information in the data, and predicts future problem making conditions from past learning data of students by using the capability of RNN. The power is provided for the personalized teaching of teachers, and meanwhile, the learning efficiency of students is also improved.

Disclosure of Invention

The invention aims to provide a depth knowledge tracking method based on residual error connection and student near-state feature fusion, which is used for solving the problem that errors in the future prediction of other knowledge points are aggravated because the rationality of predicted values of other knowledge points is ignored in the existing intelligence quotient tracking method.

The technical scheme of the invention is as follows: a deep knowledge tracking method based on residual connection and student near-condition feature fusion is characterized in that one-hot coding is carried out on required data to serve as input of a recurrent neural network, each piece of data is input into the recurrent neural network to generate a prediction list, a CUT _ STEP value is set, each student calculates CUT _ STEP data in the recurrent neural network, the CUT _ STEP original data are multiplied by weight and then added to serve as input, the input is placed into a full connection layer, output of the full connection layer serves as supplementary information and is added to a hidden layer to continue calculation until an initial prediction matrix of the input data is obtained. And splicing each piece of data of the initial prediction matrix with the sum of recent learning data of the student, namely enhancing the recent learning condition of the student, and finally obtaining the prediction matrix through a linear layer. And reading a required predicted value from the prediction matrix, calculating loss of the predicted value and the true value, and optimizing the model according to the loss until the model reaches relative optimum.

The method comprises the following specific steps:

step1: the method comprises the following steps of preprocessing data, processing each piece of original data into a one-hot data form, and combining each piece of original data to form a one-hot matrix, wherein the method comprises the following specific steps:

step1.1: and (3) reading and cleaning data, namely reading required characteristics from the question making records of students to form a new data form data, wherein the required characteristics comprise time sequence id, student id, knowledge point id and question making errors.

Step1.2: calculating the types of the knowledge points and forming a list K = { K) consisting of knowledge points id ₁ ，k ₂ ，...，k _l }。

Step1.3: and forming a dictionary according to the knowledge point list K, wherein the dictionary is in a form of { knowledge point id: the position of a knowledge point in the knowledge point list formed by step1.2, and the mathematical notation is given as fact = { k: e, wherein K belongs to K, K is a knowledge point list formed in Step1.2, and e belongs to {0,1,2.

Step1.4: and extracting a knowledge point list of each student and a corresponding list of correct or incorrect answers from the data to form a sequence.

Step1.5: sequences are converted into a one-hot coding form, and data are processed into a unified form, namely each student has MAX _ STEP data, and each one-hot data corresponds to a question made by the student.

Step2: and reading the training data in batches from the processed data in Step1.

Step3: and (3) predicting the next question and mistake of the student by using the recurrent neural network, namely, putting the data read by Step2 into the recurrent neural network for operation to generate original prediction data. The formula is as follows:

h _t ＝tanh(W _x x _t +b _x +W _h h _(t-1) +b _h ) (1)

in the formula, h _t Hidden state at time t, x _t For input at time t, h _(t-1) Is a hidden state at the time t-1.

Step4: the method comprises the following steps of enhancing the recent learning condition of a student through the recent learning data of the student by using the original prediction data generated by the recurrent neural network, and finally predicting.

If the student performs well in the previous question-making situations, but the recent question-making situations perform poorly, it is highly likely that the student will have problems with the recent learning situation. And will affect the current question-making situation. The recent learning condition of the student is judged according to the past learning data of the student. Through experiments, when the sum of recent learning data of students is spliced, the effect is optimal.

Step4.1: the original prediction data generated by Step3 is spliced with the sum of recent exercise data of the student to form the data o of hearts _t The formula is as follows:

in the formula o _t For the data obtained by splicing the sum of the original prediction data and the recent exercise data of the student, h _t For the original prediction data, x _i And (3) performing question making data for the one-hot of the student at the moment i, wherein n is the number of recent question making data of the students needing to be spliced.

Step4.2: data o after splicing Step4.1 _t Putting into a full connection layerfc ₁ In (1), adjusting the predicted dimension, fc ₁ The expression of (a) is:

in the formula (I), the compound is shown in the specification,

and

is a linear layer fc ₁ The input and the output of (a) are,

in order to be the weight, the weight is,

is an offset.

Step4.3: putting the output vector of Step4.2 into an activation function, controlling each output element to be between 0 and 1, and obtaining a final prediction result, wherein the numerical value conversion is expressed as:

step5: residual error connection is added on the basis of the recurrent neural network in Step3, so that the prediction accuracy is improved, and the method specifically comprises the following steps:

residual error connection is added on the basis of the Step3 recurrent neural network, so that the prediction accuracy is improved. Because the predicted result in the recurrent neural network excessively pursues the result of the next knowledge point answer and ignores the rationality of the predicted values of other knowledge points, the error of predicting other knowledge points later is aggravated, a CUT _ STEP (1 < CUT _ STEP < MAX _ STEP) value is set for segmenting the MAX _ STEP of a one-hot matrix of a student, CUT _ STEP is a segment value of segment data aggregation, the segmented CUT _ STEP data are added and then transformed and added to the hidden layer, and the next cycle is entered, namely residual connection, and the specific STEPs of realizing the residual connection are as follows:

step5.1: every time each student calculates CUT _ STEP data in a cyclic neural network, the previous CUT _ STEP data are multiplied by weight W and then added, the previous CUT _ STEP data are added according to the importance (weight) of the previous CUT _ STEP data by utilizing the residual error idea to obtain a vector containing the previous CUT _ STEP data information, wherein the weight W = { W _ STEP data information ₁ ，W ₂ ，...，W _C C is shorthand for CUT _ STEP.

Step5.2: the vector containing the previous CUT _ STEP data information generated by Step5.1 is taken as an input and put into the full connection layer fc ₂ The result of Step5.1 is converted into a form, fc, that can be directly added to the hidden layer ₂ The expression of (a) is:

in the formula (I), the compound is shown in the specification,

and

is a linear layer fc ₂ The input and the output of (a) a,

in order to be the weight, the weight is,

is an offset.

Step5.3: then fc will be ₂ Adding the output of the STEP (b) to the hidden layer for continuous operation, then operating the CUT _ STEP data in RNN, adding the CUT _ STEP data, repeating the above STEPs, and changing the hidden layer at time t

The expression of (a) is:

wherein C is short for CUT _ STEP, s _hd For the d-th student one-hot data, W refers to s for participating in the calculation _hd The weight assigned to the weight is given to the weight,

i.e. { s } _d(i-C+1) ，...s _di Multiply by W and then sum.

Step6: reading true values of correct and wrong answers of students and predicting the correct and wrong answer prediction values of the students by using a model, and specifically comprising the following steps:

since no explicit label is given in the dataset, the scheme adopted is: the i +1 th data of the student is a label of a prediction list generated by the ith data of the student, and since the prediction list generated by the ith data of the student is used for predicting all knowledge points of the student in the next step, and the i +1 th data of the student is a one-hot code for judging whether the student is correct or not when doing a certain knowledge point, the i +1 th data of the student can find a prediction value from the prediction list generated by the ith data of the student, and the specific steps are as follows:

step6.1: finding the positions of all knowledge points that the student does except the first one (i.e., the zero position) from the input one-hot matrix constitutes a position list.

Step6.2: and finding out a corresponding predicted value from the prediction list according to the found position list.

Step6.3: and finding out a real value list consisting of whether all knowledge points made by the student except the first knowledge point (namely the zero position) are correct or not from the input one-hot matrix.

Step7: calculating a prediction error and optimizing a network to obtain a model which can predict whether the next problem of students is correct or not, and the specific steps are as follows:

step7.1: and calculating a loss value by using a loss function according to the real value and the predicted value read in Step6.

Step7.2: and optimizing the network by utilizing an optimizer according to the loss value.

Step7.3: because the optimization is not optimized once, the size of the Epoch is defined as required, and the optimization is determined, i.e. the steps can be repeated for more than several times.

Step7.4: and obtaining a trained model which can predict whether the next questions of the students are correct or not.

In the prior art, aiming at knowledge point answers with time sequences and student learning data corresponding to the answers with the correctness, a cyclic neural network is utilized to predict the answers with the correctness of the next step of the students, and because the answers with all knowledge points of the next step of the students are predicted each time, and the students only do one question in the next step, the prediction results are found to excessively pursue the answers with the knowledge points of the next step, and the rationality of the predicted values of other knowledge points is ignored, so that the errors in the future prediction of other knowledge points are aggravated. It has also been found that if students perform well in the previous problem-making situation, but the recent problem-making situation performs poorly, there is a high probability that the students have problems with the recent learning situation. And will affect the current question-making situation. Therefore, the invention enhances the recent learning condition of the student through the recent learning data of the student and obtains good effect. The invention uses residual connection, effectively solves the problem that the predicted result excessively pursues the result of the next knowledge point answer and ignores the rationality of the predicted values of other knowledge points, thereby aggravating the error when predicting other knowledge points later.

The invention has the beneficial effects that: the invention improves the problems of a Deep Knowledge Tracking (DKT) model, wherein the deep knowledge tracking predicts the future answering conditions of students by using the relevant data of knowledge point answering with time sequence of learners and the relevant data of whether the learners correctly answer the knowledge point answering and applying a Recurrent Neural Network (RNN). According to the method, the residual error is connected and added into the DKT model so as to solve the problem of network information degradation, the recent learning condition of the student is enhanced through the recent learning data of the student, the accuracy of model prediction is effectively improved, and a feasible scheme is provided for the development of the knowledge tracking field.

Drawings

FIG. 1 is a flow chart of the steps of the present invention;

FIG. 2 is a model schematic of the present invention.

Detailed Description

The invention is further described with reference to the following drawings and detailed description.

Example 1: as shown in fig. 2, in this example, the public data set 2009-2010 assisment data set is taken as an example, data of each student is processed, and then the processed data is put into a built model to train the model. The specific process comprises the following steps: processing data into a data form capable of being input into a model, reading training data in batches from the processed data, primarily predicting the next exercise correctness of students by utilizing a recurrent neural network, enhancing the recent learning condition of the students through the recent learning data of the students, performing final prediction, adding residual connection on the basis of the recurrent neural network to improve the prediction accuracy, reading a true value and a predicted value, calculating a prediction error and optimizing the network to obtain the model capable of predicting the next exercise correctness of the students.

As shown in fig. 1, a depth knowledge tracking method based on residual connection and student's near-situation feature fusion specifically includes the steps of:

step1: and (4) preprocessing data.

Processing the original data into data which can be directly input into a model, and specifically comprising the following steps:

step1: the method comprises the following steps of data preprocessing, wherein each piece of original data is processed into a one-hot data form, and the specific steps are as follows:

step1.1: data reading and cleaning:

reading required characteristics from the student question making records to form a new data form data. The required characteristics comprise a number, a student number, a knowledge point number and a correct or incorrect four characteristics which are not arranged in time sequence, wherein the knowledge point is a knowledge point which needs to be mastered by a student, each knowledge point can correspond to a plurality of questions to assist the student in mastering the knowledge point, and each knowledge point can appear in the data of one student for a plurality of times.

Step1.2: calculating the types of the knowledge points and forming a list K = { K) consisting of knowledge points id ₁ ，k ₂ ，...，k _l The length of the list is 123, and the most important role of the list K is to determine a position for each knowledge point, so that in a subsequent one-hot matrix, which knowledge point can be determined only according to 1 position in the one-hot matrix.

Step1.3: a dictionary is formed according to the knowledge point list K formed in step1.2, and the dictionary form is { knowledge point number: the position of a knowledge point in the knowledge point list formed by step1.2, and the mathematical notation is given as fact = { k: e, wherein K belongs to K, e belongs to {0,1,2,. 122}, K is a knowledge point list generated in Step1.2, and 122 is the length-1 of the knowledge point list in Step1.2, so as to prepare for forming a one-hot code later.

Step1.4: extracting a knowledge point list of each student from the data and a corresponding list for judging whether the answer is correct or not to form a sequence, and the method comprises the following specific steps of:

step1.4.1: extracting all data S of each student from the data _r ＝{s _r1 ，s _r2 ，...，s _rt }，S _r Unprocessed data representing all students, and S in Step1.6.4 _h Distinction in which s _rt Is a two-dimensional array with unequal length (unequal length is caused by unequal number of questions made by each student) and contains all data of the t-th student.

Step1.4.2: extracting a knowledge point list S of each student from all data of the student _Ki ＝{q _ij |0≤i＜b，j∈N ^* And the corresponding answer correct or not list S _Ai ＝{a _ij |0≤i＜b，j∈N ^* Where S is correct by 1 and error by 0 _K Refers to a list of knowledge points of students, i refers to the ith student, S _Ki Refers to the i-th student's knowledge point list, q _ij The j-th question of the ith student is shown in chronological order, b is the length of the S list in Step1.4.1, namely the number of students, and 4163 students are in the data set. The range of j depends on the number of questions made by the corresponding student (since the number of questions made by each student isDifferent and therefore different range of j for each student), S _A Means the correct and wrong answer list of students, i means the ith student, i.e. S _Ai Refers to the list of correct and incorrect answers from the ith student.

Step1.4.3: the knowledge point lists of all students and the corresponding answer error lists constitute a sequence sequences.

Step1.5: sequences are converted into a one-hot coding form, each one-hot data corresponds to a question made by a student, and the specific steps are as follows:

step1.5.1: and reading the knowledge point list of each student (the length of the knowledge point list of the student is marked as M) and the answer correct or not list from the sequences.

Step1.5.2: m is the length of the knowledge point list of students (if the number of questions made by each student is different, the value of M is also different), L is twice the number of all knowledge points, i.e. L =246, a MAX _ STEP is set to 50 as required, if the number of data M of a student is an integer multiple of 50, the student forms a zero matrix of M246, otherwise:

if M%50= P

The student forms an M _ch *246 zero matrix, M _ch The expression of (a) is as follows:

M _ch ＝M+(50-P) (1)

the MAX _ STEP is used for unifying data shapes of students, and because the number of questions made by each student is different, the length M of a formed student knowledge point list is different, so that one-hot matrixes formed by the students are different, and the data shapes of input models need to be the same, so that a MAX _ STEP is set for correcting the one-hot matrix formed by each student, the length of the one-hot matrix of each student is changed into integral multiple of the MAX _ STEP, and zero padding is insufficient, so that the data are further unified into the input patterns needed by the models.

Step1.5.3: student knowledge point list S generated using Step1.4.2 _Ki ＝{q _ij |0≤i＜b，j∈N ^* Q in (b) } _ij (i.e., the jth question made by the ith student), the dictionary dit formed by Step1.3 is used to find the question _qij In Step1.2The position s at which the list of knowledge points is formed,

if the knowledge point answers at the corresponding Step1.4.2, the list S is correct or not _Ai If the answer in (1) is correct, the s-th position of the j-th row of the zero matrix of the ith student is marked as 1, otherwise, the s + l-th position of the j-th row of the zero matrix of the student is changed to 1,l which is the length of the knowledge point list formed in Step1.2, and the steps are repeated until the zero matrices of all students become the corresponding one-hot matrices.

Step1.5.4: next, the one-hot matrix M246 (or M) for each student is used _ch *246 X50X 246, X is automatically generated according to MAX _ STEP and M, that is, one-hot data of one student is divided into a plurality of blocks according to MAX _ STEP, and the above STEPs are repeated as one-hot data of a plurality of students, all the students compose the last train _ data according to the data after being divided by MAX _ STEP, and the data in the train _ data can be represented as S _h ＝{s _h1 ，s _h2 ，...，s _hi }，S _h Is Y50X 246, wherein Y is the sum of all X in X50X 246,1 ≦ i ≦ Y, s _hi ＝{s _i1 ，s _i2 ，...s _im In which s _hi The dimension of the data of the jth one-hot student is 50 x 246, and m is more than or equal to 1 and less than or equal to 50.

The above steps convert the data into a one-hot matrix of Y x 50 x 246.

Step2: and (3) reading training data from the batch processed data in Step1, setting the batch size to be 64, and determining how many one-hot student data are loaded at one time, wherein the dimension is 64 × 50 × 246.

Step3: predicting the next question correction error of the student by using a recurrent neural network to generate original prediction data, and specifically comprising the following steps of:

the data read by Step2 is put into a recurrent neural network for operation, and as the question making data of the student is time-sequenced, namely the question made by the student in the previous Step influences the correct rate of the next question making of the student, an RNN recurrent neural network with memorability and data sharing characteristics is selected as a basis for constructing a model, input _ size is set to be 246, hidden layer size (hidden _ size) is set to be 10, the number of recurrent layers is 1, selection of 'tanh' of an activation function is set to be 'initial', and the initial probability of the activation function is set to be 'initial' and 'initial' according to the selection of the activation functionStarting the random generation of hidden layer parameters, h _t The hidden state at the time t can be used as the output of the recurrent neural network at the time t, and can also be used as the hidden state at the time t to enter the next cycle, so that the hidden state at the time t is as follows:

h _t ＝tanh(W _x x _t +b _x +W _h h _(t-1) +b _h ) (2)

Step4: the method is characterized in that the original prediction data generated by the recurrent neural network is used for enhancing the recent learning condition of the student through the recent learning data of the student to perform final prediction, and the specific steps are as follows:

if the student performs well in the previous problem-making situation, but the recent problem-making situation performs poorly, it is highly likely that the student has problems with the recent learning situation. And will affect the current problem-making situation. The recent learning condition of the student is judged according to the past learning data of the student. Through experiments, when the sum of recent learning data of students is spliced, the effect is optimal.

Step4.2: data o after splicing Step4.1 _t Put into full connection layer fc ₁ In (1), adjusting the predicted dimension, fc ₁ The expression of (a) is:

in the formula (I), the compound is shown in the specification,

and

is a linear layer fc ₁ The input and the output of (a) a,

in order to be the weight, the weight is,

is an offset.

Step4.3: putting the output vector of Step4.2 into an activation function, controlling each element of the output to be between 0 and 1, and obtaining a final prediction result, wherein the numerical conversion is expressed as:

step5: residual error connection is added on the basis of a Step3 recurrent neural network, so that the prediction accuracy is improved, and the method specifically comprises the following steps:

residual error connection is added on the basis of the Step3 recurrent neural network, so that the prediction accuracy is improved. The prediction result in the recurrent neural network excessively pursues the result of the next knowledge point answer and ignores the rationality of the prediction values of other knowledge points, so that the error of predicting other knowledge points later is aggravated, a CUT _ STEP is set to be 10 and used for segmenting the MAX _ STEP of a student's one-hot matrix, the CUT _ STEP is a segment value of segment data aggregation, 10 segmented data are added and then are transformed and added to a hidden layer, the next cycle is entered, the above STEP is residual connection, and the specific STEPs of realizing the residual connection are as follows:

step5.1: every time each student calculates 10 data in the recurrent neural network, the first 10 data are calculatedMultiplying by weight W, adding (because the importance of each data is different), adding the first 10 data according to their importance (weight) by using residual idea to obtain a vector containing the information of the first 10 data, wherein the weight W = { W = { (W) } ₁ ，W ₂ ，...，W _C And C is the abbreviation of CUT _ STEP.

Step5.2: the vector containing the first 10 data information generated by Step5.1 is used as input and put into the full connection layer fc ₂ ，fc ₂ Has an input size of 246 and an output of hidden layer size 10,fc ₂ The main role of (1) is to convert the result of Step5.1 into a form, fc, that can be directly added to the hidden layer ₂ The expression of (a) is:

in the formula (I), the compound is shown in the specification,

and

is a linear layer fc ₂ The input and the output of (a) a,

in order to be the weight, the weight is,

is an offset.

Step5.3: then fc is read ₂ Adding the output of the step (b) to the hidden layer for continuous operation, operating 10 data in RNN, adding the previous 10 data, repeating the above steps, and changing the hidden layer at time t

The expression of (a) is:

i.e. { s _d(i-C+1) ，...s _di Multiply by W and then sum.

step6.1: finding the positions of all knowledge points that the student does except the first one (i.e., the zero position) from the input one-hot matrix constitutes a position list, the reason for the first one being: the prediction of the second strip is generated by the first strip of data, the prediction of the third strip is generated by the second strip of data, and the analogy is obvious, and the prediction of the first strip of data is not generated, namely, the true value and the predicted value of the first strip of data are not compared, and the error Loss is calculated, so that the error Loss is eliminated.

Step6.2: and (3) finding out corresponding predicted values from the prediction list according to the position list found in Step6.1 (because each prediction is used for predicting the correct probability of all knowledge points of the classmate next time, but the classmate next time is used for making one knowledge point, the predicted value of the knowledge point is only needed, and other prediction points are discarded) to form the prediction list.

Step6.3: and finding a real value list consisting of whether all knowledge points which are made by the student except the first knowledge point (namely the zero position) are correct or not from the input one-hot matrix.

Step7: calculating a prediction error and optimizing a network to obtain a model capable of predicting whether the next question of the student is correct or not, and the specific steps are as follows:

Step7.3: the size of the Epoch is defined as 70 according to the requirement, and the training is determined for several times, namely the steps can be repeated for several times.

While the present invention has been described in detail with reference to the embodiments shown in the drawings, the present invention is not limited to the embodiments, and various changes can be made without departing from the spirit and scope of the present invention.

Claims

1. A depth knowledge tracking method based on residual error connection and student near-state feature fusion is characterized by comprising the following steps:

step1: data preprocessing, namely processing each piece of original data into a one-hot data form;

step2: reading training data in batches from the data processed in Step 1;

step3: predicting the next exercise correction error of the student by using a recurrent neural network to generate original prediction data;

step4: original prediction data are circulated, recent learning conditions of students are enhanced through recent learning data of the students, and final prediction is carried out;

step5: adding residual error connection on the basis of the recurrent neural network in Step 3;

step6: reading a true value of the student answering correctness and a predicted value of the student answering correctness predicted by the model prediction;

and Step7, calculating a prediction error and optimizing a network to obtain a model capable of predicting whether the next question of the student is correct or not.

2. The depth knowledge tracking method based on residual error connection and student near-state feature fusion according to claim 1, wherein Step1 is specifically:

step1.1: data reading and cleaning, namely reading required characteristics from question making records of students to form a new data form data, wherein the required characteristics comprise time sequence id, student id, knowledge point id and question making errors;

step1.2: calculating the types of the knowledge points and forming a list K = { K) consisting of knowledge points id ₁ ，k ₂ ，...，k _l }；

Step1.3: and forming a dictionary according to the knowledge point list K, wherein the dictionary is in a form of { knowledge point id: the position of a knowledge point in the knowledge point list formed by step1.2, and the mathematical notation is given as fact = { k: e, wherein K belongs to K, K is a knowledge point list formed in Step1.2, and e belongs to {0,1,2., l-1};

step1.4: extracting a knowledge point list of each student and a corresponding list of correct answers from the data to form a sequence;

3. The method for tracking depth knowledge based on residual connection and student's near-state feature fusion according to claim 1, wherein Step3 is specifically:

putting the data read by Step2 into a recurrent neural network for operation to generate original prediction data, which specifically comprises the following steps:

h _t ＝tanh(W _x x _t +b _x +W _h h _(t-1) +b _h ) (1)

in the formula, h _t Hidden state at time t, x _t For input at time t, h _(t-1) Is a hidden state at time t-1.

4. The method for tracking depth knowledge based on residual connection and student near-state feature fusion according to claim 1, wherein Step4 is specifically:

in the formula o _t For the data obtained by splicing the sum of the original prediction data and the recent exercise data of the student, h _t For the original prediction data, x _i Making question data for one-hot of the student at the moment i, wherein n is the number of recent question data of the students needing to be spliced;

in the formula o _fc1 And y _fc1 Is a linear layer fc ₁ Input and output of A _fc1 Is a weight, b _fc1 Is an offset;

5. the method for tracking depth knowledge based on residual connection and student's near-state feature fusion according to claim 1, wherein Step5 is specifically:

Step5.1: every time each student calculates CUT _ STEP data in a cyclic neural network, the previous CUT _ STEP data are multiplied by weight W and then added, the previous CUT _ STEP data are added according to the importance of the previous CUT _ STEP data by utilizing the residual error idea to obtain a vector containing the previous CUT _ STEP data information, wherein the weight W = { W = ₁ ，w ₂ ，...，w _C C is the abbreviation of CUT _ STEP;

step5.2: the vector containing the data information of the previous CUT _ STEP generated by Step5.1 is taken as an input and put into the full connection layer fc ₂ The result of Step5.1 is converted into a form, fc, that can be directly added to the hidden layer ₂ The expression of (a) is:

in the formula, x _fc2 And y _fc2 Is a linear layer fc ₂ Input and output of (A) _fc2 Is a weight, b _fc2 Is an offset;

step5.3: then fc will be ₂ Adding the output of the buffer layer to the hidden layer for continuous operation, then operating CUT _ STEP data in RNN, adding the above CUT _ STEP data, repeating the above STEPs, and changing the hidden layer at time t

The expression of (a) is:

i.e. { s _d(i-C+1) ，...s _di Multiply by W and then sum.

6. The method for tracking depth knowledge based on residual connection and student's near-state feature fusion according to claim 1, wherein Step6 is specifically:

step6.1: finding out the positions of all knowledge points, except the first knowledge point, made by the student from the input one-hot matrix to form a position list;

step6.2: finding out a corresponding predicted value from the prediction list according to the found position list;

step6.3: and finding out a real value list consisting of whether all knowledge points except the first knowledge point are correct or not from the input one-hot matrix.

7. The method for tracking depth knowledge based on residual error connection and student near-state feature fusion according to claim 1, wherein Step7 is specifically:

step7.1: calculating a loss value by using a loss function according to the real value and the predicted value read in Step 6;

step7.2: optimizing the network by using an optimizer according to the loss value;

step7.3: defining the size of the Epoch according to needs, determining the number of times of optimization, and repeating the steps for more than several times;