CN114881331A

CN114881331A - Learner abnormal learning state prediction method facing online education

Info

Publication number: CN114881331A
Application number: CN202210498953.XA
Authority: CN
Inventors: 董博; 赵锐; 王余蓝; 阮建飞; 师斌
Original assignee: Xian Jiaotong University
Current assignee: Xian Jiaotong University
Priority date: 2022-05-09
Filing date: 2022-05-09
Publication date: 2022-08-09

Abstract

The invention relates to a learner abnormal learning state prediction method facing online education, which comprises the following steps: preprocessing log information of a high-dimensional online education platform and learner registration information and coding and constructing learner portrait characteristics based on a self-supervision learning method; constructing learner state characteristics, further constructing a state characteristic sequence based on the generation time sequence of the state characteristics, and constructing a state characteristic diagram based on cosine similarity among the state characteristics; constructing a long-term memory-graph attention deep network which accords with the learning badness degree prediction of online education, and determining the number of layers of the network, the number of neurons of each layer and input and output dimensions; constructing a pseudo label based on the noise label to carry out iterative training on the network; and predicting the abnormal learning state and the degree of the learner in the learning stage to be predicted by using the trained network. The method and the system predict the abnormal degree of the learner state by utilizing the learner registration information and the learner log information, and provide reference for teachers to conduct targeted guidance and help on the learners.

Description

Learner abnormal learning state prediction method facing online education

Technical Field

The invention relates to the technical field of online education, in particular to a learner abnormal learning state prediction method for online education.

Background

With the widespread popularization of modern computer networks and the rapid development of home electronic equipment terminals, online education using modern technologies such as computer networks and artificial intelligence has become an important component of home education. The market scale of the online education industry is increased by 35.5% in year 2020, but as a remote education means, the characteristics of less interaction between teachers and students and difficulty in learning and supervision of online education cause negative influences which cannot be ignored on the teaching effect. In recent years, the number of participants in online education and the increase of online courses year by year accumulate massive online education learner registration information and log data, and the data are effectively analyzed and a model is built to realize the prediction of the abnormal learning state of the learner, so that the method has important significance for teachers to know the learning condition of the learner in time and give targeted guidance and supervision.

The existing relevant research mainly focuses on finding problems of poor learners, regards the finding of poor learners as an 0/1 classification problem (normal learners and poor learners), extracts features based on learner registration information and log data generated in the learning process, and predicts the poor learners by using a supervised learning method by using binarized learning scores as labels. The following documents provide technical solutions to consult learning handicapped discovery problems:

document 1: a method (201910833015.9) for predicting bad learners based on campus card data.

Document 2: a method for identifying a learner with poor score for network education (201610864980.9).

Document 1 proposes a method for predicting a learning disability based on campus card data. The method expresses the campus card data of the learner in a matrix form, takes the learning result as a label, and trains a classifier for judging whether the learner is a person who does not learn well or not by combining a Convolutional Neural Network (CNN) and a long-short term memory neural network (LSTM).

Document 2 designs a method for identifying learners with poor performance facing to network education, which is oriented to online education data, constructs high-dimensional features based on a time window, learning duration and the like, divides a training data set into positive examples and negative examples according to whether the performance is greater than 60, and trains a two-classifier by using a random forest algorithm to find out the learners with poor performance.

The learner who uses the above-described document method for finding the learning state abnormality mainly has the following problems: first, both documents 1 and 2 are developed for finding problems of learning badness, and in practical applications, it is important to perform personalized guidance for learners with learning states of different abnormal degrees, for example, learners who happen and repeatedly occur learning badness need different degrees of supervision and guidance methods; secondly, the learning result or the examination score of a certain school period or course cannot sufficiently reflect whether the learner needs to carry out targeted supervision in the learning process, the examination score depends on different emphasis of the examination questions of the course, and the peer with more earnest learning in the emphasis part of the examination questions obtains higher scores, so that the learning score cannot sufficiently reflect the learning state of the current teaching part of the learner; in online education, a teacher hopes that learners with learning state abnormity in any teaching part can be found in time and provide targeted guidance, so that label noise is introduced by directly taking learning scores as training labels to measure the whole-process learning state; meanwhile, documents 1 and 2 mine learner portrait features or time series representations for learning, and the lack of attention to different perspectives of features also makes it difficult to develop tag disambiguation based on multi-perspective features to solve the noise problem. However, to obtain accurate labels of bad learning states and degrees thereof requires a lot of labor labeling cost, and how to construct labels based on existing learning scores and multi-view features to more accurately evaluate abnormal learning states and degrees thereof at each stage of a learner has become a problem to be solved.

Disclosure of Invention

The invention aims to provide a learner abnormal learning state prediction method for online education, which aims to solve the problems in the prior art.

In order to achieve the purpose, the invention adopts the following technical scheme to realize the purpose:

a learner abnormal learning state prediction method facing online education comprises the following steps:

firstly, preprocessing log information of a high-dimensional online education platform and learner registration information and coding and constructing learner portrait characteristics based on a self-supervision learning method; secondly, segmenting the learner log information based on a time window and respectively encoding to construct learner state characteristics, further constructing a state characteristic sequence based on the generation time sequence of the state characteristics, and constructing a state characteristic diagram based on the cosine similarity between the state characteristics; thirdly, constructing a long-time memory-graph attention LSTM-GAT deep network which accords with the learning badness degree prediction of online education, and determining the number of layers of the network, the number of neurons of each layer and the dimension of input and output based on the constructed learner state characteristics, the state characteristic sequence and the state characteristic diagram; from time to time, learning scores mapped to [0, 1] are used as noise labels, pseudo labels are constructed on the basis of the noise labels to carry out iterative training on the network, each iteration is firstly based on multi-view features and pseudo labels to optimize the network, then reliable samples are selected on the basis of time sequence local continuity, space local consistency and sample prediction errors of state features, a sub-graph with the samples as the center is constructed for each unselected unreliable sample in a state feature graph, finally, the reliable sample labels are aggregated in each sub-graph to realize pseudo label reconstruction of the center samples, and the reconstructed pseudo labels are used for next iterative training; and finally, predicting the abnormal learning state and the degree of the learner in the learning stage to be predicted by using the trained network.

A further improvement of the invention is that the method comprises in particular the steps of:

1) learner enrollment information and journal information processing

Preprocessing learner registration information and learner log information to serve as initial characteristics of learners, encoding the initial characteristics to enable the initial characteristics to have uniform mathematical expression, then training a masking self-encoder based on an auto-supervision method, and re-encoding the encoded characteristics based on the masking self-encoder to achieve characteristic dimension reduction so as to obtain the characteristics of learners;

2) log fragment information processing and state feature sequence and state feature graph construction

Dividing the learner log information into a plurality of log information segments based on a time window, and carrying out preprocessing and encoding on the log information segments similar to the log information processing in the step 1) to obtain learner state characteristics; constructing a state feature sequence by using the state features of the same learner based on a time sequence generated by the log of the learner, calculating feature similarity among the state features of all learners and constructing a state feature graph based on a k-nearest neighbor method;

3) LSTM-GAT deep network construction

Constructing a deep network group consisting of parallel LTSM and GAT to respectively extract time sequence and space representation, wherein the input of the deep network group and the input of the deep network group are respectively from a state characteristic sequence and a state characteristic diagram, connecting a vector output after the processing of the LTSM and the GAT deep network group with a learner portrait characteristic vector and then inputting the vector to a full-connection layer group, and outputting the learning state abnormal probability of the learner by the full-connection layer group;

4) LSTM-GAT deep network training based on multi-view feature and label reconstruction

Mapping the learning achievement to [0, 1] as a noise label, taking the learner state characteristic and the learner portrait characteristic as training characteristics, dividing each training iteration into two parts, namely network parameter updating and pseudo label reconstruction, and updating the network parameters by the pseudo label constructed based on the noise label during model training; when reconstructing the pseudo label, firstly selecting a reliable sample, then constructing a subgraph by taking an unreliable sample as a center, and aggregating the pseudo labels of the reliable sample in the subgraph to reconstruct the pseudo label of the unreliable sample at the center;

5) learner abnormal learning state prediction

And processing and coding the learner registration information and the log information to be predicted to be used as the input of the trained LSTM-GAT model, and outputting the probability of the learner with abnormal learning state after model processing.

The invention has the further improvement that in the step 1), the learner registration information and log information processing specifically comprises the following steps:

step 1: learner enrollment information processing

Aiming at the problem that the original learner registration information contains a plurality of redundant and missing fields, selecting the information related to the gender, the birth date, the identity type, the highest academic calendar, the university and the location with high relevance to the learner and less missing from the fields as initial characteristics; meanwhile, the birth date is reserved for the birth year, and every 10 years are used as an interval for mapping and converting the birth year into a category characteristic; mapping the locations to six types of first-line cities, new first-line cities, second-line cities, third-line cities, four-line cities and five-line cities as features according to the '2020 urban commercial charm ranking list' of the new first-line city institute, thereby obtaining a unified mathematical expression of time and category features; then, the above characteristics are coded in a one-hot encoding mode and are connected to be used as the characteristics of the learner registration information;

the one-hot encoding method comprises the following detailed steps:

s1. classify the set of characteristics into

To have n _i Class characteristics c of possible values _i Is provided with an n _i Bit state register M _i ；

s2.M _i Each bit in (1) represents c _i Whether one value of (1) is valid or not is determined, and the value of (1) is valid or not is determined as invalid;

s3. for all i e {1, 2, 3, …, l ^c H, mixing c _i Are sequentially wovenCoding to obtain one-hot vectors of all the category characteristics;

step 2: log information processing

The statistical characteristics of the log information of each school period reflect the general state of the learner in the school period, therefore, the statistical data of the total viewing times of the curriculum videos, the total viewing time of the curriculum videos, the number of learning courses, the average single viewing time, the total number of comment interactions, the average single comment interactions, the number of video pauses, the average video viewing interval and the average video viewing interval of the same chapter of the current school period are respectively constructed from the log information of each school period by utilizing a summing, averaging and counting aggregation functions, and the numerical characteristics are mapped based on z-score standardization so that the average value is 0 and the standard deviation is 1; thereby converting redundant and miscellaneous characteristics which are contained in the log information and are difficult to be directly used into learner statistical log characteristics which are directly used and have uniform mathematical representation;

the z-score method comprises the following specific steps:

s1. the set of log feature is

Logarithmic feature v _i Calculating the sample mean value mu _i Sum sample standard deviation σ _i ；

s2. characterizing v _i Normalization was performed according to the following z-score equation:

s3. for all i e {1, 2, 3, …, l ^v V, will be _i Sequentially coding to obtain one-hot vectors of all the category characteristics;

step 3: learner portrait feature coding construction

The learner registration information and the log information of the set school date are combined to macroscopically describe the general state of the learner in the set school date, the general learning state of the learner has an important reference function for predicting the learning state of the learner, and the learner registration information and the log information are combined and then the general learning state of the learner is describedLine coding to construct a learner representation feature that describes the learner's general learning state; learner portrait feature coding construction firstly connects the features generated in Step1 and Step2 to form a learner initial feature to describe the learner, then constructs a masking self-encoder consisting of a linear encoder and a linear decoder to process sparse features, and mines deep association of the learner initial feature through partial masking and restoring of the learner initial feature; the set encoder consists of an input layer and two hidden layers, the first hidden layer consisting of

A second hidden layer consisting of

A neuron component for memorizing the length of the learner's initial feature vector as l ^s Then dimension of input layer is l ^s Wherein the first hidden layer of neurons is of size

And a tan h activation function, the second hidden layer consisting of neurons of size

The decoder consists of two layers with the structure and the sequence opposite to those of the encoder, and a batch normalization layer is arranged before activation to improve the convergence degree of the model, wherein the tan h activation function is formally expressed as:

inputting the feature vector into a coder network after random mask, wherein the mask rate takes t as step length in training and sets a sequence M ^t Step growth, t is a super parameter controlling the rate of increase of mask rate, M ^t A super-reference sequence for controlling the growth rate; the loss function is set as the mean square error loss, the solutionThe output vector of the coder is

The original feature vector is x _s Then the mean square error loss function is expressed as:

after training, taking out the encoder in the self-masking encoder as the learner portrait characteristic encoder, and encoding all initial characteristics of the learner to obtain the learner portrait characteristic.

The further improvement of the invention is that in the step 2), the log segment information processing and the state feature sequence and state feature diagram construction specifically comprise the following steps:

step 1: log fragment information processing

The learning state of the learner is reflected by the learning condition within a period of time, the log segment information processing divides the log information into log information segments according to a set time window, and the learning state of the learner at the current time segment is reflected by the log information within the time segment;

specifically, the method comprises the steps of dividing log information of each learner into log information segments according to weeks, processing log information similarly, constructing statistical data of the total video watching times of a chapter, the total video watching time of a course, the number of courses participating in learning, the average single watching time, the total comment interaction times, the average single video comment interaction times, the video pause times, the average video watching interval and the average video watching interval in each chapter for the learners per week, encoding the statistical data according to a log information processing mode, and finally constructing a masking self-encoder consisting of a linear encoder and a linear decoder, wherein the number of encoder layers and the number of neurons in each layer are the same as those of image feature codes of the learners, and the length of a coded feature vector is l ^p Then dimension of input layer is l ^p The linear layer sizes of the two hidden layer neurons are respectively

And

the rest networks and training settings are the same as the construction of the learner portrait feature codes mentioned above; embedding the linked features by using a trained encoder to obtain learner state features;

step 2: state signature sequence construction

In order to fully utilize the information, the state characteristic sequence is constructed based on the learning state generation time, firstly, the state characteristic sequence is sequenced according to the time sequence generated by each learner state characteristic to construct the state characteristic total sequence of the learner, and then, the state characteristic of each learner is selected to be the I sequence of the preamble ^b The individual state features form a state feature sequence with themselves so as to utilize the first l ^b Auxiliary prediction of log fragment information of the week; for the state features with insufficient preorder state features, preorders of the state features are filled with set numerical values so as to facilitate batch processing in training; therefore, a sequence containing the learner state characteristics and the preamble nodes is constructed for each learner state characteristic so as to facilitate the data time sequence information mining in the model training;

step 3: state feature graph construction

Embedding log segment information into a representation space and constructing a state feature graph based on the sample similarity so as to carry out label disambiguation and mining similar sample association by using the spatial local consistency of the log segments in learning, keeping the number of learner state feature samples as n,

is a state feature similarity matrix, S _ij Indicating status characteristics

And

cosine similarity matrix therebetween, then

Constructing a graph based on cosine similarity and a k-nearest neighbor method, namely connecting each sample point with the most similar k sample points in a state feature embedding space to form an undirected graph, and recording an adjacency matrix of the constructed graph as A ^n×n Then the graph structure is formalized as:

wherein the content of the first and second substances,

represents a pair S _i The vector of the elements in the sequence is sorted in a descending way, the DSC is recorded as the descending algorithm, and then

The formalization is represented as:

thereby obtaining a state signature and its adjacency matrix representation.

The further improvement of the invention is that in the step 3), the construction of the LSTM-GAT deep layer network specifically comprises the following steps:

step 1: LSTM network construction

The learner state characteristic is constructed through a state characteristic sequence to obtain a learner state characteristic sequence based on time sequence, a two-layer LSTM network is constructed to learn information in the learner state characteristic sequence so as to mine data time sequence information, and the network takes the state characteristic sequence as input and outputs a learner state time sequence representation; setting the input layer dimension of the LSTM neural unit to

The hidden layer comprises c ^lstm A neuron with an output layer dimension of o ^lstm ；

Step 2: GAT network construction

The learner state characteristic is constructed through a state characteristic graph to obtain a learner state characteristic graph based on the cosine similarity of the learner state characteristic in a characteristic space, a two-layer GAT network with a multi-head attention mechanism is constructed to learn the learner state characteristic graph so as to mine the relationship information of the learner state characteristic graph in the characteristic space, LeakyReLU is used as an activation function, the state characteristic graph is used as the input of the activation function, and the similar node characteristics of each node are aggregated to obtain a learner state space representation;

step 3: fully connected layer set construction

Constructing a fully-connected layer group consisting of two fully-connected layers to integrate all coding information to predict the abnormal degree of the learner state, wherein the size of the first fully-connected layer is

Using tanh as the activation function, the second fully-connected layer size is h ^concat X 1, using sigmoid as an activation function to output a probability representing the degree of abnormality of the learner's learning state; the input is the connection of LSTM, GAT network and learner portrait feature encoder output; the feature vectors output by the three parts respectively provide time sequence information, space information and general state information of the learner, and the fully-connected layer group simultaneously predicts the abnormal degree of the learner state by utilizing the information of three dimensions;

the sigmoid function is formally expressed as follows:

wherein z is the output of the second fully connected layer, sigmoid (z) is the probability prediction of the abnormal learning state of the learner.

A further development of the invention is that,vector of layer l-1 of the GAT network

The vector aggregation formalization at layer i is represented as:

wherein the content of the first and second substances,

is a sample point in the state feature map, N _i Is a sample

Of the neighbor vector index list, α _ij W is a parameter matrix;

the attention coefficient calculation formalized is expressed as:

wherein concat represents a splicing operation,. - ^T Representing the transposition operation of the matrix, wherein a is a weight matrix connecting two layers in a single-layer feedforward neural network;

the LeakyReLU function is formalized as:

where k is a hyper parameter representing the slope.

The further improvement of the invention is that in the step 4), the LSTM-GAT deep network training based on multi-view feature and label reconstruction specifically comprises the following steps:

step 1: training data preparation

In order to predict the abnormal degree of the state of the staged learner, each state feature in the schooling stage is matched with the current schooling stageBy combining the learner representation features to form a training sample instance, the pre-processed and encoded training sample set is represented as

Wherein

The characteristics of the state of the learner are shown,

is composed of

The represented log segment is characterized by the image of the learner at the school stage,

is composed of

A noise label of the school date in which the represented log segment is located; the noise label is generated by mapping the average learning performance of the learner over the period to [0, 1%]The method is characterized in that the each subject achievement set of the kth learner is recorded as

Then calculate the kth school term noise label

Is formalized as:

wherein

Showing the score of the ith subject of the kth school period; will be provided with

The sample characteristics are sequentially subjected to characteristic coding, state characteristic sequence construction and state characteristic diagram construction to obtain a training data set with unified mathematical expression;

step 2: network parameter update

The initialization of the network parameters is very important for improving the training speed and the convergence of the deep learning network, so that the network parameters are initialized based on an Xavier initialization mode to accelerate the training speed and reduce the gradient dispersion; noting that the input dimension of the layer where the parameter w is located is o, the output dimension is o, and the Xavier initialization method is formally expressed as follows:

in order to reduce noise influence, constructing a pseudo label based on a noise label in network training for training, initializing the pseudo label as the noise label at the initial stage of training, inputting a state feature diagram, a state feature sequence, a learner image drawing feature and the pseudo label into a network in each training iteration process, setting a loss function as a mean square error, and updating network parameters based on a back propagation algorithm;

step 3: pseudo-tag reconstruction

Because the abnormal degrees of the states of the learner in different learning stages are different, and the labels generated for each stage based on the academic achievement are the same, label noise is brought to training samples, so that the labels need to be refined in the training process, and a pseudo label closer to the abnormal degree of the learning states is constructed to reduce the label noise;

specifically, after the parameters of each iteration are updated, the pseudo labels with low reliability are updated based on the pseudo label information with high reliability so as to reduce label noise, firstly, reliable samples are screened based on the time sequence local continuity, the space local consistency and the sample prediction error of the state features, the intersection of the reliable samples is used as a reliable training sample so as to achieve label disambiguation of three state feature view angles, and the selected reliable sample set is formally expressed as follows:

wherein

And

the method comprises the steps of respectively selecting reliable sample sets based on time sequence local continuity, spatial local consistency and sample prediction errors; specifically, the time sequence local continuity means that the learner has continuous learning states in a short period of continuous time, so that the continuous samples in the state feature sequence have similar labels, and therefore, the reliability of the samples in the time sequence is measured by using the pseudo label difference of the continuous samples in the state feature sequence; the spatial local consistency means that similar samples in the state feature diagram have similar state features, and the samples with similar features have similar labels, so the reliability of the samples in space is measured by using the pseudo label difference of the similar samples on the state feature diagram; meanwhile, the mean square error is used for measuring the error between the sample prediction and the pseudo label and the difference between the sample pseudo labels; then

And

the formalization of the collection is represented as:

wherein, tau _g 、τ _s And τ _r For three hyper-parameters representing threshold values, reliable sample selection based on time sequence local continuity, spatial local consistency and sample prediction error is controlled respectively;

being the sum of the elements of the ith row of the matrix S,

is the sum of the elements of row i of matrix D, x _i Denotes the ith training sample, y' _i The representation is based on

Constructed pseudo-tag of SEQ _i Represents a sample x _i List of indexes of sample in sequence, D _ij Represents a sample x _i And x _j The number of time segments that differ in sequence, thereby obtaining a reliable sample set

For subsequent training and pseudo label reconstruction;

then, after each iteration, aggregating information of peripheral reliable samples for the unreliable samples to update labels of the unreliable samples, selecting the unreliable samples as a center, and selecting a reliable adjacent sample structure subgraph on a state feature graph of the unreliable samples; if all the neighbors of the unreliable sample are unreliable, the fact that local label noise taking the sample as a center is too much is shown, effective information is few, therefore, pseudo label reconstruction of the sample is skipped, pseudo label reconstruction on a subgraph is expanded based on a label propagation method, and formalization expression is as follows:

wherein

Representation of noiseAcoustic sample x _i List of reliable samples in the neighborhood.

The invention is further improved in that, in the step 5), the learner's abnormal learning state prediction specifically comprises the following steps:

processing learner registration information and learner log information needing prediction through step 1) and step 2) to obtain state characteristic codes x ^e Learner image feature coding x ^s And the state signature sequence SEQ is used as a network input, x ^e And the preamble thereof is input into the LSTM network part, and the output of the LSTM part is used as the state characteristic time sequence representation of the sample; x is to be ^e The graph space embedded into the training sample finds k neighbors, and the state feature space characterization of the sample is represented based on the GAT network output of the k neighbors aggregated by distance, and the formalization is represented as follows:

wherein GAT (x) represents the output of the sample x after being processed by the GAT network, N ^train Represents a sample x ^e Neighbor set in training set, S (x) ₁ ，x ₂ ) Represents a sample x ₁ ，x ₂ Cosine similarity between them; connecting the state characteristic time sequence representation output by the LSTM part, the state characteristic space representation output by the GAT part and the learner portrait feature coding as the input of a full-connection layer group, and obtaining a learner state anomaly probability p through the full-connection layer group _x ∈[0，1]，p _x The larger the learner's state is, the higher the degree of abnormality is, and vice versa.

The invention has at least the following beneficial technical effects:

the learner abnormal learning state prediction method for online education, provided by the invention, predicts the learner state abnormal degree by utilizing the learner registration information and the learner log information, and provides reference for teachers to pertinently guide and help learners. Different state characteristic visual angles are constructed based on time sequence-space characterization, label reconstruction is realized based on the different state characteristic visual angles, the existing technology is improved, and a learner abnormal learning state prediction model with label noise robustness can be constructed under the condition that each learner state segment does not need to be labeled. Compared with the prior art, the invention has the advantages that:

(1) the invention regards the learner's abnormal learning state prediction as a regression problem, and maps the learning achievement to a continuous noisy label to train the prediction model. Compared with the prior art that learning achievement binaryzation is directly used as a label, the noisy label constructed by the method can better reflect the learning state abnormal degrees of learners in different stages in training data, and the learner abnormal learning state prediction model trained by using the data is more in line with the scene requirements of assisting teachers to develop targeted guidance for learners with different state abnormal degrees.

(2) The invention provides a noise label learning method based on multi-view reliable sample disambiguation and label propagation, which constructs an iteratively updated pseudo label based on a noisy label. Specifically, the method divides reliable samples and unreliable samples based on time sequence local continuity, space local consistency and sample prediction errors of state features, and then constructs a subgraph with the unreliable samples as the center and aggregates reliable sample pseudo labels in the subgraph to reconstruct the unreliable sample pseudo labels. The method is different from the prior art that the network supervision training is directly carried out by utilizing the noisy label, the pseudo label reconstruction of the sample is realized, and the influence of the label noise on the network training is effectively reduced.

(3) The invention provides a time sequence-space representation extraction method, which constructs a state feature time sequence structure based on the time sequence of the state features of a learner, constructs a graph structure based on the similarity between the state features of the learner, and learns the time sequence-space representation of the learner based on LSTM and GAT respectively. Compared with the prior art that learner image characteristics or time sequence characterization is usually mined for learning, the method provided by the invention combines static learner image characteristics with time sequence and space characterization, and mines deep multi-view characteristics, thereby improving the accuracy of learner abnormal state prediction.

Drawings

FIG. 1 is an overall framework flow diagram.

Fig. 2 is a flow chart of learner registration information and log information processing.

FIG. 3 is a flow chart of log segment information processing and status signature sequence and status signature graph construction.

FIG. 4 is a flow chart of the construction of the LSTM-GAT deep network.

FIG. 5 is a flow chart of LSTM-GAT deep network training based on multi-view feature and label reconstruction.

FIG. 6 is a flow chart of learner abnormal learning state prediction.

FIG. 7 is a diagram illustrating the prediction of abnormal learning state of a learner.

Detailed Description

Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art. It should be noted that the embodiments and features of the embodiments may be combined with each other without conflict. The present invention will be described in detail below with reference to the accompanying drawings in conjunction with embodiments.

Examples

The registration information of all learners in 2017 in a certain online education platform and the log information of the learners in 2017 and 2020 are selected. The present invention will be described in further detail with reference to the accompanying drawings, in conjunction with experimental examples and embodiments. All the technologies realized based on the present disclosure belong to the scope of the present invention.

In the embodiment of the present invention, as shown in fig. 1, the method for predicting the abnormal learning state of the learner for online education comprises the following steps:

step 1. learner registration information and log information processing

The learner registration information and the log information comprise a plurality of redundant fields and a plurality of data missing fields which are irrelevant to the learner abnormal learning state prediction; meanwhile, many useful time and category fields cannot be directly used for model training. Learner enrollment information and log information processing removes fields of data that are less relevant to the training objectives and builds a unified mathematical representation of fields that cannot be used directly for model training to facilitate subsequent feature extraction and model training. The learner registration information and log information processing implementation process is shown in fig. 2, and specifically comprises the following steps:

s101. learner registration information processing

Learner enrollment information includes { gender sex, date of birth birthday, identity type, highest school calendar qualification, graduation institution establishment, location city, profile, job } pending fields.

Specifically, in the present embodiment, since the selectable fields in which the personal profile and professional fields are registered for the registrant have a large number of corresponding fields of samples missing, and the personal profile information is not highly correlated with the target question and is difficult to utilize as long-text information, the two fields are deleted from the learner registration information, and the remaining fields are taken as features. Since the date class feature cannot be used directly, in the embodiment, each 10 years is taken as an interval, the birthday is mapped to a discrete class feature, the year of the birthday is taken as year, and the mapping manner is expressed in a formalized manner as follows:

thus, yearType is obtained to represent the year category after mapping. In addition, according to the '2020 city commercial charm ranking list' of the new first-line city research institute, the location is mapped into {1, 2, 3, 4, 5 and 6} respectively from a first-line city, a new first-line city, a second-line city, a third-line city, a four-line city and a five-line city, so that the category of the city is obtained. To this end, learner registration information is converted into class characteristics, which are then encoded based on a unique code, in the embodiment, gender (male/female), year classification, identity type (student/job), highest scholarness (below/specialty/subject/master/doctor), graduation institution (bi-level/general/other), and city classes are encoded separately, and then the codes of the above fields are connected as the learner registration information characteristics.

S102, log information processing

The statistical characteristics of the log information of each school period reflect the general state of the learner in the school period, but the log information comprises a large number of behavior fields and time fields.

Specifically, in this embodiment, the behavior field includes a login platform, a video watch, a video pause, a comment, and the like, and the time field includes a video watch time, a video pause time, a login platform time, a logout platform time, and the like. These fields are large in data size and difficult to use directly for machine learning model training, so their statistics are constructed as features using some aggregation functions. In the embodiment, statistical data such as the total number of times of viewing the curriculum videos, the total length of viewing the curriculum videos, the number of courses participating in learning, the average single-time viewing length, the total number of times of comment interaction, the average single-video comment interaction number, the number of times of pausing videos, the average video viewing interval of the same chapter and the like in the current school period are constructed from the fields and serve as the log information characteristics to be processed.

Since the statistics are numerical features that are not dimensionally uniform, the numerical features are mapped based on z-score normalization. In the embodiment, the sample mean value [ mu ] of the above-mentioned 9 columns of features is first calculated ₁ ，μ ₂ ，…，μ ₉ And sample variance σ ₁ ，σ ₂ ，…，σ ₉ Is then based on the z-score formula

Updating its characteristic { upsilon ₁ ，υ ₂ ，…，υ ₉ Redundant features contained in the log information that are difficult to use directly are thereby converted into features having a uniform mathematical representation that can be used directly. And connecting the characteristics to form the characteristics of the statistical log of the learner.

S103, construction of learner portrait feature codes

The learner registration information and the log information of the set school date are combined to macroscopically describe the general state of the learner in the set school date, and the general learning state of the learner has an important reference function on the prediction of the learning state of the learner.

Specifically, in this embodiment, the learner portrait feature coding construction firstly connects the learner registration information feature and the learner statistical log feature to form a vector with the length of 40 as the learner initial feature, then constructs a masking self-encoder composed of a linear encoder and a linear decoder to process the sparse feature, and mines the deep association of the learner initial feature through partial masking and restoring of the learner initial feature. The encoder is composed of an input Layer and two hidden layers, the dimension of the input Layer is 40, the first hidden Layer is composed of 6 neurons, the second hidden Layer is composed of 4 neurons, the neurons of the first hidden Layer are composed of linear layers with the size of 40 x 32 and tanh activation functions, the neurons of the second hidden Layer are composed of linear layers with the size of 32 x 16 and tanh activation functions, the decoder is composed of two layers with the structure and the sequence opposite to that of the encoder, and a Batch Normalization Layer (Batch Normalization Layer) is arranged before activation to improve the convergence degree of the model.

In the training of the masking self-encoder of this embodiment, the mask rate of the feature vector is increased in steps in the training in the order of {0, 15%, 30%, 45% }, with 10 as a step size, and the loss function is set as the mean square error:

the encoder in the trained masked-from-encoder is taken as the learner portrait feature encoder. The 40-dimensional learner initial features are then encoded into 16-dimensional learner representation features.

Step2, log fragment information processing and state characteristic sequence and state characteristic diagram construction

The learner learning state is reflected by the learning condition within a period of time, the learner log information is divided into a plurality of log information segments on the basis of a time window, and the log information segments are preprocessed and encoded to obtain the learner state characteristics reflecting the learning condition of the learner within the period of time; constructing a state feature sequence based on the time sequence generated by the log of the state features of the same learner, calculating feature similarity between the state features of all learners and constructing a state feature graph based on a k-nearest neighbor method, thereby facilitating the mining of the time sequence and the spatial relationship between the state features of the learners in the subsequent model training. The implementation process of log segment information processing and state feature sequence and state feature graph construction is shown in fig. 3, and specifically includes the following steps:

s201, processing log fragment information

And dividing the log information of each learner into log information segments by taking the week as a time window. Secondly, constructing statistical characteristics of log information processing similar to each week and encoding according to a log information processing mode. And constructing a masking self-encoder consisting of a linear encoder and a linear decoder to further encode the statistical features so as to mine deep feature correlation information of the statistical features.

Specifically, in this embodiment, the number of encoder layers and the number of neurons in each layer are the same as those of the learner portrait feature code, the length of the encoded feature vector is 9, the sizes of the linear layers of the neurons in the two hidden layers are respectively 9 × 24 and 24 × 16, and the rest of the network and training settings are the same as those of the learner portrait feature code construction mentioned above; and embedding the linked features by using the trained encoder to obtain final state features.

S202, constructing a state characteristic sequence

The learner's state features are generated in a chronological sequence, and to make full use of this information, the present invention constructs a sequence of state features based on the learning state generation time.

Specifically, in this embodiment, each school date includes 16 weeks, and the status characteristics of a learner in a school date are recorded

And the week it belongs to

Is composed of

According to

To pair

It sequences to obtain the total sequence of the state features of the learner at the school stage. Then, selecting 5 state features in the preamble of each learner for each learner state feature to form a 6-state feature sequence with the learner state feature, so as to facilitate the auxiliary prediction by using the log fragment information of the previous 5 weeks in the training; for a state feature with insufficient preceding state features, the preamble thereof is filled with 0, thereby constructing 16 state feature sequences for each learner per school term so as to mine time sequence information in data in the following learning process.

S203, constructing a state feature graph

And embedding log fragment information into a representation space and constructing a state feature graph based on the sample similarity so as to perform label disambiguation and mine similar sample association by using the spatial local consistency of the log fragments in learning.

Specifically, in this embodiment, the state feature similarity matrix S is first calculated, wherein

Indicating status characteristics

And

cosine similarity between them; then, each column in the matrix S is respectively sorted to obtain S ^DSC Wherein

Sorting the ith column of S in descending orderThe resulting sequence; finally, constructing a graph based on cosine similarity and a k-nearest neighbor method, wherein k is 20 in the embodiment, that is, each sample point in the state feature embedding space is connected with the most similar 20 sample points to form an undirected graph, the constructed graph is represented by an adjacency matrix A, and then some graphs are obtained

Thereby, a state feature diagram and a adjacency matrix representation thereof are obtained for mining spatial information in the data.

Step3, LSTM-GAT deep network construction

Constructing a deep network group consisting of parallel LTSM and GAT to respectively extract time sequence and space representation, inputting the two from a state feature sequence and a state feature diagram, connecting a vector output after being processed by the LTSM and GAT deep network group with a learner portrait feature vector, inputting the vector to a full-connection layer group, and outputting the learning state abnormal probability of the learner by the full-connection layer group. Fig. 4 shows an implementation process of constructing the LSTM-GAT deep layer network, which specifically includes the following steps:

s301.LSTM network construction

After the learner state characteristics are processed by S202, a learner state characteristic sequence based on time sequence is obtained, and an LSTM network is constructed to learn the sequence so as to mine data time sequence information, so that a learner state time sequence representation is obtained.

Specifically, in this embodiment, a two-layer LSTM is constructed, the input layer dimension of the LSTM neural unit is 16, the hidden layer includes 4 neurons, and the output layer dimension is 16.

S302.GAT network construction

The learner state characteristics are constructed through a state characteristic diagram to obtain a learner state characteristic diagram based on the cosine similarity of the learner state characteristics in a characteristic space, and a two-layer GAT network with a multi-head attention mechanism is constructed to learn the state characteristic diagram so as to mine the representation of the learner state space.

Specifically, in this embodiment, LeakyReLU is used as the activation function, and the activation function is obtained

Vector for layer l-1 of GAT network

The formalization of the vector aggregation at the l-th layer is expressed as

Wherein the content of the first and second substances,

is a sample point in the state feature map, N _i Is a sample

Is used to determine the neighbor vector index list of,

for attention coefficients, W is a parameter matrix, concat represents the splicing operation, · ^T Representing the transpose operation of the matrix, a is a weight matrix connecting two layers in a single-layer feedforward neural network.

S303. construction of full connection layer group

And constructing a fully-connected layer group consisting of two fully-connected layers to integrate all coding information to predict the abnormal degree of the learner state, wherein the input of the fully-connected layer group is the connection of three vectors of the sequential representation of the learner state, the spatial representation of the learner state and the image characteristic of the learner.

Specifically, in this embodiment, the first fully-connected layer size is 48 × 16, tanh is used as the activation function, and the second fully-connected layer size is 16 × 1. Recording the output of the fully connected layer group as z, using

The output of the second fully connected layer is mapped as an activation function to a probability representing the degree of abnormality of the learner's learning state.

Step 4, LSTM-GAT deep network training based on multi-view features and label reconstruction

The LSTM-GAT deep network training comprises three steps of training data preparation, network parameter updating and pseudo label reconstruction. In the training data preparation, the learning achievement is mapped to [0, 1] as a noise label, and the learner state feature and the learner portrait feature are used as training features. Each training iteration comprises two parts, namely network parameter updating and pseudo label reconstruction updating, and the network parameters are updated during network training; when the pseudo label is reconstructed, firstly, a reliable sample is selected, then a subgraph is constructed by taking the unreliable sample as a center, and labels of the reliable sample in the subgraph are aggregated to reconstruct the label of the center unreliable sample. Fig. 5 shows an implementation process of LSTM-GAT deep network training, which specifically includes the following steps:

s401. training data preparation

In order to predict the abnormal degree of the state of the learner in the stage, each state feature in the school stage is combined with the learner portrait feature of the current school stage to form a training sample example.

Specifically, in the embodiment, the data in 2017 and 2019 are divided into a training set and a verification set according to the ratio of 9: 1, and the data in 2020 is used as a test set; and training the model by using the constructed training set, selecting the model by using the verification set, and detecting the effect of the model by using the test set. The specific training process is as follows: a set of training samples after pre-processing and encoding can be represented as

Wherein

Representing a 16-dimensional learner-state feature vector,

is composed of

The 16-dimensional learner of the school period of the represented log segment draws features,

is mapped to [0, 1]]The noise label of (a) is set,wherein

Is a status feature

The corresponding learner is at

The score set corresponding to the school period is,

indicating the score of the ith subject of the kth school term. Will be provided with

The sample features in the method are sequentially subjected to feature coding, state feature sequence construction and state feature diagram construction to obtain a training set, a verification set and a test set which can be processed in the LTSM-GAT network.

S402, updating network parameters

Each training iteration of the LSTM-GAT comprises two parts of network parameter updating and pseudo label reconstruction, and the network parameter updating part is used for training on the basis of pseudo labels obtained by noise label reconstruction so as to update network parameters.

Specifically, in this embodiment, the network parameters are initialized based on the Xavier initialization mode to accelerate the training speed and reduce the gradient dispersion. Then, inputting the state characteristic sequence into LSTM network part and inputting the state characteristic diagram into GAT network part to obtain the learner state time sequence representation and learner state space representation. And then the learner state time sequence representation, the learner state space representation and the learner portrait characteristics are spliced and input into a full-connection layer group to obtain a predicted learning state abnormal probability of the learner. And finally, calculating the mean square error of the abnormal probability of the learning state and the pseudo label obtained based on the noise label reconstruction and updating the network parameters based on a back propagation algorithm.

S403. pseudo label reconstruction

Each training iteration of the LSTM-GAT comprises two parts of LSTM-GAT network training and pseudo label reconstruction, and the pseudo label reconstruction part is used for iteratively reconstructing a noise label based on the output of the LSTM-GAT network in the training, so that the pseudo label closer to the abnormal degree of the learning state is constructed to reduce label noise.

Specifically, in the present embodiment, the sample is treated

Firstly, calculating the difference of pseudo labels based on the local continuity of time sequence, the local consistency of space and the prediction error of samples

Wherein

Being the sum of the elements of the ith row of the matrix S,

is the sum of the elements of the first row of the matrix D, y' _i The representation is based on

Constructed pseudo-tag of SEQ _i Represents a sample x _i List of indexes of sample in sequence, D _ij Representing a sample x _i And x _j Number of time segments that are different in sequence. Thereby obtaining a reliable sample set

For subsequent training and pseudo-label reconstruction. Then, if (e) is satisfied _s ＜0.01)∧(e _g ＜0.01)∧(e _l < 0.01) the sample is considered reliable, otherwise the sample is unreliable. Finally, if

If the sample is unreliable, it is used as the central sample and the reliable sample on the state feature diagram is selectedConstructing subgraphs near the neighboring samples, computing

The center sample pseudo label is updated with the aggregate subgraph information.

Step 5, the learner predicts the abnormal learning state

A trained learner abnormal learning state prediction model (fig. 7) is obtained through step 4, and the probability of the learner learning state abnormality to be predicted can be estimated based on the model.

Specifically, in this embodiment, the learner registration information and learner log information to be predicted are first preprocessed in step 1) and step 2) to obtain the state feature code x ^e Learner image feature coding x ^s And a status signature sequence SEQ (fig. 6-S501). Secondly, x is ^e And the preamble thereof as input, obtaining the state characteristic time sequence representation of the sample through the LSTM network (FIGS. 6-S502). Again, calculate x ^e And (3) similarity of the state features of the samples in the training set, finding 10 nearest neighbors in the training set, and representing the state feature space characterization of the samples to be predicted based on the output of the GAT network with 10 nearest neighbors aggregated by distance (FIGS. 6-S503). Finally, the time sequence representation of the connection state characteristic, the state characteristic space representation and the learner portrait characteristic are used as the input of the full connection layer group, and a learner state abnormity probability p is obtained through the full connection layer group _x (FIG. 6-S504).

It will be understood by those skilled in the art that the foregoing is only exemplary of the method of the present invention and is not intended to limit the invention, and that any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims

1. A learner abnormal learning state prediction method facing online education is characterized by comprising the following steps:

2. The method as claimed in claim 1, wherein the method for predicting the abnormal learning state of the learner for online education comprises the following steps:

1) learner enrollment information and journal information processing

3) LSTM-GAT deep network construction

5) learner abnormal learning state prediction

3. The method as claimed in claim 2, wherein the learner registration information and log information processing in step 1) comprises the following steps:

step 1: learner enrollment information processing

the one-hot encoding method comprises the following detailed steps:

s1. classify the set of characteristics into

s3. for all i e {1, 2, 3, …, l ^c H, mixing c _i Sequentially coding to obtain one-hot vectors of all the category characteristics;

step 2: log information processing

the z-score method comprises the following specific steps:

s1. the set of log feature is

s2. will feature v _i Normalization was performed according to the following z-score equation:

step 3: learner portrait feature coding construction

The learner registration information and the log information of the set school date are combined to macroscopically describe the general state of the learner in the set school date, the general learning state of the learner has an important reference function for predicting the learning state of the learner, and the learner registration information and the log information are combined and then encoded to construct a learner portrait characteristic capable of describing the general learning state of the learner; learner portrait feature coding construction firstly connects the features generated in Step1 and Step2 to form a learner initial feature to describe the learner, then constructs a masking self-encoder consisting of a linear encoder and a linear decoder to process sparse features, and mines deep association of the learner initial feature through partial masking and restoring of the learner initial feature; the set encoder consists of an input layer and two hidden layers, the first hidden layer consisting of

A second hidden layer consisting of

inputting the feature vector into a coder network after random mask, wherein the mask rate takes t as step length in training and sets a sequence M ^t Step growth, t is a super parameter controlling the rate of increase of mask rate, M ^t A super-reference sequence for controlling the growth rate; the loss function is set as the mean square error loss, and the decoder output vector is set as

4. The method as claimed in claim 3, wherein the step 2) of processing the log segment information and constructing the state feature sequence and the state feature diagram specifically comprises the following steps:

step 1: log fragment information processing

And

the rest networks and training settings are the same as the construction of the learner portrait feature codes mentioned above; embedding the linked features by using a trained encoder to obtainLearner state characteristics;

step 2: state signature sequence construction

step 3: state feature graph construction

is a state feature similarity matrix, S _ij Indicating status characteristics

And

cosine similarity matrix therebetween, then

Construction of graphs based on cosine similarity and k-nearest neighbors, i.e. on state featuresEach sample point in the embedding space is connected with the most similar k sample points to form an undirected graph, and the adjacent matrix of the constructed graph is recorded as A ^n×n Then the graph structure is formalized as:

wherein the content of the first and second substances,

The formalization is represented as:

thereby obtaining a state signature and its adjacency matrix representation.

5. The method as claimed in claim 4, wherein the constructing of the LSTM-GAT deep network in the step 3) comprises the following steps:

step 1: LSTM network construction

Step 2: GAT network construction

step 3: fully connected layer set construction

the sigmoid function is formally expressed as follows:

6. The method of claim 5, wherein the learner's abnormal learning state prediction method for online educationCharacterized in that the vector of layer l-1 of the GAT network

The vector aggregation formalization at layer i is represented as:

wherein the content of the first and second substances,

is a sample point in the state feature map, N _i Is a sample

Of the neighbor vector index list, α _ij W is a parameter matrix for the attention coefficient;

the attention coefficient calculation formalized is expressed as:

the LeakyReLU function is formalized as:

where k is a hyper parameter representing the slope.

7. The method as claimed in claim 6, wherein the step 4) of the LSTM-GAT deep web training based on multi-view feature and label reconstruction specifically comprises the following steps:

step 1: training data preparation

In order to predict the abnormal degree of the state of the learner in stages, each state feature in the scholarly stage is combined with the learner portrait feature of the current scholarly stage to form a training sample example, and the preprocessed and coded training sample set is represented as

Wherein

The characteristics of the state of the learner are shown,

is composed of

is composed of

Then calculate the kth school term noise label

Is formalized as:

wherein

Showing the full score of the ith subject of the kth school term; will be provided with

step 2: network parameter update

The initialization of the network parameters is very important for improving the training speed and the convergence of the deep learning network, so that the network parameters are initialized based on an Xavier initialization mode to accelerate the training speed and reduce the gradient dispersion; noting that the input dimension of the layer where the parameter w is located is i, the output dimension is o, and the Xavier initialization method is formally expressed as follows:

step 3: pseudo-tag reconstruction

wherein

And

reliable sample sets selected based on time sequence local continuity, spatial local consistency and sample prediction errors respectively; specifically, the time sequence local continuity means that the learner has continuous learning states in a short period of continuous time, so that the continuous samples in the state feature sequence have similar labels, and therefore, the reliability of the samples in the time sequence is measured by using the pseudo label difference of the continuous samples in the state feature sequence; the spatial local consistency means that similar samples in the state feature diagram have similar state features, and the samples with similar features have similar labels, so the reliability of the samples in space is measured by using the pseudo label difference of the similar samples on the state feature diagram; meanwhile, the mean square error is used for measuring the error between the sample prediction and the pseudo label and the difference between the sample pseudo labels; then

And

the formalization of the collection is represented as:

being the sum of the elements of the ith row of the matrix S,

For subsequent training and pseudo label reconstruction;

wherein

Representing a noise sample x _i List of reliable samples in the neighborhood.

8. The method as claimed in claim 7, wherein the step 5) of predicting the abnormal learning state of the learner specifically includes the following steps:

wherein GAT (x) represents the output of the sample x after being processed by the GAT network, N ^train Represents a sample x ^e Neighbor set in training set, S (x) ₁ ，x ₂ ) Represents a sample x ₁ ，x ₂ Cosine similarity between them; the time sequence representation of the state feature output by the LSTM part, the space representation of the state feature output by the GAT part and the learner portraitThe characteristic coding connection is used as the input of a full-connection layer group, and the abnormal probability p of the state of a learner is obtained through the full-connection layer group _x ∈[0，1]，p _x The larger the learner's state is, the higher the degree of abnormality is, and vice versa.