CN115730244A - Classroom behavior classification method and device combining text classification and sequence labeling - Google Patents

Classroom behavior classification method and device combining text classification and sequence labeling Download PDF

Info

Publication number
CN115730244A
CN115730244A CN202211427705.2A CN202211427705A CN115730244A CN 115730244 A CN115730244 A CN 115730244A CN 202211427705 A CN202211427705 A CN 202211427705A CN 115730244 A CN115730244 A CN 115730244A
Authority
CN
China
Prior art keywords
classification
teaching
classroom
training
loss function
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211427705.2A
Other languages
Chinese (zh)
Inventor
林晓
沈锴成
阙维俏
李岩
王龚
张波
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Normal University
Original Assignee
Shanghai Normal University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Normal University filed Critical Shanghai Normal University
Priority to CN202211427705.2A priority Critical patent/CN115730244A/en
Publication of CN115730244A publication Critical patent/CN115730244A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the application discloses a classroom behavior classification method and device combining text classification and sequence labeling; the method comprises the following steps: acquiring teaching video data and performing voice transcription on the teaching video data to form initial corpus; preprocessing the initial corpus to obtain chapter data, and inputting the chapter data into an unsupervised pre-training model for incremental pre-training; constructing a combined loss function of text classification and sequence labeling to train the chapter data, and predicting the chapter data by using a supervised pre-training model to obtain a teaching behavior classification label matrix; analyzing the teaching behavior classification label matrix to obtain a classroom behavior classification combining text classification and sequence labeling; according to the embodiment of the application, the computing resources and the data volume are quickly and accurately analyzed and classified, and the classification effect of the classroom teaching behaviors is efficient and accurate.

Description

Classroom behavior classification method and device combining text classification and sequence labeling
Technical Field
The embodiment of the application relates to the technical field of classroom behavior classification, in particular to a classroom behavior classification method and device combining text classification and sequence labeling.
Background
The reform of classroom teaching in China is entering the stage of promoting deep learning of students, and not only needs to evaluate the effectiveness of external forms of teaching such as autonomous cooperation exploration, but also needs to analyze the effect of teaching languages occupying the most time in classroom teaching, and evaluate the effectiveness of the teaching languages in promoting the autonomous participation and the development of high-level thinking ability of the students. Frandess, an american scholars, states that teachers and students in classroom teaching activities communicate primarily by language, which accounts for 80% of the overall classroom teaching activity. However, conventional teaching language analysis has not been able to meet the needs of the education field in today's online classroom setting. In conventional analysis of teaching languages, the teaching languages are manually recorded and then manually coded and labeled. At present, the voice recognition and natural language processing technology in artificial intelligence can be used for replacing the process, under the energization of more efficient and accurate informatization technology, classroom language information in online classrooms and other classroom video recording is segmented, recognized, classified, coded and counted, and objective, actual and global teaching evaluation is provided for the classroom of new and old teachers on the basis of the segmentation, recognition, classification, coding and counting. However, the existing classroom teaching analysis technology cannot rapidly and accurately analyze and classify the computing resources and the data volume, and the classification prediction effect is poor.
Disclosure of Invention
The embodiment of the application provides a classroom behavior classification method and device combining text classification and sequence labeling, so that computing resources and data volume are analyzed and classified quickly and accurately, and classroom teaching behavior classification effect is efficient and accurate.
In a first aspect, an embodiment of the present application provides a classroom behavior classification method combining text classification and sequence labeling, where the method includes the following steps:
acquiring teaching video data and performing voice transcription on the teaching video data to form initial corpus;
preprocessing the initial corpus to obtain chapter data, and inputting the chapter data into an unsupervised pre-training model for incremental pre-training;
constructing a combined loss function of text classification and sequence labeling to train the chapter data, and predicting the chapter data by using a supervised pre-training model to obtain a teaching behavior classification label matrix;
and analyzing the teaching behavior classification label matrix to obtain classroom behavior classification combining text classification and sequence labeling.
Further, before inputting the chapter data into the unsupervised pre-training model for incremental pre-training, the method further includes:
constructing sentence-level comparison learning samples, and performing three types of loss function prediction and addition on the comparison learning samples through a softmax classifier to obtain an overall loss function of the NSP task:
L cNSP =αL NSP +βL PSP +σL RSP
Figure BDA0003945013470000021
Figure BDA0003945013470000022
Figure BDA0003945013470000023
wherein L is cNSP Is a warpOverall loss function of NSP task after improvement of over-contrast learning method; l is NSP 、L PSP And L RSP Loss functions of three subtasks under the cNSP task; alpha, beta and sigma are weight coefficients corresponding to the three subtasks in the total loss function; h is the output of the last layer of the hidden layer of the neural network under different subtasks; softmax () represents the normalization process using the softmax function and calculates the cross entropy loss of the result;
constructing a word-level comparison learning sample, wherein a loss function of a word-level task is as follows:
Figure BDA0003945013470000024
wherein L is cMLM Represents the overall loss function of the MLM task after comparative learning, θ represents the parameter set of the model, k represents the length of each scrambled subsequence, and pos represents the position embedding specifying the word position in the Bert model input.
Further, the inputting the chapter data into an unsupervised pre-training model for incremental pre-training includes:
the loss function for the unsupervised training phase is:
L=η 1 L cNSP2 L cMLM
wherein eta is 1 And η 2 The weight coefficients are corresponding to the subtasks;
inputting the chapter data into a loss function of an unsupervised training phase to complete incremental training.
Further, the constructing a combined loss function of text classification and sequence annotation trains the chapter data, including:
in the sequence labeling task, a model structure of Bi-LSTM + CRF is used for completing the task, so that a loss function of the stage is generated by the CRF, and the loss function of the sequence labeling task is as follows:
y st =CRF(H st );
wherein H st Is passing through B[ CLS ] obtained after ert coding]Sequence, CRF () represents the pair H st Followed by a conditional random field, thereby obtaining y st
Loss function for text classification task:
wherein, W lc Is a weight matrix among the classification tasks, H lc Is a sentence text representation for a text classification task, b lc Is an offset, from which y is derived lc
The trained joint loss function is obtained by the joint loss functions of two tasks of sequence marking and text classification:
y all =αy lc +βy st
inputting the chapter data into the joint loss function for training.
Further, the predicting the chapter data by using the supervised pre-training model to obtain a teaching behavior classification label matrix includes:
after the training is finished, predicting the chapter data by using a supervised pre-training model to obtain a group of label data;
arranging the discourse data according to a template of a teaching behavior classification label matrix to obtain a teaching behavior classification label matrix W teach
The teaching behavior classification label matrix is an m x n matrix, the size of a square matrix (namely the size of m and n) is different for classes with different lengths, m is called sentence batch, n is called sentence batch, and the whole matrix depicts the arrangement sequence of teaching behavior labels in n batches with the same batch (m).
Further, the analyzing the teaching behavior classification label matrix to obtain a classroom behavior classification combining text classification and sequence labeling includes:
counting the percentage of the teacher and the student occupying the total language of the class;
counting the teacher-student interaction time;
counting the percentage of the specific teaching behaviors in the total teaching behaviors;
and analyzing the classroom teaching mode according to the percentage of the teacher and student language to the total classroom language, the teacher and student interaction time and the percentage of the specific teaching behavior to the total teaching behavior.
Further, the preprocessing the initial corpus includes:
performing a first segmentation operation on the initial corpus to obtain first segmentation data;
and carrying out second segmentation operation on the first corpus to obtain chapter data.
In a second aspect, an apparatus for classifying classroom behavior combining text classification and sequence annotation includes:
the data acquisition module is used for acquiring teaching video data and performing voice transcription on the teaching video data to form initial corpus;
the first processing module is used for preprocessing the initial corpus to obtain chapter data, and inputting the chapter data into an unsupervised pre-training model for incremental pre-training;
the second processing module is used for constructing a combined loss function of text classification and sequence annotation to train the chapter data, and predicting the chapter data by using a supervised pre-training model to obtain a teaching behavior classification label matrix;
and the behavior classification module is used for analyzing the teaching behavior classification label matrix to obtain the classroom behavior classification combining text classification and sequence labeling.
In a third aspect, an embodiment of the present application further provides a computer device, including: a memory and one or more processors;
the memory to store one or more programs;
when executed by the one or more processors, cause the one or more processors to implement a method for classroom behavior classification that combines text classification and sequence tagging as described above.
In a fourth aspect, embodiments of the present application further provide a storage medium containing computer-executable instructions for performing a method of classroom behavior classification combining text classification and sequence labeling as described above when executed by a computer processor.
According to the embodiment of the application, the teaching video data are obtained and subjected to voice transcription to form an initial corpus; preprocessing the initial corpus to obtain chapter data, and inputting the chapter data into an unsupervised pre-training model for incremental pre-training; constructing a combined loss function of text classification and sequence labeling to train the chapter data, and predicting the chapter data by using a supervised pre-training model to obtain a teaching behavior classification label matrix; analyzing the teaching behavior classification label matrix to obtain classroom behavior classification combining text classification and sequence labeling; the classification method and the classification device realize rapid and accurate analysis and classification of computing resources and data quantity, and have high-efficiency and accurate classification effect of classroom teaching behaviors.
Drawings
Fig. 1 is a flowchart of a classroom behavior classification method combining text classification and sequence labeling according to an embodiment of the present application;
FIG. 2 is a diagram of an algorithmic model framework provided by an embodiment of the present application;
FIG. 3 is a diagram illustrating an effect of teaching evaluation using a classification label matrix of teaching behaviors provided in an embodiment of the present application;
fig. 4 is a schematic structural diagram of a classroom behavior classification device combining text classification and sequence labeling according to an embodiment of the present application;
fig. 5 is a schematic structural diagram of a computer device according to an embodiment of the present application.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, specific embodiments of the present application will be described in detail with reference to the accompanying drawings. It is to be understood that the specific embodiments described herein are merely illustrative of and not restrictive on the broad application. It should be further noted that, for the convenience of description, only some but not all of the relevant portions of the present application are shown in the drawings. Before discussing exemplary embodiments in greater detail, it should be noted that some exemplary embodiments are described as processes or methods depicted as flowcharts. Although a flowchart may describe the operations (or steps) as a sequential process, many of the operations can be performed in parallel, concurrently or simultaneously. In addition, the order of the operations may be re-arranged. The process may be terminated when its operations are completed, but may have additional steps not included in the figure. The processes may correspond to methods, functions, procedures, subroutines, and the like.
According to the method, a set of classroom behavior classification method combining text classification and sequence labeling is established, computing resources and data volume are analyzed and classified quickly and accurately, and classroom teaching behavior classification effect is efficient and accurate.
The classroom behavior classification method combining text classification and sequence labeling provided in the embodiment can be executed by a classroom behavior classification device combining text classification and sequence labeling, and the classroom behavior classification device combining text classification and sequence labeling can be realized in a software and/or hardware mode and is integrated in classroom behavior classification equipment combining text classification and sequence labeling. The classroom behavior classification device combining text classification and sequence labeling can be a computer or the like.
Fig. 1 is a flowchart of a classroom behavior classification method combining text classification and sequence labeling according to an embodiment of the present application. Referring to fig. 1, the method comprises the steps of:
and step 110, acquiring teaching video data and performing voice transcription on the teaching video data to form initial corpus.
For example, since the subject object of the teaching evaluation in the embodiment of the present application is a course, when performing data preprocessing on the initial corpus, the entire corpus needs to be segmented into classroom language texts at chapter level. On the basis, the classroom language texts of the individual chapter level are segmented into sentences, the length of the sentences generally does not exceed 64 characters, and for the sentences exceeding 64 characters, the embodiment of the application adopts a segmentation mode. Since the semantic consistency of sentences can be possibly damaged by simple segmentation, the embodiment of the application completes the work in a manual labeling mode.
And 120, preprocessing the initial corpus to obtain chapter data, and inputting the chapter data into an unsupervised pre-training model for incremental pre-training.
Specifically, the preprocessing the initial corpus includes: performing first segmentation operation on the initial corpus to obtain first segmentation data;
and carrying out second segmentation operation on the first corpus to obtain chapter data.
Specifically, before inputting the chapter data into the unsupervised pre-training model for incremental pre-training, the method further includes:
establishing an unsupervised pre-training model, which specifically comprises the following steps:
constructing sentence-level comparison learning samples, and performing three types of loss function prediction and addition on the comparison learning samples through a softmax classifier to obtain an overall loss function of the NSP task:
L cNSP =αL NSP +βL PSP +σL RSP
Figure BDA0003945013470000071
Figure BDA0003945013470000072
Figure BDA0003945013470000073
wherein L is cNSP The overall loss function of the NSP task after the improvement of the comparative learning method; l is NSP 、L PSP And L RSP Loss functions of three subtasks under the cNSP task; alpha, beta and sigma are corresponding weight coefficients of the three subtasks in the total loss function; h is the output of the last layer of the hidden layer of the neural network under different subtasks; softmax () stands for normalized using the softmax functionCalculating the cross entropy loss of the result;
constructing a word-level comparison learning sample, wherein a loss function of a word-level task is as follows:
Figure BDA0003945013470000074
wherein L is cMLM Represents the overall loss function of the MLM task after comparative learning, θ represents the parameter set of the model, k represents the length of each scrambled subsequence, and pos represents the position embedding specifying the word position in the Bert model input.
Further, the inputting the chapter data into the unsupervised pre-training model for incremental pre-training includes:
the loss function for the unsupervised training phase is:
L=η 1 L cNSP2 L cMLM
wherein eta is 1 And η 2 The weight coefficients are corresponding to the subtasks;
inputting the chapter data into a loss function of an unsupervised training phase to complete incremental training.
And 130, constructing a combined loss function of text classification and sequence labeling to train the chapter data, and predicting the chapter data by using a supervised pre-training model to obtain a teaching behavior classification label matrix.
Because the classroom language text data is not only discourse text data which can be separated in a single sentence and classified sentence by sentence, but also sentence primitive sequence data with logicality and orderliness, two natural language processing and predicting tasks of text classification and sequence labeling can be carried out at the same time.
Specifically, referring to fig. 2, the training of the chapter data by constructing the combined loss function of text classification and sequence annotation includes:
in the sequence labeling task, a model structure of Bi-LSTM + CRF is used for completing the task, so that a loss function of the stage is generated by the CRF, and the loss function of the sequence labeling task is as follows:
y st =CRF(H st );
wherein H st For [ CLS ] obtained after Bert encoding]Sequence, CRF () represents the pair H st Followed by a conditional random field, thereby obtaining y st
Loss function for text classification task:
wherein, W lc Is a weight matrix among the classification tasks, H lc Is a sentence text representation for a text classification task, b lc Is an offset, from which y is derived lc
The trained joint loss function is obtained by the joint loss functions of the two tasks of sequence marking and text classification:
y all =αy lc +βy st
inputting the chapter data into the joint loss function for training.
Further, the predicting the chapter data by using the supervised pre-training model to obtain a teaching behavior classification label matrix includes:
after the training is finished, predicting the chapter data by using a supervised pre-training model to obtain a group of label data;
arranging the chapter data according to a template of a teaching behavior classification label matrix to obtain a teaching behavior classification label matrix W teach
The teaching behavior classification label matrix is an m x n matrix, for classes with different lengths, the size of a square matrix (namely the size of m and n) is different, m is called sentence batch, n is called sentence batch, and the whole matrix depicts the arrangement sequence of teaching behavior labels in n batches with the same batch (m).
Illustratively, the input of the model is sequential chapter-level classroom teaching language data which is manually pre-labeled, and in the unsupervised training stage, the data does not need to be labeled, and only the language text needs to be sequentially input into the model to perform two pre-training tasks. For each MLM task, the construction rule of positive and negative samples in the comparative learning is as follows: one third of the words maintain normal order and are masked by the mask flag, one third of the words swap the current order and are masked by the mask flag, and one third of the words are replaced by random words and are masked by the mask flag. For each NSP task, the construction rule of positive and negative samples in the comparative learning is as follows: one third of the sentences keeps the original task, the Next Sentence (NSP) is predicted, one third of the sentences is changed to predict the Previous Sentence (PSP), one third of the sentences is changed to predict a randomly substituted previous or next sentence (RSP), and the robustness of the model is remarkably improved.
It can be understood that, because the teaching language text is continuous, logical and not in random order exchange, on the basis of performing a simple text classification task for teaching behaviors, the classification of a continuous class language text can be regarded as a sequence labeling task. The embodiment of the application creatively provides a model framework for simultaneously carrying out two tasks, and the task effect and performance of the model for classifying teaching behaviors are improved by combining a double-task loss function.
It can be understood that, because the pre-training bert model is selected in the embodiment of the present application, in the unsupervised pre-training stage, the embodiment of the present application may perform incremental pre-training on the bert model using a large amount of unmarked linguistic data in the education domain, which may effectively improve the performance of the model on the downstream task. In the pre-training process, the robustness and generalization capability of the model are enhanced by using a mode of constructing positive and negative samples through contrast learning, and because the downstream task is a text classification and sequence labeling dual-task framework, the vector representation of the model is just enhanced by the MLM and NSP two pre-training tasks from the angle of words and the angle of sentences, so that the traditional pre-training mode only focusing on the MLM is broken through, and the potential value of the NSP task is excavated.
And 140, analyzing the teaching behavior classification label matrix to obtain a classroom behavior classification combining text classification and sequence labeling.
Specifically, counting the percentage of the teacher and student languages occupying the total language of the class; counting the interaction time of teachers and students; and counting the percentage of the specific teaching behaviors in the total teaching behaviors.
And analyzing the classroom teaching mode according to the percentage of the teacher and the student occupying the total language of the classroom, the teacher and student interaction time length and the percentage of the specific teaching behavior occupying the total teaching behavior.
Illustratively, classroom languages are first divided into two broad categories, teacher and student. The teacher's language is divided into: there are 6 modules for questioning, teaching, instructing, accepting students 'opinions, responding to students' questioning, and answering their own questions. And the teacher questions are subdivided into: the system comprises 5 modules of a focused teaching content question, a question for exciting element cognition, a question for exciting student participation, a question for focused evaluation and the like. The questions focused on the teaching contents are subdivided into: questions that require the student to answer yes or no, questions that require the student to simply name or state, questions that require the student to describe or interpret 3 modules. The language of the student is divided into: answer, ask questions, teach, instruct, accommodate a total of 5 modules. According to the two examples, the classroom language behaviors are classified and coded according to a certain standard, the text classification technology is used for automatically dividing the classroom language behaviors, once the classroom language behaviors have scientific and reasonable coding indexes, accurate, scientific and multidimensional classification and arrangement can be carried out on classroom languages, and meaningful education information can be counted.
Fig. 3 shows a teaching behavior classification label matrix outputted by a model and a method for teaching evaluation by using the matrix, through the matrix, the sequence, time point, rule and frequency of various teaching behaviors in a class can be counted, and then the class style and corresponding teaching effect of a teacher can be deduced according to the information. The label matrix is arranged according to the classroom evolution sequence, the occupation ratios of various behaviors of a teacher and students can be analyzed, key behaviors made by the teacher in different periods and the influence of the key behaviors on the students can be analyzed through intercepting time segments, and then the teaching mode of the whole classroom is judged.
Illustratively, the percentage of the teacher and student languages occupying the total class language is counted: assuming that the total speech amount of a class is composed of 5000 sentences, the labels of the 5000 sentences are included in the class, and the behavior proportion of the teachers and students in the class can be obtained by counting the number of the behavior labels from the teachers and the behavior labels from the students, so that the class is judged to be more inclined to a class with the teachers as the leading part or a class with the students as the leading part.
Illustratively, counting the teacher-student interaction duration: when teachers and students interact, behavior labels of teachers and students are alternately changed in a matrix in a sequence, and the length of the change sequence is counted to obtain the interaction duration of the teachers and the students in a class.
Illustratively, the percentage of a certain teaching behavior in the total teaching behavior is counted: some key teaching behaviors, such as technical operation of a teacher, can be extracted from the matrix, so that the time of the teacher in the whole class is analyzed, the technical operation is shown to students, and the value of the teacher behavior on the classroom effect is laid down for analyzing.
Exemplary, overall analysis classroom teaching model: through the combination to different label types, can learn to say that the proportion of teaching action of the teacher of type of giving a sentence, independently explore type, question interaction type is different in the classroom, through the proportion of accounting different action label type combinations, can synthesize and separate out the classroom teaching mode of a teacher in the middle of a classroom, and then improve the classroom for the teacher, carry out the experience study between new and old teacher and provide help.
In the above, the model provided in the embodiment of the application uses a pre-training model framework of upstream unsupervised learning and downstream supervised learning, positive and negative samples are constructed in an unsupervised learning part by using a comparative learning method, incremental pre-training is performed on the model by using unlabeled data in the education field, and the effect of the model on the classification prediction performance is remarkably improved.
The embodiment of the application considers the particularity of the classroom language text, creatively combines the tasks of sequence marking and text classification together to solve the classification problem of classroom behaviors, and fully utilizes the value of the NSP pre-training task which is less concerned by the related academic circles and is in the unsupervised learning stage.
According to the embodiment of the application, the teaching behavior label matrix is constructed, and the artificial intelligence model and the algorithm are combined with the classroom behavior analysis, so that the teaching behavior analysis tools and ways are widened. Compared with manual teaching evaluation, the automatic teaching evaluation performed by using the artificial intelligence is more efficient and objective, and is beneficial to analyzing a large number of courses at one time and screening out required teaching evaluation information. The algorithm is a teaching evaluation result obtained by analyzing based on the teaching language, so the algorithm also has certain reference value and reference significance for procedural teaching evaluation.
On the basis of the foregoing embodiment, fig. 4 is a schematic structural diagram of a classroom behavior classification device combining text classification and sequence labeling according to an embodiment of the present application. Referring to fig. 4, the classroom behavior classification apparatus combining text classification and sequence labeling provided in this embodiment specifically includes: a data acquisition module 101, a first processing module 102, a second processing module 103, and a behavior classification module 104.
The data acquisition module is used for acquiring teaching video data and performing voice transcription on the teaching video data to form initial corpus; the first processing module is used for preprocessing the initial corpus to obtain chapter data, and inputting the chapter data into an unsupervised pre-training model for incremental pre-training; the second processing module is used for constructing a combined loss function of text classification and sequence annotation to train the chapter data, and predicting the chapter data by using a supervised pre-training model to obtain a teaching behavior classification label matrix; the behavior classification module is used for analyzing the teaching behavior classification label matrix to obtain classroom behavior classification combining text classification and sequence labeling.
The teaching video data are obtained and subjected to voice transcription to form an initial corpus; preprocessing the initial corpus to obtain chapter data, and inputting the chapter data into an unsupervised pre-training model for incremental pre-training; constructing a combined loss function of text classification and sequence annotation to train the chapter data, and predicting the chapter data by using a supervised pre-training model to obtain a teaching behavior classification label matrix; analyzing the teaching behavior classification label matrix to obtain classroom behavior classification combining text classification and sequence labeling; the classification method and the classification device realize rapid and accurate analysis and classification of computing resources and data quantity, and have high-efficiency and accurate classification effect of classroom teaching behaviors.
The classroom behavior classification device combining text classification and sequence labeling provided by the embodiment of the application can be used for executing the classroom behavior classification method combining text classification and sequence labeling provided by the embodiment, and has corresponding functions and beneficial effects.
The embodiment of the application also provides computer equipment which can integrate the classroom behavior classification device combining text classification and sequence labeling provided by the embodiment of the application. Fig. 5 is a schematic structural diagram of a computer device according to an embodiment of the present application. Referring to fig. 5, the computer apparatus includes: an input device 43, an output device 44, a memory 42, and one or more processors 41; the memory 42 for storing one or more programs; when executed by the one or more processors 41, cause the one or more processors 41 to implement the classroom behavior classification methodology that combines text classification and sequence labeling as provided in the embodiments above. Wherein the input device 43, the output device 44, the memory 42 and the processor 41 may be connected by a bus or other means, for example, in fig. 5.
The processor 41 executes various functional applications of the device and data processing, i.e., implements the above-described classroom behavior classification method combining text classification and sequence labeling, by executing software programs, instructions, and modules stored in the memory 42.
The computer equipment provided by the embodiment can be used for executing the classroom behavior classification method combining text classification and sequence labeling, and has corresponding functions and beneficial effects.
Embodiments of the present application further provide a storage medium containing computer-executable instructions, which when executed by a computer processor, are configured to perform a method for classifying classroom behavior combining text classification and sequence annotation, where the method for classifying classroom behavior combining text classification and sequence annotation includes: acquiring teaching video data and performing voice transcription on the teaching video data to form initial corpus; preprocessing the initial corpus to obtain chapter data, and inputting the chapter data into an unsupervised pre-training model for incremental pre-training; constructing a combined loss function of text classification and sequence labeling to train the chapter data, and predicting the chapter data by using a supervised pre-training model to obtain a teaching behavior classification label matrix; and analyzing the teaching behavior classification label matrix to obtain classroom behavior classification combining text classification and sequence labeling.
Storage medium-any of various types of memory devices or storage devices. The term "storage medium" is intended to include: mounting media such as CD-ROM, floppy disk, or tape devices; computer device memory or random access memory such as DRAM, DDRRAM, SRAM, EDORAM, lanbus (Rambus) RAM, etc.; non-volatile memory such as flash memory, magnetic media (e.g., hard disk or optical storage); registers or other similar types of memory elements, etc. The storage medium may also include other types of memory or combinations thereof. In addition, the storage medium may be located in a first computer apparatus in which the program is executed, or may be located in a different second computer apparatus connected to the first computer apparatus through a network (such as the internet). The second computer device may provide program instructions to the first computer for execution. The term "storage medium" may include two or more storage media that may reside in different locations, such as in different computer devices that are connected via a network. The storage medium may store program instructions (e.g., embodied as a computer program) that are executable by one or more processors.
Of course, the storage medium provided in the embodiments of the present application contains computer-executable instructions, and the computer-executable instructions are not limited to the above-described classroom behavior classification method based on joint text classification and sequence labeling, and may also perform related operations in the classroom behavior classification method based on joint text classification and sequence labeling provided in any embodiment of the present application.
The classroom behavior classification device, the storage medium, and the computer device for combined text classification and sequence labeling provided in the foregoing embodiments may execute the classroom behavior classification method for combined text classification and sequence labeling provided in any embodiment of the present application, and reference may be made to the classroom behavior classification method for combined text classification and sequence labeling provided in any embodiment of the present application without detailed technical details described in the foregoing embodiments.
The foregoing is considered as illustrative only of the preferred embodiments of the invention and the principles of the technology employed. The present application is not limited to the particular embodiments described herein, but is capable of various obvious changes, rearrangements and substitutions as will now become apparent to those skilled in the art without departing from the scope of the invention. Therefore, although the present application has been described in more detail with reference to the above embodiments, the present application is not limited to the above embodiments, and may include other equivalent embodiments without departing from the spirit of the present application, and the scope of the present application is determined by the scope of the claims.

Claims (10)

1. A classroom behavior classification method combining text classification and sequence labeling is characterized by comprising the following steps:
acquiring teaching video data and performing voice transcription on the teaching video data to form initial corpus;
preprocessing the initial corpus to obtain chapter data, and inputting the chapter data into an unsupervised pre-training model for incremental pre-training;
constructing a combined loss function of text classification and sequence annotation to train the chapter data, and predicting the chapter data by using a supervised pre-training model to obtain a teaching behavior classification label matrix;
and analyzing the teaching behavior classification label matrix to obtain classroom behavior classification combining text classification and sequence labeling.
2. The method for classroom behavior classification with joint text classification and sequence labeling as claimed in claim 1, wherein before inputting the chapter data into an unsupervised pre-training model for incremental pre-training, the method further comprises:
constructing sentence-level comparison learning samples, and performing three types of loss function prediction and addition on the comparison learning samples through a softmax classifier to obtain an overall loss function of the NSP task:
L cNSP =αL NSP +βL PSP +σL RSP
Figure FDA0003945013460000011
Figure FDA0003945013460000012
Figure FDA0003945013460000013
wherein L is cNSP The overall loss function of the NSP task after the improvement of the comparative learning method; l is NSP 、L PSP And L RSP A loss function of three subtasks under the cNSP task; alpha, beta and sigma are corresponding weight coefficients of the three subtasks in the total loss function; h is the output of the last layer of the hidden layer of the neural network under different subtasks; softmax () represents the normalization process using the softmax function and calculates the cross entropy loss of the result;
constructing a word level comparison learning sample, wherein the loss function of the word level task is as follows:
Figure FDA0003945013460000014
wherein L is cMLM Representing the overall loss function of the MLM task after comparative learning,θ represents the parameter set of the model, k represents the length of each scrambled subsequence, and pos represents the position embedding that specifies the word position in the Bert model input.
3. The method for classroom behavior classification with joint text classification and sequence labeling as claimed in claim 2, wherein the inputting of the chapter data into an unsupervised pre-training model for incremental pre-training comprises:
the loss function for the unsupervised training phase is:
L=η 1 L cNSP2 L cMLM
wherein eta is 1 And η 2 The weight coefficients are corresponding to the subtasks;
inputting the chapter data into a loss function of an unsupervised training phase to complete incremental training.
4. The method for classifying classroom behavior in combination with text classification and sequence annotation as claimed in claim 1, wherein the training of the chapter data by constructing a combined loss function of text classification and sequence annotation comprises:
in the sequence labeling task, the model structure of Bi-LSTM + CRF is used to complete the task, so that the loss function at this stage is generated by the CRF, and the loss function of the sequence labeling task is:
y st =CRF(H st );
wherein H st Is [ CLS ] obtained after Bert coding]Sequence, CRF () stands for pair H st Followed by a conditional random field, thereby obtaining y st
Loss function for text classification task:
wherein, W lc Is a weight matrix among the classification tasks, H lc Is a sentence text representation for a text classification task, b lc Is an offset, from which y is derived lc
The trained joint loss function is obtained by the joint loss functions of two tasks of sequence marking and text classification:
y all =αy lc +βy st
inputting the chapter data into the joint loss function for training.
5. The classroom behavior classification method combining text classification and sequence labeling as defined in claim 4, wherein predicting the chapter data using a supervised pre-trained model to obtain a teaching behavior classification tag matrix comprises:
after the training is finished, predicting the chapter data by using a supervised pre-training model to obtain a group of label data;
arranging the discourse data according to a template of a teaching behavior classification label matrix to obtain a teaching behavior classification label matrix W teach
The teaching behavior classification label matrix is an m x n matrix, the size of a square matrix (namely the size of m and n) is different for classes with different lengths, m is called sentence batch, n is called sentence batch, and the whole matrix depicts the arrangement sequence of teaching behavior labels in n batches with the same batch (m).
6. The method for classifying classroom behavior in combination with text classification and sequence labeling according to claim 1, wherein the step of analyzing the matrix of teaching behavior classification tags to obtain classroom behavior classification in combination with text classification and sequence labeling comprises:
counting the percentage of the teacher and student languages occupying the total language of the class;
counting the teacher-student interaction time;
counting the percentage of the specific teaching behaviors in the total teaching behaviors;
and analyzing the classroom teaching mode according to the percentage of the teacher and the student occupying the total language of the classroom, the teacher and student interaction time length and the percentage of the specific teaching behavior occupying the total teaching behavior.
7. The method for classifying classroom behavior in combination with text classification and sequence annotation according to claim 1, wherein the preprocessing the initial corpus comprises:
performing first segmentation operation on the initial corpus to obtain first segmentation data;
and carrying out second segmentation operation on the first corpus to obtain chapter data.
8. The utility model provides a classroom action sorter of text classification and sequence mark jointly which characterized in that includes:
the data acquisition module is used for acquiring teaching video data and performing voice transcription on the teaching video data to form initial corpus;
the first processing module is used for preprocessing the initial corpus to obtain chapter data, and inputting the chapter data into an unsupervised pre-training model for incremental pre-training;
the second processing module is used for constructing a combined loss function of text classification and sequence annotation to train the chapter data, and predicting the chapter data by using a supervised pre-training model to obtain a teaching behavior classification label matrix;
and the behavior classification module is used for analyzing the teaching behavior classification label matrix to obtain the classroom behavior classification combining text classification and sequence labeling.
9. A computer device, comprising: a memory and one or more processors;
the memory for storing one or more programs;
when executed by the one or more processors, cause the one or more processors to implement a method of classroom behavior classification in conjunction with text classification and sequence labeling according to any of claims 1-7.
10. A storage medium containing computer-executable instructions for performing a method of classroom behavior classification in combination with text classification and sequence tagging according to any one of claims 1-7 when executed by a computer processor.
CN202211427705.2A 2022-11-15 2022-11-15 Classroom behavior classification method and device combining text classification and sequence labeling Pending CN115730244A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211427705.2A CN115730244A (en) 2022-11-15 2022-11-15 Classroom behavior classification method and device combining text classification and sequence labeling

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211427705.2A CN115730244A (en) 2022-11-15 2022-11-15 Classroom behavior classification method and device combining text classification and sequence labeling

Publications (1)

Publication Number Publication Date
CN115730244A true CN115730244A (en) 2023-03-03

Family

ID=85295807

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211427705.2A Pending CN115730244A (en) 2022-11-15 2022-11-15 Classroom behavior classification method and device combining text classification and sequence labeling

Country Status (1)

Country Link
CN (1) CN115730244A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114021580A (en) * 2021-10-14 2022-02-08 华南师范大学 Classroom conversation processing method, system and storage medium based on sequence pattern mining

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114021580A (en) * 2021-10-14 2022-02-08 华南师范大学 Classroom conversation processing method, system and storage medium based on sequence pattern mining

Similar Documents

Publication Publication Date Title
Wulff et al. Computer-based classification of preservice physics teachers’ written reflections
Longo Empowering qualitative research methods in education with artificial intelligence
Alikovich Eshbayev et al. An overview of a state of the art on developing soft computing-based language education and research systems: a survey of engineering English students in Uzbekistan
CN115730244A (en) Classroom behavior classification method and device combining text classification and sequence labeling
Jiang et al. Examining computational thinking processes in modeling unstructured data
Holmberg et al. A feature space focus in machine teaching
CN112883723A (en) Deep neural network cognition level evaluation model based on Broumm cognition classification theory
Tharmaseelan et al. Revisit of automated marking techniques for programming assignments
Bai et al. Gated character-aware convolutional neural network for effective automated essay scoring
Tan et al. Does informativeness matter? Active learning for educational dialogue act classification
Makhlouf et al. Mining Students' Comments to Build an Automated Feedback System.
Alrajhi et al. Plug & Play with Deep Neural Networks: Classifying Posts that Need Urgent Intervention in MOOCs
Wang et al. Teacher talk moves in k12 mathematics lessons: Automatic identification, prediction explanation, and characteristic exploration
Cummaudo et al. Emotions in computer vision service Q&A
LO et al. Do my students understand? Automated identification of doubts from informal reflections
Das et al. FACToGRADE: Automated essay scoring system
Pallegama et al. Evaluating teaching content and assessments based on learning outcomes
Xue Design of language assisted learning model and online learning system under the background of artificial intelligence
CN116776154B (en) AI man-machine cooperation data labeling method and system
Gamieldien et al. Utilizing Natural Language Processing to Examine Self-Reflections in Self-Regulated Learning
Yang Analysis of english cultural teaching model based on machine learning
Wen et al. Research on Chinese Character Feature Extraction and Modeling of Children's Cognitive Law Based on LightGBM Algorithm
Wang et al. Automatic Teaching Plan Grading with Distilled Multimodal Education Knowledge
Lai Evaluation of Business English Practical Teaching Based on Decision Tree
Marfani et al. Analysis of learners’ sentiments on MOOC forums using natural language processing techniques

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination