CN115171042A - Student classroom behavior identification method, device, terminal equipment and medium - Google Patents

Student classroom behavior identification method, device, terminal equipment and medium Download PDF

Info

Publication number
CN115171042A
CN115171042A CN202210786689.XA CN202210786689A CN115171042A CN 115171042 A CN115171042 A CN 115171042A CN 202210786689 A CN202210786689 A CN 202210786689A CN 115171042 A CN115171042 A CN 115171042A
Authority
CN
China
Prior art keywords
class
loss value
classroom behavior
classroom
samples
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210786689.XA
Other languages
Chinese (zh)
Inventor
余承健
洪洲
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangzhou City Polytechnic
Original Assignee
Guangzhou City Polytechnic
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangzhou City Polytechnic filed Critical Guangzhou City Polytechnic
Priority to CN202210786689.XA priority Critical patent/CN115171042A/en
Publication of CN115171042A publication Critical patent/CN115171042A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/52Surveillance or monitoring of activities, e.g. for recognising suspicious objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/103Static body considered as a whole, e.g. static pedestrian or occupant recognition

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Databases & Information Systems (AREA)
  • Human Computer Interaction (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention provides a method, a device, terminal equipment and a medium for recognizing student classroom behaviors, wherein differences of the student classroom behaviors are amplified through triple training data, so that a primary posture classification model can acquire more difference information in the training process, and the sensitivity of the model can be improved without adding an additional neural network; the method comprises the steps of obtaining feature information of triple training data according to an attitude classification model, calculating a first loss value and a second loss value, calculating a total loss value according to the first loss value and the second loss value, calculating the total loss value, obtaining a final attitude classification model according to parameter adjustment of the model, and being beneficial to improving the judgment capability of the attitude classification model. The recognition of the classroom behavior images to be detected is carried out through the final posture classification model, and the recognition accuracy of classroom behaviors of students can be improved.

Description

Student classroom behavior identification method, device, terminal equipment and medium
Technical Field
The invention relates to the field of image recognition, in particular to a method, a device, terminal equipment and a medium for recognizing classroom behavior of students.
Background
In order to help schools to better understand the class learning quality of students, artificial intelligence technology is gradually introduced into intelligent analysis and decision-making auxiliary tasks such as student class condition analysis and teacher teaching evaluation. Compared with the traditional manual evaluation method, the method has the advantages of convenience, higher efficiency and the like by using the AI technology to assist in analyzing the classroom learning condition. Therefore, in order to deeply and intelligently teach the informatization and intelligentization process, more and more people are dedicated to constructing an automatic classroom teaching management system, and how to accurately classify the behaviors of students in a classroom becomes a challenging task.
The existing classroom behavior recognition method comprises a traditional machine learning classification method and a deep learning method in recent years. The self-designed manual feature priori knowledge required by the traditional machine learning classification method is used for training a classifier to recognize and detect behaviors, algorithm evolution can detect different classroom behaviors from the inspiration point, however, the manual feature design difficulty of the method is high, and the recognition accuracy is low. In contrast to the conventional method, the method does not need the characteristics of manual design, and is convenient in the training process, such as student classroom behavior recognition based on deep learning (reference documents: wei Yan Tao, qin Daoying, hu Min, and the like.) proposed by Wei Yan Tao and the like (student classroom behavior recognition based on deep learning [ J ] modern education technology, 2019,29 (7): 87-91.). Compared with the traditional machine learning classification method, the method reduces the training difficulty and improves the accuracy of behavior classification, however, only an end-to-end training mode is used, only the correct label of a single sample is considered, the cross entropy function is used for loss optimization, the influence of the difference between different behavior classes on the classification result is ignored, in the practical application, the situations that the posture difference is small and the semantics are completely different often exist, and when the trained model is used for detecting the situations, the recognition accuracy is greatly reduced.
Therefore, a recognition strategy of the student classroom behavior is needed to solve the problem of low accuracy of student classroom behavior recognition.
Disclosure of Invention
The embodiment of the invention provides a method, a device, terminal equipment and a medium for identifying student classroom behaviors, so as to improve the identification accuracy of the student classroom behaviors.
In order to solve the above problem, an embodiment of the present invention provides a method for identifying student classroom behavior, including:
extracting first posture information of the constructed triple training data, inputting the first posture information into a preset first posture classification model, and acquiring characteristic information; wherein the triple training data consists of student classroom behavior samples with different anchor point categories;
calculating to obtain a first loss value and a second loss value according to the characteristic information;
calculating a total loss value according to the first loss value and the second loss value, and performing parameter adjustment on the first posture classification model according to the total loss value to obtain a final second posture classification model;
and inputting second posture information of the classroom behavior image to be detected into the second posture classification model, and obtaining a recognition result corresponding to the classroom behavior image to be detected.
Therefore, the invention has the following beneficial effects:
the invention provides a method for identifying the classroom behavior of students, which amplifies the difference of the classroom behavior of students through triple training data, so that a primary posture classification model can obtain more difference information in the training process, and the sensitivity of the model can be improved without adding an additional neural network; the method comprises the steps of obtaining feature information of triple training data according to an attitude classification model, calculating a first loss value and a second loss value, calculating a total loss value according to the first loss value and the second loss value, calculating the total loss value, obtaining a final attitude classification model according to parameter adjustment of the model, and being beneficial to improving the judgment capability of the attitude classification model. The recognition accuracy of the classroom behavior of the students can be improved by recognizing the classroom behavior images to be detected through the final posture classification model.
As an improvement of the above solution, the triplet training data includes: the method comprises the following steps of (1) obtaining a first class classroom behavior sample, a second class classroom behavior sample and a third class classroom behavior sample; wherein the first class classroom behavior samples include: the second class classroom behavior sample comprises: a second class behavior picture and a second anchor point category label, the third class behavior sample comprising: a third classroom behavior picture and a third anchor point category label; the first anchor point category is the same as the second anchor point category, and the first anchor point category is different from the third anchor point category.
By implementing the improved scheme of the embodiment, triple training data is constructed through anchor point types, and the same points and different points between each anchor point type are highlighted, so that differences of classroom behavior samples are distinguished, and a primary posture classification model can be trained better.
As an improvement of the above scheme, the inputting the first posture information into a preset first posture classification model, and acquiring the feature information specifically includes:
respectively inputting the first posture information into a plurality of task branches of the first posture classification model according to anchor point category labels; wherein each task branch corresponds to an anchor category label one-to-one, and the first posture information includes: the gesture information of the first class of classroom behavior samples, the gesture information of the second class of classroom behavior samples and the gesture information of the third class of classroom behavior samples;
summarizing the depth characteristics output by each task branch to obtain the characteristic information; wherein the feature information includes: the depth characteristics of the first class of classroom behavior samples, the depth characteristics of the second class of classroom behavior samples, and the depth characteristics of the third class of classroom behavior samples.
By implementing the improved scheme of the embodiment, the triple training data is input into the task branch corresponding to the preliminary posture classification model according to different anchor point class labels, so that the characteristic information of the triple training data can be obtained, and a foundation is laid for the calculation of the first loss value and the second loss value.
As an improvement of the above scheme, the calculating and obtaining a first loss value and a second loss value according to the feature information specifically includes:
calculating a first loss value L through a first formula according to the depth characteristics of the first class of classroom behavior samples, the depth characteristics of the second class of classroom behavior samples and the depth characteristics of the third class of classroom behavior samples TCE
L TCE =max(L 1 ,0)
L 1 =-f a log(f p )+f a log(f n )+margin
Wherein f is a Depth features, f, for class one classroom behavior samples p Depth features, f, for class II classroom behavior samples n Depth features for class three behavior samples, f a log(f p ) Predicted characteristic distance value f for class I and class II classroom behavior samples a log(f n ) The characteristic distance value predicted for the first class of classroom behavior samples and the third class of classroom behavior samples, margin is a first weight, L TCE A pose difference loss value for different task branches;
when L is 1 When greater than 0, then L TCE Is L1;
when L is 1 When less than 0, then L TCE Is 0.
By implementing the improved scheme of the embodiment, the embodiment can obtain the first loss value through calculation by the first formula by calculating the characteristic distance value between the first class of classroom behavior samples and the second class of classroom behavior samples and calculating the characteristic distance value between the first class of classroom behavior samples and the third class of classroom behavior samples, thereby laying a foundation for calculating the total loss value.
As an improvement of the above solution, the calculating and obtaining a first loss value and a second loss value according to the feature information further includes:
calculating a second loss value L through a second formula according to the depth characteristics of the first class classroom behavior samples, the depth characteristics of the second class classroom behavior samples and the depth characteristics of the third class classroom behavior samples cross_entropy
L cross_entropy =L a +L p +L n
L a =-∑y a log(f a )
L p =-∑y p log(f p )
L n =-∑y n log(f n )
Wherein L is a Is the cross entropy loss value, L, of the class one classroom behavior sample p Cross entropy loss value, L, for class two behavior samples n Cross entropy loss value, y, for class three behavior samples a First Anchor Point class Label, y, for first class classroom behavior sample p Second Anchor Point class Label, y, for second class classroom behavior samples n A third anchor point category label for a third category class behavior sample.
By implementing the improved scheme of the embodiment, the second loss value is obtained by calculating the sum of the cross entropy loss values of the class behavior samples of each category, and a foundation is laid for calculating the total loss value.
As an improvement of the above scheme, the calculating a total loss value according to the first loss value and the second loss value specifically includes:
calculating the total loss value L by combining the first loss value and the second loss value with a preset second parameter through a third formula total
L total =L TCE +αL cross_entropy
Wherein α is a second parameter.
By implementing the improved scheme of the embodiment, the embodiment combines the first loss value and the second loss value, and performs balance calculation through the second parameter, so as to obtain a total loss value, perform preliminary parameter adjustment of the posture classification model according to the total loss value, and facilitate improvement of the discrimination capability of the model.
As an improvement of the above solution, the preset first posture classification model is composed of three convolutional layers and three multilayer sensing layers.
By implementing the improved scheme of the embodiment, the embodiment performs preliminary posture classification model design through three convolutional layers and three multilayer sensing layers, does not need pre-training, and can have better processing performance on action classification on the basis of guaranteeing the time-sharing processing efficiency.
Correspondingly, an embodiment of the present invention further provides an apparatus for identifying classroom behavior of a student, including: the device comprises an information acquisition module, a first calculation module, a second calculation module and a result output module;
the information acquisition module is used for extracting first posture information of the constructed triple training data, inputting the first posture information into a preset first posture classification model and acquiring characteristic information; wherein the triple training data consists of student classroom behavior samples with different anchor point categories;
the first calculating module is used for calculating to obtain a first loss value and a second loss value according to the characteristic information;
the second calculation module is configured to calculate a total loss value according to the first loss value and the second loss value, and perform parameter adjustment on the first posture classification model according to the total loss value to obtain a final second posture classification model;
and the result output module is used for inputting the second posture information of the classroom behavior image to be detected into the second posture classification model and obtaining the identification result corresponding to the classroom behavior image to be detected.
As an improvement of the above scheme, the triplet training data includes: the method comprises the following steps of (1) obtaining a first class classroom behavior sample, a second class classroom behavior sample and a third class classroom behavior sample; wherein the first class classroom behavior samples include: the second class classroom behavior sample comprises: a second class behavior picture and a second anchor point class label, the third class behavior sample comprising: a third classroom behavior picture and a third anchor point category label; the first anchor category is the same as the second anchor category, and the first anchor category is different from the third anchor category.
As an improvement of the above scheme, the inputting the first posture information into a preset first posture classification model, and acquiring the feature information specifically includes:
respectively inputting the first posture information into a plurality of task branches of the first posture classification model according to the anchor point category labels; wherein each task branch corresponds to an anchor category label one-to-one, and the first posture information includes: the gesture information of the first class of classroom behavior samples, the gesture information of the second class of classroom behavior samples and the gesture information of the third class of classroom behavior samples;
summarizing the depth characteristics output by each task branch to obtain the characteristic information; wherein the feature information includes: the depth characteristics of the first class of classroom behavior samples, the depth characteristics of the second class of classroom behavior samples, and the depth characteristics of the third class of classroom behavior samples.
As an improvement of the above scheme, the first calculation module specifically includes:
calculating a first loss value L through a first formula according to the depth characteristics of the first class of classroom behavior samples, the depth characteristics of the second class of classroom behavior samples and the depth characteristics of the third class of classroom behavior samples TCE
L TCE =max(L 1 ,0)
L 1 =-f a log(f p )+f a log(f n )+margin
Wherein f is a Depth features, f, for class one classroom behavior samples p Class lines for the second categoryAs depth features of the sample, f n Depth features for class three behavior samples, f a log(f p ) Predicted characteristic distance value f for class I and class II classroom behavior samples a log(f n ) The characteristic distance value predicted for the first class of classroom behavior samples and the third class of classroom behavior samples, margin is a first weight, L TCE A pose difference loss value for different task branches;
when L is 1 When greater than 0, then L TCE Is L 1
When L is 1 When less than 0, then L TCE Is 0.
As an improvement of the above scheme, the first computing module further includes:
calculating a second loss value L through a second formula according to the depth characteristics of the first class classroom behavior samples, the depth characteristics of the second class classroom behavior samples and the depth characteristics of the third class classroom behavior samples cross_entropy
L cross_entropy =L a +L p +L n
L a =-∑y a log(f a )
L p =-∑y p log(f p )
L n =-∑y n log(f n )
Wherein L is a Is the cross entropy loss value, L, of the class one classroom behavior sample p Cross entropy loss value, L, for class two behavior samples n Cross entropy loss value, y, for class three behavior samples a First Anchor Point class Label, y, for a first class classroom behavior sample p Second Anchor Point class Label, y, for second class classroom behavior samples n A third anchor category label for a third category classroom behavior sample.
As an improvement of the foregoing solution, the calculating a total loss value according to the first loss value and the second loss value specifically includes:
will be the firstCalculating a total loss value L through a third formula by combining a loss value and the second loss value with a preset second weight total
L total =L TCE +αL cross_entropy
Where α is the second weight.
As an improvement of the above solution, the preset first posture classification model is composed of three convolutional layers and three multilayer sensing layers.
Accordingly, an embodiment of the present invention further provides a computer terminal device, which includes a processor, a memory, and a computer program stored in the memory and configured to be executed by the processor, and when the processor executes the computer program, the processor implements a method for identifying student classroom behavior according to the present invention.
Correspondingly, an embodiment of the present invention further provides a computer-readable storage medium, where the computer-readable storage medium includes a stored computer program, where when the computer program runs, the apparatus where the computer-readable storage medium is located is controlled to execute the method for identifying classroom behavior of a student according to the present invention.
Drawings
Fig. 1 is a flowchart illustrating a method for identifying classroom behavior of a student according to an embodiment of the present invention;
fig. 2 is a schematic structural diagram of an apparatus for recognizing classroom behavior of a student according to an embodiment of the present invention;
FIG. 3 is a schematic structural diagram of a pose classification model according to an embodiment of the present invention;
fig. 4 is a schematic view illustrating a visualization of the superimposition of the original image and the posture information of the student classroom behavior sample according to an embodiment of the present invention;
fig. 5 is a schematic view illustrating the superposition of the posture information of the classroom behavior image of the student to be tested and the original image according to an embodiment of the present invention;
fig. 6 is a schematic structural diagram of a terminal device according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Example one
Referring to fig. 1, fig. 1 is a schematic flow chart of a method for identifying student classroom behavior according to an embodiment of the present invention, and as shown in fig. 1, the present embodiment includes steps 101 to 104, where each step is specifically as follows:
step 101: extracting first posture information of the constructed triple training data, inputting the first posture information to a preset first posture classification model, and acquiring characteristic information; wherein the triple training data consists of student classroom behavior samples with different anchor point categories.
In this embodiment, the triplet training data includes: the method comprises the following steps of (1) obtaining a first class classroom behavior sample, a second class classroom behavior sample and a third class classroom behavior sample; wherein the first class classroom behavior samples include: the second class classroom behavior sample comprises: a second class behavior picture and a second anchor point class label, the third class behavior sample comprising: a third classroom behavior picture and a third anchor point category label; the first anchor category is the same as the second anchor category, and the first anchor category is different from the third anchor category.
In a specific embodiment, 6421 images are collected for making a student classroom behavior recognition dataset. Classifying the identification data set: the method comprises the following steps that an anchor point class classroom behavior sample (namely, a first classroom behavior sample), a class behavior sample (namely, a second classroom behavior sample) with the same class as an anchor point class and a similar class behavior sample (namely, a third classroom behavior sample) with a different class from the anchor point are combined to form the anchor point class classroom behavior sample, training data are constructed in a triple mode, and single-batch triple training data comprise student class behavior images which are correspondingly combined and data sets corresponding to behavior class labels:
Triplet={D a ,D p ,D n }
D a ={(pic a ,leibie a )}
D p ={(pic p ,leibie a )}
D n ={(pic n ,leibie n )}
wherein, the triple training data set Triplet of the single batch comprises three data types D a ,D p ,D n Respectively representing an anchor sample (anchor category class behavior sample), a positive example sample (same category class behavior sample as the anchor), a negative example sample (similar class behavior sample as the anchor), wherein the sample data of each category comprises a behavior image pic and a behavior category label leibie, wherein pic comprises a behavior image pic and a behavior category label leibie a ,pic p ,pic n ∈R 3 Respectively identifying training pictures (namely classroom behavior pictures) contained in the corresponding category data of the anchor sample, the positive sample and the negative sample, R 3 Representing a three-dimensional Euclidean space, leibie a ,leibie a ,leibie n And e.N represents sample behavior labels (namely anchor class labels) corresponding to the anchor sample, the positive sample and the negative sample respectively, wherein the N represents the natural data set, and the class sample behavior labels of the anchor sample and the positive sample are the same.
In a specific embodiment, an openpos gesture recognition network is used to extract gesture information of the image in the triple training data, and the gesture information is labeled as gesture information represented by a corresponding image. The extracted attitude information is expressed as:
P={(p 1 ,p 2 ,...,p n ,y)}
wherein p is i ∈R 2 Indicating the position of the ith body part, i e [1, n ]]And n represents the total number of body parts. y represents the corresponding behavior category label, y ∈ N.
Specifically, openpos can extract student whole-body posture information in an experiment, in the embodiment, only the upper half-body posture information is utilized, and n is 12. Comprises a nose bridge, a left eyebrow, a right eyebrow, a left ear, a right ear, a neck, a left shoulder, a right shoulder, a left wrist, a right wrist, a left elbow and a right elbow. A visualization schematic diagram of the overlay of the original drawing and the pose information extracted by the openpos pose recognition network in part of the sample is shown in fig. 4.
In this embodiment, the inputting the first posture information into a preset first posture classification model, and acquiring the feature information specifically includes:
respectively inputting the first posture information into a plurality of task branches of the first posture classification model according to the anchor point category labels; wherein each task branch corresponds to an anchor category label one-to-one, and the first posture information includes: the gesture information of the first class of classroom behavior samples, the gesture information of the second class of classroom behavior samples and the gesture information of the third class of classroom behavior samples;
summarizing the depth characteristics output by each task branch to obtain the characteristic information; wherein the feature information includes: the depth characteristics of the first class of classroom behavior samples, the depth characteristics of the second class of classroom behavior samples, and the depth characteristics of the third class of classroom behavior samples.
In this embodiment, the preset first pose classification model is composed of three convolutional layers and three multi-layer sensing layers.
In a specific embodiment, the Lesson behavor post Recognition Network, LBPRN) of the first Posture classification model adds a convolution hidden layer and a multilayer sensing layer to output a final result on the basis of student Posture estimation, and classifies the behaviors of listening seriously, listening eastern and western hopes, dozing and doing notes. For better illustration, please refer to the structure of the pose classification model of fig. 3. The model is integrally designed to comprise a three-layer convolution layer and a three-layer multi-layer sensing layer structure, pre-training is not needed, and better processing performance can be achieved on action classification on the basis of guaranteeing the time-sharing processing efficiency;
specifically, the parameters (the number of convolution kernels, the number of rows, the number of columns, and the number of channels) of the three convolution layers are (5, 2, 1, and 32), (3, 2, 1, and 48), (3, 2, 1, and 64), respectively, and are used for extracting the depth features of the student posture information, and each layer uses a RELU activation function. The multilayer perceptron is an output layer behind the three-layer convolution structure and consists of three full-connection layers, wherein the full-connection layers are all designed by linear layers, and the last layer uses a Softplus activation function to output depth features extracted by key point information.
Step 102: and calculating to obtain a first loss value and a second loss value according to the characteristic information.
In this embodiment, the calculating, according to the feature information, to obtain a first loss value and a second loss value specifically includes:
calculating a first loss value L through a first formula according to the depth characteristics of the first class of classroom behavior samples, the depth characteristics of the second class of classroom behavior samples and the depth characteristics of the third class of classroom behavior samples TCE
L TCE =max(L 1 ,0)
L 1 =-f a log(f p )+f a log(f n )+margin
Wherein, f a Depth features, f, for class one classroom behavior samples p Depth features for class-second behavior samples, f n Depth features for class three behavior samples, f a log(f p ) Predicted characteristic distance value f for class I and class II classroom behavior samples a log(f n ) Predicting characteristic distance values of the first class classroom behavior samples and the third class classroom behavior samples, margin is a first weight, L TCE A pose difference loss value for different task branches;
when L is 1 When greater than 0, then L TCE Is L 1
When L is 1 When less than 0, then L TCE Is 0.
In a specific embodiment, the meaning of L _ TCE is to solve the distance between the anchor sample and the positive sample and the distance difference between the anchor sample and the negative sample according to the information of the attitude difference, the method constrains the distance in a first weight margin, the setting of margin can effectively pull open the distance between the anchor sample and the fixed sample pair and the distance between the anchor sample and the negative sample pair, and here, we set the value to be 1.2; the setting of the margin weight value can be adjusted up or down according to different experimental purposes.
In this embodiment, the calculating to obtain the first loss value and the second loss value according to the feature information further includes:
according to the depth characteristics of the first class of classroom behavior samples, the depth characteristics of the second class of classroom behavior samples and the depth characteristics of the third class of classroom behavior samples, a second loss value L is calculated through a second formula cross_entropy
L cross_entropy =L a +L p +L n
L a =-∑y a log(f a )
L p =-∑y p log(f p )
L n =-∑y n log(f n )
Wherein L is a Is the cross entropy loss value, L, of the class one classroom behavior sample p Cross entropy loss value, L, for class II behavior samples n Cross entropy loss value, y, for class three behavior samples a First Anchor Point class Label, y, for first class classroom behavior sample p Second Anchor Point class Label, y, for a second class classroom behavior sample n A third anchor point category label for a third category class behavior sample.
Step 103: calculating a total loss value according to the first loss value and the second loss value, and performing parameter adjustment on the first posture classification model according to the total loss value to obtain a final second posture classification model;
in this embodiment, the calculating a total loss value according to the first loss value and the second loss value specifically includes:
summing the first loss valuesThe second loss value is combined with a preset second weight, and the total loss value L is calculated through a third formula total
L total =L TCE +αL cross_entropy
Where α is the second weight.
In a specific embodiment, the second weight is a hyperparameter, which may be 0.4.
Step 104: and inputting second posture information of the classroom behavior image to be detected into the second posture classification model, and obtaining an identification result corresponding to the classroom behavior image to be detected.
In the embodiment, a group of classroom behavior images of students to be tested is input, and the size of the classroom behavior images is 128 x 128;
inputting the classroom behavior image of the student to be tested into an Openpos gesture recognition network to extract gesture information to obtain second gesture information, wherein a visual result graph obtained by superposing the second gesture information and the classroom behavior image of the student to be tested is shown in FIG. 5;
inputting the second posture information into a final posture classification model for behavior recognition, and outputting recognition results, wherein the final output recognition results are respectively: "take notes", "take notes" and "east-Zhang-West".
It should be noted that fig. 4 and 5 perform the fuzzification processing to protect the personal privacy of the student.
The embodiment divides behavior types into carefully listening and speaking, east Zhang hope, dozing and note taking, and establishes corresponding triple training pairs for different behavior types, so that the model can sense the posture difference among different behaviors in the training process, and the error rate of similar behavior prediction types can be effectively reduced. In the embodiment, the parameters of the posture classification model are adjusted by combining the posture difference loss value and the cross entropy loss value, so that the discrimination capability of the model is effectively enhanced. Compared with the traditional convolutional neural network-based image recognition method, the method can effectively improve the accuracy of student classroom behavior recognition.
Example two
Referring to fig. 2, fig. 2 is a schematic structural diagram of an apparatus for recognizing classroom behavior of a student according to an embodiment of the present invention, including: an information acquisition module 201, a first calculation module 202, a second calculation module 203 and a result output module 204;
the information acquisition module 201 is configured to extract first posture information from the constructed triple training data, input the first posture information to a preset first posture classification model, and acquire feature information; wherein the triple training data consists of student classroom behavior samples with different anchor point categories;
the first calculating module 202 is configured to calculate and obtain a first loss value and a second loss value according to the feature information;
the second calculating module 203 is configured to calculate a total loss value according to the first loss value and the second loss value, and perform parameter adjustment on the first posture classification model according to the total loss value to obtain a final second posture classification model;
the result output module 204 is configured to input second posture information of the classroom behavior image to be detected to the second posture classification model, and obtain an identification result corresponding to the classroom behavior image to be detected.
As an improvement of the above scheme, the triplet training data includes: the method comprises the following steps of (1) obtaining a first class classroom behavior sample, a second class classroom behavior sample and a third class classroom behavior sample; wherein the first class classroom behavior samples include: the second class classroom behavior sample comprises: a second class behavior picture and a second anchor point class label, the third class behavior sample comprising: a third classroom behavior picture and a third anchor point category label; the first anchor category is the same as the second anchor category, and the first anchor category is different from the third anchor category.
As an improvement of the above scheme, the inputting the first posture information into a preset first posture classification model to obtain the feature information specifically includes:
respectively inputting the first posture information into a plurality of task branches of the first posture classification model according to anchor point category labels; wherein each task branch corresponds to an anchor category label one-to-one, and the first posture information includes: the gesture information of the first class of classroom behavior samples, the gesture information of the second class of classroom behavior samples and the gesture information of the third class of classroom behavior samples;
summarizing the depth characteristics output by each task branch to obtain the characteristic information; wherein the feature information includes: the depth characteristics of the first class of classroom behavior samples, the depth characteristics of the second class of classroom behavior samples, and the depth characteristics of the third class of classroom behavior samples.
As an improvement of the foregoing solution, the first calculating module 202 specifically includes:
according to the depth characteristics of the first class of classroom behavior samples, the depth characteristics of the second class of classroom behavior samples and the depth characteristics of the third class of classroom behavior samples, a first loss value L is calculated through a first formula TCE
L TCE =max(L 1 ,0)
L 1 =-f a log(f p )+f a log(f n )+margin
Wherein f is a Depth features, f, for class one classroom behavior samples p Depth features, f, for class II classroom behavior samples n Depth features for class three behavior samples, f a log(f p ) Predicted characteristic distance value f for class I and class II classroom behavior samples a log(f n ) Predicting characteristic distance values of the first class classroom behavior samples and the third class classroom behavior samples, margin is a first weight, L TCE The attitude difference loss values of different task branches;
when L is 1 When greater than 0, then L TCE Is L 1
When L is 1 When less than 0, then L TCE Is 0.
As an improvement of the above solution, the first calculating module 202 further includes:
calculating a second loss value L through a second formula according to the depth characteristics of the first class classroom behavior samples, the depth characteristics of the second class classroom behavior samples and the depth characteristics of the third class classroom behavior samples cross_entropy
L cross_entropy =L a +L p +L n
L a =-∑y a log(f a )
L p =-∑y p log(f p )
L n =-∑y n log(f n )
Wherein L is a Is the cross entropy loss value, L, of the class one classroom behavior sample p Cross entropy loss value, L, for class II behavior samples n Cross entropy loss value, y, for class three behavior samples a First Anchor Point class Label, y, for first class classroom behavior sample p Second Anchor Point class Label, y, for second class classroom behavior samples n A third anchor point category label for a third category class behavior sample.
As an improvement of the above scheme, the calculating a total loss value according to the first loss value and the second loss value specifically includes:
combining the first loss value and the second loss value with a preset second weight, and calculating a total loss value L through a third formula total
L total =L TCE +αL cross_entropy
Where α is the second weight.
As an improvement of the above solution, the preset first posture classification model is composed of three convolutional layers and three multilayer sensing layers.
In the embodiment, the information acquisition module is used for training the student class behavior samples according to triple training data to acquire feature information, the first calculation module is used for processing the feature information to acquire a first loss value and a second loss value, the second calculation module is used for processing the first loss value and the second loss value, the calculated total loss value is subjected to preliminary parameter adjustment of the posture classification model to acquire a final posture classification model, and finally, the second posture information of the class behavior image to be detected is input into the final posture classification model to acquire the recognition result. In the embodiment, the difference between different behaviors is fully recognized by the primary posture classification model in the form of triple training data, and the parameter adjustment of the primary posture classification model is carried out by combining the first loss value and the second loss value, so that the accuracy of student classroom behavior recognition is improved.
EXAMPLE III
Referring to fig. 6, fig. 6 is a schematic structural diagram of a terminal device according to an embodiment of the present invention.
A terminal device of this embodiment includes: a processor 601, a memory 602, and computer programs stored in said memory 602 and executable on said processor 601. The processor 601, when executing the computer program, implements the steps of the above-described identification method for student classroom activities in embodiments, such as all the steps of the identification method for student classroom activities shown in fig. 1. Alternatively, the processor, when executing the computer program, implements the functions of the modules in the device embodiments, for example: all the modules of the recognition apparatus for student classroom behavior shown in fig. 2.
In addition, an embodiment of the present invention further provides a computer-readable storage medium, where the computer-readable storage medium includes a stored computer program, where when the computer program runs, the apparatus where the computer-readable storage medium is located is controlled to execute the method for identifying student classroom behavior according to any one of the above embodiments.
It will be appreciated by those skilled in the art that the schematic diagram is merely an example of a terminal device and does not constitute a limitation of the terminal device, and may include more or fewer components than shown, or some components may be combined, or different components, e.g., the terminal device may also include input output devices, network access devices, buses, etc.
Processor 601 may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic, discrete hardware components, etc. The general processor may be a microprocessor or the processor may be any conventional processor, etc., and the processor 601 is the control center of the terminal device and connects various parts of the whole terminal device by using various interfaces and lines.
The memory 602 can be used for storing the computer programs and/or modules, and the processor 601 can implement various functions of the terminal device by running or executing the computer programs and/or modules stored in the memory and calling the data stored in the memory 602. The memory 602 may mainly include a program storage area and a data storage area, where the program storage area may store an operating system, an application program (such as a sound playing function, an image playing function, etc.) required by at least one function, and the like; the storage data area may store data (such as audio data, a phonebook, etc.) created according to the use of the cellular phone, and the like. In addition, the memory may include high speed random access memory, and may also include non-volatile memory, such as a hard disk, a memory, a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), at least one magnetic disk storage device, a Flash memory device, or other volatile solid state storage device.
Wherein, the terminal device integrated module/unit can be stored in a computer readable storage medium if it is implemented in the form of software functional unit and sold or used as a stand-alone product. Based on such understanding, all or part of the flow of the method according to the embodiments of the present invention may also be implemented by a computer program, which may be stored in a computer-readable storage medium, and when the computer program is executed by a processor, the steps of the method embodiments described above may be implemented. Wherein the computer program comprises computer program code, which may be in the form of source code, object code, an executable file or some intermediate form, etc. The computer-readable medium may include: any entity or device capable of carrying the computer program code, recording medium, U.S. disk, removable hard disk, magnetic diskette, optical disk, computer Memory, read-Only Memory (ROM), random Access Memory (RAM), electrical carrier wave signal, telecommunications signal, and software distribution medium, etc.
It should be noted that the above-described device embodiments are merely illustrative, where the units described as separate parts may or may not be physically separate, and the parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on multiple network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. In addition, in the drawings of the embodiment of the apparatus provided by the present invention, the connection relationship between the modules indicates that there is a communication connection between them, and may be specifically implemented as one or more communication buses or signal lines. One of ordinary skill in the art can understand and implement it without inventive effort.
While the foregoing is directed to the preferred embodiment of the present invention, it will be understood by those skilled in the art that various changes and modifications may be made without departing from the spirit and scope of the invention.

Claims (10)

1. A student classroom behavior identification method is characterized by comprising the following steps:
extracting first posture information of the constructed triple training data, inputting the first posture information to a preset first posture classification model, and acquiring characteristic information; wherein the triple training data consists of student classroom behavior samples with different anchor point categories;
calculating to obtain a first loss value and a second loss value according to the characteristic information;
calculating a total loss value according to the first loss value and the second loss value, and performing parameter adjustment on the first posture classification model according to the total loss value to obtain a final second posture classification model;
and inputting second posture information of the classroom behavior image to be detected into the second posture classification model, and obtaining an identification result corresponding to the classroom behavior image to be detected.
2. The method for identifying student classroom behavior as recited in claim 1, wherein the triplet training data comprises: the method comprises the following steps of (1) obtaining a first class classroom behavior sample, a second class classroom behavior sample and a third class classroom behavior sample; wherein the first class classroom behavior samples include: the second class classroom behavior sample comprises: a second class behavior picture and a second anchor point category label, the third class behavior sample comprising: a third classroom behavior picture and a third anchor point category label; the first anchor category is the same as the second anchor category, and the first anchor category is different from the third anchor category.
3. The student classroom behavior recognition method according to claim 2, wherein the first posture information is input to a preset first posture classification model to obtain feature information, specifically:
respectively inputting the first posture information into a plurality of task branches of the first posture classification model according to the anchor point category labels; wherein each task branch corresponds to an anchor category label one-to-one, and the first posture information includes: the gesture information of the first class of classroom behavior samples, the gesture information of the second class of classroom behavior samples and the gesture information of the third class of classroom behavior samples;
summarizing the depth characteristics output by each task branch to obtain the characteristic information; wherein the feature information includes: the depth characteristics of the first class of classroom behavior samples, the depth characteristics of the second class of classroom behavior samples, and the depth characteristics of the third class of classroom behavior samples.
4. The student classroom behavior identification method according to claim 3, wherein the first loss value and the second loss value are calculated according to the characteristic information, and specifically:
according to the depth characteristics of the first class of classroom behavior samples, the depth characteristics of the second class of classroom behavior samples and the depth characteristics of the third class of classroom behavior samples, a first loss value L is calculated through a first formula TCE
L TCE =max(L 1 ,0)
L 1 =-f a log(f p )+f a log(f n )+margin
Wherein f is a Depth features, f, for class one classroom behavior samples p Depth features, f, for class II classroom behavior samples n Depth features for class three behavior samples, f a log(f p ) Predicted characteristic distance value f for class I and class II classroom behavior samples a log(f n ) Predicting characteristic distance values of the first class classroom behavior samples and the third class classroom behavior samples, margin is a first weight, L TCE The attitude difference loss values of different task branches;
when L is 1 When greater than 0, then L TCE Is L 1
When L is 1 When less than 0, then L TCE Is 0.
5. The method for identifying student classroom behavior as recited in claim 4, wherein the calculating a first loss value and a second loss value based on the characteristic information further comprises:
according to the depth characteristics of the first class of classroom behavior samples, the depth characteristics of the second class of classroom behavior samples and the third class of classroom behavior samplesThe depth characteristics of class classroom behavior samples are calculated through a second formula to obtain a second loss value L cross_entropy
L cross_entropy =L a +L p +L n
L a =-∑y a log(f a )
K p =-∑y p log(f p )
L n =-∑y n log(f n )
Wherein L is a Is the cross entropy loss value, L, of the class one classroom behavior sample p Cross entropy loss value, L, for class two behavior samples n Cross entropy loss value, y, for class three behavior samples a First Anchor Point class Label, y, for first class classroom behavior sample p Second Anchor Point class Label, y, for second class classroom behavior samples n A third anchor category label for a third category classroom behavior sample.
6. The student classroom behavior identification method as defined in claim 5, wherein the calculating a total loss value based on the first loss value and the second loss value comprises:
calculating the total loss value L through a third formula by combining the first loss value and the second loss value with a preset second weight total
L total =L TCE +αL cross_entropy
Where α is the second weight.
7. The student classroom behavior recognition method as defined in claim 2, wherein the predetermined first posture classification model is comprised of three convolutional layers and three multilayer sensing layers.
8. An apparatus for recognizing classroom behavior of students, comprising: the device comprises an information acquisition module, a first calculation module, a second calculation module and a result output module;
the information acquisition module is used for extracting first posture information of the constructed triple training data, inputting the first posture information to a preset first posture classification model and acquiring characteristic information; the triple training data consists of student classroom behavior samples with different anchor point types;
the first calculation module is used for calculating and obtaining a first loss value and a second loss value according to the characteristic information;
the second calculation module is configured to calculate a total loss value according to the first loss value and the second loss value, and perform parameter adjustment on the first posture classification model according to the total loss value to obtain a final second posture classification model;
and the result output module is used for inputting the second posture information of the classroom behavior image to be detected into the second posture classification model and obtaining the identification result corresponding to the classroom behavior image to be detected.
9. A computer terminal device, comprising a processor, a memory and a computer program stored in the memory and configured to be executed by the processor, the processor implementing a method of student classroom behavior identification as claimed in any one of claims 1 to 7 when executing the computer program.
10. A computer-readable storage medium, comprising a stored computer program, wherein the computer program, when running, controls an apparatus on which the computer-readable storage medium is located to perform a method for student classroom behavior recognition according to any one of claims 1-7.
CN202210786689.XA 2022-07-05 2022-07-05 Student classroom behavior identification method, device, terminal equipment and medium Pending CN115171042A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210786689.XA CN115171042A (en) 2022-07-05 2022-07-05 Student classroom behavior identification method, device, terminal equipment and medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210786689.XA CN115171042A (en) 2022-07-05 2022-07-05 Student classroom behavior identification method, device, terminal equipment and medium

Publications (1)

Publication Number Publication Date
CN115171042A true CN115171042A (en) 2022-10-11

Family

ID=83491604

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210786689.XA Pending CN115171042A (en) 2022-07-05 2022-07-05 Student classroom behavior identification method, device, terminal equipment and medium

Country Status (1)

Country Link
CN (1) CN115171042A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB2609542A (en) * 2021-06-02 2023-02-08 Nvidia Corp Techniques for classification with neural networks

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB2609542A (en) * 2021-06-02 2023-02-08 Nvidia Corp Techniques for classification with neural networks
GB2609542B (en) * 2021-06-02 2023-12-13 Nvidia Corp Techniques for classification with neural networks

Similar Documents

Publication Publication Date Title
CN111709409B (en) Face living body detection method, device, equipment and medium
Singh et al. Image classification: a survey
EP3968179A1 (en) Place recognition method and apparatus, model training method and apparatus for place recognition, and electronic device
CN112734775B (en) Image labeling, image semantic segmentation and model training methods and devices
CN109145871B (en) Psychological behavior recognition method, device and storage medium
CN111754596A (en) Editing model generation method, editing model generation device, editing method, editing device, editing equipment and editing medium
CN111767900A (en) Face living body detection method and device, computer equipment and storage medium
US20230095182A1 (en) Method and apparatus for extracting biological features, device, medium, and program product
CN113449700B (en) Training of video classification model, video classification method, device, equipment and medium
CN112200154A (en) Face recognition method and device for mask, electronic equipment and storage medium
CN113722474A (en) Text classification method, device, equipment and storage medium
CN115936944B (en) Virtual teaching management method and device based on artificial intelligence
CN111898550A (en) Method and device for establishing expression recognition model, computer equipment and storage medium
CN111507227A (en) Multi-student individual segmentation and state autonomous identification method based on deep learning
CN111507467A (en) Neural network model training method and device, computer equipment and storage medium
CN113569607A (en) Motion recognition method, motion recognition device, motion recognition equipment and storage medium
CN113297956A (en) Gesture recognition method and system based on vision
CN115171042A (en) Student classroom behavior identification method, device, terminal equipment and medium
Zheng et al. Attention assessment based on multi‐view classroom behaviour recognition
KR101334858B1 (en) Automatic butterfly species identification system and method, and portable terminal having automatic butterfly species identification function using the same
CN112417974A (en) Public health monitoring method
CN115659221A (en) Teaching quality assessment method and device and computer readable storage medium
CN111199378A (en) Student management method, student management device, electronic equipment and storage medium
CN114492634A (en) Fine-grained equipment image classification and identification method and system
CN114238587A (en) Reading understanding method and device, storage medium and computer equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination