CN111931585A

CN111931585A - Classroom concentration degree detection method and device

Info

Publication number: CN111931585A
Application number: CN202010676538.XA
Authority: CN
Inventors: 易秋晨; 罗明宇
Original assignee: Dongyun Ruilian Wuhan Computing Technology Co ltd
Current assignee: Dongyun Ruilian Wuhan Computing Technology Co ltd
Priority date: 2020-07-14
Filing date: 2020-07-14
Publication date: 2020-11-13

Abstract

The invention discloses a method and a device for detecting classroom concentration, wherein the method comprises the following steps: firstly, respectively acquiring posture data and facial expression data of each student in a current classroom scene based on a concentration degree identification model; generating a posture optimization reference in real time based on the posture data of each student; meanwhile, generating an expression optimization benchmark in real time based on facial expression data of each student; and finally, calculating the initial concentration degree evaluation value of each student based on the posture optimization reference, the expression optimization reference, and the posture data and the facial expression data corresponding to each student, and screening out the target students of which the initial concentration degree evaluation values do not meet the preset conditions. The invention can analyze and detect the concentration degree of students in different classroom scenes, accurately analyze the concentration degree of each student in the current classroom scene through an artificial intelligence technology, and finally automatically generate an accurate and objective student concentration degree detection result, thereby improving the teaching experience.

Description

Classroom concentration degree detection method and device

Technical Field

The invention relates to the technical field of artificial intelligence, in particular to a classroom concentration degree detection method and device.

Background

In the field of artificial intelligence, in order to improve the training efficiency of a neural network and the reasoning ability of a model, a pre-training model is usually used to accelerate the training process and improve the performance of the model. Similarly, when the artificial intelligence technology is applied to the digital teaching field, classroom data samples of a large number of different types need to be used for training, and information such as facial features and human postures of students can be accurately identified through model reasoning so as to carry out concentration degree analysis.

The traditional classroom concentration degree analysis method is difficult to objectively analyze the concentration degree of students, and is mostly limited to a single teaching environment, such as an indoor teaching environment in which students can only listen and speak in a sitting posture, and the concentration degree analysis of outdoor sports courses cannot be performed; therefore, the existing method still needs to be improved and developed for how to effectively apply the concentration analysis models in different classroom environments and ensure the objective effectiveness of the concentration analysis result in a high-quality calculation mode.

The above is only for the purpose of assisting understanding of the technical aspects of the present invention, and does not represent an admission that the above is prior art.

Disclosure of Invention

The invention mainly aims to provide a classroom concentration detection method and device, and aims to solve the technical problems that an artificial intelligence technology in the prior art is difficult to objectively analyze student concentration, is mostly limited by a single teaching environment, and cannot ensure objective validity of a finally analyzed student concentration result.

In order to achieve the above object, the present invention provides a method for detecting classroom concentration, which comprises the following steps:

acquiring posture data and facial expression data of each student from a current teaching scene image based on a concentration degree recognition model;

generating a posture optimization benchmark in real time based on the posture data of each student;

generating an expression optimization benchmark in real time based on facial expression data of each student;

calculating an initial concentration degree evaluation value of each student based on the posture optimization reference, the expression optimization reference and the posture data and the facial expression data corresponding to each student;

and screening out target students of which the initial concentration evaluation values do not meet preset conditions.

Preferably, the step of generating a posture optimization reference in real time based on the posture data of each student comprises:

evaluating the posture data of each student in real time to obtain the posture evaluation value of each student, and taking the average value of the posture evaluation values of all students as a posture optimization reference;

the step of generating an expression optimization benchmark in real time based on facial expression data of each student comprises the following steps:

evaluating the facial expression data of each student in real time to obtain the expression evaluation value of each student, and taking the average value of the expression evaluation values of all students as an expression optimization reference;

accordingly, the step of calculating an initial concentration degree evaluation value of each student based on the posture optimization reference, the expression optimization reference, and the posture data and facial expression data corresponding to each student includes:

and calculating an initial concentration degree evaluation value of each student based on the posture optimization reference, the expression optimization reference and the posture evaluation value and the expression evaluation value corresponding to each student.

Preferably, the step of calculating an initial concentration evaluation value of each student based on the posture optimization reference, the expression optimization reference, and the posture evaluation value and the expression evaluation value corresponding to each student specifically includes:

comparing the posture evaluation value of each student with the posture optimization reference to obtain the concentration degree posture evaluation value of each student;

comparing the expression evaluation value of each student with the expression optimization reference to obtain the concentration expression evaluation value of each student;

acquiring a preset weighting proportion of the concentration degree posture evaluation value and the expression evaluation value in the concentration degree evaluation value;

calculating an initial concentration degree evaluation value of each student according to the preset weighting proportion, the concentration degree posture evaluation value corresponding to each student and the concentration degree expression evaluation value;

correspondingly, the step of screening out the target students whose initial concentration evaluation values do not meet the preset conditions comprises the following steps:

respectively comparing the initial concentration evaluation value of each student with the current standard score, and screening out target students of which the initial concentration evaluation values are smaller than the current standard score; wherein the current standard score is generated in real time based on the pose optimization reference and the expression optimization reference.

Preferably, after the step of screening out the target students whose initial concentration evaluation values are smaller than the current standard score, the method further includes:

determining posture data and facial expression data of the target student;

and generating the target student concentration correction data according to the difference value between the initial concentration evaluation value of the target student and the current standard score so as to correct the initial concentration evaluation value of the target student.

Preferably, the concentration degree recognition model comprises a preset teaching environment recognition model, a preset human body posture recognition model, a preset expression recognition model and a preset face recognition model;

the step of acquiring the posture data and the facial expression data of each student from the current teaching scene image based on the concentration degree recognition model specifically comprises the following steps:

identifying a current scene based on the preset teaching environment identification model to obtain a current teaching scene image;

carrying out face recognition on each student in the current teaching scene image based on the preset face recognition model;

image cutting is carried out on each student in the current teaching scene image according to a face recognition result, and a cut individual face image of the student is obtained;

obtaining facial expression data of each student by the preset expression recognition model based on the individual face image of each student;

and acquiring posture data of each student from the current teaching scene image based on the preset human body posture recognition model.

Preferably, the step of obtaining facial expression data of each student by the preset expression recognition model based on the individual face image of each student comprises:

obtaining the expression type of the student by the preset expression recognition model based on the individual face image of the student, and obtaining the facial expression data of the student based on the expression type; the expression types include at least a happy expression type, a neutral expression type, a perplexing expression type, an aversive expression type, an angry expression type, and a surprise expression type.

Preferably, the preset human body posture recognition model comprises a neural network training program;

the neural network training program represents that batch images and annotation information are used as input or output data to an input neural network algorithm, the error between the predicted value and the true value of the neural network is used as an index for measuring the effect of the model and a loss function, training is carried out in an iteration mode of gradient descending, the loss function is descended to a threshold value or the change amplitude of parameters of the neural network reaches the threshold value and is used as a measuring index for training convergence, training is finished, and finally the preset human body posture recognition model is obtained.

Preferably, the image and the annotation information comprise a human body boundary frame, human body key point coordinates and human face key point coordinates; the human body boundary box is used for monitoring key points of a human body; the human body key point coordinates comprise coordinates of the human body important part points in a two-dimensional image rectangular coordinate system; the coordinates of the key points of the human face comprise the coordinates of the important points of the human face in a rectangular coordinate system of the two-dimensional image.

Preferably, the step of performing face recognition on each student in the current teaching scene image based on the preset face recognition model specifically includes:

sending the current teaching scene image into the preset face recognition model, and outputting identity information data to be verified of each student in the current teaching scene image;

and comparing the identity information data to be verified of each student with prestored student information in a student information database respectively, wherein if the identity information data to be verified is consistent with prestored student information in the student information database, the student corresponding to the identity information data to be verified is verified successfully.

In addition, to achieve the above object, the present invention further provides an apparatus for detecting classroom concentration, the apparatus including:

the data acquisition module is used for acquiring the posture data and the facial expression data of each student from the current teaching scene image based on the concentration degree recognition model;

the concentration degree analysis module is used for generating a posture optimization benchmark in real time based on the posture data of each student; generating an expression optimization benchmark in real time based on facial expression data of each student; calculating an initial concentration degree evaluation value of each student based on the posture optimization reference, the expression optimization reference and the posture data and the facial expression data corresponding to each student;

and the detection module is used for screening out the target students of which the initial concentration evaluation values do not meet the preset conditions.

The invention has the beneficial effects that: the detection method for the class concentration degree can analyze and detect the student concentration degree in various different class scenes, can accurately analyze the concentration degree of each student in the current class scene through an artificial intelligence technology, and finally automatically generates an accurate and objective student concentration degree detection result, thereby improving the teaching experience.

Drawings

FIG. 1 is a schematic flowchart of a first embodiment of a classroom concentration detection method according to the present invention;

fig. 2 is a schematic flow chart illustrating how each operator in the concentration analysis method obtains each evaluation value in the classroom concentration detection method according to an embodiment of the present invention;

FIG. 3 is a schematic diagram illustrating a relationship between a specification type of a teaching environment and identification of an image of the teaching environment and object information in the image according to an embodiment of the method for detecting concentration in a classroom;

FIG. 4 is a schematic diagram illustrating a relationship between a student's facial image and a tag (annotation information) according to an embodiment of the classroom concentration detection method of the present invention;

FIG. 5 is a block diagram of the hardware structure of the classroom concentration detection apparatus of the present invention;

fig. 6 is a schematic diagram of component relationships in a classroom concentration analysis system framework in an embodiment of the classroom concentration detection method of the invention.

The implementation, functional features and advantages of the objects of the present invention will be further explained with reference to the accompanying drawings.

Detailed Description

It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

The solution of the embodiment of the invention is mainly as follows: firstly, respectively acquiring posture data and facial expression data of each student in a current classroom scene based on a concentration degree identification model; generating a posture optimization reference in real time based on the posture data of each student; meanwhile, generating an expression optimization benchmark in real time based on facial expression data of each student; and finally, calculating the initial concentration degree evaluation value of each student based on the posture optimization reference, the expression optimization reference, and the posture data and the facial expression data corresponding to each student, and screening out the target students of which the initial concentration degree evaluation values do not meet the preset conditions. The detection method for the class concentration degree can analyze and detect the student concentration degree in various different class scenes, can accurately analyze the concentration degree of each student in the current class scene through an artificial intelligence technology, and finally automatically generates an accurate and objective student concentration degree detection result, thereby improving the teaching experience.

It can be understood that deep learning is an algorithm which takes an artificial neural network as an architecture and performs characterization learning on data. The invention uses a deep learning algorithm to generate a concentration degree recognition model for classroom concentration degree analysis, and the concentration degree recognition model comprises a plurality of deep learning submodels which specifically comprise a preset teaching environment recognition model, a preset face recognition model, a preset expression recognition model and a preset human body posture recognition model. To perform model training, it is first necessary to have a data set sample that can be learned, the sample is usually text, voice or image data, and the samples used in the present invention are all image data.

The training in the deep learning refers to a process of predicting parameter recursion directions in an artificial neural network (referred to as a neural network for short) through a program by using known sample data and labeled information, and the trained neural network has the capability of identifying the mapping relation between the data and the labeled information, namely, a model is generated through training. The model also refers to a file containing trained neural network parameter information and neural network structural features, and the file can be used for incremental training and deployment of model inference services. The model file is used in the present invention to extend the scope of existing model knowledge by adding new input data.

Referring to fig. 1, fig. 1 is a schematic flow chart of a first embodiment of the method for detecting classroom concentration according to the present invention; in a first embodiment, the method for detecting classroom concentration includes the following steps (expression recognition, body posture recognition + concentration analysis):

human body posture recognition and expression recognition

Step S10: acquiring posture data and facial expression data of each student from a current teaching scene image based on a concentration degree recognition model;

it should be noted that the main execution body of the embodiment is a computer, and the embodiment collects a current teaching scene image through a camera of the classroom concentration detection apparatus, and sends the current teaching scene image to the concentration recognition model;

the concentration degree recognition model comprises a preset teaching environment recognition model, a preset human body posture recognition model, a preset expression recognition model and a preset face recognition model, and the step S10 specifically comprises the following substeps:

the substep is as follows: identifying a current scene based on the preset teaching environment identification model to obtain a current teaching scene image (see the contents of the second embodiment of the invention);

the substep is as follows: performing face recognition on each student in the current teaching scene image based on the preset face recognition model (see the content of the third embodiment of the invention);

the substep is as follows: image cutting is carried out on each student in the current teaching scene image according to a face recognition result, and a cut individual face image of the student is obtained;

the substep is as follows: obtaining facial expression data of each student by the preset expression recognition model based on the individual face image of each student; specifically, the preset expression recognition model acquires the expression types of the students based on the individual face images of the students, and acquires the facial expression data of the students based on the expression types; the expression types at least comprise a happy expression type, a neutral expression type, a perplexing expression type, an aversive expression type, an angry expression type and a surprise expression type;

the substep is as follows: and acquiring posture data of each student from the current teaching scene image based on the preset human body posture recognition model. The preset human body posture recognition model comprises a neural network training program; the neural network training program represents that batch images and annotation information are used as input or output data to an input neural network algorithm, the error between the predicted value and the true value of the neural network is used as an index for measuring the effect of the model and a loss function, training is carried out in an iteration mode of gradient descending, the loss function is descended to a threshold value or the change amplitude of parameters of the neural network reaches the threshold value and is used as a measuring index for training convergence, training is finished, and finally the preset human body posture recognition model is obtained.

Specifically, in the embodiment, a current teaching scene image is acquired through a camera of the classroom concentration degree detection device, and the current teaching scene image is sent to a preset human body posture recognition model;

in a specific implementation, the present embodiment defines a human body posture image data set as necessary data for training a preset human body posture recognition model, where the human body posture image data set has the following characteristics: (1) images must cover a wide range of scenes, such as classrooms, stadiums, other venues, etc.; (2) the image must be clear enough, and human eyes can distinguish the appearance and the posture of all human bodies; (3) the image needs to be marked with information capable of reflecting human body gestures, including: the human body boundary box, the human body key point coordinates and the human face key point coordinates. For example, the human body key point coordinates are coordinates of important parts of the human body such as eyes, ears, nose, shoulders, elbows, wrists, knees and ankles in a rectangular coordinate system of the two-dimensional image, and the human face key point coordinates are coordinates of important parts of the human face such as boundary points of the mouth and eyes in the rectangular coordinate system of the two-dimensional image.

Specifically, the neural network training program of the present embodiment includes the following sub-steps:

posture training process 1: defining a neural network training program for human body posture recognition, training in an iterative mode of gradient-like decline by using batched images/labels as input/output data to an afferent neural network algorithm and using an error between a predicted value and a true value of a neural network as an index and a loss function for measuring the effect of the model, and finishing the training by using the loss function descending to a threshold or the change amplitude of parameters of the neural network reaching the threshold as a measurement index for training convergence to finally obtain a neural network model;

posture training process 2: training according to the neural network training process in the posture training process 1 by using a target detection algorithm; specifically, the images in the input data pair are the images of each frame of the student image sequence and are marked as human body boundary frames in the human body posture detection data set; training to obtain a human body boundary detection model m 1;

posture training process 3: training according to the neural network training flow in the posture training flow 1 by using a human body key point detection algorithm; specifically, the image in the input data pair is a human body image cut according to the human body boundary frame in the label, and the image is labeled as the human body key point coordinate in the human body posture detection data set. Training to obtain a human key point detection model m 2;

posture training process 4: defining a post-processing method m2' for converting the key points of the human body into the orientation and the angle of the joints of the human body; specifically, the method inputs the coordinates of key points of the human body and optionally the coordinates of control points in an image/world coordinate system, and outputs the coordinates of the key points of the human body and the coordinates of the control points in the image/world coordinate system as the orientation and the angle of joints of the human body. One possible implementation method is: solving a rotation vector corresponding to the key point coordinate by using an N-point perspective pose solving method, and converting the rotation vector into an Euler angle;

posture training process 5: and training according to the neural network training flow in the posture training flow 1 by using a human face key point detection algorithm. Specifically, the image in the input data pair is a human body image cut according to the human body boundary frame in the label, and the image is labeled as the coordinates of the human face key points in the human body posture detection data set. Training to obtain a face key point detection model m 3;

posture training process 6: a post-processing method m3' is defined that converts facial keypoints into facial orientations. Posture training process 5: inputting data into face key point coordinates and optionally control point coordinates in an image/world coordinate system, and outputting the data into face orientation;

posture training process 7: and connecting the neural network model and the post-processing method in series and in parallel according to m1- > [ m2- > m2', m3- > m3' ] to form an integral human body posture recognition model.

Concentration analysis

Step S201: generating a posture optimization benchmark in real time based on the posture data of each student;

specifically, the computer evaluates the posture data of each student in real time to obtain the posture evaluation value of each student, and takes the average value of the posture evaluation values of all students as the posture optimization reference;

step S202: generating an expression optimization benchmark in real time based on facial expression data of each student;

specifically, the computer evaluates the facial expression data of each student in real time to obtain the expression evaluation value of each student, and takes the average value of the expression evaluation values of all students as an expression optimization reference;

it should be noted that, in the process of attentiveness analysis, the computer realizes four operators (operator 1 attitude threshold operator, operator 2 expression threshold operator, operator 3 attentiveness operator, and operator 4 attentiveness bias operator) by using a preset human body attitude identification model:

operator 1 attitude threshold operator: acquiring a posture evaluation value based on the human body posture recognition model reasoning result, and optimizing the posture evaluation value by using the posture threshold operator to acquire a posture evaluation optimized value;

the present embodiment preferably uses a posture evaluation value reward and punishment mechanism based on the average of all current student posture evaluation values as an optimization reference, and the posture threshold operator sets that the optimized score proportion of the student personal posture evaluation value close to the reference is higher than the score far from the reference. The attitude evaluation value of the present embodiment has two attribute classifications: one positive and one negative;

for example, the human posture recognition model reasoning result comprises all human body boundary frames, human body key point coordinates, corrected human body joint angles, facial key point coordinates and corrected facial orientation euler angles in each frame image in the output student image sequence; preferably, the human body bounding box is used as a basis for distinguishing individuals, and the arm postures of the human body are judged according to the visibility and the angle of the shoulder-elbow-wrist, and are divided into 'lifting hands', 'placing on a table top', 'placing under the table top' and 'other'. The score of "lifting hands" and "putting on the table" is positive, the score of "putting under the table" is negative, and the score of the other "is 0; preferably, the leg postures of the human body are judged according to the visibility and the angle of hip, knee and ankle, and the leg postures are divided into sitting postures and standing postures;

for example, leg gestures are used to confirm the status of the current classroom and to distinguish teachers and students, not for personal concentration scoring. Preferably, the human face orientation is judged by the visibility of the face key points and the euler angle of the face orientation. And establishing a space rectangular coordinate system with the teaching environment plane facing the blackboard direction longitudinally, the teaching environment plane horizontal direction and the teaching environment space longitudinal direction. The standard direction is the orientation in which the student normally looks straight ahead completely. Judging the face orientation range in the standard direction when the teacher can not be detected; when the teacher is detected, the face orientation range is judged in a standard direction or a direction from the student toward the teacher. The yaw angle < + > -45 degrees, the pitch angle < + > -60 degrees, the yaw angle < + > -60 degrees are positive scores, otherwise, negative scores. The score is 0 when the orientation cannot be determined due to the absence of facial key points. The concentration threshold operator in the concentration analysis method takes the sum of all scores as the concentration estimation optimization value of the student through weighted average.

Operator 2 expression threshold operator: the embodiment acquires an expression evaluation value based on the expression recognition model reasoning result, optimizes the expression evaluation value by using the expression threshold operator, and acquires an expression evaluation optimization value; preferably, the mean value of the expression evaluation optimized values of all students with the highest expression evaluation optimized value ratio at the current time is used as a reference, and the difference ratio between the individual expression evaluation optimized value of each student and the mean value is obtained.

For example, based on the face recognition model reasoning service result, the expression recognition model can detect six expressions of distraction, neutrality (no expression), difficulty, disgust, anger and surprise presented by a face alignment image, the facial expressions of students are generally influenced by the objective of a classroom environment and the subjective influence of individual concentration degree, and when the expressions are taken as concentration degree analysis parameters, the expressions of all students at the same time are taken as expression threshold operators mentioned in the concentration degree analysis method as optimized reference values; in an embodiment, when all students present a more difficult expression probability than other expressions and one student presents a more happy expression probability than five expressions, the expression threshold operator optimizes the student concentration degree emotion assessment value for the happy expression, wherein the optimization characteristic is that the student has a lower score than the other students.

It should be noted that, in the embodiment, the above description is given by taking a concentration degree posture value reward and punishment mechanism for setting the score proportion of the forward posture evaluation optimization value to be higher than that of the negative posture as an example 1.

Step S30: calculating an initial concentration degree evaluation value of each student based on the posture optimization reference, the expression optimization reference and the posture data and the facial expression data corresponding to each student;

in a specific implementation, calculating a concentration degree posture assessment value and a concentration degree emotion assessment value of each student through a concentration degree calculation rule of an operator 3 concentration degree operator to obtain an initial concentration degree assessment value of each student;

the concentration calculation rule of the operator 3 concentration operator comprises calculating the posture evaluation optimization value to obtain a concentration posture evaluation value; calculating the expression evaluation optimization value based on the concentration degree calculation rule to obtain a concentration degree emotion evaluation value; setting a weighting proportion of the concentration degree posture evaluation value and the concentration degree expression evaluation value when calculating the concentration degree evaluation value and then calculating the concentration degree evaluation value;

specifically, the computer compares the posture evaluation value of each student with the posture optimization reference to obtain a concentration degree posture evaluation value of each student; comparing the expression evaluation value of each student with the expression optimization reference to obtain the concentration expression evaluation value of each student; acquiring a preset weighting ratio of the concentration degree posture evaluation value and the expression evaluation value in the concentration degree evaluation value, and calculating an initial concentration degree evaluation value of each student according to the preset weighting ratio, the concentration degree posture evaluation value corresponding to each student and the concentration degree expression evaluation value;

it can be understood that, referring to fig. 2, fig. 2 shows a flow representation of each value output by each operator in the method in a scenario where the gesture recognition model outputs two gesture recognition results and the expression recognition model outputs two expression recognition results, and the calculation method is defined to be substantially not changed along with the change of the type and the number of the gesture recognition results in the embodiment; similarly, the change of the number of the types of the expression recognition results does not influence the calculation of the concentration degree weighted average rule.

Step S40: and screening out target students of which the initial concentration evaluation values do not meet preset conditions.

In a specific implementation, the concentration evaluation result in step S30 is optimized by an operator 4 concentration bias operator, and a concentration analysis result is obtained. The specific concentration value may be obtained by: and setting the proportion of each analysis concentration degree evaluation value based on the concentration degree analysis frequency, and carrying out weighted calculation on all the concentration degree evaluation values to acquire the concentration degree value.

Specifically, the initial concentration degree evaluation values of the students are compared with the current standard scores respectively, and target students with initial concentration degree evaluation values smaller than the current standard scores are screened out; wherein the current standard score is generated in real time based on the pose optimization reference and the expression optimization reference.

For example, the present embodiment takes 100 as a full scale, and performs concentration analysis on different time points in the whole class, assuming that 10 times of concentration detection are performed in one class, and the score weight of one detection is 10%; the sum of the six expressions (the six expressions of happiness, blankness, difficulty, disgust, anger and surprise) is 100 points, the attitude score is integrated into 100 points, and 50 people in one class are taken as an example; at a certain moment, the six expression scores of the student A are respectively 90 happy, 2 neutral, 2 too much, 2 disgust, 2 angry and 2 surprise, while the average scores of the six expressions of other classmates are 2 happy, 90 neutral, 2 too much, 2 disgust, 2 angry and 2 surprise; at the moment, the expression threshold operator judges that the difference between the student A and the expression benchmark at the current moment is larger than 60%, the calculation is carried out according to the proportion that the expression influence concentration degree is 40% and the posture influence concentration degree is 60%, the expression of the student A deviates from the benchmark value seriously, namely the neutral 90 with the highest average score of the expressions of other students is obtained, a punishment mechanism is adopted when the expression threshold operator optimizes the expression, the expression score is 100-the result after the deviation from the proportion is multiplied by 40%, and the final expression score is 4.8. If the gesture threshold operator score is negative gesture, the final score of the student A is 4.8 points. The current score accounts for only 10% of the total score, and if the score of the student A is 90 in the next 9 scores, the final concentration score of the student A in the current classroom is 90x 90% +4.8 which is 85.8. The overall concentration of the student A in the current classroom is also in the best state by judging with the score segment standard of 100-80 as the best, 80-60 as the good and 60 below as the difference.

Optionally, the classroom concentration degree analysis value of the target student that does not satisfy the preset condition and is finally screened out by the embodiment is output to the front-end page for previewing through the classroom concentration degree detection device.

The detection method for the class concentration degree can analyze and detect the concentration degree of students in various different class scenes, can analyze and detect the concentration degree of each student in the current class scene through an artificial intelligence technology, and finally automatically generates an accurate and objective student concentration degree detection result, thereby improving teaching experience.

Further, based on the first embodiment of the detection method for classroom concentration degree, the second embodiment (scene recognition) of the invention is provided,

in this embodiment, a teaching environment recognition model training method is preset for a preset teaching environment recognition model, and the teaching environment recognition model training method includes: acquiring a teaching environment image dataset; defining a teaching environment image dataset as essential data for teaching environment recognition model training, the dataset having the following characteristics:

the method comprises the following steps that 1, the image needs to have the capacity of reflecting real scene information of a teaching environment; 2, pictures with multi-angle display are needed in the same type of teaching environment; the characteristics 3 and the image can clearly present the information of each object in the teaching environment scene, for example, the information of a fixedly placed desk and chair is presented in the image. And all the teaching environment image data with the labeled information form a teaching environment image data set. The annotation information of the object that can be identified in the image needs to be expressed explicitly and indicates the area position of the object in the image. Fig. 3 shows a representation of the annotation information of the teaching environment image, where there may be multiple teaching environment images of the same specification type, and each image may contain multiple object information;

correspondingly, the teaching environment recognition model training method corresponds to a teaching environment recognition model training program, and the teaching environment recognition model is trained based on a teaching environment image data set as input data, wherein the teaching environment recognition model training program comprises the steps of 1, setting an interface for acquiring a data set storage address and a model name of the training; 2. normalizing the teaching environment image data into a neural network input data format; 3. calculating the number of samples in the data set, and dynamically adjusting the size and epoch of each batch size according to hardware resources; 4. randomly selecting a sample of batch size, and acquiring object features in the image from the teaching environment image dataset sample according to an object recognition algorithm; 5. and coding the object features by using a neural network A according to a scene recognition algorithm, then combining to obtain teaching environment features, and supervising the neural network training by using a loss function, an optimizer and a random gradient descent method. 6. And based on the epochs, finishing the training of the model after the training program runs to finish the last epoch, and acquiring the model for incremental training and model reasoning.

Correspondingly, the step S30 specifically includes:

the method comprises the following steps: and calculating the initial concentration degree evaluation value of each student based on the posture optimization reference, the expression optimization reference and the position coordinate, the posture data and the facial expression data in the scene corresponding to each student.

It can be understood that, the specific implementation manner of step S301 in this embodiment is described in the first embodiment, and the "position coordinate in the scene corresponding to each student" may correspond to the "establishing a spatial rectangular coordinate system with the teaching environment plane facing the blackboard direction longitudinally, the teaching environment plane being horizontal, and the teaching environment space being vertical in the first embodiment. The standard direction is the orientation in which the student normally looks straight ahead completely. Judging the face orientation range in the standard direction when the teacher can not be detected; when the teacher is detected, the face orientation range is judged in a standard direction or a direction from the student toward the teacher. The yaw angle < + > -45 degrees, the pitch angle < + > -60 degrees, the yaw angle < + > -60 degrees are positive scores, otherwise, negative scores. The score is 0 when the orientation cannot be determined due to the absence of facial key points. In the concentration analysis method, the pose threshold operator uses the sum of all scores as the coordinate in the spatial rectangular coordinate system in the concentration pose estimation optimization value of the student through weighted average, which is not described herein again in this embodiment.

According to the method, the training operation process of the teaching environment recognition model is simplified, images and label data are organized according to the data set characteristics defined by the method, data set addresses are provided, the name of the training model at this time is defined, and the training program automatically manages the training model so as to obtain the mapping relation between the teaching environment and the environment marking information.

Further, on the basis of the first embodiment of the classroom concentration detection method according to the present invention, a third embodiment of the present invention is provided, in which in the third embodiment (face authentication), the sub-step of performing face recognition on each student in the current teaching scene image based on the preset face recognition model specifically includes:

and comparing the identity information data to be verified of each student with prestored student information in a student information database respectively, wherein if the identity information data to be verified is consistent with prestored student information in the student information database, the student corresponding to the identity information data to be verified is verified successfully. In a specific implementation, the embodiment presets a face recognition model training method in a classroom environment, and the method includes: acquiring a student face image data set; the face recognition model training method of the embodiment defines a student face image data set as necessary data for face recognition model training, and the data set should have the following characteristics: 1, the image is required to clearly present facial information of students; the characteristic 2 represents the corresponding relation between the image and the identity of the student through the unique labeling information, the characteristic 3 represents different face images of the same student through the same label, and the characteristic 4 represents at least two face images of the same student. Fig. 4 shows a labeling mode of the image and the labeling information in the student face image data set.

Accordingly, the face recognition model training method corresponds to a face recognition model training program for training a face recognition model based on a student face image data set as input data, the training program method comprising: 1. setting an interface for acquiring a storage address of a data set and a name of a training model; 2. normalizing the student face image data into an input data format of a training model; 3. calculating the number of samples in the data set, and dynamically adjusting the size of each batch ize and the cyclic training times (epochs) of all samples according to hardware resources; 4. randomly selecting a batch size sample, and acquiring a coordinate area of a human face in an image from the normalized data set sample according to a human face detection algorithm; 5. adjusting and abbreviating (res ize) all student face images into images with the same size based on the face coordinate area; 6. extracting face feature vectors of all samples in the same batch size according to a neural network A in a face recognition algorithm based on the face image with the same size; 7. mapping all sample image data in batch size to Euclidean space by using a neural network B, and supervising the training of the convolutional neural network by using a loss function, an optimizer and a random gradient descent method; 8. and based on the epochs, finishing the training of the model after the training program runs to finish the last epoch, and acquiring the model for incremental training and model reasoning.

In the embodiment, the training operation process of the face recognition model is simplified, images and label data are organized according to the features of the data set defined by the method, the data set address is provided, the name of the training model at this time is defined, and the training program automatically manages the training model so as to obtain the mapping relation between the face images and the identity information of students.

In addition, in order to achieve the above object, the present invention further provides an embodiment of a classroom concentration detection apparatus, as shown in fig. 5, the classroom concentration detection apparatus may include: a processor 1001, such as a CPU, a communication bus 1002, a user interface 1003, a network interface 1004, a memory 1005, and a camera 1006. Wherein a communication bus 1002 is used to enable connective communication between these components. The user interface 1003 may comprise a Display screen (Display), and the optional user interface 1003 may also comprise a standard wired interface, a wireless interface. The network interface 1004 may optionally include a standard wired interface, a wireless interface (e.g., WI-FI interface). The memory 1005 may be a high-speed RAM memory or a non-volatile memory (e.g., a magnetic disk memory). The memory 1005 may optionally be a storage device separate from the processor 1001, the memory 1005 storing a data acquisition module, a concentration analysis module, and a detection module. The camera 1006 is used for acquiring images of a current classroom scene and acquiring face images of students in the current classroom scene, and the data acquisition module comprises a teaching environment recognition module, a human body posture recognition module, an expression recognition module and a face recognition module; the processor may invoke the teaching environment recognition module, the human posture recognition module, the expression recognition module, the face recognition module, the concentration analysis module, and the detection module stored in the memory 1005 to detect the classroom concentration of the student.

Those skilled in the art will appreciate that the configuration of the apparatus shown in fig. 5 is not intended to be limiting of the devices described herein and may include more or less components than those shown, or some components may be combined, or a different arrangement of components.

Specifically, the functions of each functional module in this embodiment are respectively:

the teaching environment recognition module is used for recognizing the current scene based on the preset teaching environment recognition model so as to obtain a current teaching scene image;

the human body posture recognition module is used for acquiring posture data of each student from the current teaching scene image;

the face recognition module is used for carrying out face recognition on each student in the current teaching scene image;

the face recognition module is also used for cutting images of all students in the current teaching scene image according to the face recognition result to obtain cut individual face images of the students;

the expression recognition module is used for obtaining facial expression data of each student based on the individual face image of each student;

The detection device of degree is absorbed in class of this embodiment can be absorbed in the degree to the student in the classroom scene of various differences and carry out analysis and detection, can carry out accurate degree of absorbing in analysis to every student under the current classroom scene through artificial intelligence technique to final automatic generation accuracy objective student is absorbed in the degree testing result, has improved teaching experience.

Furthermore, each functional module of the classroom concentration detection device can be integrated in the existing software system to provide functional services for each component, and can also be independently deployed, and meanwhile, the invention does not limit the strong coupling relationship among the system components, and is within the protection scope of the invention.

In this embodiment, an implementation scenario when the present invention is independently deployed is described in a local Web service instantiation manner, where Web service instantiation is divided into front-end service and back-end service, an external component interaction function is implemented by the front-end service, and an instantiated system is used as the back-end service. An external user transmits a training data set through a front-end interface, controls the running state of the system, inputs data to be predicted (the data to be predicted comprises a current teaching scene image and face images of students in a current classroom scene), inquires concentration historical information and controls the version of a system model. The system receives training data set data, adjusts the system running state, receives data to be predicted, outputs concentration degree historical information and adjusts the system model version at the back end.

Referring to fig. 6, fig. 6 is a schematic diagram showing a relationship between components of the classroom concentration analysis system framework according to the present invention, and the present embodiment will describe a usage flow of a preferred embodiment according to the schematic diagram of the relationship between components in the diagram, where each flow includes:

the method comprises the following steps that 1, a user gives a system running state instruction and inputs data information to be used on a front-end page;

the flow 2, the front-end page jumps to a corresponding instruction interface, a user fills corresponding instruction information, the front-end service sends an instruction and transmits information to the system, and the system enters a corresponding instruction state;

the flow 3, the system completes the instruction function and returns the operation result to the front-end interface;

and 4, viewing the system operation result through the front-end interface by the user.

The system running state instruction is as follows:

reasoning instructions; training instructions; inquiring an instruction; and switching the model version instruction.

When the system operation state instruction given by the user is the inference instruction, the system operation flow in this embodiment further includes:

the method comprises the following steps that 1, a user prepares data to be predicted and organizes all the data to be predicted in the same folder;

inputting a storage address of a root directory folder of data to be predicted in a front-end page, identifying all image data in a current address as data information to be predicted by a data management component to be predicted, and storing storage absolute paths of all data to be predicted in a data storage address array to be predicted;

3, the data management component to be predicted transmits the array in the step 2 as input data to the reasoning service management component;

the process 4, the reasoning service management component receives a reasoning instruction sent by the system operation state and starts a reasoning service access interface, the data to be predicted management component inputs a data storage address array to be predicted, and the reasoning service management component starts to call the reasoning service script management component to operate the reasoning service;

preferably, the inference service management component uses a loop function to traverse the data storage address array to be predicted to obtain data therein, and obtains an image data address as an inference service call parameter at the current moment each time;

optionally, the input system runs a reasoning service frequency (reasoning frequency), and when the instantiated data to be predicted is video data, the reasoning frequency is used for acquiring the frame number interval of the video frames per second;

the flow 5, the reasoning script management component receives the calling parameter, obtains the image information in the data address to be predicted in the parameter, and generates a reasoning service method according to the combined model, and the reasoning rule analyzes the concentration of the data;

preferably, after the inference service is completed, all the label information corresponding to the students identified by the faces in the current data image is stored in the student identity information array, in this embodiment, the label information is a student status number, and the student status number is used to perform student identity detail query operation on the student information database to obtain the identity information such as the corresponding name of the student;

preferably, after the reasoning service is completed, all student expression recognition results in the current data image are stored into an expression recognition result array, all student posture recognition results are stored into a posture recognition result array, the module analyzes the model reasoning service recognition results by using the concentration degree analysis method, and the class concentration degree information of all students in the current image data is obtained;

preferably, the present component is used to enter the detailed identity information and classroom concentration information for each student into a classroom concentration database.

And 6, the flow 6 and the classroom information concentration degree database send the correlated and stored student identity information and classroom concentration degree information to a front-end interface for information display.

When the instruction of the system running state given by the user is a training instruction, the system running process in this embodiment further includes:

the method comprises the following steps that 1, a user prepares training data according to the characteristics of a data set of the system;

the process 2 is that the user stores path address information and training parameters by filling in training data on a front-end page, the front-end page transmits the training parameters as input data to a model training management component, and the training parameters comprise: the training model name and the model training type are respectively face recognition model training, body posture recognition model training and classroom environment recognition model training, and optionally, a historical model is selected for incremental training;

3, after the system receives the parameter information transmitted by the front end, the model training management component analyzes the model training type in the parameter information and uses a corresponding training method;

the model training type and the training method corresponding relation is that a face recognition model training method is used for face recognition model training, a teaching environment recognition model training method is used for teaching environment recognition model training, and a human body posture recognition model training method is used for human body posture recognition model training;

and 4, completing model training, and displaying model training completion information on a front-end page.

When the user selects the model training type as the face recognition model training, the embodiment of the face recognition model training method further comprises the following steps:

the method comprises the following steps that 1, a user prepares a data set according to a relation schematic diagram of a student face image data set given in a figure 4, in the embodiment, clear facial feature images of students and standard student status non-crown photography images are used as student face image training data, and student status numbers are used as unique information identification labeling information of the students; the labeling mode adopts a mode that the student book number is used as a folder name to label, the data organization mode is a directory structure mode that the facial images of the students are stored in folders to organize a data set, and the image folders of all the students are stored in a root directory folder;

the process 2, the face recognition model training program analyzes all image data in the data set storage address in the training parameter information, the student face image data is normalized into a matrix array which can be calculated by a neural network algorithm, and student identity student status numbers are stored as a label matrix array;

flow 3, calculating the number of samples by the training program, and dynamically adjusting the size of each batch size and the number of epochs to be trained according to hardware resources, where this embodiment takes two facial images of each student of 10 student data, the training program obtains the size of a Graphics Processing Unit (GPU) resource and a memory resource in hardware device information of a currently running system, the training program calculates the size of the batch size by dividing the size of the memory occupied by an image data matrix by the size of the GPU video memory or the size of the memory, and if the ratio is greater than 2, the value of the batch size is set as the number of images, and in this embodiment, the value of the batch size is 200; when the ratio of the batch size to the system resource is greater than 1, the program defaults to setting the epoch value to be 5, and when the ratio is less than 1 and greater than 0, the epoch value is updated by multiplying the first digit of the decimal point by the value 5, and in this example, the epoch value is 5;

the flow 4, the training program randomly selects sample data of a batch size in the image data matrix, and uses the MTCNN algorithm to identify the coordinate position of the face in the data set sample, and the resize face image is a fixed-size image, such as an image size with 32 pixels wide and 32 pixels high; it can be understood that the expression recognition model can only accept images with consistent size as input data, so that the flow adjusts all face images into images with uniform size through the function of alignment operation;

inputting the aligned image into facenet algorithm, and extracting face feature vectors by using an inference neural network convolution layer;

step 6, a neural network feature normalization layer is used for mapping the facial feature vectors extracted in the step 5 into an Euclidean space, based on the facial feature vectors, the spatial distance of the facial feature vectors of the same person in different images is small, the spatial distance of the facial feature vectors of different persons is large, and a ternary sub-loss function, an adam optimizer and a random gradient descent method are used for supervising the training of the neural network;

and 7, based on the epoch value in the step 3, finishing the model training after the training program runs to finish the last epoch.

When the user selects the model training type to train the teaching environment recognition model, the teaching environment recognition model training method further comprises the following steps:

in the embodiment, the annotation information such as "classroom environment", "teacher environment", and "teaching environment ladder" is used as the specification type, and the image number coordinate position of a fixed object such as "seat", "chair", etc. appearing in the image of the teaching environment is recorded in a text file by a manual annotation method. The method includes that a text file of object information is called an object labeling file, and an image, specification types and content organization modes of the object labeling file can be various, the method adopts the mode of organizing a face recognition image data set, one folder contains all images and object labeling files, a secondary folder is named according to the specification types, corresponding teaching environment images are stored in the specification type folders, and all object information is recorded in one text file;

the process 2, the teaching environment recognition model training program analyzes all image data in the data set storage address in the training parameter information, normalizes the teaching environment image data into a matrix data type which can be calculated by a neural network algorithm, and normalizes the labeling information in the labeling file into a label matrix data type;

the teaching environment recognition training program and the face recognition training program have the same calculation mode of batch size and epoch;

the process 4, the teaching environment recognition training program randomly selects sample data of a batch size in the image data matrix, trains the image by using a Resnet neural network, trains the teaching environment category and the object category by using the sampling mode provided by the figure 6, codes the object characteristics obtained by the image and then combines the object characteristics;

the object feature coding is that a neural network full-connection layer is used for combining all object feature vectors identified in an image;

and 5, based on the epoch value in the step 3, finishing the model training after the training program runs to finish the last epoch.

When the user selects the model training type as the human body posture recognition model training, the human body posture recognition model training method further comprises the following steps:

flow 1, a user prepares a human body posture image data set, and the labels used in the embodiment are as follows: A. bounding box coordinates of the human body; b.14 coordinates of key points (left ear, right ear, left shoulder, right shoulder, left elbow, right elbow, left wrist, right wrist, left hip, right hip, left knee, right knee, left ankle and right ankle) of the human body; C. a face bounding box; D. coordinates of key points of the face;

optionally, additionally using a teaching environment model ME;

the flow 2 and the human body posture recognition training program have the same calculation mode of the batch size and the epoch as the human body posture recognition training program;

3, training a human body detection model M1 on the labeled data A by using a target detection algorithm YOLO, inputting a picture to be detected by the model, and outputting all detected human body bounding boxes;

4, training a human body key point detection model M2 on the labeling data A, B by using a key point detection algorithm HRNet;

and 5, based on the epoch value in the step 2, finishing the model training after the training program runs to finish the last epoch.

When the system operation state instruction given by the user is the inference instruction, the implementation of the concentration analysis method in this embodiment further includes:

the method comprises the following steps that 1, an inference service script management component obtains inference service results of each model;

based on the human body posture recognition model reasoning result, calculating the result of M2 by using the post-processing model M3 to obtain a corresponding posture angle, and correcting the output by using the detection result of ME;

3, training a face key point detection model M4 on the labeling data C, D by using a face key point detection algorithm PFLD, inputting the model into a local picture containing a single human body, and outputting all key point coordinates of the face and the Euler angle of the face orientation;

optionally, the output is rectified using the detection result of the ME;

the flow 4, outputting all human body boundary frames, human body key point coordinates, corrected human body joint angles, facial key point coordinates and corrected facial orientation Euler angles in predicted images according to the human body posture recognition model reasoning results; preferably, the human body bounding box is used as a basis for distinguishing individuals, and the visibility and angle of the shoulder-elbow-wrist are used for judging the arm postures of the human body, wherein the arm postures are divided into 'lifting hands', 'placing on a table top', 'placing under the table' and 'other'. The score of "lifting hands" and "putting on the table" is positive, the score of "putting under the table" is negative, and the score of the other "is 0; preferably, the leg postures of the human body are judged according to the visibility and the angle of hip, knee and ankle, and the leg postures are divided into sitting postures and standing postures;

preferably, the leg pose is used to confirm the status of the current classroom and to distinguish teachers and students, not for personal concentration scoring. Preferably, the human face orientation is judged by the visibility of the face key points and the euler angle of the face orientation. And establishing a space rectangular coordinate system with the teaching environment plane facing the blackboard direction longitudinally, the teaching environment plane horizontal direction and the teaching environment space longitudinal direction. The standard direction is the orientation in which the student normally looks straight ahead completely. Judging the face orientation range in the standard direction when the teacher can not be detected; when the teacher is detected, the face orientation range is judged in a standard direction or a direction from the student toward the teacher. The yaw angle < + > -45 degrees, the pitch angle < + > -60 degrees, the yaw angle < + > -60 degrees are positive scores, otherwise, negative scores. The score is 0 when the orientation cannot be determined due to the absence of facial key points. The concentration threshold operator in the concentration analysis method takes the sum of all scores as the concentration estimation optimization value of the student through weighted average.

The method comprises the following steps that a flow 5 is performed, based on a face recognition model reasoning service result, an expression recognition model can detect six expressions of distraction, neutrality (no expression), difficulty, disgust, anger and surprise presented by a face alignment image, facial expressions of students are generally influenced by objective of a classroom environment and subjective influence of personal concentration degree, and when the expressions are used as concentration degree analysis parameters, the expressions of all students at the same moment are taken as expression threshold operators mentioned in the concentration degree analysis method as optimized reference values; in an embodiment scenario, when all students present refractory expression probabilities higher than other expressions and one student presents happy expression probabilities higher than five expressions, the expression threshold operator optimizes the student concentration degree emotion assessment value making happy expression, and the optimization characteristic is that the score of the student is lower than that of other students at the moment;

the process 6 and the embodiment are divided into full-scale systems by 100, concentration degree analysis is performed on different time points in the whole class, and it is assumed that 10 times of concentration degree detection is performed in one class, and the score weight of one detection is 10%; the sum of the six expressions is 100 points, the attitude score is integrated into 100 points, and the example is that 50 people in one class are taken; at a certain moment, the six expression scores of the student A are respectively 90 happy, 2 neutral, 2 too much, 2 disgust, 2 angry and 2 surprise, while the average scores of the six expressions of other classmates are 2 happy, 90 neutral, 2 too much, 2 disgust, 2 angry and 2 surprise; at the moment, the expression threshold operator judges that the difference between the student A and the expression benchmark at the current moment is larger than 60%, the calculation is carried out according to the proportion that the expression influence concentration degree is 40% and the posture influence concentration degree is 60%, the expression of the student A deviates from the benchmark value seriously, namely the neutral 90 with the highest average score of the expressions of other students is obtained, a punishment mechanism is adopted when the expression threshold operator optimizes the expression, the expression score is 100-the result after the deviation from the proportion is multiplied by 40%, and the final expression score is 4.8. If the gesture threshold operator score is negative gesture, the final score of the student A is 4.8 points. The current score accounts for only 10% of the total score, and if the score of the student A is 90 in the next 9 scores, the final concentration score of the student A in the current classroom is 90x 90% +4.8 which is 85.8. Judging by using a score segment standard with 100-80 as excellent, 80-60 as good and 60 below as poor, wherein the overall concentration of the student A in the current classroom is excellent;

7, outputting the final classroom concentration degree analysis value of the student to a front-end page through the classroom concentration degree database for the user to preview;

alternatively, the user may query the classroom concentration database for student historical concentration information via the front-end page.

The invention does not limit the equivalent value of the weight proportion and the detection frequency mentioned in the embodiment, the threshold value should be adjusted by considering the real situation caused by different teaching environments according to the actual situation, for example, the proportion of the posture score of the physical education class needing a large amount of posture movement and the teaching environment needing only to listen to and take notes should be set as different weights.

While the foregoing is directed to the preferred embodiment of the present invention, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention as defined by the appended claims.

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.

The above description is only a preferred embodiment of the present invention, and not intended to limit the scope of the present invention, and all modifications of equivalent structures and equivalent processes, which are made by using the contents of the present specification and the accompanying drawings, or directly or indirectly applied to other related technical fields, are included in the scope of the present invention.

Claims

1. A method for detecting classroom concentration, which is characterized by comprising the following steps:

2. The method of claim 1, wherein the step of generating pose optimization benchmarks in real-time based on pose data of individual students comprises:

3. The method of claim 2, wherein the step of calculating an initial concentration evaluation value for each student based on the pose optimization reference, the expression optimization reference, and the pose evaluation value and the expression evaluation value corresponding to each student comprises:

4. The method of claim 3, wherein after the step of screening out target students whose initial concentration ratings are less than the current standard score, further comprising:

determining posture data and facial expression data of the target student;

5. The method of any one of claims 1-4, wherein the concentration recognition model comprises a preset teaching environment recognition model, a preset body pose recognition model, a preset expression recognition model, and a preset face recognition model;

6. The method of claim 5, wherein the step of obtaining facial expression data of each student by the preset expression recognition model based on the individual face image of each student comprises:

7. The method of claim 5, wherein the preset human pose recognition model comprises a neural network training program;

8. The method of claim 7, wherein the image and annotation information comprises human bounding boxes, human keypoint coordinates, human face keypoint coordinates; the human body boundary box is used for monitoring key points of a human body; the human body key point coordinates comprise coordinates of the human body important part points in a two-dimensional image rectangular coordinate system; the coordinates of the key points of the human face comprise the coordinates of the important points of the human face in a rectangular coordinate system of the two-dimensional image.

9. The method as claimed in claim 5, wherein the step of performing face recognition on each student in the current teaching scene image based on the preset face recognition model specifically comprises:

10. A classroom concentration detection apparatus, the apparatus comprising: