CN115311606B

CN115311606B - Classroom recorded video validity detection method

Info

Publication number: CN115311606B
Application number: CN202211219364.XA
Authority: CN
Inventors: 刘盾; 张紫徽; 李萌; 余军; 颜涛; 唐学怡
Original assignee: Chengdu Chinamcloud Technology Co ltd
Current assignee: Chengdu Chinamcloud Technology Co ltd
Priority date: 2022-10-08
Filing date: 2022-10-08
Publication date: 2022-12-27
Anticipated expiration: 2042-10-08
Also published as: CN115311606A

Abstract

The invention relates to a classroom recorded video validity detection method, which comprises the following steps: segmenting videos needing to be analyzed, carrying out audio detection on each segment of video, and executing subsequent steps after requirements are met; carrying out video detection, text detection, human body posture detection and human face detection on the segmented video meeting the audio detection requirement in sequence; obtaining the time length occupied by the human voice in the video, the frequency of the text appearing in the video, the maximum human body number detected and the frequency of the detected human face; and classifying the teaching video of the detection result of each video segment through the trained decision tree, classifying the whole video, and finally judging the effectiveness of the whole video. The invention combines the two key points of audio content and video content and a method for evaluating and scoring the effectiveness of classroom content by using a decision tree analysis method, thereby helping to screen out effective classrooms and reducing the waste of resources caused by the occupation of resources in ineffective classrooms.

Description

Classroom recorded video validity detection method

Technical Field

The invention relates to the technical field of video analysis, in particular to a detection method for effectiveness of classroom recorded video.

Background

Each classroom in the school can automatically record teaching courses according to the school timetables, and the school can keep effective teaching videos in the classroom; however, due to various reasons, part of classrooms may not have normal lessons, such as the teacher asking for leave, changing the time and place of lessons, the students taking their own study, and the people going to the internet at home due to epidemic situations, such recorded videos will become meaningless lessons after being recorded and stored, and most of the resources will be wasted to retain and process these invalid lessons; therefore, how to detect and judge the effectiveness of the classroom recorded video is a problem to be considered at present.

Disclosure of Invention

The invention aims to overcome the defects of the prior art, provides a method for detecting the effectiveness of classroom recorded videos, and solves the problem that the effectiveness of the traditional classroom recorded videos cannot be detected.

The purpose of the invention is realized by the following technical scheme: a detection method for effectiveness of classroom recorded video comprises the following steps:

s1, segmenting videos needing to be analyzed at fixed time intervals, firstly carrying out audio detection on each segment of video, and executing subsequent steps after requirements are met, otherwise, finishing the detection on the segment of video;

s2, sequentially carrying out video detection, text detection, human body posture detection and human face detection on the segmented video meeting the audio detection requirement;

s3, obtaining the duration occupied by the voice in the video, the number of times of the text appearing in the video, the maximum human body number and the number of times of the detected human face after the relevant detection is finished;

and S4, performing teaching video classification on the detection result of each video segment through the trained decision tree, classifying the whole video based on the segmented video classification result, and finally judging the effectiveness of the whole video.

The audio detection comprises: and detecting the audio of the segmented video through a voice detection algorithm, detecting the position and the duration of the voice in each segment of audio, if no voice appears in the current video segment, judging the video to be invalid classroom content directly without detecting the subsequent steps.

The video detection comprises: acquiring data of a certain frame of the video, comparing the data with the next frame of data, calculating the position and the size of an inconsistent area in a front picture and a rear picture, continuously detecting the picture change of the next frame of video if the current frame has no change, and indicating that activities exist in a classroom if the current frame has a change.

The text detection comprises: and intercepting a picture with the middle area of the current frame data being N, detecting a text through a pre-trained text detection model, and recording the time and the position of the text if the text is detected.

The human body posture detection comprises: and detecting the human body of the current frame by adopting a human body posture detection method, judging whether a person exists in the classroom picture of the current frame, and recording the positions of all effective human body postures if the human body posture is detected.

The face detection comprises the following steps: and judging whether a face facing a student seat appears in the current frame picture, if so, indicating that the face is a teacher, recording the position of the front face appearing in the current frame picture, and if not, indicating that the human body posture detected in the human body posture detection is a student, and indicating that the current frame picture is a student for self-study.

The detection method further comprises the following steps: calculating by using a large amount of effective and ineffective classroom video data to obtain four data of the time length occupied by the human voice in the video, the times of the text appearing in the video, the detected maximum human body number and the detected times of the human face appearing and whether the data is a data set of an effective teaching video, and training a classification decision tree by using a CART classification tree algorithm.

The invention has the following advantages: a method for detecting effectiveness of classroom recorded videos integrates detection of human voice, dynamic change in videos and detection results of texts, human bodies and human faces to judge whether the recorded classroom videos are effective classroom videos or not. The two key points of the audio content and the video content are combined, and a method for evaluating and scoring the effectiveness of the classroom content by using a decision tree analysis method is used, so that an effective classroom is screened out, and the waste of resources caused by the fact that resources are occupied by an ineffective classroom is reduced.

Drawings

FIG. 1 is a schematic flow chart of the present invention.

Detailed Description

The invention will be further described with reference to the accompanying drawings, but the scope of protection of the invention is not limited to the following.

As shown in fig. 1, the present invention specifically relates to a classroom recorded video validity detection method, which adopts multiple steps to detect and score audio and video in segmented video respectively by using multiple feature detection, so as to identify video segments which may be normal lectures of teachers. Then judging whether the whole video is an effective video for the teacher to give lessons according to the identification results of all the segments, wherein the effective video comprises the following contents:

firstly, segmenting the video to be analyzed at fixed time intervals, such as ten minutes. Then the audio and video in each segment are extracted respectively, and the segments are detected and scored independently. The segmented detection is performed to avoid the possible influence on the overall detection caused by a long quiet time period due to some reason (such as an in-class examination, etc.) in a classroom.

Step two: in the audio detection, the input audio is detected by using a human voice detection algorithm, and the position and the duration of the human voice in each audio segment are detected. And if no voice appears in the current video section, directly determining that the current section is invalid classroom content, and not performing detection of other steps, otherwise, entering video detection of the third step.

However, it is not enough to detect whether the classroom recorded video is effective or not by only depending on whether the voices appear or not, and various voices also appear in the classroom under some other conditions, for example, the voices of students can be generated in the study classroom, and voices such as conversation can be generated when the cleaning staff cleans the classroom. Therefore, the detection of the video content is also needed to comprehensively judge whether the video is the video for the teacher to give lessons. Therefore, subsequent testing is required for further determination.

Step three: in video detection, a video motion detection algorithm is used to detect a changed part in a video picture. Specifically, data of a first frame of the video is obtained, and then the next frame is taken as pixel comparison, and the position and the size of an inconsistent area in a front picture and a back picture are calculated. If the current frame has no change, continuously detecting the picture change of the next frame of video; if the current frame picture is changed, the fact that the activities exist in the classroom is shown, but whether the activities are teaching activities or other light and shadow changes or other activities need further analysis, and the next detection step is carried out.

Step four: in this step, text detection is performed in order to detect whether a teacher's blackboard writing or PPT text is present on the blackboard. If text information appears in the video, the possibility that the current video segment is an effective video for the teacher to give lessons increases. Because words which are meaningless to course analysis, such as recording time, may appear at the edge of the video, text detection is performed only by taking eighty percent of the middle of the current frame. And performing text detection on the current frame by adopting a pre-trained text detection model, and recording the time and the position of the text if the text is detected. And then, carrying out human body posture detection in the next step.

Step five: the human body posture method is adopted to carry out human body detection on the current frame, and the detection in the step is to judge whether people exist in a classroom or not, and the people are likely to be teachers and students in class. And if the human body posture is detected, recording the positions of all effective human body postures, and carrying out detection in the next step.

Step six: the detection of the step is to judge whether a face facing to a student seat appears, if the face facing to the student appears, the possibility that the face is a teacher is high, and the position of the face on the front side in the current frame is recorded; if there is no face facing the student, the person detected in the previous step is probably only the student who is self-learning in the classroom, and the video may not be the video of the normal teaching of the teacher.

Step seven: according to the detection of the steps, the following data can be obtained: (1) the time length of the human voice in the video; (2) the number of times the text appears in the video; (3) the maximum number of detected human bodies; and (4) detecting the number of times of the human face. Calculations were performed using a large amount of valid and invalid classroom video data to derive a data set of these four quantities and whether it was a valid lecture video. The classification decision tree is then trained using the CART classification tree algorithm. And classifying the teaching videos of the detection results of the video bands by using the trained decision tree.

Step eight: and classifying the whole video based on the classification result of the segmented video. The entire video can be determined to be an effective teaching video by simply using an empirical value for classification, for example, the proportion of the number of the segments determined to be effective teaching to the total number of the segments exceeds a certain empirical value (e.g., 60% -80%).

The foregoing is illustrative of the preferred embodiments of this invention, and it is to be understood that the invention is not limited to the precise form disclosed herein and that various other combinations, modifications, and environments may be resorted to, falling within the scope of the concept as disclosed herein, either as described above or as apparent to those skilled in the relevant art. And that modifications and variations may be effected by those skilled in the art without departing from the spirit and scope of the invention as defined by the appended claims.

Claims

1. A classroom recorded video validity detection method is characterized in that: the detection method comprises the following steps:

s4, performing teaching video classification on the detection result of each video segment through the trained decision tree, classifying the whole video based on the classification result of the segmented video, and finally judging the effectiveness of the whole video;

the audio detection comprises: detecting the audio frequency of the segmented video through a voice detection algorithm, detecting the position and duration of the voice in each audio frequency, if no voice appears in the current video segment, judging the video segment as invalid classroom content directly without detecting the subsequent steps;

the video detection comprises: acquiring data of a certain frame of the video, comparing the data with the next frame of data to calculate the position and the size of an inconsistent area in a front picture and a rear picture, continuously detecting the picture change of the next frame of video if the current frame has no change, and indicating that activities exist in a classroom if the current frame has a change;

the text detection comprises: intercepting a picture with the middle area of the current frame data being N, carrying out text detection through a pre-trained text detection model, and recording the time and the position of the text if the text is detected;

the human body posture detection comprises: detecting the human body of the current frame by adopting a human body posture detection method, judging whether a human body exists in a classroom picture of the current frame, and recording the positions of all effective human body postures if the human body postures are detected;

the face detection comprises the following steps: and judging whether a face facing a student seat appears in the current frame, if so, indicating that the face is a teacher, recording the position of the face on the front side appearing in the current frame, and if not, indicating that the human body posture detected in the human body posture detection is a student, and indicating that the current frame is a study of the student.

2. The method for detecting effectiveness of video recorded in classroom according to claim 1, wherein: the detection method further comprises the following steps: calculating by using a large amount of effective and ineffective classroom video data to obtain four data of the time length occupied by the human voice in the video, the times of the text appearing in the video, the detected maximum human body number and the detected times of the human face appearing and whether the data is a data set of an effective teaching video, and training a classification decision tree by using a CART classification tree algorithm.