CN110650369B - Video processing method and device, storage medium and electronic equipment - Google Patents

Video processing method and device, storage medium and electronic equipment Download PDF

Info

Publication number
CN110650369B
CN110650369B CN201910936396.3A CN201910936396A CN110650369B CN 110650369 B CN110650369 B CN 110650369B CN 201910936396 A CN201910936396 A CN 201910936396A CN 110650369 B CN110650369 B CN 110650369B
Authority
CN
China
Prior art keywords
video
video clip
feature
sub
value
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910936396.3A
Other languages
Chinese (zh)
Other versions
CN110650369A (en
Inventor
李凯
赵红亮
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chengdu Yudi Technology Co ltd
Original Assignee
Beijing Qian Ren Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Qian Ren Technology Co ltd filed Critical Beijing Qian Ren Technology Co ltd
Priority to CN201910936396.3A priority Critical patent/CN110650369B/en
Publication of CN110650369A publication Critical patent/CN110650369A/en
Application granted granted Critical
Publication of CN110650369B publication Critical patent/CN110650369B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
    • H04N21/44008Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving operations for analysing video streams, e.g. detecting features or characteristics in the video stream
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/46Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/21Server components or server architectures
    • H04N21/218Source of audio or video content, e.g. local disk arrays
    • H04N21/2187Live feed
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/436Interfacing a local distribution network, e.g. communicating with another STB or one or more peripheral devices inside the home
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/47End-user applications
    • H04N21/475End-user interface for inputting end-user data, e.g. personal identification number [PIN], preference data
    • H04N21/4756End-user interface for inputting end-user data, e.g. personal identification number [PIN], preference data for rating content, e.g. scoring a recommended movie
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/83Generation or processing of protective or descriptive data associated with content; Content structuring
    • H04N21/845Structuring of content, e.g. decomposing content into time segments
    • H04N21/8456Structuring of content, e.g. decomposing content into time segments by decomposing the content in the time domain, e.g. in time segments

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Physics & Mathematics (AREA)
  • Human Computer Interaction (AREA)
  • Health & Medical Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Theoretical Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Databases & Information Systems (AREA)
  • Electrically Operated Instructional Devices (AREA)

Abstract

The embodiment of the application discloses a video processing method, a video processing device, a storage medium and electronic equipment, wherein the method comprises the following steps: acquiring a video, wherein the video comprises at least one video clip, the video clip corresponds to at least one courseware page in courseware, and the courseware page is associated with a weight value; identifying user characteristics in the video clip to obtain characteristic information corresponding to the video clip; and determining a highlight video clip based on the characteristic information corresponding to the video clip and the weight value associated with at least one courseware page corresponding to the video clip. According to the embodiment of the application, the importance degree of the courseware page display content is considered, and compared with the situation that wonderful content is intercepted only by observing the states of students and teachers in the prior art, the acquired video data are more accurate.

Description

Video processing method and device, storage medium and electronic equipment
Technical Field
The present application relates to the field of computer technologies, and in particular, to a video processing method and apparatus, a storage medium, and an electronic device.
Background
With the development of the internet, online education is popular with more and more people. Online education adopts a video form, and compared with a traditional fixed classroom, the online education has the characteristics of more convenience in movement and more visual pictures. For online education, if wonderful contents in a classroom are acquired after the classroom is finished, in the prior art, videos in the classroom are played back manually, and the wonderful contents in the videos are intercepted by observing the states of students and teachers, so that the acquired video data are not accurate enough.
Disclosure of Invention
The embodiment of the application provides a video processing method, a video processing device, a storage medium and electronic equipment, which can be used for manually playing back videos in a classroom and intercepting wonderful contents in the videos by observing the states of students and teachers; the video data obtained in this way is not accurate enough. The technical scheme is as follows;
in a first aspect, an embodiment of the present application provides a video processing method, where the method includes:
acquiring a video, wherein the video comprises at least one video clip, the video clip corresponds to at least one courseware page in courseware, and the courseware page is associated with a weight value;
identifying user characteristics in the video clip to obtain characteristic information corresponding to the video clip;
and determining a highlight video clip based on the characteristic information corresponding to the video clip and the weight value associated with at least one courseware page corresponding to the video clip.
In a second aspect, an embodiment of the present application provides a video processing apparatus, including:
the video acquisition module is used for acquiring a video, wherein the video comprises at least one video clip, the video clip corresponds to at least one courseware page in courseware, and the courseware page is associated with a weight value;
the characteristic information generation module is used for identifying the user characteristics in the video clip to obtain the characteristic information corresponding to the video clip;
and the highlight video clip determining module is used for determining the highlight video clip based on the characteristic information corresponding to the video clip and the weight value associated with at least one courseware page corresponding to the video clip.
In a third aspect, embodiments of the present application provide a computer storage medium storing a plurality of instructions adapted to be loaded by a processor and to perform the above-mentioned method steps.
In a fourth aspect, an embodiment of the present application provides an electronic device, including: a processor and a memory; wherein the memory stores a computer program adapted to be loaded by the processor and to perform the above-mentioned method steps.
The beneficial effects brought by the technical scheme provided by some embodiments of the application at least comprise:
in the embodiment of the application, each video clip of the video corresponds to at least one courseware page in courseware, the courseware pages are associated with the weight values according to the importance degree of the display content, and whether the video clips are wonderful or not is judged by the user characteristics in each video clip and the weight values of the courseware pages corresponding to the user characteristics. According to the embodiment of the application, the importance degree of the courseware page display content is considered, and compared with the situation that wonderful content is intercepted only by observing the states of students and teachers in the prior art, the acquired video data are more accurate. Meanwhile, each video clip is directly used as a selection basis of the highlight content, and compared with the situation that in the prior art, manual capturing of the highlight video in the whole video depends on the individual reaction speed, the positioning is more accurate.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
FIG. 1 is a system architecture diagram provided by an embodiment of the present application;
FIG. 2 is a schematic diagram illustrating a display of a video on an electronic device according to an embodiment of the present application;
FIG. 3 is a schematic illustration of a display of a video processing operation on an electronic device according to an embodiment of the present application;
fig. 4 is a schematic flowchart of a video processing method according to an embodiment of the present application;
fig. 5 is a schematic flowchart of a video processing method according to an embodiment of the present application;
fig. 6 is a schematic structural diagram of a video processing apparatus according to an embodiment of the present application.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more clear, embodiments of the present application will be described in further detail below with reference to the accompanying drawings.
Referring to fig. 1, a system architecture of a video processing apparatus or a video screen processing method according to an exemplary embodiment of the present application is shown.
As shown in fig. 1, the system architecture may include a first terminal device 100, a first network 101, a server 102, a second network 103, and a second terminal device 104. The first network 104 is used to provide a medium for a communication link between the first terminal device 101 and the server 102, and the second network 103 is used to provide a medium for a communication link between the second terminal device 104 and the server 102. The first network 101 and the second network 103 may include various types of wired or wireless communication links, such as: the wired communication link includes an optical fiber, a twisted pair wire, or a coaxial cable, and the WIreless communication link includes a bluetooth communication link, a WIreless-FIdelity (Wi-Fi) communication link, or a microwave communication link, etc.
The first terminal device 100 communicates with the second terminal device 104 through the first network 101, the server 102, the second network 103, the first terminal device 100 sends a message to the server 102, the server 102 forwards the message to the second terminal device 104, the second terminal device 104 sends the message to the server 102, the server 102 forwards the message to the second terminal device 100, thereby realizing communication between the first terminal device 100 and the second terminal device 104, and the message type interacted between the first terminal device 100 and the second terminal device 104 includes control data and service data.
In the present application, the first terminal device 100 is a terminal for students to attend class, and the second terminal device 104 is a terminal for teachers to attend class; or the first terminal device 100 is a terminal for the teacher to attend class and the second terminal device 104 is a terminal for the student to attend class. For example: the service data is a video stream, the first terminal device 100 collects a first video stream in the course of the student through the camera, the second terminal device collects a second video stream in the course of the teacher through the camera 104, the first terminal device 100 sends the first video stream to the server 102, the server 102 sends the first video stream to the second terminal device 104, and the second terminal device 104 displays the first video stream and the second video stream on the interface; the second terminal device 104 sends the second video stream to the server 102, the server 102 forwards the second video stream to the first terminal device 100, and the first terminal device 100 displays the first video stream and the second video stream.
The class-taking mode of the application can be one-to-one or one-to-many on-line live broadcast classes, namely that one teacher corresponds to one student or one teacher corresponds to a plurality of students. Correspondingly, in the one-to-one teaching mode, a terminal used for a teacher to attend a class and a terminal used for a student to attend the class are communicated; in the one-to-many teaching method, one terminal for a teacher to attend a class and a plurality of terminals for students to attend a class are communicated with each other. The class-giving mode of the application can also be a recorded class, such as a single class, a plurality of people can give classes together (such as a small class, a large class and the like), correspondingly, the single class-giving mode is a mode of communication between a student class-giving terminal and a server and/or a platform, and the multi-people class-giving mode is a mode of communication between a plurality of student class-giving terminals and a server and/or a platform. In addition, this application also can be the combination of live broadcast class, recorded broadcast class, for example: part of the time period is a recorded session, part of the time period is a live session, etc.
Various communication client applications may be installed on the first terminal device 100 and the second terminal device 104, for example: video recording applications, video display applications, voice interaction applications, search-type applications, instant messaging tools, mailbox clients, social platform software, and the like.
The first terminal device 100 and the second terminal device 104 may be hardware or software. When the terminal devices 101 to 103 are hardware, they may be various first terminal devices having a display screen, including but not limited to smart phones, tablet computers, laptop portable computers, desktop computers, and the like. When the first terminal device 100 and the second terminal device 104 are software, they may be installed in the first terminal device listed above. Which may be implemented as multiple software or software modules (e.g., to provide distributed services) or as a single software or software module, and is not particularly limited herein.
When the first terminal device 100 and the second terminal device 104 are hardware, a display device and a camera may be further installed thereon, the display device may display various devices capable of implementing a display function, and the camera is used to collect a video stream; for example: the display device may be a cathode ray tube (CR) display, a light-emitting diode (LED) display, an electronic ink panel, a Liquid Crystal Display (LCD), a Plasma Display Panel (PDP), or the like. The user can view information such as displayed text, pictures, videos, and the like using the display devices on the first terminal device 100 and the second terminal device 104.
It should be noted that the online education method provided in the embodiment of the present application is generally executed by the first terminal device 100, and accordingly, the online education apparatus is generally disposed on the first terminal device 100, that is, the first terminal device in the embodiment of the present application may be the first terminal device 100.
The server 102 may be a server that provides various services, and the server 102 may be hardware or software. When the server 105 is hardware, it may be implemented as a distributed server cluster composed of a plurality of servers, or may be implemented as a single server. When the server 102 is software, it may be implemented as a plurality of software or software modules (for example, for providing distributed services), or may be implemented as a single software or software module, and is not limited in particular herein.
It should be understood that the number of terminal devices, networks, and servers in fig. 1 is merely illustrative. Any number of terminal devices, networks, and servers may be used, as desired for implementation.
The server stores videos generated in a live broadcast course, and when other first terminal equipment requests the videos, the videos can be directly distributed to the requested first terminal equipment.
In the following method embodiments, for convenience of description, only the execution subject of each step is described as a terminal.
The video processing method provided by the embodiment of the present application will be described in detail below with reference to fig. 2 to 4.
Referring to fig. 2 to fig. 4, a flow chart of a video processing method according to an embodiment of the present application is schematically shown. As shown in fig. 2 to 4, the method of the embodiment of the present application may include the steps of:
s401, acquiring a video.
The video comprises at least one video clip, the video clip corresponds to at least one courseware page in courseware, and the courseware page is associated with a weight value.
The terminal 100 is installed with video processing software, and a display interface 301 of the video processing software includes an input box 302, and an operator can input a video through the input box 302. Specifically, the input video may be a video that is pre-stored in the terminal 100 and selected by the operator through clicking and browsing the corresponding folder, or may be a video that is input by the operator in a manner that the operator directly inputs a path where the video is located in the input box 302.
The video is a video file displayed by the student through the terminal 100 in the process of on-line learning, and the video comprises a courseware interface 201 for displaying courseware, a student interface 202 for displaying student video information and a teacher interface 203 for displaying teacher video information. The courseware can be PPT comprising a plurality of pages, and the courseware page is a page corresponding to each page of PPT displayed in the courseware; for example, if the courseware includes three pages of PPTs, the courseware page is a page displaying a first page of PPT, a page displaying a second page of PPT, and a page displaying a third page of PPT.
Each video clip may correspond to only one courseware page, or each video clip may correspond to multiple courseware pages. In order to obtain which courseware page or pages each video clip specifically corresponds to, and the duration corresponding to each courseware page, the following steps can be taken: acquiring a playing time interval of each courseware page; and determining the courseware page corresponding to the video clips and the corresponding duration in the courseware page according to the playing time interval of the courseware page and the playing time interval of each video clip. For example, the courseware comprises three pages of PPTs, wherein the playing time interval of the courseware page displaying the first page of PPT is 0-135s, the playing time interval of the courseware page displaying the second page of PPT is 136-; for the video clip with the playing time interval of 10-20s, the corresponding courseware page is the courseware page displaying the first PPT page; for the video clip with the playing time interval of 130-140s, the courseware page corresponding to 130-135s is the courseware page displaying the first PPT, and the corresponding time length is 5 s; the courseware page corresponding to 136-140s is the courseware page displaying the second page PPT, and the corresponding time length is 5 s.
Obtaining the playing time interval of the courseware page, including: extracting all first image frames in a video and playing a time axis of the first image frames; extracting text information at a specified area in the first image frame; the designated area is a title area or a footer area of the PPT displayed on the courseware page; and combining time axes corresponding to the first image frames with the same text information and continuous playing time to obtain the playing time interval of the courseware page.
The video segments can be obtained by traversing and dividing the playing time of the video according to a preset time length. For example, the playing time of the video is 1-300s, and if the preset time length is 10 seconds, the video can be divided into 291 video segments with playing time intervals of 0-10s, 1-11s, 2-12s, … …, 290 and 300 s; if the predetermined time duration is 10 seconds and 15 seconds, the video can be divided into 577 video segments with playing time intervals of 0-10s, 1-11s, … …, 290-300s, 0-15s, 1-16s, … …, 285-300 s.
The weighted value is the importance degree of one PPT content corresponding to the courseware page in the whole PPT, and the sum of the weighted values of all the courseware pages is 1. For example, the courseware comprises three pages of PPTs, the weight value of a courseware page of the first page of PPT is 0.2, the weight value of a courseware page of the second page of PPT is 0.7, and the weight value of a courseware page of the third page of PPT is 0.1. The weight value may be preset by an administrator.
S402, identifying the user characteristics in the video clip to obtain the characteristic information corresponding to the video clip.
The user characteristics include at least one of: the face sub-features and/or voice sub-features of the student, and the face sub-features and/or voice sub-features of the teacher.
Identifying the user characteristics in the video segment to obtain the characteristic information corresponding to the video segment, which may include: identifying the user features in the video segment, and generating a feature vector corresponding to the video segment; computing a quantile value vector for the video segment based on the feature vector.
Identifying the user features in the video segment, and generating a feature vector corresponding to the video segment includes: s5021, obtaining the user characteristics in the unit time interval of the video clip to generate characteristic sub-vectors; s5022, obtaining the feature vector of the video clip according to the playing time interval of the video clip and the feature sub-vector.
The computing a quantile value vector for the video segment based on the feature vector, comprising: obtaining a random distribution function corresponding to each sub-feature according to the duration of the video segment and the mean value of each sub-feature in the feature vector; and calculating the quantile value of the sub-features in the feature vector according to the random distribution function corresponding to the sub-features to obtain the quantile value vector of the video segment.
S403, determining a highlight video clip based on the characteristic information corresponding to the video clip and the weight value associated with at least one courseware page corresponding to the video clip.
The highlight video clip can be directly output to a designated folder on the terminal 100; the designated folder may be set by the user in advance, or may be default by the system.
In the embodiment of the application, each video clip of the video corresponds to at least one courseware page in courseware, the courseware pages are associated with the weight values according to the importance degree of the display content, and whether the video clips are wonderful or not is judged by the user characteristics in each video clip and the weight values of the courseware pages corresponding to the user characteristics. According to the embodiment of the application, the importance degree of the courseware page display content is considered, and compared with the situation that wonderful content is intercepted only by observing the states of students and teachers in the prior art, the acquired video data are more accurate. Meanwhile, each video clip is directly used as a selection basis of the highlight content, and compared with the situation that in the prior art, manual capturing of the highlight video in the whole video depends on the individual reaction speed, the positioning is more accurate.
Referring to fig. 5, a flow chart of a video processing method according to an embodiment of the present application is shown. The present embodiment is exemplified by applying the video processing method to a terminal. The video processing method may include the steps of:
s501, video is obtained.
The video comprises at least one video clip, the video clip corresponds to at least one courseware page in courseware, and the courseware page is associated with a weight value.
See S401 for details, which are not described herein.
S502, identifying the user features in the video clip, and generating a feature vector corresponding to the video clip.
The user characteristics may be the inclusion of student characteristics only; or may include only teacher features; it may also include both student and teacher features. The features may be face sub-features and/or speech sub-features. The user characteristics include at least one of: a face sub-feature and/or a voice sub-feature of the student; a teacher's face sub-feature and/or a voice sub-feature. The face sub-features comprise face appearance sub-features and happy expression appearance sub-features.
Identifying the user features in the video segment, and generating a feature vector corresponding to the video segment includes: s5021, obtaining the user characteristics in the unit time interval of the video clip to generate characteristic sub-vectors; s5022, obtaining the feature vector of the video clip according to the playing time interval of the video clip and the feature sub-vector.
The user characteristics in the unit time interval of the video clip refer to whether the students and/or teachers in the video clip have human faces, happy expressions and voices in each unit time interval. The method comprises the steps that attribute information of a video part corresponding to a unit time interval is obtained from three dimensions of face appearance, happy expression appearance and voice appearance, and is recorded as a feature sub-vector [ Fs, Es, Ss ]; wherein, Fs is the face appearance in the corresponding unit time interval, Es is the happy expression appearance in the corresponding unit time interval, and Ss is the voice condition in the unit time interval.
The obtaining of the facial sub-features in the unit time interval of the video clip in the S5021 includes: splitting the video clip into a plurality of video sub-clips according to a unit time interval; extracting all or part of a second image frame in the video sub-segment; identifying whether a face appears in each second image frame and whether a happy expression appears in each second image frame through a face identification technology; when the number of frames of the face exceeds a first preset value, Fs is 1, otherwise Fs is zero; and when the number of the frames with the happy expression exceeds a second preset value, Es is taken as 1, otherwise, Es is taken as zero.
The face recognition technology is a biological recognition technology for carrying out identity recognition based on face feature information of people, and is a series of related technologies for collecting images or video streams containing faces by using a camera or a camera, automatically detecting and tracking the faces in the images and further carrying out face recognition on the detected faces. The face Recognition algorithm may include, but is not limited to, a face-based Recognition algorithm (Feature-based Recognition algorithms), an entire face image-based Recognition algorithm (application-based Recognition algorithms), a Template-based Recognition algorithm (Template-based Recognition algorithms), an algorithm for Recognition using a neural network (Recognition algorithms using neural network), an illumination estimation model theory-based Recognition algorithm, and the like.
In S5021, the voice sub-features of the video clip in the unit time interval are acquired through the voice recognition technology, if voice information is continuously detected in the unit time interval, the Ss is recorded as 1, and if not, zero is taken.
The unit time interval may be a time interval corresponding to each second. The feature sub-vector may only include the features of the students in the video information of the students in the unit time interval, the feature sub-vector may also only include the features of the teachers in the video information of the teachers in the unit time interval, and the feature sub-vector may also include the features of the students in the video information of the students in the unit time interval and the features of the teachers in the video information of the teachers.
For example, the feature sub-vector only includes the student features in the student video information in the unit time interval, the unit time interval is the time interval corresponding to each second, the number of frames of the images extracted in each second is 24 frames, when the first preset value is 5 and the second preset value is 3, for the video clip with the playing time interval of 0-10s, the number of frames of the student faces appearing in the 24 frames of images of the student video information extracted in the unit time interval of 0-1s is 10 frames, the number of frames of the happy expressions of the students is 5 frames, and if it is detected that the student always speaks in 0-1s, Fs is1Is 1, Es1Is 1, Ss1Is 1, the eigenvector is then [1, 1%]。
The feature sub-vector only comprises teacher features in teacher video information in a unit time interval, the unit time interval is a time interval corresponding to each second, the number of frames of images extracted in each second is 24, when a first preset value is 5 and a second preset value is 3, aiming at a video clip with a playing time interval of 0-10s, the number of frames of teacher faces in the 24 frames of images of the teacher video information extracted in the unit time interval of 0-1s is 12, the number of frames of teacher happy expressions is 6, and when it is detected that the teacher is speaking all the time in 0-1s, Fs is2Is 1, Es2Is 1, Ss2Is 1, the eigenvector is then [1, 1%]。
The feature sub-vector comprises the student features in the student video information in the unit time interval and the teacher features in the teacher video information in the unit time interval, the unit time interval is the time interval corresponding to each second, the number of frames of the images extracted in each second is 24, when the first preset value is 5 and the second preset value is 3, aiming at the video clip with the playing time interval of 0-10s, the number of frames of the faces of the students in the 24 frames of images of the student video information extracted in the unit time interval of 0-1s is 10, and the height of the students is highThe number of the frames of the expression is 5, and if the student is detected to speak all the time within 0-1s, Fs1Is 1, Es1Is 1, Ss11, the number of frames of the face of the teacher in the 24 frames of images of the video information of the teacher extracted in the unit time interval of 0-1s is 12, the number of frames of the happy expression of the teacher is 6, and if the teacher is detected to speak all the time in 0-1s, Fs2Is 1, Es2Is 1, Ss2Is 1, the eigenvector is then [1, 1, 1, 1, 1, 1]。
The feature vector of the video segment in S5022 is: and summing all the characteristic sub-vectors in the playing time interval of the video clip. For example, a video segment with a playing time interval of 0-10s is played, and the corresponding feature vector is the sum of the feature sub-vectors of 0-1s, 1-2s, 2-3s, … … and 9-10 s.
S503, obtaining a random distribution function corresponding to the sub-features according to the duration of the video segment and the mean value of each sub-feature in the feature vector.
The duration of the video clip refers to the number of unit time intervals contained in the playing time interval of the video clip; for example, a video clip with a time interval of 0-10s is played, which includes ten unit time intervals of 0-1s, 1-2s, … …, 9-10s, i.e. the duration is 10.
The mean value of each sub-feature in the feature vector is a numerical value obtained by dividing each sub-feature in the feature vector by the duration; for example, if a video segment with a playing time interval of 0-10s is played, and its corresponding feature vector is [2, 1, 5, 3, 1, 6], its average value of each sub-feature is 0.2, 0.1, 0.5, 0.3, 0.1, 0.6.
In the embodiment of the present application, assuming that characteristics of a face, a happy expression and a voice all conform to binomial distribution, a random distribution function corresponding to each sub-feature is: and taking the duration t of the video clip as the total duration of the test, and taking the mean value of each sub-feature as a binomial distribution function B (t, avg) of the probability avg of the sub-feature appearing in the unit time interval. For example, if a video segment with a playing time interval of 0-10s is played, and its corresponding feature vector is [2, 1, 5, 3, 1, 6], the average value of each sub-feature is 0.2, 0.1, 0.5, 0.3, 0.1, 0.6; at this time, for a video clip with a playing time interval of 0-10s, a binomial distribution function corresponding to the child feature of the face of the student is B (10, 0.2), a binomial distribution function corresponding to the child feature of the happy expression of the student is B (10, 0.1), a binomial distribution function corresponding to the child feature of the language of the student is B (10, 0.3), a binomial distribution function corresponding to the child feature of the face of the teacher is B (10, 0.3), a binomial distribution function corresponding to the child feature of the happy expression of the teacher is B (10, 0.1), and a binomial distribution function corresponding to the child feature of the language of the teacher is B (10, 0.6).
S504, calculating the quantile value of the sub-feature in the feature vector according to the random distribution function corresponding to the sub-feature, so as to obtain the quantile value vector of the video segment.
For example, for a video clip with a playing time interval of 0-10s, the binomial distribution function corresponding to the student face appearance sub-feature is B (10, 0.2), and it can be found that the distribution and the quantile value corresponding to the student face appearance sub-feature are respectively:
prb(0):0.1074,0.1074;
prb(1):0.2684,0.3758;
prb(2):0.3020,0.6778;
prb(3):0.2013,0.8791;
prb(4):0.0881,0.9672;
prb(5):0.0264,0.9936;
prb(6):0.0055,0.9991;
prb(7):0.0008,0.9999;
prb(8):0.0001,1.0000;
prb(9):0.0000,1.0000;
prb(10):0.0000,1.0000。
the distribution of prb (i) refers to the probability that the child features of the face of the student appear in a video clip of 10 seconds for i seconds according to a binomial distribution function B (10, 0.2); the quantile value of prb (i) is the probability that the occurrence of the sub-features of the face of the student in a video clip of 10 seconds is less than or equal to i seconds according to the binomial distribution function B (10, 0.2). For example, the distribution 0.3020 corresponding to prb (2) indicates that the probability of the occurrence of the student face sub-feature in the video segment of 10 seconds is 0.3020 for 2 seconds, and the quantile value 0.6778 corresponding to prb (2) indicates that the probability of the occurrence of the student face sub-feature in the video segment of 10 seconds is 0.6778 for 2 seconds or less. The quantile value corresponding to prb (2) is the sum of the distribution corresponding to prb (0), the distribution corresponding to prb (1), and the distribution corresponding to prb (2).
Aiming at the video clip with the playing time interval of 0-10s, a binomial distribution function B (10, 0.2) corresponding to the appearance sub-features of the faces of the students, namely the appearance sub-features of the faces of the students appear for 2 seconds within 10 seconds, and the quantile value is 0.6778.
By the aid of the method, the quantile value of the child features of the happy expressions of the students, the quantile value of the child features of the languages of the students, the quantile value of the child features of the faces of the teachers, the quantile value of the child features of the happy expressions of the teachers and the quantile value of the child features of the languages of the teachers can be obtained within 0-10s, and the quantile value vector of the video clips can be obtained.
S505, comparing the quantile value of each sub-feature in the quantile value vector with the corresponding quantile value threshold value, and deleting the video segment corresponding to the quantile value vector when the quantile value of any sub-feature is smaller than the corresponding quantile value threshold value.
The score value threshold corresponding to each sub-feature may be a system default value. For example, the threshold of the place value of the child feature appearing in the student face is 0.4, the threshold of the place value of the child feature appearing in the happy expression of the student is 0.2, the threshold of the place value of the child feature appearing in the student language is 0.4, the threshold of the place value of the child feature appearing in the teacher face is 0.4, the threshold of the place value of the child feature appearing in the happy expression of the teacher is 0.2, and the threshold of the place value of the child feature appearing in the teacher language is 0.4.
And for the video clip with the playing time interval of 0-10s, deleting the video clip with the playing time interval of 0-10s if the quantile value of the sub-feature of the student face, the quantile value of the sub-feature of the happy expression of the student, the quantile value of the sub-feature of the language of the student, the quantile value of the sub-feature of the teacher face, the quantile value of the sub-feature of the happy expression of the teacher and the quantile value of the sub-feature of the language of the teacher have the condition that the quantile value is smaller than the corresponding quantile value threshold value.
S506, carrying out weighted summation on the quantile values of the quantile value vectors of the video clips according to the weight values associated with at least one courseware page corresponding to the video clips so as to obtain the credit values of the video clips.
When the video clip corresponds to a courseware page, the scoring value of the video clip is the weighted sum of the scoring value of each sub-feature in the scoring value vector corresponding to the video clip and the weighting value of the courseware page corresponding to the video clip. For example, the place-value vector corresponding to the video clip with the playing time interval of 0-10s is [ a, b, c, d, e, f ], the courseware page corresponding to 0-10s displays the first page PPT with the weight value of 0.2, and the score value of the video clip with the playing time interval of 0-10s is 0.2a +0.2b +0.2c +0.2d +0.2e +0.2 f. When the video clip corresponds to a plurality of courseware pages, the scoring values of the video clip are as follows: the method comprises the steps of firstly determining the proportion of the duration of each courseware page in a video clip in the duration of the whole video clip, then calculating the sub-rating value of each courseware page in the video clip according to the proportion, the grading value vector of the video clip and the weight value of each courseware page, and then summing the sub-rating values of all courseware pages corresponding to the video clip. For example, the place grading value vector corresponding to the video clip with the playing time interval of 130-; the courseware page corresponding to 136-140s displays the second page PPT with the weight value of 0.7, the time length of the second page PPT is 5s, the percentage of the time length of the video clip is 0.5, and the sub-score value of the second page PPT is 0.5-0.7 (o + p + q + r + s + t); the score value of the video segment with the playing time interval of 130-.
S507, determining the highlight video clip based on the scoring value.
Based on the score value, determining the highlight video segment may include: arranging the video clips in a descending order based on the score values, and selecting the first N video clips as wonderful video clips of the video; or arranging the video clips in an ascending order based on the score values, and selecting the last N video clips as the wonderful video clips of the video.
In addition, in each loop, the video clips with the highest score values can be selected in an iterative manner, the video clips which are overlapped with the video clips with the highest score values on the time axis are deleted, and then the above operations are repeated again based on the updated video until the number of the remaining video clips is N. N may be three, four, five, etc., and N may be a system default.
And S508, synthesizing the video clips according to the playing sequence to generate a wonderful video.
In the embodiment of the application, each video clip of the video corresponds to at least one courseware page in courseware, the courseware pages are associated with the weight values according to the importance degree of the display content, and whether the video clips are wonderful or not is judged by the user characteristics in each video clip and the weight values of the courseware pages corresponding to the user characteristics. According to the embodiment of the application, the importance degree of the courseware page display content is considered, and compared with the situation that wonderful content is intercepted only by observing the states of students and teachers in the prior art, the acquired video data are more accurate. Meanwhile, each video clip is directly used as a selection basis of the highlight content, and compared with the situation that in the prior art, manual capturing of the highlight video in the whole video depends on the individual reaction speed, the positioning is more accurate.
The following are embodiments of the apparatus of the present application that may be used to perform embodiments of the method of the present application. For details which are not disclosed in the embodiments of the apparatus of the present application, reference is made to the embodiments of the method of the present application.
Referring to fig. 6, a schematic structural diagram of a video processing apparatus according to an exemplary embodiment of the present application is shown. The video processing apparatus may be implemented as all or a part of the terminal by software, hardware, or a combination of both. The device includes:
the video obtaining module 601 is configured to obtain a video, where the video includes at least one video clip, the video clip corresponds to at least one courseware page in a courseware, and the courseware page is associated with a weight value;
a feature information generating module 602, configured to identify a user feature in the video segment, to obtain feature information corresponding to the video segment;
the highlight video clip determining module 603 is configured to determine a highlight video clip based on the feature information corresponding to the video clip and a weight value associated with at least one courseware page corresponding to the video clip.
Optionally, the feature information generating module 602 includes:
the feature vector generating unit is used for identifying the user features in the video segment and generating a feature vector corresponding to the video segment;
a quantile value vector generating unit for calculating a quantile value vector of the video segment based on the feature vector.
Optionally, the user characteristics include at least one of:
a face sub-feature and/or a voice sub-feature of the student;
a teacher's face sub-feature and/or a voice sub-feature.
Optionally, the quantile value vector generating unit includes:
a random distribution function determining subunit, configured to obtain a random distribution function corresponding to each sub-feature according to the duration of the video segment and a mean value of each sub-feature in the feature vector;
and the quantile value vector generating subunit is used for calculating the quantile value of the sub-feature in the feature vector according to the random distribution function corresponding to the sub-feature so as to obtain the quantile value vector of the video segment.
Optionally, the highlight video segment determination module 603 includes:
the score value calculating unit is used for carrying out weighted summation on the score value of the score value vector of the video clip according to the weight value associated with at least one courseware page corresponding to the video clip so as to obtain the score value of the video clip;
and the highlight video segment determining unit is used for determining the highlight video segment based on the scoring value.
Optionally, the apparatus further includes a video segment screening module, configured to compare a quantile value of each sub-feature in the quantile value vector with a quantile value threshold corresponding to the sub-feature, and delete the video segment corresponding to the quantile value vector when the quantile value of any sub-feature is smaller than the quantile value threshold corresponding to the sub-feature.
Optionally, the apparatus further includes a highlight video generation module, configured to combine the video segments according to a playing sequence to generate a highlight video.
It should be noted that, when the video processing apparatus provided in the foregoing embodiment executes the video processing method, only the division of the functional modules is illustrated, and in practical applications, the above functions may be distributed and completed by different functional modules according to needs, that is, the internal structure of the device is divided into different functional modules, so as to complete all or part of the functions described above. In addition, the video processing apparatus and the video processing method provided by the above embodiments belong to the same concept, and details of implementation processes thereof are referred to in the method embodiments and are not described herein again.
The above-mentioned serial numbers of the embodiments of the present application are merely for description and do not represent the merits of the embodiments.
In the embodiment of the application, each video clip of the video corresponds to at least one courseware page in courseware, the courseware pages are associated with the weight values according to the importance degree of the display content, and whether the video clips are wonderful or not is judged by the user characteristics in each video clip and the weight values of the courseware pages corresponding to the user characteristics. According to the embodiment of the application, the importance degree of the courseware page display content is considered, and compared with the situation that wonderful content is intercepted only by observing the states of students and teachers in the prior art, the acquired video data are more accurate. Meanwhile, each video clip is directly used as a selection basis of the highlight content, and compared with the situation that in the prior art, manual capturing of the highlight video in the whole video depends on the individual reaction speed, the positioning is more accurate.
An embodiment of the present application further provides a computer storage medium, where the computer storage medium may store a plurality of instructions, where the instructions are suitable for being loaded by a processor and executing the above method steps, and a specific execution process may refer to specific descriptions of the embodiments shown in fig. 4 and fig. 5, which are not described herein again.
The application also provides an electronic device comprising a processor and a memory; wherein the memory stores a computer program adapted to be loaded by the processor and to perform the above-mentioned method steps. The electronic device may be a terminal or a server.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by a computer program, which can be stored in a computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. The storage medium may be a magnetic disk, an optical disk, a read-only memory or a random access memory.
The above disclosure is only for the purpose of illustrating the preferred embodiments of the present application and is not to be construed as limiting the scope of the present application, so that the present application is not limited thereto, and all equivalent variations and modifications can be made to the present application.

Claims (10)

1. A video processing method, comprising:
acquiring a video, wherein the video comprises at least one video clip, the video clip corresponds to at least one courseware page in courseware, and the courseware page is associated with a weight value;
identifying user characteristics in the video clip to obtain characteristic information corresponding to the video clip;
and determining a wonderful video clip based on the characteristic information corresponding to the video clip, the weight value associated with at least one courseware page corresponding to the video clip and the proportion of the duration of each courseware page in the duration of the video clip.
2. The method of claim 1, wherein: the identifying the user characteristics in the video clip to obtain the characteristic information corresponding to the video clip includes:
identifying the user features in the video segment, and generating a feature vector corresponding to the video segment;
computing a quantile value vector for the video segment based on the feature vector.
3. The method of claim 2, wherein: the user characteristics include at least one of:
a face sub-feature and/or a voice sub-feature of the student;
a teacher's face sub-feature and/or a voice sub-feature.
4. The method of claim 3, wherein: the computing a quantile value vector for the video segment based on the feature vector, comprising:
obtaining a random distribution function corresponding to each sub-feature according to the duration of the video segment and the mean value of each sub-feature in the feature vector;
and calculating the quantile value of the sub-features in the feature vector according to the random distribution function corresponding to the sub-features to obtain the quantile value vector of the video segment.
5. The method of claim 3, wherein the determining a highlight video clip based on the feature information corresponding to the video clip and the weight value associated with the at least one courseware page corresponding to the video clip and based on the proportion of the duration of each of the courseware pages in the duration of the video clip comprises:
carrying out weighted summation on the place value of the place value vector of the video clip according to the weight value associated with at least one courseware page corresponding to the video clip to obtain the score value of the video clip;
determining the highlight video segment based on the score value.
6. The method of claim 3, further comprising: and comparing the place value of each sub-feature in the place value vector with the place value threshold value corresponding to the sub-feature, and deleting the video segment corresponding to the place value vector when the place value of any sub-feature is smaller than the place value threshold value corresponding to the sub-feature.
7. The method of any of claims 1 to 6, further comprising: and synthesizing the wonderful video clips according to the playing sequence to generate a wonderful video.
8. A video processing apparatus, comprising:
the video acquisition module is used for acquiring a video, wherein the video comprises at least one video clip, the video clip corresponds to at least one courseware page in courseware, and the courseware page is associated with a weight value;
the characteristic information generation module is used for identifying the user characteristics in the video clip to obtain the characteristic information corresponding to the video clip;
the highlight video clip determining module is used for determining a highlight video clip based on the characteristic information corresponding to the video clip, the weight value associated with at least one courseware page corresponding to the video clip and the proportion of the duration of each courseware page in the duration of the video clip.
9. A computer storage medium, characterized in that it stores a plurality of instructions adapted to be loaded by a processor and to carry out the method steps according to any one of claims 1 to 7.
10. An electronic device, comprising: a processor and a memory; wherein the memory stores a computer program adapted to be loaded by the processor and to perform the method steps of any of claims 1 to 7.
CN201910936396.3A 2019-09-29 2019-09-29 Video processing method and device, storage medium and electronic equipment Active CN110650369B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910936396.3A CN110650369B (en) 2019-09-29 2019-09-29 Video processing method and device, storage medium and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910936396.3A CN110650369B (en) 2019-09-29 2019-09-29 Video processing method and device, storage medium and electronic equipment

Publications (2)

Publication Number Publication Date
CN110650369A CN110650369A (en) 2020-01-03
CN110650369B true CN110650369B (en) 2021-09-17

Family

ID=69012015

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910936396.3A Active CN110650369B (en) 2019-09-29 2019-09-29 Video processing method and device, storage medium and electronic equipment

Country Status (1)

Country Link
CN (1) CN110650369B (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108877771A (en) * 2018-07-11 2018-11-23 北京大米科技有限公司 data processing method, storage medium and electronic equipment
CN109063587A (en) * 2018-07-11 2018-12-21 北京大米科技有限公司 data processing method, storage medium and electronic equipment
CN109889920A (en) * 2019-04-16 2019-06-14 威比网络科技(上海)有限公司 Network courses video clipping method, system, equipment and storage medium
CN110087143A (en) * 2019-04-26 2019-08-02 北京谦仁科技有限公司 Method for processing video frequency and device, electronic equipment and computer readable storage medium

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9792553B2 (en) * 2013-07-31 2017-10-17 Kadenze, Inc. Feature extraction and machine learning for evaluation of image- or video-type, media-rich coursework
US10140379B2 (en) * 2014-10-27 2018-11-27 Chegg, Inc. Automated lecture deconstruction
US9978119B2 (en) * 2015-10-22 2018-05-22 Korea Institute Of Science And Technology Method for automatic facial impression transformation, recording medium and device for performing the method
US10839257B2 (en) * 2017-08-30 2020-11-17 Qualcomm Incorporated Prioritizing objects for object recognition
WO2019143962A1 (en) * 2018-01-19 2019-07-25 Board Of Regents, The University Of Texas System Systems and methods for evaluating individual, group, and crowd emotion engagement and attention
CN109817041A (en) * 2019-01-07 2019-05-28 北京汉博信息技术有限公司 Multifunction teaching system

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108877771A (en) * 2018-07-11 2018-11-23 北京大米科技有限公司 data processing method, storage medium and electronic equipment
CN109063587A (en) * 2018-07-11 2018-12-21 北京大米科技有限公司 data processing method, storage medium and electronic equipment
CN109889920A (en) * 2019-04-16 2019-06-14 威比网络科技(上海)有限公司 Network courses video clipping method, system, equipment and storage medium
CN110087143A (en) * 2019-04-26 2019-08-02 北京谦仁科技有限公司 Method for processing video frequency and device, electronic equipment and computer readable storage medium

Also Published As

Publication number Publication date
CN110650369A (en) 2020-01-03

Similar Documents

Publication Publication Date Title
WO2020000879A1 (en) Image recognition method and apparatus
CN106685916B (en) Intelligent device and method for electronic conference
CN110298906B (en) Method and device for generating information
CN111476871B (en) Method and device for generating video
US10438264B1 (en) Artificial intelligence feature extraction service for products
WO2019242222A1 (en) Method and device for use in generating information
US9621851B2 (en) Augmenting web conferences via text extracted from audio content
US9754011B2 (en) Storing and analyzing presentation data
WO2019237657A1 (en) Method and device for generating model
WO2020000876A1 (en) Model generating method and device
US20200051451A1 (en) Short answer grade prediction
CN111432282B (en) Video recommendation method and device
WO2020221103A1 (en) Method for displaying user emotion, and device
CN110287947A (en) Interaction classroom in interaction classroom determines method and device
CN110880324A (en) Voice data processing method and device, storage medium and electronic equipment
CN116368785A (en) Intelligent query buffering mechanism
CN110867187B (en) Voice data processing method and device, storage medium and electronic equipment
CN112447073A (en) Explanation video generation method, explanation video display method and device
US10719696B2 (en) Generation of interrelationships among participants and topics in a videoconferencing system
CN113850898A (en) Scene rendering method and device, storage medium and electronic equipment
JP2021533489A (en) Computer implementation system and method for collecting feedback
CN117911730A (en) Method, apparatus and computer program product for processing topics
CN110650369B (en) Video processing method and device, storage medium and electronic equipment
CN111327943B (en) Information management method, device, system, computer equipment and storage medium
CN112884538A (en) Item recommendation method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right

Effective date of registration: 20230815

Address after: No. 902, 9th Floor, Unit 2, Building 1, No. 333 Jiqing Third Road, Chengdu High tech Zone, China (Sichuan) Pilot Free Trade Zone, Chengdu City, Sichuan Province, 610213

Patentee after: Chengdu Yudi Technology Co.,Ltd.

Address before: 100123 t4-27 floor, Damei center, courtyard 7, Qingnian Road, Chaoyang District, Beijing

Patentee before: BEIJING QIAN REN TECHNOLOGY Co.,Ltd.

TR01 Transfer of patent right