CN118283290A

CN118283290A - Video processing method and server

Info

Publication number: CN118283290A
Application number: CN202211736117.7A
Authority: CN
Inventors: 梁志宙; 董凯; 夏丁胤; 唐舸宇; 李景宇; 胡剑
Original assignee: Huawei Technologies Co Ltd
Current assignee: Huawei Technologies Co Ltd
Priority date: 2022-12-31
Filing date: 2022-12-31
Publication date: 2024-07-02

Abstract

The application discloses a video processing method and a server, and relates to the technical field of videos. In the scheme, the server can receive a first operation for the first video sent by the electronic device in the process of playing the first video by the electronic device, then the server can acquire a second video corresponding to the first operation, embed the second video in an unremitted segment of the first video to obtain an embedded third video, and finally send the embedded third video to the electronic device so that the electronic device plays the second video when the playing progress of the first video reaches the embedding position of the second video.

Description

Video processing method and server

Technical Field

The embodiment of the application relates to the technical field of videos, in particular to a video processing method and a server.

Background

Along with the continuous popularization of network teaching, the online classroom mode is not limited by time and place, so that the online classroom mode can be conveniently adopted by users at any time and any place according to the own demands, and the learning convenience is improved. When the user performs online classroom learning, the user can watch the online or offline in the modes of watching videos, playing courseware and the like. In the process of playing video or courseware, the screen is often a picture of a teacher lecture in a video picture or a knowledge point of the courseware presentation. However, when a user encounters an unfamiliar or difficult knowledge point during online classroom learning, the user is usually required to browse or search related data to help understand the knowledge point, which affects the online classroom learning efficiency of the user and is not experienced by the user.

Disclosure of Invention

The application provides a video processing method and a server, which can improve the online classroom learning efficiency of a user.

In order to achieve the above purpose, the application adopts the following technical scheme:

In a first aspect, a video processing method is provided, applied to a server, and the method includes: receiving a first operation aiming at a first video sent by electronic equipment in the process of playing the first video by the electronic equipment; acquiring a second video corresponding to the first operation; embedding a second video into the unplayed segment of the first video to obtain an embedded third video; and sending the third video to the electronic equipment, wherein the third video is used for playing the second video when the playing progress of the first video reaches the embedding position of the second video by the electronic equipment.

Alternatively, the first video may be a lesson video learned by the user, and the first operation may be an interactive operation performed by the user on the first video, such as screen capturing, pausing, or the like. The second video may be a micro-lesson video that assists the user in learning unfamiliar or difficult knowledge points.

According to the scheme provided by the first aspect, when the user performs the first operation on the first video in the process of playing the first video by the electronic device, the electronic device can send the first operation performed on the first video by the user to the server, so that the server can analyze knowledge points unfamiliar or difficult to understand by the user corresponding to the first operation according to the first operation of the user, and the server can acquire the second video corresponding to the first operation. The second video is a knowledge point course resource currently required by the user, and can assist the user to learn the knowledge points which are not familiar or understandable currently. At this time, the server may embed the second video into the unplayed segment of the first video to seamlessly connect the knowledge point lesson resource required by the user to a subsequent certain play position of the first video. And then the server can send the third video obtained after embedding to the electronic equipment, so that the electronic equipment can play the knowledge point course resources required by the user naturally without the need of viewing related data by interrupting video viewing under the condition that the current viewing continuity of the user is not interrupted by continuing to play the first video, and the playing progress of the first video reaches the embedding position of the second video. Therefore, the server can embed the video resources required by the user into the subsequent playing process of the original video according to the interactive operation of the user, so that the continuous watching of the user is not influenced while the subsequent playing content of the original video is changed.

In one possible implementation, embedding the second video in the unplayed segments of the first video may include: determining an embedding position of the second video in the unplayed segment of the first video; and embedding the second video into the unplayed clip according to the embedding position. In this manner, the server may determine the appropriate embedding location directly from the unplayed segments of the first video. To seamlessly interface knowledge point lesson resources required by the user into appropriate subsequent play positions of the first video. The embedding location may specifically include from which play time point (time stamp) of the first video the embedding is performed.

In one possible implementation, determining an embedding location of the second video in the unplayed segments of the first video may include: obtaining an unplayed segment of a first video; and determining the embedding position of the second video in the unplayed segment according to the comparison result of the unplayed segment and the second video. In this way, the server can determine the embedding position of the content comparison match from the unplayed segment of the first video by comparing the unplayed segment of the first video with the second video.

In one possible implementation manner, the unplayed segment of the first video may include a plurality of sub-segments, and determining the embedding position of the second video in the unplayed segment according to the comparison result of the unplayed segment and the second video includes: comparing the plurality of sub-segments with the second video respectively, and acquiring target sub-segments with comparison results meeting preset conditions from the plurality of sub-segments; and determining the embedding position of the second video in the unreleased segment according to the playing time point of the target sub-segment in the first video. Therefore, the server performs video slicing on the first video to divide the first video into relatively independent slice videos, and determines the embedding position according to the playing time point of the slice videos.

In one possible implementation, the video processing method may further include: dividing the unplayed clip into a plurality of sub-clips according to the text content in the unplayed clip; and/or detecting image content in the unplayed clip, dividing the unplayed clip into a plurality of sub-clips. It can be understood that, in order to ensure the continuity of one content or a sentence in the unplayed clip, the server performs video slicing through the text content, so that the integrity of the text content in the obtained sliced video can be ensured, and the server performs video slicing through the image content, so that the integrity of the image content in the obtained sliced video can be ensured. The server performs video slicing by combining text content and image content, and the text content and the image content complement each other, so that the accuracy of the video slicing can be effectively improved.

In a possible implementation manner, the comparing the multiple sub-segments with the second video, and obtaining the target sub-segment with the comparison result meeting the preset condition from the multiple sub-segments may include: respectively acquiring the similarity between the plurality of sub-segments and the second video; and obtaining target sub-fragments with similarity larger than a preset value from the plurality of sub-fragments. Therefore, the server can determine the slice video which is highly similar to the second video from a plurality of slice videos in the non-played fragment through similarity calculation, and the embedded position of the second video is placed near the slice video which is highly similar, so that the user can be ensured to continuously watch the first video, and no abrupt sense and contradiction sense can be caused when watching the embedded second video.

In one possible implementation, the target sub-segment is multiple, and the video processing method may further include: and acquiring the target sub-segment with the nearest playing time point from the plurality of target sub-segments. When there are a plurality of slice videos highly similar to the second video, the server may place the embedded position of the second video near one slice video in which the play time point is nearest, so that the second video can be most quickly embodied in the subsequent play without affecting the user's current viewing continuity.

Optionally, after the server embeds the second video into the unplayed segment of the first video to obtain the embedded third video, and sends the third video to the electronic device for playing, the user can also execute the interaction operation again, and the server can embed the new video into the unplayed segment of the embedded third video according to the received electric user interaction operation again, so after multiple rounds of user interaction, the original first video is converted into a personalized comprehensive video for the user after following the user interaction operation in the playing process.

In one possible implementation, the video processing method may further include: in the process of playing the first video by the electronic equipment, generating a knowledge navigation tree corresponding to the first video according to the current playing content of the first video, wherein the knowledge navigation tree comprises at least one tree node, and each tree node corresponds to a segment in the first video; when the playing progress of the first video reaches the embedded position of the second video, updating the knowledge navigation tree according to the current playing content of the second video, wherein the updated knowledge navigation tree comprises target tree nodes, and the target tree nodes correspond to the second video. In this way, in the process of playing the first video by the electronic device, the server can sort and generalize the knowledge navigation tree for the user in real time by detecting the current playing content of the electronic device in real time. And as the follow-up playing content of the first video is continuously and individually supplemented and changed along with the interactive feedback of the user, the knowledge point navigation tree generated by the server in real time is also continuously and individually supplemented and adaptively changed along with the interactive feedback of the user.

In a possible implementation manner, the generating the knowledge navigation tree corresponding to the first video according to the current playing content of the first video may include: determining a start time stamp and an end time stamp of a knowledge point corresponding to the current playing content according to the current playing content of the first video; and generating a knowledge navigation tree corresponding to the first video according to the start time stamp and the end time stamp of the knowledge point. Therefore, the server generates the knowledge navigation tree in real time through the knowledge points currently played by the first video and the start-stop time stamps corresponding to the knowledge points, so that the individuation learning context of the user can be effectively reflected, the user is helped to clearly know the knowledge points and the association between the knowledge points in the self learning process, and the learned content can be understood more comprehensively, systematically and neatly. Meanwhile, as the follow-up playing content of the first video is continuously and individually supplemented and changed along with the interactive feedback of the user, the start-stop time stamp of the original knowledge point on the knowledge navigation tree can be possibly changed, and therefore, the server can dynamically change the node content and the node time stamp on the knowledge navigation tree along with the interactive feedback of the user.

In one possible implementation, the video processing method may further include: receiving deleting operation of target tree nodes in the updated knowledge navigation tree, which is sent by the electronic equipment; and deleting the second video corresponding to the target tree node from the third video. Because the knowledge navigation tree is continuously supplemented and changed along with the interactive feedback of the user, and some knowledge points are probably mastered by the user, the server provides the deleting function of the knowledge points for the generated knowledge navigation tree and displays the knowledge points to the electronic equipment, so that when the user executes the deleting operation of a certain tree node on the knowledge navigation tree displayed by the electronic equipment, the user can delete the node on the knowledge navigation tree and delete the video segment corresponding to the node in the personalized comprehensive video by one key, and the situation that the user needs to repeatedly learn the mastered knowledge points when watching the personalized comprehensive video can be avoided.

In one possible implementation, the video processing method may further include: receiving click operation, sent by electronic equipment, aiming at a target tree node in the updated knowledge navigation tree; and sending a second video corresponding to the target tree node in the third video to the electronic equipment so as to control the electronic equipment to play the second video. Therefore, the server can provide the function of skipping to the characteristic knowledge video to perform special learning by clicking a certain node on the knowledge navigation tree for the user.

In one possible implementation, the video processing method may further include: after the electronic equipment finishes playing the first video, matching the second video with the first video; determining the embedding position of the second video in the first video again according to the matching result of the second video and the first video; and re-embedding the second video into the first video according to the re-determined embedding position to obtain a re-embedded fourth video. It will be appreciated that, in order to avoid affecting the continuous viewing experience of the user, the server cuts the second video into the subsequent playing process of the first video, if during the playing process before the first video, there is a content very related to the second video, it is possible that the embedding position of the second video behind the first video is not the most suitable, so the server may temporarily cut the second video into the subsequent playing process of the first video, and when the playing of the first video is finished, the server may rearrange the first video embedded with one or more videos to accurately find the most suitable embedding position of the second video from the complete video content of the first video.

In one possible implementation manner, the acquiring the second video corresponding to the first operation may include: acquiring the current playing content of a first video corresponding to a first operation; and acquiring a second video corresponding to the current playing content. Thus, when the user does not understand or is unfamiliar with the playing content of the first video currently played by the electronic device, the user can execute the first operation, so that the server can analyze knowledge points unfamiliar or unfamiliar with the user according to the playing content currently displayed by the electronic device when the first operation occurs, and recommend video resources possibly needed by the user according to the analysis result of the current playing content.

In a second aspect, there is provided a server comprising: the receiving unit is used for receiving a first operation aiming at the first video sent by the electronic equipment in the process of playing the first video by the electronic equipment; an acquisition unit configured to acquire a second video corresponding to the first operation; the embedding unit is used for embedding the second video into the unplayed segment of the first video to obtain an embedded third video; and the sending unit is used for sending a third video to the electronic equipment, wherein the third video is used for playing the second video when the playing progress of the first video reaches the embedding position of the second video by the electronic equipment.

In one possible embodiment, the above-mentioned embedding unit may be used for: determining an embedding position of the second video in the unplayed segment of the first video; and embedding the second video into the unplayed clip according to the embedding position.

In one possible embodiment, the above-mentioned embedding unit may be used for: obtaining an unplayed segment of a first video; and determining the embedding position of the second video in the unplayed segment according to the comparison result of the unplayed segment and the second video.

In one possible implementation, the unplayed segments of the first video may comprise a plurality of sub-segments, and the embedding unit may be configured to: comparing the plurality of sub-segments with the second video respectively, and acquiring target sub-segments with comparison results meeting preset conditions from the plurality of sub-segments; and determining the embedding position of the second video in the unreleased segment according to the playing time point of the target sub-segment in the first video.

In one possible implementation manner, the server may further include: the segmentation unit is used for dividing the unplayed fragments into a plurality of sub-fragments according to the text content in the unplayed fragments; and/or detecting image content in the unplayed clip, dividing the unplayed clip into a plurality of sub-clips.

In one possible embodiment, the above-mentioned embedding unit may be used for: respectively acquiring the similarity between the plurality of sub-segments and the second video; obtaining target sub-fragments with similarity larger than a preset value from a plurality of sub-fragments

In one possible implementation manner, the target sub-segment is multiple, and the embedding unit may be further configured to: and acquiring the target sub-segment with the nearest playing time point from the plurality of target sub-segments.

In one possible implementation manner, the server may further include: the navigation tree generation unit and the navigation tree generation unit are used for generating navigation tree. The navigation tree generation unit is used for generating a knowledge navigation tree corresponding to the first video according to the current playing content of the first video in the process of playing the first video by the electronic equipment, wherein the knowledge navigation tree comprises at least one tree node, and each tree node corresponds to a segment in the first video; and the navigation tree updating unit is used for updating the knowledge navigation tree according to the current playing content of the second video when the playing progress of the first video reaches the embedded position of the second video, wherein the updated knowledge navigation tree comprises target tree nodes corresponding to the second video.

In a possible implementation manner, the navigation tree generating unit may be configured to: determining a start time stamp and an end time stamp of a knowledge point corresponding to the current playing content according to the current playing content of the first video; and generating a knowledge navigation tree corresponding to the first video according to the start time stamp and the end time stamp of the knowledge point.

In one possible implementation manner, the server may further include: the navigation tree deleting unit is used for receiving the deleting operation of the target tree node in the updated knowledge navigation tree sent by the electronic equipment; and deleting the second video corresponding to the target tree node from the third video.

In one possible implementation manner, the server may further include: the navigation tree jumping unit is used for receiving clicking operation of target tree nodes in the updated knowledge navigation tree sent by the electronic equipment; and sending a second video corresponding to the target tree node in the third video to the electronic equipment so as to control the electronic equipment to play the second video.

In one possible implementation manner, the server may further include: a matching unit and a re-embedding unit. The matching unit is used for matching the second video with the first video after the electronic equipment finishes playing the first video; the re-embedding unit is used for re-determining the embedding position of the second video in the first video according to the matching result of the second video and the first video; and re-embedding the second video into the first video according to the re-determined embedding position to obtain a re-embedded fourth video.

In one possible embodiment, the acquiring unit may be configured to: acquiring the current playing content of a first video corresponding to a first operation; and acquiring a second video corresponding to the current playing content.

In a third aspect, a server is provided that includes one or more processors and one or more memories. The one or more memories are coupled to the one or more processors, the one or more memories being operable to store computer program code comprising computer instructions that, when executed by the one or more processors, cause the server to perform the video processing method in any of the possible implementations of the first aspect described above.

In a fourth aspect, an electronic device is provided that includes one or more processors and one or more memories. The one or more memories are coupled to the one or more processors, the one or more memories being operable to store computer program code comprising computer instructions that, when executed by the one or more processors, cause the electronic device to perform the video processing method in any of the possible implementations of the first aspect described above.

In a fifth aspect, a video processing apparatus is provided, where the apparatus is included in a server or an electronic device, and the apparatus has a function of implementing the behavior of the server or the electronic device in any one of the above first aspect and possible implementation manners of the first aspect. The functions can be realized by hardware, and can also be realized by executing corresponding software by hardware. The hardware or software includes one or more modules or units corresponding to the functions described above.

In a sixth aspect, a video processing system is provided that includes an electronic device and a server. The electronic device has a function of implementing the behavior of the electronic device in any one of the above first aspect and possible implementation manners of the first aspect. The server has the functionality to implement server behavior in the first aspect and any of the possible implementations of the first aspect.

In a seventh aspect, a chip system is provided that includes one or more interface circuits and one or more processors. The interface circuit and the processor are interconnected by a wire. The interface circuit is for receiving a signal from a memory of the electronic device and transmitting the signal to the processor, the signal including computer instructions stored in the memory. When the processor executes the computer instructions, the electronic device performs the video processing method in any of the possible implementations of the first aspect. The chip system may be formed of a chip or may include a chip and other discrete devices.

In an eighth aspect, there is provided a computer readable storage medium having stored thereon computer program instructions which, when run on a server or an electronic device, cause the server or the electronic device to perform the video processing method in any one of the possible implementations of the first aspect.

In a ninth aspect, there is provided a computer program product comprising instructions which, when run on a server or an electronic device, cause the electronic device to implement a video processing method as in any one of the possible implementations of the first aspect.

It will be appreciated that the advantages achieved by the server of the second aspect, the server of the third aspect, the electronic device of the fourth aspect, the apparatus of the fifth aspect, the video processing system of the sixth aspect, the chip system of the seventh aspect, the computer storage medium of the eighth aspect, and the computer program product of the ninth aspect provided above may refer to the advantages in any one of the possible implementations of the first aspect and the advantages will not be repeated here.

Drawings

Fig. 1A is a schematic system diagram of a video processing system according to an embodiment of the present application;

fig. 1B is a schematic hardware structure of an electronic device according to an embodiment of the present application;

fig. 1C is a schematic software structure of an electronic device according to an embodiment of the present application;

Fig. 1D is a schematic hardware structure of a server according to an embodiment of the present application;

Fig. 2 is a schematic diagram of a method for pushing data in online class according to an embodiment of the present application;

fig. 3 is a schematic flow chart of a video processing method according to an embodiment of the present application;

fig. 4 is a schematic system diagram of a video processing method according to an embodiment of the present application;

fig. 5 is an overall flowchart of a video processing method according to an embodiment of the present application;

Fig. 6 is an example schematic diagram of a video processing method according to an embodiment of the present application;

Fig. 7 is a schematic diagram of an example of another video processing method according to an embodiment of the present application;

FIG. 8 is a flowchart of a knowledge navigation tree generation method according to an embodiment of the present application;

fig. 9 is an example schematic diagram of a knowledge navigation tree generating method according to an embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be described below with reference to the accompanying drawings in the embodiments of the present application. The terms "first" and "second" are used below for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defining "a first" or "a second" may explicitly or implicitly include one or more such feature. It should be understood that in the present application, "at least one" means one or more, and "a plurality" means two or more. "and/or" is used to describe association relationships of associated objects, meaning that there may be three relationships, e.g., "a and/or B" may mean: only a, only B and both a and B are present, wherein a, B may be singular or plural.

The embodiment of the application provides a video processing method and a video processing system for online class. The video processing system may include a server and an electronic device, among other things. The video processing method can be realized by the server, the electronic equipment and the combination of the server and the electronic equipment. The video processing method provided by the application can solve the problem of low learning efficiency when a user learns in online class.

Fig. 1A is a schematic diagram of an exemplary video processing system for online class according to the present application, where the system includes an electronic device 100 and a server 200, and data transmission between the electronic device 100 and the server 200 may be performed through a communication network, as shown in fig. 1A.

The communication network may be a local area network, a wide area network switched through a relay (relay) device, or a local area network and a wide area network. When the communication network is a local area network, the communication network may be a near field communication network such as a wireless fidelity (WIRELESS FIDELITY, wi-Fi) network, a Bluetooth (BT) network, a zigbee network, or a near field communication (near fieldcommunication, NFC) network, for example. When the communication network is a wide area network, the communication network may be, for example, a third generation mobile communication technology (3 rd-generation wireless telephone technology, 3G) network, a fourth generation mobile communication technology (the 4th generation mobile communication technology,4G) network, a fifth generation mobile communication technology (5 th-generation mobile communication technology, 5G) network, a future evolution public land mobile network (public land mobile network, PLMN), or the internet, etc., to which the embodiments of the present application are not limited.

The electronic device 100 may be a mobile phone, a tablet computer, a desktop computer, a laptop computer, a handheld computer, a notebook computer, an ultra-mobilepersonal computer, a UMPC, an internet book, a cellular phone, a personal digital assistant (personal digitalassistant, PDA), an augmented reality (augmented reality, AR) \virtual reality (VR) device, an artificial intelligence (ARTIFICIAL INTELLIGENCE, AI) device, a wearable device, a vehicle-mounted device, an intelligent home device, or a device with a display screen, or may be a wireless device in a mobile office, an intelligent home, a sports health, an audio-visual entertainment, or an intelligent trip. Alternatively, the electronic device 100 may be deployed on land, including indoors or outdoors, handheld, or vehicle-mounted; can also be deployed on the water surface (such as ships, etc.); but may also be deployed in the air (e.g., on aircraft, balloon, satellite, etc.). The embodiment of the present application is not particularly limited as to the specific type of the electronic device 100.

In the embodiment of the application, the electronic device 100 can play various course videos of online class and can play other science popularization videos. The embodiment of the application does not limit the type of video played by the electronic device 100. Alternatively, when the electronic device 100 plays a course video of an online classroom, the electronic device 100 may receive a first operation of the course video by a user, where the first operation may be an operation of screen capturing, pausing, etc. to interact with the course video.

The server 200 may be implemented by a stand-alone server or a server cluster formed by a plurality of servers, and may be a cloud server, a cloud terminal, or the like. The embodiment of the present application is not particularly limited as to the specific type of the server 200.

In the embodiment of the present application, when the electronic device 100 plays a course video of an online classroom, the server 200 may dynamically adjust and change the content in the course video according to the first operation of the user. Optionally, the server 200 may also generate a knowledge navigation tree corresponding to the curriculum video according to the curriculum content of the curriculum video currently played by the real-time electronic device 100, and as the content in the curriculum video changes dynamically, the server may also update the knowledge navigation tree corresponding to the curriculum video correspondingly and dynamically.

Illustratively, the electronic device 100 in fig. 1A may employ the structure shown in fig. 1B. Fig. 1B is a schematic structural diagram of an electronic device 100 according to an embodiment of the present application. The electronic device 100 may include a processor 110, an external memory interface 120, an internal memory 121, a universal serial bus (universal serial bus, USB) interface 130, a charge management module 140, a power management module 141, a battery 142, an antenna 1, an antenna 2, a mobile communication module 150, a wireless communication module 160, an audio module 170, a speaker 170A, a receiver 170B, a microphone 170C, an earphone interface 170D, a sensor module 180, keys 190, a motor 191, an indicator 192, a camera 193, a display 194, and a subscriber identity module (subscriber identification module, SIM) card interface 195, etc. The sensor module 180 may include a pressure sensor 180A, a gyroscope sensor 180B, an air pressure sensor 180C, a magnetic sensor 180D, an acceleration sensor 180E, a distance sensor 180F, a proximity sensor 180G, a fingerprint sensor 180H, a temperature sensor 180J, a touch sensor 180K, an ambient light sensor 180L, a bone conduction sensor 180M, a gravity sensor, and the like.

It should be understood that the illustrated structure of the embodiment of the present application does not constitute a specific limitation on the electronic device 100. In other embodiments of the application, electronic device 100 may include more or fewer components than shown, or certain components may be combined, or certain components may be split, or different arrangements of components. The illustrated components may be implemented in hardware, software, or a combination of software and hardware.

The processor 110 may include one or more processing units, such as: the processor 110 may include an application processor (application processor, AP), a modem processor, a graphics processor (graphics processing unit, GPU), an image signal processor (IMAGE SIGNAL processor, ISP), a controller, a memory, a video codec, a digital signal processor (DIGITAL SIGNAL processor, DSP), a baseband processor, and/or a neural Network Processor (NPU), etc. Wherein the different processing units may be separate devices or may be integrated in one or more processors.

The controller may be a neural hub and a command center of the electronic device 100, among others. The controller can generate operation control signals according to the instruction operation codes and the time sequence signals to finish the control of instruction fetching and instruction execution.

A memory may also be provided in the processor 110 for storing instructions and data. In some embodiments, the memory in the processor 110 is a cache memory. The memory may hold instructions or data that the processor 110 has just used or recycled. If the processor 110 needs to reuse the instruction or data, it may be called directly from memory. Repeated accesses are avoided and the latency of the processor 110 is reduced, thereby improving the efficiency of the system.

In some embodiments, the processor 110 may include one or more interfaces. The interfaces may include an integrated circuit (inter-INTEGRATED CIRCUIT, I2C) interface, an integrated circuit built-in audio (inter-INTEGRATED CIRCUIT SOUND, I2S) interface, a pulse code modulation (pulse code modulation, PCM) interface, a universal asynchronous receiver transmitter (universal asynchronous receiver/transmitter, UART) interface, a mobile industry processor interface (mobile industry processor interface, MIPI), a general-purpose input/output (GPIO) interface, a subscriber identity module (subscriber identity module, SIM) interface, and/or a universal serial bus (universal serial bus, USB) interface, among others.

The I2C interface is a bi-directional synchronous serial bus comprising a serial data line (SERIAL DATA LINE, SDA) and a serial clock line (derail clock line, SCL). In some embodiments, the processor 110 may contain multiple sets of I2C buses. The processor 110 may be coupled to the touch sensor 180K, charger, flash, camera 193, etc., respectively, through different I2C bus interfaces. For example: the processor 110 may be coupled to the touch sensor 180K through an I2C interface, such that the processor 110 communicates with the touch sensor 180K through an I2C bus interface to implement a touch function of the electronic device 100.

The I2S interface may be used for audio communication. In some embodiments, the audio module 170 may transmit an audio signal to the wireless communication module 160 through the I2S interface, to implement a function of answering a call through the bluetooth headset.

PCM interfaces may also be used for audio communication to sample, quantize and encode analog signals. In some embodiments, the audio module 170 and the wireless communication module 160 may be coupled through a PCM bus interface. In some embodiments, the audio module 170 may also transmit audio signals to the wireless communication module 160 through the PCM interface to implement a function of answering a call through the bluetooth headset.

The UART interface is a universal serial data bus for asynchronous communications. In some embodiments, a UART interface is typically used to connect the processor 110 with the wireless communication module 160. For example: the processor 110 communicates with a bluetooth module in the wireless communication module 160 through a UART interface to implement a bluetooth function. In some embodiments, the audio module 170 may transmit an audio signal to the wireless communication module 160 through a UART interface, to implement a function of playing music through a bluetooth headset.

The MIPI interface may be used to connect the processor 110 to peripheral devices such as a display 194, a camera 193, and the like. The MIPI interfaces include camera serial interfaces (CAMERA SERIAL INTERFACE, CSI), display serial interfaces (DISPLAY SERIAL INTERFACE, DSI), and the like. In some embodiments, processor 110 and camera 193 communicate through a CSI interface to implement the photographing functions of electronic device 100. The processor 110 and the display 194 communicate via a DSI interface to implement the display functionality of the electronic device 100.

The GPIO interface may be configured by software. The GPIO interface may be configured as a control signal or as a data signal. In some embodiments, a GPIO interface may be used to connect the processor 110 with the camera 193, the display 194, the wireless communication module 160, the audio module 170, the sensor module 180, and the like.

The USB interface 130 is an interface conforming to the USB standard specification, and may specifically be a Mini USB interface, a Micro USB interface, a USB Type C interface, or the like. The USB interface 130 may be used to connect a charger to charge the electronic device 100, and may also be used to transfer data between the electronic device 100 and a peripheral device. And can also be used for connecting with a headset, and playing audio through the headset. The interface may also be used to connect other electronic devices.

It should be understood that the interfacing relationship between the modules illustrated in the embodiments of the present application is only illustrative, and is not meant to limit the structure of the electronic device 100. In other embodiments of the present application, the electronic device 100 may also employ different interfacing manners in the above embodiments, or a combination of multiple interfacing manners.

The charge management module 140 is configured to receive a charge input from a charger. The charger can be a wireless charger or a wired charger. In some wired charging embodiments, the charge management module 140 may receive a charging input of a wired charger through the USB interface 130. In some wireless charging embodiments, the charge management module 140 may receive wireless charging input through a wireless charging coil of the electronic device 100. The charging management module 140 may also supply power to the electronic device through the power management module 141 while charging the battery 142.

The power management module 141 is used for connecting the battery 142, and the charge management module 140 and the processor 110. The power management module 141 receives input from the battery 142 and/or the charge management module 140 and provides power to the processor 110, the internal memory 121, the external memory, the display 194, the camera 193, the wireless communication module 160, and the like. The power management module 141 may also be configured to monitor battery capacity, battery cycle number, battery health (leakage, impedance) and other parameters. In other embodiments, the power management module 141 may also be provided in the processor 110. In other embodiments, the power management module 141 and the charge management module 140 may be disposed in the same device.

The wireless communication function of the electronic device 100 may be implemented by the antenna 1, the antenna 2, the mobile communication module 150, the wireless communication module 160, a modem processor, a baseband processor, and the like.

The antennas 1 and 2 are used for transmitting and receiving electromagnetic wave signals. Each antenna in the electronic device 100 may be used to cover a single or multiple communication bands. Different antennas may also be multiplexed to improve the utilization of the antennas. For example: the antenna 1 may be multiplexed into a diversity antenna of a wireless local area network. In other embodiments, the antenna may be used in conjunction with a tuning switch.

The mobile communication module 150 may provide a solution for wireless communication including 2G/3G/4G/5G, etc., applied to the electronic device 100. The mobile communication module 150 may include at least one filter, switch, power amplifier, low noise amplifier (low noise amplifier, LNA), etc. The mobile communication module 150 may receive electromagnetic waves from the antenna 1, perform processes such as filtering, amplifying, and the like on the received electromagnetic waves, and transmit the processed electromagnetic waves to the modem processor for demodulation. The mobile communication module 150 can amplify the signal modulated by the modem processor, and convert the signal into electromagnetic waves through the antenna 1 to radiate. In some embodiments, at least some of the functional modules of the mobile communication module 150 may be disposed in the processor 110. In some embodiments, at least some of the functional modules of the mobile communication module 150 may be provided in the same device as at least some of the modules of the processor 110.

The wireless communication module 160 may provide solutions for wireless communication including wireless local area network (wireless local area networks, WLAN) (e.g., wireless fidelity (WIRELESS FIDELITY, wi-Fi) network), bluetooth (BT), global navigation satellite system (global navigation SATELLITE SYSTEM, GNSS), frequency modulation (frequency modulation, FM), near field communication (NEAR FIELD communication, NFC), infrared (IR), etc., applied to the electronic device 100. The wireless communication module 160 may be one or more devices that integrate at least one communication processing module. The wireless communication module 160 receives electromagnetic waves via the antenna 2, modulates the electromagnetic wave signals, filters the electromagnetic wave signals, and transmits the processed signals to the processor 110. The wireless communication module 160 may also receive a signal to be transmitted from the processor 110, frequency modulate it, amplify it, and convert it to electromagnetic waves for radiation via the antenna 2.

In the embodiment of the present application, the electronic device 100 may send data to the server through the wireless communication module 160, or may receive data sent by the server through the wireless communication module 160.

In some embodiments, antenna 1 and mobile communication module 150 of electronic device 100 are coupled, and antenna 2 and wireless communication module 160 are coupled, such that electronic device 100 may communicate with a network and other devices through wireless communication techniques.

The electronic device 100 implements display functions through a GPU, a display screen 194, an application processor, and the like. The GPU is a microprocessor for image processing, and is connected to the display 194 and the application processor. The GPU is used to perform mathematical and geometric calculations for graphics rendering. Processor 110 may include one or more GPUs that execute program instructions to generate or change display information.

The display screen 194 is used to display images, videos, and the like. The display 194 includes a display panel. The display panel may employ a Liquid Crystal Display (LCD) CRYSTAL DISPLAY, an organic light-emitting diode (OLED), an active-matrix organic LIGHT EMITTING diode (AMOLED), a flexible light-emitting diode (FLED), miniled, microLed, micro-oLed, a quantum dot LIGHT EMITTING diode (QLED), or the like. In some embodiments, the electronic device 100 may include 1 or N display screens 194, N being a positive integer greater than 1.

In an embodiment of the present application, the display 194 may be used to display classroom video.

The electronic device 100 may implement photographing functions through an ISP, a camera 193, a video codec, a GPU, a display screen 194, an application processor, and the like.

The ISP is used to process data fed back by the camera 193. For example, when photographing, the shutter is opened, light is transmitted to the camera photosensitive element through the lens, the optical signal is converted into an electrical signal, and the camera photosensitive element transmits the electrical signal to the ISP for processing, so that the electrical signal is converted into an image visible to naked eyes. ISP can also optimize the noise, brightness and skin color of the image. The ISP can also optimize parameters such as exposure, color temperature and the like of a shooting scene. In some embodiments, the ISP may be provided in the camera 193. In some embodiments, electronic device 100 may include 1 or N cameras 193, N being a positive integer greater than 1.

The digital signal processor is used for processing digital signals, and can process other digital signals besides digital image signals. For example, when the electronic device 100 selects a frequency bin, the digital signal processor is used to fourier transform the frequency bin energy, or the like.

Video codecs are used to compress or decompress digital video. The electronic device 100 may support one or more video codecs. In this way, the electronic device 100 may play or record video in a variety of encoding formats, such as: dynamic picture experts group (moving picture experts group, MPEG) 1, MPEG2, MPEG3, MPEG4, etc.

The external memory interface 120 may be used to connect an external memory card, such as a Micro SD card, to enable expansion of the memory capabilities of the electronic device 100. The external memory card communicates with the processor 110 through an external memory interface 120 to implement data storage functions. For example, files such as music, video, etc. are stored in an external memory card.

The internal memory 121 may be used to store computer-executable program code that includes instructions. The processor 110 executes various functional applications of the electronic device 100 and data processing by executing instructions stored in the internal memory 121. The internal memory 121 may include a storage program area and a storage data area. The storage program area may store an application program (such as a sound playing function, an image playing function, etc.) required for at least one function of the operating system, etc. The storage data area may store data created during use of the electronic device 100 (e.g., audio data, phonebook, etc.), and so on. In addition, the internal memory 121 may include a high-speed random access memory, and may further include a nonvolatile memory such as at least one magnetic disk storage device, a flash memory device, a universal flash memory (universal flash storage, UFS), and the like.

The electronic device 100 may implement audio functions through an audio module 170, a speaker 170A, a receiver 170B, a microphone 170C, an earphone interface 170D, an application processor, and the like. Such as music playing, recording, etc.

The audio module 170 is used to convert digital audio information into an analog audio signal output and also to convert an analog audio input into a digital audio signal. The audio module 170 may also be used to encode and decode audio signals. In some embodiments, the audio module 170 may be disposed in the processor 110, or a portion of the functional modules of the audio module 170 may be disposed in the processor 110.

The speaker 170A, also referred to as a "horn," is used to convert audio electrical signals into sound signals. The electronic device 100 may listen to music, or to hands-free conversations, through the speaker 170A.

A receiver 170B, also referred to as a "earpiece", is used to convert the audio electrical signal into a sound signal. When electronic device 100 is answering a telephone call or voice message, voice may be received by placing receiver 170B in close proximity to the human ear.

Microphone 170C, also referred to as a "microphone" or "microphone", is used to convert sound signals into electrical signals. When making a call or transmitting voice information, the user can sound near the microphone 170C through the mouth, inputting a sound signal to the microphone 170C. The electronic device 100 may be provided with at least one microphone 170C. In other embodiments, the electronic device 100 may be provided with two microphones 170C, and may implement a noise reduction function in addition to collecting sound signals. In other embodiments, the electronic device 100 may also be provided with three, four, or more microphones 170C to enable collection of sound signals, noise reduction, identification of sound sources, directional recording functions, etc.

The earphone interface 170D is used to connect a wired earphone. The headset interface 170D may be a USB interface 130 or a 3.5mm open mobile electronic device platform (open mobile terminal platform, OMTP) standard interface, a american cellular telecommunications industry association (cellular telecommunications industry association of the USA, CTIA) standard interface.

The pressure sensor 180A is used to sense a pressure signal, and may convert the pressure signal into an electrical signal. In some embodiments, the pressure sensor 180A may be disposed on the display screen 194. The pressure sensor 180A is of various types, such as a resistive pressure sensor, an inductive pressure sensor, a capacitive pressure sensor, and the like. The capacitive pressure sensor may be a capacitive pressure sensor comprising at least two parallel plates with conductive material. The capacitance between the electrodes changes when a force is applied to the pressure sensor 180A. The electronic device 100 determines the strength of the pressure from the change in capacitance. When a touch operation is applied to the display screen 194, the electronic apparatus 100 detects the touch operation intensity according to the pressure sensor 180A. The electronic device 100 may also calculate the location of the touch based on the detection signal of the pressure sensor 180A. In some embodiments, touch operations that act on the same touch location, but at different touch operation strengths, may correspond to different operation instructions. For example: and executing an instruction for checking the short message when the touch operation with the touch operation intensity smaller than the first pressure threshold acts on the short message application icon. And executing an instruction for newly creating the short message when the touch operation with the touch operation intensity being greater than or equal to the first pressure threshold acts on the short message application icon.

The gyro sensor 180B may be used to determine a motion gesture of the electronic device 100. The air pressure sensor 180C may be used to measure air pressure. The magnetic sensor 180D may be used to detect the opening and closing of the flip when the electronic device 100 is a flip machine. The acceleration sensor 180E may detect the magnitude of acceleration of the electronic device 100 in various directions (typically three axes). The distance sensor 180F may be used to measure distance. The proximity light sensor 180G may be used to determine whether an object is in the vicinity of the electronic device 100. The fingerprint sensor 180H is used to collect a fingerprint. The electronic device 100 may utilize the collected fingerprint feature to unlock the fingerprint, access the application lock, photograph the fingerprint, answer the incoming call, etc. The temperature sensor 180J may be used to detect temperature. Ambient light sensor 180L may be used to sense ambient light level.

The touch sensor 180K, also referred to as a "touch panel". The touch sensor 180K may be disposed on the display screen 194, and the touch sensor 180K and the display screen 194 form a touch screen, which is also called a "touch screen". The touch sensor 180K is for detecting a touch operation acting thereon or thereabout. The touch sensor may communicate the detected touch operation to the application processor to determine the touch event type. Visual output related to touch operations may be provided through the display 194. In other embodiments, the touch sensor 180K may also be disposed on the surface of the electronic device 100 at a different location than the display 194.

In the embodiment of the present application, the touch sensor 180K may be used to detect operations such as clicking, long pressing, sliding, etc. by a user. The long press operation can be understood as an operation corresponding to the time length of executing the operation by the user meeting the preset time length.

The keys 190 include a power-on key, a volume key, etc. The keys 190 may be mechanical keys. Or may be a touch key. The electronic device 100 may receive key inputs, generating key signal inputs related to user settings and function controls of the electronic device 100.

The motor 191 may generate a vibration cue. The motor 191 may be used for incoming call vibration alerting as well as for touch vibration feedback. For example, touch operations acting on different applications (e.g., photographing, audio playing, etc.) may correspond to different vibration feedback effects. The touch vibration feedback effect may also support customization.

The indicator 192 may be an indicator light, may be used to indicate a state of charge, a change in charge, a message indicating a missed call, a notification, etc.

In an embodiment of the present application, the electronic device 100 may be mountedOr other operating system, the embodiment of the application does not limit the operating system carried by the electronic device.

The software system of the electronic device 100 may employ a layered architecture, an event driven architecture, a microkernel architecture, a microservice architecture, or a cloud architecture. Embodiments of the application are configured in a layered mannerThe system is an example illustrating the software architecture of the electronic device 100.

Illustratively, the electronic device 100 of FIG. 1A may employ the software architecture block diagram shown in FIG. 1C. The layered architecture divides the software into several layers, each with distinct roles and branches. The layers communicate with each other through a software interface. In some embodiments, it willThe system is divided into four layers, from top to bottom, an application layer, an application framework layer, an Zhuoyun rows (android runtime) and a system library, and a kernel layer. The application layer may include a series of application packages.

As shown in fig. 1C, the application package may include applications for cameras, gallery, calendar, talk, map, navigation, WLAN, bluetooth, music, video, short message, etc. For convenience of description, an application program will be hereinafter simply referred to as an application. The application on the electronic device may be a native application or a third party application, and the embodiment of the present application is not limited.

The application framework layer provides an application programming interface (application programming interface, API) and programming framework for the application of the application layer. The application framework layer includes a number of predefined functions.

As shown in fig. 1C, the application framework layer may include a window MANAGER SERVICE (VMS), an activity management server (ACTIVITY MANAGER SERVICE, AMS), an input event management server (input MANAGER SERVICE, IMS), a content provider, a view system, a telephony manager, a resource manager, a notification manager, and the like.

The window management server is used for managing window programs. The window management server can acquire the size of the display screen, judge whether a status bar exists, lock the screen, intercept the screen and the like.

The Activity management server (ACTIVITY MANAGER SERVICE, AMS) is responsible for managing Activity, for starting, switching, scheduling, and managing and scheduling applications of each component in the system.

The input event management server (input MANAGER SERVICE, IMS) may be configured to translate, package, etc. the original input event, obtain an input event containing more information, and send the input event to the window management server, where a clickable area (such as a control) of each application program, location information of a focus window, etc. are stored in the window management server. Thus, the window management server can properly distribute the input event to the designated control or focus window.

The content provider is used to store and retrieve data and make such data accessible to applications. The data may include video, images, audio, calls made and received, browsing history and bookmarks, phonebooks, etc.

The view system includes visual controls, such as controls to display text, controls to display pictures, and the like. The view system may be used to build applications. The display interface may be composed of one or more views. For example, a display interface including a text message notification icon may include a view displaying text and a view displaying a picture.

The telephony manager is used to provide the communication functions of the electronic device 100. Such as the management of call status (including on, hung-up, etc.).

The resource manager provides various resources for the application program, such as localization strings, icons, pictures, layout files, video files, and the like.

The notification manager allows the application to display notification information in a status bar, can be used to communicate notification type messages, can automatically disappear after a short dwell, and does not require user interaction. Such as notification manager is used to inform that the download is complete, message alerts, etc. The notification manager may also be a notification in the form of a chart or scroll bar text that appears on the system top status bar, such as a notification of a background running application, or a notification that appears on the screen in the form of a dialog window. For example, a text message is prompted in a status bar, a prompt tone is emitted, the electronic device vibrates, and an indicator light blinks, etc.

Android runtime include core libraries and virtual machines. android runtime is responsible forScheduling and management of the system.

The core library consists of two parts: one part is a function which needs to be called by java language, and the other part is a core library of android.

The application layer and the application framework layer run in a virtual machine. The virtual machine executes java files of the application program layer and the application program framework layer as binary files. The virtual machine is used for executing the functions of object life cycle management, stack management, thread management, security and exception management, garbage collection and the like.

The system library may include a plurality of functional modules. For example: surface manager (surface manager), media Libraries (Media Libraries), three-dimensional graphics processing Libraries (e.g., openGL ES), 2D graphics engines (e.g., SGL), etc.

The surface manager is used to manage the display subsystem and provides a fusion of 2D and 3D layers for multiple applications.

Media libraries support a variety of commonly used audio, video format recordings, still image files, and the like. The media library may support a variety of audio and video encoding formats, such as MPEG4, h.264, MP3, AAC, AMR, JPG, PNG, etc.

The three-dimensional graphic processing library is used for realizing three-dimensional graphic drawing, image rendering, synthesis, layer processing and the like.

The 2D graphics engine is a drawing engine for 2D drawing.

The kernel layer is a layer between hardware and software. The kernel layer may contain display drivers, input/output device drivers (e.g., keyboard, touch screen, headphones, speakers, microphones, etc.), device nodes, camera drivers, audio drivers, and sensor drivers, among others. The user performs input operation through the input device, and the kernel layer can generate corresponding original input events according to the input operation and store the corresponding original input events in the device node.

By way of example, the server in FIG. 1A may employ the structure shown in FIG. 1D. Fig. 1D is an exemplary block diagram of a server 200 provided by the present application. As shown in fig. 1D, the server 200 includes an antenna 201, a radio frequency device 202, and a baseband device 203. The antenna 201 is connected to a radio frequency device 202. In the uplink direction, the radio frequency device 202 receives a signal from the electronic apparatus 100 through the antenna 201, and transmits the received signal to the baseband device 203 for processing. In the downstream direction, the baseband device 203 generates a signal to be transmitted to the electronic apparatus 100, and transmits the generated signal to the radio frequency device 202. The radio frequency device 202 transmits the signal via the antenna 201.

The baseband apparatus 203 may include one or more processing units 2031. The processing unit 2031 may be specifically a processor. In addition, the baseband device 203 may further include one or more storage units 2032 and one or more communication interfaces 2033. The storage unit 2032 is for storing computer programs and/or data. The communication interface 2033 is for interacting information with the radio frequency device 202. The storage unit 2032 may be specifically a memory, and the communication interface 2033 may be an input/output interface or a transceiver circuit. Alternatively, the storage unit 2032 may be a storage unit on the same chip as the processing unit 2031, i.e., an on-chip storage unit, or may be a storage unit on a different chip than the processing unit 2031, i.e., an off-chip storage unit. The application is not limited in this regard.

Based on the above description of the video processing system related to online class, the flow of the video processing method related to online class provided by the application will be described in detail with reference to the accompanying drawings.

At present, the learning content of online class mainly takes videos, and the videos comprise live course videos and non-live course videos. The live course video is usually live course video (live course for short), and the non-live course video is usually recorded course video (recorded course for short).

For recorded lessons, because recorded lessons generally lack interaction, users lack atmosphere of collective learning in the online learning process, so that the users can easily lose attention due to boring and boring. And the course cannot be completely matched with the user. Recording and playing lessons are usually public-oriented and not customized for users, and cannot be achieved. At this time, if the user does not understand or is unfamiliar with a certain piece of content of the course, the subsequent listening course is often affected; or some exquisite classes and special classes have more prepositions and great difficulty in handling, so that a user can easily and directly give up learning. Therefore, the existing recorded lessons are far more than the live lessons, but the recorded lessons have low lesson completion rate.

In order to solve the problems, the application provides a solution, by presetting test questions in the course video, when a user watches the course video, the course video can pop up the test questions at a specific time point, and the user can watch the course video continuously after answering, so that the attention loss of the user is prevented. Although the mode of presetting interactive questions can alleviate the lack of course interaction to a certain extent, the interaction is too simple and has no obvious feedback. The interaction in the real teaching is that a teacher can change the teaching content and mode according to the interaction of students. The mode of presetting interactive test questions is that after the user answers questions, the course is still the original course. In addition, the pop-up of preset questions sometimes affects the viewing experience of the video and may even interfere with the user.

The application also provides a solution, by combining simple recommendation, search and other technologies in the course video watching process, when a user encounters an unintelligible or unfamiliar knowledge point, related learning data can be pushed quickly, and the course learning difficulty is reduced. The learning material is pushed for the user depending on the user interaction, and although the learning difficulty of courses can be reduced to a certain extent, the behavior of interrupting video viewing is continuously switched and viewed by the recommended material, so that the viewing experience of the video is influenced, and meanwhile, the learning process is not smooth enough. In addition, after the user finishes learning, the course itself is not changed, the next watching needs to be repeatedly pushed, the knowledge required by the whole learning process is fragmented, and the learning resources are not complete and systematic.

For example, as shown in fig. 2, during the course video watching process, the current playing position and the playing content within a certain range before are obtained through user interaction; then, converting the playing content into text content; then, extracting knowledge points from the text content and displaying the knowledge points; and finally, inquiring related data and pushing according to the knowledge points selected by the user. So that it is possible to provide the user with the relevant information of the contents of the lecture in time when the user desires. However, the learning process in this way corresponds to switching data learning from video learning, switching video learning again, switching data learning again, and the learning content is repeatedly switched, and the whole learning process is actually continuously interrupted and fragmented. This affects the learning experience to some extent, limiting learning efficiency. After the learning process in this way, no new complete learning resource is formed, and the original course and recommended data are still separated. If the user wants to learn again next time, the user may need to repeatedly go through the processes of interaction, pushing and the like, so that the subsequent repeated learning of the user is not facilitated.

In this regard, the embodiment of the application provides a video processing method, which can seamlessly connect knowledge point course resources required by a user to a proper position of a current course video based on user interaction, and does not influence continuous video watching of the user. Therefore, after the user interaction is finished, the user can continue to watch the course video, so that the required knowledge points can be seen in the subsequent video, and the user does not need to watch the data by interrupting video watching. And through the multi-round interaction of the user, the original course video is converted into a complete and personalized comprehensive course video aiming at the user, instead of scattered fragment resources, so that the user can watch and learn repeatedly.

The following description is made with reference to fig. 3 and fig. 4, which illustrate a video processing method according to an embodiment of the present application, which may be applied to the video processing system shown in fig. 1A, where the execution subject may be a server or an electronic device. The video processing method may be performed in various orders and/or concurrently without limiting the order of execution shown in fig. 3. As shown in fig. 3, the method may include S310-S350.

S310, when receiving a first operation for the first video sent by the electronic device in the process of playing the first video by the electronic device, acquiring a second video corresponding to the first operation.

In the embodiment of the application, the first video may be a video currently played by the electronic device. Alternatively, the first video may be a lesson video for learning knowledge, or may be other videos, such as a science popularization video. The embodiment of the application does not limit the type of the first video.

Optionally, the user may open a browser webpage through the electronic device to search the server for the video to be played through the webpage, the server may retrieve one or more videos related to the user search from the database and feed back to the webpage display, and the user may select the first video to be played from the videos and click to play, so that the server may transmit the video content of the first video to be played to the electronic device in real time, and the first video is played by the electronic device.

Optionally, the user may also open the video playing application through the electronic device, so as to search for the first video to be played in the application and click to play, so that the server corresponding to the video playing application may transmit the video content of the first video to be played to the electronic device in real time, and the electronic device plays the first video.

Optionally, when the electronic device plays the first video, the user may perform the first operation with respect to the first video when the user is not aware or familiar with the content currently played by the electronic device. Alternatively, the first operation for the first video may be a pause, a screenshot, or the like operation performed by the user on the first video.

Optionally, the electronic device may display one or more button controls on the playing interface of the first video, where the user performs a first operation on the first video, and may also be a click operation on a button control on the playing interface of the first video by the user. Optionally, the electronic device may also be provided with one or more keys, such as a volume key and a start key, where the first operation performed by the user on the first video may also be a pressing operation performed by the user on one or more keys on the electronic device, where the pressing operation of the one or more keys may be used to control the first video played by the electronic device, and for example, the volume key+the start key may trigger a screen capturing operation performed by the electronic device on the first video.

Optionally, when the electronic device plays the first video, a test function for a preset knowledge point may be triggered when the first video is played to a preset time point of the first video. As one way, the electronic device may pop up one or more test questions of a preset knowledge point to detect the user's mastery of the tested knowledge point through the questions. It can be understood that whether the answer is correct or not can reflect the grasping degree of the user on the tested knowledge points, and when the user answers wrong, the user can be considered to be possibly not understand or not familiar with the tested knowledge points.

Optionally, the first operation for the first video may also be an answer operation performed by the user on the first video. In one mode, if one or more test questions of a preset knowledge point are popped up in the playing process of the first video, the user can execute answering operations, such as text input operation of answering contents, selection operation of answering options and the like, on the one or more test questions.

Optionally, the answer operation performed by the user on the first video may be an answer operation. Optionally, after detecting that the user performs a answering operation on a certain test question, the electronic device may determine whether the user answers the test question, and when detecting that the user answers the test question by mistake, may consider that a knowledge point user tested by the test question may not understand or be unfamiliar, and may consider that the user performs a wrong answering operation on the first video.

It will be appreciated that when a user answers, the user may be considered to be likely to understand or be familiar with the knowledge points of the test. The electronic device may not perform the video processing method of the present application at this time.

Optionally, the first operation of the first video may be other interactive operations performed on the first video by the user, which is not limited by the embodiment of the present application. For example, a search operation is also possible. Optionally, in the process of playing the first video by the electronic device, if the user performs the first operation on the first video to interact with the first video, after the user completes interaction, the user may still continue to watch the first video, and the electronic device may still continue to play the first video.

Alternatively, when detecting a first operation of the user on the first video, the electronic device may feed back the first operation to the server, and the server analyzes a knowledge point that may not be understood or familiar to the user, so as to obtain the second video according to the knowledge point. The second video may be a lesson video related to the knowledge point to assist the user in understanding the knowledge point. Alternatively, the server may search the video library for a second video that best matches the knowledge point.

As one way, the electronic device may send the operation information of the first operation to the server. Alternatively, the operation information may include a current playing position of the first video when the user performs the first operation. After receiving the operation information of the first operation, the server may acquire a video corresponding to the current playing position of the first video, or acquire a video of n frames near the current playing position of the first video. Wherein n is an integer greater than 1. The server may then perform optical character recognition (optical character recognition, OCR) model analysis on the image of the video to identify text content in the video image. Optionally, the server may also perform automatic speech recognition (automatic speech recognition, ASR) model analysis on the audio of the video to identify text content in the audio. Therefore, the video content corresponding to the user interaction position in the first video is converted into the text content.

Optionally, after converting the video content corresponding to the user interaction position in the first video into the text content, the server may be further combined with the education knowledge graph to send the text content to one or more knowledge points of the electronic device. The electronic device may display the one or more knowledge points in a list for viewing by the user for the user to determine their weak points of knowledge. Wherein, the weak knowledge points may be knowledge points that are unfamiliar or not understood by the user. When only one knowledge point is selected, the server can directly determine that the knowledge point is the weak knowledge point of the user, and the knowledge point is not required to be sent to the electronic equipment for displaying the knowledge point list.

Alternatively, when the knowledge points are plural, the electronic device may display the plural knowledge points in a knowledge point list, so that the user may select knowledge points from the knowledge point list that are not understood or familiar to the user.

Optionally, when the server combines with the education knowledge graph to determine that the text content relates to one or more knowledge points, one or more test questions can be generated and sent to the electronic device based on the one or more knowledge points. The electronic device can display one or more test questions for the user to answer.

After answering questions, the electronic device can automatically analyze knowledge points which are not understood or familiar to the user according to the answers of the user and feed the knowledge points back to the server. And the server can search the second video which is most matched with the knowledge points fed back by the electronic equipment from the video library according to the knowledge points fed back by the electronic equipment.

Optionally, the operation information may also include a currently played video content of the first video when the user performs the first operation. The server can identify the currently played video content to convert the video content into text content and execute the steps.

Optionally, when the first operation is a answering operation, the operation information may further include content of the test question answered by the user when the user performs the first operation. The server can search the second video which is most matched with the knowledge point from the video library according to the knowledge point tested by the test question content.

For example, as shown in fig. 5, after the user performs the interactive operation (i.e., the first operation) on the first video, the server may know which video frames in the first video need to be subjected to content analysis according to the interactive operation performed by the user. For example, as shown in fig. 6, taking the content of the first video including the content of the knowledge points such as the refraction of light, the calculation of the refractive index, the dispersion of light, and the diffraction of light as an example, when the electronic device plays the content of the refraction of light in the first video, if the user performs active operations on the playing first video, such as pausing, button pressing, screen capturing, and the like, the electronic device may send the active interactive signal to the server, so that the server may identify, according to the received interactive signal, the video frame where the user signal is located and the course content of n frames before and after the video frame where the user signal is located, and convert the video frame and the course content into text content. The recognition process involves OCR and ASR, and converts the picture frames of the lesson video and the lesson voice into text contents corresponding to the lesson voice.

As shown in fig. 5, the server may then use techniques such as word segmentation, entity recognition, and topic extraction for natural language processing based on the text content identified by OCR and ASR, in combination with the educational knowledge graph, to finally convert the lesson content into a knowledge point list. As shown in fig. 6, the server performs content recognition by combining the video frame fed back by the electronic device with the knowledge graph, and finally Luo Liechu knowledge points such as "reflection of light", "definition of light", "property of light", and the like.

After the electronic equipment displays the listed knowledge points, the user can select and feed back the listed knowledge points, and select weak knowledge points to feed back to the server. As shown in fig. 6, the user selects "reflection of light" among the listed knowledge points as an unintelligible knowledge point, so that the server can perform a subsequent operation based on the knowledge point.

Optionally, besides the method that the user selects the weak knowledge points, the server can intelligently judge the weak knowledge points of the user by intelligent question diagnosis, answering the questions by the user and combining the answer result and the evaluation technology.

As shown in fig. 7, when the electronic device plays the dispersed content of the light in the first video, if the user performs active operations on the playing first video, such as pause, button pressing, screen capturing, etc., the server performs content identification by combining the knowledge graph with the video frame fed back by the electronic device, and finally, the knowledge points such as "the characteristics of each color light", "the dispersion phenomenon", and "the three primary colors of the light" are listed. The server performs the hall test based on the listed knowledge points, diagnoses the weak knowledge points as the characteristics of each color light according to the response result of the user, and then performs the subsequent operation based on the weak knowledge points.

As shown in fig. 5, after obtaining the weak point of knowledge, the server may search from the video library to a second video related to the weak point of knowledge. Alternatively, the second video may be a micro-lesson video, which refers to a short-time video (3-5 min) of a lesson library that only speaks a certain knowledge point, and is created by a teacher or obtained by slicing a large lesson.

Optionally, the server may also recommend the weak knowledge points of the user based on learning records, learning preferences, regions, grades, and the like of the user, and select a micro-class video that is most suitable for the user from micro-classes related to the weak knowledge points in the course library. As shown in fig. 6, based on the weak knowledge point of "reflection of light", in combination with the user personalized information, the micro-lesson video which is most suitable for the "reflection of light" of the user is selected for the subsequent video embedding.

S320, determining an embedding position of the second video in the first video, wherein the embedding position is located in an unplayed segment of the first video.

Optionally, the server may separate the first video from the second video according to the explanation content, so as to facilitate subsequent analysis, embedding and synthesis of the video content. The input here is a first video and a second video, and the output is a respective set of video slices of the two videos.

In this step, the main technique involved is automated video slicing. Where video slicing may be understood as dividing a video into at least one video segment.

Alternatively, the server may perform OCR and ASR text recognition on the first video and the second video, and subject the recognized text, and then determine a start-stop timestamp of the text subject to determine a slice timestamp from the start-stop timestamp of the text subject. The start-stop time stamp of the text theme can be understood as a corresponding playing time point when the first video starts playing to the text theme, and the slice time stamp can be understood as which playing time point of the first video the server performs the video slicing operation. It will be appreciated that the content of one text topic is typically continuous. Video slicing is performed through text theme, so that the integrity of the content in the obtained video clip can be ensured.

It will be appreciated that in order to ensure continuity of interpreting a content or a sentence, avoiding that after embedding the second video, a content or a sentence is played out of play, it is necessary to perform object detection on the first video and the second video to detect uninterrupted continuous image content, and then use this continuous image content as a video slice for subsequent overall embedding.

Alternatively, the server performs object detection on the first video and the second video, and may perform segmentation again from the image content, and then determine the slice time stamp according to the start-stop time stamps of the continuous image content by determining the start-stop time stamps of the continuous image content. Wherein the detected object may be a person, a face, etc. that may appear continuously in a video segment. It can be understood that video slicing is performed through image content, so that face images which appear continuously can be prevented from being respectively located in different video slices, and the integrity of the content in the obtained video clips is ensured.

Alternatively, the server may also determine the slice time stamp in combination with the start-stop time stamp of the text topic and the image content. Thus, during video slicing, the server may perform two time stamp locating. The first is analysis of text to ensure continuity of text content and the second is analysis of image content to ensure continuity of image content. The two are mutually complemented, and the accuracy of video slicing is improved.

Alternatively, when the second video is a relatively short micro-lesson video, the video slicing may also be directed to only the first video. Optionally, slicing the video again may allow the video to be more accurate, as some micro-lessons may be doped with other knowledge points. Thus, the video slice may also be for the second video.

Alternatively, the server needs to find the appropriate point in time, i.e. the embedding location, in the first video in order to subsequently embed the second video at the appropriate point in time. The input to this step is the slice set of the first video and the second video, and the output is the timestamp of the embedding location of the second video in the first video.

The second video is further refined more accurately as the first video has been segmented into relatively independent slices. Therefore, in this step, the server may obtain a slice set of the subsequent play content of the first video (i.e. the content that is not yet played in the first video) so as to compare the slice set of the subsequent play content of the first video with the second video, and then the server may determine the embedding position of the second video in the first video according to the comparison result. In the embodiment of the application, the embedded position of the second video in the first video is positioned in the unplayed segment of the first video, namely in the subsequent play position of the current play position, so as to avoid influencing the watching of the current play content.

Optionally, the server may perform similarity detection on the slice set of the subsequent playing content of the first video and the second video, and find a termination timestamp of the slice video with the most similar theme to the second video and the most front timestamp in the slices, where the termination timestamp of the slice video is a timestamp corresponding to the embedding position. The termination time stamp of the slice video may be understood as a playing time point when the first video plays the slice video. As such, the subsequent server may embed the second video at the termination timestamp. Therefore, the second video can be most quickly embodied in subsequent play without interrupting the current watching continuity of the user.

Alternatively, the server may also use the start timestamp of the slice video in the slice that is most similar to the theme of the second video and has the timestamp that is the most forward (the nearest playing time point) as the timestamp corresponding to the embedding position of the second video in the first video.

Optionally, the server may obtain the similarity between each slice in the slice set of the subsequent play content of the first video and the second video, and then may obtain the target slice video with the similarity greater than the preset value as the slice video most similar to the theme of the second video in the subsequent play content of the first video. The preset value can be reasonably set according to an actual scene. And are not limited herein.

Optionally, for the same first video, due to the difference of different user knowledge levels, the interaction position may be different, and the second video that needs to be embedded in the first video may also be different, where the second video is embedded in the unplayed segment of the first video. In this way, the server can generate personalized embedded composite video for different users, even for the same first video.

S330, embedding the second video into the first video according to the embedding position.

Optionally, the server can finely adjust the color, the picture style and the like of the second video, so that the second video and the first video are unified as much as possible, abrupt sense of the second video when being played after being embedded into the first video is relieved, and continuous watching experience of a user is improved. The inputs of this step are the front and back slice video of the embedding position in the first video and the second video, and the output is the trimmed second video.

Optionally, the server may generate a new second video with the style adjusted by generating a countermeasure network (GAN) model in combination with the style of the first video slice and the content of the second video, so as to realize style migration and improve continuous viewing experience under the condition of ensuring that the video content is unchanged.

Alternatively, the server may output a new first video after embedding the second video, which may be referred to as a third video, based on the slice set of the first video, the second video, and the embedding timestamp. The server may return the generated new first video to the electronic device, which may continue playing based on the generated new first video.

Illustratively, as shown in fig. 5, the first video is currently outlined by speaking "refraction of light", and the teacher explanation relates to knowledge points such as "definition of light", "nature of light", "reflection of light", and the like. After the electronic equipment sends the 'light reflection' of the weak knowledge point selected by the user to the server, the server finds a second video corresponding to the 'light reflection' from the course library, and after video slicing and embedding point analysis, the second video corresponding to the 'light reflection' can be embedded into a proper position of the first video and then returned to the electronic equipment for playing. Thus, the user can quickly see the second video corresponding to the light reflection after continuing to watch the second video without interrupting the continuity of the light reflection 1 currently watched by the user. Where "reflection of light 1" and "reflection of light 2" are slices of "reflection of light" in the first video, and the combination may be the complete content of "reflection of light" in the first video.

It can be understood that in the process of watching the first video, the user can perform interactive feedback in various ways, and after each interactive feedback, the server can individually change the subsequent play content of the currently watched first video, so as to realize the teaching effect of the like-real person, namely, the student interaction needs feedback.

Alternatively, after the whole course of the first video is completed, the electronic device may save the first video embedded in the second video, that is, save the third video. Therefore, the server can finally generate a new personalized complete course based on each interaction effect, and the new personalized complete course is provided for the user at the electronic equipment side to learn and watch repeatedly.

It can be understood that, for the interaction mode of presetting question and answer questions in the first video and popup a window at a specific time stamp, the user is easy to interrupt watching, and the interaction of the application is actively initiated by the user when the user does not understand a certain piece of content, and no abrupt interrupt effect exists. In addition, for the learning materials (such as videos, texts and the like) which are relevant to the user recommendation after the interaction of the user, the learning interruption condition of switching from the first video to the learning materials is generated substantially in the form of inquiring and learning by the user, and the interaction result of the application is embedded into the subsequent playing content of the first video, so that the user does not need to switch among the learning materials, only needs to continuously watch, the required knowledge content can be reflected at the proper position in the subsequent playing process, and the learning continuity is ensured. In addition, after the whole course is learned, the server can also derive a new complete personalized course instead of the scattered resource segments, so that the repeated learning and use of the user are facilitated.

Alternatively, the server may include four components, video slicing, content analysis, style migration, video composition. The video slicing component is used for splitting the first video and/or the second video; the content analysis component is used for processing the content of the fragments of the first video and the fragments of the second video and determining the proper position for embedding the second video; the style migration component migrates the style of the second video on the premise of not changing the content, ensures the style to be unified with the style of the original first video as much as possible, and relieves the abrupt sense in playing; and the video synthesis component synthesizes the second video and the original first video into a new personalized course based on the results of the components, so that the user can continue to watch and use the new personalized course repeatedly.

And S340, when the playing progress of the first video reaches the embedded position, playing the second video.

It can be understood that, since the first video is always in playing and the embedding position of the second video is in an unreflected section of the first video, the electronic device can update the video content which is played by the original electronic device later to the corresponding content in the first video after the second video is embedded in the playing process of the first video, so that when the first video is played to the embedding position, the embedded second video can be played normally.

And S350, when the second video is played, continuing to play the unreleased fragments of the first video after the embedding position.

It can be understood that, since the first video embedded in the second video is always being played, after the embedded second video is played, the electronic device can continue playing the first video embedded in the second video, that is, continue playing the unplayed clip of the first video after the embedded position.

Optionally, since the second video can only be embedded after the current playing position, and when the content related to the second video exists before the current playing position, the related content is easily laid out at two positions of the video respectively, resulting in the scattering of knowledge points, so that after the first video, that is, the third video embedded in the second video is played, the server can also match the second video in the third video with the first video, so as to readjust the second video in the third video at a proper playing position according to the content matching result of the second video and the first video, for example, the position of the first video containing the content related to the second video, so that the server can obtain a new third video after the second video playing position is adjusted, which can also be called as a fourth video. And the relevance of knowledge points in the finally generated comprehensive video is improved, so that the user can learn systematically.

Optionally, when the user does not understand the knowledge points of the second video, the second video is embedded into the first video to obtain the third video, but after the user understands the knowledge points of the second video, each time the user views the third video, the user sees the knowledge points after understanding, repeatedly views the knowledge points, and does not use other knowledge points that the user cannot learn quickly. Therefore, the electronic device may also display a skip control on a portion of the third video related to the second video content, and when detecting the click operation of the skip control by the user, it indicates that the user does not want to see the content of the second video, and indicates that the user has understood the knowledge point of the second video, where the electronic device or the server may delete the second video in the third video. To obtain a new video after deletion.

Optionally, after the server generates the test question based on the same knowledge point, if it is detected that the answer of the user to the knowledge point corresponding to the second video is correct, the server may consider that the user already understands the knowledge point of the second video, and at this time, the server may delete the second video in the third video. To obtain a new video after deletion.

Further, in the playing process of the first video, the video processing method of the application can also construct a knowledge navigation tree corresponding to the first video according to the real-time playing content of the first video. The knowledge navigation tree can be a knowledge point tree diagram for assisting in combing the knowledge points involved in the first video, and can help a user to learn learning context, perform systematic authentication and perform systematic learning.

Optionally, as a way, the server may obtain the main branch of the knowledge tree by crawling the outline of the first video, perform target detection and identification through a frame picture or a courseware picture in the network course, extract the text content thereof, and then extract the knowledge points from the text content based on a preset rule, so as to hierarchy the knowledge points, and merge into the main branch of the knowledge tree, thereby generating the knowledge tree of the first video. Therefore, knowledge points contained in the first video can be associated and organized hierarchically, and the learning effect of the user can be improved.

It will be appreciated that the knowledge tree generated is static, and a knowledge tree diagram can be generated for each recorded network course, but the knowledge tree diagram can only represent knowledge layers of courses, and is a static and invariable knowledge tree. Therefore, for students with dynamic learning, the guiding effect is limited, and the knowledge tree generated in the manner cannot be individually adapted to the dynamic learning process of each student. Therefore, the knowledge tree generated in the above manner is applicable to the first video in which the second video is not embedded.

The first video in the embodiment of the present application is the content in the first video that can be continuously changed along with the interactive feedback of the user, so the static and fixed knowledge tree is not suitable for the first video embedded with the second video.

In the embodiment of the application, the server can generate the knowledge navigation tree corresponding to the first video according to the current playing content of the first video in the process of playing the first video by the electronic equipment.

Optionally, in the process that the electronic device plays the first video, the server may detect knowledge points related to the current playing content of the electronic device in real time, generate a knowledge navigation tree in combination with the education knowledge graph, and record the starting time point of each knowledge point in the video on the tree for real-time jump positioning.

It will be appreciated that the knowledge navigation tree generated by the present application is not the same as the static knowledge tree described above. In the embodiment of the application, the user interaction can change the subsequent playing content of the first video, so that the playing content of the first video is dynamically changed, and the knowledge navigation tree is updated in real time along with the user interaction. The knowledge points introduced by the user interaction can be automatically analyzed by the server to be related to the current knowledge navigation tree, and combing and summarizing are carried out, so that the personalized learning knowledge navigation tree which is unique to the user is finally formed. The personalized knowledge navigation tree not only carries out navigation and carding on knowledge points in the first video, but also is a representation of dynamic learning of the user, helps the user to provide a compendium for the user, and improves the efficiency through systemized learning.

Optionally, if the second video is embedded in the unplayed segment of the first video in the process of playing the first video by the electronic device, when the playing progress of the first video reaches the embedding position of the second video, the video processing method of the application can also update the content of the second video into the knowledge navigation tree. Thus, the dynamic updating of the personalized knowledge navigation tree along with the watching progress and interactive feedback of the user is realized.

Alternatively, as shown in fig. 8, the server may perform content identification on the currently played content of the first video during the playing process of the first video. Optionally, the input of this step is the video frame currently playing the first video, and the output is the text content of that frame and nearby frames. The purpose of which is to convert the video lesson into text content for further analysis.

This step is generally similar to the techniques and principles employed for "content recognition" previously described, both OCR and ASR recognition techniques being used. The difference is that the navigation tree is triggered when the user interacts with the feedback, but the navigation tree is updated in real time according to the real-time detection knowledge points by playing the real-time detection analysis.

After obtaining the text content of the currently played video frame and the nearby frames, the server can output key knowledge points of the text content, namely main speaking knowledge points and main knowledge points. Compared with the prior knowledge point extraction step of listing the related knowledge points to the user for selection, the knowledge navigation tree is better updated by screening the knowledge points of the points and removing the side branch end sections on the basis.

In the step, a server can firstly perform named entity identification and theme extraction on text content in combination with a knowledge graph to acquire a knowledge point set related to a current video playing frame; then, calculating importance scores for each knowledge point according to the characteristics of the occurrence frequency, the continuous duration, the precursor successor relationship and the like of each knowledge point for the knowledge point set; and finally, selecting the most important knowledge points to output so as to extract the time stamps later and update the knowledge navigation tree.

Alternatively, the server may obtain the important knowledge point according to the foregoing, and output a start-stop timestamp that is the important knowledge point. As a way. When a server detects a certain important knowledge point, the occurrence time of the important knowledge point can be saved as the starting time; when the next important knowledge point is continuously detected, the occurrence time of the next important knowledge point is saved as the termination time. And (5) continuously detecting in this way, and obtaining the start-stop time stamps of all important knowledge points.

Alternatively, the server may output as an updated knowledge navigational tree based on the important knowledge points and corresponding start-stop time stamps. As a way, the server may add the important knowledge points to the knowledge navigation tree based on the detected important knowledge points and the start-stop time stamps thereof, and in combination with the knowledge point map, label the predecessor and successor relationships between the current added node and the previous node, and feed back the updated knowledge navigation tree to the electronic device, so that the electronic device displays the knowledge navigation tree for the user to view. As shown in fig. 9, each node stores a start time stamp and an end time stamp of a corresponding important knowledge point in the entire video. Alternatively, the start timestamp and the end timestamp may be set to invisible states.

Alternatively, if the user needs to learn the knowledge of the second node before learning the knowledge of the first node, the second node may be a precursor node of the first node. If the user learns the knowledge of the first node before learning the knowledge of the third node, the third node may be a successor node to the first node.

Optionally, the user may perform a click operation on a node on the knowledge navigation tree displayed by the electronic device, and the electronic device may quickly jump to the video content corresponding to the node in the first video to play in response to the click operation, so that the user may quickly view the content of the knowledge point that the user wants to see. Thus, on the generated knowledge navigation tree, the user can perform thematic knowledge video viewing, and by clicking a certain tree node, the user can jump to a specific time stamp accurately to perform video viewing.

Optionally, when the first video embedded with the second video is played to the embedded position, the server may update the knowledge navigation tree corresponding to the original first video according to the current playing content of the second video.

It can be understood that the user can watch the video in real time, the interaction can be performed in the middle, and the user can directly continue watching the video after the interaction is completed. The server detects the current playing content in real time, analyzes and extracts the knowledge points involved in the current playing content, updates the knowledge point navigation tree structure in real time, starts and ends time points of all nodes (knowledge points) on the tree in the video, and returns the starting and ending time points to the electronic equipment.

After receiving the updated knowledge navigation tree, the electronic device can check the information such as the current learning knowledge point, the position in the whole knowledge structure, the previous and subsequent knowledge of the knowledge point and the like. In addition, a certain node (knowledge point) can be clicked, and the specific knowledge video can be jumped to for special learning.

By way of example, fig. 9 shows a schematic diagram of an update process of the knowledge navigation tree of the first video. As shown in fig. 9, in the initial stage of playing the original first video, the server detects the important knowledge point of "refraction of light", and the start time is 0 minutes and the end time is 10 minutes in the first video.

As shown in fig. 6, when a part of the video playing by "refraction of light" is performed, the user performs interactive feedback, and the server determines that the weak knowledge point of the user is "reflection of light" by detecting the video content corresponding to the interactive position, the server can find a second video corresponding to the knowledge point, that is, a micro-class course of "reflection of light" from the course library, and embed the second video into the appropriate position of the first video to become the playing content of the subsequent first video. The proper position of the embedded first video may be a termination time stamp of the light refraction 1 after the video slice of the light refraction 1 and the light refraction 2. As shown in fig. 6, a micro lesson of "reflection of light" indicated by a black inverted triangle is embedded at the termination time stamp of "refraction of light 1".

It can be understood that when the electronic device continues to play the subsequent play content of the first video, the server may detect the important knowledge point of "light reflection" in real time, and combine the knowledge map to generalize the important knowledge point to the existing knowledge navigation tree, and at this time, the timestamp of the original knowledge point is also changed due to the change of the play content of the first video, so as to obtain the updated knowledge navigation tree as shown in fig. 9. As shown in fig. 9, the "refraction of light" video segment at this time is divided into two segments of "refraction of light 1" of 0 to 3 minutes and "refraction of light 2" of 6 to 13 minutes, because the "reflection of light" second video of 3 to 6 minutes is embedded halfway.

Similarly, after the second video of the subsequent characteristic of each color light is embedded, the knowledge navigation tree is changed in the same way. As shown in fig. 7, when the electronic device continues to play the subsequent play content of the first video, the server may detect the important knowledge point of "dispersion of light" in real time, and the start time is 18 minutes and the end time is 24 minutes in the first video.

When the 'light dispersion' video is completely played, the user performs interactive feedback, the server can adaptively generate test questions by detecting video contents corresponding to the interactive positions, and then the server can determine that the weak knowledge points of the user are 'characteristics of each color light' according to the test questions which are wrongly answered by the user. The server can find a second video corresponding to the knowledge point from the course library, namely, a micro course of the characteristic of each color light, and the second video is embedded into a proper subsequent position of the first video to become the playing content of the subsequent first video. Wherein the proper location of the embedded first video may be a termination time stamp of "dispersion of light". As shown in fig. 7, a micro lesson of "characteristics of each color light" indicated by a black inverted triangle is embedded at the end time stamp of "dispersion of light".

It can be understood that, as shown in fig. 7, when the electronic device continues to play the subsequent play content of the first video, the server may detect the important knowledge point of the characteristic of each color light in real time, and combine the important knowledge point with the knowledge graph to generalize the important knowledge point to the existing knowledge navigation tree, and at this time, the timestamp of the original knowledge point also changes due to the change of the play content of the first video, so as to obtain the updated knowledge navigation tree shown in fig. 9. As shown in fig. 9, the original 24-29 minutes of "light diffraction" video segment, because the 6 minutes of "characteristic of each color light" second video is embedded in front of it, changes the time stamp of the "light diffraction" video segment to 30-35 minutes.

Optionally, when the knowledge points are detected and summarized on the real-time knowledge navigation tree, the server automatically establishes the association relationship between the knowledge points and the prior knowledge points, such as the successor association relationship, and the like, so that the user is helped to find the relationship between the current learning content and the prior learning content, and the learning content is more comprehensively and systematically known. As shown in fig. 9, for example, "reflection of light" and "refraction of light" are parallel knowledge points, and there is no precursor successor relationship. For another example, a node of "calculation of refractive index" may be a subsequent node of "refraction of light", and a subsequent relationship exists between a node of "calculation of refractive index" and "refraction of light".

It can be understood that during the playing process of the first video, the server can detect the content of the played video in real time, and combine with the knowledge graph analysis to sort and generalize the knowledge navigation tree for the user. In addition, as the subsequent playing content is continuously and individually supplemented and changed along with the interactive feedback of the user, the knowledge point navigation tree is also personalized and adaptively changed along with the learning of the user.

Optionally, after the server generates the real-time knowledge navigation tree, the real-time knowledge navigation tree may be sent to the electronic device for display. The user can accurately jump to a specific time stamp for video viewing by clicking a certain tree node on the real-time knowledge navigation tree generated based on the second video at the electronic device.

Alternatively, the real-time knowledge navigation tree displayed by the electronic device may only include important knowledge points, and the timestamp information of each important knowledge point may not be displayed.

Optionally, a deletion control may be added to the real-time knowledge navigation tree displayed by the electronic device. When a user clicks a deletion control of a certain tree node on the real-time knowledge navigation tree, the electronic equipment or the server can be triggered to delete the tree node in the real-time knowledge navigation tree. Optionally, the electronic device or the server may delete the knowledge point video corresponding to the deletion tree node from the integrated video corresponding to the real-time knowledge navigation tree synchronously, so as to obtain a new integrated video.

According to the video processing method provided by the embodiment of the application, the server can determine the weak knowledge points of the user through the interactive feedback of the user; then searching the knowledge point related video most suitable for the user from a video library; finally, the video is embedded into a proper subsequent position of the original video, so that the content required by the user is embedded into a subsequent playing process of the original video on the premise of not influencing the continuous watching of the user. Meanwhile, in the process of playing the original video by the electronic equipment, the server automatically analyzes the video content by detecting the video content currently played by the electronic equipment in real time, extracts knowledge points in the video content, and can correlate and link the current knowledge points with the previous knowledge points according to the knowledge structure so as to generate a knowledge navigation tree. Because the user continuously interacts and feeds back, the follow-up content of the original video can be continuously changed, and the knowledge navigation tree generated by the application is dynamically updated along with each interaction, watching progress and video content change of the user based on the original video. Therefore, the knowledge navigation tree generated by the application is not based on the original video, but can be dynamically updated according to the user interaction and the watching progress so as to generate the knowledge navigation tree personalized by the user, and the learning context of the user can be displayed.

It can be understood that after each interaction of the user is completed, the server can embed appropriate learning resources into the subsequent playing process of the original course, so that the user can see the required content resources in the subsequent scenario without interrupting the switching learning and continuing to watch the video after the interaction of the user is completed. The user interaction is the interaction of the user at the specific position of the course based on the state of the user, and is personalized interaction, so that the server can splice a plurality of detailed resources required by the user based on the original video through multiple user interactions to form a complete and comprehensive personalized course video for repeated use of the user.

The video processing method provided by the embodiment of the application can be used for course learning of an online education platform, when a user watches course video, the content of an original course can be dynamically adjusted and changed based on interactive feedback of the user, meanwhile, a knowledge navigation tree is dynamically generated along with course change, and finally, a complete personalized new course is output to the user for repeated use by the user. Therefore, the application can generate personalized courses with dynamic knowledge navigation trees for users based on the existing courses and user interaction, reduce the learning difficulty of the courses, improve the course finishing rate of the courses and improve the learning effect of the users.

It will be appreciated that the video processing system described above may include corresponding hardware and/or software modules that perform the functions described above in order to achieve the functions described above. The present application can be implemented in hardware or a combination of hardware and computer software, in conjunction with the example algorithm steps described in connection with the embodiments disclosed herein. Whether a function is implemented as hardware or computer software driven hardware depends upon the particular application and design constraints imposed on the solution. Those skilled in the art may implement the described functionality using different approaches for each particular application in conjunction with the embodiments, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

The present embodiment may divide the functional modules of the server and the electronic device in the video processing system according to the above method example, for example, each functional module may be divided corresponding to each function, or two or more functions may be integrated into one processing module. The integrated modules described above may be implemented in hardware. It should be noted that, in this embodiment, the division of the modules is schematic, only one logic function is divided, and another division manner may be implemented in actual implementation.

It should be noted that, all relevant contents of each step related to the above method embodiment may be cited to the functional description of the corresponding functional module, which is not described herein.

Still further embodiments of the present application provide a video processing apparatus, which may be applied to the above server. The device is configured to perform the functions or steps performed by the server in the method embodiments described above.

The embodiment of the application also provides a chip system which comprises at least one processor and at least one interface circuit. The processors and interface circuits may be interconnected by wires. The interface circuit may read the instructions stored in the memory and send the instructions to the processor. The instructions, when executed by the processor, may cause the video processing device, the server, to perform the functions or steps of the method embodiments described above.

The embodiment of the application also provides a computer storage medium, which comprises computer instructions, and when the computer instructions operate the video processing device and the electronic equipment, the video processing device and the server execute the functions or steps in the method embodiment.

Embodiments of the present application also provide a computer program product which, when run on a computer, causes the computer to perform the functions or steps performed by the server in the method embodiments described above.

The video processing system, the video processing device, the server, the electronic device, the computer storage medium, the computer program product, or the chip provided in this embodiment are all configured to execute the corresponding methods provided above, so that the beneficial effects achieved by the video processing system, the video processing device, the server, the electronic device, the computer storage medium, the computer program product, or the chip can refer to the beneficial effects in the corresponding methods provided above, and are not described herein.

It will be apparent to those skilled in the art from this description that, for convenience and brevity of description, only the above-described division of the functional modules is illustrated, and in practical application, the above-described functional allocation may be performed by different functional modules according to needs, i.e. the internal structure of the apparatus is divided into different functional modules to perform all or part of the functions described above.

In the several embodiments provided by the present application, it should be understood that the disclosed apparatus and method may be implemented in other manners. For example, the apparatus embodiments described above are merely illustrative, e.g., the division of the modules or units is merely a logical functional division, and there may be additional divisions when actually implemented, e.g., multiple units or components may be combined or integrated into another apparatus, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or units, which may be in electrical, mechanical or other form.

The units described as separate parts may or may not be physically separate, and the parts displayed as units may be one physical unit or a plurality of physical units, may be located in one place, or may be distributed in a plurality of different places. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional unit in the embodiments of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.

The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a readable storage medium. Based on such understanding, the technical solution of the embodiments of the present application may be essentially or a part contributing to the prior art or all or part of the technical solution may be embodied in the form of a software product stored in a storage medium, including several instructions for causing a device (may be a single-chip microcomputer, a chip or the like) or a processor (processor) to perform all or part of the steps of the method described in the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read Only Memory (ROM), a random access memory (random access memory, RAM), a magnetic disk, or an optical disk, or other various media capable of storing program codes.

The foregoing is merely illustrative of specific embodiments of the present application, but the scope of the present application is not limited thereto, and any changes or substitutions within the technical scope of the present application should be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. A video processing method, applied to a server, the method comprising:

Receiving a first operation for a first video sent by electronic equipment in the process of playing the first video by the electronic equipment;

acquiring a second video corresponding to the first operation;

embedding the second video into the unplayed segment of the first video to obtain an embedded third video;

And sending the third video to the electronic equipment, wherein the third video is used for playing the second video when the playing progress of the first video reaches the embedding position of the second video by the electronic equipment.

2. The method of claim 1, wherein the embedding the second video in the unplayed segments of the first video comprises:

Determining an embedding position of the second video in the unplayed segment of the first video;

And embedding the second video into the unplayed fragments according to the embedding position.

3. The method of claim 2, wherein the determining the embedding location of the second video in the unplayed segments of the first video comprises:

acquiring an unplayed segment of the first video;

And determining the embedding position of the second video in the unplayed segment according to the comparison result of the unplayed segment and the second video.

4. The method of claim 3, wherein the unplayed clip comprises a plurality of sub-clips, and wherein the determining the embedding location of the second video in the unplayed clip based on the comparison of the unplayed clip to the second video comprises:

Comparing the plurality of sub-segments with the second video respectively, and obtaining target sub-segments with comparison results meeting preset conditions from the plurality of sub-segments;

and determining the embedding position of the second video in the unplayed segment according to the playing time point of the target sub-segment in the first video.

5. The method according to claim 4, wherein the method further comprises:

Dividing the unplayed fragments into a plurality of sub-fragments according to the text content in the unplayed fragments; and/or

And detecting the image content in the unplayed fragments, and dividing the unplayed fragments into a plurality of sub-fragments.

6. The method of claim 4, wherein comparing the plurality of sub-segments with the second video, respectively, and obtaining a target sub-segment from the plurality of sub-segments, where a comparison result meets a preset condition, includes:

Respectively acquiring the similarity between the plurality of sub-segments and the second video;

And obtaining target sub-fragments with similarity larger than a preset value from the plurality of sub-fragments.

7. The method of claim 6, wherein the target sub-segment is a plurality, the method further comprising:

and acquiring the target sub-segment with the nearest playing time point from the target sub-segments.

8. The method according to any one of claims 1-7, further comprising:

In the process of playing a first video by electronic equipment, generating a knowledge navigation tree corresponding to the first video according to the current playing content of the first video, wherein the knowledge navigation tree comprises at least one tree node, and each tree node corresponds to a segment in the first video;

when the playing progress of the first video reaches the embedded position of the second video, updating the knowledge navigation tree according to the current playing content of the second video, wherein the updated knowledge navigation tree comprises a target tree node, and the target tree node corresponds to the second video.

9. The method of claim 8, wherein generating the knowledge navigation tree corresponding to the first video according to the current playing content of the first video comprises:

Determining a start time stamp and an end time stamp of a knowledge point corresponding to the current playing content according to the current playing content of the first video;

and generating a knowledge navigation tree corresponding to the first video according to the start time stamp and the end time stamp of the knowledge point.

10. The method of claim 8, wherein the method further comprises:

Receiving a deleting operation, sent by the electronic equipment, for the target tree node in the updated knowledge navigation tree;

And deleting the second video corresponding to the target tree node from the third video.

11. The method of claim 8, wherein the method further comprises:

Receiving clicking operation, sent by the electronic equipment, for the target tree node in the updated knowledge navigation tree;

And sending the second video corresponding to the target tree node in the third video to the electronic equipment so as to control the electronic equipment to play the second video.

12. The method according to any one of claims 1-11, further comprising:

after the electronic equipment finishes playing the first video, matching the second video with the first video;

Determining the embedding position of the second video in the first video again according to the matching result of the second video and the first video;

and re-embedding the second video into the first video according to the re-determined embedding position to obtain a re-embedded fourth video.

13. The method of any of claims 1-12, wherein the acquiring a second video corresponding to the first operation comprises:

acquiring the current playing content of the first video corresponding to the first operation;

and acquiring a second video corresponding to the current playing content.

14. A server, wherein the server comprises a memory and one or more processors; the memory is coupled to the processor; the memory is for storing computer program code comprising computer instructions which, when executed by the processor, cause the server to perform the method of any of claims 1-13.

15. A computer storage medium comprising computer instructions which, when run on an electronic device, cause the electronic device to perform the method of any one of claims 1-13.

16. A computer program product, characterized in that the computer program product, when run on a computer, causes the computer to perform the method according to any of claims 1-13.