CN112911332B

CN112911332B - Method, apparatus, device and storage medium for editing video from live video stream

Info

Publication number: CN112911332B
Application number: CN202011591386.XA
Authority: CN
Inventors: 李晨曦; 庞磊; 王珊; ***
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2020-12-29
Filing date: 2020-12-29
Publication date: 2023-07-25
Anticipated expiration: 2040-12-29
Also published as: CN112911332A

Abstract

The present disclosure discloses a method, apparatus, device and storage medium for editing video from live video streams, relating to the field of artificial intelligence. A method for editing video from a live video stream comprising: acquiring a plurality of video clips of a first preset time length from a video stream; determining whether the plurality of video clips belong to a predetermined classification; and in response to determining that a first video clip of the plurality of video clips belongs to the predetermined category, extracting video sub-clips of a second predetermined length of time from the first video clip as at least a portion of the video clip, wherein the second predetermined length of time is less than the first predetermined length of time. Aspects of the present disclosure enable category identification for, for example, live video streams and category-based automatic video editing.

Description

Method, apparatus, device and storage medium for editing video from live video stream

Technical Field

The present disclosure relates generally to the field of artificial intelligence, and more particularly, to methods, apparatuses, devices, and storage media for editing video from live video streams.

Background

With the development of technology, videos, particularly live videos, are widely used for information dissemination, product sales, social activities, and the like. In general, live video is live broadcast to network users by using the internet and streaming media technology, and the live broadcast of the video integrates elements such as images, sounds and the like, and the sound shape is combined. Users can watch live video broadcast on various clients such as PC, mobile phone and the like through the Internet. In live video, highlights, such as talent shows, may appear in the network anchor's interactions with the network user. It is often desirable to extract highlight segments of video from live video.

In the traditional video clipping method, a host needs to record videos while live broadcasting, the host browses live broadcasting and plays back the recorded videos manually after live broadcasting is finished, the starting time of a highlight segment is searched, the highlight segment is clipped by a video clipping tool, and the manufacturing mode is time-consuming to manufacture and low in generating efficiency. It is desirable to be able to improve the video editing method of live video.

Disclosure of Invention

According to example embodiments of the present disclosure, a method, apparatus, device, and storage medium for video streaming editing video are presented that can improve or eliminate one or more of the above-mentioned technical problems.

In a first aspect of the present disclosure, a method for video editing from a live video stream is provided. The method comprises the following steps: acquiring a plurality of video clips of a first preset time length from a video stream; determining whether the plurality of video clips belong to a predetermined classification; and in response to determining that a first video clip of the plurality of video clips belongs to the predetermined category, extracting video sub-clips of a second predetermined length of time from the first video clip as at least a portion of the video clip, wherein the second predetermined length of time is less than the first predetermined length of time.

In a second aspect of the present disclosure, an apparatus for editing video from a live video stream is provided. The device comprises: a video clip acquisition module configured to acquire a plurality of video clips of a first predetermined length of time from a video stream; a classification module configured to determine whether the plurality of video clips belong to a predetermined classification; and an extraction module configured to: in response to determining that a first video segment of the plurality of video segments belongs to a predetermined category, video sub-segments of a second predetermined length of time are extracted from the first video segment as at least a portion of the video clip segment, wherein the second predetermined length of time is less than the first predetermined length of time.

In a third aspect of the present disclosure, an electronic device is provided. An electronic device includes: one or more processors; and storage means for storing one or more programs which, when executed by the one or more processors, cause the one or more processors to implement the method according to the first aspect described above.

In a fourth aspect of the present disclosure, there is provided a computer readable storage medium having stored thereon a computer program which when executed by a processor implements a method according to the foregoing first aspect.

In a fifth aspect of the present disclosure, there is provided a computer program product comprising a computer program which, when executed by a processor, implements a method according to the first aspect described above.

According to the method, the device, the equipment and the storage medium for editing video from the live video stream, category identification of the video stream and automatic video editing based on categories can be realized.

It should be understood that what is described in this summary is not intended to limit the critical or essential features of the embodiments of the disclosure nor to limit the scope of the disclosure. Other features of the present disclosure will become apparent from the following description.

Drawings

The above and other features, advantages and aspects of embodiments of the present disclosure will become more apparent by reference to the following detailed description when taken in conjunction with the accompanying drawings. In the drawings, wherein like or similar reference numerals designate like or similar elements, and wherein:

FIG. 1 illustrates a schematic diagram of an example environment in which embodiments of the present disclosure may be implemented;

fig. 2 illustrates a flowchart of a method for editing video from a live video stream, in accordance with some embodiments of the present disclosure;

FIG. 3 illustrates a flowchart of a method for extracting video sub-clips from a video clip, according to some embodiments of the present disclosure;

FIG. 4 illustrates a flowchart of adjusting video clip positions based on extracted video sub-segments, according to some embodiments of the present disclosure;

FIG. 5 illustrates a flowchart for adjusting video clip positions based on extracted video sub-segments, according to some embodiments of the present disclosure;

fig. 6 illustrates a schematic block diagram of an apparatus for editing video from a live video stream, in accordance with some embodiments of the present disclosure;

fig. 7 illustrates a schematic block diagram of an extraction module of an apparatus for editing video from a live video stream, in accordance with some embodiments of the present disclosure;

fig. 8 illustrates a schematic block diagram of a sharpness determination module of an apparatus for editing video from a live video stream, in accordance with some embodiments of the present disclosure; and

fig. 9 illustrates a block diagram of an apparatus capable of implementing various embodiments of the present disclosure.

Detailed Description

Embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While certain embodiments of the present disclosure have been shown in the accompanying drawings, it is to be understood that the present disclosure may be embodied in various forms and should not be construed as limited to the embodiments set forth herein, but are provided to provide a more thorough and complete understanding of the present disclosure. It should be understood that the drawings and embodiments of the present disclosure are for illustration purposes only and are not intended to limit the scope of the present disclosure.

In describing embodiments of the present disclosure, the term "comprising" and its like should be taken to be open-ended, i.e., including, but not limited to. The term "based on" should be understood as "based at least in part on". The term "one embodiment" or "the embodiment" should be understood as "at least one embodiment". The terms "first," "second," and the like, may refer to different or the same object. Other explicit and implicit definitions are also possible below.

As described above, with the rapid development of the mobile internet, the consumption mode of people is gradually changed, and the demands of people for mental culture are increasing, and live broadcasting becomes the next popular entertainment culture. With the development of live video, live video provides one of the main portals for traffic for network platforms. During video live broadcasts, the anchor typically interacts with network users, wherein during the interaction, the anchor typically has highlight talent segments that may include singing, dancing, instrumental performance, etc., as non-limiting examples; in other embodiments, these performance properties may also include talk shows and the like. It is desirable to clip these highlight clips from these live video streams. The method and the device for editing the video can realize video category identification of live video streams and automatic video editing based on the identification categories. Although embodiments of the present disclosure illustrate a video clip method with live video streaming as an example, it should be understood that this is merely exemplary. The video editing method according to the present disclosure can also be used for non-live video editing.

Embodiments of the present disclosure will be described in detail below with reference to the accompanying drawings.

FIG. 1 illustrates a schematic diagram of an example environment 100 in which various embodiments of the present disclosure may be implemented. One or more network anchor terminals 110, a server terminal 120, and one or more network client terminals 130 are included in the example environment 100. The network anchor 110 is an intelligent device with internet access capability, and typically carries various operating systems, video devices, and audio devices, and is adapted to collect video images and audio of the network anchor. The network anchor 110 may install live video software such as a computer program, mobile application App, etc. suitable for live video. The webmaster may webcast by starting the video live software of the webmaster 110. Common network anchor 110 may include a mobile intelligent terminal, a notebook, a desktop, a tablet, or other device that provides computing power and data acquisition and communication capabilities.

Server side 120 may include servers, network nodes such as mainframe servers, cloud computing devices such as Virtual Machines (VMs), and any other device providing computing power. In a cloud environment, a server side is sometimes also called a remote control server, a cloud device, and a cloud control platform. The server side 120 may be configured to process the collected live video to live the user.

The network client 130 is an intelligent device having internet access capability and adapted to receive video and audio of a network anchor, and the network client 130 may include a mobile intelligent terminal, a notebook computer, a desktop computer, a tablet, a smart television, or other devices providing computing power and data communication capability. In some embodiments, the network client 130 may be installed with live video software such as a computer program, mobile application App, and the like. The network user may view live video by activating the network user side 130. In some embodiments, the network client 130 may include various input/output devices, such as a keyboard, mouse, stylus, video input device, audio input device, to interact with the network anchor for text, audio, video, etc.

In the example environment 100, the webcast end 110 may collect video live images as well as audio live signals of the webcast and transmit the collected live signals to the server end 120. The server side 120 may be configured to send a live signal of a network anchor to the network client 130. The network user of the network user side 130 can watch the live video broadcast of the network anchor through the live video software.

It should be noted that although in the illustrated example environment 100, the network anchor 110 and the network client 130 are depicted as separate functional units, it should be appreciated that this is merely exemplary, and that the network anchor 110 may be implemented as the network client 130, and the network client 130 may be implemented as the network anchor 110.

Methods for editing video according to some embodiments of the present disclosure are described in detail below in conjunction with fig. 2-5.

Fig. 2 illustrates a flowchart of a method 200 for editing video according to some embodiments of the present disclosure. The method 200 may be implemented at the server-side 120 of fig. 1. In other embodiments, the method 200 may be implemented at the network anchor 110 of fig. 1. In other embodiments, the method 200 may be implemented at the network client 130 of fig. 1, for example, in the case of obtaining network anchor authorization. It should be understood that although shown in a particular order, some steps in method 200 may be performed in a different order than shown or in parallel. Embodiments of the disclosure are not limited in this respect.

At 202, a plurality of video clips are acquired from a video stream for a first predetermined length of time. At 204, it is determined whether the plurality of video clips belong to a predetermined classification. At 206, in the event that a first video clip of the plurality of video clips is determined to belong to a predetermined category, video sub-clips of a second predetermined length of time are extracted from the first video clip as at least a portion of the video clip. The second predetermined length of time is less than the first predetermined length of time.

The video stream may include a wide variety of live material, as well as non-live material, from which it is difficult to clip video segments from the video stream. According to the method 200 of an embodiment of the present disclosure, a plurality of video clips are acquired from a video stream for a first predetermined length of time. In some embodiments, the first predetermined length of time may be set by a user in software; in some embodiments, the first predetermined length of time may also be automatically set by software; may be a fixed setting or may be altered by the user. In some embodiments, the first predetermined length of time may be 45s. In other embodiments, the first predetermined length of time may be 30s, 40s, 50s, 60s, 120s, 360s, etc. In other embodiments, the first predetermined length of time may be determined according to a category of the level intended for clipping. For example, in some embodiments, video of dance and instrumental performance is clipped to a 15s clip, and singing performance clips may be clipped to a 25s clip. It should be understood that this is by way of example only and not by way of limitation, and any other suitable time may be applied to the present disclosure.

And identifying the video category of the first video segment aiming at the first video segment in the acquired plurality of video segments. The video category may be a predetermined video category. In some embodiments, the predetermined classification may include a performance class, wherein the performance class may include singing, dancing, and/or musical instrument performance. It should be understood that this is merely exemplary and that other types of classification types may be included. In some embodiments, a classification model may be established for each classification, based on which it may be determined whether the first video segment belongs to a predetermined classification. The classification model may include various implementations, for example, may include a dual-stream network method, an image feature aggregation, a convolution method, and so forth, and detailed descriptions thereof are omitted, as classifying the video based on the classification model is not an issue of the present disclosure. It should be understood that other classification methods may also be employed.

In case it is determined that the first video segment belongs to a predetermined category, video sub-segments of a second predetermined length of time are extracted from the first video segment as at least part of the video clip. Thus, further clips can be made for the first video clip of the term predetermined category to extract the video clip that meets the time requirements. There are various implementations of the extraction method herein. In some embodiments, the video sub-segments may be extracted according to a size relationship between the second predetermined length of time and the first predetermined length of time. In some embodiments, the video sub-clip may be extracted as a start clip time of the video sub-clip after a predetermined time of the start time of the first video clip.

According to the method 200 for editing video from a live video stream according to an embodiment of the present disclosure, a classification model is used for classification of video streams, classification is performed for video clips acquired in real time, and extraction of video clips is performed based on the classification, without editing video clips that do not meet a predetermined classification. The identification of video categories and the automatic editing of videos can be efficiently realized, and manpower, storage space and processing resources are effectively saved. And video meeting the requirements is clipped through video classification, so that the clipping efficiency is improved. The generated video clip video (i.e., short video) may be reused, e.g., for traffic drainage to the user.

Fig. 3 illustrates a flow chart of a method 300 for extracting video sub-clips from a video clip according to some embodiments of the present disclosure. In method 300, at 302, a plurality of video sub-segments of a second predetermined length of time are acquired from a first video segment. At 304, it is determined whether the plurality of video sub-segments belong to a predetermined classification. At 306, in response to determining that the video sub-segments belong to a predetermined classification, the video sub-segments are extracted as video clips.

Obtaining a plurality of video sub-segments of a second predetermined length of time from a first video segment may include a variety of implementations. In some embodiments, the first video segment is divided equally in time sequence into a plurality of clip video sub-segments of a predetermined length of time. The plurality of predetermined time lengths may not overlap each other in time or may overlap each other in time, depending on the magnitude of the second predetermined time length.

According to the method 300 of the embodiment of the disclosure, since a plurality of video sub-segments with a second predetermined time length are acquired from the first video segment, the acquired plurality of video sub-segments are classified secondarily, so that the video clipping precision can be further improved.

In some embodiments, in the event that it is determined that the video sub-segments belong to a predetermined classification, the video sub-segments may be extracted as video clip segments; in other embodiments, video sub-segments belonging to a predetermined category may be combined to synthesize a video clip.

In some embodiments, determining whether the first video segment belongs to the predetermined classification is performed based on a first classification threshold. Thus, a first video segment may be initially screened. Determining whether the plurality of video sub-segments belong to a predetermined classification is performed based on a second classification threshold; wherein the first classification threshold is less than the second classification threshold. Thus, the video sub-segments can be secondarily refined. In this case, the video sub-clip classification accuracy used as the video clip is higher, further improving the accuracy of the video clip.

In some embodiments, the method 200 for editing video may further include determining sharpness of a video segment of the plurality of video segments; wherein determining whether the plurality of video segments belong to a predetermined classification is performed for video segments having a sharpness above a predetermined sharpness threshold. The video editing method performed by the processor may have more precise control over video quality than manual video editing. It is desirable that video clips from a video stream should be of a clear quality. In general, since the sharpness of live video is adapted to the network situation of the network anchor, the obtained sharpness of the video is often inconsistent. Video with low definition can affect the user viewing experience, and classification of video segments with a definition above a predetermined definition threshold is performed by filtering the definition of the video segments in order to ensure video quality. Therefore, fine editing processing can be performed on the high-definition video, and the quality of the editing video is improved. On the other hand, the processing load of the processor can be reduced.

In some embodiments, determining the sharpness of a video segment of the plurality of video segments may include: acquiring at least one image frame from a video clip at a predetermined period; and determining a sharpness of the video segment based on the sharpness of the acquired at least one image frame. In some embodiments, the video segments may be sampled, for example, at a frequency of 1s each (and possibly at other suitable frequencies, such as once for 0.5s, once for 2 s). For the sampled image frames, sharpness is determined based on the image frames. In this way, sharpness can be readily determined without consuming excessive processing resources. It should be appreciated that this sharpness determination method is merely illustrative and that other methods may be employed by those skilled in the art to similarly effect the determination of sharpness of a video segment.

Fig. 4 illustrates a flowchart for adjusting video clip positions based on extracted video sub-segments, according to some embodiments of the present disclosure. In the method 400 shown in fig. 4, at 402, a starting time of a video sub-segment in a first video segment is determined for the video sub-segment. At 404, a sound start endpoint in the first video clip is determined. At 406, a video clip is extracted from the first video clip with a sound start endpoint closest to the start time as a clip start point for the video clip.

The method can further improve the integrity of the video clip. In processing multi-modal video, for example, video clips include image video clips and sound clips; by the method, the synergy between the video clips and the audio clips can be improved. Taking singing video as an example, a starting time of a video sub-clip in a first video clip may be determined, and a sound starting endpoint at which a sound appears in the first video clip is determined. By starting the starting point of the video clip with the starting point of the sound, the clipped video can be made more complete, for example, the starting time of the clipped video can be exactly the starting time of a certain lyrics. In some embodiments, all sound start endpoints in the first video clip may be determined as clip starting points for the video clips with the sound start endpoint nearest to the start time. In other embodiments, all sound start endpoints in a video sub-clip may be determined, with the sound start endpoint closest to the start time being the clip start point of the video clip. It should be understood that this is merely exemplary; for example, an appropriate sound start end point may be set as the clip start point of a video clip according to the time length of the clipped video clip.

Fig. 5 illustrates a flowchart for adjusting video clip positions based on extracted video sub-segments, according to some embodiments of the present disclosure. In the method 500 shown in fig. 5, at 502, an ending time of a video sub-segment in a first video segment is determined for the video sub-segment. At 5404, an end point of sound ending in the video clip is determined. At 406, a video clip is extracted from the first video clip with the sound end point nearest to the end time as the clip end point of the video clip.

The method can further improve the integrity of the video clip. The above method can improve the synergy between video clips and audio clips when processing multi-mode video, for example. Taking singing video as an example, the end time of a video sub-clip in a first video clip may be determined, and a sound end point at which a sound appears in the first video clip may be determined. By making the clip end point of the video clip end with the sound as the end point, the clipped video can be made more complete, for example, the end time of the clipped video can be exactly the end time of a certain sentence of lyrics. In some embodiments, all of the sound end points in the first video clip may be determined as the clip end points of the video clip with the sound end point nearest to the start time. In other embodiments, all of the sound end points in the video sub-clip may be determined as clip start points for the video clip with the sound end point nearest to the start time. It should be understood that this is merely exemplary; for example, an appropriate sound end point may be set as the clip end point of a video clip according to the time length of the clipped video clip.

Although the embodiments of fig. 4 and 5 are described as separate implementations, it should be understood that the implementations shown in fig. 4 and 5 may be combined. The combined embodiments may further improve the integrity of the video, and the clipped video clip may be just the beginning of the sound (e.g., lyrics, or instrumental sounds) at the beginning of the video clip. For example, the end time of the clip video may be made exactly the end time of a sound (e.g., lyrics, or instrumental performance sound).

In some embodiments, the method 200 may further include adding segment headers and/or segment trailers for the video clip that match the video clip. By adding the segment header to the video clip segment, the prompting information can be provided for the network user, and the user can be guided to click and play conveniently. By adding segment tails, the content of video clips can be added and enriched. In view of the similarity in the fabrication of segment heads and segment tails, the following description takes the fabrication of segment heads as an example; the described method can be similarly applied to fragment tails. In some embodiments, the segment header may include text, speech, and/or pictures. In some embodiments, some video effects may also be added to the video clip to increase the entertainment of the video.

In some embodiments, adding fragment heads and/or fragment tails may include: acquiring an image matching the size of an image frame in a video clip; creating a segment header and/or an image frame of the segment based on the image and text in a predetermined format; converting the text into a voice fragment; and generating segment header and/or segment trailer video based on the segment header and/or segment image frames and the speech segments.

When the segment header is manufactured, the size of an image frame in the video clip is acquired in advance, and a header image corresponding to the size of the image frame is acquired. In some embodiments, the slice header may be extracted from the video clip. For example, a good quality image extracted in definition determination in a video clip may be used as the head portrait. In other embodiments, pre-provided images may be used.

In some embodiments, text is added to the head-of-segment image in a predetermined format and a head-of-segment and/or tail-of-segment video is generated based on the image frames of the head-of-segment and/or the segment and the speech segment, respectively. For example, text may be converted to a speech segment and video frames of the segment header may be made based on the frame rate of the video segment and the duration of the speech segment. In some embodiments, music clips may also be added to the clip header.

In some embodiments, the method 200 may also include matching the commonly clipped video clip to the format of the clip header and/or clip trailer. In some embodiments, the method 200 may further include converting the video clip into an image frame; generating a video clip consistent with the clip header and/or clip trailer format based on the image frames; the generated video segments are merged with segment headers and/or segment trailers. Thus, it can be ensured that the video clip is consistent with the video format of the clip header/trailer.

In some embodiments, the method 200 may further include automatically publishing the composed clip video clips. For generated video clips, it is often necessary to publish on a video platform for network users to view. The video clips generated per day may be as many as thousands of video clips, involving a large number of network anchors. In some embodiments, the clip video clips produced may be automatically released. Therefore, the cost of manual operation is reduced, real-time release of highlight video can be realized, related video can be released when, for example, a network anchor is live broadcast, users are attracted to click, and live broadcast penetration is improved. The automatic release function can more effectively play the role of draining the small video for the live broadcasting room.

Fig. 6 illustrates a schematic block diagram of an apparatus 600 for editing video according to some embodiments of the present disclosure. The apparatus 600 for editing video may include: a video clip acquisition module 610, a classification module 620, and an extraction module 630.

The video clip acquisition module 610 may be configured to acquire a plurality of video clips from the video stream for a first predetermined length of time. The classification module 620 may be configured to determine whether the plurality of video clips belong to a predetermined classification. The extraction module 630 may be configured to: in response to determining that a first video segment of the plurality of video segments belongs to a predetermined category, video sub-segments of a second predetermined length of time are extracted from the first video segment as at least a portion of the video clip, wherein the second length of time is less than the first length of time.

Fig. 7 illustrates a schematic block diagram of an extraction module 700 of an apparatus 600 for editing video according to some embodiments of the present disclosure. In the illustrated embodiment, the extraction module 700 may include: video sub-segment acquisition module 710, video sub-segment classification module 720, and video sub-segment extraction module 730.

The video sub-segment acquisition module 710 may be configured to acquire a plurality of video sub-segments of a second predetermined length of time from the first video segment. The video sub-segment classification module 720 may be configured to determine whether the plurality of video sub-segments belong to a predetermined classification. The video sub-segment extraction module 730 may be configured to extract video sub-segments as video clips in response to determining that the video sub-segments belong to a predetermined classification.

In some embodiments, the extraction module 630 may be configured to determine whether the plurality of video segments belong to a predetermined classification based on a first classification threshold; the video sub-segment classification module is configured to determine whether the plurality of video sub-segments belong to a predetermined classification based on a second classification threshold; wherein the first classification threshold is less than the second classification threshold.

In some embodiments, as shown in fig. 6, the apparatus 600 for editing video may further comprise a sharpness determination module 640 configured to determine sharpness of a video segment of the plurality of video segments, wherein the classification module 620 is configured to perform classification for video segments having sharpness above a predetermined sharpness threshold.

In some embodiments, as shown in fig. 8, the sharpness determination module 640 may include an image frame acquisition module 810 and an image sharpness determination module 820. The image frame acquisition module 810 may be configured to acquire at least one image frame from a video clip at a predetermined period. The image sharpness determination module 820 may be configured to determine sharpness of the video segment based on the sharpness of the acquired at least one image frame.

In some embodiments, the extraction module 630 may be further configured to: determining a starting time of the video sub-segment in the first video segment aiming at the video sub-segment; determining a sound start endpoint in a first video clip; and extracting the video clip from the first video clip with a sound start endpoint nearest to the start time as a clip start point of the video clip.

In some embodiments, the extraction module 630 may be further configured to: determining the ending time of the video sub-segment in the first video segment aiming at the video sub-segment; determining an end point of sound in the video clip; a video clip is extracted from a first video clip with a sound end point nearest to the end time as a clip end point of the video clip.

In some embodiments, as shown in fig. 6, the apparatus 600 for editing video may further comprise a segment header and/or segment trailer addition module 650 configured to add a segment header and/or segment trailer for a video clip segment that matches the video clip segment.

In some embodiments, the segment head and/or segment tail augmentation module 650 may be configured to: acquiring an image matching the size of an image frame in a video clip; creating a segment header and/or an image frame of the segment based on the image and text in a predetermined format; converting the text into a voice fragment; and generating segment header and/or segment trailer video based on the segment header and/or segment image frames and the speech segments.

In some embodiments, as shown in fig. 6, the apparatus 600 for editing video may further comprise a format conversion-publishing module 660, which may be configured to: converting the video clip into image frames; generating a video clip consistent with the clip header and/or clip trailer format based on the image frames; merging the generated video clip with a clip header and/or a clip trailer; and automatically publishing the merged video. Although in the illustrated embodiment, the format conversion and publishing module is integrated in one functional module; it should be understood that this is merely exemplary, and that the format conversion and publishing module is implemented by different functional modules.

According to embodiments of the present application, there is also provided an electronic device, a computer-readable storage medium, and a computer program product.

As shown in fig. 9, a block diagram of an electronic device for a method of vehicle navigation according to an embodiment of the present application. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the application described and/or claimed herein.

Fig. 9 illustrates a block diagram of a computing device 900 capable of implementing various embodiments of the disclosure. As shown, the device 900 includes a Central Processing Unit (CPU) 901, which can perform various suitable actions and processes in accordance with computer program instructions stored in a Read Only Memory (ROM) 902 or loaded from a storage unit 908 into a Random Access Memory (RAM) 903. In the RAM 903, various programs and data required for the operation of the device 900 can also be stored. The CPU 901, ROM 902, and RAM 903 are connected to each other through a bus 904. An input/output (I/O) interface 905 is also connected to the bus 904.

Various components in device 900 are connected to I/O interface 905, including: an input unit 906 such as a keyboard, a mouse, or the like; an output unit 907 such as various types of displays, speakers, and the like; a storage unit 908 such as a magnetic disk, an optical disk, or the like; and a communication unit 909 such as a network card, modem, wireless communication transceiver, or the like. The communication unit 909 allows the device 900 to exchange information/data with other devices through a computer network such as the internet and/or various telecommunications networks.

The processing unit 901 performs the various methods and processes described above, such as methods 200, 300, 400, 500. For example, in some embodiments, the methods 200, 300, 400, 500 may be implemented as a computer software program tangibly embodied on a machine-readable medium, such as the storage unit 908. In some embodiments, part or all of the computer program may be loaded and/or installed onto the device 900 via the ROM 902 and/or the communication unit 909. When the computer program is loaded into RAM 903 and executed by CPU 901, one or more of the steps of processes 300, 400, 500 described above may be performed. Alternatively, in other embodiments, CPU 901 may be configured to perform methods 200, 300, 400, 500 by any other suitable means (e.g., by means of firmware).

The functions described above herein may be performed, at least in part, by one or more hardware logic components. For example, without limitation, exemplary types of hardware logic components that may be used include: a Field Programmable Gate Array (FPGA), an Application Specific Integrated Circuit (ASIC), an Application Specific Standard Product (ASSP), a system on a chip (SOC), a load programmable logic device (CPLD), etc.

Program code for carrying out methods of the present disclosure may be written in any combination of one or more programming languages. These program code may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus such that the program code, when executed by the processor or controller, causes the functions/operations specified in the flowchart and/or block diagram to be implemented. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. The machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

Moreover, although operations are depicted in a particular order, this should be understood as requiring that such operations be performed in the particular order shown or in sequential order, or that all illustrated operations be performed, to achieve desirable results. In certain circumstances, multitasking and parallel processing may be advantageous. Likewise, while several specific implementation details are included in the above discussion, these should not be construed as limiting the scope of the present disclosure. Certain features that are described in the context of separate embodiments can also be implemented in combination in a single implementation. Conversely, various features that are described in the context of a single implementation can also be implemented in multiple implementations separately or in any suitable subcombination.

Although the subject matter has been described in language specific to structural features and/or methodological acts, it is to be understood that the subject matter defined in the appended claims is not necessarily limited to the specific features or acts described above. Rather, the specific features and acts described above are example forms of implementing the claims.

Claims

1. A method for editing video from a live video stream, comprising:

Acquiring a plurality of video clips of a first preset time length from the live video stream based on time;

determining whether the plurality of video clips belong to a predetermined classification; and

extracting video sub-segments of a second predetermined length of time from a first video segment of the plurality of video segments as at least a portion of a video clip segment in response to determining that the first video segment belongs to the predetermined category, wherein the second predetermined length of time is less than the first predetermined length of time;

wherein extracting video sub-segments of a second predetermined length of time from the first video segment comprises:

acquiring a plurality of video sub-segments with the second preset time length from the first video segment according to time sequence;

determining whether a plurality of video sub-segments belong to the predetermined classification;

in response to determining that the video sub-segment belongs to the predetermined classification, the video sub-segment is extracted as a video clip segment.

2. The method of claim 1, wherein determining whether the first video segment belongs to a predetermined classification is performed based on a first classification threshold; and is also provided with

Determining whether the plurality of video sub-segments belong to the predetermined classification is performed based on a second classification threshold;

Wherein the first classification threshold is less than the second classification threshold.

3. The method of claim 1, further comprising:

determining the sharpness of a video segment of the plurality of video segments,

wherein determining whether the plurality of video segments belong to a predetermined classification is performed for video segments having the sharpness above a predetermined sharpness threshold.

4. The method of claim 3, wherein determining sharpness of a video segment of the plurality of video segments comprises:

acquiring at least one image frame from the video clip at a predetermined period; and

the sharpness of the video segment is determined based on the sharpness of the acquired at least one image frame.

5. The method of any of claims 1-4, wherein the predetermined classification includes a talent performance class, wherein the talent performance class includes at least one of singing, dancing, and musical instrument performance.

6. The method of claim 5, further comprising:

determining a starting time of the video sub-segment in the first video segment aiming at the video sub-segment;

determining a sound start endpoint in the first video clip; and

extracting the video clip from the first video clip with a sound start endpoint nearest to the start time as a clip start point of the video clip.

7. The method of claim 5, further comprising:

determining, for the video sub-segment, an end time of the video sub-segment in the first video segment;

determining an end-of-sound endpoint in the video clip;

and extracting the video clip from the first video clip with the sound end point nearest to the end time as a clip end point of the video clip.

8. The method of any one of claims 1-4, 6, and 7, further comprising:

for the video clip, adding a segment header and/or a segment trailer that matches the video clip.

9. The method of claim 8, wherein increasing segment heads and/or segment tails comprises:

acquiring an image matching the size of an image frame in the video clip;

creating an image frame of the segment header and/or segment based on the image and text in a predetermined format;

converting the characters into voice fragments; and

and generating segment head and/or segment tail video based on the image frames of the segment head and/or segment tail and the voice segment.

10. The method of claim 8, further comprising:

Converting the video clip into an image frame;

generating a video clip consistent with the clip header and/or clip trailer format based on the image frame;

merging the generated video segments with the segment heads and/or segment tails; and

the merged video is automatically published.

11. An apparatus for editing video from a live video stream, comprising:

a video clip acquisition module configured to acquire a plurality of video clips of a first predetermined length of time based on time from the live video stream;

a classification module configured to determine whether the plurality of video clips belong to a predetermined classification; and

an extraction module configured to: extracting video sub-segments of a second predetermined length of time from a first video segment of the plurality of video segments as at least a portion of a video clip segment in response to determining that the first video segment belongs to the predetermined category, wherein the second predetermined length of time is less than the first predetermined length of time; wherein the extraction module comprises:

a video sub-segment obtaining module configured to obtain a plurality of video sub-segments of the second predetermined time length from the first video segment;

A video sub-segment classification module configured to determine whether a plurality of video sub-segments belong to the predetermined classification;

a video sub-segment extraction module configured to extract the video sub-segment as a video clip segment in response to determining that the video sub-segment belongs to the predetermined category.

12. The apparatus of claim 11, wherein

The extraction module is configured to determine whether the plurality of video clips belong to a predetermined classification based on a first classification threshold;

the video sub-segment classification module is configured to determine whether the plurality of video sub-segments belong to the predetermined classification based on a second classification threshold;

13. The apparatus of claim 11, further comprising:

a sharpness determination module configured to determine sharpness of a video segment of the plurality of video segments, wherein the classification module is configured to perform classification for video segments having sharpness above a predetermined sharpness threshold.

14. The apparatus of claim 13, wherein the sharpness determination module comprises:

an image frame acquisition module configured to acquire at least one image frame from the video clip at a predetermined period; and

An image sharpness determination module configured to determine sharpness of the video segment based on the sharpness of the acquired at least one image frame.

15. The apparatus of any of claims 11-14, wherein the predetermined classification comprises a performance class, wherein the performance class comprises at least one of singing, dancing, and musical instrument performance.

16. The apparatus of claim 15, the extraction module further configured to:

determining a sound start endpoint in the first video clip; and

and extracting the video clip from the first video clip by using the sound starting end point nearest to the starting time as a clipping starting point of the video clip.

17. The apparatus of claim 15, the extraction module further configured to:

determining an end-of-sound endpoint in the video clip;

18. The apparatus of any of claims 11-14, 16, and 17, further comprising a segment header and/or segment trailer incrementing module configured to increment, for the video clip, a segment header and/or segment trailer that matches the video clip.

19. The apparatus of claim 18, wherein the segment head and/or segment tail augmentation module is configured to:

acquiring an image matching the size of an image frame in the video clip;

converting the characters into voice fragments; and

and generating segment head and/or segment tail video based on the segment head and/or segment image frames and the voice segment.

20. The apparatus of claim 18, further comprising a format conversion-publishing module configured to:

converting the video clip into an image frame;

the merged video is automatically published.

21. An electronic device, comprising:

One or more processors; and

storage means for storing one or more programs which when executed by the one or more processors cause the one or more processors to implement the method of any of claims 1-10.

22. A computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the method according to any of claims 1-10.