CN109618224B

CN109618224B - Video data processing method, device, computer readable storage medium and equipment

Info

Publication number: CN109618224B
Application number: CN201811548970.XA
Authority: CN
Inventors: 杨阳
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2018-12-18
Filing date: 2018-12-18
Publication date: 2021-03-09
Anticipated expiration: 2038-12-18
Also published as: CN112929745B; CN109618224A; CN112929745A

Abstract

The application relates to a video data processing method, a device, a computer readable storage medium and equipment, wherein the method comprises the following steps: the method comprises the steps of obtaining a video screenshot request, obtaining a target video according to the video screenshot request, displaying at least one video frame in the target video, obtaining a video frame selection instruction, selecting a video frame to be spliced from the at least one video frame according to the video frame selection instruction, and splicing the video frame to be spliced to obtain a target spliced picture. The scheme that this application provided simple, the concatenation is efficient in the manufacturing process.

Description

Video data processing method, device, computer readable storage medium and equipment

Technical Field

The present application relates to the field of computer technologies, and in particular, to a video data processing method, an apparatus, a computer-readable storage medium, and a computer device.

Background

With the development of computer technology, social circles appear to splice video content into long graphs, express the mood of publishers or convey the intentions of publishers through continuous dialogue or video pictures. Then, in the current technology of social circles for splicing video contents into pictures, a target video frame needs to be manually captured from a video picture which is rapidly changed, and then the pictures are spliced by a picture splicing tool, so that the picture splicing efficiency is low and the manufacturing process is complicated.

Disclosure of Invention

In view of the foregoing, it is desirable to provide a video data processing method, device, computer-readable storage medium, and computer apparatus with simple manufacturing and high splicing efficiency.

A method of video data processing, the method comprising:

acquiring a video screenshot request;

acquiring a target video according to a video screenshot request, wherein the target video comprises at least one video frame;

displaying at least one video frame in a target video;

the method comprises the steps of obtaining a video frame selection instruction, selecting a video frame to be spliced from at least one video frame according to the video frame selection instruction, and splicing the video frame to be spliced to obtain a target spliced picture.

A video data processing apparatus, the apparatus comprising:

the video screenshot request acquisition module is used for acquiring a video screenshot request;

the target video acquisition module is used for acquiring a target video according to the video screenshot request, wherein the target video comprises at least one video frame;

the display module is used for displaying at least one video frame in the target video;

and the target spliced picture splicing module is used for acquiring a video frame selection instruction, selecting a video frame to be spliced from at least one video frame according to the video frame selection instruction, and splicing the video frame to be spliced to obtain a target spliced picture.

A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the steps when executing the program of:

acquiring a video screenshot request;

displaying at least one video frame in a target video;

A computer-readable storage medium having stored thereon a computer program which, when executed by a processor, causes the processor to perform the steps of:

acquiring a video screenshot request;

displaying at least one video frame in a target video;

According to the video data processing method and device, the computer-readable storage medium and the computer device, the terminal acquires the video screenshot request, acquires the target video according to the video screenshot request, the target video comprises at least one video frame, displays the at least one video frame in the target video, acquires the video frame selection instruction, selects the video frame to be spliced from the at least one video frame according to the video frame selection instruction, and splices the video frames to be spliced to obtain the target spliced picture.

The terminal can automatically intercept a target video in the video, each video frame in the target video comprises an original image and corresponding text content, then each video frame in the target video is displayed to a user through the front end, the user can select an interesting video frame from the displayed video frames as a video frame to be spliced, and finally the video frame to be spliced is spliced to obtain a target spliced picture. Not only need not the manual target video of intercepting from the video picture that changes rapidly, thereby improve the intercepting degree of accuracy of target video and improve the concatenation degree of accuracy, can directly splice the video frame of waiting to splice moreover, need not to carry out the picture arragement through the concatenation instrument again, thereby the simple process of making has improved concatenation efficiency.

Drawings

FIG. 1 is a diagram of an exemplary video data processing system;

FIG. 1A is a flow diagram illustrating a method for video data processing according to one embodiment;

FIG. 1B is a diagram of a video frame including textual content, in one embodiment;

FIG. 1C is a diagram of a video frame that does not include textual content, in one embodiment;

FIG. 1D is a schematic diagram illustrating an interface for displaying a target video, in one embodiment;

FIG. 1E is a diagram illustrating editing of text content of video frames to be stitched in one embodiment;

FIG. 1F is an interface diagram of a target stitched picture in one embodiment;

FIG. 2 is a flow chart illustrating a video data processing method according to another embodiment;

FIG. 2A is a schematic diagram of an interface for displaying a target video according to another embodiment;

FIG. 2B is a schematic diagram of an interface for displaying a target video according to yet another embodiment;

FIG. 2C is a diagram illustrating the selection of a target original image and target text content from the set of original images and the set of text content according to the first image stitching instruction in one embodiment;

FIG. 2D is a schematic diagram of an interface of a target stitched picture in another embodiment;

FIG. 2E is a schematic diagram of an interface of a target stitched picture in a further embodiment;

FIG. 3 is a flowchart illustrating the steps of obtaining a set of original images and a set of text content in one embodiment;

FIG. 3A is a diagram illustrating video images in a sequence of target video images, in accordance with one embodiment;

FIG. 4 is a schematic flowchart of an embodiment of obtaining target video stream data from a target video according to a video screenshot request;

FIG. 5 is a flowchart illustrating an embodiment of obtaining a corresponding sequence of target video images from target video stream data;

FIG. 6 is a flow diagram illustrating a method for video data processing according to one embodiment;

FIG. 6A is a schematic diagram of an interface showing a sequence of target video images in one embodiment;

FIG. 7 is a flowchart illustrating the steps of acquiring an original image set and a text content set in another embodiment;

FIG. 8 is a schematic flow chart illustrating splicing of a target original image and target text content to obtain a target spliced image according to an embodiment;

FIG. 8A is a diagram illustrating an embodiment of obtaining a target original image and target text content from an original image set and a text content set according to a first image stitching instruction;

FIG. 9 is a schematic diagram illustrating a video data processing method in a video playback application scenario according to an embodiment;

FIG. 10 is a block diagram showing the structure of a video data processing apparatus according to one embodiment;

FIG. 11 is a block diagram of a target video stream data acquisition module in one embodiment;

fig. 12 is a block diagram showing the configuration of a target video stream data acquisition unit in another embodiment;

FIG. 13 is a block diagram showing a configuration of a target video image sequence acquisition unit in one embodiment;

FIG. 14 is a block diagram showing a configuration of an original image set acquisition unit in one embodiment;

FIG. 15 is a block diagram showing a configuration of a computer device according to an embodiment.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.

Fig. 1 is a diagram of an application environment of a video data processing method according to an embodiment. Referring to fig. 1, the video data processing method is applied to a video data processing system. The video data processing system includes a terminal 110 and a server 120. The terminal 110 and the server 120 are connected through a network. The terminal 110 may specifically be a desktop terminal or a mobile terminal, and the mobile terminal may specifically be at least one of a mobile phone, a tablet computer, a notebook computer, and the like. The server 120 may be implemented as a stand-alone server or a server cluster composed of a plurality of servers. The terminal 110 is provided with a video playing application, a video capture control is arranged on a video playing application interface, and a video capture request can be generated through triggering of a video capture space acting on the video playing interface. The server 120 stores the target video, i.e., the target video can be obtained from the server 120.

In one embodiment, a video capture request is triggered and generated through a video capture control acting on a video playing application interface of the terminal 110, the terminal 110 sends the video capture request to the server 120, and the server 120 returns a target video to the terminal 110 according to the video capture request, wherein the target video includes at least one video frame. The terminal 110 displays at least one video frame in the target video, acquires a video frame selection instruction, selects a video frame to be spliced from the at least one video frame according to the video frame selection instruction, and splices the video frame to be spliced to obtain a target spliced picture. Further, the terminal 110 may display the spliced target spliced picture to the user for viewing.

As shown in fig. 1A, in one embodiment, a video data processing method is provided. The embodiment is mainly illustrated by applying the method to the terminal in fig. 1. Referring to fig. 1A, the video data processing method specifically includes the following steps:

step 102, a video screenshot request is obtained.

The video screenshot request can be used for requesting to screenshot of an image frame in a current video, and the current video refers to a video being played by a current playing application of the terminal; the video screenshot request can also be used to request to screenshot an image frame in a video to be played, and the video to be played can be other videos besides the current video being played by the playing application, such as other stored videos, for example, the terminal playing application is playing video a, and video screenshot can be performed on video B by setting. The screenshot refers to a process of capturing a visual image which can be displayed on a terminal screen or other display equipment from a video which is played by a current playing application of the terminal. The acquiring of the video screenshot request can be triggered by operating a related control on the current playing application of the terminal to generate the video screenshot request, or the trigger time can be set through a separate service, and when the set trigger time is reached, the service can automatically generate the video screenshot request and the like.

And 104, acquiring a target video according to the video screenshot request, wherein the target video comprises at least one video frame.

When a video screenshot request is triggered to be generated, the terminal automatically acquires videos in a preset time period according to the video screenshot request, and a plurality of video frames in the preset time period are used as target videos. The target video includes at least one video frame, and the image corresponding to the video frame may be an image including text content or an image not including text content. The text content refers to a representation of a written language, and is applicable to any language, for example, the text content may be a speech. Wherein, for example, a video frame including text content may be a movie picture including lines, as shown in fig. 1B; a video frame that does not include textual content may be a picture only, as shown in fig. 1C.

The specific method for acquiring the target video in the preset time period according to the video screenshot request includes, but is not limited to, acquiring preset time length by using a time point generated by triggering of the video screenshot request as a reference point if the target video is acquired from the current video, and acquiring the target video according to the reference point and the preset time length. If the target video is acquired from other videos (non-current videos), a time point, preset time length and other video identifiers generated by triggering the video screenshot request are set, and once the set time point generated by triggering the video screenshot request is reached, the terminal can automatically acquire the target video according to the preset time length and the other video identifiers.

And 106, displaying at least one video frame in the target video.

After the target video is acquired, the terminal can respectively extract images and corresponding text contents displayed by each video frame in the target video, and then respectively display the extracted images and text contents directly through the front end, or after the terminal acquires the target video, at least one video frame in the target video is directly displayed through the front end without any processing on the video frame. In order not to influence the playing of the current video by the video playing application, the display of each video frame in the target video at the front end can be performed through a new display window, or can be performed through the current playing window of the video playing application playing the current video.

In an embodiment, when the terminal plays a current video through a play interface of a video play application, a video screenshot request may be generated by triggering a relevant button acting on the play interface, a target video composed of at least one video frame is obtained according to the video request, and each video frame in the target video displayed at the front end of the terminal may be displayed through a newly-created display window, which may be specifically as shown in fig. 1D, which is an interface schematic diagram for displaying the target video in one embodiment of fig. 1D. As shown in fig. 1D, after the terminal acquires the target video, a new display window can be directly created at the front end to display each video frame in the target video, and the interface diagram shown in fig. 1D is the new display window. The presentation window shown in fig. 1D may present individual video frames in the target video below the window. The specific display position of each video frame in the target video in the display window is not limited herein.

And 108, acquiring a video frame selection instruction, selecting a video frame to be spliced from at least one video frame according to the video frame selection instruction, and splicing the video frame to be spliced to obtain a target spliced picture.

The video frame selection instruction is used for selecting a video frame to be spliced and can be generated according to the click operation on each video frame in the target video displayed at the front end. For example, the front end displays a video frame a, a video frame b, a video frame c and the like in the target video, and the end user triggers the selection of the video frame b by clicking to generate a video frame selection instruction. Alternatively, as shown in fig. 1D, a video frame selection instruction may be generated by triggering on each video frame displayed at the front end, and a video frame to be spliced is selected from at least one video frame according to the video frame selection instruction, where as shown in fig. 1D, the selected upper right corner of the video frame to be spliced displays a selection sequence.

The clicking operation on each video frame in the target video can be manual operation or automatic operation, and the manual operation means that a terminal user directly clicks each video frame in the target video to trigger generation of a video frame selection instruction. The automatic operation refers to that video frames to be spliced are automatically selected from all the video frames through an independent service according to a preset command so as to trigger the generation of a video frame selection instruction.

And when a video frame selection instruction is obtained, selecting a video frame to be spliced from each video frame according to the video frame selection instruction, and carrying out custom splicing on the video frame to be spliced to obtain a target spliced picture. The number of the video frames to be spliced is at least one. Further, after at least one video frame to be spliced is selected from the video frames according to the video frame selection instruction, the selected at least one video frame to be spliced is subjected to user-defined splicing to obtain a target spliced image. The splicing format can be customized, and the customization can be to splice all the video frames to be spliced vertically or horizontally, and the like. In an embodiment, if only one video frame to be spliced is selected from the video frames displayed at the front end according to the video frame selection instruction, the video frame to be spliced is directly used as the final target spliced picture, or if only one video frame is displayed at the front end, the video frame can be used as the video frame to be spliced according to the video frame selection instruction, and the video frame to be spliced is used as the target spliced picture.

In one embodiment, a terminal user can directly select a video frame to be spliced from at least one video frame displayed at the front end, and the video frame to be spliced is spliced to obtain a target spliced picture. The video frames to be spliced selected here may or may not include text content.

In another embodiment, after the video frames to be spliced are selected, the text content of the selected video frames to be spliced can be edited, and finally the edited video frames to be spliced are spliced to obtain the target spliced picture. For example, the text content may be a speech or the like, and the editing of the text content may be to add new text content in the video frame to be spliced again, or to add new text content after deleting the original text content of the video frame to be spliced.

In yet another embodiment, if the image corresponding to each video frame displayed at the front end includes text content, the image and the text content corresponding to the video frame can be extracted respectively to obtain an original image set and a text content set, then a target original image and a target text content are selected from the original image set and the text content set respectively, and finally the target original image and the target text content are spliced to obtain a target spliced image.

According to the video data processing method, the target video in the video can be automatically intercepted, each video frame in the target video comprises the original image and the corresponding text content, then each video frame in the target video is displayed to the user through the front end, the user can select the video frame which is interested from the displayed video frames as the video frame to be spliced, and finally the video frame to be spliced is spliced to obtain the target spliced picture. Not only need not the manual target video of intercepting from the video picture that changes rapidly, thereby improve the intercepting degree of accuracy of target video and improve the concatenation degree of accuracy, can directly splice the video frame of waiting to splice moreover, need not to carry out the picture arragement through the concatenation instrument again, thereby the simple process of making has improved concatenation efficiency.

In one embodiment, splicing video frames to be spliced to obtain a target spliced picture includes: acquiring a text content editing instruction, and editing a video frame to be spliced according to the text content editing instruction to obtain an edited video frame to be spliced; and splicing the edited video frames to be spliced to obtain a target spliced picture.

After the video frames to be spliced are selected from the video frames according to the video frame selection instruction, free editing of text content can be carried out on the video frames to be spliced. Specifically, a text content editing instruction can be generated by triggering, the video frames to be spliced are edited according to the text content editing instruction, so that the edited video frames to be spliced are obtained, and finally the edited video frames to be spliced are spliced to obtain the target spliced picture.

The text content editing instruction is an instruction for editing text content, and editing the video frame to be spliced according to the text content editing instruction may be adding new text content to the video frame to be spliced, or deleting the original video frame of the video frame to be spliced, adding new text content again, and the like. As shown in fig. 1E, fig. 1E is a schematic diagram illustrating text content editing performed on video frames to be spliced in one embodiment, and fig. 1E is a schematic diagram illustrating each video frame to be spliced, where each video frame to be spliced includes text content, and text content editing may be performed on the video frames to be spliced, for example, new text is added to one of the video frames to be spliced. And finally, splicing the edited video frames to be spliced to obtain a target spliced picture, as shown in fig. 1F, wherein fig. 1F is a process of splicing all the edited video frames to be spliced to obtain the target spliced picture.

As shown in fig. 2, in one embodiment, a video data processing method is provided. The embodiment is mainly illustrated by applying the method to the terminal in fig. 1. Referring to fig. 2, the video data processing method specifically includes the following steps:

step 202, a video screenshot request is obtained.

Step 204, obtaining a target video according to the video screenshot request, obtaining at least one original image from the target video to obtain an original image set, and obtaining at least one text content from the target video to obtain a text content set.

When a video screenshot request is triggered to be generated, the terminal automatically acquires videos in a preset time period according to the video screenshot request, the videos in the preset time period are used as target videos, and the target videos are composed of a plurality of video frames. While video frames may or may not include textual content.

After the target video is obtained, because the video frame in the target video may include the original image and the corresponding text content, the original image and the corresponding text content of the video frame need to be extracted respectively. The original image is the picture part which is obtained by extracting the text content in the video frame, and the text content in the video frame is not embedded into the original image, so that the original image in the video frame and the corresponding text content can be respectively extracted. After the original images and the corresponding text contents in the video frames are extracted, a plurality of original images and a plurality of text contents are obtained. And further screening the plurality of original images and the plurality of text contents according to a preset screening rule to obtain an original image set and a text content set. The original image set is composed of at least one original image after screening, and the text content is composed of at least one text content after screening. The preset screening rule can be customized, and the customization can be to filter out repeated original images and text contents, and the like.

And step 206, displaying at least one original image in the original image set, and displaying at least one text content in the text content set.

Specifically, after the terminal acquires the original image set and the text content set, each original image in the original image set and each text content in the text content set can be displayed through the front end. The original image set and the text content set displayed at the front end can be displayed through a newly-built display window, and can also be displayed through a current playing window of a playing application playing a current video.

In an embodiment, the terminal may display the original image set and the text content set at the front end through a new display window, which may be specifically shown in fig. 2A, where fig. 2A illustrates an interface diagram of a target video in an embodiment. As shown in fig. 2A, after the terminal acquires the original image set and the text content set, the front end newly creates a display window to display the target original image and text content set, and the interface diagram shown in fig. 2A is the newly created display window. In the presentation window shown in fig. 2A, each original image in the set of original images may be placed to the left of the presentation window, and each text content in the set of text contents may be placed to the right of the presentation window. The display position of each original image in the target original image in the newly-built display window can be customized, and similarly, the display position of each text content in the text content set in the newly-built display window can be customized.

In another embodiment, the terminal may display the original image set and the text content set at the front end through a current playing window of a playing application for playing a video, which may be specifically shown in fig. 2B, where fig. 2B shows an interface schematic diagram of a target video in one embodiment. As shown in fig. 2B, after the terminal acquires the original image set and the text content set, the current playing window may pause playing the current video, display each original image in the original image set below the current playing window, and display each text content in the text content set above the current playing window. The display position of each original image in the original image set in the current playing window can be customized, and similarly, the display position of each text content in the text content set in the current playing window can be customized.

And 208, acquiring a first image splicing instruction, selecting a target original image from at least one original image according to the first image splicing instruction, and selecting target text content from at least one text content according to the first image splicing instruction.

The image splicing instruction is used for indicating an instruction for image splicing, and the image splicing instruction can be generated according to the triggering of the operation on the original image set and the text content set. The operation on the original image set and the text content set may be a manual operation or an automatic operation, where the manual operation is a click operation performed by a terminal user through the original image set and the text content set to trigger generation of a first image stitching instruction. And the automatic operation refers to that a target original image and a target text content can be automatically selected from the original image set and the text content set through a single service according to a preset command so as to generate a first image splicing instruction.

Specifically, after the first image stitching instruction is obtained, the target original image and the target text content may be obtained from the original image set and the text content set according to the first image stitching instruction.

As shown in fig. 2C, fig. 2C is a schematic diagram illustrating that a target original image and a target text content are obtained from an original image set and a text content set according to a first image stitching instruction in an embodiment, the original image set and the text content set shown in fig. 2C are respectively displayed on the left and right sides of a window, an end user may select a target original image from each original image in the original image set by a click operation, and similarly, an end user may select a target text content from each text content in the text content set by a click operation. Specifically, the matching between the target original image and the corresponding target text content may be selecting one target original image, then selecting the next target original image, and then selecting the next target text content. Therefore, the target original image and the corresponding target text content are all matched one by one, for example, as shown in fig. 2C, the original image a is matched with the text content b, the original image C is matched with the text content e, and the original image d is matched with the text content C.

And 210, splicing the target original image and the target text content to obtain a target spliced image.

Specifically, after a target original image and target text content are acquired from an original image set and a text content set according to a first image splicing instruction, the selected target original image and the selected target text content are spliced in a user-defined manner, and therefore a target spliced image is obtained. The user-defined splicing can be that target text content is put into a target original image to form a target spliced picture, or when a plurality of target original images and target text content exist, the target original images and the target text content can be randomly combined, and finally the combined pictures which are randomly combined are spliced to obtain the target spliced picture, and the like, wherein when the target original images and the target text content are randomly combined, splicing can be performed according to the mood of a publisher or the intention of the publisher, and the publisher refers to a user who publishes or shares the finally obtained target spliced picture.

As shown in fig. 2D and 2E, fig. 2D and 2E show schematic diagrams of target stitched pictures in an embodiment, and the first stitched image shown in fig. 2D may specifically combine the target original image and the target text content according to intention or mood to obtain combined images, and then stitch the combined images according to a certain format, for example, from top to bottom to obtain the target stitched picture shown in fig. 2D. Or as shown in fig. 2E, the first stitched image shown in fig. 2E may be obtained by first stitching the target original images according to a certain format, for example, overlapping the target original images, and then arranging the target text contents in the first target original image in sequence, so as to obtain the target stitched image shown in fig. 2E. Wherein the discharge order of the respective target text contents may be discharged according to the intention or mood of the publisher.

According to the video data processing method, the device, the computer readable storage medium and the computer equipment, when a terminal obtains a video screenshot request, a target video is obtained according to the video screenshot request, a plurality of corresponding original images are obtained from the target video according to the video screenshot request to obtain an original image set, a plurality of corresponding text contents are obtained from the target video according to the video screenshot request to obtain a text content set, the original image set and the text content set are displayed, a first image splicing instruction is obtained, a target original image is obtained from the original image set according to the first image splicing instruction, target text contents are obtained from the text content set according to the first image splicing instruction, and a target original image and target text contents are spliced to obtain a target spliced image.

Since the target video is obtained from the video according to the video screenshot request, the video content interested by the user can be intercepted and regarded as the target video. Moreover, the terminal can automatically acquire the corresponding original image and text content from the target video according to the video screenshot request to obtain the original image set and the text content set, so that the target video frame is not required to be manually and rapidly intercepted from the video frame, and the accuracy of intercepting the video stream data is improved, thereby improving the splicing accuracy. And finally, the terminal displays the original image set and the text content set through the front end, can acquire the target original image and the target text content from the original image set and the text content set, and then splices the target original image and the target text content to obtain a target spliced image, so that the target original image and the target text content can be directly spliced without splicing by a splicing tool, the manufacturing process is simple, and the splicing efficiency is improved.

In one embodiment, as shown in fig. 3, obtaining at least one original image from a target video to obtain an original image set, and obtaining at least one text content from the target video to obtain a text content set includes:

step 302, obtaining target video stream data from a target video according to the video screenshot request.

The target video stream data is video stream data screened from the target video according to the video screenshot request, and includes, but is not limited to, video stream data composed of video data generated in the screenshot process and corresponding text content, video frames or audio frames themselves and text content, and the like. The text content may be, but is not limited to, a speech content in a video frame, and the like. The target video stream data may be a video frame within a period of time, and specifically, video stream data within a preset time interval may be captured from the target video according to a video capture request, and the video stream data within the preset time interval is used as the target video stream data.

Specifically, after the target video is obtained, video stream data intercepted from the target video according to the video screenshot request can be used as the target video stream data. The method comprises the steps of presetting an intercepting time interval, obtaining a time point where a video screenshot request is triggered to be generated once the video screenshot request is obtained, intercepting video stream data in the preset intercepting time interval by taking the time point as a reference point, and taking the intercepted video stream data as target video stream data. The method comprises the steps of capturing video stream data within a preset capturing time interval from a forward video at the time point where a video capture request is triggered to be generated, and then using the captured video stream data as target video stream data. Or intercepting video stream data within a preset intercepting time interval from a backward video at the time point of triggering the generation of the video screenshot request, and then taking the intercepted video stream data as target video stream data. Or the first video stream data in a part of intercepting time interval can be intercepted from the forward video, the second video stream data in the rest intercepting time interval can be intercepted from the backward video, and finally the target video stream data is formed according to the combination of the first video stream data intercepted from the forward video and the second video stream data intercepted from the backward video.

And step 304, obtaining a corresponding target video image sequence according to the target video stream data.

The target video image sequence is a sequence formed by target video images, and the target video images are obtained by screening video images corresponding to target video stream data and meeting preset screening conditions. The preset screening condition can be customized, and the customization can be to perform duplication elimination on video images corresponding to continuous same text contents in target video stream data, and form a target video image sequence by the video images after duplication elimination. Or the self-defining can be that firstly, the video images corresponding to the continuous same text content in the target video stream data are subjected to duplicate removal, then the continuous same video images after the duplicate removal are subjected to secondary duplicate removal, and the video images after the secondary duplicate removal form a target video image sequence. Or the customization can be to combine all video images corresponding to the target video stream data into the target video image sequence. It should be noted that the video image corresponding to the target video stream data includes corresponding text content.

Step 306, obtaining an original image corresponding to the video image in the target video image sequence to obtain an original image set, and obtaining a text content corresponding to the video image in the target video image sequence to obtain a text content set.

Each video image in the target video image sequence is composed of an original image without text content and corresponding text content, specifically, the time stamps of the original image without text content and the corresponding text content can be set to be the same, and when the original image is played, the corresponding text content can be automatically called, so that the video images are composed. That is, the time stamps of the video images are the same as the time stamps of the original image without text content and the corresponding text content, so that the corresponding original image and text content can be obtained according to the time stamp corresponding to each video image in the target video image sequence. The original image refers to an image that does not include text content in the video image, and the text content refers to text content in the video image, for example, a speech in the video image can be used as the text content. It should be noted that the original image and the text content corresponding to the video image in the target video image sequence are not limited to be obtained according to the timestamp of the video image, the manner of obtaining the original image and the text content may be customized, and the customization may be to perform image extraction and text content identification on the video image to obtain the corresponding original image and text content, and the like.

Specifically, the original image and the text content corresponding to the video image in the target video image sequence are explained with reference to fig. 3A, as shown in fig. 3A, fig. 3A shows a schematic diagram of the video image in the target video image sequence in one embodiment. Fig. 3A shows any one of the video images in the target video image sequence, and fig. 3A shows the video image composed of the original image a and the corresponding text content: what constitutes "evolution in high-energy cosmic current may lead to early earth life", then the corresponding text content may be specifically obtained according to the timestamp of the video image shown in fig. 3A as follows: "evolution in high energy cosmic flow that may lead to early earth life" and original image a.

In one embodiment, after the target video image sequence is obtained, corresponding original images and corresponding text contents can be obtained according to timestamps of the video images in the target video image sequence, the obtained original images are combined into an original image set, and the obtained text contents are combined into a text content set.

In another embodiment, after a target video image sequence is acquired, image extraction and character recognition are performed on each video image in the target video image sequence to obtain corresponding original images and text contents respectively, the extracted original images form an original image set, and the recognized text contents form a text content set.

In one embodiment, as shown in fig. 4, acquiring target video stream data from a target video according to a video screenshot request includes:

and 402, acquiring a video playing time point corresponding to the video screenshot request.

The video playing time point refers to the time point of playing the video in the playing application, and the video playing time point corresponding to the video screenshot request refers to the time point of playing the video picture in the playing application when the time point of generating the video screenshot request is triggered. Specifically, a video screenshot request can be triggered and generated through a screenshot button related to the playing application, and once the playing application detects that the video screenshot request is generated, a video playing time point corresponding to the video screenshot request is immediately acquired. The video screenshot request can also be generated through an independent service trigger, and similarly, the independent service acquires the time point of the video picture being played from the playing application according to the time point of the video screenshot request.

And step 404, acquiring forward video stream data within a preset back-off time interval according to the video playing time point.

And 406, acquiring backward video stream data within a preset forward time interval according to the video playing time point.

The forward video stream data is composed of video stream data which is backwards moved within a first preset time interval by a video playing time point corresponding to the video screenshot request, and the backward video stream data is composed of video stream data which is backwards moved within a second preset interval by the video playing time point corresponding to the video screenshot request. The first preset time interval and the second preset time interval can be set according to actual needs.

Specifically, after a video playing time point corresponding to the video screenshot request is obtained, a preset back-off time interval is obtained, the video playing time point is taken as a starting point, the preset back-off time interval is backed forward to obtain a first playing time point, and video stream data between the first playing time point and the video playing time point corresponding to the video screenshot request is taken as forward video stream data. Similarly, a preset forward time interval is obtained, the video playing time point is taken as a starting point, the preset forward time interval is advanced backwards to obtain a second playing time point, and video stream data between the second playing time point and the video playing time point corresponding to the video screenshot request is taken as backward video stream data. Wherein the first preset time interval and the preset backoff time interval may be set to be the same, and the second preset time interval and the preset advance time interval may be set to be the same.

And step 408, splicing the forward video stream data and the backward video stream data to obtain target video stream data.

The target video stream data is obtained according to forward video stream data and backward video stream data, specifically, the forward video stream data and the backward video stream data can be combined to form target video stream data, or part of video stream data can be extracted from the forward video stream data to be used as target forward video stream data, part of video stream data can be extracted from the backward video stream data to be used as target backward video stream data, and the target forward video stream data and the target backward video stream data are combined to form the target video stream data.

For example, assuming that a preset back-off time interval is 30 seconds, a preset forward time interval is 60 seconds, and the playing duration of a played video is 1 hour in total, when a playing application plays a current video for 10 minutes, a terminal user triggers to generate a video capture request by clicking a capture button arranged on the playing application, the playing application acquires a video playing point corresponding to the video capture request for 10 minutes, acquires forward video stream data within a preset back-off time interval of 30 seconds with the video playing point as a reference point, acquires backward video stream data within a preset forward time interval of 60 seconds, and combines the acquired forward video stream data and the backward video stream data to form target video stream data, wherein the target video stream data is all video stream data within 90 seconds.

In one embodiment, as shown in fig. 5, obtaining a corresponding target video image sequence from target video stream data includes:

step 502, obtaining a corresponding initial video image sequence according to the target video stream data.

The initial video image sequence refers to a set formed by video images corresponding to target video stream data. Since the target video stream data is composed of a plurality of video images, a corresponding initial sequence of video images can be obtained from the target video stream data. Specifically, the corresponding initial video image sequence may be obtained by framing the target video stream data by using an open source tool, such as an open source computer program ffmpeg.

Step 504, identifying the target text content corresponding to each video image in the initial video image sequence.

Specifically, since each video image in the initial video image sequence is composed of an original image and corresponding text content, in order to remove some video images corresponding to the same text content, the initial video image sequence needs to be deduplicated according to the text content in each video image. Therefore, text content recognition is performed on each video image in the initial video image sequence to obtain target text content corresponding to each video image. Specifically, each video image in the initial video image sequence can be identified by the text content identification component, so that the target text content corresponding to each video image is obtained. Or the text content extraction component can be used for extracting the text content of each video image in the initial image sequence, so that the target text content corresponding to each video image is obtained.

And step 506, performing de-duplication on the video images corresponding to the continuous same target text content to obtain a target video image sequence.

Specifically, since there may exist video images with the same text content in each video image in the target video stream data, and the meaning of recognizing the same text content is not great, it is necessary to perform deduplication on the video images with repeated text content in the initial video image sequence corresponding to the target video stream data. Specifically, after the target video stream data is acquired, the target video stream data is deframed to obtain a corresponding initial video image sequence, wherein the target video stream data is deframed by using a related tool, for example, an open source computer program ffmpeg tool deframed the target video stream data to obtain the corresponding initial video image sequence. Further, the text content in each video image in the initial video image sequence can be identified through a text content component to obtain corresponding target text content, and finally, the video images corresponding to the continuous same target text content are subjected to duplication elimination to obtain a target video image sequence.

For example, after the target video stream data obtains the corresponding initial video image sequence, the initial video image sequence has 5 video images, and the text content identification of the 5 video images respectively corresponds to the target text contents: "today's weather is really good", "we are free to plan to go to the outing bar", "bar friend". And finally, removing the duplication of the video images corresponding to the continuous same target text content to obtain a target video image sequence, namely removing the duplication of the video images corresponding to the last two target text contents of 'outcrossing bar friends' to obtain a set consisting of the video images corresponding to the target video image sequence of the target text contents of 'good weather today', the video images corresponding to the target text contents of 'planning to go to suburb in case of being available' and the video images corresponding to the first target text contents of 'outcrossing bar friends'.

In one embodiment, as shown in fig. 6, the video data processing method shown in fig. 5 further includes:

step 602, displaying each target video image in the target video image sequence, where the target video image includes corresponding target text content.

And step 604, acquiring a second image splicing instruction, acquiring video images to be spliced from the target video image sequence according to the second image splicing instruction, and splicing the video images to be spliced to obtain a second spliced image.

Specifically, after video images corresponding to the same continuous target text content are deduplicated to obtain a target video image sequence, each target video image in the target video image sequence can be displayed through the front end, wherein each target video image includes the corresponding target text content. The target video images in the target video image sequence displayed at the front end can be displayed through a newly-built display window, or can be displayed through a current playing window of a playing application playing the video. It should be noted that, when the text content in each target video image in the target video image sequence cannot be separated from the original image, each target video image in the target video image sequence can be directly displayed at the front end. Further, each target video image displayed at the front end may be operated to trigger generation of a second image stitching instruction, where the second image stitching instruction is an instruction for instructing image stitching. And the terminal acquires video images to be spliced from the target video image sequence according to the second image splicing instruction, and splices the video images to be spliced to obtain a second spliced image. The splicing format of the second spliced picture obtained by splicing the video images to be spliced can be customized, and the customization can be from top to bottom or splicing into a four-square grid and the like.

In an embodiment, as shown in fig. 6A, fig. 6A is an interface schematic diagram illustrating a target video image sequence in an embodiment, after the target video image sequence is obtained, a new display window may be created at a front end to display each target video image in the target video image sequence, as shown in fig. 6A, each target video image in the target video sequence is displayed below the display window shown in fig. 6A, where each target video image includes corresponding target text content. The terminal user can trigger generation of a second image splicing instruction by clicking each target video image, and obtains a video image to be spliced from the target video image sequence according to the second image splicing instruction, if the video image to be spliced is the 1 st target video image, the 2 nd target video image and the 4 th target video image, and the video images to be spliced are spliced to obtain a second spliced image. Wherein the second stitched picture may be in the stitching format as shown in fig. 2D or fig. 2E.

In an embodiment, as shown in fig. 7, acquiring an original image corresponding to a video image in a target video image sequence to obtain an original image set, and acquiring text content corresponding to the video image in the target video image sequence to obtain a text content set, includes:

step 702, obtaining a video playing time stamp corresponding to each video image in the target video image sequence.

Step 704, obtaining the matched original image according to the video playing timestamp corresponding to each video image to obtain a current original image set, and obtaining the matched text content according to the video playing timestamp corresponding to each video image to obtain a current text content set.

The video playing time stamp refers to a time point of playing the video image. Since each video image in the target video image sequence is composed of an original image and a corresponding text content, that is, the time stamps of the original image and the corresponding text content are set to be the same, the video playing time stamp of the video image formed by the final combination is also the same as the time stamps of the original image and the corresponding text content. Therefore, the matched original images and file contents can be obtained through the video playing time stamps corresponding to the video images in the target video image sequence, the original images corresponding to the video images form a current original image set, and the text contents form a current text content set. For example, assuming that a certain video image in the target video image sequence is composed of an original image a and a text content b, and the text content b and the original image a are nested and cannot be separated, the original image a and the text content b before nesting are obtained according to a video playing timestamp of the video image, the original image a is added to the current original image set, and the text content b is added to the current text content set.

Step 706, the original images corresponding to the continuous same text content in the current original image set are deduplicated to obtain the original image set.

And 708, repeating the continuous same text content in the current text content set to obtain a text content set.

The current original image set and the current text content set only store original images and text contents corresponding to each video image in the target video image sequence, so that the current original image set and the current text content set need to be deduplicated, original images corresponding to continuous identical text contents need to be deduplicated, and continuous identical text contents need to be deduplicated. Specifically, text contents corresponding to each original image in the current original image set are obtained, and original images corresponding to continuous identical text contents are subjected to duplication elimination to obtain an original image set. Similarly, when the current text content set is subjected to duplication elimination, continuous and same text contents in the current text content set are subjected to duplication elimination to obtain a text content set.

For example, the current set of original images includes original image a, original image b, and original image c, and the current set of text content includes: the text contents corresponding to the original image a, the original image b and the original image c are as follows: "the weather is good today", and "we go to the outing bar", then the original images corresponding to the continuous same text content are deduplicated to obtain an original image set including an original image a and an original image c, and the continuous same text content is deduplicated to obtain a text content set including: "weather is very good today" and "we go to a picnic bar".

In one embodiment, as shown in FIG. 8, the first image stitching instruction includes a plurality of combine sub-instructions and stitching sub-instructions; acquiring a first image splicing instruction, acquiring a target original image from an original image set according to the first image splicing instruction, acquiring target text content from a text content set according to the first image splicing instruction, and splicing the target original image and the target text content to obtain a target spliced image, wherein the method comprises the following steps:

and step 802, acquiring a combination sub-instruction, and acquiring a combination pair according to the combination sub-instruction, wherein the combination pair comprises an original image of a target to be combined and text content of the target to be combined.

The combination sub-instruction is used for indicating the image combination of the target text content and the target original image, each combination sub-instruction is used for acquiring a corresponding combination pair, and the combination pair is a combination formed by the target original image and the target text content. The first image splicing instruction comprises a plurality of combining sub-instructions and splicing sub-instructions, the splicing sub-instructions are instructions for indicating the combined images to be spliced, the combined images correspond to the combining sub-instructions one by one, the corresponding combined images can be obtained according to the combining sub-instructions, and the combined images comprise original images of targets to be combined and text contents of the targets to be combined. Specifically, the terminal user may select a target original image to be combined and a target text content to be combined according to each target original image and each target text content displayed at the front end, and may combine the target text content to be combined and the target original image to be combined to obtain a combined pair, thereby triggering generation of the combined sub-instruction.

And step 804, combining the matched original image of the target to be combined and the text content of the target to be combined to obtain a combined image corresponding to the combined pair.

Specifically, after the terminal user selects the target original image to be combined and the target text content to be combined according to each target original image and each target text content displayed at the front end, the matched target original image to be combined and the matched target text content to be combined can be combined to obtain a combined image corresponding to the combination pair, and specifically, the target text content to be combined can be put into the matched target original image to be combined to obtain the combined image. For example, the original images of the targets to be combined are a, b and C, and the text contents of the targets to be combined are A, B and C, and the matched original images of the targets to be combined and the text contents of the targets to be combined are combined to obtain combined images corresponding to the combination pairs, and the combined images corresponding to the respective combination pairs may be Ab, Ba and Cc.

At step 806, a plurality of combined images corresponding to the plurality of combined sub-instructions are obtained.

And 808, splicing the plurality of combined images according to the splicing sub-instruction to obtain a target spliced image.

After the combined image corresponding to each sub-combination sub-instruction is obtained, a plurality of combined images corresponding to each sub-combination sub-instruction are obtained, and then the combined images are spliced according to a splicing sub-instruction in the first image splicing instruction and a preset format, so that a target spliced image is obtained. The preset splicing format of the target spliced picture can be customized, and the target spliced picture can be as shown in fig. 2D or fig. 2E.

In an embodiment, the detailed description may be specifically combined with a schematic diagram shown in fig. 8A that the target original image and the target text content are obtained from the original image set and the text content set according to the first image stitching instruction, as shown in fig. 8A, each target original image in the original image set and each target text content in the text content set are respectively displayed on the left and the right of the window, and the terminal user may select each target original image to be combined from each target original image and each target text content to be combined from each target text content, and compose a corresponding combined image by dragging the target text content to be combined to the matched target original image to be combined. After dragging each target text content to be combined to the matched target original image to be combined to form a corresponding combined image, splicing the plurality of combined images according to the splicing sub-instruction to obtain a target spliced image. The splicing format of the target spliced picture can be customized, for example, as shown in fig. 2D, the combined images are spliced from top to bottom to obtain a first spliced image, or as shown in fig. 2E, the combined images are overlapped and spliced to obtain the first spliced image. It should be noted that the splicing format is not limited to the splicing formats shown in fig. 2D and 2E.

In one embodiment, obtaining a video screenshot request comprises: acquiring a video screenshot request through operation of a playing application interface acting on a current video; the method for displaying the original image set and the text content set comprises the following steps: and displaying the original image set and the text content set through a splicing window newly built in the playing application interface.

Specifically, when the playing application plays the current video, a video screenshot button is performed by setting a corresponding control on a playing application interface of the current video, and specifically, a click operation can be performed through the video screenshot button to trigger generation of a video screenshot request. However, in order not to affect the playing of the current video of the terminal, the playing application may newly create a splicing window to display the original image set and the text content set. Specifically, when the playing application plays a current video, a video screenshot obtaining request is generated by clicking a relevant space on the playing application, a target video is automatically obtained from the current video according to the video screenshot obtaining request, a corresponding target video image sequence is obtained according to the target video, an original image and text content corresponding to a video image in the target video image sequence are obtained, and an original image set and a text content set are formed. Once a video screenshot request is triggered to be generated, the playing application establishes a splicing window to display an original image set and a text content set for a terminal user to select. Wherein the newly created splicing window can be the window shown in fig. 2C.

In one embodiment, obtaining a video screenshot request comprises: acquiring a video screenshot request through an interface acting on a splicing application; acquiring a target video according to a video screenshot request, wherein the method comprises the following steps: and sending the video screenshot request to the playing application where the current video is located so that the playing application returns the target video.

In order not to influence the progress of playing the current video by the playing application, a video screenshot request can be generated by triggering the splicing application. Specifically, a related screenshot button can be arranged on an interface of the splicing application, a video screenshot request is generated by clicking the screenshot button of the splicing application or the video screenshot request is automatically generated by triggering the splicing application at a preset time. Specifically, the terminal triggers and generates a video screenshot request through a related screenshot button acting on an interface of the splicing application, the splicing application sends the video screenshot request to a playing application of the current video, and the playing application acquires a target video from the current video according to the video screenshot request and returns the target video to the splicing application.

Further, after receiving the target video, the splicing application obtains a corresponding target video image sequence according to the target video, and then obtains an original image and text content corresponding to the video image in the target video image sequence to form an original image set and a text content set. The splicing application can display the original image set and the text content set through the newly-built window or display the original image set and the text content set in the current window of the splicing application. And then, triggering and generating a first image splicing instruction by acting on a splicing application, acquiring a target original image and target text content from an original image set and a text content set by the splicing application according to the first image splicing instruction, and finally splicing the target original image and the target text content by the splicing application to obtain a target spliced image.

In a specific embodiment, a video data processing method is provided, which specifically includes the following steps:

1. and when the terminal plays the current video, acquiring a video screenshot request.

1-1, the terminal obtains a video screenshot request through operation of a playing application interface acting on a current video.

1-2, the terminal obtains a video screenshot request through an interface acting on the splicing application, and sends the video screenshot request to the playing application of the current video so that the playing application returns to the target video, and the splicing application receives the target video and can execute all the following steps, namely, the execution main bodies of all the following steps are the splicing application.

2. And acquiring a target video according to the video screenshot request, acquiring a plurality of corresponding original images from the target video to obtain an original image set, and acquiring a plurality of corresponding text contents from the target video to obtain a text content set.

And 2-1, acquiring target video stream data from the target video according to the video screenshot request.

And 2-1-1, acquiring a video playing time point corresponding to the video screenshot request.

And 2-1-2, acquiring forward video stream data in a preset back-off time interval according to the video playing time point.

And 2-1-3, acquiring backward video stream data within a preset forward time interval according to the video playing time point.

And 2-1-4, splicing the forward video stream data and the backward video stream data to obtain target video stream data.

And 2-2, obtaining a corresponding target video image sequence according to the target video stream data.

2-2-1, obtaining a corresponding initial video image sequence according to the target video stream data.

2-2-2, identifying the target text content corresponding to each video image in the initial video image sequence.

2-2-3, carrying out duplication removal on the video images corresponding to the continuous same target text content to obtain a target video image sequence.

2-2-4, displaying each target video image in the target video image sequence, wherein the target video image comprises corresponding target text content.

And 2-2-5, acquiring a second image splicing instruction, acquiring video images to be spliced from the target video image sequence according to the second image splicing instruction, and splicing all the video images to be spliced to obtain a second spliced image.

And 2-3, acquiring original images and text contents corresponding to the video images in the target video image sequence to obtain an original image set and a text content set.

And 2-3-1, acquiring video playing time stamps corresponding to all the video images in the target video image sequence.

And 2-3-2, acquiring the matched original image and text content according to the video playing time stamp corresponding to each video image to form a current original image set and a current text content set.

And 2-3-3, carrying out duplication removal on the original images corresponding to the same continuous text content in the current original image set to obtain an original image set.

And 2-3-4, carrying out duplication removal on continuous same text contents in the current text content set to obtain a text content set.

3. And displaying the original image set and the text content set.

And 3-1, displaying the original image set and the text content set through a splicing window newly built in the playing application interface.

4. Acquiring a first image splicing instruction, acquiring a target original image from an original image set according to the first image splicing instruction, and acquiring target text content from a text content set according to the first image splicing instruction.

4-1, the first image splicing instruction comprises a plurality of combination sub-instructions and splicing sub-instructions, the combination sub-instructions are obtained, combination pairs are obtained according to the combination sub-instructions, and the combination pairs comprise original images of targets to be combined and text contents of the targets to be combined.

And 4-2, combining the matched original image of the target to be combined and the text content of the target to be combined to obtain a combined image corresponding to the combined pair.

And 4-3, acquiring a plurality of combined images corresponding to the plurality of combined sub-instructions.

And 4-4, splicing the plurality of combined images according to the splicing sub-instruction to obtain a target spliced image.

5. And acquiring a target video according to the video screenshot request, wherein the target video comprises at least one video frame.

6. At least one video frame in the target video is presented.

7. The method comprises the steps of obtaining a video frame selection instruction, selecting a video frame to be spliced from at least one video frame according to the video frame selection instruction, and splicing the video frame to be spliced to obtain a target spliced picture.

In an application scenario of video playing, as shown in fig. 9, fig. 9 is a schematic diagram illustrating a principle of a video data processing method in the application scenario of video playing in one embodiment, a screenshot button is arranged on an interface of a playing application of video playing, when a current video is played, the screenshot button of the playing application is clicked, the playing application can automatically acquire a current playing time point, and the current playing time point is the same as a time point at which a video screenshot request is triggered to be generated. The playing application acquires target video stream data from the current video according to the video screenshot request, for example, by taking the time point where the video screenshot request is triggered to be generated as a reference point, video stream data of 30 seconds is intercepted forwards and video stream data of 60 seconds is intercepted backwards, and the video stream data within 90 seconds is intercepted as the target video stream data. Then, a corresponding target video image sequence is obtained according to the target video stream data, for example, the target video stream data may be deframed into the corresponding target video image sequence by using an ffmpeg tool.

Further, an original image and a text content corresponding to a video image in the target video image sequence are obtained to form an original image set and a text content set, and specifically, a first variable PicList [ ] for storing the original image and a second variable word [ ] for storing the text content may be set, and then a for loop is executed: for (i ═ 0, i < len (List), i + +), judging whether the List is cycled over, if the cycling over, outputting the PicList, if not, continuing to read the next video image, if one video image is read, using the text content identification component to identify the text content in the video image, saving the identified text content as variable string, judging whether the string is equal to the last value of the word array, namely judging whether the text content corresponding to the next video image is repeated with the text content corresponding to the last video image. And if the text content corresponding to the next video image is repeated with the text content corresponding to the previous video image, discarding the video image, if the text content is not consistent with the text content corresponding to the previous video image, storing the original image corresponding to the video image into a first variable PicList for storing the original image, storing the text content corresponding to the video image into an array of a second variable word for storing the text content, and outputting an original image set and a text content set until the circulation is finished.

And finally, the terminal can display the original image set and the text content set through the front end, specifically can display the original image set and the text content set through a newly-built splicing window, or can directly display the original image set and the text content set on a playing application interface of the current video. The terminal user can generate a first image splicing instruction by acting on the original image set and the text content set, acquire a target original image and target text content from the original image set and the text content set according to the first image splicing instruction, and then splice the target original image and the target text content to obtain a target spliced image.

It should be understood that, although the steps in the above-described flowcharts are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least a portion of the steps in the above-described flowcharts may include multiple sub-steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, and the order of performing the sub-steps or the stages is not necessarily sequential, but may be performed alternately or alternatingly with other steps or at least a portion of the sub-steps or stages of other steps.

As shown in fig. 10, in one embodiment, there is provided a video data processing apparatus 1000, including:

the first obtaining module 1002 is configured to obtain a video screenshot request.

A second obtaining module 1004, configured to obtain a target video according to the video screenshot request, where the target video includes at least one video frame.

A presentation module 1006, configured to present at least one video frame in the target video.

The splicing module 1008 is configured to obtain a video frame selection instruction, select a video frame to be spliced from at least one video frame according to the video frame selection instruction, and splice the video frame to be spliced to obtain a target spliced picture.

In one embodiment, the splicing module 1008 is further configured to obtain a text content editing instruction, and edit the video frame to be spliced according to the text content editing instruction to obtain an edited video frame to be spliced; and splicing the edited video frames to be spliced to obtain a target spliced picture.

In one embodiment, the second obtaining module 1004 is further configured to obtain at least one original image from the target video to obtain an original image set, and obtain at least one text content from the target video to obtain a text content set.

The presentation module 1006 is further configured to present at least one original image in the original image set and present at least one text content in the text content set.

The splicing module 1008 is further configured to obtain a first image splicing instruction, select a target original image from the at least one original image according to the first image splicing instruction, and select a target text content from the at least one text content according to the first image splicing instruction.

The splicing module 1008 is further configured to splice the target original image and the target text content to obtain a target spliced picture.

As shown in fig. 11, in one embodiment, the second obtaining module 1004 includes:

and a target video stream data acquiring unit 1004a, configured to acquire target video stream data from the target video according to the video screenshot request.

A target video image sequence obtaining unit 1004b, configured to obtain a corresponding target video image sequence according to the target video stream data.

An original image set obtaining unit 1004c, configured to obtain an original image corresponding to a video image in the target video image sequence to obtain an original image set, and obtain a text content corresponding to the video image in the target video image sequence to obtain a text content set.

In one embodiment, as shown in fig. 12, the target video stream data acquisition unit 1004a includes:

a video playing time point obtaining sub-unit 1202, configured to obtain a video playing time point corresponding to the video screenshot request.

A forward video stream data obtaining subunit 1204, configured to obtain, according to the video playing time point, forward video stream data within a preset back-off time interval.

And a backward video stream data obtaining subunit 1206, configured to obtain, according to the video playing time point, backward video stream data within a preset forward time interval.

And the target video stream data combining subunit 1208, configured to splice the forward video stream data and the backward video stream data to obtain target video stream data.

In one embodiment, as shown in fig. 13, the target video image sequence acquisition unit 1004b includes:

the initial video image sequence obtaining subunit 1302 is configured to obtain a corresponding initial video image sequence according to the target video stream data.

An initial video image sequence identifying subunit 1304, configured to identify target text content corresponding to each video image in the initial video image sequence.

And the target text content deduplication subunit 1306 is configured to perform deduplication on video images corresponding to consecutive identical target text contents to obtain a target video image sequence.

In one embodiment, the video data processing apparatus shown in fig. 13 is further configured to present each target video image in the sequence of target video images, the target video image including corresponding target text content; and acquiring a second image splicing instruction, acquiring video images to be spliced from the target video image sequence according to the second image splicing instruction, and splicing all the video images to be spliced to obtain a second spliced image.

In one embodiment, as shown in fig. 14, the original image set acquisition unit 1004c includes:

the video playing timestamp obtaining subunit 1402 is configured to obtain a video playing timestamp corresponding to each video image in the target video image sequence.

An original image and text content obtaining subunit 1404, configured to obtain a matched original image according to the video playing timestamp corresponding to each video image to obtain a current original image set, and obtain a matched text content according to the video playing timestamp corresponding to each video image to obtain a current text content set.

The current original image set deduplication subunit 1406 is configured to perform deduplication on original images corresponding to the same continuous text content in the current original image set to obtain an original image set.

The current text content set deduplication subunit 1408 is configured to deduplicate consecutive identical text content in the current text content set to obtain a text content set.

In one embodiment, the first image stitching instruction includes a plurality of combining sub-instructions and stitching sub-instructions, and the video data processing apparatus 1000 is further configured to obtain a combining sub-instruction, obtain a combining pair according to the combining sub-instruction, where the combining pair includes an original image of a target to be combined and text content of the target to be combined; combining the matched original image of the target to be combined with the text content of the target to be combined to obtain a combined image corresponding to the combined pair; acquiring a plurality of combined images corresponding to the plurality of combined sub-instructions; and splicing the plurality of combined images according to the splicing sub-instruction to obtain a target spliced image.

In one embodiment, the video data processing apparatus 1000 is further configured to obtain a video screenshot request through an operation of a playing application interface acting on the current video, and display the original image set and the text content set through a newly created mosaic window on the playing application interface.

In one embodiment, the video data processing apparatus 1000 is further configured to obtain a video screenshot request through an interface acting on the splicing application, and send the video screenshot request to a playing application where the current video is located, so that the playing application returns to the target video.

FIG. 15 is a diagram showing an internal structure of a computer device in one embodiment. The computer device may specifically be a terminal. As shown in fig. 15, the computer apparatus includes a processor, a memory, a network interface, an input device, and a display screen connected through a system bus. Wherein the memory includes a non-volatile storage medium and an internal memory. The non-volatile storage medium of the computer device stores an operating system and may also store a computer program that, when executed by the processor, causes the processor to implement the video data processing method. The internal memory may also have stored therein a computer program that, when executed by the processor, causes the processor to perform a video data processing method. The display screen of the computer equipment can be a liquid crystal display screen or an electronic ink display screen, and the input device of the computer equipment can be a touch layer covered on the display screen, a key, a track ball or a touch pad arranged on the shell of the computer equipment, an external keyboard, a touch pad or a mouse and the like.

Those skilled in the art will appreciate that the architecture shown in fig. 15 is merely a block diagram of some of the structures associated with the disclosed aspects and is not intended to limit the computing devices to which the disclosed aspects apply, as particular computing devices may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.

In one embodiment, the video data processing apparatus provided herein may be implemented in the form of a computer program that is executable on a computer device such as that shown in fig. 15. The memory of the computer device may store various program modules constituting the video data processing apparatus, such as a video screenshot request acquisition module, a target video stream data acquisition module, a presentation module, a splicing instruction acquisition module, and a spliced picture generation module shown in fig. 10. The computer program constituted by the respective program modules causes the processor to execute the steps in the video data processing method of the respective embodiments of the present application described in the present specification.

For example, the computer device shown in fig. 15 may execute the get video screenshot request through the first get module in the video data processing apparatus shown in fig. 10. And the second acquisition module acquires a target video according to the video screenshot request, wherein each video frame in the target video comprises an original image and corresponding text content. The display module performs display of each video frame in the target video. And the splicing module executes the video frame selection instruction, selects the video frames to be spliced from the video frames according to the video frame selection instruction, and splices the video frames to be spliced to obtain the target spliced picture.

In one embodiment, a computer device is provided, comprising a memory and a processor, the memory storing a computer program which, when executed by the processor, causes the processor to perform the steps of the video data processing method described above. Here, the steps of the video data processing method may be steps in the video data processing methods of the respective embodiments described above.

In one embodiment, a computer-readable storage medium is provided, in which a computer program is stored, which, when executed by a processor, causes the processor to carry out the steps of the above-mentioned video data processing method. Here, the steps of the video data processing method may be steps in the video data processing methods of the respective embodiments described above.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by a computer program, which can be stored in a non-volatile computer-readable storage medium, and can include the processes of the embodiments of the methods described above when the program is executed. Any reference to memory, storage, database, or other medium used in the embodiments provided herein may include non-volatile and/or volatile memory, among others. Non-volatile memory can include read-only memory (ROM), Programmable ROM (PROM), Electrically Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), Double Data Rate SDRAM (DDRSDRAM), Enhanced SDRAM (ESDRAM), Synchronous Link DRAM (SLDRAM), Rambus Direct RAM (RDRAM), direct bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM).

The technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.

The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the present application. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims

1. A video data processing method, comprising:

acquiring a video screenshot request;

acquiring a target video according to the video screenshot request, wherein the target video comprises at least one video frame;

displaying the at least one video frame in the target video; the at least one video frame comprises at least one original image and at least one text content acquired from the target video;

acquiring a first image splicing instruction, selecting a target original image from the at least one original image according to the first image splicing instruction, and selecting target text content from the at least one text content according to the first image splicing instruction;

and splicing the target original image and the target text content to obtain a target spliced image.

2. The method of claim 1, further comprising:

acquiring a video frame selection instruction, and selecting a video frame to be spliced from the at least one video frame according to the video frame selection instruction;

acquiring a text content editing instruction, and editing the video frame to be spliced according to the text content editing instruction to obtain an edited video frame to be spliced;

and splicing the edited video frames to be spliced to obtain a target spliced picture.

3. The method of claim 1, wherein said presenting the at least one video frame in the target video comprises:

acquiring at least one original image from the target video to obtain an original image set, and acquiring at least one text content from the target video to obtain a text content set;

presenting the at least one original image in the set of original images;

presenting the at least one textual content of the set of textual contents.

4. The method of claim 1, wherein said obtaining a target video according to the video screenshot request comprises:

acquiring the triggering time when the video screenshot request is triggered;

and acquiring the target video according to the trigger time and the preset duration.

5. The method of claim 3, wherein the obtaining at least one original image from the target video to obtain an original image set and obtaining at least one text content from the target video to obtain a text content set comprises:

acquiring target video stream data from the target video according to the video screenshot request;

obtaining a corresponding target video image sequence according to the target video stream data;

and acquiring an original image corresponding to the video image in the target video image sequence to obtain an original image set, and acquiring text content corresponding to the video image in the target video image sequence to obtain a text content set.

6. The method of claim 5, wherein obtaining target video stream data from the target video according to the video screenshot request comprises:

acquiring a video playing time point corresponding to the video screenshot request;

acquiring forward video stream data in a preset backspacing time interval according to the video playing time point;

acquiring backward video stream data within a preset forward time interval according to the video playing time point;

and splicing the forward video stream data and the backward video stream data to obtain the target video stream data.

7. The method of claim 5, wherein said deriving a corresponding sequence of target video images from said target video stream data comprises:

obtaining a corresponding initial video image sequence according to the target video stream data;

identifying target text content corresponding to each video image in the initial video image sequence;

and carrying out duplication removal on the video images corresponding to the continuous same target text content to obtain the target video image sequence.

8. The method of claim 7, further comprising:

displaying each target video image in the target video image sequence, wherein the target video image comprises corresponding target text content;

and acquiring a second image splicing instruction, acquiring video images to be spliced from the target video image sequence according to the second image splicing instruction, and splicing the video images to be spliced to obtain a second spliced image.

9. The method according to claim 5, wherein the obtaining of the original image corresponding to the video image in the target video image sequence to obtain an original image set and obtaining of the text content corresponding to the video image in the target video image sequence to obtain a text content set comprises:

acquiring video playing time stamps corresponding to all video images in the target video image sequence;

acquiring matched original images according to the video playing time stamps corresponding to the video images to obtain a current original image set;

acquiring matched text content according to the video playing time stamp corresponding to each video image to obtain a current text content set;

carrying out duplication removal on original images corresponding to continuous same text contents in the current original image set to obtain an original image set;

and carrying out duplication removal on continuous same text contents in the current text content set to obtain the text content set.

10. The method of claim 3, wherein the first image stitching instruction comprises a plurality of combine sub-instructions and stitching sub-instructions; the obtaining of the first image splicing instruction, obtaining a target original image from the original image set according to the first image splicing instruction, and obtaining target text content from the text content set according to the first image splicing instruction; splicing the target original image and the target text content to obtain a target spliced image, wherein the target spliced image comprises:

acquiring a combination sub-instruction, and acquiring a combination pair according to the combination sub-instruction, wherein the combination pair comprises an original image of a target to be combined and text content of the target to be combined;

combining the matched target original image to be combined with the target text content to be combined to obtain a combined image corresponding to the combined pair;

acquiring a plurality of combined images corresponding to the plurality of combined sub-instructions;

and splicing the plurality of combined images according to the splicing sub-instruction to obtain the target spliced image.

11. The method of claim 3, wherein the obtaining a video screenshot request comprises:

acquiring the video screenshot request through operation of a playing application interface acting on the current video;

the presenting the at least one video frame in the target video comprises:

and displaying the original image set and the text content set through a splicing window newly built in the playing application interface.

12. The method of claim 1, wherein the obtaining a video screenshot request comprises:

acquiring the video screenshot request through an interface acting on a splicing application;

the obtaining of the target video according to the video screenshot request comprises:

and sending the video screenshot request to a playing application where the current video is located so that the playing application returns the target video.

13. A video data processing apparatus, characterized in that the apparatus comprises:

the first acquisition module is used for acquiring a video screenshot request;

the second acquisition module is used for acquiring a target video according to the video screenshot request, wherein the target video comprises at least one video frame;

a first presentation module for presenting the at least one video frame in the target video; the at least one video frame comprises at least one original image and at least one text content acquired from the target video;

the selecting module is used for acquiring a first image splicing instruction, selecting a target original image from the at least one original image according to the first image splicing instruction, and selecting target text content from the at least one text content according to the first image splicing instruction;

and the first splicing module is used for splicing the target original image and the target text content to obtain a target spliced image.

14. The apparatus of claim 13, further comprising:

the third acquisition module is used for acquiring a video frame selection instruction and selecting a video frame to be spliced from the at least one video frame according to the video frame selection instruction;

the fourth obtaining module is used for obtaining a text content editing instruction, editing the video frame to be spliced according to the text content editing instruction, and obtaining an edited video frame to be spliced;

and the second splicing module is used for splicing the edited video frames to be spliced to obtain a target spliced picture.

15. The apparatus of claim 13, wherein the first display module is further configured to:

presenting the at least one original image in the set of original images;

presenting the at least one textual content of the set of textual contents.

16. The apparatus of claim 13, wherein the second obtaining module is further configured to:

acquiring the triggering time when the video screenshot request is triggered;

17. The apparatus of claim 15, wherein the first display module is further configured to:

18. The apparatus of claim 17, wherein the first display module is further configured to:

19. The apparatus of claim 17, wherein the first display module is further configured to:

20. The apparatus of claim 19, further comprising:

the second display module is further used for displaying each target video image in the target video image sequence, wherein the target video image comprises corresponding target text content;

and the fifth acquisition module is used for acquiring a second image splicing instruction, acquiring video images to be spliced from the target video image sequence according to the second image splicing instruction, and splicing the video images to be spliced to obtain a second spliced image.

21. The apparatus of claim 17, wherein the first display module is further configured to:

22. The apparatus of claim 15, wherein the first image stitching instruction comprises a plurality of combine sub-instructions and stitching sub-instructions; the obtaining of the first image splicing instruction, obtaining a target original image from the original image set according to the first image splicing instruction, and obtaining target text content from the text content set according to the first image splicing instruction; the selecting module is further configured to:

23. The apparatus of claim 15, wherein the first obtaining module is further configured to:

the first display module is further configured to:

24. The apparatus of claim 13, wherein the first obtaining module is further configured to:

the second obtaining module is further configured to:

25. A computer-readable storage medium, storing a computer program which, when executed by a processor, causes the processor to carry out the steps of the method according to any one of claims 1 to 12.

26. A computer device comprising a memory and a processor, the memory storing a computer program that, when executed by the processor, causes the processor to perform the steps of the method of any one of claims 1 to 12.