CN112235516A

CN112235516A - Video generation method, device, server and storage medium

Info

Publication number: CN112235516A
Application number: CN202011019061.4A
Authority: CN
Inventors: 李银辉; 赵俊; 李庆; ***; 刘凤华
Original assignee: Beijing Dajia Internet Information Technology Co Ltd
Current assignee: Beijing Dajia Internet Information Technology Co Ltd
Priority date: 2020-09-24
Filing date: 2020-09-24
Publication date: 2021-01-15
Anticipated expiration: 2040-09-24
Also published as: CN112235516B

Abstract

The present disclosure relates to a video generation method, a video generation device, a server, and a storage medium, wherein the video generation method includes: receiving a video generation request sent by electronic equipment, wherein the video generation request comprises an original video; determining a target background image according to a video frame of an original video, wherein the target background image comprises at least one of the following items: key frames in the original video and images with information displayed on the video frames; and generating a target video according to the target background image and the original video, wherein the background image on the playing interface of the target video is the target background image. By using the method disclosed by the invention, the target background image can be automatically determined, and a video producer does not need to manually configure the background image of the target video, so that the video producer can simplify the video production operation and save the video production time.

Description

Video generation method, device, server and storage medium

Technical Field

The present disclosure relates to the field of multimedia technologies, and in particular, to a video generation method, apparatus, server, and storage medium.

Background

With the development of multimedia technology, people can process a pre-prepared original video (such as a video file) to generate a video by using video processing software.

In the related art, a video generation method generally processes an original video through video processing software by a video producer, wherein the original video is configured with a background image. After processing the original video, a video may be generated.

In the above video generation method, since a video creator is required to manually configure the background image, the operation of creating the video is complicated and time-consuming.

Disclosure of Invention

The present disclosure provides a video generation method, an apparatus, a server and a storage medium, so as to at least solve the problem in the related art that the operation of making a video is relatively complicated and time-consuming due to the need of manually configuring a background image by a video producer. The technical scheme of the disclosure is as follows:

according to a first aspect of the embodiments of the present disclosure, there is provided a video generation method, including:

receiving a video generation request sent by electronic equipment, wherein the video generation request comprises an original video;

determining a target background image according to the video frame of the original video, wherein the target background image comprises at least one of the following items: a key frame in the original video and an image with information displayed on the video frame;

and generating a target video according to the target background image and the original video, wherein the background image on the playing interface of the target video is the target background image.

In one or more embodiments of the present disclosure, the determining a target background image according to a video frame of the original video includes:

acquiring a plurality of video frames from the original video;

identifying elements that each video frame of the plurality of video frames includes;

for any identified target element, acquiring a first video frame with the target element from the plurality of video frames;

when the number of the first video frames is larger than a preset number, determining the target background image according to the target element, wherein the target background image comprises at least one of the following items: a pattern of the target element and text information of the target element.

In one or more embodiments of the present disclosure, the identifying elements that each video frame of the plurality of video frames includes:

acquiring a target theme type of the original video;

according to the target theme type, identifying the elements which are included in each video frame and matched with the target theme type.

In one or more embodiments of the present disclosure, the determining the target background image according to the target element includes:

acquiring one or two target video frames in the plurality of first video frames, wherein the target video frames are: a video frame characterizing an overall structure of the target element, a video frame characterizing a local structure of the target element, or a video frame of a usage scene of the target element;

determining at least one of the target video frames as the target background image.

cropping a sub-image of the target element from at least one of the first video frames;

and adding the sub-image of the target element to a first preset image to obtain the target background image.

acquiring a preset keyword corresponding to the target element;

identifying text information comprising the preset keywords on a video frame of the original video;

and adding the text information to a second preset image to obtain the target background image.

In one or more embodiments of the present disclosure, before the adding the text information to the second preset image to obtain the target background image, the method further includes:

acquiring the number of pixel points of each color in the plurality of video frames and the total number of pixel points of the plurality of video frames for each color in a plurality of colors on the original video;

calculating the ratio of the pixel point number of each color to the total pixel point number to obtain the ratio of each color;

under the condition that the plurality of colors are arranged according to the size sequence of the occupation ratios, at least one target color is obtained from the plurality of colors from the color with the largest occupation ratio;

adding the text information to a second preset image to obtain the target background image, wherein the method comprises the following steps:

and adding the text information to a second preset image with the target color to obtain the target background image.

In one or more embodiments of the present disclosure, in a case that the number of the text information is multiple, before the adding the text information to a second preset image to obtain the target background image, the method further includes:

acquiring first text information corresponding to the target element from a text information base;

acquiring second text information with the highest similarity with the preset text information from the plurality of text information;

and adding the second text information to the second preset image to obtain the target background image.

In one or more embodiments of the present disclosure, the number of the target background images is plural; generating a target video according to the target background image and the original video, wherein the generating of the target video comprises:

generating a plurality of target videos according to a plurality of target background images and the original videos, wherein one target video is generated according to at least one target background image and the original videos, and different target videos are generated according to different target background images.

According to a second aspect of the embodiments of the present disclosure, there is provided a video generating apparatus including:

the request receiving module is configured to receive a video generation request sent by electronic equipment, wherein the video generation request comprises an original video;

an image determination module configured to determine a target background image from video frames of the original video, the target background image comprising at least one of: a key frame in the original video and an image with information displayed on the video frame;

and the video generation module is configured to generate a target video according to the target background image and the original video, wherein the background image on the playing interface of the target video is the target background image.

In one or more embodiments of the present disclosure, the image determination module includes:

a first acquisition unit configured to acquire a plurality of video frames from the original video;

an element identification unit configured to identify elements included in each of the plurality of video frames;

a second acquiring unit configured to acquire, for any one of the identified target elements, a first video frame having the target element among the plurality of video frames;

an image determining unit configured to determine the target background image according to the target element when the number of the first video frames is greater than a predetermined number, the target background image including at least one of: a pattern of the target element and text information of the target element.

In one or more embodiments of the present application, the element identifying unit includes:

a type obtaining subunit, configured to obtain a target theme type of the original video;

an element identification subunit configured to identify, according to the target topic type, an element included in each video frame that matches the target topic type.

In one or more embodiments of the present application, the image determination unit includes:

a key frame determination subunit configured to obtain, among the plurality of first video frames, one or two target video frames, where the target video frames are: a video frame characterizing an overall structure of the target element, a video frame characterizing a local structure of the target element, or a video frame of a usage scene of the target element;

a first image determination subunit configured to determine one or both of the target video frames as the target background image.

a cropping subunit configured to crop out a sub-image of the target element from at least one of the first video frames;

the image adding subunit is configured to add the sub-image of the target element to a first preset image to obtain the target background image.

In one or more embodiments of the present application, the image determination module comprises:

the text recognition unit is configured to recognize text information including preset keywords on a video frame of the original video, wherein the preset keywords are keywords corresponding to the target elements;

the text adding unit is configured to add the text information to a second preset image to obtain the target background image.

In one or more embodiments of the present application, the apparatus further comprises:

a pixel point obtaining module configured to obtain, for each of a plurality of colors on the original video, a number of pixel points of the each color in the plurality of video frames and a total number of pixel points of the plurality of video frames;

the proportion calculation module is configured to calculate the ratio of the number of the pixels of each color to the total number of the pixels to obtain the proportion of each color;

a color obtaining module configured to obtain at least one target color among the plurality of colors starting from a color with a largest proportion in a case where the plurality of colors are arranged in a size order of the proportion;

the text adding unit is specifically configured to: and adding the text information to a second preset image with the target color to obtain the target background image.

In one or more embodiments of the present application, in a case where the number of the text information is plural, the apparatus further includes:

the first text acquisition module is configured to acquire first text information corresponding to the target element from a text information base;

the second text acquisition module is configured to acquire second text information with the highest similarity to the preset text information from the plurality of text information;

the text adding unit is specifically configured to: and adding the second text information to the second preset image to obtain the target background image.

In one or more embodiments of the present disclosure, the video generation module is specifically configured to:

According to a third aspect of the embodiments of the present disclosure, there is provided a server, including:

a processor;

a memory for storing the processor-executable instructions;

wherein the processor is configured to execute the instructions to implement the video generation method of any of the above embodiments.

According to a fourth aspect of embodiments of the present disclosure, there is provided a storage medium, wherein instructions, when executed by a processor of a server, enable the server to perform the video generation method of any one of the above embodiments.

According to a fifth aspect of embodiments of the present disclosure, there is provided a computer program product, wherein instructions of the computer program product, when executed by a processor of a server, enable the server to perform the video generation method of any of the above embodiments.

The technical scheme provided by the embodiment of the disclosure at least brings the following beneficial effects:

determining a target background image according to a video frame of an original video; and then, generating a target video according to the target background image and the original video. The target background image is automatically determined according to the video frame of the original video, and a video producer does not need to manually configure the background image of the target video, so that the operation of producing the video by the video producer can be simplified, and the time for producing the video can be saved.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and, together with the description, serve to explain the principles of the disclosure and are not to be construed as limiting the disclosure.

Fig. 1 is an architecture diagram illustrating a system for implementing a video generation method in accordance with an exemplary embodiment.

Fig. 2 is a flow diagram illustrating a video generation method according to an example embodiment.

FIG. 3 is a schematic diagram illustrating a playback interface for a target video, according to an example embodiment.

Fig. 4 is a flow diagram illustrating another video generation method according to an example embodiment.

Fig. 5 is a schematic diagram illustrating a first video frame according to an example embodiment.

FIG. 6 is a diagram illustrating a target background image according to an exemplary embodiment.

Fig. 7 is a flow chart illustrating yet another video generation method according to an exemplary embodiment.

FIG. 8 is a schematic diagram illustrating another target background image according to an example embodiment.

FIG. 9 is a schematic diagram illustrating another target video playback interface in accordance with an illustrative embodiment.

FIG. 10 is a schematic diagram illustrating yet another target video playback interface in accordance with an illustrative embodiment.

Fig. 11 is a schematic diagram illustrating a playback interface for yet another target video according to an example embodiment.

Fig. 12 is a schematic diagram illustrating a playback interface for yet another target video according to an example embodiment.

Fig. 13 is a schematic diagram illustrating a playback interface for yet another target video according to an example embodiment.

Fig. 14 is a block diagram illustrating a video generation apparatus according to an example embodiment.

FIG. 15 is a block diagram illustrating a server in accordance with an exemplary embodiment.

Detailed Description

In order to make the technical solutions of the present disclosure better understood by those of ordinary skill in the art, the technical solutions in the embodiments of the present disclosure will be clearly and completely described below with reference to the accompanying drawings.

It should be noted that the terms "first," "second," and the like in the description and claims of the present disclosure and in the above-described drawings are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the disclosure described herein are capable of operation in sequences other than those illustrated or otherwise described herein. The implementations described in the exemplary embodiments below are not intended to represent all implementations consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present disclosure, as detailed in the appended claims.

The present disclosure provides a video generation method, and a system for implementing the video generation method is first described below. Fig. 1 is an architecture diagram illustrating a system for implementing a video generation method in accordance with an exemplary embodiment. As shown in fig. 1, a system for implementing a video generation method includes an electronic device 102 and a server 104.

The electronic device 102 receives the uploaded original video and receives instructions to generate a video based on the original video. Then, the electronic device 102 sends a video generation request to the server, where the video generation request includes the original video.

Under the condition that the server 104 receives a video generation request sent by the electronic equipment 102, determining a target background image according to a video frame of an original video; and then, generating a target video according to the target background image and the original video.

After the server 104 generates the target video, the electronic device 102 may download the target video from the server 104, thereby enabling downloading of the target video locally. Alternatively, after the server 104 generates the target video, the server 104 transmits the generated target video to the electronic device 102.

The system described above is illustrated by way of an example.

The electronic device 102 is installed with video processing software, receives the uploaded original video through the video processing software, and then receives an instruction to generate a video based on the original video. The electronic device 102 sends a video generation request to the server through the video processing software, wherein the video generation request includes the original video.

The server 104 receives the video generation request, and determines a target background image according to the video generation request including the original video; and then, generating a target video according to the target background image and the original video. When the target video is played, the original video is displayed in the playing area of the target video, and the target background image is displayed in the background image area of the target video. The background image area may be an area located above the playback area of the target video and an area located below the playback area of the target video.

As one example, the video processing software may be a client or an application.

In addition to the above, the electronic device 102 may implement uploading of the original video through a web page.

The system can realize the video generation method provided by the disclosure. Based on the above system, the video generation method is described below, and fig. 2 is a flowchart illustrating a video generation method according to an exemplary embodiment. The video generation method may be applied to a server, and as shown in fig. 2, the video generation method 200 includes:

s202, receiving a video generation request sent by the electronic equipment, wherein the video generation request comprises an original video;

s204, determining a target background image according to the video frame of the original video, wherein the target background image comprises at least one of the following items: key frames in the original video and images with information displayed on the video frames;

and S206, generating a target video according to the target background image and the original video, wherein the background image on the playing interface of the target video is the target background image.

The above-described steps are explained below.

In S202, the server may receive a video generation request transmitted by the electronic device through the internet. The original video in the video generation request may be a video file captured by the electronic device. Alternatively, the original video in the video generation request may be a video file sent by the electronic device to receive other electronic devices.

As one example, the original video is a video file of a landscape screen.

In S204, as an example, a key frame in the original video may be determined as the target background image.

As another example, the information displayed on the video frame may include at least one of: color, text, and pictures on a video frame.

S204 may include: and acquiring a target color displayed on a video frame of the original video, and determining a background image with the target color as a target background image. The target color is a color on a video frame of the original video.

S204 may include: and adding the text or the picture displayed on the video frame of the original video to a preset image to obtain a target background image.

In S206, the target background image is used as a background image on the playing interface of the target video, the original video is used as a playing video on the playing interface of the target video, and the target video is generated.

In one example, a target video may be generated using a target background image. Namely, the upper and lower areas of the target video display the same target background image.

In another example, one target video may be generated using two target background images. The present embodiment is explained below with reference to fig. 3.

FIG. 3 is a schematic diagram illustrating a playback interface for a target video, according to an example embodiment. As shown in fig. 3, during the playing of the target video, the original video is played in a first area 302 on the playing interface, the first target background image is displayed in a second area 304 on the playing interface above the first area 302, and the second target background image is displayed in a third area 306 on the playing interface below the first area 302.

It should be noted that the first target background image displayed in the second area 304 and the second target background image displayed in the third area 306 may be the same target background image or different target background images.

In the embodiment of the disclosure, since the target background image is automatically determined according to the video frame of the original video, the background image of the target video does not need to be manually configured by a video producer, so that the operation of producing the video by the video producer can be simplified and the time for producing the video can be saved. In addition, the target background image may include key frames in the original video, and thus, key content of the original video may also be displayed to the user during the playing of the original video, thereby effectively communicating the key content to the user. In addition, the target background image can have information displayed on the video frame, so that the target background image can be adapted to the information on the video frame of the original video, and the original video in the target video and the target background image can be more coordinated when the target video is played.

In one or more embodiments of the present disclosure, as shown in fig. 4, S204 may include:

s2042: acquiring a plurality of video frames from an original video;

s2044: identifying elements included in each of a plurality of video frames;

s2046: acquiring a first video frame with a target element in a plurality of video frames aiming at any identified target element;

s2048: when the number of the first video frames is larger than the preset number, determining a target background image according to the target elements, wherein the target background image comprises at least one of the following items: a pattern of the target element and text information of the target element.

Next, S2042 to S2048 will be described.

In S2042, a plurality of video frames may be extracted from the original video in a uniform frame extraction manner. For example, in the case where one video frame is extracted every 1 second and the original video is a video of 50 seconds, 50 video frames may be extracted from the original video.

In S2044, elements included in each of the plurality of video frames may be identified in a manner of face recognition and an item recognition. The identified elements may include at least one of: an item and a person.

In S2046, for any one of the target elements identified in the plurality of video frames, a first video frame having the target element is acquired in the plurality of video frames. The number of the first video frames may be one or more.

In S2048, as one example, at least one first video frame may be determined as a target background image. As another example, a sub-image of the target element may be truncated from the first video frame; and determining the sub-image of the target element as a target background image, or pasting the sub-image of the target element on a preset image to obtain the target background image.

As an example, determining a target background image from the target element may include: determining a target background image according to a first video frame meeting a predetermined condition, wherein the predetermined condition comprises at least one of the following items: the resolution of the video frame is greater than a predetermined threshold; no watermark exists in the video frame; the target element is located in a central region of the video frame; the video frame comprises complete sentences or complete phrases; in the case where the video frame includes a person, the person in the video frame is in an open-eye state.

In the embodiment of the disclosure, by acquiring the first video frame with the target element from the plurality of video frames, when the number of the first video frames is greater than the predetermined number, it is described that the number of times that the target element appears in the original video is greater, and it is further described that the target element is a key element in the original video. Then, a target background image can be determined from the first video frame with the target elements, the target background image including key elements of the original video. Therefore, the target background image comprises the key content of the original video, and the key content of the original video can run through the whole video, so that a user can conveniently obtain effective information from the video.

In one or more embodiments of the present disclosure, S2044 may include:

acquiring a target theme type of an original video;

according to the target topic type, elements included in each video frame are identified, wherein the elements are matched with the target topic type.

As one example, the target subject type of the original video may be a makeup category, a dress category, a travel category, a fun category, or the like.

As an example, identifying, according to the target topic type, an element included in each video frame that matches the target topic type may include: acquiring an element library corresponding to the target theme type; elements included in each video frame and in the library of elements are identified.

The following description is made by way of an example.

Assuming that the target theme type of the original video is a cosmetic category, an element library corresponding to the cosmetic category is obtained, and the element library includes skin care products, cosmetics, headwear, earrings, necklaces and the like. Then, based on the library of elements corresponding to the cosmetic class, elements in the library of elements included in each video frame are identified.

In the embodiment of the present disclosure, the elements included in each video frame, which match the target topic type, are identified according to the target topic type of the original video, and then, according to the target element in the identified elements, a target background image including the target element is determined. Therefore, the target background image can be made to conform to the target theme type of the original video, and further, the target background image in the finally generated target video can be made to be more matched with the original video.

In one or more embodiments of the present disclosure, the determining the target background image according to the target element in S2046 may include:

acquiring one or two target video frames from the plurality of first video frames, wherein the target video frames are video frames representing the whole structure of a target element, video frames representing the local structure of the target element or video frames of a use scene of the target element;

one or two target video frames are determined as target background images.

As an example, in the case of acquiring one target video frame, two target background images on the upper and lower sides of the video may be the same target video frame. In the case of acquiring two target video frames, one of the target video frames serves as a target background image on the upper side of the video, and the other serves as a target background image on the lower side of the video.

As one example, one target video may be generated using two target background images. One of the target background images is a main background image, and the other target background image is a detail background image. The main background image may satisfy at least one of: the method comprises the steps of representing the outline of a target element, reflecting the use scene of the target element, highlighting the target element and carrying out lens interaction. The detail background image may satisfy at least one of: the system includes a camera (e.g., a camera) that characterizes the local structure of the target element (i.e., details of the target element are displayed), reflects the usage scenario of the target element, and shot interactions.

The following takes an advertisement video with lipstick as an example to determine the target background image.

In a plurality of first video frames including the lipstick, a video frame representing the overall structure of the lipstick is obtained, and the video frame is determined as a target background image, so that the overall structure of the lipstick is shown to a user through the target background image. In addition, a video frame which represents the local structure of the lipstick can be obtained from a plurality of first video frames comprising the lipstick, and the video frame is determined as a target background image, so that the details of the lipstick are shown to a user through the target background image. In addition, a video frame of the lips painted with the lipstick can be acquired from a plurality of first video frames including the lipstick, and the video frame is a use scene of the lipstick.

In the embodiment of the present disclosure, at least one of a video frame characterizing an overall structure of the target element, a video frame characterizing a local structure of the target element, and a video frame of a usage scene of the target element is acquired. Therefore, the high-quality target background image is obtained from the plurality of first video frames, and the characteristics of the target elements can be reflected by the target background image.

cropping a sub-image of the target element from the at least one first video frame;

and adding the sub-image of the target element to the first preset image to obtain a target background image.

As an example, a sub-image of the target element may be cropped from a first video frame.

As another example, sub-images of the target element at different angles may be cropped from the respective first video frames.

The first video frame in which the sub-image of the target element is cut out may be a video frame representing the overall structure of the target element, a video frame representing the local structure of the target element, or a video frame representing the usage scene of the target element.

As an example, a sub-image of the target element is added to a preset position of the first preset image, resulting in a target background image. For example, the sub-image of the target element is added to the center position of the first preset image to obtain a target background image.

As one example, the first preset image may be a solid color image. For example, the first preset image is a pure black image.

As another example, the first preset image may be an image having a background pattern.

As yet another example, the first preset image may be an image having a target color, which is a color on a video frame of the original video.

The target element is exemplified as lipstick, and is explained with reference to fig. 5.

Fig. 5 is a schematic diagram illustrating a first video frame according to an example embodiment. As shown in fig. 5, the first video frame includes a lipstick pattern and a promotional literature for lipstick, in which case a sub-image 308 of the lipstick pattern is cut out from the first video frame, and then the sub-image 308 is added to the black image, resulting in the target background image shown in fig. 6.

In the embodiment of the present disclosure, a sub-image of a target element cut out from a first video frame is added to a first preset image, so as to obtain a target background image. The target background image may be made to include key elements of the original video. Thereby, the target background image is made to include the key content of the original video.

In one or more embodiments of the present disclosure, as shown in fig. 7, S204 may include:

s2050: acquiring a preset keyword corresponding to a target element;

s2052: recognizing text information including preset keywords on a video frame of an original video;

s2054: and adding the text information to a second preset image to obtain a target background image, wherein the target color is the color on the video frame of the original video.

As an example, the preset keyword is a keyword corresponding to the target element in the keyword library. For example, if the target element is lipstick, the keywords corresponding to the target element include lipstick, color number, net red, time limit, offer, special sale, and limit amount.

As one example, the second preset image may be a solid color image.

As another example, the second preset image may be a preset image that is a main tone in a certain color. That is, the second preset image may not be a solid image, and the second preset image may have some patterns or characters thereon.

Next, the above-described S2050 and S2052 will be described by taking the target element as lipstick as an example.

Text information on a video frame of an original video is recognized, and text information including lipstick keywords, namely 'recent fire-bursting lipstick charm series' and 'time-limited preference', is obtained. Since the original video has a dominant hue of pink, the text information is added to the pink background image, resulting in two target background images as shown in fig. 8. Thus, a target video as shown in fig. 9 can be generated from the target background image.

In the embodiment of the present disclosure, the text information on the video frame is added to the second preset image with the target color, so as to obtain the target background image. The target color is the color on the video frame of the original video, so that the color of the target background image can be adapted to the color on the video frame of the original video, and the color of the original video in the target video is more coordinated with the color of the target background image when the target video is played. In addition, the key information on the video frame is arranged on the target background image, so that the key content of the original video can be included on the target background image, and the key content can be effectively conveyed to the user through the target background image.

In one or more embodiments of the present disclosure, before S2054, the video generating method may further include:

acquiring the number of pixel points of each color in a plurality of video frames and the total number of pixel points of the plurality of video frames aiming at each color in a plurality of colors on an original video;

in the case where the plurality of colors are arranged in order of the size of the aspect ratio, at least one target color is obtained among the plurality of colors, starting from the color with the largest aspect ratio.

S2054 may include: and adding the text information to a second preset image with the target color to obtain a target background image.

The color proportion of any one color a on the original video is described below by way of an example.

Suppose that 500 video frames are obtained from the original video, the pixel number of the color a is obtained from 500 video frames, and the ratio of the pixel number of the color a to the total pixel number of the 500 video frames is calculated, which is the ratio of the color a to the video frames of the original video.

As one example, in a case where N target videos need to be generated, N target colors may be acquired from a plurality of colors on a video frame of an original video, where the N target colors are N colors preceding the video frame of the original video. Based on a target color, a target video may be generated.

For example, 2 target videos need to be generated, two target colors, which are pink and rose, are obtained from the multiple colors on the video frame of the original video. A target video, whose background image is pink, is generated as shown in fig. 9 based on the pink. The target video shown in fig. 10 is generated based on the rose color, and the color of the background image of the target video is rose color.

In the embodiment of the present disclosure, the color of the main tone of the original video (i.e., the target color) may be obtained in the above manner, and then, the text information is added to the second preset image having the main tone color of the original video to obtain the target background image, so that the tone of the target background image is adapted to the tone of the original video.

It should be noted that the first preset image may be a preset image having a target color. For example, a first predetermined image may be a solid image of a target color.

In order to screen out text information on the target background image from the plurality of text information in the case that the number of text information is multiple, in one or more embodiments of the present disclosure, before adding the text information to a second preset image and obtaining the target background image, the video generation method may further include:

acquiring first text information corresponding to a target element from a text information base;

and acquiring second text information with the highest similarity with the preset text information from the plurality of text information.

Adding the text information to a second preset image to obtain a target background image, wherein the method comprises the following steps:

and adding the second text information to a second preset image to obtain a target background image.

The second text information is described below by taking the target element as lipstick as an example.

If multiple promotional texts with lipstick on the video frame of the original video are identified, all of the promotional texts cannot be displayed on the target background image due to the limited area of the target background image. In order to select a suitable propaganda text, in the embodiment of the disclosure, first text information corresponding to a lipstick is obtained from a text information base, the first text information is a propaganda template of the lipstick in the text information base, and the first text information is a propaganda template which has high practical user acceptance and good propaganda effect.

Then, second text information having the highest similarity to the first text information is acquired from the plurality of text information. Since the first text information has high user acceptance and a good promotional effect, and the second text information is similar to the first text information, it can be considered that the second text information has high user acceptance and a good promotional effect. And then, adding the second text information to a second preset image to obtain a target background image.

In the embodiment of the disclosure, when the number of the text messages is multiple, according to the first text message corresponding to the target element in the text message library, the second text message with the highest similarity to the preset text message is obtained from the multiple text messages. Therefore, the second text information on the target background image is reasonably selected from the plurality of text information of the original video.

In one or more embodiments of the present disclosure, the number of the target background images is plural; generating a target video according to the target background image and the original video, wherein the generating comprises the following steps:

and generating a plurality of target videos according to the plurality of target background images and the original video, wherein one target video is generated according to at least one target background image and the original video, and different target videos are generated according to different target background images.

As one example, from a plurality of target background images and an original video, a plurality of target videos as shown in fig. 3, 9, 10, 11, 12, and 13 may be generated. Since the above description has been made on fig. 3, 9 and 10, the description is not repeated here. Fig. 11, 12, and 13 will be explained below.

In fig. 11, a lipstick picture in an original video is captured, and the lipstick picture is added to a preset image, so as to obtain a target background image displayed above a playing interface. Adding the 'time-limited preference' of the text information of the lipstick to a preset image to obtain a target background image displayed below the playing interface.

In fig. 12, a background image having a dominant hue color of an original video is taken as a target background image displayed above a playback interface. The text information of lipstick, namely 'recently exploded lipstick' and 'charm series', is added to the background image to obtain a target background image displayed below the playing interface.

In fig. 13, the text information "net red pop" of the lipstick is added to the background image having the dominant hue of the original video, resulting in a target background image displayed above the playing interface. And taking the background image with the main tone color of the original video as a target background image displayed below the playing interface.

In the related art, only transitions, special effects, and titles are different among the generated target videos, and the difference among the target videos is small. Compared with the related art, the embodiment of the disclosure can generate a plurality of target videos, and the target background image of each target video is different, so that the plurality of target videos can be directly presented in a diversified manner. The problem that a user spends more time making background images in the process of making videos is solved.

In one or more embodiments of the present disclosure, after generating the target video, the video generation method may further include: and in response to the operation of modifying the target background picture, modifying the text information on the target background picture or the style of the target background picture. Therefore, the generated target video can meet the requirements of users.

Fig. 14 is a block diagram illustrating a video generation apparatus according to an example embodiment. Referring to fig. 14, the apparatus 400 includes a request receiving module 402, an image determining module 404, and a video generating module 406.

A request receiving module 402, configured to receive a video generation request sent by an electronic device, where the video generation request includes an original video;

an image determination module 404 configured to determine a target background image from the video frames of the original video, the target background image comprising at least one of: key frames in the original video and images with information displayed on the video frames;

and the video generating module 406 is configured to generate a target video according to the target background image and the original video, where the background image on the playing interface of the target video is the target background image.

The target background image is automatically determined according to the video frame of the original video, and a video producer does not need to manually configure the background image of the target video, so that the operation of producing the video by the video producer can be simplified, and the time for producing the video can be saved. In addition, the target background image may include key frames in the original video, and thus, key content of the original video may also be displayed to the user during the playing of the original video, thereby effectively communicating the key content to the user. In addition, the target background image can have information displayed on the video frame, so that the target background image can be adapted to the information on the video frame of the original video, and the original video in the target video and the target background image can be more coordinated when the target video is played.

In one or more embodiments of the present disclosure, the image determination module 404 may include:

a first acquisition unit configured to acquire a plurality of video frames from an original video;

an element identification unit configured to identify elements included in each of a plurality of video frames;

a second acquisition unit configured to acquire, for any one of the identified target elements, a first video frame having the target element among the plurality of video frames;

an image determining unit configured to determine a target background image according to the target element when the number of the first video frames is greater than a predetermined number, the target background image including at least one of: a pattern of the target element and text information of the target element.

By acquiring a first video frame with a target element from a plurality of video frames, when the number of the first video frames is greater than a predetermined number, it is indicated that the target element appears in the original video more times, and thus it is indicated that the target element is a key element in the original video. Then, a target background image can be determined from the first video frame with the target elements, the target background image including key elements of the original video. Therefore, the target background image comprises the key content of the original video, and the key content of the original video can run through the whole video, so that a user can conveniently obtain effective information from the video.

In one or more embodiments of the present disclosure, the element identifying unit may include:

the type acquisition subunit is configured to acquire a target theme type of the original video;

and the element identification subunit is configured to identify the elements which are included in each video frame and are matched with the target topic type according to the target topic type.

According to the target theme type of the original video, elements which are included in each video frame and matched with the target theme type are identified, and then according to target elements in the identified elements, a target background image including the target elements is determined. Therefore, the target background image can be made to conform to the target theme type of the original video, and further, the target background image in the finally generated target video can be made to be more matched with the original video.

In one or more embodiments of the present disclosure, the image determining unit may include:

a key frame determination subunit configured to obtain, among the plurality of first video frames, one or two target video frames, the target video frames being: a video frame representing the overall structure of the target element, a video frame representing the local structure of the target element, or a video frame of a usage scene of the target element;

a first image determining subunit configured to determine one or two target video frames as a target background image.

a cropping subunit configured to crop out a sub-image of the target element from the at least one first video frame;

and the image adding subunit is configured to add the sub-image of the target element to the first preset image to obtain a target background image.

And adding the subimage of the target element cut out from the first video frame to a first preset image to obtain a target background image. The target background image may be made to include key elements of the original video. Thereby, the target background image is made to include the key content of the original video.

the text recognition unit is configured to recognize text information including preset keywords on a video frame of an original video, wherein the preset keywords are keywords corresponding to target elements;

and the text adding unit is configured to add the text information to a second preset image to obtain a target background image, wherein the target color is the color on the video frame of the original video.

And adding the text information on the video frame to a second preset image with a target color to obtain a target background image. The target color is the color on the video frame of the original video, so that the color of the target background image can be adapted to the color on the video frame of the original video, and the color of the original video in the target video is more coordinated with the color of the target background image when the target video is played. In addition, the key information on the video frame is arranged on the target background image, so that the key content of the original video can be included on the target background image, and the key content can be effectively conveyed to the user through the target background image.

In one or more embodiments of the present disclosure, the apparatus 400 may further include:

a pixel point acquisition module configured to acquire, for each of a plurality of colors on an original video, a pixel point number of each color in a plurality of video frames and a total pixel point number of the plurality of video frames;

the proportion calculation module is configured to calculate the ratio of the pixel point number of each color to the total pixel point number to obtain the proportion of each color;

a color obtaining module configured to obtain at least one target color among the plurality of colors, starting from a color with a largest proportion, in a case where the plurality of colors are arranged in a size order of the proportion;

the text addition unit is specifically configured to: and adding the text information to a second preset image with the target color to obtain a target background image.

In one or more embodiments of the present disclosure, in the case that the number of the text information is plural, the apparatus 400 may further include:

the second text acquisition module is configured to acquire second text information with the highest similarity to preset text information in the plurality of text information;

the text addition unit is specifically configured to: and adding the second text information to a second preset image to obtain a target background image.

And under the condition that the number of the text messages is multiple, according to the first text message corresponding to the target element in the text message library, acquiring second text messages with the highest similarity with preset text messages from the multiple text messages. Therefore, the second text information on the target background image is reasonably selected from the plurality of text information of the original video.

In one or more embodiments of the present disclosure, the video generation module 406 may be specifically configured to:

With regard to the apparatus in the above-described embodiment, the specific manner in which each module performs the operation has been described in detail in the embodiment related to the method, and will not be elaborated here.

The present disclosure provides a server comprising: a processor; a memory for storing the processor-executable instructions; wherein the processor is configured to execute the instructions to implement the video generation method of any of the above embodiments.

FIG. 15 is a block diagram illustrating a server in accordance with an exemplary embodiment. For example, the server may be provided as a server. Referring to fig. 15, server 500 includes a processing component 522 that further includes one or more processors and memory resources, represented by memory 532, for storing instructions, such as applications, that are executable by processing component 522. The application programs stored in memory 532 may include one or more modules that each correspond to a set of instructions. Further, the processing component 522 is configured to execute instructions to perform the video generation method of any of the embodiments described above.

The server 500 may also include a power component 526 configured to perform power management for the server 500, a wired or wireless network interface 550 configured to connect the server 500 to a network, and an input/output (I/O) interface 558. Server 500 may operate based on an operating system stored in memory 532, such as Windows Server, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM, or the like.

The present disclosure provides a storage medium in which instructions, when executed by a processor of a server, enable the server to perform the video generation method of any of the above embodiments.

In an exemplary embodiment, a storage medium comprising instructions, such as a memory comprising instructions, executable by a processor of a server to perform the above method is also provided. Alternatively, the storage medium may be a non-transitory computer readable storage medium, which may be, for example, a ROM, a Random Access Memory (RAM), a CD-ROM, a magnetic tape, a floppy disk, an optical data storage device, and the like.

The present disclosure provides a computer program product, in which instructions, when executed by a processor of a server, enable the server to perform the video generation method of any of the above embodiments.

Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This application is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.

It will be understood that the present disclosure is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

Claims

1. A method of video generation, comprising:

2. The method of claim 1, wherein determining a target background image from the video frames of the original video comprises:

acquiring a plurality of video frames from the original video;

3. The method of claim 2, wherein the identifying the elements that each video frame of the plurality of video frames includes comprises:

acquiring a target theme type of the original video;

4. The method of claim 2, wherein determining the target background image from the target element comprises:

determining one or two of the target video frames as the target background image.

5. The method of claim 2, wherein determining the target background image from the target element comprises:

6. The method of claim 2, wherein determining the target background image from the target element comprises:

acquiring a preset keyword corresponding to the target element;

7. The method according to claim 6, wherein before the adding the text information to the second preset image to obtain the target background image, the method further comprises:

8. A video generation apparatus, comprising:

9. A server, comprising:

a processor;

a memory for storing the processor-executable instructions;

wherein the processor is configured to execute the instructions to implement the video generation method of any of claims 1 to 7.

10. A storage medium in which instructions, when executed by a processor of a server, enable the server to perform the video generation method of any one of claims 1 to 7.