CN114760534A

CN114760534A - Video generation method and device, electronic equipment and readable storage medium

Info

Publication number: CN114760534A
Application number: CN202210311691.1A
Authority: CN
Inventors: 王愈; 李健; 陈明; 武卫东
Original assignee: Beijing Sinovoice Technology Co Ltd
Current assignee: Beijing Sinovoice Technology Co Ltd
Priority date: 2022-03-28
Filing date: 2022-03-28
Publication date: 2022-07-15
Anticipated expiration: 2042-03-28
Also published as: CN114760534B

Abstract

The application relates to a video generation method, a video generation device, electronic equipment and a readable storage medium, which relate to the technical field of video processing, and the method comprises the following steps: acquiring a plurality of candidate video sub-segments corresponding to each audio sub-segment; constructing a target candidate grid according to the audio sub-segment and the candidate video sub-segment; performing global dynamic planning according to the target candidate grid to obtain a target video sub-segment corresponding to each audio sub-segment; and generating a target video according to each audio sub-segment and the target video sub-segment. By performing global dynamic programming according to the target candidate grid, the selected target video sub-segment takes into account both the similarity with the corresponding audio sub-segment and the continuity between the front and the back of the selected target video sub-segment. And the technical problem that the continuity of the adjacent two video clips at the joint is poor in the prior art is solved.

Description

Video generation method and device, electronic equipment and readable storage medium

Technical Field

The present application relates to the field of video processing technologies, and in particular, to a video generation method and apparatus, an electronic device, and a readable storage medium.

Background

With the development of the virtual anchor technology, when people modify an image sequence according to audio, the image sequence needs to be modified locally according to the audio, the corresponding relation between the audio and a certain frame of image in the image sequence is configured according to the playing sequence of the audio and the image sequence, and if the difference between the picture expected by the audio and the picture corresponding to the original image is too large, the modification amplitude is large, the problem that the modification is not in place easily occurs, and the modified effect is not ideal occurs.

In order to solve the existing problems, in the prior art, before executing the core algorithm, a video segment close to the desired action is pre-selected according to the audio, so as to reduce the modification range of the core algorithm, and thus, the generated new image is easier to be close to the content of the pronunciation.

However, although the prior art solution pre-selects the closest video segment for each audio segment, but does not consider the continuity between the front and the back of the selected video segment, there may be a jump, for example: the head of the previous video segment swings to the left until the next video segment is abruptly trimmed to the right. Therefore, the technical problem of poor consistency exists at the joint of two adjacent video clips in the existing technical scheme.

Disclosure of Invention

In order to overcome the problems in the related art, the present application provides a video generation method, an apparatus, an electronic device, and a readable storage medium.

According to a first aspect of embodiments of the present application, there is provided a video generation method, including:

acquiring a plurality of candidate video sub-segments corresponding to each audio sub-segment, wherein the candidate video sub-segments are acquired according to a target distance, and the target distance is used for determining the similarity of each candidate video sub-segment and the audio sub-segment corresponding to each candidate video sub-segment;

constructing a target candidate grid according to the audio sub-segment and the candidate video sub-segment;

performing global dynamic planning according to the target candidate grid to obtain a target video sub-segment corresponding to each audio sub-segment;

and generating a target video according to each audio sub-segment and the target video sub-segment.

Optionally, the constructing a target candidate grid according to the audio sub-segment and the candidate video sub-segment includes:

determining said each audio sub-segment as each column in a target candidate grid;

determining a number of the candidate video sub-segments corresponding to each of the audio sub-segments as each row in the target candidate grid;

And combining according to each column in the target candidate grids and each row in the target candidate grids to obtain the target candidate grids.

Optionally, the performing global dynamic planning according to the target candidate grid to obtain a target video sub-segment corresponding to each audio sub-segment includes:

and acquiring the target distances corresponding to a plurality of candidate video sub-segments in the n-1 column of the target candidate grid, wherein n is an integer greater than or equal to 2.

Optionally, after the step of obtaining the target distances corresponding to a plurality of candidate video sub-segments in the target candidate grid n-1 column, where n is an integer greater than or equal to 2, the method further includes:

calculating according to a plurality of candidate video sub-segments in the target candidate grid n-1 column and a plurality of candidate video sub-segments in the target candidate grid n column to obtain a connection distance corresponding to the plurality of candidate video sub-segments in the target candidate grid n column, wherein the connection distance is used for determining the proximity of the connection position of two adjacent candidate video sub-segments;

calculating according to the target distance and the connection distance to obtain a total distance corresponding to a plurality of candidate video sub-segments;

Sequencing according to the total distance corresponding to the candidate video sub-segments to obtain a target total distance;

and determining the candidate video sub-segment corresponding to the target total distance as a target video sub-segment.

Optionally, the generating a target video according to each audio sub-segment and the target video sub-segment includes:

inputting each audio sub-segment and each target video sub-segment into a pre-generated face modification model to obtain a target image sequence;

and combining each audio sub-segment and the target image sequence to generate a target video.

According to a second aspect of embodiments of the present application, there is provided a video generating apparatus, the apparatus including:

the data acquisition module is used for acquiring a plurality of candidate video sub-segments corresponding to each audio sub-segment, and the candidate video sub-segments are acquired according to a target distance, wherein the target distance is used for determining the similarity between each candidate video sub-segment and the audio sub-segment corresponding to each candidate video sub-segment;

the data construction module is used for constructing a target candidate grid according to the audio sub-segment and the candidate video sub-segment;

The data global dynamic planning module is used for carrying out global dynamic planning according to the target candidate grid to obtain a target video sub-segment corresponding to each audio sub-segment;

and the data generation module is used for generating a target video according to each audio sub-segment and the target video sub-segment.

Optionally, the data construction module includes:

a first determining sub-module for determining said each audio sub-segment as each column in the target candidate grid;

a second determining submodule, configured to determine a plurality of candidate video sub-segments corresponding to each audio sub-segment as each row in the target candidate grid;

and the data combination submodule is used for combining each column in the target candidate grid and each row in the target candidate grid to obtain the target candidate grid.

Optionally, the data global dynamic programming module includes:

and the target distance obtaining submodule is used for obtaining the target distances corresponding to a plurality of candidate video sub-segments in the n-1 column of the target candidate grid, wherein n is an integer greater than or equal to 2.

Optionally, the data global dynamic programming module further includes:

An engagement distance obtaining sub-module, configured to perform an operation according to the candidate video sub-segments in the target candidate grid n-1 column and the candidate video sub-segments in the target candidate grid n column to obtain an engagement distance corresponding to the candidate video sub-segments in the target candidate grid n column, where the engagement distance is used to determine an approximation of a connection position between two adjacent candidate video sub-segments;

a total distance obtaining sub-module, configured to perform operation according to the target distance and the link distance to obtain a total distance corresponding to the candidate video sub-segments;

a target total distance obtaining sub-module, configured to sort according to the total distances corresponding to the multiple candidate video sub-segments, so as to obtain a target total distance;

and the target video sub-segment determining sub-module is used for determining the candidate video sub-segment corresponding to the target total distance as the target video sub-segment.

Optionally, the data generating module includes:

the target image sequence acquisition sub-module is used for inputting each audio sub-segment and each target video sub-segment into a pre-generated face modification model to obtain a target image sequence;

And the target video generation sub-module is used for combining each audio sub-segment and the target image sequence to generate a target video.

According to a third aspect of embodiments of the present application, there is provided an electronic apparatus, including: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the video generation method.

According to a fourth aspect of embodiments of the present application, there is provided a computer-readable storage medium storing a computer program which, when executed by a processor, implements the video generation method.

The technical scheme provided by the embodiment of the application can have the following beneficial effects:

the method comprises the steps of obtaining a plurality of candidate video sub-segments corresponding to each audio sub-segment, wherein the candidate video sub-segments are obtained according to a target distance, and the target distance is used for determining the similarity of each candidate video sub-segment and the audio sub-segment corresponding to each candidate video sub-segment; constructing a target candidate grid according to the audio sub-segment and the candidate video sub-segment; performing global dynamic planning according to the target candidate grid to obtain a target video sub-segment corresponding to each audio sub-segment; and generating a target video according to each audio sub-segment and the target video sub-segment. According to the technical scheme provided by the application, global dynamic planning is carried out according to the target candidate grid, so that the selected target video sub-segment not only considers the similarity with the corresponding audio sub-segment, but also considers the continuity between the front and the back of the selected target video sub-segment. And the technical problem that the continuity of the adjacent two video clips at the joint is poor in the prior art is solved.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the application.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the invention and together with the description, serve to explain the principles of the invention.

FIG. 1 is a flow diagram illustrating a method of video generation in accordance with an exemplary embodiment;

FIG. 2 is a flowchart illustrating a step 102 of a flowchart of one method of video generation shown in FIG. 1 according to an exemplary embodiment;

FIG. 3 is a flowchart illustrating step 103 of a flowchart of one method of video generation shown in FIG. 1 according to an exemplary embodiment;

FIG. 4 is a flowchart illustrating step 104 of a flowchart of one method of video generation shown in FIG. 1 according to an exemplary embodiment;

FIG. 5 is a block diagram illustrating an apparatus for video generation in accordance with an exemplary embodiment;

FIG. 6 is an apparatus block diagram of a data construction module 502 in the apparatus block diagram of one video generation shown in FIG. 5 according to an example embodiment;

FIG. 7 is a block diagram of a data global dynamic programming module 503 in the block diagram of an apparatus for video generation shown in FIG. 5 according to an exemplary embodiment;

FIG. 8 is an apparatus block diagram of the data generation module 504 in the apparatus block diagram of a video generation shown in FIG. 5 according to an example embodiment;

FIG. 9 is a block diagram of an electronic device shown in accordance with an exemplary embodiment;

FIG. 10 illustrates a target candidate grid for global dynamic planning, according to an example embodiment.

Detailed Description

Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present invention. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the invention, as detailed in the appended claims.

Fig. 1 is a flow chart illustrating a video generation method according to an exemplary embodiment, as shown in fig. 1, including the following steps.

Step 101, obtaining a plurality of candidate video sub-segments corresponding to each audio sub-segment, wherein the plurality of candidate video sub-segments are obtained according to a target distance, and the target distance is used for determining the similarity between each candidate video sub-segment and the audio sub-segment corresponding to each candidate video sub-segment.

It should be noted that, in the embodiment of the present application, the target distance is used to determine the similarity between each candidate video sub-segment and the corresponding audio sub-segment of each candidate video sub-segment. And performing ascending arrangement on the video sub-segments according to the target distance to obtain the video sub-segments after the ascending arrangement. Selecting front S with smaller target distance from the video sub-segments after ascending arrangement through a preset numerical value n_nAnd several candidate video sub-segments as the nth audio sub-segment. It should be noted that, in the embodiment of the present application, a specific value of the preset value n is not specifically limited.

And 102, constructing a target candidate grid according to the audio sub-segment and the candidate video sub-segment.

It should be noted that, in the embodiment of the present application, the target candidate mesh is constructed according to the audio sub-segment and the candidate video sub-segment.

Further, in the embodiment of the present application, as shown in fig. 2, step 102 includes the following steps.

Step 201, determining each audio sub-segment as each column in the target candidate grid.

Step 202, determining a number of the candidate video sub-segments corresponding to each of the audio sub-segments as each row in the target candidate grid.

Step 203, combining each column in the target candidate grid and each row in the target candidate grid to obtain the target candidate grid.

It should be noted that, in the embodiment of the present application, each audio sub-segment is each column in the target candidate grid, for example: as shown in FIG. 10, 1 and 2 … … n correspond to the first audio sub-segment and the second audio sub-segment … … nth audio sub-segment.

The candidate video sub-segments corresponding to each audio sub-segment are each row in the target candidate grid, for example: as shown in fig. 10, candidate 1, candidate 2 … …, candidate S in the first column₁Refer to the candidate video sub-segment 1 corresponding to the first audio sub-segment, the candidate video sub-segment 2 … … corresponding to the first audio sub-segment, and the candidate video sub-segment S corresponding to the first audio sub-segment₁(ii) a And so on, candidate 1, candidate 2 … …, candidate S in the second column₂Refer to candidate video sub-segment 1 corresponding to the second audio sub-segment, candidate video sub-segment 2 … … corresponding to the second audio sub-segment, and candidate video sub-segment S corresponding to the second audio sub-segment₂(ii) a Candidate 1, candidate 2 … … candidates S up to the nth column_NRefer to the candidate video sub-segment S corresponding to the n-th audio sub-segment 1, the candidate video sub-segment 2 … … _N。

According to each row in the target candidate grid, namely each audio sub-segment; the target candidate grid is constructed together with a plurality of candidate video sub-segments corresponding to each row in the target candidate grid, i.e. each audio sub-segment.

And 103, performing global dynamic planning according to the target candidate grid to obtain a target video sub-segment corresponding to each audio sub-segment.

It should be noted that, in the embodiment of the present application, global dynamic programming is performed according to the target candidate grid, so as to obtain the target video sub-segment corresponding to each audio sub-segment.

Further, in the embodiment of the present application, step 103 includes the following steps: and acquiring the target distances corresponding to a plurality of candidate video sub-segments in the n-1 column of the target candidate grid, wherein n is an integer greater than or equal to 2.

It should be noted that, in the embodiment of the present application, target distances corresponding to a plurality of candidate video sub-segments in a target candidate grid n-1 column are obtained, where n is an integer greater than or equal to 2. For example: as shown in FIG. 10, a target candidate is acquiredCandidate 1, candidate 2 … … candidate S in column 1 of grid selection₁Respectively corresponding to the target distances.

Further, in the embodiment of the present application, as shown in fig. 3, step 103 further includes the following steps.

Step 301, performing an operation according to a plurality of candidate video sub-segments in the target candidate grid n-1 column and a plurality of candidate video sub-segments in the target candidate grid n column to obtain a connection distance corresponding to the plurality of candidate video sub-segments in the target candidate grid n column, where the connection distance is used to determine an approximation of a connection position between two adjacent candidate video sub-segments.

It should be noted that, in the embodiment of the present application, the engagement distance is used to determine the proximity of the adjacent two candidate video sub-segments at the junction. For each candidate video sub-segment in each column of the target candidate grid, the engagement distance between the candidate video sub-segment and each candidate sub-segment in the previous column needs to be calculated respectively. Specifically, each candidate video sub-segment in the nth column in the target candidate grid needs to calculate the engagement distance between the candidate video sub-segment and each candidate sub-segment in the nth-1 column, taking a certain candidate video sub-segment in the nth column and a certain candidate video sub-segment in the nth-1 column as an example, the calculation method of the engagement distance between two candidate video sub-segments is as follows: 1) for a current candidate video sub-segment in the nth column, finding a corresponding image characterization vector in a lookup table of < image segment sequence number, image characterization vector >, wherein the latter half of the vector is a reverse vector; 2) for a candidate video sub-segment in n-1, finding a corresponding image characterization vector in a lookup table of (image segment serial number, image characterization vector), wherein the first half part of the vector is a forward vector; 3) the distance (cosine distance, or other type of vector distance) between the backward vector in 1) and the forward vector in 2) is calculated as the join distance between the two candidate video sub-segments.

Step 302, calculating according to the target distance and the connection distance to obtain a total distance corresponding to the candidate video sub-segments.

And 303, sequencing according to the total distances corresponding to the candidate video sub-segments to obtain a target total distance.

And step 304, determining the candidate video sub-segment corresponding to the target total distance as a target video sub-segment.

It should be noted that, in the embodiment of the present application, an operation is performed according to the target distance and the connection distance, so as to obtain a total distance corresponding to a plurality of candidate video sub-segments. Specifically, taking a certain candidate video sub-segment in the nth column and a certain candidate video sub-segment in the nth-1 column as an example, the total distance corresponding to the certain candidate video sub-segment in the nth column is the joining distance between the certain candidate video sub-segment in the nth column and the certain candidate video sub-segment in the nth-1 column + the target distance corresponding to the certain candidate video sub-segment in the nth-1 column.

And performing ascending arrangement on the total distances corresponding to the candidate video sub-segments to obtain the target total distance. Specifically, the total distances corresponding to all candidate video sub-segments in the nth column are arranged in an ascending order, the minimum total distance is selected, and the minimum total distance is used as the target total distance. Further, the candidate video sub-segment corresponding to the target total distance is determined as the nth column, that is, the target video sub-segment corresponding to the nth audio sub-segment.

And 104, generating a target video according to each audio sub-segment and the target video sub-segment.

It should be noted that, in the embodiment of the present application, the target video is generated according to each audio sub-segment and the target video sub-segment.

Further, in the embodiment of the present application, as shown in fig. 4, step 104 includes the following steps.

Step 401, inputting each audio sub-segment and each target video sub-segment into a pre-generated face modification model to obtain a target image sequence.

Step 402, combining each audio sub-segment and the target image sequence to generate a target video.

It should be noted that, in the embodiment of the present application, each audio sub-segment and each target video sub-segment are used as inputs of a pre-generated face modification model, and a target image sequence can be obtained from an output of the face modification model. Combining each audio sub-segment with the target image sequence enables the generation of a target video.

The method comprises the steps of obtaining a plurality of candidate video sub-segments corresponding to each audio sub-segment, wherein the candidate video sub-segments are obtained according to a target distance, and the target distance is used for determining the similarity of each candidate video sub-segment and the audio sub-segment corresponding to each candidate video sub-segment; constructing a target candidate grid according to the audio sub-segment and the candidate video sub-segment; performing global dynamic planning according to the target candidate grid to obtain a target video sub-segment corresponding to each audio sub-segment; and generating a target video according to each audio sub-segment and the target video sub-segment. By the technical scheme provided by the embodiment of the application, global dynamic planning can be performed according to the target candidate grid, so that the selected target video sub-segment not only considers the similarity with the corresponding audio sub-segment, but also considers the continuity between the front and the back of the selected target video sub-segment. The technical problem that the continuity of the adjacent two video clips at the joint is poor in the prior art is solved; by selecting a number of candidate video sub-segments corresponding to each audio sub-segment according to the target distance, it is achieved that the proximity of each candidate video sub-segment and the audio sub-segment corresponding to each said candidate video sub-segment can be taken into account. By respectively calculating the connection distance between each candidate video sub-segment in each column in the target candidate grid and each candidate sub-segment in the previous column, the method realizes the effect that the proximity of the adjacent two candidate video sub-segments at the connection part can be considered; the total distances corresponding to the candidate video sub-segments are obtained by calculating according to the target distance and the link distance, the total distances are ranked, the minimum total distance, namely the target total distance, is obtained, and then the candidate video sub-segments corresponding to the target total distance can be determined to be the video sub-segments which are high in similarity with the audio sub-segments and good in continuity between the front and the back of the video segments.

Fig. 5 is a block diagram illustrating an apparatus for video generation according to an exemplary embodiment, and referring to fig. 5, the apparatus includes a data acquisition module 501, a data construction module 502, a data global dynamic programming module 503, and a data generation module 504.

A data obtaining module 501, configured to obtain a plurality of candidate video sub-segments corresponding to each audio sub-segment, where the plurality of candidate video sub-segments are obtained according to a target distance, where the target distance is used to determine an approximation of each candidate video sub-segment and an audio sub-segment corresponding to each candidate video sub-segment.

A data constructing module 502, configured to construct a target candidate grid according to the audio sub-segment and the candidate video sub-segment.

And a data global dynamic planning module 503, configured to perform global dynamic planning according to the target candidate grid to obtain a target video sub-segment corresponding to each audio sub-segment.

A data generating module 504, configured to generate a target video according to each audio sub-segment and the target video sub-segment.

Fig. 6 is an apparatus block diagram of the data construction module 502 in the apparatus block diagram of video generation shown in fig. 5 according to an exemplary embodiment. Referring to fig. 6, the apparatus includes a first determination sub-module 601, a second determination sub-module 602, and a data combination sub-module 603.

A first determining sub-module 601, configured to determine each audio sub-segment as each column in the target candidate grid.

A second determining sub-module 602, configured to determine, as each row in the target candidate grid, a number of candidate video sub-segments corresponding to each of the audio sub-segments.

The data combining submodule 603 is configured to combine each column in the target candidate grid with each row in the target candidate grid to obtain the target candidate grid.

Further, referring to the data global dynamic programming module in the apparatus block diagram of video generation shown in fig. 5 according to an exemplary embodiment, the data global dynamic programming module includes: and the target distance obtaining submodule is used for obtaining the target distances corresponding to a plurality of candidate video sub-segments in the n-1 column of the target candidate grid, wherein n is an integer greater than or equal to 2.

Fig. 7 is a block diagram of a data global dynamic programming module 503 in the block diagram of the apparatus for video generation shown in fig. 5 according to an exemplary embodiment. Referring to fig. 7, the apparatus includes an engagement distance obtaining submodule 701, a total distance obtaining submodule 702, a target total distance obtaining submodule 703, and a target video sub-segment determining submodule 704.

The link distance obtaining sub-module 701 is configured to perform operation according to the candidate video sub-segments in the target candidate grid n-1 column and the candidate video sub-segments in the target candidate grid n column to obtain a link distance corresponding to the candidate video sub-segments in the target candidate grid n column, where the link distance is used to determine an approximation of a connection position between two adjacent candidate video sub-segments.

The total distance obtaining sub-module 702 is configured to perform operation according to the target distance and the link distance to obtain a total distance corresponding to the candidate video sub-segments.

The target total distance obtaining sub-module 703 is configured to sort according to the total distances corresponding to the multiple candidate video sub-segments, so as to obtain a target total distance.

And a determine target video sub-segment sub-module 704, configured to determine a candidate video sub-segment corresponding to the target total distance as a target video sub-segment.

Fig. 8 is an apparatus block diagram of the data generation module 504 in the apparatus block diagram of video generation shown in fig. 5 according to an exemplary embodiment. Referring to fig. 8, the apparatus includes a target image sequence acquisition sub-module 801 and a target video generation sub-module 802.

And the target image sequence acquisition sub-module 801 is configured to input each audio sub-segment and each target video sub-segment into a pre-generated face modification model to obtain a target image sequence.

A target video generating sub-module 802, configured to combine each audio sub-segment with the target image sequence to generate a target video.

The method comprises the steps of obtaining a plurality of candidate video sub-segments corresponding to each audio sub-segment, wherein the candidate video sub-segments are obtained according to a target distance, and the target distance is used for determining the similarity of each candidate video sub-segment and the audio sub-segment corresponding to each candidate video sub-segment; constructing a target candidate grid according to the audio sub-segment and the candidate video sub-segment; performing global dynamic planning according to the target candidate grids to obtain a target video sub-segment corresponding to each audio sub-segment; and generating a target video according to each audio sub-segment and the target video sub-segment. By the technical scheme provided by the embodiment of the application, global dynamic planning can be performed according to the target candidate grid, so that the selected target video sub-segment not only considers the similarity with the corresponding audio sub-segment, but also considers the continuity between the front and the back of the selected target video sub-segment. The technical problem that the continuity of the adjacent two video clips at the joint is poor in the prior art is solved; by selecting a number of candidate video sub-segments corresponding to each audio sub-segment according to the target distance, it is achieved that the proximity of each candidate video sub-segment and the audio sub-segment corresponding to each said candidate video sub-segment can be taken into account. By respectively calculating the connection distance between each candidate video sub-segment in each column in the target candidate grid and each candidate sub-segment in the previous column, the method realizes the effect that the proximity of the adjacent two candidate video sub-segments at the connection part can be considered; the total distances corresponding to the candidate video sub-segments are obtained by calculating according to the target distance and the link distance, the total distances are ranked, the minimum total distance, namely the target total distance, is obtained, and then the candidate video sub-segments corresponding to the target total distance can be determined to be the video sub-segments which are high in similarity with the audio sub-segments and good in continuity between the front and the back of the video segments.

With regard to the apparatus in the above-described embodiment, the specific manner in which each module performs the operation has been described in detail in the embodiment related to the method, and will not be elaborated here.

Fig. 9 is a block diagram illustrating an electronic device 900 in accordance with an example embodiment. For example, the electronic device 900 may be a mobile phone, a computer, a digital broadcast terminal, a messaging device, a game console, a tablet device, a medical device, an exercise device, a personal digital assistant, and the like.

Referring to fig. 9, electronic device 900 may include one or more of the following components: processing component 902, memory 904, power component 906, multimedia component 908, audio component 910, input/output interface 912, sensor component 914, and communication component 916.

The processing component 902 generally controls overall operation of the device 900, such as operations associated with display, telephone calls, data communications, camera operations, and recording operations. Processing component 902 may include one or more processors 920 to execute instructions to perform all or a portion of the steps of the methods described above. Further, processing component 902 can include one or more modules that facilitate interaction between processing component 902 and other components. For example, the processing component 902 can include a multimedia module to facilitate interaction between the multimedia component 908 and the processing component 902.

The memory 904 is configured to store various types of data to support operation at the device 900. Examples of such data include instructions for any application or method operating on device 900, contact data, phonebook data, messages, pictures, videos, and so forth. The memory 904 may be implemented by any type or combination of volatile or non-volatile memory devices such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disks.

The power supply component 906 provides power to the various components of the electronic device 900. The power components 906 may include a power management system, one or more power supplies, and other components associated with generating, managing, and distributing power for the electronic device 900.

The multimedia components 908 include a screen that provides an output interface between the electronic device 900 and a user. In some embodiments, the screen may include a Liquid Crystal Display (LCD) and a Touch Panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive an input signal from a user. The touch panel includes one or more touch sensors to sense touch, slide, and gestures on the touch panel. The touch sensor may not only sense the boundary of a touch or slide action, but also detect the duration and pressure associated with the touch or slide operation. In some embodiments, the multimedia component 908 includes a front facing camera and/or a rear facing camera. The front camera and/or the rear camera may receive external multimedia data when the electronic device 900 is in an operating mode, such as a shooting mode or a video mode. Each front camera and rear camera may be a fixed optical lens system or have a focal length and optical zoom capability.

The audio component 910 is configured to output and/or input audio signals. For example, the audio component 910 includes a Microphone (MIC) configured to receive external audio signals when the electronic device 900 is in an operational mode, such as a call mode, a recording mode, and a voice recognition mode. The received audio signals may further be stored in the memory 904 or transmitted via the communication component 916. In some embodiments, audio component 910 further includes a speaker for outputting audio signals.

Input/output interface 912 provides an interface between processing component 902 and peripheral interface modules, which may be keyboards, click wheels, buttons, etc. These buttons may include, but are not limited to: a home button, a volume button, a start button, and a lock button.

The sensor component 914 includes one or more sensors for providing status evaluations of various aspects of the electronic device 900. For example, sensor assembly 914 may detect an open/closed state of electronic device 900, the relative positioning of components, such as a display and keypad of electronic device 900, sensor assembly 914 may also detect a change in the position of electronic device 900 or a component of electronic device 900, the presence or absence of user contact with electronic device 900, orientation or acceleration/deceleration of electronic device 900, and a change in the temperature of electronic device 900. The sensor assembly 914 may include a proximity sensor configured to detect the presence of a nearby object in the absence of any physical contact. The sensor assembly 914 may also include a light sensor, such as a CMOS or CCD image sensor, for use in imaging applications. In some embodiments, the sensor assembly 914 may also include an acceleration sensor, a gyroscope sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.

The communication component 916 is configured to facilitate wired or wireless communication between the electronic device 900 and other devices. The electronic device 900 may access a wireless network based on a communication standard, such as WiFi, a carrier network (such as 2G, 3G, 4G, or 5G), or a combination thereof. In an exemplary embodiment, the communication component 916 receives a broadcast signal or broadcast associated information from an external broadcast management system via a broadcast channel. In an exemplary embodiment, the communication component 916 further includes a Near Field Communication (NFC) module to facilitate short-range communications. For example, the NFC module may be implemented based on Radio Frequency Identification (RFID) technology, infrared data association (IrDA) technology, Ultra Wideband (UWB) technology, Bluetooth (BT) technology, and other technologies.

In an exemplary embodiment, the electronic device 900 may be implemented by one or more Application Specific Integrated Circuits (ASICs), Digital Signal Processors (DSPs), Digital Signal Processing Devices (DSPDs), Programmable Logic Devices (PLDs), Field Programmable Gate Arrays (FPGAs), controllers, micro-controllers, microprocessors or other electronic components for performing the above-described methods.

In an exemplary embodiment, a non-transitory computer readable storage medium comprising instructions, such as the memory 904 comprising instructions, executable by the processor 920 of the electronic device 900 to perform the above-described method is also provided. For example, the non-transitory computer readable storage medium may be a ROM, a Random Access Memory (RAM), a CD-ROM, a magnetic tape, a floppy disk, an optical data storage device, and the like.

Other embodiments of the invention will be apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. This invention is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles of the invention and including such departures from the present disclosure as come within known or customary practice within the art to which the invention pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the invention being indicated by the following claims.

It will be understood that the invention is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the invention is limited only by the appended claims.

Claims

1. A method of video generation, the method comprising:

performing global dynamic planning according to the target candidate grids to obtain a target video sub-segment corresponding to each audio sub-segment;

2. The method of claim 1, wherein said constructing a target candidate grid from said audio sub-segments and said candidate video sub-segments comprises:

determining each audio sub-segment as each column in a target candidate grid;

3. The video generation method according to claim 1, wherein the performing global dynamic programming according to the target candidate grid to obtain a target video sub-segment corresponding to each audio sub-segment comprises:

4. The method according to claim 1, wherein after the step of obtaining the target distances corresponding to a plurality of candidate video sub-segments in the n-1 column of the target candidate grid, where n is an integer greater than or equal to 2, the method further comprises:

calculating according to a plurality of candidate video sub-fragments in the n-1 column of the target candidate grid and a plurality of candidate video sub-fragments in the n column of the target candidate grid to obtain a connection distance corresponding to the plurality of candidate video sub-fragments in the n column of the target candidate grid, wherein the connection distance is used for determining the approximation of the connection position of two adjacent candidate video sub-fragments;

5. The method of claim 1, wherein said generating a target video from each of said audio sub-segments and said target video sub-segments comprises:

6. A video generation apparatus, characterized in that the apparatus comprises:

7. The video generating apparatus according to claim 6, wherein the data constructing module comprises:

A first determining submodule, configured to determine each audio sub-segment as each column in the target candidate grid;

a second determining sub-module, configured to determine a number of the candidate video sub-segments corresponding to each of the audio sub-segments as each row in the target candidate grid;

and the data combination submodule is used for combining each column in the target candidate grids and each row in the target candidate grids to obtain the target candidate grids.

8. The video generation apparatus according to claim 6, wherein the data global dynamic programming module comprises:

9. An electronic device, comprising:

at least one processor; and (c) a second step of,

a memory communicatively coupled to the at least one processor; wherein, the first and the second end of the pipe are connected with each other,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the video generation method of any one of claims 1 to 5.

10. A computer-readable storage medium storing a computer program, wherein the computer program is configured to implement the video generation method according to any one of claims 1 to 5 when executed by a processor.