CN114760534A - Video generation method and device, electronic equipment and readable storage medium - Google Patents

Video generation method and device, electronic equipment and readable storage medium Download PDF

Info

Publication number
CN114760534A
CN114760534A CN202210311691.1A CN202210311691A CN114760534A CN 114760534 A CN114760534 A CN 114760534A CN 202210311691 A CN202210311691 A CN 202210311691A CN 114760534 A CN114760534 A CN 114760534A
Authority
CN
China
Prior art keywords
target
segment
sub
candidate
video
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202210311691.1A
Other languages
Chinese (zh)
Other versions
CN114760534B (en
Inventor
王愈
李健
陈明
武卫东
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Sinovoice Technology Co Ltd
Original Assignee
Beijing Sinovoice Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Sinovoice Technology Co Ltd filed Critical Beijing Sinovoice Technology Co Ltd
Priority to CN202210311691.1A priority Critical patent/CN114760534B/en
Publication of CN114760534A publication Critical patent/CN114760534A/en
Application granted granted Critical
Publication of CN114760534B publication Critical patent/CN114760534B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/83Generation or processing of protective or descriptive data associated with content; Content structuring
    • H04N21/845Structuring of content, e.g. decomposing content into time segments
    • H04N21/8456Structuring of content, e.g. decomposing content into time segments by decomposing the content in the time domain, e.g. in time segments
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/439Processing of audio elementary streams
    • H04N21/4394Processing of audio elementary streams involving operations for analysing the audio stream, e.g. detecting features or characteristics in audio streams
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
    • H04N21/44008Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving operations for analysing video streams, e.g. detecting features or characteristics in the video stream

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Studio Circuits (AREA)
  • Television Signal Processing For Recording (AREA)

Abstract

The application relates to a video generation method, a video generation device, electronic equipment and a readable storage medium, which relate to the technical field of video processing, and the method comprises the following steps: acquiring a plurality of candidate video sub-segments corresponding to each audio sub-segment; constructing a target candidate grid according to the audio sub-segment and the candidate video sub-segment; performing global dynamic planning according to the target candidate grid to obtain a target video sub-segment corresponding to each audio sub-segment; and generating a target video according to each audio sub-segment and the target video sub-segment. By performing global dynamic programming according to the target candidate grid, the selected target video sub-segment takes into account both the similarity with the corresponding audio sub-segment and the continuity between the front and the back of the selected target video sub-segment. And the technical problem that the continuity of the adjacent two video clips at the joint is poor in the prior art is solved.

Description

Video generation method and device, electronic equipment and readable storage medium
Technical Field
The present application relates to the field of video processing technologies, and in particular, to a video generation method and apparatus, an electronic device, and a readable storage medium.
Background
With the development of the virtual anchor technology, when people modify an image sequence according to audio, the image sequence needs to be modified locally according to the audio, the corresponding relation between the audio and a certain frame of image in the image sequence is configured according to the playing sequence of the audio and the image sequence, and if the difference between the picture expected by the audio and the picture corresponding to the original image is too large, the modification amplitude is large, the problem that the modification is not in place easily occurs, and the modified effect is not ideal occurs.
In order to solve the existing problems, in the prior art, before executing the core algorithm, a video segment close to the desired action is pre-selected according to the audio, so as to reduce the modification range of the core algorithm, and thus, the generated new image is easier to be close to the content of the pronunciation.
However, although the prior art solution pre-selects the closest video segment for each audio segment, but does not consider the continuity between the front and the back of the selected video segment, there may be a jump, for example: the head of the previous video segment swings to the left until the next video segment is abruptly trimmed to the right. Therefore, the technical problem of poor consistency exists at the joint of two adjacent video clips in the existing technical scheme.
Disclosure of Invention
In order to overcome the problems in the related art, the present application provides a video generation method, an apparatus, an electronic device, and a readable storage medium.
According to a first aspect of embodiments of the present application, there is provided a video generation method, including:
acquiring a plurality of candidate video sub-segments corresponding to each audio sub-segment, wherein the candidate video sub-segments are acquired according to a target distance, and the target distance is used for determining the similarity of each candidate video sub-segment and the audio sub-segment corresponding to each candidate video sub-segment;
constructing a target candidate grid according to the audio sub-segment and the candidate video sub-segment;
performing global dynamic planning according to the target candidate grid to obtain a target video sub-segment corresponding to each audio sub-segment;
and generating a target video according to each audio sub-segment and the target video sub-segment.
Optionally, the constructing a target candidate grid according to the audio sub-segment and the candidate video sub-segment includes:
determining said each audio sub-segment as each column in a target candidate grid;
determining a number of the candidate video sub-segments corresponding to each of the audio sub-segments as each row in the target candidate grid;
And combining according to each column in the target candidate grids and each row in the target candidate grids to obtain the target candidate grids.
Optionally, the performing global dynamic planning according to the target candidate grid to obtain a target video sub-segment corresponding to each audio sub-segment includes:
and acquiring the target distances corresponding to a plurality of candidate video sub-segments in the n-1 column of the target candidate grid, wherein n is an integer greater than or equal to 2.
Optionally, after the step of obtaining the target distances corresponding to a plurality of candidate video sub-segments in the target candidate grid n-1 column, where n is an integer greater than or equal to 2, the method further includes:
calculating according to a plurality of candidate video sub-segments in the target candidate grid n-1 column and a plurality of candidate video sub-segments in the target candidate grid n column to obtain a connection distance corresponding to the plurality of candidate video sub-segments in the target candidate grid n column, wherein the connection distance is used for determining the proximity of the connection position of two adjacent candidate video sub-segments;
calculating according to the target distance and the connection distance to obtain a total distance corresponding to a plurality of candidate video sub-segments;
Sequencing according to the total distance corresponding to the candidate video sub-segments to obtain a target total distance;
and determining the candidate video sub-segment corresponding to the target total distance as a target video sub-segment.
Optionally, the generating a target video according to each audio sub-segment and the target video sub-segment includes:
inputting each audio sub-segment and each target video sub-segment into a pre-generated face modification model to obtain a target image sequence;
and combining each audio sub-segment and the target image sequence to generate a target video.
According to a second aspect of embodiments of the present application, there is provided a video generating apparatus, the apparatus including:
the data acquisition module is used for acquiring a plurality of candidate video sub-segments corresponding to each audio sub-segment, and the candidate video sub-segments are acquired according to a target distance, wherein the target distance is used for determining the similarity between each candidate video sub-segment and the audio sub-segment corresponding to each candidate video sub-segment;
the data construction module is used for constructing a target candidate grid according to the audio sub-segment and the candidate video sub-segment;
The data global dynamic planning module is used for carrying out global dynamic planning according to the target candidate grid to obtain a target video sub-segment corresponding to each audio sub-segment;
and the data generation module is used for generating a target video according to each audio sub-segment and the target video sub-segment.
Optionally, the data construction module includes:
a first determining sub-module for determining said each audio sub-segment as each column in the target candidate grid;
a second determining submodule, configured to determine a plurality of candidate video sub-segments corresponding to each audio sub-segment as each row in the target candidate grid;
and the data combination submodule is used for combining each column in the target candidate grid and each row in the target candidate grid to obtain the target candidate grid.
Optionally, the data global dynamic programming module includes:
and the target distance obtaining submodule is used for obtaining the target distances corresponding to a plurality of candidate video sub-segments in the n-1 column of the target candidate grid, wherein n is an integer greater than or equal to 2.
Optionally, the data global dynamic programming module further includes:
An engagement distance obtaining sub-module, configured to perform an operation according to the candidate video sub-segments in the target candidate grid n-1 column and the candidate video sub-segments in the target candidate grid n column to obtain an engagement distance corresponding to the candidate video sub-segments in the target candidate grid n column, where the engagement distance is used to determine an approximation of a connection position between two adjacent candidate video sub-segments;
a total distance obtaining sub-module, configured to perform operation according to the target distance and the link distance to obtain a total distance corresponding to the candidate video sub-segments;
a target total distance obtaining sub-module, configured to sort according to the total distances corresponding to the multiple candidate video sub-segments, so as to obtain a target total distance;
and the target video sub-segment determining sub-module is used for determining the candidate video sub-segment corresponding to the target total distance as the target video sub-segment.
Optionally, the data generating module includes:
the target image sequence acquisition sub-module is used for inputting each audio sub-segment and each target video sub-segment into a pre-generated face modification model to obtain a target image sequence;
And the target video generation sub-module is used for combining each audio sub-segment and the target image sequence to generate a target video.
According to a third aspect of embodiments of the present application, there is provided an electronic apparatus, including: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the video generation method.
According to a fourth aspect of embodiments of the present application, there is provided a computer-readable storage medium storing a computer program which, when executed by a processor, implements the video generation method.
The technical scheme provided by the embodiment of the application can have the following beneficial effects:
the method comprises the steps of obtaining a plurality of candidate video sub-segments corresponding to each audio sub-segment, wherein the candidate video sub-segments are obtained according to a target distance, and the target distance is used for determining the similarity of each candidate video sub-segment and the audio sub-segment corresponding to each candidate video sub-segment; constructing a target candidate grid according to the audio sub-segment and the candidate video sub-segment; performing global dynamic planning according to the target candidate grid to obtain a target video sub-segment corresponding to each audio sub-segment; and generating a target video according to each audio sub-segment and the target video sub-segment. According to the technical scheme provided by the application, global dynamic planning is carried out according to the target candidate grid, so that the selected target video sub-segment not only considers the similarity with the corresponding audio sub-segment, but also considers the continuity between the front and the back of the selected target video sub-segment. And the technical problem that the continuity of the adjacent two video clips at the joint is poor in the prior art is solved.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the application.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the invention and together with the description, serve to explain the principles of the invention.
FIG. 1 is a flow diagram illustrating a method of video generation in accordance with an exemplary embodiment;
FIG. 2 is a flowchart illustrating a step 102 of a flowchart of one method of video generation shown in FIG. 1 according to an exemplary embodiment;
FIG. 3 is a flowchart illustrating step 103 of a flowchart of one method of video generation shown in FIG. 1 according to an exemplary embodiment;
FIG. 4 is a flowchart illustrating step 104 of a flowchart of one method of video generation shown in FIG. 1 according to an exemplary embodiment;
FIG. 5 is a block diagram illustrating an apparatus for video generation in accordance with an exemplary embodiment;
FIG. 6 is an apparatus block diagram of a data construction module 502 in the apparatus block diagram of one video generation shown in FIG. 5 according to an example embodiment;
FIG. 7 is a block diagram of a data global dynamic programming module 503 in the block diagram of an apparatus for video generation shown in FIG. 5 according to an exemplary embodiment;
FIG. 8 is an apparatus block diagram of the data generation module 504 in the apparatus block diagram of a video generation shown in FIG. 5 according to an example embodiment;
FIG. 9 is a block diagram of an electronic device shown in accordance with an exemplary embodiment;
FIG. 10 illustrates a target candidate grid for global dynamic planning, according to an example embodiment.
Detailed Description
Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present invention. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the invention, as detailed in the appended claims.
Fig. 1 is a flow chart illustrating a video generation method according to an exemplary embodiment, as shown in fig. 1, including the following steps.
Step 101, obtaining a plurality of candidate video sub-segments corresponding to each audio sub-segment, wherein the plurality of candidate video sub-segments are obtained according to a target distance, and the target distance is used for determining the similarity between each candidate video sub-segment and the audio sub-segment corresponding to each candidate video sub-segment.
It should be noted that, in the embodiment of the present application, the target distance is used to determine the similarity between each candidate video sub-segment and the corresponding audio sub-segment of each candidate video sub-segment. And performing ascending arrangement on the video sub-segments according to the target distance to obtain the video sub-segments after the ascending arrangement. Selecting front S with smaller target distance from the video sub-segments after ascending arrangement through a preset numerical value nnAnd several candidate video sub-segments as the nth audio sub-segment. It should be noted that, in the embodiment of the present application, a specific value of the preset value n is not specifically limited.
And 102, constructing a target candidate grid according to the audio sub-segment and the candidate video sub-segment.
It should be noted that, in the embodiment of the present application, the target candidate mesh is constructed according to the audio sub-segment and the candidate video sub-segment.
Further, in the embodiment of the present application, as shown in fig. 2, step 102 includes the following steps.
Step 201, determining each audio sub-segment as each column in the target candidate grid.
Step 202, determining a number of the candidate video sub-segments corresponding to each of the audio sub-segments as each row in the target candidate grid.
Step 203, combining each column in the target candidate grid and each row in the target candidate grid to obtain the target candidate grid.
It should be noted that, in the embodiment of the present application, each audio sub-segment is each column in the target candidate grid, for example: as shown in FIG. 10, 1 and 2 … … n correspond to the first audio sub-segment and the second audio sub-segment … … nth audio sub-segment.
The candidate video sub-segments corresponding to each audio sub-segment are each row in the target candidate grid, for example: as shown in fig. 10, candidate 1, candidate 2 … …, candidate S in the first column1Refer to the candidate video sub-segment 1 corresponding to the first audio sub-segment, the candidate video sub-segment 2 … … corresponding to the first audio sub-segment, and the candidate video sub-segment S corresponding to the first audio sub-segment1(ii) a And so on, candidate 1, candidate 2 … …, candidate S in the second column2Refer to candidate video sub-segment 1 corresponding to the second audio sub-segment, candidate video sub-segment 2 … … corresponding to the second audio sub-segment, and candidate video sub-segment S corresponding to the second audio sub-segment2(ii) a Candidate 1, candidate 2 … … candidates S up to the nth columnNRefer to the candidate video sub-segment S corresponding to the n-th audio sub-segment 1, the candidate video sub-segment 2 … … N
According to each row in the target candidate grid, namely each audio sub-segment; the target candidate grid is constructed together with a plurality of candidate video sub-segments corresponding to each row in the target candidate grid, i.e. each audio sub-segment.
And 103, performing global dynamic planning according to the target candidate grid to obtain a target video sub-segment corresponding to each audio sub-segment.
It should be noted that, in the embodiment of the present application, global dynamic programming is performed according to the target candidate grid, so as to obtain the target video sub-segment corresponding to each audio sub-segment.
Further, in the embodiment of the present application, step 103 includes the following steps: and acquiring the target distances corresponding to a plurality of candidate video sub-segments in the n-1 column of the target candidate grid, wherein n is an integer greater than or equal to 2.
It should be noted that, in the embodiment of the present application, target distances corresponding to a plurality of candidate video sub-segments in a target candidate grid n-1 column are obtained, where n is an integer greater than or equal to 2. For example: as shown in FIG. 10, a target candidate is acquiredCandidate 1, candidate 2 … … candidate S in column 1 of grid selection1Respectively corresponding to the target distances.
Further, in the embodiment of the present application, as shown in fig. 3, step 103 further includes the following steps.
Step 301, performing an operation according to a plurality of candidate video sub-segments in the target candidate grid n-1 column and a plurality of candidate video sub-segments in the target candidate grid n column to obtain a connection distance corresponding to the plurality of candidate video sub-segments in the target candidate grid n column, where the connection distance is used to determine an approximation of a connection position between two adjacent candidate video sub-segments.
It should be noted that, in the embodiment of the present application, the engagement distance is used to determine the proximity of the adjacent two candidate video sub-segments at the junction. For each candidate video sub-segment in each column of the target candidate grid, the engagement distance between the candidate video sub-segment and each candidate sub-segment in the previous column needs to be calculated respectively. Specifically, each candidate video sub-segment in the nth column in the target candidate grid needs to calculate the engagement distance between the candidate video sub-segment and each candidate sub-segment in the nth-1 column, taking a certain candidate video sub-segment in the nth column and a certain candidate video sub-segment in the nth-1 column as an example, the calculation method of the engagement distance between two candidate video sub-segments is as follows: 1) for a current candidate video sub-segment in the nth column, finding a corresponding image characterization vector in a lookup table of < image segment sequence number, image characterization vector >, wherein the latter half of the vector is a reverse vector; 2) for a candidate video sub-segment in n-1, finding a corresponding image characterization vector in a lookup table of (image segment serial number, image characterization vector), wherein the first half part of the vector is a forward vector; 3) the distance (cosine distance, or other type of vector distance) between the backward vector in 1) and the forward vector in 2) is calculated as the join distance between the two candidate video sub-segments.
Step 302, calculating according to the target distance and the connection distance to obtain a total distance corresponding to the candidate video sub-segments.
And 303, sequencing according to the total distances corresponding to the candidate video sub-segments to obtain a target total distance.
And step 304, determining the candidate video sub-segment corresponding to the target total distance as a target video sub-segment.
It should be noted that, in the embodiment of the present application, an operation is performed according to the target distance and the connection distance, so as to obtain a total distance corresponding to a plurality of candidate video sub-segments. Specifically, taking a certain candidate video sub-segment in the nth column and a certain candidate video sub-segment in the nth-1 column as an example, the total distance corresponding to the certain candidate video sub-segment in the nth column is the joining distance between the certain candidate video sub-segment in the nth column and the certain candidate video sub-segment in the nth-1 column + the target distance corresponding to the certain candidate video sub-segment in the nth-1 column.
And performing ascending arrangement on the total distances corresponding to the candidate video sub-segments to obtain the target total distance. Specifically, the total distances corresponding to all candidate video sub-segments in the nth column are arranged in an ascending order, the minimum total distance is selected, and the minimum total distance is used as the target total distance. Further, the candidate video sub-segment corresponding to the target total distance is determined as the nth column, that is, the target video sub-segment corresponding to the nth audio sub-segment.
And 104, generating a target video according to each audio sub-segment and the target video sub-segment.
It should be noted that, in the embodiment of the present application, the target video is generated according to each audio sub-segment and the target video sub-segment.
Further, in the embodiment of the present application, as shown in fig. 4, step 104 includes the following steps.
Step 401, inputting each audio sub-segment and each target video sub-segment into a pre-generated face modification model to obtain a target image sequence.
Step 402, combining each audio sub-segment and the target image sequence to generate a target video.
It should be noted that, in the embodiment of the present application, each audio sub-segment and each target video sub-segment are used as inputs of a pre-generated face modification model, and a target image sequence can be obtained from an output of the face modification model. Combining each audio sub-segment with the target image sequence enables the generation of a target video.
The method comprises the steps of obtaining a plurality of candidate video sub-segments corresponding to each audio sub-segment, wherein the candidate video sub-segments are obtained according to a target distance, and the target distance is used for determining the similarity of each candidate video sub-segment and the audio sub-segment corresponding to each candidate video sub-segment; constructing a target candidate grid according to the audio sub-segment and the candidate video sub-segment; performing global dynamic planning according to the target candidate grid to obtain a target video sub-segment corresponding to each audio sub-segment; and generating a target video according to each audio sub-segment and the target video sub-segment. By the technical scheme provided by the embodiment of the application, global dynamic planning can be performed according to the target candidate grid, so that the selected target video sub-segment not only considers the similarity with the corresponding audio sub-segment, but also considers the continuity between the front and the back of the selected target video sub-segment. The technical problem that the continuity of the adjacent two video clips at the joint is poor in the prior art is solved; by selecting a number of candidate video sub-segments corresponding to each audio sub-segment according to the target distance, it is achieved that the proximity of each candidate video sub-segment and the audio sub-segment corresponding to each said candidate video sub-segment can be taken into account. By respectively calculating the connection distance between each candidate video sub-segment in each column in the target candidate grid and each candidate sub-segment in the previous column, the method realizes the effect that the proximity of the adjacent two candidate video sub-segments at the connection part can be considered; the total distances corresponding to the candidate video sub-segments are obtained by calculating according to the target distance and the link distance, the total distances are ranked, the minimum total distance, namely the target total distance, is obtained, and then the candidate video sub-segments corresponding to the target total distance can be determined to be the video sub-segments which are high in similarity with the audio sub-segments and good in continuity between the front and the back of the video segments.
Fig. 5 is a block diagram illustrating an apparatus for video generation according to an exemplary embodiment, and referring to fig. 5, the apparatus includes a data acquisition module 501, a data construction module 502, a data global dynamic programming module 503, and a data generation module 504.
A data obtaining module 501, configured to obtain a plurality of candidate video sub-segments corresponding to each audio sub-segment, where the plurality of candidate video sub-segments are obtained according to a target distance, where the target distance is used to determine an approximation of each candidate video sub-segment and an audio sub-segment corresponding to each candidate video sub-segment.
A data constructing module 502, configured to construct a target candidate grid according to the audio sub-segment and the candidate video sub-segment.
And a data global dynamic planning module 503, configured to perform global dynamic planning according to the target candidate grid to obtain a target video sub-segment corresponding to each audio sub-segment.
A data generating module 504, configured to generate a target video according to each audio sub-segment and the target video sub-segment.
Fig. 6 is an apparatus block diagram of the data construction module 502 in the apparatus block diagram of video generation shown in fig. 5 according to an exemplary embodiment. Referring to fig. 6, the apparatus includes a first determination sub-module 601, a second determination sub-module 602, and a data combination sub-module 603.
A first determining sub-module 601, configured to determine each audio sub-segment as each column in the target candidate grid.
A second determining sub-module 602, configured to determine, as each row in the target candidate grid, a number of candidate video sub-segments corresponding to each of the audio sub-segments.
The data combining submodule 603 is configured to combine each column in the target candidate grid with each row in the target candidate grid to obtain the target candidate grid.
Further, referring to the data global dynamic programming module in the apparatus block diagram of video generation shown in fig. 5 according to an exemplary embodiment, the data global dynamic programming module includes: and the target distance obtaining submodule is used for obtaining the target distances corresponding to a plurality of candidate video sub-segments in the n-1 column of the target candidate grid, wherein n is an integer greater than or equal to 2.
Fig. 7 is a block diagram of a data global dynamic programming module 503 in the block diagram of the apparatus for video generation shown in fig. 5 according to an exemplary embodiment. Referring to fig. 7, the apparatus includes an engagement distance obtaining submodule 701, a total distance obtaining submodule 702, a target total distance obtaining submodule 703, and a target video sub-segment determining submodule 704.
The link distance obtaining sub-module 701 is configured to perform operation according to the candidate video sub-segments in the target candidate grid n-1 column and the candidate video sub-segments in the target candidate grid n column to obtain a link distance corresponding to the candidate video sub-segments in the target candidate grid n column, where the link distance is used to determine an approximation of a connection position between two adjacent candidate video sub-segments.
The total distance obtaining sub-module 702 is configured to perform operation according to the target distance and the link distance to obtain a total distance corresponding to the candidate video sub-segments.
The target total distance obtaining sub-module 703 is configured to sort according to the total distances corresponding to the multiple candidate video sub-segments, so as to obtain a target total distance.
And a determine target video sub-segment sub-module 704, configured to determine a candidate video sub-segment corresponding to the target total distance as a target video sub-segment.
Fig. 8 is an apparatus block diagram of the data generation module 504 in the apparatus block diagram of video generation shown in fig. 5 according to an exemplary embodiment. Referring to fig. 8, the apparatus includes a target image sequence acquisition sub-module 801 and a target video generation sub-module 802.
And the target image sequence acquisition sub-module 801 is configured to input each audio sub-segment and each target video sub-segment into a pre-generated face modification model to obtain a target image sequence.
A target video generating sub-module 802, configured to combine each audio sub-segment with the target image sequence to generate a target video.
The method comprises the steps of obtaining a plurality of candidate video sub-segments corresponding to each audio sub-segment, wherein the candidate video sub-segments are obtained according to a target distance, and the target distance is used for determining the similarity of each candidate video sub-segment and the audio sub-segment corresponding to each candidate video sub-segment; constructing a target candidate grid according to the audio sub-segment and the candidate video sub-segment; performing global dynamic planning according to the target candidate grids to obtain a target video sub-segment corresponding to each audio sub-segment; and generating a target video according to each audio sub-segment and the target video sub-segment. By the technical scheme provided by the embodiment of the application, global dynamic planning can be performed according to the target candidate grid, so that the selected target video sub-segment not only considers the similarity with the corresponding audio sub-segment, but also considers the continuity between the front and the back of the selected target video sub-segment. The technical problem that the continuity of the adjacent two video clips at the joint is poor in the prior art is solved; by selecting a number of candidate video sub-segments corresponding to each audio sub-segment according to the target distance, it is achieved that the proximity of each candidate video sub-segment and the audio sub-segment corresponding to each said candidate video sub-segment can be taken into account. By respectively calculating the connection distance between each candidate video sub-segment in each column in the target candidate grid and each candidate sub-segment in the previous column, the method realizes the effect that the proximity of the adjacent two candidate video sub-segments at the connection part can be considered; the total distances corresponding to the candidate video sub-segments are obtained by calculating according to the target distance and the link distance, the total distances are ranked, the minimum total distance, namely the target total distance, is obtained, and then the candidate video sub-segments corresponding to the target total distance can be determined to be the video sub-segments which are high in similarity with the audio sub-segments and good in continuity between the front and the back of the video segments.
With regard to the apparatus in the above-described embodiment, the specific manner in which each module performs the operation has been described in detail in the embodiment related to the method, and will not be elaborated here.
Fig. 9 is a block diagram illustrating an electronic device 900 in accordance with an example embodiment. For example, the electronic device 900 may be a mobile phone, a computer, a digital broadcast terminal, a messaging device, a game console, a tablet device, a medical device, an exercise device, a personal digital assistant, and the like.
Referring to fig. 9, electronic device 900 may include one or more of the following components: processing component 902, memory 904, power component 906, multimedia component 908, audio component 910, input/output interface 912, sensor component 914, and communication component 916.
The processing component 902 generally controls overall operation of the device 900, such as operations associated with display, telephone calls, data communications, camera operations, and recording operations. Processing component 902 may include one or more processors 920 to execute instructions to perform all or a portion of the steps of the methods described above. Further, processing component 902 can include one or more modules that facilitate interaction between processing component 902 and other components. For example, the processing component 902 can include a multimedia module to facilitate interaction between the multimedia component 908 and the processing component 902.
The memory 904 is configured to store various types of data to support operation at the device 900. Examples of such data include instructions for any application or method operating on device 900, contact data, phonebook data, messages, pictures, videos, and so forth. The memory 904 may be implemented by any type or combination of volatile or non-volatile memory devices such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disks.
The power supply component 906 provides power to the various components of the electronic device 900. The power components 906 may include a power management system, one or more power supplies, and other components associated with generating, managing, and distributing power for the electronic device 900.
The multimedia components 908 include a screen that provides an output interface between the electronic device 900 and a user. In some embodiments, the screen may include a Liquid Crystal Display (LCD) and a Touch Panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive an input signal from a user. The touch panel includes one or more touch sensors to sense touch, slide, and gestures on the touch panel. The touch sensor may not only sense the boundary of a touch or slide action, but also detect the duration and pressure associated with the touch or slide operation. In some embodiments, the multimedia component 908 includes a front facing camera and/or a rear facing camera. The front camera and/or the rear camera may receive external multimedia data when the electronic device 900 is in an operating mode, such as a shooting mode or a video mode. Each front camera and rear camera may be a fixed optical lens system or have a focal length and optical zoom capability.
The audio component 910 is configured to output and/or input audio signals. For example, the audio component 910 includes a Microphone (MIC) configured to receive external audio signals when the electronic device 900 is in an operational mode, such as a call mode, a recording mode, and a voice recognition mode. The received audio signals may further be stored in the memory 904 or transmitted via the communication component 916. In some embodiments, audio component 910 further includes a speaker for outputting audio signals.
Input/output interface 912 provides an interface between processing component 902 and peripheral interface modules, which may be keyboards, click wheels, buttons, etc. These buttons may include, but are not limited to: a home button, a volume button, a start button, and a lock button.
The sensor component 914 includes one or more sensors for providing status evaluations of various aspects of the electronic device 900. For example, sensor assembly 914 may detect an open/closed state of electronic device 900, the relative positioning of components, such as a display and keypad of electronic device 900, sensor assembly 914 may also detect a change in the position of electronic device 900 or a component of electronic device 900, the presence or absence of user contact with electronic device 900, orientation or acceleration/deceleration of electronic device 900, and a change in the temperature of electronic device 900. The sensor assembly 914 may include a proximity sensor configured to detect the presence of a nearby object in the absence of any physical contact. The sensor assembly 914 may also include a light sensor, such as a CMOS or CCD image sensor, for use in imaging applications. In some embodiments, the sensor assembly 914 may also include an acceleration sensor, a gyroscope sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.
The communication component 916 is configured to facilitate wired or wireless communication between the electronic device 900 and other devices. The electronic device 900 may access a wireless network based on a communication standard, such as WiFi, a carrier network (such as 2G, 3G, 4G, or 5G), or a combination thereof. In an exemplary embodiment, the communication component 916 receives a broadcast signal or broadcast associated information from an external broadcast management system via a broadcast channel. In an exemplary embodiment, the communication component 916 further includes a Near Field Communication (NFC) module to facilitate short-range communications. For example, the NFC module may be implemented based on Radio Frequency Identification (RFID) technology, infrared data association (IrDA) technology, Ultra Wideband (UWB) technology, Bluetooth (BT) technology, and other technologies.
In an exemplary embodiment, the electronic device 900 may be implemented by one or more Application Specific Integrated Circuits (ASICs), Digital Signal Processors (DSPs), Digital Signal Processing Devices (DSPDs), Programmable Logic Devices (PLDs), Field Programmable Gate Arrays (FPGAs), controllers, micro-controllers, microprocessors or other electronic components for performing the above-described methods.
In an exemplary embodiment, a non-transitory computer readable storage medium comprising instructions, such as the memory 904 comprising instructions, executable by the processor 920 of the electronic device 900 to perform the above-described method is also provided. For example, the non-transitory computer readable storage medium may be a ROM, a Random Access Memory (RAM), a CD-ROM, a magnetic tape, a floppy disk, an optical data storage device, and the like.
Other embodiments of the invention will be apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. This invention is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles of the invention and including such departures from the present disclosure as come within known or customary practice within the art to which the invention pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the invention being indicated by the following claims.
It will be understood that the invention is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the invention is limited only by the appended claims.

Claims (10)

1. A method of video generation, the method comprising:
acquiring a plurality of candidate video sub-segments corresponding to each audio sub-segment, wherein the candidate video sub-segments are acquired according to a target distance, and the target distance is used for determining the similarity of each candidate video sub-segment and the audio sub-segment corresponding to each candidate video sub-segment;
Constructing a target candidate grid according to the audio sub-segment and the candidate video sub-segment;
performing global dynamic planning according to the target candidate grids to obtain a target video sub-segment corresponding to each audio sub-segment;
and generating a target video according to each audio sub-segment and the target video sub-segment.
2. The method of claim 1, wherein said constructing a target candidate grid from said audio sub-segments and said candidate video sub-segments comprises:
determining each audio sub-segment as each column in a target candidate grid;
determining a number of the candidate video sub-segments corresponding to each of the audio sub-segments as each row in the target candidate grid;
and combining according to each column in the target candidate grids and each row in the target candidate grids to obtain the target candidate grids.
3. The video generation method according to claim 1, wherein the performing global dynamic programming according to the target candidate grid to obtain a target video sub-segment corresponding to each audio sub-segment comprises:
and acquiring the target distances corresponding to a plurality of candidate video sub-segments in the n-1 column of the target candidate grid, wherein n is an integer greater than or equal to 2.
4. The method according to claim 1, wherein after the step of obtaining the target distances corresponding to a plurality of candidate video sub-segments in the n-1 column of the target candidate grid, where n is an integer greater than or equal to 2, the method further comprises:
calculating according to a plurality of candidate video sub-fragments in the n-1 column of the target candidate grid and a plurality of candidate video sub-fragments in the n column of the target candidate grid to obtain a connection distance corresponding to the plurality of candidate video sub-fragments in the n column of the target candidate grid, wherein the connection distance is used for determining the approximation of the connection position of two adjacent candidate video sub-fragments;
calculating according to the target distance and the connection distance to obtain a total distance corresponding to a plurality of candidate video sub-segments;
sequencing according to the total distance corresponding to the candidate video sub-segments to obtain a target total distance;
and determining the candidate video sub-segment corresponding to the target total distance as a target video sub-segment.
5. The method of claim 1, wherein said generating a target video from each of said audio sub-segments and said target video sub-segments comprises:
Inputting each audio sub-segment and each target video sub-segment into a pre-generated face modification model to obtain a target image sequence;
and combining each audio sub-segment and the target image sequence to generate a target video.
6. A video generation apparatus, characterized in that the apparatus comprises:
the data acquisition module is used for acquiring a plurality of candidate video sub-segments corresponding to each audio sub-segment, and the candidate video sub-segments are acquired according to a target distance, wherein the target distance is used for determining the similarity between each candidate video sub-segment and the audio sub-segment corresponding to each candidate video sub-segment;
the data construction module is used for constructing a target candidate grid according to the audio sub-segment and the candidate video sub-segment;
the data global dynamic planning module is used for carrying out global dynamic planning according to the target candidate grid to obtain a target video sub-segment corresponding to each audio sub-segment;
and the data generation module is used for generating a target video according to each audio sub-segment and the target video sub-segment.
7. The video generating apparatus according to claim 6, wherein the data constructing module comprises:
A first determining submodule, configured to determine each audio sub-segment as each column in the target candidate grid;
a second determining sub-module, configured to determine a number of the candidate video sub-segments corresponding to each of the audio sub-segments as each row in the target candidate grid;
and the data combination submodule is used for combining each column in the target candidate grids and each row in the target candidate grids to obtain the target candidate grids.
8. The video generation apparatus according to claim 6, wherein the data global dynamic programming module comprises:
and the target distance obtaining submodule is used for obtaining the target distances corresponding to a plurality of candidate video sub-segments in the n-1 column of the target candidate grid, wherein n is an integer greater than or equal to 2.
9. An electronic device, comprising:
at least one processor; and (c) a second step of,
a memory communicatively coupled to the at least one processor; wherein, the first and the second end of the pipe are connected with each other,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the video generation method of any one of claims 1 to 5.
10. A computer-readable storage medium storing a computer program, wherein the computer program is configured to implement the video generation method according to any one of claims 1 to 5 when executed by a processor.
CN202210311691.1A 2022-03-28 2022-03-28 Video generation method, device, electronic equipment and readable storage medium Active CN114760534B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210311691.1A CN114760534B (en) 2022-03-28 2022-03-28 Video generation method, device, electronic equipment and readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210311691.1A CN114760534B (en) 2022-03-28 2022-03-28 Video generation method, device, electronic equipment and readable storage medium

Publications (2)

Publication Number Publication Date
CN114760534A true CN114760534A (en) 2022-07-15
CN114760534B CN114760534B (en) 2024-03-01

Family

ID=82327479

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210311691.1A Active CN114760534B (en) 2022-03-28 2022-03-28 Video generation method, device, electronic equipment and readable storage medium

Country Status (1)

Country Link
CN (1) CN114760534B (en)

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5880788A (en) * 1996-03-25 1999-03-09 Interval Research Corporation Automated synchronization of video image sequences to new soundtracks
KR101393351B1 (en) * 2013-06-04 2014-05-09 주식회사 텔레칩스 Method of providing automatic setting of audio configuration of receiver's televisions optimized for multimedia contents to play, and computer-readable recording medium for the same
CN109977262A (en) * 2019-03-25 2019-07-05 北京旷视科技有限公司 The method, apparatus and processing equipment of candidate segment are obtained from video
CN110446066A (en) * 2019-08-28 2019-11-12 北京百度网讯科技有限公司 Method and apparatus for generating video
CN111212245A (en) * 2020-01-15 2020-05-29 北京猿力未来科技有限公司 Method and device for synthesizing video
CN111783566A (en) * 2020-06-15 2020-10-16 神思电子技术股份有限公司 Video synthesis method based on lip language synchronization and expression adaptation effect enhancement
CN112235631A (en) * 2019-07-15 2021-01-15 北京字节跳动网络技术有限公司 Video processing method and device, electronic equipment and storage medium
CN112927712A (en) * 2021-01-25 2021-06-08 网易(杭州)网络有限公司 Video generation method and device and electronic equipment
CN113507627A (en) * 2021-07-08 2021-10-15 北京的卢深视科技有限公司 Video generation method and device, electronic equipment and storage medium
CN114025235A (en) * 2021-11-12 2022-02-08 北京捷通华声科技股份有限公司 Video generation method and device, electronic equipment and storage medium

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5880788A (en) * 1996-03-25 1999-03-09 Interval Research Corporation Automated synchronization of video image sequences to new soundtracks
KR101393351B1 (en) * 2013-06-04 2014-05-09 주식회사 텔레칩스 Method of providing automatic setting of audio configuration of receiver's televisions optimized for multimedia contents to play, and computer-readable recording medium for the same
CN109977262A (en) * 2019-03-25 2019-07-05 北京旷视科技有限公司 The method, apparatus and processing equipment of candidate segment are obtained from video
CN112235631A (en) * 2019-07-15 2021-01-15 北京字节跳动网络技术有限公司 Video processing method and device, electronic equipment and storage medium
CN110446066A (en) * 2019-08-28 2019-11-12 北京百度网讯科技有限公司 Method and apparatus for generating video
CN111212245A (en) * 2020-01-15 2020-05-29 北京猿力未来科技有限公司 Method and device for synthesizing video
CN111783566A (en) * 2020-06-15 2020-10-16 神思电子技术股份有限公司 Video synthesis method based on lip language synchronization and expression adaptation effect enhancement
CN112927712A (en) * 2021-01-25 2021-06-08 网易(杭州)网络有限公司 Video generation method and device and electronic equipment
CN113507627A (en) * 2021-07-08 2021-10-15 北京的卢深视科技有限公司 Video generation method and device, electronic equipment and storage medium
CN114025235A (en) * 2021-11-12 2022-02-08 北京捷通华声科技股份有限公司 Video generation method and device, electronic equipment and storage medium

Also Published As

Publication number Publication date
CN114760534B (en) 2024-03-01

Similar Documents

Publication Publication Date Title
CN109600659B (en) Operation method, device and equipment for playing video and storage medium
CN110825912B (en) Video generation method and device, electronic equipment and storage medium
CN106657780B (en) Image preview method and device
CN110557547B (en) Lens position adjusting method and device
CN109360261B (en) Image processing method, image processing device, electronic equipment and storage medium
CN109413478B (en) Video editing method and device, electronic equipment and storage medium
US11770497B2 (en) Method and device for processing video, and storage medium
CN114025105B (en) Video processing method, device, electronic equipment and storage medium
CN108769769B (en) Video playing method and device and computer readable storage medium
CN110764627A (en) Input method and device and electronic equipment
CN108986117B (en) Video image segmentation method and device
JP2016532947A (en) Composition changing method, composition changing apparatus, terminal, program, and recording medium
CN112291631A (en) Information acquisition method, device, terminal and storage medium
CN107272896B (en) Method and device for switching between VR mode and non-VR mode
CN109783171B (en) Desktop plug-in switching method and device and storage medium
CN112947490B (en) Path smoothing method, path smoothing device, path smoothing equipment, path smoothing storage medium and path smoothing product
CN107239758B (en) Method and device for positioning key points of human face
CN109756783B (en) Poster generation method and device
CN106447747B (en) Image processing method and device
CN106528442B (en) Cache cleaning method and device
CN115512116B (en) Image segmentation model optimization method and device, electronic equipment and readable storage medium
CN107885464B (en) Data storage method, device and computer readable storage medium
CN108009676B (en) Pet route planning method and device
CN114760534B (en) Video generation method, device, electronic equipment and readable storage medium
CN110636377A (en) Video processing method, device, storage medium, terminal and server

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant