CN114283060B

CN114283060B - Video generation method, device, equipment and storage medium

Info

Publication number: CN114283060B
Application number: CN202111566280.9A
Authority: CN
Inventors: 张英杰; 张启军; 朱亦凡; 张清源
Original assignee: Beijing ByteDance Network Technology Co Ltd
Current assignee: Beijing ByteDance Network Technology Co Ltd
Priority date: 2021-12-20
Filing date: 2021-12-20
Publication date: 2024-06-28
Anticipated expiration: 2041-12-20
Also published as: CN114283060A

Abstract

The embodiment of the disclosure discloses a video generation method, a device, equipment and a storage medium. Extracting first characteristic information of an original image and second characteristic information of each video frame in an original driving video; wherein the original image and the original driving video both comprise character images; acquiring a plurality of optical flow transformation information according to the first characteristic information and each second characteristic information; transforming the original image according to the first characteristic information and the optical flow transformation information to obtain a plurality of target images; and splicing the plurality of target images to obtain a target video. According to the video generation method provided by the embodiment of the disclosure, the original image is transformed based on the first characteristic information and the optical flow transformation information corresponding to the original driving video, so that the expression of the character in the original driving video is transferred to the character in the original image, the generation efficiency of the expression driving video can be improved, and the interestingness of the generated video is improved.

Description

Video generation method, device, equipment and storage medium

Technical Field

The embodiment of the disclosure relates to the technical field of image processing, in particular to a video generation method, a device, equipment and a storage medium.

Background

With the continued development of artificial intelligence technology, deep neural networks have become increasingly popular in the areas of computer vision, natural language processing, and other interdisciplinary research. The expression driving technology is an important computer vision application based on a deep neural network, and can transfer a motion track in a driving video to a target image by inputting the target image and a corresponding driving video to generate a video with the motion track of the driving video by taking the target image as a reference.

The existing expression driving technology is difficult to process in real time because of huge model calculation amount, insufficient computer and insufficient memory of the traditional computing equipment, and therefore, additional computing and memory equipment is needed for heterogeneous acceleration, but due to the limitation of the computing process of the prior art, the traditional heterogeneous computing scheme faces additional data transmission, so that the following two problems are caused:

1. the extra transmission time results in the inability to perform expression driven video generation in real time.

2. Excessive overhead in additional data storage results in a single card device facing the problem of insufficient storage space.

Disclosure of Invention

The embodiment of the disclosure provides a video generation method, device, equipment and storage medium, which can improve the generation efficiency of expression driving video.

In a first aspect, an embodiment of the present disclosure provides a video generating method, including:

extracting first characteristic information of an original image and second characteristic information of each video frame in an original driving video; wherein the original image and the original driving video both comprise character images;

acquiring a plurality of optical flow transformation information according to the first characteristic information and each second characteristic information;

Transforming the original image according to the first characteristic information and the optical flow transformation information to obtain a plurality of target images;

and splicing the plurality of target images to obtain a target video.

In a second aspect, an embodiment of the present disclosure further provides a video generating apparatus, including:

The characteristic information extraction module is used for extracting first characteristic information of an original image and second characteristic information of each video frame in the original driving video; wherein the original image and the original driving video both comprise character images;

An optical flow transformation information acquisition module configured to acquire a plurality of optical flow transformation information based on the first feature information and each of the second feature information;

A target image acquisition module, configured to perform transformation processing on the original image according to the first feature information and the plurality of optical flow transformation information, to obtain a plurality of target images;

and the target video acquisition module is used for splicing the plurality of target images to obtain a target video.

In a third aspect, embodiments of the present disclosure further provide an electronic device, including:

one or more processing devices;

A storage means for storing one or more programs;

The one or more programs, when executed by the one or more processing devices, cause the one or more processing devices to implement the video generation method as described in embodiments of the present disclosure.

In a fourth aspect, the embodiments of the present disclosure further provide a computer readable medium having stored thereon a computer program which, when executed by a processing device, implements a video generation method according to the embodiments of the present disclosure.

The embodiment of the disclosure provides a video generation method, a device, equipment and a storage medium. Extracting first characteristic information of an original image and second characteristic information of each video frame in an original driving video; wherein, the original image and the original driving video both comprise character images; acquiring a plurality of optical flow transformation information according to the first characteristic information and each second characteristic information; transforming the original image according to the first characteristic information and the optical flow transformation information to obtain a plurality of target images; and splicing the plurality of target images to obtain a target video. According to the video generation method provided by the embodiment of the disclosure, the original image is transformed based on the first characteristic information and the optical flow transformation information corresponding to the original driving video, so that the expression of the character in the original driving video is transferred to the character in the original image, the generation efficiency of the expression driving video can be improved, and the interestingness of the generated video is improved.

Drawings

FIG. 1 is a flow chart of a video generation method in an embodiment of the present disclosure;

FIG. 2 is a schematic diagram of determining optical flow transformation information in an embodiment of the present disclosure;

fig. 3 is a schematic structural view of a video generating apparatus in an embodiment of the present disclosure;

Fig. 4 is a schematic structural diagram of an electronic device in an embodiment of the disclosure.

Detailed Description

Embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While certain embodiments of the present disclosure have been shown in the accompanying drawings, it is to be understood that the present disclosure may be embodied in various forms and should not be construed as limited to the embodiments set forth herein, but are provided to provide a more thorough and complete understanding of the present disclosure. It should be understood that the drawings and embodiments of the present disclosure are for illustration purposes only and are not intended to limit the scope of the present disclosure.

It should be understood that the various steps recited in the method embodiments of the present disclosure may be performed in a different order and/or performed in parallel. Furthermore, method embodiments may include additional steps and/or omit performing the illustrated steps. The scope of the present disclosure is not limited in this respect.

The term "including" and variations thereof as used herein are intended to be open-ended, i.e., including, but not limited to. The term "based on" is based at least in part on. The term "one embodiment" means "at least one embodiment"; the term "another embodiment" means "at least one additional embodiment"; the term "some embodiments" means "at least some embodiments. Related definitions of other terms will be given in the description below.

It should be noted that the terms "first," "second," and the like in this disclosure are merely used to distinguish between different devices, modules, or units and are not used to define an order or interdependence of functions performed by the devices, modules, or units.

It should be noted that references to "one", "a plurality" and "a plurality" in this disclosure are intended to be illustrative rather than limiting, and those of ordinary skill in the art will appreciate that "one or more" is intended to be understood as "one or more" unless the context clearly indicates otherwise.

The names of messages or information interacted between the various devices in the embodiments of the present disclosure are for illustrative purposes only and are not intended to limit the scope of such messages or information.

Fig. 1 is a flowchart of a video generating method according to a first embodiment of the present disclosure, where the method may be applied to a case of transferring a character expression in a video to a character in an original image, and the method may be performed by a video generating apparatus, where the apparatus may be composed of hardware and/or software and may be generally integrated in a device having a video generating function, and the device may be an electronic device such as a server, a mobile terminal, or a server cluster. As shown in fig. 1, the method specifically includes the following steps:

Step 110, extracting first characteristic information of the original image and second characteristic information of each video frame in the original driving video.

Wherein, the original image and the original driving video both contain character figures. The first feature information comprises first key point information and first attribute feature information; the second feature information includes second key point information and second attribute feature information. The key point information can be understood as information composed of key points on the human figure, and can be represented by vectors or matrixes; attribute feature information may be understood as high-level abstract features of the character, such as feature information of skin color, wrinkles, etc., and may be represented by vectors or matrices.

In this embodiment, a standard convolution network may be used to extract feature information of each video frame in the original image and the original driving video. Standard convolution calculations may include: convolutional layer, active layer, and residual block, etc. The principle of feature extraction may be: an RGB image is input and output as a feature vector of the image. The input image size is H W C, and the output is subjected to dimension reduction to obtain a low-dimension feature vector. The feature vector is a high-order representation of a picture with features of the picture, and can represent some abstract features of the picture.

In this embodiment, each video frame in the original driving video may be extracted from the original driving video according to a set sampling frequency. The set sampling frequency may be understood as extracting a video frame every set duration or every set number of frames. The set duration may be any value greater than or equal to 0, and the set frame number may be any integer greater than or equal to 0.

Step 120, obtaining a plurality of optical flow transformation information according to the first characteristic information and each second characteristic information.

The optical flow transformation information comprises the adaptation information between the original image and each video frame and the attribute information after linear transformation.

Specifically, the manner of obtaining the plurality of optical flow transformation information according to the first feature information and each of the second feature information may be: determining a first convex hull area according to the first key point information; determining a second convex hull area according to the second key point information; for the Nth video frame, performing linear processing on the first convex hull area and the second convex hull area to obtain adaptation information; wherein N is a positive integer greater than or equal to 1; performing linear processing on the first attribute characteristic information and the second attribute characteristic information to obtain target attribute characteristic information; and obtaining optical flow transformation information corresponding to the Nth video frame according to the adaptation information and the target attribute characteristic information.

The convex hull area can be used for understanding the area surrounded by the outline of the figure. The convex hull area may be calculated by using the existing convex hull algorithm (Graham scanning method) or the boundary method, which is not limited herein.

In this embodiment, for the convex hull area of each video frame in the original driving video, the respective convex hull areas may be calculated according to the second key point information of each video frame, or only the convex hull area of the first frame may be calculated, and then the convex hull area of the first frame is used as the convex hull area of the subsequent video frame. This has the advantage that the calculation effort can be greatly reduced.

In this embodiment, the linear processing manner of the first convex hull area and the second convex hull area may be directly dividing the first convex hull area and the second convex hull area; or the first convex hull area and the second convex hull area are calculated mathematically, and then the first convex hull area and the second convex hull area are divided after the mathematical calculation.

Optionally, the method for linearly processing the first convex hull area and the second convex hull area to obtain the adaptation information may be: respectively performing root calculation on the first convex hull area and the second convex hull area; dividing the first convex hull area and the second convex hull area calculated by the root of the evolution to obtain the adaptation information of the Nth video frame.

The root calculation may be 2 times or 3 times, and preferably, the root calculation is performed for 2 times on the first convex hull area and the second convex hull area respectively.

Optionally, the method for linearly processing the first attribute feature information and the second attribute feature information to obtain the target attribute feature information may be: linearly fusing the second attribute information of the Nth video frame and the second attribute information of the first video frame to obtain second intermediate attribute characteristic information; and linearly fusing the second intermediate attribute characteristic information and the first attribute characteristic information to obtain target attribute characteristic information.

Wherein the first attribute characteristic information and the second attribute characteristic information are each characterized by a matrix. The process of linearly fusing the second attribute information of the nth video frame and the second attribute information of the first video frame to obtain the second intermediate attribute feature information may be: and multiplying the matrix corresponding to the second attribute information of the first video frame after inversely transforming the matrix corresponding to the second attribute information of the Nth video frame to obtain second intermediate attribute characteristic information. The process of linearly fusing the second intermediate attribute feature information and the first attribute feature information to obtain the target attribute feature information may be: multiplying the matrix corresponding to the second intermediate attribute characteristic information by the matrix corresponding to the first attribute characteristic information to obtain the target attribute characteristic information.

Specifically, the evidence corresponding to the second attribute characteristic information of the nth video frame is subjected to inverse transformation, then the evidence after inverse transformation is multiplied by a matrix corresponding to the second attribute information of the first video frame, and finally the evidence obtained after multiplication is multiplied by a matrix corresponding to the first attribute characteristic information of the original image, so that the target attribute characteristic information is obtained.

Illustratively, FIG. 2 is a schematic diagram of the present embodiment for determining optical-flow conversion information. As shown in fig. 2, the convex hull area is calculated according to the key point information of the original image, a first convex hull area is obtained, the calculated convex hull area is subjected to root opening, then the convex hull area is calculated according to the key point information of the current frame or the key point information of the first frame, a second convex hull area is obtained, the second convex hull area is subjected to root opening, and finally the first convex hull area subjected to root opening and the second convex hull area subjected to root opening are multiplied to obtain the adaptation information. And multiplying the attribute characteristic information of the first frame with the inverse matrix of the attribute characteristic information of the current frame and multiplying the attribute characteristic information of the original image to obtain the target attribute characteristic information. Optical flow variation information is obtained from the adaptation information and the target attribute feature information.

And 130, performing transformation processing on the original image according to the first characteristic information and the optical flow transformation information to obtain a plurality of target images.

Specifically, the method for transforming the original image according to the first feature information and the plurality of optical flow transformation information to obtain the plurality of target images may be: the original image, the first feature information, and the plurality of optical flow conversion information are input to a setting decoder, and a plurality of target images are obtained.

Wherein the set decoder comprises a plurality of convolution layers. That is, for each video frame, the original image, the first key point information, the first attribute feature information, and the optical flow conversion information corresponding to the video frame are input to a setting decoder, and the target image corresponding to the video frame is obtained. Thereby transferring the character expression in the video frame to the character in the original image.

And 140, splicing the plurality of target images to obtain a target video.

Specifically, after a plurality of target images are obtained, splicing and encoding are carried out on the plurality of target images, and a target video is obtained.

According to the technical scheme, first characteristic information of an original image and second characteristic information of each video frame in an original driving video are extracted; wherein, the original image and the original driving video both comprise character images; acquiring a plurality of optical flow transformation information according to the first characteristic information and each second characteristic information; transforming the original image according to the first characteristic information and the optical flow transformation information to obtain a plurality of target images; and splicing the plurality of target images to obtain a target video. According to the video generation method provided by the embodiment of the disclosure, the original image is transformed based on the first characteristic information and the optical flow transformation information corresponding to the original driving video, so that the expression of the character in the original driving video is transferred to the character in the original image, the generation efficiency of the expression driving video can be improved, and the interestingness of the generated video is improved.

Fig. 3 is a schematic structural diagram of a video generating apparatus according to an embodiment of the present disclosure. As shown in fig. 3, the apparatus includes:

The feature information extracting module 210 is configured to extract first feature information of an original image and second feature information of each video frame in an original driving video; wherein, the original image and the original driving video both comprise character images;

an optical flow transformation information acquisition module 220 for acquiring a plurality of optical flow transformation information based on the first feature information and each of the second feature information;

a target image obtaining module 230, configured to perform a transformation process on the original image according to the first feature information and the plurality of optical flow transformation information, to obtain a plurality of target images;

The target video obtaining module 240 is configured to splice a plurality of target images to obtain a target video.

Optionally, the first feature information includes first key point information and first attribute feature information; the second feature information comprises second key point information and second attribute feature information; the optical flow transformation information acquisition module 220 is further configured to:

determining a first convex hull area according to the first key point information;

determining a second convex hull area according to the second key point information;

For the Nth video frame, performing linear processing on the first convex hull area and the second convex hull area to obtain adaptation information; wherein N is a positive integer greater than or equal to 1;

performing linear processing on the first attribute characteristic information and the second attribute characteristic information to obtain target attribute characteristic information;

And obtaining optical flow transformation information corresponding to the Nth video frame according to the adaptation information and the target attribute characteristic information.

Optionally, the optical flow transformation information obtaining module 220 is further configured to:

Respectively performing root calculation on the first convex hull area and the second convex hull area;

dividing the first convex hull area and the second convex hull area calculated by the root of the evolution to obtain the adaptation information of the Nth video frame.

Linearly fusing the second attribute information of the Nth video frame and the second attribute information of the first video frame to obtain second intermediate attribute characteristic information;

And linearly fusing the second intermediate attribute characteristic information and the first attribute characteristic information to obtain target attribute characteristic information.

Optionally, the first attribute feature information and the second attribute feature information are both characterized by a matrix; the optical flow transformation information acquisition module 220 is further configured to:

multiplying the matrix corresponding to the second attribute information of the first video frame after inverse transformation of the matrix corresponding to the second attribute information of the Nth video frame to obtain second intermediate attribute characteristic information;

performing linear fusion on the second intermediate attribute characteristic information and the first attribute characteristic information to obtain target attribute characteristic information, wherein the method comprises the following steps:

multiplying the matrix corresponding to the second intermediate attribute characteristic information by the matrix corresponding to the first attribute characteristic information to obtain the target attribute characteristic information.

Optionally, the target image acquisition module 230 is further configured to:

Inputting the original image, the first feature information and the plurality of optical flow transformation information into a setting decoder to obtain a plurality of target images; wherein the set decoder comprises a plurality of convolution layers.

Optionally, the method further comprises: a video frame extraction module for:

and extracting video frames from the original driving video according to the set sampling frequency to obtain a plurality of video frames.

The device can execute the method provided by all the embodiments of the disclosure, and has the corresponding functional modules and beneficial effects of executing the method. Technical details not described in detail in this embodiment can be found in the methods provided by all of the foregoing embodiments of the present disclosure.

Referring now to fig. 4, a schematic diagram of an electronic device 300 suitable for use in implementing embodiments of the present disclosure is shown. The electronic devices in the embodiments of the present disclosure may include, but are not limited to, mobile terminals such as mobile phones, notebook computers, digital broadcast receivers, PDAs (personal digital assistants), PADs (tablet computers), PMPs (portable multimedia players), car terminals (e.g., car navigation terminals), etc., as well as fixed terminals such as digital TVs, desktop computers, etc., or various forms of servers such as stand-alone servers or server clusters. The electronic device shown in fig. 4 is merely an example and should not be construed to limit the functionality and scope of use of the disclosed embodiments.

As shown in fig. 4, the electronic device 300 may include a processing means (e.g., a central processing unit, a graphics processor, etc.) 301, which may perform various suitable actions and processes according to a program stored in a read-only memory (ROM) 302 or a program loaded from a storage 308 into a Random Access Memory (RAM) 303. In the RAM 303, various programs and data required for the operation of the electronic apparatus 300 are also stored. The processing device 301, the ROM 302, and the RAM 303 are connected to each other via a bus 304. An input/output (I/O) interface 305 is also connected to bus 304.

In general, the following devices may be connected to the I/O interface 305: input devices 306 including, for example, a touch screen, touchpad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; an output device 307 including, for example, a Liquid Crystal Display (LCD), a speaker, a vibrator, and the like; storage 308 including, for example, magnetic tape, hard disk, etc.; and communication means 309. The communication means 309 may allow the electronic device 300 to communicate with other devices wirelessly or by wire to exchange data. While fig. 4 shows an electronic device 300 having various means, it is to be understood that not all of the illustrated means are required to be implemented or provided. More or fewer devices may be implemented or provided instead.

In particular, according to embodiments of the present disclosure, the processes described above with reference to flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program containing program code for performing a recommended method of words. In such an embodiment, the computer program may be downloaded and installed from a network via a communication device 309, or installed from a storage device 308, or installed from a ROM 302. The above-described functions defined in the methods of the embodiments of the present disclosure are performed when the computer program is executed by the processing means 301.

It should be noted that the computer readable medium described in the present disclosure may be a computer readable signal medium or a computer readable storage medium, or any combination of the two. The computer readable storage medium can be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples of the computer-readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this disclosure, a computer-readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In the present disclosure, however, the computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave, with the computer-readable program code embodied therein. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: electrical wires, fiber optic cables, RF (radio frequency), and the like, or any suitable combination of the foregoing.

In some embodiments, the clients, servers may communicate using any currently known or future developed network protocol, such as HTTP (HyperText Transfer Protocol ), and may be interconnected with any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include a local area network ("LAN"), a wide area network ("WAN"), the internet (e.g., the internet), and peer-to-peer networks (e.g., ad hoc peer-to-peer networks), as well as any currently known or future developed networks.

The computer readable medium may be contained in the electronic device; or may exist alone without being incorporated into the electronic device.

The computer readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to: extracting first characteristic information of an original image and second characteristic information of each video frame in an original driving video; wherein the original image and the original driving video both comprise character images; acquiring a plurality of optical flow transformation information according to the first characteristic information and each second characteristic information; transforming the original image according to the first characteristic information and the optical flow transformation information to obtain a plurality of target images; and splicing the plurality of target images to obtain a target video.

Computer program code for carrying out operations of the present disclosure may be written in one or more programming languages, including, but not limited to, an object oriented programming language such as Java, smalltalk, C ++ and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computer (for example, through the Internet using an Internet service provider).

The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The units involved in the embodiments of the present disclosure may be implemented by means of software, or may be implemented by means of hardware. Wherein the names of the units do not constitute a limitation of the units themselves in some cases.

The functions described above herein may be performed, at least in part, by one or more hardware logic components. For example, without limitation, exemplary types of hardware logic components that may be used include: a Field Programmable Gate Array (FPGA), an Application Specific Integrated Circuit (ASIC), an Application Specific Standard Product (ASSP), a system on a chip (SOC), a Complex Programmable Logic Device (CPLD), and the like.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. The machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

According to one or more embodiments of the present disclosure, the embodiments of the present disclosure disclose a video generation method, including:

and splicing the plurality of target images to obtain a target video.

Further, the first feature information includes first key point information and first attribute feature information; the second feature information comprises second key point information and second attribute feature information; acquiring a plurality of optical flow transformation information according to the first characteristic information and each of the second characteristic information, including:

For an N-th video frame, carrying out linear processing on the first convex hull area and the second convex hull area to obtain adaptation information; wherein N is a positive integer greater than or equal to 1;

Further, performing linear processing on the first convex hull area and the second convex hull area to obtain adaptation information, including:

performing root-of-square calculation on the first convex hull area and the second convex hull area respectively;

Dividing the first convex hull area and the second convex hull area calculated by the root of the square, and obtaining the adaptation information of the Nth video frame.

Further, performing linear processing on the first attribute feature information and the second attribute feature information to obtain target attribute feature information, including:

Further, the first attribute feature information and the second attribute feature information are both characterized by a matrix; the method for obtaining the second intermediate attribute characteristic information comprises the steps of:

Multiplying the matrix corresponding to the second intermediate attribute characteristic information with the matrix corresponding to the first attribute characteristic information to obtain target attribute characteristic information.

Further, performing a transformation process on the original image according to the first feature information and the plurality of optical flow transformation information to obtain a plurality of target images, including:

inputting the original image, the first feature information, and the plurality of optical flow conversion information into a setting decoder to obtain a plurality of target images; wherein the set decoder includes a plurality of convolutional layers.

Further, before extracting the first feature information of the original image and the second feature information of each video frame in the original driving video, the method further includes:

Note that the above is only a preferred embodiment of the present disclosure and the technical principle applied. Those skilled in the art will appreciate that the present disclosure is not limited to the specific embodiments described herein, and that various obvious changes, rearrangements and substitutions can be made by those skilled in the art without departing from the scope of the disclosure. Therefore, while the present disclosure has been described in connection with the above embodiments, the present disclosure is not limited to the above embodiments, but may include many other equivalent embodiments without departing from the spirit of the present disclosure, the scope of which is determined by the scope of the appended claims.

Claims

1. A video generation method, comprising:

Acquiring a plurality of optical flow transformation information according to the first characteristic information and each second characteristic information; the optical flow transformation information comprises adaptive information between an original image and each video frame and attribute information after linear transformation; the adaptation information is determined based on the first characteristic information and the second characteristic information;

and splicing the plurality of target images to obtain a target video.

2. The method of claim 1, wherein the first characteristic information comprises first keypoint information and first attribute characteristic information; the second feature information comprises second key point information and second attribute feature information; acquiring a plurality of optical flow transformation information according to the first characteristic information and each of the second characteristic information, including:

3. The method of claim 2, wherein linearly processing the first convex hull area and the second convex hull area to obtain adaptation information comprises:

4. The method of claim 2, wherein linearly processing the first attribute feature information and the second attribute feature information to obtain target attribute feature information comprises:

linearly fusing the second attribute characteristic information of the Nth video frame with the second attribute information of the first video frame to obtain second intermediate attribute characteristic information;

5. The method of claim 4, wherein the first attribute feature information and the second attribute feature information are each characterized by a matrix; the method for obtaining the second intermediate attribute characteristic information comprises the steps of:

6. The method of claim 1, wherein transforming the original image based on the first characteristic information and the plurality of optical-flow transformation information to obtain a plurality of target images, comprises:

7. The method of claim 1, further comprising, prior to extracting the first characteristic information of the original image and the second characteristic information of each video frame in the original drive video:

8. A video generating apparatus, comprising:

an optical flow transformation information acquisition module configured to acquire a plurality of optical flow transformation information based on the first feature information and each of the second feature information; the optical flow transformation information comprises adaptive information between an original image and each video frame and attribute information after linear transformation; the adaptation information is determined based on the first characteristic information and the second characteristic information;

9. An electronic device, the electronic device comprising:

one or more processing devices;

A storage means for storing one or more programs;

The one or more programs, when executed by the one or more processing devices, cause the one or more processing devices to implement the video generation method of any of claims 1-7.

10. A computer readable medium, on which a computer program is stored, characterized in that the program, when being executed by a processing device, implements a video generation method as claimed in any one of claims 1-7.