CN114283060B - Video generation method, device, equipment and storage medium - Google Patents

Video generation method, device, equipment and storage medium Download PDF

Info

Publication number
CN114283060B
CN114283060B CN202111566280.9A CN202111566280A CN114283060B CN 114283060 B CN114283060 B CN 114283060B CN 202111566280 A CN202111566280 A CN 202111566280A CN 114283060 B CN114283060 B CN 114283060B
Authority
CN
China
Prior art keywords
information
characteristic information
attribute
video
original image
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202111566280.9A
Other languages
Chinese (zh)
Other versions
CN114283060A (en
Inventor
张英杰
张启军
朱亦凡
张清源
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing ByteDance Network Technology Co Ltd
Original Assignee
Beijing ByteDance Network Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing ByteDance Network Technology Co Ltd filed Critical Beijing ByteDance Network Technology Co Ltd
Priority to CN202111566280.9A priority Critical patent/CN114283060B/en
Publication of CN114283060A publication Critical patent/CN114283060A/en
Application granted granted Critical
Publication of CN114283060B publication Critical patent/CN114283060B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Image Processing (AREA)
  • Television Signal Processing For Recording (AREA)

Abstract

The embodiment of the disclosure discloses a video generation method, a device, equipment and a storage medium. Extracting first characteristic information of an original image and second characteristic information of each video frame in an original driving video; wherein the original image and the original driving video both comprise character images; acquiring a plurality of optical flow transformation information according to the first characteristic information and each second characteristic information; transforming the original image according to the first characteristic information and the optical flow transformation information to obtain a plurality of target images; and splicing the plurality of target images to obtain a target video. According to the video generation method provided by the embodiment of the disclosure, the original image is transformed based on the first characteristic information and the optical flow transformation information corresponding to the original driving video, so that the expression of the character in the original driving video is transferred to the character in the original image, the generation efficiency of the expression driving video can be improved, and the interestingness of the generated video is improved.

Description

Video generation method, device, equipment and storage medium
Technical Field
The embodiment of the disclosure relates to the technical field of image processing, in particular to a video generation method, a device, equipment and a storage medium.
Background
With the continued development of artificial intelligence technology, deep neural networks have become increasingly popular in the areas of computer vision, natural language processing, and other interdisciplinary research. The expression driving technology is an important computer vision application based on a deep neural network, and can transfer a motion track in a driving video to a target image by inputting the target image and a corresponding driving video to generate a video with the motion track of the driving video by taking the target image as a reference.
The existing expression driving technology is difficult to process in real time because of huge model calculation amount, insufficient computer and insufficient memory of the traditional computing equipment, and therefore, additional computing and memory equipment is needed for heterogeneous acceleration, but due to the limitation of the computing process of the prior art, the traditional heterogeneous computing scheme faces additional data transmission, so that the following two problems are caused:
1. the extra transmission time results in the inability to perform expression driven video generation in real time.
2. Excessive overhead in additional data storage results in a single card device facing the problem of insufficient storage space.
Disclosure of Invention
The embodiment of the disclosure provides a video generation method, device, equipment and storage medium, which can improve the generation efficiency of expression driving video.
In a first aspect, an embodiment of the present disclosure provides a video generating method, including:
extracting first characteristic information of an original image and second characteristic information of each video frame in an original driving video; wherein the original image and the original driving video both comprise character images;
acquiring a plurality of optical flow transformation information according to the first characteristic information and each second characteristic information;
Transforming the original image according to the first characteristic information and the optical flow transformation information to obtain a plurality of target images;
and splicing the plurality of target images to obtain a target video.
In a second aspect, an embodiment of the present disclosure further provides a video generating apparatus, including:
The characteristic information extraction module is used for extracting first characteristic information of an original image and second characteristic information of each video frame in the original driving video; wherein the original image and the original driving video both comprise character images;
An optical flow transformation information acquisition module configured to acquire a plurality of optical flow transformation information based on the first feature information and each of the second feature information;
A target image acquisition module, configured to perform transformation processing on the original image according to the first feature information and the plurality of optical flow transformation information, to obtain a plurality of target images;
and the target video acquisition module is used for splicing the plurality of target images to obtain a target video.
In a third aspect, embodiments of the present disclosure further provide an electronic device, including:
one or more processing devices;
A storage means for storing one or more programs;
The one or more programs, when executed by the one or more processing devices, cause the one or more processing devices to implement the video generation method as described in embodiments of the present disclosure.
In a fourth aspect, the embodiments of the present disclosure further provide a computer readable medium having stored thereon a computer program which, when executed by a processing device, implements a video generation method according to the embodiments of the present disclosure.
The embodiment of the disclosure provides a video generation method, a device, equipment and a storage medium. Extracting first characteristic information of an original image and second characteristic information of each video frame in an original driving video; wherein, the original image and the original driving video both comprise character images; acquiring a plurality of optical flow transformation information according to the first characteristic information and each second characteristic information; transforming the original image according to the first characteristic information and the optical flow transformation information to obtain a plurality of target images; and splicing the plurality of target images to obtain a target video. According to the video generation method provided by the embodiment of the disclosure, the original image is transformed based on the first characteristic information and the optical flow transformation information corresponding to the original driving video, so that the expression of the character in the original driving video is transferred to the character in the original image, the generation efficiency of the expression driving video can be improved, and the interestingness of the generated video is improved.
Drawings
FIG. 1 is a flow chart of a video generation method in an embodiment of the present disclosure;
FIG. 2 is a schematic diagram of determining optical flow transformation information in an embodiment of the present disclosure;
fig. 3 is a schematic structural view of a video generating apparatus in an embodiment of the present disclosure;
Fig. 4 is a schematic structural diagram of an electronic device in an embodiment of the disclosure.
Detailed Description
Embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While certain embodiments of the present disclosure have been shown in the accompanying drawings, it is to be understood that the present disclosure may be embodied in various forms and should not be construed as limited to the embodiments set forth herein, but are provided to provide a more thorough and complete understanding of the present disclosure. It should be understood that the drawings and embodiments of the present disclosure are for illustration purposes only and are not intended to limit the scope of the present disclosure.
It should be understood that the various steps recited in the method embodiments of the present disclosure may be performed in a different order and/or performed in parallel. Furthermore, method embodiments may include additional steps and/or omit performing the illustrated steps. The scope of the present disclosure is not limited in this respect.
The term "including" and variations thereof as used herein are intended to be open-ended, i.e., including, but not limited to. The term "based on" is based at least in part on. The term "one embodiment" means "at least one embodiment"; the term "another embodiment" means "at least one additional embodiment"; the term "some embodiments" means "at least some embodiments. Related definitions of other terms will be given in the description below.
It should be noted that the terms "first," "second," and the like in this disclosure are merely used to distinguish between different devices, modules, or units and are not used to define an order or interdependence of functions performed by the devices, modules, or units.
It should be noted that references to "one", "a plurality" and "a plurality" in this disclosure are intended to be illustrative rather than limiting, and those of ordinary skill in the art will appreciate that "one or more" is intended to be understood as "one or more" unless the context clearly indicates otherwise.
The names of messages or information interacted between the various devices in the embodiments of the present disclosure are for illustrative purposes only and are not intended to limit the scope of such messages or information.
Fig. 1 is a flowchart of a video generating method according to a first embodiment of the present disclosure, where the method may be applied to a case of transferring a character expression in a video to a character in an original image, and the method may be performed by a video generating apparatus, where the apparatus may be composed of hardware and/or software and may be generally integrated in a device having a video generating function, and the device may be an electronic device such as a server, a mobile terminal, or a server cluster. As shown in fig. 1, the method specifically includes the following steps:
Step 110, extracting first characteristic information of the original image and second characteristic information of each video frame in the original driving video.
Wherein, the original image and the original driving video both contain character figures. The first feature information comprises first key point information and first attribute feature information; the second feature information includes second key point information and second attribute feature information. The key point information can be understood as information composed of key points on the human figure, and can be represented by vectors or matrixes; attribute feature information may be understood as high-level abstract features of the character, such as feature information of skin color, wrinkles, etc., and may be represented by vectors or matrices.
In this embodiment, a standard convolution network may be used to extract feature information of each video frame in the original image and the original driving video. Standard convolution calculations may include: convolutional layer, active layer, and residual block, etc. The principle of feature extraction may be: an RGB image is input and output as a feature vector of the image. The input image size is H W C, and the output is subjected to dimension reduction to obtain a low-dimension feature vector. The feature vector is a high-order representation of a picture with features of the picture, and can represent some abstract features of the picture.
In this embodiment, each video frame in the original driving video may be extracted from the original driving video according to a set sampling frequency. The set sampling frequency may be understood as extracting a video frame every set duration or every set number of frames. The set duration may be any value greater than or equal to 0, and the set frame number may be any integer greater than or equal to 0.
Step 120, obtaining a plurality of optical flow transformation information according to the first characteristic information and each second characteristic information.
The optical flow transformation information comprises the adaptation information between the original image and each video frame and the attribute information after linear transformation.
Specifically, the manner of obtaining the plurality of optical flow transformation information according to the first feature information and each of the second feature information may be: determining a first convex hull area according to the first key point information; determining a second convex hull area according to the second key point information; for the Nth video frame, performing linear processing on the first convex hull area and the second convex hull area to obtain adaptation information; wherein N is a positive integer greater than or equal to 1; performing linear processing on the first attribute characteristic information and the second attribute characteristic information to obtain target attribute characteristic information; and obtaining optical flow transformation information corresponding to the Nth video frame according to the adaptation information and the target attribute characteristic information.
The convex hull area can be used for understanding the area surrounded by the outline of the figure. The convex hull area may be calculated by using the existing convex hull algorithm (Graham scanning method) or the boundary method, which is not limited herein.
In this embodiment, for the convex hull area of each video frame in the original driving video, the respective convex hull areas may be calculated according to the second key point information of each video frame, or only the convex hull area of the first frame may be calculated, and then the convex hull area of the first frame is used as the convex hull area of the subsequent video frame. This has the advantage that the calculation effort can be greatly reduced.
In this embodiment, the linear processing manner of the first convex hull area and the second convex hull area may be directly dividing the first convex hull area and the second convex hull area; or the first convex hull area and the second convex hull area are calculated mathematically, and then the first convex hull area and the second convex hull area are divided after the mathematical calculation.
Optionally, the method for linearly processing the first convex hull area and the second convex hull area to obtain the adaptation information may be: respectively performing root calculation on the first convex hull area and the second convex hull area; dividing the first convex hull area and the second convex hull area calculated by the root of the evolution to obtain the adaptation information of the Nth video frame.
The root calculation may be 2 times or 3 times, and preferably, the root calculation is performed for 2 times on the first convex hull area and the second convex hull area respectively.
Optionally, the method for linearly processing the first attribute feature information and the second attribute feature information to obtain the target attribute feature information may be: linearly fusing the second attribute information of the Nth video frame and the second attribute information of the first video frame to obtain second intermediate attribute characteristic information; and linearly fusing the second intermediate attribute characteristic information and the first attribute characteristic information to obtain target attribute characteristic information.
Wherein the first attribute characteristic information and the second attribute characteristic information are each characterized by a matrix. The process of linearly fusing the second attribute information of the nth video frame and the second attribute information of the first video frame to obtain the second intermediate attribute feature information may be: and multiplying the matrix corresponding to the second attribute information of the first video frame after inversely transforming the matrix corresponding to the second attribute information of the Nth video frame to obtain second intermediate attribute characteristic information. The process of linearly fusing the second intermediate attribute feature information and the first attribute feature information to obtain the target attribute feature information may be: multiplying the matrix corresponding to the second intermediate attribute characteristic information by the matrix corresponding to the first attribute characteristic information to obtain the target attribute characteristic information.
Specifically, the evidence corresponding to the second attribute characteristic information of the nth video frame is subjected to inverse transformation, then the evidence after inverse transformation is multiplied by a matrix corresponding to the second attribute information of the first video frame, and finally the evidence obtained after multiplication is multiplied by a matrix corresponding to the first attribute characteristic information of the original image, so that the target attribute characteristic information is obtained.
Illustratively, FIG. 2 is a schematic diagram of the present embodiment for determining optical-flow conversion information. As shown in fig. 2, the convex hull area is calculated according to the key point information of the original image, a first convex hull area is obtained, the calculated convex hull area is subjected to root opening, then the convex hull area is calculated according to the key point information of the current frame or the key point information of the first frame, a second convex hull area is obtained, the second convex hull area is subjected to root opening, and finally the first convex hull area subjected to root opening and the second convex hull area subjected to root opening are multiplied to obtain the adaptation information. And multiplying the attribute characteristic information of the first frame with the inverse matrix of the attribute characteristic information of the current frame and multiplying the attribute characteristic information of the original image to obtain the target attribute characteristic information. Optical flow variation information is obtained from the adaptation information and the target attribute feature information.
And 130, performing transformation processing on the original image according to the first characteristic information and the optical flow transformation information to obtain a plurality of target images.
Specifically, the method for transforming the original image according to the first feature information and the plurality of optical flow transformation information to obtain the plurality of target images may be: the original image, the first feature information, and the plurality of optical flow conversion information are input to a setting decoder, and a plurality of target images are obtained.
Wherein the set decoder comprises a plurality of convolution layers. That is, for each video frame, the original image, the first key point information, the first attribute feature information, and the optical flow conversion information corresponding to the video frame are input to a setting decoder, and the target image corresponding to the video frame is obtained. Thereby transferring the character expression in the video frame to the character in the original image.
And 140, splicing the plurality of target images to obtain a target video.
Specifically, after a plurality of target images are obtained, splicing and encoding are carried out on the plurality of target images, and a target video is obtained.
According to the technical scheme, first characteristic information of an original image and second characteristic information of each video frame in an original driving video are extracted; wherein, the original image and the original driving video both comprise character images; acquiring a plurality of optical flow transformation information according to the first characteristic information and each second characteristic information; transforming the original image according to the first characteristic information and the optical flow transformation information to obtain a plurality of target images; and splicing the plurality of target images to obtain a target video. According to the video generation method provided by the embodiment of the disclosure, the original image is transformed based on the first characteristic information and the optical flow transformation information corresponding to the original driving video, so that the expression of the character in the original driving video is transferred to the character in the original image, the generation efficiency of the expression driving video can be improved, and the interestingness of the generated video is improved.
Fig. 3 is a schematic structural diagram of a video generating apparatus according to an embodiment of the present disclosure. As shown in fig. 3, the apparatus includes:
The feature information extracting module 210 is configured to extract first feature information of an original image and second feature information of each video frame in an original driving video; wherein, the original image and the original driving video both comprise character images;
an optical flow transformation information acquisition module 220 for acquiring a plurality of optical flow transformation information based on the first feature information and each of the second feature information;
a target image obtaining module 230, configured to perform a transformation process on the original image according to the first feature information and the plurality of optical flow transformation information, to obtain a plurality of target images;
The target video obtaining module 240 is configured to splice a plurality of target images to obtain a target video.
Optionally, the first feature information includes first key point information and first attribute feature information; the second feature information comprises second key point information and second attribute feature information; the optical flow transformation information acquisition module 220 is further configured to:
determining a first convex hull area according to the first key point information;
determining a second convex hull area according to the second key point information;
For the Nth video frame, performing linear processing on the first convex hull area and the second convex hull area to obtain adaptation information; wherein N is a positive integer greater than or equal to 1;
performing linear processing on the first attribute characteristic information and the second attribute characteristic information to obtain target attribute characteristic information;
And obtaining optical flow transformation information corresponding to the Nth video frame according to the adaptation information and the target attribute characteristic information.
Optionally, the optical flow transformation information obtaining module 220 is further configured to:
Respectively performing root calculation on the first convex hull area and the second convex hull area;
dividing the first convex hull area and the second convex hull area calculated by the root of the evolution to obtain the adaptation information of the Nth video frame.
Optionally, the optical flow transformation information obtaining module 220 is further configured to:
Linearly fusing the second attribute information of the Nth video frame and the second attribute information of the first video frame to obtain second intermediate attribute characteristic information;
And linearly fusing the second intermediate attribute characteristic information and the first attribute characteristic information to obtain target attribute characteristic information.
Optionally, the first attribute feature information and the second attribute feature information are both characterized by a matrix; the optical flow transformation information acquisition module 220 is further configured to:
multiplying the matrix corresponding to the second attribute information of the first video frame after inverse transformation of the matrix corresponding to the second attribute information of the Nth video frame to obtain second intermediate attribute characteristic information;
performing linear fusion on the second intermediate attribute characteristic information and the first attribute characteristic information to obtain target attribute characteristic information, wherein the method comprises the following steps:
multiplying the matrix corresponding to the second intermediate attribute characteristic information by the matrix corresponding to the first attribute characteristic information to obtain the target attribute characteristic information.
Optionally, the target image acquisition module 230 is further configured to:
Inputting the original image, the first feature information and the plurality of optical flow transformation information into a setting decoder to obtain a plurality of target images; wherein the set decoder comprises a plurality of convolution layers.
Optionally, the method further comprises: a video frame extraction module for:
and extracting video frames from the original driving video according to the set sampling frequency to obtain a plurality of video frames.
The device can execute the method provided by all the embodiments of the disclosure, and has the corresponding functional modules and beneficial effects of executing the method. Technical details not described in detail in this embodiment can be found in the methods provided by all of the foregoing embodiments of the present disclosure.
Referring now to fig. 4, a schematic diagram of an electronic device 300 suitable for use in implementing embodiments of the present disclosure is shown. The electronic devices in the embodiments of the present disclosure may include, but are not limited to, mobile terminals such as mobile phones, notebook computers, digital broadcast receivers, PDAs (personal digital assistants), PADs (tablet computers), PMPs (portable multimedia players), car terminals (e.g., car navigation terminals), etc., as well as fixed terminals such as digital TVs, desktop computers, etc., or various forms of servers such as stand-alone servers or server clusters. The electronic device shown in fig. 4 is merely an example and should not be construed to limit the functionality and scope of use of the disclosed embodiments.
As shown in fig. 4, the electronic device 300 may include a processing means (e.g., a central processing unit, a graphics processor, etc.) 301, which may perform various suitable actions and processes according to a program stored in a read-only memory (ROM) 302 or a program loaded from a storage 308 into a Random Access Memory (RAM) 303. In the RAM 303, various programs and data required for the operation of the electronic apparatus 300 are also stored. The processing device 301, the ROM 302, and the RAM 303 are connected to each other via a bus 304. An input/output (I/O) interface 305 is also connected to bus 304.
In general, the following devices may be connected to the I/O interface 305: input devices 306 including, for example, a touch screen, touchpad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; an output device 307 including, for example, a Liquid Crystal Display (LCD), a speaker, a vibrator, and the like; storage 308 including, for example, magnetic tape, hard disk, etc.; and communication means 309. The communication means 309 may allow the electronic device 300 to communicate with other devices wirelessly or by wire to exchange data. While fig. 4 shows an electronic device 300 having various means, it is to be understood that not all of the illustrated means are required to be implemented or provided. More or fewer devices may be implemented or provided instead.
In particular, according to embodiments of the present disclosure, the processes described above with reference to flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program containing program code for performing a recommended method of words. In such an embodiment, the computer program may be downloaded and installed from a network via a communication device 309, or installed from a storage device 308, or installed from a ROM 302. The above-described functions defined in the methods of the embodiments of the present disclosure are performed when the computer program is executed by the processing means 301.
It should be noted that the computer readable medium described in the present disclosure may be a computer readable signal medium or a computer readable storage medium, or any combination of the two. The computer readable storage medium can be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples of the computer-readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this disclosure, a computer-readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In the present disclosure, however, the computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave, with the computer-readable program code embodied therein. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: electrical wires, fiber optic cables, RF (radio frequency), and the like, or any suitable combination of the foregoing.
In some embodiments, the clients, servers may communicate using any currently known or future developed network protocol, such as HTTP (HyperText Transfer Protocol ), and may be interconnected with any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include a local area network ("LAN"), a wide area network ("WAN"), the internet (e.g., the internet), and peer-to-peer networks (e.g., ad hoc peer-to-peer networks), as well as any currently known or future developed networks.
The computer readable medium may be contained in the electronic device; or may exist alone without being incorporated into the electronic device.
The computer readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to: extracting first characteristic information of an original image and second characteristic information of each video frame in an original driving video; wherein the original image and the original driving video both comprise character images; acquiring a plurality of optical flow transformation information according to the first characteristic information and each second characteristic information; transforming the original image according to the first characteristic information and the optical flow transformation information to obtain a plurality of target images; and splicing the plurality of target images to obtain a target video.
Computer program code for carrying out operations of the present disclosure may be written in one or more programming languages, including, but not limited to, an object oriented programming language such as Java, smalltalk, C ++ and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computer (for example, through the Internet using an Internet service provider).
The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The units involved in the embodiments of the present disclosure may be implemented by means of software, or may be implemented by means of hardware. Wherein the names of the units do not constitute a limitation of the units themselves in some cases.
The functions described above herein may be performed, at least in part, by one or more hardware logic components. For example, without limitation, exemplary types of hardware logic components that may be used include: a Field Programmable Gate Array (FPGA), an Application Specific Integrated Circuit (ASIC), an Application Specific Standard Product (ASSP), a system on a chip (SOC), a Complex Programmable Logic Device (CPLD), and the like.
In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. The machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
According to one or more embodiments of the present disclosure, the embodiments of the present disclosure disclose a video generation method, including:
extracting first characteristic information of an original image and second characteristic information of each video frame in an original driving video; wherein the original image and the original driving video both comprise character images;
acquiring a plurality of optical flow transformation information according to the first characteristic information and each second characteristic information;
Transforming the original image according to the first characteristic information and the optical flow transformation information to obtain a plurality of target images;
and splicing the plurality of target images to obtain a target video.
Further, the first feature information includes first key point information and first attribute feature information; the second feature information comprises second key point information and second attribute feature information; acquiring a plurality of optical flow transformation information according to the first characteristic information and each of the second characteristic information, including:
Determining a first convex hull area according to the first key point information;
determining a second convex hull area according to the second key point information;
For an N-th video frame, carrying out linear processing on the first convex hull area and the second convex hull area to obtain adaptation information; wherein N is a positive integer greater than or equal to 1;
performing linear processing on the first attribute characteristic information and the second attribute characteristic information to obtain target attribute characteristic information;
And obtaining optical flow transformation information corresponding to the Nth video frame according to the adaptation information and the target attribute characteristic information.
Further, performing linear processing on the first convex hull area and the second convex hull area to obtain adaptation information, including:
performing root-of-square calculation on the first convex hull area and the second convex hull area respectively;
Dividing the first convex hull area and the second convex hull area calculated by the root of the square, and obtaining the adaptation information of the Nth video frame.
Further, performing linear processing on the first attribute feature information and the second attribute feature information to obtain target attribute feature information, including:
Linearly fusing the second attribute information of the Nth video frame and the second attribute information of the first video frame to obtain second intermediate attribute characteristic information;
And linearly fusing the second intermediate attribute characteristic information and the first attribute characteristic information to obtain target attribute characteristic information.
Further, the first attribute feature information and the second attribute feature information are both characterized by a matrix; the method for obtaining the second intermediate attribute characteristic information comprises the steps of:
multiplying the matrix corresponding to the second attribute information of the first video frame after inverse transformation of the matrix corresponding to the second attribute information of the Nth video frame to obtain second intermediate attribute characteristic information;
Performing linear fusion on the second intermediate attribute characteristic information and the first attribute characteristic information to obtain target attribute characteristic information, wherein the method comprises the following steps:
Multiplying the matrix corresponding to the second intermediate attribute characteristic information with the matrix corresponding to the first attribute characteristic information to obtain target attribute characteristic information.
Further, performing a transformation process on the original image according to the first feature information and the plurality of optical flow transformation information to obtain a plurality of target images, including:
inputting the original image, the first feature information, and the plurality of optical flow conversion information into a setting decoder to obtain a plurality of target images; wherein the set decoder includes a plurality of convolutional layers.
Further, before extracting the first feature information of the original image and the second feature information of each video frame in the original driving video, the method further includes:
and extracting video frames from the original driving video according to the set sampling frequency to obtain a plurality of video frames.
Note that the above is only a preferred embodiment of the present disclosure and the technical principle applied. Those skilled in the art will appreciate that the present disclosure is not limited to the specific embodiments described herein, and that various obvious changes, rearrangements and substitutions can be made by those skilled in the art without departing from the scope of the disclosure. Therefore, while the present disclosure has been described in connection with the above embodiments, the present disclosure is not limited to the above embodiments, but may include many other equivalent embodiments without departing from the spirit of the present disclosure, the scope of which is determined by the scope of the appended claims.

Claims (10)

1. A video generation method, comprising:
extracting first characteristic information of an original image and second characteristic information of each video frame in an original driving video; wherein the original image and the original driving video both comprise character images;
Acquiring a plurality of optical flow transformation information according to the first characteristic information and each second characteristic information; the optical flow transformation information comprises adaptive information between an original image and each video frame and attribute information after linear transformation; the adaptation information is determined based on the first characteristic information and the second characteristic information;
Transforming the original image according to the first characteristic information and the optical flow transformation information to obtain a plurality of target images;
and splicing the plurality of target images to obtain a target video.
2. The method of claim 1, wherein the first characteristic information comprises first keypoint information and first attribute characteristic information; the second feature information comprises second key point information and second attribute feature information; acquiring a plurality of optical flow transformation information according to the first characteristic information and each of the second characteristic information, including:
Determining a first convex hull area according to the first key point information;
determining a second convex hull area according to the second key point information;
For an N-th video frame, carrying out linear processing on the first convex hull area and the second convex hull area to obtain adaptation information; wherein N is a positive integer greater than or equal to 1;
performing linear processing on the first attribute characteristic information and the second attribute characteristic information to obtain target attribute characteristic information;
And obtaining optical flow transformation information corresponding to the Nth video frame according to the adaptation information and the target attribute characteristic information.
3. The method of claim 2, wherein linearly processing the first convex hull area and the second convex hull area to obtain adaptation information comprises:
performing root-of-square calculation on the first convex hull area and the second convex hull area respectively;
Dividing the first convex hull area and the second convex hull area calculated by the root of the square, and obtaining the adaptation information of the Nth video frame.
4. The method of claim 2, wherein linearly processing the first attribute feature information and the second attribute feature information to obtain target attribute feature information comprises:
linearly fusing the second attribute characteristic information of the Nth video frame with the second attribute information of the first video frame to obtain second intermediate attribute characteristic information;
And linearly fusing the second intermediate attribute characteristic information and the first attribute characteristic information to obtain target attribute characteristic information.
5. The method of claim 4, wherein the first attribute feature information and the second attribute feature information are each characterized by a matrix; the method for obtaining the second intermediate attribute characteristic information comprises the steps of:
multiplying the matrix corresponding to the second attribute information of the first video frame after inverse transformation of the matrix corresponding to the second attribute information of the Nth video frame to obtain second intermediate attribute characteristic information;
Performing linear fusion on the second intermediate attribute characteristic information and the first attribute characteristic information to obtain target attribute characteristic information, wherein the method comprises the following steps:
Multiplying the matrix corresponding to the second intermediate attribute characteristic information with the matrix corresponding to the first attribute characteristic information to obtain target attribute characteristic information.
6. The method of claim 1, wherein transforming the original image based on the first characteristic information and the plurality of optical-flow transformation information to obtain a plurality of target images, comprises:
inputting the original image, the first feature information, and the plurality of optical flow conversion information into a setting decoder to obtain a plurality of target images; wherein the set decoder includes a plurality of convolutional layers.
7. The method of claim 1, further comprising, prior to extracting the first characteristic information of the original image and the second characteristic information of each video frame in the original drive video:
and extracting video frames from the original driving video according to the set sampling frequency to obtain a plurality of video frames.
8. A video generating apparatus, comprising:
The characteristic information extraction module is used for extracting first characteristic information of an original image and second characteristic information of each video frame in the original driving video; wherein the original image and the original driving video both comprise character images;
an optical flow transformation information acquisition module configured to acquire a plurality of optical flow transformation information based on the first feature information and each of the second feature information; the optical flow transformation information comprises adaptive information between an original image and each video frame and attribute information after linear transformation; the adaptation information is determined based on the first characteristic information and the second characteristic information;
A target image acquisition module, configured to perform transformation processing on the original image according to the first feature information and the plurality of optical flow transformation information, to obtain a plurality of target images;
and the target video acquisition module is used for splicing the plurality of target images to obtain a target video.
9. An electronic device, the electronic device comprising:
one or more processing devices;
A storage means for storing one or more programs;
The one or more programs, when executed by the one or more processing devices, cause the one or more processing devices to implement the video generation method of any of claims 1-7.
10. A computer readable medium, on which a computer program is stored, characterized in that the program, when being executed by a processing device, implements a video generation method as claimed in any one of claims 1-7.
CN202111566280.9A 2021-12-20 2021-12-20 Video generation method, device, equipment and storage medium Active CN114283060B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111566280.9A CN114283060B (en) 2021-12-20 2021-12-20 Video generation method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111566280.9A CN114283060B (en) 2021-12-20 2021-12-20 Video generation method, device, equipment and storage medium

Publications (2)

Publication Number Publication Date
CN114283060A CN114283060A (en) 2022-04-05
CN114283060B true CN114283060B (en) 2024-06-28

Family

ID=80873285

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111566280.9A Active CN114283060B (en) 2021-12-20 2021-12-20 Video generation method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN114283060B (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102157007A (en) * 2011-04-11 2011-08-17 北京中星微电子有限公司 Performance-driven method and device for producing face animation
CN108416266A (en) * 2018-01-30 2018-08-17 同济大学 A kind of video behavior method for quickly identifying extracting moving target using light stream

Family Cites Families (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11256951B2 (en) * 2017-05-30 2022-02-22 Google Llc Systems and methods of person recognition in video streams
US10853951B2 (en) * 2017-08-04 2020-12-01 Intel Corporation Methods and apparatus to generate temporal representations for action recognition systems
CN108574794B (en) * 2018-03-30 2021-01-22 京东方科技集团股份有限公司 Image processing method and device, display equipment and computer readable storage medium
US10445921B1 (en) * 2018-06-13 2019-10-15 Adobe Inc. Transferring motion between consecutive frames to a digital image
CN110569702B (en) * 2019-02-14 2021-05-14 创新先进技术有限公司 Video stream processing method and device
CN110909613B (en) * 2019-10-28 2024-05-31 Oppo广东移动通信有限公司 Video character recognition method and device, storage medium and electronic equipment
CN111105382B (en) * 2019-12-31 2021-11-16 北京大学 Video repair method
CN111368137A (en) * 2020-02-12 2020-07-03 百度在线网络技术(北京)有限公司 Video generation method and device, electronic equipment and readable storage medium
CN111476871B (en) * 2020-04-02 2023-10-03 百度在线网络技术(北京)有限公司 Method and device for generating video
CN111726621B (en) * 2020-04-24 2022-12-30 中国科学院微电子研究所 Video conversion method and device
CN113313085B (en) * 2021-07-28 2021-10-15 北京奇艺世纪科技有限公司 Image processing method and device, electronic equipment and storage medium

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102157007A (en) * 2011-04-11 2011-08-17 北京中星微电子有限公司 Performance-driven method and device for producing face animation
CN108416266A (en) * 2018-01-30 2018-08-17 同济大学 A kind of video behavior method for quickly identifying extracting moving target using light stream

Also Published As

Publication number Publication date
CN114283060A (en) 2022-04-05

Similar Documents

Publication Publication Date Title
CN110413812B (en) Neural network model training method and device, electronic equipment and storage medium
CN112668588B (en) Parking space information generation method, device, equipment and computer readable medium
CN110298851B (en) Training method and device for human body segmentation neural network
CN114187177B (en) Method, device, equipment and storage medium for generating special effect video
CN114004905B (en) Method, device, equipment and storage medium for generating character style pictogram
CN114863214A (en) Image generation model training method, image generation device, image generation medium, and image generation device
CN112418249A (en) Mask image generation method and device, electronic equipment and computer readable medium
CN114399814B (en) Deep learning-based occlusion object removing and three-dimensional reconstructing method
CN112714263B (en) Video generation method, device, equipment and storage medium
CN112752118B (en) Video generation method, device, equipment and storage medium
CN116596748A (en) Image stylization processing method, apparatus, device, storage medium, and program product
CN114283060B (en) Video generation method, device, equipment and storage medium
CN114066722B (en) Method and device for acquiring image and electronic equipment
CN114422698B (en) Video generation method, device, equipment and storage medium
CN115270981A (en) Object processing method and device, readable medium and electronic equipment
CN112418233B (en) Image processing method and device, readable medium and electronic equipment
CN112070888B (en) Image generation method, device, equipment and computer readable medium
CN111798385B (en) Image processing method and device, computer readable medium and electronic equipment
CN111737575B (en) Content distribution method, content distribution device, readable medium and electronic equipment
CN114972876A (en) Knowledge distillation technology-based image processing method, device, equipment and medium
CN114049417B (en) Virtual character image generation method and device, readable medium and electronic equipment
CN114418835B (en) Image processing method, device, equipment and medium
CN115345931B (en) Object attitude key point information generation method and device, electronic equipment and medium
CN112465717B (en) Face image processing model training method, device, electronic equipment and medium
CN111738899B (en) Method, apparatus, device and computer readable medium for generating watermark

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant