CN111862275B - Video editing method, device and equipment based on 3D reconstruction technology - Google Patents

Video editing method, device and equipment based on 3D reconstruction technology Download PDF

Info

Publication number
CN111862275B
CN111862275B CN202010725481.8A CN202010725481A CN111862275B CN 111862275 B CN111862275 B CN 111862275B CN 202010725481 A CN202010725481 A CN 202010725481A CN 111862275 B CN111862275 B CN 111862275B
Authority
CN
China
Prior art keywords
video
model
edited
frame
image
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010725481.8A
Other languages
Chinese (zh)
Other versions
CN111862275A (en
Inventor
吴善思源
龚秋棠
吴方灿
林奇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xiamen Zhenjing Technology Co ltd
Original Assignee
Xiamen Zhenjing Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xiamen Zhenjing Technology Co ltd filed Critical Xiamen Zhenjing Technology Co ltd
Priority to CN202010725481.8A priority Critical patent/CN111862275B/en
Publication of CN111862275A publication Critical patent/CN111862275A/en
Application granted granted Critical
Publication of CN111862275B publication Critical patent/CN111862275B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T13/00Animation
    • G06T13/203D [Three Dimensional] animation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T13/00Animation
    • G06T13/203D [Three Dimensional] animation
    • G06T13/403D [Three Dimensional] animation of characters, e.g. humans, animals or virtual beings
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T17/00Three dimensional [3D] modelling, e.g. data description of 3D objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T19/00Manipulating 3D models or images for computer graphics
    • G06T19/20Editing of 3D images, e.g. changing shapes or colours, aligning objects or positioning parts
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2219/00Indexing scheme for manipulating 3D models or images for computer graphics
    • G06T2219/20Indexing scheme for editing of 3D models
    • G06T2219/2016Rotation, translation, scaling
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Graphics (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • General Health & Medical Sciences (AREA)
  • Geometry (AREA)
  • Molecular Biology (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Computing Systems (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Health & Medical Sciences (AREA)
  • Mathematical Physics (AREA)
  • Multimedia (AREA)
  • Architecture (AREA)
  • Computer Hardware Design (AREA)
  • Processing Or Creating Images (AREA)

Abstract

The invention discloses a video editing method based on a 3D reconstruction technology, which comprises the following steps: acquiring a video to be edited; detecting identifiable objects in each frame of the video to be edited; reconstructing a first 3D model corresponding to each of the objects using a neural network; selecting a current frame of the object in the video to be edited, editing the selected object, and modifying the edited content to the first 3D model to generate a second 3D model; and carrying out real-time attitude estimation on each frame image where the object is located based on the second 3D model, driving the second 3D model to generate a replacement image according to the attitude estimation, and rendering the replacement image onto all frames of the same object of the video to be edited. The proposal provided by the invention can automatically apply to the same object on the whole video frame after editing the object in a single frame in the video, thereby improving the efficiency of editing the video by the user and improving the experience effect.

Description

Video editing method, device and equipment based on 3D reconstruction technology
Technical Field
The present invention relates to the field of video processing technologies, and in particular, to a video editing method, apparatus and device based on a 3D reconstruction technology.
Background
With the development of 5G and short video applications, users gradually began to switch from editing pictures to editing videos. The video editing software at the present stage is more in video overall timeline editing, such as deleting useless clips, adding music, and the like. If a user wants to edit an object in the video, such as changing furniture color or modifying character clothes patterns in the video, the user needs to modify the video frame by frame, 7200 frames of images are edited in one section of 5 minutes, and the workload is extremely high; there is no way to edit an object and synchronize it to a subsequent video frame, resulting in poor user experience in editing a video.
Disclosure of Invention
In view of the above, the present invention aims to provide a video editing method, device and equipment based on a 3D reconstruction technology, which can automatically apply to the same object on the whole video frame after editing the object in a single frame in the video, thereby improving the efficiency of editing the video by the user and improving the experience effect.
To achieve the above object, the present invention provides a video editing method based on a 3D reconstruction technique, the method comprising:
acquiring a video to be edited;
detecting identifiable objects in each frame of the video to be edited;
reconstructing a first 3D model corresponding to each of the objects using a neural network;
selecting a current frame of the object in the video to be edited, editing the selected object, and modifying the edited content to the first 3D model to generate a second 3D model;
and carrying out real-time attitude estimation on each frame image where the object is located based on the second 3D model, driving the second 3D model to generate a replacement image according to the attitude estimation, and rendering the replacement image onto all frames of the same object of the video to be edited.
Preferably, the detecting the identifiable object in each frame of the video to be edited includes:
and detecting identifiable objects in each frame of the video to be edited by using a universal object detection technology.
Preferably, the reconstructing the first 3D model corresponding to each of the objects using a neural network includes:
reconstructing the first 3D model corresponding to each object according to the voxel composition of the object through a self-encoder.
Preferably, the real-time pose estimation is performed on each frame image where the object is located based on the second 3D model, and the second 3D model is driven to generate a replacement image according to the pose estimation, the replacement image is rendered onto all frames of the same object of the video to be edited, the real-time pose estimation is performed on the object based on the 3D model, and the 3D model is driven to render the edited content onto all frames of the same object of the video to be edited, including:
cutting out the object according to the coordinates of each frame of image where the object is located, and inputting the object into the second 3D model;
outputting the coordinates of each frame of image where the object is located and the three-dimensional attitude parameters of the object;
and driving the second 3D model to rotate and translate to the position of the object in each corresponding frame image according to the coordinates and the three-dimensional gesture parameters, projecting the edited content onto all frames of the same object, and replacing pixel points in all frames to realize rendering.
In order to achieve the above object, the present invention further proposes a video editing apparatus based on a 3D reconstruction technique, the apparatus comprising:
the acquisition unit is used for acquiring the video to be edited;
the detection unit is used for detecting identifiable objects in each frame of the video to be edited;
a reconstruction unit, configured to reconstruct a first 3D model corresponding to each object using a neural network;
the editing unit is used for selecting the current frame of the object in the video to be edited, editing the selected object, modifying the edited content into the first 3D model and generating a second 3D model;
and the rendering unit is used for carrying out real-time gesture estimation on each frame image where the object is located based on the second 3D model, driving the second 3D model to generate a replacement image according to the gesture estimation, and rendering the replacement image onto all frames of the same object of the video to be edited.
Preferably, the detection unit further includes:
and detecting identifiable objects in each frame of the video to be edited by using a universal object detection technology.
Preferably, the editing unit further includes:
reconstructing the first 3D model corresponding to each object according to the voxel composition of the object through a self-encoder.
Preferably, the rendering unit further comprises:
the input unit is used for cutting out the object according to the coordinates of each frame of image where the object is located and inputting the object into the second 3D model;
the output unit is used for outputting the coordinates of each frame of image where the object is located and the three-dimensional attitude parameters of the object;
the driving unit is used for driving the second 3D model to rotate and translate to the position where the object appears in each corresponding frame image according to the coordinates and the three-dimensional gesture parameters, projecting the edited content onto all frames of the same object, and replacing pixel points in all frames to realize rendering.
In order to achieve the above object, the present invention further proposes a video editing apparatus based on 3D reconstruction technology, comprising a processor, a memory and a computer program stored in the memory, which, when being executed by the processor, is capable of implementing a video editing method based on 3D reconstruction technology as described in any one of the above.
In order to achieve the above object, the present invention further provides a computer readable storage medium, where the computer readable storage medium includes a stored computer program, where when the computer program runs, the computer readable storage medium is controlled to implement the video editing method based on the 3D reconstruction technology according to any one of the above embodiments when the device is executed.
It can be found that according to the scheme, the video to be edited can be obtained, identifiable objects in each frame of the video to be edited are detected, a first 3D model corresponding to each object is rebuilt by using a neural network, a current frame of the object in the video to be edited is selected, the selected object is edited, the edited content is modified to the first 3D model, a second 3D model is generated, real-time gesture estimation is carried out on each frame image of the object based on the second 3D model, the second 3D model is driven according to the gesture estimation to generate a replacement image, and the replacement image is rendered to all frames of the same object of the video to be edited.
Further, in the above scheme, the identifiable object in each frame of the video to be edited is detected by using a general object detection technology, so that the method has the advantages of being capable of accurately identifying a plurality of objects in the video and identifying a plurality of types.
Furthermore, according to the scheme, the self-encoder forms the first 3D model corresponding to the reconstructed object according to the voxels of each object, so that the object on a single frame can be edited in the video, the object can be automatically applied to the whole video, and the difficulty that the frame-by-frame editing is required for the editing in the video is solved.
Further, according to the scheme, the object is cut out according to the coordinates of each frame image where the object is located, the coordinates of each frame image where the object is located and the three-dimensional attitude parameters of the object are output to the second 3D model, the second 3D model is driven to rotate and translate to the corresponding positions where the object appears in each frame image where the object is located according to the coordinates and the three-dimensional attitude parameters, edited contents are projected onto all frames of the same object, pixel points in all frames are replaced, rendering is achieved, the same object which is automatically applied to the whole video frame after the object is edited in a single frame in the video can be achieved, and therefore video editing efficiency of a user is improved, and experience effects are improved.
Drawings
In order to more clearly illustrate the embodiments of the invention or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, it being obvious that the drawings in the following description are only some embodiments of the invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
Fig. 1 is a schematic flow chart of a video editing method based on a 3D reconstruction technology according to an embodiment of the present invention.
Fig. 2 is a schematic structural diagram of a video editing apparatus based on a 3D reconstruction technique according to another embodiment of the present invention.
The realization of the object, the functional characteristics and the advantages of the invention will be further described with reference to the accompanying drawings in connection with the embodiments.
Detailed Description
The invention is described in further detail below with reference to the drawings and examples. It is specifically noted that the following examples are only for illustrating the present invention, but do not limit the scope of the present invention. Likewise, the following examples are only some, but not all, of the examples of the present invention, and all other examples, which a person of ordinary skill in the art would obtain without making any inventive effort, are within the scope of the present invention.
The following describes the invention in detail with reference to examples.
The video editing method based on the 3D reconstruction technology can automatically apply to the same object on the whole video frame after editing the object in a single frame in the video, thereby improving the video editing efficiency of users and improving the experience effect.
Referring to fig. 1, a flow chart of a video editing method based on a 3D reconstruction technology according to an embodiment of the present invention is shown. The method comprises the following steps:
s1, acquiring a video to be edited.
S2, detecting identifiable objects in each frame of the video to be edited.
Wherein detecting the identifiable object in each frame of the video to be edited comprises: and detecting identifiable objects in each frame of the video to be edited by using a universal object detection technology.
In this embodiment, by traversing the entire video, a general object detection technique is used to find out that identifiable objects appear in the video, where the objects include objects, characters, animals, etc. that can be selected and edited by the user.
The above-mentioned general object detection technique is actually that after training a neural network model through labeling of a large amount of data, the neural network model can detect an object contained in an image according to a given image, for example: cats, dogs, people, beds, quilts, etc., and frames the positions of these objects in the image.
Since the video consists essentially of one frame of image, the 1 second video generally comprises 30 frames of images, when the video is detected and identified, each frame of image in the video is input into a neural network model for general object detection, the neural network model can give the object content contained in each frame of image, all the image detection results are gathered, n (for example, 5) objects with highest occurrence frequency are selected and are regarded as the video detection results, and the positions of the objects in the video are marked at the same time.
And S3, reconstructing a first 3D model corresponding to each object by using a neural network.
In this embodiment, according to the requirements of time and precision of an actual application scene, when a user selects a certain object, a 3D model corresponding to the object can be reconstructed by using a neural network according to a single frame or according to multiple frames. When the time requirement is strict, a single frame can be selected to reconstruct a 3D model corresponding to the object; when the precision requirements are strict, multiple frames can be selected to reconstruct a 3D model corresponding to the object.
Wherein reconstructing a first 3D model corresponding to each of the objects using a neural network, comprises: reconstructing the first 3D model corresponding to each object according to the voxel composition of the object through a self-encoder.
Specifically, the image is input through a self-encoding network (auto-encoder), and a 3D model composed of voxels of the object after reconstruction is output. Wherein: the input image may be the object detected by the general object detection technique as described above, and the object is cut out from the image according to the position result detected by the general object detection technique.
Furthermore, for time and accuracy considerations, the 3D model includes two modes: the first is fast, i.e. only 1 image is input; and the other is high-precision, n frames of images (for example, 5 frames) in the video are respectively passed through the neural network model of the first mode, n 3D models are output, and the values of voxels of the models are averaged according to positions, so that the final high-precision 3D model is obtained.
S4, selecting the current frame of the object in the video to be edited, editing the selected object, modifying the first 3D model by the edited content, and generating a second 3D model.
In this embodiment, when the user edits the object on the image, such as changing colors, changing shapes, etc., the change is recorded on the 3D model that has been reconstructed, resulting in a modified 3D model.
S5, carrying out real-time attitude estimation on each frame image where the object is located based on the second 3D model, driving the second 3D model to generate a replacement image according to the attitude estimation, and rendering the replacement image to all frames of the same object of the video to be edited.
The real-time gesture estimation is performed on each frame image where the object is located based on the second 3D model, and the second 3D model is driven to generate a replacement image according to the gesture estimation, and the replacement image is rendered to all frames of the same object of the video to be edited, including:
s5-1, cutting out the object according to the coordinates of each frame of image where the object is located, and inputting the object into the second 3D model;
s5-2, outputting coordinates of each frame of image where the object is located and three-dimensional attitude parameters of the object;
s5-3, driving the second 3D model to rotate and translate to the position of the object in each corresponding frame image according to the coordinates and the three-dimensional gesture parameters, projecting the edited content onto all frames of the same object, and replacing pixel points in all frames to realize rendering.
In this embodiment, a neural network model is trained separately for each object, and the neural network model is input as an image of the object, and output as coordinates (x, y) of the center of the object in the image and three-dimensional gestures (i.e., rotation angles of yaw, pitch, roll three gestures) of the object.
And calling a 3D model of the corresponding object according to the object selected by the user, cutting out an image according to the frame of the object and the coordinates of the corresponding object in the image after the detection of the universal object detection technology in the video, inputting the image into the 3D model, and outputting x, y, yaw, pitch, roll attitude parameters for subsequent use.
And driving the 3D model to rotate and translate to the position of the object in the corresponding frame image by using the 3D model and the 5 output gesture parameters, directly projecting the 3D model to the 2-dimensional image aiming at the editing of the 3D model by a user, replacing pixel points in the frame image, and completing rendering.
For example, in a section of a display video of a home environment, a user selects a quilt detected by a general object detection technology, a neural network model reconstructs a 3D model of the quilt, the color of the quilt on a bed is changed through color mixing, and after editing is confirmed, the color of the quilt in the whole video is modified.
For another example, in a section of self-timer video, people, clothes, etc. in a scene are detected by a general object detection technology, a user selects clothes of a human body, a neural network model reconstructs a 3D model of the clothes of the human body, a pattern of the clothes is changed by editing, and after confirming the editing, the pattern of the clothes in the whole video is modified.
It can be found that according to the scheme, the video to be edited can be obtained, identifiable objects in each frame of the video to be edited are detected, a first 3D model corresponding to each object is rebuilt by using a neural network, a current frame of the object in the video to be edited is selected, the selected object is edited, the edited content is modified to the first 3D model, a second 3D model is generated, real-time gesture estimation is carried out on each frame image of the object based on the second 3D model, the second 3D model is driven according to the gesture estimation to generate a replacement image, and the replacement image is rendered to all frames of the same object of the video to be edited.
Further, in the above scheme, the identifiable object in each frame of the video to be edited is detected by using a general object detection technology, so that the method has the advantages of being capable of accurately identifying a plurality of objects in the video and identifying a plurality of types.
Furthermore, according to the scheme, the self-encoder forms the first 3D model corresponding to the reconstructed object according to the voxels of each object, so that the object on a single frame can be edited in the video, the object can be automatically applied to the whole video, and the difficulty that the frame-by-frame editing is required for the editing in the video is solved.
Further, according to the scheme, the object is cut out according to the coordinates of each frame image where the object is located, the coordinates of each frame image where the object is located and the three-dimensional attitude parameters of the object are output to the second 3D model, the second 3D model is driven to rotate and translate to the corresponding positions where the object appears in each frame image where the object is located according to the coordinates and the three-dimensional attitude parameters, edited contents are projected onto all frames of the same object, pixel points in all frames are replaced, rendering is achieved, the same object which is automatically applied to the whole video frame after the object is edited in a single frame in the video can be achieved, and therefore video editing efficiency of a user is improved, and experience effects are improved.
Referring to fig. 2, a schematic structural diagram of a video editing apparatus based on a 3D reconstruction technique according to another embodiment of the present invention is shown. The device 10 comprises:
an acquisition unit 11 for acquiring a video to be edited;
a detection unit 12, configured to detect an identifiable object in each frame of the video to be edited;
a reconstruction unit 13 for reconstructing a first 3D model corresponding to each of the objects using a neural network;
an editing unit 14, configured to select a current frame of the object in the video to be edited, edit the selected object, and modify the edited content into the first 3D model to generate a second 3D model;
and the rendering unit 15 is configured to perform real-time pose estimation on each frame image where the object is located based on the second 3D model, drive the second 3D model to generate a replacement image according to the pose estimation, and render the replacement image onto all frames of the same object of the video to be edited.
Optionally, the detecting unit 12 is further configured to:
and detecting identifiable objects in each frame of the video to be edited by using a universal object detection technology.
Optionally, the editing unit 14 is further configured to:
reconstructing the first 3D model corresponding to each object according to the voxel composition of the object through a self-encoder.
Optionally, the rendering unit 15 further includes:
an input unit (not labeled in the figure) for cutting out the object according to the coordinates of each frame of image where the object is located, and inputting the object into the second 3D model;
an output unit (not labeled in the figure) for outputting coordinates of each frame of image where the object is located and three-dimensional posture parameters of the object;
and the driving unit (not labeled in the figure) is used for driving the second 3D model to rotate and translate to the position where the object appears in each corresponding frame image according to the coordinates and the three-dimensional posture parameters, projecting the edited content onto all frames of the same object, and replacing pixel points in all frames to realize rendering.
The functions or operation steps implemented when each unit in the video editing apparatus based on the 3D reconstruction technology is executed are substantially the same as those in the foregoing embodiments, and will not be described herein.
The embodiment of the invention also provides video editing equipment based on the 3D reconstruction technology, which comprises a processor, a memory and a computer program stored in the memory, wherein the computer program can be executed by the processor to realize the video editing method based on the 3D reconstruction technology.
The embodiment of the invention also provides a computer readable storage medium, which comprises a stored computer program, wherein the computer program is used for controlling equipment where the computer readable storage medium is located to execute the video editing method based on the 3D reconstruction technology.
The computer program may be divided into one or more units, which are stored in the memory and executed by the processor to accomplish the present invention, for example. The one or more units may be a series of computer program instruction segments capable of performing a specific function for describing the execution of the computer program in a video editing apparatus based on 3D reconstruction technology.
The video editing device based on 3D reconstruction technology may include, but is not limited to, a processor, a memory. It will be appreciated by those skilled in the art that the schematic diagram is merely an example of a video editing device based on 3D reconstruction technology, and does not constitute a limitation of a video editing device based on 3D reconstruction technology, and may include more or less components than illustrated, or may combine certain components, or different components, e.g., the video editing device based on 3D reconstruction technology may further include an input-output device, a network access device, a bus, etc.
The processor may be a central processing unit (Central Processing Unit, CPU), other general purpose processors, digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), off-the-shelf programmable gate arrays (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, or the like. The general purpose processor may be a microprocessor or the processor may be any conventional processor or the like, and the control center of the video editing apparatus based on the 3D reconstruction technology connects various parts of the entire video editing apparatus based on the 3D reconstruction technology using various interfaces and lines.
The memory may be used to store the computer program and/or module, and the processor may implement various functions of the video editing apparatus based on the 3D reconstruction technique by running or executing the computer program and/or module stored in the memory and invoking data stored in the memory. The memory may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program (such as a sound playing function, an image playing function, etc.) required for at least one function, and the like; the storage data area may store data (such as audio data, phonebook, etc.) created according to the use of the handset, etc. In addition, the memory may include high-speed random access memory, and may also include non-volatile memory, such as a hard disk, memory, plug-in hard disk, smart Media Card (SMC), secure Digital (SD) Card, flash Card (Flash Card), at least one disk storage device, flash memory device, or other volatile solid-state storage device.
Wherein the integrated units of the video editing device based on the 3D reconstruction technology may be stored in a computer readable storage medium if implemented in the form of software functional units and sold or used as independent products. Based on such understanding, the present invention may implement all or part of the flow of the method of the above embodiment, or may be implemented by a computer program to instruct related hardware, where the computer program may be stored in a computer readable storage medium, and when the computer program is executed by a processor, the computer program may implement the steps of each of the method embodiments described above. Wherein the computer program comprises computer program code which may be in source code form, object code form, executable file or some intermediate form etc.
The computer readable medium may include: any entity or device capable of carrying the computer program code, a recording medium, a U disk, a removable hard disk, a magnetic disk, an optical disk, a computer Memory, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), an electrical carrier signal, a telecommunications signal, a software distribution medium, and so forth. It should be noted that the computer readable medium contains content that can be appropriately scaled according to the requirements of jurisdictions in which such content is subject to legislation and patent practice, such as in certain jurisdictions in which such content is subject to legislation and patent practice, the computer readable medium does not include electrical carrier signals and telecommunication signals.
It should be noted that the above-described apparatus embodiments are merely illustrative, and the units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed over a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. In addition, in the drawings of the embodiment of the device provided by the invention, the connection relation between the modules represents that the modules have communication connection, and can be specifically implemented as one or more communication buses or signal lines. Those of ordinary skill in the art will understand and implement the present invention without undue burden.
The embodiments in the above examples may be further combined or replaced, and the examples are merely illustrative of preferred embodiments of the present invention and not intended to limit the spirit and scope of the present invention, and various changes and modifications made by those skilled in the art to the technical solutions of the present invention are included in the scope of the present invention without departing from the design concept of the present invention.

Claims (8)

1. A video editing method based on a 3D reconstruction technique, the method comprising:
acquiring a video to be edited;
detecting identifiable objects in each frame of the video to be edited;
reconstructing a first 3D model corresponding to each of the objects using a neural network;
selecting a current frame of the object in the video to be edited, editing the selected object, and modifying the edited content to the first 3D model to generate a second 3D model;
performing real-time pose estimation on each frame image where the object is located based on the second 3D model, driving the second 3D model to generate a replacement image according to the pose estimation, and rendering the replacement image onto all frames of the same object of the video to be edited, wherein the real-time pose estimation comprises the following steps:
cutting out the object according to the coordinates of each frame of image where the object is located, and inputting the object into the second 3D model;
outputting the coordinates of each frame of image where the object is located and the three-dimensional attitude parameters of the object;
and driving the second 3D model to rotate and translate to the position of the object in each corresponding frame image according to the coordinates and the three-dimensional gesture parameters, projecting the edited content onto all frames of the same object, and replacing pixel points in all frames to realize rendering.
2. The method for video editing based on 3D reconstruction technology according to claim 1, wherein the detecting identifiable objects in each frame of the video to be edited comprises:
and detecting identifiable objects in each frame of the video to be edited by using a universal object detection technology.
3. The video editing method according to claim 1, wherein reconstructing a first 3D model corresponding to each of the objects using a neural network comprises:
reconstructing the first 3D model corresponding to each object according to the voxel composition of the object through a self-encoder.
4. A video editing apparatus based on 3D reconstruction technology, the apparatus comprising:
the acquisition unit is used for acquiring the video to be edited;
the detection unit is used for detecting identifiable objects in each frame of the video to be edited;
a reconstruction unit, configured to reconstruct a first 3D model corresponding to each object using a neural network;
the editing unit is used for selecting the current frame of the object in the video to be edited, editing the selected object, modifying the edited content into the first 3D model and generating a second 3D model;
the rendering unit is used for carrying out real-time gesture estimation on each frame image where the object is located based on the second 3D model, driving the second 3D model to generate a replacement image according to the gesture estimation, and rendering the replacement image onto all frames of the same object of the video to be edited;
the rendering unit further includes:
the input unit is used for cutting out the object according to the coordinates of each frame of image where the object is located and inputting the object into the second 3D model;
the output unit is used for outputting the coordinates of each frame of image where the object is located and the three-dimensional attitude parameters of the object;
the driving unit is used for driving the second 3D model to rotate and translate to the position where the object appears in each corresponding frame image according to the coordinates and the three-dimensional gesture parameters, projecting the edited content onto all frames of the same object, and replacing pixel points in all frames to realize rendering.
5. The video editing apparatus based on 3D reconstruction technique according to claim 4, wherein the detecting unit further comprises:
and detecting identifiable objects in each frame of the video to be edited by using a universal object detection technology.
6. The video editing apparatus based on 3D reconstruction technique according to claim 4, wherein the editing unit further comprises:
reconstructing the first 3D model corresponding to each object according to the voxel composition of the object through a self-encoder.
7. A 3D reconstruction technique based video editing apparatus comprising a processor, a memory and a computer program stored in the memory, the computer program being executable by the processor to implement the 3D reconstruction technique based video editing method as claimed in any one of claims 1 to 3.
8. A computer readable storage medium, characterized in that the computer readable storage medium comprises a stored computer program, wherein the computer program, when run, controls a device in which the computer readable storage medium is located to perform the video editing method based on the 3D reconstruction technique according to any one of claims 1 to 3.
CN202010725481.8A 2020-07-24 2020-07-24 Video editing method, device and equipment based on 3D reconstruction technology Active CN111862275B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010725481.8A CN111862275B (en) 2020-07-24 2020-07-24 Video editing method, device and equipment based on 3D reconstruction technology

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010725481.8A CN111862275B (en) 2020-07-24 2020-07-24 Video editing method, device and equipment based on 3D reconstruction technology

Publications (2)

Publication Number Publication Date
CN111862275A CN111862275A (en) 2020-10-30
CN111862275B true CN111862275B (en) 2023-06-06

Family

ID=72950754

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010725481.8A Active CN111862275B (en) 2020-07-24 2020-07-24 Video editing method, device and equipment based on 3D reconstruction technology

Country Status (1)

Country Link
CN (1) CN111862275B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112270736B (en) * 2020-11-16 2024-03-01 Oppo广东移动通信有限公司 Augmented reality processing method and device, storage medium and electronic equipment
CN112767534B (en) * 2020-12-31 2024-02-09 北京达佳互联信息技术有限公司 Video image processing method, device, electronic equipment and storage medium
CN113518187B (en) * 2021-07-13 2024-01-09 北京达佳互联信息技术有限公司 Video editing method and device

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9736449B1 (en) * 2013-08-12 2017-08-15 Google Inc. Conversion of 2D image to 3D video
CN106254941A (en) * 2016-10-10 2016-12-21 乐视控股(北京)有限公司 Method for processing video frequency and device
CN107067429A (en) * 2017-03-17 2017-08-18 徐迪 Video editing system and method that face three-dimensional reconstruction and face based on deep learning are replaced
CN108765529A (en) * 2018-05-04 2018-11-06 北京比特智学科技有限公司 Video generation method and device
CN110475157A (en) * 2019-07-19 2019-11-19 平安科技(深圳)有限公司 Multimedia messages methods of exhibiting, device, computer equipment and storage medium

Also Published As

Publication number Publication date
CN111862275A (en) 2020-10-30

Similar Documents

Publication Publication Date Title
CN111862275B (en) Video editing method, device and equipment based on 3D reconstruction technology
US11915133B2 (en) Techniques for smooth region merging in image editing
Cole et al. Directing Gaze in 3D Models with Stylized Focus.
JP4312249B2 (en) How to create 3D animation from animation data
US20100095236A1 (en) Methods and apparatus for automated aesthetic transitioning between scene graphs
US20050253849A1 (en) Custom spline interpolation
WO2022142081A1 (en) Self-defined animation curve generation method and apparatus
CN114997105A (en) Design template, material generation method, computing device and storage medium
CN103052973B (en) Generate method and the device of body animation
EP3246921A2 (en) Integrated media processing pipeline
CN111158840B (en) Image carousel method and device
GB2391146A (en) Generating animation data with constrained parameters
Luo et al. Controllable motion-blur effects in still images
JP6275759B2 (en) Three-dimensional content generation method, program, and client device
CN115205427A (en) Cartoon face driving method and device
Forsey Motion control and surface modeling of articulated figures in computer animation
CN110877332B (en) Robot dance file generation method and device, terminal device and storage medium
RU2750278C2 (en) Method and apparatus for modification of circuit containing sequence of dots located on image
KR102561020B1 (en) Emulation of hand-drawn lines in CG animation
CN115359158A (en) Animation processing method and device applied to Unity
CN116188638B (en) Method, system, device and medium for realizing custom animation based on three-dimensional engine
EP4328863A1 (en) 3d image implementation method and system
US20220108515A1 (en) Computer Graphics System User Interface for Obtaining Artist Inputs for Objects Specified in Frame Space and Objects Specified in Scene Space
JPH10188026A (en) Method and storage medium for moving image preparation
CN116342759A (en) Method, device, equipment and storage medium for quick offline rendering

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant