CN117745943A - Three-dimensional object reconstruction method, model training method, device, equipment and medium - Google Patents

Three-dimensional object reconstruction method, model training method, device, equipment and medium Download PDF

Info

Publication number
CN117745943A
CN117745943A CN202311764196.7A CN202311764196A CN117745943A CN 117745943 A CN117745943 A CN 117745943A CN 202311764196 A CN202311764196 A CN 202311764196A CN 117745943 A CN117745943 A CN 117745943A
Authority
CN
China
Prior art keywords
sample
image
reconstructed
reconstruction
target
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311764196.7A
Other languages
Chinese (zh)
Inventor
储文青
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN202311764196.7A priority Critical patent/CN117745943A/en
Publication of CN117745943A publication Critical patent/CN117745943A/en
Pending legal-status Critical Current

Links

Landscapes

  • Image Analysis (AREA)

Abstract

The disclosure provides a three-dimensional object reconstruction method and a training method and device of a deep learning model, relates to the technical field of artificial intelligence, in particular to the technical fields of computer vision, augmented reality, virtual reality, deep learning and the like, and can be applied to scenes such as AI navigation, meta universe and the like. The implementation scheme is as follows: obtaining a fusion three-dimensional parameter based on the driving parameter and the three-dimensional parameter of the object to be reconstructed, wherein the three-dimensional parameter of the object to be reconstructed is obtained based on an original image containing the object to be reconstructed; obtaining an initial reconstruction image based on the fusion three-dimensional parameters, wherein the initial reconstruction image comprises a reconstruction object; obtaining an optical flow reconstruction image based on optical flow information between the reconstruction object and the object to be reconstructed and the original image; and generating a target reconstructed image based on the optical flow reconstructed image, the initial reconstructed image and the original image, wherein the target reconstructed image comprises a target reconstructed object, and the target reconstructed object comprises identification information of an object to be reconstructed and information characterized by driving parameters.

Description

Three-dimensional object reconstruction method, model training method, device, equipment and medium
Technical Field
The disclosure relates to the technical field of artificial intelligence, in particular to the technical fields of computer vision, augmented reality, virtual reality, deep learning and the like, and can be applied to scenes such as AI navigation, metauniverse and the like, and more particularly relates to a three-dimensional object reconstruction method, a training method of a deep learning model, a device, electronic equipment, a storage medium and a program product.
Background
Computer vision technology is a science that studies how to "see" a computer. The computer vision technology can be applied to scenes such as image recognition, image semantic understanding, image retrieval, three-dimensional object reconstruction, virtual reality, synchronous positioning, map construction and the like. Taking a three-dimensional object Reconstruction scene as an example, by using a three-dimensional Reconstruction technology (3D Reconstruction) in a computer vision technology, three-dimensional information can be reconstructed from a single image or multiple images, and objects or scenes in the real world can be expressed through computer vision. For each scene, how to use the computer vision technology to make the generated result reasonable and accurate is worth exploring.
Disclosure of Invention
The present disclosure provides a three-dimensional object reconstruction method, a training method of a deep learning model, an apparatus, an electronic device, a storage medium, and a program product.
According to an aspect of the present disclosure, there is provided a three-dimensional object reconstruction method including: obtaining a fusion three-dimensional parameter based on the driving parameter and the three-dimensional parameter of the object to be reconstructed, wherein the three-dimensional parameter of the object to be reconstructed is obtained based on an original image containing the object to be reconstructed; obtaining an initial reconstructed image based on the fusion three-dimensional parameters, wherein the initial reconstructed image comprises a reconstructed object; obtaining an optical flow reconstruction image based on optical flow information between the reconstruction object and the object to be reconstructed and the original image, wherein the optical flow information is used for representing the offset between the pixels of the reconstruction object and the pixels of the object to be reconstructed; and generating a target reconstructed image based on the optical flow reconstructed image, the initial reconstructed image and the original image, wherein the target reconstructed image comprises a target reconstructed object, and the target reconstructed object comprises identification information of an object to be reconstructed and information characterized by driving parameters.
According to another aspect of the present disclosure, there is provided a training method of a deep learning model, including:
obtaining a sample fusion three-dimensional parameter based on the sample driving parameter and the sample three-dimensional parameter of the sample object, wherein the sample three-dimensional parameter of the sample object is obtained based on a sample original image containing the sample object; obtaining a sample initial reconstruction image based on the sample fusion three-dimensional parameters, wherein the sample initial reconstruction image comprises a sample reconstruction object; obtaining a sample optical flow reconstruction image based on sample optical flow information between a sample reconstruction object and the sample object and a sample original image, wherein the sample optical flow information is used for representing sample offset between pixels of the sample reconstruction object and pixels of the sample object; generating a sample target reconstructed image based on the sample optical flow reconstructed image, the sample initial reconstructed image and the sample original image, wherein the sample target reconstructed image comprises a sample target reconstructed object, and the sample target reconstructed object comprises identification information of the sample object and information characterized by sample driving parameters; and training the deep learning model based on the sample target reconstructed image and the sample driving parameters to obtain a trained deep learning model.
According to another aspect of the present disclosure, there is provided a three-dimensional object reconstruction apparatus including: the fusion module is used for obtaining fusion three-dimensional parameters based on the driving parameters and the three-dimensional parameters of the object to be reconstructed, wherein the three-dimensional parameters of the object to be reconstructed are obtained based on the original image containing the object to be reconstructed; the first reconstruction module is used for obtaining an initial reconstruction image based on the fusion of the three-dimensional parameters, wherein the initial reconstruction image comprises a reconstruction object; the second reconstruction module is used for obtaining an optical flow reconstruction image based on optical flow information between the reconstruction object and the object to be reconstructed and the original image, wherein the optical flow information is used for representing the offset between the pixels of the reconstruction object and the pixels of the object to be reconstructed; and a third reconstruction module for generating a target reconstructed image based on the optical flow reconstructed image, the initial reconstructed image and the original image, wherein the target reconstructed image comprises a target reconstructed object, and the target reconstructed object comprises identification information of the object to be reconstructed and information characterized by driving parameters.
According to another aspect of the present disclosure, there is provided a training apparatus of a deep learning model, including: the sample fusion module is used for obtaining sample fusion three-dimensional parameters based on the sample driving parameters and sample three-dimensional parameters of the sample object, wherein the sample three-dimensional parameters of the sample object are obtained based on a sample original image containing the sample object; the first sample reconstruction module is used for obtaining a sample initial reconstruction image based on the sample fusion three-dimensional parameters, wherein the sample initial reconstruction image comprises a sample reconstruction object; the second sample reconstruction module is used for obtaining a sample optical flow reconstruction image based on sample optical flow information between a sample reconstruction object and the sample object and a sample original image, wherein the sample optical flow information is used for representing sample offset between pixels of the sample reconstruction object and pixels of the sample object; the third sample reconstruction module is used for generating a sample target reconstruction image based on the sample optical flow reconstruction image, the sample initial reconstruction image and the sample original image, wherein the sample target reconstruction image comprises a sample target reconstruction object, and the sample target reconstruction object comprises identification information of the sample object and information characterized by sample driving parameters; and the training module is used for training the deep learning model based on the sample target reconstructed image and the sample driving parameters to obtain a trained deep learning model.
According to another aspect of the present disclosure, there is provided an electronic device including: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform a three-dimensional object reconstruction method and/or a training method of a deep learning model as disclosed herein.
According to another aspect of the present disclosure, there is provided a non-transitory computer-readable storage medium storing computer instructions for causing a computer to perform a three-dimensional object reconstruction method and/or a training method of a deep learning model as the present disclosure.
According to another aspect of the present disclosure, a computer program product is provided, comprising a computer program which, when executed by a processor, implements a three-dimensional object reconstruction method and/or a training method of a deep learning model as disclosed herein.
It should be understood that the description in this section is not intended to identify key or critical features of the embodiments of the disclosure, nor is it intended to be used to limit the scope of the disclosure. Other features of the present disclosure will become apparent from the following specification.
Drawings
The drawings are for a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:
FIG. 1 schematically illustrates an exemplary system architecture to which a three-dimensional object reconstruction method, a training method of a deep learning model, and an apparatus may be applied, according to an embodiment of the present disclosure;
FIG. 2 schematically illustrates a flow chart of a three-dimensional object reconstruction method according to an embodiment of the present disclosure;
FIG. 3A schematically illustrates a flow diagram for generating an initial reconstructed image according to an embodiment of the disclosure;
FIG. 3B schematically illustrates a flow diagram for generating an optical flow reconstruction image in accordance with an embodiment of the present disclosure;
FIG. 3C schematically illustrates a flow diagram for generating a target reconstructed image according to an embodiment of the disclosure; FIG. 4 schematically illustrates a flow chart of a three-dimensional object reconstruction method according to another embodiment of the present disclosure;
FIG. 5 schematically illustrates a structural schematic of an object reconstruction model according to another embodiment of the present disclosure;
FIG. 6 schematically illustrates a flow chart of a training method of a deep learning model according to an embodiment of the present disclosure;
FIG. 7 schematically illustrates a block diagram of a three-dimensional object reconstruction apparatus according to an embodiment of the present disclosure;
FIG. 8 schematically illustrates a block diagram of a training apparatus of a deep learning model according to an embodiment of the present disclosure; and
Fig. 9 schematically illustrates a block diagram of an electronic device adapted to implement a three-dimensional object reconstruction method and/or a training method of a deep learning model, according to an embodiment of the disclosure.
Detailed Description
Exemplary embodiments of the present disclosure are described below in conjunction with the accompanying drawings, which include various details of the embodiments of the present disclosure to facilitate understanding, and should be considered as merely exemplary. Accordingly, one of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
Fig. 1 schematically illustrates an exemplary system architecture to which a three-dimensional object reconstruction method, a training method of a deep learning model, and an apparatus may be applied according to an embodiment of the present disclosure.
It should be noted that fig. 1 is only an example of a system architecture to which embodiments of the present disclosure may be applied to assist those skilled in the art in understanding the technical content of the present disclosure, but does not mean that embodiments of the present disclosure may not be used in other devices, systems, environments, or scenarios. For example, in another embodiment, an exemplary system architecture to which the three-dimensional object reconstruction method, the training method of the deep learning model, and the apparatus may be applied may include a terminal device, but the terminal device may implement the three-dimensional object reconstruction method, the training method of the deep learning model, and the apparatus provided by the embodiments of the present disclosure without interacting with a server.
As shown in fig. 1, a system architecture 100 according to this embodiment may include terminal devices 101, 102, 103, a network 104, and a server 105. The network 104 is used as a medium to provide communication links between the terminal devices 101, 102, 103 and the server 105. The network 104 may include various connection types, such as wired and/or wireless communication links, and the like.
The user may interact with the server 105 via the network 104 using the terminal devices 101, 102, 103 to receive or send messages or the like. Various communication client applications may be installed on the terminal devices 101, 102, 103, such as a knowledge reading class application, a web browser application, a search class application, an instant messaging tool, a mailbox client and/or social platform software, etc. (as examples only).
The terminal devices 101, 102, 103 may be a variety of electronic devices having a display screen and supporting web browsing, including but not limited to smartphones, tablets, laptop and desktop computers, and the like.
The server 105 may be a server providing various services, such as a background management server (by way of example only) providing support for content browsed by the user using the terminal devices 101, 102, 103. The background management server may analyze and process the received data such as the user request, and feed back the processing result (e.g., the web page, information, or data obtained or generated according to the user request) to the terminal device.
It should be noted that, the three-dimensional object reconstruction method and/or the training method of the deep learning model provided in the embodiments of the present disclosure may be generally performed by the terminal device 101, 102, or 103. Accordingly, the three-dimensional object reconstruction apparatus and/or the training apparatus of the deep learning model provided by the embodiments of the present disclosure may also be provided in the terminal device 101, 102, or 103.
Alternatively, the three-dimensional object reconstruction method and/or the training method of the deep learning model provided by the embodiments of the present disclosure may also be generally performed by the server 105. Accordingly, the three-dimensional object reconstruction device and/or training device of the deep learning model provided by the embodiments of the present disclosure may be generally provided in the server 105. The three-dimensional object reconstruction method and/or the training method of the deep learning model provided by the embodiments of the present disclosure may also be performed by a server or a server cluster that is different from the server 105 and is capable of communicating with the terminal devices 101, 102, 103 and/or the server 105. Accordingly, the three-dimensional object reconstruction apparatus and/or the training apparatus of the deep learning model provided by the embodiments of the present disclosure may also be provided in a server or a server cluster that is different from the server 105 and is capable of communicating with the terminal devices 101, 102, 103 and/or the server 105.
For example, when a user reconstructs a face online, the terminal devices 101, 102, 103 may acquire an original image and a driving image including the face to be reconstructed input by the user, then send the acquired original image and driving image including the face to be reconstructed to the server 105, and the server 105 processes the original image and driving image including the face to be reconstructed to obtain driving parameters and three-dimensional parameters of the face to be reconstructed; obtaining a fusion three-dimensional parameter based on the driving parameter and the three-dimensional parameter of the face to be reconstructed; obtaining an initial reconstructed image based on the fusion three-dimensional parameters; obtaining an optical flow reconstruction image based on optical flow information between the driving image and the face to be reconstructed and the original image; and generating a target reconstructed image based on the optical flow reconstructed image, the initial reconstructed image, and the original image. Or the original image containing the face to be reconstructed and the driving image are processed by a server or a server cluster capable of communicating with the terminal devices 101, 102, 103 and/or the server 105 and finally the target reconstructed image is achieved.
It should be understood that the number of terminal devices, networks and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.
In the technical scheme of the disclosure, the related processes of collecting, storing, using, processing, transmitting, providing, disclosing, applying and the like of the personal information of the user all conform to the regulations of related laws and regulations, necessary security measures are adopted, and the public order harmony is not violated.
In the technical scheme of the disclosure, the authorization or consent of the user is obtained before the personal information of the user is obtained or acquired.
Fig. 2 schematically illustrates a flow chart of a three-dimensional object reconstruction method according to an embodiment of the present disclosure.
As shown in fig. 2, the method includes operations S210 to S240.
In operation S210, a fused three-dimensional parameter is obtained based on the driving parameter and the three-dimensional parameter of the object to be reconstructed.
According to embodiments of the present disclosure, the three-dimensional parameters of the object to be reconstructed may be derived based on an original image comprising the object to be reconstructed.
According to embodiments of the present disclosure, the driving parameters may be derived based on the driving image. The driving image may contain a driving object.
According to an embodiment of the present disclosure, the driving parameter may be further obtained based on a driving video, where each frame of image of the driving video contains a driving object. According to embodiments of the present disclosure, three-dimensional parameters of an object to be reconstructed may be used to characterize identification information of the object to be reconstructed.
According to embodiments of the present disclosure, the fused three-dimensional parameters may be used to characterize the identification information of the object to be reconstructed and the information characterized by the driving parameters, e.g. the attribute information of the driving object characterized by the driving parameters. The driving parameters can be fused to the three-dimensional parameters of the object to be reconstructed to obtain fused three-dimensional parameters.
For example, in the case where the object to be reconstructed is a face to be reconstructed, the driving parameters may be directly spliced to the three-dimensional parameters of the face to be reconstructed. But is not limited thereto. The driving parameters can also be updated to the three-dimensional parameters of the face to be reconstructed.
In operation S220, an initial reconstructed image is obtained based on the fused three-dimensional parameters.
According to an embodiment of the present disclosure, the initial reconstructed image may comprise a reconstructed object.
According to the embodiment of the disclosure, the image can be subjected to image rendering according to the fused three-dimensional parameters, so as to obtain an initial reconstructed image. The rendering method may be a rendering method based on a renderer or a rendering method based on deep learning.
For example, the renderer may render the three-dimensional parameters by inputting them into the renderer, and graphically determining the position and state information of the reconstructed object based on the three-dimensional parameters. An initial reconstructed image is generated based on the position and state information of the reconstructed object.
In operation S230, an optical flow reconstructed image is obtained based on optical flow information between the reconstructed object and the object to be reconstructed and the original image.
According to an embodiment of the present disclosure, the optical flow information is used to characterize an offset between a pixel of the reconstructed object and a pixel of the object to be reconstructed.
According to embodiments of the present disclosure, the optical flow information may be determined from a difference between coordinates of a pixel of the reconstructed object and coordinates of a pixel of the object to be reconstructed.
According to the embodiment of the disclosure, the initial reconstructed image including the reconstructed object and the original image including the object to be reconstructed may be input into the optical flow prediction network after being spliced according to the channels, and the optical flow information may be output. The optical flow prediction network may include an optical flow prediction network 2.0 (flownet2.0) or an optical flow prediction network 1.0 (flownet1.0).
According to the embodiment of the disclosure, an original image can be reconstructed according to optical flow information, and an optical flow reconstructed image is obtained.
In operation S240, a target reconstructed image is generated based on the optical flow reconstructed image, the initial reconstructed image, and the original image.
According to an embodiment of the present disclosure, the target reconstructed image comprises a target reconstructed object comprising identification information of the object to be reconstructed and information characterized by driving parameters.
According to the embodiment of the disclosure, the optical flow reconstructed image, the initial reconstructed image and the original image can be input into a generated countermeasure network after being spliced according to the channels, and the target reconstructed image is generated.
According to the embodiment of the disclosure, the driving parameters are fused to the three-dimensional parameters of the object to be reconstructed, an initial reconstructed image is obtained, and the initial reconstruction operation is completed. And applying the motion optical flow of the reconstructed object to the original image to obtain an optical flow reconstructed image, and finishing the secondary reconstruction operation. And reconstructing the image again by utilizing the optical flow reconstructed image, the initial reconstructed image and the original image to generate a target reconstructed image, and completing three reconstruction operations. The reconstruction effect of the object can be improved by utilizing multiple reconstruction operations, so that the identification information of the object to be reconstructed is maintained, and meanwhile, the information represented by the driving parameters is accurately fused to the object to be reconstructed, so that the reconstruction effect of the target reconstructed image is natural and has high acceptance.
According to a related example, an original image of a face to be reconstructed and driving parameters can be processed by using a Deep face-changing (Deep face) technology, so as to directly obtain a reconstructed face image.
Compared with a related example, the three-dimensional object reconstruction method provided by the embodiment of the disclosure can maintain the background of the original image and has good reconstruction effect on a single image.
According to another related example, the three-dimensional expression driving technology may be further utilized to process an original image and driving parameters of the driven object, and replace the driving parameters of the driven object, such as the expression parameters and the gesture parameters, onto the driven object to obtain a reconstructed driving image.
Compared with a related example, by using the three-dimensional object reconstruction method provided by the embodiment of the disclosure, three-dimensional reconstruction errors are reduced through repeated reconstruction, three-dimensional parameter information of a driven object is kept, and a target reconstruction image displaying texture information inside a mouth can be obtained through an original image of a target object containing the closed mouth.
According to another related example, the method can also process the original image of the driven object based on the expression driving technology of the key points of the human face, detect the key points of the driving image and the original image, model the key point change of the driving image into the key point change of the original image through the key point migration technology, obtain the optical flow information of the image, apply the optical flow information to the original image, and then finish the improvement of the original image through generating the countermeasure network, thus finishing the whole driving process.
Compared with a related example, by using the three-dimensional object reconstruction method provided by the embodiment of the disclosure, key point detection errors can be reduced, expression parameters and gesture parameters are not mixed in reconstruction, confusion is avoided, and therefore a driving effect is good. Even in the case where there is a difference in the face shapes of the driving image and the driven image, the driving effect is still good.
According to another embodiment of the present disclosure, driving parameters may be extracted from a driving object that drives each frame of image in a video, constituting a driving parameter sequence. And updating the three-dimensional parameters of the object to be reconstructed by utilizing each driving parameter in the driving parameter sequence to obtain a fusion three-dimensional parameter sequence corresponding to the driving parameter sequence one by one. For the fused three-dimensional parameter sequence, an initial reconstructed image sequence corresponding to the fused three-dimensional parameter sequence one by one can be obtained. For each image in the initial reconstructed image sequence, an object reconstructed image sequence corresponding one-to-one to the initial reconstructed image sequence may be generated in combination with the optical flow reconstructed image and the original image. A drive video is generated based on the target reconstructed image sequence.
According to the embodiment of the disclosure, the driving video is generated through the target reconstructed image sequence, so that the static image can become fresh and vivid, the driving effect is natural, and the method can be applied to manufacturing various interesting videos, personalized expression packages, single-image digital demographics videos and the like.
According to an embodiment of the present disclosure, for operation S210 shown in fig. 2, obtaining a fused three-dimensional parameter based on a driving parameter and a three-dimensional parameter of an object to be reconstructed may include: three-dimensional parameters of a driving object in a driving image are extracted. The driving parameters are determined from the three-dimensional parameters of the driving object. And updating the target three-dimensional subparameter in the three-dimensional parameters of the object to be reconstructed by using the driving parameters to obtain the fused three-dimensional parameters.
According to embodiments of the present disclosure, the target three-dimensional sub-parameter may include a parameter consistent with a parameter type of the driving parameter.
According to an embodiment of the present disclosure, the three-dimensional parameter of the driving object may include at least one of: expression parameters of the driving object, gesture parameters of the driving object, visual parameters of the driving object, identity parameters of the driving object, background parameters of the driving object and texture parameters of the driving object. The three-dimensional parameters of the object to be reconstructed may comprise at least one of: expression parameters of the object to be reconstructed, gesture parameters of the object to be reconstructed, visual parameters of the object to be reconstructed, identity parameters of the object to be reconstructed, background parameters of the object to be reconstructed and texture parameters of the object to be reconstructed.
According to the embodiment of the disclosure, the identity parameter of the object to be reconstructed, the background parameter of the object to be reconstructed and the texture parameter of the object to be reconstructed can be used as the identification information of the object to be reconstructed.
According to embodiments of the present disclosure, a feature extraction network, such as a depth residual network 50 (ResNet 50), may be employed to extract three-dimensional parameters of a driving object in a driving image. The present disclosure is not particularly limited.
According to the embodiments of the present disclosure, parameters for driving may be screened from three-dimensional parameters of a driving object as driving parameters. For example, expression parameters of the driving object, posture parameters of the driving object, visual parameters of the driving object may be screened as driving parameters.
According to the embodiment of the disclosure, the expression parameter of the object to be reconstructed, the gesture parameter of the object to be reconstructed and the visual parameter of the object to be reconstructed in the three-dimensional parameters of the object to be reconstructed can be used as target three-dimensional subparameters. The target three-dimensional subparameter in the three-dimensional parameters of the object to be reconstructed can be replaced by the driving parameters to obtain the fusion three-dimensional parameters. And obtaining the fusion three-dimensional parameter based on the expression parameter of the driving object, the gesture parameter of the driving object, the visual parameter of the driving object, the identity parameter of the object to be reconstructed, the background parameter of the object to be reconstructed and the texture parameter of the object to be reconstructed.
According to the embodiment of the disclosure, the driving parameters are fused into the three-dimensional parameters of the object to be reconstructed by updating the target three-dimensional subparameters in the three-dimensional parameters of the object to be reconstructed by using the driving parameters, so that the fusion mode is simple and is suitable for fusion among different objects.
According to another embodiment of the present disclosure, for operation S210 shown in fig. 2, obtaining a fused three-dimensional parameter based on the driving parameter and the three-dimensional parameter of the object to be reconstructed may further include: the parameter variation between the reference object and the driving object is determined. And obtaining the fusion three-dimensional parameters based on the parameter variation and the three-dimensional parameters of the object to be reconstructed.
According to an embodiment of the present disclosure, the driving object is used for driving reconstruction of an object to be reconstructed, the object identifiers of the reference object and the driving object are the same, and the driving parameters of the reference object and the driving object are different. The reference object and the driving object have the same object identification, and it is understood that the reference object and the driving object have the same identification information. For example, at least one of the identity parameter, the background parameter, the texture parameter are the same.
According to an embodiment of the present disclosure, it is possible to determine a parameter variation amount between driving parameters of objects different from an initial object by traversing driving parameters of a plurality of initial objects, and in the case where it is determined that the parameter variation amount is the lowest, determine the corresponding initial object as a reference object. The object identifiers of the initial object, the reference object and the driving object are the same, and the driving parameters of the initial object, the reference object and the driving object are different. An initial object whose similarity meets a threshold value may also be used as a reference object by determining the similarity of driving parameters between a plurality of initial objects.
According to the embodiment of the disclosure, the driving parameter of the reference object and the driving parameter of the driving object can be made to be different, so as to obtain the parameter variation.
According to the embodiment of the disclosure, the parameter variation can be weighted and summed with the three-dimensional parameter of the object to be reconstructed to obtain the fused three-dimensional parameter.
According to the embodiment of the disclosure, by determining the parameter variation, the method of fusing the parameter variation to the three-dimensional parameter of the object to be reconstructed can effectively avoid reconstructing the identification information of the driving object to the target reconstruction object, and avoid mixing among a plurality of parameters, so that the reconstruction effect is good.
Fig. 3A schematically illustrates a flow diagram for generating an initial reconstructed image according to an embodiment of the disclosure.
As shown in fig. 3A, the driving image 301 and the original image 302 may be input to the feature extraction network M1, respectively, and the driving parameters 303 and the three-dimensional parameters 304 of the object to be reconstructed are output. Based on the driving parameters 303 and the three-dimensional parameters 304 of the object to be reconstructed, fusion three-dimensional parameters 305 are obtained. The fused three-dimensional parameters 305 are input to the renderer M2, outputting an initial reconstructed image 306. The driving image 301 is shown as I in FIG. 3A d Showing the same. Original image 302 is shown as I in FIG. 3A s Showing the same. The initial reconstructed image 306 is shown as I in FIG. 3A render Showing the same.
According to an embodiment of the present disclosure, the driving parameters 303 may include expression (β d Exp), pose (P d -phase), vision (τ d -gaze), etc. The three-dimensional parameters 304 of the object to be reconstructed may comprise an identification (alpha s Identity), illumination (gamma s Lighting), texture (alpha s Texture) and the like. The parametric feature extraction network M1 may comprise a depth residual network 50. The renderer may include a renderer.
In accordance with an embodiment of the present disclosure, before operation S230 for as shown in fig. 2, the three-dimensional object reconstruction method may further include the operations of: and fusing the initial target reconstruction image and the original image to obtain a first fused image. And obtaining optical flow information based on the first fused image and the fused three-dimensional parameters.
According to the embodiment of the disclosure, the initial target reconstruction image and the original image can be subjected to channel fusion to obtain a first fusion image.
According to embodiments of the present disclosure, a first feature may be extracted for an initial target reconstructed image. For example, the first feature may include at least one of: image shape, image edges, image areas, image contours, image textures, basic properties of objects in the image, etc. The second feature may be extracted for the original image. For example, the second feature may include at least one of: basic properties of the object to be reconstructed, image illumination properties, image texture, image shape, image edges, image areas, image contours, etc. And carrying out feature fusion on the first features and the second features to obtain first fusion features, and obtaining a first fusion image based on the first fusion features.
According to an embodiment of the present disclosure, a first fused image and fused three-dimensional parameters are input into an optical flow prediction network, and optical flow information is output.
According to the embodiment of the disclosure, the original image is fused in the initial target reconstruction image, and then the optical flow information is accurately determined by combining the fused three-dimensional parameters, so that the object reconstruction precision is improved, and the problems of low object-based key point reconstruction precision and poor reconstruction effect caused by the detection error of the key point can be at least partially solved.
According to an embodiment of the present disclosure, for operation S230 as shown in fig. 2, the operation may be performed by a flowchart as shown in fig. 3B.
FIG. 3B schematically illustrates a flow diagram for generating an optical flow reconstruction image in accordance with an embodiment of the present disclosure.
As shown in fig. 3B, the original image 302 and the initial reconstructed image 305 may be fused, for example, by a channel stitching fusion method, to obtain a first fused image 306. The first fused image 306 and the fused three-dimensional parameters are input into the optical flow prediction network M3, optical flow information 307 is output, and the optical flow information 307 is applied to the original image 302 to obtain an optical flow reconstructed image 308. The optical flow reconstructed image 308 is shown as I in FIG. 3B warp Showing the same.
According to an embodiment of the present disclosure, for operation S240 as shown in fig. 2, generating a target reconstructed image based on the optical flow reconstructed image, the initial reconstructed image, and the original image may include: an initial target reconstructed image and a mask image are generated based on the optical flow reconstructed image, the initial reconstructed image, and the original image. A target reconstructed image is generated based on the mask image, the initial target reconstructed image, and the optical flow reconstructed image.
According to embodiments of the present disclosure, the initial target reconstructed image and the mask image may be generated by inputting the optical flow reconstructed image, the initial reconstructed image, and the original image to a generation countermeasure network.
According to embodiments of the present disclosure, a mask image may include a plurality of pixels. The pixel value of each pixel is used to characterize the difference between a pixel in the initial target reconstructed image and a pixel in the optical flow processed reconstructed image.
According to embodiments of the present disclosure, the mask image may be consistent with the size of the initial target reconstructed image and the optical flow reconstructed image. The mask image may be an image used to generate a region containing edges or hidden regions. In reconstructing an original image containing an object to be reconstructed based on driving parameters, the edge region may include a face edge, which may be missing from the reconstructed object in the original target reconstructed image due to the turning motion. The hidden area may include teeth in the mouth, etc., and as the closed mouth is changed to open mouth, there may be a missing tooth or the like in the reconstructed object in the initial target reconstructed image.
According to embodiments of the present disclosure, the difference between a pixel in the initial target reconstructed image and a pixel in the optical flow reconstructed image may be determined from the pixel value of the pixel, e.g., the pixel value of the pixel may be a value between 0-1. The target reconstructed image is generated from pixels in the initial target reconstructed image and pixels in the optical flow reconstructed image having differences.
According to the embodiment of the disclosure, based on the mask image, the information of defects or distortions in the optical flow processing process can be complemented, so that the reconstruction effect is good. In addition, by generating the mask image, the generation countermeasure network for reconstruction can reduce the learning of the characteristics and accelerate the reconstruction in the learning process, thereby reducing the computing resources and reducing the network loss.
According to another embodiment of the present disclosure, optical flow reconstruction features, initial reconstruction features, and original features of the optical flow reconstruction image, the initial reconstruction image, and the original image, respectively, may be extracted. And fusing the optical flow reconstruction feature, the initial reconstruction feature and the original feature to obtain the target feature. And reconstructing an initial target reconstructed image according to the target characteristics. And carrying out masking operation on the initial target reconstruction image to determine a masking image.
For example, the masking operation may be to recalculate the values of the individual pixels in the initial target reconstructed image by a masking kernel that characterizes the extent to which neighboring pixels affect the new pixel values, while weighted averaging the original pixels in the optical flow reconstructed image according to weighting factors in the masking kernel.
According to the embodiment of the disclosure, the optical flow reconstructed image, the initial reconstructed image and the original image can be input into the generation countermeasure network after being spliced according to the channels, and the initial target reconstructed image and the mask image are output. The generation of the challenge network may be pre-trained.
According to embodiments of the present disclosure, the degree of pixel matching in the mask image and the initial target reconstructed image may be determined from each pixel in the mask image and the pixels in the initial target reconstructed image. In the case that the pixel matching degree is determined not to be smaller than the matching degree threshold value, the initial target reconstructed image is determined as the target reconstructed image. In the case where it is determined that the pixel matching degree is smaller than the matching degree threshold, the optical flow reconstructed image is determined as the target reconstructed image. The matching degree threshold value can be determined according to reconstruction accuracy in practical application. According to the embodiment of the disclosure, the target reconstruction image is generated based on the mask image on the basis of generating the initial target reconstruction image, so that the object reconstruction precision can be improved, and the efficiency of generating the target reconstruction image can be improved.
Fig. 3C schematically illustrates a flow diagram for generating a target reconstructed image according to an embodiment of the disclosure.
As shown in fig. 3C, the original image 302, the optical flow reconstructed image 308, and the initial reconstructed image 305 may be fused to obtain a second fused image 309. The second fused image 309 is input to the generation countermeasure network M4, and the initial target reconstructed image 310 and the mask image 311 are output. Based on the mask image 311, the initial target reconstructed image 310, and the optical flow reconstructed image 308, a target reconstructed image 312 is generated. The initial target reconstructed image 310 is shown as I in FIG. 3C out Showing the same. Mask image 311 is shown in fig. 3C as Mask.
According to another embodiment of the present disclosure, generating an initial target reconstructed image and a mask image based on an optical flow reconstructed image, an initial reconstructed image, and an original image, includes: and fusing the optical flow reconstruction image, the initial reconstruction image and the original image to obtain a second fused image. An initial target reconstructed image and a mask image are generated based on the second fused image and the fused three-dimensional parameters.
According to the embodiment of the disclosure, the optical flow reconstruction image, the initial reconstruction image and the original image can be subjected to channel fusion to obtain a second fused image.
According to embodiments of the present disclosure, the second fused image and the fused three-dimensional parameters may be input to generate an countermeasure network, generating an initial target reconstructed image and a mask image.
According to the embodiment of the disclosure, further fine reconstruction of the object is achieved by reconstructing the initial target reconstructed image again, while generating the mask image, which is advantageous for still further fine reconstruction of the object.
According to another embodiment of the present disclosure, the driving parameter and the three-dimensional parameter of the object to be reconstructed may be fused to obtain a fused three-dimensional parameter Z des . Will fuse the three-dimensional parameters Z des And the second fusion image is input simultaneously to generate an countermeasure network, and the initial target reconstruction image and the mask image are output.
According to an embodiment of the present disclosure, generating the target reconstructed image based on the mask image, the initial target reconstructed image, and the optical flow reconstructed image may further include: for each pixel in the mask image, a first pixel is determined from the initial target reconstructed image that matches the pixel, and a second pixel is determined from the optical flow reconstructed image that matches the pixel. And fusing the first pixel and the second pixel based on the pixel value of the pixel to obtain a target pixel. A target reconstructed image is generated based on the plurality of target pixels.
According to the embodiment of the disclosure, an image matching algorithm can be facilitated, a template with a preset size is stacked on a mask image and translated, and a mask sub-image covered by the template is determined. And determining the first similarity based on the gray level difference value of the corresponding positions of the mask sub-images covered by the template and the template. And stacking the template with the preset size on the initial target reconstruction image for translation, and determining an initial target reconstruction sub-image covered by the template. And determining a second similarity based on the template and the gray level difference value of the corresponding position of the initial target reconstruction sub-image covered by the template. A first pixel is determined from the initial target reconstructed image that matches the pixel based on the first similarity and the second similarity.
According to the embodiment of the present disclosure, in the case where it is determined that the first similarity and the second similarity are the same, the initial target reconstructed sub-image corresponding to the second similarity may be determined as an image similar to the mask sub-image corresponding to the first similarity. A first pixel is determined based on pixels in the initial target reconstructed sub-image.
According to embodiments of the present disclosure, a template of a predetermined size may be superimposed on the optical flow reconstruction image and translated to determine a template-covered optical flow reconstruction sub-image. And determining a third similarity based on the template and the gray level difference value of the corresponding position of the optical flow reconstruction sub-image covered by the template. Based on the first similarity and the third similarity, a second pixel matching the pixel is determined from the optical flow reconstruction image.
According to the embodiment of the disclosure, when it is determined that the first similarity and the third similarity are the same, the optical flow reconstruction sub-image corresponding to the third similarity may be determined as an image similar to the mask sub-image corresponding to the first similarity. A second pixel is determined based on the pixels in the optical flow reconstruction sub-image.
According to an embodiment of the disclosure, the first pixel may comprise a plurality of pixels, and the position of each pixel in the first pixel may be determined according to the position of the optical flow reconstruction sub-image in the optical flow reconstruction image. The second pixel may comprise a plurality of pixels, and the position of each pixel in the second pixel may be determined based on the position of the initial target reconstructed sub-image in the initial target reconstructed image. The target pixels may include a first target pixel and a second target pixel. The first pixel and the second pixel at the same position can be combined to obtain a first target pixel; a first pixel or a second pixel where there is a different position is determined as a second target pixel.
According to embodiments of the present disclosure, a target reconstructed image may be generated from a first target pixel and a second target pixel.
According to embodiments of the present disclosure, for each pixel in the mask image, a first pixel that matches the pixel may be determined from the initial target reconstructed image and a second pixel that matches the pixel may be determined from the optical flow reconstructed image based on the image coordinates of the pixel. And carrying out weighted summation on the pixel value of the first pixel and the pixel value of the second pixel based on the pixel value of the pixel to obtain the target pixel. A target reconstructed image is generated based on the plurality of target pixels.
According to embodiments of the present disclosure, the pixel values of the pixels in the mask image may be a number between 0 and 1. Based on the pixel value of the pixel, performing weighted summation on the pixel value of the first pixel and the pixel value of the second pixel to obtain a target pixel, which may include: w1 is the pixel value of the first pixel and +w2 is the pixel value of the second pixel in the mask image (1 is the pixel value of the pixel in the mask image). W1 and W2 are preset weights respectively.
According to the embodiment of the disclosure, the pixel values of the pixels in the mask image can be used for determining the incomplete area or the distorted area of the initial target reconstruction image relative to the target reconstruction image, the optical flow reconstruction image can be used for further perfecting the initial target reconstruction image, for example, the mask image, the initial target reconstruction image and the optical flow reconstruction image can be used for perfecting texture information in the display mouth, so that the reconstruction effect of the target reconstruction image is improved.
According to the embodiment of the disclosure, based on the target pixel obtained by fusing the first pixel and the second pixel, the generated target reconstructed image not only comprises the pixel characteristics of the initial target reconstructed image, but also comprises the pixel characteristics of the optical flow reconstructed image, so that the reconstruction accuracy is improved, and meanwhile, the processing operation is simple and the efficiency is high.
Fig. 4 schematically illustrates a flow chart of a three-dimensional object reconstruction method according to another embodiment of the present disclosure.
As shown in fig. 4, the method includes operations S401 to S409.
In operation S401, an original image is acquired.
According to an embodiment of the present disclosure, the original image contains a human face.
In operation S402, three-dimensional parameters of a face are extracted.
According to the embodiment of the disclosure, the face three-dimensional parameters can be extracted from the obtained original image containing the face, and the face three-dimensional parameters of the original image are obtained.
In operation S403, a driving video is acquired.
In operation S404, a face driving parameter sequence is extracted.
According to the embodiment of the disclosure, the face driving parameters can be extracted from each frame of the acquired image of the driving video containing the face, and the face driving parameter sequence is obtained.
In operation S405, the reference face driving parameters are selected from the face driving parameter sequence, and the parameter variation amounts of the reference face driving parameters and the other face driving parameters are determined.
In operation S406, the parameter variation is fused to the face three-dimensional parameters of the original image, to obtain the face fused three-dimensional parameters.
In operation S407, an initial reconstructed face is obtained based on the face fusion three-dimensional parameters.
In operation S408, the original image, the three-dimensional parameters of the face fusion, and the initial reconstructed face are input into the optical flow prediction network, optical flow information is output, and the optical flow information is applied to the original image to obtain an optical flow reconstructed face image.
In operation S409, the original image, the optical flow reconstructed face image, the face fusion three-dimensional parameters, and the initial reconstructed face are input to generate an countermeasure network, and the target reconstructed face is output.
According to an embodiment of the present disclosure, the three-dimensional face parameters may be a face identification parameter to be reconstructed, a background parameter of the face to be reconstructed, a texture parameter of the face to be reconstructed, and the like. The face driving parameters may be expression parameters driving a face, gesture parameters driving a face, visual parameters driving a face, and the like.
According to the embodiment of the disclosure, the reference face driving parameters are selected from the face driving parameter sequence, which may be a frame of face image with neutral gesture as a reference object, and the face driving parameters of the reference object are selected from the face driving parameter sequence. The step of screening the face image of the neutral posture expression of one frame can be to randomly screen the image of one frame from the driving video as an initial image, extract image characteristics of the initial image and other images except the initial image, and take the image of the frame with the maximum characteristic similarity as the face image of the neutral posture expression by calculating the characteristic similarity between the image characteristics of the initial image and the other images.
According to the embodiment of the disclosure, the parameter variation can be fused to the three-dimensional parameters of the driven face by determining the parameter variation of the face driving parameters of the reference object and the face driving parameters of other frames in the face driving video, so as to obtain the face fusion three-dimensional parameters.
According to the embodiment of the disclosure, the face is primarily reconstructed by utilizing the face fusion three-dimensional parameters fused with the face driving parameters and the face three-dimensional parameters of the face to be reconstructed, and the face driving parameters can be primarily fused to the face three-dimensional parameters of the face to be reconstructed. The face is reconstructed by utilizing the optical flow, and the motion optical flow of the reconstructed face can be applied to the face to be reconstructed. The driven image, the image after the optical flow function, the human face fusion three-dimensional parameters and the initial reconstruction human face are utilized to reconstruct the image, and a target reconstruction image is generated, so that the three-dimensional human face reconstruction effect can be improved, and the information represented by the human face driving parameters is accurately fused to the human face to be reconstructed while the identification information of the human face to be reconstructed is maintained. In addition, by fusing the parameter variation to the three-dimensional parameters of the face to be reconstructed, the reconstruction of the identification information of the driving face to the target reconstructed face can be effectively avoided, and the reconstruction effect is good.
Fig. 5 schematically illustrates a structural schematic of an object reconstruction model according to another embodiment of the present disclosure.
As shown in fig. 5, the three-dimensional object reconstruction method may be performed using the object reconstruction model as shown in fig. 5. The object reconstruction model may include a feature extraction network M1, a renderer M2, an optical flow prediction network M3, and a generation countermeasure network M4.
As shown in fig. 5, the driving image 501 and the original image 502 may be input to the parameter feature extraction network M1, respectively, and the driving parameters 503 and the three-dimensional parameters 504 of the object to be reconstructed may be output. The driving parameters 503 and the three-dimensional parameters 504 of the object to be reconstructed are input to the renderer M2, and an initial reconstructed image 505 is output. The driving parameters 503 and the three-dimensional parameters 504 of the object to be reconstructed may be fused to obtain fused three-dimensional parameters 506. The original image 502, the initial reconstructed image 505, and the fused three-dimensional parameters 506 may be input to an optical flow prediction network M3, optical flow information 507 is output, and optical flow is applied to the original image 502, resulting in an optical flow reconstructed image 508. The original image 502, the optical flow reconstructed image 508, the initial reconstructed image 505, and the fused three-dimensional parameters 506 may be input to generate the countermeasure network M4, outputting an initial target reconstructed image 509 and a mask image 510. Based on the mask image 510, the initial target reconstructed image 509, and the optical flow reconstructed image 508, a target reconstructed image 511 is generated.
According to an embodiment of the present disclosure, for an object reconstruction model as shown in fig. 5, it may be trained using a training method of a deep learning model as shown in fig. 6.
Fig. 6 schematically illustrates a flowchart of a training method of a deep learning model according to an embodiment of the present disclosure.
As shown in fig. 6, the method includes operations S610 to S650.
In operation S610, a sample fusion three-dimensional parameter is obtained based on the sample driving parameter and the sample three-dimensional parameter of the sample object.
Wherein the sample three-dimensional parameter of the sample object is derived based on the sample raw image containing the sample object.
According to embodiments of the present disclosure, the sample driving parameters may be derived based on the sample driving image. The sample drive image contains a sample drive object.
According to an embodiment of the present disclosure, the sample driving parameter may be further obtained based on a sample driving video, where each frame of image of the sample driving video contains a sample driving object.
According to embodiments of the present disclosure, three-dimensional parameters of a sample object may be used to characterize identification information of the sample object.
According to embodiments of the present disclosure, the object identities of the sample object and the sample driven object are the same.
According to embodiments of the present disclosure, sample fusion three-dimensional parameters may be used to characterize identification information of a sample object and information characterized by sample driving parameters. The sample driving parameters can be fused to the three-dimensional parameters of the sample object to obtain the sample fused three-dimensional parameters.
In operation S620, a sample initial reconstructed image is obtained based on the sample fusion three-dimensional parameters, wherein the sample initial reconstructed image includes a sample reconstructed object.
According to the embodiment of the disclosure, image rendering can be performed according to the sample fusion three-dimensional parameters, and an initial reconstructed image of the sample is obtained. The rendering mode may be a rendering mode based on a renderer or a rendering mode based on deep learning.
It should be noted that, for the sample fusion three-dimensional parameter sequence, a sample initial reconstructed image sequence corresponding to the sample fusion three-dimensional parameter sequence one by one can be obtained.
In operation S630, a sample optical flow reconstructed image is obtained based on the sample optical flow information between the sample reconstructed object and the sample object, and the sample original image.
According to an embodiment of the present disclosure, sample optical flow information is used to characterize a sample offset between a pixel of a sample reconstruction object and a pixel of the sample object.
According to embodiments of the present disclosure, sample optical flow information may be determined from differences between pixels of a sample reconstruction object and pixels of a sample object.
According to an embodiment of the present disclosure, an image including a sample reconstructed object and a sample original image including a sample object may be input to an optical flow prediction network after being spliced by channels, and optical flow information may be output. For example, the optical flow prediction network may include flownet2.0 or flownet1.0.
According to the embodiment of the disclosure, the original sample image can be reconstructed according to the sample optical flow information, and a sample optical flow reconstructed image is obtained.
In operation S640, a sample target reconstructed image is generated based on the sample optical flow reconstructed image, the sample initial reconstructed image, and the sample original image.
According to an embodiment of the present disclosure, the sample target reconstructed image comprises a sample target reconstructed object comprising identification information of the sample object and information characterized by sample driving parameters.
According to the embodiment of the disclosure, the sample optical flow reconstructed image, the sample initial reconstructed image and the sample original image can be spliced according to channels and then input into an image generator to generate a sample target reconstructed image.
In operation S650, the deep learning model is trained based on the sample target reconstructed image and the sample driving parameters, resulting in a trained deep learning model.
According to the embodiment of the disclosure, the sample three-dimensional parameters of the sample target reconstructed image can be determined according to the sample target reconstructed image; and determining a parameter loss value based on the sample three-dimensional parameter and the sample driving parameter, and adjusting parameters of the deep learning model by minimizing the parameter loss value to obtain a trained deep learning model.
According to the embodiment of the disclosure, the sample driving parameters are fused to the three-dimensional parameters of the sample object, and the image is preliminarily reconstructed; the sample optical flow reconstruction image is utilized, and the motion optical flow of the sample reconstruction object can be applied to the sample original image; and the sample optical flow reconstruction image, the sample initial reconstruction image and the sample original image are utilized to reconstruct the image again to generate a sample target reconstruction image, and the trained deep learning model obtained based on the sample target reconstruction image and the sample driving parameters has high accuracy and good reconstruction effect.
According to an embodiment of the present disclosure, generating a sample target reconstructed image based on a sample optical flow reconstructed image, a sample initial reconstructed image, and a sample original image, includes: and generating a sample initial target reconstructed image and a sample mask image based on the sample optical flow reconstructed image, the sample initial reconstructed image and the sample original image. The sample mask image includes a plurality of pixels, the pixel value of each pixel being used to characterize a difference between a pixel in the sample initial target reconstructed image and a pixel in the reconstructed image after sample optical flow processing. And generating a sample target reconstructed image based on the sample mask image, the sample initial target reconstructed image and the sample optical flow reconstructed image.
According to the embodiment of the disclosure, the sample optical flow reconstruction image, the sample initial reconstruction image, the sample original image and the sample fusion three-dimensional parameter can be input into a deep learning model generation countermeasure network, and the sample initial reconstruction image and the sample mask image are output.
According to embodiments of the present disclosure, for each sample pixel in the sample mask image, a first sample pixel that matches the sample pixel may be determined from the sample initial target reconstructed image, and a second sample pixel that matches the sample pixel may be determined from the sample optical flow reconstructed image. And fusing the first sample pixel and the second sample pixel based on the sample pixel value of the sample pixel to obtain a target sample pixel. A sample target reconstructed image is generated based on the plurality of target sample pixels.
According to the embodiment of the disclosure, the sample target reconstructed image is generated based on the sample mask image on the basis of generating the sample initial target reconstructed image, so that model training difficulty can be reduced, and model training efficiency can be improved.
According to an embodiment of the present disclosure, training a deep learning model based on a sample target reconstructed image and sample driving parameters, resulting in a trained deep learning model, comprising: and obtaining a first loss value based on the sample target reconstructed image and the sample driving image corresponding to the sample driving parameter. And reconstructing an image and a sample driving image based on the sample optical flow to obtain a second loss value. And obtaining a third loss value based on the sample mask image, the sample original image and the sample driving image. Training the deep learning model based on the first loss value, the second loss value, and the third loss value, resulting in a trained deep learning model.
According to embodiments of the present disclosure, in training a deep learning model, a first loss value of a sample target reconstructed image and a sample drive image corresponding to a sample drive parameter may be calculated using a loss function, and parameters of a generator in an antagonism network are optimally generated by minimizing the first loss value. For example, the loss function may include at least one of a Pixel loss function (Pixel-wise), a mean square error loss (MSE), an L2 regularized L2 loss, a regularized L1 loss, a cross entropy loss, not specifically defined herein.
A second loss value of the sample optical flow reconstruction image and the sample drive image may be calculated using the loss function, and parameters of the optical flow prediction network are optimized by minimizing the second loss value.
The object detection can be carried out on the sample original image and the sample driving image respectively, the first sample background area and the second sample background area except the object frame are obtained through segmentation, the characteristics of the first sample background area and the second sample background area are extracted, then the characteristics of the sample fusion are obtained through fusion, the third loss value is determined based on the sample fusion characteristics and the sample mask image, and the parameters of the generator in the countermeasure network are optimized and generated through minimizing the third loss value.
According to the embodiment of the disclosure, in the process of training the deep learning model, the first loss value, the second loss value and the third loss value can be weighted and averaged to obtain an average loss value, and parameters of the deep learning model are adjusted by minimizing the average loss value.
Based on the training method provided by the embodiment of the disclosure, the processing efficiency and the reconstruction accuracy of the deep learning model can be improved while the training difficulty is reduced.
According to the embodiment of the disclosure, two frames of images can be randomly extracted from the same video, and the two frames of images are respectively used as a sample driving image and a sample original image. Extracting sample driven images using a parameter extraction networkSample driving parameters of (a), e.g. expression (beta) d Exp), pose (P d -phase), vision (τ d -size) and the like; and extracting sample three-dimensional parameters of the original image of the sample, such as a logo (alpha s Identity), illumination (gamma s Lighting), texture (alpha s -texture) etc. parameter information; and fusing the sample driving parameters of the sample driving image to the sample three-dimensional parameters of the sample original image to obtain sample fused three-dimensional parameters. And (3) inputting the sample driving parameters of the sample driving image and the sample three-dimensional parameters of the sample original image into a renderer, and rendering a sample initial reconstructed image.
After the sample original image and the sample initial reconstructed image channel are spliced, the three-dimensional parameters are fused by combining the sample through an optical flow prediction network, the motion optical flow is predicted, and the optical flow is applied to the sample original image to obtain a sample optical flow reconstructed image.
After the sample original image, the sample optical flow reconstruction image and the sample initial reconstruction image channel are spliced, a sample initial target reconstruction image and a sample mask image are generated by generating an countermeasure network and combining sample fusion three-dimensional parameters, and the sample target reconstruction image is generated based on the sample mask image, the sample optical flow reconstruction image and the sample initial target reconstruction image.
According to the embodiment of the disclosure, by fusing the sample driving parameters of the sample driving image to the sample three-dimensional parameters of the sample original image, parameter confusion caused by the change of the motion kneading gesture and the expression of the key point can be avoided, and erroneous driving effects can be avoided. The sample driving parameters of the sample driving image and the sample three-dimensional parameters of the sample original image are input into the renderer, so that the sample initial reconstruction image is rendered, and reconstruction errors caused by utilizing a three-dimensional expression driving technology can be avoided; and the problem that information not in the image cannot be rendered can be solved, such as that internal texture information of the mouth cannot be rendered through the closed-mouth image.
According to the deep learning model obtained through training by the training framework of the deep learning model, a single image can be reconstructed, a plurality of images can be reconstructed, and the reconstruction accuracy is high.
According to the embodiment of the disclosure, the sample initial target reconstructed image and the sample original image are fused, and a first fused image of the sample is obtained. And obtaining sample optical flow information based on the sample first fusion image and the sample fusion three-dimensional parameter.
According to an embodiment of the present disclosure, generating a sample initial target reconstructed image and a sample mask image based on a sample optical flow reconstructed image, a sample initial reconstructed image, and a sample original image includes: and fusing the sample optical flow reconstruction image, the sample initial reconstruction image and the sample original image to obtain a sample second fused image. And generating a sample initial target reconstruction image and a sample mask image based on the sample second fused image and the sample fusion three-dimensional parameters.
According to an embodiment of the present disclosure, obtaining a sample fusion three-dimensional parameter based on a sample driving parameter and a three-dimensional parameter of an object to be reconstructed of a sample, includes: three-dimensional parameters of a sample driving object in a sample driving image are extracted. The sample driving parameters are determined from the three-dimensional parameters of the sample driving object. And updating the sample target three-dimensional subparameter in the three-dimensional parameters of the sample object to be reconstructed by using the sample driving parameters to obtain the sample fusion three-dimensional parameters.
According to an embodiment of the present disclosure, the sample target three-dimensional sub-parameters include parameters consistent with the parameter type of the sample driving parameters.
According to an embodiment of the present disclosure, obtaining a sample fusion three-dimensional parameter based on a sample driving parameter and a three-dimensional parameter of an object to be reconstructed of a sample, includes: a sample parameter variation between the sample reference object and the sample drive object is determined. And obtaining the sample fusion three-dimensional parameter based on the sample parameter variation and the three-dimensional parameter of the sample object to be reconstructed.
According to an embodiment of the present disclosure, the sample driving object is used for driving reconstruction of an object to be reconstructed of a sample, the object identifiers of the sample reference object and the sample driving object are the same, and the sample driving parameters of the sample reference object and the sample driving object are different.
Based on the image detection method provided by the disclosure, the disclosure also provides an image detection device. The device will be described in detail below in connection with fig. 7.
Fig. 7 schematically illustrates a block diagram of a three-dimensional object reconstruction apparatus according to an embodiment of the present disclosure.
As shown in fig. 7, the apparatus 700 of this embodiment includes a first fusion module 710, a first reconstruction module 720, a second reconstruction module 730, and a third reconstruction module 740.
The first fusion module 710 is configured to obtain a fused three-dimensional parameter based on the driving parameter and the three-dimensional parameter of the object to be reconstructed, where the three-dimensional parameter of the object to be reconstructed is obtained based on the original image including the object to be reconstructed.
The first reconstruction module 720 is configured to obtain an initial reconstructed image based on the fused three-dimensional parameters, where the initial reconstructed image includes a reconstructed object.
The second modeling block 730 is configured to obtain an optical flow reconstructed image based on optical flow information between the reconstructed object and the object to be reconstructed and the original image, where the optical flow information is used to characterize an offset between a pixel of the reconstructed object and a pixel of the object to be reconstructed.
And a third modeling block 740, configured to generate a target reconstructed image based on the optical flow reconstructed image, the initial reconstructed image, and the original image, where the target reconstructed image includes a target reconstructed object, and the target reconstructed object includes identification information of an object to be reconstructed and information characterized by driving parameters.
According to an embodiment of the present disclosure, the third reconstruction module 740 includes: a first generation unit and a second generation unit.
The first generation unit is used for generating an initial target reconstruction image and a mask image based on the optical flow reconstruction image, the initial reconstruction image and the original image, wherein the mask image comprises a plurality of pixels, and pixel values of the pixels are used for representing differences between pixels in the initial target reconstruction image and pixels in the reconstructed image after optical flow processing.
The second generation unit is used for generating a target reconstruction image based on the mask image, the initial target reconstruction image and the optical flow reconstruction image.
According to an embodiment of the present disclosure, the second generating unit includes: a first determination subunit, a first fusion subunit, and a first generation subunit.
The determination subunit is configured to determine, for each pixel in the mask image, a first pixel that matches the pixel from the initial target reconstructed image, and a second pixel that matches the pixel from the optical flow reconstructed image.
The fusion subunit is configured to fuse the first pixel and the second pixel based on the pixel value of the pixel, so as to obtain a target pixel.
The generation subunit is configured to generate a target reconstructed image based on the plurality of target pixels.
According to an embodiment of the present disclosure, the apparatus 700 further comprises: and the second fusion module and the information determination module.
The second fusion module is used for fusing the initial target reconstruction image and the original image to obtain a first fusion image.
The information determining module is used for obtaining optical flow information based on the first fused image and the fused three-dimensional parameters.
According to an embodiment of the present disclosure, a first generation unit includes: a second fusion subunit and a second generation subunit.
The second fusion subunit is used for fusing the optical flow reconstruction image, the initial reconstruction image and the original image to obtain a second fusion image.
The second generation subunit is configured to generate an initial target reconstructed image and a mask image based on the second fused image and the fused three-dimensional parameter.
According to an embodiment of the present disclosure, the first fusing module 710 includes: the device comprises an extraction unit, a first determination unit and an updating unit.
The extraction unit is used for extracting three-dimensional parameters of a driving object in the driving image.
The first determination unit is configured to determine a driving parameter from three-dimensional parameters of a driving object.
The updating unit is used for updating a target three-dimensional subparameter in the three-dimensional parameters of the object to be rebuilt by using the driving parameters to obtain a fused three-dimensional parameter, wherein the target three-dimensional subparameter comprises parameters consistent with the parameter types of the driving parameters.
According to an embodiment of the present disclosure, the first fusing module 710 includes: a second determination unit and a third determination unit.
The second determining unit is used for determining parameter variation between a reference object and a driving object, wherein the driving object is used for driving reconstruction of an object to be reconstructed, the object identifiers of the reference object and the driving object are the same, and the driving parameters of the reference object and the driving object are different.
The third determining unit is used for obtaining a fusion three-dimensional parameter based on the parameter variation and the three-dimensional parameter of the object to be reconstructed.
According to an embodiment of the present disclosure, any of the first fusion module 710, the first reconstruction module 720, the second reconstruction module 730, and the third reconstruction module 740 may be combined in one module to be implemented, or any of the modules may be split into a plurality of modules. Alternatively, at least some of the functionality of one or more of the modules may be combined with at least some of the functionality of other modules and implemented in one module. At least one of the first fusion module 710, the first reconstruction module 720, the second reconstruction module 730, and the third reconstruction module 740 according to embodiments of the present disclosure may be implemented, at least in part, as hardware circuitry, such as a Field Programmable Gate Array (FPGA), a Programmable Logic Array (PLA), a system-on-chip, a system-on-substrate, a system-on-package, an Application Specific Integrated Circuit (ASIC), or as hardware or firmware in any other reasonable manner of integrating or packaging the circuitry, or as any one of or a suitable combination of any of the three. Alternatively, at least one of the first fusion module 710, the first reconstruction module 720, the second reconstruction module 730, and the third reconstruction module 740 may be at least partially implemented as computer program modules which, when executed, perform the corresponding functions.
Based on the training method of the deep learning model provided by the disclosure, the disclosure also provides a training device of the deep learning model. The device will be described in detail below in connection with fig. 8.
Fig. 8 schematically illustrates a block diagram of a training apparatus of a deep learning model according to an embodiment of the present disclosure.
As shown in fig. 8, the apparatus 800 of this embodiment includes a sample fusion module 810, a first sample reconstruction module 820, a second sample reconstruction module 830, a third sample reconstruction module 840, and a training module 850.
A sample fusion module 810, configured to obtain a sample fusion three-dimensional parameter based on the sample driving parameter and a sample three-dimensional parameter of the sample object, where the sample three-dimensional parameter of the sample object is obtained based on a sample original image including the sample object;
a first sample reconstruction module 820, configured to obtain a sample initial reconstructed image based on the sample fusion three-dimensional parameter, where the sample initial reconstructed image includes a sample reconstructed object;
a second sample reconstruction module 830, configured to obtain a sample optical flow reconstruction image based on sample optical flow information between a sample reconstruction object and a sample original image, where the sample optical flow information is used to characterize a sample offset between a pixel of the sample reconstruction object and a pixel of the sample object;
A third sample reconstruction module 840, configured to generate a sample target reconstructed image based on the sample optical flow reconstructed image, the sample initial reconstructed image, and the sample original image, where the sample target reconstructed image includes a sample target reconstructed object, and the sample target reconstructed object includes identification information of the sample object and information characterized by sample driving parameters; and
the training module 850 is configured to train the deep learning model based on the sample target reconstructed image and the sample driving parameters, and obtain a trained deep learning model.
According to an embodiment of the present disclosure, the third sample reconstruction module 840 includes: a first sample generation unit and a second sample generation unit.
The first sample generation unit is used for generating a sample initial target reconstruction image and a sample mask image based on the sample optical flow reconstruction image, the sample initial reconstruction image and the sample original image, wherein the sample mask image comprises a plurality of pixels, and the pixel value of each pixel is used for representing the difference between the pixel in the sample initial target reconstruction image and the pixel in the reconstructed image after sample optical flow processing.
The second sample generation unit is used for generating a sample target reconstruction image based on the sample mask image, the sample initial target reconstruction image and the sample optical flow reconstruction image.
According to an embodiment of the present disclosure, training module 850 includes: the model training device comprises a first loss determining unit, a second loss determining unit, a third loss determining unit and a model training unit.
The first loss determination unit is used for obtaining a first loss value based on the sample target reconstructed image and the sample driving image corresponding to the sample driving parameter.
The second loss determination unit is used for reconstructing an image and a sample driving image based on the sample optical flow to obtain a second loss value.
The third loss determination unit is used for obtaining a third loss value based on the sample mask image, the sample original image and the sample driving image.
The model training unit is used for training the deep learning model based on the first loss value, the second loss value and the third loss value to obtain a trained deep learning model.
According to an embodiment of the present disclosure, the first sample generation unit includes: a sample fusion subunit and a sample generation subunit.
The sample fusion subunit is used for fusing the sample optical flow reconstruction image, the sample initial reconstruction image and the sample original image to obtain a sample second fusion image.
The sample generation subunit is used for generating a sample initial target reconstruction image and a sample mask image based on the sample second fused image and the sample fused three-dimensional parameter.
According to an embodiment of the present disclosure, the sample fusion module 810 includes: a sample extraction unit, a sample determination unit and a sample update unit.
The sample extraction unit is used for extracting three-dimensional parameters of a sample driving object in the sample driving image.
The sample determination unit is used for determining a sample driving parameter from three-dimensional parameters of the sample driving object.
The sample updating unit is used for updating the sample target three-dimensional subparameter in the three-dimensional parameters of the sample object to be reconstructed by using the sample driving parameters to obtain the sample fusion three-dimensional parameters.
According to an embodiment of the present disclosure, the sample target three-dimensional sub-parameters include parameters consistent with the parameter type of the sample driving parameters.
According to an embodiment of the present disclosure, the sample fusion module 810 includes: a sample second determination unit and a sample third determination unit.
The sample second determining unit is used for determining a sample parameter variation between the sample reference object and the sample driving object.
The sample third determining unit is used for obtaining a sample fusion three-dimensional parameter based on the sample parameter variation and the three-dimensional parameter of the sample object to be reconstructed.
According to an embodiment of the present disclosure, the sample driving object is used for driving reconstruction of an object to be reconstructed of a sample, the object identifiers of the sample reference object and the sample driving object are the same, and the sample driving parameters of the sample reference object and the sample driving object are different.
According to embodiments of the present disclosure, any of the sample fusion module 810, the first sample reconstruction module 820, the second sample reconstruction module 830, the third sample reconstruction module 840, and the training module 850 may be combined in one module to be implemented, or any of the modules may be split into a plurality of modules. Alternatively, at least some of the functionality of one or more of the modules may be combined with at least some of the functionality of other modules and implemented in one module. At least one of the sample fusion module 810, the first sample reconstruction module 820, the second sample reconstruction module 830, the third sample reconstruction module 840, and the training module 850 according to embodiments of the present disclosure may be implemented at least in part as hardware circuitry, such as a Field Programmable Gate Array (FPGA), a Programmable Logic Array (PLA), a system on a chip, a system on a substrate, a system on a package, an Application Specific Integrated Circuit (ASIC), or as hardware or firmware in any other reasonable manner of integrating or packaging the circuitry, or as any one of or a suitable combination of any of the three. Alternatively, at least one of the sample fusion module 810, the first sample reconstruction module 820, the second sample reconstruction module 830, the third sample reconstruction module 840, and the training module 850 may be at least partially implemented as computer program modules that, when executed, perform the corresponding functions.
According to embodiments of the present disclosure, the present disclosure also provides an electronic device, a readable storage medium and a computer program product.
According to an embodiment of the present disclosure, an electronic device includes: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform a method as in an embodiment of the present disclosure.
According to an embodiment of the present disclosure, a non-transitory computer-readable storage medium storing computer instructions for causing a computer to perform a method as in an embodiment of the present disclosure.
According to an embodiment of the present disclosure, a computer program product comprising a computer program which, when executed by a processor, implements a method as an embodiment of the present disclosure.
Fig. 9 schematically illustrates a block diagram of an electronic device adapted to implement a three-dimensional object reconstruction method and/or a training method of a deep learning model, according to an embodiment of the disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the disclosure described and/or claimed herein.
As shown in fig. 9, the apparatus 900 includes a computing unit 901 that can perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM) 902 or a computer program loaded from a storage unit 909 into a Random Access Memory (RAM) 903. In the RAM 903, various programs and data required for the operation of the device 900 can also be stored. The computing unit 901, the ROM 902, and the RAM 903 are connected to each other by a bus 904. An input/output (I/O) interface 905 is also connected to the bus 904.
Various components in device 900 are connected to an input/output (I/O) interface 905, including: an input unit 906 such as a keyboard, a mouse, or the like; an output unit 907 such as various types of displays, speakers, and the like; a storage unit 908 such as a magnetic disk, an optical disk, or the like; and a communication unit 909 such as a network card, modem, wireless communication transceiver, or the like. The communication unit 909 allows the device 900 to exchange information/data with other devices through a computer network such as the internet and/or various telecommunications networks.
The computing unit 901 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of computing unit 901 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, etc. The computing unit 901 performs the respective methods and processes described above, for example, a three-dimensional object reconstruction method and/or a training method of a deep learning model. For example, in some embodiments, the three-dimensional object reconstruction method and/or the training method of the deep learning model may be implemented as a computer software program tangibly embodied on a machine-readable medium, such as the storage unit 908. In some embodiments, part or all of the computer program may be loaded and/or installed onto the device 900 via the ROM 902 and/or the communication unit 909. When the computer program is loaded into the RAM 903 and executed by the computing unit 901, one or more steps of the three-dimensional object reconstruction method and/or the training method of the deep learning model described above may be performed. Alternatively, in other embodiments, the computing unit 901 may be configured to perform the three-dimensional object reconstruction method and/or the training method of the deep learning model in any other suitable way (e.g. by means of firmware).
Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuit systems, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), systems On Chip (SOCs), complex Programmable Logic Devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.
Program code for carrying out methods of the present disclosure may be written in any combination of one or more programming languages. These program code may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus such that the program code, when executed by the processor or controller, causes the functions/operations specified in the flowchart and/or block diagram to be implemented. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.
In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. The machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and pointing device (e.g., a mouse or trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), and the internet.
The computer system may include a client and a server. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server may be a cloud server, a server of a distributed system, or a server incorporating a blockchain.
It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps recited in the present disclosure may be performed in parallel or sequentially or in a different order, provided that the desired results of the technical solutions of the present disclosure are achieved, and are not limited herein.
The above detailed description should not be taken as limiting the scope of the present disclosure. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present disclosure are intended to be included within the scope of the present disclosure.

Claims (23)

1. A method of three-dimensional object reconstruction, comprising:
obtaining a fusion three-dimensional parameter based on a driving parameter and a three-dimensional parameter of an object to be reconstructed, wherein the three-dimensional parameter of the object to be reconstructed is obtained based on an original image containing the object to be reconstructed;
obtaining an initial reconstructed image based on the fusion three-dimensional parameters, wherein the initial reconstructed image comprises a reconstructed object;
obtaining an optical flow reconstruction image based on optical flow information between the reconstruction object and the object to be reconstructed and the original image, wherein the optical flow information is used for representing offset between pixels of the reconstruction object and pixels of the object to be reconstructed; and
Generating a target reconstruction image based on the optical flow reconstruction image, the initial reconstruction image and the original image, wherein the target reconstruction image comprises a target reconstruction object, and the target reconstruction object comprises identification information of the object to be reconstructed and information characterized by the driving parameters.
2. The method of claim 1, wherein the generating a target reconstructed image based on the optical flow reconstructed image, the initial reconstructed image, and the original image comprises:
generating an initial target reconstructed image and a mask image based on the optical flow reconstructed image, the initial reconstructed image, and the original image, wherein the mask image comprises a plurality of pixels, and pixel values of the pixels are used for representing differences between pixels in the initial target reconstructed image and pixels in the reconstructed image after optical flow processing; and
the target reconstructed image is generated based on the mask image, the initial target reconstructed image, and the optical flow reconstructed image.
3. The method of claim 2, wherein the generating the target reconstructed image based on the mask image, the initial target reconstructed image, and the optical flow reconstructed image comprises:
For each pixel in the mask image, determining a first pixel from the initial target reconstructed image that matches the pixel, and determining a second pixel from the optical flow reconstructed image that matches the pixel;
fusing the first pixel and the second pixel based on the pixel value of the pixel to obtain a target pixel;
the target reconstructed image is generated based on a plurality of the target pixels.
4. The method of claim 1, further comprising:
fusing the initial target reconstruction image and the original image to obtain a first fused image; and
and obtaining the optical flow information based on the first fused image and the fused three-dimensional parameter.
5. The method of claim 2, wherein the generating an initial target reconstructed image and a mask image based on the optical flow reconstructed image, the initial reconstructed image, and the original image comprises:
fusing the optical flow reconstruction image, the initial reconstruction image and the original image to obtain a second fused image; and
the initial target reconstructed image and the mask image are generated based on the second fused image and the fused three-dimensional parameters.
6. The method according to claim 1, wherein the obtaining the fused three-dimensional parameters based on the driving parameters and the three-dimensional parameters of the object to be reconstructed comprises:
extracting three-dimensional parameters of a driving object in a driving image;
determining the driving parameters from the three-dimensional parameters of the driving object; and
and updating a target three-dimensional subparameter in the three-dimensional parameters of the object to be rebuilt by using the driving parameters to obtain the fused three-dimensional parameters, wherein the target three-dimensional subparameter comprises parameters consistent with the parameter types of the driving parameters.
7. The method according to claim 1, wherein the obtaining the fused three-dimensional parameters based on the driving parameters and the three-dimensional parameters of the object to be reconstructed comprises:
determining parameter variation between a reference object and a driving object, wherein the driving object is used for driving reconstruction of the object to be reconstructed, the object identifiers of the reference object and the driving object are the same, and the driving parameters of the reference object and the driving object are different; and
and obtaining the fusion three-dimensional parameter based on the parameter variation and the three-dimensional parameter of the object to be reconstructed.
8. A training method of a deep learning model, comprising:
Obtaining a sample fusion three-dimensional parameter based on a sample driving parameter and a sample three-dimensional parameter of a sample object, wherein the sample three-dimensional parameter of the sample object is obtained based on a sample original image containing the sample object;
obtaining a sample initial reconstruction image based on the sample fusion three-dimensional parameters, wherein the sample initial reconstruction image comprises a sample reconstruction object;
obtaining a sample optical flow reconstruction image based on the sample optical flow information between the sample reconstruction object and the sample original image, wherein the sample optical flow information is used for representing a sample offset between a pixel of the sample reconstruction object and a pixel of the sample object;
generating a sample target reconstructed image based on the sample optical flow reconstructed image, the sample initial reconstructed image and the sample original image, wherein the sample target reconstructed image comprises a sample target reconstructed object, and the sample target reconstructed object comprises identification information of the sample object and information characterized by the sample driving parameters; and
training a deep learning model based on the sample target reconstructed image and the sample driving parameters to obtain a trained deep learning model.
9. The method of claim 8, wherein the generating a sample target reconstructed image based on the sample optical flow reconstructed image, the sample initial reconstructed image, and the sample original image comprises:
generating a sample initial target reconstructed image and a sample mask image based on the sample optical flow reconstructed image, the sample initial reconstructed image, and the sample original image, wherein the sample mask image comprises a plurality of pixels, and pixel values of each pixel are used for representing differences between pixels in the sample initial target reconstructed image and pixels in the reconstructed image after sample optical flow processing; and
the sample target reconstructed image is generated based on the sample mask image, the sample initial target reconstructed image, and the sample optical flow reconstructed image.
10. The method of claim 9, wherein the training a deep learning model based on the sample target reconstructed image and the sample driving parameters results in a trained deep learning model, comprising:
obtaining a first loss value based on the sample target reconstructed image and a sample driving image corresponding to the sample driving parameter;
Reconstructing an image based on the sample optical flow and the sample driving image to obtain a second loss value;
obtaining a third loss value based on the sample mask image, the sample original image and the sample driving image; and
training the deep learning model based on the first loss value, the second loss value, and the third loss value, resulting in the trained deep learning model.
11. A three-dimensional object reconstruction apparatus comprising:
the first fusion module is used for obtaining fusion three-dimensional parameters based on the driving parameters and the three-dimensional parameters of the object to be reconstructed, wherein the three-dimensional parameters of the object to be reconstructed are obtained based on an original image containing the object to be reconstructed;
the first reconstruction module is used for obtaining an initial reconstruction image based on the fusion three-dimensional parameters, wherein the initial reconstruction image comprises a reconstruction object;
the second reconstruction module is used for obtaining an optical flow reconstruction image based on optical flow information between the reconstruction object and the object to be reconstructed and the original image, wherein the optical flow information is used for representing the offset between the pixels of the reconstruction object and the pixels of the object to be reconstructed; and
And the third reconstruction module is used for generating a target reconstruction image based on the optical flow reconstruction image, the initial reconstruction image and the original image, wherein the target reconstruction image comprises a target reconstruction object, and the target reconstruction object comprises identification information of the object to be reconstructed and information characterized by the driving parameters.
12. The apparatus of claim 11, wherein the third modeling block comprises:
a first generation unit configured to generate an initial target reconstructed image and a mask image based on the optical flow reconstructed image, the initial reconstructed image, and the original image, wherein the mask image includes a plurality of pixels, and pixel values of the pixels are used to characterize differences between pixels in the initial target reconstructed image and pixels in the optical flow processed reconstructed image; and
a second generation unit configured to generate the target reconstructed image based on the mask image, the initial target reconstructed image, and the optical flow reconstructed image.
13. The apparatus of claim 12, wherein the second generation unit comprises:
a first determination subunit configured to determine, for each pixel in the mask image, a first pixel that matches the pixel from the initial target reconstructed image, and a second pixel that matches the pixel from the optical flow reconstructed image;
The first fusion subunit is used for fusing the first pixel and the second pixel based on the pixel value of the pixel to obtain a target pixel;
a first generation subunit for generating the target reconstructed image based on a plurality of the target pixels.
14. The apparatus of claim 11, further comprising:
the second fusion module is used for fusing the initial target reconstruction image and the original image to obtain a first fusion image; and
and the information determining module is used for obtaining the optical flow information based on the first fused image and the fused three-dimensional parameter.
15. The apparatus of claim 12, wherein the first generation unit comprises:
the second fusion subunit is used for fusing the optical flow reconstruction image, the initial reconstruction image and the original image to obtain a second fusion image; and
and a second generation subunit, configured to generate the initial target reconstructed image and the mask image based on the second fused image and the fused three-dimensional parameter.
16. The apparatus of claim 11, wherein the first fusion module comprises:
an extraction unit for extracting three-dimensional parameters of a driving object in the driving image;
A first determining unit configured to determine the driving parameter from three-dimensional parameters of the driving object; and
and the updating unit is used for updating a target three-dimensional subparameter in the three-dimensional parameters of the object to be reconstructed by using the driving parameters to obtain the fused three-dimensional parameters, wherein the target three-dimensional subparameter comprises parameters consistent with the parameter types of the driving parameters.
17. The apparatus of claim 11, wherein the first fusion module comprises:
a second determining unit, configured to determine a parameter variation between a reference object and a driving object, where the driving object is configured to drive reconstruction of the object to be reconstructed, the reference object and the driving object have the same object identifier, and driving parameters of the reference object and the driving object are different; and
and the third determining unit is used for obtaining the fusion three-dimensional parameter based on the parameter variation and the three-dimensional parameter of the object to be reconstructed.
18. A training device for a deep learning model, comprising:
the sample fusion module is used for obtaining sample fusion three-dimensional parameters based on the sample driving parameters and sample three-dimensional parameters of the sample object, wherein the sample three-dimensional parameters of the sample object are obtained based on a sample original image containing the sample object;
The first sample reconstruction module is used for obtaining a sample initial reconstruction image based on the sample fusion three-dimensional parameters, wherein the sample initial reconstruction image comprises a sample reconstruction object;
a second sample reconstruction module, configured to obtain a sample optical flow reconstruction image based on the sample optical flow information between the sample reconstruction object and the sample original image, where the sample optical flow information is used to characterize a sample offset between a pixel of the sample reconstruction object and a pixel of the sample object;
a third sample reconstruction module, configured to generate a sample target reconstruction image based on the sample optical flow reconstruction image, the sample initial reconstruction image, and the sample original image, where the sample target reconstruction image includes a sample target reconstruction object, and the sample target reconstruction object includes identification information of the sample object and information characterized by the sample driving parameters; and
and the training module is used for training the deep learning model based on the sample target reconstructed image and the sample driving parameters to obtain a trained deep learning model.
19. The apparatus of claim 18, wherein the third sample reconstruction module comprises:
A first sample generation unit configured to generate a sample initial target reconstructed image and a sample mask image based on the sample optical flow reconstructed image, the sample initial reconstructed image, and the sample original image, wherein the sample mask image includes a plurality of pixels, and a pixel value of each pixel is used to characterize a difference between a pixel in the sample initial target reconstructed image and a pixel in the reconstructed image after the sample optical flow processing; and
and the second sample generation unit is used for generating the sample target reconstruction image based on the sample mask image, the sample initial target reconstruction image and the sample optical flow reconstruction image.
20. The apparatus of claim 19, wherein the training module comprises:
a first loss determination unit, configured to obtain a first loss value based on the sample target reconstructed image and a sample driving image corresponding to the sample driving parameter;
a second loss determination unit, configured to obtain a second loss value based on the sample optical flow reconstruction image and the sample driving image;
a third loss determination unit configured to obtain a third loss value based on the sample mask image, the sample original image, and the sample driving image; and
And the model training unit is used for training the deep learning model based on the first loss value, the second loss value and the third loss value to obtain the trained deep learning model.
21. An electronic device, comprising:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1 to 10.
22. A non-transitory computer readable storage medium storing computer instructions for causing the computer to perform the method of any one of claims 1 to 10.
23. A computer program product comprising a computer program which, when executed by a processor, implements the method according to any one of claims 1 to 10.
CN202311764196.7A 2023-12-20 2023-12-20 Three-dimensional object reconstruction method, model training method, device, equipment and medium Pending CN117745943A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311764196.7A CN117745943A (en) 2023-12-20 2023-12-20 Three-dimensional object reconstruction method, model training method, device, equipment and medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311764196.7A CN117745943A (en) 2023-12-20 2023-12-20 Three-dimensional object reconstruction method, model training method, device, equipment and medium

Publications (1)

Publication Number Publication Date
CN117745943A true CN117745943A (en) 2024-03-22

Family

ID=90282905

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311764196.7A Pending CN117745943A (en) 2023-12-20 2023-12-20 Three-dimensional object reconstruction method, model training method, device, equipment and medium

Country Status (1)

Country Link
CN (1) CN117745943A (en)

Similar Documents

Publication Publication Date Title
CN114550177B (en) Image processing method, text recognition method and device
CN112784765B (en) Method, apparatus, device and storage medium for recognizing motion
CN113420719A (en) Method and device for generating motion capture data, electronic equipment and storage medium
CN114792355B (en) Virtual image generation method and device, electronic equipment and storage medium
CN115330940A (en) Three-dimensional reconstruction method, device, equipment and medium
CN113657396B (en) Training method, translation display method, device, electronic equipment and storage medium
CN107203961B (en) Expression migration method and electronic equipment
CN114708374A (en) Virtual image generation method and device, electronic equipment and storage medium
CN115731341A (en) Three-dimensional human head reconstruction method, device, equipment and medium
CN113962845A (en) Image processing method, image processing apparatus, electronic device, and storage medium
CN117036574A (en) Rendering method, rendering device, electronic equipment and storage medium
CN113781653B (en) Object model generation method and device, electronic equipment and storage medium
CN113240780B (en) Method and device for generating animation
CN115082298A (en) Image generation method, image generation device, electronic device, and storage medium
CN117745943A (en) Three-dimensional object reconstruction method, model training method, device, equipment and medium
CN114529649A (en) Image processing method and device
CN113361535A (en) Image segmentation model training method, image segmentation method and related device
CN113537398A (en) Color value evaluation model training method and component, and color value evaluation method and component
CN116385643B (en) Virtual image generation method, virtual image model training method, virtual image generation device, virtual image model training device and electronic equipment
CN114820908B (en) Virtual image generation method and device, electronic equipment and storage medium
CN115953553B (en) Avatar generation method, apparatus, electronic device, and storage medium
CN115984947B (en) Image generation method, training device, electronic equipment and storage medium
CN116012666B (en) Image generation, model training and information reconstruction methods and devices and electronic equipment
CN115147508B (en) Training of clothing generation model and method and device for generating clothing image
CN115147681B (en) Training of clothing generation model and method and device for generating clothing image

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination