WO2022052620A1 - 图像生成方法及电子设备 - Google Patents

图像生成方法及电子设备 Download PDF

Info

Publication number
WO2022052620A1
WO2022052620A1 PCT/CN2021/106178 CN2021106178W WO2022052620A1 WO 2022052620 A1 WO2022052620 A1 WO 2022052620A1 CN 2021106178 W CN2021106178 W CN 2021106178W WO 2022052620 A1 WO2022052620 A1 WO 2022052620A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
depth information
depth
pixel
dimensional model
Prior art date
Application number
PCT/CN2021/106178
Other languages
English (en)
French (fr)
Inventor
安世杰
张渊
郑文
Original Assignee
北京达佳互联信息技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京达佳互联信息技术有限公司 filed Critical 北京达佳互联信息技术有限公司
Publication of WO2022052620A1 publication Critical patent/WO2022052620A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T11/002D [Two Dimensional] image generation
    • G06T11/40Filling a planar surface by adding surface attributes, e.g. colour or texture
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T17/00Three dimensional [3D] modelling, e.g. data description of 3D objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/60Rotation of whole images or parts thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10028Range image; Depth image; 3D point clouds
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20212Image combination
    • G06T2207/20221Image fusion; Image merging

Definitions

  • the present disclosure relates to the technical field of image processing, and in particular, to an image generation method and an electronic device.
  • two-dimensional images of the same scene at different angles are captured by dual cameras, difference information between the two-dimensional images at different angles is determined, the difference information is converted into depth information of the two-dimensional image, and reconstruction is based on the depth information. 3D image.
  • Embodiments of the present disclosure provide an image generation method and electronic device, which can optimize the image effect of the generated three-dimensional image.
  • the technical solution is as follows:
  • an image generation method comprising:
  • the first image area is the image area where the target object is located
  • the second image area is the image where the background is located area
  • the third image is obtained by fusing image data in the first image area into the depth-filled second image based on the first depth information and the third depth information.
  • the first depth information is obtained by fusing image data in the first image area into the depth-filled second image based on the first depth information and the third depth information.
  • Three images including:
  • first three-dimensional model based on the image data of the first image area, where the first three-dimensional model is a three-dimensional model corresponding to the target object;
  • the pixel information corresponding to the first three-dimensional model and the second three-dimensional model is fused to obtain the third image, wherein the first three-dimensional model corresponds to
  • the depth information of the pixel point in the third image is the first depth information
  • the depth information of the pixel point corresponding to the second three-dimensional model in the third image is the third depth information.
  • the third image is obtained by fusing pixel information corresponding to the first three-dimensional model and the second three-dimensional model based on the first depth information and the third depth information, comprising: :
  • the depth information of each pixel of the target object is based on the depth information of the target key point of the target object, the target The key point is the key point of the target object;
  • the pixel information of the first pixel point is the pixel information of the target key point in the first three-dimensional model
  • the depth information of the first pixel point is the first depth information of the target key point
  • the pixel information of the second pixel point is the pixel information of the other pixel points in the first three-dimensional model
  • the depth information of the second pixel is the third depth information of the other pixels.
  • acquiring the second image by replacing the image data of the first image area based on the image data of the second image area includes:
  • the background is filled in the removed outline of the region to obtain the second image.
  • the step of filling the background in the removed region outline to obtain the second image includes:
  • the removed first image is input into an image completion model to obtain the second image, and the image completion model is used to fill the background in the outline of the region.
  • the determining of the first depth information of the first image area and the second depth information of the second image area in the first image includes:
  • the first image is input into a first depth determination model to obtain the first depth information and the second depth information.
  • the first depth determination model includes a feature extraction layer, a feature map generation layer, a feature fusion layer, and a depth determination layer;
  • the inputting the first image into the first depth determination model to obtain the first depth information and the second depth information includes:
  • the first depth information and the second depth information are obtained by convolution processing the fused feature map through the depth determination layer.
  • the method further includes:
  • the first coordinate is the position coordinate of the special effect element in the image coordinate system of the third image
  • the second coordinate is the special effect element depth coordinates in the camera coordinate system of the third image
  • the fourth image is obtained by fusing the special effect element to the first target pixel point of the third image based on the first coordinate and the second coordinate, and the first target pixel point is the position coordinate of the the first coordinate, and the depth coordinate is the pixel point of the second coordinate.
  • the method further includes:
  • the third image is rotated to generate a video.
  • the rotating the third image to generate a video includes:
  • the pixels in the third image are rotated to generate a video.
  • the determining a rotation angle to rotate in a direction corresponding to each coordinate axis of the camera coordinate system includes:
  • the rotation angle of the direction is determined based on the display angle weight and the preset display angle.
  • a method for training a depth determination model comprising:
  • a sampling weight of the first image set is determined based on a first quantity and a second quantity, the first quantity being the quantity of sample images included in the first image set, and the second quantity
  • the number is the total number of sample images included in the plurality of first image sets, the sampling weight is positively correlated with the second number, and the sampling weight is negatively correlated with the first number;
  • sampling the first image set Based on the sampling weight, sampling the first image set to obtain a second image set;
  • the second depth determination model is trained to obtain the first depth determination model.
  • an image generating apparatus comprising:
  • a first determining unit configured to determine first depth information of a first image area and second depth information of a second image area in the first image, where the first image area is the image area where the target object is located, and the first image area is the image area where the target object is located, and the first image area is The second image area is the image area where the background is located;
  • a replacement unit configured to acquire a second image by replacing the image data of the first image area based on the image data of the second image area;
  • a filling unit configured to obtain third depth information of the third image area by filling the depth of the third image area based on the second depth information, where the third image area is the same as that in the second image the image area corresponding to the first image area;
  • the first fusion unit is configured to obtain the first image by fusing the image data in the first image area into the depth-filled second image based on the first depth information and the third depth information. Three images.
  • the first fusion unit includes:
  • a first creation subunit configured to create a first three-dimensional model based on the image data of the first image area, where the first three-dimensional model is a three-dimensional model corresponding to the target object;
  • a second creation subunit configured to create a second three-dimensional model based on the depth-filled second image, where the second three-dimensional model is a three-dimensional model corresponding to the background;
  • a fusion subunit configured to fuse pixel information corresponding to the first three-dimensional model and the second three-dimensional model based on the first depth information and the third depth information, to obtain the third image, wherein,
  • the depth information of the pixels corresponding to the first three-dimensional model in the third image is the first depth information
  • the depth information of the pixels corresponding to the second three-dimensional model in the third image is the the third depth information.
  • the fusion subunit is configured to determine, from the first three-dimensional model, depth information of each pixel of the target object, where the depth information of each pixel is based on the The depth information of the target key point of the target object is the benchmark, and the target key point is the key point of the target object; based on the target key point, a first pixel point is determined, and the first pixel point is the target key point The pixel point corresponding to the point in the second three-dimensional model; assign the pixel information and depth information of the first pixel point, and the pixel information of the first pixel point is that the target key point is in the first three-dimensional model.
  • the depth information of the first pixel is the first depth information of the target key point; based on the positional relationship between the target key point and other pixels in the target object, determine the second pixel point , the second pixel point is the pixel point corresponding to the other pixel points in the second three-dimensional model; assign the pixel information and depth information of the second pixel point to obtain the third image, the third image
  • the pixel information of the two pixel points is the pixel information of the other pixel points in the first three-dimensional model, and the depth information of the second pixel point is the third depth information of the other pixel points.
  • the replacement unit includes:
  • a segmentation subunit configured to perform image segmentation on the first image, and determine an area outline corresponding to the first image area
  • a removal subunit configured to remove image data within the outline of the region
  • the completion sub-unit is configured to fill the background in the region outline after removal to obtain the second image.
  • the completion subunit is configured to input the removed first image into an image completion model to obtain the second image, and the image completion model is used for Fill background in area outline.
  • the first determination unit is configured to input the first image into a first depth determination model to obtain the first depth information and the second depth information.
  • the first depth determination model includes a feature extraction layer, a feature map generation layer, a feature fusion layer, and a depth determination layer;
  • the first determining unit includes:
  • a feature extraction subunit configured to input the first image to the feature extraction layer, and extract multiple layers of features of the first image through the feature extraction layer to obtain a plurality of image features of the first image;
  • sampling subunit configured to sample the plurality of image features through the feature map generation layer to obtain a plurality of feature maps of different scales
  • a feature fusion subunit configured to fuse the plurality of feature maps through the feature fusion layer to obtain a fused feature map
  • the convolution subunit is configured to obtain the first depth information and the second depth information by convolution processing the fused feature map through the depth determination layer.
  • the apparatus further includes:
  • a third determining unit is configured to determine a first coordinate and a second coordinate of the special effect element to be added, where the first coordinate is the position coordinate of the special effect element in the image coordinate system of the third image, the The second coordinate is the depth coordinate of the special effect element in the camera coordinate system of the third image;
  • the second fusion unit is configured to obtain a fourth image by fusing the special effect element to the first target pixel point of the third image based on the first coordinate and the second coordinate.
  • the target pixel is a pixel whose position coordinate is the first coordinate, and the depth coordinate is the second coordinate.
  • the apparatus further includes:
  • a generating unit configured to rotate the third image to generate a video.
  • the generating unit includes:
  • a coordinate setting subunit configured to set the position coordinate corresponding to the target key point of the target object as the coordinate origin of the camera coordinate system corresponding to the third image
  • a determination subunit configured to determine a rotation angle to rotate in a direction corresponding to each coordinate axis of the camera coordinate system
  • a generating subunit is configured to rotate pixels in the third image based on the rotation angle to generate a video.
  • the determining subunit is configured to:
  • an obtaining subunit configured to obtain a preset display angle, a preset motion speed and a preset number of display frames of the target key point in each direction; based on the preset motion speed and the preset number of display frames, determine the display angle weight; based on the display angle weight and the preset display angle, determine the rotation angle of the direction.
  • an apparatus for training a depth determination model comprising:
  • an acquiring unit configured to acquire a plurality of first image sets, each of which corresponds to an image scene
  • a second determination unit configured to, for each first image set, determine a sampling weight of the first image set based on a first number and a second number, the first number being the samples included in the first image set the number of images, the second number is the total number of sample images included in the plurality of first image sets, the sampling weight is positively correlated with the second number, and the sampling weight is related to the first Quantity is negatively correlated;
  • a sampling unit configured to sample the first image set based on the sampling weight to obtain a second image set
  • the model training unit is configured to train a second depth determination model to obtain a first depth determination model based on the plurality of second image sets.
  • an electronic device includes a processor and a memory, the memory stores at least one piece of program code, and the at least one piece of program code is loaded by the processor and execute to achieve the following steps:
  • the first image area is the image area where the target object is located
  • the second image area is the image where the background is located area
  • the third image is obtained by fusing image data in the first image area into the depth-filled second image based on the first depth information and the third depth information.
  • the at least one piece of program code is loaded and executed by the processor to implement the following steps:
  • first three-dimensional model based on the image data of the first image area, where the first three-dimensional model is a three-dimensional model corresponding to the target object;
  • the pixel information corresponding to the first three-dimensional model and the second three-dimensional model is fused to obtain the third image, wherein the first three-dimensional model corresponds to
  • the depth information of the pixel point in the third image is the first depth information
  • the depth information of the pixel point corresponding to the second three-dimensional model in the third image is the third depth information.
  • the at least one piece of program code is loaded and executed by the processor to implement the following steps:
  • the depth information of each pixel of the target object is based on the depth information of the target key point of the target object, the target The key point is the key point of the target object;
  • the pixel information of the first pixel point is the pixel information of the target key point in the first three-dimensional model
  • the depth information of the first pixel point is the first depth information of the target key point
  • the pixel information of the second pixel point is the pixel information of the other pixel points in the first three-dimensional model
  • the depth information of the second pixel is the third depth information of the other pixels.
  • the at least one piece of program code is loaded and executed by the processor to implement the following steps:
  • the background is filled in the removed outline of the region to obtain the second image.
  • the at least one piece of program code is loaded and executed by the processor to implement the following steps:
  • the removed first image is input into an image completion model to obtain the second image, and the image completion model is used to fill the background in the outline of the region.
  • the at least one piece of program code is loaded and executed by the processor to implement the following steps:
  • the first image is input into a first depth determination model to obtain the first depth information and the second depth information.
  • the first depth determination model includes a feature extraction layer, a feature map generation layer, a feature fusion layer and a depth determination layer; the at least one piece of program code is loaded and executed by the processor to implement the following steps :
  • the first depth information and the second depth information are obtained by convolution processing the fused feature map through the depth determination layer.
  • the at least one piece of program code is loaded and executed by the processor to implement the following steps:
  • the first coordinate is the position coordinate of the special effect element in the image coordinate system of the third image
  • the second coordinate is the special effect element depth coordinates in the camera coordinate system of the third image
  • the fourth image is obtained by fusing the special effect element to the first target pixel point of the third image based on the first coordinate and the second coordinate, and the first target pixel point is the position coordinate of the the first coordinate, and the depth coordinate is the pixel point of the second coordinate.
  • the at least one piece of program code is loaded and executed by the processor to implement the following steps:
  • the third image is rotated to generate a video.
  • the at least one piece of program code is loaded and executed by the processor to achieve the following steps:
  • the pixels in the third image are rotated to generate a video.
  • the at least one piece of program code is loaded and executed by the processor to implement the following steps:
  • the rotation angle of the direction is determined based on the display angle weight and the preset display angle.
  • an electronic device is provided.
  • the electronic device includes a processor and a memory, the memory stores at least one piece of program code, and the at least one piece of program code is loaded and executed by the processor to implement the following steps:
  • a sampling weight of the first image set is determined based on a first quantity and a second quantity, the first quantity being the quantity of sample images included in the first image set, and the second quantity
  • the number is the total number of sample images included in the plurality of first image sets, the sampling weight is positively correlated with the second number, and the sampling weight is negatively correlated with the first number;
  • sampling the first image set Based on the sampling weight, sampling the first image set to obtain a second image set;
  • the second depth determination model is trained to obtain the first depth determination model.
  • a computer-readable storage medium is provided, and at least one piece of program code is stored in the computer-readable storage medium, and the at least one piece of program code is loaded and executed by a processor to implement follows the steps below:
  • the first image area is the image area where the target object is located
  • the second image area is the image where the background is located area
  • the third image is obtained by fusing image data in the first image area into the depth-filled second image based on the first depth information and the third depth information.
  • a computer-readable storage medium is provided, and at least one piece of program code is stored in the computer-readable storage medium, and the at least one piece of program code is loaded and executed by a processor to implement follows the steps below:
  • a sampling weight of the first image set is determined based on a first quantity and a second quantity, the first quantity being the quantity of sample images included in the first image set, and the second quantity
  • the number is the total number of sample images included in the plurality of first image sets, the sampling weight is positively correlated with the second number, and the sampling weight is negatively correlated with the first number;
  • sampling the first image set Based on the sampling weight, sampling the first image set to obtain a second image set;
  • the second depth determination model is trained to obtain the first depth determination model.
  • a computer program product or a computer program comprising computer program code, the computer program code being stored in a computer-readable storage medium
  • the processor of the computer device reads the computer program code from the computer-readable storage medium, and the processor executes the computer program code, so that the computer device performs the following steps:
  • the first image area is the image area where the target object is located
  • the second image area is the image where the background is located area
  • the third image is obtained by fusing image data in the first image region into the depth-filled second image based on the first depth information and the third depth information.
  • a computer program product or a computer program comprising computer program code, the computer program code being stored in a computer-readable storage medium
  • the processor of the computer device reads the computer program code from the computer-readable storage medium, and the processor executes the computer program code, so that the computer device performs the following steps:
  • a sampling weight of the first image set is determined based on a first quantity and a second quantity, the first quantity being the quantity of sample images included in the first image set, and the second quantity
  • the number is the total number of sample images included in the plurality of first image sets, the sampling weight is positively correlated with the second number, and the sampling weight is negatively correlated with the first number;
  • sampling the first image set Based on the sampling weight, sampling the first image set to obtain a second image set;
  • the second depth determination model is trained to obtain the first depth determination model.
  • the second image is obtained after background filling and depth filling in the first image
  • the second image is fused with the first image area where the target object is located in the first image to obtain the first image.
  • the three images when the perspective of the third image changes, can fill in the background holes, and at the same time prevent distortion or loss at the boundary of the target object, and optimize the image effect of the generated image.
  • FIG. 1 is a flowchart of an image generation method provided according to an exemplary embodiment
  • FIG. 2 is a flowchart of an image generation method provided according to an exemplary embodiment
  • FIG. 3 is a schematic diagram of an image processing provided according to an exemplary embodiment
  • FIG. 4 is a schematic diagram of an image processing provided according to an exemplary embodiment
  • FIG. 5 is a schematic diagram of an image processing provided according to an exemplary embodiment
  • FIG. 6 is a flowchart of an image generation method provided according to an exemplary embodiment
  • FIG. 7 is a flowchart of an image generation method provided according to an exemplary embodiment
  • FIG. 8 is a schematic diagram of an image processing provided according to an exemplary embodiment
  • FIG. 9 is a flowchart of an image generation method provided according to an exemplary embodiment.
  • FIG. 10 is a schematic diagram of an image processing provided according to an exemplary embodiment
  • Fig. 11 is a block diagram of an image generating apparatus provided according to an exemplary embodiment
  • FIG. 12 is a flowchart of a training method for a depth determination model provided according to an exemplary embodiment
  • FIG. 13 is a schematic structural diagram of an electronic device provided according to an exemplary embodiment.
  • the electronic device In order to display the collected images in the form of three-dimensional images, the electronic device performs image processing on the collected images, generates a three-dimensional image, and displays the three-dimensional image to the user.
  • a three-dimensional image refers to an image with a three-dimensional effect.
  • the solutions provided by the embodiments of the present disclosure are applied in an electronic device, and the electronic device is an electronic device with an image acquisition function.
  • the electronic device is a camera, or the electronic device is a mobile phone, a tablet computer, or a wearable device with a camera.
  • the electronic device is not specifically limited.
  • the image generation method provided by the embodiments of the present disclosure can be applied in the following scenarios:
  • an electronic device when an electronic device captures an image, it directly converts the captured two-dimensional image into a three-dimensional image according to the method provided by the embodiment of the present disclosure.
  • the electronic device stores the two-dimensional image in the electronic device after capturing the two-dimensional image; when the user shares the two-dimensional image through the electronic device, the electronic device uses the method provided by the embodiment of the present disclosure to store the two-dimensional image in the electronic device.
  • the 3D image is converted into a 3D image, and the 3D image is shared.
  • sharing an image includes at least one of sharing an image with other users, sharing an image with a social display platform, and sharing an image with a short video platform, and the like.
  • the electronic device stores the two-dimensional image in the electronic device after shooting to obtain a two-dimensional image; when the user generates a video through the electronic device, the electronic device obtains a plurality of selected two-dimensional images, through the present disclosure
  • the method provided by the embodiment converts multiple two-dimensional images into multiple three-dimensional images, and synthesizes the multiple three-dimensional images into a video. For example, when a user shares a video on a short video platform, he first selects multiple two-dimensional selfie images containing human faces, and converts the multiple two-dimensional selfie images into multiple three-dimensional selfie images by using the method provided by the embodiment of the present application. , synthesizing multiple three-dimensional selfie images into a video, and sharing the obtained video to the short video platform.
  • FIG. 1 is a flowchart of an image generation method provided according to an exemplary embodiment. As shown in Figure 1, the method includes the following steps:
  • Step 101 Determine the first depth information of the first image area and the second depth information of the second image area in the first image, where the first image area is the image area where the target object is located, and the second image area is where the background is located. image area.
  • Step 102 Obtain a second image by replacing the image data of the first image area based on the image data of the second image area.
  • Step 103 Obtain third depth information of the third image area by filling the depth of the third image area based on the second depth information, where the third image area is an image area corresponding to the first image area in the second image.
  • Step 104 Obtain a third image by fusing the image data in the first image area into the depth-filled second image based on the first depth information and the third depth information.
  • obtaining a third image by fusing image data in the first image region into the depth-filled second image based on the first depth information and the third depth information including:
  • first three-dimensional model based on the image data of the first image area, where the first three-dimensional model is a three-dimensional model corresponding to the target object;
  • the pixel information corresponding to the first three-dimensional model and the second three-dimensional model is fused to obtain the third image, wherein the pixels corresponding to the first three-dimensional model are in the third image.
  • the depth information in the three images is the first depth information
  • the depth information of the pixels corresponding to the second three-dimensional model in the third image is the third depth information.
  • the third image is obtained by fusing the pixel information of the first three-dimensional model and the second three-dimensional model based on the first depth information and the third depth information, including:
  • the depth information of each pixel of the target object is based on the depth information of the target key point of the target object, and the target key point is the target key points of the object;
  • the first pixel point is the pixel point corresponding to the target key point in the second three-dimensional model
  • the pixel information of the first pixel point is the pixel information of the target key point in the first three-dimensional model, and the depth information of the first pixel point is the first depth of the target key point information;
  • the pixel information of the second pixel is the pixel information of the other pixel in the first three-dimensional model
  • the depth information of the second pixel is the other pixel The third depth information of the point.
  • obtaining the second image by replacing the image data of the first image area based on the image data of the second image area includes:
  • a background is filled in the removed area outline to obtain a second image, including:
  • the second image is obtained by inputting the removed first image into the image completion model, which is used to fill the background in the contour of the region.
  • determining the first depth information of the first image area and the second depth information of the second image area in the first image includes:
  • the first image is input into the first depth determination model to obtain first depth information and second depth information.
  • the first depth determination model includes a feature extraction layer, a feature map generation layer, a feature fusion layer, and a depth determination layer;
  • first depth information and second depth information including:
  • the fused feature map is obtained by fusing multiple feature maps through the feature fusion layer;
  • the first depth information and the second depth information are obtained by convolution processing the fused feature map through the depth determination layer.
  • the method further includes:
  • the first coordinate is the position coordinate of the special effect element in the image coordinate system of the third image
  • the second coordinate is the special effect element in the camera coordinate system of the third image. depth coordinates
  • the fourth image is the first target pixel point where the special effect element is fused to the third image, and the fourth image is obtained, and the first target pixel point is the position and the coordinates are the first Coordinate, the depth coordinate is the pixel point of the second coordinate.
  • the method further includes:
  • rotating the third image to generate a video includes:
  • the pixels in the third image are rotated to generate a video.
  • determining a rotation angle to rotate in a direction corresponding to each coordinate axis of the camera coordinate system includes:
  • the rotation angle of the direction is determined.
  • the second image is obtained after background filling and depth filling in the first image
  • the second image is fused with the first image area where the target object is located in the first image to obtain the first image.
  • the three images when the perspective of the third image changes, can fill in the background holes, and at the same time prevent distortion or loss at the boundary of the target object, and optimize the image effect of the generated image.
  • Fig. 2 is a flowchart of an image generation method provided according to an exemplary embodiment.
  • the training of the first depth determination model is taken as an example for description.
  • the method includes the following steps:
  • Step 201 The electronic device acquires a plurality of first image sets, and each first image set corresponds to an image category.
  • the image category is used to represent the scene to which the image belongs, that is, the image category is the image scene, and the image scene includes an indoor scene and an outdoor scene.
  • the first image set includes a plurality of sample images, and the sample images mark the depth information of the pixels in the sample images.
  • the step of acquiring the first image set by the electronic device includes: acquiring a plurality of images by the electronic device, the categories of the multiple images are the image category; marking the depth information of the pixels in the multiple images, A plurality of sample images are obtained, and the plurality of sample images are formed into a first image set.
  • the electronic device after acquiring the multiple first image sets, divides the multiple first image sets into training data and test data.
  • the training data is used to train the model
  • the test data is used to determine whether the trained model meets the requirements.
  • the electronic device selects some sample images from the plurality of first image sets as training data, and uses the remaining sample images in the plurality of first image sets as test data. In some embodiments, the electronic device selects some sample images from each first image set, composes training data from the sample images selected from each first image set, and combines the remaining samples from each first image set The images make up the test data. For example, the electronic device acquires two first image sets, which are image set A and image set B respectively.
  • the shooting scene of the sample images included in image set A is outdoor, that is, the image category of image set A is outdoor;
  • image set B includes The shooting scene of the sample image is indoor, that is, the image category of the image set B is outdoor; the electronic device selects some sample images from the image set A and the image set B respectively, and composes the selected sample images into training data.
  • the remaining sample images in image set B and the remaining sample images in image set B constitute test data.
  • each first image set corresponds to one image category
  • subsequent model training is performed through multiple image sets, so that the model can be trained according to the difference in depth under different image categories, thereby improving the training result.
  • the first depth of determines the accuracy of the model.
  • Step 202 For each first image set, the electronic device determines the sampling weight of the first image set based on the first quantity and the second quantity.
  • the first number is the number of sample images included in the first image set
  • the second number is the total number of sample images included in the multiple first image sets
  • the sampling weight is positively correlated with the second number
  • the sampling weight is negatively correlated with the first quantity. Since each first image set corresponds to one image category, the electronic device determines the sampling weights of the first image sets of different image categories based on the number of sample images in different image collections, so that the subsequent modeling is performed based on the sampling weights of different image categories training to improve accuracy.
  • the electronic device uses the ratio of the second number to the first number as the sampling weight. For example, if the second quantity is K and the first quantity is k_i, the sampling weight is K/k_i. Wherein, i represents the label (image category) of the first image set.
  • the electronic device uses the ratio of the second quantity to the first quantity as the sampling weight, so that the sampling weight of the first image set with a larger first quantity is smaller, and the sampling weight of the first image set with a smaller first quantity is smaller.
  • the larger the sampling weight the more balanced the sample images of each image category can be during the model training process, and the deviation of the model training can be prevented.
  • Step 203 Based on the sampling weight, the electronic device samples the first image set to obtain a second image set.
  • the electronic device acquires sample images from the first image set based on the sampling weight, and composes the acquired sample images into a second image set.
  • the electronic device determines a third number, the third number being the expected total number of the second set of images. For each first image set, the electronic device determines a fourth number based on the sampling weight of the first image set and the third number, where the fourth number is the number of sample images that need to be collected from the first image set, from The fourth number of sample images are collected from the first image set.
  • the fourth number of sample images are adjacent sample images in the first image set, or the fourth number of sample images are randomly sampled sample images in the first image set, or the The fourth number of sample images are sample images obtained by uniform sampling in the first image set, and the like.
  • the manner in which the electronic device samples the sample image from the first image set is not specifically limited.
  • Step 204 The electronic device trains the second depth determination model to obtain the first depth determination model based on the plurality of second image sets.
  • the electronic device adjusts the model parameters of the second depth determination model based on the second image set and the loss function to obtain the trained first depth determination model, and the process is implemented through the following steps (1)-(3), including:
  • the electronic device determines the loss value of the second depth determination model based on the second image set and the loss function.
  • the electronic device obtains the first depth determination model by training the second depth determination model, and the process includes: for each sample image in the second image set, the electronic device inputs the sample image into the second depth determination model, and outputs the sample The depth information of the image, the depth information output by the second depth determination model and the depth information marked in the sample image are input into the loss function to obtain the loss value of the second depth determination model.
  • the loss function is a vector loss function; for example, the loss function includes at least one of a depth x-direction loss function, a depth y-direction loss function, a normal vector loss function, and a reverse robust loss function (Reversed HuBer).
  • the loss function includes at least one of a depth x-direction loss function, a depth y-direction loss function, a normal vector loss function, and a reverse robust loss function (Reversed HuBer).
  • the electronic device constructs a second depth determination model.
  • the electronic device constructs the second depth determination model through a convolutional neural network.
  • the second depth determination model includes a feature extraction layer, a feature map generation layer, a feature fusion layer and a depth determination layer.
  • each layer in the second depth determination model consists of a convolutional layer
  • each convolutional layer is a convolutional layer of the same structure or a convolutional layer of a different structure.
  • the convolution layer in the second depth determination model is at least one of Depthwise Convolution (depth convolution structure), Pointwise Convolution (pointwise convolution structure) or Depthwise-Pointwise Convolution (depth pointwise convolution structure).
  • the structure of the convolution layer is not specifically limited.
  • the feature extraction layer consists of four convolutional layers.
  • the feature extraction layer is used to extract multi-layer features of the sample image to obtain multiple image features of the sample image.
  • the sample image is a 3-channel image.
  • the electronic device inputs the 3-channel sample image into the first convolutional layer, and converts the 3-channel sample image into a 16-channel sample image through the first convolutional layer; then the 16-channel sample image is converted into a 16-channel sample image.
  • Channel sample images are converted to 128-channel sample images. For sample images with different channel numbers, the image features of the sample images are extracted respectively, so that different image features corresponding to different convolutional layers can be obtained.
  • the feature map generation layer is used to sample multiple image features to obtain multiple feature maps of different scales.
  • the features of the local image and the global image in the sample image are determined by the image features of different convolutional layers output by the feature extraction layer. , record the relative relationship between the position of each pixel in the sample image and the global image, so as to provide local feature information and global feature information to the feature fusion layer and the depth determination layer.
  • the feature map generation layer consists of five convolutional layers.
  • the first to fourth convolutional layers are used to sample 128-channel sample images; the first to fourth convolutional layers are respectively connected to the fifth convolutional layer, and the sampled image is input
  • the fifth convolutional layer performs scale conversion on the four received sample images to obtain multiple feature maps of different scales, and the multiple feature maps of different scales are input to the feature fusion layer.
  • the feature fusion layer is used to perform feature fusion on the multiple feature maps to obtain a fused feature map.
  • the feature fusion layer gradually restores the image resolution and reduces the number of channels, fuses the features of the feature extraction layer, and takes into account the features of different depths in the sample image.
  • the feature fusion layer includes three layers of convolution layers.
  • the first layer of convolution layer downsamples the feature map of the 128-channel sample image to obtain the feature map of the 64-channel sample image;
  • the second layer of convolutional layer downsamples the 64-channel sample image.
  • the feature map of the sample image is obtained, and the feature map of the 32-channel sample image is obtained;
  • the third convolutional layer downsamples the feature map of the 32-channel sample image to obtain the feature map of the 16-channel sample image, and then the obtained multiple
  • the feature map is fused to obtain a fused feature map, and the fused feature map is input to the depth determination layer.
  • the depth determination layer is used to determine the depth information of each pixel of the sample image based on the fused feature map.
  • the electronic device first acquires multiple first image sets, and then constructs the second depth determination model; or, the electronic device constructs the second depth determination model first, and then acquires multiple first image sets; or, the electronic device Simultaneously acquire a plurality of first image sets and construct a second depth determination model. That is, the electronic device executes step 201 first, and then executes step 202; or, the electronic device executes step 202 first, and then executes step 201; or, the electronic device executes step 201 and step 202 at the same time. In this embodiment of the present disclosure, the execution order of step 201 and step 202 is not specifically limited.
  • the electronic device updates the model parameters of the second depth determination model through the loss value and the model optimizer to obtain a third depth determination model.
  • the optimizer is used to update model parameters using stochastic gradient descent.
  • the electronic device updates model parameters through the stochastic gradient descent method, where the model parameters include gradient values.
  • the electronic device determines the loss value of the third depth determination model based on the training data and the vector loss function, and until the loss value is less than the preset loss value, completes model training to obtain the first depth determination model.
  • the electronic device After the electronic device adjusts the model parameters of the second depth determination model, it continues to perform model training on the obtained third depth determination model. This process is similar to steps (1)-(2), and will not be repeated here, and the steps are performed each time (2) After that, the electronic device determines whether the model training is completed based on the loss value of the model. In response to the loss value being not less than the preset loss value, it is determined that the model training has not been completed, and steps (1)-(2) are continued. In response to the loss value being less than the preset loss value, it is determined that the model training is completed, and the first depth determination is obtained. Model.
  • the electronic device after the electronic device completes the model training, it evaluates the prediction result of the first depth determination model.
  • the electronic device tests the first depth determination model based on the test data, and obtains a test result of the first depth determination model, where the test result is used to indicate whether the first depth determination model meets the requirements.
  • determine that the first depth determination module is an available depth determination model, and subsequently determine the depth information of the image based on the first depth determination model; When it indicates that the first depth determination model does not meet the requirements, continue to train the first depth determination model until the first depth determination model meets the requirements.
  • the electronic device adopts at least one algorithm of mean Relative Error (average relative error) algorithm or Root Mean Squared Error (root mean square error algorithm) to determine the test result of the first determined model.
  • FIG. 4 and FIG. 5 are renderings of test results of a first depth determination model provided according to an exemplary embodiment. Pixels with the same depth information are marked with the same mark, and the more similar the depth information is, the more marked The more similar the markers are. For example, different depth information is distinguished by different colors, and the more similar the depth information is, the more similar the colors are.
  • the training process of the first depth determination model is performed by the electronic device currently used to generate the image; or, performed by another electronic device other than the current device.
  • the process for the electronic device to acquire the first depth determination model is: the electronic device sends an acquisition request to other electronic devices, and the acquisition request is used to request to acquire the first depth determination model; Other electronic devices acquire the first depth determination model based on the acquisition request, and send the first depth determination model to the electronic device; the electronic device receives the first depth determination model.
  • the process of training the first depth determination model by other electronic devices is similar to the process of training the first depth determination model by the electronic device, and details are not described herein again.
  • the sampling weight is determined based on the first number and the second number
  • the first number is the number of sample images in the first image set
  • the second number is the sample images in the multiple first image sets Therefore, when sampling the first image set based on the sampling weight, the number of sample images in each first image set can be controlled, ensuring that the sampling weight of the first image set containing more sample images is higher.
  • Fig. 6 is a flowchart of an image generation method provided according to an exemplary embodiment.
  • processing an image to generate a three-dimensional dynamic image is taken as an example for description.
  • the method includes the following steps:
  • Step 601 The electronic device determines the first depth information of the first image area and the second depth information of the second image area in the first image.
  • the first image area is the image area where the target object is located
  • the second image area is the image area where the background is located
  • the background is the part of the first image excluding the target object.
  • the target object is a designated object, a human or other animal face, or the like.
  • the electronic device obtains the first depth information and the second depth information by using the first depth determination model, and the process is as follows: the electronic device inputs the first image into the first depth determination model, and obtains the first depth information and the second depth information. 2. In-depth information.
  • the structure of the first depth determination model is the same as that of the second depth determination model.
  • the first depth determination model includes a feature extraction layer, a feature map generation layer, a feature fusion layer and a depth determination layer. This step is realized through the following steps (1)-(4), including:
  • the electronic device inputs the first image to the feature extraction layer, and extracts multi-layer features of the first image through the feature extraction layer to obtain multiple image features of the first image.
  • This step is similar to the process of extracting image features by the electronic device through the feature extraction layer in the second depth determination model in step (1) of step 204, and details are not described here.
  • the electronic device samples multiple image features through the feature map generation layer to obtain multiple feature maps of different scales.
  • step and step (1) of step 204 the electronic device determines the feature map generation layer in the model through the second depth, and the process of generating the feature map is similar, which will not be repeated here.
  • the electronic device fuses multiple feature maps through the feature fusion layer to obtain a fused feature map.
  • This step is similar to the process of feature fusion performed by the electronic device through the feature fusion layer in the second depth determination model in step (1) of step 204, and details are not repeated here.
  • the electronic device obtains the first depth information and the second depth information through the feature map after convolution processing and fusion of the depth determination layer.
  • This step is similar to the process of determining the depth information of the image by the electronic device through the depth determination layer in the second depth determination model in step (1) of step 204, and details are not repeated here.
  • the first depth information and the second depth information of the first image are determined by the pre-trained first depth determination model, thereby shortening the determination time of the first depth information and the second depth information, and further The image processing speed is improved, so that the solution can be applied to the scene of instant imaging.
  • the electronic device detects whether the target object exists in the first image, and in response to the presence of the target object in the first image, the electronic device performs step 601; in response to the absence of the target object in the first image, end.
  • the electronic device in response to the presence of the target object in the first image, the electronic device further detects an area ratio between the first image area where the target object is located and the first image, and in response to the area ratio being greater than a preset threshold, step 601 is executed, In response to the area ratio not being greater than the preset threshold, end.
  • the first image is an RGB (Red Green Blue) three-channel image.
  • Step 602 The electronic device acquires the second image by replacing the image data of the first image area based on the image data of the second image area.
  • the image data includes information such as the position and pixel value of the pixel in the image.
  • the electronic device removes the image data in the first image area through a mask, and then fills the background of the first image area through the second image area to obtain a second image.
  • this step is realized through the following steps (1)-(3), including:
  • the electronic device performs image segmentation on the first image, and determines the area contour corresponding to the first image area.
  • the electronic device divides the first image by using the image segmentation model to obtain an area outline corresponding to the first image area.
  • the image segmentation model is an image segmentation model acquired in advance by the electronic device.
  • the image segmentation model is a mask segmentation model.
  • the area outline is marked in the first image.
  • the electronic device removes the image data within the outline of the area.
  • the electronic device removes the pixel values of the pixel points in the outline of the area, so as to remove the image data in the outline of the area.
  • the image mask of the first image area will be obtained. Referring to Fig. 7 and Fig. 8, the image on the left side of Fig. 7 and the image on the left side of Fig. 8 show the mask image of the region outline.
  • the electronic device fills the background in the removed area outline to obtain a second image.
  • step (3) includes: the electronic equipment inputs the removed first image into the image complementing model, and obtains the second image, which complements the image.
  • the full model is used to fill the background in the area outline.
  • the electronic device inputs the removed first image into the image completion model, and the image completion model fills the background in the outline of the region based on the image data of the second image area, and the obtained second image is a complete background image.
  • the right image of Figure 7 and the right image of Figure 8 are complete background images.
  • the image completion model determines image features of the second image region, and based on the image features of the second image region, fills a background in the region outline.
  • Step 603 The electronic device obtains third depth information of the third image area by filling the depth of the third image area based on the second depth information, where the third image area is the image area corresponding to the first image area in the second image .
  • the electronic device performs depth information diffusion to the third image area based on the second depth information to obtain the third depth information.
  • the diffusion mode is Poisson diffusion mode.
  • the electronic device determines the change rule of depth information between adjacent pixels in the second image area, and determines the depth information of each pixel in the third image area based on the change rule of depth information; or, for the third image area
  • the electronic device determines the depth information of the pixel in the area outline, and assigns the determined depth information to the pixel.
  • the electronic device fills the depth of the third image area, so that the depth of the third image area matches the depth of the second image area, so that the generated background is more harmonious and the effect of the generated three-dimensional image is more realistic .
  • Step 604 The electronic device creates a first three-dimensional model based on the image data of the first image area, where the first three-dimensional model is a three-dimensional model corresponding to the target object.
  • the first three-dimensional model is a three-dimensional model generated based on image data of the first image area.
  • the electronic device creates the first three-dimensional model based on at least one key point of the target object in the first image area. For example, the electronic device identifies at least one key point of the target object in the first image area, and based on the at least one key point, creates a first three-dimensional model through a three-dimensional model generation algorithm.
  • the figure on the right in FIG. 9 is a first three-dimensional model created based on the face image of the figure on the left. For example, if the target object is a face, the at least one key point is a face key point.
  • the three-dimensional model generation algorithm is a 3DMM (3D Morphable Model, 3D deformation model; 3D, 3 Dimensional, three-dimensional) algorithm. Then the first three-dimensional model is a mesh mesh image model.
  • Step 605 The electronic device creates a second three-dimensional model based on the depth-filled second image, where the second three-dimensional model is a three-dimensional model corresponding to the background.
  • This step is similar to step 604 and will not be repeated here.
  • Step 606 Based on the first depth information and the third depth information, the electronic device fuses pixel information corresponding to the first three-dimensional model and the second three-dimensional model to obtain a third image.
  • the depth information of the pixels corresponding to the first three-dimensional model in the third image is the first depth information
  • the depth information of the pixels corresponding to the second three-dimensional model in the third image is the third depth information
  • the third image is generated by fusing the first three-dimensional model and the second three-dimensional model, so that the third image contains the three-dimensional target object and the three-dimensional background, thereby ensuring that the background hole can be filled when the perspective is changed. At the same time, it also prevents distortion or missing at the boundary of the target object, optimizing the image effect of the generated 3D image.
  • the electronic device determines a coordinate system, and fuses the first three-dimensional model and the second three-dimensional model into the coordinate system, so that the depth information of the pixels corresponding to the first three-dimensional model and the second three-dimensional model is based on the coordinate system.
  • the pixel information corresponding to the first three-dimensional model and the second three-dimensional model are respectively assigned to corresponding pixel positions to obtain a third image.
  • the electronic device establishes a coordinate system based on the first three-dimensional model or the second three-dimensional model, maps the second three-dimensional model or the first three-dimensional model to the coordinate system, and maps the first three-dimensional model to the second three-dimensional model
  • the pixel information of , respectively, is assigned to the corresponding pixel position to obtain a third image.
  • the electronic device determines, based on the positions of the key points in the first three-dimensional model or the second three-dimensional model in the second image and the parameter information between each key point in the first three-dimensional model and the second three-dimensional model, respectively.
  • This step is realized through the following steps (A1)-(A5), including:
  • the electronic device determines the depth information of each pixel of the target object, the depth information of each pixel is based on the depth information of the target key point of the target object, and the target The key point is the key point of the target object.
  • the depth information of each pixel is based on the depth information of the target key point of the target object, and the target key point is a key point in the at least one key point.
  • the target key point is the pixel point corresponding to the nose in the face image, or the target key point is the center point of the first three-dimensional model.
  • the electronic device selects a target key point from at least one key point of the target object, and determines the depth information of the target key point as the first depth information, and the electronic device determines, based on the model parameters of the first three-dimensional model, the The depth information of each pixel point relative to the target key point, based on the first depth information of the target key point and the depth information of each pixel point in the first three-dimensional model relative to the target key point, determine the depth information of each pixel point in the first three-dimensional model. in-depth information. For example, if the first three-dimensional model is a mesh image determined by a 3DMM algorithm, the depth information of each pixel point of the target object is determined based on the parameter information of each pixel point in the mesh image.
  • the electronic device determines the first pixel point based on the target key point
  • the first pixel point is the pixel point corresponding to the target key point in the second three-dimensional model.
  • the first three-dimensional model and the second three-dimensional model are three-dimensional models corresponding to the target object and the background in the first image. Therefore, the first three-dimensional model and the second three-dimensional model can be mapped to the same image coordinate system.
  • the electronic device maps the first three-dimensional model to the second three-dimensional model.
  • the electronic device selects the center point of the second three-dimensional model as the first pixel point, or the electronic device determines the mapping relationship between the first three-dimensional model and the second three-dimensional model based on the first mapping relationship and the second mapping relationship , the first mapping relationship is the mapping relationship between the first three-dimensional model and the first image, the second mapping relationship is the mapping relationship between the second three-dimensional model and the first image, and based on the mapping relationship between the first three-dimensional model and the second three-dimensional model, The first pixel point corresponding to the target key point is determined from the second three-dimensional model.
  • the electronic device assigns pixel information and depth information of the first pixel point.
  • the pixel information of the first pixel point is the pixel information of the target key point in the first three-dimensional model
  • the depth information of the first pixel point is the first depth information of the target key point.
  • the pixel information includes information such as pixel values of pixel points.
  • the electronic device modifies the depth information of the first pixel point in the second three-dimensional model to the first depth information of the target key point, and modifies the pixel information of the first pixel point to the pixel information of the target key point. For example, the electronic device determines the position of the nose in the face in the first three-dimensional model as the target key point, and then determines the depth information of the first pixel point as the first depth information of the nose.
  • the electronic device directly assigns the pixel information of the target key point and the first depth information to the first pixel point. In some embodiments, the electronic device sets a new layer on the second three-dimensional model, and modifies the pixel information and depth information of the first pixel point in the layer to the pixel information and the first depth information of the target key point.
  • the first three-dimensional model and the second three-dimensional model can not affect each other, and achieve the effect of integral molding, thereby optimizing the image effect of the generated three-dimensional image.
  • the electronic device determines the second pixel point based on the positional relationship between the target key point and other pixel points in the target object.
  • the second pixel point is the pixel point corresponding to other pixel points in the second three-dimensional model.
  • the electronic device sets the target key point at the origin of the coordinate system corresponding to the second three-dimensional model, and sets the origin of the coordinate system corresponding to the first three-dimensional model and the second three-dimensional model at the pixel corresponding to the target key point in the second image point location.
  • the electronic device assigns the pixel information and depth information of the second pixel point to obtain a third image, and the pixel information of the second pixel point is the pixel information of the other pixel points in the first three-dimensional model, and the depth of the second pixel point The information is the third depth information of the other pixels.
  • step (A3) is similar to step (A3) and will not be repeated here.
  • the electronic device fuses the first three-dimensional model and the second three-dimensional model based on the positional relationship of different pixels in the same image, so that when the perspective is changed, the background holes can be filled and the target object can be prevented from being damaged. Distortion or missing at the borders, optimizing the image quality of the resulting 3D image.
  • the electronic device can also add special effects elements to the third image to obtain a fourth image with special effects elements.
  • the process is as follows: the electronic device determines the first and second coordinates of the special effects elements to be added, and the first coordinates are The position coordinate of the special effect element in the image coordinate system of the third image, the second coordinate is the depth coordinate of the special effect element in the camera coordinate system of the third image, and the depth coordinate is the special effect element in the camera coordinate system.
  • the coordinate position corresponding to the depth information in the image by fusing the special effect element to the first target pixel point of the third image based on the first coordinate and the second coordinate, the fourth image is obtained, the position coordinate of the first target pixel point is the first coordinate, and the depth coordinate is the pixel point of the second coordinate.
  • the electronic device converts the pixel position to a coordinate system based on the principle of camera imaging.
  • the coordinates in the coordinate system are homogeneous coordinates (X, Y, 1), and the depth of the pixel in this coordinate system is the estimated distance of the depth map, and the depth coordinate 1 of the homogeneous coordinates and the depth Z are multiplied to form
  • the true depth coordinates (X, Y, Z) are the reconstructed 3D model.
  • the electronic device selects different positions and depths in the three-dimensional image, and places different dynamic effects to obtain a fourth image. For example, referring to Figure 10, place butterfly elements around the face with depths of 1, 2, and 3.5, respectively. This process is similar to (A1)-(A5) in step 606, and will not be repeated here.
  • the electronic device adds special effect elements to the third image based on the depth information, so that the added special effect elements and the third image are more vivid, and the image effect of the generated three-dimensional image is optimized.
  • the electronic device After the electronic device generates the three-dimensional third image, it can also rotate the third image to generate a video.
  • the process is achieved through the following steps (B1)-(B3), including:
  • the electronic device sets the position coordinate corresponding to the target key point of the target object as the coordinate origin of the camera coordinate system corresponding to the third image.
  • the electronic device determines a rotation angle for selecting a direction corresponding to each coordinate axis of the camera coordinate system.
  • the electronic device determines the rotation angle in the direction corresponding to each coordinate axis.
  • the rotation angle is a preset rotation angle, or the rotation angle is a rotation angle generated based on a rotation instruction.
  • the electronic device determines that the target key point is at a preset display angle, a preset motion speed, and a preset number of display frames; based on the preset motion speed and the preset number of display frames, determines a display angle weight; based on the The display angle weight and the preset display angle determine the rotation angle of the direction.
  • the preset display angle in the preset X (or Y) direction is AmpX (or AmpY)
  • t is the preset display frame number
  • the preset display frame number is also marked by time
  • s is Preset motion speed, then rotate AmpX*sin(s*t) angle around X axis each time (or choose AmpY*sin(s*t) angle around Y axis).
  • sin(s*t) is the display angle.
  • the display track of the third image is determined by the preset motion track, so that the third image can be displayed in rotation according to the specified route, so as to prevent the problem of track confusion when the third image generates a video.
  • the electronic device acquires the rotation instruction, and based on the rotation instruction, selects the rotation angle of the direction from the rotation angle corresponding to the rotation instruction and the preset display angle.
  • the rotation instruction is an instruction input by the user through the screen received by the electronic device, or the rotation instruction is an instruction generated by an angle sensor in the electronic device.
  • the electronic device receives a gesture operation input by the user, and determines the rotation angle based on the gesture operation. In other embodiments, the electronic device determines the current tilt angle of the electronic device through an angle sensor, and determines the tilt angle as the rotation angle.
  • the electronic device obtains the quaternion attitude of the gyroscope based on the attitude of the electronic device, calculates the inclination angles x_anlge and y_angle of the X-axis and the Y-axis, and rotates the min(x_anlge, AmpX) angle around the X-axis, Then rotate the min(y_anlge, AmpY) angle around the Y axis.
  • the electronic device determines the motion trajectory of the third image based on the received rotation instruction, so that the motion trajectory of the third image is more flexible.
  • the electronic device rotates the pixels in the third image based on the rotation angle to generate a video.
  • the electronic device translates the coordinate system to the pixel point, rotates the pixel point in the third image based on the pixel point and the rotation angle, and obtains a video.
  • the key point of the target moves according to the above motion trajectory, and finally returns to the initial position, and repeats the above (B2)-(B3) to obtain a three-dimensional dynamic video.
  • the third image generates a three-dimensional dynamic video based on the running trajectory, which enriches the display manner of the image.
  • the second image is obtained after background filling and depth filling in the first image
  • the second image is fused with the first image area where the target object is located in the first image to obtain the first image.
  • the three images when the perspective of the third image changes, can fill in the background holes, and at the same time prevent distortion or loss at the boundary of the target object, and optimize the image effect of the generated image.
  • Figure 11 provides a block diagram of an image generation according to an exemplary embodiment.
  • the device includes:
  • the first determining unit 1101 is configured to determine the first depth information of the first image area and the second depth information of the second image area in the first image, where the first image area is the image area where the target object is located, the The second image area is the image area where the background is located;
  • a replacement unit 1102 configured to acquire a second image by replacing the image data of the first image area based on the image data of the second image area;
  • the filling unit 1103 is configured to obtain third depth information of the third image area by filling the depth of the third image area based on the second depth information, where the third image area is the same as that in the second image. an image area corresponding to the first image area;
  • the first fusion unit 1104 is configured to, based on the first depth information and the third depth information, fuse the image data in the first image area into the depth-filled second image, and obtain: third image.
  • the first fusion unit 1104 includes:
  • a first creation subunit configured to create a first three-dimensional model based on the image data of the first image area, where the first three-dimensional model is a three-dimensional model corresponding to the target object;
  • a second creation subunit configured to create a second three-dimensional model based on the depth-filled second image, where the second three-dimensional model is a three-dimensional model corresponding to the background;
  • a fusion subunit configured to fuse pixel information corresponding to the first three-dimensional model and the second three-dimensional model based on the first depth information and the third depth information, to obtain the third image, wherein,
  • the depth information of the pixels corresponding to the first three-dimensional model in the third image is the first depth information
  • the depth information of the pixels corresponding to the second three-dimensional model in the third image is the the third depth information.
  • the fusion subunit is configured to determine, from the first three-dimensional model, depth information of each pixel of the target object, where the depth information of each pixel is based on the The depth information of the target key point of the target object is the benchmark, and the target key point is the key point of the target object; based on the target key point, a first pixel point is determined, and the first pixel point is the target key point The pixel point corresponding to the point in the second three-dimensional model; assign the pixel information and depth information of the first pixel point, and the pixel information of the first pixel point is that the target key point is in the first three-dimensional model.
  • the depth information of the first pixel is the first depth information of the target key point; based on the positional relationship between the target key point and other pixels in the target object, determine the second pixel point , the second pixel point is the pixel point corresponding to the other pixel points in the second three-dimensional model; assign the pixel information and depth information of the second pixel point to obtain the third image, the third image
  • the pixel information of the two pixel points is the pixel information of the other pixel points in the first three-dimensional model, and the depth information of the second pixel point is the third depth information of the other pixel points.
  • the replacement unit 1102 includes:
  • a segmentation subunit configured to perform image segmentation on the first image, and determine an area outline corresponding to the first image area
  • a removal subunit configured to remove image data within the outline of the region
  • the completion sub-unit is configured to fill the background in the region outline after removal to obtain the second image.
  • the completion subunit is configured to input the removed first image into an image completion model to obtain the second image, and the image completion model is used for Fill background in area outline.
  • the first determination unit is configured to input the first image into a first depth determination model to obtain the first depth information and the second depth information.
  • the first depth determination model includes a feature extraction layer, a feature map generation layer, a feature fusion layer, and a depth determination layer;
  • the first determining unit 1101 includes:
  • a feature extraction subunit configured to input the first image to the feature extraction layer, and extract multiple layers of features of the first image through the feature extraction layer to obtain a plurality of image features of the first image;
  • sampling subunit configured to sample the plurality of image features through the feature map generation layer to obtain a plurality of feature maps of different scales
  • a feature fusion subunit configured to fuse the plurality of feature maps through the feature fusion layer to obtain a fused feature map
  • the convolution subunit is configured to obtain the first depth information and the second depth information by convolution processing the fused feature map through the depth determination layer.
  • the apparatus further includes:
  • a third determining unit is configured to determine a first coordinate and a second coordinate of the special effect element to be added, where the first coordinate is the position coordinate of the special effect element in the image coordinate system of the third image, the The second coordinate is the depth coordinate of the special effect element in the camera coordinate system of the third image;
  • the second fusion unit is configured to obtain a fourth image by fusing the special effect element to the first target pixel point of the third image based on the first coordinate and the second coordinate.
  • the target pixel is a pixel whose position coordinate is the first coordinate, and the depth coordinate is the second coordinate.
  • the apparatus further includes:
  • a generating unit configured to rotate the third image to generate a video.
  • the generating unit includes:
  • a coordinate setting subunit configured to set the position coordinate corresponding to the target key point of the target object as the coordinate origin of the camera coordinate system corresponding to the third image
  • a determination subunit configured to determine a rotation angle to rotate in a direction corresponding to each coordinate axis of the camera coordinate system
  • a generating subunit is configured to rotate pixels in the third image based on the rotation angle to generate a video.
  • the determining subunit is configured to obtain a preset display angle, a preset motion speed and a preset number of display frames of the target key point in each direction; based on the preset motion speed and The display angle weight is determined by the preset display frame number; the rotation angle of the direction is determined based on the display angle weight and the preset display angle.
  • the second image is obtained after background filling and depth filling in the first image
  • the second image is fused with the first image area where the target object is located in the first image to obtain the first image.
  • the three images when the perspective of the third image changes, can fill in the background holes, and at the same time prevent distortion or loss at the boundary of the target object, and optimize the image effect of the generated image.
  • the image generation device provided by the above embodiments only takes the division of the above functional modules as an example to illustrate the image generation.
  • the above functions can be allocated to different functional modules as required.
  • the internal structure of the device is divided into different functional modules to complete all or part of the functions described above.
  • the image generating apparatus and the image generating method embodiments provided by the above embodiments belong to the same concept, and the specific implementation process thereof is detailed in the method embodiments, which will not be repeated here.
  • Fig. 12 provides a training device for a depth determination model according to an exemplary embodiment.
  • the device includes:
  • the acquiring unit 1201 is configured to acquire a plurality of first image sets, each of which corresponds to an image scene;
  • the second determining unit 1202 is configured to, for each first image set, determine a sampling weight of the first image set based on a first quantity and a second quantity, where the first quantity is included in the first image set
  • the number of sample images, the second number is the total number of sample images included in the plurality of first image sets, the sampling weight is positively correlated with the second number, and the sampling weight is related to the first image set. a negative correlation of quantity;
  • a sampling unit 1203, configured to sample the first image set based on the sampling weight to obtain a second image set
  • the model training unit 1204 is configured to train a second depth determination model to obtain a first depth determination model based on the plurality of second image sets.
  • the sampling weight is determined based on the first number and the second number
  • the first number is the number of sample images in the first image set
  • the second number is the sample images in the multiple first image sets Therefore, when sampling the first image set based on the sampling weight, the number of sample images in each first image set can be controlled, ensuring that the sampling weight of the first image set containing more sample images is higher.
  • the apparatus for determining the depth of the model only uses the division of the above-mentioned functional modules as an example when training the depth-determining model.
  • the above-mentioned functions can be assigned to different functions as required
  • Module completion means dividing the internal structure of the device into different functional modules to complete all or part of the functions described above.
  • the apparatus for training a depth determination model provided by the above embodiments and the embodiments of the training method for a depth determination model belong to the same concept, and the specific implementation process is detailed in the method embodiment, which will not be repeated here.
  • FIG. 13 shows a structural block diagram of an electronic device 1300 provided by an exemplary embodiment of the present disclosure.
  • the electronic device 1300 is a portable mobile terminal, such as a smart phone, a tablet computer, an MP3 player (Moving Picture Experts Group Audio Layer III, Moving Picture Experts Group Audio Layer 3), MP4 (Moving Picture Experts Group Audio Layer III) Audio Layer IV, Motion Picture Expert Compression Standard Audio Layer 4) Player, Laptop or Desktop.
  • Electronic device 1300 may also be called user equipment, portable terminal, laptop terminal, desktop terminal, and the like by other names.
  • the electronic device 1300 includes: a processor 1301 and a memory 1302 .
  • the processor 1301 includes one or more processing cores, such as a 4-core processor, an 8-core processor, and the like. In some embodiments, the processor 1301 adopts at least one of DSP (Digital Signal Processing, digital signal processing), FPGA (Field-Programmable Gate Array, field programmable gate array), PLA (Programmable Logic Array, programmable logic array). A form of hardware implementation. In some embodiments, the processor 1301 also includes a main processor and a co-processor. The main processor is a processor for processing data in a wake-up state, also referred to as a CPU (Central Processing Unit, central processing unit). ; a coprocessor is a low-power processor for processing data in a standby state.
  • CPU Central Processing Unit, central processing unit
  • a coprocessor is a low-power processor for processing data in a standby state.
  • the processor 1301 is integrated with a GPU (Graphics Processing Unit, image processor), and the GPU is used for rendering and drawing the content that needs to be displayed on the display screen.
  • the processor 1301 further includes an AI (Artificial Intelligence, artificial intelligence) processor, where the AI processor is used to process computing operations related to machine learning.
  • AI Artificial Intelligence, artificial intelligence
  • memory 1302 includes one or more computer-readable storage media that are non-transitory. In some embodiments, memory 1302 also includes high-speed random access memory, and non-volatile memory, such as one or more disk storage devices, flash storage devices. In some embodiments, a non-transitory computer-readable storage medium in the memory 1302 is used to store at least one instruction for execution by the processor 1301 to implement the image generation provided by the method embodiments of the present disclosure method.
  • the electronic device 1300 may also optionally include: a peripheral device interface 1303 and at least one peripheral device.
  • the processor 1301, the memory 1302 and the peripheral device interface 1303 are connected by a bus or a signal line.
  • each peripheral device is connected to the peripheral device interface 1303 through a bus, signal line or circuit board.
  • the peripheral device includes at least one of: a radio frequency circuit 1304 , a display screen 1305 , a camera assembly 1306 , an audio circuit 1307 , a positioning assembly 1308 , and a power supply 1309 .
  • the peripheral device interface 1303 may be used to connect at least one peripheral device related to I/O (Input/Output) to the processor 1301 and the memory 1302 .
  • processor 1301, memory 1302, and peripherals interface 1303 are integrated on the same chip or circuit board; in some other embodiments, any one of processor 1301, memory 1302, and peripherals interface 1303 or The two are implemented on a separate chip or circuit board, which is not limited by the embodiments of the present disclosure.
  • the radio frequency circuit 1304 is used for receiving and transmitting RF (Radio Frequency, radio frequency) signals, also called electromagnetic signals.
  • the radio frequency circuit 1304 communicates with communication networks and other communication devices via electromagnetic signals.
  • the radio frequency circuit 1304 converts electrical signals into electromagnetic signals for transmission, or converts received electromagnetic signals into electrical signals.
  • radio frequency circuitry 1304 includes: an antenna system, an RF transceiver, one or more amplifiers, tuners, oscillators, digital signal processors, codec chipsets, subscriber identity module cards, and the like.
  • radio frequency circuitry 1304 communicates with other terminals via at least one wireless communication protocol.
  • the wireless communication protocol includes but is not limited to: World Wide Web, Metropolitan Area Network, Intranet, various generations of mobile communication networks (2G, 3G, 4G and 5G), wireless local area network and/or WiFi (Wireless Fidelity, Wireless Fidelity) network.
  • the radio frequency circuit 1304 further includes a circuit related to NFC (Near Field Communication, short-range wireless communication), which is not limited in the present disclosure.
  • the display screen 1305 is used to display UI (User Interface, user interface).
  • the UI includes graphics, text, icons, video, and any combination thereof.
  • the display screen 1305 also has the ability to acquire touch signals on or above the surface of the display screen 1305 .
  • the touch signal is input to the processor 1301 as a control signal for processing.
  • the display screen 1305 is also used to provide virtual buttons and/or virtual keyboards, also called soft buttons and/or soft keyboards.
  • the display screen 1305 is a flexible display screen disposed on a curved or folded surface of the electronic device 1300 . Even, the display screen 1305 is also set as a non-rectangular irregular figure, that is, a special-shaped screen.
  • the display screen 1305 is made of materials such as LCD (Liquid Crystal Display, liquid crystal display), OLED (Organic Light-Emitting Diode, organic light emitting diode).
  • the camera assembly 1306 is used to capture images or video.
  • camera assembly 1306 includes a front-facing camera and a rear-facing camera.
  • the front camera is arranged on the front panel of the terminal, and the rear camera is arranged on the back of the terminal.
  • there are at least two rear cameras which are any one of a main camera, a depth-of-field camera, a wide-angle camera, and a telephoto camera, so as to realize the fusion of the main camera and the depth-of-field camera to realize the background blur function, the main camera It is integrated with the wide-angle camera to achieve panoramic shooting and VR (Virtual Reality, virtual reality) shooting functions or other integrated shooting functions.
  • the camera assembly 1306 also includes a flash.
  • the flash is a single color temperature flash, and in some embodiments, the flash is a dual color temperature flash. Dual color temperature flash refers to the combination of warm light flash and cold light flash, which is used for light compensation under different color temperatures.
  • the audio circuit 1307 includes a microphone and a speaker.
  • the microphone is used to collect the sound waves of the user and the environment, convert the sound waves into electrical signals, and input them to the processor 1301 for processing, or to the radio frequency circuit 1304 to realize voice communication.
  • there are multiple microphones which are respectively disposed in different parts of the electronic device 1300 .
  • the microphones are array microphones or omnidirectional collection microphones.
  • the speaker is used to convert the electrical signal from the processor 1301 or the radio frequency circuit 1304 into sound waves.
  • the loudspeaker is a conventional thin-film loudspeaker, and in some embodiments, the loudspeaker is a piezoelectric ceramic loudspeaker.
  • the speaker is a piezoelectric ceramic speaker, it can not only convert electrical signals into sound waves audible to humans, but also convert electrical signals into sound waves inaudible to humans for distance measurement and other purposes.
  • the audio circuit 1307 also includes a headphone jack.
  • the positioning component 1308 is used to locate the current geographic location of the electronic device 1300 to implement navigation or LBS (Location Based Service).
  • LBS Location Based Service
  • the positioning component 1308 is a positioning component based on the GPS (Global Positioning System, Global Positioning System) of the United States, the Beidou system of China or the Galileo system of Russia.
  • Power supply 1309 is used to power various components in electronic device 1300 .
  • the power source 1309 is alternating current, direct current, a disposable battery, or a rechargeable battery.
  • the rechargeable battery is a wired rechargeable battery or a wireless rechargeable battery. Wired rechargeable batteries are batteries that are charged through wired lines, and wireless rechargeable batteries are batteries that are charged through wireless coils. The rechargeable battery is also used to support fast charging technology.
  • the electronic device 1300 also includes one or more sensors 1310 .
  • the one or more sensors 1310 include, but are not limited to, an acceleration sensor 1311 , a gyro sensor 1312 , a pressure sensor 1313 , a fingerprint sensor 1314 , an optical sensor 1315 and a proximity sensor 1316 .
  • the acceleration sensor 1311 detects the magnitude of acceleration on the three coordinate axes of the coordinate system established by the electronic device 1300 .
  • the acceleration sensor 1311 is used to detect the components of the gravitational acceleration on the three coordinate axes.
  • the processor 1301 controls the display screen 1305 to display the user interface in a landscape view or a portrait view based on the gravitational acceleration signal collected by the acceleration sensor 1311 .
  • the acceleration sensor 1311 is also used for game or user movement data collection.
  • the gyroscope sensor 1312 detects the body direction and rotation angle of the electronic device 1300 , and the gyroscope sensor 1312 cooperates with the acceleration sensor 1311 to collect 3D actions of the user on the electronic device 1300 .
  • the processor 1301 can implement the following functions: motion sensing (such as changing the UI based on the user's tilt operation), image stabilization during shooting, game control, and inertial navigation.
  • the pressure sensor 1313 is disposed on the side frame of the electronic device 1300 and/or the lower layer of the display screen 1305 .
  • the pressure sensor 1313 can detect the user's holding signal of the electronic device 1300 , and the processor 1301 performs left and right hand recognition or quick operation based on the holding signal collected by the pressure sensor 1313 .
  • the processor 1301 controls the operability controls on the UI interface based on the user's pressure operation on the display screen 1305.
  • the operability controls include at least one of button controls, scroll bar controls, icon controls, and menu controls.
  • the fingerprint sensor 1314 is used to collect the user's fingerprint, and the processor 1301 identifies the user's identity based on the fingerprint collected by the fingerprint sensor 1314, or the fingerprint sensor 1314 identifies the user's identity based on the collected fingerprint. When the user's identity is identified as a trusted identity, the processor 1301 authorizes the user to perform relevant sensitive operations, including unlocking the screen, viewing encrypted information, downloading software, making payments, and changing settings.
  • the fingerprint sensor 1314 is disposed on the front, back, or side of the electronic device 1300 . When the electronic device 1300 is provided with physical buttons or a manufacturer's logo, the fingerprint sensor 1314 is integrated with the physical buttons or the manufacturer's logo.
  • Optical sensor 1315 is used to collect ambient light intensity.
  • the processor 1301 controls the display brightness of the display screen 1305 based on the ambient light intensity collected by the optical sensor 1315 . In some embodiments, when the ambient light intensity is high, the display brightness of the display screen 1305 is increased; when the ambient light intensity is low, the display brightness of the display screen 1305 is decreased. In another embodiment, the processor 1301 further dynamically adjusts the shooting parameters of the camera assembly 1306 based on the ambient light intensity collected by the optical sensor 1315 .
  • Proximity sensor 1316 also referred to as a distance sensor, is typically provided on the front panel of electronic device 1300 .
  • Proximity sensor 1316 is used to collect the distance between the user and the front of electronic device 1300 .
  • the processor 1301 controls the display screen 1305 to switch from the bright screen state to the off screen state; when the proximity sensor 1316 detects When the distance between the user and the front of the electronic device 1300 gradually increases, the processor 1301 controls the display screen 1305 to switch from the off-screen state to the bright-screen state.
  • FIG. 13 does not constitute a limitation on the electronic device 1300, and can include more or less components than the one shown, or combine some components, or adopt different component arrangements.
  • a computer-readable storage medium in which at least one piece of program code is stored, and at least one piece of program code is loaded and executed by a server to implement the image generation method in the above embodiment.
  • a computer-readable storage medium stores at least one piece of program code, and the at least one piece of program code is loaded and executed by a server, so as to realize the depth determination model in the above embodiment. training method.
  • the computer-readable storage medium is a memory.
  • the computer-readable storage medium is ROM (Read-Only Memory, read-only memory), RAM (Random Access Memory, random access memory), CD-ROM (Compact Disc Read-Only Memory, compact disc read-only storage) devices), magnetic tapes, floppy disks, and optical data storage devices, etc.
  • a computer program product or computer program comprising computer program code stored in a computer readable storage medium, the processor of the computer device from The computer-readable storage medium reads the computer program code, and the processor executes the computer program code to cause the computer device to perform the operations performed in the image generation method described above.
  • a computer program product or computer program comprising computer program code stored in a computer readable storage medium, the processor of the computer device from A computer-readable storage medium reads the computer program code, and the processor executes the computer program code, so that the computer device performs the operations performed in the above-mentioned training method for a depth determination model.

Landscapes

  • Physics & Mathematics (AREA)
  • Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Graphics (AREA)
  • Geometry (AREA)
  • Software Systems (AREA)
  • Processing Or Creating Images (AREA)

Abstract

一种图像生成方法及电子设备,涉及图像处理技术领域。方法包括:确定第一图像中第一图像区域的第一深度信息和第二图像区域的第二深度信息,第一图像区域为目标对象所在的图像区域,第二图像区域为背景所在的图像区域;通过基于第二图像区域的图像数据,替换第一图像区域的图像数据,获取第二图像;通过基于第二深度信息,填充第三图像区域的深度,获取第三图像区域的第三深度信息;通过基于第一深度信息和第三深度信息,将第一图像区域中的图像数据融合至深度填充后的第二图像中,获取第三图像。

Description

图像生成方法及电子设备
本公开基于申请号为202010947268.1、申请日为2020年9月10日的中国专利申请提出,并要求该中国专利申请的优先权,该中国专利申请的全部内容在此引入本公开作为参考。
技术领域
本公开涉及图像处理技术领域,特别涉及一种图像生成方法及电子设备。
背景技术
随着图像处理技术的发展,用户能随时随地拍摄图像,而由于拍摄的图像是二维图像,因此,该图像只能呈现到平面效果的景物和人像,在用户希望查看具有三维效果的景物和人像的时候,需要依靠后期制作,将二维图像转换为三维图像来实现。
相关技术中,通过双摄相机拍摄同一场景的不同角度的二维图像,确定不同角度的二维图像之间的差异信息,将该差异信息转化成二维图像的深度信息,基于该深度信息重建三维图像。
发明内容
本公开实施例提供了一种图像生成方法及电子设备,能够优化生成的三维图像的图像效果。所述技术方案如下:
根据本公开实施例的一方面,提供了一种图像生成方法,所述方法包括:
确定第一图像中第一图像区域的第一深度信息和第二图像区域的第二深度信息,所述第一图像区域为目标对象所在的图像区域,所述第二图像区域为背景所在的图像区域;
通过基于所述第二图像区域的图像数据替换所述第一图像区域的图像数据,获取第二图像;
通过基于所述第二深度信息填充所述第三图像区域的深度,获取第三图像区域的第三深度信息,所述第三图像区域为所述第二图像中与所述第一图像区域对应的图像区域;
通过基于所述第一深度信息和所述第三深度信息,将所述第一图像区域中的图像数据融合至深度填充后的所述第二图像中,获取第三图像。
在一些实施例中,所述通过基于所述第一深度信息和所述第三深度信息,将所述第一图像区域中的图像数据融合至深度填充后的所述第二图像中,获取第三图像,包括:
基于所述第一图像区域的图像数据,创建第一三维模型,所述第一三维模型为所述目标对象对应的三维模型;
基于深度填充后的所述第二图像,创建第二三维模型,所述第二三维模型为所述背景对应的三维模型;
基于所述第一深度信息和所述第三深度信息,融合所述第一三维模型和所述第二三维模型对应的像素信息,得到所述第三图像,其中,所述第一三维模型对应的像素点在所述第三图像中的深度信息为所述第一深度信息,所述第二三维模型对应的像素点在所述第三图像中的深度信息为所述第三深度信息。
在一些实施例中,所述基于所述第一深度信息和所述第三深度信息,融合所述第一三维模型和所述第二三维模型对应的像素信息,得到所述第三图像,包括:
从所述第一三维模型中,确定所述目标对象的每个像素点的深度信息,所述每个像素点的深度信息以所述目标对象的目标关键点的深度信息为基准,所述目标关键点为所述目标对象的关键点;
基于所述目标关键点,确定第一像素点,所述第一像素点为所述目标关键点在所述第二三维模型中对应的像素点;
赋值所述第一像素点的像素信息和深度信息,所述第一像素点的像素信息为所述目标关键点在所述第一三维模型中的像素信息,所述第一像素点的深度信息为所述目标关键点的第一深度信息;
基于所述目标关键点与所述目标对象中其他像素点的位置关系,确定第二像素点,所述第二像素点为所述其他像素点在所述第二三维模型中对应的像素点;
赋值所述第二像素点的像素信息和深度信息,得到所述第三图像,所述第二像素点的像素信息为所述其他像素点在所述第一三维模型中的像素信息,所述第二像素点的深度信息为所述其他像素点的第三深度信息。
在一些实施例中,所述通过基于所述第二图像区域的图像数据替换所述第一图像区域的图像数据,获取第二图像,包括:
对所述第一图像进行图像分割,确定所述第一图像区域对应的区域轮廓;
去除所述区域轮廓内的图像数据;
在去除后的所述区域轮廓中填充背景,得到所述第二图像。
在一些实施例中,所述在去除后的所述区域轮廓中填充背景,得到所述第二图像,包括:
将去除后的所述第一图像输入至图像补全模型,得到所述第二图像,所述图像补全模型用于在所述区域轮廓中填充背景。
在一些实施例中,所述确定第一图像中第一图像区域的第一深度信息和第二图像区域的第二深度信息,包括:
将所述第一图像输入至第一深度确定模型中,得到所述第一深度信息和所述第二深度信息。
在一些实施例中,所述第一深度确定模型包括特征提取层、特征图生成层、特征融合层和深度确定层;
所述将所述第一图像输入至第一深度确定模型中,得到所述第一深度信息和所述第二深度信息,包括:
将所述第一图像输入至所述特征提取层,通过所述特征提取层提取所述第一图像的多层特征得到所述第一图像的多个图像特征;
通过所述特征图生成层采样所述多个图像特征得到不同尺度的多个特征图;
通过所述特征融合层融合所述多个特征图得到融合后的特征图;
通过所述深度确定层卷积处理所述融合后的特征图得到所述第一深度信息和所述第二深度信息。
在一些实施例中,所述方法还包括:
确定待添加的特效元素的第一坐标和第二坐标,所述第一坐标为所述特效元素在所述第三图像的图像坐标系下的位置坐标,所述第二坐标为所述特效元素在所述第三图像的相机坐标系下的深度坐标;
通过基于所述第一坐标和所述第二坐标,将所述特效元素融合至所述第三图像的第一目标像素点,获取第四图像,所述第一目标像素点为位置坐标为所述第一坐标,深度坐标为所述第二坐标的像素点。
在一些实施例中,所述方法还包括:
旋转所述第三图像,生成视频。
在一些实施例中,所述旋转所述第三图像,生成视频,包括:
将所述目标对象的目标关键点对应的位置坐标设置为所述第三图像对应的相机坐标系的坐标原点;
确定向所述相机坐标系的每个坐标轴对应的方向进行旋转的旋转角度;
基于所述旋转角度,旋转所述第三图像中的像素点,生成视频。
在一些实施例中,所述确定向所述相机坐标系的每个坐标轴对应的方向进行旋转的旋转角度,包括:
获取所述目标关键点在每个方向的预设展示角度、预设运动速度和预设展示帧数;
基于所述预设运动速度和预设展示帧数,确定展示角度权重;
基于所述展示角度权重和所述预设展示角度,确定所述方向的旋转角度。
根据本公开实施例的另一方面,提供了一种深度确定模型的训练方法,所述方法包括:
获取多个第一图像集合,每个第一图像集合对应一个图像场景;
对于每个第一图像集合,基于第一数量和第二数量,确定所述第一图像集合的采样权重,所述第一数量为第一图像集合中包括的样本图像的数量,所述第二数量为所述多个第一图像集合中包括的样本图像的总数量,所述采样权重与所述第二数量正相关,且所述采样权重与所述第一数量负相关;
基于所述采样权重,采样所述第一图像集合,得到第二图像集合;
基于多个第二图像集合,训练第二深度确定模型得到第一深度确定模型。
根据本公开实施例的另一方面,提供了一种图像生成装置,所述装置包括:
第一确定单元,被配置为确定第一图像中第一图像区域的第一深度信息和第二图像区域的第二深度信息,所述第一图像区域为目标对象所在的图像区域,所述第二图像区域为背景所在的图像区域;
替换单元,被配置为通过基于所述第二图像区域的图像数据,替换所述第一图像区域的图像数据,获取第二图像;
填充单元,被配置为通过基于所述第二深度信息填充所述第三图像区域的深度,获取第三图像区域的第三深度信息,所述第三图像区域为所述第二图像中与所述第一图像区域对应的图像区域;
第一融合单元,被配置为通过基于所述第一深度信息和所述第三深度信息, 将所述第一图像区域中的图像数据融合至深度填充后的所述第二图像中,获取第三图像。
在一些实施例中,所述第一融合单元包括:
第一创建子单元,被配置为基于所述第一图像区域的图像数据,创建第一三维模型,所述第一三维模型为所述目标对象对应的三维模型;
第二创建子单元,被配置为基于深度填充后的所述第二图像,创建第二三维模型,所述第二三维模型为所述背景对应的三维模型;
融合子单元,被配置为基于所述第一深度信息和所述第三深度信息,融合所述第一三维模型和所述第二三维模型对应的像素信息,得到所述第三图像,其中,所述第一三维模型对应的像素点在所述第三图像中的深度信息为所述第一深度信息,所述第二三维模型对应的像素点在所述第三图像中的深度信息为所述第三深度信息。
在一些实施例中,所述融合子单元,被配置为从所述第一三维模型中,确定所述目标对象的每个像素点的深度信息,所述每个像素点的深度信息以所述目标对象的目标关键点的深度信息为基准,所述目标关键点为所述目标对象的关键点;基于所述目标关键点,确定第一像素点,所述第一像素点为所述目标关键点在所述第二三维模型中对应的像素点;赋值所述第一像素点的像素信息和深度信息,所述第一像素点的像素信息为所述目标关键点在所述第一三维模型中的像素信息,所述第一像素点的深度信息为所述目标关键点的第一深度信息;基于所述目标关键点与所述目标对象中其他像素点的位置关系,确定第二像素点,所述第二像素点为所述其他像素点在所述第二三维模型中对应的像素点;赋值所述第二像素点的像素信息和深度信息,得到所述第三图像,所述第二像素点的像素信息为所述其他像素点在所述第一三维模型中的像素信息,所述第二像素点的深度信息为所述其他像素点的第三深度信息。
在一些实施例中,所述替换单元包括:
分割子单元,被配置为对所述第一图像进行图像分割,确定所述第一图像区域对应的区域轮廓;
去除子单元,被配置为去除所述区域轮廓内的图像数据;
补全子单元,被配置为在去除后的所述区域轮廓中填充背景,得到所述第二图像。
在一些实施例中,所述补全子单元,被配置为将去除后的所述第一图像输入 至图像补全模型,得到所述第二图像,所述图像补全模型用于在所述区域轮廓中填充背景。
在一些实施例中,所述第一确定单元,被配置为将所述第一图像输入至第一深度确定模型中,得到所述第一深度信息和所述第二深度信息。
在一些实施例中,所述第一深度确定模型包括特征提取层、特征图生成层、特征融合层和深度确定层;
所述第一确定单元包括:
特征提取子单元,被配置为将所述第一图像输入至所述特征提取层,通过所述特征提取层提取所述第一图像的多层特征得到所述第一图像的多个图像特征;
采样子单元,被配置为通过所述特征图生成层采样所述多个图像特征得到不同尺度的多个特征图;
特征融合子单元,被配置为通过所述特征融合层融合所述多个特征图得到融合后的特征图;
卷积子单元,被配置为通过所述深度确定层卷积处理所述融合后的特征图得到所述第一深度信息和所述第二深度信息。
在一些实施例中,所述装置还包括:
第三确定单元,被配置为确定待添加的特效元素的第一坐标和第二坐标,所述第一坐标为所述特效元素在所述第三图像的图像坐标系下的位置坐标,所述第二坐标为所述特效元素在所述第三图像的相机坐标系下的深度坐标;
第二融合单元,被配置为通过基于所述第一坐标和所述第二坐标,将所述特效元素融合至所述第三图像的第一目标像素点,获取第四图像,所述第一目标像素点为位置坐标为所述第一坐标,深度坐标为所述第二坐标的像素点。
在一些实施例中,所述装置还包括:
生成单元,被配置为旋转所述第三图像,生成视频。
在一些实施例中,所述生成单元,包括:
坐标设置子单元,被配置为将所述目标对象的目标关键点对应的位置坐标设置为所述第三图像对应的相机坐标系的坐标原点;
确定子单元,被配置为确定向所述相机坐标系的每个坐标轴对应的方向进行旋转的旋转角度;
生成子单元,被配置为基于所述旋转角度,旋转所述第三图像中的像素点,生成视频。
在一些实施例中,所述确定子单元,被配置为:
获取子单元,被配置为获取所述目标关键点在每个方向的预设展示角度、预设运动速度和预设展示帧数;基于所述预设运动速度和预设展示帧数,确定展示角度权重;基于所述展示角度权重和所述预设展示角度,确定所述方向的旋转角度。
根据本公开实施例的另一方面,提供了一种深度确定模型的训练装置,所述装置包括:
获取单元,被配置为获取多个第一图像集合,每个第一图像集合对应一个图像场景;
第二确定单元,被配置为对于每个第一图像集合,基于第一数量和第二数量,确定所述第一图像集合的采样权重,所述第一数量为第一图像集合中包括的样本图像的数量,所述第二数量为所述多个第一图像集合中包括的样本图像的总数量,所述采样权重与所述第二数量正相关,且所述采样权重与所述第一数量负相关;
采样单元,被配置为基于所述采样权重,采样所述第一图像集合,得到第二图像集合;
模型训练单元,被配置为基于多个第二图像集合,训练第二深度确定模型得到第一深度确定模型。
根据本公开实施例的另一方面,提供了一种电子设备,所述电子设备包括处理器和存储器,所述存储器中存储有至少一条程序代码,所述至少一条程序代码由所述处理器加载并执行,以实现如下步骤:
确定第一图像中第一图像区域的第一深度信息和第二图像区域的第二深度信息,所述第一图像区域为目标对象所在的图像区域,所述第二图像区域为背景所在的图像区域;
通过基于所述第二图像区域的图像数据替换所述第一图像区域的图像数据,获取第二图像;
通过基于所述第二深度信息填充所述第三图像区域的深度,获取第三图像区域的第三深度信息,所述第三图像区域为所述第二图像中与所述第一图像区域对应的图像区域;
通过基于所述第一深度信息和所述第三深度信息,将所述第一图像区域中的图像数据融合至深度填充后的所述第二图像中,获取第三图像。
在一些实施例中,所述至少一条程序代码由所述处理器加载并执行,以实现如下步骤:
基于所述第一图像区域的图像数据,创建第一三维模型,所述第一三维模型为所述目标对象对应的三维模型;
基于深度填充后的所述第二图像,创建第二三维模型,所述第二三维模型为所述背景对应的三维模型;
基于所述第一深度信息和所述第三深度信息,融合所述第一三维模型和所述第二三维模型对应的像素信息,得到所述第三图像,其中,所述第一三维模型对应的像素点在所述第三图像中的深度信息为所述第一深度信息,所述第二三维模型对应的像素点在所述第三图像中的深度信息为所述第三深度信息。
在一些实施例中,所述至少一条程序代码由所述处理器加载并执行,以实现如下步骤:
从所述第一三维模型中,确定所述目标对象的每个像素点的深度信息,所述每个像素点的深度信息以所述目标对象的目标关键点的深度信息为基准,所述目标关键点为所述目标对象的关键点;
基于所述目标关键点,确定第一像素点,所述第一像素点为所述目标关键点在所述第二三维模型中对应的像素点;
赋值所述第一像素点的像素信息和深度信息,所述第一像素点的像素信息为所述目标关键点在所述第一三维模型中的像素信息,所述第一像素点的深度信息为所述目标关键点的第一深度信息;
基于所述目标关键点与所述目标对象中其他像素点的位置关系,确定第二像素点,所述第二像素点为所述其他像素点在所述第二三维模型中对应的像素点;
赋值所述第二像素点的像素信息和深度信息,得到所述第三图像,所述第二像素点的像素信息为所述其他像素点在所述第一三维模型中的像素信息,所述第二像素点的深度信息为所述其他像素点的第三深度信息。
在一些实施例中,所述至少一条程序代码由所述处理器加载并执行,以实现如下步骤:
对所述第一图像进行图像分割,确定所述第一图像区域对应的区域轮廓;
去除所述区域轮廓内的图像数据;
在去除后的所述区域轮廓中填充背景,得到所述第二图像。
在一些实施例中,所述至少一条程序代码由所述处理器加载并执行,以实现如下步骤:
将去除后的所述第一图像输入至图像补全模型,得到所述第二图像,所述图像补全模型用于在所述区域轮廓中填充背景。
在一些实施例中,所述至少一条程序代码由所述处理器加载并执行,以实现如下步骤:
将所述第一图像输入至第一深度确定模型中,得到所述第一深度信息和所述第二深度信息。
在一些实施例中,所述第一深度确定模型包括特征提取层、特征图生成层、特征融合层和深度确定层;所述至少一条程序代码由所述处理器加载并执行,以实现如下步骤:
将所述第一图像输入至所述特征提取层,通过所述特征提取层提取所述第一图像的多层特征得到所述第一图像的多个图像特征;
通过所述特征图生成层采样所述多个图像特征得到不同尺度的多个特征图;
通过所述特征融合层融合所述多个特征图得到融合后的特征图;
通过所述深度确定层卷积处理所述融合后的特征图得到所述第一深度信息和所述第二深度信息。
在一些实施例中,所述至少一条程序代码由所述处理器加载并执行,以实现如下步骤:
确定待添加的特效元素的第一坐标和第二坐标,所述第一坐标为所述特效元素在所述第三图像的图像坐标系下的位置坐标,所述第二坐标为所述特效元素在所述第三图像的相机坐标系下的深度坐标;
通过基于所述第一坐标和所述第二坐标,将所述特效元素融合至所述第三图像的第一目标像素点,获取第四图像,所述第一目标像素点为位置坐标为所述第一坐标,深度坐标为所述第二坐标的像素点。
在一些实施例中,所述至少一条程序代码由所述处理器加载并执行,以实现如下步骤:
旋转所述第三图像,生成视频。
在一些实施例中,所述至少一条程序代码由所述处理器加载并执行,以实现 如下步骤:
将所述目标对象的目标关键点对应的位置坐标设置为所述第三图像对应的相机坐标系的坐标原点;
确定向所述相机坐标系的每个坐标轴对应的方向进行旋转的旋转角度;
基于所述旋转角度,旋转所述第三图像中的像素点,生成视频。
在一些实施例中,所述至少一条程序代码由所述处理器加载并执行,以实现如下步骤:
获取所述目标关键点在每个方向的预设展示角度、预设运动速度和预设展示帧数;
基于所述预设运动速度和预设展示帧数,确定展示角度权重;
基于所述展示角度权重和所述预设展示角度,确定所述方向的旋转角度。
根据本公开实施例的另一方面,提供了一种电子设备,
所述电子设备包括处理器和存储器,所述存储器中存储有至少一条程序代码,所述至少一条程序代码由所述处理器加载并执行,以实现如下步骤:
获取多个第一图像集合,每个第一图像集合对应一个图像场景;
对于每个第一图像集合,基于第一数量和第二数量,确定所述第一图像集合的采样权重,所述第一数量为第一图像集合中包括的样本图像的数量,所述第二数量为所述多个第一图像集合中包括的样本图像的总数量,所述采样权重与所述第二数量正相关,且所述采样权重与所述第一数量负相关;
基于所述采样权重,采样所述第一图像集合,得到第二图像集合;
基于多个第二图像集合,训练第二深度确定模型得到第一深度确定模型。
根据本公开实施例的另一方面,提供了一种计算机可读存储介质,所述计算机可读存储介质中存储有至少一条程序代码,所述至少一条程序代码由处理器加载并执行,以实现如下步骤:
确定第一图像中第一图像区域的第一深度信息和第二图像区域的第二深度信息,所述第一图像区域为目标对象所在的图像区域,所述第二图像区域为背景所在的图像区域;
通过基于所述第二图像区域的图像数据替换所述第一图像区域的图像数据,获取第二图像;
通过基于所述第二深度信息填充所述第三图像区域的深度,获取第三图像区域的第三深度信息,所述第三图像区域为所述第二图像中与所述第一图像区域对应的图像区域;
通过基于所述第一深度信息和所述第三深度信息,将所述第一图像区域中的图像数据融合至深度填充后的所述第二图像中,获取第三图像。
根据本公开实施例的另一方面,提供了一种计算机可读存储介质,所述计算机可读存储介质中存储有至少一条程序代码,所述至少一条程序代码由处理器加载并执行,以实现如下步骤:
获取多个第一图像集合,每个第一图像集合对应一个图像场景;
对于每个第一图像集合,基于第一数量和第二数量,确定所述第一图像集合的采样权重,所述第一数量为第一图像集合中包括的样本图像的数量,所述第二数量为所述多个第一图像集合中包括的样本图像的总数量,所述采样权重与所述第二数量正相关,且所述采样权重与所述第一数量负相关;
基于所述采样权重,采样所述第一图像集合,得到第二图像集合;
基于多个第二图像集合,训练第二深度确定模型得到第一深度确定模型。
根据本公开实施例的另一方面,提供了一种计算机程序产品或计算机程序,所述计算机程序产品或所述计算机程序包括计算机程序代码,所述计算机程序代码存储在计算机可读存储介质中,计算机设备的处理器从计算机可读存储介质读取所述计算机程序代码,处理器执行所述计算机程序代码,使得所述计算机设备执行如下步骤:
确定第一图像中第一图像区域的第一深度信息和第二图像区域的第二深度信息,所述第一图像区域为目标对象所在的图像区域,所述第二图像区域为背景所在的图像区域;
通过基于所述第二图像区域的图像数据替换所述第一图像区域的图像数据,获取第二图像;
通过基于所述第二深度信息填充所述第三图像区域的深度,获取第三图像区域的第三深度信息,所述第三图像区域为所述第二图像中与所述第一图像区域对应的图像区域;
通过基于所述第一深度信息和所述第三深度信息,将所述第一图像区域中 的图像数据融合至深度填充后的所述第二图像中,获取第三图像。
根据本公开实施例的另一方面,提供了一种计算机程序产品或计算机程序,所述计算机程序产品或所述计算机程序包括计算机程序代码,所述计算机程序代码存储在计算机可读存储介质中,计算机设备的处理器从计算机可读存储介质读取所述计算机程序代码,处理器执行所述计算机程序代码,使得所述计算机设备执行如下步骤:
获取多个第一图像集合,每个第一图像集合对应一个图像场景;
对于每个第一图像集合,基于第一数量和第二数量,确定所述第一图像集合的采样权重,所述第一数量为第一图像集合中包括的样本图像的数量,所述第二数量为所述多个第一图像集合中包括的样本图像的总数量,所述采样权重与所述第二数量正相关,且所述采样权重与所述第一数量负相关;
基于所述采样权重,采样所述第一图像集合,得到第二图像集合;
基于多个第二图像集合,训练第二深度确定模型得到第一深度确定模型。
在本公开实施例中,由于第二图像是在第一图像中经过背景填充和深度填充后得到的,这样将第二图像和第一图像中目标对象所在的第一图像区域进行融合,得到第三图像,在第三图像的视角发生变化时,能够填补背景空洞的同时,还防止目标对象的边界处出现扭曲或缺失,优化了生成的图像的图像效果。
附图说明
为了更清楚地说明本公开实施例中的技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本公开的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还能够根据这些附图获得其他的附图。
图1是根据一示例性实施例提供的一种图像生成方法流程图;
图2是根据一示例性实施例提供的一种图像生成方法流程图;
图3是根据一示例性实施例提供的一种图像处理的示意图;
图4是根据一示例性实施例提供的一种图像处理的示意图;
图5是根据一示例性实施例提供的一种图像处理的示意图;
图6是根据一示例性实施例提供的一种图像生成方法流程图;
图7是根据一示例性实施例提供的一种图像生成方法流程图;
图8是根据一示例性实施例提供的一种图像处理的示意图;
图9是根据一示例性实施例提供的一种图像生成方法流程图;
图10是根据一示例性实施例提供的一种图像处理的示意图;
图11是根据一示例性实施例提供的一种图像生成装置的框图;
图12是根据一示例性实施例提供的一种深度确定模型的训练方法流程图;
图13是根据一示例性实施例提供的一种电子设备的结构示意图。
具体实施方式
为使本公开的目的、技术方案和优点更加清楚,下面将结合附图对本公开实施方式作进一步地详细描述。
这里将详细地对示例性实施例进行说明,其示例表示在附图中。下面的描述涉及附图时,除非另有表示,不同附图中的相同数字表示相同或相似的要素。以下示例性实施例中所描述的实施方式并不代表与本公开相一致的所有实施方式。相反,它们仅是与如所附权利要求书中所详述的、本公开的一些方面相一致的装置和方法的例子。
为了使采集到的图像能够以三维图像的形式进行展示,电子设备对采集到的画面进行图像处理,生成三维图像,将三维图像展示给用户。三维图像是指具有三维效果的图像。本公开实施例提供的方案应用在电子设备中,电子设备为具有图像采集功能的电子设备。例如,电子设备为摄像机,或者,电子设备为有摄像头的手机、平板电脑或可穿戴设备等。在本公开实施例中,对电子设备不做具体限定。
例如,本公开实施例提供的图像生成方法能够应用在如下几个场景中:
在一个场景:电子设备在拍摄图像时,直接按照本公开实施例提供的方法,将拍摄得到的二维图像转换为三维图像。
在另一个场景中,电子设备在拍摄得到二维图像后,将二维图像存储至电子设备中;在用户通过电子设备分享二维图像时,电子设备通过本公开实施例提供的方法,将二维图像转换为三维图像,分享该三维图像。其中,分享图像包括向其他用户分享图像、向社交展示平台分享图像、向短视频平台分享图像等中的至少一项。
在另一个场景中,电子设备在拍摄得到二维图像后,将二维图像存储至电子设备中;在用户通过电子设备生成视频时,电子设备获取被选择的多个二维图像, 通过本公开实施例提供的方法,将多个二维图像转换为多个三维图像,将多个三维图像合成视频。例如,用户在短视频平台分享视频时,先选择包含人脸的多个二维的自拍图像,通过本申请实施例提供的方法,将多个二维的自拍图像转换为多个三维的自拍图像,将多个三维的自拍图像合成视频,向短视频平台分享得到的该视频。
图1为根据一示例性实施例提供的一种图像生成方法流程图。如图1所示,该方法包括以下步骤:
步骤101:确定第一图像中第一图像区域的第一深度信息和第二图像区域的第二深度信息,该第一图像区域为目标对象所在的图像区域,该第二图像区域为背景所在的图像区域。
步骤102:通过基于该第二图像区域的图像数据替换该第一图像区域的图像数据,获取第二图像。
步骤103:通过基于该第二深度信息填充第三图像区域的深度,获取第三图像区域的第三深度信息,该第三图像区域为第二图像中与该第一图像区域对应的图像区域。
步骤104:通过基于该第一深度信息和该第三深度信息,将该第一图像区域中的图像数据融合至深度填充后的该第二图像中,获取第三图像。
在一些实施例中,通过基于该第一深度信息和该第三深度信息,将该第一图像区域中的图像数据融合至深度填充后的该第二图像中,获取第三图像,包括:
基于该第一图像区域的图像数据,创建第一三维模型,该第一三维模型为该目标对象对应的三维模型;
基于深度填充后的该第二图像,创建第二三维模型,该第二三维模型为该背景对应的三维模型;
基于该第一深度信息和该第三深度信息,融合该第一三维模型和该第二三维模型对应的像素信息,得到该第三图像,其中,该第一三维模型对应的像素点在该第三图像中的深度信息为该第一深度信息,该第二三维模型对应的像素点在该第三图像中的深度信息为该第三深度信息。
在一些实施例中,该基于该第一深度信息和该第三深度信息,融合该第一三维模型和该第二三维模型的像素信息,得到该第三图像,包括:
从该第一三维模型中,确定该目标对象的每个像素点的深度信息,该每个像 素点的深度信息以该目标对象的目标关键点的深度信息为基准,该目标关键点为该目标对象的关键点;
基于该目标关键点,确定第一像素点,第一像素点为目标关键点在该第二三维模型中对应的像素点;
赋值第一像素点的像素信息和深度信息,第一像素点的像素信息为该目标关键点在第一三维模型中的像素信息,第一像素点的深度信息为该目标关键点的第一深度信息;
基于该目标关键点与该目标对象中其他像素点的位置关系,确定第二像素点,第二像素点为其他像素点在该第二三维模型中对应的像素点;
赋值第二像素点的像素信息和深度信息,得到第三图像,第二像素点的像素信息为该其他像素点在第一三维模型中的像素信息,第二像素点的深度信息为该其他像素点的第三深度信息。
在一些实施例中,通过基于该第二图像区域的图像数据替换该第一图像区域的图像数据,获取第二图像,包括:
对该第一图像进行图像分割,确定该第一图像区域对应的区域轮廓;
去除该区域轮廓内的图像数据;
在去除后的区域轮廓中填充背景,得到第二图像。
在一些实施例中,在去除后的区域轮廓中填充背景,得到第二图像,包括:
将去除后的第一图像输入至图像补全模型,得到该第二图像,该图像补全模型用于在区域轮廓中填充背景。
在一些实施例中,确定第一图像中第一图像区域的第一深度信息和第二图像区域的第二深度信息,包括:
将第一图像输入至第一深度确定模型中,得到第一深度信息和第二深度信息。
在一些实施例中,该第一深度确定模型包括特征提取层、特征图生成层、特征融合层和深度确定层;
将第一图像输入至第一深度确定模型中,得到第一深度信息和第二深度信息,包括:
将该第一图像输入至该特征提取层,通过特征提取层提取第一图像的多层特征得到第一图像的多个图像特征;
通过特征图生成层采样多个图像特征得到不同尺度的多个特征图;
通过特征融合层融合多个特征图得到融合后的特征图;
通过深度确定层卷积处理融合后的特征图得到第一深度信息和第二深度信息。
在一些实施例中,该方法还包括:
确定待添加的特效元素的第一坐标和第二坐标,第一坐标为特效元素在第三图像的图像坐标系下的位置坐标,第二坐标为特效元素在第三图像的相机坐标系下的深度坐标;
通过基于第一坐标和第二坐标,获取第四图像,第四图像为将特效元素融合至第三图像的第一目标像素点,获取第四图像,第一目标像素点为位置坐标为第一坐标,深度坐标为第二坐标的像素点。
在一些实施例中,该方法还包括:
旋转第三图像,生成视频。
在一些实施例中,旋转第三图像,生成视频,包括:
将目标对象的目标关键点对应的位置坐标设置为第三图像对应的相机坐标系的坐标原点;
确定向相机坐标系的每个坐标轴对应的方向进行旋转的旋转角度;
基于旋转角度,旋转第三图像中的像素点,生成视频。
在一些实施例中,确定向相机坐标系的每个坐标轴对应的方向进行旋转的旋转角度,包括:
获取目标关键点在每个方向的预设展示角度、预设运动速度和预设展示帧数;
基于该预设运动速度和预设展示帧数,确定展示角度权重;
基于该展示角度权重和该第一预设展示角度,确定该方向的旋转角度。
在本公开实施例中,由于第二图像是在第一图像中经过背景填充和深度填充后得到的,这样将第二图像和第一图像中目标对象所在的第一图像区域进行融合,得到第三图像,在第三图像的视角发生变化时,能够填补背景空洞的同时,还防止目标对象的边界处出现扭曲或缺失,优化了生成的图像的图像效果。
图2为根据一示例性实施例提供的一种图像生成方法流程图。在本公开实施例中以训练第一深度确定模型为例进行说明。如图2所示,该方法包括以下步骤:
步骤201:电子设备获取多个第一图像集合,每个第一图像集合对应一个图像类别。
图像类别用于表示图像所属的场景,也即图像类别为图像场景,图像场景包括室内场景和室外场景。第一图像集合中包括多个样本图像,样本图像中标记该样本图像中像素点的深度信息。对于每个第一图像集合,电子设备获取该第一图像集合的步骤包括:电子设备获取多个图像,该多个图像的类别为该图像类别;标记该多个图像中像素点的深度信息,得到多个样本图像,将多个样本图像组成第一图像集合。
在一些实施例中,电子设备获取到多个第一图像集合后,将多个第一图像集合划分为训练数据和测试数据。训练数据用于对于模型进行训练,测试数据用于确定训练得到的模型是否符合要求。
在一些实施例中,电子设备从多个第一图像集合中选择部分样本图像作为训练数据,将多个第一图像集合中剩下的样本图像作为测试数据。在一些实施例中,电子设备从每个第一图像集合中选择部分样本图像,将从每个第一图像集合中选择的样本图像组成训练数据,将每个第一图像集合中剩下的样本图像组成测试数据。例如,电子设备获取两个第一图像集合,分别为图像集合A和图像集合B,图像集合A包括的样本图像的拍摄场景为室外,也即图像集合A的图像类别为室外;图像集合B包括的样本图像的拍摄场景为室内,也即图像集合B的图像类别为室外;电子设备分别从图像集合A和图像集合B中选择部分样本图像,将选择的样本图像组成训练数据,将图像集合A中剩余的样本图像和图像集合B中剩余的样本图像组成测试数据。
在本实现方式中,由于每个第一图像集合对应一个图像类别,因此通过多个图像集合进行后续的模型训练,使得能够针对不同图像类别下深度的差异对模型进行训练,从而提高了训练得到的第一深度确定模型的准确性。
步骤202:对于每个第一图像集合,电子设备基于第一数量和第二数量,确定该第一图像集合的采样权重。
其中,该第一数量为第一图像集合中包括的样本图像的数量,该第二数量为该多个第一图像集合中包括的样本图像的总数量,该采样权重与该第二数量正相关,且该采样权重与该第一数量负相关。由于每个第一图像集合对应一个图像类别,电子设备基于不同图像合集中的样本图像的数量,确定不同图像类别的第一图像集合的采样权重,从而后续基于不同图像类别的该采样权重进行模型训 练,能够提高准确性。
在一些实施例中,电子设备将第二数量和第一数量的比值作为采样权重。例如,第二数量为K和第一数量为k_i,则该采样权重为K/k_i。其中,i表示该第一图像集合的标签(图像类别)。
在本实现方式中,电子设备将第二数量和第一数量的比值作为采样权重,从而第一数量越多的第一图像集合的采样权重越小,第一数量越少的第一图像集合的采样权重越大,这样能够保证在进行模型训练时,各个图像类别的样本图像均衡,防止模型训练出现偏差。
步骤203:电子设备基于该采样权重,采样该第一图像集合,得到第二图像集合。
电子设备基于该采样权重,从该第一图像集合中获取样本图像,将获取的样本图像组成第二图像集合。
在一些实施例中,电子设备确定第三数量,第三数量为预期的第二图像集合的总数量。对于每个第一图像合集,电子设备基于该第一图像集合的采样权重和该第三数量,确定第四数量,第四数量为需要从该第一图像集合中采集的样本图像的数量,从该第一图像集合中采集该第四数量的样本图像。
在一些实施例中,该第四数量的样本图像为第一图像集合中相邻的样本图像,或者,该第四数量的样本图像为第一图像集合中随机采样得到的样本图像,或者,该第四数量的样本图像为第一图像集合中均匀采样得到的样本图像等。在本公开实施例中,对电子设备从第一图像集合采样得到样本图像的方式不作具体限定。
步骤204:电子设备基于多个第二图像集合,训练第二深度确定模型得到第一深度确定模型。
电子设备基于第二图像集合和损失函数,调整第二深度确定模型的模型参数,得到训练完成的第一深度确定模型,该过程通过以下步骤(1)-(3)实现,包括:
(1)电子设备基于该第二图像集合和损失函数,确定第二深度确定模型的损失值。
电子设备通过训练第二深度确定模型得到第一深度确定模型,该过程包括:对于第二图像集合中的每个样本图像,电子设备将该样本图像输入至第二深度确定模型中,输出该样本图像的深度信息,将该第二深度确定模型输出的深度信 息与该样本图像中标注的深度信息输入至损失函数中,得到第二深度确定模型的损失值。
在一些实施例中,该损失函数为向量损失函数;例如,该损失函数包括深度x方向损失函数、深度y方向损失函数、法向量损失函数和反向鲁棒损失函数(Reversed HuBer)中的至少一个。
另外,在本步骤之前,电子设备构建第二深度确定模型。在一些实施例中,电子设备通过卷积神经网络来构建第二深度确定模型。
该第二深度确定模型的结构与第一深度确定模型的结构相同。相应的,参见图3,该第二深度确定模型包括特征提取层、特征图生成层、特征融合层和深度确定层。其中,第二深度确定模型中的每一层由卷积层组成,并且,每一层卷积层为相同结构的卷积层或不同结构的卷积层。例如,第二深度确定模型中的卷积层为Depthwise Convolution(深度卷积结构)、Pointwise Convolution(逐点卷积结构)或Depthwise-Pointwise Convolution(深度逐点卷积结构)中的至少一种。在本公开实施例中,对该卷积层的结构不作具体限定。
其中,该特征提取层由四层卷积层组成。该特征提取层用于提取样本图像的多层特征,得到该样本图像的多个图像特征。例如,样本图像为3通道图像。相应的,电子设备将3通道的样本图像输入第一层卷积层,通过第一层卷积层,将该3通道的样本图像转化为16通道的样本图像;再将该16通道的样本图像输入至第二层卷积层,通过第二层卷积层,将该16通道的样本图像转化为32通道的样本图像;再将该32通道的样本图像输入至第三层卷积层,通过第三层卷积层,将该32通道的样本图像转化为64通道的样本图像;再将该64通道的样本图像输入至第四层卷积层,通过第四层卷积层,将该64通道的样本图像转化为128通道的样本图像。对于不同通道数的样本图像,分别提取该样本图像的图像特征,从而能够得到不同卷积层对应的不同的图像特征。
该特征图生成层用于采样多个图像特征,得到不同尺度的多个特征图,通过特征提取层输出的不同卷积层的图像特征,来确定样本图像中局部图像的特征和全局图像的特征,记录每个像素点在样本图像的位置与全局图像的相对关系,以便向特征融合层和深度确定层提供局部特征信息和全局特征信息。
其中,该特征图生成层由五层卷积层组成。第一层卷积层至第四层卷积层用于采样128通道的样本图像;该第一层至第四层卷积层分别与第五层卷积层连接,将采样后的样本图像输入至第五层卷积层,该第五卷积层将接收到的四个样 本图像进行尺度转换,得到不同尺度的多个特征图,将该不同尺度的多个特征图输入至特征融合层。
该特征融合层用于对该多个特征图进行特征融合,得到融合后的特征图。其中,该特征融合层逐步恢复图像分辨率以及缩减通道数,融合了特征提取层的特征,兼顾了样本图像中不同深度的特征。
其中,该特征融合层包括三层卷积层,第一层卷积层下采样128通道的样本图像的特征图,得到64通道的样本图像的特征图;第二层卷积层下采样64通道的样本图像的特征图,得到32通道的样本图像的特征图;第三层卷积层下采样32通道的样本图像的特征图,得到16通道的样本图像的特征图,然后对得到的多个特征图进行特征融合,得到融合后的特征图,将融合后的特征图输入至深度确定层。
该深度确定层用于基于融合后的特征图,确定样本图像的各个像素点的深度信息。
在一些实施例中,电子设备先获取多个第一图像集合,再构建第二深度确定模型;或者,电子设备先构建第二深度确定模型,再获取多个第一图像集合;或者,电子设备同时获取多个第一图像集合和构建第二深度确定模型。也即,电子设备先执行步骤201,再执行步骤202;或者,电子设备先执行步骤202,再执行步骤201;或者,电子设备同时执行步骤201和步骤202。在本公开实施例中,对步骤201和步骤202的执行顺序不作具体限定。
(2)电子设备通过该损失值和模型优化器更新该第二深度确定模型的模型参数,得到第三深度确定模型。
优化器用于采用随机梯度下降方法来更新模型参数。在本步骤中,电子设备基于该优化器,通过该随机梯度下降法更新模型参数,模型参数包括梯度值。
(3)电子设备基于该训练数据和向量损失函数,确定该第三深度确定模型的损失值,直到该损失值小于预设损失值,完成模型训练得到该第一深度确定模型。
电子设备调整第二深度确定模型的模型参数后,继续对得到的第三深度确定模型进行模型训练,该过程与步骤(1)-(2)相似,在此不再赘述,每次执行完步骤(2)之后,电子设备基于该模型的损失值,确定模型训练是否完成。响应于该损失值不小于预设损失值,确定模型训练未完成,继续执行步骤(1)-(2),响应于该损失值小于预设损失值,确定模型训练完成,得到第一深度确 定模型。
在一些实施例中,电子设备完成模型训练后,对该第一深度确定模型的预测结果进行评价。相应的,电子设备基于该测试数据,测试该第一深度确定模型,得到该第一深度确定模型的测试结果,该测试结果用于表示第一深度确定模型是否符合要求。响应于该测试结果用于表示第一深度确定模型符合要求时,确定第一深度确定模块为可用的深度确定模型,后续基于第一深度确定模型确定图像的深度信;响应于该测试结果用于表示第一深度确定模型不符合要求时,继续训练第一深度确定模型,直到第一深度确定模型符合要求为止。
其中,电子设备采用mean Relative Error(平均相对误差)算法或Root Mean Squared Error(均方根误差算法)中的至少一种算法来,确定第一确定模型的测试结果。图4和图5是根据一示例性实施例提供的一种第一深度确定模型的测试结果的效果图,深度信息相同的像素点被标注为相同的标记,且,深度信息越相似,标注的标记越相似。例如,通过不同的颜色区分不同的深度信息,深度信息越相似,颜色就越相近。
在一些实施例中,第一深度确定模型的训练过程由当前用于生成图像的电子设备执行;或者,由除当前设备以外的其他电子设备执行。在第一深度确定模型由其他电子设备执行的情况下,电子设备获取第一深度确定模型的过程为:电子设备向其他电子设备发送获取请求,该获取请求用于请求获取第一深度确定模型;其他电子设备基于该获取请求,获取第一深度确定模型,将该第一深度确定模型发送给电子设备;电子设备接收该第一深度确定模型。其中,其他电子设备训练第一深度确定模型的过程与电子设备训练第一深度确定模型的过程相似,在此不再赘述。
在本公开实施例中,由于采样权重是基于第一数量和第二数量确定的,第一数量为该第一图像集合中样本图像的数量,第二数量为多个第一图像集合中样本图像的总数量,从而在进行基于该采样权重,采样第一图像集合时,能够控制每个第一图像集合中的样本图像的数量,保证了包括样本图像越多的第一图像集合的采样权重越小,而包括样本图像越小的第一图像的采样权重越大,这样,保证每个第一图像集合选择出的样本图像是均衡的,防止模型训练出现偏差。
图6为根据一示例性实施例提供的一种图像生成方法流程图。在本公开实施例中以对图像进行处理,生成三维动态图像为例进行说明。如图6所示,该方 法包括以下步骤:
步骤601:电子设备确定第一图像中第一图像区域的第一深度信息和第二图像区域的第二深度信息。
该第一图像区域为目标对象所在的图像区域,该第二图像区域为背景所在的图像区域,背景为该第一图像中除该目标对象以外的部分。在一些实施例中,该目标对象为指定物体、人或其他动物面部等对象。
在一些实施例中,电子设备通过第一深度确定模型得到第一深度信息和第二深度信息,该过程为:电子设备将第一图像输入至第一深度确定模型,得到第一深度信息和第二深度信息。其中,该第一深度确定模型的结构与第二深度确定模型的结构相同,相应的,该第一深度确定模型包括特征提取层、特征图生成层、特征融合层和深度确定层。本步骤通过以下步骤(1)-(4)实现,包括:
(1)电子设备将该第一图像输入至该特征提取层,通过该特征提取层提取第一图像的多层特征得到第一图像的多个图像特征。
本步骤与步骤204的步骤(1)中,电子设备通过第二深度确定模型中特征提取层提取图像特征的过程相似,在此不再赘述。
(2)电子设备通过该特征图生成层采样多个图像特征得到不同尺度的多个特征图。
本步骤与步骤204的步骤(1)中,电子设备通过第二深度确定模型中特征图生成层,生成特征图的过程相似,在此不再赘述。
(3)电子设备通过该特征融合层融合多个特征图得到融合后的特征图。
本步骤与步骤204的步骤(1)中,电子设备通过第二深度确定模型中特征融合层进行特征融合的过程相似,在此不再赘述。
(4)电子设备通过该深度确定层卷积处理融合后的特征图得到第一深度信息和第二深度信息。
本步骤与步骤204的步骤(1)中,电子设备通过第二深度确定模型中的深度确定层确定图像的深度信息的过程相似,在此不再赘述。
在本实现方式中,通过事先训练好的第一深度确定模型,确定该第一图像的第一深度信息和第二深度信息,从而缩短了第一深度信息和第二深度信息的确定时长,进而提高了图像处理速度,使得本方案能够适用于即时成像的场景中。
在一些实施例中,电子设备检测第一图像中是否存在目标对象,响应于第一图像中存在目标对象,电子设备执行步骤601;响应于第一图像中不存在目标对 象,结束。
在一些实施例中,响应于第一图像中存在目标对象,电子设备还检测目标对象所在的第一图像区域与第一图像的面积比,响应于该面积比大于预设阈值,执行步骤601,响应于该面积比不大于预设阈值,结束。
在一些实施例中,该第一图像为RGB(Red Green Blue)三通道图像。
步骤602:电子设备通过基于第二图像区域的图像数据替换第一图像区域的图像数据,获取第二图像。
图像数据包括图像中像素点的位置、像素值等信息。电子设备将第一图像区域内的图像数据通过掩码去除,再通过第二图像区域对第一图像区域进行背景填充,得到第二图像。其中,本步骤通过以下步骤(1)-(3)实现,包括:
(1)电子设备对该第一图像进行图像分割,确定该第一图像区域对应的区域轮廓。
电子设备通过图像分割模型,对第一图像进行分割,得到第一图像区域对应的区域轮廓。该图像分割模型为电子设备事先获取的图像分割模型。在一些实施例中,该图像分割模型为掩码分割模型。
在一些实施例中,电子设备确定出第一图像区域对应的区域轮廓后,在第一图像中标记该区域轮廓。
(2)电子设备去除该区域轮廓内的图像数据。
电子设备去除区域轮廓中像素点的像素值,以实现去除该区域轮廓内的图像数据。另外,去除区域轮廓内的图像数据后,会得到第一图像区域的图像掩码。参见图7和图8,图7左侧的图像和图8左侧的图像所示的是区域轮廓的掩码图像。
(3)电子设备在去除后的区域轮廓中填充背景,得到第二图像。
电子设备通过图像补全模型,在去除后的区域轮廓中填充背景,则步骤(3)包括:电子设备将去除后的第一图像输入至图像补全模型,得到该第二图像,该图像补全模型用于在区域轮廓中填充背景。
电子设备将去除后的第一图像输入至图像补全模型,通过图像补全模型基于该第二图像区域的图像数据,在区域轮廓中填充背景,得到的第二图像是完整的背景图像。图7的右侧图像和图8右侧图像为完整的背景图像。
在一些实施例中,图像补全模型确定第二图像区域的图像特征,基于第二图像区域的图像特征,在区域轮廓中填充背景。
在本实现方式中,通过在区域轮廓中填充背景,从而防止了在视角变换时,目标对象的边界处出现空洞区域,优化了生成的三维图像的图像效果。
步骤603:电子设备通过基于第二深度信息填充第三图像区域的深度,获取第三图像区域的第三深度信息,该第三图像区域为第二图像中与该第一图像区域对应的图像区域。
电子设备基于该第二深度信息,向该第三图像区域进行深度信息扩散,得到该第三深度信息。在一些实施例中,扩散方式为泊松扩散方式。例如,电子设备确定第二图像区域中相邻像素点之间的深度信息变化规律,基于该深度信息变化规律,确定第三图像区域中每个像素点的深度信息;或者,对于第三图像区域中的每个像素点,电子设备确定该像素点在区域轮廓的深度信息,将确定出的深度信息赋值给该像素点。
在本实现方式中,电子设备通过填充第三图像区域的深度,从而使第三图像区域的深度与第二图像区域的深度匹配,从而使生成的背景更和谐,生成的三维图像的效果更真实。
步骤604:电子设备基于该第一图像区域的图像数据,创建第一三维模型,该第一三维模型为该目标对象对应的三维模型。
第一三维模型为基于第一图像区域的图像数据生成的三维模型。在一些实施例中,电子设备基于第一图像区域中目标对象的至少一个关键点,创建第一三维模型。例如,电子设备识别第一图像区域中目标对象的至少一个关键点,基于该至少一个关键点,通过三维模型生成算法,创建第一三维模型。参见图9,图9中右侧的图为基于左侧的图的人脸图像创建的第一三维模型。例如,该目标对象为人脸,则该至少一个关键点为人脸关键点。在一些实施例中,该三维模型生成算法为3DMM(3D Morphable Model,3D形变模型;3D,3 Dimensional,三维)算法。则该第一三维模型为mesh网格图像模型。
步骤605:电子设备基于深度填充后的该第二图像,创建第二三维模型,该第二三维模型为该背景对应的三维模型。
本步骤与步骤604相似,在此不再赘述。
步骤606:电子设备基于该第一深度信息和该第三深度信息,融合第一三维模型和第二三维模型对应的像素信息,得到第三图像。
其中,第一三维模型对应的像素点在第三图像中的深度信息为该第一深度信息,第二三维模型对应的像素点在第三图像中的深度信息为该第三深度信息。
在本实现方式中,通过将第一三维模型和第二三维模型融合生成第三图像,使得第三图像中包含三维的目标对象和三维的背景,从而保证了在视角变换时,能够填补背景空洞的同时,还防止目标对象的边界处出现扭曲或缺失,优化了生成的三维图像的图像效果。
在一些实施例中,电子设备确定一坐标系,将第一三维模型和第二三维模型融合至该坐标系下,使得第一三维模型和第二三维模型对应的像素点的深度信息为基于该坐标系的标准下的深度信息,分别将第一三维模型和第二三维模型对应的像素信息分别赋值到对应的像素位置,得到第三图像。在一些实施例中,电子设备基于第一三维模型或第二三维模型建立坐标系,将第二三维模型或第一三维模型映射到该坐标系中,将第一三维模型和第二三维模型对应的像素信息分别赋值到对应的像素位置,得到第三图像。其中,映射过程中,电子设备分别基于第一三维模型或第二三维模型中关键点在第二图像中的位置和第一三维模型和第二三维模型中各个关键点之间的参数信息,确定其他像素点与目标关键点的位置关系,基于该位置关系,融合第一三维模型和第二三维模型,得到第三图像。本步骤通过以下步骤(A1)-(A5)实现,包括:
(A1)电子设备从该第一三维模型中,确定该目标对象的每个像素点的深度信息,该每个像素点的深度信息以该目标对象的目标关键点的深度信息为基准,该目标关键点为该目标对象的关键点。
其中,该每个像素点的深度信息以该目标对象的目标关键点的深度信息为基准,该目标关键点为该至少一个关键点中的关键点。例如,该目标关键点为人脸图像中鼻头对应的像素点,或者,该目标关键点为第一三维模型的中心点。电子设备从目标对象的至少一个关键点中,选择一个目标关键点,将该目标关键点的深度信息确定为第一深度信息,电子设备基于第一三维模型的模型参数,确定第一三维模型中各个像素点相对于目标关键点的深度信息,基于该目标关键点的第一深度信息和第一三维模型中各个像素点相对于目标关键点的深度信息,确定第一三维模型中各个像素点的深度信息。例如,该第一三维模型为通过3DMM算法确定的mesh图像,则基于mesh图像中各个像素点的参数信息确定目标对象的各个像素点的深度信息。
(A2)电子设备基于目标关键点,确定第一像素点,
第一像素点为目标关键点在该第二三维模型中对应的像素点。第一三维模型和第二三维模型为第一图像中的目标对象和背景对应的三维模型,因此,第一 三维模型和第二三维模型能够映射到同一图像坐标系中。在本步骤中,电子设备将第一三维模型映射到第二三维模型中。在一些实施例中,电子设备选择第二三维模型的中心点作为第一像素点,或者,电子设备基于第一映射关系和第二映射关系,确定第一三维模型和第二三维模型的映射关系,第一映射关系为第一三维模型与第一图像的映射关系,第二映射关系为第二三维模型与第一图像的映射关系,基于该第一三维模型和第二三维模型的映射关系,从第二三维模型中确定目标关键点对应的第一像素点。
(A3)电子设备赋值第一像素点的像素信息和深度信息。
第一像素点的像素信息为目标关键点在第一三维模型中的像素信息,第一像素点的深度信息为目标关键点的第一深度信息。该像素信息包括像素点的像素值等信息。电子设备将第二三维模型中的第一像素点的深度信息修改为目标关键点的第一深度信息,将第一像素点的像素信息修改为目标关键点的像素信息。例如,电子设备将第一三维模型中人脸中鼻头的位置确定为目标关键点,则将第一像素点的深度信息确定为鼻子的第一深度信息。
在一些实施例中,电子设备直接将目标关键点的像素信息和第一深度信息赋值给该第一像素点。在一些实施例中,电子设备在第二三维模型上设置新的图层,将该图层中第一像素点的像素信息和深度信息修改为目标关键点的像素信息和第一深度信息。
在本实现方式中,通过添加新的图层,使得第一三维模型和第二三维模型之间能够互不影响,且到达一体成型的效果,优化了生成的三维图像的图像效果。
(A4)电子设备基于目标关键点与该目标对象中其他像素点的位置关系,确定第二像素点。
第二像素点为其他像素点在该第二三维模型中对应的像素点。例如,电子设备将目标关键点设置在第二三维模型对应的坐标系的原点,将该第一三维模型和第二三维模型对应的坐标系的原点设置在第二图像中目标关键点对应的像素点的位置。
(A5)电子设备赋值第二像素点的像素信息和深度信息,得到第三图像,第二像素点的像素信息为该其他像素点在第一三维模型中的像素信息,第二像素点的深度信息为该其他像素点的第三深度信息。
本步骤与步骤(A3)相似,在此不再赘述。
在本实现方式中,电子设备基于不同的像素点在同一图像中的位置关系,融 合第一三维模型和第二三维模型,使得在视角变换时,能够填补背景空洞的同时,还防止目标对象的边界处出现扭曲或缺失,优化了生成的三维图像的图像效果。
另外,电子设备还能够在该第三图像中添加特效元素,得到具有特效元素的第四图像,该过程为:电子设备确定待添加的特效元素的第一坐标和第二坐标,第一坐标为特效元素在第三图像的图像坐标系下的位置坐标,第二坐标为该特效元素在该第三图像的相机坐标系下的深度坐标,该深度坐标为该相机坐标系下该特效元素在该图像中的深度信息对应的坐标位置;通过基于第一坐标和第二坐标,将该特效元素融合至该第三图像的第一目标像素点,获取第四图像,第一目标像素点的位置坐标为第一坐标,深度坐标为第二坐标的像素点。
电子设备基于相机成像原理,将像素位置转换到坐标系。坐标系下的坐标为齐次坐标(X,Y,1),在该坐标系下该像素点的深度为该深度图估计的距离,将该齐次坐标的深度坐标1和深度Z相乘构成真实深度坐标(X,Y,Z),即为重建出的三维模型。
在本步骤中,电子设备选定三维图像中不同的位置和深度,放置不同的动态效果,得到第四图像。例如,参见图10,在人脸周围、深度分别为1,2,3.5的位置放置蝴蝶元素。该过程与步骤606中的(A1)-(A5)相似,在此不再赘述。
在本实现方式中,电子设备基于深度信息在第三图像中添加特效元素,使得添加的特效元素与第三图像更加贴合生动,优化了生成的三维图像的图像效果。
电子设备生成了三维的第三图像后,还能够旋转第三图像,生成视频。该过程通过以下步骤(B1)-(B3)实现,包括:
(B1)电子设备将目标对象的目标关键点对应的位置坐标设置为第三图像对应的相机坐标系的坐标原点。
(B2)电子设备确定向相机坐标系的每个坐标轴对应的方向进行选择的旋转角度。
电子设备分别确定每个坐标轴对应的方向上的旋转角度,在一些实施例中,该旋转角度为预设的旋转角度,或者,该旋转角度为基于旋转指令生成的旋转角度。
在一些实施例中,电子设备确定该目标关键点在预设展示角度、预设运动速度和预设展示帧数;基于该预设运动速度和预设展示帧数,确定展示角度权重;基于该展示角度权重和该预设展示角度,确定该方向的旋转角度。
例如,预设X(或Y)方向上的预设展示角度为AmpX(或AmpY),t为预设展示帧数,在一些实施例中,该预设展示帧数还通过时间标识,s为预设运动速度,那么每次绕X轴旋转AmpX*sin(s*t)角度(或绕Y轴旋转选AmpY*sin(s*t)角度)。其中,sin(s*t)为展示角度。
在本实现方式中,通过预先设置的运动轨迹,确定第三图像的展示轨迹,使得第三图像能够按照指定的路线进行旋转展示,防止第三图像生成视频时产生轨迹混乱的问题。
在一些实施例中,电子设备获取旋转指令,基于该旋转指令,从该旋转指令对应的旋转角度和预设展示角度中,选择该方向的旋转角度。
该旋转指令为电子设备接收到的用户通过屏幕输入的指令,或者,该旋转指令为由电子设备中的角度传感器产生的指令。在一些实施例中,电子设备接收用户输入的手势操作,基于手势操作确定旋转角度。在另一些实施例中,电子设备通过角度传感器确定当前电子设备的倾斜角度,将该倾斜角度确定为旋转角度。例如,该角度传感器为陀螺仪,则电子设备基于电子设备姿态获取陀螺仪四元数attitude,计算出X轴和Y轴的倾斜角度x_anlge和y_angle,绕X轴旋转min(x_anlge,AmpX)角度,再绕Y轴旋转min(y_anlge,AmpY)角度。
在本实现方式中,电子设备基于接收到的旋转指令确定第三图像的运动轨迹,使得第三图像的运动轨迹更加灵活。
(B3)电子设备基于该旋转角度,旋转该第三图像中的像素点,生成视频。
电子设备将坐标系平移到该像素点,基于该像素点和旋转角度旋转第三图像中的该像素点,得到视频。目标关键点按照上述运动轨迹运动,最终回到初始位置,重复执行上述(B2)-(B3)得到三维动态的视频。
在本实现方式中,第三图像基于运行轨迹生成三维动态视频,丰富了图像的展示方式。
在本公开实施例中,由于第二图像是在第一图像中经过背景填充和深度填充后得到的,这样将第二图像和第一图像中目标对象所在的第一图像区域进行融合,得到第三图像,在第三图像的视角发生变化时,能够填补背景空洞的同时,还防止目标对象的边界处出现扭曲或缺失,优化了生成的图像的图像效果。
图11据一示例性实施例提供的一种图像生成的框图。参见图11,装置包括:
第一确定单元1101,被配置为确定第一图像中第一图像区域的第一深度信 息和第二图像区域的第二深度信息,所述第一图像区域为目标对象所在的图像区域,所述第二图像区域为背景所在的图像区域;
替换单元1102,被配置为通过基于所述第二图像区域的图像数据,替换所述第一图像区域的图像数据,获取第二图像;
填充单元1103,被配置为通过基于所述第二深度信息填充所述第三图像区域的深度,获取第三图像区域的第三深度信息,所述第三图像区域为所述第二图像中与所述第一图像区域对应的图像区域;
第一融合单元1104,被配置为通过基于所述第一深度信息和所述第三深度信息,将所述第一图像区域中的图像数据融合至深度填充后的所述第二图像中,获取第三图像。
在一些实施例中,所述第一融合单元1104包括:
第一创建子单元,被配置为基于所述第一图像区域的图像数据,创建第一三维模型,所述第一三维模型为所述目标对象对应的三维模型;
第二创建子单元,被配置为基于深度填充后的所述第二图像,创建第二三维模型,所述第二三维模型为所述背景对应的三维模型;
融合子单元,被配置为基于所述第一深度信息和所述第三深度信息,融合所述第一三维模型和所述第二三维模型对应的像素信息,得到所述第三图像,其中,所述第一三维模型对应的像素点在所述第三图像中的深度信息为所述第一深度信息,所述第二三维模型对应的像素点在所述第三图像中的深度信息为所述第三深度信息。
在一些实施例中,所述融合子单元,被配置为从所述第一三维模型中,确定所述目标对象的每个像素点的深度信息,所述每个像素点的深度信息以所述目标对象的目标关键点的深度信息为基准,所述目标关键点为所述目标对象的关键点;基于所述目标关键点,确定第一像素点,所述第一像素点为所述目标关键点在所述第二三维模型中对应的像素点;赋值所述第一像素点的像素信息和深度信息,所述第一像素点的像素信息为所述目标关键点在所述第一三维模型中的像素信息,所述第一像素点的深度信息为所述目标关键点的第一深度信息;基于所述目标关键点与所述目标对象中其他像素点的位置关系,确定第二像素点,所述第二像素点为所述其他像素点在所述第二三维模型中对应的像素点;赋值所述第二像素点的像素信息和深度信息,得到所述第三图像,所述第二像素点的像素信息为所述其他像素点在所述第一三维模型中的像素信息,所述第二像素 点的深度信息为所述其他像素点的第三深度信息。
在一些实施例中,所述替换单元1102包括:
分割子单元,被配置为对所述第一图像进行图像分割,确定所述第一图像区域对应的区域轮廓;
去除子单元,被配置为去除所述区域轮廓内的图像数据;
补全子单元,被配置为在去除后的所述区域轮廓中填充背景,得到所述第二图像。
在一些实施例中,所述补全子单元,被配置为将去除后的所述第一图像输入至图像补全模型,得到所述第二图像,所述图像补全模型用于在所述区域轮廓中填充背景。
在一些实施例中,所述第一确定单元,被配置为将所述第一图像输入至第一深度确定模型中,得到所述第一深度信息和所述第二深度信息。
在一些实施例中,所述第一深度确定模型包括特征提取层、特征图生成层、特征融合层和深度确定层;
所述第一确定单元1101包括:
特征提取子单元,被配置为将所述第一图像输入至所述特征提取层,通过所述特征提取层提取所述第一图像的多层特征得到所述第一图像的多个图像特征;
采样子单元,被配置为通过所述特征图生成层采样所述多个图像特征得到不同尺度的多个特征图;
特征融合子单元,被配置为通过所述特征融合层融合所述多个特征图得到融合后的特征图;
卷积子单元,被配置为通过所述深度确定层卷积处理所述融合后的特征图得到所述第一深度信息和所述第二深度信息。
在一些实施例中,所述装置还包括:
第三确定单元,被配置为确定待添加的特效元素的第一坐标和第二坐标,所述第一坐标为所述特效元素在所述第三图像的图像坐标系下的位置坐标,所述第二坐标为所述特效元素在所述第三图像的相机坐标系下的深度坐标;
第二融合单元,被配置为通过基于所述第一坐标和所述第二坐标,将所述特效元素融合至所述第三图像的第一目标像素点,获取第四图像,所述第一目标像素点为位置坐标为所述第一坐标,深度坐标为所述第二坐标的像素点。
在一些实施例中,所述装置还包括:
生成单元,被配置为旋转所述第三图像,生成视频。
在一些实施例中,所述生成单元,包括:
坐标设置子单元,被配置为将所述目标对象的目标关键点对应的位置坐标设置为所述第三图像对应的相机坐标系的坐标原点;
确定子单元,被配置为确定向所述相机坐标系的每个坐标轴对应的方向进行旋转的旋转角度;
生成子单元,被配置为基于所述旋转角度,旋转所述第三图像中的像素点,生成视频。
在一些实施例中,所述确定子单元,被配置为获取所述目标关键点在每个方向的预设展示角度、预设运动速度和预设展示帧数;基于所述预设运动速度和预设展示帧数,确定展示角度权重;基于所述展示角度权重和所述预设展示角度,确定所述方向的旋转角度。
在本公开实施例中,由于第二图像是在第一图像中经过背景填充和深度填充后得到的,这样将第二图像和第一图像中目标对象所在的第一图像区域进行融合,得到第三图像,在第三图像的视角发生变化时,能够填补背景空洞的同时,还防止目标对象的边界处出现扭曲或缺失,优化了生成的图像的图像效果。
需要说明的是:上述实施例提供的图像生成装置在图像生成时,仅以上述各功能模块的划分进行举例说明,实际应用中,能够根据需要而将上述功能分配由不同的功能模块完成,即将装置的内部结构划分成不同的功能模块,以完成以上描述的全部或者部分功能。另外,上述实施例提供的图像生成装置与图像生成方法实施例属于同一构思,其具体实现过程详见方法实施例,这里不再赘述。
图12据一示例性实施例提供的一种深度确定模型的训练装置。参见图12,装置包括:
获取单元1201,被配置为获取多个第一图像集合,每个第一图像集合对应一个图像场景;
第二确定单元1202,被配置为对于每个第一图像集合,基于第一数量和第二数量,确定所述第一图像集合的采样权重,所述第一数量为第一图像集合中包括的样本图像的数量,所述第二数量为所述多个第一图像集合中包括的样本图像的总数量,所述采样权重与所述第二数量正相关,且所述采样权重与所述第一 数量负相关;
采样单元1203,被配置为基于所述采样权重,采样所述第一图像集合,得到第二图像集合;
模型训练单元1204,被配置为基于多个第二图像集合,训练第二深度确定模型得到第一深度确定模型。
在本公开实施例中,由于采样权重是基于第一数量和第二数量确定的,第一数量为该第一图像集合中样本图像的数量,第二数量为多个第一图像集合中样本图像的总数量,从而在进行基于该采样权重,采样第一图像集合时,能够控制每个第一图像集合中的样本图像的数量,保证了包括样本图像越多的第一图像集合的采样权重越小,而包括样本图像越小的第一图像的采样权重越大,这样,保证每个第一图像集合选择出的样本图像是均衡的,防止模型训练出现偏差。
需要说明的是:上述实施例提供的深度确定模型的装置在深度确定模型训练时,仅以上述各功能模块的划分进行举例说明,实际应用中,能够根据需要而将上述功能分配由不同的功能模块完成,即将装置的内部结构划分成不同的功能模块,以完成以上描述的全部或者部分功能。另外,上述实施例提供的深度确定模型的训练装置与深度确定模型的训练方法实施例属于同一构思,其具体实现过程详见方法实施例,这里不再赘述。
图13示出了本公开一个示例性实施例提供的电子设备1300的结构框图。在一些实施例中,电子设备1300是便携式移动终端,比如:智能手机、平板电脑、MP3播放器(Moving Picture Experts Group Audio Layer III,动态影像专家压缩标准音频层面3)、MP4(Moving Picture Experts Group Audio Layer IV,动态影像专家压缩标准音频层面4)播放器、笔记本电脑或台式电脑。电子设备1300还可能被称为用户设备、便携式终端、膝上型终端、台式终端等其他名称。
通常,电子设备1300包括有:处理器1301和存储器1302。
在一些实施例中,处理器1301包括一个或多个处理核心,比如4核心处理器、8核心处理器等。在一些实施例中,处理器1301采用DSP(Digital Signal Processing,数字信号处理)、FPGA(Field-Programmable Gate Array,现场可编程门阵列)、PLA(Programmable Logic Array,可编程逻辑阵列)中的至少一种硬件形式来实现。在一些实施例中,处理器1301也包括主处理器和协处理器,主处理器是用于对在唤醒状态下的数据进行处理的处理器,也称CPU(Central  Processing Unit,中央处理器);协处理器是用于对在待机状态下的数据进行处理的低功耗处理器。在一些实施例中,处理器1301集成有GPU(Graphics Processing Unit,图像处理器),GPU用于负责显示屏所需要显示的内容的渲染和绘制。一些实施例中,处理器1301还包括AI(Artificial Intelligence,人工智能)处理器,该AI处理器用于处理有关机器学习的计算操作。
在一些实施例中,存储器1302包括一个或多个计算机可读存储介质,该计算机可读存储介质是非暂态的。在一些实施例中,存储器1302还包括高速随机存取存储器,以及非易失性存储器,比如一个或多个磁盘存储设备、闪存存储设备。在一些实施例中,存储器1302中的非暂态的计算机可读存储介质用于存储至少一个指令,该至少一个指令用于被处理器1301所执行以实现本公开中方法实施例提供的图像生成方法。
在一些实施例中,电子设备1300还可选包括有:***设备接口1303和至少一个***设备。在一些实施例中,处理器1301、存储器1302和***设备接口1303之间通过总线或信号线相连。在一些实施例中,各个***设备通过总线、信号线或电路板与***设备接口1303相连。在一些实施例中,***设备包括:射频电路1304、显示屏1305、摄像头组件1306、音频电路1307、定位组件1308和电源1309中的至少一种。
***设备接口1303可被用于将I/O(Input/Output,输入/输出)相关的至少一个***设备连接到处理器1301和存储器1302。在一些实施例中,处理器1301、存储器1302和***设备接口1303被集成在同一芯片或电路板上;在一些其他实施例中,处理器1301、存储器1302和***设备接口1303中的任意一个或两个在单独的芯片或电路板上实现,本公开的实施例对此不加以限定。
射频电路1304用于接收和发射RF(Radio Frequency,射频)信号,也称电磁信号。射频电路1304通过电磁信号与通信网络以及其他通信设备进行通信。射频电路1304将电信号转换为电磁信号进行发送,或者,将接收到的电磁信号转换为电信号。在一些实施例中,射频电路1304包括:天线***、RF收发器、一个或多个放大器、调谐器、振荡器、数字信号处理器、编解码芯片组、用户身份模块卡等等。在一些实施例中,射频电路1304通过至少一种无线通信协议来与其他终端进行通信。该无线通信协议包括但不限于:万维网、城域网、内联网、各代移动通信网络(2G、3G、4G及5G)、无线局域网和/或WiFi(Wireless Fidelity,无线保真)网络。在一些实施例中,射频电路1304还包括NFC(Near Field  Communication,近距离无线通信)有关的电路,本公开对此不加以限定。
显示屏1305用于显示UI(User Interface,用户界面)。在一些实施例中,该UI包括图形、文本、图标、视频及其他们的任意组合。当显示屏1305是触摸显示屏时,显示屏1305还具有采集在显示屏1305的表面或表面上方的触摸信号的能力。在一些实施例中,该触摸信号作为控制信号输入至处理器1301进行处理。此时,显示屏1305还用于提供虚拟按钮和/或虚拟键盘,也称软按钮和/或软键盘。在一些实施例中,显示屏1305为一个,设置在电子设备1300的前面板;在另一些实施例中,显示屏1305为至少两个,分别设置在电子设备1300的不同表面或呈折叠设计;在另一些实施例中,显示屏1305是柔性显示屏,设置在电子设备1300的弯曲表面上或折叠面上。甚至,显示屏1305还设置成非矩形的不规则图形,也即异形屏。在一些实施例中,显示屏1305采用LCD(Liquid Crystal Display,液晶显示屏)、OLED(Organic Light-Emitting Diode,有机发光二极管)等材质制备。
摄像头组件1306用于采集图像或视频。在一些实施例中,摄像头组件1306包括前置摄像头和后置摄像头。通常,前置摄像头设置在终端的前面板,后置摄像头设置在终端的背面。在一些实施例中,后置摄像头为至少两个,分别为主摄像头、景深摄像头、广角摄像头、长焦摄像头中的任意一种,以实现主摄像头和景深摄像头融合实现背景虚化功能、主摄像头和广角摄像头融合实现全景拍摄以及VR(Virtual Reality,虚拟现实)拍摄功能或者其他融合拍摄功能。在一些实施例中,摄像头组件1306还包括闪光灯。在一些实施例中,闪光灯是单色温闪光灯,在一些实施例中,闪光灯是双色温闪光灯。双色温闪光灯是指暖光闪光灯和冷光闪光灯的组合,用于不同色温下的光线补偿。
在一些实施例中,音频电路1307包括麦克风和扬声器。麦克风用于采集用户及环境的声波,并将声波转换为电信号输入至处理器1301进行处理,或者输入至射频电路1304以实现语音通信。出于立体声采集或降噪的目的,在一些实施例中,麦克风为多个,分别设置在电子设备1300的不同部位。在一些实施例中,麦克风是阵列麦克风或全向采集型麦克风。扬声器则用于将来自处理器1301或射频电路1304的电信号转换为声波。在一些实施例中,扬声器是传统的薄膜扬声器,在一些实施例中,扬声器以是压电陶瓷扬声器。当扬声器是压电陶瓷扬声器时,不仅能够将电信号转换为人类可听见的声波,也能够将电信号转换为人类听不见的声波以进行测距等用途。在一些实施例中,音频电路1307还包括耳 机插孔。
定位组件1308用于定位电子设备1300的当前地理位置,以实现导航或LBS(Location Based Service,基于位置的服务)。在一些实施例中,定位组件1308是基于美国的GPS(Global Positioning System,全球定位***)、中国的北斗***或俄罗斯的伽利略***的定位组件。
电源1309用于为电子设备1300中的各个组件进行供电。在一些实施例中,电源1309是交流电、直流电、一次性电池或可充电电池。当电源1309包括可充电电池时,该可充电电池是有线充电电池或无线充电电池。有线充电电池是通过有线线路充电的电池,无线充电电池是通过无线线圈充电的电池。该可充电电池还用于支持快充技术。
在一些实施例中,电子设备1300还包括有一个或多个传感器1310。该一个或多个传感器1310包括但不限于:加速度传感器1311、陀螺仪传感器1312、压力传感器1313、指纹传感器1314、光学传感器1315以及接近传感器1316。
在一些实施例中,加速度传感器1311检测以电子设备1300建立的坐标系的三个坐标轴上的加速度大小。比如,加速度传感器1311用于检测重力加速度在三个坐标轴上的分量。在一些实施例中,处理器1301基于加速度传感器1311采集的重力加速度信号,控制显示屏1305以横向视图或纵向视图进行用户界面的显示。在一些实施例中,加速度传感器1311还用于游戏或者用户的运动数据的采集。
在一些实施例中,陀螺仪传感器1312检测电子设备1300的机体方向及转动角度,陀螺仪传感器1312与加速度传感器1311协同采集用户对电子设备1300的3D动作。处理器1301基于陀螺仪传感器1312采集的数据,能够实现如下功能:动作感应(比如基于用户的倾斜操作来改变UI)、拍摄时的图像稳定、游戏控制以及惯性导航。
在一些实施例中,压力传感器1313设置在电子设备1300的侧边框和/或显示屏1305的下层。当压力传感器1313设置在电子设备1300的侧边框时,能够检测用户对电子设备1300的握持信号,由处理器1301基于压力传感器1313采集的握持信号进行左右手识别或快捷操作。当压力传感器1313设置在显示屏1305的下层时,由处理器1301基于用户对显示屏1305的压力操作,实现对UI界面上的可操作性控件进行控制。可操作性控件包括按钮控件、滚动条控件、图标控件、菜单控件中的至少一种。
指纹传感器1314用于采集用户的指纹,由处理器1301基于指纹传感器1314采集到的指纹识别用户的身份,或者,由指纹传感器1314基于采集到的指纹识别用户的身份。在识别出用户的身份为可信身份时,由处理器1301授权该用户执行相关的敏感操作,该敏感操作包括解锁屏幕、查看加密信息、下载软件、支付及更改设置等。在一些实施例中,指纹传感器1314被设置在电子设备1300的正面、背面或侧面。当电子设备1300上设置有物理按键或厂商Logo时,指纹传感器1314与物理按键或厂商Logo集成在一起。
光学传感器1315用于采集环境光强度。在一个实施例中,处理器1301基于光学传感器1315采集的环境光强度,控制显示屏1305的显示亮度。在一些实施例中,当环境光强度较高时,调高显示屏1305的显示亮度;当环境光强度较低时,调低显示屏1305的显示亮度。在另一个实施例中,处理器1301还基于光学传感器1315采集的环境光强度,动态调整摄像头组件1306的拍摄参数。
接近传感器1316,也称距离传感器,通常设置在电子设备1300的前面板。接近传感器1316用于采集用户与电子设备1300的正面之间的距离。在一个实施例中,当接近传感器1316检测到用户与电子设备1300的正面之间的距离逐渐变小时,由处理器1301控制显示屏1305从亮屏状态切换为息屏状态;当接近传感器1316检测到用户与电子设备1300的正面之间的距离逐渐变大时,由处理器1301控制显示屏1305从息屏状态切换为亮屏状态。
本领域技术人员能够理解,图13中示出的结构并不构成对电子设备1300的限定,能够包括比图示更多或更少的组件,或者组合某些组件,或者采用不同的组件布置。
在示例性实施例中,还提供了一种计算机可读存储介质,计算机可读存储介质中存储至少一条程序代码,至少一条程序代码由服务器加载并执行,以实现上述实施例中图像生成方法。
在示例性实施例中,还提供了一种计算机可读存储介质,计算机可读存储介质中存储至少一条程序代码,至少一条程序代码由服务器加载并执行,以实现上述实施例中深度确定模型的训练方法。
在一些实施例中,该计算机可读存储介质是存储器。例如,该计算机可读存储介质是ROM(Read-Only Memory,只读存储器)、RAM(Random Access Memory,随机存取存储器)、CD-ROM(Compact Disc Read-Only Memory,紧凑型光盘只 读储存器)、磁带、软盘和光数据存储设备等。
在示例性实施例中,还提供了一种计算机程序产品或计算机程序,该计算机程序产品或计算机程序包括计算机程序代码,该计算机程序代码存储在计算机可读存储介质中,计算机设备的处理器从计算机可读存储介质读取该计算机程序代码,处理器执行该计算机程序代码,使得该计算机设备执行上述图像生成方法中所执行的操作。
在示例性实施例中,还提供了一种计算机程序产品或计算机程序,该计算机程序产品或计算机程序包括计算机程序代码,该计算机程序代码存储在计算机可读存储介质中,计算机设备的处理器从计算机可读存储介质读取该计算机程序代码,处理器执行该计算机程序代码,使得该计算机设备执行上述深度确定模型的训练方法中所执行的操作。
本公开所有实施例均能够单独被执行,还能够与其他实施例相结合被执行,均视为本公开要求的保护范围。
本领域普通技术人员能够理解实现上述实施例的全部或部分步骤能够通过硬件来完成,也能够通过程序来程序代码相关的硬件完成,该的程序存储于一种计算机可读存储介质中,上述提到的存储介质是只读存储器,磁盘或光盘等。

Claims (40)

  1. 一种图像生成方法,所述方法包括:
    确定第一图像中第一图像区域的第一深度信息和第二图像区域的第二深度信息,所述第一图像区域为目标对象所在的图像区域,所述第二图像区域为背景所在的图像区域;
    通过基于所述第二图像区域的图像数据替换所述第一图像区域的图像数据,获取第二图像;
    通过基于所述第二深度信息填充所述第三图像区域的深度,获取第三图像区域的第三深度信息,所述第三图像区域为所述第二图像中与所述第一图像区域对应的图像区域;
    通过基于所述第一深度信息和所述第三深度信息,将所述第一图像区域中的图像数据融合至深度填充后的所述第二图像中,获取第三图像。
  2. 根据权利要求1所述的方法,其中,所述通过基于所述第一深度信息和所述第三深度信息,将所述第一图像区域中的图像数据融合至深度填充后的所述第二图像中,获取第三图像,包括:
    基于所述第一图像区域的图像数据,创建第一三维模型,所述第一三维模型为所述目标对象对应的三维模型;
    基于深度填充后的所述第二图像,创建第二三维模型,所述第二三维模型为所述背景对应的三维模型;
    基于所述第一深度信息和所述第三深度信息,融合所述第一三维模型和所述第二三维模型对应的像素信息,得到所述第三图像,其中,所述第一三维模型对应的像素点在所述第三图像中的深度信息为所述第一深度信息,所述第二三维模型对应的像素点在所述第三图像中的深度信息为所述第三深度信息。
  3. 根据权利要求2所述的方法,其中,所述基于所述第一深度信息和所述第三深度信息,融合所述第一三维模型和所述第二三维模型对应的像素信息,得到所述第三图像,包括:
    从所述第一三维模型中,确定所述目标对象的每个像素点的深度信息,所述 每个像素点的深度信息以所述目标对象的目标关键点的深度信息为基准,所述目标关键点为所述目标对象的关键点;
    基于所述目标关键点,确定第一像素点,所述第一像素点为所述目标关键点在所述第二三维模型中对应的像素点;
    赋值所述第一像素点的像素信息和深度信息,所述第一像素点的像素信息为所述目标关键点在所述第一三维模型中的像素信息,所述第一像素点的深度信息为所述目标关键点的第一深度信息;
    基于所述目标关键点与所述目标对象中其他像素点的位置关系,确定第二像素点,所述第二像素点为所述其他像素点在所述第二三维模型中对应的像素点;
    赋值所述第二像素点的像素信息和深度信息,得到所述第三图像,所述第二像素点的像素信息为所述其他像素点在所述第一三维模型中的像素信息,所述第二像素点的深度信息为所述其他像素点的第三深度信息。
  4. 根据权利要求1所述的方法,其中,所述通过基于所述第二图像区域的图像数据替换所述第一图像区域的图像数据,获取第二图像,包括:
    对所述第一图像进行图像分割,确定所述第一图像区域对应的区域轮廓;
    去除所述区域轮廓内的图像数据;
    在去除后的所述区域轮廓中填充背景,得到所述第二图像。
  5. 根据权利要求4所述的方法,其中,所述在去除后的所述区域轮廓中填充背景,得到所述第二图像,包括:
    将去除后的所述第一图像输入至图像补全模型,得到所述第二图像,所述图像补全模型用于在所述区域轮廓中填充背景。
  6. 根据权利要求1所述的方法,其中,所述确定第一图像中第一图像区域的第一深度信息和第二图像区域的第二深度信息,包括:
    将所述第一图像输入至第一深度确定模型中,得到所述第一深度信息和所述第二深度信息。
  7. 根据权利要求6所述的方法,其中,所述第一深度确定模型包括特征提取层、特征图生成层、特征融合层和深度确定层;
    所述将所述第一图像输入至第一深度确定模型中,得到所述第一深度信息和所述第二深度信息,包括:
    将所述第一图像输入至所述特征提取层,通过所述特征提取层提取所述第一图像的多层特征得到所述第一图像的多个图像特征;
    通过所述特征图生成层采样所述多个图像特征得到不同尺度的多个特征图;
    通过所述特征融合层融合所述多个特征图得到融合后的特征图;
    通过所述深度确定层卷积处理所述融合后的特征图得到所述第一深度信息和所述第二深度信息。
  8. 根据权利要求1所述的方法,其中,所述方法还包括:
    确定待添加的特效元素的第一坐标和第二坐标,所述第一坐标为所述特效元素在所述第三图像的图像坐标系下的位置坐标,所述第二坐标为所述特效元素在所述第三图像的相机坐标系下的深度坐标;
    通过基于所述第一坐标和所述第二坐标,将所述特效元素融合至所述第三图像的第一目标像素点,获取第四图像,所述第一目标像素点为位置坐标为所述第一坐标,深度坐标为所述第二坐标的像素点。
  9. 根据权利要求1-8任一项所述的方法,其中,所述方法还包括:
    旋转所述第三图像,生成视频。
  10. 根据权利要求9所述的方法,其中,所述旋转所述第三图像,生成视频,包括:
    将所述目标对象的目标关键点对应的位置坐标设置为所述第三图像对应的相机坐标系的坐标原点;
    确定向所述相机坐标系的每个坐标轴对应的方向进行旋转的旋转角度;
    基于所述旋转角度,旋转所述第三图像中的像素点,生成视频。
  11. 根据权利要求10所述的方法,其特征在于,所述确定向所述相机坐标系 的每个坐标轴对应的方向进行旋转的旋转角度,包括:
    获取所述目标关键点在每个方向的预设展示角度、预设运动速度和预设展示帧数;
    基于所述预设运动速度和预设展示帧数,确定展示角度权重;
    基于所述展示角度权重和所述预设展示角度,确定所述方向的旋转角度。
  12. 一种深度确定模型的训练方法,所述方法包括:
    获取多个第一图像集合,每个第一图像集合对应一个图像场景;
    对于每个第一图像集合,根据第一数量和第二数量,确定所述第一图像集合的采样权重,所述第一数量为第一图像集合中包括的样本图像的数量,所述第二数量为所述多个第一图像集合中包括的样本图像的总数量,所述采样权重与所述第二数量正相关,且所述采样权重与所述第一数量负相关;
    基于所述采样权重,采样所述第一图像集合,得到第二图像集合;
    基于多个第二图像集合,训练第二深度确定模型得到第一深度确定模型。
  13. 一种图像生成装置,所述装置包括:
    第一确定单元,被配置为确定第一图像中第一图像区域的第一深度信息和第二图像区域的第二深度信息,所述第一图像区域为目标对象所在的图像区域,所述第二图像区域为背景所在的图像区域;
    替换单元,被配置为通过基于所述第二图像区域的图像数据,替换所述第一图像区域的图像数据,获取第二图像;
    填充单元,被配置为通过基于所述第二深度信息填充所述第三图像区域的深度,获取第三图像区域的第三深度信息,所述第三图像区域为所述第二图像中与所述第一图像区域对应的图像区域;
    第一融合单元,被配置为通过基于所述第一深度信息和所述第三深度信息,将所述第一图像区域中的图像数据融合至深度填充后的所述第二图像中,获取第三图像。
  14. 根据权利要求13所述的装置,其中,所述第一融合单元包括:
    第一创建子单元,被配置为基于所述第一图像区域的图像数据,创建第一三 维模型,所述第一三维模型为所述目标对象对应的三维模型;
    第二创建子单元,被配置为基于深度填充后的所述第二图像,创建第二三维模型,所述第二三维模型为所述背景对应的三维模型;
    融合子单元,被配置为基于所述第一深度信息和所述第三深度信息,融合所述第一三维模型和所述第二三维模型对应的像素信息,得到所述第三图像,其中,所述第一三维模型对应的像素点在所述第三图像中的深度信息为所述第一深度信息,所述第二三维模型对应的像素点在所述第三图像中的深度信息为所述第三深度信息。
  15. 根据权利要求14所述的装置,其中,所述融合子单元,被配置为从所述第一三维模型中,确定所述目标对象的每个像素点的深度信息,所述每个像素点的深度信息以所述目标对象的目标关键点的深度信息为基准,所述目标关键点为所述目标对象的关键点;基于所述目标关键点,确定第一像素点,所述第一像素点为所述目标关键点在所述第二三维模型中对应的像素点;赋值所述第一像素点的像素信息和深度信息,所述第一像素点的像素信息为所述目标关键点在所述第一三维模型中的像素信息,所述第一像素点的深度信息为所述目标关键点的第一深度信息;基于所述目标关键点与所述目标对象中其他像素点的位置关系,确定第二像素点,所述第二像素点为所述其他像素点在所述第二三维模型中对应的像素点;赋值所述第二像素点的像素信息和深度信息,得到所述第三图像,所述第二像素点的像素信息为所述其他像素点在所述第一三维模型中的像素信息,所述第二像素点的深度信息为所述其他像素点的第三深度信息。
  16. 根据权利要求13所述的装置,其中,所述替换单元包括:
    分割子单元,被配置为对所述第一图像进行图像分割,确定所述第一图像区域对应的区域轮廓;
    去除子单元,被配置为去除所述区域轮廓内的图像数据;
    补全子单元,被配置为在去除后的所述区域轮廓中填充背景,得到所述第二图像。
  17. 根据权利要求16所述的装置,其中,所述补全子单元,被配置为将去除 后的所述第一图像输入至图像补全模型,得到所述第二图像,所述图像补全模型用于在所述区域轮廓中填充背景。
  18. 根据权利要求13所述的装置,其中,所述第一确定单元,被配置为将所述第一图像输入至第一深度确定模型中,得到所述第一深度信息和所述第二深度信息。
  19. 根据权利要求18所述的装置,其中,所述第一深度确定模型包括特征提取层、特征图生成层、特征融合层和深度确定层;
    所述第一确定单元包括:
    特征提取子单元,被配置为将所述第一图像输入至所述特征提取层,通过所述特征提取层提取所述第一图像的多层特征得到所述第一图像的多个图像特征;
    采样子单元,被配置为通过所述特征图生成层采样所述多个图像特征得到不同尺度的多个特征图;
    特征融合子单元,被配置为通过所述特征融合层融合所述多个特征图得到融合后的特征图;
    卷积子单元,被配置为通过所述深度确定层卷积处理所述融合后的特征图得到所述第一深度信息和所述第二深度信息。
  20. 根据权利要求13所述的装置,其中,所述装置还包括:
    第三确定单元,被配置为确定待添加的特效元素的第一坐标和第二坐标,所述第一坐标为所述特效元素在所述第三图像的图像坐标系下的位置坐标,所述第二坐标为所述特效元素在所述第三图像的相机坐标系下的深度坐标;
    第二融合单元,被配置为通过基于所述第一坐标和所述第二坐标,将所述特效元素融合至所述第三图像的第一目标像素点,获取第四图像,所述第一目标像素点为位置坐标为所述第一坐标,深度坐标为所述第二坐标的像素点。
  21. 根据权利要求13-20任一项所述的装置,其中,所述装置还包括:
    生成单元,被配置为旋转所述第三图像,生成视频。
  22. 根据权利要求21所述的装置,其中,所述生成单元,包括:
    坐标设置子单元,被配置为将所述目标对象的目标关键点对应的位置坐标设置为所述第三图像对应的相机坐标系的坐标原点;
    确定子单元,被配置为确定向所述相机坐标系的每个坐标轴对应的方向进行旋转的旋转角度;
    生成子单元,被配置为基于所述旋转角度,旋转所述第三图像中的像素点,生成视频。
  23. 根据权利要求22所述的装置,其中,所述确定子单元,被配置为获取所述目标关键点在每个方向的预设展示角度、预设运动速度和预设展示帧数;基于所述预设运动速度和预设展示帧数,确定展示角度权重;基于所述展示角度权重和所述预设展示角度,确定所述方向的旋转角度。
  24. 一种深度确定模型的训练装置,所述装置包括:
    获取单元,被配置为获取多个第一图像集合,每个第一图像集合对应一个图像场景;
    第二确定单元,被配置为对于每个第一图像集合,基于第一数量和第二数量,确定所述第一图像集合的采样权重,所述第一数量为第一图像集合中包括的样本图像的数量,所述第二数量为所述多个第一图像集合中包括的样本图像的总数量,所述采样权重与所述第二数量正相关,且所述采样权重与所述第一数量负相关;
    采样单元,被配置为基于所述采样权重,采样所述第一图像集合,得到第二图像集合;
    模型训练单元,被配置为基于多个第二图像集合,训练第二深度确定模型得到第一深度确定模型。
  25. 一种电子设备,所述电子设备包括处理器和存储器,所述存储器中存储有至少一条程序代码,所述至少一条程序代码由所述处理器加载并执行,以实现如下步骤:
    确定第一图像中第一图像区域的第一深度信息和第二图像区域的第二深度 信息,所述第一图像区域为目标对象所在的图像区域,所述第二图像区域为背景所在的图像区域;
    通过基于所述第二图像区域的图像数据替换所述第一图像区域的图像数据,获取第二图像;
    通过基于所述第二深度信息填充所述第三图像区域的深度,获取第三图像区域的第三深度信息,所述第三图像区域为所述第二图像中与所述第一图像区域对应的图像区域;
    通过基于所述第一深度信息和所述第三深度信息,将所述第一图像区域中的图像数据融合至深度填充后的所述第二图像中,获取第三图像。
  26. 根据权利要求25所述的电子设备,其中,所述至少一条程序代码由所述处理器加载并执行,以实现如下步骤:
    基于所述第一图像区域的图像数据,创建第一三维模型,所述第一三维模型为所述目标对象对应的三维模型;
    基于深度填充后的所述第二图像,创建第二三维模型,所述第二三维模型为所述背景对应的三维模型;
    基于所述第一深度信息和所述第三深度信息,融合所述第一三维模型和所述第二三维模型对应的像素信息,得到所述第三图像,其中,所述第一三维模型对应的像素点在所述第三图像中的深度信息为所述第一深度信息,所述第二三维模型对应的像素点在所述第三图像中的深度信息为所述第三深度信息。
  27. 根据权利要求26所述的电子设备,其中,所述至少一条程序代码由所述处理器加载并执行,以实现如下步骤:
    从所述第一三维模型中,确定所述目标对象的每个像素点的深度信息,所述每个像素点的深度信息以所述目标对象的目标关键点的深度信息为基准,所述目标关键点为所述目标对象的关键点;
    基于所述目标关键点,确定第一像素点,所述第一像素点为所述目标关键点在所述第二三维模型中对应的像素点;
    赋值所述第一像素点的像素信息和深度信息,所述第一像素点的像素信息为所述目标关键点在所述第一三维模型中的像素信息,所述第一像素点的深度 信息为所述目标关键点的第一深度信息;
    基于所述目标关键点与所述目标对象中其他像素点的位置关系,确定第二像素点,所述第二像素点为所述其他像素点在所述第二三维模型中对应的像素点;
    赋值所述第二像素点的像素信息和深度信息,得到所述第三图像,所述第二像素点的像素信息为所述其他像素点在所述第一三维模型中的像素信息,所述第二像素点的深度信息为所述其他像素点的第三深度信息。
  28. 根据权利要求25所述的电子设备,其中,所述至少一条程序代码由所述处理器加载并执行,以实现如下步骤:
    对所述第一图像进行图像分割,确定所述第一图像区域对应的区域轮廓;
    去除所述区域轮廓内的图像数据;
    在去除后的所述区域轮廓中填充背景,得到所述第二图像。
  29. 根据权利要求28所述的电子设备,其中,所述至少一条程序代码由所述处理器加载并执行,以实现如下步骤:
    将去除后的所述第一图像输入至图像补全模型,得到所述第二图像,所述图像补全模型用于在所述区域轮廓中填充背景。
  30. 根据权利要求25所述的电子设备,其中,所述至少一条程序代码由所述处理器加载并执行,以实现如下步骤:
    将所述第一图像输入至第一深度确定模型中,得到所述第一深度信息和所述第二深度信息。
  31. 根据权利要求30所述的电子设备,其中,所述第一深度确定模型包括特征提取层、特征图生成层、特征融合层和深度确定层;所述至少一条程序代码由所述处理器加载并执行,以实现如下步骤:
    将所述第一图像输入至所述特征提取层,通过所述特征提取层提取所述第一图像的多层特征得到所述第一图像的多个图像特征;
    通过所述特征图生成层采样所述多个图像特征得到不同尺度的多个特征图;
    通过所述特征融合层融合所述多个特征图得到融合后的特征图;
    通过所述深度确定层卷积处理所述融合后的特征图得到所述第一深度信息和所述第二深度信息。
  32. 根据权利要求25所述的电子设备,其中,所述至少一条程序代码由所述处理器加载并执行,以实现如下步骤:
    确定待添加的特效元素的第一坐标和第二坐标,所述第一坐标为所述特效元素在所述第三图像的图像坐标系下的位置坐标,所述第二坐标为所述特效元素在所述第三图像的相机坐标系下的深度坐标;
    通过基于所述第一坐标和所述第二坐标,将所述特效元素融合至所述第三图像的第一目标像素点,获取第四图像,所述第一目标像素点为位置坐标为所述第一坐标,深度坐标为所述第二坐标的像素点。
  33. 根据权利要求25-32任一项所述的电子设备,其中,所述至少一条程序代码由所述处理器加载并执行,以实现如下步骤:
    旋转所述第三图像,生成视频。
  34. 根据权利要求33所述的电子设备,其中,所述至少一条程序代码由所述处理器加载并执行,以实现如下步骤:
    将所述目标对象的目标关键点对应的位置坐标设置为所述第三图像对应的相机坐标系的坐标原点;
    确定向所述相机坐标系的每个坐标轴对应的方向进行旋转的旋转角度;
    基于所述旋转角度,旋转所述第三图像中的像素点,生成视频。
  35. 根据权利要求34所述的电子设备,其中,所述至少一条程序代码由所述处理器加载并执行,以实现如下步骤:
    获取所述目标关键点在每个方向的预设展示角度、预设运动速度和预设展示帧数;
    基于所述预设运动速度和预设展示帧数,确定展示角度权重;
    基于所述展示角度权重和所述预设展示角度,确定所述方向的旋转角度。
  36. 一种电子设备,所述电子设备包括处理器和存储器,所述存储器中存储有至少一条程序代码,所述至少一条程序代码由所述处理器加载并执行,以实现如下步骤:
    获取多个第一图像集合,每个第一图像集合对应一个图像场景;
    对于每个第一图像集合,基于第一数量和第二数量,确定所述第一图像集合的采样权重,所述第一数量为第一图像集合中包括的样本图像的数量,所述第二数量为所述多个第一图像集合中包括的样本图像的总数量,所述采样权重与所述第二数量正相关,且所述采样权重与所述第一数量负相关;
    基于所述采样权重,采样所述第一图像集合,得到第二图像集合;
    基于多个第二图像集合,训练第二深度确定模型得到第一深度确定模型。
  37. 一种计算机可读存储介质,所述计算机可读存储介质中存储有至少一条程序代码,所述至少一条程序代码由处理器加载并执行,以实现如下步骤:
    确定第一图像中第一图像区域的第一深度信息和第二图像区域的第二深度信息,所述第一图像区域为目标对象所在的图像区域,所述第二图像区域为背景所在的图像区域;
    通过基于所述第二图像区域的图像数据替换所述第一图像区域的图像数据,获取第二图像;
    通过基于所述第二深度信息填充所述第三图像区域的深度,获取第三图像区域的第三深度信息,所述第三图像区域为所述第二图像中与所述第一图像区域对应的图像区域;
    通过基于所述第一深度信息和所述第三深度信息,将所述第一图像区域中的图像数据融合至深度填充后的所述第二图像中,获取第三图像。
  38. 一种计算机可读存储介质,所述计算机可读存储介质中存储有至少一条程序代码,所述至少一条程序代码由处理器加载并执行,以实现如下步骤:
    获取多个第一图像集合,每个第一图像集合对应一个图像场景;
    对于每个第一图像集合,基于第一数量和第二数量,确定所述第一图像集合的采样权重,所述第一数量为第一图像集合中包括的样本图像的数量,所述第二 数量为所述多个第一图像集合中包括的样本图像的总数量,所述采样权重与所述第二数量正相关,且所述采样权重与所述第一数量负相关;
    基于所述采样权重,采样所述第一图像集合,得到第二图像集合;
    基于多个第二图像集合,训练第二深度确定模型得到第一深度确定模型。
  39. 一种计算机程序产品或计算机程序,所述计算机程序产品或所述计算机程序包括计算机程序代码,所述计算机程序代码存储在计算机可读存储介质中,计算机设备的处理器从计算机可读存储介质读取所述计算机程序代码,处理器执行所述计算机程序代码,使得所述计算机设备执行以下步骤:
    确定第一图像中第一图像区域的第一深度信息和第二图像区域的第二深度信息,所述第一图像区域为目标对象所在的图像区域,所述第二图像区域为背景所在的图像区域;
    通过基于所述第二图像区域的图像数据替换所述第一图像区域的图像数据,获取第二图像;
    通过基于所述第二深度信息填充所述第三图像区域的深度,获取第三图像区域的第三深度信息,所述第三图像区域为所述第二图像中与所述第一图像区域对应的图像区域;
    通过基于所述第一深度信息和所述第三深度信息,将所述第一图像区域中的图像数据融合至深度填充后的所述第二图像中,获取第三图像。
  40. 一种计算机程序产品或计算机程序,所述计算机程序产品或所述计算机程序包括计算机程序代码,所述计算机程序代码存储在计算机可读存储介质中,计算机设备的处理器从计算机可读存储介质读取所述计算机程序代码,处理器执行所述计算机程序代码,使得所述计算机设备执行以下步骤:
    获取多个第一图像集合,每个第一图像集合对应一个图像场景;
    对于每个第一图像集合,基于第一数量和第二数量,确定所述第一图像集合的采样权重,所述第一数量为第一图像集合中包括的样本图像的数量,所述第二数量为所述多个第一图像集合中包括的样本图像的总数量,所述采样权重与所述第二数量正相关,且所述采样权重与所述第一数量负相关;
    基于所述采样权重,采样所述第一图像集合,得到第二图像集合;
    基于多个第二图像集合,训练第二深度确定模型得到第一深度确定模型。
PCT/CN2021/106178 2020-09-10 2021-07-14 图像生成方法及电子设备 WO2022052620A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010947268.1A CN114170349A (zh) 2020-09-10 2020-09-10 图像生成方法、装置、电子设备及存储介质
CN202010947268.1 2020-09-10

Publications (1)

Publication Number Publication Date
WO2022052620A1 true WO2022052620A1 (zh) 2022-03-17

Family

ID=80475637

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/106178 WO2022052620A1 (zh) 2020-09-10 2021-07-14 图像生成方法及电子设备

Country Status (2)

Country Link
CN (1) CN114170349A (zh)
WO (1) WO2022052620A1 (zh)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115334239A (zh) * 2022-08-10 2022-11-11 青岛海信移动通信技术股份有限公司 前后摄像头拍照融合的方法、终端设备和存储介质
CN116543075A (zh) * 2023-03-31 2023-08-04 北京百度网讯科技有限公司 图像生成方法、装置、电子设备及存储介质
CN116704129A (zh) * 2023-06-14 2023-09-05 维坤智能科技(上海)有限公司 基于全景图的三维图像生成方法、装置、设备及存储介质
CN117197003A (zh) * 2023-11-07 2023-12-08 杭州灵西机器人智能科技有限公司 一种多条件控制的纸箱样本生成方法
CN117422848A (zh) * 2023-10-27 2024-01-19 神力视界(深圳)文化科技有限公司 三维模型的分割方法及装置

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115205161B (zh) * 2022-08-18 2023-02-21 荣耀终端有限公司 一种图像处理方法及设备

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101271583A (zh) * 2008-04-28 2008-09-24 清华大学 一种基于深度图的快速图像绘制方法
CN102307312A (zh) * 2011-08-31 2012-01-04 四川虹微技术有限公司 一种对dibr技术生成的目标图像进行空洞填充的方法
US20120033852A1 (en) * 2010-08-06 2012-02-09 Kennedy Michael B System and method to find the precise location of objects of interest in digital images
CN102592275A (zh) * 2011-12-16 2012-07-18 天津大学 虚拟视点绘制方法
CN111222440A (zh) * 2019-12-31 2020-06-02 江西开心玉米网络科技有限公司 一种人像背景分离方法、装置、服务器及存储介质

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101271583A (zh) * 2008-04-28 2008-09-24 清华大学 一种基于深度图的快速图像绘制方法
US20120033852A1 (en) * 2010-08-06 2012-02-09 Kennedy Michael B System and method to find the precise location of objects of interest in digital images
CN102307312A (zh) * 2011-08-31 2012-01-04 四川虹微技术有限公司 一种对dibr技术生成的目标图像进行空洞填充的方法
CN102592275A (zh) * 2011-12-16 2012-07-18 天津大学 虚拟视点绘制方法
CN111222440A (zh) * 2019-12-31 2020-06-02 江西开心玉米网络科技有限公司 一种人像背景分离方法、装置、服务器及存储介质

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115334239A (zh) * 2022-08-10 2022-11-11 青岛海信移动通信技术股份有限公司 前后摄像头拍照融合的方法、终端设备和存储介质
CN115334239B (zh) * 2022-08-10 2023-12-15 青岛海信移动通信技术有限公司 前后摄像头拍照融合的方法、终端设备和存储介质
CN116543075A (zh) * 2023-03-31 2023-08-04 北京百度网讯科技有限公司 图像生成方法、装置、电子设备及存储介质
CN116543075B (zh) * 2023-03-31 2024-02-13 北京百度网讯科技有限公司 图像生成方法、装置、电子设备及存储介质
CN116704129A (zh) * 2023-06-14 2023-09-05 维坤智能科技(上海)有限公司 基于全景图的三维图像生成方法、装置、设备及存储介质
CN116704129B (zh) * 2023-06-14 2024-01-30 维坤智能科技(上海)有限公司 基于全景图的三维图像生成方法、装置、设备及存储介质
CN117422848A (zh) * 2023-10-27 2024-01-19 神力视界(深圳)文化科技有限公司 三维模型的分割方法及装置
CN117197003A (zh) * 2023-11-07 2023-12-08 杭州灵西机器人智能科技有限公司 一种多条件控制的纸箱样本生成方法
CN117197003B (zh) * 2023-11-07 2024-02-27 杭州灵西机器人智能科技有限公司 一种多条件控制的纸箱样本生成方法

Also Published As

Publication number Publication date
CN114170349A (zh) 2022-03-11

Similar Documents

Publication Publication Date Title
CN110544280B (zh) Ar***及方法
US11205282B2 (en) Relocalization method and apparatus in camera pose tracking process and storage medium
WO2022052620A1 (zh) 图像生成方法及电子设备
CN109308727B (zh) 虚拟形象模型生成方法、装置及存储介质
CN110992493B (zh) 图像处理方法、装置、电子设备及存储介质
US11393154B2 (en) Hair rendering method, device, electronic apparatus, and storage medium
KR102595150B1 (ko) 다수의 가상 캐릭터를 제어하는 방법, 기기, 장치 및 저장 매체
CN110427110B (zh) 一种直播方法、装置以及直播服务器
CN110064200B (zh) 基于虚拟环境的物体构建方法、装置及可读存储介质
CN109815150B (zh) 应用测试方法、装置、电子设备及存储介质
CN110148178B (zh) 相机定位方法、装置、终端及存储介质
CN108694073B (zh) 虚拟场景的控制方法、装置、设备及存储介质
CN112287852B (zh) 人脸图像的处理方法、显示方法、装置及设备
CN109522863B (zh) 耳部关键点检测方法、装置及存储介质
CN109166150B (zh) 获取位姿的方法、装置存储介质
WO2020233403A1 (zh) 三维角色的个性化脸部显示方法、装置、设备及存储介质
WO2022042425A1 (zh) 视频数据处理方法、装置、计算机设备及存储介质
CN112581358B (zh) 图像处理模型的训练方法、图像处理方法及装置
CN111897429A (zh) 图像显示方法、装置、计算机设备及存储介质
CN110837300B (zh) 虚拟交互的方法、装置、电子设备及存储介质
WO2022199102A1 (zh) 图像处理方法及装置
CN111105474A (zh) 字体绘制方法、装置、计算机设备及计算机可读存储介质
CN112308103A (zh) 生成训练样本的方法和装置
CN112381729B (zh) 图像处理方法、装置、终端及存储介质
CN110300275B (zh) 视频录制、播放方法、装置、终端及存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21865676

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 27.06.2023)

122 Ep: pct application non-entry in european phase

Ref document number: 21865676

Country of ref document: EP

Kind code of ref document: A1