CN116977417A

CN116977417A - Pose estimation method and device, electronic equipment and storage medium

Info

Publication number: CN116977417A
Application number: CN202310804725.5A
Authority: CN
Inventors: 彭昊天
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2023-06-30
Filing date: 2023-06-30
Publication date: 2023-10-31

Abstract

The disclosure relates to the technical field of artificial intelligence, in particular to the technical fields of computer vision, augmented reality, virtual reality, deep learning and the like, and can be applied to scenes such as metauniverse, digital people and the like, in particular to a pose estimation method, a pose estimation device, electronic equipment and a storage medium. The specific implementation scheme is as follows: acquiring a plurality of face-missing images about a target object; the target object is provided with a hairstyle area, and the plurality of facial missing images respectively correspond to different visual angles; performing face filling on the plurality of face missing images to obtain a plurality of face filling images; wherein the face filling images and the face missing images are in one-to-one correspondence; for each face-filling image, a pose estimation result corresponding to a hairstyle area in the face-filling image is obtained based on a face area of the face-filling image. By adopting the method and the device, the accuracy of the pose estimation result corresponding to the hairstyle area can be improved.

Description

Pose estimation method and device, electronic equipment and storage medium

Technical Field

The disclosure relates to the technical field of artificial intelligence, in particular to the technical fields of computer vision, augmented reality, virtual reality, deep learning and the like, and can be applied to scenes such as metauniverse, digital people and the like, in particular to a pose estimation method, a pose estimation device, electronic equipment and a storage medium.

Background

At present, the three-dimensional virtual image has wide application value in social, live broadcast, game and other user scenes. The three-dimensional virtual image modeling based on artificial intelligence has wide application prospect, and hairline reconstruction is an extremely important link in the three-dimensional virtual image modeling process.

Disclosure of Invention

The disclosure provides a pose estimation method, a pose estimation device, electronic equipment and a storage medium.

According to an aspect of the present disclosure, there is provided a pose estimation method, including:

acquiring a plurality of face-missing images about a target object; the target object is provided with a hairstyle area, and the plurality of facial missing images respectively correspond to different visual angles;

performing face filling on the plurality of face missing images to obtain a plurality of face filling images; wherein the face filling images and the face missing images are in one-to-one correspondence;

for each face-filling image, a pose estimation result corresponding to a hairstyle area in the face-filling image is obtained based on a face area of the face-filling image.

According to another aspect of the present disclosure, there is provided a pose estimation apparatus including:

an image acquisition unit configured to acquire a plurality of face missing images concerning a target object; the target object is provided with a hairstyle area, and the plurality of facial missing images respectively correspond to different visual angles;

A face filling unit for performing face filling on the plurality of face missing images to obtain a plurality of face filling images; wherein the face filling images and the face missing images are in one-to-one correspondence;

a first pose estimation unit for obtaining a pose estimation result corresponding to a hairstyle area in the face filling image based on a face area of the face filling image for each face filling image.

According to another aspect of the present disclosure, there is provided an electronic device including:

at least one processor;

a memory communicatively coupled to the at least one processor;

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of the embodiments of the present disclosure.

According to another aspect of the present disclosure, there is provided a non-transitory computer-readable storage medium storing computer instructions for causing the computer to perform a method according to any one of the embodiments of the present disclosure.

According to another aspect of the present disclosure, there is provided a computer program product comprising a computer program which, when executed by a processor, implements a method according to any of the embodiments of the present disclosure.

By adopting the method and the device, the accuracy of the pose estimation result corresponding to the hairstyle area can be improved.

It should be understood that the description in this section is not intended to identify key or critical features of the embodiments of the disclosure, nor is it intended to be used to limit the scope of the disclosure. Other features of the present disclosure will become apparent from the following specification.

Drawings

The drawings are for a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:

fig. 1 is a flow chart of a pose estimation method according to an embodiment of the present disclosure;

fig. 2 is a schematic diagram of a preparation process of a plurality of facial deletion images according to an embodiment of the present disclosure;

fig. 3 and 4 are diagrams illustrating effects of pose estimation according to an embodiment of the present disclosure;

FIG. 5 is an illustration of a facial filling effect provided by an embodiment of the present disclosure;

FIG. 6 is an explanatory diagram of a process of acquiring a plurality of face filling images according to an embodiment of the present disclosure;

FIG. 7 is an illustrative diagram of a face-missing area calculation process provided by an embodiment of the present disclosure;

fig. 8 is an application explanatory diagram of a result of estimating a hair style pose according to an embodiment of the present disclosure;

fig. 9 is a schematic view of a scenario of a pose estimation method according to an embodiment of the present disclosure;

Fig. 10 is a schematic block diagram of a pose estimation device according to an embodiment of the present disclosure;

fig. 11 is a schematic block diagram of an electronic device according to an embodiment of the disclosure.

Detailed Description

Exemplary embodiments of the present disclosure are described below in conjunction with the accompanying drawings, which include various details of the embodiments of the present disclosure to facilitate understanding, and should be considered as merely exemplary. Accordingly, one of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

The existing hairline reconstruction method generally utilizes a plurality of hairline face images with hairlines corresponding to different visual angles respectively to reconstruct hairlines, obtains a hairline pose estimation result aiming at each hairline face image with hairlines, and maps the hairline reconstruction result to a three-dimensional virtual image according to the hairline pose estimation result. However, due to the problems of portrait rights and the like, the face of the hairstyle face image often has a face missing condition, which affects the accuracy of the estimation result of the hairstyle pose.

Based on the above background, embodiments of the present disclosure provide a pose estimation method, which may be applied to an electronic device. In the following, a pose estimation method provided by the embodiment of the present disclosure will be described with reference to a flowchart shown in fig. 1. It should be noted that although a logical order is illustrated in the flowchart, in some cases, the steps illustrated or described may be performed in other orders.

Step S101 of acquiring a plurality of face missing images concerning a target object; the target object is provided with a hairstyle area, and the plurality of facial missing images respectively correspond to different visual angles;

step S102, face filling is carried out on a plurality of face missing images, and a plurality of face filling images are obtained; wherein the face filling images and the face missing images are in one-to-one correspondence;

step S103, for each face-filling image, a pose estimation result corresponding to a hairstyle area in the face-filling image is obtained based on the face area of the face-filling image.

Wherein the target object may be a person and have a hair styling area, which may be the area of the target object where the hair is located in three-dimensional space.

In addition, referring to fig. 2, in an embodiment of the disclosure, a method for preparing a plurality of facial deletion images may be: acquiring original images of the target object 202 from a plurality of viewpoints by changing the pose of the camera 201 to obtain a plurality of original images; the original images comprise a plurality of second visual angle images and a plurality of first visual angle images; and aiming at each second view angle image, the face area in the second view angle image is scratched out to obtain a face missing image corresponding to the second view angle image, and finally, a plurality of face missing images can be obtained. The camera 201 may be an independent image capturing apparatus, or may be an accessory image capturing apparatus mounted on a smart phone, a wearable apparatus, or the like; the target object 202 may remain in a sitting or standing position; the plurality of second perspective images may include an original image of the target object 202 having a front image, a right front side image, a left front side image, a right side image, and a left side image with a face area larger than a preset area; the plurality of first perspective images may include original images having a face area of less than or equal to a preset area, such as a back image, a right back side image, a left back side image, and the like of the target object 202. The preset area may be set according to actual requirements, which is not limited in the embodiments of the present disclosure.

After preparing a plurality of face-missing images, the plurality of face-missing images are acquired again when step S101 is performed.

Thereafter, face filling may be performed on the plurality of face-missing images, resulting in a plurality of face-filled images. In a specific example, for each face-missing image, the face-missing image may be face-filled using a face-filling algorithm to obtain a face-filled image corresponding to the face-missing image. The face filling algorithm may be a local redrawing (Inpaint) method under a Stable Diffusion (Stable Diffusion) algorithm, and the Prompt word (Prompt) may be set to face (face).

After a plurality of face-filling images are obtained, for each face-filling image, a pose estimation result corresponding to a hairstyle area in the face-filling image may be obtained based on a face area of the face-filling image. The pose estimation result corresponding to the hairstyle area in the face filling image may be a face pose in the face filling image.

By adopting the pose estimation method provided by the embodiment of the disclosure, after the plurality of face missing images about the target object are acquired, the corresponding pose estimation result is not directly obtained based on the face missing images, but the face missing images are subjected to face filling to obtain a plurality of face filling images corresponding to the face missing images one by one, and then the pose estimation result corresponding to the hairstyle area in the face filling image is obtained based on the face area of the face filling image. In the case of complete face area, there are richer referenceable features (e.g., face key points) in the face-filling image, so, based on the face area of the face-filling image, the pose estimation result corresponding to the hairstyle area in the face-filling image is obtained, and the accuracy of the pose estimation result corresponding to the hairstyle area can be improved.

Referring to fig. 3, that is, after acquiring a plurality of face missing images about a target object, when a pose estimation result corresponding to a hairstyle region in a face true image is obtained directly based on each face missing image, abundant referenceable features cannot be provided due to the missing of the face region, and thus a face pose (having a large difference from an actual face pose) cannot be accurately predicted from the face filling image, and finally, the accuracy of the hairstyle pose estimation result is affected; referring to fig. 4 again, after obtaining a plurality of face missing images about a target object, the pose estimation method provided by the embodiment of the present disclosure performs face filling on the plurality of face missing images to obtain a plurality of face filling images, and then, for each face filling image, under the condition that the face area of the face filling image is complete, the face filling image has richer referent features, so that the pose (which is substantially the same as the actual pose) of a face in the face filling image can be accurately predicted based on the face filling image, and then, the pose of the face in the face filling image is used as a pose estimation result corresponding to a hairstyle area, so as to improve the accuracy of the pose estimation result corresponding to the hairstyle area.

In some alternative embodiments, "face filling a plurality of face-missing images to obtain a plurality of face-filled images" may include the steps of:

selecting a first missing image from the plurality of face missing images;

face filling is carried out on the first missing image, and a first filling image is obtained;

a plurality of face fills are obtained using the first fill image.

The first missing image may be a face missing image having a relatively large face missing area among the plurality of face missing images. For example, the first missing image may be any one of three face missing images having the largest face missing area among the plurality of face missing images; for another example, the first missing image may be a face missing image having a largest face missing area among the plurality of face missing images, that is, the first missing image may be a face missing image obtained after face region matting is performed on a front image of the target object.

Referring to fig. 5, in the embodiment of the present disclosure, after the first missing image is selected, a face filling algorithm may be used to perform face filling on the first missing image to obtain a first filled image (in fig. 5, a face area of the first filled image is a digital face area that does not actually exist). The face filling algorithm may be a local redrawing (Inpaint) method under a Stable Diffusion (Stable Diffusion) algorithm, and the Prompt word (Prompt) may be set to face (face).

Through the steps, in the embodiment of the disclosure, the first missing image can be selected from the plurality of face missing images, then the face filling is directly performed on the first missing image, the first filling image is obtained, the first filling image is used as a filling reference image, the plurality of face filling images are obtained, the obtaining process of the plurality of face filling images can be simplified, and therefore the execution efficiency of the pose estimation method is improved.

In some alternative embodiments, "obtaining multiple face fills with the first fill image" may include the steps of:

taking the first filling image as a filling reference image, and selecting a second missing image from at least one residual image; the residual images are unselected face missing images in the plurality of face missing images;

obtaining a second fill image corresponding to the second missing image based on the face region of the fill reference image to take the second fill image as a new fill reference image;

a plurality of face filling images are obtained based on the first filling image and the plurality of second filling images.

Referring to fig. 6, that is, in the embodiment of the present disclosure, after the first filling image is used as the filling reference image, the following steps may be performed in one loop until there is no unselected face missing image in the plurality of face missing images: selecting a second missing image from the at least one remaining image; the residual images are unselected face missing images in the plurality of face missing images; a second fill image corresponding to the second missing image is obtained based on the face region of the fill reference image to take the second fill image as a new fill reference image. The second missing image may be any residual image in the at least one residual image, or may be a residual image adjacent to the filling reference image viewing angle in the at least one residual image, which is not limited by the embodiments of the present disclosure.

In a specific example, "obtaining a second fill image corresponding to a second missing image based on the face region of the fill reference image" may include: and deforming the facial region of the filling reference image to the facial missing region of the second missing image to obtain a second filling image corresponding to the second missing image. For example, the face region filling the reference image may be deformed to the face missing region of the second missing image using an affine transformation function (warp).

Through the steps, in the embodiment of the disclosure, after a first missing image is selected from a plurality of face missing images and face filling is performed on the first missing image, the first filled image is taken as a filling reference image after the first filled image is obtained, after that, a second missing image is selected from at least one remaining image, and a second filled image corresponding to the second missing image is obtained based on a face area of the filling reference image, so that the second filled image is taken as a new filling reference image. Because of the feature correlation between any two face filling images which are processed continuously, compared with the mode that each face missing image is filled with the face by using a face filling algorithm, the face filling image corresponding to the face missing image is obtained, the obtaining flow of a plurality of face filling images can be simplified, and therefore the execution efficiency of the pose estimation method is improved.

In some alternative embodiments, the "selecting the first missing image from the plurality of face missing images" may include the steps of:

acquiring a face missing area of each face missing image;

and selecting a face missing image with the largest face missing area from the plurality of face missing images as a first missing image.

The face missing area of each face missing image may be labeled in advance and stored.

Through the steps, in the embodiment of the present disclosure, the face missing area of each face missing image may be obtained, and from the plurality of face missing images, the face missing image with the largest face missing area is selected as the first missing image. Because the first missing image is the face missing image with the largest face missing area in the plurality of face missing images, the first missing image is subjected to face filling, the obtained first filling image has the most complete face characteristics, the first filling image is used as a first filling reference image, the second filling image corresponding to other face missing images is obtained, the face filling effect of the second filling image can be improved, and the accuracy of the estimation result of the hairstyle pose can be further improved.

In some alternative embodiments, "acquiring a face deletion area of each face deletion image" may include the steps of:

acquiring a plurality of labeling corner points of the face missing image aiming at each face missing image;

generating a face deletion mask of the face deletion image based on the plurality of labeled corner points;

the mask area of the face deletion mask is calculated as the face deletion area of the face deletion image.

The plurality of labeling corner points of the face missing image can be labeled in advance and stored. Furthermore, in the embodiment of the present disclosure, the labeling corner may be a face missing region corner of the face missing image.

Referring to fig. 7, in the embodiment of the present disclosure, after a plurality of labeling corner points 701 are acquired, any two adjacent labeling corner points in the plurality of labeling corner points 701 may be connected to generate a face missing mask of a face missing image, and then a mask area of the face missing mask is calculated as a face missing area of the face missing image. In a specific example, the mask area of the face deletion mask may be calculated by a monte carlo method (Monte Carlo Method, MCM).

Through the steps, in the embodiment of the disclosure, a plurality of labeling corner points of the face missing image can be obtained for each face missing image; generating a face deletion mask of the face deletion image based on the plurality of labeled corner points; the mask area of the face deletion mask is calculated as the face deletion area of the face deletion image. Because the mask area is obtained through real calculation, the reliability of the face missing area can be improved, so that the selection accuracy of the first missing image is improved, and the accuracy of the estimation result of the hairstyle pose is further improved.

In some alternative embodiments, the "selecting the second missing image from the at least one remaining image" may include the steps of:

and selecting a residual image adjacent to the filling reference image visual angle from at least one residual image as a second missing image.

Wherein, the view angle of each residual image can be represented by a pose prediction result obtained by predicting the pose of the camera when the residual image (namely, the original image corresponding to the residual image) is shot; also, the view angle of the filler reference image may be characterized by a pose prediction result obtained by predicting the pose of the camera when the filler reference image (i.e., the original image corresponding to the filler reference image) is photographed. Each pose prediction result can be obtained by using three-dimensional reconstruction software such as Colmap, openMVS.

Based on this, in a specific example, "selecting, from at least one of the remaining images, a remaining image adjacent to the filling reference image view angle as a second missing image" may include: acquiring a first camera pose corresponding to a filling reference image; acquiring a second camera pose corresponding to each residual image; for each residual image, acquiring a pose difference between a second camera pose corresponding to the residual image and a first camera pose so as to acquire at least one pose difference; and selecting a residual image with relatively smaller pose difference from at least one residual image as a second missing image. For example, the second missing image may be any one of three remaining images having the smallest pose difference among the at least one remaining image; for another example, the second missing image may be one remaining image with the least pose difference among the at least one remaining image.

Through the above steps, in the embodiment of the present disclosure, a remaining image adjacent to the filling reference image viewing angle may be selected from at least one remaining image as the second missing image. Since the second missing image is adjacent to the filling reference image in view angle, in theory, the face filling result of the second missing image should be substantially similar to the face region of the filling reference image, and therefore, based on the face region of the filling reference image, the second filling image corresponding to the second missing image is obtained, the face filling effect of the second filling image can be improved, so that the accuracy of the estimation result of the hairstyle pose can be further improved.

In some alternative embodiments, "selecting a residual image adjacent to the fill reference image perspective from at least one residual image as the second missing image" may include the steps of:

acquiring a first camera pose corresponding to a filling reference image;

acquiring a second camera pose corresponding to each residual image;

for each residual image, acquiring a pose difference between a second camera pose corresponding to the residual image and a first camera pose so as to acquire at least one pose difference;

and selecting the residual image with the minimum pose difference from at least one residual image as a second missing image.

The residual image with the minimum pose difference corresponding to the at least one residual image is the residual image closest to the filling reference image visual angle.

Through the steps, in the embodiment of the disclosure, a remaining image closest to the filling reference image perspective may be selected from at least one remaining image as the second missing image. Since the second missing image is closest to the filling reference image in view angle, in theory, the face filling result of the second missing image should be substantially similar to the face region of the filling reference image, so that the face filling effect of the second filling image can be further improved to further improve the accuracy of the estimation result of the hairstyle pose based on the face region of the filling reference image to obtain the second filling image corresponding to the second missing image.

In some alternative embodiments, "obtaining a second fill image corresponding to a second missing image based on filling the face region of the reference image" may include the steps of:

obtaining a first image to be fused corresponding to the second missing image based on filling the face region of the reference image;

face filling is carried out on the second missing image, and a second image to be fused is obtained;

And fusing the first image to be fused and the second image to be fused to obtain a second filling image corresponding to the second missing image.

In a specific example, "obtaining a first image to be fused corresponding to a second missing image based on filling a face region of a reference image" may include: and deforming the facial region filled with the reference image to the facial missing region of the second missing image to obtain a first image to be fused corresponding to the second missing image. For example, the face region filling the reference image may be deformed to the face missing region of the second missing image using an affine transformation function (warp).

In a specific example, the face filling algorithm may be used to perform face filling on the second missing image to obtain a second image to be fused. The face filling algorithm may be a local redrawing (Inpaint) method under a Stable Diffusion (Stable Diffusion) algorithm, and the Prompt word (Prompt) may be set to face (face).

Through the above steps, in the embodiment of the present disclosure, a first image to be fused corresponding to a second missing image may be obtained based on filling the face region of the reference image; face filling is carried out on the second missing image, and a second image to be fused is obtained; and fusing the first image to be fused and the second image to be fused to obtain a second filling image corresponding to the second missing image. Because the second filling image fuses the first image to be fused and the second image to be fused, and the first image to be fused is obtained based on the face area of the filling reference image, and the second image to be fused is obtained by carrying out face filling on the second missing image, the face filling effect of the second filling image can be improved, and the accuracy of the estimation result of the hairstyle pose can be further improved.

In some alternative embodiments, "obtaining a first image to be fused corresponding to a second missing image based on filling the face region of the reference image" may include the steps of:

deforming the face region filled with the reference image to a face missing region of the second missing image to obtain an initial prior image;

and carrying out optimization processing on the initial prior image to obtain a first image to be fused corresponding to the second missing image.

In a specific example, the initial prior image may be obtained by deforming the face region of the filled reference image to the face missing region of the second missing image using an affine transformation function (warp).

In a specific example, after the initial prior image is obtained, an image optimization algorithm may be used to perform an optimization process on the initial prior image to obtain a first image to be fused corresponding to the second missing image. The Image optimization algorithm may be an Image2Image method under a Stable Diffusion (Stable Diffusion) algorithm.

Through the steps, in the embodiment of the disclosure, the face region filled with the reference image can be deformed to the face missing region of the second missing image, so as to obtain an initial prior image; and carrying out optimization processing on the initial prior image to obtain a first image to be fused corresponding to the second missing image. Therefore, the image quality of the first image to be fused can be improved, so that the face filling effect of the second filling image is further improved, and the accuracy of the hair style pose estimation result is further improved.

In some optional embodiments, "obtaining a pose estimation result corresponding to a hairstyle area in a face-filling image based on a face area in the face-filling image" may include the steps of:

determining a plurality of face keypoints from a face region of a face-filling image;

determining a target camera pose corresponding to the face fill image based on the plurality of face keypoints;

and obtaining a pose estimation result corresponding to the hairstyle area in the face filling image according to the pose of the target camera.

The plurality of facial key points may include key points corresponding to specific positions such as facial contours, eyes, eyebrows, lips, and nose contours.

After the plurality of face key points are determined, pose information of the plurality of face key points can be obtained, then a target camera pose corresponding to the face filling image is determined according to the pose information of the plurality of face key points, and the face pose in the face filling image is predicted according to the target camera pose to serve as a pose estimation result corresponding to a hairstyle area in the face filling image.

In the embodiment of the present disclosure, the above steps may be implemented by a capturing algorithm (Mocap) for a face motion, which is not described herein.

Through the above steps, in the embodiments of the present disclosure, a plurality of face key points may be determined from a face region of a face-filling image; determining a target camera pose corresponding to the face filling image according to the plurality of face key points; the target camera pose is taken as a pose estimation result corresponding to the hairstyle area in the face filling image. Since the plurality of facial feature points have strong feature representation capability, the target camera pose corresponding to the face filling image can be accurately determined according to the plurality of facial key points, and therefore, the target camera pose is used as a pose estimation result corresponding to the hairstyle area in the face filling image, and the accuracy of the hairstyle pose estimation result can be further improved.

In some optional embodiments, the pose estimation method may further include the steps of:

acquiring a plurality of first view images of a target object; wherein, the first view images respectively correspond to different view angles;

for each first view angle image, selecting a pose reference image related to the first view angle image from a plurality of face filling images;

based on the pose estimation result corresponding to the hairstyle area in the pose reference image, a pose estimation result corresponding to the hairstyle area in the first view image is obtained.

In a specific example, for each first view image, a face-filling image adjacent to the view angle of the first view image may be selected from the plurality of face-filling images as the pose reference image associated with the first view image. Wherein the "selecting a face-filling image adjacent to the view angle of the first view angle image from the plurality of face-filling images as the pose reference image related to the first view angle image" may include: acquiring a third camera pose corresponding to the first visual angle image; acquiring a fourth camera pose corresponding to each face filling image; aiming at each face filling image, acquiring pose differences between a fourth camera pose and a third camera pose corresponding to the face filling image so as to acquire at least one pose difference; and selecting a face filling image with relatively small corresponding pose difference from the plurality of face filling images as a pose reference image related to the first view angle image. For example, the pose reference image may be any one of three face filling images having the smallest pose difference among the face filling images; for another example, the pose reference image may be one face filling image with the least pose difference among the face filling images.

In a specific example, after selecting, for each first view image, a pose reference image related to the first view image from the plurality of face filling images, a fifth camera pose corresponding to the pose reference image may be determined, and then, based on a pose estimation result of a hairstyle region in the pose reference image, a pose estimation result corresponding to the hairstyle region in the first view image is obtained according to a pose conversion relationship between the third camera pose and the fifth camera pose.

Through the above steps, in the embodiments of the present disclosure, a plurality of first perspective images about a target object may be acquired; wherein, the first view images respectively correspond to different view angles; for each first view angle image, selecting a pose reference image related to the first view angle image from a plurality of face filling images; and obtaining a hair style pose estimation result aiming at the first visual angle image based on the hair style pose estimation result aiming at the pose reference image. That is, for the first view angle image, the pose estimation result of the hairstyle area corresponding to the first view angle image can be obtained, and the pose estimation result is accurate, so that the usability of the pose estimation method is improved.

Referring to fig. 8, after the pose estimation result corresponding to the hairstyle area in each face filling image is obtained, a first relative pose relationship of the hairstyle area in each face filling image and the three-dimensional avatar may be further obtained for each face filling image. Thereafter, after performing hairline reconstruction based on the one face-filling image (or an original image corresponding to the one face-filling image), a first hairline reconstruction result may be mapped onto the three-dimensional avatar according to the first relative pose relationship.

Also, after the pose estimation result corresponding to the hairstyle region in each first view image is obtained, a second relative pose relationship of the hairstyle region in each first view image and the three-dimensional avatar may be further obtained for each first view image. After that, after the hairline reconstruction is performed based on the one first view angle image, the second hairline reconstruction result may be mapped onto the three-dimensional avatar according to the second relative pose relationship.

Hereinafter, a complete flow of a pose estimation method provided by the embodiments of the present disclosure will be described.

acquiring a face missing area of each face missing image, selecting a face missing image with the largest face missing area from a plurality of face missing images, taking the face missing image as a first missing image, and performing face filling on the first missing image to obtain a first filling image;

taking the first filling image as a filling reference image, performing the following steps in one cycle: selecting a residual image adjacent to the filling reference image visual angle from at least one residual image as a second missing image; the residual images are unselected face missing images in the plurality of face missing images; deforming the face region of the filling reference image into the face missing region of the second missing image to obtain an initial prior image, optimizing the initial prior image to obtain a first image to be fused corresponding to the second missing image, filling the face of the second missing image to obtain a second image to be fused, and fusing the first image to be fused and the second image to be fused to obtain a second filling image corresponding to the second missing image;

Repeating the steps in the above cycle with the second filling image as a new filling reference image until no unselected face missing image exists in the plurality of face missing images; obtaining a plurality of face filling images based on the first filling image and the plurality of second filling images;

determining a plurality of face key points from a face area of the face filling image, determining a target camera pose corresponding to the face filling image based on the plurality of face key points, and obtaining a pose estimation result corresponding to a hairstyle area in the face filling image according to the target camera pose;

acquiring a plurality of first view images of a target object; the method comprises the steps of selecting pose reference images related to a first view image from a plurality of face filling images according to each first view image, and obtaining pose estimation results corresponding to hairstyle areas in the first view images based on pose estimation results corresponding to hairstyle areas in the pose reference images.

Fig. 9 is a schematic application scenario diagram of a pose estimation method according to an embodiment of the present disclosure.

As described above, the pose estimation method provided by the embodiment of the present disclosure is applied to an electronic device. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices, and other similar computing devices.

The electronic device may be configured to:

In an embodiment of the disclosure, the preparation method of the plurality of facial deletion images may be: acquiring original images of a target object from multiple view angles by changing the pose of a camera so as to obtain multiple original images; the original images comprise a plurality of second visual angle images and a plurality of first visual angle images; and aiming at each second view angle image, the face area in the second view angle image is scratched out to obtain a face missing image corresponding to the second view angle image, and finally, a plurality of face missing images can be obtained.

It should be noted that, in the embodiment of the present disclosure, the schematic view of the scenario shown in fig. 9 is merely illustrative and not restrictive, and those skilled in the art may make various obvious changes and/or substitutions based on the example of fig. 9, and the obtained technical solution still falls within the scope of the embodiment of the present disclosure.

In order to better implement the pose estimation method, the embodiment of the present disclosure also provides a pose estimation apparatus 1000, which may be applied to an electronic device in particular. Hereinafter, a pose estimation device 1000 according to the disclosed embodiment will be described with reference to a schematic structural diagram shown in fig. 10.

The pose estimation device 1000 includes:

an image acquisition unit 1001 for acquiring a plurality of face deletion images concerning a target object; the target object is provided with a hairstyle area, and the plurality of facial missing images respectively correspond to different visual angles;

a face filling unit 1002 configured to perform face filling on a plurality of face missing images to obtain a plurality of face filled images; wherein the face filling images and the face missing images are in one-to-one correspondence;

a first pose estimation unit 1003 for obtaining a pose estimation result corresponding to a hairstyle area in a face filling image based on a face area of the face filling image for each face filling image.

In some alternative embodiments, the face filling unit 1002 is configured to:

selecting a first missing image from the plurality of face missing images;

A plurality of face fills are obtained using the first fill image.

In some alternative embodiments, the face filling unit 1002 is configured to:

acquiring a face missing area of each face missing image;

In some alternative embodiments, the face filling unit 1002 is configured to:

acquiring a first camera pose corresponding to a filling reference image;

acquiring a second camera pose corresponding to each residual image;

In some alternative embodiments, the face filling unit 1002 is configured to:

In some alternative embodiments, the first pose estimation unit 1003 is configured to:

and determining a pose estimation result corresponding to the hairstyle area in the face filling image according to the pose of the target camera.

In some alternative embodiments, the pose estimation device 1000 further comprises a second pose estimation unit for:

Descriptions of specific functions and examples of each unit of the pose estimation device 1000 in the embodiment of the present disclosure may be referred to the related descriptions of corresponding steps in the above method embodiments, and are not repeated herein.

In the technical scheme of the disclosure, the acquisition, storage, application and the like of the related user personal information (for example, the original image of the target object) all conform to the regulations of related laws and regulations and do not violate the public order colloquial.

According to embodiments of the present disclosure, the present disclosure also provides an electronic device, a readable storage medium and a computer program product.

Fig. 11 illustrates a schematic block diagram of an example electronic device 1100 that can be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile apparatuses, such as personal digital assistants, cellular telephones, smartphones, wearable devices, and other similar computing apparatuses. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the disclosure described and/or claimed herein.

As shown in fig. 11, the apparatus 1100 includes a computing unit 1101 that can perform various appropriate actions and processes according to a computer program stored in a Read-Only Memory (ROM) 1102 or a computer program loaded from a storage unit 1108 into a random access Memory (Random Access Memory, RAM) 1103. In the RAM 1103, various programs and data required for the operation of the device 1100 can also be stored. The computing unit 1101, ROM 1102, and RAM 1103 are connected to each other by a bus 1104. An Input/Output (I/O) interface 1105 is also connected to bus 1104.

Various components in device 1100 are connected to I/O interface 1105, including: an input unit 1106 such as a keyboard, a mouse, etc.; an output unit 1107 such as various types of displays, speakers, and the like; a storage unit 11011 such as a magnetic disk, an optical disk, or the like; and a communication unit 1109 such as a network card, modem, wireless communication transceiver, or the like. The communication unit 1109 allows the device 1100 to exchange information/data with other devices through a computer network such as the internet and/or various telecommunication networks.

The computing unit 1101 may be a variety of general purpose and/or special purpose processing components having processing and computing capabilities. Some examples of computing unit 1101 include, but are not limited to, a central processing unit (Central Processing Unit, CPU), a graphics processing unit (Graphics Processing Unit, GPU), various dedicated artificial intelligence (Artificial Intelligence, AI) computing chips, various computing units running machine learning model algorithms, digital signal processors (Digital Signal Process, DSP), and any suitable processors, controllers, microcontrollers, etc. The calculation unit 1101 performs the respective methods and processes described above, for example, the pose estimation method. For example, in some embodiments, the pose estimation method may be implemented as a computer software program tangibly embodied on a machine-readable medium, such as the storage unit 11011. In some embodiments, some or all of the computer programs may be loaded and/or installed onto device 1100 via ROM 1102 and/or communication unit 1109. When the computer program is loaded into the RAM 1103 and executed by the computing unit 1101, one or more steps of the pose estimation method described above may be performed. Alternatively, in other embodiments, the computing unit 1101 may be configured to perform the pose estimation method by any other suitable way (e.g. by means of firmware).

Various implementations of the systems and techniques described here above can be implemented in digital electronic circuitry, integrated circuit systems, field programmable gate arrays (Field Programmable Gate Array, FPGAs), application specific integrated circuits (Application Specific Integrated Circuit, ASICs), application specific standard products (Application Specific Standard Product, ASSPs), systems On Chip (SOC), load programmable logic devices (Complex Programmable Logic Device, CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.

Program code for carrying out methods of the present disclosure may be written in any combination of one or more programming languages. These program code may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data acquisition device such that the program code, when executed by the processor or controller, causes the functions/operations specified in the flowchart and/or block diagram to be implemented. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. The machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a RAM, a ROM, an erasable programmable read-Only Memory (EPROM) or flash Memory, an optical fiber, a portable compact disc read-Only Memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: other types of devices may also be used to provide interaction with a user, for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback), and input from the user may be received in any form (including acoustic input, speech input, or tactile input).

The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local area network (Local Area Network, LAN), wide area network (Wide Area Network, WAN) and the internet.

The computer system may include a client and a server. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server may be a cloud server, a server of a distributed system, or a server incorporating a blockchain.

The disclosed embodiments also provide a non-transitory computer-readable storage medium storing computer instructions for causing the computer to perform a pose estimation method.

The disclosed embodiments also provide a computer program product comprising a computer program which, when executed by a processor, implements a pose estimation method.

It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps recited in the present disclosure may be performed in parallel, sequentially, or in a different order, provided that the desired results of the disclosed aspects are achieved, and are not limited herein. Moreover, in this disclosure, relational terms such as "first," "second," "third," and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Furthermore, "plurality" in the present disclosure may be understood as at least two.

The above detailed description should not be taken as limiting the scope of the present disclosure. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions, improvements, etc. that are within the principles of the present disclosure are intended to be included within the scope of the present disclosure.

Claims

1. A pose estimation method, comprising:

performing face filling on the plurality of face missing images to obtain a plurality of face filling images; wherein the plurality of face filling images are in one-to-one correspondence with the plurality of face missing images;

for each of the face-filling images, a pose estimation result corresponding to a hairstyle area in the face-filling image is obtained based on a face area of the face-filling image.

2. The method of claim 1, wherein said face filling the plurality of face-missing images to obtain a plurality of face-filled images comprises:

selecting a first missing image from the plurality of face missing images;

and obtaining the plurality of face filling images by using the first filling image.

3. The method of claim 2, wherein the obtaining the plurality of face-filling images using the first filling image comprises:

Taking the first filling image as a filling reference image, and selecting a second missing image from at least one residual image; wherein the remaining images are unselected face missing images of the plurality of face missing images;

the plurality of face filling images are obtained based on the first filling image and the plurality of second filling images.

4. The method of claim 2, wherein the selecting a first missing image from the plurality of face missing images comprises:

acquiring a face missing area of each face missing image;

and selecting a face missing image with the largest face missing area from the plurality of face missing images as the first missing image.

5. The method of claim 4, wherein the acquiring a face-missing area of each of the face-missing images comprises:

and calculating a mask area of the face deletion mask as a face deletion area of the face deletion image.

6. A method according to claim 3, wherein said selecting a second missing image from the at least one remaining image comprises:

and selecting a residual image adjacent to the filling reference image visual angle from the at least one residual image as the second missing image.

7. The method of claim 6, wherein the selecting, from the at least one residual image, a residual image adjacent to the fill reference image perspective as the second missing image comprises:

acquiring a first camera pose corresponding to the filling reference image;

acquiring a second camera pose corresponding to each residual image;

for each residual image, acquiring a pose difference between a second camera pose corresponding to the residual image and the first camera pose so as to acquire at least one pose difference;

and selecting the residual image with the minimum corresponding pose difference from at least one residual image as the second missing image.

8. A method according to claim 3, wherein the obtaining a second fill image corresponding to the second missing image based on the face region of the fill reference image comprises:

obtaining a first image to be fused corresponding to the second missing image based on the face region of the filled reference image;

performing face filling on the second missing image to obtain a second image to be fused;

9. The method of claim 8, wherein the obtaining a first image to be fused corresponding to the second missing image based on the face region of the fill reference image comprises:

deforming the face region of the filling reference image to the face missing region of the second missing image to obtain an initial prior image;

10. The method of claim 1, wherein the obtaining a pose estimation result corresponding to a hairstyle area in the face filling image based on a face area in the face filling image comprises:

Determining a plurality of facial keypoints from a facial region of the facial fill image;

determining a target camera pose corresponding to the face filling image according to the plurality of face key points;

and according to the pose of the target camera, obtaining a pose estimation result corresponding to a hairstyle area in the face filling image.

11. The method of claim 1, further comprising:

acquiring a plurality of first perspective images about the target object; wherein the plurality of first view images respectively correspond to different view angles;

selecting pose reference images related to the first view images from the face filling images for each first view image;

12. A pose estimation device, comprising:

a face filling unit, configured to perform face filling on the plurality of face missing images, to obtain a plurality of face filling images; wherein the plurality of face filling images are in one-to-one correspondence with the plurality of face missing images;

A first pose estimation unit configured to obtain, for each of the face-filling images, a pose estimation result corresponding to a hairstyle area in the face-filling image based on a face area of the face-filling image.

13. The apparatus of claim 12, wherein the face filling unit is to:

selecting a first missing image from the plurality of face missing images;

the plurality of face filling images are obtained using the first filling image.

14. The apparatus of claim 13, wherein the face filling unit is to:

15. The apparatus of claim 13, wherein the face filling unit is to:

acquiring a face missing area of each face missing image;

16. The apparatus of claim 15, wherein the face filling unit is to:

17. The apparatus of claim 14, wherein the face filling unit is to:

18. The apparatus of claim 17, wherein the face filling unit is to:

acquiring a first camera pose corresponding to the filling reference image;

acquiring a second camera pose corresponding to each residual image;

19. The apparatus of claim 14, wherein the face filling unit is to:

20. The apparatus of claim 19, wherein the face filling unit is to:

21. The apparatus of claim 12, wherein the first pose estimation unit is configured to:

22. The apparatus of claim 12, further comprising a second pose estimation unit for:

23. An electronic device, comprising:

at least one processor;

a memory communicatively coupled to the at least one processor;

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1 to 11.

24. A non-transitory computer readable storage medium storing computer instructions for causing the computer to perform the method of any one of claims 1-11.

25. A computer program product comprising a computer program which, when executed by a processor, implements the method according to any one of claims 1 to 11.