CN117726907A

CN117726907A - Training method of modeling model, three-dimensional human modeling method and device

Info

Publication number: CN117726907A
Application number: CN202410171192.6A
Authority: CN
Inventors: 王宏升; 林峰
Original assignee: Zhejiang Lab
Current assignee: Zhejiang Lab
Priority date: 2024-02-06
Filing date: 2024-02-06
Publication date: 2024-03-19
Anticipated expiration: 2044-02-06
Also published as: CN117726907B

Abstract

According to the training method of the modeling model, the three-dimensional human modeling method and the three-dimensional human modeling device, two-dimensional images of a human body are obtained, image proxy features of the two-dimensional images of the human body are input into a first coding layer to obtain vertex features, joint rotation features and shape features, and camera parameters of a camera for collecting the two-dimensional images of the human body are input into the first coding layer to obtain camera features. The image agent feature, the vertex feature, the joint point rotation feature, the shape feature and the camera feature are input into a second coding layer, the second coding layer can correlate the image agent feature, the vertex feature, the joint point rotation feature, the shape feature and the camera feature to obtain coding features, and a three-dimensional human model comprising a vertex coordinate set and a joint point coordinate set is obtained according to the coding features so as to minimize deviation between the vertex coordinate set and an actual vertex coordinate set and deviation between the joint point coordinate set and an actual joint point coordinate set, and a modeling model is trained.

Description

Training method of modeling model, three-dimensional human modeling method and device

Technical Field

The present disclosure relates to the field of computer vision, and in particular, to a training method for modeling models, a method and an apparatus for three-dimensional human modeling.

Background

In recent years, with rapid development of the field of computer vision, technologies such as virtual reality and augmented reality have been greatly advanced, and at the same time, construction of a three-dimensional modeling model of a human body has been increasingly emphasized.

At present, a three-dimensional human modeling model can be constructed by fitting parameters of the model based on a preset model template. In addition, a three-dimensional modeling model of the human body may also be constructed based on the training data.

However, the accuracy of building the three-dimensional human modeling model based on the preset model template is low, the data cost and the complexity of building the three-dimensional human modeling model based on training data are high, and the execution efficiency of building the three-dimensional human modeling model is low.

Therefore, how to improve the execution efficiency and accuracy of building the three-dimensional modeling model is a problem to be solved.

Disclosure of Invention

The present specification provides a training method of a modeling model, a three-dimensional human modeling method and a device, so as to partially solve the above-mentioned problems existing in the prior art.

The technical scheme adopted in the specification is as follows:

the specification provides a training method of a modeling model, comprising the following steps:

acquiring a two-dimensional image of a human body;

obtaining image proxy features aiming at the human body two-dimensional image according to the human body two-dimensional image, wherein the image proxy features are used for representing the integral features aiming at the human body two-dimensional image;

inputting the image agent characteristics into a first coding layer in a modeling model to obtain vertex characteristics, joint rotation characteristics and shape characteristics aiming at the two-dimensional image of the human body, and inputting camera parameters of a camera for acquiring the two-dimensional image of the human body into the first coding layer to obtain camera characteristics;

inputting the image proxy feature, the vertex feature, the joint rotation feature, the shape feature and the camera feature to a second encoding layer in the modeling model, so that the second encoding layer correlates the image proxy feature, the vertex feature, the joint rotation feature, the shape feature and the camera feature to obtain an encoding feature;

obtaining a vertex coordinate set of each model vertex and a joint point coordinate set of each joint point contained in a three-dimensional human model corresponding to the human body two-dimensional image according to the coding characteristics;

Training the modeling model by minimizing the deviation between the vertex coordinate set and the actual vertex coordinate set corresponding to the two-dimensional image of the human body and minimizing the deviation between the joint point coordinate set and the actual joint point coordinate set corresponding to the two-dimensional image of the human body.

Optionally, obtaining the image proxy feature for the two-dimensional image of the human body according to the two-dimensional image of the human body specifically includes:

inputting the human body two-dimensional image into a preset extraction model so that the extraction model extracts image edge characteristics aiming at the human body two-dimensional image;

acquiring two-dimensional joint thermodynamic diagram characteristics of the two-dimensional image of the human body;

and splicing the image edge features with the two-dimensional joint thermodynamic diagram features to obtain spliced features, and taking the spliced features as image proxy features aiming at the two-dimensional human body images.

Optionally, the articulation point comprises: each father joint point and each son joint point, the joint point state information of the father joint point can influence the joint point state information of the son joint point of the joint point, and the joint point is not a root joint point of the father joint point;

inputting the image agent features into a first coding layer in a modeling model to obtain joint point rotation features aiming at the two-dimensional image of the human body, wherein the method specifically comprises the following steps:

Inputting the image agent characteristics into a first coding layer of a modeling model, so that the first coding layer determines joint point rotation characteristics of a root node aiming at the two-dimensional image of the human body, and determines joint point rotation characteristics of each sub-node according to the joint point rotation characteristics of the root node and joint probability distribution among all the joint points, wherein the joint probability distribution among all the joint points is used for representing motion correlation among all the joint points.

Optionally, the image agent feature is input to a first coding layer of a modeling model, so that the first coding layer determines a joint point rotation feature of a root node of the two-dimensional image of the human body, and specifically includes:

and inputting the image agent characteristics into a first coding layer of a modeling model, so that the first coding layer determines the positions of all vertexes conforming to the snow probability density distribution in the neighborhood range of a root joint point in the two-dimensional image of the human body, and the positions of all candidate vertexes corresponding to the root joint point, determines the positions of partial vertexes corresponding to the mode of the snow probability density distribution from the positions of all candidate vertexes corresponding to the root joint point, and determines the rotation characteristics of the root joint point according to the positions of partial vertexes corresponding to the mode of the snow probability density distribution.

Optionally, determining the rotation characteristics of the joints of each sub-joint node according to the rotation characteristics of the joints of the root joint node and the joint probability distribution among the joints, which specifically includes:

for each sub-node, determining the positions of all vertexes conforming to the probability density distribution of the snow in the neighborhood range of the sub-node as the positions of all candidate vertexes corresponding to the sub-node;

determining partial vertex positions corresponding to modes of the snow probability density distribution from the candidate vertex positions corresponding to the sub-joint points, and determining basic rotation characteristics of the sub-joint points according to the determined partial vertex positions corresponding to the modes of the snow probability density distribution;

and determining the rotation characteristics of the child node according to the basic rotation characteristics of the child node, the rotation characteristics of the parent node corresponding to the child node and the joint probability distribution between the child node and the parent node.

Optionally, the image proxy feature, the vertex feature, the joint rotation feature, the shape feature, and the camera feature are input to a second encoding layer in the modeling model, so that the second encoding layer associates the image proxy feature, the vertex feature, the joint rotation feature, the shape feature, and the camera feature to obtain an encoding feature, and specifically includes:

Determining each image agent mark corresponding to the image agent feature;

determining each vertex mark corresponding to the vertex feature according to each image proxy mark and the vertex feature, determining each joint point rotation mark corresponding to the joint point rotation feature according to each image proxy mark and the joint point rotation feature, determining each shape mark corresponding to the shape feature according to each image proxy mark and the shape feature, and determining each camera mark corresponding to the camera feature according to each image proxy mark and the camera feature;

the image agent features, the image agent marks corresponding to the image agent features, the vertex marks corresponding to the vertex features, the joint point rotation marks corresponding to the joint point rotation features, the shape marks corresponding to the shape features, and the camera marks corresponding to the camera features are input into a second coding layer in the modeling model, so that the second coding layer determines the association information between the image agent features contained in the image agent features according to the image agent marks, the association information between the vertex features contained in the vertex features according to the vertex marks, the association information between the joint point rotation features contained in the joint point rotation features according to the joint point rotation marks, the association information between the shape features contained in the shape features according to the shape marks, and the association information between the camera features contained in the camera features according to the camera marks.

The specification provides a method of three-dimensional human modeling, comprising:

collecting a two-dimensional image of a human body;

inputting the image agent characteristics into a first coding layer in a pre-trained modeling model to obtain vertex characteristics, joint rotation characteristics and shape characteristics aiming at the two-dimensional image of the human body, and inputting camera parameters of a camera for acquiring the two-dimensional image of the human body into the first coding layer to obtain camera characteristics, wherein the modeling model is trained by the method;

And carrying out three-dimensional modeling on the human body according to the vertex coordinate set of each model vertex and the joint point coordinate set of each joint point so as to obtain a three-dimensional human body model corresponding to the two-dimensional human body image.

The present specification provides a training apparatus of a modeling model, comprising:

the acquisition module is used for: the method is used for acquiring a two-dimensional image of the human body;

and the agent module: the image proxy feature is used for representing the overall feature of the human body two-dimensional image;

a first encoding module: the image agent feature is input into a first coding layer in a modeling model to obtain vertex feature, joint rotation feature and shape feature aiming at the two-dimensional image of the human body, and camera parameters of a camera for acquiring the two-dimensional image of the human body are input into the first coding layer to obtain camera features;

and a second encoding module: a second encoding layer for inputting the image proxy feature, the vertex feature, the joint rotation feature, the shape feature, and the camera feature into the modeling model, such that the second encoding layer correlates the image proxy feature, the vertex feature, the joint rotation feature, the shape feature, and the camera feature to obtain an encoded feature;

And a prediction module: the method comprises the steps of obtaining a vertex coordinate set of each model vertex and a joint point coordinate set of each joint point contained in a three-dimensional human body model corresponding to the human body two-dimensional image according to the coding characteristics;

training module: the method is used for training the modeling model by minimizing the deviation between the vertex coordinate set and the actual vertex coordinate set corresponding to the two-dimensional image of the human body and minimizing the deviation between the joint point coordinate set and the actual joint point coordinate set corresponding to the two-dimensional image of the human body.

The present specification provides a computer readable storage medium storing a computer program which when executed by a processor implements the training method of a modeling model or the method of three-dimensional human modeling described above.

The present specification provides an electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the training method of modeling models or the method of three-dimensional human modeling as described above when executing the program.

The above-mentioned at least one technical scheme that this specification adopted can reach following beneficial effect:

According to the training method of the modeling model, firstly, two-dimensional images of a human body are acquired, image proxy features can be obtained, the image proxy features are input into a first coding layer, vertex features, joint point rotation features and shape features are obtained, and camera parameters of a camera for acquiring the two-dimensional images of the human body are input into the first coding layer, so that camera features are obtained. The image agent feature, the vertex feature, the joint point rotation feature, the shape feature and the camera feature are input into a second coding layer, the second coding layer can correlate the image agent feature, the vertex feature, the joint point rotation feature, the shape feature and the camera feature to obtain coding features, and according to the coding features, a vertex coordinate set and a joint point coordinate set are contained in the three-dimensional human model, so that deviation between the vertex coordinate set and an actual vertex coordinate set is minimized, deviation between the joint point coordinate set and an actual joint point coordinate set is minimized, and the modeling model is trained.

According to the method, the modeling model can acquire the characteristics of the two-dimensional image of the human body from a plurality of angles such as vertexes, joints and the like, a more accurate three-dimensional human body model can be constructed based on the characteristics of the angles, further, in the process of using the modeling model, the characteristics of the two-dimensional image of the human body in multiple aspects can be aggregated to obtain the more accurate three-dimensional human body model, in the process of using the modeling model, the three-dimensional human body model can be obtained only by inputting the two-dimensional image of the human body into the model, and further, the modeling efficiency is improved.

Drawings

The accompanying drawings, which are included to provide a further understanding of the specification, illustrate and explain the exemplary embodiments of the present specification and their description, are not intended to limit the specification unduly. In the drawings:

FIG. 1 is a flow chart of a method for training a modeling model provided in the present specification;

FIG. 2 is a schematic illustration of a three-dimensional manikin provided herein;

FIG. 3 is a schematic diagram of a method of three-dimensional human modeling provided herein;

FIG. 4 is a schematic diagram of a training apparatus for modeling models provided herein;

FIG. 5 is a schematic diagram of a three-dimensional modeling apparatus provided herein;

fig. 6 is a schematic structural diagram of an electronic device corresponding to fig. 1 or fig. 3 provided in the present specification.

Detailed Description

For the purposes of making the objects, technical solutions and advantages of the present specification more apparent, the technical solutions of the present specification will be clearly and completely described below with reference to specific embodiments of the present specification and corresponding drawings. It will be apparent that the described embodiments are only some, but not all, of the embodiments of the present specification. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the present disclosure, are intended to be within the scope of the present disclosure.

The following describes in detail the technical solutions provided by the embodiments of the present specification with reference to the accompanying drawings.

Fig. 1 is a flow chart of a training method of a modeling model provided in the present specification, including the following steps:

s101: and acquiring a two-dimensional image of the human body.

S102: and obtaining image proxy features aiming at the human body two-dimensional image according to the human body two-dimensional image, wherein the image proxy features are used for representing the integral features aiming at the human body two-dimensional image.

The training method of the modeling model in the present specification may be executed by a terminal device such as a desktop computer or a notebook computer, or may be executed by a client or a server installed in the terminal device. In the following, a training method of a modeling model in the embodiment of the present specification will be described by taking only a server as an execution subject.

At present, in the field of computer vision, a three-dimensional human modeling model can be built by fitting parameters of the model based on a preset model template, and in addition, the three-dimensional human modeling model can also be built based on training data. However, the accuracy of building the three-dimensional human modeling model based on the preset model template is low, the data cost and the complexity of building the three-dimensional human modeling model based on training data are high, and the execution efficiency of building the three-dimensional human modeling model is low.

In order to solve the above-described problems, the present specification provides a training method of a modeling model, in which a server may first acquire a two-dimensional image of a human body, which may be obtained by a photographing device such as a camera.

In order to reduce the complexity of the subsequent training of the modeling model and improve the training efficiency of the modeling model, the server may first extract the image proxy features of the two-dimensional image of the human body.

Specifically, the server may input the two-dimensional image of the human body into a preset extraction model (such as a convolutional neural network model, etc.), and may further extract initial image features of the two-dimensional image of the human body. Then, the initial image feature can be further extracted to obtain the edge feature of the two-dimensional image of the human body, wherein the server can adopt a Gaussian filtering mode to perform noise reduction treatment on the initial image feature and filter non-edge pixel points, further can obtain the edge feature of the two-dimensional image of the human body, and gray values of all the pixel points after the Gaussian filtering noise reduction treatment can refer to the following formula:

where (x, y) can be used to represent pixel coordinates,can be used to represent gray values before pixel noise reduction,/- >Can be used for representing gray values after noise reduction of the pixel points,/->May be used to represent a neighborhood of pixels.

After the initial image feature is subjected to noise reduction processing, a gradient value corresponding to the gradient direction of each pixel point can be obtained through the change of the gray value among the pixel points, the pixel point with the gradient value larger than the maximum gradient threshold value is used as an edge pixel point, the pixel point with the gradient value smaller than the minimum gradient threshold value is used as a non-edge pixel point, and then the edge feature of the two-dimensional image of the human body can be extracted.

It should be noted that, for each pixel with a gradient value between the minimum gradient threshold and the maximum gradient threshold, if the pixel is adjacent to an edge pixel with a gradient value greater than the maximum gradient threshold, the pixel is also used as an edge pixel, otherwise, the pixel is used as a non-edge pixel.

Then, in order to further improve the accuracy of the subsequent model building and the model training efficiency, the two-dimensional joint thermodynamic diagram features of the two-dimensional image of the human body can be extracted through a preset joint feature extraction model, wherein the two-dimensional joint thermodynamic diagram features can refer to features aiming at the joint state of the human body in the two-dimensional image of the human body, which are represented in a thermodynamic diagram mode.

And then, the server can perform characteristic stitching on the edge characteristics of the two-dimensional image of the human body and the thermodynamic diagram characteristics of the two-dimensional joint to obtain the image proxy characteristics of the two-dimensional image of the human body, wherein the image proxy characteristics can be used for representing the integral characteristics of the two-dimensional image of the human body.

S103: inputting the image agent characteristics into a first coding layer in a modeling model to obtain vertex characteristics, joint rotation characteristics and shape characteristics aiming at the two-dimensional image of the human body, and inputting camera parameters of a camera for acquiring the two-dimensional image of the human body into the first coding layer to obtain camera characteristics.

In the present specification, the server may input the obtained image proxy feature for the two-dimensional image of the human body to the first coding layer in the modeling model, and may further obtain the vertex feature, the rotation feature of the joint point, and the shape feature for the two-dimensional image of the human body. In addition, camera parameters of a camera that acquires two-dimensional images of the human body may also be input into the first encoding layer to obtain camera features. The first coding layer can code image proxy features of the two-dimensional image of the human body, so that vertex features can be obtained, and the vertex features can be used for representing the features of position coordinates of each vertex in a grid space in the first coding layer. In addition, the first coding layer can also obtain joint point rotation characteristics aiming at the two-dimensional image of the human body, and the joint point rotation characteristics can be used for representing the state of the joint point corresponding to the two-dimensional image of the human body. The vertices and joints may be as shown in particular in fig. 2.

Fig. 2 is a schematic diagram of a three-dimensional manikin provided in the present specification.

As can be seen from fig. 2, the three-dimensional mannequin may be a three-dimensional model surrounded by a plurality of grid spaces, and the vertex of each grid space may be regarded as the vertex of the three-dimensional mannequin, while the joint points in fig. 2 correspond to the joint parts of the human body, and the states of the joint points may be represented by the rotation characteristics of the joint points.

Because motion correlation often exists between the joint points, in order to more accurately reflect the motion correlation between the joint points, the joint points can be divided into father joint points and son joint points, wherein the joint point state information of the father joint points can influence the joint point state information of son joint points of the joint points, and the joint points without the father joint points can be used as root joint points. In addition, since each node tends to have multiple parent nodes, i.e., each node tends to be affected by multiple parent nodes, for example: the motion of the wrist articulation point is often affected by the elbow articulation point motion and the shoulder articulation point motion, and thus the motion correlation between the articulation points can be characterized in terms of a joint probability distribution.

Therefore, the influence of the parent node needs to be taken into account when determining the rotation characteristics of the nodes of each node. The rotation characteristics of the root node can be determined through the first coding layer, and then the rotation characteristics of the joint points of each sub-node can be determined according to the rotation characteristics of the root node and joint probability distribution among the joint points.

Specifically, the first coding layer may determine, at first, positions of vertices corresponding to the probability density distribution of the snow in the neighborhood of the root joint point in the two-dimensional image of the human body, as positions of candidate vertices corresponding to the root joint point, then determine positions of partial vertices corresponding to modes of the probability density distribution of the snow from the positions of candidate vertices corresponding to the root joint point, then determine a rotation matrix corresponding to positions of partial vertices corresponding to modes of the probability density of the snow, and as a rotation matrix corresponding to the root joint point, and further determine rotation characteristics of the root joint point according to the rotation matrix corresponding to the root joint point. The mode of the probability density of the snow is corresponding to a region with concentrated distribution, wherein the probability density distribution of the snow on the rotation matrix corresponding to the root node can be referred to as the following formula:

Wherein,can be used for representing the snow probability distribution of the rotation matrix corresponding to the root joint point, +.>Can be used to represent the corresponding rotation matrix of the root joint point,>can be used to represent the trace of the matrix, +.>Parameters that can be used to represent the rotation matrix corresponding to the root joint point, < >>May be used to represent parameters of the rotation matrix corresponding to the root joint point.

Then, since each child node is often affected by a plurality of parent nodes, the influence of the rotation characteristics of its parent node needs to be considered in determining the rotation characteristics of each child node. Therefore, the joint point rotation characteristics of each sub-joint point can be determined according to the joint point rotation characteristics of the root joint point and joint probability distribution among the joint points.

Specifically, for each sub-node, each vertex position conforming to the probability density distribution of the fischer snow in the neighborhood of the sub-node may be determined first, and used as each candidate vertex position corresponding to the sub-node. And then, determining partial vertex positions corresponding to the modes of the snow probability density distribution from the candidate vertex positions corresponding to the sub-joint points, determining a rotation matrix corresponding to the partial vertex positions corresponding to the modes of the snow probability density according to the determined partial vertex positions corresponding to the modes of the snow probability density distribution, and further determining the basic rotation characteristics of the sub-joint points by the basic rotation matrix corresponding to the sub-joint points.

And then, determining the rotation characteristics of the child joint point according to the basic rotation characteristics of the child joint point, the rotation characteristics of the parent joint point corresponding to the child joint point and the joint probability distribution between the child joint point and the parent joint point.

Assuming that the child node has two parent nodes, namely a first parent node and a second parent node, the joint probability distribution between the child node and the parent nodes can be specifically referred to by the following formula:

wherein,may be used to represent the snow probability distribution of the rotation matrix corresponding to the first parent node,can be used to represent the snow probability distribution of the rotation matrix corresponding to the second parent node,/for the rotation matrix>May be used to represent a joint probability value between the child and parent nodes.

The probability density distribution of the snow on the rotation matrix corresponding to the sub-joint point can be specifically referred to as the following formula:

wherein,can be used for representing the Fisher probability distribution on the basis rotation matrix corresponding to the sub-joint points, </i >>Can be used for representing the Fisher probability distribution on the rotation matrix corresponding to the child node under the influence of the parent node>The method can be used for representing the snow probability distribution when the rotation states of the child joint points and the father joint points are the same, and further, the rotation matrix corresponding to the child joint points can be determined, so that the rotation characteristics of the child joint points can be determined.

Of course, there is also a child node with only one parent node in each child node. Then, for a child node that has only one parent node, only the effect of the one parent node need be considered in determining the rotation characteristics of the child node, and will not be described herein.

In addition, a shape feature for the two-dimensional image of the human body can also be obtained by the first encoding layer, which shape feature can be used to characterize three-dimensional shape information for the human body in the two-dimensional image of the human body. In addition, the server may also input camera parameters of a camera that captures two-dimensional images of the human body into the first encoding layer to obtain camera features. The camera parameters input into the first coding layer may be parameters such as focal length and camera position, the first coding layer may code the input camera parameters, and the obtained camera features may be used to characterize physical features of all cameras for two-dimensional images of the human body.

It should be noted that the first coding layer mentioned in this specification may employ a coding layer such as in a skin Multi-Person Linear (SMPL) model.

S104: inputting the image proxy feature, the vertex feature, the joint rotation feature, the shape feature and the camera feature to a second encoding layer in the modeling model, so that the second encoding layer correlates the image proxy feature, the vertex feature, the joint rotation feature, the shape feature and the camera feature to obtain an encoding feature.

The server may input the image proxy feature, the vertex feature, the joint rotation feature, the shape feature, and the camera feature to a second encoding layer in the modeling model, and the image proxy feature, the vertex feature, the joint rotation feature, the shape feature, and the camera feature may be associated by the second encoding layer to obtain the encoded feature.

Specifically, the server may first determine each image proxy tag corresponding to an image proxy feature, then may linearly combine each image proxy tag with each vertex feature in the first encoding layer to determine each vertex tag, may linearly combine each image proxy tag with each joint point rotation feature in the first encoding layer to determine each joint point rotation tag, and may linearly combine each image proxy tag with each shape feature in the first encoding layer to determine each shape tag, and may linearly combine each image proxy tag with each camera feature in the first encoding layer to determine each camera tag.

The server may then input the image proxy features and each image proxy marker, vertex features and each vertex marker, joint rotation features and each joint rotation marker, shape features and each shape marker, and camera features and each camera marker to a second encoding layer in the modeling model.

The server may then determine, via the second encoding layer, association information between image proxy features included in the image proxy features based on the image proxy marks, association information between vertex features included in the vertex features based on the vertex marks, association information between joint rotation features included in the joint rotation features based on the joint rotation marks, association information between shape features included in the shape features based on the shape marks, and association information between camera features included in the camera features based on the camera marks.

Specifically, the server may determine each association information by using a self-attention mechanism through a second encoding layer (such as a encoding layer of a transducer), and calculate each association information between each image proxy feature included in the image proxy feature, each vertex feature, each joint rotation feature included in the joint rotation feature, each shape feature included in the shape feature, and each camera feature included in the camera feature by obtaining a query vector, a key vector, and a value vector for each image proxy feature, each vertex mark, each joint rotation mark, each shape mark, and each camera mark.

S105: and obtaining a vertex coordinate set of each model vertex and a joint point coordinate set of each joint point contained in the three-dimensional human model corresponding to the human body two-dimensional image according to the coding features.

S106: training the modeling model by minimizing the deviation between the vertex coordinate set and the actual vertex coordinate set corresponding to the two-dimensional image of the human body and minimizing the deviation between the joint point coordinate set and the actual joint point coordinate set corresponding to the two-dimensional image of the human body.

The server may input the encoded features to the decoder, and may further obtain a vertex coordinate set of each model vertex and a joint point coordinate set of each joint point included in the three-dimensional human model corresponding to the human two-dimensional image.

And then, the server can acquire an actual vertex coordinate set and an actual joint point coordinate set corresponding to the two-dimensional image of the human body, and can train the modeling model by minimizing the deviation between the vertex coordinate set and the actual vertex coordinate set and the deviation between the joint point coordinate set and the actual joint point coordinate set.

According to the method, the characteristics of the two-dimensional image of the human body can be obtained from a plurality of angles such as vertexes, joint points and the like in the process of coding, and further, a more accurate three-dimensional human body model can be constructed based on the characteristics of the plurality of angles in the process of using the modeling model. In addition, the correlation information in each of the image agent feature, the vertex feature, the joint rotation feature, the shape feature and the camera feature can be obtained through a self-attention mechanism, so that the modeling model can learn the correlation of the features, and pay attention to important partial features, so that a three-dimensional human body model which is more consistent with a two-dimensional image of a human body can be constructed, and the accuracy of constructing the three-dimensional human body model by the modeling model is further improved.

The above-described method for training a modeling model can be applied to actual three-dimensional human modeling after the modeling model is trained, and the method for three-dimensional human modeling provided in the present specification will be described in detail below, as shown in fig. 3.

Fig. 3 is a flow chart of a three-dimensional modeling method for human body provided in the present specification, specifically including the following steps:

s301: and acquiring a two-dimensional image of the human body.

For the three-dimensional human modeling method provided in the present specification, the execution subject may be a server, or may be a terminal device such as a desktop computer, a notebook computer, or a dedicated device for performing image modeling, and the following will be described in detail by taking the terminal device as an example.

On the basis, a modeling model obtained through training by the method can be deployed in the terminal equipment, and before modeling, the terminal equipment needs to acquire the two-dimensional images of each human body by the camera, and construct a three-dimensional human body model through the acquired two-dimensional images of the human body, camera parameters of the camera and the pre-trained modeling model. The two-dimensional images of the human body acquired by the camera can be a series of two-dimensional images of the human body, which are obtained by the pointer at different shooting angles of the same human body at the same time.

S302: and obtaining image proxy features aiming at the human body two-dimensional image according to the human body two-dimensional image, wherein the image proxy features are used for representing the integral features aiming at the human body two-dimensional image.

After the terminal device acquires the two-dimensional image of the human body, the image proxy feature for representing the integral feature of the human body can be determined first, wherein the process of determining the image proxy feature by the terminal device is basically the same as the process of determining the proxy feature in the model training process, and detailed description is omitted herein.

S303: inputting the image agent characteristics into a first coding layer in a pre-trained modeling model to obtain vertex characteristics, joint rotation characteristics and shape characteristics aiming at the two-dimensional image of the human body, and inputting camera parameters of a camera for acquiring the two-dimensional image of the human body into the first coding layer to obtain camera characteristics, wherein the modeling model is trained by the model training method.

S304: inputting the image proxy feature, the vertex feature, the joint rotation feature, the shape feature and the camera feature to a second encoding layer in the modeling model, so that the second encoding layer correlates the image proxy feature, the vertex feature, the joint rotation feature, the shape feature and the camera feature to obtain an encoding feature.

After determining the image proxy feature, the terminal device may input the image proxy feature and camera parameters of a camera for acquiring a two-dimensional image of the human body into a first coding layer in a modeling model, where, for the first coding layer, the image proxy feature may be processed in the same manner as in the model training process, so as to obtain vertex features, rotation features of joint points and shape features of the two-dimensional image of the human body, where these features may be used to describe fine features of the human body from various dimensions and more details. The first coding layer also needs to determine the camera characteristic of the camera for collecting the two-dimensional image of the human body according to the acquired camera parameters, and the camera characteristic is actually used for reflecting shooting conditions such as shooting angles, focal distances and the like when the camera shoots the two-dimensional image of the human body, so that the modeling model can accurately model based on the acquired two-dimensional image of the human body based on the actual shooting conditions of the camera.

After the above features are obtained, the above features are further required to be input into a second coding layer in the modeling model, and the second coding layer codes the features on the basis of the features to obtain coding features, wherein the process of obtaining the coding features is basically the same as the process of training the model, and detailed descriptions thereof are omitted herein.

S305: and obtaining a vertex coordinate set of each model vertex and a joint point coordinate set of each joint point contained in the three-dimensional human model corresponding to the human body two-dimensional image according to the coding features.

S306: and carrying out three-dimensional modeling on the human body according to the vertex coordinate set of each model vertex and the joint point coordinate set of each joint point so as to obtain a three-dimensional human body model corresponding to the two-dimensional human body image.

After the coding features are obtained, the modeling model performs decoding operation based on the coding features, so as to determine a vertex coordinate set of each model vertex and a joint point coordinate set of each joint point contained in the three-dimensional human model.

That is, the modeling model may input the encoded features into a preset decoder to obtain vertex coordinate sets and joint point coordinate sets through the decoder, and the coordinate sets actually reflect positions of the respective vertices and the respective joint points in the virtual space, so that the terminal device may render the three-dimensional manikin through the coordinate sets.

It should be noted that, because the modeling model can further determine the joint probability distribution between the joints, that is, the modeling model considers the constraint relation of the joints of the human body in the motion process during modeling, the modeling model can actually output some coordinate point sets, and these coordinate point sets can construct a plurality of three-dimensional models of the human body, which are equivalent to generating a series of three-dimensional models of the human body with continuous motion from the visual effect, and these three-dimensional models of the human body conform to the constraint condition of the motions of the joints in the motion gesture. Therefore, the three-dimensional human modeling method provided by the specification not only can efficiently and accurately model the three-dimensional human model, but also can ensure the rationality of kinematic actions when constructing a plurality of three-dimensional human models with continuous actions, so that the experience of a user can be further improved when the three-dimensional human modeling method provided by the specification is applied to scenes such as virtual reality, augmented reality and the like.

The above is a method implemented by one or more embodiments of the present specification, and based on the same thought, the present specification further provides a training device of a corresponding modeling model and a device for three-dimensional modeling of a human body, as shown in fig. 4 and 5.

FIG. 4 is a schematic diagram of a training apparatus for modeling models provided herein, comprising:

an acquisition module 401, configured to acquire a two-dimensional image of a human body;

the proxy module 402 is configured to obtain, according to the two-dimensional image of the human body, an image proxy feature for the two-dimensional image of the human body, where the image proxy feature is used to characterize an overall feature for the two-dimensional image of the human body;

a first encoding module 403, configured to input the image proxy feature to a first encoding layer in a modeling model, obtain a vertex feature, a joint rotation feature, and a shape feature for the two-dimensional image of the human body, and input camera parameters of a camera that acquires the two-dimensional image of the human body to the first encoding layer to obtain a camera feature;

a second encoding module 404, configured to input the image proxy feature, the vertex feature, the joint rotation feature, the shape feature, and the camera feature to a second encoding layer in the modeling model, so that the second encoding layer correlates the image proxy feature, the vertex feature, the joint rotation feature, the shape feature, and the camera feature to obtain an encoded feature;

The prediction module 405 is configured to obtain, according to the coding feature, a vertex coordinate set of each model vertex and a joint point coordinate set of each joint point included in the three-dimensional human model corresponding to the human body two-dimensional image;

the training module 406 is configured to train the modeling model by minimizing a deviation between the vertex coordinate set and an actual vertex coordinate set corresponding to the two-dimensional image of the human body, and minimizing a deviation between the joint point coordinate set and an actual joint point coordinate set corresponding to the two-dimensional image of the human body.

Optionally, the agent module 402 is specifically configured to input the two-dimensional image of the human body into a preset extraction model, so that the extraction model extracts image edge features for the two-dimensional image of the human body; acquiring two-dimensional joint thermodynamic diagram characteristics of the two-dimensional image of the human body; and splicing the image edge features with the two-dimensional joint thermodynamic diagram features to obtain spliced features, and taking the spliced features as image proxy features aiming at the two-dimensional human body images.

The first encoding module 403 is specifically configured to input the image proxy feature to a first encoding layer of a modeling model, so that the first encoding layer determines a rotation feature of a joint point of a root node of the two-dimensional image of the human body, and determines the rotation feature of the joint point of each sub-node according to the rotation feature of the joint point of the root node and a joint probability distribution between the joint points, where the joint probability distribution between the joint points is used to characterize a motion correlation between the joint points.

Optionally, the first encoding module 403 is specifically configured to input the image proxy feature to a first encoding layer of a modeling model, so that the first encoding layer determines, for each vertex position conforming to a snow-cost probability density distribution in a neighborhood range where a root joint point in the two-dimensional image of the human body is located, as each candidate vertex position corresponding to the root joint point, determine a partial vertex position corresponding to a mode of the snow-cost probability density distribution from each candidate vertex position corresponding to the root joint point, and determine a rotation feature of the root joint point according to the partial vertex position corresponding to the mode of the snow-cost probability density distribution.

Optionally, the first encoding module 403 is specifically configured to determine, for each sub-node, each vertex position that meets the fischer-tropsch probability density distribution in the neighborhood range where the sub-node is located, as each candidate vertex position corresponding to the sub-node; determining partial vertex positions corresponding to modes of the snow probability density distribution from the candidate vertex positions corresponding to the sub-joint points, and determining basic rotation characteristics of the sub-joint points according to the determined partial vertex positions corresponding to the modes of the snow probability density distribution; and determining the rotation characteristics of the child node according to the basic rotation characteristics of the child node, the rotation characteristics of the parent node corresponding to the child node and the joint probability distribution between the child node and the parent node.

Optionally, the second encoding module 404 is specifically configured to determine each image agent label corresponding to the image agent feature; determining each vertex mark corresponding to the vertex feature according to each image proxy mark and the vertex feature, determining each joint point rotation mark corresponding to the joint point rotation feature according to each image proxy mark and the joint point rotation feature, determining each shape mark corresponding to the shape feature according to each image proxy mark and the shape feature, and determining each camera mark corresponding to the camera feature according to each image proxy mark and the camera feature; the image agent features, the image agent marks corresponding to the image agent features, the vertex marks corresponding to the vertex features, the joint point rotation marks corresponding to the joint point rotation features, the shape marks corresponding to the shape features, and the camera marks corresponding to the camera features are input into a second coding layer in the modeling model, so that the second coding layer determines the association information between the image agent features contained in the image agent features according to the image agent marks, the association information between the vertex features contained in the vertex features according to the vertex marks, the association information between the joint point rotation features contained in the joint point rotation features according to the joint point rotation marks, the association information between the shape features contained in the shape features according to the shape marks, and the association information between the camera features contained in the camera features according to the camera marks.

Fig. 5 is a schematic diagram of a three-dimensional modeling apparatus provided in the present specification, including:

acquisition module 501: the method is used for collecting two-dimensional images of the human body;

agent module 502: the image proxy feature is used for representing the overall feature of the human body two-dimensional image;

a first encoding module 503, configured to input the image proxy feature to a first encoding layer in a pre-trained modeling model, obtain vertex features, joint rotation features, and shape features for the two-dimensional image of the human body, and input camera parameters of a camera that acquires the two-dimensional image of the human body to the first encoding layer to obtain camera features, where the modeling model is obtained by training the training method of the modeling model;

a second encoding module 504, configured to input the image proxy feature, the vertex feature, the joint rotation feature, the shape feature, and the camera feature to a second encoding layer in the modeling model, so that the second encoding layer correlates the image proxy feature, the vertex feature, the joint rotation feature, the shape feature, and the camera feature to obtain an encoded feature;

The prediction module 505 is configured to obtain, according to the coding feature, a vertex coordinate set of each model vertex and a joint point coordinate set of each joint point included in the three-dimensional human model corresponding to the human body two-dimensional image;

the modeling module 506 is configured to perform three-dimensional modeling on a human body according to the vertex coordinate set of each model vertex and the joint point coordinate set of each joint point, so as to obtain a three-dimensional human body model corresponding to the two-dimensional image of the human body.

The present specification also provides a computer readable storage medium storing a computer program operable to perform the training method of the modeling model provided in fig. 1 above or the three-dimensional human modeling method provided in fig. 3 above.

The present specification also provides a schematic structural diagram of an electronic device corresponding to fig. 1 or 3 shown in fig. 6. At the hardware level, as shown in fig. 6, the electronic device includes a processor, an internal bus, a network interface, a memory, and a nonvolatile storage, and may of course include hardware required by other services. The processor reads the corresponding computer program from the nonvolatile memory into the memory and then runs to implement the training method of the modeling model described above in fig. 1 or the three-dimensional human modeling method provided in fig. 3.

Of course, other implementations, such as logic devices or combinations of hardware and software, are not excluded from the present description, that is, the execution subject of the following processing flows is not limited to each logic unit, but may be hardware or logic devices.

In the 90 s of the 20 th century, improvements to one technology could clearly be distinguished as improvements in hardware (e.g., improvements to circuit structures such as diodes, transistors, switches, etc.) or software (improvements to the process flow). However, with the development of technology, many improvements of the current method flows can be regarded as direct improvements of hardware circuit structures. Designers almost always obtain corresponding hardware circuit structures by programming improved method flows into hardware circuits. Therefore, an improvement of a method flow cannot be said to be realized by a hardware entity module. For example, a programmable logic device (Programmable Logic Device, PLD) (e.g., field programmable gate array (Field Programmable Gate Array, FPGA)) is an integrated circuit whose logic function is determined by the programming of the device by a user. A designer programs to "integrate" a digital system onto a PLD without requiring the chip manufacturer to design and fabricate application-specific integrated circuit chips. Moreover, nowadays, instead of manually manufacturing integrated circuit chips, such programming is mostly implemented by using "logic compiler" software, which is similar to the software compiler used in program development and writing, and the original code before the compiling is also written in a specific programming language, which is called hardware description language (Hardware Description Language, HDL), but not just one of the hdds, but a plurality of kinds, such as ABEL (Advanced Boolean Expression Language), AHDL (Altera Hardware Description Language), confluence, CUPL (Cornell University Programming Language), HDCal, JHDL (Java Hardware Description Language), lava, lola, myHDL, PALASM, RHDL (Ruby Hardware Description Language), etc., VHDL (Very-High-Speed Integrated Circuit Hardware Description Language) and Verilog are currently most commonly used. It will also be apparent to those skilled in the art that a hardware circuit implementing the logic method flow can be readily obtained by merely slightly programming the method flow into an integrated circuit using several of the hardware description languages described above.

The controller may be implemented in any suitable manner, for example, the controller may take the form of, for example, a microprocessor or processor and a computer readable medium storing computer readable program code (e.g., software or firmware) executable by the (micro) processor, logic gates, switches, application specific integrated circuits (Application Specific Integrated Circuit, ASIC), programmable logic controllers, and embedded microcontrollers, examples of which include, but are not limited to, the following microcontrollers: ARC 625D, atmel AT91SAM, microchip PIC18F26K20, and Silicone Labs C8051F320, the memory controller may also be implemented as part of the control logic of the memory. Those skilled in the art will also appreciate that, in addition to implementing the controller in a pure computer readable program code, it is well possible to implement the same functionality by logically programming the method steps such that the controller is in the form of logic gates, switches, application specific integrated circuits, programmable logic controllers, embedded microcontrollers, etc. Such a controller may thus be regarded as a kind of hardware component, and means for performing various functions included therein may also be regarded as structures within the hardware component. Or even means for achieving the various functions may be regarded as either software modules implementing the methods or structures within hardware components.

The system, apparatus, module or unit set forth in the above embodiments may be implemented in particular by a computer chip or entity, or by a product having a certain function. One typical implementation is a computer. In particular, the computer may be, for example, a personal computer, a laptop computer, a cellular telephone, a camera phone, a smart phone, a personal digital assistant, a media player, a navigation device, an email device, a game console, a tablet computer, a wearable device, or a combination of any of these devices.

For convenience of description, the above devices are described as being functionally divided into various units, respectively. Of course, the functions of each element may be implemented in one or more software and/or hardware elements when implemented in the present specification.

It will be appreciated by those skilled in the art that embodiments of the present description may be provided as a method, system, or computer program product. Accordingly, the present specification may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present description can take the form of a computer program product on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.

The present description is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the specification. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

In one typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.

The memory may include volatile memory in a computer-readable medium, random Access Memory (RAM) and/or nonvolatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of computer-readable media.

Computer readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of storage media for a computer include, but are not limited to, phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium, which can be used to store information that can be accessed by a computing device. Computer-readable media, as defined herein, does not include transitory computer-readable media (transmission media), such as modulated data signals and carrier waves.

It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article or apparatus that comprises the element.

It will be appreciated by those skilled in the art that embodiments of the present description may be provided as a method, system, or computer program product. Accordingly, the present specification may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present description can take the form of a computer program product on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.

The description may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. The specification may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.

In this specification, each embodiment is described in a progressive manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment mainly describes differences from other embodiments. In particular, for system embodiments, since they are substantially similar to method embodiments, the description is relatively simple, as relevant to see a section of the description of method embodiments.

The foregoing is merely exemplary of the present disclosure and is not intended to limit the disclosure. Various modifications and alterations to this specification will become apparent to those skilled in the art. Any modifications, equivalent substitutions, improvements, or the like, which are within the spirit and principles of the present description, are intended to be included within the scope of the claims of the present description.

Claims

1. A method of training a modeling model, comprising:

acquiring a two-dimensional image of a human body;

2. The method of claim 1, wherein obtaining image proxy features for the two-dimensional image of the human body from the two-dimensional image of the human body, specifically comprises:

3. The method of claim 1, wherein the articulation point comprises: each father joint point and each son joint point, the joint point state information of the father joint point can influence the joint point state information of the son joint point of the joint point, and the joint point is not a root joint point of the father joint point;

4. A method according to claim 3, wherein the image proxy feature is input to a first encoding layer of a modeling model such that the first encoding layer determines a joint rotation feature for a root node of the two-dimensional image of the human body, comprising in particular:

5. A method according to claim 3, wherein determining the rotation characteristics of the joints of each sub-joint node based on the rotation characteristics of the joints of the root joint node and the joint probability distribution between the joints comprises:

6. The method of claim 1, wherein the image proxy feature, the vertex feature, the joint rotation feature, the shape feature, and the camera feature are input to a second encoding layer in the modeling model such that the second encoding layer correlates the image proxy feature, the vertex feature, the joint rotation feature, the shape feature, and the camera feature to obtain an encoded feature, comprising:

Determining each image agent mark corresponding to the image agent feature;

7. A method of three-dimensional modeling of a human body, comprising:

collecting a two-dimensional image of a human body;

inputting the image agent characteristics into a first coding layer in a pre-trained modeling model to obtain vertex characteristics, joint rotation characteristics and shape characteristics aiming at the two-dimensional image of the human body, and inputting camera parameters of a camera for acquiring the two-dimensional image of the human body into the first coding layer to obtain camera characteristics, wherein the modeling model is obtained by training according to the method of any one of claims 1-6;

8. A training device for modeling models, comprising:

9. A computer readable storage medium, characterized in that the storage medium stores a computer program which, when executed by a processor, implements the method of any of the preceding claims 1-7.

10. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the method of any of the preceding claims 1-7 when executing the program.