CN110751730B

CN110751730B - Dressing human body shape estimation method based on deep neural network

Info

Publication number: CN110751730B
Application number: CN201910669676.2A
Authority: CN
Inventors: 陈欣; 庞安琪; 张哿; 王培豪; 张迎梁
Original assignee: Plex VR Digital Technology Shanghai Co Ltd
Current assignee: Plex VR Digital Technology Shanghai Co Ltd
Priority date: 2019-07-24
Filing date: 2019-07-24
Publication date: 2023-02-24
Anticipated expiration: 2039-07-24
Also published as: CN110751730A

Abstract

The invention discloses a dressing human body shape estimation method based on a deep neural network, which constructs a three-dimensional clothes grid model database with rich clothes types and trains a deep neural network model for predicting human body grid fitting degree based on the database. The method solves the problem of estimating the real body type of the user under the complex dressing, comprehensively considers the clothing types, the grid geometric structures and the human posture actions, has more accurate detection results on the complex human body actions and the clothing types, is not limited to the input of grid sequences, can also input a three-dimensional model of a single human body, and expands the application scene of body type estimation.

Description

Dressing human body shape estimation method based on deep neural network

Technical Field

The invention relates to a neural network method, in particular to a dressing human body shape estimation method based on a deep neural network.

Background

With the popularization of 3D scanners and the emergence of mobile 3D scanning sensors based on structured light, toF (Time of Flight), etc., these devices expand the way of acquiring three-dimensional images containing depth information, so that three-dimensional phantoms become more and more popular, for example, high-quality three-dimensional phantoms can be obtained by using a Kinect-based depth camera and a circular color camera system (Dome). However, almost all existing methods perform three-dimensional reconstruction without regard to human body clothing, or more precisely, without regard to the degree of fit of the clothing. In fact, the human body geometry and the clothing geometry worn by the human body can have obvious difference, and by taking the clothes manufacturer's specification as a reference, the clothing can be roughly divided into three types of loose, close-fitting and close-fitting clothing fitting conditions, which brings great difficulty to the estimation of the real body type of the human body.

The real human body shape estimation has very important significance for the applications of virtual fitting, body shape measurement and the like. Considering that it is inconvenient and inconvenient for a user to take off clothes for body shape scanning, and the user is very complicated to wear clothes, it is very challenging to estimate the body shape of the user accurately when wearing clothes. Some existing methods estimate the body type of a human body through continuous dynamic grid sequences and the constraint of a parameterized human body model obtained through statistics. However, these methods still only deal with fitting or tight fitting garments and do not allow for accurate body shaping in the case of loose fitting garments. Therefore, the method is expected to solve the problem of clothes in the aspect of data driving, and can estimate the body type of the human body more accurately on the premise of ensuring wide clothes applicability.

Disclosure of Invention

The invention aims to provide a dressing human body shape estimation method based on a deep neural network, so as to solve the problems in the background technology.

In order to achieve the purpose, the invention provides the following technical scheme:

a dressing human body shape estimation method based on a deep neural network comprises the following steps: shooting static figures wearing different types of clothes by using a high-definition annular camera array, wherein the different types of clothes comprise loose, close-fitting and tight clothes, obtaining a point cloud model by using a multi-view three-dimensional reconstruction algorithm, and performing triangular patch processing on the point cloud model to obtain a database of a human body mesh model with certain noise on the ground and meshes; manually cutting the human body mesh model in the database, dividing the part of the human body mesh model which is not covered by the clothing, and respectively dividing the clothing and the trousers in the human body mesh model; thirdly, calculating the fitting degree of the human body mesh models wearing different types of clothes in the database, using the human body mesh model obtained by shooting with close-fitting clothes or without clothes as the true value of the human body shape, for the human body mesh model wearing loose and close-fitting clothes,the degree of fit was calculated using the following steps: firstly, matching the action of a human body mesh model of the same person wearing loose and close-fitting clothes in a database with the action of a human body mesh model of the same person wearing close-fitting clothes, so that the two models have the same action after deformation; then marking each mesh vertex of the clothes on the human body mesh model of the loose and close-fitting clothes, and finding the corresponding mesh vertex on the human body shape of each mesh vertex by using a ray tracing mode to ensure the fitting degree

Wherein V _i The ith mesh vertex representing the true human body shape at the interior,

is represented by V _i Using a ray of 15 DEG to trace the set of corresponding closest points, K _G The representation is Gaussian convolution, and the symbol containing the c superscript represents each data volume of the corresponding point found based on ray tracing; mapping the human body mesh model in the database to ensure that each vertex on the three-dimensional mesh of the human body mesh model has a mapping of a geometric image on the 2D plane, and the mapping relative position relation of other vertexes except the edge point of the 2D plane is not changed to obtain the mapping relation from the three-dimensional mesh to the 2D plane, and then mapping the three-dimensional coordinate information, the normal information and the color information of the three-dimensional mesh to be used as an input characteristic diagram of the human body mesh model; generating a neural network model of the antagonistic network by adopting conditions to carry out deep neural network prediction, using the input characteristic diagram as the input of the network, wherein the output of the network is the 2D plane mapping fitting degree, and the output of the network is inversely mapped to obtain the predicted fitting degree value of the human body mesh model; subtracting the fitting degree of each vertex in the model by using the human body mesh model with clothes, so as to obtain the human body mesh model without clothes, and smoothing the model to obtain the estimation of the human body shape.

As a further scheme of the invention: the high-definition annular camera array is composed of 80 high-definition annular cameras with 4K resolution.

As a still further scheme of the invention: the types of the clothes comprise short-sleeved T-shirts, long-sleeved T-shirts, hoodie-linked sweaters, down coats, jackets, shorts and trousers.

As a still further scheme of the invention: the objective function of the conditional generation countermeasure network is:

wherein L is _GAN (G, D) represents the objective function of the generation of the competing network, D (x, y) represents the score of the arbiter, G (x, z) represents the score of the generator, I E _x,y With I E _x,z All of the expectations are corresponding, and the loss function of the network uses the L1 norm, defined as follows:

wherein

A loss function representing the loss of the network,

representing the corresponding expectation, G (x, z) represents the generator score.

As a further scheme of the invention: in the fourth step, a method of a structural diagram is used for mapping the human body mesh model in the database.

As a still further scheme of the invention: the neural network model of the conditional generation countermeasure network can learn a conditional mapping generator G and a discriminator D for the countermeasure generator to discriminate its generation authenticity, where G can map an input condition x and a randomly generated noise vector z to an output y.

Compared with the prior art, the invention has the beneficial effects that: the method solves the problem of estimating the real body type of the user under complex clothes wearing, comprehensively considers the clothes types (short-sleeved T-shirts, long-sleeved T-shirts, short coats and long coats), the grid geometric structure and the human posture actions, has more accurate detection results on complex human body actions and clothes types, is not limited to the input of grid sequences, can input a three-dimensional model of a single human body, and expands the application scene of body type estimation.

Drawings

FIG. 1 is a schematic diagram of a high definition loop camera array for capturing a static figure wearing different types of clothing in the method of the present invention.

Fig. 2 is a schematic diagram of the method of the present invention for collecting human body data by itself.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be obtained by a person skilled in the art without making any creative effort based on the embodiments in the present invention, belong to the protection scope of the present invention.

As an embodiment of the invention, a dressing human body shape estimation method based on a deep neural network comprises the following steps:

firstly, shooting static people wearing different types of clothes by using a high-definition annular camera array consisting of 80 high-definition annular cameras with 4K resolution, wherein the different types of clothes comprise loose, close-fitting and close-fitting clothes (short-sleeved T-shirts, long-sleeved T-shirts, hoodie-linked sweaters, down jackets, shorts, trousers and the like), obtaining a point cloud model by using a multi-view angle stereo reconstruction algorithm, and performing triangular patch processing on the point cloud model to obtain a database of a human body mesh model with ground and meshes having certain noise;

manually cutting the human body mesh model in the database, dividing the part of the human body mesh model which is not covered by the clothing, and dividing different parts of different clothing types in the human body mesh model;

calculating the fitting degree of the human body mesh models with different types of clothes in the database, using the human body mesh models obtained after the close-fitting clothes or the clothes are not worn as the true values of the human body shape, and calculating the fitting degree of the human body mesh models with loose and close-fitting clothes by using the following steps: firstly, carrying out deformation matching on a human body mesh model of the same person wearing loose and close-fitting clothes and a human body mesh model of the same person wearing tight-fitting clothes in a database, so that the two models have the same action after deformation; then marking each mesh vertex of the clothes on the human body mesh model of the loose and close-fitting clothes, and finding the corresponding mesh vertex on the human body shape of each mesh vertex by using a ray tracing mode to ensure the fitting degree

represents by V _i Using a ray of 15 DEG to trace the set of corresponding closest points, K _G Represented is a gaussian convolution, with the symbols with the c superscript representing the respective data volumes of the Corresponding points (coresponsoring points) found on the basis of ray tracing;

mapping the human body mesh model in the database by using a method of a structure diagram, so that each vertex on the three-dimensional mesh of the human body mesh model has a mapping of a geometric image on a 2D plane, the mapping relation from the three-dimensional mesh to the 2D plane is obtained without changing the relative position relation of mapping of other vertexes except the edge point of the 2D plane, and then the three-dimensional coordinate information, the normal information and the color information of the three-dimensional mesh are mapped to be used as an input characteristic diagram of the human body mesh model;

generating a neural network model of the antagonistic network by adopting conditions to carry out deep neural network prediction, using the input characteristic diagram as the input of the network, wherein the output of the network is the 2D plane mapping fitting degree, and the output of the network is inversely mapped to obtain the predicted fitting degree value of the human body mesh model; the neural network model of the conditional generation countermeasure network can learn a conditional mapping generator G and a discriminator D used for discriminating the authenticity of the generation of the conditional mapping generator G, wherein G can map an input condition x and a randomly generated noise vector z into an output y; the objective function of the conditional generation countermeasure network is:

wherein L is _GAN (G, D) represents the objective function of the generation of the competing network, D (x, y) represents the score of the arbiter, G (x, z) represents the score of the generator, I E _x,y With I E _x,z Are all corresponding expectations. The loss function of the network uses the L1 modulo, defined as follows:

wherein

A loss function representing the loss of the network,

represents the corresponding expectation, as above, G (x, z) represents the generator score;

subtracting the fitting degree of each vertex in the model by using the human body mesh model with clothes, so as to obtain the human body mesh model without clothes, and smoothing the model to obtain the estimation of the human body shape.

In conclusion, the method solves the problem of estimation of the real body type of the user under complex clothes wearing, comprehensively considers the clothes types (short-sleeved T-shirts, long-sleeved T-shirts, short coats and long coats), the grid geometric structure and the human posture actions, and has accurate detection results on complex human body actions and clothes types. Because the network parameters are not too complex, the network transplantation of the mobile terminal equipment is possible, the body type estimation work can be realized on the mobile terminal equipment, the human body type acquisition and body type estimation based on the mobile terminal are further completed, and the possibility is brought to virtual fitting and accurate measurement of a human body. Under the test of collecting human body data (see fig. 2) by self, the reconstruction error between the body type result obtained based on network estimation and the real body type result is 5%, and the credible body type result can be accurately estimated.

It will be evident to those skilled in the art that the invention is not limited to the details of the foregoing illustrative embodiments, and that the present invention may be embodied in other specific forms without departing from the spirit or essential attributes thereof. The present embodiments are therefore to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference sign in a claim should not be construed as limiting the claim concerned.

Furthermore, it should be understood that although the present description refers to embodiments, not every embodiment may contain only a single embodiment, and such description is for clarity only, and those skilled in the art should integrate the description, and the embodiments may be combined as appropriate to form other embodiments understood by those skilled in the art.

Claims

1. A dressing human body shape estimation method based on a deep neural network is characterized by comprising the following steps: shooting static figures wearing different types of clothes including loose, close-fitting and close-fitting by using a high-definition annular camera array, obtaining a point cloud model by using a multi-view angle stereo reconstruction algorithm, and carrying out image processing on the point cloud modelProcessing a triangular patch to obtain a database of a human body mesh model with ground and certain noise on meshes; cutting the human body mesh model in the database, cutting out the part of the human body mesh model which is not covered by the clothing, and respectively cutting out the clothing and the trousers in the human body mesh model; calculating the fitting degree of the human body mesh models wearing different types of clothes in the database, using the human body mesh models obtained by shooting with close-fitting clothes or without clothes as the true values of the human body shapes, and calculating the fitting degree of the human body mesh models wearing loose close-fitting clothes by using the following steps: firstly, matching the action of a human body mesh model of the same person wearing loose and close-fitting clothes in a database with the action of a human body mesh model of the same person wearing close-fitting clothes, so that the two models have the same action after deformation; then marking each mesh vertex of the clothes on the human body mesh model of the loose and close-fitting clothes, and finding the corresponding mesh vertex on the human body shape of each mesh vertex by using a ray tracing mode to ensure the fitting degree

represents by V _i Using a ray of 15 DEG to trace the set of corresponding closest points, K _G The representation is Gaussian convolution, and the symbol containing the c superscript represents each data volume of the corresponding point found based on ray tracing; mapping the human body mesh model in the database to ensure that each vertex on the three-dimensional mesh of the human body mesh model has a mapping of a geometric image on the 2D plane, and the relative position relation of the mapping of other vertexes except the edge point of the 2D plane is not changed to obtain the mapping relation from the three-dimensional mesh to the 2D plane, and then mapping the three-dimensional coordinate information, the normal information and the color information of the three-dimensional mesh to be used as the input characteristic of the human body mesh modelDrawing; generating a neural network model of the antagonistic network by adopting conditions to carry out deep neural network prediction, using the input characteristic diagram as the input of the network, wherein the output of the network is the 2D plane mapping fitting degree, and the output of the network is inversely mapped to obtain the predicted fitting degree value of the human body mesh model; subtracting the fitting degree of each vertex in the model by using the human body mesh model with clothes, so as to obtain the human body mesh model without clothes, and smoothing the model to obtain the estimation of the human body shape.

2. The method for estimating the size of a dressed human body according to claim 1, wherein the high definition loop camera array is composed of 80 high definition loop cameras with 4K resolution.

3. The method of estimating a body shape of a wearer based on a deep neural network as claimed in claim 1 or 2, wherein the types of clothes include short-sleeved T-shirts, long-sleeved T-shirts, hooded sweaters, down jackets, shorts, and trousers.

4. The method for estimating body shape of dressed human body based on deep neural network as claimed in claim 3, wherein in step four, the method of structure map is used to map the human mesh model in said database.

5. The method as claimed in claim 4, wherein the neural network model of the conditional generation countermeasure network learns a conditional mapping generator G and a discriminator D for the countermeasure generator to discriminate the authenticity of the generation, wherein the conditional mapping generator G maps the input condition x and the randomly generated noise vector z into the output y.

6. The method of estimating the body shape of a dressed human body based on a deep neural network as claimed in claim 5, wherein the objective function of the conditional generation countermeasure network is:

wherein

A loss function representing the loss of the network,