CN111311729A - Natural scene three-dimensional human body posture reconstruction method based on bidirectional projection network - Google Patents

Natural scene three-dimensional human body posture reconstruction method based on bidirectional projection network Download PDF

Info

Publication number
CN111311729A
CN111311729A CN202010056119.6A CN202010056119A CN111311729A CN 111311729 A CN111311729 A CN 111311729A CN 202010056119 A CN202010056119 A CN 202010056119A CN 111311729 A CN111311729 A CN 111311729A
Authority
CN
China
Prior art keywords
dimensional
posture
attitude
network
projection
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010056119.6A
Other languages
Chinese (zh)
Other versions
CN111311729B (en
Inventor
林杰
崔健
石光明
刘丹华
李甫
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xidian University
Original Assignee
Xidian University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xidian University filed Critical Xidian University
Priority to CN202010056119.6A priority Critical patent/CN111311729B/en
Publication of CN111311729A publication Critical patent/CN111311729A/en
Application granted granted Critical
Publication of CN111311729B publication Critical patent/CN111311729B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T17/00Three dimensional [3D] modelling, e.g. data description of 3D objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T19/00Manipulating 3D models or images for computer graphics
    • G06T19/006Mixed reality
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/06Topological mapping of higher dimensional structures onto lower dimensional surfaces
    • G06T3/067Reshaping or unfolding 3D tree structures onto 2D planes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Graphics (AREA)
  • Software Systems (AREA)
  • Geometry (AREA)
  • Computer Hardware Design (AREA)
  • General Engineering & Computer Science (AREA)
  • Human Computer Interaction (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a natural scene three-dimensional human body posture reconstruction method based on a bidirectional projection network, which aims at solving the problem that the human body three-dimensional posture reconstruction process in the prior art still needs to be improved. The invention comprises the following steps: firstly, acquiring data by using a camera; secondly, sending the collected video and image data to a two-dimensional posture detector to obtain two-dimensional human body joint point coordinates of corresponding postures; designing two-way projection networks with two structures according to the existence of the three-dimensional attitude data tags in the training process; training the designed network by using a deep-antagonistic learning strategy, minimizing a network loss function, and iterating to finally obtain a trained three-dimensional posture generator; and fifthly, inputting the output result of the two-dimensional posture detector in the step two into the three-dimensional posture generator trained in the step four. The technology is low in cost, can assist VR and AR technologies in the 5G era, establishes portable somatosensory interaction equipment, and realizes large-scale popularization and application of three-dimensional motion reconstruction technologies.

Description

Natural scene three-dimensional human body posture reconstruction method based on bidirectional projection network
Technical Field
The invention relates to the technical field of computer vision, in particular to a natural scene three-dimensional human body posture reconstruction method based on a bidirectional projection network.
Background
In the virtual reality technology and somatosensory human-computer interaction, the motion of a human body is usually required to be accurately captured, and a moving human body three-dimensional skeleton is reconstructed. The existing method usually uses some hardware peripherals such as a professional motion capture device (MOCAP) or a somatosensory camera (Kinect) to complete the reconstruction of the three-dimensional human body posture. However, these professional devices are usually expensive and have extremely high requirements on experimental environments, which hinders the wide-range popularization and application of the three-dimensional pose reconstruction technology. The estimation of the 3D posture of the human body in the monocular image is a difficult task in computer vision, and the reconstruction of the three-dimensional posture based on the 2D joint point is a difficult pathological problem. Most of the existing methods usually rely on paired label data to perform supervised training on the network, and model performance is poor under the conditions of label data shortage and clear corresponding relation. Therefore, the human body three-dimensional posture reconstruction process is improved by utilizing the deep learning technology, so that the whole process can get rid of the dependence of professional hardware peripherals, and the three-dimensional human body posture reconstruction in a natural scene can be completed only by using a common mobile phone and a camera.
In the existing deep learning method, a network is usually trained by relying on paired human posture data with labels, the model is difficult to train and has poor generalization performance under the conditions of lacking of three-dimensional labels and clear corresponding relation, and reasonable three-dimensional reconstruction is difficult to be carried out on complicated and changeable special human postures in a natural environment. Therefore, the method for reconstructing the three-dimensional human body posture can be used for accurately reconstructing the three-dimensional human body posture in a natural scene, a deep learning scheme which does not depend on tag data in a training process is significant, professional motion capture equipment can be replaced with extremely low cost, and the reconstruction of the three-dimensional posture in the natural scene can be completed.
Disclosure of Invention
The invention overcomes the problem that the human body three-dimensional posture reconstruction process still needs to be improved in the prior art, and provides a natural scene three-dimensional human body posture reconstruction method based on a bidirectional projection network, which can carry out three-dimensional reconstruction on human body actions in a natural scene by using a monocular camera.
The technical scheme of the invention is to provide a natural scene three-dimensional human body posture reconstruction method based on a bidirectional projection network, which comprises the following steps: comprises the following steps:
acquiring a natural scene human motion video or image data by using a camera;
secondly, sending the collected video and image data to a two-dimensional posture detector to obtain two-dimensional human body joint point coordinates of corresponding postures;
designing two-way projection networks with two structures according to the existence of the three-dimensional attitude data tags in the training process;
training the designed network by using a deep antagonistic learning strategy, minimizing a network loss function, and finally obtaining a trained three-dimensional posture generator through iteration;
and step five, inputting the output result of the two-dimensional posture detector in the step two into the trained three-dimensional posture generator in the step four, and outputting the output result to be the three-dimensional posture data of the person in the video/image.
Preferably, in the first step, a common monocular optical camera or a mobile phone camera is used to complete the acquisition of the character motion data in a natural scene, and the data is in the form of pictures or videos.
Preferably, in the second step, the two-dimensional gesture detector is a two-dimensional gesture detection method of openpos, StackHourglass, or HRNet, and when the acquired data is a picture, the picture is directly input to obtain a two-dimensional joint detection result, and when the acquired data is a video, a two-dimensional joint detection sequence is obtained by inputting frame by frame.
Preferably, in the third step, two bidirectional projection networks with different structures A/B are selected according to whether the user has three-dimensional attitude tag data, when the three-dimensional attitude data is available, the bidirectional projection networks work in an A mode, the network is composed of two opposite dual branches, and a network module of the network comprises a three-dimensional attitude generator, a three-dimensional attitude discriminator, a two-dimensional attitude projection layer and a two-dimensional attitude discriminator; when no three-dimensional attitude data is available, the bidirectional projection network works in a B mode, the network is composed of two projection branches in different directions, and a network module of the bidirectional projection network comprises a three-dimensional attitude generator, a two-dimensional attitude projection layer and a two-dimensional attitude discriminator.
Preferably, the three-dimensional posture generator in the third step has the input of two-dimensional joint point coordinates and the output of three-dimensional joint point coordinates, the three-dimensional joint point coordinates comprise two depth residual error networks and a posture feature extraction layer, the depth residual error networks are formed by stacking four residual error blocks, the number of neurons in each layer is 1024, and the posture feature extraction layer completes the coding compression of the posture topological structure; the two-dimensional attitude discriminator and the three-dimensional attitude discriminator have the same network architecture, the two-dimensional/three-dimensional attitude feature extraction layer, the depth residual error network and a full connection layer are contained in the two-dimensional attitude discriminator and the three-dimensional attitude discriminator, and the two-dimensional/three-dimensional discriminator module inputs attitude vectors with different dimensions and outputs a unitary discrimination value; the two-dimensional attitude projection layer comprises two branches of residual error network forward projection and rotation transformation, the attitude is projected to different observation angles respectively according to functions, the input of the two-dimensional attitude projection layer module is three-dimensional attitude data, and the output is projected two-dimensional attitude data.
Preferably, said step four comprises the sub-steps of,
step 4.1, when three-dimensional posture data are available for network training, selecting a mode A network architecture for training;
step 4.1.1, taking the two-dimensional posture as input, firstly, outputting an initial depth estimation value through a residual error network in a three-dimensional posture generator to obtain an initial estimation result of the three-dimensional posture; then, an initial estimation result is transmitted into an attitude feature extraction layer, a feature vector is output through attitude prior topological structure feature extraction, the feature vector is transmitted into a depth residual error network again to output a final depth estimation value, and a final three-dimensional reconstruction attitude is generated;
step 4.1.2, one path of the generated three-dimensional reconstruction posture obtains forward projection through a two-dimensional posture projection layer, and calculates a posture error with the input two-dimensional posture, and the other path of the generated three-dimensional reconstruction posture is sent to a three-dimensional posture discriminator to calculate a distribution error;
step 4.1.3, taking the three-dimensional posture as input, firstly obtaining forward projection through a two-dimensional posture projection layer, wherein one path of the forward projection is sent to a three-dimensional posture generator to obtain a three-dimensional reconstruction result, and the three-dimensional posture generator and the input three-dimensional posture calculate a posture error, and the other path of the forward projection is sent to a two-dimensional posture discriminator to calculate a distribution error;
4.2, when no three-dimensional posture data is available for network training, selecting a mode B network architecture for training;
step 4.2.1, taking the two-dimensional posture as input, firstly, outputting an initial depth estimation value through a residual error network in a three-dimensional posture generator to obtain an initial estimation result of the three-dimensional posture; then, an initial estimation result is transmitted into an attitude feature extraction layer, a feature vector is output through attitude prior topological structure feature extraction, the feature vector is transmitted into a depth residual error network again to output a final depth estimation value, and a final three-dimensional reconstruction attitude is generated;
step 4.2.2, transmitting the three-dimensional reconstruction posture into a two-dimensional posture projection layer to respectively obtain a forward projection and a rotary projection, wherein the forward projection calculates a posture error with the input two-dimensional posture, and the rotary projection calculates a two-dimensional distribution error through a two-dimensional posture discriminator;
4.3, respectively calculating loss functions in the A/B modes, wherein the loss functions comprise an attitude loss function and a distribution loss function;
step 4.3.1, in the mode a, the overall loss function of the network is defined as:
lossA=LGAN(G3d,D3d)+LGAN(G2d,D2d)+Ldual(G2d,G3d) Wherein L isGANRepresenting students with gradient penalty termsThe loss function of the countermeasure network reflects the distribution error, and the calculation formula is as follows:
Figure BDA0002372884730000031
Figure BDA0002372884730000032
Ldualrepresenting the bidirectional loss of the dual network, reflecting the attitude error, the calculation formula is as follows:
Ldual(G2d,G3d)=||G2d(G3d(X2d))-X2d||1+||G3d(G2d(X3d))-X3d||1
λ is a neural network hyperparameter, G3dRepresenting a three-dimensional pose generator, G2dRepresenting a two-dimensional pose projection layer, D3dAnd D2dRespectively representing three-dimensional and two-dimensional attitude discriminators, X2dAnd X3dRespectively representing a true two-dimensional attitude and a three-dimensional attitude, A3dRepresenting a random three-dimensional attitude on a connection line of a sampling point of the reconstructed three-dimensional attitude distribution and the real three-dimensional attitude distribution, A2dRepresenting the random two-dimensional attitude on the connecting line of the projection two-dimensional attitude distribution and the real two-dimensional attitude distribution sampling point;
step 4.3.2, in the mode B, the overall loss function of the network is defined as:
lossB=LGAN(GR2dG3d,D2d)+Lpose(GK2dG3d) Wherein L isGANRepresenting a loss function of the generation countermeasure network with a gradient penalty term, which reflects the distribution error, and the calculation formula is as follows:
Figure BDA0002372884730000033
Lposeto reconstruct the loss, which reflects the attitude error, the calculation formula is as follows:
Lpose(GK2dG3d)=||GK2dG3d(X2d)-X2d||1
λ is a neural network hyperparameter, G3dRepresenting a three-dimensional pose generator, GR2dRepresenting a two-dimensional pose projection layer rotation projection transformation, GK2dRepresenting a two-dimensional pose projection layer forward projection transformation, D2dRepresenting a two-dimensional attitude discriminator, X2dRepresenting true two-dimensional attitude data, A2dRepresenting the random two-dimensional attitude on the connecting line of the projection two-dimensional attitude distribution and the real two-dimensional attitude distribution sampling point;
and 4.4, adjusting a network parameter minimization error function by using a neural network optimizer, iterating for 20-40 EPOCH, and then converging a loss function to obtain the trained three-dimensional posture generator.
The step five comprises the following sub-steps,
step 5.1, transmitting video or image data acquired by a common camera into a two-dimensional attitude detector, and firstly obtaining two-dimensional joint point data;
step 5.2, normalizing the output result of the two-dimensional attitude detector to enable the output result to be directly used as the input of the three-dimensional attitude generator; the normalization process has the following substeps:
step 5.2.1, reconstructing the coordinates of the central neck by using the detected coordinates of the left shoulder joint and the right shoulder joint:
Figure BDA0002372884730000041
wherein: (x)T,yT) Represents the central neck coordinate, (x)ls,yls) Represents the left shoulder coordinate, (x)rs,yrs) Representing the coordinates of the right shoulder;
and 5.2.2, reconstructing central spine coordinates by using the detected left and right shoulder joints and hip joints:
Figure BDA0002372884730000042
wherein: (x)S,yS) Represents the central spine coordinate (x)ls,yls) Represents the left shoulder coordinate, (x)rs,yrs) Represents the coordinates of the right shoulder, (x)lh,ylh) Represents the left hip coordinate, (x)rh,yrh) Represents the right hip coordinate;
step 5.3, the normalized two-dimensional posture data is transmitted into a three-dimensional posture generator, the output result is the reconstructed three-dimensional posture, and when the input data is image data, the output result is a three-dimensional human body posture skeleton; when the input data is video data, the output result is three-dimensional human skeleton action.
Compared with the prior art, the natural scene three-dimensional human body posture reconstruction method based on the bidirectional projection network has the following advantages: (1) the deep neural network is trained in a data-driven mode, low-cost human body posture three-dimensional reconstruction can be achieved directly through the neural network, expensive hardware equipment is not needed, data can be collected only through a common camera or a mobile phone, three-dimensional posture reconstruction can be conducted on a moving human body based on a visual method, and the three-dimensional reconstruction of the human body posture can be completed instead of professional hardware peripherals. The cost is low, the use is convenient, the power can be assisted by VR and AR technologies in the 5G era, the portable somatosensory interaction equipment is established, and the large-scale popularization and application of the three-dimensional motion reconstruction technology are realized.
(2) A special neural network training mode is adopted, the physiological structure characteristics of human body posture data are fully utilized, new constraints are added to the network, therefore, the label-free deep learning training process can be realized in the training process of the network without depending on specific data labels and three-dimensional data sets, the trained model has good generalization performance, and the complex three-dimensional human body posture estimation task in a natural scene can be realized.
(3) The invention designs a bidirectional projection network by researching two characteristics of human body posture. The posture prior knowledge contained in the data set is used as a new constraint to be added into the training process of the network, so that the dependence of the model on real 3D data during training is reduced, the network can be trained without depending on tag data, and accurate 3D human body posture reconstruction in a natural scene can be realized.
Drawings
FIG. 1 is a schematic diagram of an A-mode network structure of a bidirectional projection network according to the present invention;
FIG. 2 is a schematic diagram of a B-mode network structure of the bidirectional projection network of the present invention;
FIG. 3 is a schematic diagram of the internal structure of the bidirectional projection network component module according to the present invention;
FIG. 4 is an overall flow chart of the present invention;
FIG. 5 is a three-dimensional human body posture reconstruction effect diagram under a natural scene.
Detailed Description
The method for reconstructing a three-dimensional human body pose of a natural scene based on a bidirectional projection network according to the present invention is further described with reference to the accompanying drawings and the specific embodiments, in which, as shown in the drawings, detailed descriptions of the present invention are presented in this embodiment.
Introduction to related Art
Reconstructing human postural movements in three-dimensional space is one of the main targets of computer vision, and this problem has been studied by relevant scholars in the last century [1 ]. To get rid of the dependence on professional equipment, some early methods were mostly based on feature engineering to reconstruct 3D poses by performing motor physiological modeling on human skeletal joints [2, 3], or search-based methods to output corresponding 3D poses [4, 5] for 2D poses using a database dictionary of 3D skeletons for nearest neighbor lookup. With the development of deep learning, researchers try to output the 3D pose of the human body directly from RGB images by establishing an end-to-end model [6, 7, 8, 9], but the complicated background of the images in natural scenes usually interferes with the end-to-end 3D pose reconstruction process. In recent years, great attention has been paid to the inference of 3D human body posture from monocular vision systems, and the technology can be widely applied to animation movies, virtual reality, behavior recognition and human-computer interaction. This is very challenging in computer vision tasks, as restoring the 3D pose from 2D observations is itself a morbid problem. Under a natural scene, under the influence of factors such as illumination, angles and complex backgrounds, it is very difficult to directly infer the 3D posture of a human body in an image, and the problem is split into two parts by some previous works: firstly, 2D postures are estimated from images through various advanced 2D human key point detectors, and then 3D human posture reconstruction is carried out on the basis of the obtained 2D postures. Wherein [10] firstly, a simple baseline algorithm is provided, 3D posture reconstruction is regarded as a regression task from a 2D joint point to a 3D coordinate point, and high-quality 3D posture reconstruction is completed by utilizing a neural network. [11] Further representation of the pose as a distance matrix translates this problem into a two-dimensional to three-dimensional distance matrix regression. [12] The human body posture is regarded as a kind of special topological graph data, a semantic graph convolution network (SemGCN) is designed, and the regression task of graph structure data is completed. However, these methods of training the network using three-dimensional label data have two serious limitations: (1) because the data of the 3D posture has high requirements on experimental conditions and usually expensive multi-angle motion capture equipment is needed to capture three-dimensional information of human motion indoors, it is usually difficult to obtain a large amount of 3D human posture data for training in real scenes; (2) strict corresponding relation in the process of training labeled data can cause an overfitting phenomenon on a single data set, the overfitting phenomenon is represented on the aspect that a model cannot be generalized to other special angles or unseen 2D postures, and on the other hand, the overfitting phenomenon is represented on the aspect that a network can only generate 3D posture data in a training set and cannot reasonably reconstruct complex posture actions in a natural scene. Both of these limitations are due to the reliance of the training process on 3D label data.
In recent years, the accuracy of 2D human joint detection algorithms is increasing, and real-time 2D pose estimation in natural scenes can be achieved. More and more researchers are therefore working on the reconstruction of 3D poses using these easily available 2D joint point data, i.e. divided into two steps: first 2D poses are obtained from the image using advanced 2D body joint detectors and then these 2D poses are lifted to 3D. The key point for solving the ill-conditioned problem is to add reasonable prior information as a constraint in combination with the problem characteristics, and in the traditional method, the constraint is provided by a manually designed regular term, and usually, only a solution of a single problem can be realized. In the deep learning era, the automatic learning of priori information constraint from data by using a network can be regarded as a new idea for solving the ill-conditioned problem, and the problems can be solved through a model trained by a large amount of data.
Therefore, abstracting the important characteristics of the pose data and serving as the constraint of the network is the main contribution point of the invention. Through the research on the physiological structure characteristics of the posture data, the invention designs a bidirectional projection network by utilizing a deep learning technology, the bidirectional projection network has two working modes of A/B, the network can be trained respectively under the condition of three-dimensional data labels and label-free data, and the trained network can complete a complex three-dimensional human posture reconstruction task under a natural scene.
Second, the proposed method
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Example 1: referring to fig. 1 to 5, an overall flowchart of a natural scene three-dimensional human body posture reconstruction method based on a bidirectional projection network is shown in fig. 4, and after a picture or a video is obtained by shooting with a monocular camera, a corresponding two-dimensional human body posture is obtained through a two-dimensional posture detection network (openpos, HRNet, StackHourglass) and a two-dimensional joint detection result is obtained. Before data is sent to the three-dimensional gesture generator, a corresponding mode needs to be selected according to whether the data has three-dimensional label data or not to train the three-dimensional gesture generator. When the three-dimensional attitude tag data exists, the bidirectional projection network works in the mode A, and when the three-dimensional attitude tag data does not exist, the bidirectional projection network works in the mode B. The user can select a corresponding mode to train the network according to whether the user has three-dimensional human body posture data.
When the mode a is selected, the two-way projection network training process has the structure as shown in fig. 1, and the two-dimensional posture data and the three-dimensional posture data are respectively sent to two branches of the two-way network. In the first branch, the input two-dimensional posture generates a three-dimensional posture reconstruction result through a three-dimensional posture generator, and the reconstruction result generates a two-dimensional projection result again through a two-dimensional posture projection layer; in the second branch, the input three-dimensional posture is firstly transmitted into the two-dimensional posture projection layer, and then the output result is transmitted into the three-dimensional posture generator to complete the second reconstruction. Two branches complete a dual operation process, errors of the reconstructed attitude need to be calculated in the two processes respectively, the errors are divided into distribution errors and attitude errors, and the loss function of the whole network is as follows:
lossA=LGAN(G3d,D3d)+LGAN(G2d,D2d)+Ldual(G2d,G3d),
wherein L isGANRepresenting a loss function of the generation countermeasure network with a gradient penalty term, which reflects the distribution error, and the calculation formula is as follows:
Figure BDA0002372884730000061
Figure BDA0002372884730000062
Ldualrepresenting the bidirectional loss of the dual network, reflecting the attitude error, the calculation formula is as follows:
Ldual(G2d,G3d)=||G2d(G3d(X2d))-X2d||1+||G3d(G2d(X3d))-X3d||1
λ is a neural network hyperparameter, G3dRepresenting a three-dimensional pose generator, G2dRepresenting a two-dimensional pose projection layer, D3dAnd D2dRespectively representing three-dimensional and two-dimensional attitude discriminators, X2dAnd X3dRespectively representing a true two-dimensional attitude and a three-dimensional attitude, A3dRepresenting a random three-dimensional attitude on a connection line of a sampling point of the reconstructed three-dimensional attitude distribution and the real three-dimensional attitude distribution, A2dRepresenting the random two-dimensional attitude on the connecting line of the projection two-dimensional attitude distribution and the real two-dimensional attitude distribution sampling point;
when the mode B is selected, the bidirectional projection network training process has the structure shown in fig. 2, the mode B does not need any label data, the input two-dimensional posture is firstly subjected to three-dimensional posture generator to obtain a reconstruction result, then the three-dimensional posture is subjected to two projection changes through the two-dimensional posture projection layer, one branch circuit projects the three-dimensional posture to a forward observation visual angle to obtain a forward two-dimensional projection result, and the other branch circuit performs rotary projection transformation on the three-dimensional posture result to obtain other visual angle observation results. Two branches at this moment finish two kinds of different observation processes, carry on two kinds of constraints to these two kinds of observation results respectively and can get attitude error and distribution error equally, the loss function of the whole network is:
lossB=LGAN(GR2dG3d,D2d)+Lpose(GK2dG3d),
wherein L isGANRepresenting a loss function of the generation countermeasure network with a gradient penalty term, which reflects the distribution error, and the calculation formula is as follows:
Figure BDA0002372884730000071
Lposeto reconstruct the loss, which reflects the attitude error, the calculation formula is as follows:
Lpose(GK2dG3d)=||GK2dG3d(X2d)-X2d||1
λ is a neural network hyperparameter, G3dRepresenting a three-dimensional pose generator, GR2dRepresenting a two-dimensional pose projection layer rotation projection transformation, GK2dRepresenting a two-dimensional pose projection layer forward projection transformation, D2dRepresenting a two-dimensional attitude discriminator, X2dRepresentsTrue two-dimensional attitude data, A2dRepresenting the random two-dimensional attitude on the connecting line of the projection two-dimensional attitude distribution and the real two-dimensional attitude distribution sampling point;
in the training process, the two A/B modes of the bidirectional projection network share the same network module, and the network module is shown in FIG. 3 and comprises a three-dimensional posture generator, a two/three-dimensional posture discriminator and a two-dimensional posture projection layer.
The three-dimensional posture generator comprises two depth residual error networks and a posture characteristic extraction layer, the depth residual error networks are formed by stacking four residual error blocks, the number of neurons in each layer is 1024, an initial depth estimation value is output by an input two-dimensional posture through the residual error networks to obtain an initial estimation result of the three-dimensional posture, then the initial estimation result is transmitted into the posture characteristic extraction layer, the three-dimensional posture is coded into a characteristic vector containing a space angle and depth information through posture prior topological structure characteristic extraction, the characteristic vector is transmitted into the depth residual error networks again to output a final depth estimation value, and a final three-dimensional reconstruction posture is generated.
The two-dimensional attitude discriminator and the three-dimensional attitude discriminator have the same network architecture, and the main difference is that the feature extraction layers are different, the attitudes of two dimensions are firstly coded into a feature vector containing a motion attitude topological structure through the corresponding attitude feature extraction layer, and then the final discrimination value is output through a depth residual error network and a full connection layer, so that the calculation of the difference between the two distributions is completed.
The two-dimensional attitude projection layer comprises two branch circuits, the attitude can be projected to different angles respectively, the observation of a forward visual angle is completed through a depth residual error network connected by a plurality of residual error blocks, and the observation of other rotating visual angles is realized through the attitude rotation conversion layer.
The transformation process of the forward projection is as follows:
X2d=G2d(X3d)
the transformation process of the rotational projection is as follows:
X2d=GR2dX3d
wherein X2dRepresenting a two-dimensional attitude, X3dRepresenting three-dimensional attitude, G2dRepresenting depth residual network projective transformation, GR2dRepresenting a rotational transformation.
Wherein the rotation transformation matrix:
Figure BDA0002372884730000081
the A, B two training modes of the bidirectional projection network can be formed by combining the modules, the corresponding mode is selected according to actual conditions to train the network, the error function is continuously iterated and minimized, and the trained three-dimensional posture generator can be finally obtained through 20-40 EPOCH network training.
The previously detected two-dimensional poses are then subjected to a normalization process as follows:
1. reconstructing central neck coordinates using the detected left and right shoulder joint coordinates:
Figure BDA0002372884730000082
wherein: (x)T,yT) Represents the central neck coordinate, (x)ls,yls) Represents the left shoulder coordinate, (x)rs,yrs) Representing the coordinates of the right shoulder;
2. reconstructing central spine coordinates using the detected left and right shoulder joints and hip joints:
Figure BDA0002372884730000083
wherein: (x)S,yS) Represents the central spine coordinate (x)ls,yls) Represents the left shoulder coordinate, (x)rs,yrs) Represents the coordinates of the right shoulder, (x)lh,ylh) Represents the left hip coordinate, (x)rh,yrh) Represents the right hip coordinate;
the normalized two-dimensional human body posture is transmitted into a trained three-dimensional posture generator, the three-dimensional posture generator can output a three-dimensional human body skeleton which accords with the human body posture topological structure according to a two-dimensional detection result, and the skeleton sequence of each frame of video is connected, so that the reconstruction of the three-dimensional human body posture in the video can be realized. The reconstruction effect of the invented method is shown in fig. 5.
And thirdly, in the reference documents, the numbers carried in parentheses in the application documents refer to the documents corresponding to the numbers below.
[1]H.-J.Lee and Z.Chen.Determination of 3d human body postures from asingle view.Computer Vision,Graphics,and Image Processing,30(2):148–168,1985.
[2]V.Ramakrishna,T.Kanade,and Y.Sheikh.Reconstructing 3d human posefrom 2d image landmarks.In European Conference on Computer Vision(ECCV),pages573–586.Springer,2012.
[3]C.Ionescu,J.Carreira,and C.Sminchisescu.Iterated second-orderlabel sensitive pooling for 3d human pose estimation.In Conference onComputer Vision and Pattern Recognition(CVPR),pages 1661–1668,2014.2
[4]H.Jiang.3d human pose reconstruction using millions ofexemplars.In International Conference on Pattern Recognition(ICPR),pages1674–1677.IEEE,2010.
[5]C.-H.Chen and D.Ramanan.3D human pose estimation=2D poseestimation+matching.In Conference on Computer Vision and Pattern Recognition(CVPR),pages 5759–5767,2017.
[6]S.Li and A.B.Chan.3d human pose estimation from monocular imageswith deep convolutional neural network.In Asian Conference on Computer Vision(ACCV),pages 332–347.Springer,2014.
[7]D.Mehta,S.Sridhar,O.Sotnychenko,H.Rhodin,M.Shafiei,H.-P.Seidel,W.Xu,D.Casas,and C.Theobalt.Vnect:Real-time 3d human pose estimation with asingle rgb camera.volume 36,72017.
[8]B.Tekin,I.Katircioglu,M.Salzmann,V.Lepetit,and P.Fua.Structuredprediction of 3d human pose with deep neural networks.In British MachineVision Conference(BMVC),2016.
[9]G.Pavlakos,X.Zhou,K.G.Derpanis,and K.Daniilidis.Coarse-to-finevolumetric prediction for single-image 3d human pose.In Conference onComputer Vision and Pattern Recognition(CVPR),pages 1263–1272.IEEE,2017.
[10]J.Martinez,R.Hossain,J.Romero,and J.J.Little.A simple yeteffective baseline for 3d human pose estimation.In ICCV,2017.
[11]F.Moreno-Noguer.3d human pose estimation from a single image viadistance matrix regression.In Proceedings of the Conference on ComputerVision and Pattern Recognition(CVPR),2017.1
[12]Zhao L,Peng X,Tian Y,et al.Semantic Graph Convolutional Networksfor 3D Human Pose Regression[C]//Proceedings of the IEEE Conference onComputer Vision and Pattern Recognition.2019:3425-3435.

Claims (7)

1. A natural scene three-dimensional human body posture reconstruction method based on a bidirectional projection network is characterized in that: comprises the following steps:
acquiring a natural scene human motion video or image data by using a camera;
secondly, sending the collected video and image data to a two-dimensional posture detector to obtain two-dimensional human body joint point coordinates of corresponding postures;
designing two-way projection networks with two structures according to the existence of the three-dimensional attitude data tags in the training process;
training the designed network by using a deep antagonistic learning strategy, minimizing a network loss function, and finally obtaining a trained three-dimensional posture generator through iteration;
and step five, inputting the output result of the two-dimensional posture detector in the step two into the trained three-dimensional posture generator in the step four, and outputting the output result to be the three-dimensional posture data of the person in the video/image.
2. The natural scene three-dimensional human body posture reconstruction method based on the bidirectional projection network as claimed in claim 1, characterized in that: in the first step, a common monocular optical camera or a mobile phone camera is adopted to complete the acquisition of character motion data in a natural scene, and the data is in the form of pictures or videos.
3. The natural scene three-dimensional human body posture reconstruction method based on the bidirectional projection network as claimed in claim 1, characterized in that: in the second step, the two-dimensional posture detector is a two-dimensional posture detection method of OpenPose, StackHourglass or HRNet, when the acquired data is a picture, the picture is directly input to obtain a two-dimensional joint detection result, and when the acquired data is a video, a two-dimensional joint detection sequence is obtained by inputting frame by frame.
4. The natural scene three-dimensional human body posture reconstruction method based on the bidirectional projection network as claimed in claim 1, characterized in that: in the third step, two bidirectional projection networks with different structures A/B are selected according to whether a user has three-dimensional attitude tag data or not, when the three-dimensional attitude data is used, the bidirectional projection networks work in an A mode, the networks are composed of two opposite dual branches, and the network modules comprise a three-dimensional attitude generator, a three-dimensional attitude discriminator, a two-dimensional attitude projection layer and a two-dimensional attitude discriminator; when no three-dimensional attitude data is available, the bidirectional projection network works in a B mode, the network is composed of two projection branches in different directions, and a network module of the bidirectional projection network comprises a three-dimensional attitude generator, a two-dimensional attitude projection layer and a two-dimensional attitude discriminator.
5. The natural scene three-dimensional human body posture reconstruction method based on the bidirectional projection network as claimed in claim 1, characterized in that: the three-dimensional posture generator in the third step has the input of two-dimensional joint point coordinates and the output of three-dimensional joint point coordinates, the three-dimensional posture generator internally comprises two depth residual error networks and a posture characteristic extraction layer, the depth residual error networks are formed by stacking four residual error blocks, the number of neurons in each layer is 1024, and the posture characteristic extraction layer completes the coding compression of a posture topological structure; the two-dimensional attitude discriminator and the three-dimensional attitude discriminator have the same network architecture, the two-dimensional/three-dimensional attitude feature extraction layer, the depth residual error network and a full connection layer are contained in the two-dimensional attitude discriminator and the three-dimensional attitude discriminator, and the two-dimensional/three-dimensional discriminator module inputs attitude vectors with different dimensions and outputs a unitary discrimination value; the two-dimensional attitude projection layer comprises two branches of residual error network forward projection and rotation transformation, the attitude is projected to different observation angles respectively according to functions, the input of the two-dimensional attitude projection layer module is three-dimensional attitude data, and the output is projected two-dimensional attitude data.
6. The natural scene three-dimensional human body posture reconstruction method based on the bidirectional projection network as claimed in claim 1, characterized in that: the fourth step comprises the following sub-steps,
step 4.1, when three-dimensional posture data are available for network training, selecting a mode A network architecture for training;
step 4.1.1, taking the two-dimensional posture as input, firstly, outputting an initial depth estimation value through a residual error network in a three-dimensional posture generator to obtain an initial estimation result of the three-dimensional posture; then, an initial estimation result is transmitted into an attitude feature extraction layer, a feature vector is output through attitude prior topological structure feature extraction, the feature vector is transmitted into a depth residual error network again to output a final depth estimation value, and a final three-dimensional reconstruction attitude is generated;
step 4.1.2, one path of the generated three-dimensional reconstruction posture obtains forward projection through a two-dimensional posture projection layer, and calculates a posture error with the input two-dimensional posture, and the other path of the generated three-dimensional reconstruction posture is sent to a three-dimensional posture discriminator to calculate a distribution error;
step 4.1.3, taking the three-dimensional posture as input, firstly obtaining forward projection through a two-dimensional posture projection layer, wherein one path of the forward projection is sent to a three-dimensional posture generator to obtain a three-dimensional reconstruction result, and the three-dimensional posture generator and the input three-dimensional posture calculate a posture error, and the other path of the forward projection is sent to a two-dimensional posture discriminator to calculate a distribution error;
4.2, when no three-dimensional posture data is available for network training, selecting a mode B network architecture for training;
step 4.2.1, taking the two-dimensional posture as input, firstly, outputting an initial depth estimation value through a residual error network in a three-dimensional posture generator to obtain an initial estimation result of the three-dimensional posture; then, an initial estimation result is transmitted into an attitude feature extraction layer, a feature vector is output through attitude prior topological structure feature extraction, the feature vector is transmitted into a depth residual error network again to output a final depth estimation value, and a final three-dimensional reconstruction attitude is generated;
step 4.2.2, transmitting the three-dimensional reconstruction posture into a two-dimensional posture projection layer to respectively obtain a forward projection and a rotary projection, wherein the forward projection calculates a posture error with the input two-dimensional posture, and the rotary projection calculates a two-dimensional distribution error through a two-dimensional posture discriminator;
4.3, respectively calculating loss functions in the A/B modes, wherein the loss functions comprise an attitude loss function and a distribution loss function;
step 4.3.1, in the mode a, the overall loss function of the network is defined as:
lossA=LGAN(G3d,D3d)+LGAN(G2d,D2d)+Ldual(G2d,G3d) Wherein L isGANRepresenting a loss function of the generation countermeasure network with a gradient penalty term, which reflects the distribution error, and the calculation formula is as follows:
Figure FDA0002372884720000021
Figure FDA0002372884720000022
Ldualrepresenting the bidirectional loss of the dual network, reflecting the attitude error, the calculation formula is as follows:
Ldual(G2d,G3d)=||G2d(G3d(X2d))-X2d||1+||G3d(G2d(X3d))-X3d||1
λ is a neural network hyperparameter, G3dRepresenting a three-dimensional pose generator, G2dRepresenting a two-dimensional pose projection layer, D3dAnd D2dRespectively representing three-dimensional and two-dimensional attitude discriminators, X2dAnd X3dRespectively representing a true two-dimensional attitude and a three-dimensional attitude, A3dRepresenting a random three-dimensional attitude on a connection line of a sampling point of the reconstructed three-dimensional attitude distribution and the real three-dimensional attitude distribution, A2dRepresenting the random two-dimensional attitude on the connecting line of the projection two-dimensional attitude distribution and the real two-dimensional attitude distribution sampling point;
step 4.3.2, in the mode B, the overall loss function of the network is defined as:
lossB=LGAN(GR2dG3d,D2d)+Lpose(GK2dG3d) Wherein L isGANRepresenting a loss function of the generation countermeasure network with a gradient penalty term, which reflects the distribution error, and the calculation formula is as follows:
Figure FDA0002372884720000031
Lposeto reconstruct the loss, which reflects the attitude error, the calculation formula is as follows:
Lpose(GK2dG3d)=||GK2dG3d(X2d)-X2d||1
λ is a neural network hyperparameter, G3dRepresenting a three-dimensional pose generator, GR2dRepresenting a two-dimensional pose projection layer rotation projection transformation, GK2dRepresenting a two-dimensional pose projection layer forward projection transformation, D2dRepresenting a two-dimensional attitude discriminator, X2dRepresenting true two-dimensional attitude data, A2dRepresenting the random two-dimensional attitude on the connecting line of the projection two-dimensional attitude distribution and the real two-dimensional attitude distribution sampling point;
and 4.4, adjusting a network parameter minimization error function by using a neural network optimizer, iterating for 20-40 EPOCH, and then converging a loss function to obtain the trained three-dimensional posture generator.
7. The natural scene three-dimensional human body posture reconstruction method based on the bidirectional projection network as claimed in claim 1, characterized in that: the step five comprises the following sub-steps,
step 5.1, transmitting video or image data acquired by a common camera into a two-dimensional attitude detector, and firstly obtaining two-dimensional joint point data;
step 5.2, normalizing the output result of the two-dimensional attitude detector to enable the output result to be directly used as the input of the three-dimensional attitude generator; the normalization process has the following substeps:
step 5.2.1, reconstructing the coordinates of the central neck by using the detected coordinates of the left shoulder joint and the right shoulder joint:
Figure FDA0002372884720000032
wherein: (x)T,yT) Represents the central neck coordinate, (x)ls,yls) Represents the left shoulder coordinate, (x)rs,yrs) Representing the coordinates of the right shoulder;
and 5.2.2, reconstructing central spine coordinates by using the detected left and right shoulder joints and hip joints:
Figure FDA0002372884720000033
wherein: (x)S,yS) Represents the central spine coordinate (x)ls,yls) Represents the left shoulder coordinate, (x)rs,yrs) Represents the coordinates of the right shoulder, (x)lh,ylh) Represents the left hip coordinate, (x)rh,yrh) Represents the right hip coordinate;
step 5.3, the normalized two-dimensional posture data is transmitted into a three-dimensional posture generator, the output result is the reconstructed three-dimensional posture, and when the input data is image data, the output result is a three-dimensional human body posture skeleton; when the input data is video data, the output result is three-dimensional human skeleton action.
CN202010056119.6A 2020-01-18 2020-01-18 Natural scene three-dimensional human body posture reconstruction method based on bidirectional projection network Active CN111311729B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010056119.6A CN111311729B (en) 2020-01-18 2020-01-18 Natural scene three-dimensional human body posture reconstruction method based on bidirectional projection network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010056119.6A CN111311729B (en) 2020-01-18 2020-01-18 Natural scene three-dimensional human body posture reconstruction method based on bidirectional projection network

Publications (2)

Publication Number Publication Date
CN111311729A true CN111311729A (en) 2020-06-19
CN111311729B CN111311729B (en) 2022-03-11

Family

ID=71145156

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010056119.6A Active CN111311729B (en) 2020-01-18 2020-01-18 Natural scene three-dimensional human body posture reconstruction method based on bidirectional projection network

Country Status (1)

Country Link
CN (1) CN111311729B (en)

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112185104A (en) * 2020-08-22 2021-01-05 南京理工大学 Traffic big data restoration method based on countermeasure autoencoder
CN112307940A (en) * 2020-10-28 2021-02-02 有半岛(北京)信息科技有限公司 Model training method, human body posture detection method, device, equipment and medium
CN112949462A (en) * 2021-02-26 2021-06-11 平安科技(深圳)有限公司 Three-dimensional human body posture estimation method, device, equipment and storage medium
CN113170050A (en) * 2020-06-22 2021-07-23 深圳市大疆创新科技有限公司 Image acquisition method, electronic equipment and mobile equipment
CN113158920A (en) * 2021-04-26 2021-07-23 平安科技(深圳)有限公司 Training method and device for specific motion recognition model and computer equipment
CN113239892A (en) * 2021-06-10 2021-08-10 青岛联合创智科技有限公司 Monocular human body three-dimensional attitude estimation method based on data enhancement architecture
CN113569627A (en) * 2021-06-11 2021-10-29 北京旷视科技有限公司 Human body posture prediction model training method, human body posture prediction method and device
CN114581613A (en) * 2022-04-29 2022-06-03 杭州倚澜科技有限公司 Trajectory constraint-based human body model posture and shape optimization method and system
WO2022115991A1 (en) * 2020-12-01 2022-06-09 Intel Corporation Incremental 2d-to-3d pose lifting for fast and accurate human pose estimation
CN115035173A (en) * 2022-06-08 2022-09-09 山东大学 Monocular depth estimation method and system based on interframe correlation
CN116205788A (en) * 2023-04-27 2023-06-02 粤港澳大湾区数字经济研究院(福田) Three-dimensional feature map acquisition method, image processing method and related device

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2009086088A1 (en) * 2007-12-21 2009-07-09 Honda Motor Co., Ltd. Controlled human pose estimation from depth image streams
WO2015143134A1 (en) * 2014-03-19 2015-09-24 Raytheon Company Bare earth finding and feature extraction for 3d point clouds
CN106651770A (en) * 2016-09-19 2017-05-10 西安电子科技大学 Method for reconstructing multispectral super-resolution imaging based on Lapras norm regularization
CN106934827A (en) * 2015-12-31 2017-07-07 杭州华为数字技术有限公司 The method for reconstructing and device of three-dimensional scenic
CN108460338A (en) * 2018-02-02 2018-08-28 北京市商汤科技开发有限公司 Estimation method of human posture and device, electronic equipment, storage medium, program
CN110189253A (en) * 2019-04-16 2019-08-30 浙江工业大学 A kind of image super-resolution rebuilding method generating confrontation network based on improvement
WO2019213450A1 (en) * 2018-05-02 2019-11-07 Quidient, Llc A codec for processing scenes of almost unlimited detail
CN110427799A (en) * 2019-06-12 2019-11-08 中国地质大学(武汉) Based on the manpower depth image data Enhancement Method for generating confrontation network

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2009086088A1 (en) * 2007-12-21 2009-07-09 Honda Motor Co., Ltd. Controlled human pose estimation from depth image streams
WO2015143134A1 (en) * 2014-03-19 2015-09-24 Raytheon Company Bare earth finding and feature extraction for 3d point clouds
CN106934827A (en) * 2015-12-31 2017-07-07 杭州华为数字技术有限公司 The method for reconstructing and device of three-dimensional scenic
CN106651770A (en) * 2016-09-19 2017-05-10 西安电子科技大学 Method for reconstructing multispectral super-resolution imaging based on Lapras norm regularization
CN108460338A (en) * 2018-02-02 2018-08-28 北京市商汤科技开发有限公司 Estimation method of human posture and device, electronic equipment, storage medium, program
WO2019213450A1 (en) * 2018-05-02 2019-11-07 Quidient, Llc A codec for processing scenes of almost unlimited detail
CN110189253A (en) * 2019-04-16 2019-08-30 浙江工业大学 A kind of image super-resolution rebuilding method generating confrontation network based on improvement
CN110427799A (en) * 2019-06-12 2019-11-08 中国地质大学(武汉) Based on the manpower depth image data Enhancement Method for generating confrontation network

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
CHING-HANG CHEN 等: "3D Human Pose Estimation = 2D Pose Estimation + Matching", 《2017 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR)》 *
JIE LIN 等: "CG Animation Creator: Auto-rendering of Motion Stick Figure Based on Conditional Adversarial Learning", 《CHINESE CONFERENCE ON PATTERN RECOGNITION AND COMPUTER VISION (PRCV)》 *
MENGXI JIANG 等: "Reweighted sparse representation with residual compensation for 3D human pose estimation from a single RGB image", 《NEUROCOMPUTING》 *
李翔 等: "基于Kinect的人体三维重建方法", 《计算机***应用》 *

Cited By (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113170050A (en) * 2020-06-22 2021-07-23 深圳市大疆创新科技有限公司 Image acquisition method, electronic equipment and mobile equipment
CN112185104B (en) * 2020-08-22 2021-12-10 南京理工大学 Traffic big data restoration method based on countermeasure autoencoder
CN112185104A (en) * 2020-08-22 2021-01-05 南京理工大学 Traffic big data restoration method based on countermeasure autoencoder
CN112307940A (en) * 2020-10-28 2021-02-02 有半岛(北京)信息科技有限公司 Model training method, human body posture detection method, device, equipment and medium
WO2022115991A1 (en) * 2020-12-01 2022-06-09 Intel Corporation Incremental 2d-to-3d pose lifting for fast and accurate human pose estimation
CN112949462B (en) * 2021-02-26 2023-12-19 平安科技(深圳)有限公司 Three-dimensional human body posture estimation method, device, equipment and storage medium
CN112949462A (en) * 2021-02-26 2021-06-11 平安科技(深圳)有限公司 Three-dimensional human body posture estimation method, device, equipment and storage medium
WO2022178951A1 (en) * 2021-02-26 2022-09-01 平安科技(深圳)有限公司 Three-dimensional human pose estimation method and apparatus, device, and storage medium
CN113158920A (en) * 2021-04-26 2021-07-23 平安科技(深圳)有限公司 Training method and device for specific motion recognition model and computer equipment
CN113158920B (en) * 2021-04-26 2023-12-22 平安科技(深圳)有限公司 Training method and device for specific action recognition model and computer equipment
CN113239892A (en) * 2021-06-10 2021-08-10 青岛联合创智科技有限公司 Monocular human body three-dimensional attitude estimation method based on data enhancement architecture
CN113569627A (en) * 2021-06-11 2021-10-29 北京旷视科技有限公司 Human body posture prediction model training method, human body posture prediction method and device
CN114581613A (en) * 2022-04-29 2022-06-03 杭州倚澜科技有限公司 Trajectory constraint-based human body model posture and shape optimization method and system
CN115035173A (en) * 2022-06-08 2022-09-09 山东大学 Monocular depth estimation method and system based on interframe correlation
CN116205788B (en) * 2023-04-27 2023-08-11 粤港澳大湾区数字经济研究院(福田) Three-dimensional feature map acquisition method, image processing method and related device
CN116205788A (en) * 2023-04-27 2023-06-02 粤港澳大湾区数字经济研究院(福田) Three-dimensional feature map acquisition method, image processing method and related device

Also Published As

Publication number Publication date
CN111311729B (en) 2022-03-11

Similar Documents

Publication Publication Date Title
CN111311729B (en) Natural scene three-dimensional human body posture reconstruction method based on bidirectional projection network
Sun et al. Multi-view to novel view: Synthesizing novel views with self-learned confidence
Zhou et al. Monocular real-time hand shape and motion capture using multi-modal data
CN109636831B (en) Method for estimating three-dimensional human body posture and hand information
Wang et al. 360sd-net: 360 stereo depth estimation with learnable cost volume
Wang et al. Hmor: Hierarchical multi-person ordinal relations for monocular multi-person 3d pose estimation
CN113160375B (en) Three-dimensional reconstruction and camera pose estimation method based on multi-task learning algorithm
CN112530019B (en) Three-dimensional human body reconstruction method and device, computer equipment and storage medium
Chen et al. Learning a deep network with spherical part model for 3D hand pose estimation
Fu et al. Fast ORB-SLAM without keypoint descriptors
Li et al. 3D human pose and shape estimation through collaborative learning and multi-view model-fitting
Li et al. Hmor: Hierarchical multi-person ordinal relations for monocular multi-person 3d pose estimation
Bashirov et al. Real-time rgbd-based extended body pose estimation
Li et al. Deep learning based monocular depth prediction: Datasets, methods and applications
Keceli Viewpoint projection based deep feature learning for single and dyadic action recognition
Yang et al. SAM-Net: Semantic probabilistic and attention mechanisms of dynamic objects for self-supervised depth and camera pose estimation in visual odometry applications
Fang et al. Self-supervised learning of depth and ego-motion from videos by alternative training and geometric constraints from 3-d to 2-d
Chang et al. Multi-view 3D human pose estimation with self-supervised learning
Fu et al. CBAM-SLAM: A semantic slam based on attention module in dynamic environment
Li et al. Monocular 3-D Object Detection Based on Depth-Guided Local Convolution for Smart Payment in D2D Systems
Price et al. Augmenting crowd-sourced 3d reconstructions using semantic detections
CN116129051A (en) Three-dimensional human body posture estimation method and system based on graph and attention interleaving
Ruget et al. Real-time, low-cost multi-person 3D pose estimation
Chen et al. 360ORB-SLAM: A Visual SLAM System for Panoramic Images with Depth Completion Network
Aleksandrova et al. 3D face model reconstructing from its 2D images using neural networks

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant