CN111353381B - 2D image-oriented human body 3D gesture estimation method - Google Patents

2D image-oriented human body 3D gesture estimation method Download PDF

Info

Publication number
CN111353381B
CN111353381B CN202010021822.3A CN202010021822A CN111353381B CN 111353381 B CN111353381 B CN 111353381B CN 202010021822 A CN202010021822 A CN 202010021822A CN 111353381 B CN111353381 B CN 111353381B
Authority
CN
China
Prior art keywords
characteristic diagram
image
carrying
steps
pair
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010021822.3A
Other languages
Chinese (zh)
Other versions
CN111353381A (en
Inventor
刘龙
杨乐
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xi'an Huaqi Zhongxin Technology Development Co ltd
Zhejiang Shuike Culture Group Co ltd
Original Assignee
Zhejiang Shuike Culture Group Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang Shuike Culture Group Co ltd filed Critical Zhejiang Shuike Culture Group Co ltd
Priority to CN202010021822.3A priority Critical patent/CN111353381B/en
Publication of CN111353381A publication Critical patent/CN111353381A/en
Application granted granted Critical
Publication of CN111353381B publication Critical patent/CN111353381B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • G06T7/73Determining position or orientation of objects or cameras using feature-based methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30196Human being; Person

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Human Computer Interaction (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Analysis (AREA)
  • Image Processing (AREA)

Abstract

The invention discloses a human body 3D gesture estimation method facing 2D images, which comprises the following steps of 1, carrying out convolution, normalization and activation operations on the 2D images in sequence, and outputting the imagesStep 2, for the imageSequentially performing convolution, normalization and activation operations to output imagesStep 3, image is processedInput subnet one is processed and the characteristic diagram C is output 1 、C 2 The method comprises the steps of carrying out a first treatment on the surface of the Step 4, feature map C 1 、C 2 Input subnet two is processed and a characteristic diagram D is output 1 、D 2 、D 3 The method comprises the steps of carrying out a first treatment on the surface of the Step 5, feature map D 1 、D 2 、D 3 Processing the input subnet III, and outputting a characteristic diagram E 1 、E 2 、E 3 The method comprises the steps of carrying out a first treatment on the surface of the Step 6, for the characteristic diagram E 1 、E 2 、E 3 And processing to obtain a matrix P, namely the estimated posture. The method provided by the invention has the advantages of accurate estimated depth, less algorithm parameters and strong generalization.

Description

2D image-oriented human body 3D gesture estimation method
Technical Field
The invention belongs to the technical field of human body posture estimation, and particularly relates to a human body 3D posture estimation method for a 2D image.
Background
Among deep neural networks, the 3D estimation method for human body posture in an image mainly includes CNN (Convolutional Neural Networks, convolutional neural network), LSTM (Long Short Term Memory network ), GCN (Graph convolution Networks) and GAN (Generative Adversarial Network) networks, wherein CNN is currently the main stream.
The initial application of convolutional neural networks in pose estimation is 2D pose estimation of a human body in an image, i.e., estimating joint points of a single person or multiple persons from a single picture, and connecting the relevant joint points. Along with optimization and expansion of algorithm performance, the accuracy or evaluation index of 2D pose estimation approaches to the bottleneck, and people aim at 3D pose estimation of human bodies in images. In recent years, the method for estimating the 3D gesture of the human body in a single image is mainly based on Stacked Hourglass (ECCV in 2016), CPM (Convolutional Pose Machine is the research of CMU Yaser Sheikh), MSPN (Multi-Stage Pose Network, proposed by face++, coco key point detection champion in 2018) and HRNet and the like, stacked Hourglass adopts a mode of stacking a plurality of Hourglass, each Hourglass uses a plurality of residual blocks (He Kaiming in 2015), and the whole frame step training realizes the estimation of the 3D gesture; CPM is to calculate the response graph of each node under each stage, and find out the maximum response value, namely the position of the node; MSPN uses multi-stage feature fusion with different scales, and combines semantic information of small-scale features and local details of large-scale features to complete prediction of joint positions; the 3D gesture estimation can be completed by the method, and a higher score is obtained in various estimated standards, but the method has the following defects:
(1) In terms of predicting depth, the algorithm cannot accurately estimate the depth value of each node;
(2) The relation among the joints of the human body is not fully considered, so that some estimated postures are wrong and do not accord with the motion relation among the joints of the human body (such as wrong estimation of knee bending and the like);
(3) The parameter amount of the algorithm model is large.
Disclosure of Invention
The invention aims to provide a 2D image-oriented human body 3D gesture estimation method which is accurate in estimation depth, small in algorithm parameter and strong in generalization.
The technical scheme adopted by the invention is that the human body 3D gesture estimation method facing the 2D image is implemented according to the following steps:
step 1, carrying out convolution, normalization and activation operations on a 2D image in sequence, and outputting the image
Step 2, for the imageSequentially performing convolution, normalization and activation operations to output image +.>
Step 3, image is processedInput subnet one is processed and the characteristic diagram C is output 1 、C 2
Step 4, feature map C 1 、C 2 Input subnet two is processed and a characteristic diagram D is output 1 、D 2 、D 3
Step 5, feature map D 1 、D 2 、D 3 Processing the input subnet III, and outputting a characteristic diagram E 1 、E 2 、E 3
Step 6, for the characteristic diagram E 1 、E 2 、E 3 And processing to obtain a matrix P, namely the estimated posture.
The invention is also characterized in that:
the step 1 is specifically implemented according to the following steps:
step 1.1, carrying out the following operations on the 2D image simultaneously:
(1) The convolution operation is performed by using a convolution kernel of 3×3, and the number of channels is (1-a) in -b in ) X64, obtain high frequency characteristic diagram A 1 =[128,128,(1-a in -b in )×64]The method comprises the steps of carrying out a first treatment on the surface of the Wherein a is in Is a low frequency channel number coefficient; b in Is the intermediate frequency channel number coefficient;
(2) Downsampling by 1/2 times, and obtaining the number of channels b in X64, obtain intermediate frequency characteristic diagram A 2 =[64,64,b in ×64];
(3) Downsampling by 1/4 times, and obtaining channel number a in X64, obtain low frequency characteristic diagram A 3 =[32,32,a in ×64];
Step 1.2, performing the following operations on each image output in step 1.1:
first, the average μ of the image pixels is calculated 1 The method comprises the steps of carrying out a first treatment on the surface of the Then, the variance sigma of the image pixels is calculated 1 The method comprises the steps of carrying out a first treatment on the surface of the Then carrying out normalization processing on the image pixels to obtainFinally, each pixel is activated by a linear rectification function to obtain +.>
The step 2 is specifically implemented according to the following steps:
step 2.1, image pairThe high-frequency characteristic diagram, the medium-frequency characteristic diagram and the low-frequency characteristic diagram in the (a) are subjected to characteristic extraction, namely the following operations are performed simultaneously:
performing convolution operation on the high-frequency image by adopting a convolution check of 3 multiplied by 3 to obtain a characteristic diagram B 1_conv
Performing 1/2 times downsampling operation on the high-frequency image to obtain a characteristic diagram B 1_down
Downsampling the high-frequency image by 1/4 times to obtain a feature map B 1_down2
Up-sampling the intermediate frequency image by 2 times to obtain a characteristic diagram B 2_up
Performing convolution operation on the intermediate frequency image by adopting a convolution check of 3 multiplied by 3 to obtain a characteristic diagram B 2_conv
Downsampling the intermediate frequency image by 1/2 times to obtain a characteristic diagram B 2_down
4 times up-sampling operation is carried out on the low-frequency image to obtain a characteristic diagram B 3_up2
2 times up-sampling operation is carried out on the low-frequency image to obtain a characteristic diagram B 3_up
Convolving the low-frequency image by using a convolution check of 3×3 to obtain a feature map B 3_conv
Step 2.2 channel merger
For characteristic diagram B 1_conv 、B 2_up 、B 3_up2 Combining the channel numbers to obtain a high-frequency characteristic diagram B 1 =[64,64,(1-a in -b in )×64];
For characteristic diagram B 1_down 、B 2_conv 、B 3_up Combining the channel numbers to obtain an intermediate frequency characteristic diagram B 2 =[32,32,b in ×64];
For characteristic diagram B 1_down2 、B 2_down 、B 3_conv Combining the channel numbers to obtain a low-frequency characteristic diagram B 3 =[16,16,a in ×64];
Step 2.3, performing the following operations on each image output in step 2.2:
first, the average μ of the image pixels is calculated 2 The method comprises the steps of carrying out a first treatment on the surface of the Then, the variance sigma of the image pixels is calculated 2 The method comprises the steps of carrying out a first treatment on the surface of the Then carrying out normalization processing on the image pixels to obtainFinally, each pixel is activated by a linear rectification function to obtain +.>
In step 1.2 and step 2.3, the average value of the pixels is calculated as follows:
wherein x is i An image input for each layer; m is the number of pixels;
the variance calculation formula of the pixel is as follows:
wherein x is i An image input for each layer; m is the number of pixels;
the normalization formula is as follows:
wherein epsilon is a very small number of 0.0001 to 0.01;
the linear rectification activation function is as follows:
the step 3 is specifically implemented according to the following steps:
step 3.1, image is processedInputting the first residual block in the subnet one for processing
Step 3.1.1, image pairThe high-frequency characteristic diagram, the medium-frequency characteristic diagram and the low-frequency characteristic diagram in the (a) are subjected to characteristic extraction, namely the following operations are performed simultaneously:
performing convolution operation on the high-frequency image by adopting a convolution check of 3×3 to obtain a feature map C 1_conv
Performing 1/2 times downsampling operation on the high-frequency image to obtain a characteristic diagram C 1_down
Downsampling the high-frequency image by 1/4 times to obtain a feature image C 1_down2
Up-sampling the intermediate frequency image by 2 times to obtain a characteristic diagram C 2_up
Performing convolution operation on the intermediate frequency image by adopting a convolution check of 3 multiplied by 3 to obtain a characteristic diagram C 2_conv
Downsampling the intermediate frequency image by 1/2 times to obtain a characteristic diagram C 2_down
4 times up-sampling operation is carried out on the low-frequency image to obtain a characteristic diagram C 3_up2
2 times up-sampling operation is carried out on the low-frequency image to obtain a characteristic diagram C 3_up
Convolving the low-frequency image by using a convolution check of 3×3 to obtain a feature map C 3_conv
Step 3.1.2 channel merger
For characteristic diagram C 1_conv 、C 2_up 、C 3_up2 Combining the channel numbers to obtain a characteristic diagram C first_1_H
For characteristic diagram C 1_down 、C 2_conv 、C 3_up Combining the channel numbers to obtain a characteristic diagram C first_1_M
For characteristic diagram C 1_down2 、C 2_down 、C 3_conv Combining the channel numbers to obtain a characteristic diagram C first_1_L
Step 3.1.3, feature map C is obtained by the method of step 1.2 first_1_H Performing corresponding operation to obtain a characteristic diagram C first_2_H
The characteristic diagram C is subjected to the method of step 1.2 first_1_M Performing corresponding operation to obtain a characteristic diagram C first_2_M
The characteristic diagram C is subjected to the method of step 1.2 first_1_L Performing corresponding operation to obtain a characteristic diagram C first_2_L
Step 3.1.4, the method from step 3.1.1 to step 3.1.3 is adopted for the characteristic diagram C first_2_H Performing corresponding operation to obtain a characteristic diagram C first_3_H The method comprises the steps of carrying out a first treatment on the surface of the The characteristic diagram C is subjected to a method from step 3.1.1 to step 3.1.3 first_2_M Performing corresponding operation to obtain a characteristic diagram C first_3_M The method comprises the steps of carrying out a first treatment on the surface of the The characteristic diagram C is subjected to a method from step 3.1.1 to step 3.1.3 first_2_L Performing corresponding operation to obtain a characteristic diagram C first_3_L
Step 3.1.5, adopting the method from step 3.1.1 to step 3.1.3,for characteristic diagram C first_3_H 、C first_3_M 、C first_3_L Performing corresponding operation to obtain a characteristic diagram C first_4_H 、C first_4_M 、C first_4_L
Step 3.1.6, image is takenHigh-frequency characteristic diagram and characteristic diagram C first_4_H Adding to obtain a characteristic diagram C 1_first The method comprises the steps of carrying out a first treatment on the surface of the Image +.>Mid-frequency signature and signature C first_4_M Adding to obtain a characteristic diagram C 2_first The method comprises the steps of carrying out a first treatment on the surface of the Image +.>Low frequency characteristic diagram and characteristic diagram C first_4_L Adding to obtain a characteristic diagram C 3_first
Step 3.2, processing the second residual block in the output/input subnet I of the step 3.1
The characteristic diagram C is subjected to the method of step 3.1 1_first 、C 2_first 、C 3_first Performing corresponding operation to obtain a characteristic diagram C 1_sec ond 、C 2_sec ond 、C 3_sec ond
Step 3.3, processing the third residual block in the output/input subnet I of the step 3.2
The characteristic diagram C is subjected to the method of step 3.1 1_sec ond 、C 2_sec ond 、C 2_third Performing corresponding operation to obtain a characteristic diagram C 1_third 、C 2_third 、C 3_third
Step 3.4, processing the fourth residual block in the output/input subnet I of the step 3.3
The characteristic diagram C is subjected to the method of step 3.1 1_third 、C 2_third 、C 3_third Performing corresponding operation to obtain a characteristic diagram C 1_fourth 、C 2_fourth 、C 3_fourth
Step 3.5, processing the output/input conversion layer in the sub-network I of step 3.4
Feature map C is checked using a 3 x 3 convolution 1_fourth 、C 2_fourth 、C 3_fourth Performing convolution operation to obtain a characteristic diagram C 1_fifth_1 、C 2_fifth_1 、C 3_fifth_1 The method comprises the steps of carrying out a first treatment on the surface of the Marked as C 1 C, i.e 1 Including feature map C 1_fifth_1 、C2 _fifth_1 、C 3_fifth_1
For characteristic diagram C 1_fourth 、C 2_fourth 、C 3_fourth Performing 1/2 times downsampling operation to obtain a characteristic diagram C 1_fifth _ 2 、C 2_fift_2 、C 3_fifth_2 The method comprises the steps of carrying out a first treatment on the surface of the Marked as C 2 C, i.e 2 Including feature map C 1_fifth_2 、C 2_fifth_2 、C 3_fifth_2
Step 4 is specifically implemented according to the following steps:
step 4.1, processing the first residual block in the output/input subnet II of the step 3, and adopting the method of the step 3.1 to carry out C 1 Performing corresponding operation to obtain D 1_first
C using the method of step 3.1 2 Performing corresponding operation to obtain D 2_first
Step 4.2, processing the second residual block in the output/input sub-network II of the step 4.1
Pair D using the procedure of step 3.1 1_first Performing corresponding operation to obtain D 1_sec ond
Pair D using the procedure of step 3.1 2_first Performing corresponding operation to obtain D 2_sec ond
Step 4.3, processing the third residual block in the output/input subnet II of the step 4.2
Pair D using the procedure of step 3.1 1_sec ond Performing corresponding operation to obtain D 1_third
Pair D using the procedure of step 3.1 2_sec ond Performing corresponding operation to obtain D 2_third
Step 4.4, processing the third residual block in the second output/input subnet of the step 4.3
Pair D using the procedure of step 3.1 1_third Performing corresponding operation to obtain D 1_fourth
Pair D using the procedure of step 3.1 2_third Performing corresponding operation to obtain D 2_fourth
Step 4.5, processing the conversion layer in the output/input sub-network II of the step 4.4
Using a 3 x 3 convolution check D 1_fourth Performing convolution operation to obtain D 1_fifth_1 The method comprises the steps of carrying out a first treatment on the surface of the Pair D 2_fourth Performing up-sampling operation by 2 times to obtain D 2_fifth_1 The method comprises the steps of carrying out a first treatment on the surface of the Will D 1_fifth_1 And D 2_fifth_1 Adding to obtain D 1
Pair D 1_fourth Performing 1/2 times downsampling operation to obtain D 1_fifth_2 The method comprises the steps of carrying out a first treatment on the surface of the Using a 3 x 3 convolution check D 2_fourth Performing convolution operation to obtain D 2_fifth_2 The method comprises the steps of carrying out a first treatment on the surface of the Will D 1_fifth_2 And D 2_fifth_2 Adding to obtain D 2
Pair D 1_fourth Performing 1/4 times downsampling operation to obtain D 1_fifth_3 The method comprises the steps of carrying out a first treatment on the surface of the Pair D 2_fourth Performing 1/2 times downsampling operation to obtain D 2_fifth_3 The method comprises the steps of carrying out a first treatment on the surface of the Will D 1_fifth_3 And D 2_fifth_3 Adding to obtain D 3
Step 5 is specifically implemented according to the following steps:
step 5.1, processing the first residual block in the output/input subnet III of the step 4
Pair D using the procedure of step 3.1 1 Performing corresponding operation to obtain E 1_first
Pair D using the procedure of step 3.1 2 Performing corresponding operation to obtain E 2_first
Pair D using the procedure of step 3.1 3 Performing corresponding operation to obtain E 3_first
Step 5.2, processing the second residual block in the output/input subnet III of the step 5.1
Pair E using the procedure of step 3.1 1_first Performing corresponding operationsObtaining E 1_sec ond
Pair E using the procedure of step 3.1 2_first Performing corresponding operation to obtain E 2_sec ond
Pair E using the procedure of step 3.1 3_first Performing corresponding operation to obtain E 3_sec ond
Step 5.3, processing the third residual block in the output/input subnet III of the step 5.2
Pair E using the procedure of step 3.1 1_sec ond Performing corresponding operation to obtain E 1_third
Pair E using the procedure of step 3.1 2_sec ond Performing corresponding operation to obtain E 2_third
Pair E using the procedure of step 3.1 3_sec ond Performing corresponding operation to obtain E 3_third
Step 5.4, processing the fourth residual block in the output/input subnet III of the step 5.3
Pair E using the procedure of step 3.1 1_third Performing corresponding operation to obtain E 1_fourth
Pair E using the procedure of step 3.1 2_third Performing corresponding operation to obtain E 2_fourth
Pair E using the procedure of step 3.1 3_third Performing corresponding operation to obtain E 3_fourth
Step 5.5, processing the conversion layer in the output/input subnet III of step 5.4
E was checked using a 3X 3 convolution 1_fourth Performing convolution operation to obtain E 1_fifth_1 The method comprises the steps of carrying out a first treatment on the surface of the Pair E 2_fourth Performing up-sampling operation by 2 times to obtain E 2_fifth_1 The method comprises the steps of carrying out a first treatment on the surface of the Pair E 3_fourth 4 times of up-sampling operation is carried out to obtain E 3_fifth_1 The method comprises the steps of carrying out a first treatment on the surface of the Will E 1_fifth_1 、E 2_fifth_1 、E 3_fifth_1 Adding to obtain E 1
Pair E 1_fourth Performing 1/2 times downsampling operation to obtain E 1_fifth_2 The method comprises the steps of carrying out a first treatment on the surface of the E was checked using a 3X 3 convolution 2_fourth Performing convolution operation to obtain E 2_fifth_2 The method comprises the steps of carrying out a first treatment on the surface of the Pair E 3_fourth Performing up-sampling operation by 2 times to obtain E 3_fifth_2 The method comprises the steps of carrying out a first treatment on the surface of the Will E 1_fifth_2 、E 2_fifth_2 、E 3_fifth_2 Adding to obtain E 2
Pair E 1_fourth Performing 1/4 times downsampling operation to obtain E 1_fifth_3 The method comprises the steps of carrying out a first treatment on the surface of the Pair E 2_fourth Performing 1/2 times downsampling operation to obtain E 2_fifth_3 The method comprises the steps of carrying out a first treatment on the surface of the E was checked using a 3X 3 convolution 3_fourth Performing convolution operation to obtain E 3_fifth_3 The method comprises the steps of carrying out a first treatment on the surface of the Will E 1_fifth_3 、E 2_fifth_3 、E 3_fifth_3 Adding to obtain E 3
The specific process of the step 6 is as follows:
step 6.1, check E with a 3×3 convolution 1 Performing convolution operation to obtain E 1_conv The method comprises the steps of carrying out a first treatment on the surface of the Pair E 2 Performing up-sampling operation by 2 times to obtain E 2_up The method comprises the steps of carrying out a first treatment on the surface of the Pair E 3 4 times of up-sampling operation is carried out to obtain E 3_up2 The method comprises the steps of carrying out a first treatment on the surface of the Will E 1_conv 、E 2_up 、E 3_up2 Adding to obtain P pre
Step 6.2, P pre Performing matrix transformation to obtain a characteristic diagram P pre_trans =[64,64,64,Allioint]The method comprises the steps of carrying out a first treatment on the surface of the For the characteristic map P pre_trans Performing softmax operation on the first three channels to obtain a characteristic diagram H;
and 6.3, extracting joint coordinates in the feature map H, wherein the operation is expressed as follows:
wherein W, H, D is the width, height and number of the feature map respectively;
step 6.4, P _x ,P _v ,P _z And (5) obtaining a matrix P after splicing, namely the estimated gesture.
softmax is expressed as:
wherein x is i Is the pixel value of the i-th pixel.
The beneficial effects of the invention are as follows:
(1) Compared with the traditional convolution mode, the improved convolution mode can extract different characteristics in the sample, and the parameters are smaller, so that the network is light;
(2) According to the invention, through the self-built neural network, large-scale details and small-scale global features are extracted in a targeted manner, and a process from shallow to deep to shallow is adopted, so that more accurate estimation of the 3D gesture is realized;
(3) The invention uses the idea of residual error network, reduces the parameter quantity of algorithm, and avoids the problems of gradient explosion or gradient disappearance of network;
(4) According to the convolutional neural network algorithm, the loss function is optimized through preprocessing the data set, and the loss function training network is adopted, so that the convolutional neural network algorithm disclosed by the invention is more suitable for estimating the human body posture in reality;
(5) Relay supervision adopted in the loss function not only visualizes the training process of the network, but also can improve the convergence rate of the network.
Drawings
FIG. 1 is a structural block diagram of a convolutional neural network algorithm of a human body 3D gesture estimation method facing a 2D image;
FIG. 2 is a convolution schematic diagram of step 1 of the human body 3D gesture estimation method facing to the 2D image;
FIG. 3 is a schematic diagram of step 3 of the human body 3D gesture estimation method facing to the 2D image;
FIG. 4 is a schematic diagram of step 4 of the human body 3D gesture estimation method facing the 2D image;
FIG. 5 is a schematic diagram of step 5 of the human body 3D gesture estimation method facing the 2D image;
FIG. 6 is a human body trunk activity diagram in the human body 3D posture estimation method facing the 2D image of the present invention;
FIG. 7 is a schematic view of joint points in the 2D image-oriented human body 3D gesture estimation method of the present invention;
FIG. 8 is a conventional 3D pose graph estimated based on the Hourgass framework;
FIG. 9 is a front view of a conventional 3D pose estimated based on the Hourglass framework;
FIG. 10 is a right side view of a conventional 3D pose estimated based on the Hourgass framework;
FIG. 11 is a left view of a conventional 3D pose estimated based on the Hourgass framework
FIG. 12 is a front view of a 3D pose estimated by the 2D image-oriented human body 3D pose estimation method of the present invention;
FIG. 13 is a right view of a 3D pose estimated by the 2D image-oriented human body 3D pose estimation method of the present invention;
fig. 14 is a left view of a 3D pose estimated by the 2D image-oriented human body 3D pose estimation method of the present invention.
Detailed Description
The invention will be described in detail below with reference to the drawings and the detailed description.
As shown in fig. 1, the method for estimating the 3D posture of the human body facing the 2D image is implemented according to the following steps:
step 1, carrying out convolution, normalization and activation operations on a 2D image in sequence, and outputting the image
As shown in fig. 2, the step 1 is specifically implemented according to the following steps:
step 1.1, carrying out the following operations on the 2D image simultaneously:
(1) The convolution operation is performed by using a convolution kernel of 3×3, and the number of channels is (1-a) in -b in ) X64, obtain high frequency characteristic diagram A 1 =[128,128,(1-a in -b in )×64]The method comprises the steps of carrying out a first treatment on the surface of the Wherein a is in Is a low frequency channel number coefficient; b in Is the intermediate frequency channel number coefficient;
(2) Downsampling by 1/2 times, and obtaining the number of channels b in X64, obtain intermediate frequency characteristic diagram A 2 =[64,64,b in ×64];
(3) Downsampling by 1/4 times, number of channelsIs a as in X64, obtain low frequency characteristic diagram A 3 =[32,32,a in ×64];
Step 1.2, performing the following operations on each image output in step 1.1:
first, the average μ of the image pixels is calculated 1 The method comprises the steps of carrying out a first treatment on the surface of the Then, the variance sigma of the image pixels is calculated 1 The method comprises the steps of carrying out a first treatment on the surface of the Then carrying out normalization processing on the image pixels to obtainFinally, each pixel is activated by a linear rectification function to obtain +.>
Step 2, for the imageSequentially performing convolution, normalization and activation operations to output image +.>
The step 2 is specifically implemented according to the following steps:
step 2.1, image pairThe high-frequency characteristic diagram, the medium-frequency characteristic diagram and the low-frequency characteristic diagram in the (a) are subjected to characteristic extraction, namely the following operations are performed simultaneously:
performing convolution operation on the high-frequency image by adopting a convolution check of 3 multiplied by 3 to obtain a characteristic diagram B 1_conv
Performing 1/2 times downsampling operation on the high-frequency image to obtain a characteristic diagram B 1_down
Downsampling the high-frequency image by 1/4 times to obtain a feature map B 1_down2
Up-sampling the intermediate frequency image by 2 times to obtain a characteristic diagram B 2_up
Performing convolution operation on the intermediate frequency image by adopting a convolution check of 3 multiplied by 3 to obtain a characteristic diagram B 2_conv
Centering ofDownsampling the frequency image by 1/2 times to obtain a characteristic diagram B 2_down
4 times up-sampling operation is carried out on the low-frequency image to obtain a characteristic diagram B 3_up2
2 times up-sampling operation is carried out on the low-frequency image to obtain a characteristic diagram B 3_up
Convolving the low-frequency image by using a convolution check of 3×3 to obtain a feature map B 3_conv
Step 2.2 channel merger
For characteristic diagram B 1_conv 、B 2_up 、B 3_up2 Combining the channel numbers to obtain a high-frequency characteristic diagram B 1 =[64,64,(1-a in -b in )×64];
For characteristic diagram B 1_down 、B 2_conv 、B 3_up Combining the channel numbers to obtain an intermediate frequency characteristic diagram B 2 =[32,32,b in ×64];
For characteristic diagram B 1_down2 、B 2_down 、B 3_conv Combining the channel numbers to obtain a low-frequency characteristic diagram B 3 =[16,16,a in ×64];
Step 2.3, performing the following operations on each image output in step 2.2:
first, the average μ of the image pixels is calculated 2 The method comprises the steps of carrying out a first treatment on the surface of the Then, the variance sigma of the image pixels is calculated 2 The method comprises the steps of carrying out a first treatment on the surface of the Then carrying out normalization processing on the image pixels to obtainFinally, each pixel is activated by a linear rectification function to obtain +.>
Step 3, image is processedInput subnet one is processed and the characteristic diagram C is output 1 、C 2
As shown in fig. 2, the step 3 is specifically implemented according to the following steps:
step 3.1, image is processedInputting the first residual block in the subnet one for processing
Step 3.1.1, image pairThe high-frequency characteristic diagram, the medium-frequency characteristic diagram and the low-frequency characteristic diagram in the (a) are subjected to characteristic extraction, namely the following operations are performed simultaneously:
performing convolution operation on the high-frequency image by adopting a convolution check of 3×3 to obtain a feature map C 1_conv
Performing 1/2 times downsampling operation on the high-frequency image to obtain a characteristic diagram C 1_down
Downsampling the high-frequency image by 1/4 times to obtain a feature image C 1_down2
Up-sampling the intermediate frequency image by 2 times to obtain a characteristic diagram C 2_up
Performing convolution operation on the intermediate frequency image by adopting a convolution check of 3 multiplied by 3 to obtain a characteristic diagram C 2_conv
Downsampling the intermediate frequency image by 1/2 times to obtain a characteristic diagram C 2_down
4 times up-sampling operation is carried out on the low-frequency image to obtain a characteristic diagram C 3_up2
2 times up-sampling operation is carried out on the low-frequency image to obtain a characteristic diagram C 3_up
Convolving the low-frequency image by using a convolution check of 3×3 to obtain a feature map C 3_conv
Step 3.1.2 channel merger
For characteristic diagram C 1_conv 、C 2_up 、C 3_up2 Combining the channel numbers to obtain a characteristic diagram C first_1_H
For characteristic diagram C 1_down 、C 2_conv 、C 3_up Combining the channel numbers to obtain a characteristic diagram C first_1_M
For characteristic diagram C 1_down2 、C 2_down 、C 3_conv Combining the channel numbers to obtain a characteristic diagram C first_1_L
Step 3.1.3, feature map C is obtained by the method of step 1.2 first_1_H Performing corresponding operation to obtain a characteristic diagram C first_2_H
The characteristic diagram C is subjected to the method of step 1.2 first_1_M Performing corresponding operation to obtain a characteristic diagram C first_2_M
The characteristic diagram C is subjected to the method of step 1.2 first_1_L Performing corresponding operation to obtain a characteristic diagram C first_2_L
Step 3.1.4, the method from step 3.1.1 to step 3.1.3 is adopted for the characteristic diagram C first_2_H Performing corresponding operation to obtain a characteristic diagram C first_3_H The method comprises the steps of carrying out a first treatment on the surface of the The characteristic diagram C is subjected to a method from step 3.1.1 to step 3.1.3 first_2_M Performing corresponding operation to obtain a characteristic diagram C first_3_M The method comprises the steps of carrying out a first treatment on the surface of the The characteristic diagram C is subjected to a method from step 3.1.1 to step 3.1.3 first_2_L Performing corresponding operation to obtain a characteristic diagram C first_3_L
Step 3.1.5, the method from step 3.1.1 to step 3.1.3 is adopted for the characteristic diagram C first_3_H 、C first_3_M 、C first_3_L Performing corresponding operation to obtain a characteristic diagram C first_4_H 、C first_4_M 、C first_4_L
Step 3.1.6, image is takenHigh-frequency characteristic diagram and characteristic diagram C first_4_H Adding to obtain a characteristic diagram C 1_first The method comprises the steps of carrying out a first treatment on the surface of the Image +.>Mid-frequency signature and signature C first_4_M Adding to obtain a characteristic diagram C 2_first The method comprises the steps of carrying out a first treatment on the surface of the Image +.>Low frequency characteristic diagram and characteristic diagram C first_4_L Adding to obtain a characteristic diagram C 3_first
Step 3.2, processing the second residual block in the output/input subnet I of the step 3.1
The characteristic diagram C is subjected to the method of step 3.1 1_first 、C 2_first 、C 3_first Performing corresponding operation to obtain a characteristic diagram C 1_sec ond 、C 2_sec ond 、C 3_sec ond
Step 3.3, processing the third residual block in the output/input subnet I of the step 3.2
The characteristic diagram C is subjected to the method of step 3.1 1_sec ond 、C 2_sec ond 、C 2_third Performing corresponding operation to obtain a characteristic diagram C 1_third 、C 2_third 、C 3_third
Step 3.4, processing the fourth residual block in the output/input subnet I of the step 3.3
The characteristic diagram C is subjected to the method of step 3.1 1_third 、C 2_third 、C 3_third Performing corresponding operation to obtain a characteristic diagram C 1_fourth 、C 2_fourth 、C 3_fourth
Step 3.5, processing the output/input conversion layer in the sub-network I of step 3.4
Feature map C is checked using a 3 x 3 convolution 1_fourth 、C 2_fourth 、C 3_fourth Performing convolution operation to obtain a characteristic diagram C 1_fifth_1 、C 2_fifth_1 、C 3_fifth_1 The method comprises the steps of carrying out a first treatment on the surface of the Marked as C 1 C, i.e 1 Including feature map C 1_fifth_1 、C 2_fifth_1 、C 3_fifth_1
For characteristic diagram C 1_fourth 、C 2_fourth 、C 3_fourth Performing 1/2 times downsampling operation to obtain a characteristic diagram C 1_fifth_2 、C 2_fifth _ 2 、C 3_fifth_2 The method comprises the steps of carrying out a first treatment on the surface of the Marked as C 2 C, i.e 2 Including feature map C 1_fifth_2 、C 2_fifth_2 、C 3_fifth_2
Step 4, feature map C 1 、C 2 Input subnet two is processed and a characteristic diagram D is output 1 、D 2 、D 3
As shown in fig. 3, the step 4 is specifically implemented according to the following steps:
step 4.1, processing the first residual block in the output/input subnet II of the step 3
C using the method of step 3.1 1 Performing corresponding operation to obtain D 1_first
C using the method of step 3.1 2 Performing corresponding operation to obtain D 2_first
Step 4.2, processing the second residual block in the output/input sub-network II of the step 4.1
Pair D using the procedure of step 3.1 1_first Performing corresponding operation to obtain D 1_sec ond
Pair D using the procedure of step 3.1 2_first Performing corresponding operation to obtain D 2_sec ond
Step 4.3, processing the third residual block in the output/input subnet II of the step 4.2
Pair D using the procedure of step 3.1 1_sec ond Performing corresponding operation to obtain D 1_third
Pair D using the procedure of step 3.1 2_sec ond Performing corresponding operation to obtain D 2_third
Step 4.4, processing the third residual block in the second output/input subnet of the step 4.3
Pair D using the procedure of step 3.1 1_third Performing corresponding operation to obtain D 1_fourth
Pair D using the procedure of step 3.1 2_third Performing corresponding operation to obtain D 2_fourth
Step 4.5, processing the conversion layer in the output/input sub-network II of the step 4.4
Using a 3 x 3 convolution check D 1_fourth Performing convolution operation to obtain D 1_fifth_1 The method comprises the steps of carrying out a first treatment on the surface of the Pair D 2_fourth Performing up-sampling operation by 2 times to obtain D 2_fifth_1 The method comprises the steps of carrying out a first treatment on the surface of the Will D 1_fifth_1 And D 2_fifth_1 Adding to obtain D 1
Pair D 1_fourth Performing 1/2 times downsampling operation to obtain D 1_fifth_2 The method comprises the steps of carrying out a first treatment on the surface of the Using a 3 x 3 convolution check D 2_fourth Performing convolution operation to obtain D 2_fifth_2 The method comprises the steps of carrying out a first treatment on the surface of the Will D 1_fifth_2 And D 2_fifth_2 Adding to obtain D 2
Pair D 1_fourth Performing 1/4 times downsampling operation to obtain D 1_fifth_3 The method comprises the steps of carrying out a first treatment on the surface of the Pair D 2_fourth Performing 1/2 times downsampling operation to obtain D 2_fifth_3 The method comprises the steps of carrying out a first treatment on the surface of the Will D 1_fifth_3 And D 2_fifth_3 Adding to obtain D 3
Step 5, feature map D 1 、D 2 、D 3 Processing the input subnet III, and outputting a characteristic diagram E 1 、E 2 、E 3
As shown in fig. 4, the step 5 is specifically implemented as follows:
step 5.1, processing the first residual block in the output/input subnet III of the step 4
Pair D using the procedure of step 3.1 1 Performing corresponding operation to obtain E 1_first
Pair D using the procedure of step 3.1 2 Performing corresponding operation to obtain E 2_first
Pair D using the procedure of step 3.1 3 Performing corresponding operation to obtain E 3_first
Step 5.2, processing the second residual block in the output/input subnet III of the step 5.1
Pair E using the procedure of step 3.1 1_first Performing corresponding operation to obtain E 1_sec ond
Pair E using the procedure of step 3.1 2_first Performing corresponding operation to obtain E 2_sec ond
Pair E using the procedure of step 3.1 3_first Performing corresponding operation to obtain E 3_sec ond
Step 5.3, processing the third residual block in the output/input subnet III of the step 5.2
Using step 3.1Method pair E 1_sec ond Performing corresponding operation to obtain E 1_third
Pair E using the procedure of step 3.1 2_sec ond Performing corresponding operation to obtain E 2_third
Pair E using the procedure of step 3.1 3_sec ond Performing corresponding operation to obtain E 3_third
Step 5.4, processing the fourth residual block in the output/input subnet III of the step 5.3
Pair E using the procedure of step 3.1 1_third Performing corresponding operation to obtain E 1_fourth
Pair E using the procedure of step 3.1 2_third Performing corresponding operation to obtain E 2_fourth
Pair E using the procedure of step 3.1 3_third Performing corresponding operation to obtain E 3_fourth
Step 5.5, processing the conversion layer in the output/input subnet III of step 5.4
E was checked using a 3X 3 convolution 1_fourth Performing convolution operation to obtain E 1_fifth_1 The method comprises the steps of carrying out a first treatment on the surface of the Pair E 2_fourth Performing up-sampling operation by 2 times to obtain E 2_fifth_1 The method comprises the steps of carrying out a first treatment on the surface of the Pair E 3_fourth 4 times of up-sampling operation is carried out to obtain E 3_fifth_1 The method comprises the steps of carrying out a first treatment on the surface of the Will E 1_fifth_1 、E 2_fifth_1 、E 3_fifth_1 Adding to obtain E 1
Pair E 1_fourth Performing 1/2 times downsampling operation to obtain E 1_fifth_2 The method comprises the steps of carrying out a first treatment on the surface of the E was checked using a 3X 3 convolution 2_fourth Performing convolution operation to obtain E 2_fifth_2 The method comprises the steps of carrying out a first treatment on the surface of the Pair E 3_fourth Performing up-sampling operation by 2 times to obtain E 3_fifth_2 The method comprises the steps of carrying out a first treatment on the surface of the Will E 1_fifth_2 、E 2_fifth_2 、E 3_fifth_2 Adding to obtain E 2
Pair E 1_fourth Performing 1/4 times downsampling operation to obtain E 1_fifth_3 The method comprises the steps of carrying out a first treatment on the surface of the Pair E 2_fourth Performing 1/2 times downsampling operation to obtain E 2_fifth_3 The method comprises the steps of carrying out a first treatment on the surface of the E was checked using a 3X 3 convolution 3_fourth Performing convolution operation to obtain E 3_fifth_3 The method comprises the steps of carrying out a first treatment on the surface of the Will E 1_fifth_3 、E 2_fifth_3 、E 3_fifth_3 Adding to obtain E 3
Step 6, for the characteristic diagram E 1 、E 2 、E 3 Processing to obtain a matrix P, namely an estimated gesture;
step 6.1, check E with a 3×3 convolution 1 Performing convolution operation to obtain E 1_conv The method comprises the steps of carrying out a first treatment on the surface of the Pair E 2 Performing up-sampling operation by 2 times to obtain E 2_up The method comprises the steps of carrying out a first treatment on the surface of the Pair E 3 4 times of up-sampling operation is carried out to obtain E 3_up2 The method comprises the steps of carrying out a first treatment on the surface of the Will E 1_conv 、E 2_up 、E 3_up2 Adding to obtain P pre
Step 6.2, P pre Performing matrix transformation to obtain a characteristic diagram P pre_trans =[64,64,64,Alljoint]The method comprises the steps of carrying out a first treatment on the surface of the For the characteristic map P pre_trans Performing softmax operation on the first three channels to obtain a characteristic diagram H;
and 6.3, extracting joint coordinates in the feature map H, wherein the operation is expressed as follows:
wherein W, H, D is the width, height and number of the feature map respectively;
step 6.4, P _x ,P _v ,P _z The matrix P is obtained after the splicing, namely the estimated gesture;
wherein, softmax is expressed as:
wherein x is i Is the pixel value of the i-th pixel.
In step 1.2 and step 2.3, the average value of the pixels is calculated as follows:
wherein x is i An image input for each layer; m is the number of pixels;
the variance calculation formula of the pixel is as follows:
wherein x is i An image input for each layer; m is the number of pixels;
the normalization formula is as follows:
wherein epsilon is a very small number of 0.0001 to 0.01;
the linear rectification activation function is as follows:
1. calculating losses using optimized loss functions
The loss term comprises relay supervision loss, symmetry loss, motion loss and depth loss;
(1) Loss of relay supervision
Predicting relay loss of the ith joint point of the human body; the ith joint point coordinate isFrom G i = (x, y) represents the true joint point coordinates; the relay supervision Loss is calculated by adopting the following formula:
wherein, allJoint is the quantity of the predicted human body joint points;
(2) Loss of symmetry
As shown in fig. 7, the lower arm, the upper arm, the shoulder, the thigh and the lower leg of the human body are symmetrical; the lengths of the arms 7-8 and 12-13 are equalThe method comprises the steps of carrying out a first treatment on the surface of the The symmetric Loss is calculated by adopting the following formula symmetry
Wherein all_s is the symmetric limb logarithm;coordinates of the joint points to be predicted;representing the limb rods between the predicted joints (e.g., 1-2 in fig. 8, left foot and left knee), limb rods (e.g., 6-5 in fig. 8, right foot and right knee).
(3) Loss of motion
As shown in FIG. 6, the range of motion of the joints of the human body is defined by counting the main stream data set, the normal standing posture of the human body is defined, the left shoulder direction is the positive x-axis direction, the direction of the two feet is the positive y-axis direction, the direction of the two feet is the positive z-axis direction, each joint motion range is obtained, and the spherical coordinates are used for representing the joint motion ranges, and gamma 1min ,γ max ) The joint 1 is shown as having a length (gamma) min ,γ max ) In between the two,indicating that the horizontal angle of joint 1 belongs to +.>θ 1min ,θ max ) Indicating that the pitch angle of the joint 1 belongs to (θ) min ,θ max ) Spherocoord is the joint (e.g. lower legs 1 and 4, upper legs 2 and 3, etc. in fig. 9), i.e.:
judging whether the predicted joint belongs to the motion range, if so, giving penalty lambda if not, wherein the loss is 0;
the motion loss is calculated using the following formula:
(4) Depth loss
The predicted ith joint point coordinate of the human body is as followsFrom G-3d i = (x, y, z) represents the true joint point coordinates, and the depth loss is calculated using the following formula: />
The total loss is:
Loss total =Loss middle +Loss symmetry +Loss Sph_c +Loss 3D
2. the method of the invention is compared with the traditional Hourgassbased network
Visualizing predicted P= [ m, allJoint,3 ]; as shown in fig. 8, which shows joints of a human body, fig. 9 to 11 are conventional 3D poses estimated based on a Hourglass network, and it can be seen that the knees are bent forward, and do not conform to the motion relationship of the human body; fig. 12-14 are 3D poses estimated by the method of the present invention, and it can be seen from different angles that the predicted result conforms to the human body motion relationship.
3. Memory consumption and floating point number test of convolutional neural network algorithm in the invention
a in ,b in When taking different values, the consumed memory and the floating point number per second operation are shown in the following table, wherein a in ,b in The low frequency channel coefficient and the medium frequency channel coefficient are respectively:
as can be seen from the graph, the consumed space and the floating point number per second operation are unchanged when the parameter amount is set to 0, but when the set value is increased, the consumed space and the floating point number per second operation start to decrease, indicating that the improved convolution system is functioning.

Claims (8)

1. The human body 3D posture estimation method facing the 2D image is characterized by comprising the following steps of:
step 1, carrying out convolution, normalization and activation operations on a 2D image in sequence, and outputting the image
Step 2, for the imageSequentially performing convolution, normalization and activation operations to output image +.>
Step 3, image is processedInput subnet one is processed and the characteristic diagram C is output 1 、C 2
Step 4, feature map C 1 、C 2 Input subnet two is processed and a characteristic diagram D is output 1 、D 2 、D 3
Step 5, feature map D 1 、D 2 、D 3 Processing the input subnet III, and outputting a characteristic diagram E 1 、E 2 、E 3
Step 6, for the characteristic diagram E 1 、E 2 、E 3 Processing to obtain a matrix P, namely an estimated gesture;
the step 3 is specifically implemented according to the following steps:
step 3.1, image is processedInputting the first residual block in the subnet one for processing
Step 3.1.1, image pairThe high-frequency characteristic diagram, the medium-frequency characteristic diagram and the low-frequency characteristic diagram in the (a) are subjected to characteristic extraction, namely the following operations are performed simultaneously:
performing convolution operation on the high-frequency image by adopting a convolution check of 3×3 to obtain a feature map C 1_conv
Performing 1/2 times downsampling operation on the high-frequency image to obtain a characteristic diagram C 1_down
Downsampling the high-frequency image by 1/4 times to obtain a characteristic diagram C 1_down2
Up-sampling the intermediate frequency image by 2 times to obtain a characteristic diagram C 2_up
Performing convolution operation on the intermediate frequency image by adopting a convolution check of 3 multiplied by 3 to obtain a characteristic diagram C 2_conv
Downsampling the intermediate frequency image by 1/2 times to obtain a characteristic diagram C 2_down
4 times up-sampling operation is carried out on the low-frequency image to obtain a characteristic diagram C 3_up2
2 times up-sampling operation is carried out on the low-frequency image to obtain a characteristic diagram C 3_up
Convolving the low-frequency image by using a convolution check of 3×3 to obtain a feature map C 3_conv
Step 3.1.2 channel merger
For characteristic diagram C 1_conv 、C 2_up 、C 3_up2 Combining the channel numbers to obtain a characteristic diagram C first_1_H
For characteristic diagram C 1_down 、C 2_conv 、C 3_up Combining the channel numbers to obtain a characteristic diagram C first_1_M
For characteristic diagram C 1_down2 、C 2_down 、C 3_conv Combining the channel numbers to obtain a characteristic diagram C first_1_L
Step 3.1.3, feature map C is obtained by the method of step 1.2 first_1_H Performing corresponding operation to obtain a characteristic diagram C first_2_H
The characteristic diagram C is subjected to the method of step 1.2 first_1_M Performing corresponding operation to obtain a characteristic diagram C first_2_M
The characteristic diagram C is subjected to the method of step 1.2 first_1_L Performing corresponding operation to obtain a characteristic diagram C first_2_L
Step 3.1.4, the method from step 3.1.1 to step 3.1.3 is adopted for the characteristic diagram C first_2_H Performing corresponding operation to obtain a characteristic diagram C first_3_H The method comprises the steps of carrying out a first treatment on the surface of the The characteristic diagram C is subjected to a method from step 3.1.1 to step 3.1.3 first_2_M Performing corresponding operation to obtain a characteristic diagram C first_3_M The method comprises the steps of carrying out a first treatment on the surface of the The characteristic diagram C is subjected to a method from step 3.1.1 to step 3.1.3 first_2_L Performing corresponding operation to obtain a characteristic diagram C first_3_L
Step 3.1.5, the method from step 3.1.1 to step 3.1.3 is adopted for the characteristic diagram C first_3_H 、C first_3_M 、C first_3_L Performing corresponding operation to obtain a characteristic diagram C first_4_H 、C first_4_M 、C first_4_L
Step 3.1.6, image is takenHigh-frequency characteristic diagram and characteristic diagram C first_4_H Adding to obtain a characteristic diagram C 1_first The method comprises the steps of carrying out a first treatment on the surface of the Image +.>Mid-frequency signature and signature C first_4_M Adding to obtain a characteristic diagram C 2_first The method comprises the steps of carrying out a first treatment on the surface of the Image +.>Low frequency characteristic diagram and characteristic diagram C first_4_L Adding to obtain a characteristic diagram C 3_first
Step 3.2, processing the second residual block in the output/input subnet I of the step 3.1
The characteristic diagram C is subjected to the method of step 3.1 1_first 、C 2_first 、C 3_first Performing corresponding operation to obtain a characteristic diagram C 1_second 、C 2_second 、C 3_second
Step 3.3, processing the third residual block in the output/input subnet I of the step 3.2
The characteristic diagram C is subjected to the method of step 3.1 1_second 、C 2_second 、C 2_second Performing corresponding operation to obtain a characteristic diagram C 1_third 、C 2_third 、C 3_third
Step 3.4, processing the fourth residual block in the output/input subnet I of the step 3.3
The characteristic diagram C is subjected to the method of step 3.1 1_third 、C 2_third 、C 3_third Performing corresponding operation to obtain a characteristic diagram C 1_fourth 、C 2_fourth 、C 3_fourth
Step 3.5, processing the output/input conversion layer in the sub-network I of step 3.4
Feature map C is checked using a 3 x 3 convolution 1_fourth 、C 2_fourth 、C 3_fourth Performing convolution operation to obtain a characteristic diagram C 1_fifth_1 、C 2_fifth_1 、C 3_fifth_1 The method comprises the steps of carrying out a first treatment on the surface of the Marked as C 1 C, i.e 1 Including feature map C 1_fifth_1 、C 2_fifth_1 、C 3_fifth_1
For characteristic diagram C 1_fourth 、C 2_fourth 、C 3_fourth Performing 1/2 times downsampling operation to obtain a characteristic diagram C 1_fifth_2 、C 2_fifth_2 、C 3_fifth_2 The method comprises the steps of carrying out a first treatment on the surface of the Marked as C 2 C, i.e 2 Including feature map C 1_fifth_2 、C 2_fifth_2 、C 3_fifth_2
2. The method for estimating the 3D pose of the human body facing the 2D image according to claim 1, wherein the step 1 is specifically implemented according to the following steps:
step 1.1, carrying out the following operations on the 2D image simultaneously:
(1) The convolution operation is performed by using a convolution kernel of 3×3, and the number of channels is (1-a) in -b in ) X64, obtain high frequency characteristic diagram A 1 =[128,128,(1-a in -b in )×64]The method comprises the steps of carrying out a first treatment on the surface of the Wherein a is in Is a low frequency channel number coefficient; b in Is the intermediate frequency channel number coefficient;
(2) Downsampling by 1/2 times, and obtaining the number of channels b in X64, obtain intermediate frequency characteristic diagram A 2 =[64,64,b in ×64];
(3) Downsampling by 1/4 times, and obtaining channel number a in X64, obtain low frequency characteristic diagram A 3 =[32,32,a in ×64];
Step 1.2, performing the following operations on each image output in step 1.1:
first, the average μ of the image pixels is calculated 1 The method comprises the steps of carrying out a first treatment on the surface of the Then, the variance sigma of the image pixels is calculated 1 The method comprises the steps of carrying out a first treatment on the surface of the Then carrying out normalization processing on the image pixels to obtainFinally, each pixel is activated by a linear rectification function to obtain +.>
3. The method for estimating the 3D pose of the human body facing the 2D image according to claim 2, wherein the step 2 is specifically implemented according to the following steps:
step 2.1, image pairThe high-frequency characteristic diagram, the medium-frequency characteristic diagram and the low-frequency characteristic diagram in the (a) are subjected to characteristic extraction, namely the following operations are performed simultaneously:
performing convolution operation on the high-frequency image by adopting a convolution check of 3 multiplied by 3 to obtain a characteristic diagram B 1_conv
Performing 1/2 times downsampling operation on the high-frequency image to obtain a characteristic diagram B 1_down
Downsampling the high-frequency image by 1/4 times to obtain a characteristic diagram B 1_down2
Up-sampling the intermediate frequency image by 2 times to obtain a characteristic diagram B 2_up
Performing convolution operation on the intermediate frequency image by adopting a convolution check of 3 multiplied by 3 to obtain a characteristic diagram B 2_conv
Downsampling the intermediate frequency image by 1/2 times to obtain a characteristic diagram B 2_down
4 times up-sampling operation is carried out on the low-frequency image to obtain a characteristic diagram B 3_up2
2 times up-sampling operation is carried out on the low-frequency image to obtain a characteristic diagram B 3_up
Convolving the low-frequency image by using a convolution check of 3×3 to obtain a feature map B 3_conv
Step 2.2 channel merger
For characteristic diagram B 1_conv 、B 2_up 、B 3_up2 Combining the channel numbers to obtain a high-frequency characteristic diagram B 1 =[64,64,(1-a in -b in )×64];
For characteristic diagram B 1_down 、B 2_conv 、B 3_up Combining the channel numbers to obtain an intermediate frequency characteristic diagram B 2 =[32,32,b in ×64];
For characteristic diagram B 1_down2 、B 2_down 、B 3_conv Combining the channel numbers to obtain a low-frequency characteristic diagram B 3 =[16,16,a in ×64];
Step 2.3, performing the following operations on each image output in step 2.2:
first, the average μ of the image pixels is calculated 2 The method comprises the steps of carrying out a first treatment on the surface of the Then, the variance sigma of the image pixels is calculated 2 The method comprises the steps of carrying out a first treatment on the surface of the Then carrying out normalization processing on the image pixels to obtainFinally, a linear rectification function is adopted for eachThe pixels are activated to get +.>
4. A method for estimating a 3D posture of a human body facing a 2D image according to claim 3, wherein in the step 1.2 and the step 2.3, the average value of the pixels is calculated as follows:
wherein x is i An image input for each layer; m is the number of pixels;
the variance calculation formula of the pixel is as follows:
wherein x is i An image input for each layer; m is the number of pixels;
the normalization formula is as follows:
wherein epsilon is a very small number of 0.0001 to 0.01;
the linear rectification activation function is as follows:
5. the method for estimating the 3D pose of the human body facing the 2D image according to claim 1, wherein the step 4 is specifically implemented according to the following steps:
step 4.1, processing the first residual block in the output/input subnet II of the step 3
C using the method of step 3.1 1 Performing corresponding operation to obtain D 1_first
C using the method of step 3.1 2 Performing corresponding operation to obtain D 2_first
Step 4.2, processing the second residual block in the output/input sub-network II of the step 4.1
Pair D using the procedure of step 3.1 1_first Performing corresponding operation to obtain D 1_second
Pair D using the procedure of step 3.1 2_first Performing corresponding operation to obtain D 2_second
Step 4.3, processing the third residual block in the output/input subnet II of the step 4.2
Pair D using the procedure of step 3.1 1_second Performing corresponding operation to obtain D 1_third
Pair D using the procedure of step 3.1 2_second Performing corresponding operation to obtain D 2_third
Step 4.4, processing the fourth residual block in the output/input subnet II of the step 4.3
Pair D using the procedure of step 3.1 1_third Performing corresponding operation to obtain D 1_fourth
Pair D using the procedure of step 3.1 2_third Performing corresponding operation to obtain D 2_fourth
Step 4.5, processing the conversion layer in the output/input sub-network II of the step 4.4
Using a 3 x 3 convolution check D 1_fourth Performing convolution operation to obtain D 1_fifth_1 The method comprises the steps of carrying out a first treatment on the surface of the Pair D 2_fourth Performing up-sampling operation by 2 times to obtain D 2_fifth_1 The method comprises the steps of carrying out a first treatment on the surface of the Will D 1_fifth_1 And D 2_fifth_1 Adding to obtain D 1
Pair D 1_fourth Performing 1/2 times downsampling operation to obtain D 1_fifth_2 The method comprises the steps of carrying out a first treatment on the surface of the Using a 3 x 3 convolution check D 2_fourth Performing convolution operation to obtain D 2_fifth_2 The method comprises the steps of carrying out a first treatment on the surface of the Will D 1_fifth_2 And D 2_fifth_2 Adding to obtain D 2
Pair D 1_fourth Performing 1/4 times downsampling operation to obtain D 1_fifth_3 The method comprises the steps of carrying out a first treatment on the surface of the Pair D 2_fourth Performing 1/2 times downsampling operation to obtain D 2_fifth_3 The method comprises the steps of carrying out a first treatment on the surface of the Will D 1_fifth_3 And D 2_fifth_3 Adding to obtain D 3
6. The method for estimating a 3D pose of a human body facing a 2D image according to claim 5, wherein said step 5 is specifically implemented as follows:
step 5.1, processing the first residual block in the output/input subnet III of the step 4
Pair D using the procedure of step 3.1 1 Performing corresponding operation to obtain E 1_first
Pair D using the procedure of step 3.1 2 Performing corresponding operation to obtain E 2_first
Pair D using the procedure of step 3.1 3 Performing corresponding operation to obtain E 3_first
Step 5.2, processing the second residual block in the output/input subnet III of the step 5.1
Pair E using the procedure of step 3.1 1_first Performing corresponding operation to obtain E 1_second
Pair E using the procedure of step 3.1 2_first Performing corresponding operation to obtain E 2_second
Pair E using the procedure of step 3.1 3_first Performing corresponding operation to obtain E 3_second
Step 5.3, processing the third residual block in the output/input subnet III of the step 5.2
Pair E using the procedure of step 3.1 1_second Performing corresponding operation to obtain E 1_third
Pair E using the procedure of step 3.1 2_second Performing corresponding operation to obtain E 2_third
Pair E using the procedure of step 3.1 3_second Performing corresponding operation to obtain E 3_third
Step 5.4, processing the fourth residual block in the output/input subnet III of the step 5.3
Pair E using the procedure of step 3.1 1_third Performing corresponding operation to obtain E 1_fourth
Pair E using the procedure of step 3.1 2_third Performing corresponding operation to obtain E 2_fourth
Pair E using the procedure of step 3.1 3_third Performing corresponding operation to obtain E 3_fourth
Step 5.5, processing the conversion layer in the output/input subnet III of step 5.4
E was checked using a 3X 3 convolution 1_fourth Performing convolution operation to obtain E 1_fifth_1 The method comprises the steps of carrying out a first treatment on the surface of the Pair E 2_fourth Performing up-sampling operation by 2 times to obtain E 2_fifth_1 The method comprises the steps of carrying out a first treatment on the surface of the Pair E 3_fourth 4 times of up-sampling operation is carried out to obtain E 3_fifth_1 The method comprises the steps of carrying out a first treatment on the surface of the Will E 1_fifth_1 、E 2_fifth_1 、E 3_fifth_1 Adding to obtain E 1
Pair E 1_fourth Performing 1/2 times downsampling operation to obtain E 1_fifth_2 The method comprises the steps of carrying out a first treatment on the surface of the E was checked using a 3X 3 convolution 2_fourth Performing convolution operation to obtain E 2_fifth_2 The method comprises the steps of carrying out a first treatment on the surface of the Pair E 3_fourth Performing up-sampling operation by 2 times to obtain E 3_fifth_2 The method comprises the steps of carrying out a first treatment on the surface of the Will E 1_fifth_2 、E 2_fifth_2 、E 3_fifth_2 Adding to obtain E 2
Pair E 1_fourth Performing 1/4 times downsampling operation to obtain E 1_fifth_3 The method comprises the steps of carrying out a first treatment on the surface of the Pair E 2_fourth Performing 1/2 times downsampling operation to obtain E 2_fifth_3 The method comprises the steps of carrying out a first treatment on the surface of the E was checked using a 3X 3 convolution 3_fourth Performing convolution operation to obtain E 3_fifth_3 The method comprises the steps of carrying out a first treatment on the surface of the Will E 1_fifth_3 、E 2_fifth_3 、E 3_fifth_3 Adding to obtain E 3
7. The method for estimating 3D pose of human body facing 2D image according to claim 6, wherein said step 6 comprises the specific procedures of:
step 6.1, check E with a 3×3 convolution 1 Performing convolution operation to obtain E 1_conv The method comprises the steps of carrying out a first treatment on the surface of the Pair E 2 Performing up-sampling operation by 2 times to obtain E 2_up The method comprises the steps of carrying out a first treatment on the surface of the Pair E 3 4 times of up-sampling operation is carried out to obtain E 3_up2 The method comprises the steps of carrying out a first treatment on the surface of the Will E 1_conv 、E 2_up 、E 3_up2 Adding to obtain P pre
Step 6.2, P pre Performing matrix transformation to obtain a characteristic diagram P pre_trans =[64,64,64,Alljoint]The method comprises the steps of carrying out a first treatment on the surface of the For the characteristic map P pre_trans Performing softmax operation on the first three channels to obtain a characteristic diagram H;
and 6.3, extracting joint coordinates in the feature map H, wherein the operation is expressed as follows:
wherein W, H, D is the width, height and number of the feature map respectively;
step 6.4, P _x ,P _y ,P _z And (5) obtaining a matrix P after splicing, namely the estimated gesture.
8. The 2D image oriented human 3D pose estimation method of claim 7, wherein the softmax is expressed as:
wherein x is i Is the pixel value of the i-th pixel.
CN202010021822.3A 2020-01-09 2020-01-09 2D image-oriented human body 3D gesture estimation method Active CN111353381B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010021822.3A CN111353381B (en) 2020-01-09 2020-01-09 2D image-oriented human body 3D gesture estimation method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010021822.3A CN111353381B (en) 2020-01-09 2020-01-09 2D image-oriented human body 3D gesture estimation method

Publications (2)

Publication Number Publication Date
CN111353381A CN111353381A (en) 2020-06-30
CN111353381B true CN111353381B (en) 2023-12-08

Family

ID=71195679

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010021822.3A Active CN111353381B (en) 2020-01-09 2020-01-09 2D image-oriented human body 3D gesture estimation method

Country Status (1)

Country Link
CN (1) CN111353381B (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2017084222A1 (en) * 2015-11-22 2017-05-26 南方医科大学 Convolutional neural network-based method for processing x-ray chest radiograph bone suppression
WO2017133009A1 (en) * 2016-02-04 2017-08-10 广州新节奏智能科技有限公司 Method for positioning human joint using depth image of convolutional neural network
CN109376571A (en) * 2018-08-03 2019-02-22 西安电子科技大学 Estimation method of human posture based on deformation convolution
CN110659565A (en) * 2019-08-15 2020-01-07 电子科技大学 3D multi-person human body posture estimation method based on porous convolution

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10885659B2 (en) * 2018-01-15 2021-01-05 Samsung Electronics Co., Ltd. Object pose estimating method and apparatus

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2017084222A1 (en) * 2015-11-22 2017-05-26 南方医科大学 Convolutional neural network-based method for processing x-ray chest radiograph bone suppression
WO2017133009A1 (en) * 2016-02-04 2017-08-10 广州新节奏智能科技有限公司 Method for positioning human joint using depth image of convolutional neural network
CN109376571A (en) * 2018-08-03 2019-02-22 西安电子科技大学 Estimation method of human posture based on deformation convolution
CN110659565A (en) * 2019-08-15 2020-01-07 电子科技大学 3D multi-person human body posture estimation method based on porous convolution

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于三维卷积的人手姿态估计高分辨率网络;桑农;李默然;;华中科技大学学报(自然科学版)(01);全文 *

Also Published As

Publication number Publication date
CN111353381A (en) 2020-06-30

Similar Documents

Publication Publication Date Title
CN110135375B (en) Multi-person attitude estimation method based on global information integration
CN109191491B (en) Target tracking method and system of full convolution twin network based on multi-layer feature fusion
CN109376571B (en) Human body posture estimation method based on deformation convolution
CN110008915B (en) System and method for estimating dense human body posture based on mask-RCNN
CN111340814B (en) RGB-D image semantic segmentation method based on multi-mode self-adaptive convolution
CN106127684B (en) Image super-resolution Enhancement Method based on forward-backward recutrnce convolutional neural networks
CN107492121B (en) Two-dimensional human body bone point positioning method of monocular depth video
CN109522850B (en) Action similarity evaluation method based on small sample learning
CN109377530A (en) A kind of binocular depth estimation method based on deep neural network
CN110659565B (en) 3D multi-person human body posture estimation method based on porous convolution
CN110738161A (en) face image correction method based on improved generation type confrontation network
CN106600632B (en) A kind of three-dimensional image matching method improving matching cost polymerization
CN110766746B (en) 3D driver posture estimation method based on combined 2D-3D neural network
CN113205595B (en) Construction method and application of 3D human body posture estimation model
CN112232134B (en) Human body posture estimation method based on hourglass network and attention mechanism
CN110060286B (en) Monocular depth estimation method
CN110135277B (en) Human behavior recognition method based on convolutional neural network
CN109598732A (en) A kind of medical image cutting method based on three-dimensional space weighting
CN106815855A (en) Based on the human body motion tracking method that production and discriminate combine
CN112084934A (en) Behavior identification method based on two-channel depth separable convolution of skeletal data
CN116030498A (en) Virtual garment running and showing oriented three-dimensional human body posture estimation method
CN113838102B (en) Optical flow determining method and system based on anisotropic dense convolution
CN111353381B (en) 2D image-oriented human body 3D gesture estimation method
CN116665300A (en) Skeleton action recognition method based on space-time self-adaptive feature fusion graph convolution network
CN115205737B (en) Motion real-time counting method and system based on transducer model

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right
TA01 Transfer of patent application right

Effective date of registration: 20231113

Address after: Room 623, Building 1, No. 132 Shenjia Road, Gongshu District, Hangzhou City, Zhejiang Province, 310000

Applicant after: Zhejiang Shuike Culture Group Co.,Ltd.

Address before: 710000 No. B49, Xinda Zhongchuang space, 26th Street, block C, No. 2 Trading Plaza, South China City, international port district, Xi'an, Shaanxi Province

Applicant before: Xi'an Huaqi Zhongxin Technology Development Co.,Ltd.

Effective date of registration: 20231113

Address after: 710000 No. B49, Xinda Zhongchuang space, 26th Street, block C, No. 2 Trading Plaza, South China City, international port district, Xi'an, Shaanxi Province

Applicant after: Xi'an Huaqi Zhongxin Technology Development Co.,Ltd.

Address before: 710048 Shaanxi province Xi'an Beilin District Jinhua Road No. 5

Applicant before: XI'AN University OF TECHNOLOGY

GR01 Patent grant
GR01 Patent grant