WO2022178951A1 - Three-dimensional human pose estimation method and apparatus, device, and storage medium - Google Patents

Three-dimensional human pose estimation method and apparatus, device, and storage medium Download PDF

Info

Publication number
WO2022178951A1
WO2022178951A1 PCT/CN2021/084570 CN2021084570W WO2022178951A1 WO 2022178951 A1 WO2022178951 A1 WO 2022178951A1 CN 2021084570 W CN2021084570 W CN 2021084570W WO 2022178951 A1 WO2022178951 A1 WO 2022178951A1
Authority
WO
WIPO (PCT)
Prior art keywords
dimensional
joint point
human body
trained
network
Prior art date
Application number
PCT/CN2021/084570
Other languages
French (fr)
Chinese (zh)
Inventor
孙奥兰
王健宗
程宁
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2022178951A1 publication Critical patent/WO2022178951A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/103Static body considered as a whole, e.g. static pedestrian or occupant recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/60Type of objects
    • G06V20/64Three-dimensional objects
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Definitions

  • the present application relates to the technical field of artificial intelligence, and in particular, to a method, device, device and storage medium for estimating a three-dimensional human body posture.
  • 3D human pose detection is an important research field in the field of computer vision, which has important application value and practical significance for human action discrimination, intention recognition, behavior detection, and sports teaching.
  • one method is to predict 2D joint points first.
  • the inventor realizes that 3D depth joint point regression prediction is performed on the basis of 2D joint point prediction, but this This method generally has the problem of insufficient utilization of spatial information extraction when performing 3D regression from 2D prediction results, resulting in low performance of the 3D prediction part; the other method is to directly perform end-to-end prediction of 3D joint points, but this The method is very difficult, the actual prediction effect is not good, and the performance indicators are generally low.
  • these two methods cannot correctly predict the joint points of the human body when there is interference and occlusion, and the prediction results of abnormal poses often occur.
  • the three-dimensional human body posture detection in the prior art cannot correctly predict the joint points of the human body when there is interference and occlusion, and the technical problems of abnormal posture prediction results often occur.
  • the main purpose of the present application is to provide a method, device, equipment and storage medium for estimating a three-dimensional human body posture, aiming to solve the problem that the three-dimensional human body posture detection in the prior art cannot correctly predict human body joint points when there is interference and occlusion, and Technical issues with prediction results for abnormal poses.
  • the present application proposes a method for estimating a three-dimensional human body posture, the method comprising:
  • the target three-dimensional human body posture data corresponding to the image to be estimated is obtained.
  • the present application also proposes a device for estimating a three-dimensional human body posture, the device comprising:
  • a data acquisition module for acquiring the image to be estimated
  • a three-dimensional human body posture estimation module is used to input the image to be estimated into a target three-dimensional human body posture estimation model for three-dimensional human posture estimation, wherein the target three-dimensional human body posture estimation model is based on the generation network to be trained, the discrimination to be trained The model obtained by adversarial training between the network and the strong geometric constraint model of human posture;
  • the target three-dimensional human body posture data determination module is used to obtain the target three-dimensional human body posture data corresponding to the image to be estimated according to the three-dimensional human body posture data output by the target three-dimensional human body posture estimation model.
  • the present application also proposes a computer device, including a memory and a processor, the memory stores a computer program, and the processor implements the following method steps when executing the computer program:
  • the target three-dimensional human body posture data corresponding to the image to be estimated is obtained.
  • the present application also proposes a computer-readable storage medium on which a computer program is stored, and when the computer program is executed by a processor, the following method steps are implemented:
  • the target three-dimensional human body posture data corresponding to the image to be estimated is obtained.
  • the method, device, device and storage medium for estimating three-dimensional human body posture of the present application by inputting the image to be estimated into the target three-dimensional human body posture estimation model to perform three-dimensional human body posture estimation, and according to the three-dimensional human body posture data output by the target three-dimensional human body posture estimation model,
  • the target 3D human pose data corresponding to the image to be estimated is obtained.
  • the target 3D human pose estimation model is a model obtained by adversarial training based on the generation network to be trained, the discriminant network to be trained and the strong geometric constraint model of human pose, and the adversarial training effectively solves the problem.
  • the introduction of a strong geometric constraint model of human posture improves the prediction performance of the trained model for interference occlusion, and reduces the probability of abnormal posture prediction.
  • FIG. 1 is a schematic flowchart of a method for estimating a three-dimensional human body posture according to an embodiment of the present application
  • FIG. 2 is a schematic structural block diagram of an apparatus for estimating a three-dimensional human body posture according to an embodiment of the present application
  • FIG. 3 is a schematic structural block diagram of a computer device according to an embodiment of the present application.
  • the present application proposes a three-dimensional human body posture estimation method.
  • the above method is applied to the field of artificial intelligence technology.
  • the method for estimating the three-dimensional human posture is to use two-dimensional data for preliminary confrontation training, and then use three-dimensional data to perform optimal confrontation training to obtain a three-dimensional human posture estimation model.
  • the penalty term for abnormal posture, adversarial training effectively solves the problem of fewer training samples, and the introduction of a strong geometric constraint model of human posture improves the prediction performance of the trained model for interference occlusion, and reduces the probability of abnormal posture prediction.
  • an embodiment of the present application provides a method for estimating a three-dimensional human body posture, and the method includes:
  • S2 Input the image to be estimated into the target three-dimensional human body posture estimation model to perform three-dimensional human body posture estimation, wherein the target three-dimensional human body posture estimation model is based on the generation network to be trained, the discrimination network to be trained and the strong geometry of human posture The model obtained by adversarial training of the constraint model;
  • the image to be estimated is input into the target three-dimensional human body posture estimation model to perform three-dimensional human body posture estimation, and the target three-dimensional human body posture data corresponding to the to-be-estimated image is obtained according to the three-dimensional human body posture data output by the target three-dimensional human body posture estimation model.
  • the 3D human pose estimation model is a model obtained by adversarial training based on the generative network to be trained, the discriminant network to be trained and the strong geometric constraint model of human posture. The model improves the prediction performance of the trained model for interference occlusion, and reduces the probability of abnormal posture prediction.
  • the image to be estimated inputted by the user may be acquired, the image to be estimated may also be acquired from a database, or the image to be estimated sent by a third-party application system.
  • the image to be estimated refers to a digital image containing a human body whose three-dimensional human pose needs to be estimated.
  • the three-dimensional human body posture data is the three-dimensional coordinate data of the 16 joints of the human body.
  • the three-dimensional coordinate data is expressed as (x, y, z), where x is the abscissa in the image, y is the ordinate in the image, and z is the depth value coordinate.
  • multiple two-dimensional joint point training samples are used to perform preliminary adversarial training on the generation network to be trained and the discriminant network to be trained. After the preliminary adversarial training reaches the convergence condition, the optimization adversarial training is continued with the three-dimensional joint point training samples.
  • the strong geometric constraint model of human pose is used to constrain the human pose in the loss function of the generating network, and finally the generating network that optimizes the adversarial training to reach the convergence condition is used as the target 3D human pose estimation model, which improves the training results.
  • the prediction performance of the model for interference occlusion reduces the probability of abnormal posture prediction.
  • the human body posture strong geometric constraint model is used to predict whether the human body posture is correct according to the three-dimensional human body posture data.
  • the implementation method of the human body posture strong geometric constraint model can be selected from the prior art, which will not be repeated here.
  • the three-dimensional human body posture data output by the target three-dimensional human body posture estimation model is used as the target three-dimensional human body posture data corresponding to the image to be estimated.
  • the method before the step of inputting the image to be estimated into the target 3D human pose estimation model for 3D human pose estimation, the method further includes:
  • S022 Use the plurality of two-dimensional joint point training samples to perform preliminary confrontation training on the generation network to be trained and the discrimination network to be trained, to obtain the generation network to be optimized and the discrimination network to be optimized;
  • S024 Use the penalty function and the multiple three-dimensional joint point training samples to perform optimization adversarial training on the generation network to be optimized and the discriminant network to be optimized, and optimize the adversarial training to achieve convergence conditions for the to-be-optimized
  • the generative network is used as the target 3D human pose estimation model.
  • the target three-dimensional human pose estimation model is obtained by adversarial training based on the generation network to be trained, the discrimination network to be trained, and the strong geometric constraint model of human posture.
  • the strong geometric constraint model improves the prediction performance of the trained model for interference occlusion, and reduces the probability of abnormal posture prediction.
  • multiple two-dimensional joint point training samples input by the user may be obtained, multiple two-dimensional joint point training samples may also be obtained from a database, or multiple two-dimensional joint point training samples sent by a third-party application system.
  • Multiple 3D joint point training samples input by the user can be obtained, multiple 3D joint point training samples can also be obtained from a database, or multiple 3D joint point training samples sent by a third-party application system.
  • the two-dimensional joint point training sample includes a first image sample data and a human body two-dimensional joint point calibration value.
  • the calibration value of the two-dimensional joint point of the human body is the result of calibrating the position data of the two-dimensional posture of the human body in the first image sample data.
  • the first image sample data is a digital image containing a human body.
  • the three-dimensional joint point training sample includes a second image sample data and a human body three-dimensional joint point calibration value.
  • the calibration value of the three-dimensional joint point of the human body is the result of calibrating the position data of the three-dimensional posture of the human body in the second image sample data.
  • the second image sample data is a digital image including a human body.
  • the discriminant network to be trained whose preliminary adversarial training reaches the convergence condition is used as the discriminant network to be optimized, so that the generation network to be optimized and the discriminant network to be optimized Learning a part of 2D joint point information in advance is beneficial to solve the problem of poor prediction performance of the trained model due to less training samples of 3D joint points.
  • the strong geometric constraint model of the human body posture carries out the geometric constraints of the human body posture in three aspects: joint point position, joint angle, and bone length, and when the output result of the strong geometric constraint model of the human body posture is abnormal, the Geometric constraints as penalty functions.
  • the penalty term of the loss function of the network strengthens the capture of the deeper spatial characteristics of the image by the target 3D human pose estimation model, and realizes the re-checking of whether the joint points are correctly predicted to capture the spatial characteristics of the image at multiple scales, thereby improving training.
  • the prediction performance of the obtained model for interference occlusion reduces the probability of abnormal pose prediction.
  • the above-mentioned multiple two-dimensional joint point training samples are used to perform preliminary confrontation training on the generation network to be trained and the discrimination network to be trained, and the generation network to be optimized and the discrimination network to be optimized are obtained.
  • the steps of the network include:
  • S0221 Obtain a two-dimensional joint point training sample from the plurality of two-dimensional joint point training samples as a target two-dimensional joint point training sample, where the target two-dimensional joint point training sample includes: first image sample data, human body 2D joint point calibration value;
  • S0222 Input the first image sample data of the target two-dimensional joint point training sample into the to-be-trained generation network to predict human two-dimensional joint points, and obtain a predicted value of human two-dimensional joint point samples;
  • S0224 Use the calibration value of the two-dimensional human body joint point of the target two-dimensional joint point training sample, the predicted value of the two-dimensional joint point sample of the human body, and the first confidence result to the generation network to be trained. Perform preliminary adversarial training on the discriminant network to be trained;
  • S0225 Repeat the step of obtaining a two-dimensional joint point training sample from the plurality of two-dimensional joint point training samples as a target two-dimensional joint point training sample, until the preliminary confrontation training reaches a convergence condition, and then use the preliminary confrontation training
  • the generation network to be trained that reaches the convergence condition is used as the generation network to be optimized, and the discriminant network to be trained that reaches the convergence condition through preliminary adversarial training is used as the discrimination network to be optimized.
  • This embodiment enables the generation network to be optimized and the discrimination network to be optimized to learn a part of the two-dimensional joint point information in advance, which is beneficial to solve the problem of poor prediction performance of the trained model due to fewer three-dimensional joint point training samples .
  • the predicted value of the human body two-dimensional joint point sample is the prediction result of the human body two-dimensional joint point position data of the first image sample data of the target two-dimensional joint point training sample.
  • For S0224 use the calibration value of the two-dimensional human body joint point of the target two-dimensional joint point training sample and the predicted value of the human two-dimensional joint point sample to perform loss value calculation and parameter update on the generation network to be trained,
  • the first confidence result is used to calculate the loss value and update the parameters of the discriminant network to be trained.
  • steps S0221 to S0225 are repeatedly executed until the preliminary confrontation training reaches the convergence condition, the generation network to be trained where the preliminary confrontation training reaches the convergence condition is used as the generation network to be optimized, and the preliminary confrontation training reaches the convergence condition
  • the discriminant network to be trained is used as the discriminant network to be optimized, so that the generating network and the discriminant network learn a part of two-dimensional joint point information in advance.
  • the convergence condition of preliminary confrontation training includes: the loss value of the generating network to be trained and the loss value of the discriminating network to be trained both reach the first convergence condition, or the number of training times of preliminary confrontation training reaches the second convergence condition condition.
  • the loss value of the generation network to be trained and the loss value of the discriminant network to be trained refer to the loss value of the generation network to be trained during preliminary adversarial training, and the loss value of the discriminant network to be trained.
  • the loss values all meet the first convergence condition.
  • the first convergence condition means that the size of the loss value calculated twice adjacent to the same network (that is, one of the generation network to be trained and the discriminant network to be trained) satisfies the Lipschitz condition (the Lipschitz continuity condition).
  • the number of training times of preliminary adversarial training reaches the second convergence condition, which refers to the number of times that the generating network to be trained and the discriminative network to be trained are used for preliminary adversarial training, that is, the number of preliminary adversarial training is increased once the preliminary adversarial training is performed.
  • the calibration value of the two-dimensional joint point of the human body, the predicted value of the two-dimensional joint point sample of the human body, and the first confidence result of the target two-dimensional joint point training sample are used for the training sample.
  • the generating network and the discriminant network to be trained perform preliminary adversarial training, including:
  • S02241 Input the calibration value of the human body two-dimensional joint point of the target two-dimensional joint point training sample and the predicted value of the human body two-dimensional joint point sample into the loss function of the generation network to be trained for calculation, and obtain the the first loss value of the generation network to be trained, and update the parameters of the generation network to be trained according to the first loss value;
  • S02242 Input the first confidence result into the loss function of the to-be-trained discriminant network for calculation, obtain a second loss value of the to-be-trained discriminant network, and update the to-be-trained discriminant network according to the second loss value Discriminate the parameters of the network;
  • the loss function of the generation network to be trained adopts the MSE loss function
  • the loss function of the discriminant network to be trained adopts the cross entropy loss function
  • This embodiment uses the calibration value of the two-dimensional joint point of the human body of the target two-dimensional joint point training sample, the predicted value of the two-dimensional joint point sample of the human body, and the first confidence result.
  • the discriminant network to be trained is subjected to preliminary confrontation training, so that the generation network to be optimized and the discriminant network to be optimized learn a part of the two-dimensional joint point information in advance, which is beneficial to solve the problem of less three-dimensional joint point training samples.
  • the problem of poor prediction performance of the trained model is referred to improve.
  • the loss function of the generation network to be trained that is, the MSE loss function
  • the method for updating the parameters of the generating network to be trained according to the first loss value can be selected from the prior art, and details are not described here.
  • the implementation manner of the MSE loss function can be selected from the prior art, and details are not described here.
  • the first confidence result is input into the loss function of the discriminant network to be trained to calculate the loss value, and the calculated loss value is used as the second loss value of the discriminant network to be trained.
  • the method for updating the parameters of the discriminant network to be trained according to the second loss value can be selected from the prior art, and details are not described here.
  • the above-mentioned steps of obtaining a penalty function according to the strong geometric constraint model of the human body posture include:
  • S0231 carry out joint point position constraint according to the strong geometric constraint model of the human body posture, and obtain a joint point position penalty item;
  • S0232 carry out joint angle constraint according to the strong geometric constraint model of the human body posture, and obtain a joint angle penalty item;
  • S0233 carry out bone length constraint according to the strong geometric constraint model of the human body posture, and obtain a bone length penalty term
  • the strong geometric constraint model of human posture is introduced to improve the prediction performance of the trained model for interference occlusion, and reduce the probability of abnormal posture prediction.
  • the three-dimensional joint point position data of the human body is input into the strong geometric constraint model of human posture, and the prediction result of the strong geometric constraint model of human posture is the deviation of the relative position relationship of the joint points, and the square loss of the relative position relationship deviation of the joint points is used as the joint point position penalty item.
  • the three-dimensional joint point position data of the human body is input into the strong geometric constraint model of human posture, and the prediction result of the strong geometric constraint model of human posture is that when the bones are abnormally bent, the square loss of the abnormal bending is used as the joint angle penalty item.
  • the normal joint angle of A bone and B bone in the human body is C 11 .
  • the square loss (C 12 -C 11 ) 2 is used as the penalty item corresponding to the joint angle between the A bone and the B bone.
  • the value of the penalty item corresponding to the joint angle between the A bone and the B bone is set to 0. This example is not specifically limited.
  • the three-dimensional joint point position data of the human body is input into the strong geometric constraint model of the human body posture to predict that the joint angles of C 2 and C 3 are abnormal.
  • the angle penalty term is 0+(C 22 -C 21 ) 2 +(C 32 -C 31 ) 2 +0, the first 0 is the penalty term for the joint angle C 1 , and (C 22 -C 21 ) 2 is the joint angle
  • the penalty item of C 2 , (C 32 -C 31 ) 2 is the penalty item of the joint angle C 3
  • the second 0 is the penalty item of the joint angle C 4 , which is not specifically limited in this example.
  • the proportion of bones is calculated from the training data set, and the correct proportion of bones is used as the comparison threshold for the model with strong geometric constraints on human posture.
  • the position data of the three-dimensional joint points of the human body is input into the strong geometric constraint model of human posture, and the prediction result of the strong geometric constraint model of human posture is that when there is a deviation from the existing bone proportional relationship, the square loss of the deviation of the bone proportional relationship is used as the bone length penalty item. .
  • the joint point position penalty item, the joint angle penalty item and the bone length constraint are added and associated to form a function, and the associated function is used as the penalty function.
  • the joint point position penalty item is P 1 , the joint angle penalty item P 2 , and the bone length penalty item P 3 , then the addition is associated with a function ⁇ 1 P 1 + ⁇ 2 P 2 + ⁇ 3 P 3 as Penalty function, where ⁇ 1 , ⁇ 2 , ⁇ 3 are the penalty coefficients of each penalty item, and the value range of the penalty coefficients is ⁇ 1 , ⁇ 2 , ⁇ 3 ⁇ (0,1).
  • the penalty function and the plurality of three-dimensional joint point training samples are used to perform optimal adversarial training on the generation network to be optimized and the discriminant network to be optimized, and the optimal adversarial training reaches a convergence condition.
  • the step of generating the network to be optimized as the target three-dimensional human body pose estimation model including:
  • S0242 Use the loss function of the discriminant network to be trained as the loss function of the discriminant network to be optimized
  • S0244 Use the loss function of the generation network to be optimized, the loss function of the discriminant network to be optimized, and the multiple three-dimensional joint point training samples to analyze the generation network to be optimized and the discriminant network to be optimized
  • the optimization confrontation training is performed, and the generation network to be optimized where the optimization confrontation training reaches a convergence condition is used as the target three-dimensional human body pose estimation model.
  • the penalty function obtained from the strong geometric constraint model of human posture is added as a penalty term to the loss function of the generating network to be trained, which strengthens the capture of the deeper spatial characteristics of the image by the target three-dimensional human posture estimation model, and realizes In order to capture the spatial characteristics of images at multiple scales, re-check whether the joint points are correctly predicted, improve the prediction performance of the trained model for interference occlusion, and reduce the probability of abnormal posture prediction.
  • the loss function of the generation network to be trained and the loss function of the discriminant network to be trained and the loss function of the generation network to be trained and the loss function of the generation network to be trained and the loss function of the generation network to be trained can also be obtained from the database.
  • the loss function of the discriminant network to be trained may also be the loss function of the generation network to be trained and the loss function of the discriminant network to be trained sent by a third-party application system. It can be understood that, the loss function of the generation network to be trained and the loss function of the discriminant network to be trained can also be written into the program file for realizing the present application.
  • the loss function of the discriminant network to be trained is used as the loss function of the discriminant network to be optimized.
  • the loss function of the generation network to be trained is subtracted and associated with the penalty function, and the associated function is used as the loss function of the generation network to be optimized. That is to say, the penalty function is used as a subtraction item to achieve the purpose of penalty.
  • the loss function of the generation network to be trained is L gen
  • the penalty function is ⁇ 1 P 1 + ⁇ 2 P 2 + ⁇ 3 P 3
  • the loss function of the generation network to be trained is performed with the penalty function.
  • the generation network to be optimized has learned two-dimensional information and three-dimensional depth information. At this time, the optimized adversarial training can be optimized to meet the convergence conditions.
  • the generation network to be optimized is used as the target three-dimensional human pose estimation model.
  • the loss function of the generation network to be optimized, the loss function of the discriminant network to be optimized, and the plurality of three-dimensional joint point training samples are used to analyze the generation network to be optimized and the The discriminant network to be optimized performs optimization adversarial training, and the steps of using the generation network to be optimized that the optimized adversarial training reaches convergence conditions as the target three-dimensional human body pose estimation model include:
  • S02441 Obtain a 3D joint point training sample from the plurality of 3D joint point training samples as a target 3D joint point training sample, where the target 3D joint point training sample includes: second image sample data, human body 3D joint point calibration value;
  • S02442 Input the second image sample data of the target three-dimensional joint point training sample into the generation network to be optimized to predict the three-dimensional joint point of the human body, and obtain the predicted value of the three-dimensional joint point sample of the human body;
  • S02445 Repeat the step of obtaining a 3D joint point training sample from the plurality of 3D joint point training samples as a target 3D joint point training sample, until the optimized confrontation training reaches the convergence condition, and the optimized confrontation training reaches the convergence condition
  • the generated network to be optimized is used as the target 3D human pose estimation model.
  • the penalty function obtained from the strong geometric constraint model of human posture is added as a penalty term to the loss function of the generating network to be trained, which strengthens the capture of the deeper spatial characteristics of the image by the target three-dimensional human posture estimation model, and realizes In order to capture the spatial characteristics of images at multiple scales, re-check whether the joint points are correctly predicted, thereby improving the prediction performance of the trained model for interference occlusion, improving the prediction performance of the trained model for interference occlusion, and reducing the impact of noise.
  • the probability of occurrence of normal posture prediction; three-dimensional data regression is used, so that the model can learn the information of three-dimensional depth value.
  • the predicted value of the three-dimensional joint point sample of the human body is the prediction result of the position data of the three-dimensional joint point of the human body in the second image sample data of the target three-dimensional joint point training sample.
  • steps S02441 to S02445 are repeatedly executed until the optimized adversarial training reaches the convergence condition.
  • the generation network to be optimized has learned two-dimensional information and three-dimensional depth information, and the optimized adversarial training can be used at this time.
  • the generating network to be optimized that reaches the convergence condition is used as the target three-dimensional human body pose estimation model.
  • optimizing the adversarial training to reach the convergence condition includes: the loss value of the generation network to be optimized and the loss value of the discriminant network to be optimized both reach the third convergence condition, or, the training times of optimizing the adversarial training reach the fourth convergence condition.
  • the loss value of the generation network to be optimized and the loss value of the discriminant network to be optimized refer to the loss value of the generation network to be optimized for optimizing the adversarial training, and the loss value of the discriminant network to be optimized.
  • the loss values all meet the third convergence condition.
  • the third convergence condition means that the size of the loss value calculated twice adjacent to the same network (that is, one of the generation network to be optimized and the discriminant network to be optimized) satisfies the Lipschitz condition (Lippsch z continuous condition).
  • the number of training times for optimizing adversarial training reaches the fourth convergence condition, which refers to the number of times that the generation network to be optimized and the discriminant network to be optimized are used to optimize adversarial training.
  • the present application also proposes a device for estimating a three-dimensional human body posture, and the device includes:
  • the three-dimensional human body posture estimation module 200 is used for inputting the image to be estimated into the target three-dimensional human body posture estimation model to perform three-dimensional human body posture estimation, wherein the target three-dimensional human body posture estimation model is based on the generation network to be trained, the The model obtained by adversarial training of the discriminant network and the strong geometric constraint model of human posture;
  • the target three-dimensional human body posture data determination module 300 is used for obtaining the target three-dimensional human body posture data corresponding to the image to be estimated according to the three-dimensional human body posture data output by the target three-dimensional human body posture estimation model.
  • an embodiment of the present application further provides a computer device.
  • the computer device may be a server, and its internal structure may be as shown in FIG. 3 .
  • the computer device includes a processor, memory, a network interface, and a database connected by a system bus. Among them, the processor of the computer design is used to provide computing and control capabilities.
  • the memory of the computer device includes a non-volatile storage medium, an internal memory.
  • the nonvolatile storage medium stores an operating system, a computer program, and a database.
  • the memory provides an environment for the execution of the operating system and computer programs in the non-volatile storage medium.
  • the database of the computer equipment is used to store data such as the estimation method of the three-dimensional human body posture.
  • the network interface of the computer device is used to communicate with an external terminal through a network connection.
  • the computer program implements a method for estimating a three-dimensional human pose when executed by a processor.
  • the method for estimating a three-dimensional human body posture includes: acquiring an image to be estimated; inputting the to-be-estimated image into a target three-dimensional human body posture estimation model to perform three-dimensional human body posture estimation, wherein the target three-dimensional human body posture estimation model is based on the to-be-estimated three-dimensional human body posture estimation model.
  • the image to be estimated is input into the target three-dimensional human body posture estimation model to perform three-dimensional human body posture estimation, and the target three-dimensional human body posture data corresponding to the to-be-estimated image is obtained according to the three-dimensional human body posture data output by the target three-dimensional human body posture estimation model.
  • the 3D human pose estimation model is a model obtained by adversarial training based on the generative network to be trained, the discriminant network to be trained and the strong geometric constraint model of human posture. The model improves the prediction performance of the trained model for interference occlusion, and reduces the probability of abnormal posture prediction.
  • An embodiment of the present application also provides a computer-readable storage medium on which a computer program is stored.
  • a method for estimating a three-dimensional human body pose is implemented, including the steps of: acquiring an image to be estimated; The image to be estimated is input to the target three-dimensional human body pose estimation model for three-dimensional human body pose estimation, wherein the target three-dimensional human body pose estimation model is based on the generation network to be trained, the discrimination network to be trained and the human body pose strong geometric constraint model for confrontation The model obtained by training; according to the three-dimensional human body posture data output by the target three-dimensional human body posture estimation model, the target three-dimensional human body posture data corresponding to the image to be estimated is obtained.
  • the above-mentioned method for estimating the three-dimensional human body posture is performed by inputting the image to be estimated into the target three-dimensional human body posture estimation model to perform three-dimensional human body posture estimation, and according to the three-dimensional human body posture data output by the target three-dimensional human body posture estimation model, obtain the image corresponding to the to-be-estimated image.
  • the target 3D human pose data, the target 3D human pose estimation model is a model obtained by adversarial training based on the generation network to be trained, the discriminant network to be trained and the strong geometric constraint model of human pose, and the adversarial training effectively solves the problem of less training samples , the introduction of a strong geometric constraint model of human posture improves the prediction performance of the trained model for interference occlusion, and reduces the probability of abnormal posture prediction.
  • the computer storage medium can be non-volatile or volatile.
  • Nonvolatile memory may include read only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory.
  • Volatile memory may include random access memory (RAM) or external cache memory.
  • RAM is available in various forms such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double-rate SDRAM (SSRSDRAM), enhanced SDRAM (ESDRAM), synchronous Link (Synchlink) DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), etc.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Human Computer Interaction (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Image Analysis (AREA)

Abstract

A three-dimensional human pose estimation method and apparatus, a device, and a storage medium. The method comprises: obtaining an image for estimation (S1); inputting the image for estimation into a target three-dimensional human pose estimation model for three-dimensional human pose estimation, wherein the target three-dimensional human pose estimation model is a model obtained by adversarial training on the basis of a generative network to be trained, a discriminative network to be trained, and a human pose strong geometric constraint model (S2); and obtaining, according to three-dimensional human pose data output by the target three-dimensional human pose estimation model, target three-dimensional human pose data corresponding to the image for estimation (S3). The problem of a few training samples is effectively solved by adversarial training, and the introduction of the human pose strong geometric constraint model improves the prediction performance of the trained model for interference and occlusion, and reduces the probability of abnormal pose prediction.

Description

三维人体姿态的估计方法、装置、设备及存储介质Method, device, device and storage medium for estimating three-dimensional human pose
本申请要求于2021年02月26日提交中国专利局、申请号为2021102196064,发明名称为“三维人体姿态的估计方法、装置、设备及存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims the priority of the Chinese patent application filed on February 26, 2021 with the application number 2021102196064, and the invention title is "Method, Apparatus, Equipment and Storage Medium for Estimating Three-dimensional Human Body Pose", the entire content of which is approved by Reference is incorporated in this application.
技术领域technical field
本申请涉及到人工智能技术领域,特别是涉及到一种三维人体姿态的估计方法、装置、设备及存储介质。The present application relates to the technical field of artificial intelligence, and in particular, to a method, device, device and storage medium for estimating a three-dimensional human body posture.
背景技术Background technique
三维人体姿态检测是计算机视觉领域中的一个重要的研究领域,这一研究领域对于人体动作判别、意图识别、行为检测、运动教学等都有着重要的应用价值和实际意义。3D human pose detection is an important research field in the field of computer vision, which has important application value and practical significance for human action discrimination, intention recognition, behavior detection, and sports teaching.
目前三维人体姿态检测主要采用的方法有两种:其中一种方法是通过先进行二维关节点预测,发明人意识到在二维关节点预测的基础上进行三维深度关节点回归预测,但这种方式在从二维预测结果进行三维回归时普遍存在对空间信息提取利用不足,导致三维预测部分性能较低的问题;另一种方法是直接进行三维关节点的端到端的预测,但这种方法难度很高,实际预测效果不好,性能指标普遍较低。此外,这两种方法无法在存在干扰遮挡时正确预测人体关节点、以及常出现不正常姿态的预测结果的问题。At present, there are two main methods for 3D human pose detection: one method is to predict 2D joint points first. The inventor realizes that 3D depth joint point regression prediction is performed on the basis of 2D joint point prediction, but this This method generally has the problem of insufficient utilization of spatial information extraction when performing 3D regression from 2D prediction results, resulting in low performance of the 3D prediction part; the other method is to directly perform end-to-end prediction of 3D joint points, but this The method is very difficult, the actual prediction effect is not good, and the performance indicators are generally low. In addition, these two methods cannot correctly predict the joint points of the human body when there is interference and occlusion, and the prediction results of abnormal poses often occur.
技术问题technical problem
现有技术中的三维人体姿态检测无法在存在干扰遮挡时正确预测人体关节点、以及常出现不正常姿态的预测结果的技术问题。The three-dimensional human body posture detection in the prior art cannot correctly predict the joint points of the human body when there is interference and occlusion, and the technical problems of abnormal posture prediction results often occur.
技术解决方案technical solutions
本申请的主要目的为提供一种三维人体姿态的估计方法、装置、设备及存储介质,旨在解决现有技术中的三维人体姿态检测无法在存在干扰遮挡时正确预测人体关节点、以及常出现不正常姿态的预测结果的技术问题。The main purpose of the present application is to provide a method, device, equipment and storage medium for estimating a three-dimensional human body posture, aiming to solve the problem that the three-dimensional human body posture detection in the prior art cannot correctly predict human body joint points when there is interference and occlusion, and Technical issues with prediction results for abnormal poses.
为了实现上述发明目的,本申请提出一种三维人体姿态的估计方法,所述方法包括:In order to achieve the above purpose of the invention, the present application proposes a method for estimating a three-dimensional human body posture, the method comprising:
获取待估计的图像;Get the image to be estimated;
将所述待估计的图像输入目标三维人体姿态估计模型进行三维人体姿态估计,其中,所述目标三维人体姿态估计模型是基于待训练的生成网络、待训练的判别网络和人体姿态强几何约束模型进行对抗训练得到的模型;Input the image to be estimated into the target three-dimensional human body posture estimation model for three-dimensional human body posture estimation, wherein the target three-dimensional human body posture estimation model is based on the generation network to be trained, the discrimination network to be trained and the strong geometric constraint model of human posture The model obtained by adversarial training;
根据所述目标三维人体姿态估计模型输出的三维人体姿态数据,得到所述待估计的图像对应的目标三维人体姿态数据。According to the three-dimensional human body posture data output by the target three-dimensional human body posture estimation model, the target three-dimensional human body posture data corresponding to the image to be estimated is obtained.
本申请还提出了一种三维人体姿态的估计装置,所述装置包括:The present application also proposes a device for estimating a three-dimensional human body posture, the device comprising:
数据获取模块,用于获取待估计的图像;a data acquisition module for acquiring the image to be estimated;
三维人体姿态估计模块,用于将所述待估计的图像输入目标三维人体姿态估计模型进行三维人体姿态估计,其中,所述目标三维人体姿态估计模型是基于待训练的生成网络、待训练的判别网络和人体姿态强几何约束模型进行对抗训练得到的模型;A three-dimensional human body posture estimation module is used to input the image to be estimated into a target three-dimensional human body posture estimation model for three-dimensional human posture estimation, wherein the target three-dimensional human body posture estimation model is based on the generation network to be trained, the discrimination to be trained The model obtained by adversarial training between the network and the strong geometric constraint model of human posture;
目标三维人体姿态数据确定模块,用于根据所述目标三维人体姿态估计模型 输出的三维人体姿态数据,得到所述待估计的图像对应的目标三维人体姿态数据。The target three-dimensional human body posture data determination module is used to obtain the target three-dimensional human body posture data corresponding to the image to be estimated according to the three-dimensional human body posture data output by the target three-dimensional human body posture estimation model.
本申请还提出了一种计算机设备,包括存储器和处理器,所述存储器存储有计算机程序,所述处理器执行所述计算机程序时实现如下方法步骤:The present application also proposes a computer device, including a memory and a processor, the memory stores a computer program, and the processor implements the following method steps when executing the computer program:
获取待估计的图像;Get the image to be estimated;
将所述待估计的图像输入目标三维人体姿态估计模型进行三维人体姿态估计,其中,所述目标三维人体姿态估计模型是基于待训练的生成网络、待训练的判别网络和人体姿态强几何约束模型进行对抗训练得到的模型;Input the image to be estimated into the target three-dimensional human body posture estimation model for three-dimensional human body posture estimation, wherein the target three-dimensional human body posture estimation model is based on the generation network to be trained, the discrimination network to be trained and the strong geometric constraint model of human posture The model obtained by adversarial training;
根据所述目标三维人体姿态估计模型输出的三维人体姿态数据,得到所述待估计的图像对应的目标三维人体姿态数据。According to the three-dimensional human body posture data output by the target three-dimensional human body posture estimation model, the target three-dimensional human body posture data corresponding to the image to be estimated is obtained.
本申请还提出了一种计算机可读存储介质,其上存储有计算机程序,所述计算机程序被处理器执行时实现如下方法步骤:The present application also proposes a computer-readable storage medium on which a computer program is stored, and when the computer program is executed by a processor, the following method steps are implemented:
获取待估计的图像;Get the image to be estimated;
将所述待估计的图像输入目标三维人体姿态估计模型进行三维人体姿态估计,其中,所述目标三维人体姿态估计模型是基于待训练的生成网络、待训练的判别网络和人体姿态强几何约束模型进行对抗训练得到的模型;Input the image to be estimated into the target three-dimensional human body posture estimation model for three-dimensional human body posture estimation, wherein the target three-dimensional human body posture estimation model is based on the generation network to be trained, the discrimination network to be trained and the strong geometric constraint model of human posture The model obtained by adversarial training;
根据所述目标三维人体姿态估计模型输出的三维人体姿态数据,得到所述待估计的图像对应的目标三维人体姿态数据。According to the three-dimensional human body posture data output by the target three-dimensional human body posture estimation model, the target three-dimensional human body posture data corresponding to the image to be estimated is obtained.
有益效果beneficial effect
本申请的三维人体姿态的估计方法、装置、设备及存储介质,通过将待估计的图像输入目标三维人体姿态估计模型进行三维人体姿态估计,根据目标三维人体姿态估计模型输出的三维人体姿态数据,得到待估计的图像对应的目标三维人体姿态数据,目标三维人体姿态估计模型是基于待训练的生成网络、待训练的判别网络和人体姿态强几何约束模型进行对抗训练得到的模型,对抗训练有效解决了训练样本较少的问题,引入人体姿态强几何约束模型提高了训练得到的模型对干扰遮挡的预测性能,降低了不正常姿态预测出现的概率。The method, device, device and storage medium for estimating three-dimensional human body posture of the present application, by inputting the image to be estimated into the target three-dimensional human body posture estimation model to perform three-dimensional human body posture estimation, and according to the three-dimensional human body posture data output by the target three-dimensional human body posture estimation model, The target 3D human pose data corresponding to the image to be estimated is obtained. The target 3D human pose estimation model is a model obtained by adversarial training based on the generation network to be trained, the discriminant network to be trained and the strong geometric constraint model of human pose, and the adversarial training effectively solves the problem. In order to solve the problem of less training samples, the introduction of a strong geometric constraint model of human posture improves the prediction performance of the trained model for interference occlusion, and reduces the probability of abnormal posture prediction.
附图说明Description of drawings
图1为本申请一实施例的三维人体姿态的估计方法的流程示意图;FIG. 1 is a schematic flowchart of a method for estimating a three-dimensional human body posture according to an embodiment of the present application;
图2为本申请一实施例的三维人体姿态的估计装置的结构示意框图;FIG. 2 is a schematic structural block diagram of an apparatus for estimating a three-dimensional human body posture according to an embodiment of the present application;
图3为本申请一实施例的计算机设备的结构示意框图。FIG. 3 is a schematic structural block diagram of a computer device according to an embodiment of the present application.
本申请目的实现、功能特点及优点将结合实施例,参照附图做进一步说明。The realization, functional features and advantages of the present application will be further described with reference to the accompanying drawings in conjunction with the embodiments.
本发明的实施方式Embodiments of the present invention
为了使本申请的目的、技术方案及优点更加清楚明白,以下结合附图及实施例,对本申请进行进一步详细说明。应当理解,此处描述的具体实施例仅仅用以解释本申请,并不用于限定本申请。In order to make the purpose, technical solutions and advantages of the present application more clearly understood, the present application will be described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are only used to explain the present application, but not to limit the present application.
为了解决现有技术中的三维人体姿态检测无法在存在干扰遮挡时正确预测人体关节点、以及常出现不正常姿态的预测结果的技术问题,本申请提出了一种三维人体姿态的估计方法,所述方法应用于人工智能技术领域。所述三维人体姿态的估计方法采用首先采用二维数据进行初步对抗训练,然后采用三维数据进行优化对抗训练后得到三维人体姿态估计模型,优化对抗训练时采用人体姿态强几何约束模型作为干扰遮挡和不正常姿态的惩罚项,对抗训练有效解决了训练样本较少的问题,引入人体姿态强几何约束模型提高了训练得到的模型对干扰遮挡的 预测性能,降低了不正常姿态预测出现的概率。In order to solve the technical problems that the three-dimensional human body posture detection in the prior art cannot correctly predict the joint points of the human body when there is interference and occlusion, and the prediction results of abnormal postures often occur, the present application proposes a three-dimensional human body posture estimation method. The above method is applied to the field of artificial intelligence technology. The method for estimating the three-dimensional human posture is to use two-dimensional data for preliminary confrontation training, and then use three-dimensional data to perform optimal confrontation training to obtain a three-dimensional human posture estimation model. The penalty term for abnormal posture, adversarial training effectively solves the problem of fewer training samples, and the introduction of a strong geometric constraint model of human posture improves the prediction performance of the trained model for interference occlusion, and reduces the probability of abnormal posture prediction.
参照图1,本申请实施例中提供一种三维人体姿态的估计方法,所述方法包括:Referring to FIG. 1, an embodiment of the present application provides a method for estimating a three-dimensional human body posture, and the method includes:
S1:获取待估计的图像;S1: Obtain the image to be estimated;
S2:将所述待估计的图像输入目标三维人体姿态估计模型进行三维人体姿态估计,其中,所述目标三维人体姿态估计模型是基于待训练的生成网络、待训练的判别网络和人体姿态强几何约束模型进行对抗训练得到的模型;S2: Input the image to be estimated into the target three-dimensional human body posture estimation model to perform three-dimensional human body posture estimation, wherein the target three-dimensional human body posture estimation model is based on the generation network to be trained, the discrimination network to be trained and the strong geometry of human posture The model obtained by adversarial training of the constraint model;
S3:根据所述目标三维人体姿态估计模型输出的三维人体姿态数据,得到所述待估计的图像对应的目标三维人体姿态数据。S3: According to the three-dimensional human body posture data output by the target three-dimensional human body posture estimation model, obtain the target three-dimensional human body posture data corresponding to the image to be estimated.
本实施例通过将待估计的图像输入目标三维人体姿态估计模型进行三维人体姿态估计,根据目标三维人体姿态估计模型输出的三维人体姿态数据,得到待估计的图像对应的目标三维人体姿态数据,目标三维人体姿态估计模型是基于待训练的生成网络、待训练的判别网络和人体姿态强几何约束模型进行对抗训练得到的模型,对抗训练有效解决了训练样本较少的问题,引入人体姿态强几何约束模型提高了训练得到的模型对干扰遮挡的预测性能,降低了不正常姿态预测出现的概率。In this embodiment, the image to be estimated is input into the target three-dimensional human body posture estimation model to perform three-dimensional human body posture estimation, and the target three-dimensional human body posture data corresponding to the to-be-estimated image is obtained according to the three-dimensional human body posture data output by the target three-dimensional human body posture estimation model. The 3D human pose estimation model is a model obtained by adversarial training based on the generative network to be trained, the discriminant network to be trained and the strong geometric constraint model of human posture. The model improves the prediction performance of the trained model for interference occlusion, and reduces the probability of abnormal posture prediction.
对于S1,可以获取用户输入的待估计的图像,也可以从数据库中获取待估计的图像,还可以是第三方应用***发送的待估计的图像。For S1, the image to be estimated inputted by the user may be acquired, the image to be estimated may also be acquired from a database, or the image to be estimated sent by a third-party application system.
待估计的图像,是指需要估计三维人体姿态的包含人体的数字图像。The image to be estimated refers to a digital image containing a human body whose three-dimensional human pose needs to be estimated.
对于S2,将所述待估计的图像输入目标三维人体姿态估计模型进行三维人体姿态估计,得到所述目标三维人体姿态估计模型输出的三维人体姿态数据。For S2, input the image to be estimated into the target three-dimensional human body posture estimation model to perform three-dimensional human body posture estimation, and obtain the three-dimensional human body posture data output by the target three-dimensional human body posture estimation model.
三维人体姿态数据,是人体16个关节的三维坐标数据。三维坐标数据表述为(x,y,z),x是图像中的横坐标、y是图像中的纵坐标、z是深度值坐标。The three-dimensional human body posture data is the three-dimensional coordinate data of the 16 joints of the human body. The three-dimensional coordinate data is expressed as (x, y, z), where x is the abscissa in the image, y is the ordinate in the image, and z is the depth value coordinate.
其中,先采用多个二维关节点训练样本对待训练的生成网络和待训练的判别网络进行初步对抗训练,初步对抗训练达到收敛条件后再采用三维关节点训练样本继续进行优化对抗训练,在优化对抗训练时,将人体姿态强几何约束模型对人体姿态进行约束应用于生成网络的损失函数中,最后将优化对抗训练达到收敛条件的生成网络作为所述目标三维人体姿态估计模型,提高了训练得到的模型对干扰遮挡的预测性能,降低了不正常姿态预测出现的概率。Among them, multiple two-dimensional joint point training samples are used to perform preliminary adversarial training on the generation network to be trained and the discriminant network to be trained. After the preliminary adversarial training reaches the convergence condition, the optimization adversarial training is continued with the three-dimensional joint point training samples. During adversarial training, the strong geometric constraint model of human pose is used to constrain the human pose in the loss function of the generating network, and finally the generating network that optimizes the adversarial training to reach the convergence condition is used as the target 3D human pose estimation model, which improves the training results. The prediction performance of the model for interference occlusion reduces the probability of abnormal posture prediction.
人体姿态强几何约束模型用于根据三维人体姿态数据对人体姿态是否正确进行预测,人体姿态强几何约束模型的实现方法可以从现有技术中选择,在此不做赘述。The human body posture strong geometric constraint model is used to predict whether the human body posture is correct according to the three-dimensional human body posture data. The implementation method of the human body posture strong geometric constraint model can be selected from the prior art, which will not be repeated here.
对于S3,将所述目标三维人体姿态估计模型输出的三维人体姿态数据作为所述待估计的图像对应的目标三维人体姿态数据。For S3, the three-dimensional human body posture data output by the target three-dimensional human body posture estimation model is used as the target three-dimensional human body posture data corresponding to the image to be estimated.
在一个实施例中,上述将所述待估计的图像输入目标三维人体姿态估计模型进行三维人体姿态估计的步骤之前,还包括:In one embodiment, before the step of inputting the image to be estimated into the target 3D human pose estimation model for 3D human pose estimation, the method further includes:
S021:获取多个二维关节点训练样本、多个三维关节点训练样本;S021: Obtain multiple two-dimensional joint point training samples and multiple three-dimensional joint point training samples;
S022:采用所述多个二维关节点训练样本对所述待训练的生成网络和所述待训练的判别网络进行初步对抗训练,得到待优化的生成网络和待优化的判别网络;S022: Use the plurality of two-dimensional joint point training samples to perform preliminary confrontation training on the generation network to be trained and the discrimination network to be trained, to obtain the generation network to be optimized and the discrimination network to be optimized;
S023:根据所述人体姿态强几何约束模型,得到惩罚函数;S023: obtain a penalty function according to the strong geometric constraint model of the human body posture;
S024:采用所述惩罚函数和所述多个三维关节点训练样本对所述待优化的生成网络和所述待优化的判别网络进行优化对抗训练,将优化对抗训练达到收敛条件的所述待优化的生成网络作为所述目标三维人体姿态估计模型。S024: Use the penalty function and the multiple three-dimensional joint point training samples to perform optimization adversarial training on the generation network to be optimized and the discriminant network to be optimized, and optimize the adversarial training to achieve convergence conditions for the to-be-optimized The generative network is used as the target 3D human pose estimation model.
本实施例实现了基于待训练的生成网络、待训练的判别网络和人体姿态强几何约束模型进行对抗训练得到目标三维人体姿态估计模型,对抗训练有效解决了训练样本较少的问题,引入人体姿态强几何约束模型提高了训练得到的模型对干扰遮挡的预测性能,降低了不正常姿态预测出现的概率。In this embodiment, the target three-dimensional human pose estimation model is obtained by adversarial training based on the generation network to be trained, the discrimination network to be trained, and the strong geometric constraint model of human posture. The strong geometric constraint model improves the prediction performance of the trained model for interference occlusion, and reduces the probability of abnormal posture prediction.
对于S021,可以获取用户输入的多个二维关节点训练样本,也可以从数据库中获取多个二维关节点训练样本,还可以是第三方应用***发送的多个二维关节点训练样本。For S021, multiple two-dimensional joint point training samples input by the user may be obtained, multiple two-dimensional joint point training samples may also be obtained from a database, or multiple two-dimensional joint point training samples sent by a third-party application system.
可以获取用户输入的多个三维关节点训练样本,也可以从数据库中获取多个三维关节点训练样本,还可以是第三方应用***发送的多个三维关节点训练样本。Multiple 3D joint point training samples input by the user can be obtained, multiple 3D joint point training samples can also be obtained from a database, or multiple 3D joint point training samples sent by a third-party application system.
二维关节点训练样本中包括一个第一图像样本数据和一个人体二维关节点标定值。在二维关节点训练样本中,人体二维关节点标定值是对第一图像样本数据中的人体的二维姿态的位置数据进行标定的结果。The two-dimensional joint point training sample includes a first image sample data and a human body two-dimensional joint point calibration value. In the two-dimensional joint point training sample, the calibration value of the two-dimensional joint point of the human body is the result of calibrating the position data of the two-dimensional posture of the human body in the first image sample data.
第一图像样本数据,是包含人体的数字图像。The first image sample data is a digital image containing a human body.
三维关节点训练样本中包括一个第二图像样本数据和一个人体三维关节点标定值。在三维关节点训练样本中,人体三维关节点标定值是对第二图像样本数据中的人体的三维姿态的位置数据进行标定的结果。The three-dimensional joint point training sample includes a second image sample data and a human body three-dimensional joint point calibration value. In the three-dimensional joint point training sample, the calibration value of the three-dimensional joint point of the human body is the result of calibrating the position data of the three-dimensional posture of the human body in the second image sample data.
第二图像样本数据,是包含人体的数字图像。The second image sample data is a digital image including a human body.
对于S022,采用所述多个二维关节点训练样本对所述待训练的生成网络和所述待训练的判别网络进行初步对抗训练,将初步对抗训练达到收敛条件的所述待训练的生成网络作为所述待优化的生成网络,将初步对抗训练达到收敛条件的所述待训练的判别网络作为所述待优化的判别网络,从而使所述待优化的生成网络和所述待优化的判别网络预先学习到一部分的二维关节点信息,有利于解决三维关节点训练样本较少导致训练得到的模型预测性能差的问题。For S022, use the plurality of two-dimensional joint point training samples to perform preliminary confrontation training on the generation network to be trained and the discrimination network to be trained, and perform preliminary confrontation training on the generation network to be trained that reaches a convergence condition As the generation network to be optimized, the discriminant network to be trained whose preliminary adversarial training reaches the convergence condition is used as the discriminant network to be optimized, so that the generation network to be optimized and the discriminant network to be optimized Learning a part of 2D joint point information in advance is beneficial to solve the problem of poor prediction performance of the trained model due to less training samples of 3D joint points.
对于S023,所述人体姿态强几何约束模型进行关节点位置、关节角度、骨骼长度三方面进行人体姿态的几何约束,所述人体姿态强几何约束模型的输出结果出现不正常时,将人体姿态的几何约束作为惩罚函数。For S023, the strong geometric constraint model of the human body posture carries out the geometric constraints of the human body posture in three aspects: joint point position, joint angle, and bone length, and when the output result of the strong geometric constraint model of the human body posture is abnormal, the Geometric constraints as penalty functions.
对于S024,采用所述多个三维关节点训练样本对所述待优化的生成网络和所述待优化的判别网络进行优化对抗训练,优化对抗训练时将所述惩罚函数作为所述待优化的生成网络的损失函数的惩罚项,加强了目标三维人体姿态估计模型对图像更深层空间特性的捕捉,实现了捕捉多个尺度下图像空间特性的重新校验关节点是否被预测正确,从而提高了训练得到的模型对干扰遮挡的预测性能,降低了不正常姿态预测出现的概率。For S024, use the plurality of three-dimensional joint point training samples to perform optimization adversarial training on the generation network to be optimized and the discriminant network to be optimized, and use the penalty function as the generation network to be optimized when optimizing adversarial training. The penalty term of the loss function of the network strengthens the capture of the deeper spatial characteristics of the image by the target 3D human pose estimation model, and realizes the re-checking of whether the joint points are correctly predicted to capture the spatial characteristics of the image at multiple scales, thereby improving training. The prediction performance of the obtained model for interference occlusion reduces the probability of abnormal pose prediction.
在一个实施例中,上述采用所述多个二维关节点训练样本对所述待训练的生成网络和所述待训练的判别网络进行初步对抗训练,得到待优化的生成网络和待优化的判别网络的步骤,包括:In one embodiment, the above-mentioned multiple two-dimensional joint point training samples are used to perform preliminary confrontation training on the generation network to be trained and the discrimination network to be trained, and the generation network to be optimized and the discrimination network to be optimized are obtained. The steps of the network include:
S0221:从所述多个二维关节点训练样本中获取一个二维关节点训练样本,作为目标二维关节点训练样本,所述目标二维关节点训练样本包括:第一图像样本数据、人体二维关节点标定值;S0221: Obtain a two-dimensional joint point training sample from the plurality of two-dimensional joint point training samples as a target two-dimensional joint point training sample, where the target two-dimensional joint point training sample includes: first image sample data, human body 2D joint point calibration value;
S0222:将所述目标二维关节点训练样本的所述第一图像样本数据输入所述待训练的生成网络进行人体二维关节点预测,得到人体二维关节点样本预测值;S0222: Input the first image sample data of the target two-dimensional joint point training sample into the to-be-trained generation network to predict human two-dimensional joint points, and obtain a predicted value of human two-dimensional joint point samples;
S0223:将所述目标二维关节点训练样本的所述人体二维关节点标定值和所述人体二维关节点样本预测值进行判别,得到第一置信结果;S0223: Distinguish the calibration value of the human body two-dimensional joint point of the target two-dimensional joint point training sample and the predicted value of the human body two-dimensional joint point sample to obtain a first confidence result;
S0224:采用所述目标二维关节点训练样本的所述人体二维关节点标定值、 所述人体二维关节点样本预测值、所述第一置信结果对所述待训练的生成网络和所述待训练的判别网络进行初步对抗训练;S0224: Use the calibration value of the two-dimensional human body joint point of the target two-dimensional joint point training sample, the predicted value of the two-dimensional joint point sample of the human body, and the first confidence result to the generation network to be trained. Perform preliminary adversarial training on the discriminant network to be trained;
S0225:重复执行所述从所述多个二维关节点训练样本中获取一个二维关节点训练样本,作为目标二维关节点训练样本的步骤,直至初步对抗训练达到收敛条件,将初步对抗训练达到收敛条件的所述待训练的生成网络作为所述待优化的生成网络,将初步对抗训练达到收敛条件的所述待训练的判别网络作为所述待优化的判别网络。S0225: Repeat the step of obtaining a two-dimensional joint point training sample from the plurality of two-dimensional joint point training samples as a target two-dimensional joint point training sample, until the preliminary confrontation training reaches a convergence condition, and then use the preliminary confrontation training The generation network to be trained that reaches the convergence condition is used as the generation network to be optimized, and the discriminant network to be trained that reaches the convergence condition through preliminary adversarial training is used as the discrimination network to be optimized.
本实施例使所述待优化的生成网络和所述待优化的判别网络预先学习到一部分的二维关节点信息,有利于解决三维关节点训练样本较少导致训练得到的模型预测性能差的问题。This embodiment enables the generation network to be optimized and the discrimination network to be optimized to learn a part of the two-dimensional joint point information in advance, which is beneficial to solve the problem of poor prediction performance of the trained model due to fewer three-dimensional joint point training samples .
对于S0221,从所述多个二维关节点训练样本中获取一个二维关节点训练样本,将获取的二维关节点训练样本作为目标二维关节点训练样本。For S0221, acquire a two-dimensional joint point training sample from the plurality of two-dimensional joint point training samples, and use the acquired two-dimensional joint point training sample as a target two-dimensional joint point training sample.
对于S0222,将所述目标二维关节点训练样本的所述第一图像样本数据输入所述待训练的生成网络进行人体二维关节点预测,将人体二维关节点预测结果作为所述目标二维关节点训练样本对应的人体二维关节点样本预测值。For S0222, input the first image sample data of the target two-dimensional joint point training sample into the generation network to be trained to predict human two-dimensional joint points, and use the human body two-dimensional joint point prediction result as the target two The predicted value of the two-dimensional joint point sample of the human body corresponding to the training sample of the two-dimensional joint point.
人体二维关节点样本预测值,是对所述目标二维关节点训练样本的所述第一图像样本数据的人体二维关节点位置数据的预测结果。The predicted value of the human body two-dimensional joint point sample is the prediction result of the human body two-dimensional joint point position data of the first image sample data of the target two-dimensional joint point training sample.
对于S0223,将所述目标二维关节点训练样本的所述人体二维关节点标定值和所述人体二维关节点样本预测值进行判别,将判别得到的置信结果作为所述人体二维关节点样本预测值对应的第一置信结果。For S0223, discriminate the two-dimensional human body joint point calibration value of the target two-dimensional joint point training sample and the human body two-dimensional joint point sample predicted value, and use the discriminated confidence result as the human body two-dimensional joint The first confidence result corresponding to the predicted value of the point sample.
对于S0224,采用所述目标二维关节点训练样本的所述人体二维关节点标定值、所述人体二维关节点样本预测值对所述待训练的生成网络进行损失值计算和参数更新,采用所述第一置信结果对所述待训练的判别网络进行损失值计算和参数更新。For S0224, use the calibration value of the two-dimensional human body joint point of the target two-dimensional joint point training sample and the predicted value of the human two-dimensional joint point sample to perform loss value calculation and parameter update on the generation network to be trained, The first confidence result is used to calculate the loss value and update the parameters of the discriminant network to be trained.
对于S0225,重复执行步骤S0221至步骤S0225,直至初步对抗训练达到收敛条件,将初步对抗训练达到收敛条件的所述待训练的生成网络作为所述待优化的生成网络,将初步对抗训练达到收敛条件的所述待训练的判别网络作为所述待优化的判别网络,从而使生成网络和判别网络预先学习到一部分二维关节点信息。For S0225, steps S0221 to S0225 are repeatedly executed until the preliminary confrontation training reaches the convergence condition, the generation network to be trained where the preliminary confrontation training reaches the convergence condition is used as the generation network to be optimized, and the preliminary confrontation training reaches the convergence condition The discriminant network to be trained is used as the discriminant network to be optimized, so that the generating network and the discriminant network learn a part of two-dimensional joint point information in advance.
其中,初步对抗训练达到收敛条件包括:所述待训练的生成网络的损失值、所述待训练的判别网络的损失值均达到第一收敛条件,或者,初步对抗训练的训练次数达到第二收敛条件。Wherein, the convergence condition of preliminary confrontation training includes: the loss value of the generating network to be trained and the loss value of the discriminating network to be trained both reach the first convergence condition, or the number of training times of preliminary confrontation training reaches the second convergence condition condition.
其中,所述待训练的生成网络的损失值、所述待训练的判别网络的损失值,是指,初步对抗训练的所述待训练的生成网络的损失值、所述待训练的判别网络的损失值全部达到第一收敛条件。Wherein, the loss value of the generation network to be trained and the loss value of the discriminant network to be trained refer to the loss value of the generation network to be trained during preliminary adversarial training, and the loss value of the discriminant network to be trained. The loss values all meet the first convergence condition.
第一收敛条件,是指同一个网络(也就是待训练的生成网络和待训练的判别网络中的一个)相邻两次计算的损失值的大小满足lipschitz条件(利普希茨连续条件)。The first convergence condition means that the size of the loss value calculated twice adjacent to the same network (that is, one of the generation network to be trained and the discriminant network to be trained) satisfies the Lipschitz condition (the Lipschitz continuity condition).
初步对抗训练的训练次数达到第二收敛条件,是指待训练的生成网络和待训练的判别网络被用于初步对抗训练的次数,也就是说,初步对抗训练一次,初步对抗训练的训练次数增加1。The number of training times of preliminary adversarial training reaches the second convergence condition, which refers to the number of times that the generating network to be trained and the discriminative network to be trained are used for preliminary adversarial training, that is, the number of preliminary adversarial training is increased once the preliminary adversarial training is performed. 1.
在一个实施例中,上述采用所述目标二维关节点训练样本的所述人体二维关节点标定值、所述人体二维关节点样本预测值、所述第一置信结果对所述待训练的生成网络和所述待训练的判别网络进行初步对抗训练的步骤,包括:In an embodiment, the calibration value of the two-dimensional joint point of the human body, the predicted value of the two-dimensional joint point sample of the human body, and the first confidence result of the target two-dimensional joint point training sample are used for the training sample. The generating network and the discriminant network to be trained perform preliminary adversarial training, including:
S02241:将所述目标二维关节点训练样本的所述人体二维关节点标定值和所述人体二维关节点样本预测值输入所述待训练的生成网络的损失函数进行计算,得到所述待训练的生成网络的第一损失值,根据所述第一损失值更新所述待训练的生成网络的参数;S02241: Input the calibration value of the human body two-dimensional joint point of the target two-dimensional joint point training sample and the predicted value of the human body two-dimensional joint point sample into the loss function of the generation network to be trained for calculation, and obtain the the first loss value of the generation network to be trained, and update the parameters of the generation network to be trained according to the first loss value;
S02242:将所述第一置信结果输入所述待训练的判别网络的损失函数进行计算,得到所述待训练的判别网络的第二损失值,根据所述第二损失值更新所述待训练的判别网络的参数;S02242: Input the first confidence result into the loss function of the to-be-trained discriminant network for calculation, obtain a second loss value of the to-be-trained discriminant network, and update the to-be-trained discriminant network according to the second loss value Discriminate the parameters of the network;
其中,所述待训练的生成网络的损失函数采用MSE损失函数,所述待训练的判别网络的损失函数采用交叉熵损失函数。Wherein, the loss function of the generation network to be trained adopts the MSE loss function, and the loss function of the discriminant network to be trained adopts the cross entropy loss function.
本实施例采用所述目标二维关节点训练样本的所述人体二维关节点标定值、所述人体二维关节点样本预测值、所述第一置信结果对所述待训练的生成网络和所述待训练的判别网络进行初步对抗训练,使所述待优化的生成网络和所述待优化的判别网络预先学习到一部分的二维关节点信息,有利于解决三维关节点训练样本较少导致训练得到的模型预测性能差的问题。This embodiment uses the calibration value of the two-dimensional joint point of the human body of the target two-dimensional joint point training sample, the predicted value of the two-dimensional joint point sample of the human body, and the first confidence result. The discriminant network to be trained is subjected to preliminary confrontation training, so that the generation network to be optimized and the discriminant network to be optimized learn a part of the two-dimensional joint point information in advance, which is beneficial to solve the problem of less three-dimensional joint point training samples. The problem of poor prediction performance of the trained model.
对于S02241,将所述目标二维关节点训练样本的所述人体二维关节点标定值和所述人体二维关节点样本预测值输入所述待训练的生成网络的损失函数(也就是MSE损失函数)进行损失值计算,将计算得到的损失值作为所述待训练的生成网络的第一损失值。For S02241, input the calibration value of the two-dimensional human body joint point of the target two-dimensional joint point training sample and the predicted value of the human two-dimensional joint point sample into the loss function of the generation network to be trained (that is, the MSE loss function) to calculate the loss value, and use the calculated loss value as the first loss value of the generating network to be trained.
根据所述第一损失值更新所述待训练的生成网络的参数的方法可以从现有技术中选择,在此不做赘述。The method for updating the parameters of the generating network to be trained according to the first loss value can be selected from the prior art, and details are not described here.
MSE损失函数的实现方式可以从现有技术中选择,在此不做赘述。The implementation manner of the MSE loss function can be selected from the prior art, and details are not described here.
对于S02242,将所述第一置信结果输入所述待训练的判别网络的损失函数进行损失值计算,将计算得到的损失值作为所述待训练的判别网络的第二损失值。For S02242, the first confidence result is input into the loss function of the discriminant network to be trained to calculate the loss value, and the calculated loss value is used as the second loss value of the discriminant network to be trained.
根据所述第二损失值更新所述待训练的判别网络的参数的方法可以从现有技术中选择,在此不做赘述。The method for updating the parameters of the discriminant network to be trained according to the second loss value can be selected from the prior art, and details are not described here.
交叉熵损失函数的实现方式可以从现有技术中选择,在此不做赘述。The implementation manner of the cross-entropy loss function can be selected from the prior art, and details are not described here.
在一个实施例中,上述根据所述人体姿态强几何约束模型,得到惩罚函数的步骤,包括:In one embodiment, the above-mentioned steps of obtaining a penalty function according to the strong geometric constraint model of the human body posture include:
S0231:根据所述人体姿态强几何约束模型进行关节点位置约束,得到关节点位置惩罚项;S0231: carry out joint point position constraint according to the strong geometric constraint model of the human body posture, and obtain a joint point position penalty item;
S0232:根据所述人体姿态强几何约束模型进行关节角度约束,得到关节角度惩罚项;S0232: carry out joint angle constraint according to the strong geometric constraint model of the human body posture, and obtain a joint angle penalty item;
S0233:根据所述人体姿态强几何约束模型进行骨骼长度约束,得到骨骼长度惩罚项;S0233: carry out bone length constraint according to the strong geometric constraint model of the human body posture, and obtain a bone length penalty term;
S0234:根据所述关节点位置惩罚项、所述关节角度惩罚项和所述骨骼长度约束,得到所述惩罚函数。S0234: Obtain the penalty function according to the joint point position penalty item, the joint angle penalty item, and the bone length constraint.
本实施例引入人体姿态强几何约束模型提高了训练得到的模型对干扰遮挡的预测性能,降低了不正常姿态预测出现的概率。In this embodiment, the strong geometric constraint model of human posture is introduced to improve the prediction performance of the trained model for interference occlusion, and reduce the probability of abnormal posture prediction.
对于S0231,关节点位置的几何约束,主要体现在不同关节点在人体中的位置之间存在相对位置关系。For S0231, the geometric constraints on the positions of joint points are mainly reflected in the relative positional relationship between the positions of different joint points in the human body.
其中,将人体三维关节点位置数据输入人体姿态强几何约束模型,人体姿态强几何约束模型预测结果是关节点相对位置关系偏差,将关节点相对位置关系偏差的平方损失作为关节点位置惩罚项。Among them, the three-dimensional joint point position data of the human body is input into the strong geometric constraint model of human posture, and the prediction result of the strong geometric constraint model of human posture is the deviation of the relative position relationship of the joint points, and the square loss of the relative position relationship deviation of the joint points is used as the joint point position penalty item.
对于S0232,关节角度的几何约束,主要考虑人体关节活动、骨骼弯曲时存在一定几何约束,不会出现非正常几何形态。For S0232, the geometric constraints of joint angles mainly consider that there are certain geometric constraints when human joints move and bones are bent, and abnormal geometric shapes will not appear.
其中,将人体三维关节点位置数据输入人体姿态强几何约束模型,人体姿态强几何约束模型预测结果是骨骼出现不正常弯曲时,将不正常弯曲的平方损失作为关节角度惩罚项。比如,人体中A骨骼与B骨骼的正常关节角度是C 11,当人体三维关节点位置数据中A骨骼与B骨骼的实际关节角度是C 12,当C 12大于C 11时将平方损失(C 12-C 11) 2作为A骨骼与B骨骼之间关节角度对应的惩罚项,当C 12小于等于C 11时将A骨骼与B骨骼之间关节角度对应的惩罚项的值置为0,在此举例不做具体限定。 Among them, the three-dimensional joint point position data of the human body is input into the strong geometric constraint model of human posture, and the prediction result of the strong geometric constraint model of human posture is that when the bones are abnormally bent, the square loss of the abnormal bending is used as the joint angle penalty item. For example, the normal joint angle of A bone and B bone in the human body is C 11 . When the actual joint angle of A bone and B bone in the three-dimensional joint point position data of the human body is C 12 , when C 12 is greater than C 11 , the square loss (C 12 -C 11 ) 2 is used as the penalty item corresponding to the joint angle between the A bone and the B bone. When C 12 is less than or equal to C 11 , the value of the penalty item corresponding to the joint angle between the A bone and the B bone is set to 0. This example is not specifically limited.
比如,人体的关节角度有C 1、C 2、C 3、C 4,则人体三维关节点位置数据输入人体姿态强几何约束模型预测出C 2和C 3的关节角度异常,此时将得到关节角度惩罚项为0+(C 22-C 21) 2+(C 32-C 31) 2+0,第一个0是关节角度C 1的惩罚项,(C 22-C 21) 2是关节角度C 2的惩罚项,(C 32-C 31) 2是关节角度C 3的惩罚项,第二个0是关节角度C 4的惩罚项,在此举例不做具体限定。 For example, if the joint angles of the human body are C 1 , C 2 , C 3 , and C 4 , the three-dimensional joint point position data of the human body is input into the strong geometric constraint model of the human body posture to predict that the joint angles of C 2 and C 3 are abnormal. The angle penalty term is 0+(C 22 -C 21 ) 2 +(C 32 -C 31 ) 2 +0, the first 0 is the penalty term for the joint angle C 1 , and (C 22 -C 21 ) 2 is the joint angle The penalty item of C 2 , (C 32 -C 31 ) 2 is the penalty item of the joint angle C 3 , and the second 0 is the penalty item of the joint angle C 4 , which is not specifically limited in this example.
对于S0233,骨骼长度的几何约束,主要体现在人体骨骼存在一定比例关系。For S0233, the geometric constraint of bone length is mainly reflected in the existence of a certain proportional relationship between human bones.
在对人体姿态强几何约束模型训练时,先从训练数据集中进行骨骼比例关***计,将正确的骨骼比例关系作为人体姿态强几何约束模型的对比阈值。When training a model with strong geometric constraints on human posture, the proportion of bones is calculated from the training data set, and the correct proportion of bones is used as the comparison threshold for the model with strong geometric constraints on human posture.
其中,将人体三维关节点位置数据输入人体姿态强几何约束模型,人体姿态强几何约束模型预测结果是与已有骨骼比例关系出现偏差时,将骨骼比例关系出现偏差的平方损失作为骨骼长度惩罚项。Among them, the position data of the three-dimensional joint points of the human body is input into the strong geometric constraint model of human posture, and the prediction result of the strong geometric constraint model of human posture is that when there is a deviation from the existing bone proportional relationship, the square loss of the deviation of the bone proportional relationship is used as the bone length penalty item. .
对于S0234,将所述关节点位置惩罚项、所述关节角度惩罚项和所述骨骼长度约束进行相加关联成一个函数,将关联得到的函数作为所述惩罚函数。For S0234, the joint point position penalty item, the joint angle penalty item and the bone length constraint are added and associated to form a function, and the associated function is used as the penalty function.
可选的,关节点位置惩罚项为P 1,关节角度惩罚项P 2,骨骼长度惩罚项P 3,则将相加关联成一个函数λ 1P 12P 23P 3作为惩罚函数,其中,λ 123是每个惩罚项的惩罚系数,惩罚系数的取值范围是λ 123∈(0,1)。 Optionally, the joint point position penalty item is P 1 , the joint angle penalty item P 2 , and the bone length penalty item P 3 , then the addition is associated with a function λ 1 P 12 P 23 P 3 as Penalty function, where λ 1 , λ 2 , λ 3 are the penalty coefficients of each penalty item, and the value range of the penalty coefficients is λ 1 , λ 2 , λ 3 ∈(0,1).
在一个实施例中,上述采用所述惩罚函数和所述多个三维关节点训练样本对所述待优化的生成网络和所述待优化的判别网络进行优化对抗训练,将优化对抗训练达到收敛条件的所述待优化的生成网络作为所述目标三维人体姿态估计模型的步骤,包括:In one embodiment, the penalty function and the plurality of three-dimensional joint point training samples are used to perform optimal adversarial training on the generation network to be optimized and the discriminant network to be optimized, and the optimal adversarial training reaches a convergence condition. The step of generating the network to be optimized as the target three-dimensional human body pose estimation model, including:
S0241:获取所述待训练的生成网络的损失函数和所述待训练的判别网络的损失函数;S0241: Obtain the loss function of the generation network to be trained and the loss function of the discriminant network to be trained;
S0242:将所述待训练的判别网络的损失函数作为所述待优化的判别网络的损失函数;S0242: Use the loss function of the discriminant network to be trained as the loss function of the discriminant network to be optimized;
S0243:将所述惩罚函数作为惩罚项添加到所述待训练的生成网络的损失函数中,得到所述待优化的生成网络的损失函数;S0243: adding the penalty function as a penalty item to the loss function of the generation network to be trained, to obtain the loss function of the generation network to be optimized;
S0244:采用所述待优化的生成网络的损失函数、所述待优化的判别网络的损失函数和所述多个三维关节点训练样本对所述待优化的生成网络和所述待优化的判别网络进行优化对抗训练,将优化对抗训练达到收敛条件的所述待优化的生成网络作为所述目标三维人体姿态估计模型。S0244: Use the loss function of the generation network to be optimized, the loss function of the discriminant network to be optimized, and the multiple three-dimensional joint point training samples to analyze the generation network to be optimized and the discriminant network to be optimized The optimization confrontation training is performed, and the generation network to be optimized where the optimization confrontation training reaches a convergence condition is used as the target three-dimensional human body pose estimation model.
本实施例引入人体姿态强几何约束模型得到的所述惩罚函数作为惩罚项添加到所述待训练的生成网络的损失函数,加强了目标三维人体姿态估计模型对图 像更深层空间特性的捕捉,实现了捕捉多个尺度下图像空间特性的重新校验关节点是否被预测正确,提高了训练得到的模型对干扰遮挡的预测性能,降低了不正常姿态预测出现的概率。In this embodiment, the penalty function obtained from the strong geometric constraint model of human posture is added as a penalty term to the loss function of the generating network to be trained, which strengthens the capture of the deeper spatial characteristics of the image by the target three-dimensional human posture estimation model, and realizes In order to capture the spatial characteristics of images at multiple scales, re-check whether the joint points are correctly predicted, improve the prediction performance of the trained model for interference occlusion, and reduce the probability of abnormal posture prediction.
对于S0241,可以获取用户输入的所述待训练的生成网络的损失函数和所述待训练的判别网络的损失函数,也可以从数据库中获取的所述待训练的生成网络的损失函数和所述待训练的判别网络的损失函数,还可以是第三方应用***发送的所述待训练的生成网络的损失函数和所述待训练的判别网络的损失函数。可以理解的是,也可以将所述待训练的生成网络的损失函数和所述待训练的判别网络的损失函数写入实现本申请的程序文件中。For S0241, the loss function of the generation network to be trained and the loss function of the discriminant network to be trained and the loss function of the generation network to be trained and the loss function of the generation network to be trained can also be obtained from the database. The loss function of the discriminant network to be trained may also be the loss function of the generation network to be trained and the loss function of the discriminant network to be trained sent by a third-party application system. It can be understood that, the loss function of the generation network to be trained and the loss function of the discriminant network to be trained can also be written into the program file for realizing the present application.
对于S0242,可选的,将所述待训练的判别网络的损失函数,也就是交叉熵损失函数作为所述待优化的判别网络的损失函数。For S0242, optionally, the loss function of the discriminant network to be trained, that is, the cross-entropy loss function, is used as the loss function of the discriminant network to be optimized.
对于S0243,所述待训练的生成网络的损失函数与所述惩罚函数进行相减关联,将关联后的函数作为所述待优化的生成网络的损失函数。也就是说,将惩罚函数作为减项,以达到对惩罚的目的。For S0243, the loss function of the generation network to be trained is subtracted and associated with the penalty function, and the associated function is used as the loss function of the generation network to be optimized. That is to say, the penalty function is used as a subtraction item to achieve the purpose of penalty.
比如,所述待训练的生成网络的损失函数是L gen,惩罚函数是λ 1P 12P 23P 3,所述待训练的生成网络的损失函数与所述惩罚函数进行相减关联得到L gen-(λ 1P 12P 23P 3),将L gen-(λ 1P 12P 23P 3)作为所述待优化的生成网络的损失函数。 For example, the loss function of the generation network to be trained is L gen , the penalty function is λ 1 P 12 P 23 P 3 , and the loss function of the generation network to be trained is performed with the penalty function. Subtract the correlation to obtain L gen -(λ 1 P 12 P 23 P 3 ), and use L gen -(λ 1 P 12 P 23 P 3 ) as the to-be-optimized Generate the loss function of the network.
对于S0244,采用所述多个三维关节点训练样本对所述待优化的生成网络和所述待优化的判别网络进行优化对抗训练,优化对抗训练时采用所述待优化的生成网络的损失函数和所述待优化的判别网络的损失函数计算损失值,优化对抗训练后,所述待优化的生成网络已经学习到二维信息和三维的深度信息,此时可以将优化对抗训练达到收敛条件的所述待优化的生成网络作为所述目标三维人体姿态估计模型。For S0244, use the plurality of three-dimensional joint point training samples to perform optimization adversarial training on the generation network to be optimized and the discriminant network to be optimized, and use the loss function and The loss function of the discriminant network to be optimized calculates the loss value. After the optimized adversarial training, the generation network to be optimized has learned two-dimensional information and three-dimensional depth information. At this time, the optimized adversarial training can be optimized to meet the convergence conditions. The generation network to be optimized is used as the target three-dimensional human pose estimation model.
在一个实施例中,上述采用所述待优化的生成网络的损失函数、所述待优化的判别网络的损失函数和所述多个三维关节点训练样本对所述待优化的生成网络和所述待优化的判别网络进行优化对抗训练,将优化对抗训练达到收敛条件的所述待优化的生成网络作为所述目标三维人体姿态估计模型的步骤,包括:In an embodiment, the loss function of the generation network to be optimized, the loss function of the discriminant network to be optimized, and the plurality of three-dimensional joint point training samples are used to analyze the generation network to be optimized and the The discriminant network to be optimized performs optimization adversarial training, and the steps of using the generation network to be optimized that the optimized adversarial training reaches convergence conditions as the target three-dimensional human body pose estimation model include:
S02441:从所述多个三维关节点训练样本中获取一个三维关节点训练样本,作为目标三维关节点训练样本,所述目标三维关节点训练样本包括:第二图像样本数据、人体三维关节点标定值;S02441: Obtain a 3D joint point training sample from the plurality of 3D joint point training samples as a target 3D joint point training sample, where the target 3D joint point training sample includes: second image sample data, human body 3D joint point calibration value;
S02442:将所述目标三维关节点训练样本的所述第二图像样本数据输入所述待优化的生成网络进行人体三维关节点预测,得到人体三维关节点样本预测值;S02442: Input the second image sample data of the target three-dimensional joint point training sample into the generation network to be optimized to predict the three-dimensional joint point of the human body, and obtain the predicted value of the three-dimensional joint point sample of the human body;
S02443:将所述目标三维关节点训练样本的所述人体三维关节点标定值和所述人体三维关节点样本预测值进行判别,得到第二置信结果;S02443: Distinguish the calibration value of the three-dimensional human body joint point of the target three-dimensional joint point training sample and the predicted value of the three-dimensional human body joint point sample to obtain a second confidence result;
S02444:采用所述待优化的生成网络的损失函数、所述待优化的判别网络的损失函数、所述目标三维关节点训练样本的所述人体三维关节点标定值、所述人体三维关节点样本预测值、所述第二置信结果对所述待优化的生成网络和所述待优化的判别网络进行优化对抗训练;S02444: Adopt the loss function of the generation network to be optimized, the loss function of the discriminant network to be optimized, the calibration value of the human body three-dimensional joint point of the target three-dimensional joint point training sample, and the human three-dimensional joint point sample The predicted value and the second confidence result are subjected to optimization adversarial training on the generation network to be optimized and the discriminant network to be optimized;
S02445:重复执行所述从所述多个三维关节点训练样本中获取一个三维关节点训练样本,作为目标三维关节点训练样本的步骤,直至优化对抗训练达到收敛条件,将优化对抗训练达到收敛条件的所述待优化的生成网络作为所述目标三维人体姿态估计模型。S02445: Repeat the step of obtaining a 3D joint point training sample from the plurality of 3D joint point training samples as a target 3D joint point training sample, until the optimized confrontation training reaches the convergence condition, and the optimized confrontation training reaches the convergence condition The generated network to be optimized is used as the target 3D human pose estimation model.
本实施例引入人体姿态强几何约束模型得到的所述惩罚函数作为惩罚项添加到所述待训练的生成网络的损失函数,加强了目标三维人体姿态估计模型对图像更深层空间特性的捕捉,实现了捕捉多个尺度下图像空间特性的重新校验关节点是否被预测正确,从而提高了训练得到的模型对干扰遮挡的预测性能,提高了训练得到的模型对干扰遮挡的预测性能,降低了不正常姿态预测出现的概率;采用三维数据回归,从而使模型学习到三维的深度值的信息。In this embodiment, the penalty function obtained from the strong geometric constraint model of human posture is added as a penalty term to the loss function of the generating network to be trained, which strengthens the capture of the deeper spatial characteristics of the image by the target three-dimensional human posture estimation model, and realizes In order to capture the spatial characteristics of images at multiple scales, re-check whether the joint points are correctly predicted, thereby improving the prediction performance of the trained model for interference occlusion, improving the prediction performance of the trained model for interference occlusion, and reducing the impact of noise. The probability of occurrence of normal posture prediction; three-dimensional data regression is used, so that the model can learn the information of three-dimensional depth value.
对于S02441,从所述多个三维关节点训练样本中获取一个三维关节点训练样本,将获取的三维关节点训练样本作为目标三维关节点训练样本。For S02441, obtain a 3D joint point training sample from the plurality of 3D joint point training samples, and use the obtained 3D joint point training sample as a target 3D joint point training sample.
对于S02442,将所述目标三维关节点训练样本的所述第二图像样本数据输入所述待训练的生成网络进行人体三维关节点预测,将人体三维关节点预测结果作为所述目标三维关节点训练样本对应的人体三维关节点样本预测值。For S02442, input the second image sample data of the target 3D joint point training sample into the to-be-trained generation network to predict human body 3D joint points, and use the human body 3D joint point prediction result as the target 3D joint point training The predicted value of the human 3D joint point sample corresponding to the sample.
人体三维关节点样本预测值,是对所述目标三维关节点训练样本的所述第二图像样本数据的人体三维关节点位置数据的预测结果。The predicted value of the three-dimensional joint point sample of the human body is the prediction result of the position data of the three-dimensional joint point of the human body in the second image sample data of the target three-dimensional joint point training sample.
对于S02443,将所述目标三维关节点训练样本的所述人体三维关节点标定值和所述人体三维关节点样本预测值进行判别,将判别得到的置信结果作为所述人体三维关节点样本预测值对应的第二置信结果。For S02443, discriminate the calibration value of the three-dimensional human body joint point of the target three-dimensional joint point training sample and the predicted value of the three-dimensional human body joint point sample, and use the discriminated confidence result as the predicted value of the three-dimensional human body joint point sample The corresponding second confidence result.
对于S02444,采用所述目标三维关节点训练样本的所述人体三维关节点标定值、所述人体三维关节点样本预测值对所述待训练的生成网络进行损失值计算和参数更新,采用所述第二置信结果对所述待训练的判别网络进行损失值计算和参数更新。For S02444, use the calibration value of the three-dimensional human body joint point of the target three-dimensional joint point training sample and the predicted value of the three-dimensional joint point sample of the human body to calculate the loss value and update the parameters of the generating network to be trained, and use the The second confidence result performs loss value calculation and parameter update on the discriminant network to be trained.
对于S02445,重复执行步骤S02441至步骤S02445,直至优化对抗训练达到收敛条件,优化对抗训练后,所述待优化的生成网络已经学习到二维信息和三维的深度信息,此时可以将优化对抗训练达到收敛条件的所述待优化的生成网络作为所述目标三维人体姿态估计模型。For S02445, steps S02441 to S02445 are repeatedly executed until the optimized adversarial training reaches the convergence condition. After the optimized adversarial training, the generation network to be optimized has learned two-dimensional information and three-dimensional depth information, and the optimized adversarial training can be used at this time. The generating network to be optimized that reaches the convergence condition is used as the target three-dimensional human body pose estimation model.
其中,优化对抗训练达到收敛条件包括:所述待优化的生成网络的损失值、所述待优化的判别网络的损失值均达到第三收敛条件,或者,优化对抗训练的训练次数达到第四收敛条件。Wherein, optimizing the adversarial training to reach the convergence condition includes: the loss value of the generation network to be optimized and the loss value of the discriminant network to be optimized both reach the third convergence condition, or, the training times of optimizing the adversarial training reach the fourth convergence condition.
其中,所述待优化的生成网络的损失值、所述待优化的判别网络的损失值,是指,优化对抗训练的所述待优化的生成网络的损失值、所述待优化的判别网络的损失值全部达到第三收敛条件。Wherein, the loss value of the generation network to be optimized and the loss value of the discriminant network to be optimized refer to the loss value of the generation network to be optimized for optimizing the adversarial training, and the loss value of the discriminant network to be optimized. The loss values all meet the third convergence condition.
第三收敛条件,是指同一个网络(也就是所述待优化的生成网络、所述待优化的判别网络中的一个)的相邻两次计算的损失值的大小满足lipschitz条件(利普希茨连续条件)。The third convergence condition means that the size of the loss value calculated twice adjacent to the same network (that is, one of the generation network to be optimized and the discriminant network to be optimized) satisfies the Lipschitz condition (Lippsch z continuous condition).
优化对抗训练的训练次数达到第四收敛条件,是指待优化的生成网络和待优化的判别网络被用于优化对抗训练的次数,也就是说,优化对抗训练一次,优化对抗训练的训练次数增加1。The number of training times for optimizing adversarial training reaches the fourth convergence condition, which refers to the number of times that the generation network to be optimized and the discriminant network to be optimized are used to optimize adversarial training. 1.
参照图2,本申请还提出了一种三维人体姿态的估计装置,所述装置包括:Referring to FIG. 2 , the present application also proposes a device for estimating a three-dimensional human body posture, and the device includes:
数据获取模块100,用于获取待估计的图像;a data acquisition module 100 for acquiring the image to be estimated;
三维人体姿态估计模块200,用于将所述待估计的图像输入目标三维人体姿态估计模型进行三维人体姿态估计,其中,所述目标三维人体姿态估计模型是基于待训练的生成网络、待训练的判别网络和人体姿态强几何约束模型进行对抗训练得到的模型;The three-dimensional human body posture estimation module 200 is used for inputting the image to be estimated into the target three-dimensional human body posture estimation model to perform three-dimensional human body posture estimation, wherein the target three-dimensional human body posture estimation model is based on the generation network to be trained, the The model obtained by adversarial training of the discriminant network and the strong geometric constraint model of human posture;
目标三维人体姿态数据确定模块300,用于根据所述目标三维人体姿态估计 模型输出的三维人体姿态数据,得到所述待估计的图像对应的目标三维人体姿态数据。The target three-dimensional human body posture data determination module 300 is used for obtaining the target three-dimensional human body posture data corresponding to the image to be estimated according to the three-dimensional human body posture data output by the target three-dimensional human body posture estimation model.
参照图3,本申请实施例中还提供一种计算机设备,该计算机设备可以是服务器,其内部结构可以如图3所示。该计算机设备包括通过***总线连接的处理器、存储器、网络接口和数据库。其中,该计算机设计的处理器用于提供计算和控制能力。该计算机设备的存储器包括非易失性存储介质、内存储器。该非易失性存储介质存储有操作***、计算机程序和数据库。该内存器为非易失性存储介质中的操作***和计算机程序的运行提供环境。该计算机设备的数据库用于储存三维人体姿态的估计方法等数据。该计算机设备的网络接口用于与外部的终端通过网络连接通信。该计算机程序被处理器执行时以实现一种三维人体姿态的估计方法。所述三维人体姿态的估计方法,包括:获取待估计的图像;将所述待估计的图像输入目标三维人体姿态估计模型进行三维人体姿态估计,其中,所述目标三维人体姿态估计模型是基于待训练的生成网络、待训练的判别网络和人体姿态强几何约束模型进行对抗训练得到的模型;根据所述目标三维人体姿态估计模型输出的三维人体姿态数据,得到所述待估计的图像对应的目标三维人体姿态数据。Referring to FIG. 3 , an embodiment of the present application further provides a computer device. The computer device may be a server, and its internal structure may be as shown in FIG. 3 . The computer device includes a processor, memory, a network interface, and a database connected by a system bus. Among them, the processor of the computer design is used to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium, an internal memory. The nonvolatile storage medium stores an operating system, a computer program, and a database. The memory provides an environment for the execution of the operating system and computer programs in the non-volatile storage medium. The database of the computer equipment is used to store data such as the estimation method of the three-dimensional human body posture. The network interface of the computer device is used to communicate with an external terminal through a network connection. The computer program implements a method for estimating a three-dimensional human pose when executed by a processor. The method for estimating a three-dimensional human body posture includes: acquiring an image to be estimated; inputting the to-be-estimated image into a target three-dimensional human body posture estimation model to perform three-dimensional human body posture estimation, wherein the target three-dimensional human body posture estimation model is based on the to-be-estimated three-dimensional human body posture estimation model. The trained generating network, the discriminant network to be trained and the model obtained by confrontation training with the strong geometric constraint model of human posture; according to the three-dimensional human posture data output by the target three-dimensional human posture estimation model, the target corresponding to the to-be-estimated image is obtained 3D human pose data.
本实施例通过将待估计的图像输入目标三维人体姿态估计模型进行三维人体姿态估计,根据目标三维人体姿态估计模型输出的三维人体姿态数据,得到待估计的图像对应的目标三维人体姿态数据,目标三维人体姿态估计模型是基于待训练的生成网络、待训练的判别网络和人体姿态强几何约束模型进行对抗训练得到的模型,对抗训练有效解决了训练样本较少的问题,引入人体姿态强几何约束模型提高了训练得到的模型对干扰遮挡的预测性能,降低了不正常姿态预测出现的概率。In this embodiment, the image to be estimated is input into the target three-dimensional human body posture estimation model to perform three-dimensional human body posture estimation, and the target three-dimensional human body posture data corresponding to the to-be-estimated image is obtained according to the three-dimensional human body posture data output by the target three-dimensional human body posture estimation model. The 3D human pose estimation model is a model obtained by adversarial training based on the generative network to be trained, the discriminant network to be trained and the strong geometric constraint model of human posture. The model improves the prediction performance of the trained model for interference occlusion, and reduces the probability of abnormal posture prediction.
本申请一实施例还提供一种计算机可读存储介质,其上存储有计算机程序,计算机程序被处理器执行时实现一种三维人体姿态的估计方法,包括步骤:获取待估计的图像;将所述待估计的图像输入目标三维人体姿态估计模型进行三维人体姿态估计,其中,所述目标三维人体姿态估计模型是基于待训练的生成网络、待训练的判别网络和人体姿态强几何约束模型进行对抗训练得到的模型;根据所述目标三维人体姿态估计模型输出的三维人体姿态数据,得到所述待估计的图像对应的目标三维人体姿态数据。An embodiment of the present application also provides a computer-readable storage medium on which a computer program is stored. When the computer program is executed by a processor, a method for estimating a three-dimensional human body pose is implemented, including the steps of: acquiring an image to be estimated; The image to be estimated is input to the target three-dimensional human body pose estimation model for three-dimensional human body pose estimation, wherein the target three-dimensional human body pose estimation model is based on the generation network to be trained, the discrimination network to be trained and the human body pose strong geometric constraint model for confrontation The model obtained by training; according to the three-dimensional human body posture data output by the target three-dimensional human body posture estimation model, the target three-dimensional human body posture data corresponding to the image to be estimated is obtained.
上述执行的三维人体姿态的估计方法,通过将待估计的图像输入目标三维人体姿态估计模型进行三维人体姿态估计,根据目标三维人体姿态估计模型输出的三维人体姿态数据,得到待估计的图像对应的目标三维人体姿态数据,目标三维人体姿态估计模型是基于待训练的生成网络、待训练的判别网络和人体姿态强几何约束模型进行对抗训练得到的模型,对抗训练有效解决了训练样本较少的问题,引入人体姿态强几何约束模型提高了训练得到的模型对干扰遮挡的预测性能,降低了不正常姿态预测出现的概率。The above-mentioned method for estimating the three-dimensional human body posture is performed by inputting the image to be estimated into the target three-dimensional human body posture estimation model to perform three-dimensional human body posture estimation, and according to the three-dimensional human body posture data output by the target three-dimensional human body posture estimation model, obtain the image corresponding to the to-be-estimated image. The target 3D human pose data, the target 3D human pose estimation model is a model obtained by adversarial training based on the generation network to be trained, the discriminant network to be trained and the strong geometric constraint model of human pose, and the adversarial training effectively solves the problem of less training samples , the introduction of a strong geometric constraint model of human posture improves the prediction performance of the trained model for interference occlusion, and reduces the probability of abnormal posture prediction.
所述计算机存储介质可以是非易失性,也可以是易失性。The computer storage medium can be non-volatile or volatile.
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,是可以通过计算机程序来指令相关的硬件来完成,所述的计算机程序可存储于一非易失性计算机可读取存储介质中,该计算机程序在执行时,可包括如上述各方法的实施例的流程。其中,本申请所提供的和实施例中所使用的对存储器、存储、数据库或其它介质的任何引用,均可包括非易失性和/或易失性存储器。非易失性存储器可以包括只读存储器(ROM)、可编程ROM(PROM)、电可编程ROM (EPROM)、电可擦除可编程ROM(EEPROM)或闪存。易失性存储器可包括随机存取存储器(RAM)或者外部高速缓冲存储器。作为说明而非局限,RAM以多种形式可得,诸如静态RAM(SRAM)、动态RAM(DRAM)、同步DRAM(SDRAM)、双速据率SDRAM(SSRSDRAM)、增强型SDRAM(ESDRAM)、同步链路(Synchlink)DRAM(SLDRAM)、存储器总线(Rambus)直接RAM(RDRAM)、直接存储器总线动态RAM(DRDRAM)、以及存储器总线动态RAM(RDRAM)等。Those of ordinary skill in the art can understand that all or part of the processes in the methods of the above embodiments can be implemented by instructing relevant hardware through a computer program, and the computer program can be stored in a non-volatile computer-readable storage In the medium, when the computer program is executed, it may include the processes of the above-mentioned method embodiments. Wherein, any reference to memory, storage, database or other medium provided in this application and used in the embodiments may include non-volatile and/or volatile memory. Nonvolatile memory may include read only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory. Volatile memory may include random access memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in various forms such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double-rate SDRAM (SSRSDRAM), enhanced SDRAM (ESDRAM), synchronous Link (Synchlink) DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), etc.
需要说明的是,在本文中,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、装置、物品或者方法不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、装置、物品或者方法所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括该要素的过程、装置、物品或者方法中还存在另外的相同要素。It should be noted that, herein, the terms "comprising", "comprising" or any other variation thereof are intended to encompass non-exclusive inclusion, such that a process, device, article or method comprising a series of elements includes not only those elements, It also includes other elements not expressly listed or inherent to such a process, apparatus, article or method. Without further limitation, an element qualified by the phrase "comprising a..." does not preclude the presence of additional identical elements in the process, apparatus, article, or method that includes the element.
以上所述仅为本申请的优选实施例,并非因此限制本申请的专利范围,凡是利用本申请说明书及附图内容所作的等效结构或等效流程变换,或直接或间接运用在其他相关的技术领域,均同理包括在本申请的专利保护范围内。The above are only the preferred embodiments of the present application, and are not intended to limit the scope of the patent of the present application. Any equivalent structure or equivalent process transformation made by using the contents of the description and drawings of the present application, or directly or indirectly applied to other related The technical field is similarly included in the scope of patent protection of this application.

Claims (20)

  1. 一种三维人体姿态的估计方法,其中,所述方法包括:A method for estimating a three-dimensional human body posture, wherein the method comprises:
    获取待估计的图像;Get the image to be estimated;
    将所述待估计的图像输入目标三维人体姿态估计模型进行三维人体姿态估计,其中,所述目标三维人体姿态估计模型是基于待训练的生成网络、待训练的判别网络和人体姿态强几何约束模型进行对抗训练得到的模型;Input the image to be estimated into the target three-dimensional human body posture estimation model for three-dimensional human body posture estimation, wherein the target three-dimensional human body posture estimation model is based on the generation network to be trained, the discrimination network to be trained and the strong geometric constraint model of human posture The model obtained by adversarial training;
    根据所述目标三维人体姿态估计模型输出的三维人体姿态数据,得到所述待估计的图像对应的目标三维人体姿态数据。According to the three-dimensional human body posture data output by the target three-dimensional human body posture estimation model, the target three-dimensional human body posture data corresponding to the image to be estimated is obtained.
  2. 根据权利要求1所述的三维人体姿态的估计方法,其中,所述将所述待估计的图像输入目标三维人体姿态估计模型进行三维人体姿态估计的步骤之前,还包括:The method for estimating three-dimensional human body posture according to claim 1, wherein before the step of inputting the image to be estimated into the target three-dimensional human body posture estimation model to perform three-dimensional human body posture estimation, it further comprises:
    获取多个二维关节点训练样本、多个三维关节点训练样本;Obtain multiple 2D joint point training samples and multiple 3D joint point training samples;
    采用所述多个二维关节点训练样本对所述待训练的生成网络和所述待训练的判别网络进行初步对抗训练,得到待优化的生成网络和待优化的判别网络;Use the plurality of two-dimensional joint point training samples to perform preliminary confrontation training on the generation network to be trained and the discrimination network to be trained to obtain the generation network to be optimized and the discrimination network to be optimized;
    根据所述人体姿态强几何约束模型,得到惩罚函数;According to the strong geometric constraint model of the human body posture, a penalty function is obtained;
    采用所述惩罚函数和所述多个三维关节点训练样本对所述待优化的生成网络和所述待优化的判别网络进行优化对抗训练,将优化对抗训练达到收敛条件的所述待优化的生成网络作为所述目标三维人体姿态估计模型。The generation network to be optimized and the discriminant network to be optimized are optimized for adversarial training by using the penalty function and the multiple three-dimensional joint point training samples, and the optimized adversarial training reaches the convergence condition of the generation network to be optimized. The network is used as the target three-dimensional human pose estimation model.
  3. 根据权利要求2所述的三维人体姿态的估计方法,其中,所述采用所述多个二维关节点训练样本对所述待训练的生成网络和所述待训练的判别网络进行初步对抗训练,得到待优化的生成网络和待优化的判别网络的步骤,包括:The method for estimating a three-dimensional human body posture according to claim 2, wherein the preliminary confrontation training is performed on the generation network to be trained and the discrimination network to be trained by using the plurality of two-dimensional joint point training samples, The steps of obtaining the generation network to be optimized and the discriminant network to be optimized include:
    从所述多个二维关节点训练样本中获取一个二维关节点训练样本,作为目标二维关节点训练样本,所述目标二维关节点训练样本包括:第一图像样本数据、人体二维关节点标定值;A two-dimensional joint point training sample is obtained from the plurality of two-dimensional joint point training samples as a target two-dimensional joint point training sample, and the target two-dimensional joint point training sample includes: first image sample data, two-dimensional human body Joint point calibration value;
    将所述目标二维关节点训练样本的所述第一图像样本数据输入所述待训练的生成网络进行人体二维关节点预测,得到人体二维关节点样本预测值;Inputting the first image sample data of the target two-dimensional joint point training sample into the to-be-trained generation network to predict the human body two-dimensional joint point, and obtain the predicted value of the human two-dimensional joint point sample;
    将所述目标二维关节点训练样本的所述人体二维关节点标定值和所述人体二维关节点样本预测值进行判别,得到第一置信结果;Distinguish the calibration value of the human body two-dimensional joint point of the target two-dimensional joint point training sample and the predicted value of the human body two-dimensional joint point sample to obtain a first confidence result;
    采用所述目标二维关节点训练样本的所述人体二维关节点标定值、所述人体二维关节点样本预测值、所述第一置信结果对所述待训练的生成网络和所述待训练的判别网络进行初步对抗训练;Using the calibration value of the two-dimensional human body joint point of the target two-dimensional joint point training sample, the predicted value of the human two-dimensional joint point sample, and the first confidence result are used to determine the generation network to be trained and the to-be-trained joint point. The trained discriminant network is subjected to preliminary adversarial training;
    重复执行所述从所述多个二维关节点训练样本中获取一个二维关节点训练样本,作为目标二维关节点训练样本的步骤,直至初步对抗训练达到收敛条件,将初步对抗训练达到收敛条件的所述待训练的生成网络作为所述待优化的生成网络,将初步对抗训练达到收敛条件的所述待训练的判别网络作为所述待优化的判别网络。Repeat the step of obtaining a two-dimensional joint point training sample from the plurality of two-dimensional joint point training samples as the target two-dimensional joint point training sample, until the preliminary confrontation training reaches the convergence condition, and the preliminary confrontation training reaches convergence The conditional generation network to be trained is used as the generation network to be optimized, and the discriminant network to be trained when the preliminary adversarial training reaches the convergence condition is used as the discriminant network to be optimized.
  4. 根据权利要求3所述的三维人体姿态的估计方法,其中,所述采用所述目标二维关节点训练样本的所述人体二维关节点标定值、所述人体二维关节点样本预测值、所述第一置信结果对所述待训练的生成网络和所述待训练的判别网络进行初步对抗训练的步骤,包括:The method for estimating three-dimensional human body posture according to claim 3, wherein the two-dimensional joint point calibration value of the human body using the target two-dimensional joint point training sample, the predicted value of the human two-dimensional joint point sample, The step of performing preliminary confrontation training on the generation network to be trained and the discriminant network to be trained by the first confidence result includes:
    将所述目标二维关节点训练样本的所述人体二维关节点标定值和所述人体二维关节点样本预测值输入所述待训练的生成网络的损失函数进行计算,得到所述待训练的生成网络的第一损失值,根据所述第一损失值更新所述待训练的生成 网络的参数;Inputting the calibration value of the two-dimensional human body joint point of the target two-dimensional joint point training sample and the predicted value of the human two-dimensional joint point sample into the loss function of the to-be-trained generation network for calculation to obtain the to-be-trained The first loss value of the generation network, update the parameters of the generation network to be trained according to the first loss value;
    将所述第一置信结果输入所述待训练的判别网络的损失函数进行计算,得到所述待训练的判别网络的第二损失值,根据所述第二损失值更新所述待训练的判别网络的参数;Input the first confidence result into the loss function of the discriminant network to be trained for calculation, obtain the second loss value of the discriminant network to be trained, and update the discriminant network to be trained according to the second loss value parameter;
    其中,所述待训练的生成网络的损失函数采用MSE损失函数,所述待训练的判别网络的损失函数采用交叉熵损失函数。Wherein, the loss function of the generation network to be trained adopts the MSE loss function, and the loss function of the discriminant network to be trained adopts the cross entropy loss function.
  5. 根据权利要求2所述的三维人体姿态的估计方法,其中,所述根据所述人体姿态强几何约束模型,得到惩罚函数的步骤,包括:The method for estimating three-dimensional human body posture according to claim 2, wherein the step of obtaining a penalty function according to the strong geometric constraint model of the human body posture comprises:
    根据所述人体姿态强几何约束模型进行关节点位置约束,得到关节点位置惩罚项;According to the strong geometric constraint model of the human body posture, the joint point position is constrained, and the joint point position penalty item is obtained;
    根据所述人体姿态强几何约束模型进行关节角度约束,得到关节角度惩罚项;Perform joint angle constraints according to the strong geometric constraint model of the human body posture to obtain the joint angle penalty term;
    根据所述人体姿态强几何约束模型进行骨骼长度约束,得到骨骼长度惩罚项;Carry out bone length constraint according to the strong geometric constraint model of the human body posture, and obtain the bone length penalty term;
    根据所述关节点位置惩罚项、所述关节角度惩罚项和所述骨骼长度约束,得到所述惩罚函数。The penalty function is obtained according to the joint point position penalty term, the joint angle penalty term and the bone length constraint.
  6. 根据权利要求2所述的三维人体姿态的估计方法,其中,所述采用所述惩罚函数和所述多个三维关节点训练样本对所述待优化的生成网络和所述待优化的判别网络进行优化对抗训练,将优化对抗训练达到收敛条件的所述待优化的生成网络作为所述目标三维人体姿态估计模型的步骤,包括:The method for estimating a three-dimensional human pose according to claim 2, wherein the generating network to be optimized and the discriminant network to be optimized are performed using the penalty function and the plurality of three-dimensional joint point training samples. Optimizing adversarial training, using the generating network to be optimized where the optimal adversarial training reaches a convergence condition as the step of the target three-dimensional human body pose estimation model, including:
    获取所述待训练的生成网络的损失函数和所述待训练的判别网络的损失函数;Obtain the loss function of the generation network to be trained and the loss function of the discriminant network to be trained;
    将所述待训练的判别网络的损失函数作为所述待优化的判别网络的损失函数;Taking the loss function of the discriminant network to be trained as the loss function of the discriminant network to be optimized;
    将所述惩罚函数作为惩罚项添加到所述待训练的生成网络的损失函数中,得到所述待优化的生成网络的损失函数;adding the penalty function as a penalty term to the loss function of the generation network to be trained to obtain the loss function of the generation network to be optimized;
    采用所述待优化的生成网络的损失函数、所述待优化的判别网络的损失函数和所述多个三维关节点训练样本对所述待优化的生成网络和所述待优化的判别网络进行优化对抗训练,将优化对抗训练达到收敛条件的所述待优化的生成网络作为所述目标三维人体姿态估计模型。The generation network to be optimized and the discriminant network to be optimized are optimized by using the loss function of the generation network to be optimized, the loss function of the discriminant network to be optimized, and the multiple three-dimensional joint point training samples In the adversarial training, the generating network to be optimized after the optimized adversarial training reaches the convergence condition is used as the target three-dimensional human body pose estimation model.
  7. 根据权利要求6所述的三维人体姿态的估计方法,其中,所述采用所述待优化的生成网络的损失函数、所述待优化的判别网络的损失函数和所述多个三维关节点训练样本对所述待优化的生成网络和所述待优化的判别网络进行优化对抗训练,将优化对抗训练达到收敛条件的所述待优化的生成网络作为所述目标三维人体姿态估计模型的步骤,包括:The method for estimating a three-dimensional human pose according to claim 6, wherein the loss function of the generation network to be optimized, the loss function of the discriminant network to be optimized, and the plurality of three-dimensional joint point training samples are used. The steps of performing optimization adversarial training on the generation network to be optimized and the discriminant network to be optimized, and using the generation network to be optimized where the optimization confrontation training reaches a convergence condition as the target three-dimensional human body pose estimation model, include:
    从所述多个三维关节点训练样本中获取一个三维关节点训练样本,作为目标三维关节点训练样本,所述目标三维关节点训练样本包括:第二图像样本数据、人体三维关节点标定值;A 3D joint point training sample is obtained from the plurality of 3D joint point training samples, as a target 3D joint point training sample, and the target 3D joint point training sample includes: second image sample data and a calibration value of a human body 3D joint point;
    将所述目标三维关节点训练样本的所述第二图像样本数据输入所述待优化的生成网络进行人体三维关节点预测,得到人体三维关节点样本预测值;Inputting the second image sample data of the target three-dimensional joint point training sample into the generation network to be optimized to predict the three-dimensional joint point of the human body, and obtain the predicted value of the three-dimensional joint point sample of the human body;
    将所述目标三维关节点训练样本的所述人体三维关节点标定值和所述人体三维关节点样本预测值进行判别,得到第二置信结果;Distinguishing the calibration value of the three-dimensional human body joint point of the target three-dimensional joint point training sample and the predicted value of the three-dimensional human body joint point sample to obtain a second confidence result;
    采用所述待优化的生成网络的损失函数、所述待优化的判别网络的损失函数、所述目标三维关节点训练样本的所述人体三维关节点标定值、所述人体三维关节点样本预测值、所述第二置信结果对所述待优化的生成网络和所述待优化的判别网络进行优化对抗训练;The loss function of the generation network to be optimized, the loss function of the discriminant network to be optimized, the calibration value of the human body three-dimensional joint point of the target three-dimensional joint point training sample, and the predicted value of the human three-dimensional joint point sample are adopted. , the second confidence result is optimized and confrontational training is performed on the generation network to be optimized and the discrimination network to be optimized;
    重复执行所述从所述多个三维关节点训练样本中获取一个三维关节点训练样本,作为目标三维关节点训练样本的步骤,直至优化对抗训练达到收敛条件,将优化对抗训练达到收敛条件的所述待优化的生成网络作为所述目标三维人体姿态估计模型。Repeat the step of obtaining a 3D joint point training sample from the plurality of 3D joint point training samples as the target 3D joint point training sample, until the optimal confrontation training reaches the convergence condition, and optimize the confrontation training to reach the convergence condition. The generation network to be optimized is used as the target three-dimensional human pose estimation model.
  8. 一种三维人体姿态的估计装置,其中,所述装置包括:A device for estimating a three-dimensional human body posture, wherein the device comprises:
    数据获取模块,用于获取待估计的图像;a data acquisition module for acquiring the image to be estimated;
    三维人体姿态估计模块,用于将所述待估计的图像输入目标三维人体姿态估计模型进行三维人体姿态估计,其中,所述目标三维人体姿态估计模型是基于待训练的生成网络、待训练的判别网络和人体姿态强几何约束模型进行对抗训练得到的模型;A three-dimensional human body posture estimation module is used to input the image to be estimated into a target three-dimensional human body posture estimation model for three-dimensional human posture estimation, wherein the target three-dimensional human body posture estimation model is based on the generation network to be trained, the discrimination to be trained The model obtained by adversarial training between the network and the strong geometric constraint model of human posture;
    目标三维人体姿态数据确定模块,用于根据所述目标三维人体姿态估计模型输出的三维人体姿态数据,得到所述待估计的图像对应的目标三维人体姿态数据。The target three-dimensional human body posture data determination module is configured to obtain the target three-dimensional human body posture data corresponding to the to-be-estimated image according to the three-dimensional human body posture data output by the target three-dimensional human body posture estimation model.
  9. 一种计算机设备,包括存储器和处理器,所述存储器存储有计算机程序,其中,所述处理器执行所述计算机程序时实现如下方法步骤:A computer device includes a memory and a processor, wherein the memory stores a computer program, wherein the processor implements the following method steps when executing the computer program:
    获取待估计的图像;Get the image to be estimated;
    将所述待估计的图像输入目标三维人体姿态估计模型进行三维人体姿态估计,其中,所述目标三维人体姿态估计模型是基于待训练的生成网络、待训练的判别网络和人体姿态强几何约束模型进行对抗训练得到的模型;Input the image to be estimated into the target three-dimensional human body posture estimation model for three-dimensional human body posture estimation, wherein the target three-dimensional human body posture estimation model is based on the generation network to be trained, the discrimination network to be trained and the strong geometric constraint model of human posture The model obtained by adversarial training;
    根据所述目标三维人体姿态估计模型输出的三维人体姿态数据,得到所述待估计的图像对应的目标三维人体姿态数据。According to the three-dimensional human body posture data output by the target three-dimensional human body posture estimation model, the target three-dimensional human body posture data corresponding to the image to be estimated is obtained.
  10. 根据权利要求9所述的计算机设备,其中,所述将所述待估计的图像输入目标三维人体姿态估计模型进行三维人体姿态估计的步骤之前,还包括:The computer device according to claim 9, wherein, before the step of inputting the image to be estimated into the target three-dimensional human body pose estimation model to perform three-dimensional human body pose estimation, it further comprises:
    获取多个二维关节点训练样本、多个三维关节点训练样本;Obtain multiple 2D joint point training samples and multiple 3D joint point training samples;
    采用所述多个二维关节点训练样本对所述待训练的生成网络和所述待训练的判别网络进行初步对抗训练,得到待优化的生成网络和待优化的判别网络;Use the plurality of two-dimensional joint point training samples to perform preliminary confrontation training on the generation network to be trained and the discrimination network to be trained to obtain the generation network to be optimized and the discrimination network to be optimized;
    根据所述人体姿态强几何约束模型,得到惩罚函数;According to the strong geometric constraint model of the human body posture, a penalty function is obtained;
    采用所述惩罚函数和所述多个三维关节点训练样本对所述待优化的生成网络和所述待优化的判别网络进行优化对抗训练,将优化对抗训练达到收敛条件的所述待优化的生成网络作为所述目标三维人体姿态估计模型。The generation network to be optimized and the discriminant network to be optimized are optimized for adversarial training by using the penalty function and the multiple three-dimensional joint point training samples, and the optimized adversarial training reaches the convergence condition of the generation network to be optimized. The network is used as the target three-dimensional human pose estimation model.
  11. 根据权利要求10所述的计算机设备,其中,所述采用所述多个二维关节点训练样本对所述待训练的生成网络和所述待训练的判别网络进行初步对抗训练,得到待优化的生成网络和待优化的判别网络的步骤,包括:The computer device according to claim 10, wherein the generation network to be trained and the discriminant network to be trained are subjected to preliminary confrontation training by using the plurality of two-dimensional joint point training samples to obtain the to-be-optimized The steps of generating the network and the discriminant network to be optimized include:
    从所述多个二维关节点训练样本中获取一个二维关节点训练样本,作为目标二维关节点训练样本,所述目标二维关节点训练样本包括:第一图像样本数据、人体二维关节点标定值;A two-dimensional joint point training sample is obtained from the plurality of two-dimensional joint point training samples as a target two-dimensional joint point training sample, and the target two-dimensional joint point training sample includes: first image sample data, two-dimensional human body Joint point calibration value;
    将所述目标二维关节点训练样本的所述第一图像样本数据输入所述待训练的生成网络进行人体二维关节点预测,得到人体二维关节点样本预测值;Inputting the first image sample data of the target two-dimensional joint point training sample into the to-be-trained generation network to predict the human body two-dimensional joint point, and obtain the predicted value of the human two-dimensional joint point sample;
    将所述目标二维关节点训练样本的所述人体二维关节点标定值和所述人体二维关节点样本预测值进行判别,得到第一置信结果;Distinguish the calibration value of the human body two-dimensional joint point of the target two-dimensional joint point training sample and the predicted value of the human body two-dimensional joint point sample to obtain a first confidence result;
    采用所述目标二维关节点训练样本的所述人体二维关节点标定值、所述人体二维关节点样本预测值、所述第一置信结果对所述待训练的生成网络和所述待训练的判别网络进行初步对抗训练;Using the calibration value of the two-dimensional human body joint point of the target two-dimensional joint point training sample, the predicted value of the human two-dimensional joint point sample, and the first confidence result are used to determine the generation network to be trained and the to-be-trained joint point. The trained discriminant network is subjected to preliminary adversarial training;
    重复执行所述从所述多个二维关节点训练样本中获取一个二维关节点训练样本,作为目标二维关节点训练样本的步骤,直至初步对抗训练达到收敛条件, 将初步对抗训练达到收敛条件的所述待训练的生成网络作为所述待优化的生成网络,将初步对抗训练达到收敛条件的所述待训练的判别网络作为所述待优化的判别网络。Repeat the step of obtaining a two-dimensional joint point training sample from the plurality of two-dimensional joint point training samples as the target two-dimensional joint point training sample, until the preliminary confrontation training reaches the convergence condition, and the preliminary confrontation training reaches convergence The conditional generation network to be trained is used as the generation network to be optimized, and the discriminant network to be trained when the preliminary adversarial training reaches the convergence condition is used as the discriminant network to be optimized.
  12. 根据权利要求11所述的计算机设备,其中,所述采用所述目标二维关节点训练样本的所述人体二维关节点标定值、所述人体二维关节点样本预测值、所述第一置信结果对所述待训练的生成网络和所述待训练的判别网络进行初步对抗训练的步骤,包括:The computer device according to claim 11, wherein the calibration value of the two-dimensional joint point of the human body, the predicted value of the two-dimensional joint point sample of the human body using the target two-dimensional joint point training sample, the first The steps of performing preliminary confrontation training on the generation network to be trained and the discriminant network to be trained in the confidence result, including:
    将所述目标二维关节点训练样本的所述人体二维关节点标定值和所述人体二维关节点样本预测值输入所述待训练的生成网络的损失函数进行计算,得到所述待训练的生成网络的第一损失值,根据所述第一损失值更新所述待训练的生成网络的参数;Inputting the calibration value of the two-dimensional human body joint point of the target two-dimensional joint point training sample and the predicted value of the human two-dimensional joint point sample into the loss function of the to-be-trained generation network for calculation to obtain the to-be-trained The first loss value of the generation network, update the parameters of the generation network to be trained according to the first loss value;
    将所述第一置信结果输入所述待训练的判别网络的损失函数进行计算,得到所述待训练的判别网络的第二损失值,根据所述第二损失值更新所述待训练的判别网络的参数;Input the first confidence result into the loss function of the discriminant network to be trained for calculation, obtain the second loss value of the discriminant network to be trained, and update the discriminant network to be trained according to the second loss value parameter;
    其中,所述待训练的生成网络的损失函数采用MSE损失函数,所述待训练的判别网络的损失函数采用交叉熵损失函数。Wherein, the loss function of the generation network to be trained adopts the MSE loss function, and the loss function of the discriminant network to be trained adopts the cross entropy loss function.
  13. 根据权利要求10所述的计算机设备,其中,所述根据所述人体姿态强几何约束模型,得到惩罚函数的步骤,包括:The computer device according to claim 10, wherein the step of obtaining a penalty function according to the strong geometric constraint model of the human body posture comprises:
    根据所述人体姿态强几何约束模型进行关节点位置约束,得到关节点位置惩罚项;According to the strong geometric constraint model of the human body posture, the joint point position is constrained, and the joint point position penalty item is obtained;
    根据所述人体姿态强几何约束模型进行关节角度约束,得到关节角度惩罚项;Perform joint angle constraints according to the strong geometric constraint model of the human body posture to obtain the joint angle penalty term;
    根据所述人体姿态强几何约束模型进行骨骼长度约束,得到骨骼长度惩罚项;Carry out bone length constraint according to the strong geometric constraint model of the human body posture, and obtain the bone length penalty term;
    根据所述关节点位置惩罚项、所述关节角度惩罚项和所述骨骼长度约束,得到所述惩罚函数。The penalty function is obtained according to the joint point position penalty term, the joint angle penalty term and the bone length constraint.
  14. 根据权利要求10所述的计算机设备,其中,所述采用所述惩罚函数和所述多个三维关节点训练样本对所述待优化的生成网络和所述待优化的判别网络进行优化对抗训练,将优化对抗训练达到收敛条件的所述待优化的生成网络作为所述目标三维人体姿态估计模型的步骤,包括:The computer device according to claim 10, wherein the optimization adversarial training is performed on the generation network to be optimized and the discriminant network to be optimized by using the penalty function and the plurality of three-dimensional joint point training samples, The step of using the generation network to be optimized, where the optimized confrontation training reaches the convergence condition, as the target three-dimensional human body pose estimation model, includes:
    获取所述待训练的生成网络的损失函数和所述待训练的判别网络的损失函数;Obtain the loss function of the generation network to be trained and the loss function of the discriminant network to be trained;
    将所述待训练的判别网络的损失函数作为所述待优化的判别网络的损失函数;Taking the loss function of the discriminant network to be trained as the loss function of the discriminant network to be optimized;
    将所述惩罚函数作为惩罚项添加到所述待训练的生成网络的损失函数中,得到所述待优化的生成网络的损失函数;adding the penalty function as a penalty term to the loss function of the generation network to be trained to obtain the loss function of the generation network to be optimized;
    采用所述待优化的生成网络的损失函数、所述待优化的判别网络的损失函数和所述多个三维关节点训练样本对所述待优化的生成网络和所述待优化的判别网络进行优化对抗训练,将优化对抗训练达到收敛条件的所述待优化的生成网络作为所述目标三维人体姿态估计模型。The generation network to be optimized and the discriminant network to be optimized are optimized by using the loss function of the generation network to be optimized, the loss function of the discriminant network to be optimized, and the multiple three-dimensional joint point training samples In the adversarial training, the generation network to be optimized after the optimized adversarial training reaches the convergence condition is used as the target three-dimensional human body pose estimation model.
  15. 一种计算机可读存储介质,其上存储有计算机程序,其中,所述计算机程序被处理器执行时实现如下方法步骤:A computer-readable storage medium on which a computer program is stored, wherein when the computer program is executed by a processor, the following method steps are implemented:
    获取待估计的图像;Get the image to be estimated;
    将所述待估计的图像输入目标三维人体姿态估计模型进行三维人体姿态估计,其中,所述目标三维人体姿态估计模型是基于待训练的生成网络、待训练的判别网络和人体姿态强几何约束模型进行对抗训练得到的模型;Input the image to be estimated into the target three-dimensional human body posture estimation model for three-dimensional human body posture estimation, wherein the target three-dimensional human body posture estimation model is based on the generation network to be trained, the discrimination network to be trained and the strong geometric constraint model of human posture The model obtained by adversarial training;
    根据所述目标三维人体姿态估计模型输出的三维人体姿态数据,得到所述待估计的图像对应的目标三维人体姿态数据。According to the three-dimensional human body posture data output by the target three-dimensional human body posture estimation model, the target three-dimensional human body posture data corresponding to the image to be estimated is obtained.
  16. 根据权利要求15所述的计算机可读存储介质,其中,所述将所述待估计的图像输入目标三维人体姿态估计模型进行三维人体姿态估计的步骤之前,还包括:The computer-readable storage medium according to claim 15, wherein before the step of inputting the image to be estimated into the target three-dimensional human pose estimation model for three-dimensional human pose estimation, the method further comprises:
    获取多个二维关节点训练样本、多个三维关节点训练样本;Obtain multiple 2D joint point training samples and multiple 3D joint point training samples;
    采用所述多个二维关节点训练样本对所述待训练的生成网络和所述待训练的判别网络进行初步对抗训练,得到待优化的生成网络和待优化的判别网络;Use the plurality of two-dimensional joint point training samples to perform preliminary confrontation training on the generation network to be trained and the discrimination network to be trained to obtain the generation network to be optimized and the discrimination network to be optimized;
    根据所述人体姿态强几何约束模型,得到惩罚函数;According to the strong geometric constraint model of the human body posture, a penalty function is obtained;
    采用所述惩罚函数和所述多个三维关节点训练样本对所述待优化的生成网络和所述待优化的判别网络进行优化对抗训练,将优化对抗训练达到收敛条件的所述待优化的生成网络作为所述目标三维人体姿态估计模型。The generation network to be optimized and the discriminant network to be optimized are optimized for adversarial training by using the penalty function and the multiple three-dimensional joint point training samples, and the optimized adversarial training reaches the convergence condition of the generation network to be optimized. The network is used as the target three-dimensional human pose estimation model.
  17. 根据权利要求16所述的计算机可读存储介质,其中,所述采用所述多个二维关节点训练样本对所述待训练的生成网络和所述待训练的判别网络进行初步对抗训练,得到待优化的生成网络和待优化的判别网络的步骤,包括:The computer-readable storage medium according to claim 16, wherein the preliminary confrontation training is performed on the generation network to be trained and the discriminant network to be trained by using the plurality of two-dimensional joint point training samples to obtain The steps of the generation network to be optimized and the discriminant network to be optimized include:
    从所述多个二维关节点训练样本中获取一个二维关节点训练样本,作为目标二维关节点训练样本,所述目标二维关节点训练样本包括:第一图像样本数据、人体二维关节点标定值;A two-dimensional joint point training sample is obtained from the plurality of two-dimensional joint point training samples as a target two-dimensional joint point training sample, and the target two-dimensional joint point training sample includes: first image sample data, two-dimensional human body Joint point calibration value;
    将所述目标二维关节点训练样本的所述第一图像样本数据输入所述待训练的生成网络进行人体二维关节点预测,得到人体二维关节点样本预测值;Inputting the first image sample data of the target two-dimensional joint point training sample into the to-be-trained generation network to predict the human body two-dimensional joint point, and obtain the predicted value of the human two-dimensional joint point sample;
    将所述目标二维关节点训练样本的所述人体二维关节点标定值和所述人体二维关节点样本预测值进行判别,得到第一置信结果;Distinguish the calibration value of the human body two-dimensional joint point of the target two-dimensional joint point training sample and the predicted value of the human body two-dimensional joint point sample to obtain a first confidence result;
    采用所述目标二维关节点训练样本的所述人体二维关节点标定值、所述人体二维关节点样本预测值、所述第一置信结果对所述待训练的生成网络和所述待训练的判别网络进行初步对抗训练;Using the calibration value of the two-dimensional human body joint point of the target two-dimensional joint point training sample, the predicted value of the human two-dimensional joint point sample, and the first confidence result are used to determine the generation network to be trained and the to-be-trained joint point. The trained discriminant network is subjected to preliminary adversarial training;
    重复执行所述从所述多个二维关节点训练样本中获取一个二维关节点训练样本,作为目标二维关节点训练样本的步骤,直至初步对抗训练达到收敛条件,将初步对抗训练达到收敛条件的所述待训练的生成网络作为所述待优化的生成网络,将初步对抗训练达到收敛条件的所述待训练的判别网络作为所述待优化的判别网络。Repeat the step of obtaining a two-dimensional joint point training sample from the plurality of two-dimensional joint point training samples as the target two-dimensional joint point training sample, until the preliminary confrontation training reaches the convergence condition, and the preliminary confrontation training reaches convergence The conditional generation network to be trained is used as the generation network to be optimized, and the discriminant network to be trained when the preliminary adversarial training reaches the convergence condition is used as the discriminant network to be optimized.
  18. 根据权利要求17所述的计算机可读存储介质,其中,所述采用所述目标二维关节点训练样本的所述人体二维关节点标定值、所述人体二维关节点样本预测值、所述第一置信结果对所述待训练的生成网络和所述待训练的判别网络进行初步对抗训练的步骤,包括:The computer-readable storage medium according to claim 17, wherein the calibration value of the two-dimensional joint point of the human body, the predicted value of the two-dimensional joint point sample of the human body, the The step of performing preliminary confrontation training on the generation network to be trained and the discriminant network to be trained with the first confidence result, including:
    将所述目标二维关节点训练样本的所述人体二维关节点标定值和所述人体二维关节点样本预测值输入所述待训练的生成网络的损失函数进行计算,得到所述待训练的生成网络的第一损失值,根据所述第一损失值更新所述待训练的生成网络的参数;Inputting the calibration value of the two-dimensional human body joint point of the target two-dimensional joint point training sample and the predicted value of the human two-dimensional joint point sample into the loss function of the to-be-trained generation network for calculation to obtain the to-be-trained The first loss value of the generation network, update the parameters of the generation network to be trained according to the first loss value;
    将所述第一置信结果输入所述待训练的判别网络的损失函数进行计算,得到所述待训练的判别网络的第二损失值,根据所述第二损失值更新所述待训练的判别网络的参数;Input the first confidence result into the loss function of the discriminant network to be trained for calculation, obtain the second loss value of the discriminant network to be trained, and update the discriminant network to be trained according to the second loss value parameter;
    其中,所述待训练的生成网络的损失函数采用MSE损失函数,所述待训练的判别网络的损失函数采用交叉熵损失函数。Wherein, the loss function of the generation network to be trained adopts the MSE loss function, and the loss function of the discriminant network to be trained adopts the cross entropy loss function.
  19. 根据权利要求16所述的计算机可读存储介质,其中,所述根据所述人体 姿态强几何约束模型,得到惩罚函数的步骤,包括:The computer-readable storage medium according to claim 16, wherein the step of obtaining a penalty function according to the strong geometric constraint model of the human body posture comprises:
    根据所述人体姿态强几何约束模型进行关节点位置约束,得到关节点位置惩罚项;According to the strong geometric constraint model of the human body posture, the joint point position is constrained, and the joint point position penalty item is obtained;
    根据所述人体姿态强几何约束模型进行关节角度约束,得到关节角度惩罚项;Perform joint angle constraints according to the strong geometric constraint model of the human body posture to obtain the joint angle penalty term;
    根据所述人体姿态强几何约束模型进行骨骼长度约束,得到骨骼长度惩罚项;Carry out bone length constraint according to the strong geometric constraint model of the human body posture, and obtain the bone length penalty term;
    根据所述关节点位置惩罚项、所述关节角度惩罚项和所述骨骼长度约束,得到所述惩罚函数。The penalty function is obtained according to the joint point position penalty term, the joint angle penalty term and the bone length constraint.
  20. 根据权利要求16所述的计算机可读存储介质,其中,所述采用所述惩罚函数和所述多个三维关节点训练样本对所述待优化的生成网络和所述待优化的判别网络进行优化对抗训练,将优化对抗训练达到收敛条件的所述待优化的生成网络作为所述目标三维人体姿态估计模型的步骤,包括:The computer-readable storage medium of claim 16, wherein the generating network to be optimized and the discriminant network to be optimized are optimized by using the penalty function and the plurality of three-dimensional joint point training samples Adversarial training, the step of using the generation network to be optimized, where the optimized confrontation training reaches the convergence condition, as the target three-dimensional human body pose estimation model, including:
    获取所述待训练的生成网络的损失函数和所述待训练的判别网络的损失函数;Obtain the loss function of the generation network to be trained and the loss function of the discriminant network to be trained;
    将所述待训练的判别网络的损失函数作为所述待优化的判别网络的损失函数;Taking the loss function of the discriminant network to be trained as the loss function of the discriminant network to be optimized;
    将所述惩罚函数作为惩罚项添加到所述待训练的生成网络的损失函数中,得到所述待优化的生成网络的损失函数;adding the penalty function as a penalty term to the loss function of the generation network to be trained to obtain the loss function of the generation network to be optimized;
    采用所述待优化的生成网络的损失函数、所述待优化的判别网络的损失函数和所述多个三维关节点训练样本对所述待优化的生成网络和所述待优化的判别网络进行优化对抗训练,将优化对抗训练达到收敛条件的所述待优化的生成网络作为所述目标三维人体姿态估计模型。The generation network to be optimized and the discriminant network to be optimized are optimized by using the loss function of the generation network to be optimized, the loss function of the discriminant network to be optimized, and the multiple three-dimensional joint point training samples In the adversarial training, the generating network to be optimized after the optimized adversarial training reaches the convergence condition is used as the target three-dimensional human body pose estimation model.
PCT/CN2021/084570 2021-02-26 2021-03-31 Three-dimensional human pose estimation method and apparatus, device, and storage medium WO2022178951A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110219606.4A CN112949462B (en) 2021-02-26 2021-02-26 Three-dimensional human body posture estimation method, device, equipment and storage medium
CN202110219606.4 2021-02-26

Publications (1)

Publication Number Publication Date
WO2022178951A1 true WO2022178951A1 (en) 2022-09-01

Family

ID=76246626

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/084570 WO2022178951A1 (en) 2021-02-26 2021-03-31 Three-dimensional human pose estimation method and apparatus, device, and storage medium

Country Status (2)

Country Link
CN (1) CN112949462B (en)
WO (1) WO2022178951A1 (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115601505A (en) * 2022-11-07 2023-01-13 广州趣丸网络科技有限公司(Cn) Human body three-dimensional posture restoration method and device, electronic equipment and storage medium
CN117292407A (en) * 2023-11-27 2023-12-26 安徽炬视科技有限公司 3D human body posture estimation method and system
CN117456612A (en) * 2023-12-26 2024-01-26 西安龙南铭科技有限公司 Cloud computing-based body posture automatic assessment method and system
CN117671738A (en) * 2024-02-01 2024-03-08 山东大学 Human body posture recognition system based on artificial intelligence

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114712835B (en) * 2022-03-25 2022-10-14 中国地质大学(武汉) Supplementary training system based on two mesh human position appearance discernments
CN114663593B (en) * 2022-03-25 2023-04-07 清华大学 Three-dimensional human body posture estimation method, device, equipment and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110427877A (en) * 2019-08-01 2019-11-08 大连海事大学 A method of the human body three-dimensional posture estimation based on structural information
CN111311729A (en) * 2020-01-18 2020-06-19 西安电子科技大学 Natural scene three-dimensional human body posture reconstruction method based on bidirectional projection network
CN111553968A (en) * 2020-05-11 2020-08-18 青岛联合创智科技有限公司 Method for reconstructing animation by three-dimensional human body
US20200342270A1 (en) * 2019-04-26 2020-10-29 Tata Consultancy Services Limited Weakly supervised learning of 3d human poses from 2d poses
CN112016494A (en) * 2020-09-03 2020-12-01 中科人工智能创新技术研究院(青岛)有限公司 Three-dimensional human body posture estimation method and system based on neural network structure search

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108038465A (en) * 2017-12-25 2018-05-15 深圳市唯特视科技有限公司 A kind of three-dimensional more personage's Attitude estimations based on generated data collection
CN111062326B (en) * 2019-12-02 2023-07-25 北京理工大学 Self-supervision human body 3D gesture estimation network training method based on geometric driving
CN111401151B (en) * 2020-02-28 2022-09-06 中国科学技术大学 Accurate three-dimensional hand posture estimation method

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200342270A1 (en) * 2019-04-26 2020-10-29 Tata Consultancy Services Limited Weakly supervised learning of 3d human poses from 2d poses
CN110427877A (en) * 2019-08-01 2019-11-08 大连海事大学 A method of the human body three-dimensional posture estimation based on structural information
CN111311729A (en) * 2020-01-18 2020-06-19 西安电子科技大学 Natural scene three-dimensional human body posture reconstruction method based on bidirectional projection network
CN111553968A (en) * 2020-05-11 2020-08-18 青岛联合创智科技有限公司 Method for reconstructing animation by three-dimensional human body
CN112016494A (en) * 2020-09-03 2020-12-01 中科人工智能创新技术研究院(青岛)有限公司 Three-dimensional human body posture estimation method and system based on neural network structure search

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
MUHAMMED KOCABAS; SALIH KARAGOZ; EMRE AKBAS: "Self-Supervised Learning of 3D Human Pose using Multi-view Geometry", ARXIV.ORG, 6 March 2019 (2019-03-06), pages 1 - 10, XP081162347 *

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115601505A (en) * 2022-11-07 2023-01-13 广州趣丸网络科技有限公司(Cn) Human body three-dimensional posture restoration method and device, electronic equipment and storage medium
CN115601505B (en) * 2022-11-07 2023-03-14 广州趣丸网络科技有限公司 Human body three-dimensional posture restoration method and device, electronic equipment and storage medium
CN117292407A (en) * 2023-11-27 2023-12-26 安徽炬视科技有限公司 3D human body posture estimation method and system
CN117292407B (en) * 2023-11-27 2024-03-26 安徽炬视科技有限公司 3D human body posture estimation method and system
CN117456612A (en) * 2023-12-26 2024-01-26 西安龙南铭科技有限公司 Cloud computing-based body posture automatic assessment method and system
CN117456612B (en) * 2023-12-26 2024-03-12 西安龙南铭科技有限公司 Cloud computing-based body posture automatic assessment method and system
CN117671738A (en) * 2024-02-01 2024-03-08 山东大学 Human body posture recognition system based on artificial intelligence
CN117671738B (en) * 2024-02-01 2024-04-23 山东大学 Human body posture recognition system based on artificial intelligence

Also Published As

Publication number Publication date
CN112949462B (en) 2023-12-19
CN112949462A (en) 2021-06-11

Similar Documents

Publication Publication Date Title
WO2022178951A1 (en) Three-dimensional human pose estimation method and apparatus, device, and storage medium
US20210374474A1 (en) Method, apparatus, and electronic device for training neural network model
US11783199B2 (en) Image description information generation method and apparatus, and electronic device
CN113792682B (en) Face quality assessment method, device, equipment and medium based on face image
CN112200165A (en) Model training method, human body posture estimation method, device, equipment and medium
CN112818963B (en) Training method, device and equipment of face recognition model and storage medium
CN111221981A (en) Method and device for training knowledge graph embedded model and computer storage medium
CN112990478B (en) Federal learning data processing system
CN110009663B (en) Target tracking method, device, equipment and computer readable storage medium
CN116524062B (en) Diffusion model-based 2D human body posture estimation method
CN112330569A (en) Model training method, text denoising method, device, equipment and storage medium
CN116071601A (en) Method, apparatus, device and medium for training model
CN109785372B (en) Basic matrix robust estimation method based on soft decision optimization
CN111369449A (en) Infrared blind pixel compensation method based on generating type countermeasure network
CN110570487A (en) Undersampling model generation method, image reconstruction method, device and computer equipment
CN109934926B (en) Model data processing method, device, readable storage medium and equipment
CN115880546A (en) Confrontation robustness evaluation method based on class activation mapping chart and terminal equipment
CN115223199A (en) Pig behavior data equalization method and device, computer equipment and storage medium
CN107958229B (en) Face recognition method, device and equipment based on neighbor keeping low-rank representation
CN112183283A (en) Age estimation method, device, equipment and storage medium based on image
CN112989788A (en) Method, device, equipment and medium for extracting relation triples
CN113077379A (en) Method, device, equipment and storage medium for extracting characteristic latent codes
CN111199513A (en) Image processing method, computer device, and storage medium
US20190325318A1 (en) Method and system for learning in a trustless environment
CN115565051B (en) Lightweight face attribute recognition model training method, recognition method and device

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21927385

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21927385

Country of ref document: EP

Kind code of ref document: A1