CN113033369A - Motion capture method, motion capture device, electronic equipment and computer-readable storage medium - Google Patents

Motion capture method, motion capture device, electronic equipment and computer-readable storage medium Download PDF

Info

Publication number
CN113033369A
CN113033369A CN202110292819.XA CN202110292819A CN113033369A CN 113033369 A CN113033369 A CN 113033369A CN 202110292819 A CN202110292819 A CN 202110292819A CN 113033369 A CN113033369 A CN 113033369A
Authority
CN
China
Prior art keywords
foot
video image
joint point
ground
target person
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110292819.XA
Other languages
Chinese (zh)
Other versions
CN113033369B (en
Inventor
赵培尧
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Dajia Internet Information Technology Co Ltd
Original Assignee
Beijing Dajia Internet Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Dajia Internet Information Technology Co Ltd filed Critical Beijing Dajia Internet Information Technology Co Ltd
Priority to CN202110292819.XA priority Critical patent/CN113033369B/en
Publication of CN113033369A publication Critical patent/CN113033369A/en
Application granted granted Critical
Publication of CN113033369B publication Critical patent/CN113033369B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2411Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Data Mining & Analysis (AREA)
  • Multimedia (AREA)
  • Social Psychology (AREA)
  • Human Computer Interaction (AREA)
  • Health & Medical Sciences (AREA)
  • Psychiatry (AREA)
  • General Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Processing Or Creating Images (AREA)

Abstract

The disclosure relates to a motion capture method, a motion capture device, an electronic device and a computer-readable storage medium, and belongs to the field of image processing. The method comprises the following steps: when the foot of a target person in any frame of video image is in contact with the ground, the position and the orientation of the ankle joint point of the target person in the next frame of video image are optimized through the position and the orientation of the ankle joint point of the target person in the next frame of video image, the position and the orientation of the ankle joint point in the target person in the frame of video image when the postures of the target persons in adjacent frames of video images are consistent as much as possible are obtained, and then the positions and the orientations of other joint points in the frame of video image are adjusted according to the human posture of the target person on the basis of the position and the orientation of the optimized ankle joint point in the frame of video image, so that the target person in the frame of video image can stably stand without sliding on the ground when the original posture is kept, and the motion capture effect is better.

Description

Motion capture method, motion capture device, electronic equipment and computer-readable storage medium
Technical Field
The present disclosure relates to the field of motion capture technologies, and in particular, to a motion capture method and apparatus, an electronic device, and a computer-readable storage medium.
Background
The motion capture technology is a technology which can accurately measure and record various motion tracks and postures of a moving object in a real three-dimensional space in real time and reconstruct the motion state of the object at each moment in a virtual three-dimensional space. Motion capture technology has wide application in live entertainment, human-computer interaction, virtual content production and other aspects, such as driving virtual characters through motion capture. Currently, commonly used motion capture technologies include single-view motion capture, multi-view motion capture, inertial motion capture, and the like, but are limited by the capture accuracy of these methods, and the motion capture results often have problems such as unstable feet when the virtual character stands or sliding feet on the ground when the virtual character walks, which reduces the viewing experience of the user.
In the related art, a Ground IK (Inverse dynamics) method is adopted to solve the problem of foot slip of virtual characters in a video after motion capture. The specific process is as follows: the method comprises the steps of shooting rays downwards from the foot part of the virtual character vertically, calculating the contact position of the foot and the ground and the distance between the foot and the ground, and fixing the foot on the contact position by adopting reverse dynamics if the distance between the foot and the ground is smaller than a certain threshold value.
Although the above method can ensure that the foot of the virtual character steps on the ground, the method cannot eliminate the sliding of the foot in the horizontal direction, and therefore, the motion capture effect is not good.
Disclosure of Invention
The present disclosure provides a motion capture method, a motion capture apparatus, an electronic device and a computer-readable storage medium, so as to at least solve the problem of poor motion capture effect in the related art. The technical scheme of the disclosure is as follows:
according to a first aspect of embodiments of the present disclosure, there is provided a motion capture method, the method comprising:
carrying out initial motion capture on a target figure in each frame of video image of a video to be processed to obtain three-dimensional coordinates of each joint point of the target figure in each frame of video image and a local rotation matrix of each joint point relative to a father node;
converting the local rotation matrix of each joint point relative to a father node into a global rotation matrix of each joint point relative to a root node according to a human body skeleton structure;
calculating the distance between the foot and the ground according to the three-dimensional coordinates of the foot joint points;
calculating the foot speed according to the two-dimensional coordinates of the foot key points;
determining the contact condition between the foot and the ground according to the distance between the foot and the ground and the foot speed;
responding to the contact of the foot of the target person in the nth frame of video image with the ground, and optimizing the three-dimensional coordinate and the global rotation matrix of the ankle joint point, so that the optimized three-dimensional coordinate and the optimized global rotation matrix of the ankle joint point are respectively equal to the optimized three-dimensional coordinate and the optimized global rotation matrix of the target person in the (N + 1) th frame of video image, wherein N is less than or equal to N, and N is the total frame number of the video images included in the video to be processed;
and adjusting the three-dimensional coordinates and local rotation matrixes of other joint points of the target person in the nth frame of video image based on the optimized three-dimensional coordinates and global rotation matrixes of the ankle joint points to obtain the processed nth frame of video image.
In another embodiment of the present disclosure, the calculating a distance between the foot and the ground from the three-dimensional coordinates of the foot joint point comprises:
and calculating the distance between the foot joint point and the ground according to the ground position and the three-dimensional coordinates of the foot joint point.
In another embodiment of the present disclosure, before calculating the distance between the foot joint point and the ground according to the ground position and the three-dimensional coordinates of the foot joint point, the method further comprises:
for a target person in each frame of video image, acquiring the three-dimensional coordinates of the lowest joint point according to the three-dimensional coordinates of each joint point;
calculating the distance between the lowest joint point and the ground according to the three-dimensional coordinate of the lowest joint point, the position coordinate of the ground to be calibrated and the normal vector of the plane where the ground is located;
calculating the sum of the distances between the lowest joint point of the target person in each frame of video image and the ground;
determining the ground position coordinates and the normal vector that minimize the distance sum as the ground position.
In another embodiment of the present disclosure, the calculating the foot velocity from the two-dimensional coordinates of the foot key points comprises:
splicing the two-dimensional coordinates of the key points of each foot part into a foot part vector of each foot part;
and calculating the foot speed of each foot according to the foot vector of each foot.
In another embodiment of the present disclosure, said calculating a foot velocity of said each foot from said foot vector of said each foot comprises:
calculating the foot velocity of each foot side according to the foot vector of each foot side by applying the following formula:
Figure BDA0002983019950000031
wherein the content of the first and second substances,
Figure BDA0002983019950000032
representing the foot speed of the foot on one side of the target person in the nth frame video image,
Figure BDA0002983019950000033
a foot vector representing the side foot of the target person in the n-th frame of video image, and K represents
Figure BDA0002983019950000034
The number of key points of the foot that are included,
Figure BDA0002983019950000035
the foot vector of the foot part of the target person in the n +1 th frame of video image is represented, delta T represents the acquisition time interval of two adjacent frames of video images, and i represents the left side or the right side.
In another embodiment of the present disclosure, the determining the contact condition between the foot and the ground according to the distance between the foot and the ground and the foot speed includes:
and for a side foot of the target person in the nth frame of video image, when the distance between the side foot and the ground is smaller than a distance threshold value and the foot speed of the side foot is smaller than a speed threshold value, determining that the side foot is in contact with the ground.
In another embodiment of the present disclosure, the optimizing the three-dimensional coordinates of the ankle joint point in response to the contact of the foot of the target person with the ground in the nth video image includes:
in response to the contact between one side foot of the target person in the nth frame of video image and the ground, based on a first constraint condition, applying the following formula to optimize the three-dimensional coordinates of the ankle joint point of the side foot:
Figure BDA0002983019950000036
wherein the content of the first and second substances,
Figure BDA0002983019950000037
representing the three-dimensional coordinates of the optimized ankle joint point of the side foot part of the target person in the nth frame of video image,
Figure BDA0002983019950000038
representing the three-dimensional coordinates of the optimized ankle joint point of the side foot part of the target person in the n +1 th frame of video image,
Figure BDA0002983019950000039
and the three-dimensional coordinates of the ankle joint point of the side foot part of the target person in the nth frame of video image are shown, and i represents the left side or the right side.
In another embodiment of the present disclosure, the optimizing the global rotation matrix of the ankle joint point in response to the contact between the foot and the ground of the target person in the nth frame of video image comprises:
in response to the contact between one side foot of the target person in the nth frame of video image and the ground, applying the following formula to optimize a global rotation matrix of the ankle joint points of the side foot based on a second constraint condition:
Figure BDA0002983019950000041
wherein the content of the first and second substances,
Figure BDA0002983019950000042
a global rotation matrix representing the optimized ankle joint point of the side foot of the target person in the nth frame of video image,
Figure BDA0002983019950000043
a global rotation matrix representing the optimized ankle joint point of the side foot of the target person in the n +1 th frame of video image,
Figure BDA0002983019950000044
and a global rotation matrix representing the ankle joint point of the side foot of the target person in the nth frame of video image, wherein i represents the left side or the right side.
In a second aspect, there is provided a motion capture device, the device comprising:
the motion capture module is used for carrying out initial motion capture on a target person in each frame of video image of a video to be processed to obtain three-dimensional coordinates of each joint point of the target person in each frame of video image and a local rotation matrix of each joint point relative to a father node;
the transformation module is used for transforming the local rotation matrix of each joint point relative to the father node into a global rotation matrix of each joint point relative to the root node according to the human body skeleton structure;
the calculation module is used for calculating the distance between the foot and the ground according to the three-dimensional coordinates of the foot joint points;
the calculation module is used for calculating the foot speed according to the two-dimensional coordinates of the foot key points;
the determining module is used for determining the contact condition between the foot and the ground according to the distance between the foot and the ground and the foot speed;
the optimization module is used for responding to the contact between the foot of a target person in the nth frame of video image and the ground, and optimizing the three-dimensional coordinates and the global rotation matrix of the ankle joint point, so that the optimized three-dimensional coordinates and the optimized global rotation matrix of the ankle joint point are respectively equal to the optimized three-dimensional coordinates and the optimized global rotation matrix of the target person in the (N + 1) th frame of video image, wherein N is less than or equal to N, and N is the total frame number of the video images included in the video to be processed;
and the adjusting module is used for adjusting the three-dimensional coordinates and the local rotation matrix of other joint points of the target person in the nth frame of video image based on the optimized three-dimensional coordinates and the global rotation matrix of the ankle joint points to obtain the processed nth frame of video image.
In another embodiment of the present disclosure, the calculation module is configured to calculate the distance between the foot joint point and the ground according to the ground position and the three-dimensional coordinates of the foot joint point.
In another embodiment of the present disclosure, the apparatus further comprises:
the acquisition module is used for acquiring the three-dimensional coordinates of the lowest joint point according to the three-dimensional coordinates of each joint point for the target person in each frame of video image;
the calculation module is further used for calculating the distance between the lowest joint point and the ground according to the three-dimensional coordinate of the lowest joint point, the position coordinate of the ground to be calibrated and the normal vector of the plane where the ground is located;
the computing module is further used for computing the sum of the distance between the lowest joint point of the target person in each frame of video image and the ground;
the determining module is further configured to determine the ground position as the ground position based on the ground position coordinate and the normal vector that minimize the distance sum.
In another embodiment of the present disclosure, the calculating module is further configured to splice the two-dimensional coordinates of the key points of each foot into a foot vector of each foot; and calculating the foot speed of each foot according to the foot vector of each foot.
In another embodiment of the present disclosure, the calculating module is configured to calculate the foot velocity of each foot side by applying the following formula according to the foot vector of each foot side:
Figure BDA0002983019950000051
wherein the content of the first and second substances,
Figure BDA0002983019950000052
representing the foot speed of the foot on one side of the target person in the nth frame video image,
Figure BDA0002983019950000053
a foot vector representing the side foot of the target person in the n-th frame of video image, and K represents
Figure BDA0002983019950000054
The number of key points of the foot that are included,
Figure BDA0002983019950000055
the foot vector of the foot part of the target person in the n +1 th frame of video image is represented, delta T represents the acquisition time interval of two adjacent frames of video images, and i represents the left side or the right side.
In another embodiment of the present disclosure, the determining module is configured to determine, for a side foot of the target person in the nth frame of video image, that the side foot is in contact with the ground when a distance between the side foot and the ground is less than a distance threshold and a foot speed of the side foot is less than a speed threshold.
In another embodiment of the present disclosure, the optimization module is configured to, in response to a contact between a lateral foot of the target person in the nth video image and the ground, apply the following formula to optimize the three-dimensional coordinates of the ankle joint point of the lateral foot based on the first constraint condition:
Figure BDA0002983019950000056
wherein the content of the first and second substances,
Figure BDA0002983019950000057
representing the three-dimensional coordinates of the optimized ankle joint point of the side foot part of the target person in the nth frame of video image,
Figure BDA0002983019950000058
representing the three-dimensional coordinates of the optimized ankle joint point of the side foot part of the target person in the n +1 th frame of video image,
Figure BDA0002983019950000059
and the three-dimensional coordinates of the ankle joint point of the side foot part of the target person in the nth frame of video image are shown, and i represents the left side or the right side.
In another embodiment of the present disclosure, the optimization module is configured to, in response to a contact between a lateral foot of the target person in the nth video image and the ground, apply the following formula to optimize the global rotation matrix of the ankle joint point of the lateral foot based on a second constraint condition:
Figure BDA0002983019950000061
wherein the content of the first and second substances,
Figure BDA0002983019950000062
a global rotation matrix representing the optimized ankle joint point of the side foot of the target person in the nth frame of video image,
Figure BDA0002983019950000063
a global rotation matrix representing the optimized ankle joint point of the side foot of the target person in the n +1 th frame of video image,
Figure BDA0002983019950000064
and a global rotation matrix representing the ankle joint point of the side foot of the target person in the nth frame of video image, wherein i represents the left side or the right side.
In a third aspect, an electronic device is provided, including:
a processor;
a memory for storing the processor-executable instructions;
wherein the processor is configured to execute the instructions to implement the motion capture method of the first aspect.
In a fourth aspect, a computer-readable storage medium is provided, the instructions in which, when executed by a processor of an electronic device, enable the electronic device to perform the motion capture method of the first aspect.
In a fifth aspect, a computer program product is provided, comprising a computer program which, when executed by a processor, implements the motion capture method of the first aspect.
The technical scheme provided by the embodiment of the disclosure at least brings the following beneficial effects:
when the foot of the target person in any frame of video image is in contact with the ground, the position and the orientation of the ankle joint point of the target person in the next frame of video image are optimized through the position and the orientation of the ankle joint point of the target person in the next frame of video image, the position and the orientation of the ankle joint point of the target person in the frame of video image are obtained when the postures of the target person in the adjacent frame of video image are consistent as much as possible, and then the positions and the orientations of other joint points in the frame of video image are adjusted according to the human body posture of the target person on the basis of the position and the orientation of the optimized ankle joint point in the frame of video image, so that the target person in the frame of video image keeps the original posture, the foot can stably stand on the ground without sliding, and the action capturing effect is better.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and, together with the description, serve to explain the principles of the disclosure and are not to be construed as limiting the disclosure.
FIG. 1 is a flow diagram illustrating a method of motion capture in accordance with an exemplary embodiment.
FIG. 2 is a flow diagram illustrating a method of motion capture in accordance with an exemplary embodiment.
FIG. 3 is a flow diagram illustrating a motion capture method in accordance with an exemplary embodiment.
FIG. 4 is a block diagram illustrating a motion capture device, according to an example embodiment.
Fig. 5 shows a block diagram of an electronic device according to an exemplary embodiment of the present disclosure.
Detailed Description
In order to make the technical solutions of the present disclosure better understood by those of ordinary skill in the art, the technical solutions in the embodiments of the present disclosure will be clearly and completely described below with reference to the accompanying drawings.
It should be noted that the terms "first," "second," and the like in the description and claims of the present disclosure and in the above-described drawings are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the disclosure described herein are capable of operation in sequences other than those illustrated or otherwise described herein. The implementations described in the exemplary embodiments below are not intended to represent all implementations consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present disclosure, as detailed in the appended claims.
The user information to which the present disclosure relates may be information authorized by the user or sufficiently authorized by each party.
And (3) optional: the environment in which the solution is applied is illustrated. For example, a technical solution related to multi-end interaction needs to be limited in the network environment and hardware to which the technical solution is applied before the detailed description of the solution.
With the continuous development of human-computer interaction technology, the natural and multi-modal interaction technology of human and computer has entered the stage of getting burned, and the motion capture technology is an important branch of human-computer interaction, and has developed rapidly in recent years. The motion capture system converts the information into a digital abstract model by detecting and recording the motion postures and positions of the human body or other objects, and expresses the postures of the targets at different moments. Motion capture systems have been widely used in the fields of motion analysis, model coding, virtual reality, animation production, intelligent monitoring systems, game production, and the like. For example, in an interactive game, various actions of a player can drive actions of virtual characters in a game environment, so that a brand-new participation experience is brought to the player, and the reality and interactivity of the game are enhanced; in animation production, the motion capture technology greatly improves the development efficiency and the development level of animation and game production and reduces the development cost of the animation and game production. In sports training, the motion capture technology can capture quantitative information of displacement, speed, acceleration, force, electromyographic signals and the like of an athlete in the motion process, and by combining a machine learning technology and a human body biomechanics principle, the motion of the athlete can be analyzed from a quantitative angle and a scientific improvement method is provided. Therefore, the motion capture technology applied to the human body has wide application prospect and great commercial value. However, the currently adopted motion capture method has low capture precision, so that the virtual character in the capture result has a foot slip problem, which affects the viewing experience of the user.
In order to solve the problem of foot slip of a virtual character and improve the viewing experience of a user, the embodiment of the disclosure provides a motion capture method. FIG. 1 is a flow chart illustrating a motion capture method, as shown in FIG. 1, for use in an electronic device, including the following steps, according to an example embodiment.
In step S101, performing initial motion capture on a target person in each frame of video image of a video to be processed to obtain three-dimensional coordinates of each joint point of the target person in each frame of video image and a local rotation matrix of each joint point relative to a parent node.
In step S102, the local rotation matrix of each joint point with respect to the parent node is converted into a global rotation matrix of each joint point with respect to the root node according to the human skeleton structure.
In step S103, the distance between the foot and the ground is calculated from the three-dimensional coordinates of the foot joint point.
In step S104, the foot speed is calculated from the two-dimensional coordinates of the foot key points.
In step S105, the contact condition between the foot and the ground is determined according to the distance between the foot and the ground and the foot speed.
In step S106, in response to the contact between the foot of the target person in the nth frame of video image and the ground, the three-dimensional coordinates and the global rotation matrix of the ankle joint point are optimized, so that the three-dimensional coordinates and the global rotation matrix of the optimized ankle joint point are respectively equal to the three-dimensional coordinates and the global rotation matrix of the optimized ankle joint point of the target person in the (N + 1) th frame of video image, where N is equal to or less than N, and N is the total number of frames of video images included in the video to be processed.
In step S107, the three-dimensional coordinates and the local rotation matrix of the other joint points of the target person in the nth frame of video image are adjusted based on the optimized three-dimensional coordinates and the global rotation matrix of the ankle joint points, so as to obtain a processed nth frame of video image.
The method provided by the embodiment of the disclosure can reflect the position of the joint point by the three-dimensional coordinate of the joint point of the target person, can reflect the orientation of the joint point by the global rotation matrix, when the foot of the target person in any frame of video image contacts the ground, optimizes the position and orientation of the ankle joint point of the target person in the frame of video image by the position and orientation of the ankle joint point of the target person in the next frame of video image, obtains the position and orientation of the ankle joint point in the target person in the frame of video image when the postures of the target person in the adjacent frame of video image are consistent as much as possible, then adjusts the positions and orientations of other joint points in the frame of video image according to the posture of the target person based on the position and orientation of the optimized ankle joint point in the frame of video image, so that the target person in the frame of video image keeps the original posture, the foot can stably stand on the ground without sliding, and the action capturing effect is better.
In another embodiment of the present disclosure, calculating a distance between the foot and the ground from the three-dimensional coordinates of the foot joint points comprises:
and calculating the distance between the foot joint point and the ground according to the ground position and the three-dimensional coordinates of the foot joint point.
In another embodiment of the present disclosure, before calculating the distance between the foot joint point and the ground according to the ground position and the three-dimensional coordinates of the foot joint point, the method further comprises:
for a target person in each frame of video image, acquiring the three-dimensional coordinates of the lowest joint point according to the three-dimensional coordinates of each joint point;
calculating the distance between the lowest joint point and the ground according to the three-dimensional coordinate of the lowest joint point, the position coordinate of the ground to be calibrated and the normal vector of the plane where the ground is located;
calculating the sum of the distances between the lowest joint point of the target person and the ground in each frame of video image;
the ground location coordinate and normal vector that minimizes the distance sum are determined as the ground location.
In another embodiment of the present disclosure, calculating foot velocity from two-dimensional coordinates of foot keypoints comprises:
splicing the two-dimensional coordinates of the key points of each foot part into a foot part vector of each foot part;
and calculating the foot speed of each foot according to the foot vector of each foot.
In another embodiment of the present disclosure, calculating a foot velocity for each foot from the foot vector for each foot comprises:
from the foot vector for each foot, the foot velocity for each foot is calculated using the following formula:
Figure BDA0002983019950000101
wherein the content of the first and second substances,
Figure BDA0002983019950000102
the foot speed of the foot on the side of the target person in the n-th frame video image is shown,
Figure BDA0002983019950000103
a foot vector representing the foot of the target human body side in the nth frame video image, and K represents
Figure BDA0002983019950000104
The number of key points of the foot that are included,
Figure BDA0002983019950000105
the foot vector of the foot of the target human object side in the n +1 frame video image is represented, Δ T represents the acquisition time interval of two adjacent frames of video images, and i represents the left side or the right side.
In another embodiment of the present disclosure, determining contact between the foot and the ground based on the distance between the foot and the ground and the foot speed comprises:
and for one side foot of the target person in the nth frame of video image, when the distance between the side foot and the ground is smaller than a distance threshold value and the foot speed of the side foot is smaller than a speed threshold value, determining that the side foot is in contact with the ground.
In another embodiment of the present disclosure, in response to the foot of the target person in the nth video image contacting the ground, the optimization process of the three-dimensional coordinates of the ankle joint point includes:
and in response to the contact between one side foot of the target person in the nth frame of video image and the ground, optimizing the three-dimensional coordinates of the ankle joint point of the side foot by applying the following formula based on the first constraint condition:
Figure BDA0002983019950000106
wherein the content of the first and second substances,
Figure BDA0002983019950000107
representing the three-dimensional coordinates of the optimized ankle joint point of the target human body side foot in the nth frame video image,
Figure BDA0002983019950000108
representing the three-dimensional coordinates of the optimized ankle joint point of the target human body side foot in the (n + 1) th frame video image,
Figure BDA0002983019950000109
and the three-dimensional coordinates of the ankle joint point of the target human body side foot in the nth frame video image are shown, and i represents the left side or the right side.
In another embodiment of the present disclosure, in response to the target person's foot in contact with the ground in the nth video image, the optimization process for the global rotation matrix of the ankle joint point includes:
and in response to the contact between the foot part of the target person in the nth frame of video image and the ground, applying the following formula to optimize the global rotation matrix of the ankle joint point of the side foot part based on the second constraint condition:
Figure BDA00029830199500001010
wherein the content of the first and second substances,
Figure BDA00029830199500001011
a global rotation matrix of the optimized ankle joint point of the target human body side foot in the nth frame video image is shown,
Figure BDA00029830199500001012
is shown asThe global rotation matrix of the optimized ankle joint point of the target human body side foot in the n +1 frame video image,
Figure BDA00029830199500001013
and the global rotation matrix represents the ankle joint point of the target human body side foot in the nth frame video image, and i represents the left side or the right side.
Fig. 2 is a flow chart illustrating a motion capture method, as shown in fig. 2, for use in an electronic device, according to an example embodiment, including the following steps.
In step S201, a video to be processed is acquired.
The video to be processed may be a single-view video obtained by shooting the target person from one view angle, or may be a multi-view video obtained by shooting the target person from multiple view angles. When the video to be processed is a multi-view video, time synchronization needs to be performed on videos of different views in advance. When the videos at different visual angles are synchronized, the synchronization can be realized by adopting a camera supporting hardware synchronization or software synchronization, and the time difference between the videos shot at different visual angles can be estimated by adopting a sound synchronization method, so that the videos are delayed by corresponding time to realize time synchronization.
In step S202, initial motion capture is performed on the target person in each frame of video image of the video to be processed, so as to obtain three-dimensional coordinates of each joint point of the target person in each frame of video image and a local rotation matrix of each joint point relative to the parent node.
After the video to be processed is obtained, performing initial motion capture on a target figure in each frame of video image in the video to be processed by adopting a motion capture technology to obtain three-dimensional coordinates of each joint point of the target figure in each frame of video image and a local rotation matrix of each joint point relative to a father node. The motion capture technology includes inertial motion capture technology, single-view capture technology, multi-view capture technology, and the like. The target character is an actor in the video. The local rotation matrix may be employed
Figure BDA0002983019950000111
Is represented by the formula (I), wherein RjJ is the J-th joint point in the human skeleton, the value of J is 1 to J, and J is the number of the joint points in the human skeleton. The three-dimensional coordinates of the joint points can be adopted
Figure BDA0002983019950000112
And (4) showing.
When the three-dimensional coordinates of each joint point of the target person and the local rotation matrix of each joint point relative to the parent node in each frame of video image are obtained, the related technology can directly output the motion capture result or redirect the motion capture result to the virtual character to be driven for output. However, the accuracy of the conventional motion capture method is limited, so that the virtual character reconstructed according to the target person has a foot slip problem. In order to solve the problem of foot sliding of the virtual character, the embodiment of the present disclosure does not directly use the motion capture result to drive the virtual character, and the position and the orientation of the joint point of the foot of the virtual character are fixed through the subsequent steps when the foot of the virtual character contacts the ground, so that the foot of the virtual character cannot slide, thereby well solving the problem of foot sliding of the virtual character.
It should be noted that, in the field of motion capture technology, redirection is an optional operation, and the redirection maps the human skeleton defined in the motion capture algorithm to the skeleton of the virtual character to be driven. To avoid confusion, embodiments of the present disclosure do not distinguish between the various joints of the human skeleton and the various joints of the virtual character skeleton of the motion capture algorithm, i.e., if a redirection operation is performed, embodiments of the present disclosure still use
Figure BDA0002983019950000121
And
Figure BDA0002983019950000122
three-dimensional coordinates of each joint point on the human skeleton representing the re-oriented virtual character and a local rotation matrix of each joint point relative to the parent node.
In step S203, the local rotation matrix of each joint point with respect to the parent node is converted into a global rotation matrix of each joint point with respect to the root node according to the human skeleton structure.
Because the local rotation matrix of each joint point relative to its father node is obtained after the initial motion capture, in order to facilitate subsequent calculation, the embodiment of the disclosure further converts the local rotation matrix of each joint point relative to the father node into the global rotation matrix of each joint point relative to the root node according to the skeleton structure of the human body, and the global rotation matrix can adopt
Figure BDA0002983019950000123
Meaning that the root node is typically a human pelvic joint point. Specifically, the global rotation matrix of a joint is equal to the local rotation matrix of the joint relative to its parent node multiplied by the local rotation matrix of the parent node relative to the root node and then multiplied by the rotation matrix of the root node. For example, the root node is a pelvic joint point, the parent node of the wrist joint point is an elbow joint point, and the global rotation matrix of the wrist joint point is a local rotation matrix of the wrist joint point relative to the elbow joint point and a local rotation matrix of the elbow joint point relative to the pelvic joint point and a rotation matrix of the pelvic joint point.
In step S204, the distance between the foot and the ground is calculated from the three-dimensional coordinates of the foot joint point.
Since the ground position is an important reference for determining the contact condition between the foot of the target person and the ground, the ground position can be determined before the step is performed, and the ground position comprises the position coordinates of the ground and the normal vector of the plane where the ground is located. The position coordinate of the ground in the embodiment of the disclosure is the intercept of the ground on the y axis, and y is adoptedfioorAnd (4) representing that a normal vector of a plane where the ground is located is represented by e. In determining the ground location, embodiments of the present disclosure need to make the following assumptions:
first, it is assumed that the target person is in contact with the ground for most of the time in the video to be processed, i.e., the time for the target person to completely vacate the ground is a small percentage of the total duration of the video to be processed.
Second, assume that the camera is placed approximately parallel to the ground, i.e., the positive direction of the camera in the y-axis of the coordinate system is approximately above the actual physical space.
Based on the above two assumptions, when determining the ground position, the following method can be adopted:
the method comprises the following steps of firstly, acquiring the three-dimensional coordinates of the lowest joint point according to the three-dimensional coordinates of all joint points for a target person in each frame of video image.
And acquiring the joint point with the minimum y axis from the joint coordinates of all joint points of the target person in each frame of video image, taking the joint point with the minimum y axis as the lowest joint point of the target person, and further acquiring the three-dimensional coordinates of the lowest joint point of the target person in each frame of video image.
When the three-dimensional coordinates of the lowest joint point of the target person in each frame of video image are acquired, the three-dimensional coordinates of the lowest joint point extracted from each frame of video image are formed into a sequence which can be expressed as
Figure BDA0002983019950000131
Wherein the content of the first and second substances,
Figure BDA0002983019950000132
three-dimensional coordinates representing the lowest joint point of the target person in the first frame video image,
Figure BDA0002983019950000133
three-dimensional coordinates representing the lowest joint point of the target person in the second frame video image,
Figure BDA0002983019950000134
and representing the three-dimensional coordinates of the lowest joint point of the target person in the N frame of video image, wherein N is the total frame number of the video images included in the video to be processed.
And secondly, calculating the distance between the lowest joint point and the ground according to the three-dimensional coordinate of the lowest joint point, the position coordinate of the ground to be calibrated and the normal vector of the plane where the ground is located.
And for each frame of video image, acquiring a vector difference between the position coordinate of the ground and the three-dimensional coordinate of the lowest joint point of the target person in each frame of video image, and taking the product of the normal vector of the plane where the ground is located and the vector difference corresponding to each frame of video image as the distance between the lowest joint point of the target person and the ground in each frame of video image.
And thirdly, calculating the sum of the distances between the lowest joint point of the target person in each frame of video image and the ground.
After the distance between the lowest joint point of the target person and the ground in each frame of video image is calculated, the distance sum between the lowest joint point of the target person and the ground in each frame of video image is obtained by adding the distances between the lowest joint point of the target person and the ground in each frame of video image.
The position coordinates of the ground are set to be P ═ {0, yfioor0, the sum of the distances between the lowest joint point of the target person in each frame of video image and the ground to be calibrated is expressed as
Figure BDA0002983019950000135
And fourthly, determining the ground position as the ground position by using the ground position coordinate and the normal vector which enable the distance sum to be minimum.
In this step, when the ground position coordinates and normal vectors that minimize the distance sum are obtained, the optimal solution of the following function can be calculated:
Figure BDA0002983019950000136
obtaining y that minimizes the sum of the distances by solving an optimal solution of the functionfioorAnd a normal vector e, and further sum the yfioorAnd the normal vector e is determined as the ground position. To improve the robustness of the algorithm to outliers, embodiments of the present disclosure may also use the energy term in the huber function pair equation above
Figure BDA0002983019950000137
And (6) carrying out constraint.
It should be noted that, if the position of the camera remains unchanged, the ground position determination result may be reused, that is, the ground position does not need to be determined again, and after the ground position is determined this time, subsequent calculation may be performed based on the determination result this time. In determining the ground location, a set of videos may be previously captured for determining the ground location.
When the target person stands or moves, the foot joint point is the lowest joint point of the target person generally, and the ground to be calibrated can ensure that the foot of the target person steps on the ground for most of time by optimizing the sum of the distances between the lowest joint point of the target person and the ground, so that the contact condition between the foot of the target person and the ground can be conveniently judged subsequently.
Based on the determined ground position, the distance between the foot joint point and the ground can be calculated from the ground position and the three-dimensional coordinates of the foot joint point. For example, for the n-th frame of video image, the three-dimensional coordinates of the left foot joint point of the target person in the n-th frame of video image are set to
Figure BDA0002983019950000141
The distance between the left foot joint point of the target person in the nth video image and the ground is
Figure BDA0002983019950000142
For another example, for the n-th frame of video image, the three-dimensional coordinates of the right foot joint point of the target person in the n-th frame of video image are set to
Figure BDA0002983019950000143
The distance between the right foot joint point of the target person in the nth video image and the ground is
Figure BDA0002983019950000144
The embodiment of the disclosure can calculate the distance between the foot joint point and the ground by applying a point-to-plane distance formula based on the calibrated ground position and the three-dimensional coordinates of the foot joint point, thereby providing a method for calculating the distance between the foot joint point and the ground.
In step S205, the foot speed is calculated from the two-dimensional coordinates of the foot key points.
Before this step is performed, the foot key points may be defined in advance according to the processing requirements for the video image. The key points of the foot comprise big toe joint points, small toe joint points, heel joint points, ankle joint points and the like. The keypoint detection algorithm may be a 2D (Two-Dimensional) keypoint detection algorithm, or the like. The 2D keypoint detection algorithm may use a common openpos or other algorithm. When the 2D key point detection algorithm is adopted for detection, the 2D coordinates of the joint points which are not detected are set to be (0, 0).
For the target person in each frame of video image, when the foot speed is calculated according to the two-dimensional coordinates of the foot key points, the following method can be adopted:
2051. and splicing the two-dimensional coordinates of the key points of each foot part into the foot part vector of each foot part.
For any frame of video image, the two-dimensional coordinates of the key points of the left foot of the target person in the frame of video image are spliced into a first foot vector of the left foot. For example, a first foot vector formed by splicing two-dimensional coordinates of a left foot key point of a target person in the nth frame video image is represented as
Figure BDA0002983019950000145
And for any frame of video image, splicing the two-dimensional coordinates of the key points of the right foot of the target person in the frame of video image into a second foot vector of the right foot. For example, a second foot vector formed by splicing the two-dimensional coordinates of the key point of the right foot of the target person in the nth frame of video image is represented as
Figure BDA0002983019950000151
2052. And calculating the foot speed of each foot according to the foot vector of each foot.
For the left foot of the target person in the nth frame of video image, the following formula can be applied to calculate the foot velocity of the left foot:
Figure BDA0002983019950000152
wherein the content of the first and second substances,
Figure BDA0002983019950000153
the foot speed of the left foot of the target person in the n-th frame of video image is shown, and K is shown
Figure BDA0002983019950000154
The number of the foot key points is included, delta T represents the acquisition time interval of two adjacent frames of video images,
Figure BDA0002983019950000155
a first footer vector representing the left footer of the target person in the nth frame video image,
Figure BDA0002983019950000156
a first footer vector representing the left footer of the target person in the (n + 1) th frame of video image.
For the right foot of the target person in the nth frame of video image, the following formula can be applied to calculate the foot velocity of the right foot:
Figure BDA0002983019950000157
wherein the content of the first and second substances,
Figure BDA0002983019950000158
the foot velocity of the right foot of the target person in the n-th frame of video image is shown, and K is shown
Figure BDA0002983019950000159
The number of the foot key points is included, delta T represents the acquisition time interval of two adjacent frames of video images,
Figure BDA00029830199500001510
second footer vector representing the right footer of the target person in the n-th frame video image,
Figure BDA00029830199500001511
And a second footer vector representing the right footer of the target person in the n +1 th frame video image.
The embodiment of the disclosure calculates the foot vector spliced by the key points of each foot in the two adjacent frames of video images, the acquisition time interval of the two adjacent frames of video images and the number of the key points of each foot, and can directly acquire the relevant parameters for calculating the speed of each foot without using tools such as a sensor, so that the cost is saved, and the calculation result is more accurate.
The embodiment of the disclosure provides a foot speed calculation method, which is based on detected key points of each side of foot, and the two-dimensional coordinates of the key points of each side of foot are spliced into a foot vector capable of describing the characteristics of each side of foot, so that the foot speed of each side of foot is calculated according to the foot vector of each side of foot.
In step S206, the contact condition between the foot and the ground is determined according to the distance between the foot and the ground and the foot speed.
When the contact condition between the foot of the target person and the ground is determined in each frame of video image, it is assumed in advance that the foot does not slip when contacting the ground, and the contact condition between the foot and the ground can be determined based on the assumption using the foot speed and the distance between the foot and the ground. And for one side foot of the target person in the nth frame of video image, when the distance between the side foot and the ground is smaller than a distance threshold value and the foot speed of the side foot is smaller than a speed threshold value, determining that the side foot is in contact with the ground. Wherein the distance threshold value adopts ythThe distance threshold may be 1 mm, 2 mm, etc. The speed threshold represents, which may be 1 mm/sec, 2 mm/sec, etc.
For the left and right feet of the target person in the nth frame of video image, when determining the contact condition between the feet and the ground, the following description will be made separately.
For the left foot of the target person in the nth frame of video image, the left foot of the target person is processedThe distance between the left foot and the ground is less than a distance threshold and the foot speed of the left foot is less than a speed threshold, determining that the left foot is in contact with the ground. Set adoption
Figure BDA0002983019950000161
Showing the contact condition of the left foot of the target person with the ground in the nth frame of video image,
Figure BDA0002983019950000162
indicating that the left foot of the target person is in contact with the ground,
Figure BDA0002983019950000163
the left foot of the target person is not in contact with the ground
Figure BDA0002983019950000164
Can be calculated by the following formula:
Figure BDA0002983019950000165
wherein the content of the first and second substances,
Figure BDA0002983019950000166
is the three-dimensional coordinates of the left foot joint point of the target person in the nth video image,
Figure BDA0002983019950000167
the distance between the left foot of the target person and the ground in the nth frame of video image is shown.
And for the right foot of the target person in the nth frame of video image, when the distance between the right foot and the ground is smaller than a distance threshold value and the foot speed of the right foot is smaller than a speed threshold value, determining that the right foot is in contact with the ground. Set adoption
Figure BDA0002983019950000168
Showing the contact of the right foot of the target person with the ground in the nth frame of video imageIn the light of the situation, the user can select the specific situation,
Figure BDA0002983019950000169
indicating that the right foot of the target person is in contact with the ground,
Figure BDA00029830199500001610
the right foot of the target person is not in contact with the ground
Figure BDA00029830199500001611
Can be calculated by the following formula:
Figure BDA00029830199500001612
wherein the content of the first and second substances,
Figure BDA00029830199500001613
is the three-dimensional coordinates of the right side foot joint point of the target person in the nth frame video image,
Figure BDA00029830199500001614
is the distance between the right foot of the target person and the ground in the nth frame of video image.
Because the contact condition between foot and the ground can be reflected to distance between foot and the ground and foot speed, usually when the distance between foot and the ground is great, the foot can not contact with ground, and the foot also can not contact with ground when the speed of foot and ground is great, this disclosed embodiment is based on the foot speed of every side foot and the distance between every side foot and the ground, when distance between foot and ground and foot speed all satisfy the requirement, just determine foot and ground contact, the accuracy of determining the result has been improved.
In step S207, in response to the contact between the foot of the target person in the nth frame of video image and the ground, the three-dimensional coordinates and the global rotation matrix of the ankle joint point are optimized, so that the three-dimensional coordinates and the global rotation matrix of the optimized ankle joint point are respectively equal to the three-dimensional coordinates and the global rotation matrix of the optimized ankle joint point of the target person in the (N + 1) th frame of video image, where N is equal to or less than N, and N is the total number of frames of video images included in the video to be processed.
The above-mentioned when judging whether the foot of the target person in each frame of video image contacts with the ground, it is set that the foot will not slide when contacting with the ground, however, in reality, the foot will slide when contacting with the ground because the foot has a certain speed in the motion process of the target person. To solve this problem, the embodiments of the present disclosure will also optimize the three-dimensional coordinates and rotation matrix of the foot joint point of the target person in each frame of the video image so that the three-dimensional coordinates and rotation matrix of the foot joint point of the target person in the video image are fixed when the foot of the target person in the video image is in contact with the ground. In the field of motion capture, the foot can rotate along with the rotation of the ankle joint point as a rigid body, so that the optimization problem of the ankle joint point can be converted into the optimization of the ankle joint point.
The optimization process of the three-dimensional coordinates of the ankle joint point of the left foot of the target person in the nth frame video image comprises the following steps:
and in response to the contact between the left foot of the target person in the nth frame of video image and the ground, applying the following formula to optimize the three-dimensional coordinates of the ankle joint point of the left foot of the target person in the nth frame of video image based on the first constraint condition:
Figure BDA0002983019950000171
wherein the content of the first and second substances,
Figure BDA0002983019950000172
representing the three-dimensional coordinates of the optimized ankle joint point of the left foot of the target person in the nth frame of video image,
Figure BDA0002983019950000173
representing the three-dimensional coordinates of the left foot optimized ankle joint point of the target person in the (n + 1) th frame video image,
Figure BDA0002983019950000174
three-dimensional coordinates of the ankle joint point of the left foot of the target person in the nth frame video image, a first energy function
Figure BDA0002983019950000175
The second energy function is used for constraining the three-dimensional coordinates of the ankle joint point of the left foot, so that the optimized three-dimensional coordinates of the ankle joint point of the left foot are consistent with the three-dimensional coordinates of the ankle joint point of the left foot obtained by motion capture as much as possible
Figure BDA0002983019950000176
And the first constraint condition is that when the left foot of the target person in the nth frame of video image is in contact with the ground, the three-dimensional coordinates of the optimized side ankle joint point of the left foot of the target person in the nth frame of video image are equal to the three-dimensional coordinates of the optimized ankle joint point of the left foot of the target person in the n +1 th frame of video image.
The optimization process of the three-dimensional coordinates of the ankle joint point of the right foot of the target person in the nth frame video image comprises the following steps:
and in response to the contact of the right foot part of the target person with the ground in the nth frame of video image, applying the following formula to optimize the three-dimensional coordinates of the ankle joint point of the right foot part of the target person in the nth frame of video image based on the first constraint condition:
Figure BDA0002983019950000181
wherein the content of the first and second substances,
Figure BDA0002983019950000182
three-dimensional coordinates of the right foot optimized ankle joint point of the target person in the nth frame of video image are shown,
Figure BDA0002983019950000183
represents the n +1 thThe three-dimensional coordinates of the right foot optimized ankle joint point of the target person in the frame video image,
Figure BDA0002983019950000184
three-dimensional coordinates of an ankle joint point of the right foot of the target person in the nth frame of video image and a first energy function
Figure BDA0002983019950000185
The second energy function is used for constraining the three-dimensional coordinates of the ankle joint point of the right foot part, so that the optimized three-dimensional coordinates of the ankle joint point of the right foot part are consistent with the three-dimensional coordinates of the ankle joint point of the right foot part obtained by motion capture as much as possible
Figure BDA0002983019950000186
And the first constraint condition is that when the right foot of the target person in the nth frame of video image is in contact with the ground, the three-dimensional coordinates of the optimized ankle joint point of the right foot of the target person in the nth frame of video image are equal to the three-dimensional coordinates of the optimized ankle joint point of the right foot of the target person in the n +1 th frame of video image.
It should be noted that the ankle joint point in the embodiment of the present disclosure is one of the foot joint points, taking the left foot as an example,
Figure BDA0002983019950000187
is the three-dimensional coordinates of the left foot joint point of the target person in the nth video image,
Figure BDA0002983019950000188
three-dimensional coordinates of the ankle joint point of the left foot of the target person in the n-th video image, that is
Figure BDA0002983019950000189
Is composed of
Figure BDA00029830199500001810
A typical joint point of courseIf the calculation amount is not considered, each joint point of the left foot can be optimized when
Figure BDA00029830199500001811
And
Figure BDA00029830199500001812
all represent the left foot joint point.
The embodiment of the present disclosure optimizes the three-dimensional coordinates of the ankle joint point of the target person in the current frame video image according to the three-dimensional coordinates of the ankle joint point of the target person in the next frame video image, so that the ankle joint points of the target person in the adjacent frame video images are as consistent as possible in terms of the three-dimensional coordinates, i.e., the positions. The three-dimensional coordinates of the ankle joint points of the target person in the current frame video image are the basis for subsequently adjusting the three-dimensional coordinates of other joint points, the three-dimensional coordinates of the ankle joint points of the target person in the current frame video image are optimized, so that the three-dimensional coordinates of the ankle joint points are more accurate, when the other joint points are adjusted on the basis of the ankle joint points, the three-dimensional coordinates of the other adjusted joint points are more accurate, and the accuracy of the motion capture result is further improved.
The optimization process of the global rotation matrix of the ankle joint point of the left foot of the target person in the nth frame of video image is as follows:
and in response to the contact between the left foot of the target person and the ground in the nth frame of video image, applying the following formula to optimize a global rotation matrix of the ankle joint point of the left foot based on the second constraint condition:
Figure BDA0002983019950000191
wherein the content of the first and second substances,
Figure BDA0002983019950000192
a global rotation matrix representing the optimized ankle joint point of the left foot of the target person in the nth frame of video image,
Figure BDA0002983019950000193
a global rotation matrix representing the optimized ankle joint point of the left foot of the target person in the n +1 th frame of video image,
Figure BDA0002983019950000194
global rotation matrix representing the ankle joint point of the left foot of the target person in the nth video image, first energy function
Figure BDA0002983019950000195
The global rotation matrix is used for constraining the ankle joint points of the left foot, so that the optimized global rotation matrix of the ankle joint points of the left foot is consistent with the global rotation matrix of the ankle joint points of the left foot obtained by motion capture as much as possible, and the second energy function
Figure BDA0002983019950000196
And the second constraint condition is that when the left foot of the target person in the nth frame of video image is in contact with the ground, the global rotation matrix of the optimized ankle joint point of the left foot of the target person in the nth frame of video image is equal to the global rotation matrix of the optimized ankle joint point of the left foot of the target person in the n +1 th frame of video image.
The optimization process of the global rotation matrix of the ankle joint point of the right foot of the target person in the nth frame of video image is as follows:
and in response to the contact between the right foot of the target person in the nth frame of video image and the ground, applying the following formula to optimize a global rotation matrix of the ankle joint point of the right foot based on the second constraint condition:
Figure BDA0002983019950000201
wherein the content of the first and second substances,
Figure BDA0002983019950000202
a global rotation matrix representing the right foot optimized ankle joint point of the target person in the nth frame of video image,
Figure BDA0002983019950000203
a global rotation matrix representing the right foot optimized ankle joint point of the target person in the (n + 1) th frame video image,
Figure BDA0002983019950000204
global rotation matrix representing the ankle joint point of the right foot of the target person in the nth video image, first energy function
Figure BDA0002983019950000205
The global rotation matrix is used for constraining the ankle joint points of the right foot, so that the optimized global rotation matrix of the ankle joint points of the right foot is consistent with the global rotation matrix of the ankle joint points of the right foot obtained by motion capture as much as possible, and the second energy function
Figure BDA0002983019950000206
And the second constraint condition is that when the right foot of the target person in the nth frame of video image is in contact with the ground, the global rotation matrix of the optimized ankle joint point of the right foot of the target person in the nth frame of video image is equal to the global rotation matrix of the optimized ankle joint point of the right foot of the target person in the n +1 th frame of video image.
According to the global rotation matrix of the ankle joint point of the target person in the next frame of video image, the global rotation matrix of the ankle joint point of the target person in the current frame of video image is optimized, so that the ankle joint points of the target person in the adjacent frame of video image are as consistent as possible in the orientation direction. The global rotation matrix of the ankle joint point of the target person in the current frame video image is the basis for adjusting the global rotation matrix of other joint points subsequently, the global rotation matrix of the ankle joint point of the target person in the current frame video image is optimized, so that the global rotation matrix of the ankle joint point is more accurate, when other joint points are adjusted on the basis of the ankle joint point, the adjusted local rotation matrix of other joint points is more accurate, and the accuracy of the action capturing result is further improved.
In step S208, the three-dimensional coordinates and the local rotation matrix of other joint points of the target person in the nth frame of video image are adjusted based on the optimized three-dimensional coordinates and the global rotation matrix of the ankle joint point, so as to obtain a processed nth frame of video image.
And adjusting the three-dimensional coordinates and the local rotation matrix of other joint points of the target person in the nth frame of video image by adopting a reverse dynamics method based on the optimized three-dimensional coordinates and the global rotation matrix of the ankle joint points to obtain the processed nth frame of video image. The three-dimensional coordinates and the local rotation matrix of each joint point in the processed nth frame video image are adjusted on the basis of the optimized three-dimensional coordinates and the global rotation matrix of the ankle joint point, so that the three-dimensional coordinates and the local rotation matrix of each joint point of the target person in the adjusted nth frame video image are moved integrally, and the problem of foot sliding of the target person in the adjusted video image can be solved. Among them, the common 3D engines such as Unity and urea have well-established inverse dynamics tools.
The processing procedure of the video image will be described below with reference to fig. 3 as an example.
Referring to fig. 3, for an input single-view or multi-view video, a motion capture method is adopted to process the input single-view or multi-view video to obtain three-dimensional coordinates and a local rotation matrix of each joint point in a human body skeleton, the local rotation matrix of each joint point is converted into a global rotation matrix, and ground calibration is performed based on the obtained three-dimensional coordinates and the global rotation matrix of each joint point in the human body skeleton to obtain a ground position and a ground direction. And detecting by adopting a 2D key point detection method to obtain the 2D key point coordinates of the feet. Judging whether the foot is in contact with the ground or not based on the ground position and direction, the 2D key point coordinates of the foot and the three-dimensional coordinates and the global rotation matrix of each joint point in the human body skeleton, further optimizing the three-dimensional coordinates and the global rotation matrix of the foot joint to obtain the optimized three-dimensional coordinates and the optimized global rotation matrix of the foot joint, and further processing by adopting inverse dynamics to obtain a processed video image.
The method provided by the embodiment of the disclosure can reflect the position of the joint point by the three-dimensional coordinate of the joint point of the target person, can reflect the orientation of the joint point by the global rotation matrix, when the foot of the target person in any frame of video image contacts the ground, optimizes the position and orientation of the ankle joint point of the target person in the frame of video image by the position and orientation of the ankle joint point of the target person in the next frame of video image, obtains the position and orientation of the ankle joint point in the target person in the frame of video image when the postures of the target person in the adjacent frame of video image are consistent as much as possible, then adjusts the positions and orientations of other joint points in the frame of video image according to the posture of the target person based on the position and orientation of the optimized ankle joint point in the frame of video image, so that the target person in the frame of video image keeps the original posture, the foot can stably stand on the ground without sliding, and the action capturing effect is better.
FIG. 4 is a block diagram illustrating a motion capture device, according to an example embodiment. Referring to fig. 4, the apparatus includes:
the motion capture module 401 is configured to perform initial motion capture on a target person in each frame of video image of a video to be processed to obtain three-dimensional coordinates of each joint point of the target person in each frame of video image and a local rotation matrix of each joint point relative to a parent node;
a conversion module 402, configured to convert, according to a human skeleton structure, a local rotation matrix of each joint point relative to a parent node into a global rotation matrix of each joint point relative to a root node;
a calculating module 403, configured to calculate a distance between the foot and the ground according to the three-dimensional coordinates of the foot joint point;
a calculating module 403, configured to calculate a foot speed according to the two-dimensional coordinates of the foot key points;
a determining module 404, configured to determine a contact condition between the foot and the ground according to a distance between the foot and the ground and a foot speed;
an optimizing module 405, configured to optimize a three-dimensional coordinate and a global rotation matrix of an ankle joint point in response to a contact between a foot of a target person in an nth frame of video image and a ground, so that the optimized three-dimensional coordinate and the optimized global rotation matrix of the ankle joint point are equal to the optimized three-dimensional coordinate and the optimized global rotation matrix of the ankle joint point of the target person in an (N + 1) th frame of video image, respectively, where N is equal to or less than N, and N is a total number of frames of video images included in a video to be processed;
and an adjusting module 406, configured to adjust three-dimensional coordinates and local rotation matrices of other joint points of the target person in the nth frame of video image based on the optimized three-dimensional coordinates and global rotation matrix of the ankle joint point, so as to obtain a processed nth frame of video image.
In another embodiment of the present disclosure, the calculating module 403 is configured to calculate the distance between the foot joint point and the ground according to the ground position and the three-dimensional coordinates of the foot joint point.
In another embodiment of the present disclosure, the apparatus further comprises:
the acquisition module is used for acquiring the three-dimensional coordinates of the lowest joint point according to the three-dimensional coordinates of each joint point for the target person in each frame of video image;
the calculation module 403 is further configured to calculate a distance between the lowest joint point and the ground according to the three-dimensional coordinate of the lowest joint point, the position coordinate of the ground to be calibrated, and a normal vector of a plane where the ground is located;
the calculating module 403 is further configured to calculate a sum of distances between a lowest joint point of the target person in each frame of video image and the ground;
the determining module 404 is further configured to determine the ground position as the ground position according to the ground position coordinate and the normal vector that minimize the distance sum.
In another embodiment of the present disclosure, the calculating module 403 is further configured to splice the two-dimensional coordinates of the key points of each foot into a foot vector of each foot; and calculating the foot speed of each foot according to the foot vector of each foot.
In another embodiment of the present disclosure, the calculating module 403 is configured to calculate the foot velocity of each foot by applying the following formula according to the foot vector of each foot:
Figure BDA0002983019950000231
wherein the content of the first and second substances,
Figure BDA0002983019950000232
the foot speed of the foot on the side of the target person in the n-th frame video image is shown,
Figure BDA0002983019950000233
a foot vector representing the foot of the target human body side in the nth frame video image, and K represents
Figure BDA0002983019950000234
The number of key points of the foot that are included,
Figure BDA0002983019950000235
the foot vector of the foot of the target human object side in the n +1 frame video image is represented, Δ T represents the acquisition time interval of two adjacent frames of video images, and i represents the left side or the right side.
In another embodiment of the disclosure, the determining module 404 is configured to determine that the side foot of the target person in the nth frame of video image is in contact with the ground when the distance between the side foot and the ground is less than the distance threshold and the foot speed of the side foot is less than the speed threshold.
In another embodiment of the present disclosure, the optimizing module 405 is configured to optimize the three-dimensional coordinates of the ankle joint point of the lateral foot by applying the following formula based on the first constraint condition in response to the contact between the lateral foot of the target person and the ground in the nth frame of video image:
Figure BDA0002983019950000236
wherein the content of the first and second substances,
Figure BDA0002983019950000237
representing the three-dimensional coordinates of the optimized ankle joint point of the target human body side foot in the nth frame video image,
Figure BDA0002983019950000238
representing the three-dimensional coordinates of the optimized ankle joint point of the target human body side foot in the (n + 1) th frame video image,
Figure BDA0002983019950000239
and the three-dimensional coordinates of the ankle joint point of the target human body side foot in the nth frame video image are shown, and i represents the left side or the right side.
In another embodiment of the present disclosure, the optimization module 405 is configured to optimize the global rotation matrix of the ankle joint point of the lateral foot by applying the following formula based on the second constraint condition in response to the contact between the lateral foot of the target person and the ground in the nth frame of video image:
Figure BDA00029830199500002310
wherein the content of the first and second substances,
Figure BDA00029830199500002311
a global rotation matrix of the optimized ankle joint point of the target human body side foot in the nth frame video image is shown,
Figure BDA00029830199500002312
a global rotation matrix of the optimized ankle joint point of the target human body side foot in the n +1 frame video image is shown,
Figure BDA00029830199500002313
and the global rotation matrix represents the ankle joint point of the target human body side foot in the nth frame video image, and i represents the left side or the right side.
According to the device provided by the embodiment of the disclosure, the three-dimensional coordinates of the joint point of the target person can reflect the position of the joint point, the global rotation matrix can reflect the orientation of the joint point, when the foot of the target person in any frame of video image is in contact with the ground, the position and the orientation of the ankle joint point of the target person in the next frame of video image are optimized according to the position and the orientation of the ankle joint point of the target person in the next frame of video image, the position and the orientation of the ankle joint point of the target person in the frame of video image are obtained when the postures of the target person in the adjacent frame of video image are consistent as much as possible, then the positions and the orientations of other joint points in the frame of video image are adjusted according to the posture of the target person based on the position and the orientation of the optimized ankle joint point in the frame of video image, so that the target person in the frame of video image keeps the original posture, the foot can stably stand on the ground without sliding, and the action capturing effect is better. .
With regard to the apparatus in the above-described embodiment, the specific manner in which each module performs the operation has been described in detail in the embodiment related to the method, and will not be elaborated here.
Fig. 5 shows a block diagram of an electronic device 500 according to an exemplary embodiment of the present disclosure. In general, the apparatus 500 includes: a processor 501 and a memory 502.
The processor 501 may include one or more processing cores, such as a 4-core processor, an 8-core processor, and so on. The processor 501 may be implemented in at least one hardware form of a DSP (Digital Signal Processing), an FPGA (Field-Programmable Gate Array), and a PLA (Programmable Logic Array). The processor 501 may also include a main processor and a coprocessor, where the main processor is a processor for Processing data in an awake state, and is also called a Central Processing Unit (CPU); a coprocessor is a low power processor for processing data in a standby state. In some embodiments, the processor 501 may be integrated with a GPU (Graphics Processing Unit), which is responsible for rendering and drawing the content required to be displayed on the display screen. In some embodiments, processor 501 may also include an AI (Artificial Intelligence) processor for processing computational operations related to machine learning.
Memory 502 may include one or more computer-readable storage media, which may be non-transitory. Memory 502 may also include high-speed random access memory, as well as non-volatile memory, such as one or more magnetic disk storage devices, flash memory storage devices. In some embodiments, a non-transitory computer readable storage medium in memory 502 is used to store at least one instruction for execution by processor 501 to implement the motion capture method provided by method embodiments in the present disclosure.
In some embodiments, the electronic device 500 may further optionally include: a peripheral interface 503 and at least one peripheral. The processor 501, memory 502 and peripheral interface 503 may be connected by a bus or signal lines. Each peripheral may be connected to the peripheral interface 503 by a bus, signal line, or circuit board. Specifically, the peripheral device includes: a power supply 504.
The peripheral interface 503 may be used to connect at least one peripheral related to I/O (Input/Output) to the processor 501 and the memory 502. In some embodiments, the processor 501, memory 502, and peripheral interface 503 are integrated on the same chip or circuit board; in some other embodiments, any one or two of the processor 501, the memory 502, and the peripheral interface 503 may be implemented on a separate chip or circuit board, which is not limited in this embodiment.
The power supply 504 is used to power the various components in the electronic device 500. The power source 504 may be alternating current, direct current, disposable batteries, or rechargeable batteries. When the power supply 504 includes a rechargeable battery, the rechargeable battery may be a wired rechargeable battery or a wireless rechargeable battery. The wired rechargeable battery is a battery charged through a wired line, and the wireless rechargeable battery is a battery charged through a wireless coil. The rechargeable battery may also be used to support fast charge technology.
Those skilled in the art will appreciate that the configuration shown in fig. 5 is not intended to be limiting of the electronic device 500 and may include more or fewer components than those shown, or some components may be combined, or a different arrangement of components may be used.
In an exemplary embodiment, a computer-readable storage medium comprising instructions, such as a memory comprising instructions, executable by a processor of the electronic device 500 to perform the video processing method described above is also provided. Alternatively, the storage medium may be a non-transitory computer readable storage medium, which may be, for example, a ROM, a Random Access Memory (RAM), a CD-ROM, a magnetic tape, a floppy disk, an optical data storage device, and the like.
Embodiments of the present disclosure provide a computer-readable storage medium, in which instructions, when executed by a processor of an electronic device, enable the electronic device to perform the above-described motion capture method.
According to an embodiment of the present disclosure, a computer program product is provided, comprising a computer program, which when executed by a processor, implements the above-mentioned motion capture method.
Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This application is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.
It will be understood that the present disclosure is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

Claims (10)

1. A method of motion capture, the method comprising:
carrying out initial motion capture on a target figure in each frame of video image of a video to be processed to obtain three-dimensional coordinates of each joint point of the target figure in each frame of video image and a local rotation matrix of each joint point relative to a father node;
converting the local rotation matrix of each joint point relative to a father node into a global rotation matrix of each joint point relative to a root node according to a human body skeleton structure;
calculating the distance between the foot and the ground according to the three-dimensional coordinates of the foot joint points;
calculating the foot speed according to the two-dimensional coordinates of the foot key points;
determining the contact condition between the foot and the ground according to the distance between the foot and the ground and the foot speed;
responding to the contact of the foot of the target person in the nth frame of video image with the ground, and optimizing the three-dimensional coordinate and the global rotation matrix of the ankle joint point, so that the optimized three-dimensional coordinate and the optimized global rotation matrix of the ankle joint point are respectively equal to the optimized three-dimensional coordinate and the optimized global rotation matrix of the target person in the (N + 1) th frame of video image, wherein N is less than or equal to N, and N is the total frame number of the video images included in the video to be processed;
and adjusting the three-dimensional coordinates and local rotation matrixes of other joint points of the target person in the nth frame of video image based on the optimized three-dimensional coordinates and global rotation matrixes of the ankle joint points to obtain the processed nth frame of video image.
2. The motion capture method of claim 1, wherein calculating the distance between the foot and the ground based on the three-dimensional coordinates of the foot joint points comprises:
and calculating the distance between the foot joint point and the ground according to the ground position and the three-dimensional coordinates of the foot joint point.
3. The motion capture method of claim 2, wherein prior to calculating the distance between the foot joint point and the ground based on the ground location and the three-dimensional coordinates of the foot joint point, further comprising:
for a target person in each frame of video image, acquiring the three-dimensional coordinates of the lowest joint point according to the three-dimensional coordinates of each joint point;
calculating the distance between the lowest joint point and the ground according to the three-dimensional coordinate of the lowest joint point, the position coordinate of the ground to be calibrated and the normal vector of the plane where the ground is located;
calculating the sum of the distances between the lowest joint point of the target person in each frame of video image and the ground;
determining the ground position coordinates and the normal vector that minimize the distance sum as the ground position.
4. The motion capture method of claim 1, wherein the calculating foot velocities from the two-dimensional coordinates of the foot keypoints comprises:
splicing the two-dimensional coordinates of the key points of each foot part into a foot part vector of each foot part;
and calculating the foot speed of each foot according to the foot vector of each foot.
5. The motion capture method of claim 4, wherein the calculating the foot velocity for the each foot from the foot vector for the each foot comprises:
calculating the foot velocity of each foot side according to the foot vector of each foot side by applying the following formula:
Figure FDA0002983019940000021
wherein the content of the first and second substances,
Figure FDA0002983019940000022
representing the foot speed of one side foot of the target person in the n-th frame of video image, Fi nA foot vector representing the side foot of the target person in the n-th frame of video image, K represents Fi nNumber of foot Key points, F, includedi n+1The foot vector of the foot part of the target person in the n +1 th frame of video image is represented, delta T represents the acquisition time interval of two adjacent frames of video images, and i represents the left side or the right side.
6. The motion capture method of claim 1, wherein determining the contact between the foot and the ground based on the distance between the foot and the ground and the foot velocity comprises:
and for a side foot of the target person in the nth frame of video image, when the distance between the side foot and the ground is smaller than a distance threshold value and the foot speed of the side foot is smaller than a speed threshold value, determining that the side foot is in contact with the ground.
7. A motion capture device, the device comprising:
the motion capture module is used for carrying out initial motion capture on a target person in each frame of video image of a video to be processed to obtain three-dimensional coordinates of each joint point of the target person in each frame of video image and a local rotation matrix of each joint point relative to a father node;
the transformation module is used for transforming the local rotation matrix of each joint point relative to the father node into a global rotation matrix of each joint point relative to the root node according to the human body skeleton structure;
the calculation module is used for calculating the distance between the foot and the ground according to the three-dimensional coordinates of the foot joint points;
the calculation module is used for calculating the foot speed according to the two-dimensional coordinates of the foot key points;
the determining module is used for determining the contact condition between the foot and the ground according to the distance between the foot and the ground and the foot speed;
the optimization module is used for responding to the contact between the foot of a target person in the nth frame of video image and the ground, and optimizing the three-dimensional coordinates and the global rotation matrix of the ankle joint point, so that the optimized three-dimensional coordinates and the optimized global rotation matrix of the ankle joint point are respectively equal to the optimized three-dimensional coordinates and the optimized global rotation matrix of the target person in the (N + 1) th frame of video image, wherein N is less than or equal to N, and N is the total frame number of the video images included in the video to be processed;
and the adjusting module is used for adjusting the three-dimensional coordinates and the local rotation matrix of other joint points of the target person in the nth frame of video image based on the optimized three-dimensional coordinates and the global rotation matrix of the ankle joint points to obtain the processed nth frame of video image.
8. An electronic device, comprising:
a processor;
a memory for storing the processor-executable instructions;
wherein the processor is configured to execute the instructions to implement the motion capture method of any of claims 1 to 6.
9. A computer-readable storage medium in which instructions, when executed by a processor of an electronic device, enable the electronic device to perform the motion capture method of any of claims 1-6.
10. A computer program product comprising a computer program, characterized in that the computer program, when being executed by a processor, implements the motion capture method of any of claims 1 to 6.
CN202110292819.XA 2021-03-18 2021-03-18 Motion capture method, motion capture device, electronic equipment and computer readable storage medium Active CN113033369B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110292819.XA CN113033369B (en) 2021-03-18 2021-03-18 Motion capture method, motion capture device, electronic equipment and computer readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110292819.XA CN113033369B (en) 2021-03-18 2021-03-18 Motion capture method, motion capture device, electronic equipment and computer readable storage medium

Publications (2)

Publication Number Publication Date
CN113033369A true CN113033369A (en) 2021-06-25
CN113033369B CN113033369B (en) 2024-03-12

Family

ID=76471563

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110292819.XA Active CN113033369B (en) 2021-03-18 2021-03-18 Motion capture method, motion capture device, electronic equipment and computer readable storage medium

Country Status (1)

Country Link
CN (1) CN113033369B (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113420719A (en) * 2021-07-20 2021-09-21 北京百度网讯科技有限公司 Method and device for generating motion capture data, electronic equipment and storage medium
CN113421286A (en) * 2021-07-12 2021-09-21 北京未来天远科技开发有限公司 Motion capture system and method
CN113657278A (en) * 2021-08-18 2021-11-16 成都信息工程大学 Motion gesture recognition method, device, equipment and storage medium
CN116092120A (en) * 2022-12-30 2023-05-09 北京百度网讯科技有限公司 Image-based action determining method and device, electronic equipment and storage medium
CN117541646A (en) * 2023-12-20 2024-02-09 暗物质(北京)智能科技有限公司 Motion capturing method and system based on parameterized model

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20140092536A (en) * 2013-01-16 2014-07-24 계명대학교 산학협력단 3d character motion synthesis and control method and device for navigating virtual environment using depth sensor
CN104463146A (en) * 2014-12-30 2015-03-25 华南师范大学 Posture identification method and device based on near-infrared TOF camera depth information
CN112381003A (en) * 2020-11-16 2021-02-19 网易(杭州)网络有限公司 Motion capture method, motion capture device, motion capture equipment and storage medium

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20140092536A (en) * 2013-01-16 2014-07-24 계명대학교 산학협력단 3d character motion synthesis and control method and device for navigating virtual environment using depth sensor
CN104463146A (en) * 2014-12-30 2015-03-25 华南师范大学 Posture identification method and device based on near-infrared TOF camera depth information
CN112381003A (en) * 2020-11-16 2021-02-19 网易(杭州)网络有限公司 Motion capture method, motion capture device, motion capture equipment and storage medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
王文杰;秦现生;王鸿博;洪杰;牛军龙;谭小群;张雪峰;: "基于动作捕捉的人形机器人遥操作动作重定向技术", 机械设计与研究, no. 01 *

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113421286A (en) * 2021-07-12 2021-09-21 北京未来天远科技开发有限公司 Motion capture system and method
CN113421286B (en) * 2021-07-12 2024-01-02 北京未来天远科技开发有限公司 Motion capturing system and method
CN113420719A (en) * 2021-07-20 2021-09-21 北京百度网讯科技有限公司 Method and device for generating motion capture data, electronic equipment and storage medium
CN113420719B (en) * 2021-07-20 2022-07-22 北京百度网讯科技有限公司 Method and device for generating motion capture data, electronic equipment and storage medium
CN113657278A (en) * 2021-08-18 2021-11-16 成都信息工程大学 Motion gesture recognition method, device, equipment and storage medium
CN116092120A (en) * 2022-12-30 2023-05-09 北京百度网讯科技有限公司 Image-based action determining method and device, electronic equipment and storage medium
CN116092120B (en) * 2022-12-30 2023-12-05 北京百度网讯科技有限公司 Image-based action determining method and device, electronic equipment and storage medium
CN117541646A (en) * 2023-12-20 2024-02-09 暗物质(北京)智能科技有限公司 Motion capturing method and system based on parameterized model

Also Published As

Publication number Publication date
CN113033369B (en) 2024-03-12

Similar Documents

Publication Publication Date Title
CN111126272B (en) Posture acquisition method, and training method and device of key point coordinate positioning model
CN113033369B (en) Motion capture method, motion capture device, electronic equipment and computer readable storage medium
CN111738220B (en) Three-dimensional human body posture estimation method, device, equipment and medium
CN103578135B (en) The mutual integrated system of stage that virtual image combines with real scene and implementation method
WO2019005999A1 (en) Method and system for performing simultaneous localization and mapping using convolutional image transformation
US11945125B2 (en) Auxiliary photographing device for dyskinesia analysis, and control method and apparatus for auxiliary photographing device for dyskinesia analysis
US20130136302A1 (en) Apparatus and method for calculating three dimensional (3d) positions of feature points
CN112070782B (en) Method, device, computer readable medium and electronic equipment for identifying scene contour
CN103140879A (en) Information presentation device, digital camera, head mount display, projector, information presentation method, and information presentation program
CN110598590A (en) Close interaction human body posture estimation method and device based on multi-view camera
US20240046557A1 (en) Method, device, and non-transitory computer-readable storage medium for reconstructing a three-dimensional model
CN110211222B (en) AR immersion type tour guide method and device, storage medium and terminal equipment
CN110717391A (en) Height measuring method, system, device and medium based on video image
CN114120432A (en) Online learning attention tracking method based on sight estimation and application thereof
EP0847201A1 (en) Real time tracking system for moving bodies on a sports field
Yan et al. Cimi4d: A large multimodal climbing motion dataset under human-scene interactions
CN116700471A (en) Method and system for enhancing user experience of virtual reality system
CN117711066A (en) Three-dimensional human body posture estimation method, device, equipment and medium
CN116523962A (en) Visual tracking method, device, system, equipment and medium for target object
CN116109974A (en) Volumetric video display method and related equipment
Nagai et al. An on-site visual feedback method using bullet-time video
Pham et al. A low cost system for 3d motion analysis using Microsoft Kinect
Cordea et al. 3-D head pose recovery for interactive virtual reality avatars
KR20210045148A (en) Method, device and non-transitory computer-readable recording medium for estimating information about golf swing
CN111754543A (en) Image processing method, device and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant