CN113033369A

CN113033369A - Motion capture method, motion capture device, electronic equipment and computer-readable storage medium

Info

Publication number: CN113033369A
Application number: CN202110292819.XA
Authority: CN
Inventors: 赵培尧
Original assignee: Beijing Dajia Internet Information Technology Co Ltd
Current assignee: Beijing Dajia Internet Information Technology Co Ltd
Priority date: 2021-03-18
Filing date: 2021-03-18
Publication date: 2021-06-25
Anticipated expiration: 2041-03-18
Also published as: CN113033369B

Abstract

The disclosure relates to a motion capture method, a motion capture device, an electronic device and a computer-readable storage medium, and belongs to the field of image processing. The method comprises the following steps: when the foot of a target person in any frame of video image is in contact with the ground, the position and the orientation of the ankle joint point of the target person in the next frame of video image are optimized through the position and the orientation of the ankle joint point of the target person in the next frame of video image, the position and the orientation of the ankle joint point in the target person in the frame of video image when the postures of the target persons in adjacent frames of video images are consistent as much as possible are obtained, and then the positions and the orientations of other joint points in the frame of video image are adjusted according to the human posture of the target person on the basis of the position and the orientation of the optimized ankle joint point in the frame of video image, so that the target person in the frame of video image can stably stand without sliding on the ground when the original posture is kept, and the motion capture effect is better.

Description

Motion capture method, motion capture device, electronic equipment and computer-readable storage medium

Technical Field

The present disclosure relates to the field of motion capture technologies, and in particular, to a motion capture method and apparatus, an electronic device, and a computer-readable storage medium.

Background

The motion capture technology is a technology which can accurately measure and record various motion tracks and postures of a moving object in a real three-dimensional space in real time and reconstruct the motion state of the object at each moment in a virtual three-dimensional space. Motion capture technology has wide application in live entertainment, human-computer interaction, virtual content production and other aspects, such as driving virtual characters through motion capture. Currently, commonly used motion capture technologies include single-view motion capture, multi-view motion capture, inertial motion capture, and the like, but are limited by the capture accuracy of these methods, and the motion capture results often have problems such as unstable feet when the virtual character stands or sliding feet on the ground when the virtual character walks, which reduces the viewing experience of the user.

In the related art, a Ground IK (Inverse dynamics) method is adopted to solve the problem of foot slip of virtual characters in a video after motion capture. The specific process is as follows: the method comprises the steps of shooting rays downwards from the foot part of the virtual character vertically, calculating the contact position of the foot and the ground and the distance between the foot and the ground, and fixing the foot on the contact position by adopting reverse dynamics if the distance between the foot and the ground is smaller than a certain threshold value.

Although the above method can ensure that the foot of the virtual character steps on the ground, the method cannot eliminate the sliding of the foot in the horizontal direction, and therefore, the motion capture effect is not good.

Disclosure of Invention

The present disclosure provides a motion capture method, a motion capture apparatus, an electronic device and a computer-readable storage medium, so as to at least solve the problem of poor motion capture effect in the related art. The technical scheme of the disclosure is as follows:

according to a first aspect of embodiments of the present disclosure, there is provided a motion capture method, the method comprising:

carrying out initial motion capture on a target figure in each frame of video image of a video to be processed to obtain three-dimensional coordinates of each joint point of the target figure in each frame of video image and a local rotation matrix of each joint point relative to a father node;

converting the local rotation matrix of each joint point relative to a father node into a global rotation matrix of each joint point relative to a root node according to a human body skeleton structure;

calculating the distance between the foot and the ground according to the three-dimensional coordinates of the foot joint points;

calculating the foot speed according to the two-dimensional coordinates of the foot key points;

determining the contact condition between the foot and the ground according to the distance between the foot and the ground and the foot speed;

responding to the contact of the foot of the target person in the nth frame of video image with the ground, and optimizing the three-dimensional coordinate and the global rotation matrix of the ankle joint point, so that the optimized three-dimensional coordinate and the optimized global rotation matrix of the ankle joint point are respectively equal to the optimized three-dimensional coordinate and the optimized global rotation matrix of the target person in the (N + 1) th frame of video image, wherein N is less than or equal to N, and N is the total frame number of the video images included in the video to be processed;

and adjusting the three-dimensional coordinates and local rotation matrixes of other joint points of the target person in the nth frame of video image based on the optimized three-dimensional coordinates and global rotation matrixes of the ankle joint points to obtain the processed nth frame of video image.

In another embodiment of the present disclosure, the calculating a distance between the foot and the ground from the three-dimensional coordinates of the foot joint point comprises:

and calculating the distance between the foot joint point and the ground according to the ground position and the three-dimensional coordinates of the foot joint point.

In another embodiment of the present disclosure, before calculating the distance between the foot joint point and the ground according to the ground position and the three-dimensional coordinates of the foot joint point, the method further comprises:

for a target person in each frame of video image, acquiring the three-dimensional coordinates of the lowest joint point according to the three-dimensional coordinates of each joint point;

calculating the distance between the lowest joint point and the ground according to the three-dimensional coordinate of the lowest joint point, the position coordinate of the ground to be calibrated and the normal vector of the plane where the ground is located;

calculating the sum of the distances between the lowest joint point of the target person in each frame of video image and the ground;

determining the ground position coordinates and the normal vector that minimize the distance sum as the ground position.

In another embodiment of the present disclosure, the calculating the foot velocity from the two-dimensional coordinates of the foot key points comprises:

splicing the two-dimensional coordinates of the key points of each foot part into a foot part vector of each foot part;

and calculating the foot speed of each foot according to the foot vector of each foot.

In another embodiment of the present disclosure, said calculating a foot velocity of said each foot from said foot vector of said each foot comprises:

calculating the foot velocity of each foot side according to the foot vector of each foot side by applying the following formula:

wherein the content of the first and second substances,

representing the foot speed of the foot on one side of the target person in the nth frame video image,

a foot vector representing the side foot of the target person in the n-th frame of video image, and K represents

The number of key points of the foot that are included,

the foot vector of the foot part of the target person in the n +1 th frame of video image is represented, delta T represents the acquisition time interval of two adjacent frames of video images, and i represents the left side or the right side.

In another embodiment of the present disclosure, the determining the contact condition between the foot and the ground according to the distance between the foot and the ground and the foot speed includes:

and for a side foot of the target person in the nth frame of video image, when the distance between the side foot and the ground is smaller than a distance threshold value and the foot speed of the side foot is smaller than a speed threshold value, determining that the side foot is in contact with the ground.

In another embodiment of the present disclosure, the optimizing the three-dimensional coordinates of the ankle joint point in response to the contact of the foot of the target person with the ground in the nth video image includes:

in response to the contact between one side foot of the target person in the nth frame of video image and the ground, based on a first constraint condition, applying the following formula to optimize the three-dimensional coordinates of the ankle joint point of the side foot:

wherein the content of the first and second substances,

representing the three-dimensional coordinates of the optimized ankle joint point of the side foot part of the target person in the nth frame of video image,

representing the three-dimensional coordinates of the optimized ankle joint point of the side foot part of the target person in the n +1 th frame of video image,

and the three-dimensional coordinates of the ankle joint point of the side foot part of the target person in the nth frame of video image are shown, and i represents the left side or the right side.

In another embodiment of the present disclosure, the optimizing the global rotation matrix of the ankle joint point in response to the contact between the foot and the ground of the target person in the nth frame of video image comprises:

in response to the contact between one side foot of the target person in the nth frame of video image and the ground, applying the following formula to optimize a global rotation matrix of the ankle joint points of the side foot based on a second constraint condition:

wherein the content of the first and second substances,

a global rotation matrix representing the optimized ankle joint point of the side foot of the target person in the nth frame of video image,

a global rotation matrix representing the optimized ankle joint point of the side foot of the target person in the n +1 th frame of video image,

and a global rotation matrix representing the ankle joint point of the side foot of the target person in the nth frame of video image, wherein i represents the left side or the right side.

In a second aspect, there is provided a motion capture device, the device comprising:

the motion capture module is used for carrying out initial motion capture on a target person in each frame of video image of a video to be processed to obtain three-dimensional coordinates of each joint point of the target person in each frame of video image and a local rotation matrix of each joint point relative to a father node;

the transformation module is used for transforming the local rotation matrix of each joint point relative to the father node into a global rotation matrix of each joint point relative to the root node according to the human body skeleton structure;

the calculation module is used for calculating the distance between the foot and the ground according to the three-dimensional coordinates of the foot joint points;

the calculation module is used for calculating the foot speed according to the two-dimensional coordinates of the foot key points;

the determining module is used for determining the contact condition between the foot and the ground according to the distance between the foot and the ground and the foot speed;

the optimization module is used for responding to the contact between the foot of a target person in the nth frame of video image and the ground, and optimizing the three-dimensional coordinates and the global rotation matrix of the ankle joint point, so that the optimized three-dimensional coordinates and the optimized global rotation matrix of the ankle joint point are respectively equal to the optimized three-dimensional coordinates and the optimized global rotation matrix of the target person in the (N + 1) th frame of video image, wherein N is less than or equal to N, and N is the total frame number of the video images included in the video to be processed;

and the adjusting module is used for adjusting the three-dimensional coordinates and the local rotation matrix of other joint points of the target person in the nth frame of video image based on the optimized three-dimensional coordinates and the global rotation matrix of the ankle joint points to obtain the processed nth frame of video image.

In another embodiment of the present disclosure, the calculation module is configured to calculate the distance between the foot joint point and the ground according to the ground position and the three-dimensional coordinates of the foot joint point.

In another embodiment of the present disclosure, the apparatus further comprises:

the acquisition module is used for acquiring the three-dimensional coordinates of the lowest joint point according to the three-dimensional coordinates of each joint point for the target person in each frame of video image;

the calculation module is further used for calculating the distance between the lowest joint point and the ground according to the three-dimensional coordinate of the lowest joint point, the position coordinate of the ground to be calibrated and the normal vector of the plane where the ground is located;

the computing module is further used for computing the sum of the distance between the lowest joint point of the target person in each frame of video image and the ground;

the determining module is further configured to determine the ground position as the ground position based on the ground position coordinate and the normal vector that minimize the distance sum.

In another embodiment of the present disclosure, the calculating module is further configured to splice the two-dimensional coordinates of the key points of each foot into a foot vector of each foot; and calculating the foot speed of each foot according to the foot vector of each foot.

In another embodiment of the present disclosure, the calculating module is configured to calculate the foot velocity of each foot side by applying the following formula according to the foot vector of each foot side:

wherein the content of the first and second substances,

The number of key points of the foot that are included,

In another embodiment of the present disclosure, the determining module is configured to determine, for a side foot of the target person in the nth frame of video image, that the side foot is in contact with the ground when a distance between the side foot and the ground is less than a distance threshold and a foot speed of the side foot is less than a speed threshold.

In another embodiment of the present disclosure, the optimization module is configured to, in response to a contact between a lateral foot of the target person in the nth video image and the ground, apply the following formula to optimize the three-dimensional coordinates of the ankle joint point of the lateral foot based on the first constraint condition:

wherein the content of the first and second substances,

In another embodiment of the present disclosure, the optimization module is configured to, in response to a contact between a lateral foot of the target person in the nth video image and the ground, apply the following formula to optimize the global rotation matrix of the ankle joint point of the lateral foot based on a second constraint condition:

wherein the content of the first and second substances,

In a third aspect, an electronic device is provided, including:

a processor;

a memory for storing the processor-executable instructions;

wherein the processor is configured to execute the instructions to implement the motion capture method of the first aspect.

In a fourth aspect, a computer-readable storage medium is provided, the instructions in which, when executed by a processor of an electronic device, enable the electronic device to perform the motion capture method of the first aspect.

In a fifth aspect, a computer program product is provided, comprising a computer program which, when executed by a processor, implements the motion capture method of the first aspect.

The technical scheme provided by the embodiment of the disclosure at least brings the following beneficial effects:

when the foot of the target person in any frame of video image is in contact with the ground, the position and the orientation of the ankle joint point of the target person in the next frame of video image are optimized through the position and the orientation of the ankle joint point of the target person in the next frame of video image, the position and the orientation of the ankle joint point of the target person in the frame of video image are obtained when the postures of the target person in the adjacent frame of video image are consistent as much as possible, and then the positions and the orientations of other joint points in the frame of video image are adjusted according to the human body posture of the target person on the basis of the position and the orientation of the optimized ankle joint point in the frame of video image, so that the target person in the frame of video image keeps the original posture, the foot can stably stand on the ground without sliding, and the action capturing effect is better.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and, together with the description, serve to explain the principles of the disclosure and are not to be construed as limiting the disclosure.

FIG. 1 is a flow diagram illustrating a method of motion capture in accordance with an exemplary embodiment.

FIG. 2 is a flow diagram illustrating a method of motion capture in accordance with an exemplary embodiment.

FIG. 3 is a flow diagram illustrating a motion capture method in accordance with an exemplary embodiment.

FIG. 4 is a block diagram illustrating a motion capture device, according to an example embodiment.

Fig. 5 shows a block diagram of an electronic device according to an exemplary embodiment of the present disclosure.

Detailed Description

In order to make the technical solutions of the present disclosure better understood by those of ordinary skill in the art, the technical solutions in the embodiments of the present disclosure will be clearly and completely described below with reference to the accompanying drawings.

It should be noted that the terms "first," "second," and the like in the description and claims of the present disclosure and in the above-described drawings are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the disclosure described herein are capable of operation in sequences other than those illustrated or otherwise described herein. The implementations described in the exemplary embodiments below are not intended to represent all implementations consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present disclosure, as detailed in the appended claims.

The user information to which the present disclosure relates may be information authorized by the user or sufficiently authorized by each party.

And (3) optional: the environment in which the solution is applied is illustrated. For example, a technical solution related to multi-end interaction needs to be limited in the network environment and hardware to which the technical solution is applied before the detailed description of the solution.

With the continuous development of human-computer interaction technology, the natural and multi-modal interaction technology of human and computer has entered the stage of getting burned, and the motion capture technology is an important branch of human-computer interaction, and has developed rapidly in recent years. The motion capture system converts the information into a digital abstract model by detecting and recording the motion postures and positions of the human body or other objects, and expresses the postures of the targets at different moments. Motion capture systems have been widely used in the fields of motion analysis, model coding, virtual reality, animation production, intelligent monitoring systems, game production, and the like. For example, in an interactive game, various actions of a player can drive actions of virtual characters in a game environment, so that a brand-new participation experience is brought to the player, and the reality and interactivity of the game are enhanced; in animation production, the motion capture technology greatly improves the development efficiency and the development level of animation and game production and reduces the development cost of the animation and game production. In sports training, the motion capture technology can capture quantitative information of displacement, speed, acceleration, force, electromyographic signals and the like of an athlete in the motion process, and by combining a machine learning technology and a human body biomechanics principle, the motion of the athlete can be analyzed from a quantitative angle and a scientific improvement method is provided. Therefore, the motion capture technology applied to the human body has wide application prospect and great commercial value. However, the currently adopted motion capture method has low capture precision, so that the virtual character in the capture result has a foot slip problem, which affects the viewing experience of the user.

In order to solve the problem of foot slip of a virtual character and improve the viewing experience of a user, the embodiment of the disclosure provides a motion capture method. FIG. 1 is a flow chart illustrating a motion capture method, as shown in FIG. 1, for use in an electronic device, including the following steps, according to an example embodiment.

In step S101, performing initial motion capture on a target person in each frame of video image of a video to be processed to obtain three-dimensional coordinates of each joint point of the target person in each frame of video image and a local rotation matrix of each joint point relative to a parent node.

In step S102, the local rotation matrix of each joint point with respect to the parent node is converted into a global rotation matrix of each joint point with respect to the root node according to the human skeleton structure.

In step S103, the distance between the foot and the ground is calculated from the three-dimensional coordinates of the foot joint point.

In step S104, the foot speed is calculated from the two-dimensional coordinates of the foot key points.

In step S105, the contact condition between the foot and the ground is determined according to the distance between the foot and the ground and the foot speed.

In step S106, in response to the contact between the foot of the target person in the nth frame of video image and the ground, the three-dimensional coordinates and the global rotation matrix of the ankle joint point are optimized, so that the three-dimensional coordinates and the global rotation matrix of the optimized ankle joint point are respectively equal to the three-dimensional coordinates and the global rotation matrix of the optimized ankle joint point of the target person in the (N + 1) th frame of video image, where N is equal to or less than N, and N is the total number of frames of video images included in the video to be processed.

In step S107, the three-dimensional coordinates and the local rotation matrix of the other joint points of the target person in the nth frame of video image are adjusted based on the optimized three-dimensional coordinates and the global rotation matrix of the ankle joint points, so as to obtain a processed nth frame of video image.

The method provided by the embodiment of the disclosure can reflect the position of the joint point by the three-dimensional coordinate of the joint point of the target person, can reflect the orientation of the joint point by the global rotation matrix, when the foot of the target person in any frame of video image contacts the ground, optimizes the position and orientation of the ankle joint point of the target person in the frame of video image by the position and orientation of the ankle joint point of the target person in the next frame of video image, obtains the position and orientation of the ankle joint point in the target person in the frame of video image when the postures of the target person in the adjacent frame of video image are consistent as much as possible, then adjusts the positions and orientations of other joint points in the frame of video image according to the posture of the target person based on the position and orientation of the optimized ankle joint point in the frame of video image, so that the target person in the frame of video image keeps the original posture, the foot can stably stand on the ground without sliding, and the action capturing effect is better.

In another embodiment of the present disclosure, calculating a distance between the foot and the ground from the three-dimensional coordinates of the foot joint points comprises:

calculating the sum of the distances between the lowest joint point of the target person and the ground in each frame of video image;

the ground location coordinate and normal vector that minimizes the distance sum are determined as the ground location.

In another embodiment of the present disclosure, calculating foot velocity from two-dimensional coordinates of foot keypoints comprises:

In another embodiment of the present disclosure, calculating a foot velocity for each foot from the foot vector for each foot comprises:

from the foot vector for each foot, the foot velocity for each foot is calculated using the following formula:

wherein the content of the first and second substances,

the foot speed of the foot on the side of the target person in the n-th frame video image is shown,

a foot vector representing the foot of the target human body side in the nth frame video image, and K represents

The number of key points of the foot that are included,

the foot vector of the foot of the target human object side in the n +1 frame video image is represented, Δ T represents the acquisition time interval of two adjacent frames of video images, and i represents the left side or the right side.

In another embodiment of the present disclosure, determining contact between the foot and the ground based on the distance between the foot and the ground and the foot speed comprises:

and for one side foot of the target person in the nth frame of video image, when the distance between the side foot and the ground is smaller than a distance threshold value and the foot speed of the side foot is smaller than a speed threshold value, determining that the side foot is in contact with the ground.

In another embodiment of the present disclosure, in response to the foot of the target person in the nth video image contacting the ground, the optimization process of the three-dimensional coordinates of the ankle joint point includes:

and in response to the contact between one side foot of the target person in the nth frame of video image and the ground, optimizing the three-dimensional coordinates of the ankle joint point of the side foot by applying the following formula based on the first constraint condition:

wherein the content of the first and second substances,

representing the three-dimensional coordinates of the optimized ankle joint point of the target human body side foot in the nth frame video image,

representing the three-dimensional coordinates of the optimized ankle joint point of the target human body side foot in the (n + 1) th frame video image,

and the three-dimensional coordinates of the ankle joint point of the target human body side foot in the nth frame video image are shown, and i represents the left side or the right side.

In another embodiment of the present disclosure, in response to the target person's foot in contact with the ground in the nth video image, the optimization process for the global rotation matrix of the ankle joint point includes:

and in response to the contact between the foot part of the target person in the nth frame of video image and the ground, applying the following formula to optimize the global rotation matrix of the ankle joint point of the side foot part based on the second constraint condition:

wherein the content of the first and second substances,

a global rotation matrix of the optimized ankle joint point of the target human body side foot in the nth frame video image is shown,

is shown asThe global rotation matrix of the optimized ankle joint point of the target human body side foot in the n +1 frame video image,

and the global rotation matrix represents the ankle joint point of the target human body side foot in the nth frame video image, and i represents the left side or the right side.

Fig. 2 is a flow chart illustrating a motion capture method, as shown in fig. 2, for use in an electronic device, according to an example embodiment, including the following steps.

In step S201, a video to be processed is acquired.

The video to be processed may be a single-view video obtained by shooting the target person from one view angle, or may be a multi-view video obtained by shooting the target person from multiple view angles. When the video to be processed is a multi-view video, time synchronization needs to be performed on videos of different views in advance. When the videos at different visual angles are synchronized, the synchronization can be realized by adopting a camera supporting hardware synchronization or software synchronization, and the time difference between the videos shot at different visual angles can be estimated by adopting a sound synchronization method, so that the videos are delayed by corresponding time to realize time synchronization.

In step S202, initial motion capture is performed on the target person in each frame of video image of the video to be processed, so as to obtain three-dimensional coordinates of each joint point of the target person in each frame of video image and a local rotation matrix of each joint point relative to the parent node.

After the video to be processed is obtained, performing initial motion capture on a target figure in each frame of video image in the video to be processed by adopting a motion capture technology to obtain three-dimensional coordinates of each joint point of the target figure in each frame of video image and a local rotation matrix of each joint point relative to a father node. The motion capture technology includes inertial motion capture technology, single-view capture technology, multi-view capture technology, and the like. The target character is an actor in the video. The local rotation matrix may be employed

Is represented by the formula (I), wherein R_jJ is the J-th joint point in the human skeleton, the value of J is 1 to J, and J is the number of the joint points in the human skeleton. The three-dimensional coordinates of the joint points can be adopted

And (4) showing.

When the three-dimensional coordinates of each joint point of the target person and the local rotation matrix of each joint point relative to the parent node in each frame of video image are obtained, the related technology can directly output the motion capture result or redirect the motion capture result to the virtual character to be driven for output. However, the accuracy of the conventional motion capture method is limited, so that the virtual character reconstructed according to the target person has a foot slip problem. In order to solve the problem of foot sliding of the virtual character, the embodiment of the present disclosure does not directly use the motion capture result to drive the virtual character, and the position and the orientation of the joint point of the foot of the virtual character are fixed through the subsequent steps when the foot of the virtual character contacts the ground, so that the foot of the virtual character cannot slide, thereby well solving the problem of foot sliding of the virtual character.

It should be noted that, in the field of motion capture technology, redirection is an optional operation, and the redirection maps the human skeleton defined in the motion capture algorithm to the skeleton of the virtual character to be driven. To avoid confusion, embodiments of the present disclosure do not distinguish between the various joints of the human skeleton and the various joints of the virtual character skeleton of the motion capture algorithm, i.e., if a redirection operation is performed, embodiments of the present disclosure still use

And

three-dimensional coordinates of each joint point on the human skeleton representing the re-oriented virtual character and a local rotation matrix of each joint point relative to the parent node.

In step S203, the local rotation matrix of each joint point with respect to the parent node is converted into a global rotation matrix of each joint point with respect to the root node according to the human skeleton structure.

Because the local rotation matrix of each joint point relative to its father node is obtained after the initial motion capture, in order to facilitate subsequent calculation, the embodiment of the disclosure further converts the local rotation matrix of each joint point relative to the father node into the global rotation matrix of each joint point relative to the root node according to the skeleton structure of the human body, and the global rotation matrix can adopt

Meaning that the root node is typically a human pelvic joint point. Specifically, the global rotation matrix of a joint is equal to the local rotation matrix of the joint relative to its parent node multiplied by the local rotation matrix of the parent node relative to the root node and then multiplied by the rotation matrix of the root node. For example, the root node is a pelvic joint point, the parent node of the wrist joint point is an elbow joint point, and the global rotation matrix of the wrist joint point is a local rotation matrix of the wrist joint point relative to the elbow joint point and a local rotation matrix of the elbow joint point relative to the pelvic joint point and a rotation matrix of the pelvic joint point.

In step S204, the distance between the foot and the ground is calculated from the three-dimensional coordinates of the foot joint point.

Since the ground position is an important reference for determining the contact condition between the foot of the target person and the ground, the ground position can be determined before the step is performed, and the ground position comprises the position coordinates of the ground and the normal vector of the plane where the ground is located. The position coordinate of the ground in the embodiment of the disclosure is the intercept of the ground on the y axis, and y is adopted_fioorAnd (4) representing that a normal vector of a plane where the ground is located is represented by e. In determining the ground location, embodiments of the present disclosure need to make the following assumptions:

first, it is assumed that the target person is in contact with the ground for most of the time in the video to be processed, i.e., the time for the target person to completely vacate the ground is a small percentage of the total duration of the video to be processed.

Second, assume that the camera is placed approximately parallel to the ground, i.e., the positive direction of the camera in the y-axis of the coordinate system is approximately above the actual physical space.

Based on the above two assumptions, when determining the ground position, the following method can be adopted:

the method comprises the following steps of firstly, acquiring the three-dimensional coordinates of the lowest joint point according to the three-dimensional coordinates of all joint points for a target person in each frame of video image.

And acquiring the joint point with the minimum y axis from the joint coordinates of all joint points of the target person in each frame of video image, taking the joint point with the minimum y axis as the lowest joint point of the target person, and further acquiring the three-dimensional coordinates of the lowest joint point of the target person in each frame of video image.

When the three-dimensional coordinates of the lowest joint point of the target person in each frame of video image are acquired, the three-dimensional coordinates of the lowest joint point extracted from each frame of video image are formed into a sequence which can be expressed as

Wherein the content of the first and second substances,

three-dimensional coordinates representing the lowest joint point of the target person in the first frame video image,

three-dimensional coordinates representing the lowest joint point of the target person in the second frame video image,

and representing the three-dimensional coordinates of the lowest joint point of the target person in the N frame of video image, wherein N is the total frame number of the video images included in the video to be processed.

And secondly, calculating the distance between the lowest joint point and the ground according to the three-dimensional coordinate of the lowest joint point, the position coordinate of the ground to be calibrated and the normal vector of the plane where the ground is located.

And for each frame of video image, acquiring a vector difference between the position coordinate of the ground and the three-dimensional coordinate of the lowest joint point of the target person in each frame of video image, and taking the product of the normal vector of the plane where the ground is located and the vector difference corresponding to each frame of video image as the distance between the lowest joint point of the target person and the ground in each frame of video image.

And thirdly, calculating the sum of the distances between the lowest joint point of the target person in each frame of video image and the ground.

After the distance between the lowest joint point of the target person and the ground in each frame of video image is calculated, the distance sum between the lowest joint point of the target person and the ground in each frame of video image is obtained by adding the distances between the lowest joint point of the target person and the ground in each frame of video image.

The position coordinates of the ground are set to be P ═ {0, y_fioor0, the sum of the distances between the lowest joint point of the target person in each frame of video image and the ground to be calibrated is expressed as

And fourthly, determining the ground position as the ground position by using the ground position coordinate and the normal vector which enable the distance sum to be minimum.

In this step, when the ground position coordinates and normal vectors that minimize the distance sum are obtained, the optimal solution of the following function can be calculated:

obtaining y that minimizes the sum of the distances by solving an optimal solution of the function_fioorAnd a normal vector e, and further sum the y_fioorAnd the normal vector e is determined as the ground position. To improve the robustness of the algorithm to outliers, embodiments of the present disclosure may also use the energy term in the huber function pair equation above

And (6) carrying out constraint.

It should be noted that, if the position of the camera remains unchanged, the ground position determination result may be reused, that is, the ground position does not need to be determined again, and after the ground position is determined this time, subsequent calculation may be performed based on the determination result this time. In determining the ground location, a set of videos may be previously captured for determining the ground location.

When the target person stands or moves, the foot joint point is the lowest joint point of the target person generally, and the ground to be calibrated can ensure that the foot of the target person steps on the ground for most of time by optimizing the sum of the distances between the lowest joint point of the target person and the ground, so that the contact condition between the foot of the target person and the ground can be conveniently judged subsequently.

Based on the determined ground position, the distance between the foot joint point and the ground can be calculated from the ground position and the three-dimensional coordinates of the foot joint point. For example, for the n-th frame of video image, the three-dimensional coordinates of the left foot joint point of the target person in the n-th frame of video image are set to

The distance between the left foot joint point of the target person in the nth video image and the ground is

For another example, for the n-th frame of video image, the three-dimensional coordinates of the right foot joint point of the target person in the n-th frame of video image are set to

The distance between the right foot joint point of the target person in the nth video image and the ground is

The embodiment of the disclosure can calculate the distance between the foot joint point and the ground by applying a point-to-plane distance formula based on the calibrated ground position and the three-dimensional coordinates of the foot joint point, thereby providing a method for calculating the distance between the foot joint point and the ground.

In step S205, the foot speed is calculated from the two-dimensional coordinates of the foot key points.

Before this step is performed, the foot key points may be defined in advance according to the processing requirements for the video image. The key points of the foot comprise big toe joint points, small toe joint points, heel joint points, ankle joint points and the like. The keypoint detection algorithm may be a 2D (Two-Dimensional) keypoint detection algorithm, or the like. The 2D keypoint detection algorithm may use a common openpos or other algorithm. When the 2D key point detection algorithm is adopted for detection, the 2D coordinates of the joint points which are not detected are set to be (0, 0).

For the target person in each frame of video image, when the foot speed is calculated according to the two-dimensional coordinates of the foot key points, the following method can be adopted:

2051. and splicing the two-dimensional coordinates of the key points of each foot part into the foot part vector of each foot part.

For any frame of video image, the two-dimensional coordinates of the key points of the left foot of the target person in the frame of video image are spliced into a first foot vector of the left foot. For example, a first foot vector formed by splicing two-dimensional coordinates of a left foot key point of a target person in the nth frame video image is represented as

And for any frame of video image, splicing the two-dimensional coordinates of the key points of the right foot of the target person in the frame of video image into a second foot vector of the right foot. For example, a second foot vector formed by splicing the two-dimensional coordinates of the key point of the right foot of the target person in the nth frame of video image is represented as

2052. And calculating the foot speed of each foot according to the foot vector of each foot.

For the left foot of the target person in the nth frame of video image, the following formula can be applied to calculate the foot velocity of the left foot:

wherein the content of the first and second substances,

the foot speed of the left foot of the target person in the n-th frame of video image is shown, and K is shown

The number of the foot key points is included, delta T represents the acquisition time interval of two adjacent frames of video images,

a first footer vector representing the left footer of the target person in the nth frame video image,

a first footer vector representing the left footer of the target person in the (n + 1) th frame of video image.

For the right foot of the target person in the nth frame of video image, the following formula can be applied to calculate the foot velocity of the right foot:

wherein the content of the first and second substances,

the foot velocity of the right foot of the target person in the n-th frame of video image is shown, and K is shown

second footer vector representing the right footer of the target person in the n-th frame video image，

And a second footer vector representing the right footer of the target person in the n +1 th frame video image.

The embodiment of the disclosure calculates the foot vector spliced by the key points of each foot in the two adjacent frames of video images, the acquisition time interval of the two adjacent frames of video images and the number of the key points of each foot, and can directly acquire the relevant parameters for calculating the speed of each foot without using tools such as a sensor, so that the cost is saved, and the calculation result is more accurate.

The embodiment of the disclosure provides a foot speed calculation method, which is based on detected key points of each side of foot, and the two-dimensional coordinates of the key points of each side of foot are spliced into a foot vector capable of describing the characteristics of each side of foot, so that the foot speed of each side of foot is calculated according to the foot vector of each side of foot.

In step S206, the contact condition between the foot and the ground is determined according to the distance between the foot and the ground and the foot speed.

When the contact condition between the foot of the target person and the ground is determined in each frame of video image, it is assumed in advance that the foot does not slip when contacting the ground, and the contact condition between the foot and the ground can be determined based on the assumption using the foot speed and the distance between the foot and the ground. And for one side foot of the target person in the nth frame of video image, when the distance between the side foot and the ground is smaller than a distance threshold value and the foot speed of the side foot is smaller than a speed threshold value, determining that the side foot is in contact with the ground. Wherein the distance threshold value adopts y_thThe distance threshold may be 1 mm, 2 mm, etc. The speed threshold represents, which may be 1 mm/sec, 2 mm/sec, etc.

For the left and right feet of the target person in the nth frame of video image, when determining the contact condition between the feet and the ground, the following description will be made separately.

For the left foot of the target person in the nth frame of video image, the left foot of the target person is processedThe distance between the left foot and the ground is less than a distance threshold and the foot speed of the left foot is less than a speed threshold, determining that the left foot is in contact with the ground. Set adoption

Showing the contact condition of the left foot of the target person with the ground in the nth frame of video image,

indicating that the left foot of the target person is in contact with the ground,

the left foot of the target person is not in contact with the ground

Can be calculated by the following formula:

wherein the content of the first and second substances,

is the three-dimensional coordinates of the left foot joint point of the target person in the nth video image,

the distance between the left foot of the target person and the ground in the nth frame of video image is shown.

And for the right foot of the target person in the nth frame of video image, when the distance between the right foot and the ground is smaller than a distance threshold value and the foot speed of the right foot is smaller than a speed threshold value, determining that the right foot is in contact with the ground. Set adoption

Showing the contact of the right foot of the target person with the ground in the nth frame of video imageIn the light of the situation, the user can select the specific situation,

indicating that the right foot of the target person is in contact with the ground,

the right foot of the target person is not in contact with the ground

Can be calculated by the following formula:

wherein the content of the first and second substances,

is the three-dimensional coordinates of the right side foot joint point of the target person in the nth frame video image,

is the distance between the right foot of the target person and the ground in the nth frame of video image.

Because the contact condition between foot and the ground can be reflected to distance between foot and the ground and foot speed, usually when the distance between foot and the ground is great, the foot can not contact with ground, and the foot also can not contact with ground when the speed of foot and ground is great, this disclosed embodiment is based on the foot speed of every side foot and the distance between every side foot and the ground, when distance between foot and ground and foot speed all satisfy the requirement, just determine foot and ground contact, the accuracy of determining the result has been improved.

In step S207, in response to the contact between the foot of the target person in the nth frame of video image and the ground, the three-dimensional coordinates and the global rotation matrix of the ankle joint point are optimized, so that the three-dimensional coordinates and the global rotation matrix of the optimized ankle joint point are respectively equal to the three-dimensional coordinates and the global rotation matrix of the optimized ankle joint point of the target person in the (N + 1) th frame of video image, where N is equal to or less than N, and N is the total number of frames of video images included in the video to be processed.

The above-mentioned when judging whether the foot of the target person in each frame of video image contacts with the ground, it is set that the foot will not slide when contacting with the ground, however, in reality, the foot will slide when contacting with the ground because the foot has a certain speed in the motion process of the target person. To solve this problem, the embodiments of the present disclosure will also optimize the three-dimensional coordinates and rotation matrix of the foot joint point of the target person in each frame of the video image so that the three-dimensional coordinates and rotation matrix of the foot joint point of the target person in the video image are fixed when the foot of the target person in the video image is in contact with the ground. In the field of motion capture, the foot can rotate along with the rotation of the ankle joint point as a rigid body, so that the optimization problem of the ankle joint point can be converted into the optimization of the ankle joint point.

The optimization process of the three-dimensional coordinates of the ankle joint point of the left foot of the target person in the nth frame video image comprises the following steps:

and in response to the contact between the left foot of the target person in the nth frame of video image and the ground, applying the following formula to optimize the three-dimensional coordinates of the ankle joint point of the left foot of the target person in the nth frame of video image based on the first constraint condition:

wherein the content of the first and second substances,

representing the three-dimensional coordinates of the optimized ankle joint point of the left foot of the target person in the nth frame of video image,

representing the three-dimensional coordinates of the left foot optimized ankle joint point of the target person in the (n + 1) th frame video image,

three-dimensional coordinates of the ankle joint point of the left foot of the target person in the nth frame video image, a first energy function

The second energy function is used for constraining the three-dimensional coordinates of the ankle joint point of the left foot, so that the optimized three-dimensional coordinates of the ankle joint point of the left foot are consistent with the three-dimensional coordinates of the ankle joint point of the left foot obtained by motion capture as much as possible

And the first constraint condition is that when the left foot of the target person in the nth frame of video image is in contact with the ground, the three-dimensional coordinates of the optimized side ankle joint point of the left foot of the target person in the nth frame of video image are equal to the three-dimensional coordinates of the optimized ankle joint point of the left foot of the target person in the n +1 th frame of video image.

The optimization process of the three-dimensional coordinates of the ankle joint point of the right foot of the target person in the nth frame video image comprises the following steps:

and in response to the contact of the right foot part of the target person with the ground in the nth frame of video image, applying the following formula to optimize the three-dimensional coordinates of the ankle joint point of the right foot part of the target person in the nth frame of video image based on the first constraint condition:

wherein the content of the first and second substances,

three-dimensional coordinates of the right foot optimized ankle joint point of the target person in the nth frame of video image are shown,

represents the n +1 thThe three-dimensional coordinates of the right foot optimized ankle joint point of the target person in the frame video image,

three-dimensional coordinates of an ankle joint point of the right foot of the target person in the nth frame of video image and a first energy function

The second energy function is used for constraining the three-dimensional coordinates of the ankle joint point of the right foot part, so that the optimized three-dimensional coordinates of the ankle joint point of the right foot part are consistent with the three-dimensional coordinates of the ankle joint point of the right foot part obtained by motion capture as much as possible

And the first constraint condition is that when the right foot of the target person in the nth frame of video image is in contact with the ground, the three-dimensional coordinates of the optimized ankle joint point of the right foot of the target person in the nth frame of video image are equal to the three-dimensional coordinates of the optimized ankle joint point of the right foot of the target person in the n +1 th frame of video image.

It should be noted that the ankle joint point in the embodiment of the present disclosure is one of the foot joint points, taking the left foot as an example,

three-dimensional coordinates of the ankle joint point of the left foot of the target person in the n-th video image, that is

Is composed of

A typical joint point of courseIf the calculation amount is not considered, each joint point of the left foot can be optimized when

And

all represent the left foot joint point.

The embodiment of the present disclosure optimizes the three-dimensional coordinates of the ankle joint point of the target person in the current frame video image according to the three-dimensional coordinates of the ankle joint point of the target person in the next frame video image, so that the ankle joint points of the target person in the adjacent frame video images are as consistent as possible in terms of the three-dimensional coordinates, i.e., the positions. The three-dimensional coordinates of the ankle joint points of the target person in the current frame video image are the basis for subsequently adjusting the three-dimensional coordinates of other joint points, the three-dimensional coordinates of the ankle joint points of the target person in the current frame video image are optimized, so that the three-dimensional coordinates of the ankle joint points are more accurate, when the other joint points are adjusted on the basis of the ankle joint points, the three-dimensional coordinates of the other adjusted joint points are more accurate, and the accuracy of the motion capture result is further improved.

The optimization process of the global rotation matrix of the ankle joint point of the left foot of the target person in the nth frame of video image is as follows:

and in response to the contact between the left foot of the target person and the ground in the nth frame of video image, applying the following formula to optimize a global rotation matrix of the ankle joint point of the left foot based on the second constraint condition:

wherein the content of the first and second substances,

a global rotation matrix representing the optimized ankle joint point of the left foot of the target person in the nth frame of video image,

a global rotation matrix representing the optimized ankle joint point of the left foot of the target person in the n +1 th frame of video image,

global rotation matrix representing the ankle joint point of the left foot of the target person in the nth video image, first energy function

The global rotation matrix is used for constraining the ankle joint points of the left foot, so that the optimized global rotation matrix of the ankle joint points of the left foot is consistent with the global rotation matrix of the ankle joint points of the left foot obtained by motion capture as much as possible, and the second energy function

And the second constraint condition is that when the left foot of the target person in the nth frame of video image is in contact with the ground, the global rotation matrix of the optimized ankle joint point of the left foot of the target person in the nth frame of video image is equal to the global rotation matrix of the optimized ankle joint point of the left foot of the target person in the n +1 th frame of video image.

The optimization process of the global rotation matrix of the ankle joint point of the right foot of the target person in the nth frame of video image is as follows:

and in response to the contact between the right foot of the target person in the nth frame of video image and the ground, applying the following formula to optimize a global rotation matrix of the ankle joint point of the right foot based on the second constraint condition:

wherein the content of the first and second substances,

a global rotation matrix representing the right foot optimized ankle joint point of the target person in the nth frame of video image,

a global rotation matrix representing the right foot optimized ankle joint point of the target person in the (n + 1) th frame video image,

global rotation matrix representing the ankle joint point of the right foot of the target person in the nth video image, first energy function

The global rotation matrix is used for constraining the ankle joint points of the right foot, so that the optimized global rotation matrix of the ankle joint points of the right foot is consistent with the global rotation matrix of the ankle joint points of the right foot obtained by motion capture as much as possible, and the second energy function

And the second constraint condition is that when the right foot of the target person in the nth frame of video image is in contact with the ground, the global rotation matrix of the optimized ankle joint point of the right foot of the target person in the nth frame of video image is equal to the global rotation matrix of the optimized ankle joint point of the right foot of the target person in the n +1 th frame of video image.

According to the global rotation matrix of the ankle joint point of the target person in the next frame of video image, the global rotation matrix of the ankle joint point of the target person in the current frame of video image is optimized, so that the ankle joint points of the target person in the adjacent frame of video image are as consistent as possible in the orientation direction. The global rotation matrix of the ankle joint point of the target person in the current frame video image is the basis for adjusting the global rotation matrix of other joint points subsequently, the global rotation matrix of the ankle joint point of the target person in the current frame video image is optimized, so that the global rotation matrix of the ankle joint point is more accurate, when other joint points are adjusted on the basis of the ankle joint point, the adjusted local rotation matrix of other joint points is more accurate, and the accuracy of the action capturing result is further improved.

In step S208, the three-dimensional coordinates and the local rotation matrix of other joint points of the target person in the nth frame of video image are adjusted based on the optimized three-dimensional coordinates and the global rotation matrix of the ankle joint point, so as to obtain a processed nth frame of video image.

And adjusting the three-dimensional coordinates and the local rotation matrix of other joint points of the target person in the nth frame of video image by adopting a reverse dynamics method based on the optimized three-dimensional coordinates and the global rotation matrix of the ankle joint points to obtain the processed nth frame of video image. The three-dimensional coordinates and the local rotation matrix of each joint point in the processed nth frame video image are adjusted on the basis of the optimized three-dimensional coordinates and the global rotation matrix of the ankle joint point, so that the three-dimensional coordinates and the local rotation matrix of each joint point of the target person in the adjusted nth frame video image are moved integrally, and the problem of foot sliding of the target person in the adjusted video image can be solved. Among them, the common 3D engines such as Unity and urea have well-established inverse dynamics tools.

The processing procedure of the video image will be described below with reference to fig. 3 as an example.

Referring to fig. 3, for an input single-view or multi-view video, a motion capture method is adopted to process the input single-view or multi-view video to obtain three-dimensional coordinates and a local rotation matrix of each joint point in a human body skeleton, the local rotation matrix of each joint point is converted into a global rotation matrix, and ground calibration is performed based on the obtained three-dimensional coordinates and the global rotation matrix of each joint point in the human body skeleton to obtain a ground position and a ground direction. And detecting by adopting a 2D key point detection method to obtain the 2D key point coordinates of the feet. Judging whether the foot is in contact with the ground or not based on the ground position and direction, the 2D key point coordinates of the foot and the three-dimensional coordinates and the global rotation matrix of each joint point in the human body skeleton, further optimizing the three-dimensional coordinates and the global rotation matrix of the foot joint to obtain the optimized three-dimensional coordinates and the optimized global rotation matrix of the foot joint, and further processing by adopting inverse dynamics to obtain a processed video image.

FIG. 4 is a block diagram illustrating a motion capture device, according to an example embodiment. Referring to fig. 4, the apparatus includes:

the motion capture module 401 is configured to perform initial motion capture on a target person in each frame of video image of a video to be processed to obtain three-dimensional coordinates of each joint point of the target person in each frame of video image and a local rotation matrix of each joint point relative to a parent node;

a conversion module 402, configured to convert, according to a human skeleton structure, a local rotation matrix of each joint point relative to a parent node into a global rotation matrix of each joint point relative to a root node;

a calculating module 403, configured to calculate a distance between the foot and the ground according to the three-dimensional coordinates of the foot joint point;

a calculating module 403, configured to calculate a foot speed according to the two-dimensional coordinates of the foot key points;

a determining module 404, configured to determine a contact condition between the foot and the ground according to a distance between the foot and the ground and a foot speed;

an optimizing module 405, configured to optimize a three-dimensional coordinate and a global rotation matrix of an ankle joint point in response to a contact between a foot of a target person in an nth frame of video image and a ground, so that the optimized three-dimensional coordinate and the optimized global rotation matrix of the ankle joint point are equal to the optimized three-dimensional coordinate and the optimized global rotation matrix of the ankle joint point of the target person in an (N + 1) th frame of video image, respectively, where N is equal to or less than N, and N is a total number of frames of video images included in a video to be processed;

and an adjusting module 406, configured to adjust three-dimensional coordinates and local rotation matrices of other joint points of the target person in the nth frame of video image based on the optimized three-dimensional coordinates and global rotation matrix of the ankle joint point, so as to obtain a processed nth frame of video image.

In another embodiment of the present disclosure, the calculating module 403 is configured to calculate the distance between the foot joint point and the ground according to the ground position and the three-dimensional coordinates of the foot joint point.

the calculation module 403 is further configured to calculate a distance between the lowest joint point and the ground according to the three-dimensional coordinate of the lowest joint point, the position coordinate of the ground to be calibrated, and a normal vector of a plane where the ground is located;

the calculating module 403 is further configured to calculate a sum of distances between a lowest joint point of the target person in each frame of video image and the ground;

the determining module 404 is further configured to determine the ground position as the ground position according to the ground position coordinate and the normal vector that minimize the distance sum.

In another embodiment of the present disclosure, the calculating module 403 is further configured to splice the two-dimensional coordinates of the key points of each foot into a foot vector of each foot; and calculating the foot speed of each foot according to the foot vector of each foot.

In another embodiment of the present disclosure, the calculating module 403 is configured to calculate the foot velocity of each foot by applying the following formula according to the foot vector of each foot:

wherein the content of the first and second substances,

The number of key points of the foot that are included,

In another embodiment of the disclosure, the determining module 404 is configured to determine that the side foot of the target person in the nth frame of video image is in contact with the ground when the distance between the side foot and the ground is less than the distance threshold and the foot speed of the side foot is less than the speed threshold.

In another embodiment of the present disclosure, the optimizing module 405 is configured to optimize the three-dimensional coordinates of the ankle joint point of the lateral foot by applying the following formula based on the first constraint condition in response to the contact between the lateral foot of the target person and the ground in the nth frame of video image:

wherein the content of the first and second substances,

In another embodiment of the present disclosure, the optimization module 405 is configured to optimize the global rotation matrix of the ankle joint point of the lateral foot by applying the following formula based on the second constraint condition in response to the contact between the lateral foot of the target person and the ground in the nth frame of video image:

wherein the content of the first and second substances,

a global rotation matrix of the optimized ankle joint point of the target human body side foot in the n +1 frame video image is shown,

According to the device provided by the embodiment of the disclosure, the three-dimensional coordinates of the joint point of the target person can reflect the position of the joint point, the global rotation matrix can reflect the orientation of the joint point, when the foot of the target person in any frame of video image is in contact with the ground, the position and the orientation of the ankle joint point of the target person in the next frame of video image are optimized according to the position and the orientation of the ankle joint point of the target person in the next frame of video image, the position and the orientation of the ankle joint point of the target person in the frame of video image are obtained when the postures of the target person in the adjacent frame of video image are consistent as much as possible, then the positions and the orientations of other joint points in the frame of video image are adjusted according to the posture of the target person based on the position and the orientation of the optimized ankle joint point in the frame of video image, so that the target person in the frame of video image keeps the original posture, the foot can stably stand on the ground without sliding, and the action capturing effect is better. .

With regard to the apparatus in the above-described embodiment, the specific manner in which each module performs the operation has been described in detail in the embodiment related to the method, and will not be elaborated here.

Fig. 5 shows a block diagram of an electronic device 500 according to an exemplary embodiment of the present disclosure. In general, the apparatus 500 includes: a processor 501 and a memory 502.

The processor 501 may include one or more processing cores, such as a 4-core processor, an 8-core processor, and so on. The processor 501 may be implemented in at least one hardware form of a DSP (Digital Signal Processing), an FPGA (Field-Programmable Gate Array), and a PLA (Programmable Logic Array). The processor 501 may also include a main processor and a coprocessor, where the main processor is a processor for Processing data in an awake state, and is also called a Central Processing Unit (CPU); a coprocessor is a low power processor for processing data in a standby state. In some embodiments, the processor 501 may be integrated with a GPU (Graphics Processing Unit), which is responsible for rendering and drawing the content required to be displayed on the display screen. In some embodiments, processor 501 may also include an AI (Artificial Intelligence) processor for processing computational operations related to machine learning.

Memory 502 may include one or more computer-readable storage media, which may be non-transitory. Memory 502 may also include high-speed random access memory, as well as non-volatile memory, such as one or more magnetic disk storage devices, flash memory storage devices. In some embodiments, a non-transitory computer readable storage medium in memory 502 is used to store at least one instruction for execution by processor 501 to implement the motion capture method provided by method embodiments in the present disclosure.

In some embodiments, the electronic device 500 may further optionally include: a peripheral interface 503 and at least one peripheral. The processor 501, memory 502 and peripheral interface 503 may be connected by a bus or signal lines. Each peripheral may be connected to the peripheral interface 503 by a bus, signal line, or circuit board. Specifically, the peripheral device includes: a power supply 504.

The peripheral interface 503 may be used to connect at least one peripheral related to I/O (Input/Output) to the processor 501 and the memory 502. In some embodiments, the processor 501, memory 502, and peripheral interface 503 are integrated on the same chip or circuit board; in some other embodiments, any one or two of the processor 501, the memory 502, and the peripheral interface 503 may be implemented on a separate chip or circuit board, which is not limited in this embodiment.

The power supply 504 is used to power the various components in the electronic device 500. The power source 504 may be alternating current, direct current, disposable batteries, or rechargeable batteries. When the power supply 504 includes a rechargeable battery, the rechargeable battery may be a wired rechargeable battery or a wireless rechargeable battery. The wired rechargeable battery is a battery charged through a wired line, and the wireless rechargeable battery is a battery charged through a wireless coil. The rechargeable battery may also be used to support fast charge technology.

Those skilled in the art will appreciate that the configuration shown in fig. 5 is not intended to be limiting of the electronic device 500 and may include more or fewer components than those shown, or some components may be combined, or a different arrangement of components may be used.

In an exemplary embodiment, a computer-readable storage medium comprising instructions, such as a memory comprising instructions, executable by a processor of the electronic device 500 to perform the video processing method described above is also provided. Alternatively, the storage medium may be a non-transitory computer readable storage medium, which may be, for example, a ROM, a Random Access Memory (RAM), a CD-ROM, a magnetic tape, a floppy disk, an optical data storage device, and the like.

Embodiments of the present disclosure provide a computer-readable storage medium, in which instructions, when executed by a processor of an electronic device, enable the electronic device to perform the above-described motion capture method.

According to an embodiment of the present disclosure, a computer program product is provided, comprising a computer program, which when executed by a processor, implements the above-mentioned motion capture method.

Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This application is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.

It will be understood that the present disclosure is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

Claims

1. A method of motion capture, the method comprising:

2. The motion capture method of claim 1, wherein calculating the distance between the foot and the ground based on the three-dimensional coordinates of the foot joint points comprises:

3. The motion capture method of claim 2, wherein prior to calculating the distance between the foot joint point and the ground based on the ground location and the three-dimensional coordinates of the foot joint point, further comprising:

4. The motion capture method of claim 1, wherein the calculating foot velocities from the two-dimensional coordinates of the foot keypoints comprises:

5. The motion capture method of claim 4, wherein the calculating the foot velocity for the each foot from the foot vector for the each foot comprises:

wherein the content of the first and second substances,

representing the foot speed of one side foot of the target person in the n-th frame of video image, F_i ⁿA foot vector representing the side foot of the target person in the n-th frame of video image, K represents F_i ⁿNumber of foot Key points, F, included_i ⁿ⁺¹The foot vector of the foot part of the target person in the n +1 th frame of video image is represented, delta T represents the acquisition time interval of two adjacent frames of video images, and i represents the left side or the right side.

6. The motion capture method of claim 1, wherein determining the contact between the foot and the ground based on the distance between the foot and the ground and the foot velocity comprises:

7. A motion capture device, the device comprising:

8. An electronic device, comprising:

a processor;

a memory for storing the processor-executable instructions;

wherein the processor is configured to execute the instructions to implement the motion capture method of any of claims 1 to 6.

9. A computer-readable storage medium in which instructions, when executed by a processor of an electronic device, enable the electronic device to perform the motion capture method of any of claims 1-6.

10. A computer program product comprising a computer program, characterized in that the computer program, when being executed by a processor, implements the motion capture method of any of claims 1 to 6.