CN112515661B

CN112515661B - Posture capturing method and device, electronic equipment and storage medium

Info

Publication number: CN112515661B
Application number: CN202011376569.XA
Authority: CN
Inventors: 柴金祥; 其他发明人请求不公开姓名
Original assignee: Shanghai Movu Technology Co Ltd; Mofa Shanghai Information Technology Co Ltd
Current assignee: Shanghai Movu Technology Co Ltd; Mofa Shanghai Information Technology Co Ltd
Priority date: 2020-11-30
Filing date: 2020-11-30
Publication date: 2021-09-14
Anticipated expiration: 2040-11-30
Also published as: WO2022111525A1; CN112515661A

Abstract

The application discloses a gesture capturing method, a gesture capturing device, electronic equipment and a storage medium, wherein the gesture capturing method comprises the following steps: determining a spatial position of a first marking object attached to a subject and label information of a part of the first marking object on the subject; and adjusting an initial posture model by using the spatial position of the first marking object and the label information to generate a reference posture model corresponding to the shot object. According to the method and the device, the reference attitude model of the shot object can be generated by utilizing the spatial position and the label information of the first marking object, and the subsequent motion capture is facilitated.

Description

Posture capturing method and device, electronic equipment and storage medium

Technical Field

The present application relates to the field of computer technologies, and in particular, to a method and an apparatus for capturing an attitude, an electronic device, and a storage medium.

Background

With the rapid development of computer technology, sensor technology and virtual reality industry, the development of motion capture technology is extremely rapid and the application range is increasingly wide, especially playing a very important role in many fields such as sports industry, game making, animation making and movie and television special effect making, etc., which forms a novel mode of mutual infiltration and fusion of art and technology and will become a development trend in the future.

In the current optical capturing mode, mark points are placed on all joints of the body of an actor, three-dimensional coordinates of the mark points are acquired by utilizing a plurality of high-speed infrared cameras, and the performance of the actor is deduced through the three-dimensional coordinates of the mark points. In the scheme, each skeleton of a human body is assumed to be a rigid body, so that the mark point is placed at the position of a bone node on an actor, and the position of the placed point must be accurate. And reconstructing a skeletal model of the actor by moving the joints is time consuming and cumbersome. The existing method can not reconstruct a three-dimensional human body model matched with the actor, and because each skeleton of the human body has muscle attachment and is not a rigid body in a strict sense, the method has deviation from the assumption of the existing optical capturing technology, and therefore, the existing optical capturing technology has some precision loss for capturing the actor performance. In addition, because the existing optical capturing technology cannot reconstruct a three-dimensional human body model matched with the actor, the capturing precision cannot be improved by increasing the number of the mark points attached to the actor, and the robustness is enhanced.

Disclosure of Invention

At present, there are two methods for capturing the motion of an actor with higher precision, namely optical motion capture and inertial motion capture, which are described below. In the optical capture, a plurality of high-speed infrared cameras are arranged around a capture field, mark points are placed on each joint of the actor body, the three-dimensional coordinates of the mark points are obtained, and the performance of the actor is deduced through the three-dimensional coordinates of the mark points. Optical capture has problems that, firstly, the existing optical capture assumes that each bone of the actor is a rigid body, so that the mark points must be placed at the bone nodes of the actor (the bone nodes refer to the positions of the bones protruding from the body, and if the mark points are placed at other positions, the mark points may slide along with the change of muscles), the actor moves each joint to obtain the length of each joint bone of the actor, and the process requires that the positions of the placed mark points must be accurate, and each joint needs to be moved, which is time-consuming and tedious. Since each bone of the human body has muscle attachment and is not a rigid body in a strict sense, which deviates from the assumption of the existing optical capturing technology, the existing optical capturing technology has some loss of accuracy in capturing the actor's performance. In addition, because the existing optical capturing can not reconstruct a three-dimensional human body model matched with the actor, the capturing precision can not be improved by increasing the number of the mark points attached to the actor, and the robustness can not be enhanced. Secondly, when capturing more complicated single-person motion (such as kneeling or rolling on the ground) or multi-person motion capture (such as holding by multiple persons), the light capture camera cannot see the mark points on the actor due to the self-shielding of the actor, and in this case, the capture system cannot acquire the mark points attached to the actor, which leads to the reduction of capture precision or errors.

Another capture method is inertial capture, in which inertial sensors (angular velocity meters, linear accelerometers, and the like) are placed at each joint, so that information such as angular velocity and linear acceleration of each joint is obtained when an actor moves, and since the inertial capture cannot directly obtain absolute position and direction information of each joint of the actor, the angular velocity and acceleration of each joint need to be integrated to obtain the absolute posture of each joint. Compared with optical capturing, inertial capturing cannot directly acquire joint postures, and can only acquire the joint postures through integral angular velocity and acceleration, in addition, observation data of an inertial sensor generally has a certain amount of noise, and error accumulation caused by integral is increased along with the increase of capturing time, so that the inertial capturing cannot perform long-time accurate human motion capturing, a certain deviation exists between a three-dimensional human posture and a real human posture when being captured, and the deviation is increased along with the increase of capturing time; various sensors need to be worn for inertial capture, and wearing is inconvenient; the sensor needs a battery for driving, and the recording time is limited by the battery.

In view of the above problems, embodiments of the present application provide a gesture capturing method, device, electronic device and storage medium, which are used to solve at least the above-mentioned problems.

The embodiment of the present application further provides a method for capturing an attitude, including: determining a spatial position of a first marking object attached to a subject and label information of a part of the first marking object on the subject; and adjusting an initial posture model by using the spatial position of the first marking object and the label information to generate a reference posture model corresponding to the shot object.

The embodiment of the method has the advantages that the marking point does not need to be placed at the bone node of the actor, and does not need to be accurate to a certain fixed position, so that the time is saved. And a three-dimensional human model matching the actor can be reconstructed.

Optionally, the initial pose model comprises body parameters for describing a body and motion parameters for describing a motion.

Optionally, the adjusting an initial pose model using the spatial position of the first marker object and the tag information, and generating a reference pose model corresponding to the object includes: setting motion parameters in the initial pose model by causing the subject to make a specific motion; acquiring the spatial position of a first marker object and the tag information when the object performs the specific motion if the motion parameter is determined; and adjusting the body parameters of the initial posture model by using the action parameters, the spatial position and the label information to generate the reference posture model corresponding to the shot object.

According to the method, the three-dimensional human body model matched with the actor can be reconstructed by acquiring the motion parameters and the spatial position and the label information of the first marking object.

Optionally, after generating the reference pose model corresponding to the object, the method further includes: acquiring a current spatial position and current tag information of a first marking object in response to the motion of the object; and adjusting the reference attitude model by using the current spatial position and the current tag information to obtain a current attitude model of the object, so as to be used for capturing the current attitude of the object.

According to the embodiment of the method, the motion of the human motion can be accurately captured by using the current spatial position and the current tag information of the first marked object, and the captured gesture is higher in matching degree with the real human gesture.

Optionally, the obtaining of the current tag information of the first tagged object includes: acquiring a predicted spatial position of the current tag information and a current spatial position of the first marker object; under the condition that the current spatial position of the first marking object is determined to be within a preset range of the predicted spatial position of the current tag information, matching the first marking object with the current tag information to obtain a matching relation, wherein the preset range is a range set according to the motion track prediction of the shot object; and determining the current label information corresponding to the first marking object according to the matching relation.

Optionally, the adjusting the reference pose model corresponding to the subject by using the current spatial position and the current tag information to obtain a current pose model of the subject for capturing a current pose of the subject includes: and under the condition that the body parameters of the reference posture model are determined, continuously adjusting the motion parameters to enable the sum of the virtual space positions of all the tag information and the space position distances of the first marking objects corresponding to all the tag information to be minimum, and acquiring the current posture model for capturing the posture of the shot object.

Optionally, the method further comprises: and utilizing a prior action model generated by a preset action library to constrain the current attitude model and acquiring the constrained current attitude model.

The embodiment of the method can utilize the prior motion model to restrain the motion, and avoid unreasonable or discontinuous motion of the performer. The problem that unreasonable and discontinuous actions are captured when the mark points are missing due to shielding can be solved.

In addition, in the actor performance process, probably because the performance demand can appear some from sheltering from the condition, like both hands embrace the chest, the point of arm, chest has inevitable caused sheltering from, squats on the ground, and the reflection of light point on shank, the belly has also caused sheltering from. At the moment, the mark points in the formula are lost, the three-dimensional coordinates corresponding to the chest mark points cannot be found when the user holds the chest, the performance of the performer can be captured by adding the prior information, and unreasonable actions are prevented from being captured.

Furthermore, an a priori motion model can be added to limit the captured motion so as not to generate motion that the object cannot make. And utilizing a prior action model generated by a preset action library to constrain the current attitude model and acquiring the constrained current attitude model.

For example, in the case of human being, since the degree of freedom of human skeleton is high, if the human body is not constrained, some actions that the human body cannot do are generated, and when the human body does continuous actions, the continuous actions are continuous and reasonable.

Optionally, the method further comprises: determining an interactive prop for performing interaction with the subject; acquiring a prop space position and prop label information of the interactive prop through a prop mark object attached to the interactive prop; and adjusting a basic prop posture model corresponding to the interactive prop by using the prop spatial position and the prop label information to generate a current prop posture model for capturing the motion of the interactive prop.

Alternatively, determining the spatial position of the first marker object attached to the subject comprises: determining at least two-dimensional positions of the first marked object using images taken of the first marked object by at least two cameras; and determining the spatial position of the first marked object by using the at least two-dimensional positions and the calibration information of the at least two cameras.

Optionally, determining the spatial position of the first marker object using the at least two-dimensional positions and the calibration information of the at least two cameras comprises: determining at least two rays of the at least two cameras corresponding to the first marked object by using the at least two-dimensional positions and calibration information of the at least two cameras; the spatial position of the first marker object is determined in such a way that the distance of the first marker object from the at least two rays at the spatial position is minimal.

Optionally, the spatial position comprises coordinate data of the first marker object within a spatial coordinate system corresponding to a capture volume for capturing the subject.

Optionally, the method further comprises: acquiring calibration information of all cameras by performing calibration on all cameras used for capturing the object; setting a proportional relation according to the calibration information of all the cameras and a marking device with a second marking object; and calibrating the ground of the capturing space by utilizing the calibration information of all the cameras, determining the ground information and the space coordinate system determined by utilizing the ground information.

Optionally, the determining the spatial position of the first marker object attached to the object and the tag information of the first marker object at the position of the object further includes: and matching the first mark object with the label information to obtain the corresponding relation between the first mark object and the label information.

The embodiment of the present application further provides a method for capturing an attitude, including: determining a current spatial position of a first marker object attached to a subject and tag information describing a part of the first marker object on the subject in response to an action of the subject; and adjusting the reference posture model of the shot object by using the current spatial position and the tag information of the first marking object, and acquiring the current posture model of the shot object so as to capture the current posture of the shot object.

The embodiment of the method can accurately capture the action matched with the action of the actor according to the spatial position and the label information of the first marker object by responding to the action of the actor.

An embodiment of the present application further provides a gesture capture device, the device includes: a tag information determination unit configured to determine a spatial position of a first marker object attached to a subject and tag information describing a part of the first marker object on the subject; and a reference posture model generating unit configured to generate a reference posture model corresponding to the subject by adjusting an initial posture model using the spatial position of the first marker object and the tag information.

An embodiment of the present application further provides an electronic device, including: one or more processors; a memory; and one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by the one or more processors, the one or more programs including instructions for performing the above methods.

Embodiments of the present application also provide a computer readable storage medium storing one or more programs, the one or more programs comprising instructions, which when executed by a computing device, cause the computing device to perform the method.

The embodiment of the application adopts at least one technical scheme which can achieve the following beneficial effects:

in summary, the pose capturing method according to the exemplary embodiment of the present application can determine the reference pose model corresponding to the subject by using only the spatial position of the first marker object and the tag information without constraining the first marker object, does not need to limit the first marker object to the bone point, is more flexible to use, and can more flexibly and conveniently acquire the reference pose model corresponding to the subject.

The mark points do not need to be placed at the bone nodes of the actor and can be placed at any position on the actor; the degree of freedom of the stationing on the actor is increased. And obtaining the model matched with the actor, so that the precision of capturing the actor performance is further improved. An artificial intelligence human body posture prior model is established through the posture database, and the prior model is used in capturing, so that when an actor performs, the performance of the actor can still be captured when a large number of reconstructed reflecting points are lacked due to self-shielding.

Drawings

The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the application and together with the description serve to explain the application and not to limit the application. In the drawings:

FIG. 1 is a flowchart of the steps of a method of gesture capture according to an exemplary embodiment of the present application;

FIG. 2 is a diagram of a calibration operation performed on a plurality of cameras using a second marking device, according to an exemplary embodiment of the present application;

FIG. 3 is a block diagram of performing calibration operations on multiple cameras according to an exemplary embodiment of the present application;

FIG. 4 is a block diagram of obtaining a spatial position of a first marker object according to an exemplary embodiment of the present application;

FIG. 5 is a diagram of spatial coordinate matching according to an exemplary embodiment of the present application;

fig. 6 is a block diagram of generating a reference pose model of a subject according to an exemplary embodiment of the present application;

FIG. 7 is a block diagram of obtaining a current pose model according to an exemplary embodiment of the present application;

FIG. 8 is a block diagram of determining current tag information according to an exemplary embodiment of the present application;

FIG. 9 is a block diagram of a gesture capture device according to an exemplary embodiment of the present application;

fig. 10 is a schematic diagram of a first marker object in a human body position at the time of gesture capture according to an exemplary embodiment of the present application.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the technical solutions of the present application will be described in detail and completely with reference to the following specific embodiments of the present application and the accompanying drawings. It should be apparent that the described embodiments are only some of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

The technical solutions provided by the embodiments of the present application are described in detail below with reference to the accompanying drawings. In order to be able to solve the above technical problem, a pose capturing method of an exemplary embodiment of the present application may determine a spatial position of a first marker object attached to a subject using at least two cameras, and generate a reference pose model of the subject using the spatial position. After the object moves, the motion parameters in the reference posture model can be determined by using the spatial position of the first marking object and the label information, so that a current posture model corresponding to the motion is generated, and the aim of capturing the posture of the object is fulfilled.

The gesture capturing method of the exemplary embodiment of the present application can be applied to various fields including, but not limited to, animation field, sports field, game production, motion production, and movie production.

FIG. 1 is a flowchart of the steps of a gesture capture method according to an exemplary embodiment of the present application.

As shown in fig. 1, in step S110, the spatial position of the first marker object attached to the subject and the tag information describing the position of the first marker object on the subject are determined.

In implementation, the object refers to an object photographed by a camera, and the object may be a living object capable of performing various actions by itself, including but not limited to humans (including males or females) or animals (e.g., pandas, horses, etc.), a mechanical object that receives instructions and performs various actions, for example, an automatic walking device (such as a robot), or an inanimate object, for example, an interactive prop that performs various actions in cooperation with the object, such as a soccer ball, a basketball, or a flower club.

Further, the subject according to the exemplary embodiments of the present application may be a single photographic subject, that is, a single subject for gesture capture, or a plurality of photographic subjects, in the case of which the respective motions may be individually performed, or various interactions may occur. For example, two subjects may hug together.

The mark object refers to a mark (marker point) whose surface is covered with a special light-reflecting material, such as a spherical mark. The first markup object and the second markup object are only used for distinguishing in naming in the present application. In practice, a camera may be used to emit infrared light that is reflected off of the marker and acquires the planar coordinates (i.e., two-dimensional coordinates) of the marker. In addition, neither the first marker object nor the second marker object mentioned in the present application limits the number, that is, there may be a plurality of first marker objects and a plurality of second marker objects. The processing may be performed for each of the first and second time marked objects in the following manner.

Further, the spatial position includes coordinate data of the first marker object within a capture space for capturing the subject. Specifically, in order to capture the posture of the subject, a calibration site is first constructed for the subject, wherein the calibration site is composed of a plurality of cameras, and then after the calibration site is constructed, a virtual capture space corresponding to the calibration site is determined by using the cameras and the second marker object, and then a space coordinate system corresponding to the capture space is determined. The camera calibration operation will be described below with reference to fig. 2 and 3.

Fig. 2 is a diagram of performing a calibration operation on a plurality of cameras using a marking device according to an exemplary embodiment of the present application. FIG. 3 is a block diagram of performing calibration operations on multiple cameras according to an exemplary embodiment of the present application.

As shown in fig. 2, these cameras constitute a calibration space. Then, it may be calibrated using a calibration device (e.g., a calibration rod) as in fig. 2, wherein a marking object (i.e., a second marking object) is disposed on the calibration device, and preferably, three marking objects may be disposed on the calibration device.

The field is then swept using a marking device, in particular the marking device with the second marking object may be a marking rod with marking points. In implementation, a user (e.g., a technician) may swing a calibration pole with marker points in a calibration field, each camera acquires two-dimensional coordinates of the marker points, and performs calibration on all cameras according to the two-dimensional coordinates to obtain calibration information between all cameras, where the calibration information includes relative position relationships between the cameras and internal parameters of the cameras. Wherein the calibration site is a real space.

As shown in fig. 2, each camera in fig. 2 may capture an image of a calibration bar including a marker object and calculate calibration information. In an implementation, the cameras in fig. 2 may include at least two cameras.

Specifically, as shown at block 301, individual retro-reflective dots within the capture volume may be excluded. Since some reflective spots are inevitably captured by the camera in the field, the camera needs to be tested to eliminate the reflective spots affecting the capture of the camera, i.e. to ensure that the camera captures the marked object.

Subsequently, as shown in block 302, a sweep is performed using the calibration apparatus. Three collinear second marking objects can be mounted on the calibration device, and the distances between the three marking objects are determined. In the calibration space, the calibration device is swung, the cameras can capture the plane positions of the three marked object points, and finally, the scanning is finished after each camera acquires the plane positions.

Subsequently, as shown in block 303, calibration information for all cameras is determined, wherein the calibration information includes parameter information, relative position and scale information of the cameras. In an implementation, the parameter information includes internal parameters of the camera, including a focal length, a distortion parameter, and the like, and external parameters, which refer to a position and an orientation of the camera.

In implementation, a marking device with a second marking object in the calibration space can be shot by the camera, and an image of the marking device is acquired; and finally, determining the proportional relation through the distance between the second marking objects in the calibrating device.

Specifically, when the cameras are calibrated, the calibrated cameras are used for capturing the mark points on the calibration rod, three-dimensional coordinates of the captured mark points are reconstructed in a capturing space, the distance of the three-dimensional coordinates of the reconstructed mark points is compared with the distance between the mark points on the actual calibration rod, and a proportional relation is obtained and used for subsequent calculation.

At the same time, as shown in block 304, a set square (with 3 vertices on the set square having respective marked objects) may be placed in the capture volume to calibrate the ground, thereby determining ground information. Specifically, an L-shaped triangular bar having a marking point on each corner is placed on the calibration site. Three-dimensional coordinates of three mark points on the L-shaped triangular rod are reconstructed in the capturing space, a virtual L-shaped triangular rod is formed in the capturing space, a right-angle point of the virtual L-shaped triangular rod is an original point, a short side is a Z axis, a long side is an X axis, a Y axis can be established through the X axis and the Z axis, ground information of the capturing space can be established through the X axis and the Z axis, and the original point and the X axis, the Y axis and the Z axis are space coordinate systems in the capturing space. Wherein the capture space is a virtual space.

Finally, as shown in block 305, the spatial coordinate system is determined using the calibration information and the ground information determined in block 304. That is, after determining the ground information for the capture volume, a spatial coordinate system for the capture volume based on the ground information may be determined.

After determining the spatial coordinate system of the capture space, the spatial location of the first marker object may be determined. Which will be described in detail below with reference to fig. 4. FIG. 4 is a block diagram of acquiring a spatial position of a first marker object according to an exemplary embodiment of the present application.

As per block 401, two-dimensional positions of a number of first tagged objects may be acquired with a number of cameras. In an implementation, each first marker object is photographed by at least two cameras, at least two images of the same first marker object photographed by the at least two cameras are acquired, and then at least two-dimensional positions for the same marker object are acquired by the at least two images. At block 402, calibration information for the at least two cameras is obtained. Subsequently, at block 403, at least two rays corresponding to the same first marker object may be generated using the calibration information of the at least two cameras and the at least two-dimensional positions corresponding thereto.

Subsequently, as shown in block 404, the corresponding relationships of different cameras to the same marked object can be obtained according to various constraint conditions. And a corresponding ray is generated for each two-dimensional position using the parameter information of the camera.

Finally, at block 406, after the above correspondence is obtained, the three-dimensional position of the same first marker object may be determined by intersecting rays generated by different cameras for the first marker object. That is, a point having the smallest distance from all the rays is found as the three-dimensional coordinate point of the mark object.

In practice, these rays may not intersect at a point, and an optimization process, as shown in block 405, may be employed to make the reconstructed three-dimensional position more stable. In short, the optimization processing may iteratively adjust the weights of the different rays according to the different distances between the generated three-dimensional coordinate point and the different rays, so that the generated three-dimensional coordinate point is closest to the most rays.

Given the above process of determining spatial positions for a single first marker object, in an implementation, in the case of multiple first marker objects, the above process may be employed to obtain corresponding spatial coordinates for each first marker object.

For better explanation, the following description will be made in conjunction with fig. 5. Fig. 5 is a diagram of spatial coordinate matching according to an exemplary embodiment of the present application. As shown in fig. 5, the first tagged object may generate

different images

510 and 520 with different cameras. The two-dimensional position of the first marker object in the image 510 is P_LThe two-dimensional position of the first marker object in the image 520 is P_R. The optical center of the camera corresponding to image 510 is O_LThe optical center of the camera corresponding to image 520 is O_R. The ray P thus formed_LO_LAnd P_RO_RCan intersect at point P, then point P is the reconstructed spatial location of the first tagged object. Fig. 5 may be referred to as a three-dimensional reconstruction process of the spatial position of the first marker object.

Subsequently, the position of the first mark object on the object can be determined. Preferably, a mark object set (markerset) is set in advance, that is, a position of which part of the subject the first mark object is attached to is set in advance. For example, a plurality of mark objects may be attached to a certain portion of the subject, and the plurality of mark objects may be located at different positions of the certain portion. A certain position defined to be attached to a certain portion is referred to as label information.

In implementation, the subject may be caused to assume a specific posture (e.g., a human-type posture or a T-type posture), then a spatial position on each first mark object placed on the action suit of the subject is acquired, and the tag information of each first mark object is determined based on the preset markerset, for example, the tag information of the first mark object located at the uppermost and middle position may be determined as the upper position of the head.

Subsequently, step S120 may be performed, where an initial pose model is adjusted by using the spatial position of the first marker object and the tag information, and a reference pose model corresponding to the object is generated. To better describe this step, the following will be described with reference to fig. 6.

Fig. 6 is a block diagram of generating a reference pose model of a subject according to an exemplary embodiment of the present application.

At block 610, a technician may obtain a large amount of model data via three-dimensional scanning, for example, a subject being a human, and the pose database may include pose data for various body configurations and/or actions, such as tall, short, fat, thin, male, female, and the like.

At block 620, a low-dimensional body distribution may be generated using the gesture database in block 610. The distribution can be sampled to generate different forms of the human body.

At block 630, an initial pose model is established, wherein the initial pose model includes shape parameters for describing a shape and motion parameters for describing a motion. As shown in the following equation 1, FK is used to indicate an initial pose model,

rho can represent height and fat-thin respectively, and position represents the posture of the object to be shot, because

Both ρ and pos are unknown, and therefore need to be solved for using the spatial position of the first marker object.

At block 640, a spatial position and tag information of a first marker object is obtained, and at block 650, an initial pose model is adjusted using the spatial position and tag information of the first marker object to generate a reference pose model corresponding to the subject.

Alternatively, in step S110, determining the spatial position of the first marker object attached to the subject and the tag information describing the position of the first marker object on the subject may further include: and matching the first mark object with the label information to obtain the corresponding relation between the first mark object and the label information.

In implementation, since α, ρ and pos are unknown, the spatial positions and label information of the first marker object at different time points and/or different actions of the subject can be obtained, and then the solution is performed by using the spatial positions and label information to determine the parameters in FK, and the parameters θ, ρ in FK are to satisfy the low-dimensional body distribution. It should be noted that how to determine the tag information of the first marked object when the spatial position changes will be explained in detail below with reference to fig. 7, and will not be described in detail herein.

Preferably, the motion parameters in the initial pose model may be set by causing the subject to make a specific motion (e.g., to make a T-shape). In order to make the result more accurate, the set action parameter is a standard specific action. In a case where the motion parameter is determined, the spatial position of the first mark object in a case where the subject makes the specific motion and the tag information are acquired.

Finally, the body parameters of the initial attitude model are adjusted by using the motion parameters, the spatial position and the tag information, and the reference attitude model corresponding to the shot object is generated. That is, the shape parameters in the FK model are continually adjusted in the manner of equation 1 below until equation 1 converges.

In implementation, the initial pose model may be adjusted using equation 1 below:

in the formula, α, ρ can respectively represent the body parameters (height and fat and thin) of the object to be shot, and pos represents the motion parameters (posture) of the object to be shot; FK represents a posture model, and a virtual human body model with motion corresponding to a shot object can be reconstructed by utilizing alpha, rho, pos and the FK posture model.

Corr represents the matching relationship of the tag information with the first tagged object, i.e. which of the first tagged objects the tag information i corresponds toA tag object (or to which tag information the first tag object m belongs). i represents the tag information of the first marked object, body marker in the formula_iThe position of the ith label information on the virtual human body model corresponding to the shot object is represented by a body marker_i(FK (α, ρ, pos) can derive a virtual three-dimensional coordinate, Marker, of the i-th tag information_mRepresenting the three-dimensional coordinates of the mth first marker object.

The tag information is matched with the first marked object by a matching relationship Corr, i.e. the tag information i corresponds to the mth first marked object. Dis denotes a body marker_i(FK (. alpha.,. rho., pos) and Marker)_mThe distance of (c).

Formula (1) shows that after the three-dimensional coordinates of the first marker objects are acquired, the tag information of each first marker object is determined (Corr is used in the formula to represent the matching relationship between the first marker object and the tag information), and then the variables in formula (1) are optimized to minimize the sum of the distances between the virtual three-dimensional coordinates of all the tag information on the virtual human body model and the three-dimensional coordinates of the first marker objects corresponding to all the tag information, so as to acquire the virtual human body model corresponding to the object.

The optimization process of equation (1) is as follows:

1. setting an initial value of the variable. Setting body marker with initial value for defining marker_iI.e. when defining the markerset, body marker_iIndicating the position of the tag information i set on the virtual human body model; acquiring human body distribution under height constraint in low-dimensional human body distribution of a posture database through the height of a shot object (three-dimensional coordinate height difference of a first marked object), and taking the average value of the human body distribution under the constraint as the initial value of alpha and rho; the subject's position is close to a preset posture, and a preset specific motion (such as T-position) is used as an initial value.

2. The matching relationship of the tag information and the first marker object (Corr in formula (1)) is obtained. By body marker_i(FK (α, ρ, pos) can derive virtual three-dimensional coordinates of tag information, the virtual three-dimensional coordinates of all tag information being set A, the three-dimensional coordinates of all first marker objects being set BBecause each virtual three-dimensional coordinate in the set A has label information, the successfully matched three-dimensional coordinate in the set B has label information consistent with the virtual three-dimensional coordinate in the set A after matching.

For example, the matching method may employ a nearest neighbor matching method. The nearest neighbor matching method is a method for forming matching by starting from a certain point a in the set A and searching for a point which is closest to the point a in another set B. In practical use, the present invention is not limited to the matching method (i.e., other matching methods may be used).

3. Optimizing variables alpha, rho, pos, body marker_iAnd the sum of the distances between the virtual three-dimensional coordinates of all the label information on the virtual human body model and the three-dimensional coordinates of the first label object corresponding to all the label information is minimized.

4. And returning to the step 3 for iterative optimization until the formula (1) converges or the maximum iteration number is reached.

Since the shape of each object is different, in order to be able to more accurately represent each object, each object needs to perform the above operation, which is referred to as a calibration process, before motion capture is performed, to determine a reference attitude model of the object. And the start motion of the object may be set to a certain motion in order to generate the reference pose model of the object more accurately. For example, each actor makes a T-shaped motion before performing motion capture, and then generates a reference pose model for the actor.

The method can generate a current posture model corresponding to the current posture of the object by using the reference posture model when the reference posture model corresponding to the object is determined, so that the current posture of the object can be captured. To better describe this process, it will be described in detail below in conjunction with fig. 7.

After the calibration operation for the object is completed according to the above-described operation, the capturing operation can be performed for the object. When capturing, a specific motion (e.g., T-pos) is first swung out. And acquiring the three-dimensional coordinates of the first marked object and the matching relation between the label information and the first marked object.

Specifically, the first mark pairThe current spatial position of the image is obtained by three-dimensional reconstruction. The acquisition of the matching relation is similar to the calibration process, and at the moment, the position and the matching relation Corr are optimized, because the alpha, rho and body marker in the formula (1)_iAcquired during the calibration process and kept unchanged.

The subject can perform various actions according to actual needs, for example, actors can perform various performances according to a scenario. In this case, in response to the motion of the subject, at block 710, the current spatial position of the first mark object and the current tag information are acquired under the motion.

The current space position of the first marked object is obtained through three-dimensional reconstruction, and the current label information of the first marked object is obtained through the matching relation between the label information and the first marked object.

The current spatial position of the first marker object is a spatial position on which the first marker object position has moved as a result of the subject performing the action. In implementation, the current spatial position of the first marker object may be determined as described above with respect to FIG. 4.

In case the current spatial position of the first marker object has been determined, the current tag information of the first marker object may be determined. The process of determining the current tag information will be described below in conjunction with fig. 8.

Fig. 8 is a block diagram illustrating determining current tag information according to an exemplary embodiment of the present application.

At block 810, a predicted spatial location of current tag information i is obtained. In an implementation, a following spatial position of the current tag information i may be predicted according to a preceding spatial position of the first tagged object corresponding to the tag information i, and the following spatial position may be determined as a predicted spatial position, where the preceding spatial position refers to a spatial position of the first tagged object corresponding to the tag information i at a previous time (i.e., a previous frame). In the last time, the correspondence between the tag information i and the first marker object is determined, that is, the tag information i and the spatial position of the first marker object are consistent.

The predicted spatial position refers to a spatial position predicted by the current tag information i at the current time (i.e., the current frame). In implementation, the prediction spatial position may be determined using a prediction method for a motion trajectory of an object, and here, the prediction method will not be limited.

At block 820, the current spatial position of the first tagged object P is obtained. In practice, the current spatial position may be determined using the methods already described above.

At block 830, it is determined whether the current spatial position of the first tagged object P is within a preset range of the predicted spatial position of the current tag information i. If yes, namely when the current spatial position of the first marked object P is determined to be within the preset range of the predicted spatial position of the current label information i, the first marked object P and the current label information i are matched by using a nearest neighbor method, and whether the matching is correct or not is judged (the nearest neighbor method is introduced during the calibration and is not repeated), if the matching is correct, the first marked object P and the label information i are in an effective matching relation; if not, no matching is required. The preset range is a range set according to the motion track prediction of the object, and the matching relation between the first marking object and the label information is obtained through the process.

At block 840, the matching relationship between the tag information i at the current time and the first tag object P is determined, and the tag information i corresponding to the successfully matched first tag object P is determined according to the matching relationship.

Having determined the current tag information, at block 730, the reference pose model in block 720 is adjusted using the current spatial position and the current tag information to obtain a current pose model of the subject for capturing the current pose of the subject.

Specifically, the reference posture model is a model obtained in the above process, and the model is a model composed of a body parameter and an action parameter, and when the body parameter is determined, the action parameter in the model can be determined using the current spatial position of the first marker object and the current tag information.

In implementation, the sum of the distances between the virtual three-dimensional coordinates of all the label information on the virtual human body model and the three-dimensional coordinates of the first mark object corresponding to all the label information can be minimized by continuously adjusting the action parameters.

For example, for a first marker object whose tag information is a head, a left arm, and a right leg, the motion parameter in the reference posture model may be acquired by minimizing the sum of the difference between the spatial position (three-dimensional coordinates) of the first marker object corresponding to the head tag and the virtual spatial position (virtual three-dimensional coordinates) corresponding to the head tag in the reference posture model, the difference between the spatial position of the first marker object corresponding to the left arm tag and the virtual spatial position corresponding to the left arm tag in the reference posture model, and the difference between the spatial position of the first marker object corresponding to the right leg tag and the virtual spatial position corresponding to the right leg tag in the reference posture model, wherein the corresponding virtual spatial position of each tag in the reference pose model indicates the position of the contact point where the corresponding first marker object is attached to the reference pose model (i.e. in the virtual body).

After determining the motion parameters, a current pose model can be determined by using the shape parameters and the motion parameters, so that the current pose of the object is captured.

In addition, in the actor performance process, probably because the performance demand can appear some from sheltering from the condition, like both hands embrace the chest, the point of arm, chest has inevitable caused sheltering from, squats on the ground, and the reflection of light point on shank, the belly has also caused sheltering from. At the moment, the mark points in the formula are lost, the three-dimensional coordinates corresponding to the chest mark points cannot be found when the user holds the chest, and the position can be restrained by adding prior information to capture the performance of the performer, so that unreasonable actions are prevented from being captured.

Based on the above consideration, the prior motion model can be generated by using a preset motion library, wherein the motion library comprises motions conforming to human bones and motion continuity. Subsequently, in determining the current pose model, it can be constrained using the a priori motion model. In implementation, the processing may be performed according to the following equation 2:

in the formula, i represents the ith label, j represents the current frame, pos^jRepresents the current frame, position^j-1Represents the previous frame (pos)^j-1，...，pose^j-k) Representing the k frames (pos) before the current frame.

Wherein FK represents a reference attitude model, alpha and rho represent body parameters of the reference attitude model, and pos represents motion parameters of a shot object; marker_mThree-dimensional coordinates representing an mth first marker object; i represents the tag information which is matched to the first marked object by means of a matching relation Corr. As shown in equation 2, Prior2 (pos)^j) The Prior motion model, Prior1(posej | (pos), indicating that the acquired pos satisfies the motion preset^j-1，...，pose^j-k) Indicates that the captured pos needs to satisfy the timing signal without a sudden change. body marker_iThe position of the ith label on the virtual human body model corresponding to the shot object is shown, and the position is marked by a body marker_i(FK (α, ρ, pos) can derive the virtual three-dimensional coordinates of the i-th tag.

α, ρ, body marker in formula (2)_iObtained during calibration, equation (2) is a fixed variable. The correspondence (Corr) of the tag information to the first marker object is known. And obtaining the motion parameters of the shot object by optimizing the position to minimize the sum of the distances between the virtual three-dimensional coordinates of all the label information on the virtual human body model and the three-dimensional coordinates of the first mark object corresponding to all the label information.

In addition, the subject may also include interactive props, for example, an actor and a basketball interacting with the actor may be photographed simultaneously using multiple cameras. It should be noted that the interactive props are not limited in number and type, that is, the interactive props may be single or multiple, and may be of the same type or different types.

In this case, an item marker object may be placed on the interactive item in advance, where the item marker object is the same marker object as the first marker object described above, and is not limited in number. And then, acquiring a prop spatial position and prop label information of the interactive prop through a prop mark object attached to the interactive prop, wherein the prop spatial position and the prop label information can be acquired as above.

And finally, adjusting the basic prop posture model by using the space position of the prop mark points and the label information to obtain the current prop posture model and capture the motion of the current prop. The method comprises the steps that a virtual human body model is generated through an algorithm for a human body, a virtual prop model is manually made for a prop, the virtual prop model is manually made according to mark point information of the prop during calibration, and prop actions are captured according to the space position and the mark point information of the mark point on the prop during capture, so that the virtual prop model can make the same actions or postures as real props.

In addition, the method can also execute redirection operation, namely, the current posture model is redirected to the virtual object according to the preset corresponding relation. In addition, under the condition that the shot object comprises the interactive prop, the current prop posture model can be redirected to the virtual object according to the preset corresponding relation.

The invention provides a novel optical body motion capturing mode, which obtains the low-dimensional distribution of a human body model by establishing a human body model database, and can directly obtain the body model (including bones and the statures of actors) corresponding to the actors during optical capturing. The marking point does not need to be placed at the bone node of the actor and can be placed at any position on the actor; the degree of freedom of the stationing on the actor is increased. And obtaining the model matched with the actor, so that the precision of capturing the actor performance is further improved. The attitude prior model is established through the attitude database, and the prior model is used in capturing, so that when an actor performs, the actor can still be captured when a large number of reconstructed reflecting points are lacked due to self-shielding.

In summary, the pose capturing method according to the exemplary embodiment of the present application can determine the reference pose model corresponding to the subject by using only the spatial position of the first marker object and the tag information without constraining the first marker object, does not need to limit the first marker object to the bone point, is more flexible to use, and can more flexibly and conveniently acquire the reference pose model corresponding to the subject. Further, by determining the body parameters of the reference pose model when the subject performs a specific motion, the reference pose model can be made to better conform to the body of the subject. Furthermore, the current spatial position, the current tag information and the reference posture model of the first marking object can be used for acquiring the current posture model of the object, so that the current posture of the object is captured, the motion of the object is captured, and the difficulty of motion capture is reduced.

The gesture capturing method can be used in the field of performance animation production and the field of virtual live broadcast, particularly high-quality three-dimensional animation generation, and can generate the motion and/or gesture of a virtual character by capturing the motion and/or gesture of a real object. The gesture capturing method can realize capturing of a single person and capturing of multiple persons, namely output of a single virtual character or output of multiple virtual characters can be realized in the same picture. In the case of multi-person capture, interactions between actors may be captured, e.g., hugs, handshakes, etc., and the interactions of the virtual character may be output based on the interactions between the actors. The gesture capture method may support an offline animation generation mode and a real-time online animation generation mode.

In the embodiment of the present application, the terminal and the like may be divided into functional modules according to the method example, for example, each functional module may be divided corresponding to each function, or two or more functions may be integrated into one processing module. The integrated module can be realized in a hardware mode, and can also be realized in a software functional module mode. It should be noted that, in the embodiment of the present application, the division of the module is schematic, and is only one logic function division, and there may be another division manner in actual implementation.

Fig. 9 is a block diagram of a gesture capturing apparatus according to an exemplary embodiment of the present application, with respective functional modules divided in correspondence with respective functions.

As shown in fig. 9, the pose capturing apparatus 900 may include a tag information determining unit 910 and a reference pose model generating unit 920, wherein the tag information determining unit 910 may determine a spatial position of a first marker object attached to a subject and tag information of a part of the first marker object at the subject; the reference pose model generating unit 920 may adjust the initial pose model using the spatial position of the first marker object and the tag information, and generate a reference pose model corresponding to the object.

The reference posture model generating unit 920 includes an action parameter setting module for setting an action parameter in the initial posture model by causing the subject to make a specific action, a first marker object information acquiring module, and a reference posture model generating module; the first marking object information acquisition module is used for acquiring the spatial position and the label information of a first marking object when the specific action is made by the object under the condition that the action parameters are determined; and the reference attitude model generating module is used for adjusting the body parameters of the initial attitude model by using the action parameters, the spatial position and the label information to generate the reference attitude model corresponding to the shot object.

Optionally, the first marker object information obtaining module may further obtain a current spatial position of the first marker object and current tag information in response to a motion of the subject, and the reference pose model generating module may further adjust the reference pose model using the current spatial position and the current tag information to obtain a current pose model of the subject for capturing a current pose of the subject.

Optionally, the first marked object information obtaining module is specifically configured to obtain a predicted spatial position of the current tag information and a current spatial position of the first marked object; under the condition that the current spatial position of the first marking object is determined to be within a preset range of the predicted spatial position of the current tag information, matching the first marking object with the current tag information to obtain a matching relation, wherein the preset range is a range set according to the motion track prediction of the shot object; and determining the current label information corresponding to the first marking object according to the matching relation.

Optionally, the reference pose model generating module is specifically configured to, when the shape parameter of the reference pose model is determined, continuously adjust the motion parameter to minimize a sum of distances between virtual spatial positions of all the tag information and spatial positions of the first marker object corresponding to all the tag information, and obtain the current pose model for capturing the pose of the subject.

Optionally, the gesture capturing apparatus 900 further includes a constraint unit, where the constraint unit is configured to utilize a priori motion model generated by a preset motion library to constrain the current gesture model, and obtain the constrained current gesture model.

Optionally, the posture capture device 900 further includes an interactive prop determination unit, a prop information determination unit, and a current prop model determination unit, where the interactive prop determination unit is configured to determine an interactive prop for performing interaction with the subject; the prop information determining unit is used for acquiring a prop spatial position and prop label information of the interactive prop through a prop mark object attached to the interactive prop; and the current prop model confirming unit is used for adjusting a basic prop attitude model corresponding to the interactive prop by utilizing the prop spatial position and the prop label information to generate a current prop attitude model so as to be used for capturing the motion of the interactive prop.

Optionally, the first marked object information acquiring module includes a two-dimensional position determining sub-module and a spatial position determining sub-module, wherein the two-dimensional position determining sub-module is configured to determine at least two-dimensional positions of the first marked object by using images of the first marked object captured by at least two cameras, and the spatial position determining sub-module is configured to determine a spatial position of the first marked object by using the at least two-dimensional positions and calibration information of the at least two cameras.

Optionally, the spatial position determining sub-module is specifically configured to determine, by using the at least two-dimensional positions and the calibration information of the at least two cameras, at least two rays of the at least two cameras corresponding to the first marker object; the spatial position of the first marker object is determined in such a way that the distance of the first marker object from the at least two rays at the spatial position is minimal.

Optionally, the gesture capturing apparatus 900 further includes a camera calibration information obtaining unit, a proportional relationship setting unit, and a spatial coordinate system determining unit, wherein the camera calibration information obtaining unit is configured to obtain calibration information of all cameras by performing calibration on all cameras used for capturing the subject; the proportional relation setting unit is used for setting a proportional relation according to the calibration information of all the cameras and the marking device with the second marking object; the space coordinate system determining unit is used for calibrating the ground of the capturing space by using the calibration information of all the cameras, determining ground information and the space coordinate system determined by using the ground information.

Optionally, the gesture capturing apparatus 900 further includes a tag information matching unit, wherein the tag information matching unit matches the first tagged object with the tag information to obtain a corresponding relationship between the first tagged object and the tag information.

As shown in fig. 10, at the time of actual gesture capture, the first marker object may be placed at a different position of the human body. The shot object needs to wear a special posture capture garment which comprises a posture capture glove and a posture capture shoe, and mark points are pasted on the posture capture garment. Fig. 10 is an exemplary view only, and does not show a garment, so some of the first marked objects are not in full contact with the skin, and in fact, the first marked objects are in contact with the garment.

In summary, the posture capture device according to the exemplary embodiment of the present application can determine the reference posture model corresponding to the subject by using only the spatial position of the first marker object and the tag information without constraining the first marker object, without limiting the first marker object to a skeletal point, and is more flexible to use, and can more flexibly and conveniently acquire the reference posture model corresponding to the subject. Further, by determining the body parameters of the reference pose model when the subject performs a specific motion, the reference pose model can be made to better conform to the body of the subject. Furthermore, the current spatial position, the current tag information and the reference posture model of the first marking object can be used for acquiring the current posture model of the object, so that the current posture of the object is captured, the motion of the object is captured, and the difficulty of motion capture is reduced.

It should be noted that the execution subjects of the steps of the method provided in embodiment 1 may be the same device, or different devices may be used as the execution subjects of the method. For example, the execution subject of steps 21 and 22 may be device 1, and the execution subject of step 23 may be device 2; for another example, the execution subject of step 21 may be device 1, and the execution subjects of steps 22 and 23 may be device 2; and so on.

As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable gesture capture device to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable gesture capture device, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable gesture capture device to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable gesture capture device to cause a series of operational steps to be performed on the computer or other programmable device to produce a computer implemented process such that the instructions which execute on the computer or other programmable device provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

In a typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.

The memory may include forms of volatile memory in a computer readable medium, Random Access Memory (RAM) and/or non-volatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of a computer-readable medium.

Computer-readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, a computer readable medium does not include a transitory computer readable medium such as a modulated data signal and a carrier wave.

It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The above description is only an example of the present application and is not intended to limit the present application. Various modifications and changes may occur to those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present application should be included in the scope of the claims of the present application.

Claims

1. A method of gesture capture, comprising:

determining a spatial position of a first marking object attached to a subject and label information of a part of the first marking object on the subject;

adjusting an initial pose model using the spatial position of the first marker object and the tag information to generate a reference pose model corresponding to the subject,

wherein the initial pose model includes a shape parameter for describing a shape and an action parameter for describing an action, and the adjusting the initial pose model using the spatial position of the first marker object and the tag information to generate a reference pose model corresponding to the subject includes:

setting motion parameters in an initial posture model by causing the object to make a specific motion; acquiring the spatial position of a first marker object and the tag information when the object performs the specific motion if the motion parameter is determined; and adjusting the body parameters of the initial posture model by using the action parameters, the spatial position and the label information to generate a reference posture model corresponding to the shot object.

2. The method of claim 1, further comprising, after generating the reference pose model corresponding to the subject:

acquiring a current spatial position and current tag information of a first marking object in response to the motion of the object;

and adjusting the reference attitude model by using the current spatial position and the current tag information to obtain a current attitude model of the object, so as to be used for capturing the current attitude of the object.

3. The method of claim 2, wherein obtaining current tag information for the first tagged object comprises:

acquiring a predicted spatial position of the current tag information and a current spatial position of the first marker object;

under the condition that the current spatial position of the first marking object is determined to be within a preset range of the predicted spatial position of the current tag information, matching the first marking object with the current tag information to obtain a matching relation, wherein the preset range is a range set according to the motion track prediction of the shot object;

and determining the current label information corresponding to the first marking object according to the matching relation.

4. The method of claim 2, wherein the adjusting the reference pose model corresponding to the subject using the current spatial position and the current tag information to obtain a current pose model of the subject for capturing a current pose of the subject comprises:

and under the condition that the body parameters of the reference posture model are determined, continuously adjusting the motion parameters to enable the sum of the virtual space positions of all the tag information and the space position distances of the first marking objects corresponding to all the tag information to be minimum, and acquiring the current posture model for capturing the posture of the shot object.

5. The method of claim 4, further comprising:

and utilizing a prior action model generated by a preset action library to constrain the current attitude model and acquiring the constrained current attitude model.

6. The method of any of claims 1 to 5, further comprising:

determining an interactive prop for performing interaction with the subject;

acquiring a prop space position and prop label information of the interactive prop through a prop mark object attached to the interactive prop;

and adjusting a basic prop posture model corresponding to the interactive prop by using the prop spatial position and the prop label information to generate a current prop posture model for capturing the motion of the interactive prop.

7. The method of claim 1, wherein determining the spatial position of the first marker object attached to the subject comprises:

determining at least two-dimensional positions of the first marked object using images taken of the first marked object by at least two cameras;

and determining the spatial position of the first marked object by using the at least two-dimensional positions and the calibration information of the at least two cameras.

8. The method of claim 7, wherein determining the spatial position of the first tagged object using the at least two-dimensional positions and the calibration information of the at least two cameras comprises:

determining at least two rays of the at least two cameras corresponding to the first marked object by using the at least two-dimensional positions and calibration information of the at least two cameras;

the spatial position of the first marker object is determined in such a way that the distance of the first marker object from the at least two rays at the spatial position is minimal.

9. The method of claim 1, wherein the spatial location comprises coordinate data of the first marker object within a spatial coordinate system corresponding to a capture volume used to capture the subject.

10. The method of claim 9, further comprising:

acquiring calibration information of all cameras by performing calibration on all cameras used for capturing the object;

setting a proportional relation according to the calibration information of all the cameras and a marking device with a second marking object;

and calibrating the ground of the capturing space by utilizing the calibration information of all the cameras, determining the ground information and the space coordinate system determined by utilizing the ground information.

11. The method of claim 1, wherein determining the spatial position of the first marker object attached to the subject and the label information of the first marker object at the location of the subject further comprises: and matching the first mark object with the label information to obtain the corresponding relation between the first mark object and the label information.

12. A gesture capture device, comprising:

a tag information determination unit configured to determine a spatial position of a first marker object attached to a subject and tag information describing a part of the first marker object on the subject;

a reference pose model generation unit configured to generate a reference pose model corresponding to the object by adjusting an initial pose model using the spatial position of the first marker object and the tag information,

13. An electronic device, comprising:

one or more processors;

a memory; and

one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by the one or more processors, the one or more programs comprising instructions for performing the method of any of claims 1-11.

14. A computer readable storage medium storing one or more programs, the one or more programs comprising instructions, which when executed by a computing device, cause the computing device to perform the method of any of claims 1-11.