CN115294623B - Human body whole body motion capturing method, device, storage medium and terminal - Google Patents

Human body whole body motion capturing method, device, storage medium and terminal Download PDF

Info

Publication number
CN115294623B
CN115294623B CN202210742349.7A CN202210742349A CN115294623B CN 115294623 B CN115294623 B CN 115294623B CN 202210742349 A CN202210742349 A CN 202210742349A CN 115294623 B CN115294623 B CN 115294623B
Authority
CN
China
Prior art keywords
information
human body
gesture
motion
facial expression
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210742349.7A
Other languages
Chinese (zh)
Other versions
CN115294623A (en
Inventor
赵天奇
段盼
渠源
巴君
崔丰驿
苗渊渊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Juli Dimension Technology Co ltd
Original Assignee
Beijing Juli Dimension Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Juli Dimension Technology Co ltd filed Critical Beijing Juli Dimension Technology Co ltd
Priority to CN202210742349.7A priority Critical patent/CN115294623B/en
Publication of CN115294623A publication Critical patent/CN115294623A/en
Application granted granted Critical
Publication of CN115294623B publication Critical patent/CN115294623B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/174Facial expression recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • G06V40/28Recognition of hand or arm movements, e.g. recognition of deaf sign language
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Evolutionary Computation (AREA)
  • Human Computer Interaction (AREA)
  • Psychiatry (AREA)
  • Social Psychology (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • Image Analysis (AREA)
  • Processing Or Creating Images (AREA)

Abstract

The application relates to a human body whole body motion capturing method, a device, a storage medium and a terminal. Wherein the method comprises the following steps: and acquiring a cutting graph of the human body by using a single camera, taking the cutting graph as input, and capturing facial expression posture information, body motion information, gesture motion information and body root position 3D information by using a cyclic feedback and iterative mutual aid optimization mechanism between the neural networks, so that real-time whole body motion information of the human body is acquired. The device has the characteristics of low cost and convenient use, and can improve the capturing precision and stability.

Description

Human body whole body motion capturing method, device, storage medium and terminal
Technical Field
The present invention relates to the field of motion capture technologies, and in particular, to a method, an apparatus, a storage medium, and a terminal for capturing motion of a whole body of a human body.
Background
Motion capture technology has become an indispensable production tool in the fields of games, animation, and movies. In the existing motion capture technology, the optical motion capture technology has the characteristics of high cost and complex scene equipment arrangement; the inertial motion capturing technology is easily influenced by environmental factors of equipment with magnetic tape iron and the like, and is easy to collapse when capturing rapid motion; although the light-inertia mixing capturing technology improves capturing precision, the problems of high cost and high use complexity still exist.
In the present situation, reducing the cost of motion capture and simplifying the process of motion capture are general concerns.
Disclosure of Invention
The embodiment of the application provides a human body whole body motion capturing method, a device, a storage medium and a terminal. The following presents a simplified summary in order to provide a basic understanding of some aspects of the disclosed embodiments. This summary is not an extensive overview and is intended to neither identify key/critical elements nor delineate the scope of such embodiments. Its sole purpose is to present some concepts in a simplified form as a prelude to the more detailed description that is presented later.
In a first aspect, an embodiment of the present application provides a method for capturing motion of a whole body of a human body, where the method includes:
collecting motion data of a human body;
cleaning and marking the motion data to obtain a cutting graph of the human body;
extracting facial expression posture information, body motion information, gesture motion information and body root position 3D information of the human body according to the cutting graph and/or a fusion mode of loop feedback and iterative mutual optimization;
and carrying out space-time fusion on the facial expression posture information, the body motion information, the gesture motion information and the body root position 3D information to obtain the whole body motion information of the human body.
Optionally, the collecting motion data of the human body includes:
and acquiring the motion data of the human body through a single RGB camera.
Optionally, extracting facial expression posture information, body motion information and gesture motion information of the human body according to the cutting graph and/or a fusion mode of loop feedback and iterative mutual optimization, including:
according to the cutting graph, body action information and gesture action information of facial expression and gesture information of the human body are determined;
carrying out iterative optimization on the facial expression posture information, the body motion information and the gesture motion information by adopting a fusion mode of loop feedback and iterative mutual optimization;
the facial expression posture information, the body motion information and the gesture motion information after iterative optimization are the extracted facial expression posture information, body motion information and gesture motion information of the human body.
Optionally, determining facial expression pose information of the human body according to the cutting graph includes:
inputting the cutting graph into a pre-training model to obtain the general facial features of the human body;
acquiring facial expression posture information of the human body according to the general facial features, the general ID coding network, the expression extraction neural network, the expression optimization model and the posture extraction neural network;
And fusing the facial expression posture information through a space-time fusion network to obtain the fused facial expression posture information.
Optionally, determining the body motion information and the gesture motion information of the human body according to the cropping map includes:
extracting a joint global map, a gesture global map, a joint local map and a gesture local map of the human body according to the cutting map;
respectively inputting the joint global map and the joint local map into a mobilent network, and outputting joint global characteristics and joint local characteristics of the human body;
respectively inputting the gesture global map and the gesture local map into a mobilent network, and outputting gesture global features and gesture local features of the human body;
fusing the joint global features and the joint local features through a space-time fusion network to obtain body action information;
and fusing the gesture global features and the gesture local features through the space-time fusion network to obtain gesture action information.
Optionally, extracting 3D information of the body root position of the human body according to the cropping map includes:
extracting the motion direction characteristics and tracking information of the human body according to the cutting graph;
And determining the 3D information of the body root position of the human body according to the motion direction characteristics, the tracking information and the prior constraint of the physical rule.
Optionally, the method further comprises:
and repositioning the virtual character according to the whole body action information, so that the real-time driving of the virtual character is realized.
In a second aspect, embodiments of the present application provide a human body whole body motion capture device, based on a camera and a cyclic feedback mechanism, the device comprising:
the data acquisition module is used for acquiring motion data of a human body;
the clear labeling module is used for cleaning and labeling the motion data to obtain a cutting graph of the human body;
the human body whole body action extraction module is used for extracting facial expression posture information, body action information, gesture action information and body root position 3D information of the human body according to the cutting graph and/or a fusion mode of loop feedback and iterative mutual optimization;
the human body whole body action determining module is used for carrying out space-time fusion on the facial expression posture information, the body action information, the gesture action information and the body root position 3D information to obtain whole body action information of the human body.
In a third aspect, embodiments of the present application provide a computer storage medium having stored thereon a plurality of instructions adapted to be loaded by a processor and to perform the above-described method steps.
In a fourth aspect, embodiments of the present application provide a terminal, which may include: a processor and a memory; wherein the memory stores a computer program adapted to be loaded by the processor and to perform the method steps described above.
The technical scheme provided by the embodiment of the application can comprise the following beneficial effects:
in the embodiment of the application, the human body whole body motion capturing method firstly collects motion data of a human body; then cleaning and labeling the motion data to obtain a cutting graph of the human body; secondly, extracting facial expression posture information, body motion information, gesture motion information and body root position 3D information of the human body according to the cutting graph and/or a fusion mode of loop feedback and iterative mutual optimization; and finally, carrying out space-time fusion on the facial expression posture information, the body motion information, the gesture motion information and the body root position 3D information to obtain the whole body motion information of the human body. The human facial expression posture information, the body motion information, the gesture motion information and the body root position 3D information of the human body extracted in real time can be fused in time and space to acquire the whole body motion information of the human body, and the human body facial expression posture information acquisition method has the characteristics of being low in cost and convenient to use and can improve capturing precision and stability.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the invention as claimed.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the invention and together with the description, serve to explain the principles of the invention.
Fig. 1 is a schematic diagram of the overall design of a human body whole body motion capturing method according to an embodiment of the present application;
fig. 2 is a schematic flow chart of a method for capturing motion of a whole body of a human body according to an embodiment of the present application;
fig. 3 is a schematic diagram of extracting gesture motion information and facial expression posture information of a human body whole body motion capturing method according to an embodiment of the present application;
fig. 4 is a schematic diagram illustrating the extraction of joint information in a human body whole body motion capturing method according to an embodiment of the present application;
fig. 5 is a schematic diagram of iterative optimization among facial expression posture information, gesture motion information and body motion information in a human body whole body motion capturing method according to an embodiment of the present application;
fig. 6 is a schematic diagram of performing space-time fusion on facial expression posture information, body motion information, gesture motion information and body root position 3D information of a human body whole body motion capturing method according to an embodiment of the present application;
FIG. 7 is a schematic diagram of a device for capturing motion of a whole body of a human body according to an embodiment of the present application;
fig. 8 is a schematic diagram of a terminal according to an embodiment of the present application.
Detailed Description
The following description and the drawings sufficiently illustrate specific embodiments of the invention to enable those skilled in the art to practice them.
It should be understood that the described embodiments are merely some, but not all, embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
When the following description refers to the accompanying drawings, the same numbers in different drawings refer to the same or similar elements, unless otherwise indicated. The implementations described in the following exemplary examples do not represent all implementations consistent with the invention. Rather, they are merely examples of systems and methods that are consistent with aspects of the invention as detailed in the accompanying claims.
In the description of the present invention, it should be understood that the terms "first," "second," and the like are used for descriptive purposes only and are not to be construed as indicating or implying relative importance. The specific meaning of the above terms in the present invention will be understood in specific cases by those of ordinary skill in the art. Furthermore, in the description of the present invention, unless otherwise indicated, "a plurality" means two or more. "and/or", describes an association relationship of an association object, and indicates that there may be three relationships, for example, a and/or B, and may indicate: a exists alone, A and B exist together, and B exists alone. The character "/" generally indicates that the context-dependent object is an "or" relationship.
Referring to fig. 1-6, a flowchart of a method for capturing motion of a whole body of a human body is provided in an embodiment of the present application. As shown in fig. 1-6, the method of the embodiments of the present application may include the steps of:
s100, collecting motion data of a human body. The S100 includes: and acquiring the motion data of the human body through a single RGB camera.
In the embodiment of the application, the RGB image of a massive human body can be acquired through the RGB camera under the environment of illumination, background, human body height, human body fat and thin and random transformation of human body clothes. The motion data is an RGB image.
And S200, cleaning and labeling the motion data to obtain a cutting graph of the human body.
In the embodiment of the application, a frame containing a human body is manually marked on the RGB image; a network of predicted human body crop frames can be trained from a large number of RGB images containing human body frames.
The network of predicted human body cutting frames can cut the RGB image with the motion data of the human body to obtain a cutting graph of the human body.
S300, extracting facial expression posture information, body motion information, gesture motion information and body root position 3D information of the human body according to the cutting graph and/or a fusion mode of loop feedback and iterative mutual optimization;
In S300, extracting facial expression posture information, body motion information and gesture motion information of the human body according to the cutting graph and/or the fusion mode of loop feedback and iterative mutual optimization, including:
s311, determining facial expression posture information, body motion information and gesture motion information of the human body according to the cutting graph;
in the embodiment of the application, the movement range of the human hand is larger, so that the human hand is easy to shield other parts of the human body or is shielded by other parts of the human body when moving; and because the occupation of the human hand in the cutting graph is smaller, special treatment is required to be carried out on the shielding condition of the human hand in the cutting graph. As shown in fig. 3, when a human hand is blocked by other parts of the human body, the gesture action information of the current frame can be assigned to a default gesture action; if the human hand is not blocked by other parts of the human body, the gesture motion information can be predicted directly through the gesture motion extraction network. When a human hand shields a joint part of a human body, information of the joint part can be predicted according to joint information outside the joint part.
The gesture action of the human body is easier to shade the action of the human face, the gesture action information of the human body also has influence on the prediction of the facial expression gesture information, and in the embodiment of the application, the processing mode when the human face is shaded is similar to the processing mode when the human hand is shaded, and the natural action of the human face can be defaulted to replace the currently shaded facial expression gesture information.
In an embodiment of the present application, determining facial expression pose information of the human body according to the clipping diagram includes:
and inputting the cutting graph into a pre-training model to obtain the general facial features of the human body. In the embodiment of the application, the pre-training model is a general feature extraction model; the general feature extraction model can smoothly cope with the complexity and diversity of the cut map. The general feature extraction model can extract general features of the human face through general spatial position codes of the human face.
Acquiring facial expression posture information of the human body according to the general facial features, the general ID coding network, the expression extraction neural network, the expression optimization model and the posture extraction neural network; and fusing the facial expression posture information through a space-time fusion network to obtain the fused facial expression posture information.
The general features of the face are the identity ID, the gesture, the expression distribution features and the like of the face.
Based on massive human body data, the identity ID is the most obvious characteristic for distinguishing the human body, and in order to capture hundreds of micro expressions of any person, the identity ID of the human face must be successfully stripped to remove the variability, so that the micro expression migration among different people is achieved. In an embodiment of the present application, the universal ID encoding network includes a base model module, a meta learning module, and a random implicit encoding mechanism. The random implicit coding mechanism obtains a unique random code by randomly sampling a certain face image under the distribution through a mathematical statistical model obtained by carrying out statistical distribution on the face data. The random codes and the face images are input into a basic model module, and the basic model module outputs face ID codes. And the meta learning module performs consistency constraint on the face ID code and the random code to finish final training of the universal ID code network.
In the embodiment of the application, identity code information is preset in a universal ID code network, and each face corresponds to a unique face ID code; the face ID code can be determined according to the face ID included in the general face features, so that the human body identity information can be determined. The embodiment of the application realizes the micro algorithm combining the neural network and the traditional rule in the design of the face action extraction network for the first time, and finishes the encoding and separation of the human identity information.
Facial expressions become the technical difficulty of extracting facial expression posture information due to the richness, complexity and dynamics of the facial expressions. The existing facial expression and gesture information extraction technology is either rich and not complex enough; or complexity may be, dynamic stability may not be feasible. The expression extraction neural network provided by the embodiment of the application provides a position block coding for a facial expression, specifically, local area images such as an eyebrow area image, an eye area image, a nose area image, a mouth area image and the like are respectively obtained by cutting detected facial key points, the local area images are respectively input into respective feature extraction networks, the global feature extraction is also carried out on a facial original image while the features are respectively extracted on the local area images, then the global feature and the local feature are fused, and finally the motion parameters of corresponding areas are respectively regressively predicted.
In the embodiment of the application, a combination can be generated between the images of different local areas, and a combination expression can be generated, wherein the combination expression can be that a user skims his mouth with eyes closed, opens the eyes and moves upward with eyebrows, and the like. Most expressions are generated along with the motion association of a plurality of local areas of the face, and it is important to simulate the motion association among the local areas, and the embodiment of the application introduces a spatial association concept: the facial local features are combined in a random mode, so that the spatial association network can be adaptively selected and combined, and the refinement of the facial global expression and the facial local expression and the richness and the complexity of the facial global expression can be improved.
The embodiment of the application also introduces a time sequence mechanism, so that the expression extraction neural network can utilize front and back frames to achieve the dynamic stability of the facial expression.
Because the facial expression has strong individuality, such as laughter, etc. Aiming at the problems that the semantics are consistent but the expression degree is different from person to person, the embodiment of the application introduces an expression optimization model which is similar to the universal ID coding network, can extract personalized characterization of the face, and can complete personalized facial expression prediction by combining with other networks in the face action extraction network.
The training process of the expression optimization model specifically comprises the following steps: when capturing the expression, shooting 5-10 face pictures with various angles and illumination changes under the natural expression, and then carrying out illumination brightness, blurring and other enhancement treatment on the face pictures. The facial image data after the enhancement processing is input into an expression extraction neural network, and corresponding basic expression feature codes are output, so that difference values are made between the expression features obtained by the expression extraction neural network and the basic expression features corresponding to the basic expression feature codes in the process of capturing the facial expression in real time, and real-time expression changes of the face are obtained through difference value regression.
The facial expression changes and is accompanied with the movement of the head of the human body. Only when the facial expression is combined with the head movement of the human body, the facial expression extracted by the expression extraction neural network is more natural and has better expressive force. For this reason, the embodiment of the application designs a gesture extraction neural network. The gesture extraction neural network extracts position information and angle information of the head joint of the human body.
According to the embodiment of the application, a space-time fusion network is introduced, on one hand, when the extracted facial expression is too fuzzy and is not displayed fully, the facial expression of the current frame can be recovered by finding the similar frames of the pathological frame or the front and back normal frames, and the extraction effect of the facial expression posture information can be improved. On the other hand, the facial expression and the position information and the angle information of the head joints of the human body recognized by the facial expression extraction neural network and the gesture extraction neural network after the ID code network recognizes the facial ID code and the expression optimization model is optimized can be subjected to space-time fusion, so that the facial expression and the gesture information of the human body with individuation can be obtained.
Meanwhile, in order to make the facial expression excessive and natural, the 3d convolution network is designed in the space-time fusion network, and time information is additionally added, so that the facial motion extraction network has better stability and robustness.
After the block networks in the face action extraction network complete the extraction of the face expression posture information, the space-time fusion network in the embodiment of the application completes the parameter regression of the face expression posture information.
Traditional network structure design is based on mass data and adopts end-to-end black box type violence training. The deep learning-based face action extraction network provides a network design which is segmented according to definite semantics, and can complete capturing of any facial expression posture information from a normal facial expression with definite semantics to a micro expression containing various micro changes; the fine expression extraction has more real effect. The method solves the problems that the traditional numerical statistics method is time-consuming and labor-consuming, a large amount of human judgment is needed to select proper face features, and great individual deviation and instability exist.
And determining the body action information and gesture action information of the human body according to the cutting graph. In the embodiment of the application, gesture motion information can be extracted according to the cutting graph, each block in the gesture motion extraction network and the space-time fusion network; extracting body action information according to the cutting graph and each block and space-time fusion network in the body action extraction network; the method comprises the following steps:
And extracting a joint global map, a gesture global map, a joint local map and a gesture local map of the human body according to the cutting map.
Respectively inputting the joint global map and the joint local map into a mobilent network, and extracting joint global features and joint local features of the human body; and fusing the joint global features and the joint local features through a space-time fusion network to obtain joint information.
As shown in fig. 4, extracting global features of the joints according to the global joint graph; the joint partial graph comprises an upper joint partial graph and a lower joint partial graph, and the joint partial features extracted from the upper joint partial graph and the lower joint partial graph are upper joint partial features and lower joint partial features; the superior and inferior joint local features include joint position information and angle information.
And carrying out space-time fusion on the joint global features and the joint local features on different scales, and processing the joint global features and the joint local features after fusion through 2 layers of convolution layers to obtain final joint information.
In the embodiment of the present application, the body motion information is joint information, and the body motion extraction network is a joint extraction network.
Respectively inputting the gesture global map and the gesture local map into a mobilent network, and outputting gesture global features and gesture local features of the human body; and fusing the gesture global features and the gesture local features through the space-time fusion network to obtain gesture action information.
In the embodiment of the application, the left-hand global features and the left-hand local features are extracted through a left-hand manual extraction network; and extracting the right-hand global features and the right-hand local features through a right-hand action extraction network.
The gesture global map comprises a right-hand global map and a left-hand global map, and gesture global features extracted according to the right-hand global map and the left-hand global map are right-hand global features and left-hand global features; the gesture local features extracted from the wrist local graph, the palm local graph and the finger local graph are wrist local features, palm local features and finger local features.
Traditional extraction of gesture motion information is realized based on gesture recognition. The other is realized by means of a more complex glove designed in hardware, both of which have drawbacks in terms of richness, complexity and cheapness of the action. In the embodiment of the application, a brand new characterization and training mode is designed for solving the problems of small occupation of hands in a cutting diagram, high flexibility, large movement range and fuzzy shielding, so as to cope with extraction of gesture action information; after labeling the detection confidence of the cut map, in order to further reduce the cost of the human hand detection part and improve the precision of human hand detection, the embodiment of the application combines the traditional clustering method and designs the ultra-small calculation force hand detection network. The traditional clustering algorithm is used for counting the distribution of gesture motions, whether the gesture motions are uniform or absent, and the like; the ultra-small calculation force hand detection network can fully cope with the problems of small size, motion blur and the like of human hands, and ensures the real-time requirement of gesture recognition.
Because the complexity of gesture actions determines the diversity and effectiveness of the representation modes, the embodiment of the application continuously refines the block representation of the human hand on the basis that the human hand is represented as a whole by the existing gesture feature extraction network, and divides the human hand into three blocks of wrists, palms and fingers.
The inherent physical rigidity constraints of the fingers indicate that the movement of gestures, although diversified, is naturally limited, meaning that the application can reduce learning space. Based on the above, the embodiment of the application designs a training mode of fusion of various features of gesture global motion, gesture local motion, finger joint relative motion and corresponding rotation, and performs temporal-spatial fusion on gesture global features and gesture local features. When the hand detection network is trained, data enhancement operation under a large number of scenes is performed on hand data artificially; because a large amount of open source data marks 2d key points at all joints of a human hand, a 2d key point constraint function needs to be added in a loss function, and the hand detection network is further supervised and trained on the open source data. The regression of the network not only has the coordinate values of human hand frames, but also includes the confidence degree and the left-right distinction of each human hand frame, and because the confidence degree and the left-right information of the human hand frames are marked, only the regression value and the marked value are required to be subjected to constraint training of a mean square error loss function L2 at the moment, and 2d key points are introduced for supervised learning, so that the hand detection network can be better converged.
Figure BDA0003718487260000141
Wherein the index i indicates the joint index, (x) i ,y i ) Representing the pixel coordinates of the artificially noted i-th joint,
Figure BDA0003718487260000142
representing the pixel coordinates of the predicted ith joint, c i Indicating the confidence level of the i-th joint of the manual annotation,/->
Figure BDA0003718487260000143
Representing confidence level of the i-th joint of the prediction, v i Left and right hand of the ith joint representing the determination of the manual annotation, +.>
Figure BDA0003718487260000144
Indicating the left and right hand, θ, of the predicted decision ith joint i Indicating the rotation angle of the artificially marked ith joint,/->
Figure BDA0003718487260000145
The predicted rotation angle of the i-th joint is represented by m, the number of coordinates is represented by q, the number of confidence degrees is represented by q, the number of left and right hands is represented by t, and the number of rotation angles of the joints is represented by n. L (L) smooth Representing smoothing the 2d key points of the previous and subsequent frames, alpha coor Weight coefficient, beta, representing coordinates obj Weight coefficient, gamma, representing confidence l-r Weight coefficient of left and right hand, θ θ A weight coefficient indicating the rotation angle of the joint.
To this end, the process of extracting the body motion information and/or the gesture motion information by introducing an attention mechanism in the mobilet network is completed.
In another embodiment of the present application, S312, the facial expression pose information, the body motion information and the gesture motion information are iteratively optimized by adopting a fusion manner of the loop feedback and the iterative mutual optimization.
As shown in fig. 5, iterative optimization is performed among the facial expression posture information, the gesture motion information and the body motion information.
The joint extraction network contains movements of the whole body joints, of which the wrist joints are a priori associated with global body movements. For example, wrist motion is associated with arm motion, and arm motion may also be associated with wrist motion. Because the prediction of the joint extraction network and the gesture action extraction network is not completely accurate before the iterative training is started, in the process of cyclic iteration, the joint information output by the joint extraction network is input into the gesture extraction network, which is equivalent to giving the global motion of the body as the prior to the gesture action extraction network, so that the gesture action extraction network can obtain the global information of the body, and the prediction of the wrist motion is more accurate; inputting gesture information output by the gesture extraction network into the joint extraction network, and accelerating convergence of the joint extraction network; the joint extraction network and the gesture action extraction network are mutually input and output completed loop iteration, so that the gesture action extraction network and the joint extraction network can be gradually iterated and lifted in the process of network training.
The joint information output by the joint extraction network can be used as input of the gesture motion extraction network, so that the gesture motion information can be optimized. In the embodiment of the application, due to the complex variability of gesture actions, the orientation of a human hand is difficult to distinguish in the process of extracting the information of the gesture actions alone. The direction of the hand is consistent with the direction of the inner side and the outer side of the arm, the correct direction of the arm can be obtained through the joint extraction network, the joint information output by the joint extraction network is input into the gesture action extraction network, and the joint information of the direction of the inner side and the outer side of the arm output by the joint extraction network can be used for correcting the direction of the hand output by the gesture action extraction network. The orientation of the human hand refers to the orientation of the palm and back of the hand. The gesture motion information output by the gesture motion extraction network is input into the joint extraction network, so that the problem that the joint extraction network is not trained in place in the training process can be solved, and the joint information is optimized.
The iterative optimization process of the face motion extraction network and the joint extraction network is similar to the iterative optimization process of the gesture motion extraction network and the joint extraction network, and is not repeated here.
S313, the facial expression posture information, the body motion information and the gesture motion information after iterative optimization are the extracted facial expression posture information, body motion information and gesture motion information of the human body.
In S300, 3D information of the body root position of the human body is extracted according to the cut map. Extracting 3D information of the body root position of the human body through a 3D position extraction network of the human body, wherein the specific process is as follows:
s321, extracting the motion direction characteristics and tracking information of the human body according to the cutting graph;
in the embodiment of the application, the motion direction characteristics of the front and rear frame pixels and the adjacent pixels of the plurality of cutting pictures are the motion direction characteristics of the human body; inputting the cut map into a neural network and outputting a 2d thermodynamic diagram; and performing feature fusion operation on the 2D thermodynamic diagrams to obtain a 3D position diagram, wherein the 3D position diagram is marked with tracking information of a human body frame, and the tracking information can be motion information of a human body root.
S322, predicting a final human body root motion track according to the motion direction characteristics, the tracking information and the priori constraint of physical rules, and further determining the 3D information of the human body root position according to the human body root motion track.
In the embodiment of the application, the body root position 3D information is the acquired real position of the person in the environment, and whether the body root position 3D information is accurate or not determines the position sensing capability during the extraction of the body action information.
The a priori constraint of the physical rule is determined based on whether the feet are touching the ground: inputting the 2d thermodynamic diagrams into a neural network to judge whether the feet contact the ground or not, and when the feet contact the ground, the two-finger label is 1; when the feet are not contacted with the ground, the binarization label is 0. When the two feet contact the ground, fitting a human body model to the rotation and the position of the human body joints included in the initial predicted tracking information, obtaining a human body model containing height and weight information, and calculating the acceleration of the two feet, so as to update the predicted position, speed and direction of the human body and realize the purpose of updating the 3D information of the body root position of the human body. S400, carrying out space-time fusion on the facial expression posture information, the body motion information, the gesture motion information and the body root position 3D information to obtain the whole body motion information of the human body.
In the embodiment of the application, the space-time fusion network mainly utilizes the spatial relationship and the time relationship to fuse the facial expression posture information, the body motion information and the gesture motion information to obtain an output result; the spatial relationship is a global and local relationship, and the temporal relationship is a front-to-back relationship of continuous frames. And fusing the output result of the space-time fusion network with the 3D information of the body root position, and completing the prediction of the whole body motion information parameter regression of the human body through the fully connected network. The specific process is as follows:
As shown in fig. 6, the cut and aligned human body image including facial expression posture information, body motion information and gesture motion information is respectively branched into two branches: the upper branch extracts features on different scales of human body images comprising facial expression posture information, body motion information and gesture motion information, and fuses the extracted features through a space fusion network; the lower branch utilizes front and back frames of a human body image comprising facial expression posture information, body motion information and gesture motion information, calculates the motion direction of the same pixel point on the front and back frames of the human body image in a pixel-by-pixel matching mode to obtain an optical flow graph, and fuses the optical flow graph through a time fusion network; the space-time fusion network performs space-time fusion on the output results of the space fusion network and the time fusion network and the 3D information of the body root position to obtain a human body whole body action characteristic diagram; inputting the human body whole body action feature map into a fully-connected network, restraining the human body whole body action feature map by means of a penalty function, and predicting whole body action information of the human body. The space-time fusion strategy of the gesture motion extraction network is similar to that of the space-time fusion network for fusing the whole body motion of the human body, and the characteristics of the space-time fusion strategy are different.
Wherein the penalty function is as follows:
Figure BDA0003718487260000181
wherein θ represents the rotation angle of the artificially marked joint,
Figure BDA0003718487260000182
representing the predicted angle of rotation of the joint, P 2d Representing the manually marked joint 2d coordinates, +.>
Figure BDA0003718487260000183
Representing predicted joint 2d coordinates, H representing a manually labeled thermodynamic diagram, < >>
Figure BDA0003718487260000184
Representing a predicted thermodynamic diagram. Alpha angle Weight coefficient, beta, representing the rotation angle of the joint 2d Weight coefficient, gamma, representing joint 2d coordinates h The weight coefficients representing the thermodynamic diagram. n represents the number of rotation angles of the joint, m represents the number of coordinates of the joint 2d, s represents the number of thermodynamic diagrams, L smooth Representing a canonical term for smoothing the prediction result, L smooth Then the constraint of the front and back frames is performed to perform a smoothing function.
In the embodiment of the application, the facial expression posture information, the body motion information, the gesture motion information and the body root position 3D information are respectively extracted and recombined by the method, so that the representation capability of the network is enhanced.
The method further comprises the steps of: s500, repositioning the virtual roles according to the whole body action information, so that real-time driving of the virtual roles is realized.
In the embodiment of the application, the human body whole body motion capturing method firstly collects motion data of a human body; then cleaning and labeling the motion data to obtain a cutting graph of the human body; secondly, extracting facial expression posture information, body motion information, gesture motion information and body root position 3D information of the human body according to the cutting graph and/or a fusion mode of loop feedback and iterative mutual optimization; and finally, carrying out space-time fusion on the facial expression posture information, the body motion information, the gesture motion information and the body root position 3D information to obtain the whole body motion information of the human body. The human facial expression posture information, the body motion information, the gesture motion information and the body root position 3D information of the human body extracted in real time can be fused in time and space to acquire the whole body motion information of the human body, and the human body facial expression posture information acquisition method has the characteristics of being low in cost and convenient to use and can improve capturing precision and stability.
The following are examples of the apparatus of the present invention that may be used to perform the method embodiments of the present invention. For details not disclosed in the embodiments of the apparatus of the present invention, please refer to the embodiments of the method of the present invention.
Referring to fig. 7, a schematic structural diagram of a human body whole body motion capture device according to an exemplary embodiment of the invention is shown. The device 1 is based on a camera and a cyclic feedback mechanism, comprising: the system comprises a data acquisition module 10, a clear labeling module 20, a human body whole body action extraction module 30 and a human body whole body action determination module 40.
The data acquisition module 10 is used for acquiring motion data of a human body;
the clear labeling module 20 is used for cleaning and labeling the motion data to obtain a cutting graph of the human body;
the human body whole body action extraction module 30 is used for extracting facial expression posture information, body action information, gesture action information and body root position 3D information of the human body according to the cutting graph and/or a fusion mode of loop feedback and iterative mutual optimization;
the human body whole body motion determining module 40 is configured to perform space-time fusion on the facial expression posture information, the body motion information, the gesture motion information and the body root position 3D information, and obtain whole body motion information of the human body.
It should be noted that, when the human body whole body motion capturing device provided in the above embodiment performs the human body whole body motion capturing method, only the division of the above functional modules is used for illustration, in practical application, the above functional allocation may be performed by different functional modules according to needs, that is, the internal structure of the device is divided into different functional modules, so as to complete all or part of the functions described above. In addition, the human body whole body motion capturing device provided in the above embodiment and the human body whole body motion capturing method embodiment belong to the same concept, which embody the detailed implementation process and are not described herein.
The foregoing embodiment numbers of the present application are merely for describing, and do not represent advantages or disadvantages of the embodiments.
In the embodiment of the application, the human body whole body motion capturing device firstly collects motion data of a human body; then cleaning and labeling the motion data to obtain a cutting graph of the human body; secondly, extracting facial expression posture information, body motion information, gesture motion information and body root position 3D information of the human body according to the cutting graph and/or a fusion mode of loop feedback and iterative mutual optimization; and finally, carrying out space-time fusion on the facial expression posture information, the body motion information, the gesture motion information and the body root position 3D information to obtain the whole body motion information of the human body. The human facial expression posture information, the body motion information, the gesture motion information and the body root position 3D information of the human body extracted in real time can be fused in time and space to acquire the whole body motion information of the human body, and the human body facial expression posture information acquisition method has the characteristics of being low in cost and convenient to use and can improve capturing precision and stability.
The present invention also provides a computer readable medium having stored thereon program instructions which, when executed by a processor, implement the human body whole body motion capturing method provided by the above-mentioned respective method embodiments.
The invention also provides a computer program product containing instructions which, when run on a computer, cause the computer to perform the human body whole body motion capture method of the above-described method embodiments.
Referring to fig. 8, a schematic structural diagram of a terminal is provided in an embodiment of the present application. As shown in fig. 8, terminal 1000 can include: at least one processor 1001, at least one network interface 1004, a user interface 1003, a memory 1005, at least one communication bus 1002.
Wherein the communication bus 1002 is used to enable connected communication between these components.
The user interface 1003 may include a Display screen (Display) and a Camera (Camera), and the optional user interface 1003 may further include a standard wired interface and a wireless interface.
The network interface 1004 may optionally include a standard wired interface, a wireless interface (e.g., WI-FI interface), among others.
Wherein the processor 1001 may include one or more processing cores. The processor 1001 connects various parts within the entire electronic device 1000 using various interfaces and lines, and performs various functions of the electronic device 1000 and processes data by executing or executing instructions, programs, code sets, or instruction sets stored in the memory 1005, and invoking data stored in the memory 1005. Alternatively, the processor 1001 may be implemented in at least one hardware form of digital signal processing (Digital Signal Processing, DSP), field programmable gate array (Field-Programmable Gate Array, FPGA), programmable logic array (Programmable Logic Array, PLA). The processor 1001 may integrate one or a combination of several of a central processing unit (Central Processing Unit, CPU), an image processor (Graphics Processing Unit, GPU), and a modem, etc. The CPU mainly processes an operating system, a user interface, an application program and the like; the GPU is used for rendering and drawing the content required to be displayed by the display screen; the modem is used to handle wireless communications. It will be appreciated that the modem may not be integrated into the processor 1001 and may be implemented by a single chip.
The Memory 1005 may include a random access Memory (Random Access Memory, RAM) or a Read-Only Memory (Read-Only Memory). Optionally, the memory 1005 includes a non-transitory computer readable medium (non-transitory computer-readable storage medium). The memory 1005 may be used to store instructions, programs, code, sets of codes, or sets of instructions. The memory 1005 may include a stored program area and a stored data area, wherein the stored program area may store instructions for implementing an operating system, instructions for at least one function (such as a touch function, a sound playing function, an image playing function, etc.), instructions for implementing the above-described respective method embodiments, etc.; the storage data area may store data or the like referred to in the above respective method embodiments. The memory 1005 may also optionally be at least one storage device located remotely from the processor 1001. As shown in fig. 8, an operating system, a network communication module, a user interface module, and an availability analysis application of vehicle running track data may be included in a memory 1005 as one type of computer storage medium.
In terminal 1000 shown in fig. 8, user interface 1003 is mainly used for providing an input interface for a user, and acquiring data input by the user; and the processor 1001 may be configured to invoke the human body whole body motion capture application program stored in the memory 1005, and specifically perform the following operations:
Collecting motion data of a human body;
cleaning and marking the motion data to obtain a cutting graph of the human body;
extracting facial expression posture information, body motion information, gesture motion information and body root position 3D information of the human body according to the cutting graph and/or a fusion mode of loop feedback and iterative mutual optimization;
carrying out space-time fusion on the facial expression posture information, the body motion information, the gesture motion information and the body root position 3D information to obtain whole body motion information of the human body;
and repositioning the virtual character according to the whole body action information, so that the real-time driving of the virtual character is realized.
In one embodiment, the processor 1001, when executing the acquisition of motion data of the human body, specifically performs the following operations:
and acquiring the motion data of the human body through a single RGB camera.
In one embodiment, when the processor 1001 performs the fusion method of iterative mutual optimization according to the cropping map and/or loop feedback, the following operations are specifically performed when extracting the body motion information and the gesture motion information of the facial expression pose information of the human body:
According to the cutting graph, body action information and gesture action information of facial expression and gesture information of the human body are determined;
carrying out iterative optimization on the facial expression posture information, the body motion information and the gesture motion information by adopting a fusion mode of loop feedback and iterative mutual optimization;
the facial expression posture information, the body motion information and the gesture motion information after iterative optimization are the extracted facial expression posture information, the body motion information and the gesture motion information of the human body.
In one embodiment, the processor 1001, when executing the determination of the facial expression pose information of the human body according to the cropping map, specifically executes the following operations:
inputting the cutting graph into a pre-training model to obtain the general facial features of the human body;
acquiring facial expression posture information of the human body according to the general facial features, the general ID coding network, the expression extraction neural network, the expression optimization model and the posture extraction neural network;
and fusing the facial expression posture information through a space-time fusion network to obtain the fused facial expression posture information.
In one embodiment, the processor 1001, when executing the determination of the body motion information and the gesture motion information of the human body according to the cropping map, specifically executes the following operations:
extracting a joint global map, a gesture global map, a joint local map and a gesture local map of the human body according to the cutting map;
respectively inputting the joint global map and the joint local map into a mobilent network, and outputting joint global characteristics and joint local characteristics of the human body;
respectively inputting the gesture global map and the gesture local map into a mobilent network, and outputting gesture global features and gesture local features of the human body;
fusing the joint global features and the joint local features through a space-time fusion network to obtain body action information;
and fusing the gesture global features and the gesture local features through the space-time fusion network to obtain gesture action information.
In one embodiment, the processor 1001, when executing the extraction of the body root position 3D information of the human body according to the cropping map, specifically performs the following operations:
extracting the motion direction characteristics and tracking information of the human body according to the cutting graph;
And determining the 3D information of the body root position of the human body according to the motion direction characteristics, the tracking information and the prior constraint of the physical rule.
In the embodiment of the application, the human body whole body motion capturing method firstly collects motion data of a human body; then cleaning and labeling the motion data to obtain a cutting graph of the human body; secondly, extracting facial expression posture information, body motion information, gesture motion information and body root position 3D information of the human body according to the cutting graph and/or a fusion mode of loop feedback and iterative mutual optimization; and finally, carrying out space-time fusion on the facial expression posture information, the body motion information, the gesture motion information and the body root position 3D information to obtain the whole body motion information of the human body. The human facial expression posture information, the body motion information, the gesture motion information and the body root position 3D information of the human body extracted in real time can be fused in time and space to obtain the whole body motion information of the human body, and the human body facial expression posture information capturing device has the characteristics of being low in cost and convenient to use and can enable capturing precision and stability to be improved.
Those skilled in the art will appreciate that a program implementing all or part of the above-described embodiment method, which is implemented by means of hardware related to instructions of a computer program, may be stored in a computer readable storage medium, and the program, when executed, may include the above-described embodiment method flow. The storage medium may be a magnetic disk, an optical disk, a read-only memory, a random access memory, or the like.
The foregoing disclosure is only illustrative of the preferred embodiments of the present application and is not intended to limit the scope of the claims herein, as the equivalent of the claims herein shall be construed to fall within the scope of the claims herein.

Claims (9)

1. The human body whole body motion capturing method is based on a camera and a circulating feedback mechanism and is characterized by comprising the following steps of:
collecting motion data of a human body;
cleaning and marking the motion data to obtain a cutting graph of the human body;
extracting facial expression posture information, body motion information, gesture motion information and body root position 3D information of the human body according to the cutting graph and/or a fusion mode of loop feedback and iterative mutual optimization, wherein the method comprises the following steps of: according to the cutting graph, determining facial expression posture information, body motion information and gesture motion information of the human body; carrying out iterative optimization on the facial expression posture information, the body motion information and the gesture motion information by adopting a fusion mode of loop feedback and iterative mutual optimization; the facial expression posture information, the body motion information and the gesture motion information after iterative optimization are the extracted facial expression posture information, body motion information and gesture motion information of the human body;
And carrying out space-time fusion on the facial expression posture information, the body motion information, the gesture motion information and the body root position 3D information to obtain the whole body motion information of the human body.
2. The method of claim 1, wherein the acquiring motion data of the human body comprises:
and acquiring the motion data of the human body through a single RGB camera.
3. The human body whole body motion capturing method according to claim 1, wherein determining facial expression posture information of the human body from the cut map comprises:
inputting the cutting graph into a pre-training model to obtain the general facial features of the human body;
acquiring facial expression posture information of the human body according to the general facial features, the general ID coding network, the expression extraction neural network, the expression optimization model and the posture extraction neural network;
and fusing the facial expression posture information through a space-time fusion network to obtain the fused facial expression posture information.
4. The human body whole body motion capturing method according to claim 1, wherein determining the body motion information and the gesture motion information of the human body from the cut map comprises:
Extracting a joint global map, a gesture global map, a joint local map and a gesture local map of the human body according to the cutting map;
respectively inputting the joint global map and the joint local map into a mobilent network, and outputting joint global characteristics and joint local characteristics of the human body;
respectively inputting the gesture global map and the gesture local map into a mobilent network, and outputting gesture global features and gesture local features of the human body;
fusing the joint global features and the joint local features through a space-time fusion network to obtain body action information;
and fusing the gesture global features and the gesture local features through the space-time fusion network to obtain gesture action information.
5. The human body whole body motion capturing method according to claim 1, wherein extracting 3D information of a body root position of the human body according to the cut map comprises:
extracting the motion direction characteristics and tracking information of the human body according to the cutting graph;
and determining the 3D information of the body root position of the human body according to the motion direction characteristics, the tracking information and the prior constraint of the physical rule.
6. The human whole body motion capture method according to claim 1, further comprising:
and repositioning the virtual character according to the whole body action information, so that the real-time driving of the virtual character is realized.
7. A human body whole body motion capture device based on a camera and a circulating feedback mechanism, comprising:
the data acquisition module is used for acquiring motion data of a human body;
the clear labeling module is used for cleaning and labeling the motion data to obtain a cutting graph of the human body;
the human body whole body action extraction module is used for extracting facial expression posture information, body action information, gesture action information and body root position 3D information of the human body according to the cutting graph and/or a fusion mode of loop feedback and iterative mutual optimization, and comprises the following steps: according to the cutting graph, determining facial expression posture information, body motion information and gesture motion information of the human body; carrying out iterative optimization on the facial expression posture information, the body motion information and the gesture motion information by adopting a fusion mode of loop feedback and iterative mutual optimization; the facial expression posture information, the body motion information and the gesture motion information after iterative optimization are the extracted facial expression posture information, body motion information and gesture motion information of the human body;
The human body whole body action determining module is used for carrying out space-time fusion on the facial expression posture information, the body action information, the gesture action information and the body root position 3D information to obtain whole body action information of the human body.
8. A computer storage medium storing a plurality of instructions adapted to be loaded by a processor and to perform the method steps of any of claims 1-6.
9. A terminal, comprising: a processor and a memory; wherein the memory stores a computer program adapted to be loaded by the processor and to perform the method steps of any of claims 1-6.
CN202210742349.7A 2022-06-28 2022-06-28 Human body whole body motion capturing method, device, storage medium and terminal Active CN115294623B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210742349.7A CN115294623B (en) 2022-06-28 2022-06-28 Human body whole body motion capturing method, device, storage medium and terminal

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210742349.7A CN115294623B (en) 2022-06-28 2022-06-28 Human body whole body motion capturing method, device, storage medium and terminal

Publications (2)

Publication Number Publication Date
CN115294623A CN115294623A (en) 2022-11-04
CN115294623B true CN115294623B (en) 2023-05-16

Family

ID=83821096

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210742349.7A Active CN115294623B (en) 2022-06-28 2022-06-28 Human body whole body motion capturing method, device, storage medium and terminal

Country Status (1)

Country Link
CN (1) CN115294623B (en)

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110033505A (en) * 2019-04-16 2019-07-19 西安电子科技大学 A kind of human action capture based on deep learning and virtual animation producing method
CN110223368A (en) * 2019-05-15 2019-09-10 武汉奥贝赛维数码科技有限公司 A kind of unmarked motion capture method of face based on deep learning
CN113569775A (en) * 2021-08-02 2021-10-29 杭州相芯科技有限公司 Monocular RGB input-based mobile terminal real-time 3D human body motion capture method and system, electronic equipment and storage medium
CN113989830A (en) * 2021-09-17 2022-01-28 苏州声影空间智能科技有限公司 Motion gesture recognition method based on 3D video

Also Published As

Publication number Publication date
CN115294623A (en) 2022-11-04

Similar Documents

Publication Publication Date Title
WO2021129064A1 (en) Posture acquisition method and device, and key point coordinate positioning model training method and device
WO2021184933A1 (en) Three-dimensional human body model reconstruction method
KR20220025023A (en) Animation processing method and apparatus, computer storage medium, and electronic device
CN111240476B (en) Interaction method and device based on augmented reality, storage medium and computer equipment
US11282257B2 (en) Pose selection and animation of characters using video data and training techniques
CN110796593A (en) Image processing method, device, medium and electronic equipment based on artificial intelligence
CN110688965A (en) IPT (inductive power transfer) simulation training gesture recognition method based on binocular vision
US11648477B2 (en) Systems and methods for generating a model of a character from one or more images
CN111949112A (en) Object interaction method, device and system, computer readable medium and electronic equipment
CN111062328A (en) Image processing method and device and intelligent robot
CN109325408A (en) A kind of gesture judging method and storage medium
CN110069125B (en) Virtual object control method and device
CN107329564B (en) Man-machine finger guessing method based on gesture intelligent perception and man-machine cooperation mechanism
CN115497149A (en) Music interaction method for automobile cabin
US11361467B2 (en) Pose selection and animation of characters using video data and training techniques
CN115294623B (en) Human body whole body motion capturing method, device, storage medium and terminal
KR101447958B1 (en) Method and apparatus for recognizing body point
CN112837339B (en) Track drawing method and device based on motion capture technology
CN115994944A (en) Three-dimensional key point prediction method, training method and related equipment
CN112183155B (en) Method and device for establishing action posture library, generating action posture and identifying action posture
Ghosh et al. Real-time 3d markerless multiple hand detection and tracking for human computer interaction applications
CN112132107A (en) Image processing method, image processing device, electronic equipment and computer readable storage medium
CN112667088B (en) Gesture application identification method and system based on VR walking platform
CN113342167B (en) Space interaction AR realization method and system based on multi-person visual angle positioning
CN110941974B (en) Control method and device of virtual object

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant