CN111523408B

CN111523408B - Motion capturing method and device

Info

Publication number: CN111523408B
Application number: CN202010272137.8A
Authority: CN
Inventors: 孟庆月; 赵晨
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2020-04-09
Filing date: 2020-04-09
Publication date: 2023-09-15
Anticipated expiration: 2040-04-09
Also published as: CN111523408A

Abstract

The embodiment of the application discloses a motion capture method and a motion capture device. One embodiment of the method comprises the following steps: acquiring a video frame containing an object, which is shot by a monocular camera; detecting key points of the object in the video frame to obtain three-dimensional coordinates of the key points of the object; a target associated with the object is generated based on three-dimensional coordinates of at least one of the keypoints of the object in the video frame. The embodiment of the application can adopt the monocular camera to shoot, thereby effectively saving shooting cost. And, by determining the three-dimensional coordinates of the keypoints, a more accurate target can be generated.

Description

Motion capturing method and device

Technical Field

The embodiment of the application relates to the technical field of computers, in particular to the technical field of Internet, and particularly relates to a motion capturing method and device.

Background

Motion capture refers to the recording and processing of displacements or activities of a person or other object using an external device. The technology related to motion capture was mainly used in the fields of physical therapy and physical rehabilitation in the early stage, and was applied in the field of animation in the beginning of the 20 th century.

With the rapid development of the animation industry, virtual reality, games and other fields, the application of motion capture is becoming more and more widespread.

Disclosure of Invention

The embodiment of the application provides a motion capture method and a motion capture device.

In a first aspect, an embodiment of the present application provides a motion capture method, including: acquiring a video frame containing an object, which is shot by a monocular camera; detecting key points of an object in a video frame to obtain three-dimensional coordinates of the key points of the object; a target associated with the object is generated based on three-dimensional coordinates of at least one of the keypoints of the object in the video frame.

In some embodiments, the target comprises a virtual animation or motion trail; generating a target associated with the object based on three-dimensional coordinates of at least one of the keypoints of the object in the video frame, comprising: inputting three-dimensional coordinates of at least one key point in key points of objects in the video frame into a first kinematic algorithm to obtain a motion track of the objects in the video frame; and/or inputting three-dimensional coordinates of at least one key point of key points of the object in the video frame into a second kinematic algorithm to obtain parameters for driving the virtual animation, and driving the virtual animation based on the parameters, wherein the key point of the virtual animation corresponds to the at least one key point of the object.

In some embodiments, the second kinematic algorithm comprises a reverse kinematic algorithm, at least one key point comprises a joint key point, and the parameter comprises a joint angle corresponding to the joint key point; inputting three-dimensional coordinates of at least one key point of key points of objects in the video frame into a second kinematic algorithm to obtain parameters for driving the virtual animation, wherein the parameters comprise: inputting three-dimensional coordinates of the joint key points in the video frame into a reverse kinematics algorithm to obtain joint included angles corresponding to the joint key points; inputting the joint included angle into a forward kinematics algorithm to obtain parameters for driving the virtual animation.

In some embodiments, for each of the keypoints of the object in the video frame, in the three-dimensional coordinates (x, y, z) of the keypoint, x, y is the coordinate of the keypoint mapped from the camera coordinate system of the monocular camera to the UV coordinate system of the video frame, and z is the distance of the keypoint from the monocular camera in the camera coordinate system.

In some embodiments, the method further comprises: for each video frame, selecting a specified three-dimensional coordinate from three-dimensional coordinates of key points of objects in the video frame as the three-dimensional coordinate of at least one key point, wherein each category of objects has the specified key point corresponding to the category.

In a second aspect, an embodiment of the present application provides a motion capture device, including: an acquisition unit configured to acquire a video frame containing an object captured by a monocular camera; the detection unit is configured to detect key points of the object in the video frame and obtain three-dimensional coordinates of the key points of the object; and a generation unit configured to generate a target associated with the object based on three-dimensional coordinates of at least one of the key points of the object in the video frame.

In some embodiments, the generating unit is further configured to perform generating the object associated with the object based on three-dimensional coordinates of at least one of the keypoints of the object in the video frame as follows: inputting three-dimensional coordinates of at least one key point in key points of objects in the video frame into a first kinematic algorithm to obtain a motion track of the objects in the video frame; and/or inputting three-dimensional coordinates of at least one key point of key points of the object in the video frame into a second kinematic algorithm to obtain parameters for driving the virtual animation, and driving the virtual animation based on the parameters, wherein the key point of the virtual animation corresponds to the at least one key point of the object.

In some embodiments, the second kinematic algorithm comprises a reverse kinematic algorithm, at least one key point comprises a joint key point, and the parameter comprises a joint angle corresponding to the joint key point; the generating unit is further configured to perform three-dimensional coordinates of at least one key point of key points of objects in the video frame, input a second kinematic algorithm, and obtain parameters for driving the virtual animation in the following manner: inputting three-dimensional coordinates of the joint key points in the video frame into a reverse kinematics algorithm to obtain joint included angles corresponding to the joint key points; inputting the joint included angle into a forward kinematics algorithm to obtain parameters for driving the virtual animation.

In some embodiments, the apparatus further comprises: and a selection unit configured to select, for each video frame, a specified three-dimensional coordinate among three-dimensional coordinates of key points of objects in the video frame as a three-dimensional coordinate of at least one key point, wherein each category of objects has a specified key point corresponding to the category.

In a third aspect, an embodiment of the present application provides an electronic device, including: one or more processors; and a storage means for storing one or more programs which, when executed by the one or more processors, cause the one or more processors to implement a method as in any of the embodiments of the motion capture method.

In a fourth aspect, embodiments of the present application provide a computer readable storage medium having stored thereon a computer program which when executed by a processor performs a method as any of the embodiments of the motion capture method.

According to the motion capture scheme provided by the embodiment of the application, firstly, a video frame containing an object and shot by a monocular camera is obtained. And then, detecting key points of the object in the video frame to obtain three-dimensional coordinates of the key points of the object. Finally, a target associated with the object is generated based on the three-dimensional coordinates of at least one of the keypoints of the object in the video frame. The scheme provided by the embodiment of the application can adopt the monocular camera to shoot, thereby effectively saving shooting cost. And, by determining the three-dimensional coordinates of the keypoints, a more accurate target can be generated.

Drawings

Other features, objects and advantages of the present application will become more apparent upon reading of the detailed description of non-limiting embodiments, made with reference to the accompanying drawings in which:

FIG. 1 is an exemplary system architecture diagram in which some embodiments of the present application may be applied;

FIG. 2 is a flow chart of one embodiment of a motion capture method according to the present application;

FIG. 3 is a schematic diagram of an application scenario of the motion capture method according to the present application;

FIG. 4 is a flow chart of yet another embodiment of a motion capture method according to the present application;

FIG. 5 is a schematic diagram illustrating the construction of one embodiment of a motion capture device in accordance with the present application;

FIG. 6 is a schematic diagram of a computer system suitable for use in implementing some embodiments of the application.

Detailed Description

The application is described in further detail below with reference to the drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the application and are not limiting of the application. It should be noted that, for convenience of description, only the portions related to the present application are shown in the drawings.

It should be noted that, without conflict, the embodiments of the present application and features of the embodiments may be combined with each other. The application will be described in detail below with reference to the drawings in connection with embodiments.

FIG. 1 illustrates an exemplary system architecture 100 in which embodiments of motion capture methods or motion capture devices of the present application may be applied.

As shown in fig. 1, a system architecture 100 may include terminal devices 101, 102, 103, a network 104, and a server 105. The network 104 is used as a medium to provide communication links between the terminal devices 101, 102, 103 and the server 105. The network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, among others.

The user may interact with the server 105 via the network 104 using the terminal devices 101, 102, 103 to receive or send messages or the like. Various communication client applications, such as a motion capture application, a video-type application, a live application, an instant messaging tool, a mailbox client, social platform software, etc., may be installed on the terminal devices 101, 102, 103.

The terminal devices 101, 102, 103 may be hardware or software. When the terminal devices 101, 102, 103 are hardware, they may be various electronic devices with display screens, including but not limited to smartphones, tablets, electronic book readers, laptop and desktop computers, and the like. When the terminal devices 101, 102, 103 are software, they can be installed in the above-listed electronic devices. Which may be implemented as multiple software or software modules (e.g., multiple software or software modules for providing distributed services) or as a single software or software module. The present application is not particularly limited herein.

The server 105 may be a server providing various services, such as a background server providing support for the terminal devices 101, 102, 103. The background server may analyze and process the received data, such as the video frame, and feed back the processing result (for example, the motion track of the object in the video frame) to the terminal device.

It should be noted that, the motion capturing method provided in the embodiment of the present application may be executed by the server 105 or the terminal devices 101, 102, 103, and accordingly, the motion capturing apparatus may be disposed in the server 105 or the terminal devices 101, 102, 103.

It should be understood that the number of terminal devices, networks and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.

With continued reference to FIG. 2, a flow 200 of one embodiment of a motion capture method in accordance with the present application is shown. The motion capturing method comprises the following steps:

step 201, a video frame containing an object shot by a monocular camera is acquired.

In this embodiment, the execution body of the motion capture method (for example, the server or the terminal device shown in fig. 1) may acquire the video frame acquired by the monocular camera. The video frame contains an object. The object here may be various objects, for example, a living thing such as a person or an animal, or an object such as an animated image of a sponge baby. The video frame may be a single video frame or a plurality of video frames such as a plurality of consecutive video frames.

Step 202, detecting key points of the object in the video frame to obtain three-dimensional coordinates of the key points of the object.

In this embodiment, the executing body may detect a key point of an object included in the video frame. If the object is a human body, the key points of the object may be human body key points. If the object is an animal, the keypoints of the object may be preset for the animal, for example, may include joint keypoints indicating joints and face keypoints indicating faces. The keypoints may be represented using three-dimensional coordinates. The three-dimensional coordinates herein may include various meanings, for example, the three-dimensional coordinates of the key points may be three-dimensional positions of the key points with respect to the camera.

In practice, the execution subject described above may determine the three-dimensional coordinates of the keypoints using a pre-trained deep neural network. For example, the deep neural network may be a convolutional neural network or a residual neural network, or the like. In particular, the depth neural network may be used to characterize the correspondence between an image (such as a video frame) and the three-dimensional coordinates of key points in the image. The execution body inputs the acquired video frame into the depth neural network, and can obtain three-dimensional coordinates of key points in the video frame output from the depth neural network.

In step 203, a target associated with the object is generated based on the three-dimensional coordinates of at least one of the keypoints of the object in the video frame.

In this embodiment, the execution body may generate the target associated with the object based on three-dimensional coordinates of at least one of the detected keypoints. Specifically, the target may be a motion track of an object or a motion track of a key point in the object, or may be a virtual animation driven by three-dimensional coordinates of at least one key point. For example, if the object in the video frame is a tennis ball, the motion trajectory may be coordinate data of a generated motion trajectory line of the tennis ball, and the virtual animation is an animation of the generated tennis ball in motion using three-dimensional coordinates of at least one key point (such as a center point of the tennis ball). For another example, if the object in the video frame is a person, the motion trajectory may be the motion trajectory of a key point or some key points of the human body, and the virtual animation may be an avatar that changes in motion with the motion of the object (facial motion, body motion, and/or hand motion). The avatar may be an avatar having a high similarity to the object, or may be an avatar different from the object such as an animated character in an animation film.

In practice, the above-described execution bodies may generate the target associated with the object based on the three-dimensional coordinates of the at least one key point in various ways. For example, the target may be a determination result of whether or not the action of the object is standard or standard degree. The execution body may compare the obtained three-dimensional position of the at least one key point (at least one joint point) with a reference three-dimensional position of the at least one joint point. And then the execution body can determine whether the action of the object is standard or not based on the three-dimensional position deviation obtained by comparison, or determine the standard degree of the action of the object according to the corresponding relation between the preset three-dimensional position deviation and the standard degree of the action. Furthermore, the target may also be an action made by the object. The execution body may determine which preset action the object makes based on a positional relationship between specified key points among the at least one key point. Specifically, the executing body may determine the action corresponding to the at least one key point by using a preset correspondence between the position relationship between the designated key points and the action, such as a correspondence table or a model.

The method provided by the embodiment of the application can adopt the monocular camera to shoot, thereby effectively saving shooting cost. And, by determining the three-dimensional coordinates of the keypoints, a more accurate target can be generated.

In some optional implementations of the present embodiment, the method further includes: for each video frame, selecting a specified three-dimensional coordinate from three-dimensional coordinates of key points of objects in the video frame as the three-dimensional coordinate of at least one key point, wherein each category of objects has the specified key point corresponding to the category.

In these alternative implementations, the execution body may select, from the three-dimensional coordinates of the key points obtained in step 202, the three-dimensional coordinates of the specified key point as the three-dimensional coordinates of the at least one key point.

Each class of objects has a specified keypoint corresponding thereto, and thus, in practice, the keypoints corresponding to different classes of objects may be different. For example, a person's key points include the key points of a person's joints, and a soccer key point may include only the center point of a soccer ball. Further, the specified key points may be different for the same kind of object. For example, where the object is a person, the designated keypoints may include only the keypoints of the person's hand if hand motion is to be captured. If facial keypoints are to be captured, the specified keypoints may include only the keypoints of the person's face. If a body motion is to be captured, the designated keypoints may include only those of a person's joints.

The implementation modes can process different objects or application scenes by adopting the designated key points, and avoid invalid processing procedures generated by processing other key points, thereby improving the processing effectiveness, saving the operation resources and improving the processing efficiency to a certain extent.

In some optional implementations of this embodiment, for each of the keypoints of the object in the video frame, in the three-dimensional coordinates (x, y, z) of the keypoint, x, y is the coordinate of the keypoint mapped from the camera coordinate system of the monocular camera to the UV coordinate system of the video frame, and z is the distance of the keypoint from the monocular camera in the camera coordinate system.

In these alternative implementations, for each of the video frames, the three-dimensional coordinates of each of the keypoints of the object in the video frame may be (x, y, z). Where x, y is the UV coordinate mapped into the UV coordinate system and z is the distance from the monocular camera, where the unit of distance may be mm.

The implementation modes can show richer key point information through the UV coordinates and the distance between the UV coordinates and the camera, so that the generated target is more accurate and vivid.

With continued reference to fig. 3, fig. 3 is a schematic diagram of an application scenario of the motion capture method according to the present embodiment. In the application scenario of fig. 3, the execution subject 301 may acquire a plurality of continuous video frames 302 containing one white rabbit shot by a monocular camera. The key points (such as head key points and joint key points) of the white rabbit in the video frames 302 are detected, and three-dimensional coordinates 303 of the key points of the white rabbit are obtained. A target 304 associated with the white rabbit, such as a motion trajectory of the white rabbit, is generated based on three-dimensional coordinates of at least one of the key points of the white rabbit in a video frame.

With further reference to FIG. 4, a flow 400 of yet another embodiment of a motion capture method is shown. The process 400 of the motion capture method includes the steps of:

step 401, obtaining a video frame containing an object, which is shot by a monocular camera.

In this embodiment, the execution body of the motion capture method (for example, the server or the terminal device shown in fig. 1) may acquire the video frame acquired by the monocular camera. The video frame contains an object. The object may be various objects, such as living things, such as people or animals, or objects, such as animated image sponge baby. The video frame may be a single video frame or a plurality of video frames such as a plurality of consecutive video frames.

In step 402, key points of the object in the video frame are detected, and three-dimensional coordinates of the key points of the object are obtained.

In this embodiment, the executing body may detect a key point of an object included in the video frame. If the object is a human body, the key points of the object may be human body key points. If the object is an animal, the keypoints of the object may be preset for the animal, such as may include keypoints indicating joints and keypoints indicating faces. The keypoints may be represented using three-dimensional coordinates.

Step 403, inputting the three-dimensional coordinates of at least one key point of the key points of the object in the video frame into a first kinematic algorithm to obtain the motion trail of the object in the video frame.

In this embodiment, the target may include a virtual animation or a motion trail. The execution body may input the three-dimensional coordinates of at least one key point of the object in the video frame into a first kinematic algorithm (for example, an inverse kinematic algorithm) to obtain a motion track of the object output from the first kinematic algorithm.

And step 404, inputting three-dimensional coordinates of at least one key point of key points of the objects in the video frames into a second kinematic algorithm to obtain parameters for driving the virtual animation, and driving the virtual animation based on the parameters, wherein the key points of the virtual animation correspond to the at least one key point of the objects.

In this embodiment, the executing body may input the three-dimensional coordinates of the at least one key point into a second kinematic algorithm (for example, an inverse kinematic algorithm) to obtain parameters for driving the virtual animation, and drive the virtual animation based on the parameters. In particular, the parameter may be various parameters for driving the virtual animation, and may indicate a target position to be reached by a specified point in the virtual animation, such as a target three-dimensional position of a vertex in a three-dimensional model (e.g., a three-dimensional face model). The keypoints of the virtual animation correspond to at least one keypoint of the object, i.e. the virtual animation and the object each comprise at least one keypoint. Thus, the virtual animation also includes at least one key point described above, such as a right corner key point of the left eye, a mouth key point, and so on.

The embodiment can generate the motion trail of the object or drive the virtual animation by using a designated kinematics algorithm, thereby realizing multiple functions by accurately using the three-dimensional coordinates of the key points.

In some optional implementations of this embodiment, the second kinematic algorithm includes a reverse kinematic algorithm, at least one key point includes a joint key point, and the parameter includes a joint angle corresponding to the joint key point; and inputting the three-dimensional coordinates of at least one of the key points of the object in the video frame into a second kinematic algorithm to obtain parameters for driving the virtual animation in step 403, which may include: inputting three-dimensional coordinates of the joint key points in the video frame into a reverse kinematics algorithm to obtain joint included angles corresponding to the joint key points; inputting the joint included angle into a forward kinematics algorithm to obtain parameters for driving the virtual animation.

In these alternative implementations, the executing body may input the three-dimensional coordinates of the at least one key point into a reverse kinematic algorithm, so as to obtain the joint included angle corresponding to the joint key point output from the reverse kinematic algorithm. Thereafter, the executing body may input the joint angle into a forward kinematics (Forward Kinematics) algorithm, thereby obtaining parameters for driving the virtual animation.

In practice, the executing body may calculate the included angle of the key points of the joint, that is, the included angle of the joint, by using a reverse kinematics algorithm. The three joint key points can form a joint included angle, and the joint included angle is the joint included angle corresponding to all the three joint key points. The joint angles corresponding to each joint key point can be one or more. For example, the three joint key points of the right wrist joint point, the right elbow joint point and the right shoulder joint point respectively can form a joint included angle by taking the right elbow joint point as the vertex. The right shoulder joint point can also correspond to other joint angles, for example, the joint angles formed by the right elbow joint point, the right shoulder joint point and the shoulder center joint point can be corresponding.

The implementation modes can be combined with a reverse kinematics algorithm and a forward kinematics algorithm to accurately determine the parameters for driving the virtual animation.

With further reference to fig. 5, as an implementation of the method shown in the foregoing figures, the present application provides an embodiment of a motion capture device, where the embodiment of the device corresponds to the embodiment of the method shown in fig. 2, and may include the same or corresponding features or effects as the embodiment of the method shown in fig. 2, except for the features described below. The device can be applied to various electronic equipment.

As shown in fig. 5, the motion capture device 500 of the present embodiment includes: an acquisition unit 501, a detection unit 502, and a generation unit 503. Wherein, the obtaining unit 501 is configured to obtain a video frame containing an object, which is shot by a monocular camera; the detecting unit 502 is configured to detect key points of the object in the video frame, and obtain three-dimensional coordinates of the key points of the object; the generating unit 503 is configured to generate a target associated with the object based on three-dimensional coordinates of at least one of the key points of the object in the video frame.

In this embodiment, the specific processing and the technical effects of the acquiring unit 501, the detecting unit 502 and the generating unit 503 of the motion capturing device 500 may refer to the related descriptions of the steps 201, 202 and 203 in the corresponding embodiment of fig. 2, and are not repeated here.

In some optional implementations of this embodiment, the generating unit is further configured to perform generating the object-associated target based on three-dimensional coordinates of at least one of the keypoints of the object in the video frame as follows: inputting three-dimensional coordinates of at least one key point in key points of objects in the video frame into a first kinematic algorithm to obtain a motion track of the objects in the video frame; and/or inputting three-dimensional coordinates of at least one key point of key points of the object in the video frame into a second kinematic algorithm to obtain parameters for driving the virtual animation, and driving the virtual animation based on the parameters, wherein the key point of the virtual animation corresponds to the at least one key point of the object.

In some optional implementations of this embodiment, the second kinematic algorithm includes a reverse kinematic algorithm, at least one key point includes a joint key point, and the parameter includes a joint angle corresponding to the joint key point; the generating unit is further configured to perform three-dimensional coordinates of at least one key point of key points of objects in the video frame, input a second kinematic algorithm, and obtain parameters for driving the virtual animation in the following manner: inputting three-dimensional coordinates of the joint key points in the video frame into a reverse kinematics algorithm to obtain joint included angles corresponding to the joint key points; inputting the joint included angle into a forward kinematics algorithm to obtain parameters for driving the virtual animation.

In some optional implementations of this embodiment, the apparatus further includes: and a selection unit configured to select, for each video frame, a specified three-dimensional coordinate among three-dimensional coordinates of key points of objects in the video frame as a three-dimensional coordinate of at least one key point, wherein each category of objects has a specified key point corresponding to the category.

As shown in fig. 6, the electronic device 600 may include a processing means (e.g., a central processing unit, a graphics processor, etc.) 601, which may perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM) 602 or a program loaded from a storage means 608 into a Random Access Memory (RAM) 603. In the RAM 603, various programs and data required for the operation of the electronic apparatus 600 are also stored. The processing device 601, the ROM 602, and the RAM 603 are connected to each other through a bus 604. An input/output (I/O) interface 605 is also connected to bus 604.

In general, the following devices may be connected to the I/O interface 605: input devices 606 including, for example, a touch screen, touchpad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, and the like; an output device 607 including, for example, a Liquid Crystal Display (LCD), a speaker, a vibrator, and the like; storage 608 including, for example, magnetic tape, hard disk, etc.; and a communication device 609. The communication means 609 may allow the electronic device 600 to communicate with other devices wirelessly or by wire to exchange data. While fig. 6 shows an electronic device 600 having various means, it is to be understood that not all of the illustrated means are required to be implemented or provided. More or fewer devices may be implemented or provided instead. Each block shown in fig. 6 may represent one device or a plurality of devices as needed.

In particular, according to embodiments of the present disclosure, the processes described above with reference to flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method shown in the flowcharts. In such an embodiment, the computer program may be downloaded and installed from a network via communication means 609, or from storage means 608, or from ROM 602. The above-described functions defined in the methods of the embodiments of the present disclosure are performed when the computer program is executed by the processing means 601. It should be noted that the computer readable medium of the embodiments of the present disclosure may be a computer readable signal medium or a computer readable storage medium, or any combination of the two. The computer readable storage medium can be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples of the computer-readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In an embodiment of the present disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. Whereas in embodiments of the present disclosure, the computer-readable signal medium may comprise a data signal propagated in baseband or as part of a carrier wave, with computer-readable program code embodied therein. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: electrical wires, fiber optic cables, RF (radio frequency), and the like, or any suitable combination of the foregoing.

The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The units involved in the embodiments of the present application may be implemented in software or in hardware. The described units may also be provided in a processor, for example, described as: a processor includes an acquisition unit, a detection unit, and a generation unit. The names of these units do not constitute limitations on the unit itself in some cases, and for example, the acquisition unit may also be described as "a unit that acquires a video frame containing an object captured by a monocular camera".

As another aspect, the present application also provides a computer-readable medium that may be contained in the apparatus described in the above embodiments; or may be present alone without being fitted into the device. The computer readable medium carries one or more programs which, when executed by the apparatus, cause the apparatus to: acquiring a video frame containing an object, which is shot by a monocular camera; detecting key points of an object in a video frame to obtain three-dimensional coordinates of the key points of the object; a target associated with the object is generated based on three-dimensional coordinates of at least one of the keypoints of the object in the video frame.

The above description is only illustrative of the preferred embodiments of the present application and of the principles of the technology employed. It will be appreciated by persons skilled in the art that the scope of the application referred to in the present application is not limited to the specific combinations of the technical features described above, but also covers other technical features formed by any combination of the technical features described above or their equivalents without departing from the inventive concept described above. Such as the above-mentioned features and the technical features disclosed in the present application (but not limited to) having similar functions are replaced with each other.

Claims

1. A method of motion capture, the method comprising:

acquiring a video frame containing an object, which is shot by a monocular camera;

detecting key points of the object in the video frame to obtain three-dimensional coordinates of the key points of the object;

generating a target associated with the object based on three-dimensional coordinates of at least one of the object's keypoints in the video frame;

the target comprises a virtual animation or a motion trail; the generating a target associated with the object based on three-dimensional coordinates of at least one of the keypoints of the object in the video frame comprises:

inputting three-dimensional coordinates of at least one key point of key points of the object in the video frame into a second kinematic algorithm to obtain parameters for driving the virtual animation, and driving the virtual animation based on the parameters, wherein the key points of the virtual animation correspond to the at least one key point of the object;

the second kinematic algorithm comprises a reverse kinematic algorithm, the at least one key point comprises a joint key point, and the parameter comprises a joint included angle corresponding to the joint key point; inputting three-dimensional coordinates of at least one key point of the key points of the object in the video frame into a second kinematic algorithm to obtain parameters for driving the virtual animation, wherein the parameters comprise:

inputting the three-dimensional coordinates of the joint key points in the video frame into the inverse kinematics algorithm to obtain joint included angles corresponding to the joint key points;

and inputting the joint included angle into a forward kinematics algorithm to obtain parameters for driving the virtual animation.

2. The method of claim 1, wherein the generating a target associated with the object based on three-dimensional coordinates of at least one of the keypoints of the object in the video frame further comprises:

and inputting three-dimensional coordinates of at least one key point of the key points of the object in the video frame into a first kinematic algorithm to obtain the motion trail of the object in the video frame.

3. The method of any of claims 1-2, wherein, for each of the keypoints of the object in the video frame, in the three-dimensional coordinates (x, y, z) of the keypoint, x, y is the coordinate of the keypoint mapped from the camera coordinate system of the monocular camera to the UV coordinate system of the video frame, z is the distance of the keypoint from the monocular camera in the camera coordinate system.

4. The method of claim 1, wherein the method further comprises:

for each video frame, selecting a specified three-dimensional coordinate from three-dimensional coordinates of key points of the objects in the video frame as the three-dimensional coordinate of the at least one key point, wherein each category of objects has the specified key point corresponding to the category.

5. A motion capture device, the device comprising:

an acquisition unit configured to acquire a video frame containing an object captured by a monocular camera;

the detection unit is configured to detect key points of the object in the video frame and obtain three-dimensional coordinates of the key points of the object;

a generation unit configured to generate a target associated with the object based on three-dimensional coordinates of at least one of the key points of the object in the video frame;

the generating unit is further configured to perform the generating of the target associated with the object based on three-dimensional coordinates of at least one of the keypoints of the object in the video frame in the following manner:

the second kinematic algorithm comprises a reverse kinematic algorithm, the at least one key point comprises a joint key point, and the parameter comprises a joint included angle corresponding to the joint key point; the generating unit is further configured to execute the three-dimensional coordinates of at least one key point of the key points of the object in the video frame, input a second kinematic algorithm, and obtain parameters for driving a virtual animation in the following manner:

6. The apparatus of claim 5, wherein the generating unit is further configured to perform the generating the object associated with the object based on three-dimensional coordinates of at least one of the keypoints of the object in the video frame in the following manner:

7. The apparatus of any of claims 5-6, wherein, for each of the keypoints of the object in the video frame, in the three-dimensional coordinates (x, y, z) of the keypoint, x, y is the coordinate of the keypoint mapped from the camera coordinate system of the monocular camera to the UV coordinate system of the video frame, z is the distance of the keypoint from the monocular camera in the camera coordinate system.

8. The apparatus of claim 5, wherein the apparatus further comprises:

and a selecting unit configured to select, for each of the video frames, a specified three-dimensional coordinate among three-dimensional coordinates of key points of the objects in the video frame as a three-dimensional coordinate of the at least one key point, wherein each category of objects has a specified key point corresponding to the category.

9. An electronic device, comprising:

one or more processors;

storage means for storing one or more programs,

when executed by the one or more processors, causes the one or more processors to implement the method of any of claims 1-4.

10. A computer readable storage medium having stored thereon a computer program, wherein the program when executed by a processor implements the method of any of claims 1-4.