CN118072392A

CN118072392A - Dynamic capture information determining method and device, terminal equipment and readable storage medium

Info

Publication number: CN118072392A
Application number: CN202410215942.5A
Authority: CN
Inventors: 胡永涛; 戴景文; 贺杰
Original assignee: Guangdong Virtual Reality Technology Co Ltd
Current assignee: Guangdong Virtual Reality Technology Co Ltd
Priority date: 2024-02-27
Filing date: 2024-02-27
Publication date: 2024-05-24

Abstract

The application discloses a dynamic capture information determining method, a device, a terminal device and a readable storage medium, wherein the method is applied to the terminal device comprising an image acquisition device and comprises the following steps: collecting video data by an image collecting device, wherein the video data comprises M images, and the images comprise one or more persons; respectively estimating human body postures of M images to obtain M pieces of local 3D posture information, wherein the M images correspond to the M pieces of local 3D posture information one by one; determining space pose information of the terminal equipment according to the M images to obtain M pieces of space pose information, wherein the M images correspond to the M pieces of space pose information one by one; according to the M pieces of space pose information, converting the M pieces of local 3D pose information into global 3D pose information to obtain M pieces of global 3D pose information; and determining dynamic capturing information of one or more people according to the M pieces of global 3D gesture information. According to the embodiment of the application, the determination efficiency of the dynamic capture information can be improved.

Description

Dynamic capture information determining method and device, terminal equipment and readable storage medium

Technical Field

The application relates to the technical field of motion capture, in particular to a method and a device for determining motion capture information, terminal equipment and a readable storage medium.

Background

The motion capture technology is to store the human motion in the form of data through the motion capture system. Currently, a dynamic capture system includes a plurality of image capturing devices, and the dynamic capture system can determine dynamic capture information of a person according to video data collected by the plurality of image capturing devices at different positions. In the above manner, the dynamic capture system needs to calibrate the multiple image acquisition devices among the devices, and performs time synchronization processing, data cooperative processing and other processing on video data acquired by the multiple image acquisition devices, so that the processing process is more, and the determination efficiency of dynamic capture information is reduced.

Disclosure of Invention

The embodiment of the application discloses a dynamic capture information determining method, a dynamic capture information determining device, terminal equipment and a readable storage medium, which are used for improving the determining efficiency of dynamic capture information.

In a first aspect, an embodiment of the present application discloses a method for determining dynamic capture information, where the method is applied to a terminal device, the terminal device includes an image acquisition device, and the method includes:

collecting video data by the image collecting device, wherein the video data comprises M images, the images comprise one or more persons, and M is an integer greater than 1;

respectively estimating human body postures of the M images to obtain M pieces of local 3D posture information, wherein the M images are in one-to-one correspondence with the M pieces of local 3D posture information;

Determining the space pose information of the terminal equipment according to the M images to obtain M pieces of space pose information, wherein the M images are in one-to-one correspondence with the M pieces of space pose information;

According to the M pieces of space pose information, converting the M pieces of local 3D pose information into global 3D pose information to obtain M pieces of global 3D pose information;

And determining the dynamic capturing information of the one or more people according to the M pieces of global 3D gesture information.

In a second aspect, an embodiment of the present application discloses a dynamic capture information determining apparatus, where the apparatus is applied to a terminal device, the terminal device includes an image acquisition apparatus, and the apparatus includes:

the image acquisition device is used for acquiring video data, wherein the video data comprises M images, the images comprise one or more persons, and M is an integer greater than 1;

The estimating unit is used for respectively estimating the human body postures of the M images to obtain M pieces of local 3D posture information, and the M images are in one-to-one correspondence with the M pieces of local 3D posture information;

The determining unit is used for determining the space pose information of the terminal equipment according to the M images to obtain M pieces of space pose information, and the M images are in one-to-one correspondence with the M pieces of space pose information;

the conversion unit is used for converting the M pieces of local 3D gesture information into global 3D gesture information according to the M pieces of space gesture information to obtain M pieces of global 3D gesture information;

The determining unit is further configured to determine dynamic capturing information of the one or more people according to the M global 3D pose information.

As a possible implementation manner, the determining unit determines, according to the M images, spatial pose information of the terminal device, where obtaining the M spatial pose information includes:

And determining the space pose information of the terminal equipment through a visual positioning method or a visual odometer method according to the M images to obtain M pieces of space pose information.

As a possible implementation, the terminal device further comprises an inertial measurement unit (Inertial Measurement Unit, IMU);

The acquisition unit is further configured to acquire N pieces of IMU data through the IMU, where the N pieces of IMU data are IMU data acquired by the IMU during an acquisition time of the M images, and N is an integer greater than or equal to M;

the determining unit determines the spatial pose information of the terminal equipment according to the M images, and the obtaining of the M spatial pose information comprises the following steps:

And determining the space pose information of the terminal equipment by using a visual inertial odometer method according to the M images and the N IMU data to obtain M space pose information.

As a possible implementation manner, the terminal device further comprises an IMU and a lidar;

the acquisition unit is further used for acquiring M radar data through the laser radar, and the M radar data are in one-to-one correspondence with the M images;

and determining the space pose information of the terminal equipment by using a laser radar inertial odometer method according to the M images, the N IMU data and the M radar data to obtain M space pose information.

As a possible implementation manner, the device further comprises:

the optimizing unit is used for optimizing the M pieces of local 3D gesture information and/or the M pieces of space gesture information according to the M pieces of global 3D gesture information;

The determining unit is further used for determining optimized M pieces of global 3D gesture information according to the optimized M pieces of local 3D gesture information and/or the optimized M pieces of space gesture information;

The determining unit determining the dynamic capture information of the one or more people according to the M global 3D pose information includes:

and determining target dynamic capture information of the one or more people according to the optimized M global 3D gesture information.

As a possible implementation manner, the device further comprises:

and the control unit is used for controlling the movement of the target object according to the dynamic capturing information of the one or more people.

As a possible implementation manner, the device further comprises:

and the output unit is used for outputting the dynamic capturing information of the one or more persons.

In a third aspect, an embodiment of the present application discloses a terminal device, including a processor and a memory, where the processor is configured to invoke a computer program stored in the memory to perform the method disclosed in the first aspect.

In a fourth aspect, embodiments of the present application disclose a computer readable storage medium having stored thereon a computer program or computer instructions which, when executed by a processor, implement a method as disclosed in the first aspect above.

In a fifth aspect, embodiments of the present application disclose a computer program product comprising computer program code which, when executed by a processor, causes the above-mentioned method to be performed.

In the embodiment of the application, terminal equipment acquires video data comprising M images through an image acquisition device, wherein the images comprise one or more persons; and respectively carrying out human body posture estimation on the M images to obtain M pieces of local 3D posture information, determining space posture information of the terminal equipment according to the M pieces of images to obtain M pieces of space posture information, converting the M pieces of local 3D posture information into global 3D posture information according to the M pieces of space posture information to obtain M pieces of global 3D posture information, and determining dynamic capturing information of one or more people according to the M pieces of global 3D posture information. Therefore, the terminal equipment can determine the dynamic capturing information of the person according to the video data collected by the image collecting device integrated on the terminal equipment, calibration among the image collecting devices is not needed, synchronous processing, data collaborative processing and other processing are not needed for the video data collected by the plurality of image collecting devices, the processing process is reduced, and therefore the determining efficiency of the dynamic capturing information can be improved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the description of the embodiments will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

Fig. 1 is a schematic structural diagram of a terminal device according to an embodiment of the present application;

Fig. 2 is a schematic flow chart of a dynamic capture information determining method disclosed in the embodiment of the application;

FIG. 3 is a flow chart of another method for determining dynamic capture information according to an embodiment of the present application;

Fig. 4 is a schematic structural diagram of an dynamic capture information determining apparatus according to an embodiment of the present application;

Fig. 5 is a schematic structural diagram of a terminal device according to an embodiment of the present application.

Detailed Description

In order to enable those skilled in the art to better understand the present application, the following description will make clear and complete descriptions of the technical solutions according to the embodiments of the present application with reference to the accompanying drawings.

The embodiment of the application discloses a dynamic capture information determining method, a dynamic capture information determining device, terminal equipment and a readable storage medium, which are used for improving the determining efficiency of dynamic capture information. The following will describe in detail.

For a better understanding of the embodiments of the present application, the following description will be given with reference to the related art.

The motion capture technology is to store the human motion in the form of data through the motion capture system.

In one form, an dynamic capture system may include an image acquisition device and a plurality of sensors for wearing on a human body. The dynamic capture system can determine dynamic capture information of the person according to the video data acquired by the image acquisition device and the data acquired by the plurality of sensors. However, the use of the actuation catch system is inconvenient due to the need to wear multiple sensors on the human body.

In another mode, the dynamic capture system comprises a plurality of image acquisition devices, and the dynamic capture system can determine dynamic capture information of a person according to video data acquired by the plurality of image acquisition devices at different positions. In the above manner, the dynamic capture system needs to calibrate the multiple image acquisition devices among the devices, and performs time synchronization processing, data cooperative processing and other processing on video data acquired by the multiple image acquisition devices, so that the processing process is more, and the determination efficiency of dynamic capture information is reduced.

In order to solve the problems, the application discloses a dynamic capture information determining method, wherein a terminal device acquires video data comprising M images through an image acquisition device, wherein the images comprise one or more persons; and respectively carrying out human body posture estimation on the M images to obtain M pieces of local 3D posture information, determining space posture information of the terminal equipment according to the M pieces of images to obtain M pieces of space posture information, converting the M pieces of local 3D posture information into global 3D posture information according to the M pieces of space posture information to obtain M pieces of global 3D posture information, and determining dynamic capturing information of one or more people according to the M pieces of global 3D posture information. Therefore, the terminal equipment can determine the dynamic capturing information of the person according to the video data collected by the image collecting device integrated on the terminal equipment, calibration among the image collecting devices is not needed, synchronous processing, data collaborative processing and other processing are not needed for the video data collected by the plurality of image collecting devices, the processing process is reduced, and therefore the determining efficiency of the dynamic capturing information can be improved.

In addition, since the terminal device does not include a plurality of sensors to be worn on the human body, the convenience of the use of the terminal device can be improved, and thus the usability of the dynamic capturing information can be improved.

In order to better understand the embodiments of the present application, the structure of the terminal device will be described first.

Referring to fig. 1, fig. 1 is a schematic structural diagram of a terminal device according to an embodiment of the present application. As shown in fig. 1, the terminal device may include an image acquisition device and a processor.

The image acquisition device may acquire video data. The processor can determine dynamic capture information of the person according to the video data acquired by the image acquisition device.

The number of the image acquisition devices can be 1 or a plurality of the image acquisition devices. In the case where the number of image pickup devices is plural, the plural image pickup devices may be provided at different positions of the terminal apparatus.

In the case where the number of image pickup devices is 1, the image pickup devices can pick up one video data. In the case where the number of image pickup devices is plural, the image pickup devices may pick up plural video data. The plurality of video data corresponds to the plurality of image acquisition devices one by one, namely each image acquisition device acquires one video data respectively.

The video data may include images that are grayscale images, color images (i.e., RGB images), or depth images.

Under the condition that the number of the image acquisition devices is 1, the image acquisition devices can be cameras for acquiring color images or gray images or cameras for acquiring depth images.

Under the condition that the number of the image acquisition devices is multiple, the multiple image acquisition devices can uniformly acquire gray images, can uniformly acquire color images, can uniformly acquire depth images, can also partially acquire color images and partially acquire gray images, can also partially acquire color images and partially acquire depth images, can also partially acquire gray images and partially acquire depth images, and can also partially acquire color images, partially acquire gray images and partially acquire depth images.

The terminal device may also comprise other auxiliary positioning means. The auxiliary positioning device may include one or more of an IMU, a lidar, a global positioning system (Global Positioning System, GPS), and a Real-time dynamic (Real-TIME KINEMATIC, RTK) positioning system, among others.

In the case where the terminal device includes an IMU, the IMU may collect IMU data. The IMU data is data acquired by the IMU. The number of IMUs may be 1 or more. In case the number of IMUs is plural, the plural IMUs may be provided at different positions of the terminal device.

In case the terminal device comprises a lidar, the lidar may collect radar data. The radar data are data collected by a laser radar.

In case the terminal device comprises a GPS, the GPS may collect GPS data. The GPS data is the position data of the terminal equipment collected by the GPS under the world coordinate system.

In the case where the terminal device comprises an RTK, the RTK may collect the RTK data. The RTK data is position data acquired by the RTK.

In the case that the terminal device includes an image acquisition device and an IMU, the processor may determine the motion capture information of the person according to the video data acquired by the image acquisition device and the IMU data acquired by the IMU. In the case that the terminal device includes an image acquisition device and a laser radar, the processor may determine dynamic capturing information of the person according to video data acquired by the image acquisition device and radar data acquired by the laser radar. In the case that the terminal device includes an image acquisition device and a GPS, the processor may determine the dynamic capturing information of the person according to the video data acquired by the image acquisition device and the GPS data acquired by the GPS. In the case that the terminal device includes an image acquisition device and an RTK, the processor may determine the dynamic capture information of the person according to the video data acquired by the image acquisition device and the RTK data acquired by the RTK. Under the condition that the terminal equipment comprises an image acquisition device, an IMU and a laser radar, the processor can determine the dynamic capturing information of the person according to video data acquired by the image acquisition device, the IMU data acquired by the IMU and the radar data acquired by the laser radar. Under the condition that the terminal equipment comprises an image acquisition device, an IMU and a GPS, the processor can determine the dynamic capturing information of the person according to the video data acquired by the image acquisition device, the IMU data acquired by the IMU and the GPS data acquired by the GPS. In the case that the terminal device includes an image acquisition device, an IMU, and an RTK, the processor may determine motion capture information of the person according to video data acquired by the image acquisition device, IMU data acquired by the IMU, and RTK data acquired by the RTK. In the case that the terminal device includes an image acquisition device, a laser radar and a GPS, the processor may determine the dynamic capturing information of the person according to the video data acquired by the image acquisition device, the radar data acquired by the laser radar and the GPS data acquired by the GPS. In the case that the terminal device includes an image acquisition device, a laser radar and an RTK, the processor may determine the dynamic capture information of the person according to video data acquired by the image acquisition device, radar data acquired by the laser radar, and RTK data acquired by the RTK. In the case that the terminal device includes an image acquisition device, a GPS and an RTK, the processor may determine the dynamic capture information of the person according to video data acquired by the image acquisition device, GPS data acquired by the GPS and RTK data acquired by the RTK. Under the condition that the terminal equipment comprises an image acquisition device, an IMU, a laser radar and a GPS, the processor can determine the dynamic capturing information of the person according to video data acquired by the image acquisition device, IMU data acquired by the IMU, radar data acquired by the laser radar and GPS data acquired by the GPS. In the case that the terminal device includes an image acquisition device, an IMU, a lidar, and an RTK, the processor may determine the motion capture information of the person according to video data acquired by the image acquisition device, IMU data acquired by the IMU, radar data acquired by the lidar, and RTK data acquired by the RTK. In the case that the terminal device includes an image acquisition device, a GPS, a lidar, and an RTK, the processor may determine the dynamic capture information of the person according to video data acquired by the image acquisition device, GPS data acquired by the GPS, radar data acquired by the lidar, and RTK data acquired by the RTK. In the case that the terminal device includes an image acquisition device, an IMU, a GPS, and an RTK, the processor may determine the motion capture information of the person according to video data acquired by the image acquisition device, IMU data acquired by the IMU, GPS data acquired by the GPS, and RTK data acquired by the RTK. Under the condition that the terminal equipment comprises an image acquisition device, an IMU, a laser radar, a GPS and an RTK, the processor can determine the dynamic capturing information of the person according to video data acquired by the image acquisition device, IMU data acquired by the IMU, radar data acquired by the laser radar, GPS data acquired by the GPS and RTK data acquired by the RTK.

By way of example, the terminal device may be a Mixed Reality (MR) helmet comprising 4 cameras and an IMU, 2 of the 4 cameras being used for acquiring color images and/or gray scale images and 2 cameras being used for acquiring depth images.

Referring to fig. 2, fig. 2 is a flow chart of a dynamic capturing information determining method according to an embodiment of the present application. The dynamic capture information determining method can be applied to terminal equipment, and the terminal equipment can comprise an image acquisition device. As shown in fig. 2, the dynamic capture information determining method may include the following steps.

201. Video data including M images is acquired by an image acquisition device.

In the process of motion capture, the terminal equipment can acquire video data through the image acquisition device. The video data may include M images, M being an integer greater than 1. The M images may include one or more persons. The M images include one or more persons, and it is understood that each of the M images includes one or more persons. The M images may include identical or partially identical persons.

In the case of a terminal device comprising an image acquisition means, the terminal device can acquire a video data via the image acquisition means. In the case where the terminal device includes a plurality of image pickup apparatuses, the terminal device may collect a plurality of video data through the plurality of image pickup apparatuses. The image acquisition devices are in one-to-one correspondence with the video data, namely each image acquisition device acquires one video data. Each of the plurality of video data includes M images.

The image acquired by one image acquisition device can be a gray image, a color image or a depth image.

The types of images acquired by the different image acquisition devices may be identical. The images collected by the image collecting devices can be gray images, color images and depth images.

The types of images acquired by different image acquisition devices may be different. For example, one of the 2 image capturing devices may capture a color image and the other image capturing device may capture a depth image.

202. And respectively carrying out human body posture estimation on the M images to obtain M pieces of local 3D posture information.

After the terminal equipment acquires video data through the image acquisition device, human body posture estimation can be respectively carried out on M images, and M pieces of local 3D posture information are obtained.

The M images are in one-to-one correspondence with the M local 3D gesture information, namely, each image corresponds to one local 3D gesture information respectively. In the case where one image includes one person, one piece of local 3D pose information includes local 3D pose information of one person. In the case where one image includes a plurality of persons, one piece of local 3D pose information includes local 3D pose information of the plurality of persons.

The local 3D pose information of the person may be understood as 3D pose information of the person with respect to the terminal device. The local 3D pose information of the person may include local 3D pose information of a joint point of the person. The local 3D gesture information of the joint points of the people are different, and the corresponding gestures of the people are different. The joint points of the human body are the joint points of the human body. The human node of interest may include a plurality of nodes of interest. By way of example, a person's articulation points may include 18 articulation points, which may be the nose, neck, left shoulder, right shoulder, left wrist, right wrist, left elbow, right elbow, left hip, right hip, left knee, right knee, left ankle, right ankle, left eye, right eye, left ear, right ear, respectively.

It should be understood that the foregoing is illustrative of a human articulation point and is not intended to be limiting. Illustratively, a person's articulation point may include 8 articulation points.

In some embodiments, the terminal device may use an end-to-end network (end to end network) to perform human body pose estimation on the M images respectively, to directly obtain M pieces of local 3D pose information.

In some embodiments, the terminal device may first perform human body pose estimation on the M images using a 2D human body pose estimation (human pose estimation) network to obtain M local 2D pose information, and then may use a 3D pose network to convert the M local 2D pose information into 3D pose information to obtain M local 3D pose information.

It should be understood that the end-to-end network, the 2D human body pose estimation network, and the 3D pose network are not limited, and only a network having the above functions is required.

Under the condition that the terminal equipment comprises a plurality of image acquisition devices, the terminal equipment can respectively estimate the human body gesture of M images included in each video data in the plurality of video data to obtain M pieces of local 3D gesture information of the plurality of video data. It can be seen that each of the plurality of video data corresponds to M pieces of local 3D pose information, respectively.

In one case, the M pieces of local 3D pose information of the plurality of video data may be directly used for the subsequent step. In another case, the terminal device may determine M pieces of target local 3D pose information from M pieces of local 3D pose information of the plurality of video data.

The plurality of image capturing devices may have the same capturing time, and the terminal device may determine the images having the same capturing time in the plurality of video data as the corresponding images. The plurality of image capturing devices may have different capturing times, and the terminal device may determine an image having a shortest time interval between capturing times among the plurality of video data as the corresponding image.

The terminal device may divide the images included in the plurality of video data into M groups according to the acquisition time of the images to obtain M image groups, and then determine an average value of local 3D pose information corresponding to the images included in each image group as local 3D pose information corresponding to each image group to obtain M target local 3D pose information. Each of the M image groups includes a corresponding set of images of the plurality of video data.

By way of example, assuming that the terminal device comprises 2 image capturing means, two video data captured by the 2 image capturing means each comprise 4 images, video data 1 comprises image 1, image 2, image 3 and image 4, and video data 2 comprises image 5, image 6, image 7 and image 8. The acquisition time of the image 1 is the same as that of the image 5, the acquisition time of the image 2 is the same as that of the image 6, the acquisition time of the image 3 is the same as that of the image 7, and the acquisition time of the image 4 is the same as that of the image 8. The terminal device may divide the video data 1 and the video data 2 into 4 image groups according to the acquisition time, the image group 1 including the image 1 and the image 5, the image group 2 including the image 2 and the image 6, the image group 3 including the image 3 and the image 7, and the image group 4 including the image 4 and the image 8. The terminal device may determine an average value of the local 3D pose information corresponding to the image 1 and the image 5 as the local 3D pose information corresponding to the image group 1, may determine an average value of the local 3D pose information corresponding to the image 2 and the image 6 as the local 3D pose information corresponding to the image group 2, may determine an average value of the local 3D pose information corresponding to the image 3 and the image 7 as the local 3D pose information corresponding to the image group 3, and may determine an average value of the local 3D pose information corresponding to the image 4 and the image 8 as the local 3D pose information corresponding to the image group 4.

It should be understood that the terminal device, when determining the average value of the local 3D pose information, determines the average value of the local 3D pose information of the same person instead of determining the average value of the local 3D pose information of different persons.

203. And determining the space pose information of the terminal equipment according to the M images to obtain M pieces of space pose information.

The M images are in one-to-one correspondence with the M space pose information, namely, each piece of space pose information in the M space pose information corresponds to the space pose information of the terminal equipment when one image is shot. The spatial pose information of the terminal equipment can be understood as pose information of the image acquisition device in a world coordinate system.

In some embodiments, under the condition that the terminal device only collects video data, the terminal device can determine space pose information of the terminal device according to the M images through a visual positioning method, and also can determine space pose information of the terminal device through a visual odometer method, so that M pieces of space pose information are obtained.

The terminal equipment determines the space pose information of the terminal equipment according to the video data, less processing data is required, and the determination efficiency of the dynamic capture information can be improved.

In some embodiments, the terminal device may further include an IMU, through which the terminal device may collect N IMU data. The acquisition frequency of the IMU can be greater than or equal to that of the image acquisition device, N is an integer greater than or equal to M, and the N pieces of IMU data are IMU data acquired by the IMU in the acquisition time of M images. The acquisition time of the first IMU data in the N IMU data may be the same as the acquisition time of the first image in the M images. The acquisition time of the last IMU data in the N IMU data can be the same as the acquisition time of the last image in the M images, or can be earlier than the acquisition time of the last image in the M images.

The acquisition frequency and the acquisition time of the IMU and the image acquisition device can be the same, and the terminal equipment can determine the image and IMU data with the same acquisition time as corresponding data. The acquisition frequency and the acquisition time of the IMU and the image acquisition device can be different, and the terminal equipment can determine the image and IMU data with the time interval between the acquisition times within the preset duration as corresponding data.

The terminal equipment can determine the space pose information of the terminal equipment by using a Visual-Inertial Odometry (VIO) method according to the M images and the N IMU data to obtain M space pose information. Specifically, the terminal device may determine, according to the first image and the first IMU data, first spatial pose information of the terminal device using a VIO method. The first image is any one image of the M images, the first IMU data is IMU data corresponding to the first image in the N IMU data, and the first IMU data can comprise one IMU data or a plurality of IMU data. The terminal equipment can determine the space pose information of the terminal equipment according to the video data and the IMU data, and the accuracy of dynamic capture information determination can be improved.

In some embodiments, the terminal device may further include a laser radar (Light detection AND RANGING, light), and the terminal device may collect M pieces of radar data by using the laser radar, where the M pieces of radar data are in one-to-one correspondence with the M images. The acquisition time of the laser radar and the acquisition time of the image acquisition device can be the same, and the terminal equipment can determine the image and the radar data with the same acquisition time as corresponding data. The acquisition time of the laser radar and the acquisition time of the image acquisition device can be different, and the terminal equipment can determine the image and radar data with the shortest time interval between the acquisition times as corresponding data. The terminal equipment can determine the space pose information of the terminal equipment by using a laser radar vision fusion method according to the M images and the M radar data, so as to obtain M space pose information. The terminal equipment can determine the space pose information of the terminal equipment according to the video data and the radar data, and the accuracy of dynamic capture information determination can be improved.

In some embodiments, the terminal device may further include a GPS, and the terminal device may collect M pieces of GPS data through the GPS, where the M pieces of GPS data are in one-to-one correspondence with the M images. The acquisition time of the GPS and the image acquisition device can be the same, and the terminal equipment can determine the image and GPS data with the same acquisition time as corresponding data. The acquisition time of the GPS and the image acquisition device may be different, and the terminal device may determine the image and GPS data having the shortest time interval between the acquisition times as the corresponding data. The terminal equipment can determine the space pose information of the terminal equipment by using a visual fusion method according to the M images and the M GPS data, so as to obtain M space pose information. The terminal equipment can determine the space pose information of the terminal equipment according to the video data and the GPS data, and the accuracy of dynamic acquisition information determination can be improved.

In some embodiments, the terminal device may further include an RTK, and the terminal device may collect M pieces of RTK data through the RTK, where the M pieces of RTK data are in one-to-one correspondence with the M images. The acquisition time of the RTK and the image acquisition device can be the same, and the terminal equipment can determine the image and the RTK data with the same acquisition time as corresponding data. The acquisition time of the RTK and the image acquisition device may be different, and the terminal device may determine the image and the RTK data having the shortest time interval between the acquisition times as corresponding data. The terminal equipment can determine the space pose information of the terminal equipment by using a visual fusion method according to the M images and the M RTK data, so as to obtain M space pose information. The terminal equipment can determine the space pose information of the terminal equipment according to the video data and the RTK data, and the accuracy of dynamic capture information determination can be improved.

In some embodiments, the terminal device may further include an IMU and a LiDAR, where the terminal device may collect N IMU data by the IMU, where the N IMU data is IMU data collected by the IMU during a collection time of M images, and N is an integer greater than or equal to M. The terminal equipment can collect M radar data through the laser radar, and the M radar data are in one-to-one correspondence with the M images. According to the M images, the N IMU data and the M radar data, the terminal equipment can determine the space pose information of the terminal equipment by using a laser radar inertial odometer (LiDAR-Inertial Odometry, LIO) method, and can also determine the space pose information of the terminal equipment by using a laser radar visual inertial fusion method, so that the M space pose information is obtained. The terminal equipment can determine the space pose information of the terminal equipment according to the video data, the IMU data and the radar data, and the accuracy of dynamic acquisition information determination can be improved.

In some embodiments, the terminal device may further include an IMU and a GPS, where the terminal device may collect N IMU data by the IMU, where the N IMU data is IMU data collected by the IMU during a collection time of M images, and N is an integer greater than or equal to M. The terminal equipment can collect M pieces of GPS data through GPS, and the M pieces of GPS data are in one-to-one correspondence with the M images. The terminal equipment can determine the space pose information of the terminal equipment by using a visual inertia fusion method according to the M images, the N IMU data and the M GPS data, so that the M space pose information is obtained. The terminal equipment can determine the space pose information of the terminal equipment according to the video data, the IMU data and the GPS data, and the accuracy of dynamic acquisition information determination can be improved.

In some embodiments, the terminal device may further include an IMU and an RTK, where the terminal device may collect N IMU data by the IMU, where the N IMU data is IMU data collected by the IMU during a collection time of M images, and N is an integer greater than or equal to M. The terminal equipment can collect M RTK data through the RTK, and the M RTK data corresponds to the M images one by one. The terminal equipment can determine the space pose information of the terminal equipment by using a visual inertia fusion method according to the M images, the N IMU data and the M RTK data, so that the M space pose information is obtained. The terminal equipment can determine the space pose information of the terminal equipment according to the video data, the IMU data and the RTK data, and the accuracy of dynamic acquisition information determination can be improved.

In some embodiments, the terminal device may further include a laser radar and a GPS, and the terminal device may collect M radar data by the laser radar, where the M radar data corresponds to the M images one to one. The terminal equipment can collect M pieces of GPS data through GPS, and the M pieces of GPS data are in one-to-one correspondence with the M images. The terminal equipment can determine the space pose information of the terminal equipment by using a laser radar vision fusion method according to the M images, the M radar data and the M GPS data, so that the M space pose information is obtained. The terminal equipment can determine the space pose information of the terminal equipment according to the video data, the radar data and the GPS data, and the accuracy of dynamic acquisition information determination can be improved.

In some embodiments, the terminal device may further include a lidar and an RTK, where the terminal device may collect M radar data by the lidar, where the M radar data corresponds to the M images one to one. The terminal equipment can collect M RTK data through the RTK, and the M RTK data corresponds to the M images one by one. The terminal equipment can determine the space pose information of the terminal equipment by using a laser radar vision fusion method according to the M images, the M radar data and the M RTK data, so that the M space pose information is obtained. The terminal equipment can determine the space pose information of the terminal equipment according to the video data, the IMU data and the radar data, and the accuracy of dynamic acquisition information determination can be improved.

In some embodiments, the terminal device may further include a GPS and an RTK, where the terminal device may collect M GPS data through the GPS, where the M GPS data corresponds to the M images one to one. The terminal equipment can collect M RTK data through the RTK, and the M RTK data corresponds to the M images one by one. The terminal equipment can determine the space pose information of the terminal equipment by using a visual fusion method according to the M images, the M GPS data and the M RTK data, so that the M space pose information is obtained. The terminal equipment can determine the space pose information of the terminal equipment according to the video data, the GPS data and the RTK data, and the accuracy of dynamic acquisition information determination can be improved.

In some embodiments, the terminal device may further include an IMU, a laser radar, and a GPS, where the terminal device may collect N IMU data by the IMU, where the N IMU data is IMU data collected by the IMU during a collection time of M images, and N is an integer greater than or equal to M. The terminal equipment can collect M radar data through the laser radar, and the M radar data are in one-to-one correspondence with the M images. The terminal equipment can collect M pieces of GPS data through GPS, and the M pieces of GPS data are in one-to-one correspondence with the M images. The terminal equipment can determine the space pose information of the terminal equipment by using a laser radar visual inertial fusion method according to M images, N pieces of IMU data, M pieces of radar data and M pieces of GPS data, so that M pieces of space pose information are obtained. The terminal equipment can determine the space pose information of the terminal equipment according to the video data, the IMU data, the radar data and the GPS data, and the accuracy of dynamic acquisition information determination can be improved.

In some embodiments, the terminal device may further include an IMU, a laser radar, and an RTK, where the terminal device may collect N IMU data by the IMU, where the N IMU data is IMU data collected by the IMU during a collection time of M images, and N is an integer greater than or equal to M. The terminal equipment can collect M radar data through the laser radar, and the M radar data are in one-to-one correspondence with the M images. The terminal equipment can collect M RTK data through the RTK, and the M RTK data corresponds to the M images one by one. The terminal equipment can determine the space pose information of the terminal equipment by using a laser radar visual inertial fusion method according to M images, N pieces of IMU data, M pieces of radar data and M pieces of RTK data, so that M pieces of space pose information are obtained. The terminal equipment can determine the space pose information of the terminal equipment according to the video data, the IMU data, the radar data and the RTK data, and the accuracy of dynamic acquisition information determination can be improved.

In some embodiments, the terminal device may further include a GPS, a laser radar, and an RTK, where the terminal device may collect M pieces of GPS data through the GPS, where the M pieces of GPS data are in one-to-one correspondence with the M images. The terminal equipment can collect M radar data through the laser radar, and the M radar data are in one-to-one correspondence with the M images. The terminal equipment can collect M RTK data through the RTK, and the M RTK data corresponds to the M images one by one. The terminal equipment can determine the space pose information of the terminal equipment by using a laser radar vision fusion method according to M images, M radar data, M GPS data and M RTK data, so that M space pose information is obtained. The terminal equipment can determine the space pose information of the terminal equipment according to the video data, the GPS data, the radar data and the RTK data, and the accuracy of dynamic acquisition information determination can be improved.

In some embodiments, the terminal device may further include an IMU, a GPS, and an RTK, where the terminal device may collect N IMU data by the IMU, where the N IMU data is IMU data collected by the IMU during a collection time of M images, and N is an integer greater than or equal to M. The terminal equipment can collect M pieces of GPS data through GPS, and the M pieces of GPS data are in one-to-one correspondence with the M images. The terminal equipment can collect M RTK data through the RTK, and the M RTK data corresponds to the M images one by one. The terminal equipment can determine the space pose information of the terminal equipment by using a visual inertia fusion method according to M images, N pieces of IMU data, M pieces of GPS data and M pieces of RTK data, so that M pieces of space pose information are obtained. The terminal equipment can determine the space pose information of the terminal equipment according to the video data, the IMU data, the GPS data and the RTK data, and the accuracy of dynamic acquisition information determination can be improved.

In some embodiments, the terminal device may further include an IMU, a laser radar, a GPS, and an RTK, where the terminal device may collect N IMU data by the IMU, where the N IMU data is IMU data collected by the IMU during a collection time of M images, and N is an integer greater than or equal to M. The terminal equipment can collect M radar data through the laser radar, and the M radar data are in one-to-one correspondence with the M images. The terminal equipment can collect M pieces of GPS data through GPS, and the M pieces of GPS data are in one-to-one correspondence with the M images. The terminal equipment can collect M RTK data through the RTK, and the M RTK data corresponds to the M images one by one. The terminal equipment can determine the space pose information of the terminal equipment by using a laser radar visual inertial fusion method according to M images, N pieces of IMU data, M pieces of radar data, M pieces of GPS data and M pieces of RTK data, so that M pieces of space pose information are obtained. The terminal equipment can determine the space pose information of the terminal equipment according to the video data, the IMU data, the radar data, the GPS data and the RTK data, and the accuracy of determining the dynamic capture information can be improved.

In the case that the terminal device includes a plurality of image acquisition devices, the terminal device may determine spatial pose information of the terminal device according to M images included in each video data in the plurality of video data, to obtain M spatial pose information corresponding to the plurality of video data. It can be seen that each of the plurality of video data corresponds to M spatial pose information, respectively.

In this case, M pieces of spatial pose information corresponding to the plurality of video data may be directly used for the subsequent step. In another case, the terminal device may determine M pieces of target spatial pose information according to M pieces of spatial pose information corresponding to the plurality of video data.

The terminal device may divide the images included in the plurality of video data into M groups according to the acquisition time of the images to obtain M image groups, and then determine an average value of spatial pose information corresponding to the images included in each image group as spatial pose information corresponding to each image group to obtain M target spatial pose information.

Step 202 and step 203 may be performed in parallel or may be performed in series. In the case where step 202 and step 203 are executed in series, step 202 may be executed after step 203 is executed first, or step 203 may be executed after step 202 is executed first.

204. And according to the M pieces of space pose information, converting the M pieces of local 3D pose information into global 3D pose information, and obtaining M pieces of global 3D pose information.

After obtaining the M local 3D pose information and the M spatial pose information, the terminal device may convert the M local 3D pose information into global 3D pose information according to the M spatial pose information, to obtain M global 3D pose information.

The terminal device may determine the product of the matrix corresponding to the M spatial pose information and the inverse matrix of the matrix corresponding to the M local 3D pose information as the matrix corresponding to the M global 3D pose information, where the formula may be expressed as follows:

T₃＝T₂*T₁

Wherein T ₁ represents a matrix corresponding to M pieces of local 3D pose information, T ₂ represents a matrix corresponding to M pieces of spatial pose information, and T ₃ represents a matrix corresponding to M pieces of global 3D pose information.

In the case that the terminal device includes a plurality of image acquisition devices, in one case, the terminal device may convert M local 3D pose information of the first video data into global 3D pose information according to M spatial pose information corresponding to the first video data, to obtain M global 3D pose information corresponding to the first video data. The first video data is any one of a plurality of video data. The terminal equipment can determine the average value of M pieces of global 3D gesture information corresponding to a plurality of pieces of video data as the space gesture information of the terminal equipment corresponding to each piece of video data, and obtain M pieces of target global 3D gesture information.

Therefore, in the method, the data corresponding to different video data and different video data are processed separately, so that the accuracy of data processing can be improved, the accuracy of global 3D attitude information can be improved, and the accuracy of dynamic capture information determination can be improved.

In another case, the terminal device may convert the M target local 3D pose information into global 3D pose information according to the M target space pose information, to obtain M target global 3D pose information.

Therefore, in the method, the data corresponding to different video data are combined and processed, so that the data processing amount can be reduced, and the determination efficiency of the dynamic capture information can be improved.

205. And determining dynamic capturing information of one or more people according to the M pieces of global 3D gesture information.

After obtaining the M global 3D pose information, the terminal device may determine dynamic capture information of one or more people according to the M global 3D pose information. In particular, the terminal device may determine a plurality of global 3D pose information for one or more persons.

In the dynamic capture information determining method described in fig. 2, the terminal device can determine the dynamic capture information of the person according to the video data collected by the image collecting device integrated on the terminal device, calibration among the image collecting devices is not needed, synchronous processing, data collaborative processing and other processing are not needed for the video data collected by the plurality of image collecting devices, the processing process is reduced, and therefore the determining efficiency of the dynamic capture information can be improved.

Referring to fig. 3, fig. 3 is a flow chart of another dynamic capturing information determining method according to an embodiment of the present application. The dynamic capture information determining method can be applied to terminal equipment, and the terminal equipment can comprise an image acquisition device. As shown in fig. 3, the dynamic capture information determining method may include the following steps.

301. Video data including M images is acquired by an image acquisition device.

Wherein a detailed description of step 301 may refer to step 201.

302. And respectively carrying out human body posture estimation on the M images to obtain M pieces of local 3D posture information.

Wherein a detailed description of step 302 may refer to step 202.

303. And determining the space pose information of the terminal equipment according to the M images to obtain M pieces of space pose information.

Wherein a detailed description of step 303 may refer to step 203.

304. And according to the M pieces of space pose information, converting the M pieces of local 3D pose information into global 3D pose information, and obtaining M pieces of global 3D pose information.

Wherein, reference may be made to step 204 for a detailed description of step 304.

305. And optimizing M pieces of local 3D gesture information and/or M pieces of space gesture information according to the M pieces of global 3D gesture information.

After obtaining the M pieces of global 3D gesture information, the terminal equipment can optimize the M pieces of local 3D gesture information according to the M pieces of global 3D gesture information; the M space pose information can be optimized according to the M global 3D pose information; and the M local 3D gesture information and the M space gesture information can be optimized according to the M global 3D gesture information.

For example, when there is a change amount of global 3D pose information of a first node of a first person in K pieces of continuous global 3D pose information that is less than or equal to a first change threshold and a change amount of K pieces of spatial pose information corresponding to the K pieces of global 3D pose information that is less than or equal to a second change threshold in the M pieces of global 3D pose information, the terminal device may replace local 3D pose information of the first node of the first person in K pieces of local 3D pose information with an average value of local 3D pose information of the first node of the K pieces of local 3D pose information corresponding to the K pieces of global 3D pose information, so as to obtain M pieces of optimized local 3D pose information. The first person M images comprise any person, and the first joint point is any joint point in the joint points of the first person. K is an integer greater than 1.

For example, when the change amount of the global 3D pose information of the first node of the first person in the continuous K pieces of global 3D pose information is smaller than or equal to the first change threshold in the M pieces of global 3D pose information, the terminal device may replace the K pieces of space pose information by using an average value of K pieces of space pose information corresponding to the K pieces of global 3D pose information in the M pieces of space pose information, so as to obtain the optimized M pieces of space pose information.

It should be appreciated that the foregoing is an exemplary illustration of optimizing M local 3D pose information and M spatial pose information based on M global 3D pose information, and is not limited to a specific optimization approach.

306. And determining optimized M pieces of global 3D gesture information according to the optimized M pieces of local 3D gesture information and/or the optimized M pieces of space gesture information.

Under the condition that the terminal equipment optimizes the M pieces of local 3D gesture information, the optimized M pieces of global 3D gesture information can be determined according to the optimized M pieces of local 3D gesture information and the M pieces of space gesture information.

Under the condition that the terminal equipment optimizes the M pieces of space pose information, the optimized M pieces of global 3D pose information can be determined according to the optimized M pieces of space pose information and the M pieces of local 3D pose information.

Under the condition that the terminal equipment optimizes the M pieces of local 3D gesture information and the M pieces of space gesture information, the optimized M pieces of global 3D gesture information can be determined according to the optimized M pieces of space gesture information and the optimized M pieces of local 3D gesture information.

The manner in which the optimized M global 3D pose information is determined is the same as the manner in which the global 3D bit pose information is determined, and the detailed description may refer to step 204.

In some embodiments, the terminal device may directly optimize the M global 3D pose information according to the M global 3D pose information.

For example, when the change amount of the global 3D pose information of the first node of the first person in the K pieces of global 3D pose information is smaller than or equal to the first change threshold, the terminal device may replace the global 3D pose information of the first node of the first person in the K pieces of global 3D pose information by using an average value of the global 3D pose information of the first node of the K pieces of global 3D pose information, so as to obtain M pieces of all 3D pose information after optimization.

By way of example, the terminal device may replace global 3D pose information of a first node of the first person in the K global 3D pose information with a curve point corresponding to a fitted curve of global 3D pose information of the first node of the first person in the K continuous K global 3D pose information, so as to obtain all M optimized 3D pose information, and may improve smoothness of the node in a spatial position.

307. And determining dynamic capturing information of one or more people according to the optimized M global 3D gesture information.

Wherein a detailed description of step 307 may refer to step 205.

In the dynamic capture information determining method described in fig. 3, the terminal device can determine the dynamic capture information of the person according to the video data collected by the image collecting device integrated on the terminal device, calibration among the image collecting devices is not needed, synchronous processing, data collaborative processing and other processing are not needed for the video data collected by the plurality of image collecting devices, the processing process is reduced, and therefore the determining efficiency of the dynamic capture information can be improved. In addition, the global 3D gesture information is used for optimizing the local 3D gesture information and/or the space gesture information, and the global 3D gesture information is re-determined by using the optimized local 3D gesture information and/or the space gesture information, so that the accuracy of determining the global 3D gesture information can be improved, and the accuracy of determining the dynamic capture information can be further improved.

In some embodiments, the terminal device may control the movement of the target object based on the determined dynamic capture information of the one or more persons.

In the case that the terminal device controls the target object using the person's dynamic capture information, the terminal device may control the movement of the target object according to the determined dynamic capture information of one or more persons after determining the dynamic capture information of one or more persons.

The target object may be a movable device such as a robot, a mechanical device, or the like. The target object may also be a virtual object, such as a virtual person, a virtual animal, a virtual tool, etc.

Therefore, the user can control the movement of the target object through the movement, and the user experience can be improved.

In some embodiments, the terminal device may output dynamic capture information for one or more persons.

The terminal device can display the dynamic capturing information of one or more persons, and can output the dynamic capturing information of one or more persons to other devices, so that the other devices can be used for driving target objects such as virtual roles and robots, and can also be used for making MR videos.

It is to be understood that the same or corresponding information in the different embodiments above may be referred to each other.

Referring to fig. 4, fig. 4 is a schematic structural diagram of a dynamic capture information determining apparatus according to an embodiment of the present application. The dynamic capture information determining device can be applied to or deployed in terminal equipment, and the terminal equipment comprises an image acquisition device. As shown in fig. 4, the dynamic capture information determining apparatus may include:

An acquisition unit 401, configured to acquire video data by using an image acquisition device, where the video data includes M images, the images include one or more persons, and M is an integer greater than 1;

An estimation unit 402, configured to perform human body pose estimation on M images respectively, to obtain M pieces of local 3D pose information, where the M images are in one-to-one correspondence with the M pieces of local 3D pose information;

a determining unit 403, configured to determine spatial pose information of the terminal device according to M images, to obtain M pieces of spatial pose information, where the M images correspond to the M pieces of spatial pose information one by one;

The conversion unit 404 is configured to convert the M pieces of local 3D pose information into global 3D pose information according to the M pieces of spatial pose information, to obtain M pieces of global 3D pose information;

the determining unit 403 is further configured to determine dynamic capturing information of one or more people according to the M global 3D pose information.

In some embodiments, the determining unit 403 determines the spatial pose information of the terminal device according to the M images, and obtaining the M spatial pose information includes:

And determining the space pose information of the terminal equipment according to the M images by a visual positioning method or a visual odometer method to obtain M pieces of space pose information.

In some embodiments, the terminal device further comprises an IMU;

the acquisition unit 401 is further configured to acquire N IMU data by using the IMU, where the N IMU data are IMU data acquired by the IMU during an acquisition time of M images, and N is an integer greater than or equal to M;

the determining unit 403 determines, according to the M images, spatial pose information of the terminal device, where obtaining the M spatial pose information includes:

In some embodiments, the terminal device further comprises an IMU and a lidar;

the acquisition unit 401 is further configured to acquire M pieces of radar data through a laser radar, where the M pieces of radar data are in one-to-one correspondence with the M images;

In some embodiments, the dynamic capture information determining apparatus may further include:

the determining unit 403 is further configured to determine, according to the optimized M pieces of local 3D pose information and/or the M pieces of spatial pose information, the optimized M pieces of global 3D pose information;

The determining unit 403 determines dynamic capture information of one or more persons according to the M global 3D pose information includes:

And determining target dynamic capture information of one or more people according to the optimized M global 3D gesture information.

and the control unit is used for controlling the movement of the target object according to the dynamic capturing information of one or more people.

And the output unit is used for outputting dynamic capturing information of one or more persons.

It will be clearly understood by those skilled in the art that, for convenience and brevity of description, specific working processes of the acquisition unit 401, the estimation unit 402, the determination unit 403, the conversion unit 404, the optimization unit, the control unit and the output unit described above may refer to corresponding processes in the foregoing method embodiments, which are not repeated herein.

In several embodiments provided by the present application, the coupling of the elements to each other may be electrical, mechanical, or other.

In addition, each functional unit in the embodiments of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.

Referring to fig. 5, fig. 5 is a schematic structural diagram of a terminal device according to an embodiment of the present application. As shown in fig. 5, the terminal device may include a processor 501 and a memory 502. Memory 502 may store one or more computer programs. The one or more computer programs are configured to perform the methods as described in the foregoing method embodiments. The memory 502 may be separate or integrated with the processor 501.

The processor 501 may include one or more processing cores. The processor 501 may connect various parts within the overall terminal device using various interfaces and lines, and may perform various functions of the terminal device and process data by executing or executing instructions, programs, code sets, or instruction sets stored in the memory 502, and invoking data stored in the memory 502. Alternatively, the processor 501 may be implemented in at least one hardware form of digital signal processing (DIGITAL SIGNAL processing, DSP), field programmable gate array (field programmable GATE ARRAY, FPGA), programmable logic array (programmable logic array, PLA). The processor 501 may integrate one or a combination of several of a central processing unit (central processing unit, CPU), an image processor (graphics processing unit, GPU), and a modem, etc. The CPU mainly processes an operating system, a user interface, an application program and the like; the GPU is used for being responsible for rendering and drawing of display content; the modem is used to handle wireless communications. It will be appreciated that the modem may not be integrated into the processor 501 and may be implemented solely by a single communication chip.

The memory 502 may include random access memory (random access memory, RAM) or read-only memory (ROM). Memory 502 may be used to store instructions, programs, code sets, or instruction sets. The memory 502 may include a stored program area and a stored data area, wherein the stored program area may store instructions for implementing an operating system, instructions for implementing at least one function (such as a touch function, a sound playing function, an image playing function, etc.), instructions for implementing the various method embodiments described above, and the like. The storage data area may also store data created by the terminal device in use (such as phonebook, audio-video data, chat-record data), etc.

The processor 501 may be adapted to perform the various operations performed by the terminal device in the method embodiments described above when the computer program instructions stored in the memory 502 are executed. The specific implementation of these operations may be found in the previous embodiments, and will not be described here in detail.

The embodiment of the application discloses a structural schematic diagram of a computer readable storage medium. The computer readable medium has stored therein computer program code that is callable by a processor to perform the various operations of the method embodiments described above. The specific implementation of each operation may be referred to the previous embodiments, and will not be described herein.

The computer readable storage medium may be an electronic memory such as flash memory, electrically erasable programmable read-only memory (ELECTRICALLY ERASABLE PROGRAMMABLE READ ONLY MEMORY, EEPROM), erasable programmable read-only memory (erasable programmable read only memory, EPROM), hard disk, or ROM. Alternatively, the computer readable storage medium may comprise a non-volatile computer readable medium (non-transitory computer-readable storage medium). The computer readable storage medium has storage space for program code to perform any of the method steps described above. These computer program code can be read from or written to one or more computer program products. The computer program code may be compressed, for example, in a suitable form.

Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present application, and are not limiting; although the application has been described in detail with reference to the foregoing embodiments, it will be appreciated by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not drive the essence of the corresponding technical solutions to depart from the spirit and scope of the technical solutions of the embodiments of the present application.

Claims

1. The dynamic capture information determining method is characterized in that the method is applied to terminal equipment, the terminal equipment comprises an image acquisition device, and the method comprises the following steps:

2. The method of claim 1, wherein determining the spatial pose information of the terminal device according to the M images, and obtaining M spatial pose information comprises:

3. The method of claim 1, wherein the terminal device further comprises an inertial measurement unit, IMU, the method further comprising:

Acquiring N pieces of IMU data by the IMU, wherein the N pieces of IMU data are acquired by the IMU in the acquisition time of the M images, and N is an integer greater than or equal to M;

the determining the spatial pose information of the terminal equipment according to the M images, and obtaining M pieces of spatial pose information comprises:

4. The method of claim 1, wherein the terminal device further comprises an IMU and a lidar, the method further comprising:

m radar data are acquired through the laser radar, and the M radar data are in one-to-one correspondence with the M images;

And determining the space pose information of the terminal equipment by using a laser radar inertial odometer method or a laser radar visual inertial fusion method according to the M images, the N IMU data and the M radar data to obtain M space pose information.

5. The method according to any one of claims 1-4, further comprising:

optimizing the M pieces of local 3D gesture information and/or the M pieces of space gesture information according to the M pieces of global 3D gesture information;

determining optimized M pieces of global 3D gesture information according to the optimized M pieces of local 3D gesture information and/or the M pieces of space gesture information;

The determining the dynamic capture information of the one or more people according to the M global 3D gesture information comprises:

6. The method according to any one of claims 1-4, further comprising:

And controlling the movement of the target object according to the dynamic capturing information of the one or more people.

7. The method according to any one of claims 1-4, further comprising:

outputting the dynamic capturing information of the one or more persons.

8. A dynamic capture information determining apparatus, wherein the apparatus is applied to a terminal device, the terminal device includes an image acquisition apparatus, the apparatus includes:

9. A terminal device comprising a processor and a memory, the processor being configured to invoke a computer program stored in the memory to implement the method according to any of claims 1-7.

10. A computer readable storage medium, characterized in that the computer readable storage medium has stored therein a computer program or computer instructions, which, when executed by a processor, implement the method according to any of claims 1-7.