CN111222437A

CN111222437A - Human body posture estimation method based on multi-depth image feature fusion

Info

Publication number: CN111222437A
Application number: CN201911403474.XA
Authority: CN
Inventors: 张文安; 贾晓凌; 谢长值; 杨旭升
Original assignee: Zhejiang University of Technology ZJUT
Current assignee: Zhejiang University of Technology ZJUT
Priority date: 2019-12-31
Filing date: 2019-12-31
Publication date: 2020-06-02

Abstract

A human body posture estimation method based on multi-depth image feature fusion adopts a distributed fusion method, and solves the problem of human body posture estimation of multi-sensor information fusion in a complex scene. By fusing human body posture information from a plurality of 3D vision sensors, factors influencing human body posture estimation, such as view shielding, human body part misrecognition, motion mutation and the like, are effectively overcome. The invention provides a human body posture estimation method based on multi-depth image feature fusion, which effectively improves the accuracy and robustness of human body posture estimation.

Description

Human body posture estimation method based on multi-depth image feature fusion

Technical Field

The invention belongs to the field of human body posture estimation, and particularly relates to a human body posture estimation method based on multi-depth image feature fusion.

Background

With the continuous development of 3D vision and artificial intelligence technologies, the 3D vision sensor has a wider and wider application range, and especially plays an increasingly important role in the field of human body posture estimation. Human body posture estimation techniques based on 3D vision have been applied to the fields of behavior recognition, behavior prediction, human-computer interaction, video surveillance, virtual reality, and the like, for example, for rehabilitation training of injured people, for physical training analysis of athletes, for character image production of 3D animated movies, and the like.

At present, a human body posture estimation technology based on 3D vision is mature, the foreground and the background can be rapidly segmented by using depth image information, all joint points of a human body are identified by a random forest-based method, and then the 3D human body posture information can be calculated and output. However, a large number of factors affecting estimation of the human body posture, such as visual occlusion, human body component misrecognition, sudden motion change, environmental dynamic change, etc., cause a large measurement information deviation, so that it is difficult to capture complete and reliable human body posture information by means of a single 3D visual sensor. In order to enhance the robustness of the human body posture estimation system to adverse factors such as visual occlusion, environmental changes, etc., an effective method is to fuse human body posture information from multiple 3D visual sensors to obtain complete and reliable human body posture estimation information. However, in the existing 3D visual human body posture estimation, no technology exists that can robustly and effectively fuse the information of a plurality of 3D visual sensors to solve the human body posture estimation problem in a complex scene.

Disclosure of Invention

In order to overcome the defect that a single 3D vision sensor has poor robustness to occlusion, motion mutation, dynamic scene change and the like, the invention provides a human body posture estimation method based on multi-depth image feature fusion, namely, a distributed fusion method is adopted to fuse information of a plurality of 3D vision sensors to obtain estimation of human body posture, and the accuracy and robustness of human body posture estimation are effectively improved.

The technical scheme adopted by the invention for solving the technical problems is as follows:

a human body posture estimation method based on multi-depth image feature fusion comprises the following steps:

step 1) determining a world coordinate system and a rotational translation relation between each camera coordinate system and the world coordinate system, and establishing a kinematic model of each joint point of a human body and the quantity of each sensorMeasuring the model, determining the process noise covariance Q of each joint point of the human body_i,kEach sensor measuring the noise covariance

Isoparametric and initial state of each joint point of the human body under each sensor

Step 2) calculating the state prediction value of each joint point of the human body in each sensor at the moment k according to the kinematic model of each joint point of the human body

And its covariance

Step 3) reading a depth image from the 3D vision sensor, and identifying and calculating the positions of all joint points of the human body based on a depth random forest method

Calculating the residual error under each sensor

And its covariance

Step 4) calculating Kalman filtering gains of all joint points of the human body under all sensors at the moment k

And state estimation values of all joint points of human body at the time k

And its covariance

Step 5) estimating the state of each joint point of the human body under each sensor

And its covariance

When the coordinate system is changed to the world coordinate system, the coordinate system is respectively recorded as

And

step 6) fusing the state estimation values of all the joints of the human body under all the sensors by adopting a distributed fusion method

And

calculating to obtain the fusion state estimated value of each joint point of the human body at the moment k

And its covariance

And (5) repeatedly executing the steps 2) -6) to finish the posture estimation of each joint point of the human body, and obtaining the human body posture estimation fusing the characteristics of the multi-depth image.

In step 1), i represents a serial number of each joint point of the human body, i is 1.. multidot.25, each joint point of the human body comprises a head joint, a thoracic vertebra joint, a shoulder joint, an elbow joint, a wrist joint and other joint points of the human body which need to be estimated, l represents a serial number of a visual sensor, and l is 1, 2.. multidot.n, wherein n is more than or equal to 2 and represents the number of the sensors. And k is a discrete time sequence.

In the step 1), the state of the human body joint point is the position of each joint point on the x, y and z axes of each camera coordinate system.

In the step 3), the residual error

For the measured value of each joint point of the human body under each sensor

And its predicted value

The difference between them.

In the step 3), the position information of each joint point of the human body is calculated on the basis of realizing human body part identification by using a random forest method according to the read position information of each joint point of the human body.

In the step 5), the superscript l represents an estimation result of the sensor l in the world coordinate system, and the read position information of each joint point of the human body is calculated to obtain the position information of each joint point of the human body on the basis of realizing human body part identification by using a random forest method.

In the step 6), the superscript f represents a fusion estimation result.

The invention has the following beneficial effects: aiming at the defects of shielding, environment change and the like existing when a single 3D vision sensor captures the human body posture, the distributed fusion method is adopted to fuse the information of a plurality of 3D vision sensors to obtain the estimation of the human body posture, so that the adverse effect on the estimation of the human body posture in a complex environment is reduced, and the accuracy and the robustness of the estimation of the human body posture are effectively improved.

Drawings

Fig. 1 is a schematic diagram for describing joint points of a human body under a depth image.

Fig. 2 is a flow chart of human body pose estimation based on multi-depth image feature fusion.

Detailed Description

The invention is further described below with reference to the accompanying drawings.

Referring to fig. 1 and 2, a human body posture estimation method based on multi-depth image feature fusion includes the following steps:

step 1) determining a world coordinate system and a rotational translation relation between each camera coordinate system and the world coordinate system, establishing a kinematics model of each joint of the human body and a measurement model of each sensor, and determining a process noise covariance Q of each joint of the human body_i,kEach sensor measuring the noise covariance

And its covariance

Calculating the residual error under each sensor

And its covariance

And state estimation values of all joint points of human body at the time k

And its covariance

And its covariance

Converted into world coordinate system and respectively recorded as

And

And

And its covariance

As shown in fig. 1, the human posture estimation problem is decomposed into position estimation problems of joint points of the human body, which include 25 human joint points such as a head joint, a thoracic joint, a shoulder joint, an elbow joint, and a wrist joint. A flowchart of human body pose estimation based on multi-depth image feature fusion is shown in fig. 2. Firstly, calibrating a camera coordinate system and a world coordinate system of each sensor, and determining a rotational-translational relation between each camera coordinate system and the world coordinate system. Establishing a kinematic model of each joint point of the human body and a measurement model of each sensor:

x_i,k＝x_i,k-1+w_i,k(1)

wherein k is 1,2, … is a discrete time sequence,

the state of a human body joint point i is 1,2, m is the serial number of each joint point of the human body, m is 25,

and

the coordinate values of each joint point of the human body at the time k on the x, y and z axes, w_i,kIs zero mean and covariance is Q_i,kWhite gaussian noise.

The measured values of each joint point of the human body under the camera coordinate system of the sensor,

respectively measuring values of all joint points of the human body at the time k on x, y and z axes,

is zero mean and covariance of

White gaussian noise, where l is 1, …, n, each measured noise

Are not related to each other, and w_i,kIs not relevant. Determining the initial state and covariance of each joint point of the human body as

And

secondly, calculating the state prediction values of all the joint points of the human body under all the sensors

And its covariance

And residual error

And its covariance

Thirdly, calculating Kalman filtering gains of all joint points of the human body under all sensors

State estimation

And its covariance

Then, estimating the state of each joint point of the human body

And its covariance

Conversion to the world coordinate system of

And

finally, calculating the fusion state estimated value of each joint point of the human body

And its covariance

According to the kinematic model of each joint point of the human body and the state estimation value of the previous moment

And its covariance

State prediction value of each joint point of human body under each sensor

And its covariance

And residual error

And its covariance

The calculation formula of (a) is as follows:

calculating Kalman filtering gain of each joint point of human body under each sensor

And obtaining the state estimation value of each joint point of the human body at the moment k

And its covariance

Calculating the state estimation value of each joint point of the human body under each sensor in the world coordinate system

And its covariance

The conversion formula is as follows:

wherein the content of the first and second substances,

is the rotational-translational relationship between each camera coordinate system and the world coordinate system.

State estimation values of all joint points of human body under all sensors are fused by adopting a distributed fusion method

Computing a fused state estimate

And its covariance

And (4) repeatedly executing the formulas 3) -13), finishing the state estimation of all the joint points of the human body, and thus obtaining the human body posture estimation based on the multi-depth image feature fusion.

Claims

1. A human body posture estimation method based on multi-depth image feature fusion is characterized by comprising the following steps: the method comprises the following steps:

And its covariance

Calculating the residual error under each sensor

And its covariance

And state estimation values of all joint points of human body at the time k

And its covariance

And its covariance

And

And

And its covariance

2. The human body posture estimation method based on multi-depth image feature fusion as claimed in claim 1, characterized in that: in the step 1), i represents the serial number of each joint point of the human body, i is 1,.. and 25, each joint point of the human body comprises a head joint, a thoracic vertebra joint, a shoulder joint, an elbow joint, a wrist joint and other joint points of the human body which need to be estimated, l represents the serial number of the visual sensor, l is 1,2,.. n, wherein n is more than or equal to 2 and represents the number of the sensors.

3. The human body posture estimation method based on multi-depth image feature fusion as claimed in claim 1 or 2, characterized in that: in the step 1), the state of the human body joint point is the position of each joint point on the x, y and z axes of each camera coordinate system.

4. The human body posture estimation method based on multi-depth image feature fusion as claimed in claim 1 or 2, characterized in that: in the step 3), the residual error

For the measured value of each joint point of the human body under each sensor

And its predicted value

The difference between them.

5. The human body posture estimation method based on multi-depth image feature fusion as claimed in claim 1 or 2, characterized in that: in the step 3), the position information of each joint point of the human body is calculated on the basis of realizing human body part identification by using a random forest method according to the read position information of each joint point of the human body.

6. The human body posture estimation method based on multi-depth image feature fusion as claimed in claim 1 or 2, characterized in that: in the step 5), the superscript l represents an estimation result of the sensor l in the world coordinate system, and the read position information of each joint point of the human body is calculated to obtain the position information of each joint point of the human body on the basis of realizing human body part identification by using a random forest method.

7. The human body posture estimation method based on multi-depth image feature fusion as claimed in claim 1 or 2, characterized in that: in the step 6), the superscript f represents a fusion estimation result.