CN109035305A

CN109035305A - Indoor human body detection and tracking in the case of a kind of low visual angle based on RGB-D

Info

Publication number: CN109035305A
Application number: CN201810908661.2A
Authority: CN
Inventors: 袁泽慧; 段荣杰; 安晓红; 李世中; 张亚
Original assignee: North University of China
Current assignee: North University of China
Priority date: 2018-08-10
Filing date: 2018-08-10
Publication date: 2018-12-18
Anticipated expiration: 2038-08-10
Also published as: CN109035305B

Abstract

The present invention relates in the case of a kind of low visual angle based on RGB-D indoor human body detection and tracking, belong to Human Detection field, specifically include following steps, point cloud acquisition is carried out using Asus Xtion Pro, obtain 3D point cloud, then noise reduction and down-sampled processing are carried out to 3D point cloud, and ground is detected and is removed, 3D cluster is carried out followed by the Euclidean distance between two o'clock, by the HOG feature for calculating each cluster, and it is fed to the soft classifier of SVM binary system trained in advance, being classified as people with high HOG characteristic, to achieve the purpose that human testing, the tracking of human body is finally realized using the colour consistency during data coherency and two likelihood probabilities constituted apart from consistency.Precision of the present invention is high, the indoor human body detection and tracking being widely used in the case of low visual angle.

Description

Indoor human body detection and tracking in the case of a kind of low visual angle based on RGB-D

Technical field

The present invention relates in the case of a kind of low visual angle based on RGB-D indoor human body detection and tracking, belong to human body Detection technique field.

Background technique

Human testing and tracking are the key that mobile robot executes task in environment indoors.Mobile robot is come It says, it must be able to difference human body and other barriers, to adjust track according to its task.For example service robot must be in spy Help is provided to specific people in fixed environment.

There are certain methods to realize human testing using RGB-D depth camera or radar range finding by simple at present With tracking.With the appearance of RGB-D depth camera, for example, Microsoft Kinect or Asus Xtion Pro, can be with 30 The image of 640 × 480 pixel of frame/s rate capture, simultaneously because the advantages that its energy consumption is small, was widely used in recent years The 3D of robot is perceived and the fields such as indoor positioning and identification.

There are some more successful human testing algorithms at present, however these algorithms are all based on the entire body of people, The especially visible situation in head.In some cases, as in the case that robot is especially close far from object, the major part of object It can be other than sensor sensing range.Meanwhile when RGB-D sensor is mounted on small scale robot, such as Turtlebot2.It Sight be in close proximity to ground, therefore can only observe the lower part of the body of people or only leg is visible.At low visual angle Human testing with tracking be it is very difficult, mainly some main features are all lost, and are distinguished without apparent feature Human body and other objects, such as can not be distinguished the leg, chair and human body of desk using corresponding feature.When RGB-D pacifies When on Turtlebot2, if people is very close from robot, (distance < 100cm), in most cases, from foot To waist all as it can be seen that be based on this, we have proposed in the case of a kind of low visual angle human testing and track algorithm.

Summary of the invention

To solve technical problem of the existing technology, the present invention provides in the case of a kind of low visual angle based on RGB-D Indoor human body detection and tracking, cluster the object on ground using in limited height, in conjunction with HOG feature People or object are distinguished with SVM classifier, then the joint likelihood probability using colour consistency and apart from consistency composition is to human body It is tracked, accuracy rate is higher.

To achieve the above object, the technical scheme adopted by the invention is as follows a kind of room in the case of low visual angle based on RGB-D Interior human testing and tracking, include the following steps,

The acquisition and pretreatment of step a. data

Point cloud acquisition is carried out using Asus Xtion Pro, obtains intensive 3D point cloud, by straight-through filter to intensive 3D point Cloud carries out noise reduction process, and carries out down-sampled processing to the 3D point cloud after noise reduction by three-dimensional voxel lattice filter；

It step b. ground detection and filters out

Pass through the ground in the 3D initial point cloud of step a processing using the least square method detection based on RANSAC, for For non-first frame data, the ground parameter that previous frame detects can be used as the initial parameter of next frame, according to initially giving Fixed ground parameter and preset initial distance threshold value carry out ground detection to current frame data using RANSAC method, and Index of each three-dimensional point in 3D point cloud according to ground, it is filtered out from 3D point cloud；It, can basis for the first frame data The determining relativeness with ground of installation site of the RGB-D sensor in robot, thus the initial parameter on given ground；

Step c. cluster

The point cloud data within 130cm from the ground is obtained in the 3D point cloud for removing ground, then using between two o'clock Euclidean distance carries out 3D cluster, and when the Euclidean distance between each pair of point is less than predefined distance threshold, this two o'clock is defined as Belong to same class；Wherein, cluster needs to set two initial thresholds: belonging to the most narrow spacing between the two o'clock of the same cluster From the minimal point required with one cluster of generation；

Step d.HOG+SVM classification

RGB image will be projected to through the point cloud in 3D cluster back boundary frame, using the image module of acquisition, calculates its HOG Then obtained HOG describer is sent to preparatory trained SVM classifier, calculates the HOG confidence of each cluster by describer Degree；When the HOG confidence level of calculating is higher than the threshold value of setting, then determine that the cluster for people, is not otherwise then；

Step e. tracking

Obtained human body is clustered into the input as tracking module, i.e., as the object to be tracked in next step, is then utilized The human body cluster detected in each frame is matched with existing tracking object, specific using based on apart from consistency and color The method that consistency is combined calculates the maximum likelihood probability between currently detected human body and known tracking object.

Preferably, in the step c, when being clustered, 1) the Kd tree for creating 3D point cloud first, as extracting later a little Searching method used when cloud；2) the empty list C of setting cluster, and the empty queue of point cloud；3) for each of cloud Point p, using preset distance threshold as radius, searches for all neighbor points in the ball centered on it, the neighbour obtained for search Near point, first checks for whether it has been added other clusters, if it is not, queue Q is added in the Neighbor Points；4) as queue Q In point it is all processed after, queue Q is added in cluster C；5) after point all in initial point cloud is all processed, End of clustering.

Preferably, it in the step c, after the completion of cluster, needs further to locate cluster and deficient clustering problem excessively Reason；

The processing for crossing cluster is as follows: the cluster C obtained for each_i, its central point is calculated first in XZ plane Project p_iIf p_iWith cluster C_jCentral projection point p_jDistance be less than setting threshold value when, then it is assumed that cluster C_iAnd C_jBelong to The same cluster, and then merged；

The processing for owing cluster is as follows: for each cluster, their geological information calculated, specifically includes width, depth, Elevation information；If the geological information of some clusters is much larger than the threshold value set, further divided using colouring information, i.e., will Point with same color is classified as same class；Those are put with very little cluster, is directly given up.

Preferably, in the step e,

Apart from conformance definition are as follows: a given human testing clusters C_i, by handling global arest neighbors data to find Immediate tracking object T_jIf their distance is less than threshold value, the cluster of detection is regarded as having connection with the cluster of tracking System, then for tracking object T_jEach of point p_i,j, detection target C is found using Octree method_iEach of point p_j,i, Calculate the distance between its, point p_i,jWith point p_j,iThe distance between consistency definition of probability it is as follows:

Wherein α is weight vectors；

Colour consistency is defined as: cluster C when comparing current detection_iWith tracking object T_jColouring information when, pass through calculating Find nearest a pair of point < p_j,i,p_i,j> between colour consistency, colour consistency can be in RGB, HSV or other color spaces Middle calculating；By taking HSV space as an example, point p_i,jAnd p_j,iBetween colour consistency definition of probability it is as follows:

Wherein c_i,jAnd c_j,iRespectively indicate p_i,jAnd p_j,iHSV information, β indicate weight；

p_i,jAnd p_j,iBetween joint consistency definition of probability are as follows:

L(p_i,j,p_j,i)=L_d(p_i,j,p_j,i)L_c(p_i,j,p_j,i)；

For each tracking object T_jC is clustered with detection_iMaximum combined likelihood probability is defined as:

If L (j, i) is higher than the threshold value of setting, show currently to cluster C_iWith tracking object T_jThe same person, it is on the contrary then It is not, if do not found and C_iThe tracking object being associated then creates new tracking object.

Compared with prior art, the present invention has following technical effect that present invention is generally directed to when RGB-D sensor is liftoff Face is lower, i.e., in the case where low visual angle, or when detection object is closer from sensor, only visible feelings of the human body lower part of the body Condition proposes a kind of human testing algorithm based on human body lower half as main feature.The algorithm can effectively improve human body The accuracy of detection is detected and is removed to the ground in scene first based on the common sense that human body moves on the ground, The object on ground is clustered in limited height, the HOG feature by calculating them is then fed to training in advance The soft classifier of SVM binary system, being classified as people with high HOG the value of the confidence are on the contrary then be classified as other.Then detection is obtained Input of the human body result as tracking module, using colour consistency and the joint likelihood probability constituted apart from consistency, It finds and currently detected human body clusters immediate tracking object, when maximum likelihood probability is greater than given threshold, it is believed that Human body cluster and the tracking object that current detection obtains are same people.

Detailed description of the invention

Fig. 1 is flow chart of the invention.

Fig. 2 is that the present invention collects intensive 3D point cloud datagram.

Fig. 3 is that intensive 3D point cloud data pass through pretreated point cloud chart in the present invention.

Fig. 4 is the identification situation schematic diagram of the present invention under various circumstances.

Specific embodiment

In order to which technical problems, technical solutions and advantages to be solved are more clearly understood, tie below Accompanying drawings and embodiments are closed, the present invention will be described in further detail.It should be appreciated that specific embodiment described herein is only To explain the present invention, it is not intended to limit the present invention.

As shown in Figure 1, indoor human body detection and tracking in the case of a kind of low visual angle based on RGB-D, including it is following Step,

The acquisition and processing of step a. data

As shown in Fig. 2, carrying out point cloud acquisition using Asus Xtion Pro, intensive 3D point cloud is obtained.Due to by different The fact that discretization effect and camera in the influence of error source, especially depth measurement are calibrated in a certain range, adopts There is very big noise in the Initial R GB-D point cloud collected, while the RGB-D point cloud that each frame obtains includes 307200 points, Dimension 640 × 480 corresponding to its depth image.Therefore in order to improve the processing speed and precision of data, first with straight-through Filter removes uninterested point, such as according to the parameter of Xtion pro, filters out the point for being greater than 5m on the direction z.Then three are used It is down-sampled to cloud progress to tie up voxel grid filter, the algorithm is using the center of gravity of all the points in voxel come in approximate display voxel Other points reduce operand to realize the down-sampled of point cloud data.The side length of voxel (i.e. three-dimensional cube) is dimensioned to 0.06m.The point cloud data that RGB-D sensor obtains can be compressed to an order of magnitude by down-sampled, to substantially reduce The processing time of later period point cloud, at the same it is down-sampled after data density throughout it is identical.Treated point cloud data such as Fig. 3 institute Show.

It step b. ground detection and filters out

This step is walked on the ground based on people it is assumed that therefore carrying out ground in our point clouds first after the filtering Detection, and remove it.When being fixedly mounted in mobile robot due to RGB-D, it is approximate with the relative position on ground It is known that being based on this point, initial ground parameter is set when handling first frame point cloud, is utilized on this basis based on RANSAC most Ground in small square law detection 3D point cloud, obtains updated ground parameter, and using updated ground parameter as next The initial parameter on the ground of frame.It in this way can be with parameter of the real-time update ground in robot coordinate system.Theoretically, ground is Proper plane, relative to robot, or in robot coordinate system, its parameter is fixed, but due to machine The vibration of camera and the small skew of actual ground in people's movement, cause ground parameter in different moments, in robot coordinate Parameter is different in system.

Step c. cluster

The point cloud data within 130cm from the ground is obtained in the 3D point cloud for removing ground, then using between two o'clock Euclidean distance carries out 3D cluster, and when the Euclidean distance between each pair of point is less than predefined distance threshold, this two o'clock is defined as Belong to same class；Wherein, cluster needs to set two initial thresholds: belonging to the most narrow spacing between the two o'clock of the same cluster From the minimal point required with one cluster of generation.

Once ground is detected, so that it may remove the three-dimensional point for belonging to ground, therefore object and ground on ground It no longer contacts, shows as the object being not connected with one by one.Due to our human body detecting method use the lower part of human body as Feature description, therefore the point cloud data to be analyzed is limited for the point cloud data within 130cm from the ground, then using between two o'clock Euclidean distance carry out 3D cluster.

When being clustered, 1) the Kd tree for creating 3D point cloud first, as extract later put cloud when searching method used； 2) the empty list C of setting cluster, and the empty queue of point cloud；3) for each of cloud point p, centered on it, with pre- If distance threshold be radius, search for all neighbor points in the ball, for the obtained neighbor point of search, whether first check for it Other clusters are added, if it is not, queue Q is added in the Neighbor Points；4) when the point in queue Q is all processed it Afterwards, queue Q is added in cluster C；5) after point all in initial point cloud is all processed, end of clustering.

After the completion of Euclidean cluster, there is a problem of in actual operation two kinds it is typical, that is, cross and cluster and owe to cluster: (1) mistake Cluster, i.e., the point cloud that should belong to the same object is divided into multiple clusters, this is primarily due to noise, blocks and lacks with some Caused by the depth data of mistake.(2) cluster is owed, as its name suggests, two or more different objects should then be belonged to by owing cluster Point cloud is classified as the same cluster by inappropriate.Such as when people and background, cabinet or desk lean on it is close when, people usually with back Sight spot, desk are classified as same class.In our experiment, due to only considering the point on ground within 130 centimetres, In cluster operation, the case where point of two different people is merged together, is not common.

In order to solve both of these problems, after carrying out initial European cluster, obtained cluster need to be further processed.

For crossing clustering problem, after the completion of cluster, the cluster C that is obtained for each_i, its central point is calculated first in XZ Projection p in plane_iIf p_iWith cluster C_jCentral projection point p_jDistance be less than setting threshold value when, then it is assumed that cluster C_i And C_jBelong to the same cluster, and then is merged.

For owing cluster, when a people and background are divided into one kind, separated using colouring information.This is the trousers based on people Sub-color it is different with background color it is assumed that the color of such as jeans be it is blue, wall is white.Specific algorithm is as follows: For each cluster, their geological information is calculated, width, depth, elevation information are specifically included；If some clusters is several What information is much larger than the threshold value set, is further divided using colouring information, i.e., is classified as the point with same color same A class；Very little cluster is put similarly for those in this operation, is directly given up.

Step d.HOG+SVM classification

RGB image will be projected to through the point cloud in 3D cluster back boundary frame, using the image module of acquisition, calculates its HOG Then obtained HOG describer is sent to preparatory trained SVM classifier, calculates the HOG confidence of each cluster by describer Degree；When the HOG confidence level of calculating is higher than the threshold value of setting, then determines that the cluster is people, be not otherwise；

Step e. tracking

Obtained human body is clustered into the input as tracking module, i.e., as the object to be tracked in next step, is then utilized Human body cluster is matched with existing tracking object, specific using based on apart from the side that consistency and colour consistency are combined Method calculates the maximum likelihood probability between currently detected human body and known tracking object.

Track algorithm is the track that each target is estimated using particle filter.Assuming that people moves on the ground, everyone Status tracking amount show as 2D transformation, the i.e. position (x, y) of center of gravity and rotation angle θ.Wherein mobility model is set as constant speed fortune It is dynamic, this is because the model is good at the complete coupled problem of processing.

When a cluster is classified as human body, it is meant that they are not in contact with known tracking object, at this moment create New tracking object.Obviously, the human body detected in the first frame is both initialized to new tracking object.

In order to the people detected in present frame and known tracking object match, it is calculated based on color and distance Consistency, specific as follows shown:

Apart from conformance definition are as follows: a given human testing clusters C_i, by handling global arest neighbors data to find Immediate human body tracking object T_jIf their distance is less than threshold value, the cluster of detection is regarded as the object with tracking It is related, then for tracking object T_jEach of point p_i,j, detection target C is found using Octree method_iIn each point p_j,i, calculate the distance between its, point p_i,jWith point p_j,iThe distance between consistency definition of probability it is as follows:

Wherein α is weight vectors；

Colour consistency is defined as: cluster C when comparing current detection_iWith tracking object T_jColouring information when, pass through calculating Find nearest a pair of point < p_j,i,p_i,j> between colour consistency.Colour consistency can be in RGB, HSV or other color spaces Middle calculating.By taking HSV space as an example, point p_i,jAnd p_j,iBetween colour consistency definition of probability it is as follows:

L(p_i,j,p_j,i)=L_d(p_i,j,p_j,i)L_c(p_i,j,p_j,i)；

For each tracking object T_jC is clustered with detection_iMaximum combined likelihood probability then is defined as:

If L (j, i) is higher than the threshold value of setting, show currently to cluster C_iWith tracking object T_jThe same person, it is on the contrary then It is not.If do not found and C_iThe tracking object being associated then creates new tracking object.

In order to verify the algorithm of proposition, carries out following experiment: Asus Xtion Pro being installed on Turtlebot 2 and is swashed Optical radar, wherein Xtion Pro is used to collect the original RGB-D point cloud data of environment, and laser radar is then used to avoidance.

As shown in figure 4, reality has been carried out under three kinds of different scenes respectively in order to verify method proposed in the present invention It tests:

Simple environment: not in the environment of barrier, sensor is fixed, and two people are moved with identical track, in Fig. 4 A.

Moderate environment: not in the environment of barrier, sensor is fixed, more than two people's random walks and movement rail There is intersection between mark, such as the b in Fig. 4.

Difficult circumstances: having a barrier, robot motion, between three or more people's random walks and its motion profile mutually Intersect, such as the c in Fig. 4.

For each scene, for video sequence in about 250 frames, total test set includes 798 frames.One has 2698 people's Example, they by hand labeled on RGB image, using as truthful data.

Human detection result:

In order to verify the performance of method proposed by the invention, using the measurement based on frame, precision (p), recall rate are utilized (r) and f1 score value three measures the superiority and inferiority of proposed method.Three are respectively defined as: Wherein TP indicates kidney-Yang rate, and FP indicates false positive rate, and FN indicates False-Negative Rate.

As shown in figure 4, the human body detected is enclosed with green box, it can be, it is evident that for first two feelings from figure Condition, under especially simple environment, Comparison of experiment results is good.It is as a result just good without first two however under difficult scene.This is very It is readily appreciated that there is barrier in difficult scene.It in some cases, is highly difficult by geological information and HOG+SVM classification 's.As a result some barriers are identified as human body by mistake in.The human body under three kinds of different environment is had recorded in Table 1 The performance of detection.

In order to reduce rate of false alarm, -2.2 are strictly set by HOG confidence threshold.This helps to reduce false positive rate, but also can Lead to the increase of False-Negative Rate.

Table one: the performance parameter of human testing model under three kinds of varying environments

	Precision	Recall rate	F1 scoring
				Simple environment	0.97	0.91	0.94
Moderate environment	0.92	0.88	0.89
				Difficult circumstances	0.82	0.78	0.80

Human body tracking result:

We have evaluated our tracking result with regard to false positive rate and False-Negative Rate, and table two has recorded in three kinds of different environment Under tracking performance, result is preferable however in the case where difficulty in the case where simple and medium, FP and FN ratio is somewhat high, It is 5.8% and 5.2% respectively.This is mainly due in this scenario, the movement speed of people is very fast, is blocked or people by other people Other than camera fields of view range.

Table two: the performance parameter of human body tracking model under three kinds of varying environments

	FP	FN
			Simple environment	2.4%	1.8%
Moderate environment	4.6%	4.4%
			Difficult circumstances	5.8%	5.2%

The foregoing is merely illustrative of the preferred embodiments of the present invention, is not intended to limit the invention, all in essence of the invention Made any modifications, equivalent replacements, and improvements etc., should all wrap within the scope of the present invention within mind and principle.

Claims

1. indoor human body detection and tracking in the case of a kind of low visual angle based on RGB-D, which is characterized in that including following step Suddenly,

The acquisition and pretreatment of step a. data

Carry out point cloud acquisition using Asus Xtion Pro, obtain intensive 3D point cloud, by straight-through filter to intensive 3D point cloud into Row noise reduction process, and down-sampled processing is carried out to the 3D point cloud after noise reduction by three-dimensional voxel lattice filter；

It step b. ground detection and filters out

Using the ground in 3D initial point cloud of the least square method detection by step a processing based on RANSAC, for non-the For one frame data, the ground parameter that previous frame detects can be used as the initial parameter of next frame, according to what is initially given Ground parameter and preset initial distance threshold value carry out ground detection, and foundation to current frame data using RANSAC method Index of each three-dimensional point on ground in 3D point cloud, it is filtered out from 3D point cloud；It, can be according to RGB-D for the first frame data The determining relativeness with ground of installation site of the sensor in robot, thus the initial parameter on given ground；

Step c. cluster

The point cloud data within 130cm from the ground is obtained in the 3D point cloud for removing ground, then utilizes the Euclidean between two o'clock Distance carries out 3D cluster, and when the Euclidean distance between each pair of point is less than predefined distance threshold, this two o'clock is defined as belonging to Same class；Wherein, cluster needs to set two initial thresholds: belong to minimum range between the two o'clock of the same cluster and Minimal point needed for generating a cluster；

Step d.HOG+SVM classification

RGB image will be projected to through the point cloud in 3D cluster back boundary frame, using the image module of acquisition, calculates its HOG description Then obtained HOG describer is sent to preparatory trained SVM classifier, calculates the HOG confidence level of each cluster by device； When the HOG confidence level of calculating is higher than the threshold value of setting, then determine that the cluster for people, is not otherwise then；

Step e. tracking

Obtained human body is clustered into the input as tracking module, i.e., as the object to be tracked in next step, then utilizes each frame The middle obtained human body cluster that detects is matched with existing tracking object, specific using based on apart from consistency and solid colour The method that property is combined, calculates the maximum likelihood probability between currently detected human body and known tracking object.

2. indoor human body detection and tracking in the case of a kind of low visual angle based on RGB-D according to claim 1, It is characterized in that, in the step c, when being clustered, 1) the Kd tree that creates 3D point cloud first, as extraction point Yun Shisuo later Searching method；2) the empty list C of setting cluster, and the empty queue of point cloud；3) for each of cloud point p, with Centered on it, using preset distance threshold as radius, all neighbor points in the ball are searched for, for searching for obtained neighbor point, First check for whether it has been added other clusters, if it is not, queue Q is added in the Neighbor Points；4) when the point in queue Q After being all processed, queue Q is added in cluster C；5) after point all in initial point cloud is all processed, cluster knot Beam.

3. indoor human body detection and tracking in the case of a kind of low visual angle based on RGB-D according to claim 2, It is characterized in that, in the step c, after the completion of cluster, needs that cluster and deficient clustering problem excessively is further processed；

The processing for crossing cluster is as follows: the cluster C obtained for each_i, projection of its central point in XZ plane is calculated first p_iIf p_iWith cluster C_jCentral projection point p_jDistance be less than setting threshold value when, then it is assumed that cluster C_iAnd C_jBelong to same A cluster, and then merged；

The processing for owing cluster is as follows: for each cluster, calculating their geological information, specifically includes width, depth, height Information；If the geological information of some clusters is much larger than the threshold value set, is further divided using colouring information, i.e., will had The point of same color is classified as same class；Those are put with very little cluster, is directly given up.

4. indoor human body detection and tracking in the case of a kind of low visual angle based on RGB-D according to claim 1, It is characterized in that, in the step e,

Apart from conformance definition are as follows: a given human testing clusters C_i, closest to find by handling global arest neighbors data Tracking object T_jIf their distance is less than threshold value, the cluster of detection is regarded as being related with the cluster of tracking, then For tracking object T_jEach of point p_i,j, detection target C is found using Octree method_iEach of point p_j,i, calculate its it Between distance, point p_i,jWith point p_j,iThe distance between consistency definition of probability it is as follows:

Wherein α is weight vectors；

Colour consistency is defined as: cluster C when comparing current detection_iWith tracking object T_jColouring information when, by calculate find Nearest a pair of point < p_j,i,p_i,j> between colour consistency, colour consistency can fall into a trap in RGB, HSV or other color spaces It calculates；By taking HSV space as an example, point p_i,jAnd p_j,iBetween colour consistency definition of probability it is as follows:

L(p_i,j,p_j,i)=L_d(p_i,j,p_j,i)L_c(p_i,j,p_j,i)；

If L (j, i) is higher than the threshold value of setting, show currently to cluster C_iWith tracking object T_jThe same person, it is on the contrary then not It is, if do not found and C_iThe tracking object being associated then creates new tracking object.