WO2013091370A1

WO2013091370A1 - Human body part detection method based on parallel statistics learning of 3d depth image information

Info

Publication number: WO2013091370A1
Application number: PCT/CN2012/077874
Authority: WO
Inventors: 黄向生; 徐波
Original assignee: 中国科学院自动化研究所
Priority date: 2011-12-22
Filing date: 2012-06-29
Publication date: 2013-06-27
Also published as: CN102609680B; CN102609680A

Abstract

Disclosed is a human body part detection method based on the parallel statistics learning of 3D depth image information. For problems that human body parts (head, hands, and feet) are complicated in shape changes and hard to describe and so on, a novel feature that embodies the diversity of human body parts is constructed, i.e. a universal feature, a parallel statistics learning method is applied to select effective and sufficient novel features and form a parallel cascaded classifier, thus performing real-time and highly efficient detection of human body parts.

Description

基于三维深度图像信息的并行统计学习人体部位检测方法技术领域本发明涉及图像处理、模式识别、人机交互及视觉监控等领域，尤其是一种基于三维深度图像信息的并行统计学习人体部位检测方法。 FIELD OF THE INVENTION The present invention relates to the fields of image processing, pattern recognition, human-computer interaction, and visual monitoring, and more particularly to a method for parallel detection of human body parts based on three-dimensional depth image information. .

背景技术随着计算机性能的逐歩提高和各个领域对计算机使用的不断深入，人与计算机的交互技术日益成为计算机领域的研究热点。基于动态序列图像的目标识别已经成为近年来计算机视觉领域中备受关注的研究内容，它主要从图像序列中检测、识别、跟踪以及对生物特征理解和描述进行研究。 BACKGROUND OF THE INVENTION With the improvement of computer performance and the continuous use of computers in various fields, the interaction technology between humans and computers has increasingly become a research hotspot in the field of computers. Target recognition based on dynamic sequence images has become a research topic in the field of computer vision in recent years. It mainly detects, identifies, tracks and understands and describes biometrics in image sequences.

目标检测是目标识别中最为关键的一歩，是研究如何让计算机以人的思维方式从图像或视频中找出目标对象所在区域的技术。其中人手检测技术是研究难度最大的一个问题。目前，赤手交互成为虚拟游戏中非常吸引人的一项应用，这将引起新一轮的对人体部位检测实时性研究的热潮。 Target detection is the most critical aspect of target recognition. It is a technique to study how to let a computer find the target object's area from an image or video in a human mind. Among them, the manual detection technology is the most difficult problem to study. At present, bare-hand interaction has become an attractive application in virtual games, which will lead to a new round of real-time research on human body parts detection.

在过去的 20 年中，大量的目标检测方法被提出。例如，基于神经网络的检测方法、基于支持向量机的检测算法、基于隐形马尔可夫模型的检测方法和基于概率的检测方法。然而，大多数的算法都只是应用图像的原始像素作为特征，他们大多对光照变化和噪声十分敏感。目前最主流的目标检测方法是基于 AdaBoost学习的统计模型方法。 In the past 20 years, a large number of target detection methods have been proposed. For example, a neural network based detection method, a support vector machine based detection algorithm, a stealth Markov model based detection method, and a probability based detection method. However, most algorithms only use the original pixels of the image as features, and they are mostly sensitive to illumination changes and noise. The most mainstream target detection method at present is based on the statistical model method of AdaBoost learning.

人体部位检测涉及图像处理、模式识别、人机交互及视觉监控等领域，在虚拟现实、人机交互、视觉监控等领域均有着广阔的应用。人体部位检测不仅需要完成目标特征的构造和进行相应的脱机训练，实现实时的动态监测，同时还要排除背景噪声和不特定的干扰等问题，这也是需要面临和克服的挑战性问题。发明内容由于人体部位（头、手、脚）具有多样性、方向性、多义性等因素，仅仅应用现有的简单特征训练并不能得到理想的检测效果。为了解决人体部位（头、手、脚）检测中特征多样性的问题，以及获得实时的检测效果，本发明提供了一种新型的特征一万向特征（Omm -direction Features) , 结合并行级联的统计学习算法进行人体部位检测，保证实时检测的情况下实现了高检测率。从而在目标检测和模式识别等方面具有重要的作用。 Human body parts detection involves image processing, pattern recognition, human-computer interaction and visual monitoring, and has a wide range of applications in virtual reality, human-computer interaction, and visual surveillance. The detection of human body parts not only needs to complete the construction of target features and corresponding offline training, real-time dynamic monitoring, but also eliminate background noise and unspecified interference, which is also a challenging problem that needs to be faced and overcome. SUMMARY OF THE INVENTION Due to the diversity, directionality, ambiguity and the like of the human body parts (head, hand, foot), the application of the existing simple feature training does not achieve the desired detection effect. In order to solve the problem of feature diversity in the detection of human body parts (heads, hands, feet), and to obtain real-time detection effects, the present invention provides a novel feature Omm-direction Features, combined with parallel cascading The statistical learning algorithm performs human body part detection to ensure high detection rate under real-time detection. Thus it plays an important role in target detection and pattern recognition.

本发明所处理的数据源是三维深度图像，这与常见的灰度图像和彩色图像有很大的不同。三维深度图像是将摄像头与拍摄目标的各个像素点的距离读取并储存而获得的图像数据，用不同的灰度来体现图像中像素点的距离信息。 The data source processed by the present invention is a three-dimensional depth image, which is quite different from common grayscale and color images. The three-dimensional depth image is image data obtained by reading and storing the distance between the camera and each pixel of the shooting target, and different distances are used to represent the distance information of the pixel points in the image.

本发明所提出的一种基于三维深度图像的人体部位检测方法，其特征在于，该方法包括以下歩骤： A method for detecting a human body part based on a three-dimensional depth image according to the present invention is characterized in that the method comprises the following steps:

歩骤 1，采用深度摄像头采集多幅三维深度图像并对其进行处理，建立人体部位样本数据库； Step 1, using a depth camera to collect and process a plurality of three-dimensional depth images, and establishing a sample database of human body parts;

歩骤 2，对于人体部位样本数据库中的每幅图像，构造描述各人体部位的万向特征； Step 2: constructing a universal feature describing each human body part for each image in the body part sample database;

歩骤 3，对于所述万向特征基于并行级联的统计学习算法训练分类器，得到贡献力最大的那些万向特征； Step 3: training the classifier based on the parallel cascading statistical learning algorithm for the universal feature, and obtaining those universal features having the greatest contribution;

歩骤 4，基于歩骤 3得到的贡献力最大的万向特征，对从深度摄像头实时读入的图像进行人体部位的检测，并对检测出的人体部位区域进行标注显示。 Step 4: Based on the universal characteristic obtained by the step 3, the human body part is detected on the image read in real time from the depth camera, and the detected human body part area is marked and displayed.

本发明的有益效果是： The beneficial effects of the invention are:

a.目标实时检测，保证了实时的检测速度，检测效果优越； a. Target real-time detection, ensuring real-time detection speed, superior detection effect;

b.相比较其他特征如 haar-like特征，运用万向特征（Omni -direction Features ) 大大提高了检测率； b. Compared with other features such as haar-like features, the use of Omni-direction Features greatly improves the detection rate;

c.运用并行级联的分类器进行训练，由于是分层训练的关系，每次分组到特征集的特征数目远小于未分组时的特征数目，在训练时间方面有很大的提高，训练速度是原来速度的 N-1倍 (N为特征的分组数)； d.因为设定分类器的训练在达到 600个特征的时候就会停止（分类器必须设定一个停止的参数，无限制的训练下去，到后期的训练已经没有意义），原来未分组的分类器在特征的选择上会受到分类器特征数目的限制，选择到的特征不够精细和丰富。分组训练所能挑选的特征虽然也受到这一因素影响，但因为是分成了 N组的原因，每组分配到的特征基本都可以完全参与训练选择，大大提高了可选择的特征数目； c. Using parallel cascaded classifiers for training, because of the hierarchical training relationship, the number of features each time grouped into the feature set is much smaller than the number of features when not grouped, in terms of training time Greatly improved, the training speed is N-1 times the original speed (N is the number of characteristic groups); d. Because the training of the set classifier stops when it reaches 600 features (the classifier must set one The parameters of the stop, the unrestricted training, and the later training have no meaning. The original ungrouped classifier is limited by the number of classifier features in the feature selection, and the selected features are not detailed and rich. Although the characteristics that can be selected by group training are also affected by this factor, because it is divided into N groups, the characteristics assigned to each group can basically participate in the training selection, which greatly improves the number of selectable features.

e. 在检测率上有很大提高。因为选择到的特征的贡献比未分组的特征贡献性总体要好，所以在误判率上有很好的改善，误检率降低了近 3 倍。 e. There is a significant increase in detection rates. Since the contribution of the selected feature is better than the uncontributed feature contribution overall, the false positive rate is improved, and the false positive rate is reduced by nearly 3 times.

本发明运用前景广泛，在目标检测，模式识别，计算机图像处理等方面有着重要作用，也为计算机三维应用在实时检测跟踪方面提供了应用趋势。 The invention has wide application prospects and plays an important role in target detection, pattern recognition, computer image processing, etc., and also provides an application trend for real-time detection and tracking of computer three-dimensional applications.

附图说明图 1是本发明所提出的基于三维深度图像信息的并行统计学习人体部位检测方法流程图。 BRIEF DESCRIPTION OF THE DRAWINGS FIG. 1 is a flow chart of a parallel statistical learning human body part detecting method based on three-dimensional depth image information proposed by the present invention.

图 2是本发明人体部位样本数据库的例图。 Fig. 2 is a view showing an example of a human body part sample database of the present invention.

图 3是本发明的万向特征（Omni -direction Features) 的矩形块表示图。 Figure 3 is a rectangular block representation of the Omni-direction Features of the present invention.

图 4是本发明的九种简单的万向特征（ Omni -direction Features ) 的构造图。 Figure 4 is a structural diagram of nine simple Omni-direction features of the present invention.

图 5是本发明的一种万向特征（Omni -direction Features) 的特征值计算示意图。 Fig. 5 is a diagram showing the calculation of the eigenvalues of an Omni-direction feature of the present invention.

图 6是利用图像积分图快速计算矩形特征值的示意图。 FIG. 6 is a schematic diagram of rapidly calculating a rectangular feature value using an image integration map.

图 7是样本特征计算方法流程图。 Figure 7 is a flow chart of the sample feature calculation method.

图 8是本发明的三种扩展的万向特征（Omni -direction Features)构造图。图 9是本发明的多层万向特征（Omni -direction Features)的构造图。图 10是本发明统计学习训练方法的流程图。 Figure 8 is a diagram showing three expanded Omni-direction Features of the present invention. Figure 9 is a configuration diagram of the Omni-direction Features of the present invention. 10 is a flow chart of a statistical learning training method of the present invention.

图 11是本发明的并行级联分类器的构造图。 Figure 11 is a configuration diagram of a parallel cascade classifier of the present invention.

图 12是本发明对图像进行实时检测的方法流程图。 Figure 12 is a flow chart of a method for real-time detection of an image of the present invention.

具体实施方式为使本发明的目的、技术方案和优点更加清楚明白，以下结合具体实施例，并参照附图，对本发明进一歩详细说明。 DETAILED DESCRIPTION OF THE INVENTION In order to make the objects, the technical solutions and the advantages of the present invention more comprehensible, the present invention will be described in detail below with reference to the accompanying drawings.

本发明是基于统计学习的目标检测原理，对所获取的三维深度图像进行目标检测跟踪。图 1是本发明所提出的基于三维深度图像信息的并行统计学习人体部位检测方法流程图，如图 1所示，本发明所提出的基于三维深度图像的人体部位检测方法包含以下几个歩骤： The invention is based on the principle of target detection of statistical learning, and performs target detection and tracking on the acquired three-dimensional depth image. 1 is a flowchart of a parallel statistical learning human body part detecting method based on three-dimensional depth image information according to the present invention. As shown in FIG. 1 , the method for detecting a human body part based on a three-dimensional depth image according to the present invention includes the following steps: :

歩骤 1，采用深度摄像头采集多幅三维深度图像并对其进行处理，建立人体部位样本数据库。 Step 1. The depth camera is used to collect and process a plurality of three-dimensional depth images, and a human body part sample database is established.

在基于统计学习的检测方法中，除了学习算法的性能以及特征形式对检测器性能有较大影响外，训练集也是一个关键的因素，如果训练集选取不当，会严重影响检测效果。本发明的样本采集设备为深度摄像头，采集地点为 CASIA (中国科学院自动化研究所）高新技术创新中心。采集时从深度摄像头读入数据并对读入的视频抓取帧图片保存。所采集的三维深度图像所储存的数据是摄像头与拍摄视角内各个感兴趣的目标的距离的深度信息。样本数据库的建立原则是尽可能的涵盖较多环境下和具有较多姿势的人体部位（头、手、脚）图像，使得选取的样本足够丰富。本发明的训练样本集通过采集 86人，每个人预设 21个规定动作，制作了一个由 10000个人体三维深度图像组成的初始数据集，将所获取的所有图片统一归一化为分辨率为 320 X 240 像素的图片，图片均为 BMP格式的深度信息图像。从归一化后的图片中分割出头部、手部和脚部，并重新设定人头部样本为 24 X 28像素、人手部样本为 28 X 24像素、人脚部样本为 24 X 24像素。排除遮挡和外界噪声等影响的样本，得到实验人头部、手、脚的正样本各 8000 个，从归一化图片数据中切割出 7500幅非人体部位（头、手、脚）的负样本图片。将人体部位：头、手、脚的 8000幅正样本分别和 7500幅负样本组合成人体部位（头、手、脚）的三个样本数据库。图 2是人体部位（头、手、脚）训练的资料，其中， 1010101表示人体头部正样本， 1010102表示人体手部正样本， 1010103 表示人体脚部正样本， 10102表示非人体部位（头、手、脚）即负样本图片。 In the statistical learning-based detection method, in addition to the performance of the learning algorithm and the characteristic form have a great influence on the performance of the detector, the training set is also a key factor. If the training set is improperly selected, the detection effect will be seriously affected. The sample collection device of the invention is a depth camera, and the collection location is CASIA (Institute of Automation, Chinese Academy of Sciences) High-tech Innovation Center. The data is read from the depth camera during acquisition and the captured video frame is saved. The data stored in the acquired three-dimensional depth image is depth information of the distance between the camera and each of the objects of interest within the shooting angle of view. The principle of establishing a sample database is to cover as much as possible the images of human body parts (heads, hands, feet) with more postures and more postures, so that the selected samples are rich enough. The training sample set of the invention collects 86 people, each person presets 21 prescribed actions, and creates an initial data set composed of 10000 human body three-dimensional depth images, and normalizes all the acquired images into resolutions. 320 X 240 pixel images, pictures are depth information images in BMP format. The head, the hand and the foot are separated from the normalized picture, and the human head sample is 24 X 28 pixels, the human hand sample is 28 X 24 pixels, and the human foot sample is 24 X 24 Pixel. Exclude samples of occlusion and external noise, etc. The experimental subjects had 8000 positive samples of the head, hands and feet, and 7500 non-human parts (head, hand, foot) negative sample images were cut out from the normalized image data. The human body parts: 8000 positive samples of the head, hands and feet and 7500 negative samples are combined with three sample databases of the adult body parts (head, hand, foot). Figure 2 shows the training of human body parts (head, hand, foot). Among them, 1010101 represents the positive sample of the human head, 1010102 represents the positive sample of the human hand, 1010103 represents the positive sample of the human foot, and 10102 represents the non-human body part (head, Hand, foot) is the negative sample picture.

歩骤 2，基于人体部位样本数据库构造描述各人体部位的万向特征以克服人体部位变化的多义性和多样性。 Step 2: Describe the universal features of each human body part based on the human body part sample database structure to overcome the ambiguity and diversity of the human body part changes.

人体部位（头、手、脚）存在多义性和多样性特征。例如人手在变化的时候，其姿态是万千变化的，这增加了人手检测的难度。由于人体部位（头、手、脚）具有多样性、方向性等因素，所以至今没有很好的描述这类特征的算法。本发明提出了一种能够很好的描述人体部位（头、手、脚）方向性和多样性特性的新型特征——万向特征（Omni -direction Features )。 The human body parts (head, hands, feet) are characterized by ambiguity and diversity. For example, when the human hand is changing, its posture is varied, which increases the difficulty of human detection. Since the human body parts (heads, hands, feet) have various factors such as diversity and directionality, so far there is no good algorithm for describing such features. The present invention proposes a novel feature that can well describe the directionality and diversity characteristics of the human body parts (head, hand, foot) - Omni-direction Features.

分析采集到的正样本数据，在每张正样本图片中人体部位只能从一个方向伸进正样本图片，并且稳定的占据样本图片的中间位置，根据这一样本特性，样本中间位置的平均深度要比周围位置的深度大，通过中间矩形区域和***矩形区域的深度差值来建造新的特征，称做万向特征 ( Omni -direction Features )。万向特征 (Omni -direction Features) 为类矩形特征（类矩形特征为各种矩形框经过相互重叠、遮挡、错层等方式组合得到的形状特征），是通过出现人体部位的矩形区域和周围矩形区域按照一定的权值关系进行组合得到的，分为任意位置的单层次单矩形特征；多层次多矩形特征；组合矩形特征；组合菱形特征；组合椭圆特征；组合对角对称特征等类型。图 4中特征 10201-特征 10209为单层次单矩形特征，图 8中 10210为组合矩形特征， 10211为组合菱形特征， 10212为组合椭圆特征，图 9中特征 10213为多层次多矩形特征，特征 10214-特征 10217为组合对角对称特征。 Analyze the collected positive sample data. In each positive sample picture, the human body part can only extend into the positive sample picture from one direction, and stably occupy the middle position of the sample picture. According to the sample characteristics, the average depth of the sample intermediate position To be larger than the depth of the surrounding position, a new feature is built by the difference in depth between the middle rectangular area and the outer rectangular area, called Omni-direction Features. Omni-direction Features is a class-like rectangle feature (a rectangular feature is a combination of various rectangular frames that are superimposed, occluded, and split layered), and is a rectangular area and a surrounding rectangle through which human body parts appear. The regions are combined according to a certain weight relationship, and are divided into single-level single-rectangular features at arbitrary positions; multi-level and multi-rectangular features; combined rectangular features; combined diamond features; combined elliptical features; combined diagonal symmetric features and the like. The feature 10201-feature 10209 in FIG. 4 is a single-level single-rectangular feature, 10210 in FIG. 8 is a combined rectangular feature, 10211 is a combined diamond feature, 10212 is a combined elliptical feature, and feature 10113 in FIG. 9 is a multi-level multi-rectangular feature, feature 10214 - Feature 10217 is a combined diagonal symmetry feature.

所有类型特征都可以通过图像积分进行快速计算和提取。通过万向特征（Omni -direction Features)可以很好的描述人体部位（头、手、脚）的多义性、多样性和复杂形变等本质结构特性。所有列出的特征均可以一定程度的表征人体部位（头、手、脚）的特性，所有列出的特征是为了阐述万向特征（Omni -direction Features) 的概念和构造原理，万向特征（Omni -direction Features) 包含但不限于所列特征。 All types of features can be quickly calculated and extracted by image integration. Through universal Omni-direction Features can well describe the essential structural characteristics of human body parts (heads, hands, feet) such as ambiguity, diversity and complex deformation. All of the listed features can characterize the characteristics of the body parts (head, hand, foot) to a certain extent. All the listed features are to illustrate the concept and construction principle of Omni-direction Features, universal features ( Omni -direction Features) Includes, but is not limited to, the listed features.

现就万向特征（Omni -direction Features) 进行简单的描述和说明： a) 矩形的表示： A brief description and description of the Omni-direction Features is now available: a) Representation of the rectangle:

如图 3所示，假设图像中存在人体部位（头、手、脚）的子图像窗口区域由 W*H个像素组成，用一个五元组来表示子图像中的任意矩形： r = (x,y,w,h,a), 其中（xj) 为矩形左上顶点的坐标， w和 ?为矩形的长和宽，《为矩形的旋转角度， W， H分别表示子图像窗口的长和宽。并且它们满足： As shown in Fig. 3, it is assumed that the sub-image window area of the human body part (head, hand, foot) in the image is composed of W*H pixels, and a five-tuple is used to represent any rectangle in the sub-image: r = (x , y, w, h, a), where (xj) is the coordinate of the upper left vertex of the rectangle, w and ? are the length and width of the rectangle, "for the rotation angle of the rectangle, W, H respectively represent the length and width of the sub-image window . And they satisfy:

0≤x,x + w≤ ;0≤_y,_y + w, 2>0;a匚 [0。，45⁰]。 0 ≤ x, x + w ≤ ; 0 ≤ y, _y + w, 2>0; a 匚 [0. , 45 ⁰ ].

b) 矩形特征的表示： b) representation of the rectangular feature:

如图 4所示，以 10201特征为例，黑色矩形区域可以代表人体部位 As shown in Figure 4, taking the 10201 feature as an example, the black rectangular area can represent the human body part.

(头、手、脚）区域，可以以任何方向和大小定位到整个矩形区域的任何位置。一个简单的万向特征（Omni -direction Features) 是通过累加周围白色区域的像素的值，再减去中间黑色矩形区域的像素的累加和得到的，图 4中 10201特征代表白色区域面积是黑色区域面积 6.25倍的特征。 The (head, hand, foot) area can be positioned anywhere in the entire rectangular area in any direction and size. A simple Omni-direction feature is obtained by accumulating the values of the pixels in the surrounding white area and subtracting the sum of the pixels in the middle black rectangular area. The feature of 10201 in Figure 4 represents that the white area is a black area. Features 6.25 times the area.

矩形特征值用如下公式表示： The rectangular feature values are represented by the following formula:

feature i = ^ ω_ί -RQcS m^) _Feature i = ^ ω _ί -RQcS m^)

ieI={l,-,N} (1) 其中，为第 z个矩形的权值， Rec m )代表第 _z个矩形内所有像素值的和， N为组成特征的矩形个数。 ieI = {l, -, N } (1) where z is the rectangle of weights, Rec m) and the representative of all the pixel values within the rectangle of the _z, N is the number of rectangles features.

假设组成图 5所示的矩形特征 10201的两个矩形分别为 , 。其中包含^ 且的面积等于 6.25倍的的面积。根据矩形权值异号且与其面积成反比，得到两个矩形的权值比为 -1:6.25。根据图 3定义的五元组， =(0,0,20,20,0°) ₂=(5,4,8,8,0°)，那么由公式（1)给出的计算矩形特征的一般方法，可得到该矩形特征为： It is assumed that the two rectangles constituting the rectangular feature 10201 shown in FIG. 5 are respectively . The area containing ^ and the area equal to 6.25 times. According to the rectangular weight and the inverse of its area, the weight ratio of the two rectangles is -1: 6.25. According to the quintuple defined in Figure 3, =(0,0,20,20,0°) ₂ =(5,4,8,8,0°), then the calculation of the rectangular feature given by equation (1) In general, the rectangular feature is:

feature_i = -1 · Re cSum(0, 0, 20, 20, 0° ) + 6.25 · Re cSum(5, 4, 8, 8, 0° ) 其中， : «2的比值由特征原型确定，是一个固定值。即从同一特征原型中派生出来的所有矩形特征，都是该特征原型的缩放，其权值比不发生变化。 Feature _i = -1 · Re cSum(0, 0, 20, 20, 0° ) + 6.25 · Re cSum(5, 4, 8, 8, 0° ) Among them, the ratio of «2 is determined by the feature prototype and is a fixed value. That is, all the rectangular features derived from the same feature prototype are scaled by the feature prototype, and the weight ratio does not change.

由于训练样本有上万个，并且万向特征（Omni -direction Features) 的数量十分庞大，如果每次进行特征值的计算都需要统计矩形内所有像素之和，将会大大影响训练和检测的速度。 Paul Viola等引入了一种新的图像的表示方法一积分图像，矩形特征的特征值计算，只与此特征矩形的端点的积分图有关，所以不管此特征矩形的尺度变换如何，特征值的计算所消耗的时间都是常量（样本特征计算流程图如图 7 所示）。这样只要遍历图像一次，就可以求得所有子窗口的特征值。利用它可以快速计算矩形特征。 Since there are tens of thousands of training samples and the number of Omni-direction Features is very large, if you calculate the eigenvalues every time you need to count the sum of all the pixels in the rectangle, it will greatly affect the speed of training and detection. . Paul Viola et al. introduced a new image representation method, an integral image, and the eigenvalue calculation of the rectangular feature is only related to the integral graph of the endpoint of the feature rectangle, so no matter how the scale transformation of the feature rectangle is, the calculation of the feature value The time consumed is constant (the sample feature calculation flow chart is shown in Figure 7). In this way, as long as the image is traversed once, the feature values of all the child windows can be obtained. Use it to quickly calculate rectangular features.

积分图的定义为： x'≤x ≤y (3) 其中， /(x' )为图像在点 (x' )处的像素值。 The integral map is defined as: x'≤x ≤y (3) where /(x' ) is the pixel value of the image at point (x').

为了节约时间，减少重复计算，则图像 /的积分图可按如下递推公式计算： To save time and reduce double counting, the image/integral map can be calculated as follows:

s(x,y) = s(x,y-\) + i(x,y) s(x,y) = s(x,y-\) + i(x,y)

ii(x, y) = ii(x -l,y) + s(x, y) (₄) 其中，为点 (XJ处的像素值，为点 (XJ 的累计行像素总和，

为点 (x, 的积分图， χ,-1)=0， "'(-1, =0。 Ii(x, y) = ii(x -l,y) + s(x, y) ( ₄ ) where is the point (the pixel value at XJ, which is the point (the sum of the cumulative line pixels of XJ,

For the point (x, the integral graph, χ, -1) = 0, "'(-1, =0.

这样就可以进行 2种运算： This allows 2 operations to be performed:

(0任意矩形区域内像素积分。由图像的积分图可方便快速地计算图像中任意矩形内所有像素灰度积分图如图 6(a)所示。如图 6(b)所示，点 1的积分图像 1的值为 (其中 Sum为求和）： (0 pixel integral in any rectangular area. The integral map of the image can be used to quickly and easily calculate the gray level integral map of all pixels in any rectangle in the image as shown in Fig. 6(a). As shown in Fig. 6(b), point 1 The value of integral image 1 (where Sum is summed):

zzl=Sum (A) (5) 同理，点 2、点 3、点 4的积分图像分别为： Zzl=Sum (A) (5) Similarly, the integral images of points 2, 3, and 4 are:

z2=Sum(A)+Sum(B); (6) zz3=Sum(A)+Sum(C); (7) ,,4=Sum(A)+Sum(B)+Sum(C)+Sum(D); (8) 矩形区域 D内的所有像素灰度积分可由矩形端点的积分图像值得到： Z2=Sum(A)+Sum(B); (6) zz3=Sum(A)+Sum(C); (7) ,,4=Sum(A)+Sum(B)+Sum(C)+Sum (D); (8) The gray level integration of all pixels in the rectangular area D can be obtained from the integral image values of the rectangular end points:

Sum(D)= l+ 4-( 2+ 3); (9) Sum(D)= l+ 4-( 2+ 3); (9)

(11) 特征值计算 (11) Eigenvalue calculation

矩形特征的特征值是两个不同的矩形分区图元和之差，由 (9)式可以计算任意矩形特征的特征值，下面以图 6(b)中特征原型 A为例说明特征值的计算。 The eigenvalue of the rectangular feature is the difference between two different rectangular partitions. The eigenvalue of any rectangular feature can be calculated by (9). The following is an example of the feature prototype A in Figure 6(b). .

如图 6(c)所示，该特征原型的特征值定义为： As shown in Figure 6(c), the feature values of the feature prototype are defined as:

Sum(A)-Sum(B) (10) 根据（9) 式则有： Sum(A)-Sum(B) (10) According to (9), there are:

Sum(A)= 4+ l -(Z2+ 3); (11) Sum(B)= 6+ 3-( 4+ 5); (12) 所以此类特征原型的特征值为： Sum(A)= 4+ l -(Z2+ 3); (11) Sum(B)= 6+ 3-( 4+ 5); (12) So the eigenvalues of such feature prototypes are:

( 4- 3)-( 2- 1)+( 4- 3)-( 6- 5)； (13) 另示：运用积分图可以快速计算给定的矩形内部所有像素值之和 Sumr：)。假设 r=(x,y,_W,h;),那么通过积分图计算此矩形内部所有像素值之和等价于下面这个式子： ( 4- 3)-( 2- 1)+( 4- 3)-( 6- 5); (13) Additional: The integral graph can be used to quickly calculate the sum of all pixel values within a given rectangle, Sumr:). Assuming r = (x, y, _W , h;), then the sum of all pixel values inside this rectangle is calculated by the integral graph equivalent to the following formula:

Sum(r)= (x+w,y+h)+ (x- 1 ,y- 1 )- (x+w,y- 1 )-ii(x- 1 ,y+h)； (14) 由此可见，矩形特征特征值计算只与此特征端点的积分图有关，而与图像坐标值无关。对于同一类型的矩形特征，不管特征的尺度和位置如何，特征值的计算所耗费的时间都是常量，而且都只是简单的加减运算。其它类型的特征值计算方法类似。 Sum(r)= (x+w,y+h)+ (x-1,y-1)-(x+w,y-1)-ii(x-1,y+h); (14) It can be seen that the rectangular feature eigenvalue calculation is only related to the integral map of the feature endpoint, and is independent of the image coordinate values. For the same type of rectangular feature, regardless of the scale and position of the feature, the time it takes to calculate the feature value is constant, and it is simply a simple addition and subtraction operation. Other types of eigenvalue calculation methods are similar.

c) 编码型万向特征 (Omni -direction Features) c) Omni-direction Features

由于人体部位（头、手、脚）方向性和多义性的特征，很难由统一的结构化刚体模型进行描述，所以特征的构造并不局限在正矩形或正方体的形状，而是变化万千的形状特征，图 8给出了三种特征依次为：矩形编码、菱形编码和椭圆编码。以矩形编码为例，对于矩形特征，多个小矩形排列成矩形的形状，左图中各个小矩形中的数字为该位置的像素值；求取这个矩形特征所有位置的像素平均值，大于平均值的像素值为有效值，设为 1，小于平均值的像素值设为 0，为计算简便和有限性，选取特征的边缘位置进行计算比较即可，如此，形成由元素 1、 0 组成的矩形特征，如中图所示；元素为 0的矩形框，也就是像素值低于平均值的矩形框，认为是可以代表人体部位（头、手、脚）区域的矩形框，如右图中的黑色矩形框所示，由此，形成右图所示的特征模板。图 8中所列特征均是以 8位二进制数描述，对于各个特征的中图，从特征左上角开始，顺时针绕边缘进行编码排序，这种描述利于直观的得到特征的内部结构，如矩形特征以 00011001 (无中心位置值）来描述，也可以十进制数表示，转化为十进制为 25; 菱形特征为： 00011001或 25; 椭圆特征为 00110001或 49，图 8中的三种特征编码，均体现了左下部分和右上部分的对比关系。 Due to the directionality and ambiguity of the human body parts (head, hand, foot), it is difficult to describe it by a unified structured rigid body model, so the structure of the feature is not limited to the shape of a positive rectangle or a cube, but varies Thousands of shape features, Figure 8 shows three features in order: rectangular coding, diamond coding and elliptical coding. Taking rectangular coding as an example, for a rectangular feature, a plurality of small rectangles are arranged in a rectangular shape, and the numbers in the small rectangles in the left figure are the pixel values of the position; the average value of the pixels at all positions of the rectangular feature is greater than the average The pixel value of the value The effective value is set to 1, and the pixel value smaller than the average value is set to 0. For the convenience and finiteness of calculation, the edge position of the feature is selected for calculation and comparison, thus forming a rectangular feature composed of elements 1, 0, such as The figure shows a rectangular box with an element of 0, that is, a rectangular box with a pixel value lower than the average value, which is considered to be a rectangular frame that can represent the area of the human body (head, hand, foot), such as the black rectangular frame in the right picture. Thus, the feature template shown in the right figure is formed. The features listed in Figure 8 are all described in 8-bit binary numbers. For the middle image of each feature, starting from the upper left corner of the feature, the code is sorted clockwise around the edge. This description facilitates intuitively obtaining the internal structure of the feature, such as a rectangle. The feature is described by 00010001 (no center position value), and can also be expressed as a decimal number, converted to decimal as 25; the diamond shape is: 00011001 or 25; the ellipse feature is 00110001 or 49, and the three feature codes in Figure 8 are reflected. The contrast between the lower left part and the upper right part.

编码型万向特征的定义可以表示为： The definition of a coded universal feature can be expressed as:

其中： MP 为万向特征值，代表人体部位（头、手、脚）区域的黑色矩形框的面积， g_£ 为判定阈值函数， S为矩形框

Where: MP is the universal eigenvalue, representing the area of the black rectangular frame of the human body part (head, hand, foot) area, g _£ is the judgment threshold function, and S is the rectangular frame

的个数。 The number.

d) 多层的万向特征 ( Omni -direction Features ) d) Multi-layered universal features ( Omni -direction Features)

直接从黑色区域减去白色区域仅仅体现了两个矩形的特征，它不能具体化特征在图像中的位置。可以通过计算不同系数的多层像素来解决这个问题；然而这将带来巨大的计算量。为了保留位置特征并且获得较快的计算速度，本发明将简单的一个万向特征（Omni -direction Features) 扩展到多层的一簇万向特征（Omni -direction Features )。 Subtracting the white area directly from the black area embodies only the features of the two rectangles, which does not materialize the position of the feature in the image. This problem can be solved by computing multi-layer pixels with different coefficients; however, this will bring a huge amount of computation. In order to preserve positional features and achieve faster computational speeds, the present invention extends a simple Omni-direction feature to a multi-layered Omni-direction Features.

创建多层万向特征（Omni -direction Features) 的目的是为了降低因为直接的黑白区域求差引起的位置信息损坏，而保持位置信息的完整性 ₍ 多层的万向特征（Omni -direction Features) 如图 9所示。矩形区域是由黑到白逐级递进的。特征的计算是由黑色矩形区域的像素和减去最外层白色区域的像素和之后，加上围绕在黑色矩形区域的灰色区域的像素和。多层的万向特征（Omni -direction Features) 使得图像的特征更加柔和。同样利用公式（1): f ∑ A'RecSO, 如图 ₉中的第一个特征，设三个矩形的五元组分别为 =(0,0,20,20,0°)， r₂ =(5,5,10,10,0°) , r₃ =(7, 7,5,5,0°)对应的权值由面积以及比例关系得到：：： ^ = -1： 2： 8。则矩形特征值为： The purpose of creating multilayer universal feature (Omni -direction Features) is to reduce the damage position information as direct black and white areas caused by poor demand, while maintaining the integrity of the location information _(universal feature of the multilayer (Omni -direction Features) As shown in Figure 9. The rectangular area is progressively progressive from black to white. The feature is calculated by subtracting the outermost layer from the pixels of the black rectangular area. The pixels of the white area are followed by the sum of the pixels surrounding the gray area in the black rectangular area. Multi-layer Omni-direction Features make the image's features softer. Also use the formula (1): f ∑ A'RecSO, as shown in the first feature in Figure ₉ , let the five rectangles of the five rectangles be = (0,0,20,20,0°), r ₂ = (5,5,10,10,0°), r ₃ =(7, 7,5,5,0°) The weights are obtained from the area and proportional relationship: : : ^ = -1: 2: 8. Then the rectangular feature value is:

=— 1 · Re cSum(0, 0,20, 20,0°) + 2 -Re cSum(5, 5,10,10, 0°) + 8-Re cSumil, 7,5,5,0°) 。歩骤 3，对于所述万向特征基于并行级联的统计学习算法训练分类器，得到贡献力最大的那些万向特征。 =— 1 · Re cSum(0, 0,20, 20,0°) + 2 -Re cSum(5, 5,10,10, 0°) + 8-Re cSumil, 7,5,5,0°) . Step 3: Train the classifier based on the parallel cascading statistical learning algorithm for the universal feature to obtain those universal features with the greatest contribution.

可以通过万向特征（Omni -direction Features)提取到大量的人体部位（头、手、脚）特征，但是有些特征在检测阶段未必有实际意义。挑选和浓缩这些特征，对于在不影响检测率的情况下，减少冗余特征的证明和计算是十分必要的。为了克服这个问题，本发明采用基于统计学习的理论来选择贡献力最大的那些特征。所述贡献力是指所选特征对于检测***的有效性，也就是判断所选特征是否可以有效的判定待检测图像是否含有人体部位。但由于一般情况下，应用到统计学习的特征数量非常大，并且样本的数量需要满足一定的比例，因此将所有的特征应用到一个分类器训练是一件十分困难的事情。所以本发明提出图 10 所示的一种并行级联的统计学习算法，采用分类并行的分类训练 103最终组合成强分类器 104来解决这个问题。每种分类器的训练在达到 600个特征的时候就会停止（分类器必须设定一个停止的参数，无限制的训练下去，后期的训练已经没有意义）。图 10为本发明统计学习训练方法的流程图。 A large number of human body parts (head, hand, foot) features can be extracted by Omni-direction Features, but some features may not be meaningful at the detection stage. Selecting and concentrating these features is necessary to reduce the redundancy and reliability of the features without affecting the detection rate. To overcome this problem, the present invention employs a theory based on statistical learning to select those features that have the greatest contribution. The contribution is the validity of the selected feature for the detection system, that is, whether the selected feature can be effectively determined whether the image to be detected contains a human body part. However, since the number of features applied to statistical learning is very large in general, and the number of samples needs to meet a certain ratio, it is very difficult to apply all the features to a classifier training. Therefore, the present invention proposes a parallel cascaded statistical learning algorithm as shown in Fig. 10, which uses classification parallel training 103 to finally combine into a strong classifier 104 to solve this problem. The training of each classifier will stop when it reaches 600 features (the classifier must set a stop parameter, unrestricted training, and later training is meaningless). FIG. 10 is a flowchart of a statistical learning training method according to the present invention.

(0 统计学习方法 (0 statistical learning method

此处的训练目标是通过对判断得出的真假样本进行分析，选择分类错误率最低的 T个弱分类器，最终优化组合成一个强分类器。训练方法具体为： The training goal here is to analyze the true and false samples obtained by the judgment, select the T weak classifiers with the lowest classification error rate, and finally optimize the combination into a strong classifier. The training method is specifically as follows:

1、给定训练集：（^ ),...,^^^), 其中 {1,-1}，表示样本的正确的类别标签， z = l,...,N ，令 ^ )表示第 z副图像的第个特征值。 2、计算训练集上样本的初始分布： A(0 (17) m 1. Given a training set: (^ ),...,^^^), where {1,-1}, indicates the correct category label for the sample, z = l,...,N , and ^ ) The first feature value of the zth sub-image. 2. Calculate the initial distribution of the sample on the training set: A(0 (17) m

3、对于所有样本的所有特征（此处省略了按照前文所述歩骤计算样本图像积分、使用万向特征模板计算万向特征值、最终得到特征集合的歩骤），寻找弱分类器 (t = l, ..., T ) ₀ 对于每个样本中的第个特征，可以得到一个弱分类器 ^ 即可得到阈值和方向 C 使得 s₃ =∑D_t(x₁ h]为：

3. For all the features of all samples (the steps of calculating the sample image integral according to the previous step, calculating the gimbal feature value using the universal feature template, and finally obtaining the feature set are omitted), and looking for the weak classifier (t = l, ..., T ) ₀ For the first feature in each sample, a weak classifier ^ can be obtained to get the threshold and direction C such that s ₃ = ∑D _t (x ₁ h) is:

其中，决定不等式的方向，只有 ±1两种情况。 Among them, the direction of determining the inequality is only ±1.

4、在所有样本的所有特征中挑选出一个具有最小误差^的弱分类器 4. Pick a weak classifier with the smallest error ^ among all the features of all samples.

5、对所有的样本权重进行更新:

5. Update all sample weights:

N N

其中， 4是使∑D_i+1(x,) = l的归一化因子，为弱分类器在强分 Where 4 is the normalization factor for ∑D _i+1 (x,) = l, which is the weak classifier in the strong score

i=l i=l

类器中的权重，和 ^的分类错误成反比。 The weight in the classifier is inversely proportional to the classification error of ^.

6、经过 T轮训练得到 T个最优的弱分类器，将 T个最优的弱分类器组成一个强分类器；

6. After T round training, T optimal weak classifiers are obtained, and T optimal weak classifiers are formed into a strong classifier;

7、经过 L次训练得到 L个强分类器。 7. After L training, L strong classifiers are obtained.

将每次训练得到的强分类器组合在一起形成分级分类器。分级分类器中每层的强分类器经过阈值调整，使得每一层都能让几乎全部的人体部位（头、手、脚）样本通过，而拒绝很大一部分非人体部位（头、手、脚）样本。将由更重要特征构成的结构较简单的强分类器放在前面，这样可以先排除大量的假样本。尽管随着级数的增多矩形特征在增多，但计算量却在减少，检测的速度在加快，使本发明方法具有很好的实时性。 The strong classifiers obtained from each training are combined to form a hierarchical classifier. The strong classifier of each layer in the hierarchical classifier is threshold-adjusted so that each layer can pass almost all parts of the body (head, hand, foot) and reject a large part of the non-human parts (head, hand, foot) ) Sample. A simple classifier with a simpler structure consisting of more important features is placed in front, so that a large number of false samples can be excluded first. Although the number of rectangles increases with the increase of the number of stages, the amount of calculation is decreasing, and the speed of detection is increasing, so that the method of the invention has good real-time performance.

(11) 并行级联的分类器 (11) Parallel cascaded classifier

一般情况下，应用到学习训练的特征数量非常大，并且样本的数量需要满足一定的比例，因此将所有的特征应用到一个分类器训练是一件十分困难的事情。图 11 描绘了一个可以很好的解决这个问题的并行级联分类器。如图 11所示，首先将大量的万向特征（Omni -direction Features) 随机分为 n组， {fl, f2... fn}。那些好的候选特征通过分类器从并行的组里被挑选出来。从各组选出来的贡献相对比较大的特征结合成一组新的特征集，然后便可以在新组成的特征集中应用强分类器训练出贡献最大的那些特征。比较未经过挑选的特征集（一个万向特征（Omni -direction Features)在 28*24像素图的子检测窗口中的矩形特征数量总计为 96600 个），经过挑选的特征集的特征数目远小于未经过挑选的特征集，在本发明的实验中，设定分类器的训练在达到 600个特征的时候停止。 In general, the number of features applied to learning training is very large, and the number of samples A certain proportion needs to be met, so it is very difficult to apply all the features to a classifier training. Figure 11 depicts a parallel cascade classifier that can solve this problem very well. As shown in FIG. 11, a large number of Omni-direction features are randomly divided into n groups, {fl, f2...fn}. Those good candidate features are selected from the parallel groups by the classifier. The relatively large-scale features selected from each group are combined into a new set of features, and then the strong classifiers can be applied to the newly composed feature sets to train those features that contribute the most. Comparing the unselected feature set (the total number of rectangular features in the sub-detection window of the 28*24 pixel graph by Omni-direction Features is 96,600), and the number of features of the selected feature set is much smaller than that of the After the selected feature set, in the experiment of the present invention, the training of the set classifier is stopped when 600 features are reached.

具体实施办法是： The specific implementation method is:

1.将遍历样本得到的大量万向特征（Omni -direction Features) 进行随机分组，默认分为《组，每组特征集用 / ^… 表示。 1. Randomly group the Omni-direction Features obtained by traversing the samples. The default is divided into groups, and each set of features is represented by / ^.

2.运用（0 中阐述的算法分别对这《组特征集进行分类训练，每组选择对检测贡献最大的那些特征。 2. Use the algorithm described in (0) to classify the “group feature set”, and each group selects those features that contribute the most to the test.

3.将第 2歩中选择得到的具有较大贡献的特征进行整理组合，得到一组新的特征集，这个特征集中的特征数目远小于未经过筛选的原始特征数目，并且特征的总体有效性和贡献度均远优于原始特征。 3. Combine the features with larger contributions selected in the second , to obtain a new set of features. The number of features in this feature set is much smaller than the number of original features that have not been screened, and the overall validity of the feature. And the contribution is far superior to the original features.

4.再次运用强分类器，调整阈值进行分类，选择出最终的待用特征集，此时选到的特征数目是所有分组特征集选择得到的特征集数目的 \ln , 有效性是原来的《倍，可获得到较高的检测率。 4. Use the strong classifier again, adjust the threshold to classify, and select the final inactive feature set. The number of features selected at this time is \ln of the number of feature sets selected by all group feature sets, and the validity is the original one. Times, a higher detection rate is obtained.

歩骤 4，基于歩骤 3得到的贡献力最大的万向特征构造分类器，对从深度摄像头实时读入的图像进行人体部位的分类检测，并对检测出的人体部位区域进行标注显示。 Step 4: Based on the universally-characteristic feature classifier obtained in step 3, the image read in real time from the depth camera is subjected to classification detection of the human body part, and the detected human body part area is marked and displayed.

如图 12所示，歩骤 4进一歩包括以下歩骤： As shown in Figure 12, step 4 includes the following steps:

歩骤 4.1，对从深度摄像头读入的视频抓取帧图片保存； Step 4.1: Capture a frame image of a video read from a depth camera;

歩骤 4.2，对抓取的帧图片进行深度归一化处理； Step 4.2: Performing a deep normalization process on the captured frame image;

深度图片的像素值是从 0到 9999，为加快后续计算，需要将图片的像素值归一化为常用的 0到 255。所述深度归一化处理的具体歩骤为：歩骤 4.2.1，设定一个大小为 10000 的深度直方图数组 g_pDepthHist[10000]，用以统计像素分布； The pixel value of the depth image is from 0 to 9999. To speed up subsequent calculations, the pixel values of the image need to be normalized to 0 to 255. The specific steps of the depth normalization process are: Step 4.2.1, setting a depth histogram array g_pDepthHist[10000] of size 10000 to calculate the pixel distribution;

歩骤 4.2.2，遍历从深度摄像头抓取的深度图片，寻找深度对应的索引值，索引值不为 0 的，该深度像素值的个数加 1 ， g_pDepthHist[curDepth]++，并累计深度索引值不为 0 的总个数 nNumberOfPoints； Step 4.2.2, traversing the depth image captured from the depth camera, looking for the index value corresponding to the depth, the index value is not 0, the number of the depth pixel value is increased by 1, g_pDepthHist[curDepth]++, and the cumulative depth The total number of index values not 0 is nNumberOfPoints;

歩骤 4.2.3，遍历深度直方图数组，计算累计深度直方图， g_pDepthHist[nIndex] += g_pDepthHist[nIndex-l]； Step 4.2.3, traverse the depth histogram array, calculate the cumulative depth histogram, g_pDepthHist[nIndex] += g_pDepthHist[nIndex-l];

歩骤 4.2.4，遍历累计深度直方图，得到映射到 [0〜255]区间的深度查找表数组 g_pDepthHist[nIndex] = (float)(unsigned int)(255 * (l .Of - (g_pDepthHist[nIndex] I nNumberOfPoints)))； Step 4.2.4, traversing the cumulative depth histogram, and obtaining the depth lookup table array mapped to the [0~255] interval g_pDepthHist[nIndex] = (float)(unsigned int)(255 * (l .Of - (g_pDepthHist[nIndex ] I nNumberOfPoints)));

歩骤 4.2.5，遍历深度图片，根据深度值查深度查找表数组，得到 [0〜 255]区间的深度值 (unsigned int)g_pDepthHist[dep]； Step 4.2.5, traverse the depth picture, look up the depth lookup table array according to the depth value, and obtain the depth value of the [0~255] interval (unsigned int) g_pDepthHist[dep];

歩骤 4.3，基于识别窗口多尺度机制对抓取的帧图片提取子图像，用基于贡献力最大的万向特征构造的分类器分别检测子图像是否是人体部位； Step 4.3: extracting a sub-image from the captured frame image based on the multi-scale mechanism of the recognition window, and respectively detecting whether the sub-image is a human body part by using a classifier constructed based on the universal feature with the largest contribution;

初始识别窗口一般设定为与人体部位训练样本大小相同，即初始人头部位识别窗口为 24 X 28像素，初始人手部位识别窗口为 28 X 24像素、初始人脚部位识别窗口为 24 X 24像素。然后从帧图片的左上角开始遍历整个图像获取子图像，每当遍历完一遍后，识别窗口就进行一次放大，然后再遍历整个图像获取子图像，直到识别窗口的大小大于图片的大小停止。窗口等比放大系数越大，识别窗口的放大次数就越少，截取出的子图像数据就越少，识别率就越低，但识别速度将提高，反之亦然。识别窗口多尺度机制通过改变识别窗口的尺寸来提取子图像，避免了传统方法中的对图像的缩放变换，减少计算量。 The initial recognition window is generally set to the same size as the body part training sample, that is, the initial human head part recognition window is 24 X 28 pixels, the initial human hand part recognition window is 28 X 24 pixels, and the initial human foot part recognition window is 24 X 24 pixels. Then, the entire image is acquired from the upper left corner of the frame image to obtain the sub-image. After each traversal, the recognition window is enlarged once, and then the entire image is traversed to obtain the sub-image until the size of the recognition window is larger than the size of the image. The larger the window magnification ratio, the less the number of magnifications of the recognition window, the less the sub-image data is intercepted, the lower the recognition rate, but the recognition speed will increase, and vice versa. The multi-scale mechanism of the recognition window extracts the sub-image by changing the size of the recognition window, avoiding the scaling transformation of the image in the conventional method, and reducing the amount of calculation.

对于提取出的子图像，用基于贡献力最大的万向特征构造的级联分类器检测子图像是否是人体部位，经过基于贡献力最大的万向特征构造的级联分类器的检测后，待识别帧图片中的很大一部分子图像区域被并行级联分类器的前几层强分类器快速识别为非人体部位（头、手、脚）区域，只有实际可能包含人体部位（头、手、脚）的子图像才能到达最后层的强分类器。 For the extracted sub-images, the cascaded classifier based on the universally-contributed universal feature is used to detect whether the sub-image is a human body part, and after the detection by the cascade classifier based on the universally-contributed universal feature, A large part of the sub-image area in the recognition frame picture is quickly recognized as a non-human part (head, hand, foot) by the first few layers of the parallel classifier. Area, only sub-images that may actually contain human parts (heads, hands, feet) can reach the strong classifier of the last layer.

歩骤 4.4，对检测为人体部位的子图像进行合并，得到帧图片中各人体部位的最终检测结果，并对检测出的人体部位区域进行标注显示。 Step 4.4: Combine the sub-images detected as human body parts to obtain the final detection result of each human body part in the frame picture, and display the detected human body part area.

通过歩骤 4.3，实际可能包含人体部位的多个子图像被检测出来，对检测为人体部位的子图像进行合并处理，只有满足一定条件的合并子图像才最终被确定为实际包含人体部位（手、头、脚），此处的一定条件是指某一通过人体部位判断的子图像附近有一定数目的通过人体部位判断的子图像，也就是说，多个通过人体部位判断的子图像出现重叠。相反，如果只是孤立零散的一个子图像，这种子图像认为是噪声，或者叫不确定的人体部分。检测结果的合并处理可以去除很多误识，进一歩提高检测结果的准确性。最后对检测出来的人体部位（头、手、脚）区域进行标注显示。以上所述的具体实施例，对本发明的目的、技术方案和有益效果进行了进一歩详细说明，所应理解的是，以上所述仅为本发明的具体实施例而已，并不用于限制本发明，凡在本发明的精神和原则之内，所做的任何修改、等同替换、改进等，均应包含在本发明的保护范围之内。 Through step 4.3, a plurality of sub-images that may actually include a human body part are detected, and the sub-images detected as human body parts are combined, and only the merged sub-images satisfying certain conditions are finally determined to actually contain the human body parts (hand, Head, foot), the certain condition here refers to a certain number of sub-images judged by the human body part near the sub-image judged by the human body part, that is, a plurality of sub-images judged by the human body part overlap. Conversely, if it is only a sub-image that is isolated and scattered, this sub-image is considered to be noise, or an indeterminate part of the human body. The combined processing of the test results can remove a lot of misunderstandings and further improve the accuracy of the test results. Finally, the detected parts of the human body (head, hand, foot) are marked. The specific embodiments of the present invention have been described in detail with reference to the preferred embodiments of the present invention. Any modifications, equivalent substitutions, improvements, etc., made within the spirit and scope of the present invention are intended to be included within the scope of the present invention.

Claims

权利要求 Rights request

1、一种基于三维深度图像的人体部位检测方法，其特征在于，该方法包括以下歩骤： A method for detecting a human body part based on a three-dimensional depth image, the method comprising the following steps:

2、根据权利要求 1 所述的方法，其特征在于，所采集的三维深度图像储存有摄像头与拍摄目标各个像素点的距离的深度信息。 2. The method according to claim 1, wherein the acquired three-dimensional depth image stores depth information of a distance between the camera and each pixel of the shooting target.

3、根据权利要求 1所述的方法，其特征在于，所述歩骤 1 中对三维深度图像进行处理，建立人体部位样本数据库的歩骤进一歩包括：将采集到的多幅三维深度图像分别归一化为分辨率为 320x240像素的 BMP图片； The method according to claim 1, wherein the step of processing the three-dimensional depth image in the step 1 and establishing the database of the body part sample comprises: collecting the plurality of three-dimensional depth images respectively Normalized to a BMP image with a resolution of 320x240 pixels;

从归一化后的多幅 BMP 图片中分别分割出头部、手部和脚部，得到多幅人体部位，即头部、手部和脚部的正样本图片； The head, the hand and the foot are separated from the normalized multiple BMP images, and a plurality of human body parts, that is, a positive sample picture of the head, the hand and the foot are obtained;

从归一化后的多幅 BMP 图片中分割出多幅非人体部位的负样本图片； A plurality of negative sample images of non-human parts are segmented from the normalized multiple BMP images;

将所述多幅正样本图片分别与多幅负样本图片组合成头部、手部和脚部三个人体部位样本数据库。 The plurality of positive sample images and the plurality of negative sample images are respectively combined into three human body part sample databases of the head, the hand and the foot.

4、根据权利要求 1 所述的方法，其特征在于，依据人体部位的三维深度图像样本的中间位置是出现待检测部位的稳定位置这一特点，通过比较三维深度图像样本的中间区域和***区域得到特征位置信息，将所述特征位置信息结合人体部位的形状特性构造得到所述万向特征。 4. The method according to claim 1, wherein the intermediate position of the three-dimensional depth image sample according to the human body part is a feature that a stable position of the to-be-detected portion appears, by comparing the intermediate region and the peripheral region of the three-dimensional depth image sample. The feature position information is obtained, and the feature position information is combined with the shape characteristic of the body part to construct the universal feature.

5、根据权利要求 4所述的方法，其特征在于，所述万向特征为类矩形特征，是通过出现人体部位的矩形区域和周围矩形区域按照一定的权值关系进行相互重叠、遮或错层方式的组合得到的形状特征：

The method according to claim 4, wherein the universal feature is a rectangular-like feature, which is overlapped, covered or wrong according to a certain weight relationship by a rectangular region in which a human body portion appears and a surrounding rectangular region. Shape characteristics obtained by the combination of layer modes:

其中，为特征值，为第 z'个矩形的权值， R_{e Cl}¾ W代表第,个矩形内所有像素值的和， N为组成所述形状特征的矩形个数； Wherein, the feature value is the weight of the z'th rectangle, R _{e Cl} 3⁄4 W represents the sum of all pixel values in the first rectangle, and N is the number of rectangles constituting the shape feature;

所述矩形用一个五元组来表示： r = (x,_y,_WA )，其中（x,_y) 为矩形左上顶点的坐标， w和 2为矩形的长和宽，《为矩形的旋转角度。 The rectangle is represented by a quintuple: r = (x, _y, _W A ), where (x, _y) is the coordinate of the upper left vertex of the rectangle, w and 2 are the length and width of the rectangle, "rotation of the rectangle" angle.

6、根据权利要求 4所述的方法，其特征在于，所述万向特征分为任意位置的单层次单矩形特征、多层次多矩形特征、编码型万向特征及组合对角对称特征，所述编码型万向特征又分为组合矩形特征、组合菱形特征和组合椭圆特征，所有类型的特征都能够通过图像积分进行快速计算和提取。 The method according to claim 4, wherein the universal feature is divided into a single-level single-rectangular feature, a multi-level multi-rectangular feature, a coded gimbal feature, and a combined diagonal symmetry feature at an arbitrary position. The coding type universal feature is further divided into a combined rectangular feature, a combined diamond feature and a combined elliptical feature, and all types of features can be quickly calculated and extracted by image integration.

7、根据权利要求 6所述的方法，其特征在于，所述编码型万向特 7. The method according to claim 6, wherein the coding type universal joint

其中， MP为万向特征值， &为能够代表人体部位区域的矩形框的面积 g_c = -∑g₁ , 为判定阈值函数， S为矩形框的个数。 Wherein, MP is a universal eigenvalue, and & is an area of a rectangular frame capable of representing a human body part region, g _c = -∑g ₁ , which is a function of determining a threshold value, and S is a number of rectangular frames.

8、根据权利要求 1 所述的方法，其特征在于，基于并行级联的统计学习算法训练分类器的歩骤进一歩为：将歩骤 2得到的万向特征随机分为《组，通过并行级联的统计学习算法进行同时训练，得到各组中贡献力较大的万向特征，合并这些贡献力较大的万向特征得到一个新的特征集，通过再一次的学习训练，最终得到的贡献力最大的那些万向特征组成强分类器。 8. The method according to claim 1, wherein the step of training the classifier based on the parallel cascading statistical learning algorithm is further: the universal feature obtained in step 2 is randomly divided into groups, by parallel The cascaded statistical learning algorithm performs simultaneous training, and obtains the universal feature with greater contribution in each group. Combining these universal features with greater contribution to obtain a new feature set, through the further learning and training, the final result is obtained. Those universal features that contribute the most are strong classifiers.

9、根据权利要求 8所述的方法，其特征在于，所述贡献力是指所选特征对于人体检测的有效性，也就是判断所选特征是否可以有效的判定待检测图像是否含有人体部位。 9. The method according to claim 8, wherein said contributing force means The feature is selected for the effectiveness of the human body detection, that is, whether the selected feature can be effectively determined whether the image to be detected contains a human body part.

10、根据权利要求 1所述的方法，其特征在于，歩骤 4进一歩包括以下歩骤： 10. The method according to claim 1, wherein the step 4 further comprises the following steps:

11、根据权利要求 10 所述的方法，其特征在于，所述识别窗口多尺度机制进一歩包括： 11. The method according to claim 10, wherein the multi-scale mechanism of the identification window further comprises:

首先将初始识别窗口设定为与人体部位训练样本大小相同；然后从帧图片的左上角开始以初始识别窗口遍历整个图像获取子图像； First, the initial recognition window is set to be the same size as the body part training sample; and then the entire image acquisition sub-image is traversed from the upper left corner of the frame picture with the initial recognition window;

每当遍历完一遍后，识别窗口就进行一次放大，然后再遍历整个图像获取子图像，直到识别窗口大小大于图片大小时停止。 Each time the traversal is completed, the recognition window is zoomed in once, and then the entire image is traversed to acquire the sub-image until the recognition window size is larger than the image size.