WO2013091370A1 - Human body part detection method based on parallel statistics learning of 3d depth image information - Google Patents

Human body part detection method based on parallel statistics learning of 3d depth image information Download PDF

Info

Publication number
WO2013091370A1
WO2013091370A1 PCT/CN2012/077874 CN2012077874W WO2013091370A1 WO 2013091370 A1 WO2013091370 A1 WO 2013091370A1 CN 2012077874 W CN2012077874 W CN 2012077874W WO 2013091370 A1 WO2013091370 A1 WO 2013091370A1
Authority
WO
WIPO (PCT)
Prior art keywords
feature
human body
image
body part
universal
Prior art date
Application number
PCT/CN2012/077874
Other languages
French (fr)
Chinese (zh)
Inventor
黄向生
徐波
Original Assignee
中国科学院自动化研究所
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中国科学院自动化研究所 filed Critical 中国科学院自动化研究所
Publication of WO2013091370A1 publication Critical patent/WO2013091370A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • G06V10/443Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components by matching or filtering
    • G06V10/446Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components by matching or filtering using Haar-like filters, e.g. using integral image techniques

Definitions

  • the present invention relates to the fields of image processing, pattern recognition, human-computer interaction, and visual monitoring, and more particularly to a method for parallel detection of human body parts based on three-dimensional depth image information. .
  • Target detection is the most critical aspect of target recognition. It is a technique to study how to let a computer find the target object's area from an image or video in a human mind. Among them, the manual detection technology is the most difficult problem to study. At present, bare-hand interaction has become an attractive application in virtual games, which will lead to a new round of real-time research on human body parts detection.
  • a large number of target detection methods have been proposed. For example, a neural network based detection method, a support vector machine based detection algorithm, a stealth Markov model based detection method, and a probability based detection method.
  • a neural network based detection method For example, a neural network based detection method, a support vector machine based detection algorithm, a stealth Markov model based detection method, and a probability based detection method.
  • most algorithms only use the original pixels of the image as features, and they are mostly sensitive to illumination changes and noise.
  • the most mainstream target detection method at present is based on the statistical model method of AdaBoost learning.
  • Human body parts detection involves image processing, pattern recognition, human-computer interaction and visual monitoring, and has a wide range of applications in virtual reality, human-computer interaction, and visual surveillance.
  • the detection of human body parts not only needs to complete the construction of target features and corresponding offline training, real-time dynamic monitoring, but also eliminate background noise and unspecified interference, which is also a challenging problem that needs to be faced and overcome.
  • SUMMARY OF THE INVENTION Due to the diversity, directionality, ambiguity and the like of the human body parts (head, hand, foot), the application of the existing simple feature training does not achieve the desired detection effect.
  • the present invention provides a novel feature Omm-direction Features, combined with parallel cascading
  • the statistical learning algorithm performs human body part detection to ensure high detection rate under real-time detection. Thus it plays an important role in target detection and pattern recognition.
  • the data source processed by the present invention is a three-dimensional depth image, which is quite different from common grayscale and color images.
  • the three-dimensional depth image is image data obtained by reading and storing the distance between the camera and each pixel of the shooting target, and different distances are used to represent the distance information of the pixel points in the image.
  • Step 1 using a depth camera to collect and process a plurality of three-dimensional depth images, and establishing a sample database of human body parts;
  • Step 2 constructing a universal feature describing each human body part for each image in the body part sample database
  • Step 3 training the classifier based on the parallel cascading statistical learning algorithm for the universal feature, and obtaining those universal features having the greatest contribution;
  • Step 4 Based on the universal characteristic obtained by the step 3, the human body part is detected on the image read in real time from the depth camera, and the detected human body part area is marked and displayed.
  • Target real-time detection ensuring real-time detection speed, superior detection effect
  • the training speed is N-1 times the original speed (N is the number of characteristic groups); d. Because the training of the set classifier stops when it reaches 600 features (the classifier must set one The parameters of the stop, the unrestricted training, and the later training have no meaning.
  • the original ungrouped classifier is limited by the number of classifier features in the feature selection, and the selected features are not detailed and rich.
  • the invention has wide application prospects and plays an important role in target detection, pattern recognition, computer image processing, etc., and also provides an application trend for real-time detection and tracking of computer three-dimensional applications.
  • FIG. 1 is a flow chart of a parallel statistical learning human body part detecting method based on three-dimensional depth image information proposed by the present invention.
  • Fig. 2 is a view showing an example of a human body part sample database of the present invention.
  • Figure 3 is a rectangular block representation of the Omni-direction Features of the present invention.
  • Figure 4 is a structural diagram of nine simple Omni-direction features of the present invention.
  • Fig. 5 is a diagram showing the calculation of the eigenvalues of an Omni-direction feature of the present invention.
  • FIG. 6 is a schematic diagram of rapidly calculating a rectangular feature value using an image integration map.
  • Figure 7 is a flow chart of the sample feature calculation method.
  • Figure 8 is a diagram showing three expanded Omni-direction Features of the present invention.
  • Figure 9 is a configuration diagram of the Omni-direction Features of the present invention.
  • 10 is a flow chart of a statistical learning training method of the present invention.
  • Figure 11 is a configuration diagram of a parallel cascade classifier of the present invention.
  • Figure 12 is a flow chart of a method for real-time detection of an image of the present invention.
  • the invention is based on the principle of target detection of statistical learning, and performs target detection and tracking on the acquired three-dimensional depth image.
  • 1 is a flowchart of a parallel statistical learning human body part detecting method based on three-dimensional depth image information according to the present invention. As shown in FIG. 1 , the method for detecting a human body part based on a three-dimensional depth image according to the present invention includes the following steps: :
  • Step 1 The depth camera is used to collect and process a plurality of three-dimensional depth images, and a human body part sample database is established.
  • the sample collection device of the invention is a depth camera, and the collection location is CASIA (Institute of Automation, Chinese Academy of Sciences) High-tech Innovation Center.
  • the data is read from the depth camera during acquisition and the captured video frame is saved.
  • the data stored in the acquired three-dimensional depth image is depth information of the distance between the camera and each of the objects of interest within the shooting angle of view.
  • the principle of establishing a sample database is to cover as much as possible the images of human body parts (heads, hands, feet) with more postures and more postures, so that the selected samples are rich enough.
  • the training sample set of the invention collects 86 people, each person presets 21 prescribed actions, and creates an initial data set composed of 10000 human body three-dimensional depth images, and normalizes all the acquired images into resolutions.
  • 320 X 240 pixel images, pictures are depth information images in BMP format.
  • the head, the hand and the foot are separated from the normalized picture, and the human head sample is 24 X 28 pixels, the human hand sample is 28 X 24 pixels, and the human foot sample is 24 X 24 Pixel. Exclude samples of occlusion and external noise, etc.
  • the experimental subjects had 8000 positive samples of the head, hands and feet, and 7500 non-human parts (head, hand, foot) negative sample images were cut out from the normalized image data.
  • the human body parts 8000 positive samples of the head, hands and feet and 7500 negative samples are combined with three sample databases of the adult body parts (head, hand, foot).
  • Figure 2 shows the training of human body parts (head, hand, foot).
  • 1010101 represents the positive sample of the human head
  • 1010102 represents the positive sample of the human hand
  • 1010103 represents the positive sample of the human foot
  • 10102 represents the non-human body part (head, Hand, foot) is the negative sample picture.
  • Step 2 Describe the universal features of each human body part based on the human body part sample database structure to overcome the ambiguity and diversity of the human body part changes.
  • the human body parts are characterized by ambiguity and diversity. For example, when the human hand is changing, its posture is varied, which increases the difficulty of human detection. Since the human body parts (heads, hands, feet) have various factors such as diversity and directionality, so far there is no good algorithm for describing such features.
  • the present invention proposes a novel feature that can well describe the directionality and diversity characteristics of the human body parts (head, hand, foot) - Omni-direction Features.
  • Omni-direction Features is a class-like rectangle feature (a rectangular feature is a combination of various rectangular frames that are superimposed, occluded, and split layered), and is a rectangular area and a surrounding rectangle through which human body parts appear.
  • the regions are combined according to a certain weight relationship, and are divided into single-level single-rectangular features at arbitrary positions; multi-level and multi-rectangular features; combined rectangular features; combined diamond features; combined elliptical features; combined diagonal symmetric features and the like.
  • the feature 10201-feature 10209 in FIG. 4 is a single-level single-rectangular feature
  • 10210 in FIG. 8 is a combined rectangular feature
  • 10211 is a combined diamond feature
  • 10212 is a combined elliptical feature
  • feature 10113 in FIG. 9 is a multi-level multi-rectangular feature
  • feature 10214 - Feature 10217 is a combined diagonal symmetry feature.
  • Omni-direction Features can well describe the essential structural characteristics of human body parts (heads, hands, feet) such as ambiguity, diversity and complex deformation. All of the listed features can characterize the characteristics of the body parts (head, hand, foot) to a certain extent. All the listed features are to illustrate the concept and construction principle of Omni-direction Features, universal features ( Omni -direction Features) Includes, but is not limited to, the listed features.
  • the black rectangular area can represent the human body part.
  • the (head, hand, foot) area can be positioned anywhere in the entire rectangular area in any direction and size.
  • a simple Omni-direction feature is obtained by accumulating the values of the pixels in the surrounding white area and subtracting the sum of the pixels in the middle black rectangular area.
  • the feature of 10201 in Figure 4 represents that the white area is a black area. Features 6.25 times the area.
  • ieI ⁇ l, -, N ⁇ (1)
  • z is the rectangle of weights, Rec m) and the representative of all the pixel values within the rectangle of the z, N is the number of rectangles features.
  • Feature i -1 ⁇ Re cSum(0, 0, 20, 20, 0° ) + 6.25 ⁇ Re cSum(5, 4, 8, 8, 0° )
  • the ratio of «2 is determined by the feature prototype and is a fixed value. That is, all the rectangular features derived from the same feature prototype are scaled by the feature prototype, and the weight ratio does not change.
  • the integral map is defined as: x' ⁇ x ⁇ y (3) where /(x' ) is the pixel value of the image at point (x').
  • the image/integral map can be calculated as follows:
  • Ii(x, y) ii(x -l,y) + s(x, y) ( 4 )
  • integral map of the image can be used to quickly and easily calculate the gray level integral map of all pixels in any rectangle in the image as shown in Fig. 6(a).
  • point 1 The value of integral image 1 (where Sum is summed):
  • the eigenvalue of the rectangular feature is the difference between two different rectangular partitions.
  • the eigenvalue of any rectangular feature can be calculated by (9). The following is an example of the feature prototype A in Figure 6(b). .
  • the feature values of the feature prototype are defined as:
  • Figure 8 shows three features in order: rectangular coding, diamond coding and elliptical coding.
  • rectangular coding for a rectangular feature, a plurality of small rectangles are arranged in a rectangular shape, and the numbers in the small rectangles in the left figure are the pixel values of the position; the average value of the pixels at all positions of the rectangular feature is greater than the average The pixel value of the value The effective value is set to 1, and the pixel value smaller than the average value is set to 0.
  • the edge position of the feature is selected for calculation and comparison, thus forming a rectangular feature composed of elements 1, 0, such as
  • the figure shows a rectangular box with an element of 0, that is, a rectangular box with a pixel value lower than the average value, which is considered to be a rectangular frame that can represent the area of the human body (head, hand, foot), such as the black rectangular frame in the right picture.
  • the feature template shown in the right figure is formed.
  • the features listed in Figure 8 are all described in 8-bit binary numbers. For the middle image of each feature, starting from the upper left corner of the feature, the code is sorted clockwise around the edge. This description facilitates intuitively obtaining the internal structure of the feature, such as a rectangle.
  • the feature is described by 00010001 (no center position value), and can also be expressed as a decimal number, converted to decimal as 25; the diamond shape is: 00011001 or 25; the ellipse feature is 00110001 or 49, and the three feature codes in Figure 8 are reflected.
  • MP is the universal eigenvalue, representing the area of the black rectangular frame of the human body part (head, hand, foot) area
  • g £ is the judgment threshold function
  • S is the rectangular frame
  • the present invention extends a simple Omni-direction feature to a multi-layered Omni-direction Features.
  • the purpose of creating multilayer universal feature is to reduce the damage position information as direct black and white areas caused by poor demand, while maintaining the integrity of the location information (universal feature of the multilayer (Omni -direction Features) As shown in Figure 9.
  • the rectangular area is progressively progressive from black to white.
  • the feature is calculated by subtracting the outermost layer from the pixels of the black rectangular area.
  • the pixels of the white area are followed by the sum of the pixels surrounding the gray area in the black rectangular area.
  • Multi-layer Omni-direction Features make the image's features softer.
  • Step 3 Train the classifier based on the parallel cascading statistical learning algorithm for the universal feature to obtain those universal features with the greatest contribution.
  • a large number of human body parts (head, hand, foot) features can be extracted by Omni-direction Features, but some features may not be meaningful at the detection stage. Selecting and concentrating these features is necessary to reduce the redundancy and reliability of the features without affecting the detection rate.
  • the present invention employs a theory based on statistical learning to select those features that have the greatest contribution. The contribution is the validity of the selected feature for the detection system, that is, whether the selected feature can be effectively determined whether the image to be detected contains a human body part.
  • the number of features applied to statistical learning is very large in general, and the number of samples needs to meet a certain ratio, it is very difficult to apply all the features to a classifier training.
  • FIG. 10 is a flowchart of a statistical learning training method according to the present invention.
  • the training goal here is to analyze the true and false samples obtained by the judgment, select the T weak classifiers with the lowest classification error rate, and finally optimize the combination into a strong classifier.
  • the training method is specifically as follows:
  • the direction of determining the inequality is only ⁇ 1.
  • the weight in the classifier is inversely proportional to the classification error of ⁇ .
  • T optimal weak classifiers are obtained, and T optimal weak classifiers are formed into a strong classifier
  • the strong classifiers obtained from each training are combined to form a hierarchical classifier.
  • the strong classifier of each layer in the hierarchical classifier is threshold-adjusted so that each layer can pass almost all parts of the body (head, hand, foot) and reject a large part of the non-human parts (head, hand, foot) ) Sample.
  • a simple classifier with a simpler structure consisting of more important features is placed in front, so that a large number of false samples can be excluded first. Although the number of rectangles increases with the increase of the number of stages, the amount of calculation is decreasing, and the speed of detection is increasing, so that the method of the invention has good real-time performance.
  • Figure 11 depicts a parallel cascade classifier that can solve this problem very well.
  • a large number of Omni-direction features are randomly divided into n groups, ⁇ fl, f2...fn ⁇ .
  • Those good candidate features are selected from the parallel groups by the classifier.
  • the relatively large-scale features selected from each group are combined into a new set of features, and then the strong classifiers can be applied to the newly composed feature sets to train those features that contribute the most.
  • the training of the set classifier is stopped when 600 features are reached.
  • the specific implementation method is:
  • Step 4 Based on the universally-characteristic feature classifier obtained in step 3, the image read in real time from the depth camera is subjected to classification detection of the human body part, and the detected human body part area is marked and displayed.
  • step 4 includes the following steps:
  • Step 4.1 Capture a frame image of a video read from a depth camera
  • Step 4.2 Performing a deep normalization process on the captured frame image
  • the pixel value of the depth image is from 0 to 9999. To speed up subsequent calculations, the pixel values of the image need to be normalized to 0 to 255.
  • the specific steps of the depth normalization process are: Step 4.2.1, setting a depth histogram array g_pDepthHist[10000] of size 10000 to calculate the pixel distribution;
  • Step 4.2.2 traversing the depth image captured from the depth camera, looking for the index value corresponding to the depth, the index value is not 0, the number of the depth pixel value is increased by 1, g_pDepthHist[curDepth]++, and the cumulative depth
  • the total number of index values not 0 is nNumberOfPoints;
  • Step 4.2.5 traverse the depth picture, look up the depth lookup table array according to the depth value, and obtain the depth value of the [0 ⁇ 255] interval (unsigned int) g_pDepthHist[dep];
  • Step 4.3 extracting a sub-image from the captured frame image based on the multi-scale mechanism of the recognition window, and respectively detecting whether the sub-image is a human body part by using a classifier constructed based on the universal feature with the largest contribution;
  • the initial recognition window is generally set to the same size as the body part training sample, that is, the initial human head part recognition window is 24 X 28 pixels, the initial human hand part recognition window is 28 X 24 pixels, and the initial human foot part recognition window is 24 X 24 pixels. Then, the entire image is acquired from the upper left corner of the frame image to obtain the sub-image. After each traversal, the recognition window is enlarged once, and then the entire image is traversed to obtain the sub-image until the size of the recognition window is larger than the size of the image. The larger the window magnification ratio, the less the number of magnifications of the recognition window, the less the sub-image data is intercepted, the lower the recognition rate, but the recognition speed will increase, and vice versa.
  • the multi-scale mechanism of the recognition window extracts the sub-image by changing the size of the recognition window, avoiding the scaling transformation of the image in the conventional method, and reducing the amount of calculation.
  • the cascaded classifier based on the universally-contributed universal feature is used to detect whether the sub-image is a human body part, and after the detection by the cascade classifier based on the universally-contributed universal feature, A large part of the sub-image area in the recognition frame picture is quickly recognized as a non-human part (head, hand, foot) by the first few layers of the parallel classifier. Area, only sub-images that may actually contain human parts (heads, hands, feet) can reach the strong classifier of the last layer.
  • Step 4.4 Combine the sub-images detected as human body parts to obtain the final detection result of each human body part in the frame picture, and display the detected human body part area.
  • step 4.3 a plurality of sub-images that may actually include a human body part are detected, and the sub-images detected as human body parts are combined, and only the merged sub-images satisfying certain conditions are finally determined to actually contain the human body parts (hand, Head, foot), the certain condition here refers to a certain number of sub-images judged by the human body part near the sub-image judged by the human body part, that is, a plurality of sub-images judged by the human body part overlap. Conversely, if it is only a sub-image that is isolated and scattered, this sub-image is considered to be noise, or an indeterminate part of the human body.
  • the combined processing of the test results can remove a lot of misunderstandings and further improve the accuracy of the test results.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Theoretical Computer Science (AREA)
  • Human Computer Interaction (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Analysis (AREA)

Abstract

Disclosed is a human body part detection method based on the parallel statistics learning of 3D depth image information. For problems that human body parts (head, hands, and feet) are complicated in shape changes and hard to describe and so on, a novel feature that embodies the diversity of human body parts is constructed, i.e. a universal feature, a parallel statistics learning method is applied to select effective and sufficient novel features and form a parallel cascaded classifier, thus performing real-time and highly efficient detection of human body parts.

Description

基于三维深度图像信息的并行统计学习人体部位检测方法 技术领域 本发明涉及图像处理、 模式识别、 人机交互及视觉监控等领域, 尤 其是一种基于三维深度图像信息的并行统计学习人体部位检测方法。  FIELD OF THE INVENTION The present invention relates to the fields of image processing, pattern recognition, human-computer interaction, and visual monitoring, and more particularly to a method for parallel detection of human body parts based on three-dimensional depth image information. .
背景技术 随着计算机性能的逐歩提高和各个领域对计算机使用的不断深入, 人与计算机的交互技术日益成为计算机领域的研究热点。 基于动态序列 图像的目标识别已经成为近年来计算机视觉领域中备受关注的研究内 容, 它主要从图像序列中检测、 识别、 跟踪以及对生物特征理解和描述 进行研究。 BACKGROUND OF THE INVENTION With the improvement of computer performance and the continuous use of computers in various fields, the interaction technology between humans and computers has increasingly become a research hotspot in the field of computers. Target recognition based on dynamic sequence images has become a research topic in the field of computer vision in recent years. It mainly detects, identifies, tracks and understands and describes biometrics in image sequences.
目标检测是目标识别中最为关键的一歩, 是研究如何让计算机以人 的思维方式从图像或视频中找出目标对象所在区域的技术。 其中人手检 测技术是研究难度最大的一个问题。 目前, 赤手交互成为虚拟游戏中非 常吸引人的一项应用, 这将引起新一轮的对人体部位检测实时性研究的 热潮。  Target detection is the most critical aspect of target recognition. It is a technique to study how to let a computer find the target object's area from an image or video in a human mind. Among them, the manual detection technology is the most difficult problem to study. At present, bare-hand interaction has become an attractive application in virtual games, which will lead to a new round of real-time research on human body parts detection.
在过去的 20 年中, 大量的目标检测方法被提出。 例如, 基于神经 网络的检测方法、 基于支持向量机的检测算法、 基于隐形马尔可夫模型 的检测方法和基于概率的检测方法。 然而, 大多数的算法都只是应用图 像的原始像素作为特征, 他们大多对光照变化和噪声十分敏感。 目前最 主流的目标检测方法是基于 AdaBoost学习的统计模型方法。  In the past 20 years, a large number of target detection methods have been proposed. For example, a neural network based detection method, a support vector machine based detection algorithm, a stealth Markov model based detection method, and a probability based detection method. However, most algorithms only use the original pixels of the image as features, and they are mostly sensitive to illumination changes and noise. The most mainstream target detection method at present is based on the statistical model method of AdaBoost learning.
人体部位检测涉及图像处理、 模式识别、 人机交互及视觉监控等领 域, 在虚拟现实、 人机交互、 视觉监控等领域均有着广阔的应用。 人体 部位检测不仅需要完成目标特征的构造和进行相应的脱机训练, 实现实 时的动态监测, 同时还要排除背景噪声和不特定的干扰等问题, 这也是 需要面临和克服的挑战性问题。 发明内容 由于人体部位(头、手、脚)具有多样性、方向性、 多义性等因素, 仅仅应用现有的简单特征训练并不能得到理想的检测效果。 为了解决人 体部位 (头、 手、 脚) 检测中特征多样性的问题, 以及获得实时的检测 效果, 本发明提供了一种新型的特征一万向特征 (Omm -direction Features) , 结合并行级联的统计学习算法进行人体部位检测, 保证实时 检测的情况下实现了高检测率。 从而在目标检测和模式识别等方面具有 重要的作用。 Human body parts detection involves image processing, pattern recognition, human-computer interaction and visual monitoring, and has a wide range of applications in virtual reality, human-computer interaction, and visual surveillance. The detection of human body parts not only needs to complete the construction of target features and corresponding offline training, real-time dynamic monitoring, but also eliminate background noise and unspecified interference, which is also a challenging problem that needs to be faced and overcome. SUMMARY OF THE INVENTION Due to the diversity, directionality, ambiguity and the like of the human body parts (head, hand, foot), the application of the existing simple feature training does not achieve the desired detection effect. In order to solve the problem of feature diversity in the detection of human body parts (heads, hands, feet), and to obtain real-time detection effects, the present invention provides a novel feature Omm-direction Features, combined with parallel cascading The statistical learning algorithm performs human body part detection to ensure high detection rate under real-time detection. Thus it plays an important role in target detection and pattern recognition.
本发明所处理的数据源是三维深度图像, 这与常见的灰度图像和彩 色图像有很大的不同。 三维深度图像是将摄像头与拍摄目标的各个像素 点的距离读取并储存而获得的图像数据, 用不同的灰度来体现图像中像 素点的距离信息。  The data source processed by the present invention is a three-dimensional depth image, which is quite different from common grayscale and color images. The three-dimensional depth image is image data obtained by reading and storing the distance between the camera and each pixel of the shooting target, and different distances are used to represent the distance information of the pixel points in the image.
本发明所提出的一种基于三维深度图像的人体部位检测方法, 其特 征在于, 该方法包括以下歩骤:  A method for detecting a human body part based on a three-dimensional depth image according to the present invention is characterized in that the method comprises the following steps:
歩骤 1, 采用深度摄像头采集多幅三维深度图像并对其进行处理, 建立人体部位样本数据库;  Step 1, using a depth camera to collect and process a plurality of three-dimensional depth images, and establishing a sample database of human body parts;
歩骤 2, 对于人体部位样本数据库中的每幅图像, 构造描述各人体 部位的万向特征;  Step 2: constructing a universal feature describing each human body part for each image in the body part sample database;
歩骤 3, 对于所述万向特征基于并行级联的统计学习算法训练分类 器, 得到贡献力最大的那些万向特征;  Step 3: training the classifier based on the parallel cascading statistical learning algorithm for the universal feature, and obtaining those universal features having the greatest contribution;
歩骤 4, 基于歩骤 3得到的贡献力最大的万向特征, 对从深度摄像 头实时读入的图像进行人体部位的检测, 并对检测出的人体部位区域进 行标注显示。  Step 4: Based on the universal characteristic obtained by the step 3, the human body part is detected on the image read in real time from the depth camera, and the detected human body part area is marked and displayed.
本发明的有益效果是:  The beneficial effects of the invention are:
a.目标实时检测, 保证了实时的检测速度, 检测效果优越;  a. Target real-time detection, ensuring real-time detection speed, superior detection effect;
b.相比较其他特征如 haar-like特征, 运用万向特征 (Omni -direction Features ) 大大提高了检测率;  b. Compared with other features such as haar-like features, the use of Omni-direction Features greatly improves the detection rate;
c.运用并行级联的分类器进行训练, 由于是分层训练的关系, 每次分 组到特征集的特征数目远小于未分组时的特征数目, 在训练时间方面有 很大的提高, 训练速度是原来速度的 N-1倍 (N为特征的分组数); d.因为设定分类器的训练在达到 600个特征的时候就会停止(分类器 必须设定一个停止的参数, 无限制的训练下去, 到后期的训练已经没有 意义), 原来未分组的分类器在特征的选择上会受到分类器特征数目的 限制, 选择到的特征不够精细和丰富。 分组训练所能挑选的特征虽然也 受到这一因素影响, 但因为是分成了 N组的原因, 每组分配到的特征基 本都可以完全参与训练选择, 大大提高了可选择的特征数目; c. Using parallel cascaded classifiers for training, because of the hierarchical training relationship, the number of features each time grouped into the feature set is much smaller than the number of features when not grouped, in terms of training time Greatly improved, the training speed is N-1 times the original speed (N is the number of characteristic groups); d. Because the training of the set classifier stops when it reaches 600 features (the classifier must set one The parameters of the stop, the unrestricted training, and the later training have no meaning. The original ungrouped classifier is limited by the number of classifier features in the feature selection, and the selected features are not detailed and rich. Although the characteristics that can be selected by group training are also affected by this factor, because it is divided into N groups, the characteristics assigned to each group can basically participate in the training selection, which greatly improves the number of selectable features.
e. 在检测率上有很大提高。 因为选择到的特征的贡献比未分组的特 征贡献性总体要好, 所以在误判率上有很好的改善, 误检率降低了近 3 倍。  e. There is a significant increase in detection rates. Since the contribution of the selected feature is better than the uncontributed feature contribution overall, the false positive rate is improved, and the false positive rate is reduced by nearly 3 times.
本发明运用前景广泛, 在目标检测, 模式识别, 计算机图像处理等 方面有着重要作用, 也为计算机三维应用在实时检测跟踪方面提供了应 用趋势。  The invention has wide application prospects and plays an important role in target detection, pattern recognition, computer image processing, etc., and also provides an application trend for real-time detection and tracking of computer three-dimensional applications.
附图说明 图 1是本发明所提出的基于三维深度图像信息的并行统计学习人体 部位检测方法流程图。 BRIEF DESCRIPTION OF THE DRAWINGS FIG. 1 is a flow chart of a parallel statistical learning human body part detecting method based on three-dimensional depth image information proposed by the present invention.
图 2是本发明人体部位样本数据库的例图。  Fig. 2 is a view showing an example of a human body part sample database of the present invention.
图 3是本发明的万向特征(Omni -direction Features) 的矩形块表示 图。  Figure 3 is a rectangular block representation of the Omni-direction Features of the present invention.
图 4是本发明的九种简单的万向特征 ( Omni -direction Features ) 的 构造图。  Figure 4 is a structural diagram of nine simple Omni-direction features of the present invention.
图 5是本发明的一种万向特征 (Omni -direction Features) 的特征值 计算示意图。  Fig. 5 is a diagram showing the calculation of the eigenvalues of an Omni-direction feature of the present invention.
图 6是利用图像积分图快速计算矩形特征值的示意图。  FIG. 6 is a schematic diagram of rapidly calculating a rectangular feature value using an image integration map.
图 7是样本特征计算方法流程图。  Figure 7 is a flow chart of the sample feature calculation method.
图 8是本发明的三种扩展的万向特征 (Omni -direction Features)构 造图。 图 9是本发明的多层万向特征(Omni -direction Features)的构造图。 图 10是本发明统计学习训练方法的流程图。 Figure 8 is a diagram showing three expanded Omni-direction Features of the present invention. Figure 9 is a configuration diagram of the Omni-direction Features of the present invention. 10 is a flow chart of a statistical learning training method of the present invention.
图 11是本发明的并行级联分类器的构造图。  Figure 11 is a configuration diagram of a parallel cascade classifier of the present invention.
图 12是本发明对图像进行实时检测的方法流程图。  Figure 12 is a flow chart of a method for real-time detection of an image of the present invention.
具体实施方式 为使本发明的目的、 技术方案和优点更加清楚明白, 以下结合具体 实施例, 并参照附图, 对本发明进一歩详细说明。 DETAILED DESCRIPTION OF THE INVENTION In order to make the objects, the technical solutions and the advantages of the present invention more comprehensible, the present invention will be described in detail below with reference to the accompanying drawings.
本发明是基于统计学习的目标检测原理, 对所获取的三维深度图像 进行目标检测跟踪。 图 1是本发明所提出的基于三维深度图像信息的并 行统计学习人体部位检测方法流程图, 如图 1所示, 本发明所提出的基 于三维深度图像的人体部位检测方法包含以下几个歩骤:  The invention is based on the principle of target detection of statistical learning, and performs target detection and tracking on the acquired three-dimensional depth image. 1 is a flowchart of a parallel statistical learning human body part detecting method based on three-dimensional depth image information according to the present invention. As shown in FIG. 1 , the method for detecting a human body part based on a three-dimensional depth image according to the present invention includes the following steps: :
歩骤 1, 采用深度摄像头采集多幅三维深度图像并对其进行处理, 建立人体部位样本数据库。  Step 1. The depth camera is used to collect and process a plurality of three-dimensional depth images, and a human body part sample database is established.
在基于统计学习的检测方法中, 除了学习算法的性能以及特征形式 对检测器性能有较大影响外, 训练集也是一个关键的因素, 如果训练集 选取不当,会严重影响检测效果。本发明的样本采集设备为深度摄像头, 采集地点为 CASIA (中国科学院自动化研究所) 高新技术创新中心。 采 集时从深度摄像头读入数据并对读入的视频抓取帧图片保存。 所采集的 三维深度图像所储存的数据是摄像头与拍摄视角内各个感兴趣的目标 的距离的深度信息。 样本数据库的建立原则是尽可能的涵盖较多环境下 和具有较多姿势的人体部位 (头、 手、 脚) 图像, 使得选取的样本足够 丰富。本发明的训练样本集通过采集 86人,每个人预设 21个规定动作, 制作了一个由 10000个人体三维深度图像组成的初始数据集, 将所获取 的所有图片统一归一化为分辨率为 320 X 240 像素的图片, 图片均为 BMP格式的深度信息图像。从归一化后的图片中分割出头部、手部和脚 部,并重新设定人头部样本为 24 X 28像素、人手部样本为 28 X 24像素、 人脚部样本为 24 X 24像素。 排除遮挡和外界噪声等影响的样本, 得到 实验人头部、 手、 脚的正样本各 8000 个, 从归一化图片数据中切割出 7500幅非人体部位(头、手、脚)的负样本图片。将人体部位: 头、手、 脚的 8000幅正样本分别和 7500幅负样本组合成人体部位(头、手、脚) 的三个样本数据库。 图 2是人体部位(头、手、脚)训练的资料, 其中, 1010101表示人体头部正样本, 1010102表示人体手部正样本, 1010103 表示人体脚部正样本, 10102表示非人体部位(头、 手、 脚) 即负样本 图片。 In the statistical learning-based detection method, in addition to the performance of the learning algorithm and the characteristic form have a great influence on the performance of the detector, the training set is also a key factor. If the training set is improperly selected, the detection effect will be seriously affected. The sample collection device of the invention is a depth camera, and the collection location is CASIA (Institute of Automation, Chinese Academy of Sciences) High-tech Innovation Center. The data is read from the depth camera during acquisition and the captured video frame is saved. The data stored in the acquired three-dimensional depth image is depth information of the distance between the camera and each of the objects of interest within the shooting angle of view. The principle of establishing a sample database is to cover as much as possible the images of human body parts (heads, hands, feet) with more postures and more postures, so that the selected samples are rich enough. The training sample set of the invention collects 86 people, each person presets 21 prescribed actions, and creates an initial data set composed of 10000 human body three-dimensional depth images, and normalizes all the acquired images into resolutions. 320 X 240 pixel images, pictures are depth information images in BMP format. The head, the hand and the foot are separated from the normalized picture, and the human head sample is 24 X 28 pixels, the human hand sample is 28 X 24 pixels, and the human foot sample is 24 X 24 Pixel. Exclude samples of occlusion and external noise, etc. The experimental subjects had 8000 positive samples of the head, hands and feet, and 7500 non-human parts (head, hand, foot) negative sample images were cut out from the normalized image data. The human body parts: 8000 positive samples of the head, hands and feet and 7500 negative samples are combined with three sample databases of the adult body parts (head, hand, foot). Figure 2 shows the training of human body parts (head, hand, foot). Among them, 1010101 represents the positive sample of the human head, 1010102 represents the positive sample of the human hand, 1010103 represents the positive sample of the human foot, and 10102 represents the non-human body part (head, Hand, foot) is the negative sample picture.
歩骤 2, 基于人体部位样本数据库构造描述各人体部位的万向特征 以克服人体部位变化的多义性和多样性。  Step 2: Describe the universal features of each human body part based on the human body part sample database structure to overcome the ambiguity and diversity of the human body part changes.
人体部位 (头、 手、 脚) 存在多义性和多样性特征。 例如人手在变 化的时候, 其姿态是万千变化的, 这增加了人手检测的难度。 由于人体 部位 (头、 手、 脚) 具有多样性、 方向性等因素, 所以至今没有很好的 描述这类特征的算法。本发明提出了一种能够很好的描述人体部位(头、 手、脚)方向性和多样性特性的新型特征——万向特征(Omni -direction Features )。  The human body parts (head, hands, feet) are characterized by ambiguity and diversity. For example, when the human hand is changing, its posture is varied, which increases the difficulty of human detection. Since the human body parts (heads, hands, feet) have various factors such as diversity and directionality, so far there is no good algorithm for describing such features. The present invention proposes a novel feature that can well describe the directionality and diversity characteristics of the human body parts (head, hand, foot) - Omni-direction Features.
分析采集到的正样本数据, 在每张正样本图片中人体部位只能从一 个方向伸进正样本图片, 并且稳定的占据样本图片的中间位置, 根据这 一样本特性, 样本中间位置的平均深度要比周围位置的深度大, 通过中 间矩形区域和***矩形区域的深度差值来建造新的特征, 称做万向特征 ( Omni -direction Features )。 万向特征 (Omni -direction Features) 为类 矩形特征 (类矩形特征为各种矩形框经过相互重叠、 遮挡、 错层等方式 组合得到的形状特征), 是通过出现人体部位的矩形区域和周围矩形区 域按照一定的权值关系进行组合得到的, 分为任意位置的单层次单矩形 特征; 多层次多矩形特征; 组合矩形特征; 组合菱形特征; 组合椭圆特 征; 组合对角对称特征等类型。 图 4中特征 10201-特征 10209为单层次 单矩形特征, 图 8中 10210为组合矩形特征, 10211为组合菱形特征, 10212为组合椭圆特征, 图 9中特征 10213为多层次多矩形特征, 特征 10214-特征 10217为组合对角对称特征。  Analyze the collected positive sample data. In each positive sample picture, the human body part can only extend into the positive sample picture from one direction, and stably occupy the middle position of the sample picture. According to the sample characteristics, the average depth of the sample intermediate position To be larger than the depth of the surrounding position, a new feature is built by the difference in depth between the middle rectangular area and the outer rectangular area, called Omni-direction Features. Omni-direction Features is a class-like rectangle feature (a rectangular feature is a combination of various rectangular frames that are superimposed, occluded, and split layered), and is a rectangular area and a surrounding rectangle through which human body parts appear. The regions are combined according to a certain weight relationship, and are divided into single-level single-rectangular features at arbitrary positions; multi-level and multi-rectangular features; combined rectangular features; combined diamond features; combined elliptical features; combined diagonal symmetric features and the like. The feature 10201-feature 10209 in FIG. 4 is a single-level single-rectangular feature, 10210 in FIG. 8 is a combined rectangular feature, 10211 is a combined diamond feature, 10212 is a combined elliptical feature, and feature 10113 in FIG. 9 is a multi-level multi-rectangular feature, feature 10214 - Feature 10217 is a combined diagonal symmetry feature.
所有类型特征都可以通过图像积分进行快速计算和提取。 通过万向 特征(Omni -direction Features)可以很好的描述人体部位(头、手、脚) 的多义性、 多样性和复杂形变等本质结构特性。 所有列出的特征均可以 一定程度的表征人体部位 (头、 手、 脚) 的特性, 所有列出的特征是为 了阐述万向特征 (Omni -direction Features) 的概念和构造原理, 万向特 征 (Omni -direction Features) 包含但不限于所列特征。 All types of features can be quickly calculated and extracted by image integration. Through universal Omni-direction Features can well describe the essential structural characteristics of human body parts (heads, hands, feet) such as ambiguity, diversity and complex deformation. All of the listed features can characterize the characteristics of the body parts (head, hand, foot) to a certain extent. All the listed features are to illustrate the concept and construction principle of Omni-direction Features, universal features ( Omni -direction Features) Includes, but is not limited to, the listed features.
现就万向特征 (Omni -direction Features) 进行简单的描述和说明: a) 矩形的表示:  A brief description and description of the Omni-direction Features is now available: a) Representation of the rectangle:
如图 3所示, 假设图像中存在人体部位 (头、 手、 脚) 的子图像窗 口区域由 W*H个像素组成,用一个五元组来表示子图像中的任意矩形: r = (x,y,w,h,a), 其中 (xj) 为矩形左上顶点的坐标, w和 ?为矩形的 长和宽, 《为矩形的旋转角度, W, H分别表示子图像窗口的长和宽。 并且它们满足:  As shown in Fig. 3, it is assumed that the sub-image window area of the human body part (head, hand, foot) in the image is composed of W*H pixels, and a five-tuple is used to represent any rectangle in the sub-image: r = (x , y, w, h, a), where (xj) is the coordinate of the upper left vertex of the rectangle, w and ? are the length and width of the rectangle, "for the rotation angle of the rectangle, W, H respectively represent the length and width of the sub-image window . And they satisfy:
0≤x,x + w≤ ;0≤_y,_y + w, 2>0;a匚 [0。,450]。 0 ≤ x, x + w ≤ ; 0 ≤ y, _y + w, 2>0; a 匚 [0. , 45 0 ].
b) 矩形特征的表示:  b) representation of the rectangular feature:
如图 4所示, 以 10201特征为例, 黑色矩形区域可以代表人体部位 As shown in Figure 4, taking the 10201 feature as an example, the black rectangular area can represent the human body part.
(头、 手、 脚) 区域, 可以以任何方向和大小定位到整个矩形区域的任 何位置。 一个简单的万向特征 (Omni -direction Features) 是通过累加周 围白色区域的像素的值, 再减去中间黑色矩形区域的像素的累加和得到 的,图 4中 10201特征代表白色区域面积是黑色区域面积 6.25倍的特征。 The (head, hand, foot) area can be positioned anywhere in the entire rectangular area in any direction and size. A simple Omni-direction feature is obtained by accumulating the values of the pixels in the surrounding white area and subtracting the sum of the pixels in the middle black rectangular area. The feature of 10201 in Figure 4 represents that the white area is a black area. Features 6.25 times the area.
矩形特征值用如下公式表示:  The rectangular feature values are represented by the following formula:
feature i = ^ ωί -RQcS m^) Feature i = ^ ω ί -RQcS m^)
ieI={l,-,N} (1) 其中, 为第 z个矩形的权值, Rec m )代表第 z个矩形内所有像素值 的和, N为组成特征的矩形个数。 ieI = {l, -, N } (1) where z is the rectangle of weights, Rec m) and the representative of all the pixel values within the rectangle of the z, N is the number of rectangles features.
假设组成图 5所示的矩形特征 10201的两个矩形分别为 , 。其中 包含^ 且 的面积等于 6.25倍的 的面积。 根据矩形权值异号且与其 面积成反比,得到两个矩形的权值比为 -1:6.25。根据图 3定义的五元组, =(0,0,20,20,0°) 2=(5,4,8,8,0°), 那么由公式(1)给出的计算矩形特征的 一般方法, 可得到该矩形特征为: It is assumed that the two rectangles constituting the rectangular feature 10201 shown in FIG. 5 are respectively . The area containing ^ and the area equal to 6.25 times. According to the rectangular weight and the inverse of its area, the weight ratio of the two rectangles is -1: 6.25. According to the quintuple defined in Figure 3, =(0,0,20,20,0°) 2 =(5,4,8,8,0°), then the calculation of the rectangular feature given by equation (1) In general, the rectangular feature is:
featurei = -1 · Re cSum(0, 0, 20, 20, 0° ) + 6.25 · Re cSum(5, 4, 8, 8, 0° ) 其中, : «2的比值由特征原型确定, 是一个固定值。 即从同一特征原型 中派生出来的所有矩形特征, 都是该特征原型的缩放, 其权值比不发生 变化。 Feature i = -1 · Re cSum(0, 0, 20, 20, 0° ) + 6.25 · Re cSum(5, 4, 8, 8, 0° ) Among them, the ratio of «2 is determined by the feature prototype and is a fixed value. That is, all the rectangular features derived from the same feature prototype are scaled by the feature prototype, and the weight ratio does not change.
由于训练样本有上万个, 并且万向特征 (Omni -direction Features) 的数量十分庞大, 如果每次进行特征值的计算都需要统计矩形内所有像 素之和,将会大大影响训练和检测的速度。 Paul Viola等引入了一种新的 图像的表示方法一积分图像, 矩形特征的特征值计算, 只与此特征矩 形的端点的积分图有关, 所以不管此特征矩形的尺度变换如何, 特征值 的计算所消耗的时间都是常量 (样本特征计算流程图如图 7 所示)。 这 样只要遍历图像一次, 就可以求得所有子窗口的特征值。 利用它可以快 速计算矩形特征。  Since there are tens of thousands of training samples and the number of Omni-direction Features is very large, if you calculate the eigenvalues every time you need to count the sum of all the pixels in the rectangle, it will greatly affect the speed of training and detection. . Paul Viola et al. introduced a new image representation method, an integral image, and the eigenvalue calculation of the rectangular feature is only related to the integral graph of the endpoint of the feature rectangle, so no matter how the scale transformation of the feature rectangle is, the calculation of the feature value The time consumed is constant (the sample feature calculation flow chart is shown in Figure 7). In this way, as long as the image is traversed once, the feature values of all the child windows can be obtained. Use it to quickly calculate rectangular features.
积分图的定义为: x'≤x ≤y (3) 其中, /(x' )为图像在点 (x' )处的像素值。  The integral map is defined as: x'≤x ≤y (3) where /(x' ) is the pixel value of the image at point (x').
为了节约时间, 减少重复计算, 则图像 /的积分图可按如下递推公 式计算:  To save time and reduce double counting, the image/integral map can be calculated as follows:
s(x,y) = s(x,y-\) + i(x,y)  s(x,y) = s(x,y-\) + i(x,y)
ii(x, y) = ii(x -l,y) + s(x, y) (4) 其中, 为点 (XJ处的像素值, 为点 (XJ 的累计行像素总和,
Figure imgf000009_0001
为点 (x, 的积分图, χ,-1)=0, "'(-1, =0。
Ii(x, y) = ii(x -l,y) + s(x, y) ( 4 ) where is the point (the pixel value at XJ, which is the point (the sum of the cumulative line pixels of XJ,
Figure imgf000009_0001
For the point (x, the integral graph, χ, -1) = 0, "'(-1, =0.
这样就可以进行 2种运算:  This allows 2 operations to be performed:
(0任意矩形区域内像素积分。 由图像的积分图可方便快速地计算 图像中任意矩形内所有像素灰度积分图如图 6(a)所示。 如图 6(b)所示, 点 1的积分图像 1的值为 (其中 Sum为求和) :  (0 pixel integral in any rectangular area. The integral map of the image can be used to quickly and easily calculate the gray level integral map of all pixels in any rectangle in the image as shown in Fig. 6(a). As shown in Fig. 6(b), point 1 The value of integral image 1 (where Sum is summed):
zzl=Sum (A) (5) 同理, 点 2、 点 3、 点 4的积分图像分别为:  Zzl=Sum (A) (5) Similarly, the integral images of points 2, 3, and 4 are:
z2=Sum(A)+Sum(B); (6) zz3=Sum(A)+Sum(C); (7) ,,4=Sum(A)+Sum(B)+Sum(C)+Sum(D); (8) 矩形区域 D内的所有像素灰度积分可由矩形端点的积分图像值得 到: Z2=Sum(A)+Sum(B); (6) zz3=Sum(A)+Sum(C); (7) ,,4=Sum(A)+Sum(B)+Sum(C)+Sum (D); (8) The gray level integration of all pixels in the rectangular area D can be obtained from the integral image values of the rectangular end points:
Sum(D)= l+ 4-( 2+ 3); (9) Sum(D)= l+ 4-( 2+ 3); (9)
(11) 特征值计算 (11) Eigenvalue calculation
矩形特征的特征值是两个不同的矩形分区图元和之差,由 (9)式可以 计算任意矩形特征的特征值, 下面以图 6(b)中特征原型 A为例说明特征 值的计算。  The eigenvalue of the rectangular feature is the difference between two different rectangular partitions. The eigenvalue of any rectangular feature can be calculated by (9). The following is an example of the feature prototype A in Figure 6(b). .
如图 6(c)所示, 该特征原型的特征值定义为:  As shown in Figure 6(c), the feature values of the feature prototype are defined as:
Sum(A)-Sum(B) (10) 根据 (9) 式则有:  Sum(A)-Sum(B) (10) According to (9), there are:
Sum(A)= 4+ l -(Z2+ 3); (11) Sum(B)= 6+ 3-( 4+ 5); (12) 所以此类特征原型的特征值为:  Sum(A)= 4+ l -(Z2+ 3); (11) Sum(B)= 6+ 3-( 4+ 5); (12) So the eigenvalues of such feature prototypes are:
( 4- 3)-( 2- 1)+( 4- 3)-( 6- 5); (13) 另示: 运用积分图可以快速计算给定的矩形内部所有像素值之和 Sumr:)。 假设 r=(x,y,W,h;),那么通过积分图计算此矩形内部所有像素值之 和等价于下面这个式子: ( 4- 3)-( 2- 1)+( 4- 3)-( 6- 5); (13) Additional: The integral graph can be used to quickly calculate the sum of all pixel values within a given rectangle, Sumr:). Assuming r = (x, y, W , h;), then the sum of all pixel values inside this rectangle is calculated by the integral graph equivalent to the following formula:
Sum(r)= (x+w,y+h)+ (x- 1 ,y- 1 )- (x+w,y- 1 )-ii(x- 1 ,y+h); (14) 由此可见, 矩形特征特征值计算只与此特征端点的积分图有关, 而 与图像坐标值无关。 对于同一类型的矩形特征, 不管特征的尺度和位置 如何, 特征值的计算所耗费的时间都是常量, 而且都只是简单的加减运 算。 其它类型的特征值计算方法类似。  Sum(r)= (x+w,y+h)+ (x-1,y-1)-(x+w,y-1)-ii(x-1,y+h); (14) It can be seen that the rectangular feature eigenvalue calculation is only related to the integral map of the feature endpoint, and is independent of the image coordinate values. For the same type of rectangular feature, regardless of the scale and position of the feature, the time it takes to calculate the feature value is constant, and it is simply a simple addition and subtraction operation. Other types of eigenvalue calculation methods are similar.
c) 编码型万向特征 (Omni -direction Features)  c) Omni-direction Features
由于人体部位 (头、 手、 脚) 方向性和多义性的特征, 很难由统一 的结构化刚体模型进行描述, 所以特征的构造并不局限在正矩形或正方 体的形状, 而是变化万千的形状特征, 图 8给出了三种特征依次为: 矩 形编码、 菱形编码和椭圆编码。 以矩形编码为例, 对于矩形特征, 多个 小矩形排列成矩形的形状, 左图中各个小矩形中的数字为该位置的像素 值; 求取这个矩形特征所有位置的像素平均值, 大于平均值的像素值为 有效值, 设为 1, 小于平均值的像素值设为 0, 为计算简便和有限性, 选取特征的边缘位置进行计算比较即可, 如此, 形成由元素 1、 0 组成 的矩形特征, 如中图所示; 元素为 0的矩形框, 也就是像素值低于平均 值的矩形框, 认为是可以代表人体部位 (头、 手、 脚) 区域的矩形框, 如右图中的黑色矩形框所示, 由此, 形成右图所示的特征模板。 图 8中 所列特征均是以 8位二进制数描述, 对于各个特征的中图, 从特征左上 角开始, 顺时针绕边缘进行编码排序, 这种描述利于直观的得到特征的 内部结构, 如矩形特征以 00011001 (无中心位置值)来描述, 也可以十 进制数表示, 转化为十进制为 25; 菱形特征为: 00011001或 25; 椭圆 特征为 00110001或 49, 图 8中的三种特征编码, 均体现了左下部分和 右上部分的对比关系。 Due to the directionality and ambiguity of the human body parts (head, hand, foot), it is difficult to describe it by a unified structured rigid body model, so the structure of the feature is not limited to the shape of a positive rectangle or a cube, but varies Thousands of shape features, Figure 8 shows three features in order: rectangular coding, diamond coding and elliptical coding. Taking rectangular coding as an example, for a rectangular feature, a plurality of small rectangles are arranged in a rectangular shape, and the numbers in the small rectangles in the left figure are the pixel values of the position; the average value of the pixels at all positions of the rectangular feature is greater than the average The pixel value of the value The effective value is set to 1, and the pixel value smaller than the average value is set to 0. For the convenience and finiteness of calculation, the edge position of the feature is selected for calculation and comparison, thus forming a rectangular feature composed of elements 1, 0, such as The figure shows a rectangular box with an element of 0, that is, a rectangular box with a pixel value lower than the average value, which is considered to be a rectangular frame that can represent the area of the human body (head, hand, foot), such as the black rectangular frame in the right picture. Thus, the feature template shown in the right figure is formed. The features listed in Figure 8 are all described in 8-bit binary numbers. For the middle image of each feature, starting from the upper left corner of the feature, the code is sorted clockwise around the edge. This description facilitates intuitively obtaining the internal structure of the feature, such as a rectangle. The feature is described by 00010001 (no center position value), and can also be expressed as a decimal number, converted to decimal as 25; the diamond shape is: 00011001 or 25; the ellipse feature is 00110001 or 49, and the three feature codes in Figure 8 are reflected. The contrast between the lower left part and the upper right part.
编码型万向特征的定义可以表示为:  The definition of a coded universal feature can be expressed as:
Figure imgf000011_0001
Figure imgf000011_0001
其中: MP 为万向特征值, 代表人体部位 (头、 手、 脚) 区域 的黑色矩形框的面积, g£ 为判定阈值函数, S为矩形框
Figure imgf000011_0002
Where: MP is the universal eigenvalue, representing the area of the black rectangular frame of the human body part (head, hand, foot) area, g £ is the judgment threshold function, and S is the rectangular frame
Figure imgf000011_0002
的个数。 The number.
d) 多层的万向特征 ( Omni -direction Features )  d) Multi-layered universal features ( Omni -direction Features)
直接从黑色区域减去白色区域仅仅体现了两个矩形的特征, 它不能 具体化特征在图像中的位置。 可以通过计算不同系数的多层像素来解决 这个问题; 然而这将带来巨大的计算量。 为了保留位置特征并且获得较 快的计算速度,本发明将简单的一个万向特征(Omni -direction Features) 扩展到多层的一簇万向特征 (Omni -direction Features )。  Subtracting the white area directly from the black area embodies only the features of the two rectangles, which does not materialize the position of the feature in the image. This problem can be solved by computing multi-layer pixels with different coefficients; however, this will bring a huge amount of computation. In order to preserve positional features and achieve faster computational speeds, the present invention extends a simple Omni-direction feature to a multi-layered Omni-direction Features.
创建多层万向特征(Omni -direction Features) 的目的是为了降低因 为直接的黑白区域求差引起的位置信息损坏, 而保持位置信息的完整性 ( 多层的万向特征 (Omni -direction Features) 如图 9所示。 矩形区域是由 黑到白逐级递进的。 特征的计算是由黑色矩形区域的像素和减去最外层 白色区域的像素和之后, 加上围绕在黑色矩形区域的灰色区域的像素和。 多层的万向特征 (Omni -direction Features) 使得图像的特征更加柔和。 同样利用公式 (1): f ∑ A'RecSO, 如图 9中的第一 个特征, 设三个矩形的五元组分别为 =(0,0,20,20,0°), r2 =(5,5,10,10,0°) , r3 =(7, 7,5,5,0°)对应的权值由面积以及比例关系得到: : : ^ = -1: 2: 8。 则矩形特征值为: The purpose of creating multilayer universal feature (Omni -direction Features) is to reduce the damage position information as direct black and white areas caused by poor demand, while maintaining the integrity of the location information (universal feature of the multilayer (Omni -direction Features) As shown in Figure 9. The rectangular area is progressively progressive from black to white. The feature is calculated by subtracting the outermost layer from the pixels of the black rectangular area. The pixels of the white area are followed by the sum of the pixels surrounding the gray area in the black rectangular area. Multi-layer Omni-direction Features make the image's features softer. Also use the formula (1): f ∑ A'RecSO, as shown in the first feature in Figure 9 , let the five rectangles of the five rectangles be = (0,0,20,20,0°), r 2 = (5,5,10,10,0°), r 3 =(7, 7,5,5,0°) The weights are obtained from the area and proportional relationship: : : ^ = -1: 2: 8. Then the rectangular feature value is:
=— 1 · Re cSum(0, 0,20, 20,0°) + 2 -Re cSum(5, 5,10,10, 0°) + 8-Re cSumil, 7,5,5,0°) 。 歩骤 3, 对于所述万向特征基于并行级联的统计学习算法训练分类 器, 得到贡献力最大的那些万向特征。  =— 1 · Re cSum(0, 0,20, 20,0°) + 2 -Re cSum(5, 5,10,10, 0°) + 8-Re cSumil, 7,5,5,0°) . Step 3: Train the classifier based on the parallel cascading statistical learning algorithm for the universal feature to obtain those universal features with the greatest contribution.
可以通过万向特征(Omni -direction Features)提取到大量的人体部 位 (头、 手、 脚) 特征, 但是有些特征在检测阶段未必有实际意义。 挑 选和浓缩这些特征, 对于在不影响检测率的情况下, 减少冗余特征的证 明和计算是十分必要的。 为了克服这个问题, 本发明采用基于统计学习 的理论来选择贡献力最大的那些特征。 所述贡献力是指所选特征对于检 测***的有效性, 也就是判断所选特征是否可以有效的判定待检测图像 是否含有人体部位。 但由于一般情况下, 应用到统计学习的特征数量非 常大, 并且样本的数量需要满足一定的比例, 因此将所有的特征应用到 一个分类器训练是一件十分困难的事情。 所以本发明提出图 10 所示的 一种并行级联的统计学习算法, 采用分类并行的分类训练 103最终组合 成强分类器 104来解决这个问题。 每种分类器的训练在达到 600个特征 的时候就会停止(分类器必须设定一个停止的参数,无限制的训练下去, 后期的训练已经没有意义)。图 10为本发明统计学习训练方法的流程图。  A large number of human body parts (head, hand, foot) features can be extracted by Omni-direction Features, but some features may not be meaningful at the detection stage. Selecting and concentrating these features is necessary to reduce the redundancy and reliability of the features without affecting the detection rate. To overcome this problem, the present invention employs a theory based on statistical learning to select those features that have the greatest contribution. The contribution is the validity of the selected feature for the detection system, that is, whether the selected feature can be effectively determined whether the image to be detected contains a human body part. However, since the number of features applied to statistical learning is very large in general, and the number of samples needs to meet a certain ratio, it is very difficult to apply all the features to a classifier training. Therefore, the present invention proposes a parallel cascaded statistical learning algorithm as shown in Fig. 10, which uses classification parallel training 103 to finally combine into a strong classifier 104 to solve this problem. The training of each classifier will stop when it reaches 600 features (the classifier must set a stop parameter, unrestricted training, and later training is meaningless). FIG. 10 is a flowchart of a statistical learning training method according to the present invention.
(0 统计学习方法  (0 statistical learning method
此处的训练目标是通过对判断得出的真假样本进行分析, 选择分类 错误率最低的 T个弱分类器, 最终优化组合成一个强分类器。 训练方法 具体为:  The training goal here is to analyze the true and false samples obtained by the judgment, select the T weak classifiers with the lowest classification error rate, and finally optimize the combination into a strong classifier. The training method is specifically as follows:
1、 给定训练集: (^ ),...,^^^), 其中 {1,-1}, 表示样本 的正 确的类别标签, z = l,...,N ,令 ^ )表示第 z副图像的第 个特征值。 2、 计算训练集上样本的初始分布: A(0 (17) m 1. Given a training set: (^ ),...,^^^), where {1,-1}, indicates the correct category label for the sample, z = l,...,N , and ^ ) The first feature value of the zth sub-image. 2. Calculate the initial distribution of the sample on the training set: A(0 (17) m
3、 对于所有样本的所有特征 (此处省略了按照前文所述歩骤计算 样本图像积分、 使用万向特征模板计算万向特征值、 最终得到特征集合 的歩骤), 寻找弱分类器 (t = l, ..., T ) 0 对于每个样本中的第 个特征, 可以得到一个弱分类器 ^ 即可得到阈值 和方向 C 使得 s3 =∑Dt(x1 h]为:
Figure imgf000013_0001
3. For all the features of all samples (the steps of calculating the sample image integral according to the previous step, calculating the gimbal feature value using the universal feature template, and finally obtaining the feature set are omitted), and looking for the weak classifier (t = l, ..., T ) 0 For the first feature in each sample, a weak classifier ^ can be obtained to get the threshold and direction C such that s 3 = ∑D t (x 1 h) is:
Figure imgf000013_0001
其中, 决定不等式的方向, 只有 ±1两种情况。 Among them, the direction of determining the inequality is only ±1.
4、在所有样本的所有特征中挑选出一个具有最小误差^的弱分类器  4. Pick a weak classifier with the smallest error ^ among all the features of all samples.
5、 对所有的样本权重进行更新:
Figure imgf000013_0002
5. Update all sample weights:
Figure imgf000013_0002
N  N
其中, 4是使∑Di+1(x,) = l的归一化因子, 为弱分类器 在强分 Where 4 is the normalization factor for ∑D i+1 (x,) = l, which is the weak classifier in the strong score
i=l  i=l
类器 中的权重, 和 ^的分类错误成反比。 The weight in the classifier is inversely proportional to the classification error of ^.
6、 经过 T轮训练得到 T个最优的弱分类器, 将 T个最优的弱分类 器组成一个强分类器;
Figure imgf000013_0003
6. After T round training, T optimal weak classifiers are obtained, and T optimal weak classifiers are formed into a strong classifier;
Figure imgf000013_0003
7、 经过 L次训练得到 L个强分类器。  7. After L training, L strong classifiers are obtained.
将每次训练得到的强分类器组合在一起形成分级分类器。 分级分类 器中每层的强分类器经过阈值调整, 使得每一层都能让几乎全部的人体 部位(头、 手、脚)样本通过, 而拒绝很大一部分非人体部位(头、手、 脚) 样本。 将由更重要特征构成的结构较简单的强分类器放在前面, 这 样可以先排除大量的假样本。 尽管随着级数的增多矩形特征在增多, 但 计算量却在减少,检测的速度在加快,使本发明方法具有很好的实时性。  The strong classifiers obtained from each training are combined to form a hierarchical classifier. The strong classifier of each layer in the hierarchical classifier is threshold-adjusted so that each layer can pass almost all parts of the body (head, hand, foot) and reject a large part of the non-human parts (head, hand, foot) ) Sample. A simple classifier with a simpler structure consisting of more important features is placed in front, so that a large number of false samples can be excluded first. Although the number of rectangles increases with the increase of the number of stages, the amount of calculation is decreasing, and the speed of detection is increasing, so that the method of the invention has good real-time performance.
(11) 并行级联的分类器  (11) Parallel cascaded classifier
一般情况下, 应用到学习训练的特征数量非常大, 并且样本的数量 需要满足一定的比例, 因此将所有的特征应用到一个分类器训练是一件 十分困难的事情。 图 11 描绘了一个可以很好的解决这个问题的并行级 联分类器。如图 11所示,首先将大量的万向特征(Omni -direction Features) 随机分为 n组, {fl, f2... fn}。那些好的候选特征通过分类器从并行的组 里被挑选出来。 从各组选出来的贡献相对比较大的特征结合成一组新的 特征集, 然后便可以在新组成的特征集中应用强分类器训练出贡献最大 的那些特征。 比较未经过挑选的特征集(一个万向特征(Omni -direction Features)在 28*24像素图的子检测窗口中的矩形特征数量总计为 96600 个), 经过挑选的特征集的特征数目远小于未经过挑选的特征集, 在本 发明的实验中, 设定分类器的训练在达到 600个特征的时候停止。 In general, the number of features applied to learning training is very large, and the number of samples A certain proportion needs to be met, so it is very difficult to apply all the features to a classifier training. Figure 11 depicts a parallel cascade classifier that can solve this problem very well. As shown in FIG. 11, a large number of Omni-direction features are randomly divided into n groups, {fl, f2...fn}. Those good candidate features are selected from the parallel groups by the classifier. The relatively large-scale features selected from each group are combined into a new set of features, and then the strong classifiers can be applied to the newly composed feature sets to train those features that contribute the most. Comparing the unselected feature set (the total number of rectangular features in the sub-detection window of the 28*24 pixel graph by Omni-direction Features is 96,600), and the number of features of the selected feature set is much smaller than that of the After the selected feature set, in the experiment of the present invention, the training of the set classifier is stopped when 600 features are reached.
具体实施办法是:  The specific implementation method is:
1.将遍历样本得到的大量万向特征 (Omni -direction Features) 进行 随机分组, 默认分为《组, 每组特征集用 / ^… 表示。  1. Randomly group the Omni-direction Features obtained by traversing the samples. The default is divided into groups, and each set of features is represented by / ^.
2.运用 (0 中阐述的算法分别对这《组特征集进行分类训练, 每组 选择对检测贡献最大的那些特征。  2. Use the algorithm described in (0) to classify the “group feature set”, and each group selects those features that contribute the most to the test.
3.将第 2歩中选择得到的具有较大贡献的特征进行整理组合, 得到 一组新的特征集, 这个特征集中的特征数目远小于未经过筛选的原始特 征数目, 并且特征的总体有效性和贡献度均远优于原始特征。  3. Combine the features with larger contributions selected in the second , to obtain a new set of features. The number of features in this feature set is much smaller than the number of original features that have not been screened, and the overall validity of the feature. And the contribution is far superior to the original features.
4.再次运用强分类器, 调整阈值进行分类, 选择出最终的待用特征 集,此时选到的特征数目是所有分组特征集选择得到的特征集数目的 \ln , 有效性是原来的《倍, 可获得到较高的检测率。  4. Use the strong classifier again, adjust the threshold to classify, and select the final inactive feature set. The number of features selected at this time is \ln of the number of feature sets selected by all group feature sets, and the validity is the original one. Times, a higher detection rate is obtained.
歩骤 4, 基于歩骤 3得到的贡献力最大的万向特征构造分类器, 对 从深度摄像头实时读入的图像进行人体部位的分类检测, 并对检测出的 人体部位区域进行标注显示。  Step 4: Based on the universally-characteristic feature classifier obtained in step 3, the image read in real time from the depth camera is subjected to classification detection of the human body part, and the detected human body part area is marked and displayed.
如图 12所示, 歩骤 4进一歩包括以下歩骤:  As shown in Figure 12, step 4 includes the following steps:
歩骤 4.1, 对从深度摄像头读入的视频抓取帧图片保存;  Step 4.1: Capture a frame image of a video read from a depth camera;
歩骤 4.2, 对抓取的帧图片进行深度归一化处理;  Step 4.2: Performing a deep normalization process on the captured frame image;
深度图片的像素值是从 0到 9999, 为加快后续计算, 需要将图片的 像素值归一化为常用的 0到 255。 所述深度归一化处理的具体歩骤为: 歩骤 4.2.1, 设定一个大小为 10000 的深度直方图数组 g_pDepthHist[10000], 用以统计像素分布; The pixel value of the depth image is from 0 to 9999. To speed up subsequent calculations, the pixel values of the image need to be normalized to 0 to 255. The specific steps of the depth normalization process are: Step 4.2.1, setting a depth histogram array g_pDepthHist[10000] of size 10000 to calculate the pixel distribution;
歩骤 4.2.2, 遍历从深度摄像头抓取的深度图片, 寻找深度对应的索 引值, 索引值不为 0 的, 该深度像素值的个数加 1 , g_pDepthHist[curDepth]++, 并累计深度索引值不为 0 的总个数 nNumberOfPoints;  Step 4.2.2, traversing the depth image captured from the depth camera, looking for the index value corresponding to the depth, the index value is not 0, the number of the depth pixel value is increased by 1, g_pDepthHist[curDepth]++, and the cumulative depth The total number of index values not 0 is nNumberOfPoints;
歩骤 4.2.3, 遍历深度直方图数组, 计算累计深度直方图, g_pDepthHist[nIndex] += g_pDepthHist[nIndex-l];  Step 4.2.3, traverse the depth histogram array, calculate the cumulative depth histogram, g_pDepthHist[nIndex] += g_pDepthHist[nIndex-l];
歩骤 4.2.4, 遍历累计深度直方图, 得到映射到 [0〜255]区间的深度 查找表数组 g_pDepthHist[nIndex] = (float)(unsigned int)(255 * (l .Of - (g_pDepthHist[nIndex] I nNumberOfPoints)));  Step 4.2.4, traversing the cumulative depth histogram, and obtaining the depth lookup table array mapped to the [0~255] interval g_pDepthHist[nIndex] = (float)(unsigned int)(255 * (l .Of - (g_pDepthHist[nIndex ] I nNumberOfPoints)));
歩骤 4.2.5,遍历深度图片,根据深度值查深度查找表数组,得到 [0〜 255]区间的深度值 (unsigned int)g_pDepthHist[dep];  Step 4.2.5, traverse the depth picture, look up the depth lookup table array according to the depth value, and obtain the depth value of the [0~255] interval (unsigned int) g_pDepthHist[dep];
歩骤 4.3, 基于识别窗口多尺度机制对抓取的帧图片提取子图像, 用基于贡献力最大的万向特征构造的分类器分别检测子图像是否是人 体部位;  Step 4.3: extracting a sub-image from the captured frame image based on the multi-scale mechanism of the recognition window, and respectively detecting whether the sub-image is a human body part by using a classifier constructed based on the universal feature with the largest contribution;
初始识别窗口一般设定为与人体部位训练样本大小相同, 即初始人 头部位识别窗口为 24 X 28像素,初始人手部位识别窗口为 28 X 24像素、 初始人脚部位识别窗口为 24 X 24像素。 然后从帧图片的左上角开始遍 历整个图像获取子图像,每当遍历完一遍后,识别窗口就进行一次放大, 然后再遍历整个图像获取子图像, 直到识别窗口的大小大于图片的大小 停止。 窗口等比放大系数越大, 识别窗口的放大次数就越少, 截取出的 子图像数据就越少, 识别率就越低, 但识别速度将提高, 反之亦然。 识 别窗口多尺度机制通过改变识别窗口的尺寸来提取子图像, 避免了传统 方法中的对图像的缩放变换, 减少计算量。  The initial recognition window is generally set to the same size as the body part training sample, that is, the initial human head part recognition window is 24 X 28 pixels, the initial human hand part recognition window is 28 X 24 pixels, and the initial human foot part recognition window is 24 X 24 pixels. Then, the entire image is acquired from the upper left corner of the frame image to obtain the sub-image. After each traversal, the recognition window is enlarged once, and then the entire image is traversed to obtain the sub-image until the size of the recognition window is larger than the size of the image. The larger the window magnification ratio, the less the number of magnifications of the recognition window, the less the sub-image data is intercepted, the lower the recognition rate, but the recognition speed will increase, and vice versa. The multi-scale mechanism of the recognition window extracts the sub-image by changing the size of the recognition window, avoiding the scaling transformation of the image in the conventional method, and reducing the amount of calculation.
对于提取出的子图像, 用基于贡献力最大的万向特征构造的级联分 类器检测子图像是否是人体部位, 经过基于贡献力最大的万向特征构造 的级联分类器的检测后, 待识别帧图片中的很大一部分子图像区域被并 行级联分类器的前几层强分类器快速识别为非人体部位 (头、 手、 脚) 区域, 只有实际可能包含人体部位 (头、 手、 脚) 的子图像才能到达最 后层的强分类器。 For the extracted sub-images, the cascaded classifier based on the universally-contributed universal feature is used to detect whether the sub-image is a human body part, and after the detection by the cascade classifier based on the universally-contributed universal feature, A large part of the sub-image area in the recognition frame picture is quickly recognized as a non-human part (head, hand, foot) by the first few layers of the parallel classifier. Area, only sub-images that may actually contain human parts (heads, hands, feet) can reach the strong classifier of the last layer.
歩骤 4.4, 对检测为人体部位的子图像进行合并, 得到帧图片中各 人体部位的最终检测结果, 并对检测出的人体部位区域进行标注显示。  Step 4.4: Combine the sub-images detected as human body parts to obtain the final detection result of each human body part in the frame picture, and display the detected human body part area.
通过歩骤 4.3, 实际可能包含人体部位的多个子图像被检测出来, 对检测为人体部位的子图像进行合并处理, 只有满足一定条件的合并子 图像才最终被确定为实际包含人体部位 (手、 头、 脚), 此处的一定条 件是指某一通过人体部位判断的子图像附近有一定数目的通过人体部 位判断的子图像,也就是说,多个通过人体部位判断的子图像出现重叠。 相反, 如果只是孤立零散的一个子图像, 这种子图像认为是噪声, 或者 叫不确定的人体部分。 检测结果的合并处理可以去除很多误识, 进一歩 提高检测结果的准确性。 最后对检测出来的人体部位 (头、 手、 脚) 区 域进行标注显示。 以上所述的具体实施例, 对本发明的目的、 技术方案和有益效果进 行了进一歩详细说明, 所应理解的是, 以上所述仅为本发明的具体实施 例而已, 并不用于限制本发明, 凡在本发明的精神和原则之内, 所做的 任何修改、 等同替换、 改进等, 均应包含在本发明的保护范围之内。  Through step 4.3, a plurality of sub-images that may actually include a human body part are detected, and the sub-images detected as human body parts are combined, and only the merged sub-images satisfying certain conditions are finally determined to actually contain the human body parts (hand, Head, foot), the certain condition here refers to a certain number of sub-images judged by the human body part near the sub-image judged by the human body part, that is, a plurality of sub-images judged by the human body part overlap. Conversely, if it is only a sub-image that is isolated and scattered, this sub-image is considered to be noise, or an indeterminate part of the human body. The combined processing of the test results can remove a lot of misunderstandings and further improve the accuracy of the test results. Finally, the detected parts of the human body (head, hand, foot) are marked. The specific embodiments of the present invention have been described in detail with reference to the preferred embodiments of the present invention. Any modifications, equivalent substitutions, improvements, etc., made within the spirit and scope of the present invention are intended to be included within the scope of the present invention.

Claims

权 利 要 求 Rights request
1、 一种基于三维深度图像的人体部位检测方法, 其特征在于, 该 方法包括以下歩骤: A method for detecting a human body part based on a three-dimensional depth image, the method comprising the following steps:
歩骤 1, 采用深度摄像头采集多幅三维深度图像并对其进行处理, 建立人体部位样本数据库;  Step 1, using a depth camera to collect and process a plurality of three-dimensional depth images, and establishing a sample database of human body parts;
歩骤 2, 对于人体部位样本数据库中的每幅图像, 构造描述各人体 部位的万向特征;  Step 2: constructing a universal feature describing each human body part for each image in the body part sample database;
歩骤 3, 对于所述万向特征基于并行级联的统计学习算法训练分类 器, 得到贡献力最大的那些万向特征;  Step 3: training the classifier based on the parallel cascading statistical learning algorithm for the universal feature, and obtaining those universal features having the greatest contribution;
歩骤 4, 基于歩骤 3得到的贡献力最大的万向特征, 对从深度摄像 头实时读入的图像进行人体部位的检测, 并对检测出的人体部位区域进 行标注显示。  Step 4: Based on the universal characteristic obtained by the step 3, the human body part is detected on the image read in real time from the depth camera, and the detected human body part area is marked and displayed.
2、 根据权利要求 1 所述的方法, 其特征在于, 所采集的三维深度 图像储存有摄像头与拍摄目标各个像素点的距离的深度信息。  2. The method according to claim 1, wherein the acquired three-dimensional depth image stores depth information of a distance between the camera and each pixel of the shooting target.
3、 根据权利要求 1所述的方法, 其特征在于, 所述歩骤 1 中对三 维深度图像进行处理, 建立人体部位样本数据库的歩骤进一歩包括: 将采集到的多幅三维深度图像分别归一化为分辨率为 320x240像素 的 BMP图片;  The method according to claim 1, wherein the step of processing the three-dimensional depth image in the step 1 and establishing the database of the body part sample comprises: collecting the plurality of three-dimensional depth images respectively Normalized to a BMP image with a resolution of 320x240 pixels;
从归一化后的多幅 BMP 图片中分别分割出头部、 手部和脚部, 得 到多幅人体部位, 即头部、 手部和脚部的正样本图片;  The head, the hand and the foot are separated from the normalized multiple BMP images, and a plurality of human body parts, that is, a positive sample picture of the head, the hand and the foot are obtained;
从归一化后的多幅 BMP 图片中分割出多幅非人体部位的负样本图 片;  A plurality of negative sample images of non-human parts are segmented from the normalized multiple BMP images;
将所述多幅正样本图片分别与多幅负样本图片组合成头部、 手部和 脚部三个人体部位样本数据库。  The plurality of positive sample images and the plurality of negative sample images are respectively combined into three human body part sample databases of the head, the hand and the foot.
4、 根据权利要求 1 所述的方法, 其特征在于, 依据人体部位的三 维深度图像样本的中间位置是出现待检测部位的稳定位置这一特点, 通 过比较三维深度图像样本的中间区域和***区域得到特征位置信息, 将 所述特征位置信息结合人体部位的形状特性构造得到所述万向特征。 4. The method according to claim 1, wherein the intermediate position of the three-dimensional depth image sample according to the human body part is a feature that a stable position of the to-be-detected portion appears, by comparing the intermediate region and the peripheral region of the three-dimensional depth image sample. The feature position information is obtained, and the feature position information is combined with the shape characteristic of the body part to construct the universal feature.
5、 根据权利要求 4所述的方法, 其特征在于, 所述万向特征为类 矩形特征, 是通过出现人体部位的矩形区域和周围矩形区域按照一定的 权值关系进行相互重叠、 遮 或错层方式的组合得到的形状特征:
Figure imgf000018_0001
The method according to claim 4, wherein the universal feature is a rectangular-like feature, which is overlapped, covered or wrong according to a certain weight relationship by a rectangular region in which a human body portion appears and a surrounding rectangular region. Shape characteristics obtained by the combination of layer modes:
Figure imgf000018_0001
其中, 为特征值, 为第 z'个矩形的权值, Re Cl¾ W代表第,个 矩形内所有像素值的和, N为组成所述形状特征的矩形个数; Wherein, the feature value is the weight of the z'th rectangle, R e Cl 3⁄4 W represents the sum of all pixel values in the first rectangle, and N is the number of rectangles constituting the shape feature;
所述矩形用一个五元组来表示: r = (x,_y,WA ), 其中 (x,_y) 为矩形 左上顶点的坐标, w和 2为矩形的长和宽, 《为矩形的旋转角度。 The rectangle is represented by a quintuple: r = (x, _y, W A ), where (x, _y) is the coordinate of the upper left vertex of the rectangle, w and 2 are the length and width of the rectangle, "rotation of the rectangle" angle.
6、 根据权利要求 4所述的方法, 其特征在于, 所述万向特征分为 任意位置的单层次单矩形特征、 多层次多矩形特征、 编码型万向特征及 组合对角对称特征, 所述编码型万向特征又分为组合矩形特征、 组合菱 形特征和组合椭圆特征, 所有类型的特征都能够通过图像积分进行快速 计算和提取。  The method according to claim 4, wherein the universal feature is divided into a single-level single-rectangular feature, a multi-level multi-rectangular feature, a coded gimbal feature, and a combined diagonal symmetry feature at an arbitrary position. The coding type universal feature is further divided into a combined rectangular feature, a combined diamond feature and a combined elliptical feature, and all types of features can be quickly calculated and extracted by image integration.
7、 根据权利要求 6所述的方法, 其特征在于, 所述编码型万向特  7. The method according to claim 6, wherein the coding type universal joint
Figure imgf000018_0002
Figure imgf000018_0002
其中, MP为万向特征值, &为能够代表人体部位区域的矩形框的面积 gc = -∑g1 , 为判定阈值函数, S为矩形框的个数。 Wherein, MP is a universal eigenvalue, and & is an area of a rectangular frame capable of representing a human body part region, g c = -∑g 1 , which is a function of determining a threshold value, and S is a number of rectangular frames.
8、 根据权利要求 1 所述的方法, 其特征在于, 基于并行级联的统 计学习算法训练分类器的歩骤进一歩为: 将歩骤 2得到的万向特征随机 分为《组, 通过并行级联的统计学习算法进行同时训练, 得到各组中贡 献力较大的万向特征, 合并这些贡献力较大的万向特征得到一个新的特 征集, 通过再一次的学习训练, 最终得到的贡献力最大的那些万向特征 组成强分类器。 8. The method according to claim 1, wherein the step of training the classifier based on the parallel cascading statistical learning algorithm is further: the universal feature obtained in step 2 is randomly divided into groups, by parallel The cascaded statistical learning algorithm performs simultaneous training, and obtains the universal feature with greater contribution in each group. Combining these universal features with greater contribution to obtain a new feature set, through the further learning and training, the final result is obtained. Those universal features that contribute the most are strong classifiers.
9、 根据权利要求 8所述的方法, 其特征在于, 所述贡献力是指所 选特征对于人体检测的有效性, 也就是判断所选特征是否可以有效的判 定待检测图像是否含有人体部位。 9. The method according to claim 8, wherein said contributing force means The feature is selected for the effectiveness of the human body detection, that is, whether the selected feature can be effectively determined whether the image to be detected contains a human body part.
10、 根据权利要求 1所述的方法, 其特征在于, 歩骤 4进一歩包括 以下歩骤:  10. The method according to claim 1, wherein the step 4 further comprises the following steps:
歩骤 4.1, 对从深度摄像头读入的视频抓取帧图片保存;  Step 4.1: Capture a frame image of a video read from a depth camera;
歩骤 4.2, 对抓取的帧图片进行深度归一化处理;  Step 4.2: Performing a deep normalization process on the captured frame image;
歩骤 4.3, 基于识别窗口多尺度机制对抓取的帧图片提取子图像, 用基于贡献力最大的万向特征构造的分类器分别检测子图像是否是人 体部位;  Step 4.3: extracting a sub-image from the captured frame image based on the multi-scale mechanism of the recognition window, and respectively detecting whether the sub-image is a human body part by using a classifier constructed based on the universal feature with the largest contribution;
歩骤 4.4, 对检测为人体部位的子图像进行合并, 得到帧图片中各 人体部位的最终检测结果, 并对检测出的人体部位区域进行标注显示。  Step 4.4: Combine the sub-images detected as human body parts to obtain the final detection result of each human body part in the frame picture, and display the detected human body part area.
11、 根据权利要求 10 所述的方法, 其特征在于, 所述识别窗口多 尺度机制进一歩包括:  11. The method according to claim 10, wherein the multi-scale mechanism of the identification window further comprises:
首先将初始识别窗口设定为与人体部位训练样本大小相同; 然后从帧图片的左上角开始以初始识别窗口遍历整个图像获取子 图像;  First, the initial recognition window is set to be the same size as the body part training sample; and then the entire image acquisition sub-image is traversed from the upper left corner of the frame picture with the initial recognition window;
每当遍历完一遍后, 识别窗口就进行一次放大, 然后再遍历整个图 像获取子图像, 直到识别窗口大小大于图片大小时停止。  Each time the traversal is completed, the recognition window is zoomed in once, and then the entire image is traversed to acquire the sub-image until the recognition window size is larger than the image size.
PCT/CN2012/077874 2011-12-22 2012-06-29 Human body part detection method based on parallel statistics learning of 3d depth image information WO2013091370A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201110435745.7 2011-12-22
CN2011104357457A CN102609680B (en) 2011-12-22 2011-12-22 Method for detecting human body parts by performing parallel statistical learning based on three-dimensional depth image information

Publications (1)

Publication Number Publication Date
WO2013091370A1 true WO2013091370A1 (en) 2013-06-27

Family

ID=46527039

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2012/077874 WO2013091370A1 (en) 2011-12-22 2012-06-29 Human body part detection method based on parallel statistics learning of 3d depth image information

Country Status (2)

Country Link
CN (1) CN102609680B (en)
WO (1) WO2013091370A1 (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104123570A (en) * 2014-07-22 2014-10-29 西安交通大学 Shared weak classifier combination based hand classifier and training and detection method
US9430701B2 (en) 2014-02-07 2016-08-30 Tata Consultancy Services Limited Object detection system and method
CN109558810A (en) * 2018-11-12 2019-04-02 北京工业大学 Divided based on position and merges target person recognition methods
CN109871799A (en) * 2019-02-02 2019-06-11 浙江万里学院 A kind of driver based on deep learning plays the detection method of mobile phone behavior
CN110443748A (en) * 2019-07-31 2019-11-12 思百达物联网科技(北京)有限公司 Human body screen method, device and storage medium
CN111523613A (en) * 2020-05-09 2020-08-11 黄河勘测规划设计研究院有限公司 Image analysis anti-interference method under complex environment of hydraulic engineering
CN113536841A (en) * 2020-04-15 2021-10-22 普天信息技术有限公司 Human body structural information analysis method and system
CN114894807A (en) * 2022-05-16 2022-08-12 福耀玻璃工业集团股份有限公司 Workpiece surface orange peel detection equipment, method and device

Families Citing this family (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103345744B (en) * 2013-06-19 2016-01-06 北京航空航天大学 A kind of human body target part automatic analytic method based on many images
WO2015139187A1 (en) * 2014-03-17 2015-09-24 Mediatek Inc. Low latency encoder decision making for illumination compensation and depth look-up table transmission in video coding
WO2015139203A1 (en) * 2014-03-18 2015-09-24 Mediatek Singapore Pte. Ltd. Dlt signaling in 3d video coding
CN104463878A (en) * 2014-12-11 2015-03-25 南京理工大学 Novel depth image local descriptor method
WO2017114846A1 (en) * 2015-12-28 2017-07-06 Robert Bosch Gmbh Depth sensing based system for detecting, tracking, estimating, and identifying occupancy in real-time
CN106127733B (en) * 2016-06-14 2019-02-22 湖南拓视觉信息技术有限公司 The method and apparatus of human body target identification
CN106096551B (en) * 2016-06-14 2019-05-21 湖南拓视觉信息技术有限公司 The method and apparatus of face position identification
CN106127173B (en) * 2016-06-30 2019-05-07 北京小白世纪网络科技有限公司 A kind of human body attribute recognition approach based on deep learning
US10198655B2 (en) * 2017-01-24 2019-02-05 Ford Global Technologies, Llc Object detection using recurrent neural network and concatenated feature map
CN108460362B (en) * 2018-03-23 2021-11-30 成都品果科技有限公司 System and method for detecting human body part
CN110390660A (en) * 2018-04-16 2019-10-29 北京连心医疗科技有限公司 A kind of medical image jeopardizes organ automatic classification method, equipment and storage medium
CN108827974B (en) * 2018-06-28 2024-01-09 广东科达洁能股份有限公司 Ceramic tile defect detection method and system
CN109670532B (en) 2018-11-23 2022-12-09 腾讯医疗健康(深圳)有限公司 Method, device and system for identifying abnormality of biological organ tissue image
CN109934847B (en) * 2019-03-06 2020-05-22 视辰信息科技(上海)有限公司 Method and device for estimating posture of weak texture three-dimensional object
CN111062918B (en) 2019-12-10 2023-11-21 歌尔股份有限公司 Abnormality detection method and device based on computer vision

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101320484A (en) * 2008-07-17 2008-12-10 清华大学 Three-dimensional human face recognition method based on human face full-automatic positioning
CN101373514A (en) * 2007-08-24 2009-02-25 李树德 Method and system for recognizing human face
CN102004899A (en) * 2010-11-03 2011-04-06 无锡中星微电子有限公司 Human face identifying system and method

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP3775683B2 (en) * 2003-08-21 2006-05-17 松下電器産業株式会社 Person detection device and person detection method
CN101419669B (en) * 2008-10-14 2011-08-31 复旦大学 Three-dimensional human ear extracting method based on profile wave convert
CN101720992B (en) * 2009-11-13 2012-11-07 东华大学 Three-dimensional human body measurement method by using single camera
US8213680B2 (en) * 2010-03-19 2012-07-03 Microsoft Corporation Proxy training data for human body tracking
CN101938668B (en) * 2010-09-10 2013-01-23 中国科学院自动化研究所 Method for three-dimensional reconstruction of multilevel lens multi-view scene

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101373514A (en) * 2007-08-24 2009-02-25 李树德 Method and system for recognizing human face
CN101320484A (en) * 2008-07-17 2008-12-10 清华大学 Three-dimensional human face recognition method based on human face full-automatic positioning
CN102004899A (en) * 2010-11-03 2011-04-06 无锡中星微电子有限公司 Human face identifying system and method

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9430701B2 (en) 2014-02-07 2016-08-30 Tata Consultancy Services Limited Object detection system and method
CN104123570A (en) * 2014-07-22 2014-10-29 西安交通大学 Shared weak classifier combination based hand classifier and training and detection method
CN104123570B (en) * 2014-07-22 2018-06-05 西安交通大学 Human hand grader and training and detection method based on the combination of shared Weak Classifier
CN109558810A (en) * 2018-11-12 2019-04-02 北京工业大学 Divided based on position and merges target person recognition methods
CN109558810B (en) * 2018-11-12 2023-01-20 北京工业大学 Target person identification method based on part segmentation and fusion
CN109871799A (en) * 2019-02-02 2019-06-11 浙江万里学院 A kind of driver based on deep learning plays the detection method of mobile phone behavior
CN109871799B (en) * 2019-02-02 2023-03-24 浙江万里学院 Method for detecting mobile phone playing behavior of driver based on deep learning
CN110443748A (en) * 2019-07-31 2019-11-12 思百达物联网科技(北京)有限公司 Human body screen method, device and storage medium
CN113536841A (en) * 2020-04-15 2021-10-22 普天信息技术有限公司 Human body structural information analysis method and system
CN111523613A (en) * 2020-05-09 2020-08-11 黄河勘测规划设计研究院有限公司 Image analysis anti-interference method under complex environment of hydraulic engineering
CN111523613B (en) * 2020-05-09 2023-03-24 黄河勘测规划设计研究院有限公司 Image analysis anti-interference method under complex environment of hydraulic engineering
CN114894807A (en) * 2022-05-16 2022-08-12 福耀玻璃工业集团股份有限公司 Workpiece surface orange peel detection equipment, method and device

Also Published As

Publication number Publication date
CN102609680B (en) 2013-12-04
CN102609680A (en) 2012-07-25

Similar Documents

Publication Publication Date Title
WO2013091370A1 (en) Human body part detection method based on parallel statistics learning of 3d depth image information
Zhan et al. Face detection using representation learning
CN106682598B (en) Multi-pose face feature point detection method based on cascade regression
WO2019134327A1 (en) Facial expression recognition feature extraction method employing edge detection and sift
Satpathy et al. LBP-based edge-texture features for object recognition
Kölsch et al. Robust Hand Detection.
Liu et al. Continuous gesture recognition with hand-oriented spatiotemporal feature
Zhang et al. Real-time multi-view face detection
CN103443804B (en) Method of facial landmark detection
Pazhoumand-Dar et al. Joint movement similarities for robust 3D action recognition using skeletal data
CN110263712B (en) Coarse and fine pedestrian detection method based on region candidates
CN113033398B (en) Gesture recognition method and device, computer equipment and storage medium
CN109558855B (en) A kind of space gesture recognition methods combined based on palm contour feature with stencil matching method
CN111460976B (en) Data-driven real-time hand motion assessment method based on RGB video
Zakaria et al. Hierarchical skin-adaboost-neural network (h-skann) for multi-face detection
Yi et al. Motion keypoint trajectory and covariance descriptor for human action recognition
CN111275010A (en) Pedestrian re-identification method based on computer vision
CN104346602A (en) Face recognition method and device based on feature vectors
Ding et al. Recognition of hand-gestures using improved local binary pattern
Sasithradevi et al. Video classification and retrieval through spatio-temporal Radon features
CN109447022A (en) A kind of lens type recognition methods and device
Sheeba et al. Hybrid features-enabled dragon deep belief neural network for activity recognition
Chen et al. A multi-scale fusion convolutional neural network for face detection
Lin et al. Region-based context enhanced network for robust multiple face alignment
Shan et al. Adaptive slice representation for human action classification

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 12859081

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 12859081

Country of ref document: EP

Kind code of ref document: A1