KR101601660B1

KR101601660B1 - Hand part classification method using depth images and apparatus thereof

Info

Publication number: KR101601660B1
Application number: KR1020140154424A
Authority: KR
Inventors: 손명규; 김동주; 이상헌; 김현덕
Original assignee: 재단법인대구경북과학기술원
Priority date: 2014-11-07
Filing date: 2014-11-07
Publication date: 2016-03-10

Abstract

The present invention relates to hand part classification method using depth images and apparatus thereof. According to the present invention, a hand part classification method using depth images may comprise: detecting hand part from camera′s depth image; calculating feature value of each pixel included in the hand part, specifically calculating feature value for each pixel, by using the difference of average depth values for the two offset patches which are generated to have certain size, based on two points that are spaced apart by a set offset from the pixel within the depth image, as reference; classifying the hand part into multiple parts by inputting each of the pixel′s feature value into random decision forest, which is trained to output classification label corresponding to inputted feature value, in advance; setting central points for the multiple classified parts; and extracting a skeleton of the hand by using the set central points. According to the hand part classification method using depth images and apparatus thereof, a hand part can be effectively classified into multiple parts by using depth image only, thereby allowing accurate extraction of the hand′s skeleton information and improved recognition rate of hand movements.

Description

깊이 영상을 이용한 손 영역 분류 방법 및 그 장치{Hand part classification method using depth images and apparatus thereof}Technical Field [0001] The present invention relates to a hand area classification method and apparatus,

본 발명은 깊이 영상을 이용한 손 영역 분류 방법 및 그 장치에 관한 것으로서, 보다 상세하게는 깊이 영상으로부터 취득된 사용자의 손 영상을 처리하여 손을 각 파트로 나누고 손의 스켈레톤을 추출할 수 있는 깊이 영상을 이용한 손 영역 분류 방법 및 그 장치에 관한 것이다.More particularly, the present invention relates to a method and apparatus for classifying a hand region using a depth image, and more particularly, to a hand region classification method and apparatus for processing a hand image of a user obtained from a depth image, And a method and apparatus for the same.

최근 손동작 인식은 게임, 브라우저, 미디어 컨트롤 등 다양한 분야에서 사람과 컴퓨터 사이의 상호 인터랙션(HCI;Human-Computer Interaction)을 위한 수단으로 많은 관심을 받고 있다. Recently, the recognition of hand gesture has attracted much attention as a means for human-computer interaction (HCI) in various fields such as game, browser, and media control.

손동작 인식을 위해서는 손 영역을 추출하는 알고리즘을 필요로 하는데 일반적인 RGB 카메라를 이용하는 방법과 깊이 카메라를 이용하는 방법이 있다. RGB 카메라를 이용하는 경우 깊이 카메라를 이용하는 경우보다 배경으로부터 손 영역을 분리하는 것이 매우 복잡하다. 깊이 카메라로부터 얻어지는 영상의 경우 배경과 전경의 분리가 깊이 정보로부터 얻어질 수 있기 때문에 손의 영역을 분리하는 것이 보다 간단해질 수 있다. In order to recognize the hand motion, a hand area extraction algorithm is required. However, there are a general RGB camera method and a depth camera method. In the case of using an RGB camera, it is very complicated to separate the hand region from the background rather than using the depth camera. In the case of an image obtained from a depth camera, it may be simpler to separate the region of the hand because the separation of the background and the foreground can be obtained from the depth information.

이와 같이 깊이 영상에서 얻어지는 손 영역을 여러 부분으로 구분하게 되면 이로부터 손의 스켈레톤을 탐색하거나 손의 모양을 인식하는데 효과적일 것으로 판단된다. 따라서 깊이 영상에서 손 영역을 다수의 파트로 구분하고 이를 통한 손 동작 인식 효율을 높일 수 있는 기술이 요구된다.If the hand region obtained from the depth image is divided into several parts, it is considered to be effective in searching for the skeleton of the hand or recognizing the shape of the hand. Therefore, there is a need for a technique for dividing the hand region into a plurality of parts in the depth image and improving the hand motion recognition efficiency through the division.

본 발명의 배경이 되는 기술은 한국등록특허 제1171239호(2012.08.06 공고) 에 개시되어 있다.The technology that becomes the background of the present invention is disclosed in Korean Patent No. 1171239 (published on Aug. 6, 2012).

본 발명은 깊이 영상으로부터 손 영역을 각 파트로 구분하여 손의 스켈레톤을 추출하는 정확도를 향상시키고 손동작의 인식률을 높일 수 있는 깊이 영상을 이용한 손 영역 분류 방법 및 그 장치를 제공하는데 목적이 있다.An object of the present invention is to provide a method and a device for classifying a hand region using a depth image that can improve the accuracy of extraction of a skeleton of a hand by dividing the hand region into respective parts from the depth image and increase the recognition rate of the hand movement.

본 발명은, 카메라의 깊이 영상으로부터 손 영역을 검출하는 단계와, 상기 손 영역에 포함된 각 픽셀의 특징값을 연산하되, 상기 깊이 영상 내에서 상기 픽셀로부터 설정 오프셋만큼 떨어진 두 지점을 기준으로 임의 크기로 생성된 두 오프셋 패치에 대한 평균 깊이값의 차이를 이용하여 상기 각 픽셀의 특징값을 연산하는 단계와, 입력되는 특징값에 대응하는 분류 라벨을 출력하도록 미리 학습된 랜덤 결정 트리 앙상블(Random Decision Forest) 내에 상기 각 픽셀의 특징값을 개별 입력하여 상기 손 영역을 다수의 파트로 분류하는 단계와, 상기 분류된 다수의 파트 별로 중심점을 설정하는 단계, 및 상기 설정된 중심점들을 이용하여 상기 손의 스켈레톤(skeleton)을 추출하는 단계를 포함하는 깊이 영상을 이용한 손 영역 분류 방법을 제공한다.The method includes calculating a feature value of each pixel included in the hand region, calculating a feature value of each pixel included in the hand region based on two points separated by a set offset from the pixel in the depth image, Calculating a feature value of each of the pixels by using a difference of average depth values of two offset patches generated in a size of a random pattern tree, generating randomly determined tree ensembles to output classification labels corresponding to input feature values, A step of classifying the hand region into a plurality of parts by separately inputting feature values of the respective pixels in a decision forest, setting a center point for each of the plurality of classified parts, A method for classifying a hand region using a depth image including a step of extracting a skeleton is provided.

또한, 상기 각 픽셀의 특징값 f_o(I,p)은 아래의 수학식을 이용하여 연산할 수 있다.In addition, the feature value f _o (I, p) of each pixel can be calculated using the following equation.

여기서, I는 깊이 영상, p는 영상 내의 픽셀 위치, d_I(·)는 상기 깊이 영상 I 내에서 괄호 안의 픽셀 위치에서의 깊이 값, d_I ⁰은 상기 손 영역에 포함된 픽셀들의 평균 깊이 값, u(i) 및 v(j)는 제1 및 제2 오프셋 패치, i 및 j는 상기 제1 및 제2 오프셋 패치 내에 각각 존재하는 픽셀들의 위치 값, u(i)/d_I ⁰ 및 v(j)/d_I ⁰는 d_I ⁰로 정규화된 u(i) 및 v(j)의 오프셋 위치, k 및 l은 상기 제1 및 제2 오프셋 패치 내에 각각 존재하는 픽셀들의 개수를 의미한다.D _I (·) is a depth value at a pixel position in parentheses in the depth image I, d _I ⁰ is an average depth value of pixels included in the hand region, , u (i) and v (j) are the first and second offset patch, i and j are the first and second respectively, present position value of the pixels in the offset _{patch, u (i) / d i} 0 and v (j) / d _I ⁰ is the offset position of u (i) and v (j) normalized to d _I ⁰ , and k and l denote the number of pixels respectively present in the first and second offset patches.

또한, 상기 손 영역을 다수의 파트로 분류하는 단계는, 상기 손 영역에 포함된 각 픽셀에 대해 그 특징값에 대응하여 출력된 해당 분류 라벨을 할당하여 상기 손 영역을 상기 다수의 파트로 분류하고, 상기 중심점을 설정하는 단계는, 상기 분류 라벨이 할당되지 않은 비어있는 파트가 존재하면, 상기 비어있는 파트와 인접한 파트들에 대한 각 중심점과 다항식 곡선 접합을 이용하여, 상기 비어있는 파트에 대한 중심점을 추정할 수 있다.The step of classifying the hand region into a plurality of parts may include classifying the hand region into the plurality of parts by assigning corresponding classification labels output corresponding to the feature values to each pixel included in the hand region, Wherein the step of setting the center point further comprises the step of using a polynomial curve joining with each center point for the parts adjacent to the empty part if there is an empty part to which the classification label is not assigned, Can be estimated.

또한, 상기 중심점을 설정하는 단계는, 상기 추정된 중심점이 상기 깊이 영상 내에서 상기 손 영역에 대응하는 전경 이미지가 아닌 배경 이미지 내에 존재하면, 상기 전경 이미지 내의 픽셀 중에서 상기 추정된 중심점과 최단 유클리디안 거리에 존재하는 픽셀을 상기 비어있는 파트에 대한 중심점으로 설정할 수 있다.In addition, the step of setting the center point may further comprise: if the estimated center point exists in the background image other than the foreground image corresponding to the hand region in the depth image, A pixel existing at the distance of the dian can be set as a center point for the empty part.

그리고, 본 발명은, 카메라의 깊이 영상으로부터 손 영역을 검출하는 영역 검출부와, 상기 손 영역에 포함된 각 픽셀의 특징값을 연산하되, 상기 깊이 영상 내에서 상기 픽셀로부터 설정 오프셋만큼 떨어진 두 지점을 기준으로 임의 크기로 생성된 두 오프셋 패치에 대한 평균 깊이값의 차이를 이용하여 상기 각 픽셀의 특징값을 연산하는 특징값 연산부와, 입력되는 특징값에 대응하는 분류 라벨을 출력하도록 미리 학습된 랜덤 결정 트리 앙상블(Random Decision Forest) 내에 상기 각 픽셀의 특징값을 개별 입력하여 상기 손 영역을 다수의 파트로 분류하는 파트 분류부와, 상기 분류된 다수의 파트 별로 중심점을 설정하는 중심점 설정부, 및 상기 설정된 중심점들을 이용하여 상기 손의 스켈레톤(skeleton)을 추출하는 스켈레톤 추출부를 포함하는 깊이 영상을 이용한 손 영역 분류 장치를 제공한다.According to another aspect of the present invention, there is provided an image processing apparatus including an area detection unit for detecting a hand area from a depth image of a camera, and a feature value calculation unit for calculating a feature value of each pixel included in the hand area, A feature value calculator for calculating a feature value of each pixel by using a difference of average depth values of two offset patches generated with an arbitrary size as a reference; A part classification unit for separately inputting feature values of the respective pixels in a decision tree ensemble to classify the hand region into a plurality of parts, a center point setting unit for setting a center point for each of the classified parts, And a skeleton extracting unit for extracting a skeleton of the hand using the set center points, Provide any good hand region classification unit.

본 발명에 따른 깊이 영상을 이용한 손 영역 분류 방법 및 그 장치에 따르면, 손의 깊이 영상만으로 손 영역을 다수의 파트로 효과적으로 분류할 수 있으며 이를 통해 손의 스켈레톤 정보를 정확하게 추출하고 손동작의 인식률을 향상시킬 수 있는 이점이 있다.According to the method and apparatus for dividing a hand region using depth images according to the present invention, it is possible to effectively classify a hand region into a plurality of parts using only a depth image of a hand, thereby accurately extracting skeleton information of a hand and improving the recognition rate of a hand movement There is an advantage that can be made.

도 1은 본 발명의 실시예에서 랜덤 결정 트리 앙상블을 이용하여 손의 각 파트를 분류하는 개념도이다.
도 2는 본 발명의 실시예에서 깊이 영상과 그의 라벨 영상을 나타낸다.
도 3은 본 발명의 실시예에 따른 손 영역 분류 과정을 통하여 나타나는 결과를 보여주는 도면이다.
도 4는 본 발명의 실시예에 따른 깊이 영상을 이용한 손 영역 분류 장치의 구성도이다.
도 5는 도 4를 이용한 손 영역 분류 방법의 흐름도이다.
도 6은 도 5의 S520 단계를 설명하는 개념도이다.
도 7은 도 5의 S540 단계에서 중심점을 추정하는 방법의 개념도이다.
도 8은 본 발명의 실시예에 따른 손 영역 분류 방법에 대한 정확도 검증 결과를 나타낸다.
도 9는 본 발명의 실시예에 따른 손 영역 분류 방법에 대한 정확도 검증 결과를 나타낸다.
도 10은 본 발명의 실시예에 따른 손 영역 분류 방법에서 스켈레톤 추정의 정확도를 나타낸다.1 is a conceptual diagram for classifying each part of a hand using a random decision tree ensemble in an embodiment of the present invention.
2 shows a depth image and its label image in an embodiment of the present invention.
FIG. 3 is a diagram illustrating a result of a hand region classification process according to an embodiment of the present invention. Referring to FIG.
4 is a block diagram of a hand region classification apparatus using a depth image according to an embodiment of the present invention.
5 is a flowchart of a hand area classification method using FIG.
6 is a conceptual diagram illustrating the step S520 of FIG.
7 is a conceptual diagram of a method of estimating a center point in step S540 of FIG.
8 shows the accuracy verification result of the hand area classification method according to the embodiment of the present invention.
9 shows the accuracy verification result of the hand area classification method according to the embodiment of the present invention.
10 shows the accuracy of skeleton estimation in the hand region classification method according to an embodiment of the present invention.

그러면 첨부한 도면을 참고로 하여 본 발명의 실시예에 대하여 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자가 용이하게 실시할 수 있도록 상세히 설명한다.Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings so that those skilled in the art can easily carry out the present invention.

본 발명은 깊이 영상을 이용한 손 영역 분류 방법 및 그 장치에 관한 것으로서 보다 상세하게는 깊이 정보 카메라로부터 얻어지는 깊이 영상으로부터 취득된 사용자의 손 영상을 처리하여 손을 각 파트로 나누고 궁극적으로 손의 스켈레톤(skeleton)을 추출하는 방법을 제시한다.The present invention relates to a hand area classification method and apparatus using depth images. More particularly, the present invention relates to a hand area classification method and apparatus for processing a hand image obtained from a depth image obtained from a depth information camera, skeleton is extracted.

손 영상에서 손의 각 파트를 인식하여 손의 스켈레톤을 추출하는데 있어서 본 발명의 기본이 되는 기술은 랜덤 결정 트리 앙상블(Random Decision Forest)을 이용하는 것이다. 랜덤 결정 트리 앙상블은 손 영상을 구성하는 각각의 픽셀이 위치한 부분이 손의 어떤 파트에 해당하는지를 분류하기 위하여 사용된다. 이러한 분류를 위해서는 랜덤 결정 트리 앙상블이 미리 학습되어 있어야 한다.The technique of extracting the skeleton of the hand by recognizing each part of the hand in the hand image is to use a random decision tree ensemble. The random decision tree ensemble is used to classify which part of the hand corresponds to where each pixel constituting the hand image is located. For this classification, a random decision tree ensemble must be learned in advance.

학습 과정은 샘플 영상들을 사용하며 각 샘플 영상별로 미리 알고 있는 손 영역의 각 파트마다 해당 파트 내의 각 픽셀의 특징값과 그에 대응하는 분류 라벨 값을 입력받아 트리의 각 노드가 가지는 분기 함수(split function)를 결정하는 것이다. 분기 함수는 이진 분류를 위한 함수로서 라벨 값이 양쪽으로 잘 분리되게 학습하게 된다. 이와 같은 학습 과정은 각 노드가 자신에게 입력된 값을 그 자식 노드 중 어느 하나로 분류하여 서로 비슷한 클래스끼리 모이도록 하기 위한 최적의 파라미터를 결정하는 과정에 해당된다. The learning process uses sample images and receives characteristic values of each pixel in the corresponding part and classification label value corresponding thereto for each part of the hand region that is known in advance for each sample image and outputs a split function ). The branch function is a function for binary classification, so that the label value is learned well on both sides. This learning process corresponds to the process of determining the optimal parameters for each node to classify the input value into one of its child nodes so that similar classes are gathered together.

앞서 샘플 영상들에 의해 학습이 완료된 랜덤 결정 트리 앙상블은 각 노드 별로 최적의 분기 함수를 가지게 된다. 추후 이 훈련된 랜덤 결정 트리 앙상블 내에, 테스트 영상 내의 임의 픽셀에 대한 특징값이 입력되면, 상기 특징값에 대응하는 분류 라벨 값을 최종적으로 출력할 수 있게 된다. 이러한 방법을 통하여, 테스트 영상 내의 손 영역을 이루는 각각의 픽셀이 손의 어떤 파트에 해당하는지를 정확도 있게 분류할 수 있게 된다. The random decision tree ensemble that has been learned by the sample images has an optimal branch function for each node. When a feature value of any pixel in the test image is input in the trained random decision tree ensemble, the classification value corresponding to the feature value can be finally output. With this method, it is possible to accurately classify which part of the hand corresponds to each pixel constituting the hand region in the test image.

도 1은 본 발명의 실시예에서 랜덤 결정 트리 앙상블을 이용하여 손의 각 파트를 분류하는 개념도이다. 랜덤 결정 트리 앙상블은 다수의 트리(T₁,…,T_N)를 포함한다. 트리의 루트 노드(root node)에 픽셀 값이 입력되면 각 노드를 거치면서 최종 리프 노드(leaf node)에 도착하게 되고 이 리프 노드가 가지고 있는 라벨 값을 취하면 입력된 픽셀 값에 대한 손의 파트(라벨 값)을 알 수 있다. 여기서 각 트리에 대하여 수행한 결과 가장 최대 빈도로 출력된 라벨 값을 해당 픽셀에 대한 라벨 값으로 취할 수 있다. 이러한 랜덤 결정 트리 앙상블에 의한 데이터의 분류 방법은 기 공지된 방법으로서 더욱 상세한 설명은 생략한다.1 is a conceptual diagram for classifying each part of a hand using a random decision tree ensemble in an embodiment of the present invention. The random decision tree ensemble includes a plurality of trees T ₁ , ..., T _N. When a pixel value is input to the root node of the tree, the leaf node arrives at the leaf node through each node. When the label value of the leaf node is taken, (Label value) can be known. Here, the label value outputted at the maximum frequency as a result of performing the tree value can be taken as a label value for the corresponding pixel. The method of classifying data by the random decision tree ensemble is a known method and will not be described in further detail.

도 2는 본 발명의 실시예에서 깊이 영상과 그의 라벨 영상을 나타낸다. 도 2의 (a)는 깊이 영상의 예시이고, (b)는 (a)의 손 부분을 21개 파트로 구분한 라벨 영상이다. 도 2의 (b)는 21개의 각 파트가 그 분류 라벨 값에 대응하는 색상으로 표현되어 있다. 일부 색상의 경우 톤이 유사하여 육안으로는 동일 색상으로 보일 수 있음을 이해하여야 한다.2 shows a depth image and its label image in an embodiment of the present invention. 2 (a) is an example of a depth image, and FIG. 2 (b) is a label image in which the hand portion of FIG. In FIG. 2 (b), each of the 21 parts is represented by a color corresponding to the classification label value. It should be understood that for some colors, the tones are similar and may appear the same in the naked eye.

이러한 본 발명의 실시예의 경우, 엄지 손가락 부위는 3개의 파트, 나머지 네 손가락 부위는 각각 4개의 파트, 비교적 큰 영역을 가지는 손바닥 부위는 간단히 2개의 파트로 구분하고 있다. 이와 같이 손 영역은 21개의 파트로 구분되며 그에 대응하는 분류 라벨 값의 종류 또한 21 가지를 가지도록 구성된다.In this embodiment of the present invention, the thumb region is divided into three parts, the remaining four finger regions are respectively divided into four parts, and the palm region having a relatively large region is divided into two parts. Thus, the hand region is divided into 21 parts, and the classification label values corresponding thereto are also configured to have 21 kinds.

앞서 설명한 바와 같이, 샘플 영상에서 손의 각 파트를 대상으로, 해당 파트에 존재하는 각 픽셀의 특징값과 해당 파트에 대응하는 라벨 값을 랜덤 결정 트리 앙상블에 입력하면서 학습 과정을 거치게 되면, 트리 내의 각 노드에 대한 개별 분기 함수를 결정할 수 있게 된다. 이상과 같은 손의 파트 분류 예는 반드시 이에 한정되지 않으며 이보다 더욱 간소화되거나 복잡한 형태로 변경될 수 있음은 물론이다.As described above, if the learning process is performed while inputting the characteristic values of each pixel existing in the part and the label value corresponding to the corresponding part in the random decision tree ensemble with respect to each part of the hand in the sample image, A separate branch function for each node can be determined. It is needless to say that the above-described hand part classification example is not necessarily limited to this and can be changed into a more simplified or complicated form.

도 3은 본 발명의 실시예에 따른 손 영역 분류 과정을 통하여 나타나는 결과를 보여주는 도면이다. 각각의 행은 4가지 손동작에 따른 구분이며 각각의 열은 해당 손동작에 대해 손 영역의 분류를 적용한 것이다.FIG. 3 is a diagram illustrating a result of a hand region classification process according to an embodiment of the present invention. Referring to FIG. Each row is divided into four hand movements, and each row applies a classification of the hand region to the corresponding hand movements.

도 3의 (a)는 깊이 정보 카메라에 의해 얻어진 영상으로서 손 영역에 대한 깊이 영상을 확인할 수 있다. (b)는 영상 (a)에서 손 영역을 나타내는 각각의 픽셀이 손의 어떤 파트에 해당하는지를 분류하여 색상으로 보여주는 것으로서, 이는 랜덤 결정 트리 앙상블을 통한 손 파트의 분류 결과에 해당된다. 각각의 픽셀은 분류된 해당 라벨 값에 대응하는 색상으로 표현되어 있음을 알 수 있다.3 (a) shows the depth image of the hand region as an image obtained by the depth information camera. (b) shows how each pixel representing the hand region in the image (a) corresponds to which part of the hand, and this corresponds to the classification result of the hand part through the random decision tree ensemble. It can be seen that each pixel is represented by a color corresponding to the corresponding labeled value.

도 3의 (c)는 분류된 각 파트의 영역에서 개별 중심점을 탐색한 결과이다. 영역 내에서 중심점을 탐색하는 방법은 기 공지된 방법을 사용할 수 있다. (d)는 (c)에서 탐색된 중심점과 손의 물리적 형상 정보를 이용하여 스켈레톤을 추출한 결과이다. 3 (c) is a result of searching for individual center points in the area of each classified part. A known method can be used as a method of searching a center point in the region. (d) is the result of extracting the skeleton using the physical point information of the hand and the center point searched in (c).

본 발명의 실시예에서는 도 3의 과정 (b)에서 손 영역으로부터 손의 각 파트를 구분하는 효과적인 방법과, 과정 (c)에서 중심점을 탐색하기 위한 효과적인 방법을 제안한다. 이하에서는 본 발명의 실시예에 따른 깊이 영상을 이용한 손 영역 분류 방법에 관하여 더욱 상세히 설명한다.In the embodiment of the present invention, an effective method of distinguishing each part of the hand from the hand area in the process (b) of FIG. 3 and an effective method of searching the center point in the process (c) are proposed. Hereinafter, a hand region classification method using a depth image according to an embodiment of the present invention will be described in more detail.

도 4는 본 발명의 실시예에 따른 깊이 영상을 이용한 손 영역 분류 장치의 구성도이다. 도 5는 도 4를 이용한 손 영역 분류 방법의 흐름도이다.4 is a block diagram of a hand region classification apparatus using a depth image according to an embodiment of the present invention. 5 is a flowchart of a hand area classification method using FIG.

도 4 및 도 5를 참조하면, 본 발명의 실시예에 따른 깊이 영상을 이용한 손 영역 분류 장치(100)는 영역 검출부(110), 특징값 연산부(120), 파트 분류부(130), 중심점 설정부(140), 스켈레톤 추출부(150)를 포함한다.4 and 5, a hand region classification apparatus 100 using a depth image according to an exemplary embodiment of the present invention includes an area detection unit 110, a feature value calculation unit 120, a part classification unit 130, Unit 140, and a skeleton extraction unit 150.

먼저, 영역 검출부(110)는 카메라의 깊이 영상으로부터 손 영역을 검출한다(S510). 손 영역의 영상은 도 3의 (a)를 참조한다.First, the region detection unit 110 detects a hand region from the depth image of the camera (S510). The image of the hand region is shown in Fig. 3 (a).

다음, 특징값 연산부(120)는 상기 손 영역에 포함된 각 픽셀에 대한 특징값을 연산하되, 해당 픽셀로부터 설정 오프셋만큼 떨어진 두 지점에 랜덤하게 생성된 오프셋 패치에 대한 평균 깊이 값의 차이를 이용하여 각 픽셀의 특징값을 연산한다(S520).Next, the feature value calculation unit 120 calculates the feature value of each pixel included in the hand region, and uses the difference of the average depth value for the offset patch generated randomly at two points separated by the setting offset from the corresponding pixel And calculates a feature value of each pixel (S520).

도 6은 도 5의 S520 단계를 설명하는 개념도이다. 본 발명의 실시예는 손 영역을 다수의 파트로 각각 분류하기 위한 방법으로 각 픽셀 단위의 분류를 수행한다. 또한 이러한 분류를 위해 각 픽셀에 대한 특징값을 연산한다. 특징값의 연산에는 도 6에 도시된 개념이 적용된다.6 is a conceptual diagram illustrating the step S520 of FIG. The embodiments of the present invention classify each pixel unit as a method for classifying a hand region into a plurality of parts, respectively. Also, feature values for each pixel are calculated for this classification. The concept shown in Fig. 6 is applied to the calculation of the feature value.

도 6의 (a)는 해당 픽셀에 대해 설정 오프셋(거리)만큼 떨어진 두 지점을 랜덤하게 생성하여 해당 픽셀의 특징값을 연산하는 개념이고, 도 6의 (b)는 해당 픽셀에 대해 설정 오프셋 만큼 떨어진 두 지점을 기준으로 랜덤하게 생성되는 두 오프셋 패치를 사용하여 해당 픽셀에 대한 특징값을 연산하는 개념이다.6A is a concept of randomly generating two points separated by a set offset (distance) with respect to the corresponding pixel to calculate a feature value of the corresponding pixel. FIG. 6B shows a case where And a feature value for the corresponding pixel is calculated using two offset patches randomly generated based on two separated points.

본 발명의 실시예는 깊이 영상 내에서 손 영역을 구성하는 픽셀의 특징값을 연산하기 위한 방법으로서 도 6의 (b)와 같은 방법을 사용한다. 즉, 도 6의 (a)와 같이 해당 픽셀에 대해 설정 오프셋만큼 떨어진 두 지점을 랜덤하게 생성하는 것뿐만 아니라, 도 6의 (b)와 같이 생성된 두 지점을 기준으로 임의 크기로 랜덤하게 생성되는 두 오프셋 패치를 사용하여 해당 픽셀에 대한 특징값을 연산한다. An embodiment of the present invention uses a method as shown in FIG. 6 (b) as a method for calculating a feature value of a pixel constituting a hand region in a depth image. That is, as shown in FIG. 6 (a), not only two points, which are separated by a set offset with respect to the corresponding pixel, are randomly generated, and random points are generated randomly based on two points generated as shown in FIG. 6 (b) And calculates a feature value for the corresponding pixel using two offset patches.

여기서 특정 오프셋만큼 떨어진 랜덤하게 생성된 오프셋 패치 후보군 중에서 손의 파트를 잘 분리할 수 있는 후보군을 선택하여 사용할 수 있다. 이러한 후보군은 학습 과정에서 얻을 수도 있다. 그리고 랜덤하게 생성되는 오프셋 패치는 사이즈가 무한히 큰 패치가 선택되어 분류 효율을 떨어뜨리거나 연산 복잡도를 증가시키는 것을 방지하기 위하여, 선택 가능한 최대 오프셋 사이즈가 미리 설정될 수 있다. 설정되는 최대 오프셋 사이즈에 따라 손 파트의 분류의 정확도는 달라질 수 있다.Here, a candidate group capable of separating a part of a hand from randomly generated offset patch candidates separated by a specific offset can be selected and used. These candidates can be obtained from the learning process. A randomly generated offset patch can be preset with a selectable maximum offset size in order to prevent a patch having an infinitely large size from being selected to reduce the classification efficiency or increase the computational complexity. The accuracy of the classification of the hand parts can vary depending on the maximum offset size that is set.

이러한 S520 단계에서 각 픽셀의 특징값 f_o(I,p)은 아래의 수학식 1을 이용하여 연산될 수 있다.In step S520, the feature value f _o (I, p) of each pixel can be calculated using the following equation (1).

여기서, I는 깊이 영상, p는 영상 내의 픽셀 위치, d_I(·)는 상기 깊이 영상 I 내에서 괄호 안의 픽셀 위치에서의 깊이 값, d_I ⁰은 상기 손 영역에 포함된 픽셀들의 평균 깊이 값을 나타낸다. u(i) 및 v(j)는 제1 및 제2 오프셋 패치로서, u(i)의 i는 제1 오프셋 패치 내에 존재하는 픽셀들의 위치 값, v(j)의 j는 제2 오프셋 패치 내에 존재하는 픽셀들의 위치 값을 나타낸다. u(i)/d_I ⁰ 및 v(j)/d_I ⁰는 평균 깊이 값인 d_I ⁰로 정규화된 u(i) 및 v(j)의 오프셋 위치를 나타내고, k 및 l은 제1 및 제2 오프셋 패치 내에 각각 존재하는 픽셀들의 개수를 의미한다.D _I (·) is a depth value at a pixel position in parentheses in the depth image I, d _I ⁰ is an average depth value of pixels included in the hand region, . where u (i) and v (j) are first and second offset patches, i of i (i) is the position value of the pixels present in the first offset patch, j of v (j) Represents the position value of existing pixels. where u (i) / d _I ⁰ and v (j) / d _I ⁰ represent the offset positions of u (i) and v (j) normalized to the average depth value d _I ⁰ , 2 " offset < / RTI >

즉, 특징값은 쉽게 말해서 깊이 영상 내에서 각 오프셋 패치의 영역에 대한 평균 깊이 값의 차이에 의해 연산된다. 이때, 물론 각 오프셋 패치의 위치는 손의 평균 깊이 값으로 정규화된 것을 사용한다. That is, the feature values are easily calculated by the difference of the average depth value for the area of each offset patch in the depth image. At this time, of course, the position of each offset patch is normalized to the average depth value of the hand.

물론 오프셋 패치를 사용하지 않고 도 6의 (a)와 같이 오프셋 포인트만 이용할 경우, 수학식 1에 개시된 평균 개념을 사용할 필요가 없으며 오프셋 포인트 u에 대한 정규화에는 전체 손 영역의 평균 깊이 값이 아닌 해당 픽셀 위치 p에서의 깊이 값이 사용된다.Of course, when the offset point is used as shown in FIG. 6 (a) without using the offset patch, it is not necessary to use the average concept disclosed in Equation (1), and the normalization for the offset point u is not limited to the average depth value of the entire hand region The depth value at the pixel position p is used.

도 6의 (b)와 같이, 해당 픽셀로부터 임의 오프셋만큼 떨어진 두 지점을 기준으로 생성된 두 오프셋 패치 영역 간의 평균 깊이 차를 이용하여 특징값을 연산하면, 단순히 두 지점(point) 간의 깊이 차만을 이용하여 특징값을 연산하는 도 6의 (a)의 경우에 비하여 좀더 노이즈에 강인한 분류 성능을 보여줄 수 있다. 즉, 픽셀로부터 떨어진 두 오프셋 점을 사용하는 경우보다 두 오프셋 패치를 사용할 경우 손 포즈 및 형상의 변화에 더욱 강인한 시스템의 성능을 발휘할 수 있으며 분류의 정확도를 높일 수 있게 된다.As shown in FIG. 6 (b), when the feature value is calculated using the average depth difference between two offset patch regions generated based on two points separated by an arbitrary offset from the corresponding pixel, only the depth difference between two points The classification performance can be more robust against noise than the case of FIG. 6 (a) in which feature values are calculated. That is, when two offset patches are used, it is possible to exhibit the system performance that is more robust to the hand pose and shape change than the case of using two offset points separated from the pixel, and the classification accuracy can be improved.

다음, 손 파트 분류부(130)는 입력되는 특징값에 대응하는 분류 라벨을 출력하도록 미리 학습된 랜덤 결정 트리 앙상블(Random Decision Forest) 내에, 상기 연산된 각 픽셀의 특징값을 개별 입력하여 상기 손 영역을 다수의 파트로 분류한다(S530). Then, the hand part classifier 130 individually inputs the feature values of the computed pixels into a random decision tree ensemble which is previously learned so as to output a classification label corresponding to the input feature value, The region is classified into a plurality of parts (S530).

각 노드에 입력된 픽셀의 특징값 f_o(I,p)은 해당 노드에 설정된 분리 함수에 따라 좌측의 자식 노드 또는 우측의 자식 노드로 분리되며 이와 같은 과정을 반복하여 최종 리프 노드까지 도달한다. 리프 노드는 그에 대응하는 라벨 값이 저장되어 있다. 이러한 노드의 분류 과정은 랜덤 결정 트리 분류기에서 일반적으로 사용되는 방법이다. The feature value f _o (I, p) of the pixel input to each node is divided into the child node on the left side or the child node on the right side according to the separation function set on the node, and the process is repeated to reach the final leaf node. The leaf node stores the corresponding label value. The classification of these nodes is a commonly used method in the random decision tree classifier.

이러한 S530 단계는 상기 손 영역에 포함된 각 픽셀에 대해 그 특징값에 대응하여 출력된 해당 분류 라벨을 할당하여 상기 손 영역을 상기 다수의 파트로 분류한다. 동일한 파트 내의 픽셀들은 모두 동일한 라벨 값을 가짐은 자명하다. 그 분류된 영상은 각 라벨에 대응하는 색상으로 표시될 수 있으며 이는 도 3의 (b)를 참조한다. In step S530, each pixel included in the hand region is assigned a corresponding classification label output corresponding to the feature value, and the hand region is classified into the plurality of parts. It is obvious that all the pixels in the same part have the same label value. The classified image can be displayed in a color corresponding to each label, which is shown in FIG. 3 (b).

다음, 중심점 설정부(140)는 상기 분류된 다수의 파트 별로 중심점을 설정한다(S540). 손의 중심점을 설정한 예시는 도 3의 (c)를 참조한다.Next, the center point setting unit 140 sets a center point for each of the plurality of classified parts (S540). An example of setting the center point of the hand is shown in Fig. 3 (c).

여기서 중심점을 탐색 또는 세팅하는 방법은 해당 파트의 영역을 구성하는 픽셀들 중에서 중심 위치에 있는 평균 픽셀 지점에 해당될 수 있다. 물론 본 발명이 반드시 이에 한정되는 것은 아니며 픽셀로 된 영역 내에서 중심점을 탐색하는 방법은 기 공지된 다양한 방법을 사용할 수 있다.Here, the method of searching or setting the center point may correspond to the average pixel point in the center position among the pixels constituting the area of the part. Of course, the present invention is not necessarily limited to this, and various known methods can be used as a method of searching a center point in a pixel region.

한편, 상기 손 영상에서 각 파트의 분류한 결과, 일부 파트의 경우 분류 라벨이 할당되지 않고 비어있는 파트가 발생할 수 있다. 예를 들어, 카메라에서 보았을 때 손동작, 모양, 각도에 따라 일부 손가락 파트가 가려져 있었거나 분류 과정에서 오류가 발생하여 일부 파트는 미분류될 수 있다.On the other hand, as a result of classification of each part in the hand image, in the case of some parts, a classification label may not be allocated and an empty part may occur. For example, some finger parts may be obscured by hand movements, shapes, and angles when viewed from the camera, or some parts may be undecided because errors occur during the classification process.

이와 같이 손 영역에 대한 파트 분류 이후, 비어있는 파트가 존재할 경우에는 해당 파트에 대해서는 중심점을 설정할 수 없게 되고 정확한 스켈레톤의 추출 또한 어렵게 된다. 하지만, 본 실시예는 이와 같이 분류 라벨이 할당되지 않은 비어있는 파트가 존재할 경우, 상기 비어있는 파트와 인접한 주변 파트들의 각 중심점과 기 공지된 다항식 곡선 접합 방법을 이용하여 상기 비어있는 파트에 대한 중심점을 추정할 수 있다.If there is an empty part after the part classification for the hand area, the center point can not be set for the part and extraction of the correct skeleton becomes difficult. However, in this embodiment, when there is an empty part to which the classification label is not allocated, the center point of the empty part and the center point of the neighboring parts adjacent to the center point of the empty part are calculated using the polynomial curve joining method, Can be estimated.

도 7은 도 5의 S540 단계에서 미분류된 파트에 대한 중심점을 추정하는 방법을 나타내는 개념도이다. 일반적으로 가려지는 파트는 손가락인 경우가 대부분이므로 손가락 부분을 예시하여 설명한다. 7 is a conceptual diagram illustrating a method of estimating a center point of a non-classified part in step S540 of FIG. Generally, the part to be covered is a finger, so the finger part is exemplified.

도 7의 (a)는 4개의 파트로 구분되는 임의 손가락에서 라벨 값을 가지지 못한 1개의 파트 P가 발생한 경우를 나타낸다. 그 주변의 파트는 라벨 값을 가지고 있기 때문에 중심점이 설정되어 있는 것을 알 수 있다. 도 7의 (b)는 이러한 (a)에서 그 주변 파트들의 세 중심점의 위치와 2차 다항식 곡선 접합을 이용하여, 보이지 않는 두 번째 파트의 중심점을 추정한 예를 나타낸다.FIG. 7A shows a case where one part P having no label value is generated in an arbitrary finger divided into four parts. Since the surrounding part has the label value, it can be seen that the center point is set. Fig. 7 (b) shows an example of estimating the center point of the second invisible part by using the positions of the three center points of the peripheral parts and the quadratic polynomial curve joining in (a).

여기서, 상기 추정된 중심점이 상기 깊이 영상 내에서 상기 손 영역에 대응하는 전경 이미지가 아닌 배경 이미지 내에 존재하는 경우가 발생할 수도 있다. 이때에는 상기 전경 이미지 내의 픽셀 중에서 상기 추정된 중심점과 최단 유클리디안 거리에 존재하는 전경 픽셀을 상기 비어있는 파트(P)에 대한 중심점으로 재설정한다. 이와 같은 방법을 통하여 미분류된 보이지 않는 파트에 대한 중심점 추정 내지 보정 과정을 효과적으로 수행할 수 있다.Here, it may happen that the estimated center point exists in the background image other than the foreground image corresponding to the hand region in the depth image. At this time, the foreground pixel existing at the shortest Euclidean distance from the estimated center point among the pixels in the foreground image is reset to the center point for the empty part (P). Through such a method, it is possible to effectively perform a center point estimation or correction process for a non-classified invisible part.

이후, 스켈레톤 추출부(150)는 상기 설정된 중심점들을 이용하여 상기 손의 스켈레톤(skeleton)을 추출한다(S550). 손의 스켈레톤 영상에 대한 예시는 도 3의 (d)를 참조한다.Then, the skeleton extracting unit 150 extracts a skeleton of the hand using the set center points (S550). An example of a skeleton image of a hand is shown in Fig. 3 (d).

이하에서는 본 발명의 실시예에 따른 손 영역 분류 방법에 대한 검증 결과에 대하여 설명한다. 테스트에는 총 5개의 데이터 집합을 사용하였다. 각 데이터 집합은 엄지손가락 접기, 두 손가락 접기, 세 손가락 접기, 네 손가락 접기, 모든 손가락 접기로 구분된다. 그리고 각각의 데이터 집합은 연속되는 손동작의 프레임들로 구성될 수 있다.Hereinafter, verification results of the hand area classification method according to the embodiment of the present invention will be described. A total of five data sets were used for the test. Each data set is divided into thumb folding, two finger folding, three finger folding, four finger folding, and all finger folding. And each dataset may consist of frames of successive hand gestures.

도 8은 본 발명에서 사용되는 데이터 집합의 예시를 나타낸다. 도 8의 (a)는 전체 데이터 집합 내에 포함된 일부 손동작 데이터들을 나타낸 것이고, (b)는 5번째 데이터 집합만 도시한 것으로 다섯 손가락을 접는 동작의 프레임들로 구성된다.8 shows an example of a data set used in the present invention. FIG. 8 (a) shows some hand operation data included in the entire data set, and FIG. 8 (b) shows only the fifth data set and consists of frames of folding five fingers.

도 9는 본 발명의 실시예에 따른 손 영역 분류 방법에 대한 정확도 검증 결과를 나타낸다. 이러한 도 9는 각 픽셀 단위의 손 영역의 분류에 있어서 도 6의 (a)와 같은 오프셋 포인트를 이용한 방법과 (b)와 같은 오프셋 패치를 이용한 방법 간의 정확도 성능 차이를 나타낸다. 본 발명의 실시예의 경우 대부분 정확도가 0.9 이상으로 우수한 것을 알 수 있다. 데이터 집합-1에서 데이터 집합-5로 갈수록 어려운 손동작으로서 분류의 난이도가 높아지기 때문에 그 정확도는 미소하게 차이가 나는 것을 알 수 있다.9 shows the accuracy verification result of the hand area classification method according to the embodiment of the present invention. 9 shows the accuracy performance difference between the method using the offset point as shown in (a) of FIG. 6 and the method using the offset patch as shown in (b) in the classification of the hand region for each pixel. It can be seen that most of the embodiments of the present invention have an accuracy of 0.9 or more. From the data set-1 to the data set-5, it is difficult to classify as a hand gesture.

도 10은 본 발명의 실시예에 따른 손 영역 분류 방법에서 스켈레톤 추정의 정확도를 나타낸다. 도 6의 (a)와 같은 오프셋 포인트를 이용한 방법보다 (b)왁 kx은 오프셋 패치를 이용한 경우 손의 스켈레톤 추정의 정확도가 높은 것을 알 수 있다.10 shows the accuracy of skeleton estimation in the hand region classification method according to an embodiment of the present invention. It can be seen from FIG. 6 (a) that the accuracy of hand skeleton estimation is higher when (b) wack kx is offset patches than when using offset points.

이상과 같은 본 발명에 따른 깊이 영상을 이용한 손 영역 분류 방법 및 그 장치에 따르면, 손의 깊이 영상만으로 손 영역을 다수의 파트로 효과적으로 분류할 수 있으며 이를 통해 손의 스켈레톤 정보를 정확하게 추출하고 손 동작의 인식률을 향상시킬 수 있는 이점이 있다.According to the hand region classification method and apparatus using the depth image according to the present invention, it is possible to effectively classify the hand region into a plurality of parts using only the depth image of the hand, thereby accurately extracting the skeleton information of the hand, Can be improved.

본 발명은 도면에 도시된 실시예를 참고로 설명되었으나 이는 예시적인 것에 불과하며, 본 기술 분야의 통상의 지식을 가진 자라면 이로부터 다양한 변형 및 균등한 다른 실시예가 가능하다는 점을 이해할 것이다. 따라서, 본 발명의 진정한 기술적 보호 범위는 첨부된 특허청구범위의 기술적 사상에 의하여 정해져야 할 것이다.While the present invention has been described with reference to exemplary embodiments, it is to be understood that the invention is not limited to the disclosed exemplary embodiments, but, on the contrary, is intended to cover various modifications and equivalent arrangements included within the spirit and scope of the appended claims. Accordingly, the true scope of the present invention should be determined by the technical idea of the appended claims.

100: 손 영역 분류 장치
110: 영역 검출부 120: 특징값 연산부
130: 파트 분류부 140: 중심점 설정부
150: 스켈레톤 추출부100: hand area classifier
110: area detecting unit 120: feature value calculating unit
130: part classification unit 140: center point setting unit
150: Skeleton extracting unit

Claims

카메라의 깊이 영상으로부터 손 영역을 검출하는 단계;
상기 손 영역에 포함된 각 픽셀의 특징값을 연산하되, 상기 깊이 영상 내에서 상기 픽셀로부터 설정 오프셋만큼 떨어진 두 지점을 기준으로 임의 크기로 생성된 두 오프셋 패치에 대한 평균 깊이값의 차이를 이용하여 상기 각 픽셀의 특징값을 연산하는 단계;
입력되는 특징값에 대응하는 분류 라벨을 출력하도록 미리 학습된 랜덤 결정 트리 앙상블(Random Decision Forest) 내에 상기 각 픽셀의 특징값을 개별 입력하여 상기 손 영역을 다수의 파트로 분류하는 단계;
상기 분류된 다수의 파트 별로 중심점을 설정하는 단계; 및
상기 설정된 중심점들을 이용하여 상기 손의 스켈레톤(skeleton)을 추출하는 단계를 포함하며,
상기 손 영역을 다수의 파트로 분류하는 단계는,
상기 손 영역에 포함된 각 픽셀에 대해 그 특징값에 대응하여 출력된 해당 분류 라벨을 할당하여 상기 손 영역을 상기 다수의 파트로 분류하고,
상기 중심점을 설정하는 단계는,
상기 분류 라벨이 할당되지 않은 비어있는 파트가 존재하면, 상기 비어있는 파트와 인접한 파트들에 대한 각 중심점과 다항식 곡선 접합을 이용하여, 상기 비어있는 파트에 대한 중심점을 추정하는 깊이 영상을 이용한 손 영역 분류 방법.Detecting a hand region from a depth image of the camera;
Calculating a feature value of each pixel included in the hand region by using a difference of an average depth value of two offset patches generated in an arbitrary size based on two points spaced from the pixel by a set offset in the depth image, Calculating feature values of each pixel;
Classifying the hand region into a plurality of parts by separately inputting feature values of the respective pixels in a random decision tree ensemble which is previously learned to output a classification label corresponding to input feature values;
Setting a center point for each of the plurality of classified parts; And
And extracting a skeleton of the hand using the set center points,
The step of classifying the hand region into a plurality of parts comprises:
Assigning a corresponding classification label output corresponding to the feature value to each pixel included in the hand region, classifying the hand region into the plurality of parts,
The step of setting the center-
A hand region using a depth image estimating a center point of the empty part by using polynomial curve joining with each center point for the parts adjacent to the empty part if there is an empty part to which the classification label is not allocated, Classification method.

청구항 1에 있어서,
상기 각 픽셀의 특징값 f_o(I,p)은 아래의 수학식을 이용하여 연산하는 깊이 영상을 이용한 손 영역 분류 방법:

여기서, I는 깊이 영상, p는 영상 내의 픽셀 위치, d_I(·)는 상기 깊이 영상 I 내에서 괄호 안의 픽셀 위치에서의 깊이 값, d_I ⁰은 상기 손 영역에 포함된 픽셀들의 평균 깊이 값, u(i) 및 v(j)는 제1 및 제2 오프셋 패치, i 및 j는 상기 제1 및 제2 오프셋 패치 내에 각각 존재하는 픽셀들의 위치 값, u(i)/d_I ⁰ 및 v(j)/d_I ⁰는 d_I ⁰로 정규화된 u(i) 및 v(j)의 오프셋 위치, k 및 l은 상기 제1 및 제2 오프셋 패치 내에 각각 존재하는 픽셀들의 개수를 의미한다.The method according to claim 1,
The feature value f _o (I, p) of each pixel is calculated by using the following equation:

D _I (·) is a depth value at a pixel position in parentheses in the depth image I, d _I ⁰ is an average depth value of pixels included in the hand region, , u (i) and v (j) are the first and second offset patch, i and j are the first and second respectively, present position value of the pixels in the offset _{patch, u (i) / d I} 0 and v (j) / d _I ⁰ is the offset position of u (i) and v (j) normalized to d _I ⁰ , and k and l denote the number of pixels respectively present in the first and second offset patches.

삭제delete

청구항 1에 있어서,
상기 중심점을 설정하는 단계는,
상기 추정된 중심점이 상기 깊이 영상 내에서 상기 손 영역에 대응하는 전경 이미지가 아닌 배경 이미지 내에 존재하면, 상기 전경 이미지 내의 픽셀 중에서 상기 추정된 중심점과 최단 유클리디안 거리에 존재하는 픽셀을 상기 비어있는 파트에 대한 중심점으로 설정하는 깊이 영상을 이용한 손 영역 분류 방법.The method according to claim 1,
The step of setting the center-
Wherein if the estimated center point is within a background image other than the foreground image corresponding to the hand region within the depth image, then a pixel existing at the shortest Euclidean distance from the estimated center point among the pixels in the foreground image, A Hand Region Classification Using Depth Image Set as Center Point for Part.

카메라의 깊이 영상으로부터 손 영역을 검출하는 영역 검출부;
상기 손 영역에 포함된 각 픽셀의 특징값을 연산하되, 상기 깊이 영상 내에서 상기 픽셀로부터 설정 오프셋만큼 떨어진 두 지점을 기준으로 임의 크기로 생성된 두 오프셋 패치에 대한 평균 깊이값의 차이를 이용하여 상기 각 픽셀의 특징값을 연산하는 특징값 연산부;
입력되는 특징값에 대응하는 분류 라벨을 출력하도록 미리 학습된 랜덤 결정 트리 앙상블(Random Decision Forest) 내에 상기 각 픽셀의 특징값을 개별 입력하여 상기 손 영역을 다수의 파트로 분류하는 파트 분류부;
상기 분류된 다수의 파트 별로 중심점을 설정하는 중심점 설정부; 및
상기 설정된 중심점들을 이용하여 상기 손의 스켈레톤(skeleton)을 추출하는 스켈레톤 추출부를 포함하며,
상기 파트 분류부는,
상기 손 영역에 포함된 각 픽셀에 대해 그 특징값에 대응하여 출력된 해당 분류 라벨을 할당하여 상기 손 영역을 상기 다수의 파트로 분류하고,
상기 중심점 설정부는,
상기 분류 라벨이 할당되지 않은 비어있는 파트가 존재하면, 상기 비어있는 파트와 인접한 파트들에 대한 각 중심점과 다항식 곡선 접합을 이용하여, 상기 비어있는 파트에 대한 중심점을 추정하는 깊이 영상을 이용한 손 영역 분류 장치.An area detector for detecting a hand area from the depth image of the camera;
Calculating a feature value of each pixel included in the hand region by using a difference of an average depth value of two offset patches generated in an arbitrary size based on two points spaced from the pixel by a set offset in the depth image, A feature value computing unit for computing a feature value of each pixel;
A part classifier for classifying the hand region into a plurality of parts by separately inputting feature values of the respective pixels in a random decision tree ensemble (Random Decision Forest) that has been previously learned to output classification labels corresponding to inputted feature values;
A center point setting unit for setting a center point for each of the plurality of classified parts; And
And a skeleton extracting unit for extracting a skeleton of the hand using the set center points,
Wherein the part classification unit comprises:
Assigning a corresponding classification label output corresponding to the feature value to each pixel included in the hand region, classifying the hand region into the plurality of parts,
The center-
A hand region using a depth image estimating a center point of the empty part by using polynomial curve joining with each center point for the parts adjacent to the empty part if there is an empty part to which the classification label is not allocated, Classification device.

청구항 5에 있어서,
상기 각 픽셀의 특징값 f_o(I,p)은 아래의 수학식을 이용하여 연산하는 깊이 영상을 이용한 손 영역 분류 장치:

여기서, I는 깊이 영상, p는 영상 내의 픽셀 위치, d_I(·)는 상기 깊이 영상 I 내에서 괄호 안의 픽셀 위치에서의 깊이 값, d_I ⁰은 상기 손 영역에 포함된 픽셀들의 평균 깊이 값, u(i) 및 v(j)는 제1 및 제2 오프셋 패치, i 및 j는 상기 제1 및 제2 오프셋 패치 내에 각각 존재하는 픽셀들의 위치 값, u(i)/d_I ⁰ 및 v(j)/d_I ⁰는 d_I ⁰로 정규화된 u(i) 및 v(j)의 오프셋 위치, k 및 l은 상기 제1 및 제2 오프셋 패치 내에 각각 존재하는 픽셀들의 개수를 의미한다.The method of claim 5,
The feature value f _o (I, p) of each pixel is calculated by using the following equation:

삭제delete

청구항 5에 있어서,
상기 중심점 설정부는,
상기 추정된 중심점이 상기 깊이 영상 내에서 상기 손 영역에 대응하는 전경 이미지가 아닌 배경 이미지 내에 존재하면, 상기 전경 이미지 내의 픽셀 중에서 상기 추정된 중심점과 최단 유클리디안 거리에 존재하는 픽셀을 상기 비어있는 파트에 대한 중심점으로 설정하는 깊이 영상을 이용한 손 영역 분류 장치.The method of claim 5,
The center-
Wherein if the estimated center point is within a background image other than the foreground image corresponding to the hand region within the depth image, then a pixel existing at the shortest Euclidean distance from the estimated center point among the pixels in the foreground image, A hand region classifier using depth images to set center points for parts.