KR20220085678A

KR20220085678A - A 3D skeleton generation method using calibration based on joints acquired from multi-view camera

Info

Publication number: KR20220085678A
Application number: KR1020210008979A
Authority: KR
Inventors: 서영호; 박병서
Original assignee: 광운대학교 산학협력단
Priority date: 2020-12-15
Filing date: 2021-01-21
Publication date: 2022-06-22
Also published as: KR102416523B1

Abstract

분산 RGB-D 카메라로부터 각 시점의 부분 스켈레톤을 추출하고, 각 부분 스켈레톤의 조인트를 특징점으로 하여 카메라 파라미터를 계산하고, 파라미터를 기반으로 각 부분 스켈레톤을 3차원 스켈레톤으로 통합하는, 다시점 카메라로부터 획득된 조인트 기반의 캘리브레이션을 이용한 3차원 스켈레톤 생성 방법에 관한 것으로서, (a) 다시점 색상-깊이 영상을 획득하는 단계; (b) 각 시점의 색상-깊이 영상으로부터 각 시점의 3차원 스켈레톤을 생성하고, 각 시점의 스켈레톤의 조인트를 특징점으로 생성하는 단계; (c) 각 시점의 스켈레톤의 조인트를 이용하여 외부 파라미터를 최적화 하는 외부 캘리브레이션을 수행하는 단계; 및, (d) 외부 파라미터를 이용하여, 각 시점의 3차원 스켈레톤을 정렬하고 통합하는 단계를 포함하는 구성을 마련하여, 다시점 RGB-D 영상의 부분 스켈레톤의 조인트를 특징점으로 파라미터를 계산하고 이를 기반으로 스켈레톤을 통합함으로써, 높은 신뢰성을 갖는 3D 스켈레톤을 생성할 수 있고, 생성된 3D 스켈레톤은 시간적으로 끊김없이 3D 체적 정보의 형태와 움직임을 정확히 표현할 수 있다.Obtained from a multi-view camera, extracting a partial skeleton at each viewpoint from a distributed RGB-D camera, calculating camera parameters using the joint of each partial skeleton as a feature point, and integrating each partial skeleton into a 3D skeleton based on the parameters A method for generating a three-dimensional skeleton using a joint-based calibration, comprising the steps of: (a) acquiring a multi-viewpoint color-depth image; (b) generating a three-dimensional skeleton of each view from the color-depth image of each view, and generating a joint of the skeleton of each view as a feature point; (c) performing external calibration for optimizing external parameters using the joint of the skeleton at each time point; And, (d) using the external parameters to prepare a configuration comprising the step of aligning and integrating the three-dimensional skeleton of each view, calculating the parameters using the joint of the partial skeleton of the multi-view RGB-D image as a feature point, and By integrating the skeleton as the base, a 3D skeleton with high reliability can be created, and the created 3D skeleton can accurately express the shape and movement of 3D volume information without any temporal interruption.

Description

다시점 카메라로부터 획득된 조인트 기반의 캘리브레이션을 이용한 3차원 스켈레톤 생성 방법 { A 3D skeleton generation method using calibration based on joints acquired from multi-view camera }{ A 3D skeleton generation method using calibration based on joints acquired from multi-view camera }

본 발명은 분산 RGB-D 카메라로부터 각 시점의 부분 스켈레톤을 추출하고, 각 부분 스켈레톤의 조인트를 특징점으로 하여 카메라 파라미터를 계산하고, 파라미터를 기반으로 각 부분 스켈레톤을 3차원 스켈레톤으로 통합하는, 다시점 카메라로부터 획득된 조인트 기반의 캘리브레이션을 이용한 3차원 스켈레톤 생성 방법에 관한 것이다.The present invention extracts a partial skeleton at each viewpoint from a distributed RGB-D camera, calculates camera parameters using the joint of each partial skeleton as a feature point, and integrates each partial skeleton into a three-dimensional skeleton based on the parameters. It relates to a method for generating a three-dimensional skeleton using joint-based calibration obtained from a camera.

또한, 본 발명은 카메라 파라미터와 색상-깊이 영상으로부터 3차원 메쉬를 생성하고, 카메라 파라미터를 이용하여 각 부분의 스켈레톤을 정렬하고 3차원 메쉬로 보정하여, 통합된 3차원 스켈레톤을 생성하는, 다시점 카메라로부터 획득된 조인트 기반의 캘리브레이션을 이용한 3차원 스켈레톤 생성 방법에 관한 것이다.In addition, the present invention generates a three-dimensional mesh from camera parameters and color-depth images, aligns the skeleton of each part using the camera parameters, and corrects the three-dimensional mesh to generate an integrated three-dimensional skeleton, multi-view It relates to a method for generating a three-dimensional skeleton using joint-based calibration obtained from a camera.

최근 가상 현실(virtual reality)과 증강 현실(argumented reality) 산업이 활성화 되면서, 여러 시점에서 실감 넘치는 경험을 제공해 주는 3D 비디오 콘텐츠 기술 개발 또한 활발하게 이뤄지고 있다. 3D 비디오 콘텐츠는 게임 분야부터 영상서비스, 의료분야, 교육분야 등 여러 응용분야에 적용되고 있다.Recently, as the virtual reality and augmented reality industries have been activated, the development of 3D video content technology that provides a realistic experience from multiple points of view is also being actively developed. 3D video contents are being applied to various application fields such as game field, image service field, medical field, and education field.

이러한 3차원 콘텐츠를 생성하기 위하여, 3차원 객체에 대하여 3D 체적 메쉬 모델로 모델링하고, 이를 바탕으로 리깅 혹은 애니메이팅을 수행한다. 그런데 이러한 적용을 위하여 정밀한 3D 스켈레톤을 획득하는 방법이 필요하다.In order to generate such 3D content, a 3D object is modeled as a 3D volumetric mesh model, and rigging or animation is performed based on this. However, for this application, a method of obtaining a precise 3D skeleton is required.

현재까지 스켈레톤 추출을 위한 많은 연구가 진행되어 왔다. 스켈레톤 추출을 위한 많은 신호처리 기술들이 연구되었고, 최근에는 딥러닝을 기반으로 하는 많은 기술들이 연구되어 오고 있다. Until now, many studies for skeleton extraction have been conducted. Many signal processing techniques for skeleton extraction have been studied, and recently, many techniques based on deep learning have been studied.

전통적인 스켈레톤 추출 방법은 사람에게 센서와 같은 다양한 장비를 부착하는 것이다. 움직임을 실시간으로 정교하게 파악할 수 있지만 높은 비용이 들어가기도 하고 실생활에서 항상 장비를 착용하는 것이 아니어서 연구실 또는 한정된 영역에서만 가능한 방법이다.The traditional method of skeleton extraction is to attach various devices, such as sensors, to the person. Although it is possible to precisely grasp the movement in real time, it is expensive and it is not always possible to wear the equipment in real life, so it is only possible in the laboratory or in a limited area.

그래서 몸에 부착하는 장비 없이, 사진에서 자세를 추정하는 인체 자세 추정 연구가 진행되었다. 자세를 추정하기 위해서는 사진에서 인체의 윤곽이나 특정 신체 부위를 추론할 만한 외곽선 등의 특징을 추출해야 한다. 그러나 아직 정밀도가 높지 않다.Therefore, a study on human body posture estimation was conducted to estimate the posture from a photograph without any equipment attached to the body. In order to estimate the posture, it is necessary to extract features such as the outline of the human body or an outline from which a specific body part can be inferred from the photograph. However, the precision is not yet high.

Kevin Desai, Balakrishnan Prabhakaran, and Suraj Raghuraman. 2018. “Skeleton-based continuous extrinsic calibration of multiple RGB-D kinect cameras.” 2018 Proceedings of the 9th ACM Multimedia Systems Conference. Association for Computing Machinery, New York, NY, USA, 250?257. DOI:https://doi.org/10.1145/3204949.3204969 Kevin Desai, Balakrishnan Prabhakaran, and Suraj Raghuraman. 2018. “Skeleton-based continuous extrinsic calibration of multiple RGB-D kinect cameras.” 2018 Proceedings of the 9th ACM Multimedia Systems Conference. Association for Computing Machinery, New York, NY, USA, 250–257. DOI: https://doi.org/10.1145/3204949.3204969 Y. Wu, L. Gao, S. Hoermann and R. W. Lindeman, "Towards Robust 3D Skeleton Tracking Using Data Fusion from Multiple Depth Sensors," 2018 10th International Conference on Virtual Worlds and Games for Serious Applications (VS-Games), Wurzburg, 2018, pp. 1-4, doi: 10.1109/VS-Games.2018.8493443. Y. Wu, L. Gao, S. Hoermann and R. W. Lindeman, “Towards Robust 3D Skeleton Tracking Using Data Fusion from Multiple Depth Sensors,” 2018 10th International Conference on Virtual Worlds and Games for Serious Applications (VS-Games), Wurzburg, 2018, pp. 1-4, doi: 10.1109/VS-Games.2018.8493443. Cao, Zhe, et al. "OpenPose: realtime multi-person 2D pose estimation using Part Affinity Fields." arXiv preprint arXiv:1812.08008 (2018). Cao, Zhe, et al. "OpenPose: realtime multi-person 2D pose estimation using Part Affinity Fields." arXiv preprint arXiv:1812.08008 (2018).

본 발명의 목적은 상술한 바와 같은 문제점을 해결하기 위한 것으로, 분산 RGB-D 카메라로부터 각 시점의 부분 스켈레톤을 추출하고, 각 부분 스켈레톤의 조인트를 특징점으로 하여 카메라 파라미터를 계산하고, 파라미터를 기반으로 각 부분 스켈레톤을 3차원 스켈레톤으로 통합하는, 다시점 카메라로부터 획득된 조인트 기반의 캘리브레이션을 이용한 3차원 스켈레톤 생성 방법을 제공하는 것이다.An object of the present invention is to solve the above-mentioned problems, by extracting a partial skeleton at each viewpoint from a distributed RGB-D camera, calculating camera parameters using the joint of each partial skeleton as a feature point, and based on the parameters An object of the present invention is to provide a method for generating a three-dimensional skeleton using joint-based calibration obtained from a multi-view camera that integrates each partial skeleton into a three-dimensional skeleton.

또한, 본 발명의 목적은 카메라 파라미터와 색상-깊이 영상으로부터 3차원 메쉬를 생성하고, 카메라 파라미터를 이용하여 각 부분의 스켈레톤을 정렬하고 3차원 메쉬로 보정하여, 통합된 3차원 스켈레톤을 생성하는, 다시점 카메라로부터 획득된 조인트 기반의 캘리브레이션을 이용한 3차원 스켈레톤 생성 방법을 제공하는 것이다.In addition, an object of the present invention is to generate a three-dimensional mesh from camera parameters and color-depth images, align the skeleton of each part using the camera parameters, and correct the three-dimensional mesh to generate an integrated three-dimensional skeleton, An object of the present invention is to provide a method for generating a three-dimensional skeleton using joint-based calibration obtained from a multi-view camera.

상기 목적을 달성하기 위해 본 발명은 다시점 카메라로부터 획득된 조인트 기반의 캘리브레이션을 이용한 3차원 스켈레톤 생성 방법에 관한 것으로서, (a) 다시점 색상-깊이 영상을 획득하는 단계; (b) 각 시점의 색상-깊이 영상으로부터 각 시점의 3차원 스켈레톤을 생성하고, 각 시점의 스켈레톤의 조인트를 특징점으로 생성하는 단계; (c) 각 시점의 스켈레톤의 조인트를 이용하여 외부 파라미터를 최적화 하는 외부 캘리브레이션을 수행하는 단계; 및, (d) 외부 파라미터를 이용하여, 각 시점의 3차원 스켈레톤을 정렬하고 통합하는 단계를 포함하는 것을 특징으로 한다.To achieve the above object, the present invention relates to a method for generating a three-dimensional skeleton using joint-based calibration obtained from a multi-view camera, comprising the steps of: (a) acquiring a multi-view color-depth image; (b) generating a three-dimensional skeleton of each view from the color-depth image of each view, and generating a joint of the skeleton of each view as a feature point; (c) performing external calibration for optimizing external parameters using the joint of the skeleton at each time point; and, (d) using an external parameter, aligning and integrating the three-dimensional skeleton of each view.

또, 본 발명은 다시점 카메라로부터 획득된 조인트 기반의 캘리브레이션을 이용한 3차원 스켈레톤 생성 방법에 있어서, 상기 (a)단계에서, 상기 다시점 색상-깊이 영상은 적어도 2개의 수평층을 이루고 각 층에서 적어도 4개의 색상-깊이 카메라에 의해 촬영된 각 시점의 색상-깊이 영상들로 구성되는 것을 특징으로 한다.In addition, the present invention provides a method for generating a three-dimensional skeleton using joint-based calibration obtained from a multi-view camera, in the step (a), the multi-view color-depth image forms at least two horizontal layers and in each layer It is characterized in that it consists of color-depth images of each viewpoint taken by at least four color-depth cameras.

또, 본 발명은 다시점 카메라로부터 획득된 조인트 기반의 캘리브레이션을 이용한 3차원 스켈레톤 생성 방법에 있어서, 상기 (b)단계에서, 애저 키넥트(Asure Kinect)의 SDK(software development kit)를 이용하여, 각 카메라별로 3D 스켈레톤을 추출하는 것을 특징으로 한다.In addition, the present invention provides a method for generating a three-dimensional skeleton using joint-based calibration obtained from a multi-view camera, in the step (b), using a software development kit (SDK) of Azure Kinect, It is characterized by extracting a 3D skeleton for each camera.

또, 본 발명은 다시점 카메라로부터 획득된 조인트 기반의 캘리브레이션을 이용한 3차원 스켈레톤 생성 방법에 있어서, 상기 (c)단계에서, 각 시점의 스켈레톤의 조인트들을 특징점 집합으로 구성하고, 상기 특징점 집합에 대해 외부 파라미터의 최적화를 수행하여 가장 큰 오차를 가진 특징점을 검출하고, 검출된 특징점을 상기 특징점 집합에서 제외하고 나머지 집합에 대해 최적화 작업을 반복하고, 최적화 오차가 가장 적은 경우의 최적화된 외부 파라미터를 최종 외부 파라미터로 획득하는 것을 특징으로 한다.In addition, the present invention provides a method for generating a three-dimensional skeleton using joint-based calibration obtained from a multi-view camera. In step (c), the joints of the skeleton at each view are configured as a feature set, The feature point with the largest error is detected by performing optimization of the external parameter, the detected feature point is excluded from the feature point set, and the optimization operation is repeated for the remaining set, and the optimized external parameter with the smallest optimization error is finalized. It is characterized in that it is obtained as an external parameter.

또, 본 발명은 다시점 카메라로부터 획득된 조인트 기반의 캘리브레이션을 이용한 3차원 스켈레톤 생성 방법에 있어서, 상기 (c)단계에서, 가장 오차가 심한 조인트의 오차가 특정 임계치 이하인 경우, 반복을 중단하고, 중단할 때의 카메라 파라미터를 최종 파라미터로 선정하는 것을 특징으로 한다.In addition, the present invention provides a method for generating a three-dimensional skeleton using joint-based calibration obtained from a multi-view camera. It is characterized in that the camera parameter at the time of stopping is selected as the final parameter.

또, 본 발명은 다시점 카메라로부터 획득된 조인트 기반의 캘리브레이션을 이용한 3차원 스켈레톤 생성 방법에 있어서, 상기 (c)단계에서, 기준 좌표계의 포인트 클라우드의 실제 좌표(X_ref)와 상기 변환 파라미터에 의한 변환 좌표(X_i')의 오차가 최소화 되도록, 상기 변환 파라미터를 최적화하되, 다음 수식 1에 의하여, 현재 좌표변환 파라미터 P_n에서 다음 좌표변환 파라미터 P_n+1을 업데이트하여, 최적화를 반복하는 것을 특징으로 한다.In addition, the present invention provides a three-dimensional skeleton generation method using joint-based calibration obtained from a multi-view camera, in the step (c), the actual coordinates (X _ref ) of the point cloud of the reference coordinate system and the transformation parameters Optimizing the transformation parameters so that the error of the transformation coordinates (X _i ') is minimized, and updating the next coordinate transformation parameter P _n+1 from the current coordinate transformation parameter P _n by Equation 1, repeating the optimization characterized.

[수식 1][Formula 1]

여기서, α는 사전에 설정된 상수이고, P는 변환 파라미터 회 전변환 행렬 R, 평행이동 행렬 t, 및, 스케일링 팩터 S를 의미하고, P_n은 현재 계산된 변환 파라미터의 값, 그리고 P_n+1은 보정될 좌표변환 파라미터 값이고, ∂f_Error/∂P_n 는 f_Error를 변환 파라미터로 편미분하는 것을 의미하고, f_Error는 기준 좌표계의 포인트 클라우드의 실제 좌표(X_ref)와 변환 파라미터에 의한 변환 좌표(X_i')의 오차 함수임Here, α is a preset constant, P denotes a transformation parameter rotation transformation matrix R, a translation matrix t, and a scaling factor S, P _n is a value of the currently calculated transformation parameter, and P _n+1 is the coordinate transformation parameter value to be corrected, ∂f _Error /∂P _n means partial differentiation of f _Error as a transformation parameter, and f _Error is the actual coordinate (X _ref ) of the point cloud of the reference coordinate system and transformation by the transformation parameter. is the error function of the coordinates (X _i ')

또, 본 발명은 다시점 카메라로부터 획득된 조인트 기반의 캘리브레이션을 이용한 3차원 스켈레톤 생성 방법에 있어서, 상기 (d)단계에서, 각 시점의 외부 파라미터 또는 카메라 파라미터를 이용하여, 각 시점의 3차원 스켈레톤을 하나의 월드 좌표계로 정렬한 후 통합하는 것을 특징으로 한다.In addition, the present invention provides a method for generating a three-dimensional skeleton using joint-based calibration obtained from a multi-view camera. It is characterized in that it is integrated after aligning it into one world coordinate system.

또, 본 발명은 다시점 카메라로부터 획득된 조인트 기반의 캘리브레이션을 이용한 3차원 스켈레톤 생성 방법에 있어서, 상기 (d)단계에서, 다시점 색상-깊이 영상으로부터 3차원 메쉬 모델을 생성하고, 각 시점의 스켈레톤을 월드 좌표계로 변환하여 정렬한 후, 정렬된 3D 스켈레톤들과 3D 체적 메쉬 모델을 하나의 공간에 위치시키고, 3D 체적 메쉬 모델의 바깥에 존재하는 조인트들을 제외하여 보정하는 것을 특징으로 한다.In addition, the present invention provides a method for generating a three-dimensional skeleton using joint-based calibration obtained from a multi-view camera. In step (d), a three-dimensional mesh model is generated from a multi-view color-depth image, and the After the skeleton is converted into the world coordinate system and aligned, the aligned 3D skeletons and the 3D volumetric mesh model are placed in one space, and the joints existing outside the 3D volumetric mesh model are excluded and corrected.

또, 본 발명은 다시점 카메라로부터 획득된 조인트 기반의 캘리브레이션을 이용한 3차원 스켈레톤 생성 방법에 있어서, 상기 (d)단계에서, 포인트 클라우드 통합 또는 메쉬 생성 방식을 사용하여 3차원 체적 모델을 생성하는 것을 특징으로 한다.In addition, the present invention provides a method for generating a three-dimensional skeleton using joint-based calibration obtained from a multi-view camera, in the step (d), generating a three-dimensional volume model using a point cloud integration or mesh generation method. characterized.

또한, 본 발명은 다시점 카메라로부터 획득된 조인트 기반의 캘리브레이션을 이용한 3차원 스켈레톤 생성 방법을 수행하는 프로그램을 기록한 컴퓨터로 읽을 수 있는 기록매체에 관한 것이다.In addition, the present invention relates to a computer-readable recording medium in which a program for performing a method for generating a three-dimensional skeleton using joint-based calibration obtained from a multi-view camera is recorded.

상술한 바와 같이, 본 발명에 따른 다시점 카메라로부터 획득된 조인트 기반의 캘리브레이션을 이용한 3차원 스켈레톤 생성 방법에 의하면, 다시점 RGB-D 영상의 부분 스켈레톤의 조인트를 특징점으로 파라미터를 계산하고 이를 기반으로 스켈레톤을 통합함으로써, 높은 신뢰성을 갖는 3D 스켈레톤을 생성할 수 있고, 생성된 3D 스켈레톤은 시간적으로 끊김없이 3D 체적 정보의 형태와 움직임을 정확히 표현할 수 있는 효과가 얻어진다.As described above, according to the 3D skeleton generation method using the joint-based calibration obtained from the multi-view camera according to the present invention, parameters are calculated using the joint of the partial skeleton of the multi-view RGB-D image as a feature point, and based on this By integrating the skeleton, it is possible to generate a 3D skeleton with high reliability, and the created 3D skeleton has the effect of accurately expressing the shape and movement of 3D volume information without temporal interruption.

도 1은 본 발명을 실시하기 위한 전체 시스템에 대한 구성도.
도 2는 본 발명의 일실시예에 따른 다시점 카메라로부터 획득된 조인트 기반의 캘리브레이션을 이용한 3차원 스켈레톤 생성 방법을 설명하는 흐름도.
도 3은 본 발명의 일실시예에 따른 실사 3D 체적 촬영 시스템에 대한 예시도로서, (a) 수직, (b) 수평 촬영 각도 및 범위에 대한 예시도.
도 4는 본 발명의 일실시예에 따른 RGB-D 카메라로부터 출력되는 영상과 스켈레톤의 조인트에 대한 예시도로서, (a) 색상 영상(RGB image), (b) 깊이 영상(depth image), (c) 추정된 스켈레톤과 정의된 조인트(estimated skeleton and defined joint)에 대한 예시도.
도 5는 본 발명의 일실시예에 따른 스켈레톤에서 특징점을 추출하는 예시도로서, (a) 스켈레톤과 (b) 조인트(joint)로 정의된 특징점에 대한 예시도.
도 6은 본 발명의 일실시예에 따른 스켈레톤 기반의 외부 캘리브레이션 방법을 나타낸 슈도 코드.
도 7은 본 발명의 일실시예에 따른 공간 상에서 조인트(joint)들 간의 거리를 이용하여 거리의 차이가 최소화되는 외부 파라미터 최적화 단계를 도식화한 도면.
도 8은 본 발명의 일실시예에 따른 통합된 3차원 모델의 포인트 클라우드 분포를 이용한 관절의 결정 및 보정 방법을 도식화 한 예시도로서, 굵은 점선은 3차원 모델의 표면을 나타내고, 붉은 점은 결정된 관절이고, 푸른 점은 정렬된 관절이며, (a) 정렬된 관절, (b) 불필요한 관절을 제외한 후에 보정된 관절에 대한 예시도.1 is a block diagram of an entire system for implementing the present invention.
2 is a flowchart illustrating a 3D skeleton generation method using joint-based calibration obtained from a multi-view camera according to an embodiment of the present invention.
Figure 3 is an exemplary view of a live-action 3D volumetric imaging system according to an embodiment of the present invention, (a) vertical, (b) an exemplary view of the horizontal shooting angle and range.
4 is an exemplary diagram of a joint between an image output from an RGB-D camera and a skeleton according to an embodiment of the present invention, (a) a color image (RGB image), (b) a depth image, ( c) An illustration of an estimated skeleton and defined joint.
5 is an exemplary diagram for extracting feature points from a skeleton according to an embodiment of the present invention, and is an exemplary diagram of feature points defined by (a) a skeleton and (b) a joint.
6 is a pseudo code illustrating a skeleton-based external calibration method according to an embodiment of the present invention.
7 is a diagram schematically illustrating an external parameter optimization step in which a distance difference is minimized by using a distance between joints in space according to an embodiment of the present invention.
8 is an exemplary diagram schematically illustrating a joint determination and correction method using a point cloud distribution of an integrated three-dimensional model according to an embodiment of the present invention, wherein the thick dotted line indicates the surface of the three-dimensional model, and the red dot is the determined Joints, blue dots are aligned joints, (a) aligned joints, (b) exemplary views of corrected joints after excluding unnecessary joints.

이하, 본 발명의 실시를 위한 구체적인 내용을 도면에 따라서 설명한다.Hereinafter, specific contents for carrying out the present invention will be described with reference to the drawings.

또한, 본 발명을 설명하는데 있어서 동일 부분은 동일 부호를 붙이고, 그 반복 설명은 생략한다.In addition, in demonstrating this invention, the same part is attached|subjected by the same code|symbol, and the repetition description is abbreviate|omitted.

먼저, 본 발명을 실시하기 위한 전체 시스템의 구성의 예들에 대하여 도 1을 참조하여 설명한다.First, examples of the configuration of the entire system for implementing the present invention will be described with reference to FIG. 1 .

도 1에서 보는 바와 같이, 본 발명에 따른 3차원 스켈레톤 생성 방법은 분산 카메라 시스템(20)에 의해 촬영된 다시점 깊이 및 색상(RGB 등) 이미지(60)를 입력받아 3차원 스켈레톤을 생성하는 컴퓨터 단말(30) 상의 프로그램 시스템으로 실시될 수 있다. 즉, 3차원 스켈레톤 생성 방법은 프로그램으로 구성되어 컴퓨터 단말(30)에 설치되어 실행될 수 있다. 컴퓨터 단말(30)에 설치된 프로그램은 하나의 프로그램 시스템(40)과 같이 동작할 수 있다.As shown in FIG. 1 , the method for generating a three-dimensional skeleton according to the present invention receives a multi-viewpoint depth and color (RGB, etc.) image 60 captured by a distributed camera system 20 and generates a three-dimensional skeleton by a computer It may be implemented as a program system on the terminal 30 . That is, the 3D skeleton generation method may be configured as a program and installed in the computer terminal 30 to be executed. A program installed in the computer terminal 30 may operate as one program system 40 .

한편, 다른 실시예로서, 3차원 스켈레톤 생성 방법은 프로그램으로 구성되어 범용 컴퓨터에서 동작하는 것 외에 ASIC(주문형 반도체) 등 하나의 전자회로로 구성되어 실시될 수 있다. 또는 다시점 깊이 및 색상 이미지에서 3차원 스켈레톤을 생성하는 것만을 전용으로 처리하는 전용 컴퓨터 단말(30)로 개발될 수도 있다. 이를 3차원 스켈레톤 생성 시스템(40)이라 부르기로 한다. 그 외 가능한 다른 형태도 실시될 수 있다.On the other hand, as another embodiment, the three-dimensional skeleton generation method may be implemented with a single electronic circuit such as an ASIC (application specific semiconductor) in addition to being configured as a program and operated in a general-purpose computer. Alternatively, it may be developed as a dedicated computer terminal 30 that exclusively processes only the generation of a three-dimensional skeleton from a multi-viewpoint depth and color image. This will be referred to as a three-dimensional skeleton generating system 40 . Other possible forms may also be implemented.

한편, 분산 카메라 시스템(20)은 객체(10)에 대해 서로 다른 시점으로 촬영하는 다수의 색상-깊이(RGB-D) 카메라(21)로 구성된다.On the other hand, the distributed camera system 20 is composed of a plurality of color-depth (RGB-D) cameras 21 that take pictures of the object 10 from different viewpoints.

또한, 각 RGB-D 카메라(21)는 색상 정보 및 깊이 정보를 측정하여 색상 및 깊이 영상(또는 RGB-D 영상)을 획득하는 카메라이다. 바람직하게는, RGB-D 카메라(21)는 키넥트(kinect) 카메라이다. RGB-D 카메라(21)를 통해, 색상 및 깊이 영상은 2차원 픽셀들로 구성되고, 각 픽셀은 색상 값 및 깊이 값을 갖는다.In addition, each RGB-D camera 21 is a camera that obtains a color and depth image (or RGB-D image) by measuring color information and depth information. Preferably, the RGB-D camera 21 is a kinect camera. Through the RGB-D camera 21, a color and depth image is composed of two-dimensional pixels, and each pixel has a color value and a depth value.

RGB-D 카메라(21)에 의해 촬영된 다시점 색상-깊이 영상(60)은 컴퓨터 단말(30)에 직접 입력되어 저장되고, 3차원 스켈레톤 생성 시스템(40)에 의해 처리된다. 또는, 다시점 색상-깊이 영상(60)은 컴퓨터 단말(30)의 저장매체에 미리 저장되고, 3차원 스켈레톤 생성 시스템(40)에 의해 저장된 색상-깊이 영상(60)을 읽어 입력될 수도 있다.The multi-viewpoint color-depth image 60 photographed by the RGB-D camera 21 is directly input to the computer terminal 30 and stored, and is processed by the three-dimensional skeleton generating system 40 . Alternatively, the multi-viewpoint color-depth image 60 may be previously stored in the storage medium of the computer terminal 30 , and may be input by reading the color-depth image 60 stored by the 3D skeleton generating system 40 .

영상은 시간상으로 연속된 프레임으로 구성된다. 예를 들어, 현재시간 t의 프레임을 현재 프레임이라고 하면, 직전시간 t-1의 프레임은 이전 프레임이라고 하고, t+1의 프레임은 다음 프레임이라고 부르기로 한다. 한편, 각 프레임은 컬러영상(또는 컬러 이미지) 및 깊이영상(또는 깊이정보)을 갖는다.An image is made up of consecutive frames in time. For example, if the frame at the current time t is called the current frame, the frame at the immediately preceding time t-1 is called the previous frame, and the frame at t+1 is called the next frame. Meanwhile, each frame has a color image (or color image) and a depth image (or depth information).

특히, RGB-D 카메라(21)의 개수만큼 객체(10)에 대해 서로 다른 시점으로 촬영하고, 특정 시간 t에서, 카메라 개수만큼의 다시점 깊이 및 색상 영상(60)이 획득된다.In particular, as many RGB-D cameras 21 as the number of objects 10 are photographed from different viewpoints, and at a specific time t, multi-view depth and color images 60 as many as the number of cameras are acquired.

한편, 색상-깊이 영상(60)은 시간상으로 연속된 프레임으로 구성된다. 하나의 프레임은 하나의 이미지를 갖는다. 또한, 영상(60)은 하나의 프레임(또는 이미지)을 가질 수도 있다. 즉, 영상(60)은 하나의 이미지인 경우에도 해당된다.On the other hand, the color-depth image 60 is composed of consecutive frames in time. One frame has one image. Also, the image 60 may have one frame (or image). That is, the image 60 corresponds to a single image.

다시점 색상-깊이 영상에서 다시점 클라우드 포인트를 정합하는 것은, 곧 깊이/색상 프레임(또는 이미지) 각각에서 검출하는 것을 의미하나, 이하에서 특별한 구별의 필요성이 없는 한, 영상이나 이미지의 용어를 혼용하기로 한다.Matching multi-view cloud points in a multi-view color-depth image means detecting each of the depth/color frames (or images). decide to do

다음으로, 본 발명의 일실시예에 따른 다시점 카메라로부터 획득된 조인트 기반의 캘리브레이션을 이용한 3차원 스켈레톤 생성 방법을 전체적으로 설명한다.Next, a method for generating a three-dimensional skeleton using joint-based calibration obtained from a multi-view camera according to an embodiment of the present invention will be described as a whole.

일반적으로, 3D 체적의 모델링 결과는 정밀 동작 인식, 리깅, 혹은 영역 분할 등을 통해 AR 및 VR 서비스 위한 콘텐츠 제작 등에 널리 이용될 수 있다. 3D 체적은 다양한 형태와 특성을 갖는 분산된 카메라들로 만들 수 있다.In general, the modeling result of the 3D volume can be widely used in content creation for AR and VR services through precise motion recognition, rigging, or region segmentation. A 3D volume can be created with distributed cameras of various shapes and characteristics.

본 발명에서는 다양한 시점에서 촬영이 가능한 분산 RGB-D 카메라 네트워크를 이용하여 3D 체적 메쉬 모델에 리깅 혹은 애니메이팅을 적용하기 위해 정밀한 3D 스켈레톤을 획득하는 방법이다. The present invention is a method of obtaining a precise 3D skeleton in order to apply rigging or animation to a 3D volumetric mesh model using a distributed RGB-D camera network capable of shooting from various viewpoints.

이를 위해, 공간의 임의의 위치에 분산되어 있는 RGB-D 카메라 네트워크를 이용하여 카메라 파라미터를 획득하고 하나의 통합된 3차원 스켈레톤을 생성하는 새로운 그래픽스 파이프라인을 구성한다.To this end, we construct a new graphics pipeline that acquires camera parameters and generates a single three-dimensional skeleton using an RGB-D camera network distributed at arbitrary locations in space.

본 발명에 따른 방법은 크게 두 가지 세 가지 과정으로 구성된다.The method according to the present invention mainly consists of two or three processes.

첫 번째 과정은 수정된 파라미터 최적화(Modified Parameter Optimization) 과정이다. 즉, 분산 카메라 네트워크에서 3차원 스켈레톤 정보를 이용하여 각 카메라의 파라미터를 획득하는 과정이다. 획득된 파라미터는 각 카메라별로 획득된 스켈레톤을 통합하는데 이용된다.The first process is a Modified Parameter Optimization process. That is, it is a process of acquiring parameters of each camera using 3D skeleton information in a distributed camera network. The acquired parameters are used to integrate the acquired skeletons for each camera.

구체적으로, 카메라들로부터 획득된 부분 스켈레톤의 조인트를 특징점으로 이용하여 카메라 파라미터를 계산한다. 다수 개의 카메라들은 모두 임의의 위치에 존재할 수 있으므로, 통합을 위해 카메라 파라미터를 획득해야 한다. 본 발명에서는 일반적인 방식과 달리 특수 제작된 보정판을 이용하여 카메라들 간의 내부 및 외부 파라미터를 계산하지 않는다.Specifically, a camera parameter is calculated using a joint of a partial skeleton obtained from cameras as a feature point. Since a plurality of cameras may all exist in arbitrary positions, it is necessary to obtain camera parameters for integration. In the present invention, unlike a general method, internal and external parameters between cameras are not calculated using a specially manufactured compensation plate.

두 번째 과정은 3차원 스켈레톤 생성(3D Skeleton Generation) 과정이다. 분산 카메라 네트워크 내의 카메라들로부터 획득된 스켈레톤들을 카메라 파라미터를 이용하여 정렬한다.The second process is a 3D Skeleton Generation process. The skeletons obtained from the cameras in the distributed camera network are aligned using the camera parameters.

즉, 계산된 카메라 파라미터를 이용하여 각 카메라들로부터 획득된 스켈레톤을 통합하여 하나의 3D 스켈레톤을 생성하고, 이와 동시에 등록된(registered) 포인트 클라우드 형식의 3D 체적 모델(volumetric model)을 재구성(reconstruction)한다. 즉, 획득된 카메라 파라미터를 이용하여 각 카메라로부터 독립적으로 획득한 부분 스켈레톤을 3D 공간상에서 정렬한다.That is, one 3D skeleton is generated by integrating the skeletons obtained from each camera using the calculated camera parameters, and at the same time, a 3D volumetric model of a registered point cloud format is reconstructed. do. That is, partial skeletons independently obtained from each camera are aligned in 3D space using the obtained camera parameters.

세 번째 과정은 3차원 체적 정보를 이용하여, 스켈레톤의 각 조인트를 보정하는 알고리즘으로 구성된다. 분산 카메라 네트워크로부터 생성된 카메라 파라미터와 깊이지도 및 RGB 영상을 이용하여 포인트 클라우드 혹은 메쉬 형태의 3D 체적 모델을 생성한다. 3차원 체적 모델을 이용하여 3차원 스켈레톤을 보정함으로써 통합된 3D 스켈레톤을 생성한다.The third process consists of an algorithm that corrects each joint of the skeleton using 3D volume information. A point cloud or mesh-type 3D volume model is created using camera parameters, depth maps, and RGB images generated from a distributed camera network. An integrated 3D skeleton is created by calibrating the three-dimensional skeleton using a three-dimensional volume model.

보정 방법은 공간 및 시간적으로 수행된다. 3D 포인트 클라우드를 RGB-D 카메라들을 이용하여 획득된 포인트 클라우드 정보를 통합하고 조인트의 위치를 보정함으로써 높은 신뢰성을 갖는 3D 스켈레톤을 생성한다. 특히, 3D 체적 모델을 생성하고, 이를 이용하여 정렬된 3D 스켈레톤 정보들을 통합한 후에 최종적으로 고품질의 3D 스켈레톤을 생성한다. The calibration method is performed spatially and temporally. By integrating the point cloud information obtained using the 3D point cloud using RGB-D cameras and correcting the position of the joint, a 3D skeleton with high reliability is generated. In particular, after creating a 3D volume model and integrating the sorted 3D skeleton information using it, a high-quality 3D skeleton is finally generated.

생성된 3D 스켈레톤은 시간적으로 끊김없이 3D 체적 정보의 형태와 움직임을 정확히 표현할 수 있다.The generated 3D skeleton can accurately express the shape and movement of 3D volume information without any temporal interruption.

3D 체적을 획득하기 위해서 분산 카메라 네트워크를 일반적으로 많이 사용하고 있다. 앞서 설명한 바와 같이 분산 카메라 네트워크로부터 획득된 다시점 영상을 다양한 방법으로 활용하여 3D 체적을 생성할 수 있다. 분산 카메라 네트워크는 임의의 위치에 분포하기 때문에 객체의 크기 혹은 위치에 따라가 각 카메라에서 획득된 영상에서는 객체의 전부 혹은 일부가 포함될 수 있다. 최악의 경우에는 모든 카메라에서 객체의 일부만을 촬영할 수도 있다. 이와 같은 환경에서 3D 체적에 대한 3D 스켈레톤을 획득하는 경우에는 이러한 모든 카메라의 정보를 통합하여 하나의 3D 스켈레톤을 구해야만 할 것이다. Distributed camera networks are commonly used to acquire 3D volumes. As described above, a 3D volume can be generated by using the multi-viewpoint image obtained from the distributed camera network in various ways. Since the distributed camera network is distributed at an arbitrary location, all or part of the object may be included in the image acquired from each camera according to the size or location of the object. In the worst case, all cameras may capture only a subset of the object. In the case of obtaining a 3D skeleton for a 3D volume in such an environment, one 3D skeleton must be obtained by integrating information from all these cameras.

다음으로, 본 발명의 일실시예에 따른 다시점 카메라로부터 획득된 조인트 기반의 캘리브레이션을 이용한 3차원 스켈레톤 생성 방법을 도 2를 참조하여 보다 구체적으로 설명한다.Next, a method for generating a three-dimensional skeleton using joint-based calibration obtained from a multi-view camera according to an embodiment of the present invention will be described in more detail with reference to FIG. 2 .

도 2에서 보는 바와 같이, 본 발명의 일실시예에 따른 3차원 스켈레톤 생성 방법은 (a) 다시점 색상-깊이 영상 획득 단계(S10), (b) 관절-기반의 특징점 생성 (Joint-based Feature Point Generation) 단계(S20), (c) 스켈레톤 기반의 외부 캘리브레이션 (Skeleton-based Dynamic Extrinsic Calibration) 단계(S30), (d) 스켈레톤 정렬 및 통합 (Skeleton Alignment and Integration) 단계(S40), (f) 스켈레톤 보정(Skeleton Refinement) 단계(S60)로 구성된다. 추가적으로, (e) 3차원 메쉬 모델 생성 단계(S50)를 더 포함할 수 있다.As shown in FIG. 2 , the method for generating a three-dimensional skeleton according to an embodiment of the present invention includes (a) multi-view color-depth image acquisition step (S10), (b) joint-based feature point generation (Joint-based Feature) Point Generation) Steps (S20), (c) Skeleton-based Dynamic Extrinsic Calibration Steps (S30), (d) Skeleton Alignment and Integration Steps (S40), (f) It consists of a skeleton correction (Skeleton Refinement) step (S60). Additionally, (e) the 3D mesh model generation step (S50) may be further included.

즉, 분산 다시점 카메라의 경우에 객체가 이동함에 따라서 각 카메라가 촬영할 수 있는 객체의 영역이 달라진다. 이 경우에 각 카메라들이 촬영한 부분 정보를 하나로 모아서 하나의 정보를 생성하는 방법이 필요하다. 여기에서는 3D 스켈레톤을 생성하는 것을 목표로 하고 있으므로 각 위치에서 촬영한 객체의 부분에 대한 스켈레톤 일부를 얻고(DL-based 3D Skeleton Generation), 이 부분 스켈레톤을 정렬하고(Skeleton Alignment) 통합하여(Skeleton Integration) 완전한 3D 스켈레톤을 얻는다. 또한, 스켈레톤을 정렬하기 위해서는 분산 카메라에 대한 카메라 파라미터가 필요하고, 스켈레톤을 통합하는 과정에서는 3D 메쉬(Mesh) 정보를 이용하여 보정함으로써 정확한 3D 스켈레톤을 3D 체적 모델에 내장시킬 수 있다.That is, in the case of a distributed multi-view camera, as the object moves, the area of the object that can be photographed by each camera changes. In this case, there is a need for a method of generating one piece of information by collecting the partial information photographed by each camera. Here, we aim to create a 3D skeleton, so we get a part of the skeleton for the part of the object shot at each location (DL-based 3D Skeleton Generation), align this part of the skeleton (Skeleton Alignment), and integrate (Skeleton Integration). ) to get a full 3D skeleton. In addition, in order to align the skeleton, camera parameters for the distributed camera are required, and in the process of integrating the skeleton, an accurate 3D skeleton can be embedded in the 3D volume model by correcting it using 3D mesh information.

이하에서 보다 구체적으로 설명한다.It will be described in more detail below.

먼저, 다시점 색상-깊이(RGB-D) 카메라로부터 객체(10)를 촬영한 다시점 영상을 입력받는다(S10). 즉, 다시점 색상-깊이 카메라 시스템이 객체를 촬영하고, 촬영된 다시점 색상-깊이 영상을 입력받는다. 또한, RGB-D 카메라를 통해 촬영되는 깊이 및 RGB 이미지를 이용하여 깊이 카메라의 좌표계를 따르는 포인트 클라우드를 각 카메라에서 획득한다.First, a multi-viewpoint image obtained by photographing the object 10 is received from a multi-viewpoint color-depth (RGB-D) camera ( S10 ). That is, the multi-viewpoint color-depth camera system captures an object and receives the captured multi-viewpoint color-depth image. In addition, a point cloud that follows the coordinate system of the depth camera is acquired from each camera using the depth and RGB images captured by the RGB-D camera.

실사 기반의 3D 체적을 생성하기 위해 RGB-D 카메라를 사용한다. 3D 스켈레톤을 생성하는 것이 목표이므로, 공간의 여기저기에 분산하여 다수의 RGB-D 카메라를 설치한다. 바람직하게는, 다수의 RGB-D 카메라(8대 RGB-D 카메라 등)의 배치는 물체를 모든 높이에서 촬영하기 위해 높이가 다른 위치에 카메라를 설치할 수 있는 스탠드 형태의 촬영 장비를 사용하여 구성한다. 도 3은 본 발명에서 사용한 분산 카메라 네트워크이다.It uses an RGB-D camera to create a photorealistic 3D volume. Since our goal is to create a 3D skeleton, we install multiple RGB-D cameras scattered around the space. Preferably, the arrangement of a plurality of RGB-D cameras (eight RGB-D cameras, etc.) is configured using stand-type photographing equipment capable of installing cameras at different heights in order to photograph an object from all heights. . 3 is a distributed camera network used in the present invention.

분산 카메라 네트워크는 일정한 공간 내에 임의의 위치에 다수 개의 카메라를 위치시키고, 객체를 스캐닝하는 시스템을 의미한다. 특히, 분산 카메라 시스템(20)은 수평 방향으로 적어도 4개의 지점(시점)에서 객체를 향하는 카메라를 설치하고, 각 지점(시점)에서 수직 방향(위아래 방향)으로 적어도 2대의 카메라를 이격되도록 설치한다. 즉, 분산 카메라 시스템(20)은 적어도 4대의 카메라가 하나의 수평층을 이루고, 적어도 2개의 수평층을 갖는다. 모든 카메라가 정확한 위치에 설치될 필요는 없고, 대략적으로 유사한 위치에 설치될 수 있다.A distributed camera network refers to a system in which a plurality of cameras are located at arbitrary positions in a certain space and an object is scanned. In particular, the distributed camera system 20 installs cameras facing the object at at least four points (viewpoints) in the horizontal direction, and installs at least two cameras to be spaced apart from each other in the vertical direction (up and down directions) at each point (viewpoint). . That is, in the distributed camera system 20, at least four cameras form one horizontal layer and have at least two horizontal layers. Not all cameras need to be installed at exact locations, and may be installed at approximately similar locations.

일례로서, 도 3에서 보는 바와 같이, 8대의 RGB-D 카메라를 설치한다. 즉, 8대의 카메라는 공간의 중심을 향하도록 하고, 4대의 카메라는 아래에 위치하고, 4대의 카메라는 위쪽에 위치한다.As an example, as shown in FIG. 3 , eight RGB-D cameras are installed. That is, 8 cameras face the center of the space, 4 cameras are located below, and 4 cameras are located above.

더욱 바람직하게는, 비교적 저가의 ToF(Time of Flight) 센서인 애저 키넥트 (Asure Kinect)를 이용한 색상-깊이(RGB-D) 센서를 사용한다. 그러나 애저 키넥트의 SDK 뿐만 아니라 3D 스켈레톤을 추출할 수 있는 어떠한 장치나 시스템도 이용할 수 있다.More preferably, a color-depth (RGB-D) sensor using Asure Kinect, which is a relatively inexpensive Time of Flight (ToF) sensor, is used. However, not only Azure Kinect's SDK, but any device or system capable of extracting 3D skeletons can be used.

한편, 분산 RGB-D 카메라 네트워크를 통해서 생성된 다수 개의 RGB와 깊이지도 영상은 전처리 과정을 거쳐서 보정된다.Meanwhile, multiple RGB and depth map images generated through a distributed RGB-D camera network are corrected through preprocessing.

다음으로, 각 시점의 색상-깊이 영상으로부터 각 시점의 3차원 스켈레톤을 생성하고, 각 시점의 스켈레톤의 조인트를 특징점으로 생성한다(S20).Next, a 3D skeleton of each view is generated from the color-depth image of each view, and a joint of the skeleton of each view is generated as a feature point (S20).

바람직하게는, DL 기반(DL-based)으로 3차원 스켈레톤(3D Skeleton)을 생성한다. 즉, 오픈포즈(OpenPose)와 같은 딥러닝(deep learning)을 이용하여 스켈레톤을 추출한다.Preferably, a 3D skeleton is generated on a DL-based basis. That is, the skeleton is extracted using deep learning such as OpenPose.

특히, 애저 키넥트(Asure Kinect)의 SDK(software development kit)를 이용하여, 각 카메라별로 3D 스켈레톤을 추출한다. 각 카메라별로 3D 스켈레톤을 추출하기 위한 통상의 방법은 어떠한 것을 사용하여도 무관하고, 카메라별로 다른 방식을 사용하여도 상관없다.In particular, using the SDK (software development kit) of Azure Kinect, 3D skeletons are extracted for each camera. Any conventional method for extracting the 3D skeleton for each camera may be used, and a different method may be used for each camera.

색상-깊이 영상에서 스켈레톤을 추출하는 예가 도 4에 도시되고 있다.An example of extracting a skeleton from a color-depth image is shown in FIG. 4 .

일례로서, 신경망 방식의 오픈포즈(OpenPose)를 이용하여 2D 스켈레톤을 추출할 수 있다[비특허문헌 3]. 오픈포즈(OpenPose)는 CNN(Convolutional Neural Network)을 기반으로 하며, 사진에서 실시간(real-time)으로 여러 사람의 몸, 손, 그리고 얼굴의 특장점을 추출할 수 있는 라이브러리이다. 특히, 오픈포즈는 여러 사람의 자세(pose)를 빠르게 찾는다. 오픈포즈(OpenPose)는 바텀업(Bottom-Up) 방식의 일종으로 반복적인 처리 없이 성능을 향상시켰다. 바텀업(Bottom-Up) 방식은 모든 사람의 관절을 추정하고, 각 관절의 위치를 이은 다음, 각각에 해당하는 사람의 관절 위치로 재생성하는 방식이다. 오픈포즈(OpenPose)를 이용하여 스켈레톤을 추출된 결과는 이미지(Image)와 제이슨(json) 파일로 출력된다.As an example, a 2D skeleton can be extracted using OpenPose of a neural network method [Non-Patent Document 3]. OpenPose is a library that is based on a Convolutional Neural Network (CNN) and can extract features of various people's bodies, hands, and faces in real-time from photos. In particular, the open pose quickly finds the poses of several people. OpenPose is a type of bottom-up method that improves performance without repetitive processing. The bottom-up method is a method of estimating the joints of all people, connecting the positions of each joint, and then regenerating the joint positions of each person. The result of extracting the skeleton using OpenPose is output as an image and a json file.

또한, 추출된 3차원 스켈레톤에서 조인트들을 각 시점의 특징점으로 생성한다. 스켈레톤의 조인트를 특징점으로 생성하는 예가 도 5에 도시되고 있다.Also, from the extracted 3D skeleton, joints are created as feature points at each viewpoint. An example of generating a skeleton joint as a feature point is shown in FIG. 5 .

다음으로, 각 시점의 스켈레톤의 조인트를 이용하여 외부 파라미터를 최적화 하는 외부 캘리브레이션을 수행한다(S30).Next, external calibration for optimizing external parameters is performed using the skeleton joint at each time point (S30).

도 6은 스켈레톤을 이용하여 카메라 파라미터를 구하기 위한 외부 캘리브레이션 (extrinsic calibration) 과정을 나타낸 슈도코드를 나타내고, 도 7은 외부 캘리브레이션의 과정을 도식화 하고 있다.6 shows pseudocodes showing an external calibration process for obtaining camera parameters using a skeleton, and FIG. 7 schematically shows the external calibration process.

도 6 또는 도 7에서 보는 바와 같이, 각 시점의 스켈레톤의 조인트들을 특징점 집합으로 구성한다. 그리고 특징점 집합에 대해 외부 파라미터의 최적화를 수행하여 가장 큰 오차를 가진 특징점을 검출하고, 검출된 특징점을 특징점 집합에서 제외하고 나머지 집합에 대해 최적화 작업을 반복한다. 이때, 특징점 집합의 크기가 1일 때까지 반복한다. 그리고 최적화 오차가 가장 적은 경우의 최적화된 외부 파라미터를 최종 외부 파라미터로 획득한다.As shown in FIG. 6 or FIG. 7 , the joints of the skeleton at each viewpoint are configured as a set of feature points. Then, the feature point with the largest error is detected by optimizing the external parameter on the feature point set, the detected feature point is excluded from the feature point set, and the optimization operation is repeated for the remaining set. At this time, the repetition is repeated until the size of the feature point set is 1. Then, the optimized external parameter in the case of the smallest optimization error is obtained as the final external parameter.

구체적으로, 각 카메라로부터 획득된 유효한 조인트 정보를 특징점으로 하여 1차 최적화를 수행한다. 다음으로 이전 최적화 이후에 가장 오차가 심한 조인트를 제외하고 최적화를 다시 수행하고 전체 조인트에 대한 오차를 다시 계산하여 저장한다. 이러한 과정은 이전 조인트 세트에 의한 최적화 오차의 수렴한 값과 다음 조인트 세트의 최적화 오차의 수렴한 값 사이의 차이값이 특정 임계치 이하일 때까지 반복한다. 즉, 특정 임계치 이하이면 반복을 중단한다. 그리고 중단하고 중단시의 현재의 최적화에 의해 획득된 카메라 파라미터를 획득한다. 이 과정은 모든 카메라로부터 획득된 8개의 스켈레톤(skeleton) 정보를 이용하여 반복 수행한다.Specifically, first-order optimization is performed using valid joint information obtained from each camera as a feature point. Next, excluding the joint with the most error after the previous optimization, the optimization is performed again, and the error for the entire joint is recalculated and stored. This process is repeated until the difference between the converged value of the optimization error of the previous joint set and the converged value of the optimization error of the next joint set is less than or equal to a specific threshold. That is, iteration is stopped when it is below a certain threshold. Then stop and acquire the camera parameters obtained by the current optimization at the time of stopping. This process is repeatedly performed using information on eight skeletons obtained from all cameras.

다음으로, 이하에서, 특징점 집합에 대하여 외부 파라미터를 최적화 하는 과정을 설명한다. 즉, 캘리브레이션을 위해 포인트 클라우드 세트들(또는 특징점 집합)에서 매칭되는 좌표를 이용하여 각 카메라들의 외부(extrinsic parameter)들을 구하는 방법은 다음과 같다.Next, a process of optimizing an external parameter with respect to a set of feature points will be described below. That is, a method of obtaining the extrinsic parameters of each camera using coordinates matched from point cloud sets (or feature point sets) for calibration is as follows.

이 파라미터들은 최적화 알고리즘을 이용하여 매칭되는 좌표들의 유클리드 제곱거리가 최소가 되도록 계산된다. 좌표계의 변환행렬에는 x, y, z 축 각각의 회전각과 평행이동 값들의 파라미터가 포함되어 있다. 하나의 카메라를 기준 좌표계로 설정한 후에, 기준 좌표계로 다른 카메라들의 그것들을 변환시키는 파라미터를 구한다. X_ref는 기준 카메라의 좌표를 나타내고 X_i는 나머지 카메라들의 좌표를 나타낸다. R_i->ref와 t_i->ref은 각 카메라에서 기준 카메라로의 회전과 평행이동 행렬을 나타낸다. 초기 R은 단위행렬이고 t_i->ref는 모두 제로(zero)로 되어있다.These parameters are calculated using an optimization algorithm so that the Euclidean square distance of the matching coordinates is minimized. The transformation matrix of the coordinate system includes parameters of rotation angles and translation values of each of the x, y, and z axes. After setting one camera to the reference coordinate system, parameters for converting those of the other cameras to the reference coordinate system are obtained. X _ref represents the coordinates of the reference camera and X _i represents the coordinates of the remaining cameras. R _i->ref and t _i->ref represent the rotation and translation matrices from each camera to the reference camera. Initial R is the identity matrix, and t _i->ref is all zeros.

도 6의 식 (1)은 초기 파라미터를 적용하게 되면 결과는 X_i가 되고, 최적화를 진행하면서 X_ref에 수렴하게 된다. 최적화를 진행할 손실(loss) 함수는 X_ref와 X_i'의 SED(Squared Euclidean Distance, 유클리드 제곱 거리)의 평균값이다. 식 (2)는 오차 함수를 나타낸다.In Equation (1) of FIG. 6 , when the initial parameter is applied, the result becomes X _i and converges to X _ref while the optimization is performed. The loss function to be optimized is the average value of the SED (Squared Euclidean Distance) of X _ref and X _i '. Equation (2) represents the error function.

손실(loss) 함수를 좌표계 변환 파라미터들에 대해 미분하고 함수가 최소가 되도록 파라미터를 갱신해 나가는 과정을 식 (3)과 같이 나타낼 수 있다. α는 상수로써 학습률이고, 사전에 정해지는 상수값으로서, 바람직하게는, 0.01값을 사용한다. P는 카메라 파라미터 또는 외부 파라미터를 나타내고, P_n+1과 P_n는 각각 n+1과 n번째 반복연산에서의 파라미터이다.The process of differentiating the loss function with respect to the coordinate system transformation parameters and updating the parameters so that the function is minimized can be expressed as Equation (3). α is a learning rate as a constant, and as a constant value determined in advance, preferably, a value of 0.01 is used. P represents a camera parameter or an external parameter, and P _n+1 and P _n are parameters in the n+1 and n-th iterations, respectively.

실험에 의하면, 이 과정을 200,000번 이상 수행하면, 8대 카메라들의 평균 오차는 2.98mm까지 낮아진다.According to the experiment, if this process is performed more than 200,000 times, the average error of the eight cameras is lowered to 2.98mm.

다음으로, 외부 파라미터를 이용하여, 각 시점의 3차원 스켈레톤을 정렬하고 통합한다(S40).Next, using the external parameters, the three-dimensional skeleton of each view is aligned and integrated (S40).

즉, 앞서 구한 각 시점의 외부 파라미터 또는 카메라 파라미터를 이용하여, 각 시점의 3차원 스켈레톤을 하나의 월드 좌표계로 정렬한 후 통합한다. 앞서의 예에서, 카메라를 8대 사용하므로, 최대 8개의 부분 스켈레톤을 하나의 월드 좌표계로 정렬하여 통합한다. That is, by using the previously obtained external parameters or camera parameters of each viewpoint, the three-dimensional skeleton of each viewpoint is aligned into one world coordinate system and then integrated. In the previous example, since 8 cameras are used, a maximum of 8 partial skeletons are integrated into one world coordinate system.

구체적으로, 최적화 과정에 의해 각 카메라의 파라미터가 구해지면, 다음 수학식 4를 이용하여 각 시점의 카메라 좌표계에서 월드 좌표계로 변환을 수행하고, 통일된 좌표계를 기준으로 포인트 클라우드를 정렬할 수 있다.Specifically, when the parameters of each camera are obtained by the optimization process, the transformation from the camera coordinate system at each viewpoint to the world coordinate system is performed using the following Equation 4, and the point clouds can be aligned based on the unified coordinate system.

분산 카메라 네트워크에서 획득된 8쌍의 색상(RGB)와 깊이영상을 이용하여 스켈레톤의 정렬을 위한 스켈레톤 위치 이동 및 회전 변환 파라미터를 계산한다. 이동 및 회전 변환 파라미터는 앞서 외부 파라미터의 최적화 과정에서 획득한 각 시점의 변환 파라미터(카메라 파라미터)이다.Using 8 pairs of color (RGB) and depth images obtained from a distributed camera network, we calculate the skeleton position movement and rotation transformation parameters for skeleton alignment. The movement and rotation transformation parameters are transformation parameters (camera parameters) of each viewpoint obtained in the process of optimizing the external parameters.

함수 최적화 기법을 이용하여 최적화된 각 카메라별 회전(R) 및 이동(t) 매트릭스(행렬)를 각 카메라 좌표계를 기준으로 정의된 각 카메라로부터 추출된 3D 스켈레톤에 적용하면 모든 3D 스켈레톤은 월드 좌표계를 기준으로 정렬된다.If the rotation (R) and movement (t) matrix (matrix) for each camera optimized using the function optimization technique is applied to the 3D skeleton extracted from each camera defined based on each camera coordinate system, all 3D skeletons are created using the world coordinate system. are sorted by

이와 같은 관계는 수학식 4 및 5와 같이 정의된다. P_W는 월드 좌표(기준 카메라 좌표)를 나타내고, S_C는 카메라 좌표를 나타낸다[18]. Such a relationship is defined as Equations 4 and 5. P _W denotes world coordinates (reference camera coordinates), and S _C denotes camera coordinates [18].

[수학식 4][Equation 4]

[수학식 5][Equation 5]

다음으로, 다시점 색상-깊이 영상으로부터 3차원 메쉬 모델을 생성한다(S50).Next, a 3D mesh model is generated from the multi-viewpoint color-depth image (S50).

즉, 앞서 단계에서, 각 시점의 스켈레톤을 정렬하고, 이로부터 포인트 클라우드 혹은 메쉬 형태의 3차원 체적 모델을 생성한다. 여기에서 TSDF(truncated signed distance function)를 사용할 수 있으나 이에 한정되지 않는다. 즉, 통합된 포인트 클라우드의 공간적인 분포를 확인할 수 있는 방식이면 어떠한 방식이든 적용될 수 있다.That is, in the previous step, the skeleton at each viewpoint is aligned, and a point cloud or mesh-type 3D volume model is generated from this. Here, a truncated signed distance function (TSDF) may be used, but is not limited thereto. That is, any method can be applied as long as it is a method that can confirm the spatial distribution of the integrated point cloud.

다음으로, 통합된 3차원 스켈레톤에 대해 보정을 수행한다(S60).Next, correction is performed on the integrated three-dimensional skeleton (S60).

분산 카메라를 통해서 획득된 부분적인, 그리고 불완전한 3차원 스켈레톤들은 스켈레톤 통합 과정을 거치면서 하나로 합쳐진다. 이 과정에서 잘못 추출된 조인트는 삭제되고, 나머지 조인트 정보들을 공간에서 평균하여 정확한 스켈레톤을 추출한다.Partial and incomplete 3D skeletons obtained through the distributed camera are merged into one through the skeleton integration process. In this process, the wrongly extracted joint is deleted, and the correct skeleton is extracted by averaging the remaining joint information in space.

바람직하게는, 도 8에서 보는 바와 같이, 3차원(3D) 체적 메쉬 모델의 정보를 이용하여, 통합된 3차원 스켈레톤을 보정한다. 즉, 정렬된 3D 스켈레톤들과 3D 체적 메쉬 모델을 하나의 공간에 위치시킨 후, 3D 체적 메쉬 모델의 바깥에 존재하는 조인트들은 잘못 추출된 조인트들로 간주하여 삭제한다.Preferably, as shown in FIG. 8 , the integrated three-dimensional skeleton is corrected using the information of the three-dimensional (3D) volumetric mesh model. That is, after arranging the aligned 3D skeletons and the 3D volumetric mesh model in one space, the joints existing outside the 3D volumetric mesh model are considered as erroneously extracted joints and are deleted.

도 8은 보정 과정을 예시한 것으로서, 도 8(a)는 보정전 정렬된 관절(조인트)을 나타내고, 도 8(b)는 보정후 관절을 나타낸다. 특히, 도 8에서, 굵은 점선은 3차원 모델의 표면을 나타내고, 붉은 점은 결정된 관절이고, 푸른 점은 정렬된 관절이다.Fig. 8 illustrates the correction process. Fig. 8 (a) shows the joint (joint) aligned before correction, and Fig. 8 (b) shows the joint after correction. In particular, in FIG. 8 , a thick dotted line indicates a surface of the 3D model, a red dot is a determined joint, and a blue dot is an aligned joint.

한편, 각 시점의 스켈레톤을 월드 좌표계로 변환하여 정렬한 후, 체적 메쉬로 보정을 하고, 보정한 뒤 남은 스켈레톤을 평균하여 통합한다.On the other hand, the skeleton at each point of view is converted into the world coordinate system and aligned, then corrected with a volumetric mesh, and the remaining skeletons after correction are averaged and integrated.

이상, 본 발명자에 의해서 이루어진 발명을 상기 실시 예에 따라 구체적으로 설명하였지만, 본 발명은 상기 실시 예에 한정되는 것은 아니고, 그 요지를 이탈하지 않는 범위에서 여러 가지로 변경 가능한 것은 물론이다.As mentioned above, the invention made by the present inventors has been described in detail according to the above embodiments, but the present invention is not limited to the above embodiments, and various modifications can be made without departing from the gist of the present invention.

10 : 물체 20 : 카메라 시스템
30 : 컴퓨터 단말 40 : 프로그램 시스템10 object 20 camera system
30: computer terminal 40: program system

Claims

다시점 카메라로부터 획득된 조인트 기반의 캘리브레이션을 이용한 3차원 스켈레톤 생성 방법에 있어서,
(a) 다시점 색상-깊이 영상을 획득하는 단계;
(b) 각 시점의 색상-깊이 영상으로부터 각 시점의 3차원 스켈레톤을 생성하고, 각 시점의 스켈레톤의 조인트를 특징점으로 생성하는 단계;
(c) 각 시점의 스켈레톤의 조인트를 이용하여 외부 파라미터를 최적화 하는 외부 캘리브레이션을 수행하는 단계; 및,
(d) 외부 파라미터를 이용하여, 각 시점의 3차원 스켈레톤을 정렬하고 통합하는 단계를 포함하는 것을 특징으로 하는 다시점 카메라로부터 획득된 조인트 기반의 캘리브레이션을 이용한 3차원 스켈레톤 생성 방법.
In a three-dimensional skeleton generation method using joint-based calibration obtained from a multi-view camera,
(a) acquiring a multi-viewpoint color-depth image;
(b) generating a three-dimensional skeleton of each view from the color-depth image of each view, and generating a joint of the skeleton at each view as a feature point;
(c) performing external calibration for optimizing external parameters using the joints of the skeleton at each time point; and,
(d) Using an external parameter, a 3D skeleton generation method using a joint-based calibration obtained from a multi-view camera, characterized in that it comprises the step of aligning and integrating the 3D skeleton of each view.

제1항에 있어서,
상기 (a)단계에서, 상기 다시점 색상-깊이 영상은 적어도 2개의 수평층을 이루고 각 층에서 적어도 4개의 색상-깊이 카메라에 의해 촬영된 각 시점의 색상-깊이 영상들로 구성되는 것을 특징으로 하는 다시점 카메라로부터 획득된 조인트 기반의 캘리브레이션을 이용한 3차원 스켈레톤 생성 방법.
According to claim 1,
In the step (a), the multi-view color-depth image constitutes at least two horizontal layers and consists of color-depth images of each view taken by at least four color-depth cameras in each layer. A method of generating a three-dimensional skeleton using joint-based calibration obtained from a multi-view camera.

제1항에 있어서,
상기 (b)단계에서, 애저 키넥트(Asure Kinect)의 SDK(software development kit)를 이용하여, 각 카메라별로 3D 스켈레톤을 추출하는 것을 특징으로 하는 다시점 카메라로부터 획득된 조인트 기반의 캘리브레이션을 이용한 3차원 스켈레톤 생성 방법.
According to claim 1,
In step (b), using the SDK (software development kit) of Azure Kinect, 3D using joint-based calibration obtained from a multi-view camera, characterized in that the 3D skeleton is extracted for each camera How to create a dimensional skeleton.

제1항에 있어서,
상기 (c)단계에서, 각 시점의 스켈레톤의 조인트들을 특징점 집합으로 구성하고, 상기 특징점 집합에 대해 외부 파라미터의 최적화를 수행하여 가장 큰 오차를 가진 특징점을 검출하고, 검출된 특징점을 상기 특징점 집합에서 제외하고 나머지 집합에 대해 최적화 작업을 반복하고, 최적화 오차가 가장 적은 경우의 최적화된 외부 파라미터를 최종 외부 파라미터로 획득하는 것을 특징으로 하는 다시점 카메라로부터 획득된 조인트 기반의 캘리브레이션을 이용한 3차원 스켈레톤 생성 방법.
According to claim 1,
In step (c), the joints of the skeleton at each point of time are configured as a set of feature points, and the feature point with the largest error is detected by optimizing an external parameter for the set of feature points, and the detected feature point is set in the feature point set. 3D skeleton generation using joint-based calibration obtained from a multi-view camera, characterized in that the optimization operation is repeated for the remaining sets except for, and the optimized external parameter in the case of the smallest optimization error is acquired as the final external parameter Way.

제4항에 있어서,
상기 (c)단계에서, 가장 오차가 심한 조인트의 오차가 특정 임계치 이하인 경우, 반복을 중단하고, 중단할 때의 카메라 파라미터를 최종 파라미터로 선정하는 것을 특징으로 하는 다시점 카메라로부터 획득된 조인트 기반의 캘리브레이션을 이용한 3차원 스켈레톤 생성 방법.
5. The method of claim 4,
In the step (c), when the error of the joint with the most error is below a specific threshold, the repetition is stopped and the camera parameter at the time of stopping is selected as the final parameter. A 3D skeleton creation method using calibration.

제1항에 있어서,
상기 (c)단계에서, 기준 좌표계의 포인트 클라우드의 실제 좌표(X_ref)와 상기 변환 파라미터에 의한 변환 좌표(X_i')의 오차가 최소화 되도록, 상기 변환 파라미터를 최적화하되, 다음 수식 1에 의하여, 현재 좌표변환 파라미터 P_n에서 다음 좌표변환 파라미터 P_n+1을 업데이트하여, 최적화를 반복하는 것을 특징으로 하는 다시점 카메라로부터 획득된 조인트 기반의 캘리브레이션을 이용한 3차원 스켈레톤 생성 방법.
[수식 1]

여기서, α는 사전에 설정된 상수이고, P는 변환 파라미터 회 전변환 행렬 R, 평행이동 행렬 t, 및, 스케일링 팩터 S를 의미하고, P_n은 현재 계산된 변환 파라미터의 값, 그리고 P_n+1은 보정될 좌표변환 파라미터 값이고, ∂f_Error/∂P_n 는 f_Error를 변환 파라미터로 편미분하는 것을 의미하고, f_Error는 기준 좌표계의 포인트 클라우드의 실제 좌표(X_ref)와 변환 파라미터에 의한 변환 좌표(X_i')의 오차 함수임.
According to claim 1,
In the step (c), the transformation parameter is optimized so that the error between the actual coordinate (X _ref ) of the point cloud of the reference coordinate system and the transformation coordinate (X _i ') by the transformation parameter is minimized, by the following Equation 1 , A method for generating a three-dimensional skeleton using a joint-based calibration obtained from a multi-view camera, characterized in that by updating the next coordinate transformation parameter P _n+1 from the current coordinate transformation parameter P _n , and repeating the optimization.
[Formula 1]

Here, α is a preset constant, P denotes a transformation parameter rotation transformation matrix R, a translation matrix t, and a scaling factor S, P _n is a value of the currently calculated transformation parameter, and P _n+1 is the coordinate transformation parameter value to be corrected, ∂f _Error /∂P _n means partial differentiation of f _Error as a transformation parameter, and f _Error is the actual coordinate (X _ref ) of the point cloud of the reference coordinate system and transformation by the transformation parameter. The error function of the coordinates (X _i ').

제1항에 있어서,
상기 (d)단계에서, 각 시점의 외부 파라미터 또는 카메라 파라미터를 이용하여, 각 시점의 3차원 스켈레톤을 하나의 월드 좌표계로 정렬한 후 통합하는 것을 특징으로 하는 다시점 카메라로부터 획득된 조인트 기반의 캘리브레이션을 이용한 3차원 스켈레톤 생성 방법.
According to claim 1,
In step (d), the joint-based calibration obtained from the multi-view camera, characterized in that the three-dimensional skeleton of each view is aligned into one world coordinate system and then integrated using the external parameter or camera parameter of each view. How to create a 3D skeleton using

제7항에 있어서,
상기 (d)단계에서, 다시점 색상-깊이 영상으로부터 3차원 메쉬 모델을 생성하고, 각 시점의 스켈레톤을 월드 좌표계로 변환하여 정렬한 후, 정렬된 3D 스켈레톤들과 3D 체적 메쉬 모델을 하나의 공간에 위치시키고, 3D 체적 메쉬 모델의 바깥에 존재하는 조인트들을 제외하여 보정하는 것을 특징으로 하는 다시점 카메라로부터 획득된 조인트 기반의 캘리브레이션을 이용한 3차원 스켈레톤 생성 방법.
8. The method of claim 7,
In step (d), a 3D mesh model is generated from the multi-viewpoint color-depth image, the skeleton at each viewpoint is converted to the world coordinate system and aligned, and then the aligned 3D skeletons and the 3D volumetric mesh model are combined into one space. A method of generating a three-dimensional skeleton using joint-based calibration obtained from a multi-view camera, characterized in that it is positioned in the .

제8항에 있어서,
상기 (d)단계에서, 포인트 클라우드 통합 또는 메쉬 생성 방식을 사용하여 3차원 체적 모델을 생성하는 것을 특징으로 하는 다시점 카메라로부터 획득된 조인트 기반의 캘리브레이션을 이용한 3차원 스켈레톤 생성 방법.
9. The method of claim 8,
In step (d), a three-dimensional skeleton generation method using a joint-based calibration obtained from a multi-view camera, characterized in that the three-dimensional volume model is generated using a point cloud integration or a mesh generation method.

제1항 내지 제9항 중 어느 한 항의 다시점 카메라로부터 획득된 조인트 기반의 캘리브레이션을 이용한 3차원 스켈레톤 생성 방법을 수행하는 프로그램을 기록한 컴퓨터로 읽을 수 있는 기록매체.
A computer-readable recording medium recording a program for performing a three-dimensional skeleton generation method using the joint-based calibration obtained from the multi-view camera of any one of claims 1 to 9.