KR20200075727A

KR20200075727A - Method and apparatus for calculating depth map

Info

Publication number: KR20200075727A
Application number: KR1020190096306A
Authority: KR
Inventors: 리우 지후아; 김윤태; 이형욱; 마 린; 왕 치앙; 야민 마오; 가오 티안하오
Original assignee: 삼성전자주식회사
Priority date: 2018-12-18
Filing date: 2019-08-07
Publication date: 2020-06-26
Also published as: CN111340922A

Abstract

Disclosed are a method and an apparatus for calculating a depth map. According to one embodiment of the present invention, the method for calculating the depth map comprises the steps of: calculating a global sparse depth map corresponding to a current frame using a plurality of frames including the current frame; calculating a local dense depth map corresponding to the current frame using the current frame; extracting a non-static object area by masking a static object area in the current frame; removing the non-static object area from the global sparse depth map; and generating a global dense depth map corresponding to the current frame by integrating the global sparse depth map from which the non-static object area are removed and the local dense depth map.

Description

깊이 맵 산출 방법 및 장치{METHOD AND APPARATUS FOR CALCULATING DEPTH MAP}Depth map calculation method and apparatus {METHOD AND APPARATUS FOR CALCULATING DEPTH MAP}

아래 실시예들은 깊이 맵 산출 방법 및 장치에 관한 것이다.The embodiments below relate to a depth map calculation method and apparatus.

센서 정보를 이용하여 환경에 대한 지도(map)를 작성하는 동시에, 작성된 지도로부터 현재 위치를 추정하는 것을 동시적 위치추정 및 지도작성(SLAM; Simultaneous Localization And Mapping) 기법(이하, 'SLAM 기법'이라 지칭한다)이라고 한다.Simultaneous Localization And Mapping (SLAM) technique (hereinafter referred to as'SLAM technique') is a method of creating a map of the environment using sensor information and estimating the current location from the created map. Refer to).

SLAM 기법에는 센서 정보(예를 들어, 영상 데이터)를 획득하기 위한 센서로 레이저 거리 센서(LiDAR)와 카메라가 주로 사용된다. 카메라를 사용하는 경우, 레이저 거리 센서를 사용하는 경우보다 가격이 저렴하고, 사용 범위가 더 광범위(다양한 날씨, 상황에서 사용 가능)한 장점이 있으나, 포즈 결정 및 구축된 맵의 정확도가 낮아 알려지지 않은 깊이가 많은 장면에 대해서는 필요한 정보를 거의 제공할 수 없는 문제가 있다.In the SLAM technique, a laser distance sensor (LiDAR) and a camera are mainly used as sensors for acquiring sensor information (eg, image data). When using a camera, it is cheaper than using a laser distance sensor, and has a wider range of use (available in a variety of weather and situations), but the accuracy of the pose determination and the constructed map is low and unknown. There is a problem in that it is hard to provide necessary information for a scene having a large depth.

일 실시예에 따른 깊이 맵 산출 방법은 현재 프레임을 포함하는 복수의 프레임들을 이용하여, 상기 현재 프레임에 대응하는 글로벌 스파스 깊이 맵(global sparse depth map)을 산출하는 단계; 상기 현재 프레임을 이용하여, 상기 현재 프레임에 대응하는 로컬 덴스 깊이 맵(local dense depth map)을 산출하는 단계; 상기 현재 프레임에서, 정적 객체(static object) 영역을 마스크(mask)하여 동적 객체(none static object) 영역을 추출하는 단계; 상기 글로벌 스파스 깊이 맵에서 상기 동적 객체 영역을 제거하는 단계; 및 상기 동적 객체 영역이 제거된 글로벌 스파스 깊이 맵과 상기 로컬 덴스 깊이 맵을 통합하여 상기 현재 프레임에 대응하는 글로벌 덴스 깊이 맵(global dense depth map)을 생성하는 단계를 포함한다.A depth map calculation method according to an embodiment may include calculating a global sparse depth map corresponding to the current frame using a plurality of frames including the current frame; Calculating a local dense depth map corresponding to the current frame using the current frame; Extracting a dynamic object region by masking a static object region in the current frame; Removing the dynamic object region from the global sparse depth map; And integrating the global sparse depth map from which the dynamic object region is removed and the local dense depth map to generate a global dense depth map corresponding to the current frame.

상기 글로벌 스파스 깊이 맵을 산출하는 단계는 상기 현재 프레임에 포함된 하나 이상의 픽셀 포인트에 대응하는 깊이 정보를 산출하는 단계; 상기 현재 프레임에 대응하는 카메라의 포즈 정보를 추정하는 단계; 및 상기 깊이 정보 및 상기 카메라의 포즈 정보에 기초하여, 상기 픽셀 포인트의 3차원 좌표를 산출하는 단계를 포함할 수 있다.The calculating of the global sparse depth map may include calculating depth information corresponding to one or more pixel points included in the current frame; Estimating pose information of a camera corresponding to the current frame; And calculating 3D coordinates of the pixel point based on the depth information and the pose information of the camera.

일 실시예에 따른 깊이 맵 산출 방법은 상기 글로벌 덴스 깊이 맵에 기초하여, 상기 현재 프레임에 대응하는 카메라의 포즈 정보를 업데이트하는 단계를 더 포함할 수 있다.The method of calculating a depth map according to an embodiment may further include updating pose information of a camera corresponding to the current frame based on the global depth map.

일 실시예에 따른 깊이 맵 산출 방법은 상기 업데이트가 완료된 카메라의 포즈 정보에 기초하여, 상기 글로벌 스파스 깊이 맵을 업데이트하는 단계를 더 포함할 수 있다.The method for calculating a depth map according to an embodiment may further include updating the global sparse depth map based on the pose information of the updated camera.

상기 글로벌 스파스 깊이 맵을 산출하는 단계는 상기 복수의 프레임들 중 상기 현재 프레임 이전 시점의 키 프레임(key frame)에 대응하는 제1 깊이 정보를 산출하는 단계; 상기 현재 프레임에 대응하는 제2 깊이 정보를 산출하는 단계; 상기 제1 깊이 정보 및 상기 제2 깊이 정보에 기초하여, 상기 현재 프레임에 대응하는 카메라의 포즈 정보를 추정하는 단계; 및 상기 제2 깊이 정보 및 상기 카메라의 포즈 정보에 기초하여, 상기 글로벌 스파스 깊이 맵을 산출하는 단계를 포함할 수 있다.The calculating of the global sparse depth map may include calculating first depth information corresponding to a key frame at a time before the current frame among the plurality of frames; Calculating second depth information corresponding to the current frame; Estimating pose information of a camera corresponding to the current frame based on the first depth information and the second depth information; And calculating the global sparse depth map based on the second depth information and the pose information of the camera.

상기 제2 깊이 정보를 산출하는 단계는 상기 현재 프레임에 포함된 우안 영상과 좌안 영상의 스테레오 매칭(stereo matching)을 수행하는 단계를 포함할 수 있다.The calculating of the second depth information may include performing stereo matching of a right-eye image and a left-eye image included in the current frame.

상기 카메라의 포즈 정보는 제1 위치에서 제2 위치로의 상기 카메라의 이동에 따라 변화되는 회전 정보 및 이동 정보 중 적어도 하나를 포함할 수 있다.The pose information of the camera may include at least one of rotation information and movement information changed according to the movement of the camera from the first position to the second position.

상기 로컬 덴스 깊이 맵을 산출하는 단계는 복수의 픽셀 포인트들을 포함하는 상기 현재 프레임을 인공 신경망에 입력함으로써, 상기 복수의 픽셀 포인트들의 깊이 정보에 대응하는 상기 인공 신경망의 출력들을 획득하는 단계; 및 상기 출력들에 기초하여, 상기 로컬 덴스 깊이 맵을 산출하는 단계를 포함할 수 있다.The calculating of the local dense depth map may include inputting the current frame including a plurality of pixel points into an artificial neural network, thereby obtaining outputs of the artificial neural network corresponding to depth information of the plurality of pixel points; And calculating the local depth depth map based on the outputs.

상기 동적 객체 영역을 추출하는 단계는 상기 현재 프레임을 인공 신경망에 입력함으로써, 정적 객체 영역과 동적 객체 영역으로 분류된 상기 인공 신경망의 출력들을 획득하는 단계; 및 상기 출력들에 기초하여, 상기 동적 객체 영역을 추출하는 단계를 포함할 수 있다.The step of extracting the dynamic object region may include obtaining the outputs of the artificial neural network classified as a static object region and a dynamic object region by inputting the current frame into the artificial neural network; And extracting the dynamic object region based on the outputs.

상기 글로벌 덴스 깊이 맵을 생성하는 단계는 상기 로컬 덴스 깊이 맵을 복수의 그리드 셀들로 분할하는 단계; 상기 동적 객체 영역이 제거된 글로벌 스파스 깊이 맵에 기초하여, 상기 그리드 셀들의 꼭지점들(corner points)에 대응하는 픽셀 포인트들의 깊이 정보를 업데이트하는 단계; 및 상기 동적 객체 영역이 제거된 글로벌 스파스 깊이 맵 및 상기 업데이트된 꼭지점들에 대응하는 픽셀 포인트들의 깊이 정보에 기초하여, 상기 그리드 셀들의 내부 영역에 포함되는 픽셀 포인트들의 깊이 정보를 업데이트하는 단계를 포함할 수 있다.The generating of the global dense depth map may include dividing the local dense depth map into a plurality of grid cells; Updating depth information of pixel points corresponding to corner points of the grid cells based on the global sparse depth map from which the dynamic object area is removed; And updating depth information of pixel points included in the inner region of the grid cells based on the global sparse depth map from which the dynamic object region has been removed and depth information of pixel points corresponding to the updated vertices. It can contain.

상기 로컬 덴스 깊이 맵을 산출하는 단계는 상기 현재 프레임에 포함된 우안 영상과 좌안 영상을 특징 추출 모듈에 입력하여, 상기 우안 영상에 대응하는 우 특징 맵(feature map)과 상기 좌안 영상에 대응하는 좌 특징 맵을 산출하는 단계; 상기 우 특징 맵과 상기 좌 특징 맵에 기초하여, 상기 좌안 영상과 상기 우안 영상 간의 매칭되는 픽셀들의 초기 매칭 비용 데이터(initial matching cost data)를 획득하는 단계; 상기 초기 매칭 비용 데이터를 인공 신경망에 입력하여, 매칭 비용 데이터를 예측하는 단계; 상기 매칭 비용 데이터에 기초하여, 상기 매칭되는 픽셀들 각각의 깊이 정보를 산출하는 단계; 및 상기 각각의 깊이 정보에 기초하여, 상기 로컬 덴스 깊이 맵을 산출하는 단계를 포함할 수 있다.In the calculating of the local depth depth map, a right eye image and a left eye image included in the current frame are input to a feature extraction module, and a right feature map corresponding to the right eye image and a left eye corresponding to the left eye image are input. Calculating a feature map; Obtaining initial matching cost data of matching pixels between the left-eye image and the right-eye image based on the right feature map and the left feature map; Predicting matching cost data by inputting the initial matching cost data into an artificial neural network; Calculating depth information of each of the matched pixels based on the matching cost data; And calculating the local dense depth map based on the respective depth information.

상기 특징 추출 모듈은 상기 좌안 영상이 입력되는 좌 컨볼루션 인공 신경망과 상기 우안 영상이 입력되는 우 컨볼루션 인공 신경망을 포함하고, 상기 좌 컨볼루션 인공 신경망과 상기 우 컨볼루션 인공 신경망은 가중치(weight)를 공유할 수 있다.The feature extraction module includes a left convolutional artificial neural network to which the left eye image is input and a right convolutional artificial neural network to which the right eye image is input, and the left convolutional artificial neural network and the right convolutional artificial neural network are weighted. To share.

상기 초기 매칭 비용 데이터를 획득하는 단계는 상기 우 특징 맵과 상기 좌 특징 맵을 연결하여, 상기 초기 매칭 비용 데이터를 획득하는 단계를 포함할 수 있다.The obtaining of the initial matching cost data may include obtaining the initial matching cost data by connecting the right feature map and the left feature map.

상기 매칭 비용 데이터를 예측하는 단계는 모래시계(Hourglass) 인공 신경망을 및 초기 매칭 비용 데이터에 기초하여, 상기 매칭 비용 데이터를 예측하는 단계를 포함할 수 있다.The step of predicting the matching cost data may include predicting the matching cost data based on an hourglass artificial neural network and initial matching cost data.

상기 깊이 정보를 산출하는 단계는 컨볼루션 인공 신경망을 이용하여, 상기 매칭 비용 데이터에 대해 공간 컨볼루션 연산을 수행하는 단계; 상기 공간 컨볼루션 연산 수행 결과에 기초하여, 상기 좌안 영상과 상기 우안 영상 간의 매칭되는 픽셀들의 시차를 추정하는 단계; 및 상기 시차에 기초하여, 상기 깊이 정보를 산출하는 단계를 포함할 수 있다. The calculating of the depth information may include performing a spatial convolution operation on the matching cost data using a convolutional artificial neural network; Estimating a parallax of matching pixels between the left-eye image and the right-eye image based on a result of performing the spatial convolution operation; And calculating the depth information based on the parallax.

상기 공간 컨볼루션 연산을 수행하는 단계는 상기 매칭 비용 데이터에 대해 설정된 방향에 따라, 상기 매칭 비용 데이터에 대해 분할을 진행하여 복수의 매칭 비용 레이어를 획득하는 단계; 및 상기 복수의 매칭 비용 레이어 각각에 대하여, 상기 방향에 따라 차례대로 컨볼루션 연산을 수행하는 단계를 포함할 수 있다.The step of performing the spatial convolution operation may include dividing the matching cost data according to a direction set for the matching cost data to obtain a plurality of matching cost layers; And performing a convolution operation sequentially for each of the plurality of matching cost layers according to the direction.

상기 차례대로 컨볼루션 연산을 수행하는 단계는 임의의 매칭 비용 레이어에 대해 컨볼루션 연산을 수행할 때, 상기 임의의 매칭 비용 레이어에 상기 임의의 매칭 비용 레이어 이전의 매칭 비용 레이어의 컨볼루션 결과를 누적한 후, 컨볼루션 연산을 수행하는 단계를 포함할 수 있다.The step of performing the convolution operation in this order accumulates the convolution result of the matching cost layer before the arbitrary matching cost layer in the arbitrary matching cost layer when performing the convolution operation on the arbitrary matching cost layer After that, it may include the step of performing a convolution operation.

상기 시차를 추정하는 단계는 상기 공간 컨볼루션 처리 결과 및 소프트맥스(softmax) 함수에 기초하여, 상기 좌안 영상과 상기 우안 영상 간의 매칭되는 픽셀들의 시차 확률 분포를 획득하는 단계; 및 상기 시차 확률 분포에 기초하여, 상기 시차를 추정하는 단계를 포함할 수 있다. The estimating of the parallax may include obtaining a parallax probability distribution of matching pixels between the left-eye image and the right-eye image based on the spatial convolution processing result and a softmax function; And estimating the parallax based on the parallax probability distribution.

상기 동적 객체 영역을 추출하는 단계는 상기 현재 프레임을 특징 추출 모듈에 입력하여, 상기 현재 프레임에 대응하는 특징 맵을 산출하는 단계; 상기 특징 맵에 기초하여, 상기 현재 프레임에 포함된 객체들의 카테고리 속성(category attribute) 정보를 획득하는 단계; 및 상기 카테고리 속성 정보에 기초하여, 상기 현재 프레임에 포함된 객체들의 상태 정보를 획득하는 단계를 포함할 수 있다.The extracting of the dynamic object region may include inputting the current frame into a feature extraction module to calculate a feature map corresponding to the current frame; Obtaining category attribute information of objects included in the current frame based on the feature map; And obtaining status information of objects included in the current frame based on the category attribute information.

상기 상태 정보를 획득하는 단계는 상기 현재 프레임과 상기 현재 프레임의 이전 프레임 사이의 광류 정보(optical flow information)를 결정하는 단계; 및 상기 광류 정보 및 상기 카테고리 속성 정보에 기초하여, 상기 상태 정보를 획득하는 단계를 포함할 수 있다.The obtaining of the status information may include determining optical flow information between the current frame and a previous frame of the current frame; And obtaining the status information based on the light flow information and the category attribute information.

일 실시예에 따른 깊이 맵 산출 장치는 현재 프레임을 포함하는 복수의 프레임들을 획득하는 카메라; 및 상기 복수의 프레임들을 이용하여, 상기 현재 프레임에 대응하는 글로벌 스파스 깊이 맵(global sparse depth map)을 산출하고, 상기 현재 프레임을 이용하여, 상기 현재 프레임에 대응하는 로컬 덴스 깊이 맵(local dense depth map)을 산출하고, 상기 현재 프레임에서, 정적 객체(static object) 영역을 마스크(mask)하여 동적 객체 영역(none static object)을 추출하고, 상기 글로벌 스파스 깊이 맵에서 상기 동적 객체 영역을 제거하며, 상기 동적 객체 영역이 제거된 글로벌 스파스 깊이 맵과 상기 로컬 덴스 깊이 맵을 통합하여 상기 현재 프레임에 대응하는 글로벌 덴스 깊이 맵(global dense depth map)을 생성하는 프로세서를 포함한다.An apparatus for calculating a depth map according to an embodiment includes a camera that acquires a plurality of frames including a current frame; And calculating a global sparse depth map corresponding to the current frame using the plurality of frames, and using the current frame, a local dense depth map corresponding to the current frame. depth map), in the current frame, a static object area is masked to extract a dynamic object area, and the dynamic object area is removed from the global sparse depth map. And a processor generating a global dense depth map corresponding to the current frame by integrating the global sparse depth map from which the dynamic object region is removed and the local dense depth map.

상기 프로세서는 상기 현재 프레임에 포함된 하나 이상의 픽셀 포인트에 대응하는 깊이 정보를 산출하고, 상기 깊이 정보에 기초하여 상기 픽셀 포인트의 3차원 좌표를 산출할 수 있다.The processor may calculate depth information corresponding to one or more pixel points included in the current frame, and calculate three-dimensional coordinates of the pixel point based on the depth information.

상기 프로세서는 상기 글로벌 덴스 깊이 맵에 기초하여, 상기 현재 프레임에 대응하는 카메라의 포즈 정보를 업데이트할 수 있다.The processor may update pose information of a camera corresponding to the current frame, based on the global depth map.

상기 프로세서는 상기 업데이트가 완료된 카메라의 포즈 정보에 기초하여, 상기 글로벌 스파스 깊이 맵을 업데이트할 수 있다.The processor may update the global sparse depth map based on the pose information of the updated camera.

상기 프로세서는 상기 복수의 프레임들 중 키 프레임(key frame)에 대응하는 제1 깊이 정보를 산출하고, 상기 현재 프레임에 대응하는 제2 깊이 정보를 산출하고, 상기 제1 깊이 정보 및 상기 제2 깊이 정보에 기초하여, 상기 현재 프레임에 대응하는 카메라의 포즈 정보를 추정하며, 상기 제2 깊이 정보 및 상기 카메라의 포즈 정보에 기초하여, 상기 글로벌 스파스 깊이 맵을 산출할 수 있다.The processor calculates first depth information corresponding to a key frame among the plurality of frames, calculates second depth information corresponding to the current frame, and calculates the first depth information and the second depth. Based on information, the pose information of the camera corresponding to the current frame is estimated, and based on the second depth information and the pose information of the camera, the global sparse depth map can be calculated.

상기 프로세서는 상기 현재 프레임에 포함된 우안 영상과 좌안 영상의 스테레오 매칭(stereo matching)을 수행할 수 있다.The processor may perform stereo matching of a right-eye image and a left-eye image included in the current frame.

상기 프로세서는 복수의 픽셀 포인트들을 포함하는 상기 현재 프레임을 인공 신경망에 입력함으로써, 상기 복수의 픽셀 포인트들의 깊이 정보에 대응하는 상기 인공 신경망의 출력들을 획득하고, 상기 출력들에 기초하여 상기 로컬 덴스 깊이 맵을 산출할 수 있다.The processor obtains outputs of the artificial neural network corresponding to depth information of the plurality of pixel points by inputting the current frame including a plurality of pixel points to an artificial neural network, and based on the outputs, the local density depth. You can calculate the map.

상기 프로세서는 상기 현재 프레임을 인공 신경망에 입력함으로써, 정적 객체 영역과 동적 객체 영역으로 분류된 상기 인공 신경망의 출력들을 획득하고, 상기 출력들에 기초하여 상기 동적 객체 영역을 추출할 수 있다.The processor may obtain the outputs of the artificial neural network classified into a static object region and a dynamic object region by inputting the current frame into the artificial neural network, and extract the dynamic object region based on the outputs.

상기 프로세서는 상기 로컬 덴스 깊이 맵을 복수의 그리드 셀들로 분할하고, 상기 동적 객체 영역이 제거된 글로벌 스파스 깊이 맵에 기초하여, 상기 그리드 셀들의 꼭지점들(corner points)에 대응하는 픽셀 포인트들의 깊이 정보를 업데이트하고, 상기 동적 객체 영역이 제거된 글로벌 스파스 깊이 맵 및 상기 업데이트된 꼭지점들에 대응하는 픽셀 포인트들의 깊이 정보에 기초하여, 상기 그리드 셀들의 내부 영역에 포함되는 픽셀 포인트들의 깊이 정보를 업데이트할 수 있다.The processor divides the local density depth map into a plurality of grid cells, and based on the global sparse depth map from which the dynamic object region is removed, the depths of pixel points corresponding to corner points of the grid cells. Update information, and based on the global sparse depth map from which the dynamic object region is removed and depth information of pixel points corresponding to the updated vertices, depth information of pixel points included in the inner region of the grid cells is obtained. Can be updated.

상기 프로세서는 상기 현재 프레임에 포함된 우안 영상과 좌안 영상을 특징 추출 모듈에 입력하여, 상기 우안 영상에 대응하는 우 특징 맵(feature map)과 상기 좌안 영상에 대응하는 좌 특징 맵을 산출하고, 상기 우 특징 맵과 상기 좌 특징 맵에 기초하여, 상기 좌안 영상과 상기 우안 영상 간의 매칭되는 픽셀들의 초기 매칭 비용 데이터(initial matching cost data)를 획득하고, 상기 초기 매칭 비용 데이터를 인공 신경망에 입력하여, 매칭 비용 데이터를 예측하고, 상기 매칭 비용 데이터에 기초하여, 상기 매칭되는 픽셀들 각각의 깊이 정보를 산출하며, 상기 각각의 깊이 정보에 기초하여, 상기 로컬 덴스 깊이 맵을 산출할 수 있다.The processor inputs a right-eye image and a left-eye image included in the current frame into a feature extraction module, calculates a right feature map corresponding to the right-eye image and a left feature map corresponding to the left-eye image, and the Based on the right feature map and the left feature map, initial matching cost data of pixels matched between the left eye image and the right eye image is obtained, and the initial matching cost data is input to an artificial neural network, Prediction of matching cost data, based on the matching cost data, depth information of each of the matched pixels may be calculated, and based on the respective depth information, the local density depth map may be calculated.

도 1은 일 실시예에 따른 SLAM 기법을 설명하기 위한 도면이다.
도 2는 일 실시예에 따른 깊이 맵 생성 방법을 설명하기 위한 순서도이다.
도 3은 일 실시예에 따른 산출 장치의 동작을 설명하기 위한 도면이다.
도 4 는 일 실시예에 따른 글로벌 스파스 깊이 맵을 산출하는 방법을 설명하기 위한 순서도이다.
도 5를 참조하면, 글로벌 스파스 깊이 맵을 산출하는 방법을 설명하기 위한 도면이다.
도 6은 일 실시예에 따른 기하학 스테레오 매칭 기법을 설명하기 위한 도면이다.
도 7과 도 8은 일 실시예에 따른 로컬 덴스 깊이 맵을 산출하는 방법을 설명하기 위한 도면들이다.
도 9는 일 실시예에 따른 공간 컨볼루션 연산을 수행하는 방법을 설명하기 위한 도면이다.
도 10은 일 실시예에 따른 정적 객체를 마스크하는 방법을 설명하기 위한 도면이다.
도 11은 일 실시예에 따른 글로벌 덴스 깊이 맵을 생성하는 방법의 순서도를 도시한 도면이다.
도 12는 일 실시예에 따른 글로벌 덴스 깊이 맵을 생성하는 방법을 설명하기 위한 도면이다.
도 13은 일 실시예에 따른 픽셀 포인트들의 깊이 정보를 업데이트하는 방법을 설명하기 위한 도면이다.
도 14는 일 실시예에 따른 깊이 맵을 산출하는 장치의 블록도이다. 1 is a view for explaining a SLAM technique according to an embodiment.
2 is a flowchart illustrating a method of generating a depth map according to an embodiment.
3 is a view for explaining the operation of the calculation device according to an embodiment.
4 is a flowchart illustrating a method of calculating a global sparse depth map according to an embodiment.
5, a diagram for describing a method of calculating a global sparse depth map.
6 is a diagram for describing a geometric stereo matching technique according to an embodiment.
7 and 8 are diagrams for describing a method of calculating a local depth depth map according to an embodiment.
9 is a diagram illustrating a method of performing a spatial convolution operation according to an embodiment.
10 is a diagram for explaining a method of masking a static object according to an embodiment.
11 is a flowchart illustrating a method of generating a global dense depth map according to an embodiment.
12 is a diagram for describing a method of generating a global dense depth map according to an embodiment.
13 is a diagram for describing a method of updating depth information of pixel points according to an embodiment.
14 is a block diagram of an apparatus for calculating a depth map according to an embodiment.

본 명세서에서 개시되어 있는 특정한 구조적 또는 기능적 설명들은 단지 기술적 개념에 따른 실시예들을 설명하기 위한 목적으로 예시된 것으로서, 실시예들은 다양한 다른 형태로 실시될 수 있으며 본 명세서에 설명된 실시예들에 한정되지 않는다.The specific structural or functional descriptions disclosed in this specification are only for the purpose of describing the embodiments according to the technical concept, and the embodiments may be implemented in various other forms and are limited to the embodiments described herein. Does not work.

제1 또는 제2 등의 용어는 다양한 구성요소들을 설명하는데 사용될 수 있지만, 이런 용어들은 하나의 구성요소를 다른 구성요소로부터 구별하는 목적으로만 이해되어야 한다. 예를 들어 제1 구성요소는 제2 구성요소로 명명될 수 있고, 유사하게 제2 구성요소는 제1 구성요소로도 명명될 수 있다.Terms such as first or second may be used to describe various components, but these terms should be understood only for the purpose of distinguishing one component from other components. For example, the first component may be referred to as the second component, and similarly, the second component may also be referred to as the first component.

어떤 구성요소가 다른 구성요소에 "연결되어" 있다거나 "접속되어" 있다고 언급된 때에는, 그 다른 구성요소에 직접적으로 연결되어 있거나 또는 접속되어 있을 수도 있지만, 중간에 다른 구성요소가 존재할 수도 있다고 이해되어야 할 것이다. 반면에, 어떤 구성요소가 다른 구성요소에 "직접 연결되어" 있다거나 "직접 접속되어" 있다고 언급된 때에는, 중간에 다른 구성요소가 존재하지 않는 것으로 이해되어야 할 것이다. 구성요소들 간의 관계를 설명하는 표현들, 예를 들어 "~간의에"와 "바로~간의에" 또는 "~에 이웃하는"과 "~에 직접 이웃하는" 등도 마찬가지로 해석되어야 한다.When an element is said to be "connected" or "connected" to another component, it is understood that other components may be directly connected or connected to the other component, but other components may exist in the middle. It should be. On the other hand, when a component is said to be "directly connected" or "directly connected" to another component, it should be understood that no other component exists in the middle. Expressions describing the relationship between the components, for example, "between" and "immediately between" or "adjacent to" and "directly adjacent to" should be interpreted similarly.

단수의 표현은 문맥상 명백하게 다르게 뜻하지 않는 한, 복수의 표현을 포함한다. 본 명세서에서, "포함하다" 또는 "가지다" 등의 용어는 설시된 특징, 숫자, 단계, 동작, 구성요소, 부분품 또는 이들을 조합한 것이 존재함으로 지정하려는 것이지, 하나 또는 그 이상의 다른 특징들이나 숫자, 단계, 동작, 구성요소, 부분품 또는 이들을 조합한 것들의 존재 또는 부가 가능성을 미리 배제하지 않는 것으로 이해되어야 한다.Singular expressions include plural expressions unless the context clearly indicates otherwise. In this specification, the terms "include" or "have" are intended to designate the presence of a feature, number, step, action, component, part, or combination thereof as described, one or more other features or numbers, It should be understood that the presence or addition possibilities of steps, actions, components, parts or combinations thereof are not excluded in advance.

다르게 정의되지 않는 한, 기술적이거나 과학적인 용어를 포함해서 여기서 사용되는 모든 용어들은 해당 기술 분야에서 통상의 지식을 가진 자에 의해 일반적으로 이해되는 것과 동일한 의미를 가진다. 일반적으로 사용되는 사전에 정의되어 있는 것과 같은 용어들은 관련 기술의 문맥상 가지는 의미와 일치하는 의미를 갖는 것으로 해석되어야 하며, 본 명세서에서 명백하게 정의하지 않는 한, 이상적이거나 과도하게 형식적인 의미로 해석되지 않는다.Unless otherwise defined, all terms used herein, including technical or scientific terms, have the same meaning as commonly understood by a person skilled in the art. Terms such as those defined in a commonly used dictionary should be interpreted as having meanings consistent with meanings in the context of related technologies, and should not be interpreted as ideal or excessively formal meanings unless explicitly defined herein. Does not.

실시예들은 퍼스널 컴퓨터, 랩톱 컴퓨터, 태블릿 컴퓨터, 스마트 폰, 텔레비전, 스마트 가전 기기, 지능형 자동차, 키오스크, 웨어러블 장치 등 다양한 형태의 제품으로 구현될 수 있다. 이하, 실시예들을 첨부된 도면을 참조하여 상세하게 설명한다. 각 도면에 제시된 동일한 참조 부호는 동일한 부재를 나타낸다.Embodiments may be implemented in various types of products such as personal computers, laptop computers, tablet computers, smart phones, televisions, smart home appliances, intelligent cars, kiosks, and wearable devices. Hereinafter, embodiments will be described in detail with reference to the accompanying drawings. The same reference numerals in each drawing denote the same members.

도 1은 일 실시예에 따른 SLAM 기법을 설명하기 위한 도면이다.1 is a view for explaining a SLAM technique according to an embodiment.

도 1을 참조하면, 일 실시예에 따른 SLAM 기법에 따르면 깊이 맵을 산출하는 장치(이하, '산출 장치')는 센서 정보들을 이용하여 환경에 대한 맵을 작성하는 동시에, 작성된 맵으로부터 자신의 현재 위치를 추정할 수 있다. 산출 장치는 센서(들)로부터 얻어진 센서 정보들을 조합하여 최적화 알고리즘을 통해 자신의 위치를 계산하고 맵을 얻어낼 수 있다. 예를 들어, 산출 장치는 로봇(110)일 수 있고 로봇(110)이 이동하면서 로봇(110)에 설치된 센서를 활용해서 주변의 공간 지형(120)을 인식하고, 얻어진 공간 지형을 이용하여 주변 환경의 맵을 만들면서 동시에 로봇(110)의 상대적인 위치를 알아낼 수 있다.Referring to FIG. 1, according to an SLAM technique according to an embodiment, a device for calculating a depth map (hereinafter referred to as a'calculation device') creates a map for the environment using sensor information, while simultaneously generating a map of the environment from the created map. The location can be estimated. The calculation device may combine sensor information obtained from the sensor(s) to calculate its location through an optimization algorithm and obtain a map. For example, the calculation device may be the robot 110, and as the robot 110 moves, a sensor installed in the robot 110 is used to recognize the surrounding spatial terrain 120, and using the obtained spatial terrain, the surrounding environment While making a map of, it is possible to find out the relative position of the robot 110 at the same time.

SLAM 기법은 실내 자율주행 로봇, 실외 배송로봇, 무인항공기, 수중 및 지하탐사 로봇 등 다양한 로봇 뿐만 아니라 AR/VR 등에도 활용될 수 있다. 실외에서의 위치 정보를 획득하기 위해서 GPS가 널리 사용되고 있지만 보급형 GPS는 위치 정확도가 수 미터 수준이고 높은 빌딩이나 터널 등 장애물에 의한 신호가림이 발생할 경우 정확도가 매우 떨어질 수 있다. 이러한 환경에서도 로봇이 성공적인 자율주행을 수행하기 위해서는 로봇에 장착된 센서가 주변의 환경을 인식하여 수 센티미터 수준의 위치추정 할 수 있어야 한다. 정확한 위치추정을 위해서 SLAM 기법을 활용한 정밀 맵 작성은 필수적일 수 있다.The SLAM technique can be used not only for various robots such as indoor autonomous driving robots, outdoor delivery robots, unmanned aerial vehicles, underwater and underground exploration robots, but also AR/VR. GPS is widely used to acquire location information in the outdoors, but in the case of entry-level GPS, the accuracy of the location may be several meters and the accuracy may be very poor when a signal is blocked by an obstacle such as a high building or tunnel. In this environment, in order for the robot to perform successful autonomous driving, the sensor mounted on the robot must be able to estimate the location of several centimeters by recognizing the surrounding environment. For accurate location estimation, it may be necessary to create a precise map using SLAM technique.

SLAM 기법에 일반적으로 활용되는 센서로는 카메라, 레이저 거리 센서(LiDAR), 자이로 센서, 엔코더 등이 있다. 레이저 거리 센서를 사용하는 경우, 비교적 정확하고, 해상도가 높은 깊이 맵을 산출할 수 있으나, 센서의 측정거리가 닿지 않는 넓은 공간(공항, 쇼핑몰, 대형 홀 등)에서 주행하거나, 레이저 거리 센서가 감지하지 못하는 장애물(유리, 거울)이 많은 공간, 사람이 많이 움직이는 환경에서는 레이저 거리 센서를 이용한 공간인식이 어렵고, 가격이 비싸다는 단점이 있을 수 있다.Sensors commonly used in SLAM techniques include cameras, laser distance sensors (LiDAR), gyro sensors, and encoders. In the case of using a laser distance sensor, a relatively accurate and high resolution depth map can be calculated, but driving in a large space (airport, shopping mall, large hall, etc.) where the sensor's measurement distance does not reach, or detected by the laser distance sensor In an environment where there are many obstacles (glass, mirrors) that cannot be performed, and in environments where a lot of people move, it may be difficult to recognize a space using a laser distance sensor, and may be expensive.

카메라를 사용하는 SLAM 기법을 Visual SLAM 기법이라 한다. Visual SLAM 기법의 경우 레이저 거리 센서를 사용하는 경우보다 가격이 저렴하고, 사용 범위가 더 광범위(다양한 날씨, 상황에서 사용 가능)한 장점이 있으나, 포즈 결정 및 구축된 맵의 정확도가 낮아 알려지지 않은 깊이가 많은 장면에 대해서는 필요한 정보를 거의 제공할 수 없는 문제가 있다.The SLAM technique using the camera is called the Visual SLAM technique. The Visual SLAM technique has the advantage of being cheaper than using a laser distance sensor, and having a wider range of use (available in a variety of weather and situations), but the depth of the pose determination and the constructed map is low, so the depth is unknown. For many scenes, there is a problem that it is hard to provide necessary information.

일 실시예에 따른 산출 장치는 Visual SLAM 기법이 갖고 있는 장점을 유지하면서 딥러닝 기반의 기술을 접목하여 고화질의 신뢰도 높은 맵을 생성할 수 있다. 이하, 도 2, 도 3을 참조하여 깊이 맵 산출 방법 및 산출 장치에 대한 전반적인 동작이, 도 4 내지 도 6을 참조하여 글로벌 스파스 깊이 맵을 산출하는 방법이, 도 7 내지 도 9를 참조하여 로컬 덴스 깊이 맵을 산출하는 방법이, 도 10을 참조하여 정적 객체를 마스크하는 방법이, 도 11 내지 도 13을 참조하여 글로벌 덴스 깊이 맵을 생성하는 방법이, 도 14를 참조하여 산출 장치의 블록도가 상세히 설명된다.The computing device according to an embodiment may generate a map having high image quality and high reliability by applying deep learning-based technology while maintaining the advantages of the Visual SLAM technique. Hereinafter, with reference to FIGS. 2 and 3, a method of calculating a depth map and an overall operation of the calculation apparatus, and a method of calculating a global sparse depth map with reference to FIGS. 4 to 6, with reference to FIGS. 7 to 9 A method of calculating a local dense depth map, a method of masking a static object with reference to FIG. 10, a method of generating a global dense depth map with reference to FIGS. 11 to 13, and a block of the computing device with reference to FIG. The drawings are described in detail.

도 2는 일 실시예에 따른 깊이 맵 생성 방법을 설명하기 위한 순서도이다.2 is a flowchart illustrating a method of generating a depth map according to an embodiment.

도 2를 참조하면, 일 실시예에 따른 깊이 맵 생성 방법은 도 1을 참조하여 전술된 산출 장치에 의해 수행된다. 산출 장치는 하나 또는 그 이상의 하드웨어 모듈, 하나 또는 그 이상의 소프트웨어 모듈, 또는 이들의 다양한 조합에 의하여 구현될 수 있다.Referring to FIG. 2, a method of generating a depth map according to an embodiment is performed by the calculation device described above with reference to FIG. 1. The computing device may be implemented by one or more hardware modules, one or more software modules, or various combinations thereof.

단계(210)에서, 산출 장치는 현재 프레임을 포함하는 복수의 프레임들을 이용하여 현재 프레임에 대응하는 글로벌 스파스 깊이 맵(global sparse depth map)을 산출한다. 산출 장치는 복수의 프레임들을 포함하는 입력 영상을 획득할 수 있다. 프레임은 산출 장치에 입력되는 입력 영상의 단위일 수 있고, 프레임은 픽셀 포인트들의 집합일 수 있다. 깊이 맵 상의 픽셀 포인트를 맵 포인트라 지칭할 수 있다.In step 210, the calculation device calculates a global sparse depth map corresponding to the current frame using a plurality of frames including the current frame. The calculation device may acquire an input image including a plurality of frames. The frame may be a unit of an input image input to the calculation device, and the frame may be a set of pixel points. Pixel points on the depth map may be referred to as map points.

입력 영상은 예를 들어, 실시간 영상(live image) 또는 동영상(moving picture)일 수 있다. 또는 입력 영상은 모노 영상일 수도 있고, 스테레오 영상일 수도 있다. 입력 영상은 산출 장치에 포함된 카메라를 통해 캡쳐된 것일 수도 있고, 산출 장치의 외부로부터 획득된 것일 수도 있다.The input image may be, for example, a live image or a moving picture. Alternatively, the input image may be a mono image or a stereo image. The input image may be captured through a camera included in the calculation device, or may be obtained from the outside of the calculation device.

산출 장치는 입력 영상에 포함된 복수의 프레임들의 특징점(feature point) 매칭을 통해 카메라의 포즈 정보를 추적하고 이들 특징점들에 대한 깊이 맵을 생성할 수 있다. 특징점은 프레임의 대표성을 갖는 픽셀 포인트를 의미할 수 있다.The calculation device may track pose information of the camera through feature point matching of a plurality of frames included in the input image and generate a depth map for these feature points. The feature point may mean a pixel point having representativeness of the frame.

산출 장치는 현재 프레임에 대응하는 글로벌 스파스 깊이 맵을 산출할 수 있다. 깊이 맵은 프레임에 존재하는 픽셀들 사이의 상대적인 거리를 특정 방법으로(예를 들어, 그레이 스케일(gray scale))로 구분하여 나타낸 이미지일 수 있다.The calculating device may calculate a global sparse depth map corresponding to the current frame. The depth map may be an image represented by dividing the relative distances between pixels present in a frame in a specific method (eg, gray scale).

글로벌 스파스 깊이 맵은 후술할 로컬 덴스 깊이 맵(local dense depth map)과 대응되는 개념일 수 있다. '글로벌'과 '로컬'의 개념과 관련하여, 프레임의 픽셀 포인트에 대응하는 삼차원 공간에서 좌표(이하, '글로벌 좌표'로 지칭) 정보를 포함하는 깊이 맵을 '글로벌 깊이 맵'이라 지칭할 수 있고, 단순히 픽셀 포인트에 대응하는 깊이 정보만 포함하는 깊이 맵을 '로컬 깊이 맵'이라 지칭할 수 있다.The global sparse depth map may be a concept corresponding to a local dense depth map, which will be described later. In relation to the concept of'global' and'local', a depth map including coordinate information (hereinafter referred to as'global coordinates') in a three-dimensional space corresponding to a pixel point of a frame may be referred to as a'global depth map'. A depth map that includes only depth information corresponding to a pixel point may be referred to as a'local depth map'.

'스파스'와 '덴스'의 개념과 관련하여, 미리 정해진 비율 이상(예를 들어, 프레임에 포함된 모든 픽셀 포인트)의 픽셀 포인트들에 대한 깊이 정보를 포함하는 깊이 맵을 '덴스 깊이 맵'이라 지칭할 수 있고, 미리 정해진 비율 미만의 픽셀 포인트들에 대한 깊이 정보를 포함하는 깊이 맵을 '스파스 깊이 맵'이라 지칭할 수 있다. 위 설명을 종합하면, 글로벌 스파스 깊이 맵은 미리 정해진 비율 미만의 픽셀 포인트들에 대응하는 글로벌 좌표 정보를 포함하는 깊이 맵일 수 있고, 로컬 덴스 깊이 맵은 미리 정해진 비율 이상의 픽셀 포인트들에 대응하는 깊이 정보만을 포함하는 깊이 맵일 수 있다. Regarding the concept of'sparse' and'dens', a depth map including depth information for pixel points having a predetermined ratio or more (eg, all pixel points included in a frame) is referred to as a'dens depth map'. The depth map including depth information for pixel points having a predetermined ratio or less may be referred to as a'sparse depth map'. Summarizing the above description, the global sparse depth map may be a depth map including global coordinate information corresponding to pixel points less than a predetermined ratio, and the local dense depth map may correspond to depths corresponding to pixel points of a predetermined ratio or more. It may be a depth map including only information.

산출 장치는 현재 프레임에 대응하는 깊이 정보를 산출할 수 있고, 현재 프레임에 대응하는 카메라의 포즈 정보를 추정할 수 있다. 나아가, 산출 장치는 깊이 정보 및 카메라의 포즈 정보에 기초하여, 픽셀 포인트의 3차원 좌표를 산출하여 글로벌 스파스 깊이 맵을 산출할 수 있다.The calculation device may calculate depth information corresponding to the current frame, and estimate pose information of the camera corresponding to the current frame. Furthermore, the calculation device may calculate a global sparse depth map by calculating three-dimensional coordinates of the pixel point based on the depth information and the pose information of the camera.

예를 들어, 산출 장치는 스테레오 카메라를 이용하여 입력 영상을 획득할 수 있고, 이 경우 프레임은 좌안 렌즈에 대응하는 좌안 영상 및 우안 렌즈에 대응하는 우안 영상을 포함할 수 있다. 산출 장치는 좌안 영상과 우안 영상에 대하여 기하학 스테레오 매칭 기법(geometric stereo matching strategy)을 적용하여 현재 프레임에 대응하는 깊이 정보를 산출할 수 있다. 기하학 스테레오 매칭 기법에 대한 상세한 방법은 도 6을 참조하여 설명된다.For example, the calculation device may acquire an input image using a stereo camera, and in this case, the frame may include a left-eye image corresponding to the left-eye lens and a right-eye image corresponding to the right-eye lens. The calculation device may calculate depth information corresponding to the current frame by applying a geometric stereo matching strategy to the left-eye image and the right-eye image. A detailed method of the geometric stereo matching technique is described with reference to FIG. 6.

현재 프레임에 대응하는 깊이 정보를 산출한 산출 장치는, 현재 프레임에 대응하는 카메라의 포즈 정보를 추정할 수 있다. 카메라의 포즈 정보는 제1 위치에서 제2 위치로의 카메라의 이동에 따라 변화되는 회전 정보 및 이동 정보 중 적어도 하나를 포함할 수 있다. 예를 들어, 카메라의 포즈 정보는 카메라의 회전(rotation) 정보(R) 및 이동(translation) 정보(T)를 포함할 수 있다. 또는 카메라의 포즈 정보는 예를 들어, 카메라의 위치에 해당하는 X(수평), Y(수직), Z(깊이) 및/또는 카메라의 자세(orientation)에 해당하는 피치(pitch), 요(yaw), 및 롤(roll)을 포함하는 6 자유도(6 DoF) 카메라 포즈일 수 있다.The calculation device that calculates depth information corresponding to the current frame can estimate pose information of the camera corresponding to the current frame. The pose information of the camera may include at least one of rotation information and movement information changed according to the movement of the camera from the first position to the second position. For example, the pose information of the camera may include rotation information (R) and translation information (T) of the camera. Alternatively, the pose information of the camera may include, for example, a pitch, yaw corresponding to X (horizontal), Y (vertical), Z (depth) and/or orientation of the camera corresponding to the position of the camera. ), and a 6 degree of freedom (6 DoF) camera pose including a roll.

산출 장치는 예를 들어, 연속적인 일련의 프레임들에서의 픽셀들 간의 상관 관계를 이용하여 입력 영상을 촬영한 카메라의 이동 정보(예를 들어, 카메라의 위치) 및 회전 정보(예를 들어, 카메라의 자세)를 포함하는 포즈 정보를 추정할 수 있다.The calculation device may include, for example, movement information (for example, the position of the camera) and rotation information (for example, the camera) of the camera capturing the input image using correlation between pixels in a series of consecutive frames. Pose information) can be estimated.

산출 장치는 깊이 정보 및 카메라의 포즈 정보에 기초하여, 픽셀 포인트의 3차원 좌표를 산출할 수 있다. 픽셀 포인트의 글로벌 좌표는 해당 픽셀 포인트에 대응하는 깊이 정보와 해당 픽셀 포인트에 대응하는 카메라의 포즈 정보의 곱에 기초하여 결정될 수 있다. 글로벌 스파스 깊이 맵을 산출하는 상세한 방법은 도 4 내지 도 6를 참조하여 설명된다.The calculation device may calculate the 3D coordinates of the pixel point based on the depth information and the pose information of the camera. The global coordinates of a pixel point may be determined based on a product of depth information corresponding to the pixel point and pose information of the camera corresponding to the pixel point. A detailed method of calculating the global sparse depth map is described with reference to FIGS. 4 to 6.

단계(220)에서, 산출 장치는 현재 프레임을 이용하여, 현재 프레임에 대응하는 로컬 덴스 깊이 맵(local dense depth map)을 산출할 수 있다. 전술한 바와 같이, 로컬 덴스 깊이 맵은 미리 정해진 비율 이상의 픽셀 포인트들에 대응하는 깊이 정보만을 포함하는 깊이 맵일 수 있다.In step 220, the calculation device may calculate a local dense depth map corresponding to the current frame using the current frame. As described above, the local depth map may be a depth map including only depth information corresponding to pixel points of a predetermined ratio or more.

산출 장치는 시맨틱 정보(semantic information) 및 글로벌 맥락 정보(global contextual information)와 같은 이미지에 대한 사전 지식을 고려하여 고품질의 정확도가 높은 로컬 덴스 깊이 맵을 산출할 수 있다. 로컬 덴스 깊이 맵을 산출하는 상세한 방법은 도 7 내지 도 9를 참조하여 설명된다.The calculation device may calculate a local dense depth map with high quality and high accuracy in consideration of prior knowledge of images such as semantic information and global contextual information. A detailed method of calculating the local dense depth map is described with reference to FIGS. 7 to 9.

글로벌 스파스 깊이 맵은 로컬 덴스 깊이 맵에 비해 적은 픽셀 포인트들에 대한 깊이 정보만을 포함하고 있지만, 엄밀한 기하학 관계에 기초하여 깊이 정보를 계산하였기 때문에 로컬 덴스 깊이 맵에 비해 정확한 깊이 정보를 포함할 수 있다. 이에, 산출 장치는 글로벌 스파스 깊이 맵과 로컬 덴스 깊이 맵을 통합하여 각 깊이 맵의 장점만 취할 수 있다.The global sparse depth map contains only depth information for fewer pixel points than the local depth map, but since depth information is calculated based on a strict geometric relationship, it can include accurate depth information compared to the local depth map. have. Accordingly, the calculation device can take advantage of each depth map by integrating the global sparse depth map and the local dense depth map.

다만, Visual SLAM 기법에 따라 산출된 깊이 정보는 레이저 거리 센서를 이용하여 산출된 깊이 정보에 비해 정확도가 낮을 수 있다. 깊이 정보의 정확도가 낮기 때문에, 깊이 정보에 기초하여 추정되는 카메라의 포즈 정보 정확도 또한 떨어질 수 있다. 보다 구체적으로, Visual SLAM 기법을 이용하여 깊이 정보를 산출하는 경우, 객체가 움직이지 않음을 가정하고 깊이 정보를 산출할 수 있다. 따라서, 픽셀 포인트가 동적 객체(none static object) 영역에 포함되는 경우, 상기 픽셀 포인트에 대응하는 깊이 정보의 정확도는 낮을 수 있다.However, depth information calculated according to the Visual SLAM technique may have lower accuracy than depth information calculated using a laser distance sensor. Since the accuracy of the depth information is low, the accuracy of the pose information of the camera estimated based on the depth information may also decrease. More specifically, when depth information is calculated using the Visual SLAM technique, it is possible to calculate depth information on the assumption that the object does not move. Therefore, when a pixel point is included in a region of a static object, the accuracy of depth information corresponding to the pixel point may be low.

이러한 문제점을 보완하기 위하여, 단계(230)에서 산출 장치는 현재 프레임에서, 정적 객체(static object) 영역을 마스크(mask)하여 동적 객체 영역을 추출하고, 단계(240)에서 산출 장치는 글로벌 스파스 깊이 맵에서 동적 객체 영역을 제거한다. To compensate for this problem, in step 230, the computing device extracts a dynamic object area by masking a static object area in the current frame, and in step 240, the computing device performs global sparse. Remove dynamic object regions from the depth map.

산출 장치는 인공 신경망을 이용하여 프레임에서 동적 객체와 정적 객체를 분류하고, 정적 객체 영역을 마스크하여 동적 객체 영역만 추출할 수 있다. 동적 객체 영역에 포함된 픽셀 포인트에 기초하여 깊이 정보를 산출하면 깊이 정보의 정확도가 떨어지기 때문에, 산출 장치는 글로벌 스파스 깊이 맵에서 동적 객체 영역을 제거할 수 있다. 정적 객체 영역을 마스크하는 상세한 방법은, 도 10을 참조하여 설명된다.The computing device may classify dynamic objects and static objects in a frame using an artificial neural network, and extract only dynamic object regions by masking the static object regions. When the depth information is calculated based on the pixel points included in the dynamic object region, since the accuracy of the depth information is deteriorated, the calculating apparatus may remove the dynamic object region from the global sparse depth map. A detailed method of masking the static object area is described with reference to FIG. 10.

단계(250)에서, 산출 장치는 동적 객체 영역이 제거된 글로벌 스파스 깊이 맵과 로컬 덴스 깊이 맵을 통합하여 현재 프레임에 대응하는 글로벌 덴스 깊이 맵을 생성한다.In step 250, the calculation device generates a global dense depth map corresponding to the current frame by integrating the local sparse depth map and the global sparse depth map from which the dynamic object region is removed.

동적 객체 영역이 제거된 글로벌 스파스 깊이 맵은 정확도가 높은 깊이 정보를 포함하고 있으나, 깊이 정보를 갖는 픽셀 포인트가 너무 희소할 수 있다. 반면에, 로컬 덴스 깊이 맵은 많은 픽셀 포인트의 깊이 정보를 갖고 있으나, 깊이 정보의 정확도가 낮을 수 있다. 산출 장치는 동적 객체 영역이 제거된 글로벌 스파스 깊이 맵과 로컬 덴스 깊이 맵을 통합하여 정확도가 높은 다수의 픽셀 포인트들의 깊이 정보를 포함하는 글로벌 덴스 깊이 맵을 생성할 수 있다. 글로벌 덴스 깊이 맵을 생성하는 상세한 방법은 도 11 내지 도 13을 참조하여 설명된다.The global sparse depth map from which the dynamic object region is removed includes depth information with high accuracy, but pixel points having depth information may be too rare. On the other hand, the local dense depth map has depth information of many pixel points, but the accuracy of depth information may be low. The calculation device may generate a global dense depth map including depth information of a plurality of pixel points with high accuracy by integrating a local sparse depth map with a dynamic object region removed and a local dense depth map. A detailed method of generating a global dense depth map is described with reference to FIGS. 11 to 13.

도 3은 일 실시예에 따른 산출 장치의 동작을 설명하기 위한 도면이다.3 is a view for explaining the operation of the calculation device according to an embodiment.

도 3을 참조하면, 일 실시예에 따른 산출 장치는 카메라(310), SLAM 모듈(320), 인공 신경망 모듈(330), 깊이 맵 통합 모듈(340)을 포함할 수 있다. 도 3의 실시 예에서 이를 별도로 구성하여 도시한 것은 각 기능들을 구별하여 설명하기 위함이다. 따라서 실제로 제품을 구현하는 경우에 이들 모두를 프로세서에서 처리하도록 구성할 수도 있으며, 이들 중 일부만을 프로세서에서 처리하도록 구성할 수도 있다.Referring to FIG. 3, the calculation device according to an embodiment may include a camera 310, a SLAM module 320, an artificial neural network module 330, and a depth map integration module 340. In the embodiment of FIG. 3, this is configured and illustrated separately to describe each function separately. Therefore, in the case of actually implementing a product, all of them may be configured to be processed by the processor, or only a part of them may be configured to be processed by the processor.

카메라(310)는 스테레오 카메라일 수 있다. 카메라(310)에 의해 촬영된 영상은 복수의 프레임들을 포함할 수 있다. 스테레오 카메라를 이용하는 경우, 좌안 렌즈에 대응하는 좌안 영상 및 우안 렌즈에 대응하는 우안 영상을 획득할 수 있다. The camera 310 may be a stereo camera. The image captured by the camera 310 may include a plurality of frames. When a stereo camera is used, a left-eye image corresponding to a left-eye lens and a right-eye image corresponding to a right-eye lens may be obtained.

SLAM 모듈(320)은 동시적 위치추정을 수행하는 포즈 추정 모듈(321)과 지도작성을 수행하는 글로벌 스파스 깊이 맵 산출 모듈(323)을 포함할 수 있다. 포즈 추정 모듈(321)은 카메라(310)에 의해 획득된 복수의 프레임들을 입력 받고, 복수의 프레임들에 기하학 스테레오 매칭 기법을 적용하여 현재 프레임에 대응하는 깊이 정보를 산출할 수 있다. 나아가, 포즈 추정 모듈(321)은 연속적인 일련의 프레임들에서의 픽셀들 간의 상관 관계를 이용하여 카메라의 포즈 정보를 추정할 수 있다.The SLAM module 320 may include a pose estimation module 321 that performs simultaneous location estimation and a global sparse depth map calculation module 323 that performs mapping. The pose estimation module 321 may receive a plurality of frames obtained by the camera 310 and apply depth geometry matching to the plurality of frames to calculate depth information corresponding to the current frame. Furthermore, the pose estimation module 321 may estimate pose information of the camera using a correlation between pixels in a series of consecutive frames.

글로벌 스파스 깊이 맵 산출 모듈(323)은 포즈 추정 모듈(321)로부터 수신한 현재 프레임에 대응하는 깊이 정보 및 현재 프레임에 대응하는 카메라의 포즈 정보에 기초하여, 픽셀 포인트의 3차원 좌표를 산출할 수 있다.The global sparse depth map calculation module 323 calculates three-dimensional coordinates of pixel points based on depth information corresponding to the current frame received from the pose estimation module 321 and pose information of the camera corresponding to the current frame. Can.

인공 신경망 모듈(330)은 로컬 덴스 깊이 맵 산출 모듈(331)과 마스킹 모듈(333)을 포함할 수 있다. 로컬 덴스 깊이 맵 산출 모듈(331)과 마스킹 모듈(333)은 포즈 추정 모듈(321)과 달리 현재 프레임만 입력 받으면 되고, 현재 프레임 이외의 프레임들을 입력 받지 않을 수 있다. 로컬 덴스 깊이 맵 산출 모듈(331)은 현재 프레임 입력 받아, 복수의 픽셀 포인트들에 대응하는 깊이 정보들을 출력할 수 있다. 예를 들어, 로컬 덴스 깊이 맵 산출 모듈(331)은 시맨틱 정보 및 글로벌 맥락 정보를 고려하여 입력 받은 좌안 영상과 우안 영상에 대한 스테레오 매칭을 수행할 수 있다. 마스킹 모듈(333)은 현재 프레임을 입력 받아, 현재 프레임에서 정적 객체(static object) 영역을 마스크(mask)하여 동적 객체 영역을 추출할 수 있다.The artificial neural network module 330 may include a local density depth map calculation module 331 and a masking module 333. Unlike the pose estimation module 321, the local density depth map calculation module 331 and the masking module 333 need only receive the current frame, and may not receive frames other than the current frame. The local density depth map calculation module 331 may receive a current frame and output depth information corresponding to a plurality of pixel points. For example, the local dense depth map calculation module 331 may perform stereo matching on the received left-eye image and the right-eye image in consideration of semantic information and global context information. The masking module 333 may receive a current frame and extract a dynamic object region by masking a static object region in the current frame.

깊이 맵 통합 모듈(340)은 동적 객체 영역 제거 모듈(341)과 글로벌 덴스 깊이 맵 생성 모듈(343)을 포함할 수 있다. 동적 객체 영역 제거 모듈(341)은 글로벌 스파스 깊이 맵에서 동적 객체 영역을 제거할 수 있다. 글로벌 덴스 깊이 맵 성성 모듈(343)은 동적 객체 영역이 제거된 글로벌 스파스 깊이 맵과 로컬 덴스 깊이 맵을 통합하여 현재 프레임에 대응하는 글로벌 덴스 깊이 맵을 생성할 수 있다.The depth map integration module 340 may include a dynamic object region removal module 341 and a global dense depth map generation module 343. The dynamic object region removal module 341 may remove the dynamic object region from the global sparse depth map. The global dense depth map property module 343 may generate a global dense depth map corresponding to the current frame by integrating the global sparse depth map with the dynamic object region removed and the local dense depth map.

포즈 추정 모듈(321)과 로컬 덴스 깊이 맵 산출 모듈(331)은 좌안 영상과 우안 영상의 스테레오 매칭을 수행하는데, 스테레오 매칭을 수행 시 오차가 발생할 수 있고, 이러한 오차는 동작의 수행에 따라 누적되어, 카메라의 포즈 및 글로벌 덴스 깊이 맵의 추정 값과 실제 값 사이에 오차가 존재할 수 있다. 이러한 오차를 제거하기 위해, 카메라의 포즈, 글로벌 스파스 깊이 맵 및 글로벌 덴스 깊이 맵에 대하여 최적화를 수행하여, 더 정확한 결과를 획득할 수 있다.The pose estimation module 321 and the local density depth map calculation module 331 perform stereo matching of the left-eye image and the right-eye image. Errors may occur when performing stereo matching, and these errors are accumulated according to the performance of the operation , An error may exist between the pose of the camera and the estimated value and the actual value of the global dense depth map. In order to eliminate such errors, optimization may be performed on the pose of the camera, the global sparse depth map, and the global dense depth map to obtain more accurate results.

일 실시예에 따른 최적화는 글로벌 덴스 깊이 맵에 기초한 카메라의 포즈 최적화(이하, 포즈 최적화라 지칭), 카메라의 포즈에 기초한 글로벌 스파스 깊이 맵 최적화(이하, 스파스 깊이 맵 최적화라 지칭) 및 글로벌 스파스 깊이 맵에 기초한 글로벌 덴스 깊이 맵 최적화(이하, 깊이 맵 통합 최적화이라 지칭)를 포함할 수 있다.The optimization according to an embodiment is a pose optimization of a camera based on a global dense depth map (hereinafter referred to as a pose optimization), a global sparse depth map optimization based on a camera pose (hereinafter referred to as a sparse depth map optimization) and a global. And a global dense depth map optimization based on a sparse depth map (hereinafter referred to as depth map integration optimization).

글로벌 덴스 깊이 맵의 맵 포인트를 두 부류로 나눌 수 있다. 제1 맵 포인트는 글로벌 스파스 깊이 맵으로부터 온 맵 포인트이고, 제2 맵 포인트는 글로벌 스파스 깊이 맵에는 위치하지 않고 로컬 덴스 깊이 맵에만 위치한 맵 포인트일 수 있다.You can divide the map points of the global dense depth map into two categories. The first map point is a map point from the global sparse depth map, and the second map point is not a global sparse depth map but may be a map point located only in the local dense depth map.

포즈 최적화는 글로벌 덴스 깊이 맵에 기초하여, 현재 프레임에 대응하는 카메라의 포즈 정보를 업데이트하는 것을 의미할 수 있다. 포즈 추정 모듈(321)은 최적화 효율을 고려하여 제2 맵 포인트에만 기초하여, 포즈 최적화를 수행할 수 있다. 즉, 포즈 추정 모듈(321)은 로컬 덴스 깊이 맵에만 위치한 맵 포인트를 이용하여 포즈 최적화를 수행할 수 있다.Pose optimization may mean updating pose information of a camera corresponding to a current frame based on a global depth map. The pose estimation module 321 may perform pose optimization based on only the second map point in consideration of optimization efficiency. That is, the pose estimation module 321 may perform pose optimization using map points located only in the local depth map.

스파스 깊이 맵 최적화는 포즈 최적화가 완료된 포즈 정보에 기초하여, 글로벌 스파스 깊이 맵을 업데이트하는 것을 의미할 수 있다. 글로벌 스파스 깊이 맵 산출 모듈(323)은 최적화 효율을 고려하여 제1 맵 포인트에만 기초하여, 스파스 깊이 맵 최적화를 수행할 수 있다.Sparse depth map optimization may mean updating the global sparse depth map based on pose information for which pose optimization has been completed. The global sparse depth map calculation module 323 may perform sparse depth map optimization based on only the first map point in consideration of optimization efficiency.

포즈 최적화와 스파스 깊이 맵 최적화가 완료되면, 글로벌 덴스 깊이 맵 생성 모듈(343)은 깊이 맵 통합 최적화를 수행할 수 있다. 깊이 맵 통합 최적화는 포즈 최적화와 스파스 깊이 맵 최적화가 완료될 때 마다, 맵 통합 방법을 이용하여 업데이트된 글로벌 덴스 깊이 맵을 생성하는 것을 의미할 수 있다. 업데이트된 글로벌 덴스 깊이 맵은 다시 포즈 최적화에 사용될 수 있다.When the pose optimization and the sparse depth map optimization are completed, the global dense depth map generation module 343 may perform depth map integration optimization. Depth map integration optimization may mean generating an updated global dense depth map using a map integration method whenever pose optimization and sparse depth map optimization are completed. The updated global density depth map can again be used for pose optimization.

도 4 는 일 실시예에 따른 글로벌 스파스 깊이 맵을 산출하는 방법을 설명하기 위한 순서도이다.4 is a flowchart illustrating a method of calculating a global sparse depth map according to an embodiment.

도 4를 참조하면, 일 실시예에 따른 단계들(410 내지 440)은 도 1 내지 도 3을 참조하여 전술된 산출 장치에 의해 수행된다.Referring to FIG. 4, steps 410 to 440 according to an embodiment are performed by the calculation device described above with reference to FIGS. 1 to 3.

단계(410)에서, 산출 장치는 복수의 프레임들 중 현재 프레임 이전 시점의 키 프레임(key frame)에 대응하는 제1 깊이 정보를 산출할 수 있다. 다시각 기하학 이론에 기초하여, 키 프레임에 대해 스테레오 매칭을 수행하여 제1 깊이 정보를 산출할 수 있다. 제1 깊이 정보는 키 프레임에 포함된 특징점에 대응하는 깊이 정보를 포함할 수 있다.In operation 410, the calculation device may calculate first depth information corresponding to a key frame at a time before the current frame among the plurality of frames. Again, based on each geometry theory, the first depth information may be calculated by performing stereo matching on the key frame. The first depth information may include depth information corresponding to the feature points included in the key frame.

카메라의 포즈 정보 추정(특징점 매칭) 및 깊이 맵 산출(또는, 갱신)이 모든 영상 프레임에 대해서 이루어지면, 처리속도가 느리고 또한 처리 속도가 느리기 때문에 정밀한 알고리즘을 사용하기 힘든 문제점이 있을 수 있다. 이에, 일 실시예에 따른 산출 장치는 변화가 큰 장면이나 일정 시간 간격으로만 깊이 맵 산출을 수행하고, 그 사이의 장면은 변화되는 정보만 저장할 수 있다. 이 때, 깊이 맵 산출의 기초가 되는 프레임을 키 프레임이라 하고, 예를 들어 약 2초 간격으로 키 프레임이 배치될 수 있다. 키 프레임으로 선택되지 않은 프레임들을 키 프레임과 구분하기 위해 일반 프레임으로 지칭 한다.When the camera pose information estimation (feature point matching) and depth map calculation (or update) are performed for all image frames, it may be difficult to use a precise algorithm because the processing speed is slow and the processing speed is slow. Accordingly, the calculation apparatus according to an embodiment may perform depth map calculation only at a scene with a large change or at a predetermined time interval, and the scene therebetween may store only information that changes. At this time, the frame that is the basis of the depth map calculation is referred to as a key frame, and for example, key frames may be arranged at intervals of about 2 seconds. Frames that are not selected as key frames are referred to as general frames to distinguish them from key frames.

키 프레임을 선정하는 방법의 일 예로, 깊이 맵에 새로 추가되는 키 프레임들은 다음의 3가지 조건을 모두 만족하는 경우에 한한다. i) 해당 프레임에서 추정된 카메라 포즈 정보의 품질이 좋아야 한다. 예를 들어, 매칭된 특징점의 비율이 특정 임계값 이상인 경우, 추정된 카메라 포즈 정보의 품질이 좋다고 판단될 수 있다. ii) 가장 최근에 깊이 맵에 추가된 키 프레임과 미리 정해진 시간 차이가 나야 한다. 예를 들어, 키 프레임으로 선정되려면, 해당 프레임이 가장 최근에 깊이 맵에 추가된 키 프레임과 최소 20프레임 이상 시간 차이가 나야 한다. iii) 카메라와 기존 깊이 맵과의 최단거리가 특정 임계값 이하여야 한다. 조건 iii)은 기존 깊이 맵과 너무 멀리 떨어진 시점에서 찍은 영상을 깊이 맵에 추가할 경우 기존 깊이 맵이 깨질 수 있기 때문에 이를 방지하기 위함이다. 여기서, 상기 키 프레임을 선정하는 방법은 예시적인 사항일 뿐, 위 방법에 제한되지 않는다.As an example of a method of selecting a key frame, key frames newly added to the depth map are limited to satisfying all of the following three conditions. i) The quality of the camera pose information estimated in the corresponding frame should be good. For example, when the ratio of the matched feature points is greater than or equal to a specific threshold, it may be determined that the quality of the estimated camera pose information is good. ii) There must be a predetermined time difference from the key frame added to the depth map most recently. For example, in order to be selected as a key frame, the frame must have a time difference of at least 20 frames from the key frame that was most recently added to the depth map. iii) The shortest distance between the camera and the existing depth map must be below a certain threshold. The condition iii) is to prevent the existing depth map from being broken if an image taken at a point too far away from the existing depth map is added to the depth map. Here, the method for selecting the key frame is only exemplary, and is not limited to the above method.

범위는 개시된 실시예들에 한정되지 않는다.단계(420)에서, 산출 장치는 현재 프레임에 대응하는 제2 깊이 정보를 산출할 수 있다. 제2 깊이 정보는 현재 프레임에 포함된 특징점에 대응하는 깊이 정보를 포함할 수 있다. 다시각 기하학 이론에 기초하여 스테레오 매칭을 수행하는 상세한 방법은 도 6을 참조하여 설명된다.The range is not limited to the disclosed embodiments. In step 420, the calculation device may calculate second depth information corresponding to the current frame. The second depth information may include depth information corresponding to the feature points included in the current frame. Again, a detailed method of performing stereo matching based on each geometry theory is described with reference to FIG. 6.

단계(430)에서, 산출 장치는 제1 깊이 정보 및 제2 깊이 정보에 기초하여, 현재 프레임에 대응하는 카메라의 포즈 정보를 추정할 수 있다. 산출 장치는 현재 프레임이 획득되면 현재 프레임의 특징점과 키 프레임의 특징점을 매칭할 수 있다. 일 예로, 매칭 유사도로는 예를 들어, 8x8 이미지 패치 사이의 SSD(Sum of Squared Difference)를 이용할 수 있다. 매칭이 완료되면 찾아진 매칭쌍들에 대해 재투영 오차(reprojection error)를 최소화 시키도록 카메라의 포즈를 결정할 수 있다. 재투영 오차 란 매칭된 키 프레임의 특징점을 현재의 프레임에 투영시킨 위치와 현재의 프레임에서 실제로 관측된 위치 사이의 에러를 의미할 수 있다. 다만, 카메라의 포즈를 추정하는 방법의 실시예는 위 예에 국한되어 적용될 필요는 없고, 다양한 방법의 추정 방법이 적용될 수 있다.In operation 430, the calculation device may estimate pose information of the camera corresponding to the current frame based on the first depth information and the second depth information. When the current frame is obtained, the calculating device may match the feature points of the current frame and the feature points of the key frame. For example, as the matching similarity, for example, an SSD (Sum of Squared Difference) between 8x8 image patches may be used. When the match is completed, the pose of the camera can be determined to minimize reprojection errors for the matched pairs. The re-projection error may mean an error between a position where a feature point of a matched key frame is projected on the current frame and a position actually observed on the current frame. However, the embodiment of the method for estimating the pose of the camera is not limited to the above example, and various estimation methods may be applied.

단계(440)에서, 산출 장치는 제2 깊이 정보 및 카메라의 포즈 정보에 기초하여, 글로벌 스파스 깊이 맵을 산출할 수 있다. 산출 장치는 제2 깊이 정보 및 카메라의 포즈 정보에 기초하여 현재 프레임에 대응하는 글로벌 좌표를 산출할 수 있고, 글로벌 좌표는 글로벌 스파스 깊이 맵에 추가될 수 있다.In operation 440, the calculation device may calculate a global sparse depth map based on the second depth information and the pose information of the camera. The calculation device may calculate global coordinates corresponding to the current frame based on the second depth information and the pose information of the camera, and the global coordinates may be added to the global sparse depth map.

도 5를 참조하면, 글로벌 스파스 깊이 맵을 산출하는 방법을 설명하기 위한 도면이다.5, a diagram for describing a method of calculating a global sparse depth map.

도 5를 참조하면, 일 실시예에 따른 글로벌 스파스 깊이 맵 산출 동작은 크게 카메라의 포즈 정보를 추정하는 단계와 글로벌 스파스 깊이 맵 산출 단계로 나눌 수 있다. 산출 장치는 비교적 연산로드가 적은 카메라의 포즈 정보를 추정하는 단계는 모든 프레임에 적용하여 실시간성을 추구하고, 깊이 맵 산출 단계는 키 프레임에만 적용하되 시간이 오래 걸리더라도 정밀한 알고리즘을 사용하여 정확도를 추구할 수 있다.Referring to FIG. 5, a global sparse depth map calculation operation according to an embodiment may be largely divided into a step of estimating pose information of a camera and a global sparse depth map calculation step. The calculation device seeks real-time performance by applying all frames to the step of estimating the pose information of the camera with relatively little computational load, and the depth map calculation step applies only to the key frame, but uses a precise algorithm to achieve accuracy even if it takes a long time. Can pursue.

산출 장치는 복수의 프레임들 중 현재 프레임(520) 이전 시점의 키 프레임(510)에 대응하는 제1 깊이 정보(513)를 산출할 수 있다. 산출 장치는 스테레오 카메라를 이용하여 입력 영상을 획득할 수 있고, 이 경우 키 프레임(520)은 좌안 렌즈에 대응하는 좌안 영상(511) 및 우안 렌즈에 대응하는 우안 영상(512)을 포함할 수 있다. 산출 장치는 좌안 영상(511) 및 우안 영상(512)대하여 기하학 스테레오 매칭 기법을 적용하여 키 프레임(510)에 대응하는 제1 깊이 정보(513)를 산출할 수 있다.The calculation device may calculate the first depth information 513 corresponding to the key frame 510 at a time before the current frame 520 among the plurality of frames. The calculation device may acquire an input image using a stereo camera, and in this case, the key frame 520 may include a left-eye image 511 corresponding to the left-eye lens and a right-eye image 512 corresponding to the right-eye lens. . The calculation device may calculate the first depth information 513 corresponding to the key frame 510 by applying a geometric stereo matching technique to the left-eye image 511 and the right-eye image 512.

산출 장치는 현재 프레임(520)에 대응하는 제1 깊이 정보(513)를 산출할 수 있다. 현재 프레임(520)이 일반 프레임일 경우에는 산출 장치는 현재 프레임(520)에 대응하는 카메라의 포즈 정보만 추정할 수 있다. 반면에, 현재 프레임(520)은 키 프레임일 수도 있는데, 이 경우에는 산출 장치는 현재 프레임(520)에 대응하는 카메라의 포즈 정보 및 글로벌 스파스 깊이 맵도 산출할 수 있다.The calculation device may calculate the first depth information 513 corresponding to the current frame 520. When the current frame 520 is a normal frame, the calculation device may estimate only the pose information of the camera corresponding to the current frame 520. On the other hand, the current frame 520 may be a key frame. In this case, the calculation device may also calculate the pose information of the camera corresponding to the current frame 520 and a global sparse depth map.

산출 장치는 스테레오 카메라를 이용하여 입력 영상을 획득할 수 있고, 이 경우 현재 프레임(520)은 좌안 렌즈에 대응하는 좌안 영상(521) 및 우안 렌즈에 대응하는 우안 영상(522)을 포함할 수 있다. 산출 장치는 좌안 영상(521) 및 우안 영상(522)대하여 기하학 스테레오 매칭 기법을 적용하여 현재 프레임(520)에 대응하는 제2 깊이 정보(523)를 산출할 수 있다.The calculation device may acquire an input image using a stereo camera, and in this case, the current frame 520 may include a left-eye image 521 corresponding to the left-eye lens and a right-eye image 522 corresponding to the right-eye lens. . The calculation device may calculate the second depth information 523 corresponding to the current frame 520 by applying a geometric stereo matching technique to the left-eye image 521 and the right-eye image 522.

산출 장치는 키 프레임(510)에 대응하는 제1 깊이 정보(513)와 현재 프레임(520)에 대응하는 제2 깊이 정보(523)에 기초하여 현재 프레임에 대응하는 카메라의 포즈 정보(530)를 추정할 수 있다.The calculation device displays the pose information 530 of the camera corresponding to the current frame based on the first depth information 513 corresponding to the key frame 510 and the second depth information 523 corresponding to the current frame 520. Can be estimated.

산출 장치는 제2 깊이 정보(523) 및 카메라의 포즈 정보(530)에 기초하여, 글로벌 스파스 깊이 맵(540)을 산출할 수 있다. 산출 장치는 현재 프레임(520)이 키 프레임인 경우, 제2 깊이 정보(523) 및 카메라의 포즈 정보(530)에 기초하여 현재 프레임(520)에 대응하는 글로벌 좌표를 산출할 수 있고, 글로벌 좌표는 글로벌 스파스 깊이 맵(540)에 추가될 수 있다.The calculation device may calculate the global sparse depth map 540 based on the second depth information 523 and the camera pose information 530. When the current frame 520 is a key frame, the calculation device may calculate global coordinates corresponding to the current frame 520 based on the second depth information 523 and the pose information 530 of the camera. Can be added to the global sparse depth map 540.

도 6은 일 실시예에 따른 기하학 스테레오 매칭 기법을 설명하기 위한 도면이다.6 is a diagram for describing a geometric stereo matching technique according to an embodiment.

도 6을 참조하면, 일 실시예에 따른 산출 장치는 기하학 스테레오 매칭에 기초하여 프레임의 픽셀 포인트에 대응하는 깊이 정보를 산출할 수 있다.Referring to FIG. 6, the calculation apparatus according to an embodiment may calculate depth information corresponding to a pixel point of a frame based on geometric stereo matching.

일 실시예에 따른 카메라는 스테레오 카메라일 수 있고, 스테레오 카메라는 좌안 렌즈(601) 및 우안 렌즈(602)를 포함할 수 있고, 좌안 렌즈(601)는 좌안 영상을 촬영하고, 우안 렌즈(602)는 우안 영상을 촬영할 수 있다. 프레임은 좌안 영상과 우안 영상을 포함할 수 있다.The camera according to an embodiment may be a stereo camera, the stereo camera may include a left-eye lens 601 and a right-eye lens 602, the left-eye lens 601 photographs a left-eye image, and the right-eye lens 602 Can shoot the right eye video. The frame may include a left-eye image and a right-eye image.

좌안 영상은, 픽셀 포인트(603), 픽셀 포인트(604) 및 픽셀 포인트(605)를 포함할 수 있다. 우안 영상은 픽셀 포인트(606), 픽셀 포인트(607) 및 픽셀 포인트(608)를 포함할 수 있다. 좌안 영상의 픽셀 포인트(603)는 우안 영상의 픽셀 포인트(606)와 대응되고, 픽셀 포인트(603)와 픽셀 포인트(606)는 맵 포인트(609)에 매칭될 수 있다. 좌안 영상의 픽셀 포인트(604)는 우안 영상의 픽셀 포인트(607)와 대응되고, 픽셀 포인트(604)와 픽셀 포인트(607)는 맵 포인트 (610)에 매칭될 수 있다. 마찬가지로, 좌안 영상의 픽셀 포인트(605)는 우안 영상의 픽셀 포인트(608)와 대응되고, 픽셀 포인트(605)와 픽셀 포인트(608)는 맵 포인트(611)에 매칭될 수 있다. 동일한 포인트에 대응하는 좌안 영상과 우안 영상의 픽셀 포인트 쌍을 매칭 픽셀 포인트로 지칭할 수 있다.The left-eye image may include a pixel point 603, a pixel point 604 and a pixel point 605. The right-eye image may include a pixel point 606, a pixel point 607, and a pixel point 608. The pixel point 603 of the left-eye image may correspond to the pixel point 606 of the right-eye image, and the pixel point 603 and the pixel point 606 may match the map point 609. The pixel point 604 of the left-eye image may correspond to the pixel point 607 of the right-eye image, and the pixel point 604 and the pixel point 607 may match the map point 610. Similarly, the pixel point 605 of the left-eye image may correspond to the pixel point 608 of the right-eye image, and the pixel point 605 and the pixel point 608 may match the map point 611. A pair of pixel points of a left-eye image and a right-eye image corresponding to the same point may be referred to as a matching pixel point.

좌안 렌즈(601)의 위치와 우안 렌즈(602)의 위치는 고정적이고, 좌안 렌즈(601)의 위치와 우안 렌즈(602)사이의 거리, 좌안 렌즈(601)와 좌안 영상 사이의 거리 및 우안 렌즈(602)와 우안 영상 사이의 거리는 이미 알고 있을 수 있다.The position of the left-eye lens 601 and the position of the right-eye lens 602 are fixed, the distance between the position of the left-eye lens 601 and the right-eye lens 602, the distance between the left-eye lens 601 and the left-eye image and the right-eye lens The distance between 602 and the right eye image may already be known.

이미 알고 있는 거리 정보에 기초하여, 맵 포인트의 깊이 정보를 산출할 수 있다. 구체적으로, 좌안 렌즈(601)와 픽셀 포인트(603)의 연결선을 연장하고, 우안 렌즈(602)와 픽셀 포인트(606)의 연결선을 연장하여, 맵 포인트(609)를 획득할 수 있고, 유사하게, 맵 포인트(610) 및 맵 포인트(611)를 획득할 수 있고, 맵 포인트(609), 맵 포인트(610) 및 맵 포인트(611)는 글로벌 스파스 깊이 맵에 위치할 수 있다.Based on the distance information already known, depth information of the map point can be calculated. Specifically, the connecting line between the left-eye lens 601 and the pixel point 603 may be extended, and the connecting line between the right-eye lens 602 and the pixel point 606 may be extended to obtain a map point 609, and similarly , The map point 610 and the map point 611 may be acquired, and the map point 609, the map point 610, and the map point 611 may be located in the global sparse depth map.

도 7과 도 8은 일 실시예에 따른 로컬 덴스 깊이 맵을 산출하는 방법을 설명하기 위한 도면들이다.7 and 8 are diagrams for describing a method of calculating a local depth depth map according to an embodiment.

도 7을 참조하면, 일 실시예에 따른 단계들(710 내지 750)은 도 1 내지 도3을 참조하여 전술된 산출 장치에 의해 수행된다.Referring to FIG. 7, steps 710 to 750 according to an embodiment are performed by the calculation device described above with reference to FIGS. 1 to 3.

도 7을 참조하면, 단계(710)에서, 산출 장치는 현재 프레임에 포함된 우안 영상과 좌안 영상을 특징 추출 모듈에 입력하여, 우안 영상에 대응하는 우 특징 맵(feature map)과 좌안 영상에 대응하는 좌 특징 맵을 산출할 수 있다.Referring to FIG. 7, in step 710, the calculation device inputs the right-eye image and the left-eye image included in the current frame to the feature extraction module, and corresponds to a right-feature map and a left-eye image corresponding to the right-eye image. The left feature map can be calculated.

도 8을 참조하면, 일 실시예에 따른 특징 추출 모듈(810)은 좌안 영상이 입력되는 제1 특징 추출 모듈(811)과 우안 영상이 입력되는 제2 특징 추출 모듈(812)을 포함할 수 있다. 제1 특징 추출 모듈(811)은 좌 컨볼루션 인공 신경망일 수 있고, 제2 특징 추출 모듈(812)은 우 컨볼루션 인공 신경망일 수 있고, 좌 컨볼루션 인공 신경망과 우 컨볼루션 인공 신경망은 가중치(weight)를 공유할 수 있다.Referring to FIG. 8, the feature extraction module 810 according to an embodiment may include a first feature extraction module 811 to which a left eye image is input and a second feature extraction module 812 to which a right eye image is input. . The first feature extraction module 811 may be a left convolutional artificial neural network, the second feature extraction module 812 may be a right convolutional artificial neural network, and the left convolutional artificial neural network and the right convolutional artificial neural network may be weighted ( weight).

제1 특징 추출 모듈(811)과 제2 특징 추출 모듈은 이차원 컨볼루션 인공 신경망 및 미리 트레이닝 한 공간 피라미드 풀링 네트워크(spatial pyramid pooling network)를 포함하는 인공 신경망으로 구성될 수 있다.The first feature extraction module 811 and the second feature extraction module may be composed of a two-dimensional convolutional artificial neural network and an artificial neural network including a pre-trained spatial pyramid pooling network.

도 7을 참조하면, 단계(720)에서, 산출 장치는 우 특징 맵과 좌 특징 맵에 기초하여, 좌안 영상과 우안 영상 간의 매칭되는 픽셀들의 초기 매칭 비용 데이터(initial matching cost data)를 획득할 수 있다. 매칭 비용 데이터는 매칭 비용 바디 또는 매칭 코스트 행렬로 지칭될 수도 있다.Referring to FIG. 7, in operation 720, the calculating device may obtain initial matching cost data of pixels matched between the left-eye image and the right-eye image based on the right feature map and the left feature map. have. The matching cost data may be referred to as a matching cost body or matching cost matrix.

도 8을 참조하면, 산출 장치는 초기 비용 볼륨 모듈(820)을 이용하여 우 특징 맵과 좌 특징 맵을 연결하여, 초기 매칭 비용 데이터를 획득할 수 있다. 예를 들어, 좌, 우 특징 맵의 디멘젼이 모두 m*n*c

(m, n및 c는 자연수)라고 가정하고, 만약 시차(disparity)가 0이라면, 좌, 우 특징 맵을 직접 연결하여 크기가 m*n*2c인 매트릭스를 획득할 수 있고, 만약 시차가 d라면(d는 자연수), 특징 맵을 미리 결정된 x축 방향으로 d열 평행 이동시킨 후 좌 특징 맵과 우 특징 맵을 연결하여, d개의 크기가 m*n*2c인 매트릭스를 획득할 수 있다.Referring to FIG. 8, the calculation device may obtain initial matching cost data by connecting the right feature map and the left feature map using the initial cost volume module 820. For example, the dimensions of the left and right feature maps are all m*n*c

Assuming (m, n, and c are natural numbers), and if the disparity is 0, a matrix having a size of m*n*2c can be obtained by directly connecting the left and right feature maps, and if the disparity is d Ramen (d is a natural number), the feature map may be moved in parallel to the d-column in a predetermined x-axis direction, and then a left feature map and a right feature map may be connected to obtain a matrix of d size m*n*2c.

도 7을 참조하면, 단계(730)에서, 산출 장치는 초기 매칭 비용 데이터를 인공 신경망에 입력하여, 매칭 비용 데이터를 예측할 수 있다. Referring to FIG. 7, in operation 730, the calculation apparatus may predict matching cost data by inputting initial matching cost data into the artificial neural network.

도 8을 참조하면, 산출 장치는 모래시계(Hourglass) 인공 신경망(830)을 이용하여 초기 매칭 비용 데이터를 입력 받아, 매칭 비용 데이터를 예측할 수 있다.Referring to FIG. 8, the calculation device may receive initial matching cost data using the hourglass artificial neural network 830 and predict matching cost data.

도 7을 참조하면, 단계(740)에서 산출 장치는 매칭 비용 데이터에 기초하여, 매칭되는 픽셀들 각각의 깊이 정보를 산출할 수 있다.Referring to FIG. 7, in operation 740, the calculation device may calculate depth information of each of the pixels to be matched based on the matching cost data.

도 8을 참조하면, 산출 장치는 공간 컨볼루션 인공 신경망(SCNN; spatial convolution neural network)(840)을 이용하여 매칭 비용 데이터에 대해 공간 컨볼루션 연산을 수행하여 매칭 비용 데이터를 획득할 수 있다.Referring to FIG. 8, the calculation device may obtain matching cost data by performing spatial convolution operation on the matching cost data using a spatial convolution neural network (SCNN) 840.

산출 장치는 인접한 픽셀 포인트 사이의 시차 변화가 현저하지 않게 하기 위해, 컨볼루션 인공 신경망(예를 들어, 공간 컨볼루션 인공 신경망)를 이용하여, 매칭 비용 데이터에 대해 공간 컨볼루션 연산을 수행할 수 있다. 이를 통해, 산출 장치는 인접한 픽셀 포인트의 시차에 대한 공간 연속성을 증가시킬 수 있고, 일련의 노이즈 포인트를 제거할 수 있다. 공간 컨볼루션 연산을 수행하는 상세한 방법은 도 9를 참조하여 설명된다.The calculation device may perform a spatial convolution operation on the matching cost data by using a convolutional artificial neural network (eg, a spatial convolutional artificial neural network) so that the parallax change between adjacent pixel points is not remarkable. . Through this, the computing device can increase spatial continuity with respect to parallax of adjacent pixel points and remove a series of noise points. A detailed method of performing the spatial convolution operation is described with reference to FIG. 9.

산출 장치는 회귀 모듈(850)을 이용하여 유연성이 가장 큰 전달 함수(예를 들어, 소프트맥스 함수)를 포함한 회귀함수에 기초하여 매칭 비용 데이터에 대해 계산을 수행하여 매 쌍의 매칭 픽셀 포인트에 대응하는 각각의 가능한 시차 값 및 대응하는 확률 분포를 획득할 수 있다. The calculation device uses the regression module 850 to perform calculation on the matching cost data based on the regression function including the most flexible transfer function (eg, Softmax function) to correspond to each pair of matching pixel points Each possible parallax value and corresponding probability distribution can be obtained.

산출 장치는 매 쌍의 픽셀 포인트에 대응하는 각각의 가능한 시차 값 및 대응하는 확률 분포에 기초하여 해당 픽셀 포인트에 대응하는 깊이 정보를 산출할 수 있다. 구체적으로, 산출 장치는 매 쌍의 매칭 픽셀 포인트에 대해, 누적치(예, 기대치)를 계산하는 방법을 이용하여 시차를 계산할 수 있다. 예를 들어, 산출 장치는 해당 매칭 픽셀 포인트의 후보 시차들과 후보 시차들에 대응하는 확률을 각각 곱할 수 있고, 그 다음 각각의 곱을 합하여 해당 매칭 픽셀 포인트에 대응하는 시차를 획득할 수 있다.The calculation device may calculate depth information corresponding to the corresponding pixel point based on each possible parallax value corresponding to each pair of pixel points and a corresponding probability distribution. Specifically, the calculation device may calculate a parallax for each pair of matching pixel points using a method of calculating a cumulative value (eg, an expected value). For example, the calculation device may multiply each candidate parallax of the corresponding matching pixel point and the probability corresponding to the candidate parallaxes, and then sum each product to obtain a parallax corresponding to the corresponding matching pixel point.

산출 장치는 매칭 픽셀 포인트들의 깊이 정보에 기초하여 로컬 덴스 깊이 맵을 산출할 수 있다.The calculation device may calculate a local dense depth map based on the depth information of the matching pixel points.

도 9는 일 실시예에 따른 공간 컨볼루션 연산을 수행하는 방법을 설명하기 위한 도면이다.9 is a diagram illustrating a method of performing a spatial convolution operation according to an embodiment.

도 9를 참조하면, 산출 장치는 매칭 비용 데이터에 대해 설정된 방향에 따라, 매칭 비용 데이터에 대해 분할을 진행하여 복수의 매칭 비용 레이어를 획득할 수 있고, 복수의 매칭 비용 레이어 각각에 대하여, 방향에 따라 차례대로 컨볼루션 연산을 수행할 수 있다. 산출 장치는 임의의 매칭 비용 레이어에 대해 컨볼루션 연산을 수행할 때, 임의의 매칭 비용 레이어에 상기 임의의 매칭 비용 레이어 이전의 매칭 비용 레이어의 컨볼루션 결과를 누적한 후, 컨볼루션 연산을 수행할 수 있다.Referring to FIG. 9, the calculating apparatus may obtain a plurality of matching cost layers by dividing the matching cost data according to a direction set for the matching cost data, and for each of the plurality of matching cost layers, in a direction Therefore, the convolution operation can be performed in sequence. When the convolution operation is performed on an arbitrary matching cost layer, the calculation device accumulates the convolution results of the matching cost layer before the arbitrary matching cost layer in an arbitrary matching cost layer, and then performs a convolution operation Can.

예를 들어, 산출 장치는 매칭 비용 데이터의 높이 방향과 수직인 복수 개의 제1 평면을 이용하여, 매칭 비용 데이터를 높이 방향에 평행하는 제1 방향(예를 들어, 위에서 아래의 방향)에 따라 차례로 배열된 복수 개의 제1 매칭 비용 레이어들로 분할할 수 있다.For example, the calculation device sequentially uses the first direction parallel to the height direction (for example, the direction from top to bottom) by using the plurality of first planes perpendicular to the height direction of the matching cost data. It can be divided into a plurality of arranged first matching cost layers.

수학식 1을 참조하면, 산출 장치는 미리 결정된 컨볼루션 커널을 이용하여 제1 방향에 따라 두 번째의 제1 매칭 비용 레이어로부터 각각의 제1 매칭 비용 레이어에 대해 업데이트를 수행할 수 있다.Referring to Equation 1, the calculation device may perform an update for each first matching cost layer from the second first matching cost layer according to the first direction using a predetermined convolution kernel.

i, j, k는 매칭 비용 데이터의 높이, 너비 및 형상 채널, K는 컨볼루션 함수, X는 매칭 비용 레이어, X'은 업데이트된 매칭 비용 레이어를 의미할 수 있다.i, j, and k are height, width and shape channels of the matching cost data, K is a convolution function, X is a matching cost layer, and X'is an updated matching cost layer.

제1 매칭 비용 레이어의 업데이트 값은 제1 매칭 비용 레이어의 현재 값에 제1 매칭 비용 레이어 이전의 매칭 비용 레어어에 컨볼루션 연산한 값의 합일 수 있다. The update value of the first matching cost layer may be the sum of the current value of the first matching cost layer and the value of the convolution operation of the matching cost layer before the first matching cost layer.

산출 장치는 제1 컨볼루션 매칭 비용 데이터를 제1 방향과 반대인 제2 방향(예를 들어, 아래에서 위의 방향)에 따라 순서대로 배열되도록 분할된 복수개의 제2 매칭 비용 레이어들로 분할하고, 제2 매칭 비용 레이어들에 대하여 수학식 1에 기초하여 컨볼루션 연산을 수행할 수 있다.The calculating device divides the first convolution matching cost data into a plurality of second matching cost layers divided to be arranged in order in a second direction (for example, from bottom to top) opposite to the first direction, and , Convolutional operations may be performed on the second matching cost layers based on Equation (1).

또한, 산출 장치는 매칭 비용 데이터의 길이 방향 또는 너비 방향과 수직인 복수 개의 제2 평면을 이용하여, 제2 컨볼루션 매칭 비용 데이터를 제2 평면과 수직인 제3 방향(예를 들어, 왼쪽에서 오른쪽의 방향)에 따라 순서대로 배열한 복수개의 제3 매칭 비용 레이어로 분할하고, 제3 매칭 비용 레이어들에 대하여 수학식 1에 기초하여 컨볼루션 연산을 수행할 수 있다.In addition, the calculating device may use the plurality of second planes perpendicular to the length direction or the width direction of the matching cost data, so that the second convolution matching cost data is in the third direction perpendicular to the second plane (eg, from the left side). It may be divided into a plurality of third matching cost layers arranged in order according to the right direction), and the convolution operation may be performed on the third matching cost layers based on Equation (1).

또한, 산출 장치는 제3 컨볼루션 매칭 비용 데이터를 제3 방향과 반대인 제4방향(예를 들어, 오른쪽에서 왼쪽까지의 방향)으로 차례로 배열한 복수 개의 제4 서브 매칭 비용 레이어로 분할하고, 제4 매칭 비용 레이어들에 대하여 수학식 1에 기초하여 컨볼루션 연산을 수행할 수 있다.In addition, the calculation device divides the third convolution matching cost data into a plurality of fourth sub-matching cost layers arranged in order in a fourth direction (for example, from right to left) opposite to the third direction, Convolution operations may be performed on the fourth matching cost layers based on Equation (1).

산출 장치는 매칭 비용 데이터의 높이 및 너비(또는 길이) 두 디멘젼(dimension)에서 공간 필터링(즉, 매칭 비용 누적 수행)을 수행할 수 있다. 공간 필터링은 구체적으로, 위에서 아래로, 아래에서 위로, 왼쪽에서 오른쪽으로, 오른쪽에서 왼쪽으로 수행하는 총 네 개의 방향에 따른 네 개의 과정으로 분류될 수 있다. 이 네 개 과정의 계산 방법은 유사하다.The calculation device may perform spatial filtering (ie, accumulate matching cost) in two dimensions of height and width (or length) of the matching cost data. Spatial filtering can be classified into four processes according to a total of four directions, specifically, from top to bottom, bottom to top, left to right, and right to left. The calculation method for these four processes is similar.

산출 장치가 위에서 아래 방향으로 수행하는 과정과 아래에서 위 방향으로 수행하는 과정(또는 왼쪽에서 오른쪽 방향으로 수행하는 과정과 오른쪽에서 왼쪽 방향으로 수행하는 과정을 모두 수행)을 모두 수행하는 이유는 각 방향에 대응하는 비용이 대등하게 되도록 하기 위함 일 수 있다. 산출 장치는 매칭 비용 레이어에 대해 업데이트를 수행하여, 각 방향에 대응하는 매칭 비용의 연속성을 증가시킬 수 있다. The reason that the calculation device performs both the process from top to bottom and the process from bottom to top (or both from left to right and from right to left) is in each direction. It may be to ensure that the cost corresponding to is equal. The calculation device may update the matching cost layer, thereby increasing the continuity of the matching cost corresponding to each direction.

일반적으로, 픽셀 포인트에 대한 깊이 정보 예측하기 위하여 단대단(end-to-end) 트레이닝 방법이 사용된다. 일 실시예에 따른 공간 컨볼루션 연산(공간 필터링)을 수행하지 않고 기존의 방법에 따라 오브젝트의 경계선을 처리하면, 경계선이 단절되는 현상이 발생할 수 있다. 그러나, 공간 컨볼루션 연산을 거친 이후에는, 매칭 비용 데이터가 감소하거나 경계선이 단절되는 현상이 발생하지 않을 수 있다.In general, an end-to-end training method is used to predict depth information for a pixel point. If a boundary line of an object is processed according to an existing method without performing spatial convolution operation (spatial filtering) according to an embodiment, a boundary line may be disconnected. However, after the spatial convolution operation, the matching cost data may not be reduced or the boundary line may not be disconnected.

도 10은 일 실시예에 따른 정적 객체를 마스크하는 방법을 설명하기 위한 도면이다.10 is a diagram for explaining a method of masking a static object according to an embodiment.

도 10을 참조하면, 일 실시예에 따른 산출 장치는 현재 프레임을 특징 추출 모듈에 입력하여, 현재 프레임에 대응하는 특징 맵을 산출할 수 있다. 특징 추출 모듈은 도 8을 참조하여 설명한 특징 추출 모듈(810)일 수 있다. 특징 추출 모듈은 좌안 영상이 입력되는 제1 특징 추출 모듈(1010)과 우안 영상이 입력되는 제2 특징 추출 모듈(1020)을 포함할 수 있다. 제1 특징 추출 모듈(1010)은 좌 컨볼루션 인공 신경망일 수 있고, 제2 특징 추출 모듈(1020)은 우 컨볼루션 인공 신경망일 수 있고, 좌 컨볼루션 인공 신경망과 우 컨볼루션 인공 신경망은 가중치(weight)를 공유할 수 있다.Referring to FIG. 10, the calculation apparatus according to an embodiment may input a current frame into a feature extraction module to calculate a feature map corresponding to the current frame. The feature extraction module may be the feature extraction module 810 described with reference to FIG. 8. The feature extraction module may include a first feature extraction module 1010 to which the left eye image is input and a second feature extraction module 1020 to which the right eye image is input. The first feature extraction module 1010 may be a left convolutional artificial neural network, the second feature extraction module 1020 may be a right convolutional artificial neural network, and the left convolutional artificial neural network and the right convolutional artificial neural network may be weighted ( weight).

산출 장치는 현재 프레임과 현재 프레임의 이전 프레임(이하, 이전 프레임이라 지칭한다) 각각을 특징 추출 모듈에 입력할 수 있다. 예를 들어, 이전 프레임은 제1 좌안 영상과 제1 우안 영상을 포함하고, 현재 프레임은 제2 좌안 영상과 제2 우안 영상을 포함할 수 있다.The calculation device may input each of the current frame and the previous frame (hereinafter referred to as a previous frame) of the current frame to the feature extraction module. For example, the previous frame may include the first left-eye image and the first right-eye image, and the current frame may include the second left-eye image and the second right-eye image.

산출 장치는 광류 검출을 위한 인공 신경망(이하, 제1 인공 신경망이라 지칭한다)(1030)을 이용하여, 현재 프레임의 특징 맵과 이전 프레임의 특징 맵에 기초하여 현재 프레임과 이전 프레임 사이의 광류 정보(optical flow information)을 예측할 수 있다. 광류 정보는 두 개의 프레임 사이에 이미지 객체의 가시적인 동작 패턴을 의미할 수 있다.The calculation device uses an artificial neural network (hereinafter, referred to as a first artificial neural network) 1030 for detecting the optical flow, optical flow information between the current frame and the previous frame based on the feature map of the current frame and the feature map of the previous frame (optical flow information) can be predicted. The optical flow information may mean a visible motion pattern of an image object between two frames.

산출 장치는 모션(motion) 검출에 이용되는 인공 신경망(이하, 제2 인공 신경망이라 지칭한다)(1040)을 이용하여, 현재 프레임의 특징 맵에 대응하는 정적 객체 바운딩 박스(bounding box)(1050)를 획득할 수 있다. 제2 인공 신경망은 정적 객체 바운딩 박스(1050)에 기초하여 객체 카테고리 속성 정보를 예측할 수 있다. 제2 인공 신경망은 현재 프레임에 포함된 객체들을 복수개의 카테고리로 분류할 수 있다. 예를 들어, 산출 장치는 객체들을 차, 사람, 동물, 교통 표지, 책걸상, 도로 등으로 분류할 수 있다.The computing device uses an artificial neural network (hereinafter referred to as a second artificial neural network) 1040 used for motion detection, and a static object bounding box 1050 corresponding to the feature map of the current frame. Can be obtained. The second artificial neural network may predict object category attribute information based on the static object bounding box 1050. The second artificial neural network may classify objects included in the current frame into a plurality of categories. For example, the computing device may classify objects into cars, people, animals, traffic signs, book stands, roads, and the like.

산출 장치는 정적 객체 바운딩 박스(1050)와 광류 정보를 동적 상태 예측을 위한 인공 신경망(이하, 제3 인공 신경망이라 지칭한다)(1060)에 입력하여, 정적 객체 영역을 마스크(1070)할 수 있다. 정적 객체는 이차 영상(binary image) 일 수 있고, 현재 프레임에 포함된 객체들의 상태 정보를 나타낼 수 있다. 객체들의 상태 정보란 해당 객체가 동적 속성을 갖는지, 정적 속성을 갖는지 여부에 대한 정보를 포함할 수 있다. 산출 장치는 정적 객체 영역을 마스크(1070)하여 동적 객체 영역을 추출할 수 있다.The computing device may mask the static object region by inputting the static object bounding box 1050 and the optical flow information into an artificial neural network (hereinafter referred to as a third artificial neural network) 1060 for dynamic state prediction. . The static object may be a secondary image, and may indicate status information of objects included in the current frame. The state information of objects may include information about whether the corresponding object has dynamic properties or static properties. The computing device may extract the dynamic object region by masking the static object region 1070.

도 11은 일 실시예에 따른 글로벌 덴스 깊이 맵을 생성하는 방법의 순서도를 도시한 도면이다.11 is a flowchart illustrating a method of generating a global dense depth map according to an embodiment.

도 11을 참조하면, 일 실시예에 따른 단계들(1110 내지 1130)은 도 1 내지 도 10을 참조하여 전술된 산출 장치에 의해 수행된다.Referring to FIG. 11, steps 1110 to 1130 according to an embodiment are performed by the calculation device described above with reference to FIGS. 1 to 10.

단계(1110)에서, 산출 장치는 로컬 덴스 깊이 맵을 복수의 그리드 셀들로 분할할 수 있다.In step 1110, the computing device may divide the local dense depth map into a plurality of grid cells.

단계(1120)에서, 산출 장치는 동적 객체 영역이 제거된 글로벌 스파스 깊이 맵에 기초하여, 그리드 셀들의 꼭지점들(corner points)에 대응하는 픽셀 포인트들의 깊이 정보를 업데이트할 수 있다. 동적 객체 영역에 포함된 픽셀 포인트들은 노이즈 포인트일 수 있기 때문에, 산출 장치는 글로벌 스파스 깊이 맵에서 동적 객체 영역을 제거한 깊이 맵을 로컬 덴스 깊이 맵과 통합할 수 있다.In operation 1120, the calculation device may update depth information of pixel points corresponding to corner points of grid cells based on the global sparse depth map from which the dynamic object region is removed. Since the pixel points included in the dynamic object region may be noise points, the calculation device may integrate the depth map with the dynamic object region removed from the global sparse depth map with the local dense depth map.

단계(1130)에서, 산출 장치는 동적 객체 영역이 제거된 글로벌 스파스 깊이 맵 및 업데이트된 꼭지점들에 대응하는 픽셀 포인트들의 깊이 정보에 기초하여, 그리드 셀들의 내부 영역에 포함되는 픽셀 포인트들의 깊이 정보를 업데이트할 수 있다.In step 1130, the calculation device calculates the depth information of the pixel points included in the inner region of the grid cells based on the global sparse depth map from which the dynamic object region is removed and the depth information of the pixel points corresponding to the updated vertices. Can be updated.

산출 장치는 먼저 꼭지점들에 대응하는 픽셀 포인트들의 깊이 정보를 계산할 수 있고, 그 다음 나머지 픽셀 포인트들의 깊이 정보를 꼭지점들에 대응하는 픽셀 포인트들의 깊이 정보에 기초하여 보간(interpolation)할 수 있다. 픽셀 포인트들의 깊이 정보를 업데이트하는 상세한 방법은 도 13을 참조하여 설명된다.The calculating device may first calculate depth information of pixel points corresponding to the vertices, and then interpolate depth information of the remaining pixel points based on depth information of pixel points corresponding to the vertices. A detailed method of updating depth information of pixel points is described with reference to FIG. 13.

도 12는 일 실시예에 따른 글로벌 덴스 깊이 맵을 생성하는 방법을 설명하기 위한 도면이다.12 is a diagram for describing a method of generating a global dense depth map according to an embodiment.

도 12를 참조하면, 일 실시예에 따른 산출 장치는 프레임을 입력 받을 수 있다(1210). 산출 장치는 현재 프레임을 인공 신경망 모듈에 입력하여 현재 프레임에 대응하는 로컬 덴스 깊이 맵을 산출할 수 있다(1215). 산출 장치는 현재 프레임을 인공 신경망 모듈에 입력하여 현재 프레임에 포함된 정적 객체를 마스크할 수 있다(1220). 산출 장치는 현재 프레임 및 현재 프레임 이전의 키 프레임에 기초하여, 현재 프레임에 대응하는 글로벌 스파스 깊이 맵을 산출할 수 있다(1225).Referring to FIG. 12, the calculation apparatus according to an embodiment may receive a frame 1210. The calculation device may input a current frame to the artificial neural network module to calculate a local density depth map corresponding to the current frame (1215). The calculating device may mask the static object included in the current frame by inputting the current frame to the artificial neural network module (1220). The calculation device may calculate a global sparse depth map corresponding to the current frame based on the current frame and the key frame before the current frame (1225).

산출 장치는 로컬 덴스 깊이 맵을 복수의 그리드 셀들로 분할할 수 있다(1230). 산출 장치는 글로벌 스파스 깊이 맵에서 동적 객체 영역을 제거할 수 있다(1235).The calculation device may divide the local density depth map into a plurality of grid cells (1230 ). The computing device may remove the dynamic object region from the global sparse depth map (1235).

산출 장치는 동적 객체 영역이 제거된 글로벌 스파스 깊이 맵에 기초하여, 그리드 셀들의 꼭지점들에 대응하는 픽셀 포인트들의 깊이 정보를 업데이트할 수 있다(1240).The computing device may update the depth information of the pixel points corresponding to the vertices of the grid cells (1240) based on the global sparse depth map from which the dynamic object region has been removed.

산출 장치는 동적 객체 영역이 제거된 글로벌 스파스 깊이 맵 및 업데이트된 꼭지점들에 대응하는 픽셀 포인트들의 깊이 정보에 기초하여, 그리드 셀들의 내부 영역에 포함되는 픽셀 포인트들의 깊이 정보를 업데이트 하여(1245) 최종적으로 정확도 및 신뢰도가 높은 글로벌 덴스 깊이 맵을 생성할 수 있다(1250).The calculation device updates the depth information of the pixel points included in the inner region of the grid cells based on the global sparse depth map from which the dynamic object region is removed and the depth information of the pixel points corresponding to the updated vertices (1245). Finally, a global dense depth map with high accuracy and reliability may be generated (1250).

도 13은 일 실시예에 따른 픽셀 포인트들의 깊이 정보를 업데이트하는 방법을 설명하기 위한 도면이다.13 is a diagram for describing a method of updating depth information of pixel points according to an embodiment.

도 13을 참조하면, 일 실시예에 따른 픽셀 포인트가 그리드 셀들의 꼭지점에 위치하는 경우(1310), 산출 장치는 수학식 2에 기초하여 글로벌 덴스 깊이 맵의 픽셀 포인트의 깊이 정보(d_p)를 결정할 수 있다.Referring to FIG. 13, when a pixel point according to an embodiment is located at a vertex of grid cells (1310), the calculation device may calculate the depth information (d _p ) of the pixel point of the global density depth map based on Equation (2). Can decide.

일 실시예에 따른 픽셀 포인트가 그리드 셀들의 내부에 위치하는 경우(1320), 산출 장치는 수학식 3에 기초하여 글로벌 덴스 깊이 맵의 픽셀 포인트의 깊이 정보(d_p)를 결정할 수 있다.When a pixel point according to an embodiment is located inside grid cells (1320 ), the calculation device may determine depth information d _p of the pixel point of the global density depth map based on Equation (3).

여기서, d_l,p는 픽셀 포인트 p의 로컬 덴스 깊이 맵에서의 깊이 정보를 나타내고, d_g,q는 픽셀 포인트 q의 글로벌 스파스 깊이 맵에서의 깊이 정보를 나타내고, q는 집합

에서의 픽셀 포인트를 나타내고,

는 로컬 덴스 깊이 맵에서 픽셀 포인트 p에 인접한 네 개의 그리드에서의 픽셀 포인트의 집합을 나타내고,

는

에서 동적 오브젝트에 대응하는 픽셀 포인트를 제거한 이후의 집합을 나타내고, N_p는

의 각각의 픽셀 포인트 중 글로벌 스파스 깊이 맵 좌표를 지닌 픽셀 포인트의 수량을 나타내고, D(p,q)는 픽셀 포인트 p와 픽셀 포인트q 사이의 거리를 나타내고, d_l,q는 픽셀 포인트 q의 글로벌 덴스 깊이 맵에서의 깊이 정보를 나타내고, d_l,p'는 픽셀 포인트 p'의 로컬 덴스 깊이 맵에서의 깊이 정보를 나타내고, q_k는 픽셀 포인트 q'이 위치한 그리드의 정점을 나타내고, d_qk는 q_k의 글로벌 스파스 깊이 맵에서의 깊이 정보를 나타내고, d_l,qk는 정점 q_k의 로컬 덴스 깊이 맵에서의 깊이 정보를 나타내고, D(p',q_k)는 픽셀 포인트 p'과 정점 q_k 사이의 거리를 나타낼 수 있다.Here, d _l,p represents depth information in the local dense depth map of the pixel point p, d _g,q represents depth information in the global sparse depth map of the pixel point q, and q is a set

Represents the pixel point at,

Denotes a set of pixel points in four grids adjacent to pixel point p in the local dense depth map,

The

Denotes the set after removing the pixel points corresponding to the dynamic object, N _p is

The number of pixel points with global sparse depth map coordinates among each pixel point of, D(p,q) represents the distance between pixel point p and pixel point q, and d _l,q is the pixel point q Depth information in the global dense depth map, d _l,p' represents the depth information in the local dense depth map of the pixel point p', q _k represents the vertex of the grid _where the pixel point _q'is located, d _qk _Denotes depth information in the global sparse depth map of q _k , d _l,qk denotes depth information in the local dense depth map of vertex q _k , and D(p',q _k ) denotes the pixel point p' It can represent the distance between vertices q _k .

도 14는 일 실시예에 따른 깊이 맵을 산출하는 장치의 블록도이다.14 is a block diagram of an apparatus for calculating a depth map according to an embodiment.

도 14를 참조하면, 일 실시예에 따른 산출 장치(1400)는 프로세서(1410), 메모리(1430) 및 센서(들)(1470)를 포함한다. 산출 장치(1400)는 통신 인터페이스(1450) 및/또는 디스플레이 장치(미도시)를 더 포함할 수 있다. 프로세서(1410), 메모리(1430), 통신 인터페이스(1450), 센서(들)(1470) 및 디스플레이 장치는 통신 버스(1405)를 통해 서로 통신할 수 있다.Referring to FIG. 14, the computing device 1400 according to an embodiment includes a processor 1410, a memory 1430, and sensor(s) 1470. The calculation device 1400 may further include a communication interface 1450 and/or a display device (not shown). The processor 1410, the memory 1430, the communication interface 1450, the sensor(s) 1470, and the display device may communicate with each other through the communication bus 1405.

산출 장치(1400)는 예를 들어, 증강 현실 헤드-업 디스플레이, 증강 현실/가상 현실 글래스, 자율 주행 자동차, 지능형 자동차, 스마트 폰, 및 모바일 기기 등과 같이 실시간으로 다양한 증강 현실 어플리케이션들을 구현하는 전자 장치일 수 있다.The computing device 1400 is an electronic device that implements various augmented reality applications in real time, such as, for example, augmented reality head-up displays, augmented reality/virtual reality glasses, autonomous vehicles, intelligent cars, smart phones, and mobile devices. Can be

센서(들)(1470)는 카메라와 같은 영상 센서일 수 있다. 카메라는 입력 영상을 획득한다. 카메라는 예를 들어, RGB 카메라 또는 RGB-D(Depth) 카메라일 수 있다. 입력 영상은 산출 장치(1400)에 입력되는 영상으로, 예를 들어, 실시간 영상 또는 동영상일 수 있다. 또는 입력 영상은 모노 영상일 수도 있고, 스테레오 영상일 수도 있다. 입력 영상은 복수의 프레임들을 포함할 수 있다. 입력 영상은 카메라를 통해 촬영 또는 캡쳐된 것일 수도 있고, 산출 장치(1400)의 외부로부터 획득된 것일 수도 있다.The sensor(s) 1470 may be an image sensor such as a camera. The camera acquires the input image. The camera may be, for example, an RGB camera or an RGB-D (Depth) camera. The input image is an image input to the calculation device 1400, and may be, for example, a real-time image or a video. Alternatively, the input image may be a mono image or a stereo image. The input image may include a plurality of frames. The input image may be captured or captured through a camera, or may be obtained from the outside of the calculation device 1400.

프로세서(1420)는 복수의 프레임들을 이용하여, 현재 프레임에 대응하는 글로벌 스파스 깊이 맵을 산출하고, 현재 프레임을 이용하여, 현재 프레임에 대응하는 로컬 덴스 깊이 맵을 산출하고, 현재 프레임에서, 정적 객체 영역을 마스크하여 동적 객체 영역을 추출하고, 글로벌 스파스 깊이 맵에서 동적 객체 영역을 제거하며, 동적 객체 영역이 제거된 글로벌 스파스 깊이 맵과 로컬 덴스 깊이 맵을 통합하여 현재 프레임에 대응하는 글로벌 덴스 깊이 맵을 생성할 수 있다.The processor 1420 calculates a global sparse depth map corresponding to the current frame using a plurality of frames, and calculates a local dense depth map corresponding to the current frame using the current frame, and is static in the current frame. Mask the area of the object to extract the dynamic object area, remove the dynamic object area from the global sparse depth map, and integrate the global sparse depth map with the dynamic object area removed and the local dense depth map to correspond to the current frame. You can create a Dense Depth Map.

프로세서(1420)는 현재 프레임에 포함된 하나 이상의 픽셀 포인트에 대응하는 깊이 정보를 산출하고, 깊이 정보에 기초하여 픽셀 포인트의 3차원 좌표를 산출할 수 있다.The processor 1420 may calculate depth information corresponding to one or more pixel points included in the current frame, and calculate three-dimensional coordinates of the pixel points based on the depth information.

프로세서(1420)는 글로벌 덴스 깊이 맵에 기초하여, 현재 프레임에 대응하는 카메라의 포즈 정보를 업데이트하고, 업데이트가 완료된 카메라의 포즈 정보에 기초하여, 글로벌 스파스 깊이 맵을 업데이트할 수 있다.The processor 1420 may update the pose information of the camera corresponding to the current frame based on the global density depth map, and update the global sparse depth map based on the pose information of the updated camera.

프로세서(1420)는 복수의 프레임들 중 키 프레임(key frame)에 대응하는 제1 깊이 정보를 산출하고, 현재 프레임에 대응하는 제2 깊이 정보를 산출하고, 제1 깊이 정보 및 제2 깊이 정보에 기초하여, 현재 프레임에 대응하는 카메라의 포즈 정보를 추정하며, 제2 깊이 정보 및 카메라의 포즈 정보에 기초하여, 글로벌 스파스 깊이 맵을 산출할 수 있다.The processor 1420 calculates first depth information corresponding to a key frame among a plurality of frames, calculates second depth information corresponding to a current frame, and calculates first depth information and second depth information. Based on this, the pose information of the camera corresponding to the current frame is estimated, and based on the second depth information and the pose information of the camera, a global sparse depth map can be calculated.

프로세서(1420)는 현재 프레임에 포함된 우안 영상과 좌안 영상의 스테레오 매칭을 수행할 수 있다.The processor 1420 may perform stereo matching of the right-eye image and the left-eye image included in the current frame.

프로세서(1420)는 복수의 픽셀 포인트들을 포함하는 현재 프레임을 인공 신경망에 입력함으로써, 복수의 픽셀 포인트들의 깊이 정보에 대응하는 인공 신경망의 출력들을 획득하고, 출력들에 기초하여 로컬 덴스 깊이 맵을 산출할 수 있다.The processor 1420 obtains outputs of an artificial neural network corresponding to depth information of a plurality of pixel points by inputting a current frame including a plurality of pixel points into an artificial neural network, and calculates a local density depth map based on the outputs. can do.

프로세서(1420)는 현재 프레임을 인공 신경망에 입력함으로써, 정적 객체 영역과 동적 객체 영역으로 분류된 인공 신경망의 출력들을 획득하고, 출력들에 기초하여 동적 객체 영역을 추출할 수 있다.The processor 1420 may input the current frame into the artificial neural network, obtain outputs of the static neural network classified as the static object region and the dynamic object region, and extract the dynamic object region based on the outputs.

프로세서(1420)는 로컬 덴스 깊이 맵을 복수의 그리드 셀들로 분할하고, 동적 객체 영역이 제거된 글로벌 스파스 깊이 맵에 기초하여, 그리드 셀들의 꼭지점들(corner points)에 대응하는 픽셀 포인트들의 깊이 정보를 업데이트하고, 동적 객체 영역이 제거된 글로벌 스파스 깊이 맵 및 업데이트된 꼭지점들에 대응하는 픽셀 포인트들의 깊이 정보에 기초하여, 그리드 셀들의 내부 영역에 포함되는 픽셀 포인트들의 깊이 정보를 업데이트할 수 있다.The processor 1420 divides the local dense depth map into a plurality of grid cells, and based on the global sparse depth map from which the dynamic object region is removed, depth information of pixel points corresponding to corner points of the grid cells. And update the depth information of the pixel points included in the inner region of the grid cells based on the global sparse depth map from which the dynamic object region is removed and the depth information of the pixel points corresponding to the updated vertices. .

프로세서(1420)는 현재 프레임에 포함된 우안 영상과 좌안 영상을 특징 추출 모듈에 입력하여, 우안 영상에 대응하는 우 특징 맵(feature map)과 좌안 영상에 대응하는 좌 특징 맵을 산출하고, 우 특징 맵과 좌 특징 맵에 기초하여, 좌안 영상과 우안 영상 간의 매칭되는 픽셀들의 초기 매칭 비용 데이터(initial matching cost data)를 획득하고, 초기 매칭 비용 데이터를 인공 신경망에 입력하여, 매칭 비용 데이터를 예측하고, 매칭 비용 데이터에 기초하여, 매칭되는 픽셀들 각각의 깊이 정보를 산출하며, 각각의 깊이 정보에 기초하여, 로컬 덴스 깊이 맵을 산출할 수 있다.The processor 1420 inputs the right-eye image and the left-eye image included in the current frame to the feature extraction module, calculates a right feature map corresponding to the right-eye image and a left feature map corresponding to the left-eye image, and right-right features Based on the map and the left feature map, initial matching cost data of matching pixels between the left-eye image and the right-eye image is obtained, and the initial matching cost data is input to the artificial neural network to predict matching cost data. , Based on the matching cost data, depth information of each of the matched pixels may be calculated, and based on each depth information, a local dense depth map may be calculated.

이 밖에도, 프로세서(1410)는 도 1 내지 도 13을 통해 전술한 방법 또는 방법에 대응되는 알고리즘을 수행할 수 있다. 프로세서(1410)는 프로그램을 실행하고, 산출 장치(1400)를 제어할 수 있다. 프로세서(1410)에 의하여 실행되는 프로그램 코드는 메모리(1430)에 저장될 수 있다.In addition, the processor 1410 may perform an algorithm corresponding to the above-described method or method through FIGS. 1 to 13. The processor 1410 may execute a program and control the calculation device 1400. Program code executed by the processor 1410 may be stored in the memory 1430.

메모리(1430)는 입력 영상 및/또는 복수의 프레임들을 저장할 수 있다. 메모리(1430)는 프로세서(1410)가 추정한 입력 영상에 대한 카메라의 포즈 정보, 프로세서(1410)가 산출한 깊이 맵, 및/또는 프로세서(1410)가 깊이 맵을 이용하여 재구성한 3차원 영상을 저장할 수 있다. The memory 1430 may store an input image and/or a plurality of frames. The memory 1430 uses the pose information of the camera for the input image estimated by the processor 1410, the depth map calculated by the processor 1410, and/or the 3D image reconstructed by the processor 1410 using the depth map. Can be saved.

또한, 메모리(1430)는 전술한 프로세서(1410)에서의 처리 과정에서 생성되는 다양한 정보들을 저장할 수 있다. 이 밖에도, 메모리(1430)는 각종 데이터와 프로그램 등을 저장할 수 있다. 메모리(1430)는 휘발성 메모리 또는 비휘발성 메모리를 포함할 수 있다. 메모리(1430)는 하드 디스크 등과 같은 대용량 저장 매체를 구비하여 각종 데이터를 저장할 수 있다.In addition, the memory 1430 may store various information generated in the process of the processor 1410 described above. In addition, the memory 1430 can store various data and programs. The memory 1430 may include volatile memory or nonvolatile memory. The memory 1430 may be equipped with a mass storage medium such as a hard disk to store various data.

실시예에 따라서, 산출 장치(1400)는 통신 인터페이스(1450)를 통해 산출 장치(1400)의 외부에서 촬영된 입력 영상을 수신할 수 있다. 이 경우, 통신 인터페이스(1450)는 입력 영상 이외에도 입력 영상을 촬영한 촬영 장치의 회전 정보 및 이동 정보 등과 같은 포즈 정보, 촬영 장치의 위치 정보 등을 함께 수신할 수 있다.According to an embodiment, the calculation device 1400 may receive an input image photographed from the outside of the calculation device 1400 through the communication interface 1450. In this case, the communication interface 1450 may receive, in addition to the input image, pose information such as rotation information and movement information of the photographing apparatus that photographed the input image, location information of the photographing apparatus, and the like.

디스플레이 장치는 프로세서(1410)가 산출한 깊이 맵에 의해 재구성한 3차원 영상을 디스플레이할 수 있다.The display device may display a 3D image reconstructed by a depth map calculated by the processor 1410.

이상에서 설명된 실시예들은 하드웨어 구성요소, 소프트웨어 구성요소, 및/또는 하드웨어 구성요소 및 소프트웨어 구성요소의 조합으로 구현될 수 있다. 예를 들어, 실시예들에서 설명된 장치, 방법 및 구성요소는, 예를 들어, 프로세서, 콘트롤러, ALU(arithmetic logic unit), 디지털 신호 프로세서(digital signal processor), 마이크로컴퓨터, FPGA(field programmable gate array), PLU(programmable logic unit), 마이크로프로세서, 또는 명령(instruction)을 실행하고 응답할 수 있는 다른 어떠한 장치와 같이, 하나 이상의 범용 컴퓨터 또는 특수 목적 컴퓨터를 이용하여 구현될 수 있다. 처리 장치는 운영 체제(OS) 및 상기 운영 체제 상에서 수행되는 하나 이상의 소프트웨어 애플리케이션을 수행할 수 있다. 또한, 처리 장치는 소프트웨어의 실행에 응답하여, 데이터를 접근, 저장, 조작, 처리 및 생성할 수도 있다. 이해의 편의를 위하여, 처리 장치는 하나가 사용되는 것으로 설명된 경우도 있지만, 해당 기술분야에서 통상의 지식을 가진 자는, 처리 장치가 복수 개의 처리 요소(processing element) 및/또는 복수 유형의 처리 요소를 포함할 수 있음을 알 수 있다. 예를 들어, 처리 장치는 복수 개의 프로세서 또는 하나의 프로세서 및 하나의 콘트롤러를 포함할 수 있다. 또한, 병렬 프로세서(parallel processor)와 같은, 다른 처리 구성(processing configuration)도 가능하다.The embodiments described above may be implemented with hardware components, software components, and/or combinations of hardware components and software components. For example, the devices, methods, and components described in the embodiments include, for example, processors, controllers, arithmetic logic units (ALUs), digital signal processors (micro signal processors), microcomputers, and field programmable gates (FPGAs). It can be implemented using one or more general purpose computers or special purpose computers, such as arrays, programmable logic units (PLUs), microprocessors, or any other device capable of executing and responding to instructions. The processing device may run an operating system (OS) and one or more software applications running on the operating system. Further, the processing device may access, store, manipulate, process, and generate data in response to the execution of the software. For convenience of understanding, a processing device may be described as one being used, but a person having ordinary skill in the art, the processing device may include a plurality of processing elements and/or a plurality of types of processing elements. It can be seen that may include. For example, the processing device may include a plurality of processors or a processor and a controller. In addition, other processing configurations, such as parallel processors, are possible.

소프트웨어는 컴퓨터 프로그램(computer program), 코드(code), 명령(instruction), 또는 이들 중 하나 이상의 조합을 포함할 수 있으며, 원하는 대로 동작하도록 처리 장치를 구성하거나 독립적으로 또는 결합적으로(collectively) 처리 장치를 명령할 수 있다. 소프트웨어 및/또는 데이터는, 처리 장치에 의하여 해석되거나 처리 장치에 명령 또는 데이터를 제공하기 위하여, 어떤 유형의 기계, 구성요소(component), 물리적 장치, 가상 장치(virtual equipment), 컴퓨터 저장 매체 또는 장치, 또는 전송되는 신호 파(signal wave)에 영구적으로, 또는 일시적으로 구체화(embody)될 수 있다. 소프트웨어는 네트워크로 연결된 컴퓨터 시스템 상에 분산되어서, 분산된 방법으로 저장되거나 실행될 수도 있다. 소프트웨어 및 데이터는 하나 이상의 컴퓨터 판독 가능 기록 매체에 저장될 수 있다.The software may include a computer program, code, instruction, or a combination of one or more of these, and configure the processing device to operate as desired, or process independently or collectively You can command the device. Software and/or data may be interpreted by a processing device, or to provide instructions or data to a processing device, of any type of machine, component, physical device, virtual equipment, computer storage medium or device. , Or may be permanently or temporarily embodied in the transmitted signal wave. The software may be distributed on networked computer systems and stored or executed in a distributed manner. Software and data may be stored in one or more computer-readable recording media.

실시예에 따른 방법은 다양한 컴퓨터 수단을 통하여 수행될 수 있는 프로그램 명령 형태로 구현되어 컴퓨터 판독 가능 매체에 기록될 수 있다. 상기 컴퓨터 판독 가능 매체는 프로그램 명령, 데이터 파일, 데이터 구조 등을 단독으로 또는 조합하여 포함할 수 있다. 상기 매체에 기록되는 프로그램 명령은 실시예를 위하여 특별히 설계되고 구성된 것들이거나 컴퓨터 소프트웨어 당업자에게 공지되어 사용 가능한 것일 수도 있다. 컴퓨터 판독 가능 기록 매체의 예에는 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체(magnetic media), CD-ROM, DVD와 같은 광기록 매체(optical media), 플롭티컬 디스크(floptical disk)와 같은 자기-광 매체(magneto-optical media), 및 롬(ROM), 램(RAM), 플래시 메모리 등과 같은 프로그램 명령을 저장하고 수행하도록 특별히 구성된 하드웨어 장치가 포함된다. 프로그램 명령의 예에는 컴파일러에 의해 만들어지는 것과 같은 기계어 코드뿐만 아니라 인터프리터 등을 사용해서 컴퓨터에 의해서 실행될 수 있는 고급 언어 코드를 포함한다. The method according to the embodiment may be implemented in the form of program instructions that can be executed through various computer means and recorded on a computer-readable medium. The computer-readable medium may include program instructions, data files, data structures, or the like alone or in combination. The program instructions recorded on the medium may be specially designed and configured for the embodiments or may be known and usable by those skilled in computer software. Examples of computer-readable recording media include magnetic media such as hard disks, floppy disks, and magnetic tapes, optical media such as CD-ROMs, DVDs, and magnetic media such as floptical disks. Includes hardware devices specifically configured to store and execute program instructions such as magneto-optical media, and ROM, RAM, flash memory, and the like. Examples of program instructions include high-level language code that can be executed by a computer using an interpreter, etc., as well as machine language codes produced by a compiler.

이상과 같이 실시예들이 비록 한정된 도면에 의해 설명되었으나, 해당 기술분야에서 통상의 지식을 가진 자라면 상기를 기초로 다양한 기술적 수정 및 변형을 적용할 수 있다. 예를 들어, 설명된 기술들이 설명된 방법과 다른 순서로 수행되거나, 및/또는 설명된 시스템, 구조, 장치, 회로 등의 구성요소들이 설명된 방법과 다른 형태로 결합 또는 조합되거나, 다른 구성요소 또는 균등물에 의하여 대치되거나 치환되더라도 적절한 결과가 달성될 수 있다.As described above, although the embodiments have been described by the limited drawings, those skilled in the art can apply various technical modifications and variations based on the above. For example, the described techniques are performed in a different order than the described method, and/or the components of the described system, structure, device, circuit, etc. are combined or combined in a different form from the described method, or other components Alternatively, proper results can be achieved even if replaced or substituted by equivalents.

그러므로, 다른 구현들, 다른 실시예들 및 특허청구범위와 균등한 것들도 후술하는 특허청구범위의 범위에 속한다.Therefore, other implementations, other embodiments, and equivalents to the claims are also within the scope of the following claims.

Claims

현재 프레임을 포함하는 복수의 프레임들을 이용하여, 상기 현재 프레임에 대응하는 글로벌 스파스 깊이 맵(global sparse depth map)을 산출하는 단계;
상기 현재 프레임을 이용하여, 상기 현재 프레임에 대응하는 로컬 덴스 깊이 맵(local dense depth map)을 산출하는 단계;
상기 현재 프레임에서, 정적 객체(static object) 영역을 마스크(mask)하여 동적 객체(none static object) 영역을 추출하는 단계;
상기 글로벌 스파스 깊이 맵에서 상기 동적 객체 영역을 제거하는 단계; 및
상기 동적 객체 영역이 제거된 글로벌 스파스 깊이 맵과 상기 로컬 덴스 깊이 맵을 통합하여 상기 현재 프레임에 대응하는 글로벌 덴스 깊이 맵(global dense depth map)을 생성하는 단계
를 포함하는 깊이 맵 산출 방법.
Calculating a global sparse depth map corresponding to the current frame using a plurality of frames including the current frame;
Calculating a local dense depth map corresponding to the current frame using the current frame;
Extracting a dynamic object region by masking a static object region in the current frame;
Removing the dynamic object region from the global sparse depth map; And
Generating a global dense depth map corresponding to the current frame by integrating the global sparse depth map from which the dynamic object region has been removed and the local dense depth map.
Depth map calculation method comprising a.

제1항에 있어서,
상기 글로벌 스파스 깊이 맵을 산출하는 단계는
상기 현재 프레임에 포함된 하나 이상의 픽셀 포인트에 대응하는 깊이 정보를 산출하는 단계;
상기 현재 프레임에 대응하는 카메라의 포즈 정보를 추정하는 단계; 및
상기 깊이 정보 및 상기 카메라의 포즈 정보에 기초하여, 상기 픽셀 포인트의 3차원 좌표를 산출하는 단계
를 포함하는, 깊이 맵 산출 방법.
According to claim 1,
The step of calculating the global sparse depth map is
Calculating depth information corresponding to one or more pixel points included in the current frame;
Estimating pose information of a camera corresponding to the current frame; And
Calculating 3D coordinates of the pixel point based on the depth information and the pose information of the camera
Including, depth map calculation method.

제1항에 있어서,
상기 글로벌 덴스 깊이 맵에 기초하여, 상기 현재 프레임에 대응하는 카메라의 포즈 정보를 업데이트하는 단계
를 더 포함하는, 깊이 맵 산출 방법.
According to claim 1,
Updating pose information of a camera corresponding to the current frame based on the global depth map.
Further comprising, the depth map calculation method.

제3항에 있어서,
상기 업데이트가 완료된 카메라의 포즈 정보에 기초하여, 상기 글로벌 스파스 깊이 맵을 업데이트하는 단계
를 더 포함하는, 깊이 맵 산출 방법.
According to claim 3,
Updating the global sparse depth map based on the pose information of the updated camera.
Further comprising, the depth map calculation method.

제1항에 있어서,
상기 글로벌 스파스 깊이 맵을 산출하는 단계는
상기 복수의 프레임들 중 상기 현재 프레임 이전 시점의 키 프레임(key frame)에 대응하는 제1 깊이 정보를 산출하는 단계;
상기 현재 프레임에 대응하는 제2 깊이 정보를 산출하는 단계;
상기 제1 깊이 정보 및 상기 제2 깊이 정보에 기초하여, 상기 현재 프레임에 대응하는 카메라의 포즈 정보를 추정하는 단계; 및
상기 제2 깊이 정보 및 상기 카메라의 포즈 정보에 기초하여, 상기 글로벌 스파스 깊이 맵을 산출하는 단계
를 포함하는, 깊이 맵 산출 방법.
According to claim 1,
The step of calculating the global sparse depth map is
Calculating first depth information corresponding to a key frame of a time point before the current frame among the plurality of frames;
Calculating second depth information corresponding to the current frame;
Estimating pose information of a camera corresponding to the current frame based on the first depth information and the second depth information; And
Calculating the global sparse depth map based on the second depth information and the pose information of the camera
Including, depth map calculation method.

제5항에 있어서,
상기 제2 깊이 정보를 산출하는 단계는
상기 현재 프레임에 포함된 우안 영상과 좌안 영상의 스테레오 매칭(stereo matching)을 수행하는 단계
를 포함하는, 깊이 맵 산출 방법.
The method of claim 5,
The calculating of the second depth information is
Performing stereo matching of the right-eye image and the left-eye image included in the current frame
Including, depth map calculation method.

제5항에 있어서,
상기 카메라의 포즈 정보는
제1 위치에서 제2 위치로의 상기 카메라의 이동에 따라 변화되는 회전 정보 및 이동 정보 중 적어도 하나를 포함하는, 깊이 맵 산출 방법.
The method of claim 5,
The pose information of the camera
And at least one of rotation information and movement information changed according to the movement of the camera from the first position to the second position.

제1항에 있어서,
상기 로컬 덴스 깊이 맵을 산출하는 단계는
복수의 픽셀 포인트들을 포함하는 상기 현재 프레임을 인공 신경망에 입력함으로써, 상기 복수의 픽셀 포인트들의 깊이 정보에 대응하는 상기 인공 신경망의 출력들을 획득하는 단계; 및
상기 출력들에 기초하여, 상기 로컬 덴스 깊이 맵을 산출하는 단계
를 포함하는, 깊이 맵 산출 방법.
According to claim 1,
The step of calculating the local depth depth map
Obtaining outputs of the artificial neural network corresponding to depth information of the plurality of pixel points by inputting the current frame including a plurality of pixel points into an artificial neural network; And
Based on the outputs, calculating the local depth depth map
Including, depth map calculation method.

제1항에 있어서,
상기 동적 객체 영역을 추출하는 단계는
상기 현재 프레임을 인공 신경망에 입력함으로써, 정적 객체 영역과 동적 객체 영역으로 분류된 상기 인공 신경망의 출력들을 획득하는 단계; 및
상기 출력들에 기초하여, 상기 동적 객체 영역을 추출하는 단계
를 포함하는, 깊이 맵 산출 방법.

According to claim 1,
The step of extracting the dynamic object region is
Obtaining outputs of the artificial neural network classified into a static object region and a dynamic object region by inputting the current frame into the artificial neural network; And
Extracting the dynamic object region based on the outputs
Including, depth map calculation method.

제1항에 있어서,
상기 글로벌 덴스 깊이 맵을 생성하는 단계는
상기 로컬 덴스 깊이 맵을 복수의 그리드 셀들로 분할하는 단계;
상기 동적 객체 영역이 제거된 글로벌 스파스 깊이 맵에 기초하여, 상기 그리드 셀들의 꼭지점들(corner points)에 대응하는 픽셀 포인트들의 깊이 정보를 업데이트하는 단계; 및
상기 동적 객체 영역이 제거된 글로벌 스파스 깊이 맵 및 상기 업데이트된 꼭지점들에 대응하는 픽셀 포인트들의 깊이 정보에 기초하여, 상기 그리드 셀들의 내부 영역에 포함되는 픽셀 포인트들의 깊이 정보를 업데이트하는 단계
를 포함하는, 깊이 맵 산출 방법.
According to claim 1,
Generating the global depth depth map is
Dividing the local density depth map into a plurality of grid cells;
Updating depth information of pixel points corresponding to corner points of the grid cells based on the global sparse depth map from which the dynamic object area is removed; And
Updating depth information of pixel points included in the inner region of the grid cells based on the global sparse depth map from which the dynamic object region has been removed and depth information of pixel points corresponding to the updated vertices.
Including, depth map calculation method.

제1항에 있어서,
상기 로컬 덴스 깊이 맵을 산출하는 단계는
상기 현재 프레임에 포함된 우안 영상과 좌안 영상을 특징 추출 모듈에 입력하여, 상기 우안 영상에 대응하는 우 특징 맵(feature map)과 상기 좌안 영상에 대응하는 좌 특징 맵을 산출하는 단계;
상기 우 특징 맵과 상기 좌 특징 맵에 기초하여, 상기 좌안 영상과 상기 우안 영상 간의 매칭되는 픽셀들의 초기 매칭 비용 데이터(initial matching cost data)를 획득하는 단계;
상기 초기 매칭 비용 데이터를 인공 신경망에 입력하여, 매칭 비용 데이터를 예측하는 단계;
상기 매칭 비용 데이터에 기초하여, 상기 매칭되는 픽셀들 각각의 깊이 정보를 산출하는 단계; 및
상기 각각의 깊이 정보에 기초하여, 상기 로컬 덴스 깊이 맵을 산출하는 단계
를 포함하는, 깊이 맵 산출 방법.
According to claim 1,
The step of calculating the local depth depth map
Calculating a right feature map corresponding to the right-eye image and a left feature map corresponding to the left-eye image by inputting a right-eye image and a left-eye image included in the current frame into a feature extraction module;
Obtaining initial matching cost data of matching pixels between the left-eye image and the right-eye image based on the right feature map and the left feature map;
Predicting matching cost data by inputting the initial matching cost data into an artificial neural network;
Calculating depth information of each of the matched pixels based on the matching cost data; And
Calculating the local dense depth map based on the respective depth information
Including, depth map calculation method.

제11항에 있어서,
상기 특징 추출 모듈은
상기 좌안 영상이 입력되는 좌 컨볼루션 인공 신경망과 상기 우안 영상이 입력되는 우 컨볼루션 인공 신경망을 포함하고,
상기 좌 컨볼루션 인공 신경망과 상기 우 컨볼루션 인공 신경망은 가중치(weight)를 공유하는, 깊이 맵 산출 방법.
The method of claim 11,
The feature extraction module
And a left convolutional artificial neural network to which the left eye image is input and a right convolutional artificial neural network to which the right eye image is input,
The left convolution artificial neural network and the right convolution artificial neural network share a weight, the depth map calculation method.

제11항에 있어서,
상기 초기 매칭 비용 데이터를 획득하는 단계는
상기 우 특징 맵과 상기 좌 특징 맵을 연결하여, 상기 초기 매칭 비용 데이터를 획득하는 단계
를 포함하는, 깊이 맵 산출 방법.
The method of claim 11,
The step of obtaining the initial matching cost data is
Connecting the right feature map and the left feature map to obtain the initial matching cost data
Including, depth map calculation method.

제11항에 있어서,
상기 매칭 비용 데이터를 예측하는 단계는
모래시계(Hourglass) 인공 신경망을 및 초기 매칭 비용 데이터에 기초하여, 상기 매칭 비용 데이터를 예측하는 단계
를 포함하는, 깊이 맵 산출 방법.
The method of claim 11,
The step of predicting the matching cost data
Predicting the matching cost data based on an hourglass artificial neural network and initial matching cost data
Including, depth map calculation method.

제11항에 있어서,
상기 깊이 정보를 산출하는 단계는
컨볼루션 인공 신경망을 이용하여, 상기 매칭 비용 데이터에 대해 공간 컨볼루션 연산을 수행하는 단계;
상기 공간 컨볼루션 연산 수행 결과에 기초하여, 상기 좌안 영상과 상기 우안 영상 간의 매칭되는 픽셀들의 시차를 추정하는 단계; 및
상기 시차에 기초하여, 상기 깊이 정보를 산출하는 단계
를 포함하는, 깊이 맵 산출 방법.
The method of claim 11,
The step of calculating the depth information
Performing a spatial convolution operation on the matching cost data using a convolutional artificial neural network;
Estimating a parallax of matching pixels between the left-eye image and the right-eye image based on a result of performing the spatial convolution operation; And
Calculating the depth information based on the parallax
Including, depth map calculation method.

제15항에 있어서,
상기 공간 컨볼루션 연산을 수행하는 단계는
상기 매칭 비용 데이터에 대해 설정된 방향에 따라, 상기 매칭 비용 데이터에 대해 분할을 진행하여 복수의 매칭 비용 레이어를 획득하는 단계; 및
상기 복수의 매칭 비용 레이어 각각에 대하여, 상기 방향에 따라 차례대로 컨볼루션 연산을 수행하는 단계
를 포함하는, 깊이 맵 산출 방법.
The method of claim 15,
The step of performing the spatial convolution operation is
Dividing the matching cost data according to a direction set for the matching cost data to obtain a plurality of matching cost layers; And
Convolutional operation is performed for each of the plurality of matching cost layers in order according to the direction.
Including, depth map calculation method.

제16항에 있어서,
상기 차례대로 컨볼루션 연산을 수행하는 단계는
임의의 매칭 비용 레이어에 대해 컨볼루션 연산을 수행할 때, 상기 임의의 매칭 비용 레이어에 상기 임의의 매칭 비용 레이어 이전의 매칭 비용 레이어의 컨볼루션 결과를 누적한 후, 컨볼루션 연산을 수행하는 단계를 포함하는,
The method of claim 16,
The step of performing the convolution operation in the above order is
When performing a convolution operation on an arbitrary matching cost layer, accumulating the convolution result of the matching cost layer before the arbitrary matching cost layer in the arbitrary matching cost layer, and then performing a convolution operation Containing,

제15항에 있어서,
상기 시차를 추정하는 단계는
상기 공간 컨볼루션 처리 결과 및 소프트맥스(softmax) 함수에 기초하여, 상기 좌안 영상과 상기 우안 영상 간의 매칭되는 픽셀들의 시차 확률 분포를 획득하는 단계; 및
상기 시차 확률 분포에 기초하여, 상기 시차를 추정하는 단계
를 포함하는, 깊이 맵 산출 방법.
The method of claim 15,
Estimating the time difference is
Obtaining a parallax probability distribution of matching pixels between the left-eye image and the right-eye image based on the spatial convolution processing result and a softmax function; And
Estimating the parallax based on the parallax probability distribution
Including, depth map calculation method.

제1항에 있어서,
상기 동적 객체 영역을 추출하는 단계는
상기 현재 프레임을 특징 추출 모듈에 입력하여, 상기 현재 프레임에 대응하는 특징 맵을 산출하는 단계;
상기 특징 맵에 기초하여, 상기 현재 프레임에 포함된 객체들의 카테고리 속성(category attribute) 정보를 획득하는 단계; 및
상기 카테고리 속성 정보에 기초하여, 상기 현재 프레임에 포함된 객체들의 상태 정보를 획득하는 단계
를 포함하는, 깊이 맵 산출 방법.
According to claim 1,
The step of extracting the dynamic object region is
Inputting the current frame into a feature extraction module, and calculating a feature map corresponding to the current frame;
Obtaining category attribute information of objects included in the current frame based on the feature map; And
Obtaining status information of objects included in the current frame based on the category attribute information
Including, depth map calculation method.

제19항에 있어서,
상기 상태 정보를 획득하는 단계는
상기 현재 프레임과 상기 현재 프레임의 이전 프레임 사이의 광류 정보(optical flow information)를 결정하는 단계; 및
상기 광류 정보 및 상기 카테고리 속성 정보에 기초하여, 상기 상태 정보를 획득하는 단계
를 포함하는, 깊이 맵 산출 방법.
The method of claim 19,
The step of obtaining the status information is
Determining optical flow information between the current frame and a previous frame of the current frame; And
Obtaining the status information based on the light flow information and the category attribute information
Including, depth map calculation method.

하드웨어와 결합되어 제1항 내지 제20항 중 어느 하나의 항의 방법을 실행시키기 위하여 매체에 저장된 컴퓨터 프로그램.
A computer program stored in a medium to execute the method of any one of claims 1 to 20 in combination with hardware.

현재 프레임을 포함하는 복수의 프레임들을 획득하는 카메라; 및
상기 복수의 프레임들을 이용하여, 상기 현재 프레임에 대응하는 글로벌 스파스 깊이 맵(global sparse depth map)을 산출하고, 상기 현재 프레임을 이용하여, 상기 현재 프레임에 대응하는 로컬 덴스 깊이 맵(local dense depth map)을 산출하고, 상기 현재 프레임에서, 정적 객체(static object) 영역을 마스크(mask)하여 동적 객체 영역(none static object)을 추출하고, 상기 글로벌 스파스 깊이 맵에서 상기 동적 객체 영역을 제거하며, 상기 동적 객체 영역이 제거된 글로벌 스파스 깊이 맵과 상기 로컬 덴스 깊이 맵을 통합하여 상기 현재 프레임에 대응하는 글로벌 덴스 깊이 맵(global dense depth map)을 생성하는 프로세서
를 포함하는, 깊이 맵 산출 장치.
A camera acquiring a plurality of frames including the current frame; And
Using the plurality of frames, a global sparse depth map corresponding to the current frame is calculated, and using the current frame, a local dense depth map corresponding to the current frame map), in the current frame, mask a static object area, extract a dynamic object area, remove the dynamic object area from the global sparse depth map, A processor generating a global dense depth map corresponding to the current frame by integrating the global sparse depth map from which the dynamic object region is removed and the local dense depth map
Including, depth map calculation device.

제22항에 있어서,
상기 프로세서는
상기 현재 프레임에 포함된 하나 이상의 픽셀 포인트에 대응하는 깊이 정보를 산출하고, 상기 깊이 정보에 기초하여 상기 픽셀 포인트의 3차원 좌표를 산출하는, 깊이 맵 산출 장치.
The method of claim 22,
The processor
The depth map calculation device calculates depth information corresponding to one or more pixel points included in the current frame, and calculates three-dimensional coordinates of the pixel points based on the depth information.

제23항에 있어서,
상기 프로세서는
상기 글로벌 덴스 깊이 맵에 기초하여, 상기 현재 프레임에 대응하는 카메라의 포즈 정보를 업데이트하는, 깊이 맵 산출 장치.
The method of claim 23,
The processor
A depth map calculation device for updating pose information of a camera corresponding to the current frame, based on the global depth map.

제24항에 있어서,
상기 프로세서는
상기 업데이트가 완료된 카메라의 포즈 정보에 기초하여, 상기 글로벌 스파스 깊이 맵을 업데이트하는, 깊이 맵 산출 장치.
The method of claim 24,
The processor
A depth map calculation device for updating the global sparse depth map based on the pose information of the updated camera.

제22항에 있어서,
상기 프로세서는
상기 복수의 프레임들 중 키 프레임(key frame)에 대응하는 제1 깊이 정보를 산출하고, 상기 현재 프레임에 대응하는 제2 깊이 정보를 산출하고, 상기 제1 깊이 정보 및 상기 제2 깊이 정보에 기초하여, 상기 현재 프레임에 대응하는 카메라의 포즈 정보를 추정하며, 상기 제2 깊이 정보 및 상기 카메라의 포즈 정보에 기초하여, 상기 글로벌 스파스 깊이 맵을 산출하는, 깊이 맵 산출 장치.
The method of claim 22,
The processor
Calculate first depth information corresponding to a key frame among the plurality of frames, calculate second depth information corresponding to the current frame, and based on the first depth information and the second depth information The apparatus for estimating pose information of a camera corresponding to the current frame, and calculating the global sparse depth map based on the second depth information and the pose information of the camera.

제25항에 있어서,
상기 프로세서는
상기 현재 프레임에 포함된 우안 영상과 좌안 영상의 스테레오 매칭(stereo matching)을 수행하는, 깊이 맵 산출 장치.
The method of claim 25,
The processor
A depth map calculating device that performs stereo matching of a right-eye image and a left-eye image included in the current frame.

제22항에 있어서,
상기 프로세서는
복수의 픽셀 포인트들을 포함하는 상기 현재 프레임을 인공 신경망에 입력함으로써, 상기 복수의 픽셀 포인트들의 깊이 정보에 대응하는 상기 인공 신경망의 출력들을 획득하고, 상기 출력들에 기초하여 상기 로컬 덴스 깊이 맵을 산출하는, 깊이 맵 산출 장치.
The method of claim 22,
The processor
By inputting the current frame including a plurality of pixel points to an artificial neural network, outputs of the artificial neural network corresponding to depth information of the plurality of pixel points are obtained, and the local density depth map is calculated based on the outputs. Depth map calculation device.

제22항에 있어서,
상기 프로세서는
상기 현재 프레임을 인공 신경망에 입력함으로써, 정적 객체 영역과 동적 객체 영역으로 분류된 상기 인공 신경망의 출력들을 획득하고, 상기 출력들에 기초하여 상기 동적 객체 영역을 추출하는, 깊이 맵 산출 장치.
The method of claim 22,
The processor
Depth map calculating device, by inputting the current frame into an artificial neural network, obtaining outputs of the artificial neural network classified into a static object region and a dynamic object region, and extracting the dynamic object region based on the outputs.

제22항에 있어서,
상기 프로세서는
상기 로컬 덴스 깊이 맵을 복수의 그리드 셀들로 분할하고, 상기 동적 객체 영역이 제거된 글로벌 스파스 깊이 맵에 기초하여, 상기 그리드 셀들의 꼭지점들(corner points)에 대응하는 픽셀 포인트들의 깊이 정보를 업데이트하고, 상기 동적 객체 영역이 제거된 글로벌 스파스 깊이 맵 및 상기 업데이트된 꼭지점들에 대응하는 픽셀 포인트들의 깊이 정보에 기초하여, 상기 그리드 셀들의 내부 영역에 포함되는 픽셀 포인트들의 깊이 정보를 업데이트하는, 깊이 맵 산출 장치.
The method of claim 22,
The processor
Divide the local dense depth map into a plurality of grid cells, and update depth information of pixel points corresponding to corner points of the grid cells based on the global sparse depth map from which the dynamic object region is removed. And updating the depth information of the pixel points included in the inner region of the grid cells based on the global sparse depth map from which the dynamic object region has been removed and the depth information of the pixel points corresponding to the updated vertices. Depth map calculation device.

제22항에 있어서,
상기 프로세서는
상기 현재 프레임에 포함된 우안 영상과 좌안 영상을 특징 추출 모듈에 입력하여, 상기 우안 영상에 대응하는 우 특징 맵(feature map)과 상기 좌안 영상에 대응하는 좌 특징 맵을 산출하고, 상기 우 특징 맵과 상기 좌 특징 맵에 기초하여, 상기 좌안 영상과 상기 우안 영상 간의 매칭되는 픽셀들의 초기 매칭 비용 데이터(initial matching cost data)를 획득하고, 상기 초기 매칭 비용 데이터를 인공 신경망에 입력하여, 매칭 비용 데이터를 예측하고, 상기 매칭 비용 데이터에 기초하여, 상기 매칭되는 픽셀들 각각의 깊이 정보를 산출하며, 상기 각각의 깊이 정보에 기초하여, 상기 로컬 덴스 깊이 맵을 산출하는, 깊이 맵 산출 장치.The method of claim 22,
The processor
The right eye image and the left eye image included in the current frame are input to a feature extraction module to calculate a right feature map corresponding to the right eye image and a left feature map corresponding to the left eye image, and the right feature map And initial matching cost data of pixels matching between the left-eye image and the right-eye image based on the left feature map, and inputting the initial matching cost data into an artificial neural network, thereby matching cost data , Prediction, and based on the matching cost data, to calculate the depth information of each of the matched pixels, based on the depth information, to calculate the local depth depth map, depth map calculation device.