KR20240027395A

KR20240027395A - Method and apparatus for visual positioning based on single image object recognition in the mobile environment

Info

Publication number: KR20240027395A
Application number: KR1020220105560A
Authority: KR
Inventors: 김주영; 김인선; 정태원; 정계동; 유민수; 정치서; 박찬수; 강진규
Original assignee: 주식회사 공간의파티
Priority date: 2022-08-23
Filing date: 2022-08-23
Publication date: 2024-03-04

Abstract

본 발명은 시각적 측위 장치에서 수행되는 시각적 측위 방법에 있어서, (a) 사용자 단말로부터 기설정된 크기로 조정된 이미지를 수신하는 단계, (b) 상기 이미지에 2차원 고정 객체 또는 3차원 고정 객체의 존재 여부를 판단하는 단계, (c) 상기 이미지에 존재하는 고정 객체의 유형에 따라 상기 이미지에 딥러닝 알고리즘을 적용하여 상기 사용자 단말의 위치 좌표를 획득하는 단계, 및 (d) 상기 위치 좌표를 상기 사용자 단말에 전송하는 단계를 포함한다.The present invention relates to a visual positioning method performed in a visual positioning device, comprising: (a) receiving an image adjusted to a preset size from a user terminal; (b) the presence of a two-dimensional fixed object or a three-dimensional fixed object in the image; determining whether or not, (c) applying a deep learning algorithm to the image according to the type of fixed object present in the image to obtain the location coordinates of the user terminal, and (d) applying the location coordinates to the user It includes the step of transmitting to the terminal.

Description

모바일 환경에서 단일 이미지 객체 인식 기반의 시각적 측위 방법 및 장치{METHOD AND APPARATUS FOR VISUAL POSITIONING BASED ON SINGLE IMAGE OBJECT RECOGNITION IN THE MOBILE ENVIRONMENT}Visual positioning method and device based on single image object recognition in a mobile environment {METHOD AND APPARATUS FOR VISUAL POSITIONING BASED ON SINGLE IMAGE OBJECT RECOGNITION IN THE MOBILE ENVIRONMENT}

본 발명은 실내 측위 기술에 관한 것으로서, 보다 상세하게, 이미지로부터 카메라의 위치와 객체의 포즈를 추정하여 실내 공간에서 사용자의 위치를 결정하는 시각적 측위 방법 및 장치에 관한 것이다.The present invention relates to indoor positioning technology, and more specifically, to a visual positioning method and device for determining a user's position in an indoor space by estimating the position of a camera and the pose of an object from an image.

최근 GPS(Global Positioning System)가 없는 실내 공간에서 모바일을 사용한 위치기반 서비스에 대한 수요가 증가하고 있다. 그러나 아직까지 실외 환경에서의 GNSS(Global Navigation Satellite System) 솔루션과 같이 모바일에서 실시간성을 보장하는 실내 측위 및 내비게이션에 완전히 적용 가능한 기술은 찾기 어려운 상황이다.Recently, demand for location-based services using mobile devices is increasing in indoor spaces without GPS (Global Positioning System). However, it is still difficult to find a technology that is fully applicable to indoor positioning and navigation that guarantees real-time on mobile devices, such as GNSS (Global Navigation Satellite System) solutions in outdoor environments.

스마트폰 카메라를 이용한 실내 단일 이미지 포지셔닝은 별도의 인프라가 필요 없고 저렴한 비용과 스마트폰의 대중화로 인해 잠재 시장이 크다는 장점이 있으나, 스마트폰 카메라와 영상 알고리즘을 기반으로 하는 기존의 방법은 실내 공간에서 구현할 때 여러가지 한계가 있다. 실내 내비게이션 관련 기술 중 위치 인식을 위해 센서 없이 기계학습과 딥러닝 기법을 적용하는 방법이 있으나 이 방법은 내장된 지도의 지속적인 업데이트와 실내 지도 구축 없이는 위치기반 증강현실 서비스의 품질을 유지하기가 어려운 문제가 있다. 또한, 시각 위치 측위 시스템에 사용 가능한 물체 포즈 추정 접근방식 중 RGB 이미지로 깊이 맵을 계산하는 방법의 경우에는 우수한 성능을 보여주고 있으나 깊이 추정 카메라가 실외 또는 반사 물체의 깊이를 측정할 수 없어 신뢰성이 떨어지고 센서의 동작에 따라 추가 모바일 기기의 배터리를 소모하는 문제가 있다. 이에 더하여, 실내 측위 방식 중 스크린샷을 이용한 QRcode 방식은 정확도가 높지만 사용자의 위치가 대략적으로 파악되는 문제가 있다.Indoor single image positioning using a smartphone camera has the advantage of requiring no separate infrastructure and has a large potential market due to low cost and the popularization of smartphones. However, existing methods based on smartphone cameras and video algorithms are used in indoor spaces. There are several limitations when implementing it. Among indoor navigation-related technologies, there is a method of applying machine learning and deep learning techniques without sensors for location recognition, but this method has the problem of maintaining the quality of location-based augmented reality services without continuous updating of the built-in map and construction of an indoor map. There is. In addition, among the object pose estimation approaches available for visual positioning systems, the method of calculating a depth map from an RGB image shows excellent performance, but is unreliable because the depth estimation camera cannot measure the depth of outdoor or reflective objects. There is a problem of falling and consuming the battery of additional mobile devices depending on the operation of the sensor. In addition, among indoor positioning methods, the QRcode method using screenshots has high accuracy, but has the problem of roughly determining the user's location.

한국공개특허 제10-2018-0106848호Korean Patent Publication No. 10-2018-0106848

본 발명의 목적은 모바일 웹 환경에서 단일 이미지로부터 객체 포즈 추정 기반의 접근법을 이용하여 가상 공간에서 사용자의 시각적 위치를 결정하고 증강현실 인터페이스를 구성하여 사용자의 실내 위치 정보를 제공하는 시각적 측위 방법 및 장치를 제공하는데 있다.The purpose of the present invention is to determine a user's visual position in a virtual space using an approach based on object pose estimation from a single image in a mobile web environment and to construct an augmented reality interface to provide indoor location information of the user. is to provide.

본 발명이 해결하고자 하는 과제들은 이상에서 언급된 과제로 제한되지 않으며, 언급되지 않은 또 다른 과제들은 아래의 기재로부터 통상의 기술자에게 명확하게 이해될 수 있을 것이다.The problems to be solved by the present invention are not limited to the problems mentioned above, and other problems not mentioned can be clearly understood by those skilled in the art from the description below.

상기 목적을 달성하기 위한 본 발명의 제1 측면은, 시각적 측위 장치에서 수행되는 시각적 측위 방법에 있어서, (a) 사용자 단말로부터 기설정된 크기로 조정된 이미지를 수신하는 단계, (b) 상기 이미지에 2차원 고정 객체 또는 3차원 고정 객체의 존재 여부를 판단하는 단계, (c) 상기 이미지에 존재하는 고정 객체의 유형에 따라 상기 이미지에 딥러닝 알고리즘을 적용하여 상기 사용자 단말의 위치 좌표를 획득하는 단계, 및 (d) 상기 위치 좌표를 상기 사용자 단말에 전송하는 단계를 포함한다.The first aspect of the present invention for achieving the above object is a visual positioning method performed in a visual positioning device, comprising: (a) receiving an image adjusted to a preset size from a user terminal, (b) determining whether a two-dimensional fixed object or a three-dimensional fixed object exists; (c) applying a deep learning algorithm to the image according to the type of fixed object present in the image to obtain location coordinates of the user terminal; , and (d) transmitting the location coordinates to the user terminal.

바람직하게, 상기 (b) 단계는, 상기 이미지에 객체의 바운딩 박스의 좌표와 클래스 정보를 획득하기 위한 딥러닝 알고리즘을 적용하여 객체 인식의 신뢰값을 획득하는 단계, 및 상기 신뢰값이 기설정된 수치를 초과하는 경우, 해당 신뢰값을 가지는 객체가 존재하는 것으로 판단하는 단계를 포함할 수 있다.Preferably, the step (b) includes obtaining a trust value for object recognition by applying a deep learning algorithm to obtain the coordinates and class information of the object's bounding box to the image, and the trust value is a preset value. If it exceeds, it may include a step of determining that an object with the corresponding trust value exists.

바람직하게, 상기 (b) 단계는, 상기 이미지에 기설정된 수치를 초과하는 신뢰값을 가지는 객체가 복수개 있는 경우에는, 가장 큰 신뢰값을 가지는 객체를 상기 이미지에 존재하는 고정 객체로 결정하는 단계를 포함할 수 있다.Preferably, step (b) includes, when there are a plurality of objects in the image with a trust value exceeding a preset value, determining the object with the largest trust value as the fixed object present in the image. It can be included.

바람직하게, 상기 (c) 단계는, 상기 이미지에 존재하는 고정 객체의 유형이 2차원 고정 객체인 경우에는, 상기 2차원 고정 객체와 상기 사용자 단말의 거리를 추정하는 단계, 상기 사용자 단말에 구비된 방향 센서로부터 회전 각도를 획득하는 단계, 상기 거리와 회전 각도를 기초로 상기 사용자 단말의 상기 2차원 고정 객체에 대한 상대 위치 좌표를 산출하는 단계, 및 상기 상대 위치 좌표를 그리드 인덱싱을 통해 보정하는 단계를 포함할 수 있다.Preferably, step (c) includes, when the type of fixed object present in the image is a two-dimensional fixed object, estimating the distance between the two-dimensional fixed object and the user terminal, Obtaining a rotation angle from a direction sensor, calculating relative position coordinates of the user terminal with respect to the two-dimensional fixed object based on the distance and rotation angle, and correcting the relative position coordinates through grid indexing. may include.

바람직하게, 상기 거리를 추정하는 단계는, 상기 2차원 고정 객체에 대한 바운딩 박스의 좌표와 클래스 정보를 획득하는 단계, 및 상기 2차원 고정 객체의 클래스와 동일한 클래스인 참조 이미지를 이용하여 상기 2차원 고정 객체와 상기 사용자 단말의 거리를 산출하는 단계를 포함할 수 있다.Preferably, the step of estimating the distance includes obtaining coordinates and class information of a bounding box for the two-dimensional fixed object, and using a reference image of the same class as the class of the two-dimensional fixed object. It may include calculating the distance between a fixed object and the user terminal.

바람직하게, 상기 거리를 산출하는 단계는, 상기 참조 이미지의 바운딩 박스의 높이, 상기 참조 이미지에 대해 사전에 입력된 사용자 단말과의 거리, 및 상기 참조 이미지의 실제 높이를 기초로 산출된 초점 거리를 획득하는 단계, 상기 2차원 고정 객체의 실제 높이 및 상기 2차원 고정 객체의 바운딩 박스의 높이를 획득하는 단계, 및 상기 초점 거리, 상기 2차원 고정 객체의 실제 높이, 및 상기 2차원 고정 객체의 바운딩 박스의 높이를 기초로 상기 2차원 고정 객체와 상기 사용자 단말의 거리를 산출하는 단계를 포함할 수 있다.Preferably, the step of calculating the distance includes the focal distance calculated based on the height of the bounding box of the reference image, the distance to the user terminal pre-entered for the reference image, and the actual height of the reference image. Obtaining, the actual height of the two-dimensional fixed object and the height of the bounding box of the two-dimensional fixed object, and the focal distance, the actual height of the two-dimensional fixed object, and the bounding of the two-dimensional fixed object. It may include calculating the distance between the two-dimensional fixed object and the user terminal based on the height of the box.

바람직하게, 상기 보정하는 단계는, 상기 이미지가 촬영된 실내 공간에 대한 그리드 메쉬를 기설정된 사이즈의 정사각형 그리드로 분할하는 단계, 상기 분할한 각 그리드의 주소를 생성하여 그리드 데이터베이스를 구성하는 단계, 및 상기 그리드 데이터베이스를 기초로 상대 위치 좌표를 그리드 주소로 변환하여 상대 위치 좌표를 보정하는 단계를 포함할 수 있다.Preferably, the correction step includes dividing the grid mesh for the indoor space where the image was captured into square grids of a preset size, creating an address for each divided grid to construct a grid database, and It may include correcting the relative position coordinates by converting the relative position coordinates into a grid address based on the grid database.

바람직하게, 상기 (c) 단계는, 상기 이미지에 존재하는 고정 객체의 유형이 3차원 고정 객체인 경우에는, 상기 3차원 고정 객체의 바운딩 박스의 좌표 및 3차원 고정 객체의 크기를 획득하는 단계, 상기 바운딩 박스의 좌표, 상기 3차원 고정 객체의 크기, 상기 이미지를 촬영한 사용자 단말에 구비된 카메라의 내부 파라미터를 기초로 카메라의 위치 및 촬영 각도를 획득하기 위한 알고리즘을 적용하여 상기 3차원 고정 객체의 회전벡터 및 이동벡터를 예측하는 단계, 상기 회전벡터 및 이동벡터를 기초로 상기 3차원 고정 객체에 대한 상기 사용자 단말의 상대 위치 좌표를 산출하는 단계, 및 상기 상대 위치 좌표를 복셀 인덱싱을 통해 보정하는 단계를 포함할 수 있다.Preferably, step (c) includes, when the type of fixed object present in the image is a 3D fixed object, obtaining the coordinates of a bounding box of the 3D fixed object and the size of the 3D fixed object, Applying an algorithm to obtain the position and shooting angle of the camera based on the coordinates of the bounding box, the size of the 3D fixed object, and the internal parameters of the camera provided in the user terminal that captured the image, the 3D fixed object predicting a rotation vector and a movement vector, calculating relative position coordinates of the user terminal with respect to the three-dimensional stationary object based on the rotation vector and movement vector, and correcting the relative position coordinates through voxel indexing. It may include steps.

바람직하게, 상기 회전벡터 및 이동벡터를 예측하는 단계는, 상기 알고리즘을 적용하기 이전에, 상기 바운딩 박스의 꼭짓점에 해당하는 8개의 모서리를 제어 지점으로 설정하는 단계, 상기 바운딩 박스의 중심을 제어 지점으로 설정하는 단계, 및 상기 3차원 고정 객체에 대한 제어 지점의 좌표를 파라미터화하는 단계를 포함할 수 있다.Preferably, the step of predicting the rotation vector and the movement vector includes, before applying the algorithm, setting eight corners corresponding to the vertices of the bounding box as control points, and setting the center of the bounding box as the control point. It may include setting and parameterizing the coordinates of the control point for the three-dimensional fixed object.

바람직하게, 상기 보정하는 단계는, 상기 이미지가 촬영된 실내 공간을 기설정된 사이즈의 정육면체의 복셀로 분할하는 단계, 상기 분할한 각 복셀의 주소를 생성하여 복셀 데이터베이스를 구성하는 단계, 및 상기 복셀 데이터베이스를 기초로 상대 위치 좌표를 복셀 주소로 변환하여 상대 위치 좌표를 보정하는 단계를 포함할 수 있다.Preferably, the correction step includes dividing the indoor space where the image was captured into cubic voxels of a preset size, creating an address of each divided voxel to construct a voxel database, and the voxel database. It may include a step of correcting the relative position coordinates by converting the relative position coordinates into a voxel address based on .

상기 목적을 달성하기 위한 본 발명의 제2 측면은, 시각적 측위 장치로서, 사용자 단말로부터 기설정된 크기로 조정된 이미지를 수신하는 수신부, 상기 이미지에 2차원 고정 객체 또는 3차원 고정 객체의 존재 여부를 판단하고, 상기 이미지에 존재하는 고정 객체의 유형에 따라 상기 이미지에 딥러닝 알고리즘을 적용하여 상기 사용자 단말의 위치 좌표를 획득하는 측위부, 및 상기 위치 좌표를 상기 사용자 단말에 전송하는 전송부를 포함한다.The second aspect of the present invention for achieving the above object is a visual positioning device, which includes a receiver that receives an image adjusted to a preset size from a user terminal, and a device that determines whether a two-dimensional fixed object or a three-dimensional fixed object is present in the image. It includes a positioning unit that determines and obtains the location coordinates of the user terminal by applying a deep learning algorithm to the image according to the type of fixed object present in the image, and a transmission unit that transmits the location coordinates to the user terminal. .

상기 목적을 달성하기 위한 본 발명의 제3 측면은 컴퓨터 판독 가능 매체에 저장되어 있는 컴퓨터 프로그램에 있어서, 상기 컴퓨터 프로그램의 명령이 실행될 경우, 데이터 정합 방법이 수행되는 것을 특징으로 한다.A third aspect of the present invention for achieving the above object is characterized in that, in a computer program stored in a computer-readable medium, when an instruction of the computer program is executed, a data matching method is performed.

상기한 바와 같이 본 발명에 의하면, GPS 서비스가 어려운 곳에서 사용자 단말의 카메라를 사용하여 웹 환경에서 단일 이미지를 통해 물체를 감지하고 사용자 단말의 위치를 계산하여 실내 공간에서 사용자를 찾을 수 있고 주변 환경과 방향을 시각적으로 파악할 수 있는 효과가 있다.As described above, according to the present invention, in places where GPS service is difficult, the camera of the user terminal can be used to detect an object through a single image in a web environment and calculate the location of the user terminal to find the user in an indoor space and the surrounding environment. It has the effect of visually identifying the direction.

또한, 확장된 Single shot deep CNN 네트워크와 YOLO(You Only Look Once) 네트워크를 사용하여 실시간성을 보장하면서 추가 계산 비용을 낮게 유지하는 효과가 있고, 애플리케이션을 추가 설치할 필요 없이 모바일 웹 브라우저를 통해 적은 비용으로 빠르고 지속 가능한 시각 측위 정보를 제공하여 사용자에게 다양한 위치 기반 서비스를 제공할 수 있는 효과가 있다.In addition, by using the expanded Single shot deep CNN network and YOLO (You Only Look Once) network, it is effective in keeping additional computational costs low while ensuring real-time, and low cost through mobile web browser without the need to install additional applications. This has the effect of providing a variety of location-based services to users by providing fast and sustainable visual positioning information.

도 1은 본 발명의 바람직한 실시예에 따른 시각적 측위 시스템에 대한 구성도이다.
도 2 및 3은 일 실시예에 따른 시각적 측위 장치에 대한 블록도이다.
도 4는 일 실시예에 따른 시각적 측위 방법에 대한 흐름도이다.
도 5는 일 실시예에 따른 복셀 인덱싱을 통한 위치 좌표 보정을 설명하기 위한 예시도이다.
도 6은 카메라 초점 거리와 카메라의 거리 관계를 설명하기 위한 예시도이다.
도 7은 사용자 단말의 표준 3축 좌표계에 대한 예시도이다.
도 8은 고정 객체와 카메라 사이의 위치 관계를 설명하기 위한 예시도이다.
도 9는 일 실시예에 따른 그리드 인덱싱을 통한 위치 좌표 보정을 설명하기 위한 예시도이다.1 is a configuration diagram of a visual positioning system according to a preferred embodiment of the present invention.
2 and 3 are block diagrams of a visual positioning device according to one embodiment.
Figure 4 is a flowchart of a visual positioning method according to one embodiment.
Figure 5 is an example diagram for explaining position coordinate correction through voxel indexing according to an embodiment.
Figure 6 is an example diagram for explaining the relationship between camera focal length and camera distance.
Figure 7 is an example diagram of a standard three-axis coordinate system of a user terminal.
Figure 8 is an example diagram for explaining the positional relationship between a fixed object and a camera.
Figure 9 is an example diagram for explaining position coordinate correction through grid indexing according to an embodiment.

이하, 본 발명의 이점 및 특징, 그리고 그것들을 달성하는 방법은 첨부되는 도면과 함께 상세하게 후술되어 있는 실시예들을 참조하면 명확해질 것이다. 그러나 본 발명은 이하에서 개시되는 실시예들에 한정되는 것이 아니라 서로 다른 다양한 형태로 구현될 것이며, 단지 본 실시예들은 본 발명의 개시가 완전하도록 하며, 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자에게 발명의 범주를 완전하게 알려주기 위해 제공되는 것이며, 본 발명은 청구항의 범주에 의해 정의될 뿐이다. 명세서 전체에 걸쳐 동일 참조 부호는 동일 구성 요소를 지칭한다. "및/또는"은 언급된 아이템들의 각각 및 하나 이상의 모든 조합을 포함한다.Hereinafter, the advantages and features of the present invention and methods for achieving them will become clear with reference to the embodiments described in detail below along with the accompanying drawings. However, the present invention is not limited to the embodiments disclosed below and will be implemented in various different forms. The present embodiments only serve to ensure that the disclosure of the present invention is complete and that common knowledge in the technical field to which the present invention pertains is not limited. It is provided to fully inform those who have the scope of the invention, and the present invention is only defined by the scope of the claims. Like reference numerals refer to like elements throughout the specification. “And/or” includes each and every combination of one or more of the mentioned items.

비록 제1, 제2 등이 다양한 소자, 구성요소 및/또는 섹션들을 서술하기 위해서 사용되나, 이들 소자, 구성요소 및/또는 섹션들은 이들 용어에 의해 제한되지 않음은 물론이다. 이들 용어들은 단지 하나의 소자, 구성요소 또는 섹션들을 다른 소자, 구성요소 또는 섹션들과 구별하기 위하여 사용하는 것이다. 따라서, 이하에서 언급되는 제1 소자, 제1 구성요소 또는 제1 섹션은 본 발명의 기술적 사상 내에서 제2 소자, 제2 구성요소 또는 제2 섹션일 수도 있음은 물론이다.Although first, second, etc. are used to describe various elements, elements and/or sections, it is understood that these elements, elements and/or sections are not limited by these terms. These terms are merely used to distinguish one element, element, or section from other elements, elements, or sections. Therefore, it goes without saying that the first element, first element, or first section mentioned below may also be a second element, second element, or second section within the technical spirit of the present invention.

또한, 각 단계들에 있어 식별부호(예를 들어, a, b, c 등)는 설명의 편의를 위하여 사용되는 것으로 식별부호는 각 단계들의 순서를 설명하는 것이 아니며, 각 단계들은 문맥상 명백하게 특정 순서를 기재하지 않는 이상 명기된 순서와 다르게 일어날 수 있다. 즉, 각 단계들은 명기된 순서와 동일하게 일어날 수도 있고 실질적으로 동시에 수행될 수도 있으며 반대의 순서대로 수행될 수도 있다.In addition, identification codes (e.g., a, b, c, etc.) for each step are used for convenience of explanation. The identification codes do not explain the order of each step, and each step is clearly specified in the context. Unless the order is specified, it may occur differently from the specified order. That is, each step may occur in the same order as specified, may be performed substantially simultaneously, or may be performed in the opposite order.

본 명세서에서 사용된 용어는 실시예들을 설명하기 위한 것이며 본 발명을 제한하고자 하는 것은 아니다. 본 명세서에서, 단수형은 문구에서 특별히 언급하지 않는 한 복수형도 포함한다. 명세서에서 사용되는 “포함한다(comprises)" 및/또는 “포함하는(comprising)"은 언급된 구성요소, 단계, 동작 및/또는 소자는 하나 이상의 다른 구성요소, 단계, 동작 및/또는 소자의 존재 또는 추가를 배제하지 않는다.The terminology used herein is for describing embodiments and is not intended to limit the invention. As used herein, singular forms also include plural forms, unless specifically stated otherwise in the context. As used in the specification, “comprises” and/or “comprising” refers to the presence of one or more other components, steps, operations and/or elements. or does not rule out addition.

다른 정의가 없다면, 본 명세서에서 사용되는 모든 용어(기술 및 과학적 용어를 포함)는 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자에게 공통적으로 이해될 수 있는 의미로 사용될 수 있을 것이다. 또 일반적으로 사용되는 사전에 정의되어 있는 용어들은 명백하게 특별히 정의되어 있지 않는 한 이상적으로 또는 과도하게 해석되지 않는다.Unless otherwise defined, all terms (including technical and scientific terms) used in this specification may be used with meanings that can be commonly understood by those skilled in the art to which the present invention pertains. Additionally, terms defined in commonly used dictionaries are not interpreted ideally or excessively unless clearly specifically defined.

또한, 본 발명의 실시예들을 설명함에 있어서 공지 기능 또는 구성에 대한 구체적인 설명이 본 발명의 요지를 불필요하게 흐릴 수 있다고 판단되는 경우에는 그 상세한 설명을 생략할 것이다. 그리고 후술되는 용어들은 본 발명의 실시예에서의 기능을 고려하여 정의된 용어들로서 이는 사용자, 운용자의 의도 또는 관례 등에 따라 달라질 수 있다. 그러므로 그 정의는 본 명세서 전반에 걸친 내용을 토대로 내려져야 할 것이다.Additionally, in describing the embodiments of the present invention, if it is determined that a detailed description of a known function or configuration may unnecessarily obscure the gist of the present invention, the detailed description will be omitted. The terms described below are terms defined in consideration of functions in the embodiments of the present invention, and may vary depending on the intention or custom of the user or operator. Therefore, the definition should be made based on the contents throughout this specification.

도 1은 본 발명의 바람직한 실시예에 따른 시각적 측위 시스템에 대한 구성도이다.1 is a configuration diagram of a visual positioning system according to a preferred embodiment of the present invention.

도 1을 참조하면, 시각적 측위 시스템(100)은 사용자 단말(110) 및 시각적 측위 장치(120)를 포함한다. 여기에서, 사용자 단말(110) 및 시각적 측위 장치(120)는 네트워크를 통하여 연결된다.Referring to FIG. 1, the visual positioning system 100 includes a user terminal 110 and a visual positioning device 120. Here, the user terminal 110 and the visual positioning device 120 are connected through a network.

시각적 측위 시스템(100)은 클라이언트-서버 구조로서, 사용자 단말(110)은 네트워크를 통해 시각적 측위 장치(120)에 접속하여 사용자의 위치 정보를 요청하고, 시각적 측위 장치(120)에서 시각적 측위 방법을 수행하는데 필요한 데이터를 전송하며, 시각적 측위 장치(120)로부터 사용자의 위치 정보를 제공받아 출력한다.The visual positioning system 100 has a client-server structure, and the user terminal 110 connects to the visual positioning device 120 through a network to request location information of the user, and uses the visual positioning method in the visual positioning device 120. Data necessary for performance is transmitted, and user location information is received from the visual positioning device 120 and output.

사용자 단말(110)은 시각적 측위 방법이 수행되는데 필요한 이미지를 획득하기 위해 카메라 기능이 구비된 기기로서, 적어도 하나의 프로세서를 포함하는 모든 종류의 하드웨어 장치를 의미하는 것이고, 실시예에 따라 해당 하드웨어 장치에서 동작하는 소프트웨어적 구성도 포괄하는 의미로서 이해될 수 있다. 예를 들어, 사용자 단말(110)은 스마트폰 또는 태블릿 PC에 해당할 수 있고, 각 장치에서 구동되는 사용자 클라이언트 및 애플리케이션을 모두 포함하는 의미로 이해될 수 있으나, 이에 제한되는 것은 아니다.The user terminal 110 is a device equipped with a camera function to acquire images necessary for performing a visual positioning method, and refers to any type of hardware device including at least one processor. Depending on the embodiment, the corresponding hardware device It can also be understood as encompassing the software configuration that operates in . For example, the user terminal 110 may correspond to a smartphone or a tablet PC, and may be understood to include both a user client and an application running on each device, but is not limited thereto.

시각적 측위 장치(120)는 웹 기반의 사용자의 위치 정보를 제공하는 장치로서, 바람직하게, 사용자 단말(110)과 연결되어 사용자 단말(110)의 카메라로부터 획득된 이미지를 기초로 시각적 측위 방법을 수행하여 증강현실을 구성하는데 필요한 카메라의 상대적 위치 정보를 사용자 단말(110)에 제공하는 서버에 해당할 수 있다.The visual positioning device 120 is a web-based device that provides location information of the user, and is preferably connected to the user terminal 110 to perform a visual positioning method based on the image acquired from the camera of the user terminal 110. Thus, it may correspond to a server that provides the user terminal 110 with the relative position information of the camera necessary to configure augmented reality.

도 2 및 3은 일 실시예에 따른 사용자 단말 및 시각적 측위 장치에 대한 블록도이다.2 and 3 are block diagrams of a user terminal and a visual positioning device according to an embodiment.

도 2를 참조하면, 시각적 측위 장치(120)는 수신부(210), 측위부(220), 전송부(230), 및 제어부(240)를 포함한다. 여기에서, 제어부(240)는 수신부(210), 측위부(220), 및 전송부(230)의 동작 및 데이터의 흐름을 제어한다.Referring to FIG. 2, the visual positioning device 120 includes a receiving unit 210, a positioning unit 220, a transmitting unit 230, and a control unit 240. Here, the control unit 240 controls the operations of the receiving unit 210, the positioning unit 220, and the transmitting unit 230 and the flow of data.

수신부(210)는 사용자 단말(110)로부터 이미지를 수신한다. 바람직하게, 사용자 단말(110)은 웹 브라우저를 통해 시각적 측위 장치(120)에 접속하고 웹 소켓 통신을 통해 카메라로부터 획득한 이미지를 전송할 수 있다.The receiving unit 210 receives an image from the user terminal 110. Preferably, the user terminal 110 can access the visual positioning device 120 through a web browser and transmit images acquired from the camera through web socket communication.

측위부(220)는 사용자 단말(110)로부터 수신한 이미지에서 2차원 고정 객체 또는 3차원 고정 객체를 검출하고, 검출한 객체 유형에 따른 딥러닝 알고리즘을 적용하여 이미지를 촬영한 사용자 단말(110)의 위치 좌표를 획득한다. 바람직하게, 도 3을 참조하면, 측위부(220)는 객체 판단 모듈(221), 박스 좌표 예측 모듈(222), 6D 포즈 예측 모듈(223), 거리 예측 모듈(224), 위치 좌표 산출 모듈(225), 및 위치 좌표 보정 모듈(226)을 포함할 수 있다. 여기에서, 각 모듈은 서로 그 기능 및 구성이 구분된 것으로 기재하였으나 실시예에 따라 서로 통합되거나 분리되어 구성될 수 있다. 또한, 도 3에는 제어부(240)를 도시하지 않았으나, 도 2와 마찬가지로 도 3의 각 구성의 동작 및 데이터의 흐름은 제어부(240)를 통하여 제어될 수 있다.The positioning unit 220 detects a two-dimensional fixed object or a three-dimensional fixed object in the image received from the user terminal 110, and applies a deep learning algorithm according to the detected object type to capture the image. Obtain the location coordinates of . Preferably, referring to FIG. 3, the positioning unit 220 includes an object determination module 221, a box coordinate prediction module 222, a 6D pose prediction module 223, a distance prediction module 224, and a position coordinate calculation module ( 225), and a position coordinate correction module 226. Here, each module is described as having a separate function and configuration, but depending on the embodiment, it may be integrated or configured separately. Additionally, the control unit 240 is not shown in FIG. 3 , but like FIG. 2 , the operations and data flow of each component in FIG. 3 can be controlled through the control unit 240 .

객체 판단 모듈(221)은 이미지에 딥러닝 알고리즘을 적용하여 이미지에 존재하는 객체의 바운딩 박스와 클래스 정보를 획득하고, 각 객체에 대한 객체 인식의 신뢰값을 기초로 이미지에 존재하는 객체를 결정할 수 있다. 여기에서, 결정된 객체는 2차원 고정 객체 또는 3차원 고정 객체에 해당할 수 있다.The object determination module 221 can apply a deep learning algorithm to the image to obtain bounding box and class information of the object present in the image, and determine the object present in the image based on the trust value of object recognition for each object. there is. Here, the determined object may correspond to a 2-dimensional fixed object or a 3-dimensional fixed object.

박스 좌표 예측 모듈(222)은 이미지에 존재하는 3차원 고정 객체의 바운딩 박스(bounding box)의 좌표를 획득한다.The box coordinate prediction module 222 acquires the coordinates of the bounding box of a 3D fixed object present in the image.

6D 포즈 예측 모듈(223)은 박스 좌표 예측 모듈(222)을 통해 획득된 바운딩 박스의 좌표와 3차원 고정 객체의 크기 및 이미지를 촬영한 사용자 단말(110)에 구비된 카메라의 내부 파라미터를 기초로 3차원 고정 객체에 대한 사용자 단말(110)의 상대 위치 좌표를 산출한다. 본 발명에 따른 시각적 측위 방법은 카메라의 내부 파라미터가 고정되어 있다는 전제하에 사용자 단말(110)의 카메라의 위치를 예측하는 것으로서, 카메라의 내부 파라미터에는 초점 거리, 주점, 및 비대칭 계수, 등이 있다. 여기에서, 초점 거리는 카메라 렌즈의 중심과 이미지 센서와의 거리를 의미하고, 주점은 카메라 렌즈의 중점을 의미하고, 비대칭 계수는 이미지 센서의 x축과 y축 사이의 뒤틀림 계수를 의미하며, 이에 대한 내부 파라미터는 아래의 [식 1]과 같이 표현될 수 있다. The 6D pose prediction module 223 is based on the coordinates of the bounding box acquired through the box coordinate prediction module 222, the size of the 3D fixed object, and the internal parameters of the camera provided in the user terminal 110 that captured the image. The relative position coordinates of the user terminal 110 with respect to the 3D fixed object are calculated. The visual positioning method according to the present invention predicts the position of the camera of the user terminal 110 under the premise that the internal parameters of the camera are fixed. The internal parameters of the camera include focal length, main point, and asymmetry coefficient. Here, the focal length refers to the distance between the center of the camera lens and the image sensor, the principal point refers to the midpoint of the camera lens, and the asymmetry coefficient refers to the distortion coefficient between the x-axis and y-axis of the image sensor. Internal parameters can be expressed as [Equation 1] below.

[식 1] [Equation 1]

일 실시예에서, 6D 포즈 예측 모듈(223)에서 수행되는 동작은 객체 판단 모듈(221)을 통해 수행될 수 있고, 즉, 객체 판단 모듈(221)과 6D 포즈 예측 모듈(223)은 통합되어 구성될 수 있다.In one embodiment, the operation performed in the 6D pose prediction module 223 may be performed through the object determination module 221, that is, the object determination module 221 and the 6D pose prediction module 223 are integrated and configured. It can be.

거리 예측 모듈(224)은 이미지에 존재하는 2차원 고정 객체와 해당 이미지를 촬영한 카메라가 구비되어 있는 사용자 단말(110)의 거리를 추정한다.The distance prediction module 224 estimates the distance between the two-dimensional fixed object present in the image and the user terminal 110 equipped with the camera that captured the image.

위치 좌표 산출 모듈(225)은 거리 예측 모듈(224)을 통해 추정된 거리와 사용자 단말(110)에 구비된 방향 센서로부터 획득한 회전 각도를 기초로 사용자 단말(110)의 상대 위치 좌표를 산출한다.The position coordinate calculation module 225 calculates the relative position coordinates of the user terminal 110 based on the distance estimated through the distance prediction module 224 and the rotation angle obtained from the direction sensor provided in the user terminal 110. .

위치 좌표 보정 모듈(226)은 6D 포즈 예측 모듈(223) 및 위치 좌표 산출 모듈(225)을 통해 획득된 상대 위치 좌표를 그리드 인덱싱 또는 복셀 인덱싱을 통해 보정한다.The position coordinate correction module 226 corrects the relative position coordinates obtained through the 6D pose prediction module 223 and the position coordinate calculation module 225 through grid indexing or voxel indexing.

전송부(230)는 측위부(220)를 통해 획득된 사용자 단말(110)의 위치 좌표를 사용자 단말(110)에 전송한다.The transmission unit 230 transmits the location coordinates of the user terminal 110 obtained through the positioning unit 220 to the user terminal 110.

도 2 및 3에 도시된 시각적 측위 장치(120)의 각 구성을 통하여 수행되는 동작은 이하 도 4를 참조하여 상세하게 설명한다. 도 4를 참조하여 설명될 각 단계는 서로 다른 구성에 의하여 수행되는 것으로 기재하였으나 이에 제한되는 것은 아니며, 실시예에 따라 각 단계들의 적어도 일부는 서로 동일하거나 다른 구성에서 수행될 수도 있다.Operations performed through each component of the visual positioning device 120 shown in FIGS. 2 and 3 will be described in detail below with reference to FIG. 4. Each step to be described with reference to FIG. 4 is described as being performed by a different configuration, but is not limited thereto, and depending on the embodiment, at least part of each step may be performed in the same or different configuration.

도 4는 일 실시예에 따른 시각적 측위 방법을 나타내는 흐름도이다.Figure 4 is a flowchart showing a visual positioning method according to an embodiment.

도 4를 참조하면, 수신부(210)는 사용자 단말(110)로부터 기설정된 크기로 조정된 이미지를 수신한다(단계 S410). 바람직하게, 실시간성을 보장하기 위해 처리 속도와 정확성을 고려하여 이미지의 크기가 미리 설정될 수 있고 조정되어야 하는 이미지의 크기는 필요에 따라 변경될 수 있으며, 예를 들어, 416px*582px로 조정될 수 있다.Referring to FIG. 4, the receiver 210 receives an image adjusted to a preset size from the user terminal 110 (step S410). Preferably, the size of the image can be preset in consideration of processing speed and accuracy to ensure real-time, and the size of the image to be adjusted can be changed as needed, for example, can be adjusted to 416px*582px. there is.

측위부(220)는 이미지에 2차원 고정 객체 또는 3차원 고정 객체의 존재 여부를 판단하고, 이미지에 존재하는 고정 객체의 유형에 따라 이미지에 딥러닝 알고리즘을 적용하여 사용자 단말(110)의 위치 좌표를 획득한다(단계 S420). The positioning unit 220 determines whether a two-dimensional fixed object or a three-dimensional fixed object exists in the image, and applies a deep learning algorithm to the image according to the type of fixed object present in the image to determine the location coordinates of the user terminal 110. Obtain (step S420).

보다 구체적으로, 측위부(220)의 객체 판단 모듈(221)은 이미지에 객체의 바운딩 박스(bounding box)의 좌표와 클래스(class) 정보를 획득하기 위한 딥러닝 알고리즘을 적용하여 객체 인식의 신뢰값을 획득한다. 바람직하게, 딥러닝 알고리즘은 YOLO(You Only Look Once)v2가 적용될 수 있고 객체 판단 모듈(221)은 YOLOv2를 기반으로 13X13 그리드 방식을 이용하여 신뢰값을 추출할 수 있다. YOLO 알고리즘은 이미지 전체에 대해서 하나의 신경망이 한 번의 계산만으로 경계박스와 클래스 확률을 예측하는 것으로서 YOLO의 통합된 모델은 객체 검출 파이프라인이 하나의 신경망으로 구성되어 있는 종단 간 형식이며 통신 속도에 문제가 없다면 시스템의 실시간성을 보장할 수 있는 알고리즘이다.More specifically, the object determination module 221 of the positioning unit 220 applies a deep learning algorithm to obtain the coordinates and class information of the object's bounding box to the image to obtain a trust value for object recognition. obtain. Preferably, the deep learning algorithm may be YOLO (You Only Look Once) v2, and the object determination module 221 may extract a trust value using a 13X13 grid method based on YOLOv2. The YOLO algorithm predicts the bounding box and class probability with only one calculation by a single neural network for the entire image. YOLO's integrated model is an end-to-end format in which the object detection pipeline consists of a single neural network, and communication speed is an issue. If there is no, it is an algorithm that can guarantee the real time of the system.

바람직하게, 객체 판단 모듈(221)은 딥러닝 알고리즘을 통해 획득한 신뢰값이 기설정된 수치를 초과하는 경우에는 해당 신뢰값을 가지는 객체가 존재하는 것으로 판단하고, 기설정된 수치를 초과하는 신뢰값을 가지는 객체가 복수개 있는 경우에는 가장 큰 신뢰값을 가지는 객체를 이미지에 존재하는 고정 객체로 결정할 수 있다. 즉, 객체 판단 모듈(221)을 통해 이미지에 2차원 고정 객체가 있는 것으로 판단되거나 3차원 고정 객체가 있는 것으로 판단될 수 있으며, 이하에서는, 객체 판단 모듈(221)을 통해 이미지에 존재하는 것으로 판단된 고정 객체의 유형에 따라 고정 객체에 대한 사용자 단말(110)의 카메라의 상대 위치 좌표를 획득하는 방법을 설명한다. Preferably, when the trust value obtained through a deep learning algorithm exceeds a preset value, the object determination module 221 determines that an object with the corresponding trust value exists, and sets the trust value exceeding the preset value. If there are multiple objects, the object with the highest trust value can be determined as the fixed object present in the image. That is, it may be determined that there is a two-dimensional fixed object in the image through the object determination module 221 or that there is a three-dimensional fixed object. Hereinafter, it may be determined that there is a two-dimensional fixed object in the image through the object determination module 221. A method of obtaining the relative position coordinates of the camera of the user terminal 110 with respect to the fixed object according to the type of fixed object will be described.

이미지에 3차원 고정 객체가 존재하는 경우When there is a 3D fixed object in the image

측위부(220)의 객체 판단 모듈(221)을 통해 이미지에 3차원 고정 객체가 존재하는 것으로 판단된 경우에는, 박스 좌표 획득 모듈(222)은 객체 판단 모듈(221)을 통해 획득된 3차원 고정 객체의 바운딩 박스의 좌표, 즉, 바운딩 박스의 꼭짓점 좌표를 획득한다. 바람직하게, 박스 좌표 획득 모듈(222)은 확장한 3D 객체 탐지 네트워크인 Single Shot Deep CNN 네트워크를 통해 바운딩 박스의 좌표를 획득할 수 있고, 바운딩 박스의 좌표는 바운딩 박스를 나타내는 직육면체의 8개의 꼭짓점 좌표에 해당한다. 또한, 박스 좌표 획득 모듈(222)은 Single Shot Deep CNN 네트워크의 사전 학습 데이터 세트의 구성시 생성된 정보인 3차원 고정 객체의 크기를 획득할 수 있고, 3차원 고정 객체의 크기는 실제 물체의 크기로서 3차원 메쉬(mesh) 모델 파일(.ply) 형태이며 3차원 고정 객체를 포함하는 직육면체의 8개의 꼭짓점 좌표로 구성될 수 있다.If it is determined that a 3D fixed object exists in the image through the object determination module 221 of the positioning unit 220, the box coordinate acquisition module 222 determines the 3D fixed object obtained through the object determination module 221. Obtain the coordinates of the object's bounding box, that is, the coordinates of the vertices of the bounding box. Preferably, the box coordinate acquisition module 222 can acquire the coordinates of the bounding box through a Single Shot Deep CNN network, which is an extended 3D object detection network, and the coordinates of the bounding box are the coordinates of eight vertices of a rectangular parallelepiped representing the bounding box. corresponds to In addition, the box coordinate acquisition module 222 can acquire the size of the 3D fixed object, which is information generated when constructing the pre-learning data set of the Single Shot Deep CNN network, and the size of the 3D fixed object is the size of the actual object. It is in the form of a 3D mesh model file (.ply) and can be composed of the coordinates of eight vertices of a rectangular parallelepiped containing a 3D fixed object.

6D 포즈 예측 모듈(223)은 바운딩 박스의 좌표, 3차원 고정 객체의 크기, 이미지를 촬영한 사용자 단말(110)에 구비된 카메라의 내부 파라미터를 기초로 카메라의 위치 및 촬영 각도를 획득하기 위한 알고리즘을 적용하여 카메라의 회전벡터 및 이동벡터를 예측하고, 3차원 고정 객체의 크기, 카메라의 내부 파라미터, 및 카메라의 회전벡터와 이동벡터를 기초로 3차원 고정 객체의 6DoF(Six Degree of Freedom) 포즈(이하, '6D 포즈'라 함)를 예측한다. 여기에서, 6D 포즈는 객체의 좌표계에서 카메라의 좌표계로의 강체 변환을 나타낸다.The 6D pose prediction module 223 is an algorithm for acquiring the position and shooting angle of the camera based on the coordinates of the bounding box, the size of the 3D fixed object, and the internal parameters of the camera provided in the user terminal 110 that captured the image. Apply to predict the camera's rotation vector and movement vector, and determine the 6DoF (Six Degree of Freedom) pose of the 3D fixed object based on the size of the 3D fixed object, the camera's internal parameters, and the camera's rotation vector and movement vector. (hereinafter referred to as ‘6D pose’) is predicted. Here, 6D pose represents a rigid body transformation from the object's coordinate system to the camera's coordinate system.

보다 구체적으로, 6D 포즈 예측 모듈(223)은 바운딩 박스의 좌표, 3차원 고정 객체의 크기, 및 카메라의 내부 파라미터를 PnP(Perspectrive-n-Point) 알고리즘에 입력하여 카메라의 회전벡터와 이동벡터를 예측하고, Single shot Deep CNN 네트워크에 3차원 고정 객체의 크기, 카메라의 내부 파라미터, 및 카메라의 회전벡터 및 이동벡터를 입력하여 3차원 고정 객체의 6D 포즈를 예측할 수 있다. More specifically, the 6D pose prediction module 223 inputs the coordinates of the bounding box, the size of the 3D fixed object, and the internal parameters of the camera into the PnP (Perspective-n-Point) algorithm to calculate the rotation vector and movement vector of the camera. You can predict the 6D pose of a 3D fixed object by inputting the size of the 3D fixed object, the camera's internal parameters, and the camera's rotation vector and movement vector into the Single shot Deep CNN network.

바람직하게, PnP 알고리즘이 적용되기 이전에, 6D 포즈 예측 모듈(223)은 3차원 고정 객체의 바운딩 박스의 꼭짓점에 해당하는 8개의 모서리를 제어 지점으로 설정하고, 바운딩 박스의 중심을 제어 지점으로 설정하여, 8개의 모서리와 1개의 중심 좌표를 파라미터화할 수 있다. 즉, 총 9개의 좌표가 리스트 형식으로 정렬되는 것이다. 이와 같은 제어 지점의 설정은 일반적이며 임의적인 모양 및 토폴로지가 있는 모든 고정 객체에 사용될 수 있다. 본 발명에 따른 시각적 측위 방법은 실시간으로 6D 포즈 예측이 가능한 종단간 훈련이 가능하며 물체를 둘러싼 3차원 바운딩 박스의 모서리의 2차원 투영을 예측할 수 있고, 2차원 바운딩 박스를 회귀시키고 3차원 바운딩 박스의 모서리의 투영을 예측하기 위하여 각 객체 인스턴스에 대해 몇 개의 2차원 지점을 추가로 예측한다. 그런 다음 이러한 2차원 좌표와 바운딩 박스의 모서리의 3차원 지상 제어 지점을 고려하여 효율적인 PnP 알고리즘을 적용함으로써 6D 포즈가 대수적으로 계산되도록 할 수 있다.Preferably, before the PnP algorithm is applied, the 6D pose prediction module 223 sets eight corners corresponding to the vertices of the bounding box of the 3D fixed object as control points, and sets the center of the bounding box as the control point. Thus, eight corners and one center coordinate can be parameterized. In other words, a total of 9 coordinates are sorted in list format. This setup of control points is general and can be used for any stationary object with arbitrary shape and topology. The visual localization method according to the present invention is capable of end-to-end training capable of real-time 6D pose prediction, can predict a 2-dimensional projection of the corners of a 3-dimensional bounding box surrounding an object, and can regress the 2-dimensional bounding box and create a 3-dimensional bounding box. To predict the projection of the edges of , several 2D points are additionally predicted for each object instance. Then, considering these 2D coordinates and the 3D ground control points of the corners of the bounding box, an efficient PnP algorithm can be applied to allow the 6D pose to be calculated algebraically.

6D 포즈 예측 모듈(223)은 회전벡터 및 이동벡터를 기초로 3차원 고정 객체에 대한 사용자 단말(110)의 상대 위치 좌표를 산출한다. 바람직하게, 6D 포즈 예측 모듈(223)은 회전벡터를 로드리게스(Rodrigues) 함수를 이용하여 회전행렬로 표현하고, 회전행렬과 이동벡터를 이용하여 3차원 고정 객체와 사용자 단말(110)의 상대 좌표를 산출할 수 있다. 예를 들어, 점 x₁을 3차원 공간에서의 x₂로 변환한 것을 회전행렬(R)로 나타내는 경우, 점 X₁=[x₁y₁z₁]^T에서 X₂=[x₂y₂z₂]^T로의 매핑 함수는 아래의 [식 2]와 같이 표현될 수 있다.The 6D pose prediction module 223 calculates the relative position coordinates of the user terminal 110 with respect to the 3D fixed object based on the rotation vector and the movement vector. Preferably, the 6D pose prediction module 223 expresses the rotation vector as a rotation matrix using the Rodrigues function, and uses the rotation matrix and movement vector to determine the relative coordinates of the 3D fixed object and the user terminal 110. It can be calculated. For example _, if _the transformation of point x ₁ to x ₂ in three-dimensional _space _is expressed as a rotation matrix ( _R ₎ , _the ^point z ₂ ] The mapping function to ^T can be expressed as [Equation 2] below.

[식 2] [Equation 2]

여기에서, 역행렬이 존재하는 회전행렬은 일반 선형 군 GL(3, R)에 해당한다. 이러한 회전 중에서 행렬식이 ±1인 직교행렬들을 직교군이라고 하고, 즉, (O(3)⊂GL(3,R)) 관계이다. 변환행렬 중에서 두 점의 거리가 변하지 않는 변환을 등거리 변환이라고 하고 그 중에서 행렬식이 +1인 행렬을 평면 등거리 변환이라고 하며, 이러한 특수 직교군을 SO(3)라고 한다. (SO(3)⊂O(3)) 관계를 가지는 SO(3) 군은 순수한 회전만 표현할 수 있으므로, 이동을 표현하기 위해서는 아래의 [식 3]과 같이 위치행렬과 이동행렬(T)을 이용한 4×4 행렬을 고려되어야 하고, 3차원 점들은 동차좌표로 확장되어야 한다(GL(4, R)). 결과적으로, 완전한 6D 포즈는 [식 4]와 같이 3차원 직교군으로 객체의 3D 회전 R∈SO(3) 및 3차원 변환 t∈R³의 두 부분으로 구성된다. 즉, [식 4]를 참고하면, 위치 행렬을 적용시킨 X₁에 이동행렬(T)를 적용시키면 X₂가 되는 것을 알 수 있다.Here, the rotation matrix with an inverse matrix corresponds to the general linear group GL(3, R). Among these rotations, orthogonal matrices with a determinant of ±1 are called orthogonal groups, that is, the relationship is (O(3)⊂GL(3,R)). Among transformation matrices, transformations in which the distance between two points does not change are called equidistant transformations, and among them, matrices with a determinant of +1 are called plane equidistant transformations, and this special orthogonal group is called SO(3). The SO(3) group with the relationship (SO(3)⊂O(3)) can only express pure rotation, so to express movement, use the position matrix and translation matrix (T) as shown in [Equation 3] below. A 4×4 matrix must be considered, and the 3D points must be expanded into homogeneous coordinates (GL(4, R)). As a result, the complete 6D pose is a 3D orthogonal group as shown in [Equation 4] and consists of two parts: the 3D rotation R ∈ SO (3) and the 3D transformation t ∈ R ³ of the object. In other words, referring to [Equation 4], it can be seen that if the movement matrix (T) is applied to X ₁ to which the position matrix is applied, X ₂ becomes.

[식 3] [Equation 3]

[식 4] [Equation 4]

위치 좌표 보정 모듈(226)은 6D 포즈 예측 모듈(223)을 통해 획득된 상대 위치 좌표를 복셀 인덱싱을 통해 보정한다. 바람직하게, 위치 좌표 보정 모듈(226)은 이미지가 촬영된 실내 공간의 3축(X,Y,Z)의 전체 크기를 연산하고 기설정된 너비, 길이, 및 높이가 같은 정육면체의 복셀 크기로 분할한 후 넘버링 하여 복셀의 주소를 생성할 수 있고, 복셀의 주소로 변환된 복셀 데이터베이스를 구성할 수 있다. 여기에서, 실내 공간의 3축의 전체 크기는 3차원 센서로 스캔된 실내 공간에 대한 포인트 클라우드 좌표의 최대값과 최소값을 이용하여 계산될 수 있으며, 복셀의 주소는 사용자 단말(110)의 공간상의 위치를 결정하기 위한 주소이다. 예를 들어, 위치 좌표 보정 모듈(226)은 좌표를 복셀 주소로 변환하는 [표 1]의 함수를 이용하여 상대 위치 좌표를 보정할 수 있다.The position coordinate correction module 226 corrects the relative position coordinates obtained through the 6D pose prediction module 223 through voxel indexing. Preferably, the position coordinate correction module 226 calculates the overall size of the three axes (X, Y, Z) of the indoor space where the image was captured and divides it into the voxel size of a cube with the same preset width, length, and height. Afterwards, the address of the voxel can be generated by numbering, and a voxel database converted to the address of the voxel can be constructed. Here, the overall size of the three axes of the indoor space can be calculated using the maximum and minimum values of the point cloud coordinates for the indoor space scanned with a 3D sensor, and the address of the voxel is the spatial location of the user terminal 110. This is the address for determining . For example, the position coordinate correction module 226 can correct relative position coordinates using the function in [Table 1] that converts coordinates into voxel addresses.

size: voxel size (cm). 복셀의 단위
idx: voxel index
a: 실내 공간의 크기
-a <= x,y,z <= a (좌표 (x y z)의 범위)
num = 2a/size (복셀 한줄의 갯수)

idx_x = x/size
if x > 0: (양수면 한 칸 밀리기 때문에 1을 더해주고 num/2를 더하는 이유는 양수로 만들기 위해)
idx_x = idx_x + num/2+1
else:
idx_x = idx_x + num/2

idx_y = y/size
if y > 0: (y 와 z는 0이 존재해도 되기 때문)
idx_y = idx_y + num/2
else:
idx_y = idx_y + num/2-1

idx_z = z/size
if z > 0:
idx_z = idx_z + num/2
else:
idx_z = idx_z + num/2-1

idx = [idx_x, idx_y*num, idx_z*num*2] (복셀 주소)size: voxel size (cm). unit of voxel
idx: voxel index
a: Size of indoor space
-a <= x,y,z <= a (range of coordinates (xyz))
num = 2a/size (number of voxels in one line)

idx_x = x/size
if
idx_x = idx_x + num/2+1
else:
idx_x = idx_x + num/2

idx_y = y/size
if y > 0: (because y and z can be 0)
idx_y = idx_y + num/2
else:
idx_y = idx_y + num/2-1

idx_z = z/size
if z > 0:
idx_z = idx_z + num/2
else:
idx_z = idx_z + num/2-1

idx = [idx_x, idx_y*num, idx_z*num*2] (voxel address)

본 발명을 통해 추정된 카메라의 위치 좌표(즉, 사용자 단말(110)의 위치 좌표)가 실제 카메라의 위치 좌표와 동일한 복셀 주소가 아니면 평균적 오차율은 복셀 중심과 다른 복셀의 중심 거리만큼 차이가 나고, 예를 들어, αcm의 모서리를 가진 복셀의 데이터베이스에서 같은 주소를 가질 때 실제 거리 차이는 가 초과되지 않는다. 반면 복셀의 데이터베이스에서 주소가 다를 경우의 거리 차이는 [식 5]와 같다. 여기에서, (x₁,y₁,z₁)는 실제 카메라의 위치 좌표이고 (x₂,y₂,z₂)는 추정한 카메라의 위치 좌표이다.If the position coordinates of the camera (i.e., the position coordinates of the user terminal 110) estimated through the present invention are not the same voxel address as the position coordinates of the actual camera, the average error rate differs by the distance between the center of the voxel and the center of another voxel, For example, when voxels with edges of αcm have the same address in a database, the actual distance difference is is not exceeded. On the other hand, the distance difference when the addresses in the voxel database are different is as in [Equation 5]. Here, (x ₁ , y ₁ , z ₁ ) are the actual camera position coordinates and (x ₂ , y ₂ , z ₂ ) are the estimated camera position coordinates.

[식 5] [Equation 5]

도 5는 3차원 고정 객체에 대해 상대 측위한 결과를 복셀에 인덱싱시킨 결과를 시각화하여 나타낸 도면이다. 푸른 점은 레이블 된 실측 지점이고 붉은 점은 네트워크가 예측한점이며 복셀 사이즈가 20cm일 때 예측값과 실측값이 모두 같은 복셀에 인덱싱된 모습을 볼 수 있다.Figure 5 is a diagram visualizing the results of indexing the relative localization results for a 3D fixed object into voxels. The blue dots are the labeled ground truth points and the red dots are the points predicted by the network. When the voxel size is 20cm, you can see that both the predicted and measured values are indexed in the same voxel.

이미지에 2차원 고정 객체가 존재하는 경우When there is a two-dimensional fixed object in the image

측위부(220)의 객체 판단 모듈(221)을 통해 이미지에 존재하는 고정 객체가 2차원 고정 객체인 것으로 판단되면, 측위부(220)의 거리 예측 모듈(224)은 2차원 고정 객체와 사용자 단말(110)의 거리를 추정한다. 바람직하게, 객체 판단 모듈(221)에서 YOLO 알고리즘을 통해 2차원 고정 객체에 대한 바운딩 박스와 클래스의 확률이 예측되면, 거리 예측 모듈(224)은 2차원 고정 객체에 대한 바운딩 박스의 좌표와 클래스 정보를 획득하고, 2차원 고정 객체의 클래스와 동일한 클래스인 참조 이미지를 이용하여 2차원 고정 객체와 사용자 단말(110)의 거리를 산출할 수 있다. 여기에서, 바운딩 박스위 좌표는 YOLOv2 알고리즘을 통해 획득될 수 있고, 참조 이미지에 대한 정보, 예를 들어, 참조 이미지의 클래스, 참조 이미지의 바운딩 박스, 참조 이미지와 참조 이미지를 촬영한 사용자 단말 간의 거리, 참조 이미지의 실제 크기, 등은 사전에 획득되어 시각적 측위 장치(120) 또는 별도의 데이터베이스에 저장될 수 있다.If the fixed object present in the image is determined to be a two-dimensional fixed object through the object determination module 221 of the positioning unit 220, the distance prediction module 224 of the positioning unit 220 determines the two-dimensional fixed object and the user terminal. Estimate the distance of (110). Preferably, when the probability of the bounding box and class for the two-dimensional fixed object is predicted in the object determination module 221 through the YOLO algorithm, the distance prediction module 224 calculates the coordinates and class information of the bounding box for the two-dimensional fixed object. It is possible to obtain and calculate the distance between the two-dimensional fixed object and the user terminal 110 using a reference image of the same class as the class of the two-dimensional fixed object. Here, the coordinates on the bounding box can be obtained through the YOLOv2 algorithm, and information about the reference image, such as the class of the reference image, the bounding box of the reference image, and the distance between the reference image and the user terminal that captured the reference image. , the actual size of the reference image, etc. may be obtained in advance and stored in the visual positioning device 120 or a separate database.

바람직하게, 거리 예측 모듈(224)은 참조 이미지의 바운딩 박스의 높이, 참조 이미지와 사용자 단말(110)과의 거리, 및 참조 이미지의 실제 높이를 기초로 초점 거리를 산출하고, 2차원 고정 객체의 실제 높이 및 2차원 고정 객체의 바운딩 박스의 높이를 기초로 2차원 고정 객체와 사용자 단말(110)의 거리를 산출할 수 있다. 보다 구체적으로, 도 6을 참조하면, 거리 예측 모듈(224)은 아래의 [식 5]를 통해 초점 거리를 획득하고, 획득한 초점 거리를 이용하여 아래의 [식 6]을 통해 2차원 고정 객체와 사용자 단말(110) 간의 거리를 산출한다.Preferably, the distance prediction module 224 calculates the focal distance based on the height of the bounding box of the reference image, the distance between the reference image and the user terminal 110, and the actual height of the reference image, and calculates the focal distance of the two-dimensional fixed object. The distance between the two-dimensional fixed object and the user terminal 110 can be calculated based on the actual height and the height of the bounding box of the two-dimensional fixed object. More specifically, referring to FIG. 6, the distance prediction module 224 obtains the focal distance through [Equation 5] below, and uses the obtained focal distance to predict a two-dimensional fixed object through [Equation 6] below. Calculate the distance between and the user terminal 110.

[식 5] [Equation 5]

[식 6] [Equation 6]

여기에서, D는 참조 이미지와 카메라의 거리에 해당하고, P는 YOLO 알고리즘을 통해 획득된 참조 이미지의 바운딩 박스의 높이에 해당하고, H는 참조 이미지의 실제 높이에 해당하고, F는 초점 거리에 해당하고, P'는 YOLO 알고리즘을 통해 획득된 2차원 고정 객체의 바운딩 박스의 높이에 해당하고, D'은 2차원 고정 객체와 사용자 단말(110)의 카메라 간의 거리에 해당한다.Here, D corresponds to the distance between the reference image and the camera, P corresponds to the height of the bounding box of the reference image obtained through the YOLO algorithm, H corresponds to the actual height of the reference image, and F is the focal length. Corresponding, P' corresponds to the height of the bounding box of the two-dimensional fixed object obtained through the YOLO algorithm, and D' corresponds to the distance between the two-dimensional fixed object and the camera of the user terminal 110.

측위부(220)의 위치 좌표 산출 모듈(225)은 거리 예측 모듈(224)을 통해 획득된 2차원 고정 객체와 사용자 단말의 간의 거리 및 사용자 단말(110)에 구비된 방향 센서로부터 획득된 회전 각도를 이용하여, 사용자 단말(110)의 2차원 고정 객체에 대한 상대 위치 좌표를 산출할 수 있다.The position coordinate calculation module 225 of the positioning unit 220 calculates the distance between the two-dimensional fixed object and the user terminal obtained through the distance prediction module 224 and the rotation angle obtained from the direction sensor provided in the user terminal 110. Using , the relative position coordinates of the two-dimensional fixed object of the user terminal 110 can be calculated.

먼저, 회전 각도의 획득에 대해 설명한다. 사용자 단말(110)의 센서는 데이터값을 표현하기 위해 도 7에 도시된 바와 같이 표준 3축 좌표계를 사용하고, 해당 좌표계는 사용자 단말(110)이 기본 방향으로 고정될 때 사용자 단말(110)의 화면을 기준으로 정의된다. 사용자 단말(110)이 정상 위치에 있을 때 x축은 수평으로 오른쪽을 가리키고 y축은 수직으로 위쪽을 가리키고 z축은 화면 전면을 가리키며, 화면 뒤의 위치는 음의 z 값으로 표시된다. 이 좌표계에서는 사용자 단말(110)의 화면 방향이 변경되더라도 축이 바뀌지 않고, 이는 사용자 단말(110)이 이동함에 따라 센서의 좌표계가 절대 변하지 않음을 의미한다. 안드로이드 운영 체제에서의 회전 행렬 RM은 아래의 [식 7]과 같이 정의된다.First, the acquisition of the rotation angle will be explained. The sensor of the user terminal 110 uses a standard three-axis coordinate system as shown in FIG. 7 to express data values, and the coordinate system is that of the user terminal 110 when the user terminal 110 is fixed in the basic direction. It is defined based on the screen. When the user terminal 110 is in its normal position, the x-axis points horizontally to the right, the y-axis points vertically upward, and the z-axis points to the front of the screen, and the position behind the screen is indicated by a negative z value. In this coordinate system, the axis does not change even if the screen direction of the user terminal 110 changes, which means that the coordinate system of the sensor never changes as the user terminal 110 moves. The rotation matrix RM in the Android operating system is defined as [Equation 7] below.

[식 7] [Equation 7]

여기에서, x, y, z는 사용자 단말(110)에 상대적인 축이고 E와 N은 각각 동쪽과 북쪽을 가리키는 단위 벡터이며 G는 지구의 중심을 가리키는 중력 벡터이다. 사용자 단말(110)의 오일러 각 Ψ, θ, Φ는 각각 G, E, N의 회전으로 정의된다. 바람직하게, 좌표 산출 모듈(225)은 Φ만 사용하고 이동 평균 필터를 이용해 값을 보정하며 사용자 단말(110)이 2차원 고정 객체의 정면을 바라보며 정상 위치에 있을 때의 Φ값과 변화한 각도의 차이를 이용하여 회전 각도 θ'값을 획득할 수 있다.Here, x, y, and z are axes relative to the user terminal 110, E and N are unit vectors pointing east and north, respectively, and G is a gravity vector pointing to the center of the Earth. The Euler angles Ψ, θ, and Φ of the user terminal 110 are defined as rotations of G, E, and N, respectively. Preferably, the coordinate calculation module 225 uses only Φ, corrects the value using a moving average filter, and calculates the Φ value when the user terminal 110 is in the normal position while looking at the front of the two-dimensional fixed object and the changed angle. The rotation angle θ' value can be obtained using the difference.

그 다음, 도 8을 참조하면, 좌표 산출 모듈(225)은 사용자 단말(110)의 카메라 위치와 2차원 고정 객체의 거리를 D'이라고 하고 2차원 고정 객체에 수직선과의 사이각을 θ'라고 할 때 사용자 단말(110)의 카메라의 2차원 고정 객체에 대한 상대 위치 좌표(x, y)는 (sinθ'*D',cosθ'*D')으로 획득할 수 있다.Next, referring to FIG. 8, the coordinate calculation module 225 sets the distance between the camera position of the user terminal 110 and the two-dimensional fixed object as D' and the angle between the two-dimensional fixed object and the vertical line as θ'. When doing this, the relative position coordinates (x, y) of the camera of the user terminal 110 with respect to the two-dimensional fixed object can be obtained as (sinθ'*D', cosθ'*D').

측위부(220)의 위치 좌표 보정 모듈(226)은 좌표 산출 모듈(225)을 통해 획득한 상대 위치 좌표를 그리드 인덱싱을 통해 보정할 수 있다. 바람직하게, 위치 좌표 보정 모듈(226)은 이미지가 촬영된 실내 공간의 전체 X,Y 축의 크기를 그리드 메쉬 좌표의 최대값 및 최소값을 이용하여 산출하고, 그리드 메쉬의 X,Y축을 기설정된 가로 및 세로가 같은 사이즈의 정사각형 그리드로 분할한 후 각 그리드의 주소를 생성할 수 있고, 그리드의 주소로 변환된 그리드 데이터베이스를 구성할 수 있다. 예를 들어, 위치 좌표 산출 모듈(225)은 좌표를 그리드 주소로 변환하는 [표 2]의 함수를 이용하여 상대 위치 좌표를 보정할 수 있다.The position coordinate correction module 226 of the positioning unit 220 may correct the relative position coordinates obtained through the coordinate calculation module 225 through grid indexing. Preferably, the position coordinate correction module 226 calculates the size of the entire X and Y axes of the indoor space where the image was captured using the maximum and minimum values of the grid mesh coordinates, and sets the After dividing the grid into square grids of the same vertical size, the address of each grid can be created, and a grid database converted to the address of the grid can be constructed. For example, the position coordinate calculation module 225 can correct relative position coordinates using the function in [Table 2] that converts coordinates into grid addresses.

size: grid size (cm), (그리드*줄의 갯수)
idx: grid index
a: 실내 공간의 크기
-a <= x,y <= a (x, y의 범위)
num = 2a/size

idx_x = x/size
if x > 0:
idx_x = idx_x + num/2+1
else:
idx_x = idx_x + num/2

idx_y = y/size
-> y는 음수가 나올 수 없기 때문에 범위에 따른 if 연산이 필요 없음

idx = [idx_x, idx_y](그리드 주소)size: grid size (cm), (grid*number of lines)
idx: grid index
a: Size of indoor space
-a <= x,y <= a (range of x,y)
num = 2a/size

idx_x = x/size
if x > 0:
idx_x = idx_x + num/2+1
else:
idx_x = idx_x + num/2

idx_y = y/size
-> Since y cannot be a negative number, there is no need for if operation depending on the range.

idx = [idx_x, idx_y] (grid address)

본 발명을 통해 추정된 카메라의 위치 좌표(즉, 사용자 단말(110)의 위치 좌표)가 실제 카메라의 위치 좌표와 동일한 그리드 주소가 아닌 경우에는 평균적 오차율은 그리드의 중심과 다른 그리드의 중심 거리만큼 차이가 나고, 예를 들어, αcm의 변을 가진 그리드 데이터베이스에서 같은 주소를 가질 때 실제 거리 차이는 가 초과되지 않으며 같은 주소가 아닐 때 거리 차이는 [식 6]과 같다. 여기에서, (x₁,y₁)는 실제 카메라의 위치 좌표이고 (x₂,y₂)는 추정한 카메라의 위치 좌표이다.If the location coordinates of the camera (i.e., the location coordinates of the user terminal 110) estimated through the present invention are not the same grid address as the location coordinates of the actual camera, the average error rate is the difference between the center of the grid and the center of the other grid. For example, when having the same address in a grid database with sides of αcm, the actual distance difference is When is not exceeded and is not the same address, the distance difference is as in [Equation 6]. Here, (x ₁ , y ₁ ) are the actual camera position coordinates and (x ₂ , y ₂ ) are the estimated camera position coordinates.

[식 6] [Equation 6]

도 9는 2차원 고정 객체에 대해 상대 측위한 결과를 그리드에 인덱싱시킨 결과를 시각화하여 나타낸 도면으로서, 그리드 사이즈가 100cm인 경우에 푸른 점은 레이블 된 실측 지점이고 붉은 점은 예측된 카메라의 위치를 나타낸다. Figure 9 is a diagram visualizing the results of indexing the relative positioning results for a two-dimensional fixed object into a grid. When the grid size is 100 cm, the blue dots are the labeled actual measurement points and the red dots are the predicted camera positions. indicates.

전송부(230)는 6D 포즈 예측 모듈(223) 및 위치 좌표 산출 모듈(225)을 통해 획득된 위치 좌표가 위치 좌표 보정 모듈(226)을 통해 보정되면, 보정된 위치 좌표를 사용자 단말(110)에 전송한다(단계 S430). 바람직하게, 사용자 단말(110)은 보정된 위치 좌표를 이용하여 사용자의 위치 정보를 제공받을 수 있다.When the position coordinates obtained through the 6D pose prediction module 223 and the position coordinate calculation module 225 are corrected through the position coordinate correction module 226, the transmitter 230 sends the corrected position coordinates to the user terminal 110. Transmit to (step S430). Preferably, the user terminal 110 can receive the user's location information using the corrected location coordinates.

한편, 본 발명의 실시예와 관련하여 설명된 방법 또는 알고리즘의 단계들은 하드웨어로 직접 구현되거나, 하드웨어에 의해 실행되는 소프트웨어 모듈로 구현되거나, 또는 이들의 결합에 의해 구현될 수 있다. 소프트웨어 모듈은 RAM(Random Access Memory), ROM(Read Only Memory), EPROM(Erasable Programmable ROM), EEPROM(Electrically Erasable Programmable ROM), 플래시 메모리(Flash Memory), 하드 디스크, 착탈형 디스크, CD-ROM, 또는 본 발명이 속하는 기술 분야에서 잘 알려진 임의의 형태의 컴퓨터 판독가능 기록매체에 상주할 수도 있다. Meanwhile, the steps of the method or algorithm described in relation to the embodiment of the present invention may be implemented directly in hardware, implemented as a software module executed by hardware, or a combination thereof. The software module may be RAM (Random Access Memory), ROM (Read Only Memory), EPROM (Erasable Programmable ROM), EEPROM (Electrically Erasable Programmable ROM), Flash Memory, hard disk, removable disk, CD-ROM, or It may reside on any type of computer-readable recording medium well known in the art to which the present invention pertains.

본 발명의 구성 요소들은 하드웨어인 컴퓨터와 결합되어 실행되기 위해 프로그램(또는 애플리케이션)으로 구현되어 매체에 저장될 수 있다. 본 발명의 구성 요소들은 소프트웨어 프로그래밍 또는 소프트웨어 요소들로 실행될 수 있으며, 이와 유사하게, 실시예는 데이터구조, 프로세스들, 루틴들 또는 다른 프로그래밍 구성들의 조합으로 구현되는 다양한 알고리즘을 포함하여, C, C++, 자바(Java), 어셈블러(assembler) 등과 같은 프로그래밍 또는 스크립팅 언어로 구현될 수 있다. 기능적인 측면들은 하나 이상의 프로세서들에서 실행되는 알고리즘으로 구현될 수 있다.The components of the present invention may be implemented as a program (or application) and stored in a medium in order to be executed in conjunction with a hardware computer. Components of the invention may be implemented as software programming or software elements, and similarly, embodiments may include various algorithms implemented as combinations of data structures, processes, routines or other programming constructs, such as C, C++, , may be implemented in a programming or scripting language such as Java, assembler, etc. Functional aspects may be implemented as algorithms running on one or more processors.

전술한 본 발명에 따른 시각적 측위 방법 및 장치에 대한 바람직한 실시예에 대하여 설명하였지만, 본 발명은 이에 한정되는 것이 아니고 특허청구범위와 발명의 상세한 설명 및 첨부한 도면의 범위 안에서 여러가지로 변형하여 실시하는 것이 가능하고 이 또한 본 발명에 속한다.Although preferred embodiments of the visual positioning method and device according to the present invention have been described above, the present invention is not limited thereto, and may be implemented with various modifications within the scope of the claims, the detailed description of the invention, and the accompanying drawings. It is possible and this also belongs to the present invention.

100: 시각적 측위 시스템
110: 사용자 단말
120: 시각적 측위 장치
210: 수신부
220: 측위부
230: 전송부
240: 제어부
221: 객체 판단 모듈
222: 박스 좌표 획득 모듈
223: 6D 포즈 예측 모듈
224: 거리 예측 모듈
225: 위치 좌표 산출 모듈
226: 위지 좌표 보정 모듈100: Visual positioning system
110: user terminal
120: visual positioning device
210: receiving unit
220: Positioning unit
230: Transmission unit
240: control unit
221: Object judgment module
222: Box coordinate acquisition module
223: 6D pose prediction module
224: Distance prediction module
225: Position coordinate calculation module
226: Position coordinate correction module

Claims

시각적 측위 장치에서 수행되는 시각적 측위 방법에 있어서,
(a) 사용자 단말로부터 기설정된 크기로 조정된 이미지를 수신하는 단계;
(b) 상기 이미지에 2차원 고정 객체 또는 3차원 고정 객체의 존재 여부를 판단하는 단계;
(c) 상기 이미지에 존재하는 고정 객체의 유형에 따라 상기 이미지에 딥러닝 알고리즘을 적용하여 상기 사용자 단말의 위치 좌표를 획득하는 단계; 및
(d) 상기 위치 좌표를 상기 사용자 단말에 전송하는 단계를 포함하는 시각적 측위 방법.
In a visual positioning method performed in a visual positioning device,
(a) receiving an image adjusted to a preset size from a user terminal;
(b) determining whether a two-dimensional fixed object or a three-dimensional fixed object exists in the image;
(c) obtaining location coordinates of the user terminal by applying a deep learning algorithm to the image according to the type of fixed object present in the image; and
(d) A visual positioning method comprising transmitting the location coordinates to the user terminal.

제1항에 있어서, 상기 (b) 단계는,
상기 이미지에 객체의 바운딩 박스의 좌표와 클래스 정보를 획득하기 위한 딥러닝 알고리즘을 적용하여 객체 인식의 신뢰값을 획득하는 단계; 및
상기 신뢰값이 기설정된 수치를 초과하는 경우, 해당 신뢰값을 가지는 객체가 존재하는 것으로 판단하는 단계를 포함하는 것을 특징으로 하는 시각적 측위 방법.
The method of claim 1, wherein step (b) is:
Obtaining a trust value for object recognition by applying a deep learning algorithm to obtain the coordinates and class information of the object's bounding box to the image; and
A visual positioning method comprising the step of determining that an object having the corresponding trust value exists when the trust value exceeds a preset value.

제2항에 있어서, 상기 (b) 단계는,
상기 이미지에 기설정된 수치를 초과하는 신뢰값을 가지는 객체가 복수개 있는 경우에는, 가장 큰 신뢰값을 가지는 객체를 상기 이미지에 존재하는 고정 객체로 결정하는 단계를 포함하는 것을 특징으로 하는 시각적 측위 방법.
The method of claim 2, wherein step (b) is,
When there are a plurality of objects in the image with a trust value exceeding a preset value, a visual location method comprising determining the object with the largest trust value as a fixed object present in the image.

제1항에 있어서, 상기 (c) 단계는,
상기 이미지에 존재하는 고정 객체의 유형이 2차원 고정 객체인 경우에는,
상기 2차원 고정 객체와 상기 사용자 단말의 거리를 추정하는 단계;
상기 사용자 단말에 구비된 방향 센서로부터 회전 각도를 획득하는 단계;
상기 거리와 회전 각도를 기초로 상기 사용자 단말의 상기 2차원 고정 객체에 대한 상대 위치 좌표를 산출하는 단계; 및
상기 상대 위치 좌표를 그리드 인덱싱을 통해 보정하는 단계를 포함하는 것을 특징으로 하는 시각적 측위 방법.
The method of claim 1, wherein step (c) is:
If the type of fixed object present in the image is a two-dimensional fixed object,
estimating the distance between the two-dimensional fixed object and the user terminal;
Obtaining a rotation angle from a direction sensor provided in the user terminal;
calculating relative position coordinates of the user terminal with respect to the two-dimensional fixed object based on the distance and rotation angle; and
A visual positioning method comprising correcting the relative position coordinates through grid indexing.

제4항에 있어서, 상기 거리를 추정하는 단계는,
상기 2차원 고정 객체에 대한 바운딩 박스의 좌표와 클래스 정보를 획득하는 단계; 및
상기 2차원 고정 객체의 클래스와 동일한 클래스인 참조 이미지를 이용하여 상기 2차원 고정 객체와 상기 사용자 단말의 거리를 산출하는 단계를 포함하는 것을 특징으로 하는 시각적 측위 방법.
The method of claim 4, wherein the step of estimating the distance includes:
Obtaining coordinates and class information of a bounding box for the two-dimensional fixed object; and
A visual positioning method comprising calculating the distance between the two-dimensional fixed object and the user terminal using a reference image of the same class as the class of the two-dimensional fixed object.

제5항에 있어서, 상기 거리를 산출하는 단계는,
상기 참조 이미지의 바운딩 박스의 높이, 상기 참조 이미지에 대해 사전에 입력된 사용자 단말과의 거리, 및 상기 참조 이미지의 실제 높이를 기초로 산출된 초점 거리를 획득하는 단계;
상기 2차원 고정 객체의 실제 높이 및 상기 2차원 고정 객체의 바운딩 박스의 높이를 획득하는 단계; 및
상기 초점 거리, 상기 2차원 고정 객체의 실제 높이, 및 상기 2차원 고정 객체의 바운딩 박스의 높이를 기초로 상기 2차원 고정 객체와 상기 사용자 단말의 거리를 산출하는 단계를 포함하는 것을 특징으로 하는 시각적 측위 방법.
The method of claim 5, wherein calculating the distance comprises:
Obtaining a focal distance calculated based on the height of the bounding box of the reference image, the distance to the user terminal input in advance for the reference image, and the actual height of the reference image;
Obtaining an actual height of the two-dimensional fixed object and a height of a bounding box of the two-dimensional fixed object; and
Comprising the step of calculating the distance between the two-dimensional fixed object and the user terminal based on the focal distance, the actual height of the two-dimensional fixed object, and the height of the bounding box of the two-dimensional fixed object. Positioning method.

제4항에 있어서, 상기 보정하는 단계는,
상기 이미지가 촬영된 실내 공간에 대한 그리드 메쉬를 기설정된 사이즈의 정사각형 그리드로 분할하는 단계;
상기 분할한 각 그리드의 주소를 생성하여 그리드 데이터베이스를 구성하는 단계; 및
상기 그리드 데이터베이스를 기초로 상대 위치 좌표를 그리드 주소로 변환하여 상대 위치 좌표를 보정하는 단계를 포함하는 것을 특징으로 하는 시각적 측위 방법.
The method of claim 4, wherein the correcting step includes:
dividing the grid mesh for the indoor space where the image was captured into square grids of a preset size;
Constructing a grid database by generating addresses for each of the divided grids; and
A visual positioning method comprising the step of correcting the relative position coordinates by converting the relative position coordinates into a grid address based on the grid database.

제1항에 있어서, 상기 (c) 단계는,
상기 이미지에 존재하는 고정 객체의 유형이 3차원 고정 객체인 경우에는,
상기 3차원 고정 객체의 바운딩 박스의 좌표 및 3차원 고정 객체의 크기를 획득하는 단계;
상기 바운딩 박스의 좌표, 상기 3차원 고정 객체의 크기, 상기 이미지를 촬영한 사용자 단말에 구비된 카메라의 내부 파라미터를 기초로 카메라의 위치 및 촬영 각도를 획득하기 위한 알고리즘을 적용하여 상기 3차원 고정 객체의 회전벡터 및 이동벡터를 예측하는 단계;
상기 회전벡터 및 이동벡터를 기초로 상기 3차원 고정 객체에 대한 상기 사용자 단말의 상대 위치 좌표를 산출하는 단계; 및
상기 상대 위치 좌표를 복셀 인덱싱을 통해 보정하는 단계를 포함하는 것을 특징으로 하는 시각적 측위 방법.
The method of claim 1, wherein step (c) is:
If the type of fixed object present in the image is a 3D fixed object,
Obtaining coordinates of a bounding box of the 3D fixed object and a size of the 3D fixed object;
Applying an algorithm to obtain the position and shooting angle of the camera based on the coordinates of the bounding box, the size of the 3D fixed object, and the internal parameters of the camera provided in the user terminal that captured the image, the 3D fixed object predicting the rotation vector and movement vector of;
calculating relative position coordinates of the user terminal with respect to the three-dimensional fixed object based on the rotation vector and the movement vector; and
A visual positioning method comprising correcting the relative position coordinates through voxel indexing.

제8항에 있어서, 상기 회전벡터 및 이동벡터를 예측하는 단계는,
상기 알고리즘을 적용하기 이전에,
상기 바운딩 박스의 꼭짓점에 해당하는 8개의 모서리를 제어 지점으로 설정하는 단계;
상기 바운딩 박스의 중심을 제어 지점으로 설정하는 단계; 및
상기 3차원 고정 객체에 대한 제어 지점의 좌표를 파라미터화하는 단계를 포함하는 것을 특징으로 하는 시각적 측위 방법.
The method of claim 8, wherein the step of predicting the rotation vector and the movement vector comprises:
Before applying the above algorithm,
Setting eight corners corresponding to vertices of the bounding box as control points;
setting the center of the bounding box as a control point; and
A visual localization method comprising parameterizing coordinates of control points for the three-dimensional fixed object.

제8항에 있어서, 상기 보정하는 단계는,
상기 이미지가 촬영된 실내 공간을 기설정된 사이즈의 정육면체의 복셀로 분할하는 단계;
상기 분할한 각 복셀의 주소를 생성하여 복셀 데이터베이스를 구성하는 단계; 및
상기 복셀 데이터베이스를 기초로 상대 위치 좌표를 복셀 주소로 변환하여 상대 위치 좌표를 보정하는 단계를 포함하는 것을 특징으로 하는 시각적 측위 방법.
The method of claim 8, wherein the correcting step includes:
Dividing the indoor space where the image was captured into cubic voxels of a preset size;
Constructing a voxel database by generating an address for each of the divided voxels; and
A visual positioning method comprising the step of correcting the relative position coordinates by converting the relative position coordinates into voxel addresses based on the voxel database.

사용자 단말로부터 기설정된 크기로 조정된 이미지를 수신하는 수신부;
상기 이미지에 2차원 고정 객체 또는 3차원 고정 객체의 존재 여부를 판단하고, 상기 이미지에 존재하는 고정 객체의 유형에 따라 상기 이미지에 딥러닝 알고리즘을 적용하여 상기 사용자 단말의 위치 좌표를 획득하는 측위부; 및
상기 위치 좌표를 상기 사용자 단말에 전송하는 전송부를 포함하는 시각적 측위 장치.
A receiving unit that receives an image adjusted to a preset size from a user terminal;
A positioning unit that determines whether a two-dimensional fixed object or a three-dimensional fixed object exists in the image and obtains the location coordinates of the user terminal by applying a deep learning algorithm to the image according to the type of fixed object present in the image. ; and
A visual positioning device including a transmission unit that transmits the location coordinates to the user terminal.

컴퓨터 판독 가능 매체에 저장되어 있는 컴퓨터 프로그램에 있어서,
상기 컴퓨터 프로그램의 명령이 실행될 경우, 제1항 내지 제8항 중 어느 한 항에 따른 방법이 수행되는 것을 특징으로 하는 컴퓨터 판독 가능 매체에 저장되어 있는 컴퓨터 프로그램.
In a computer program stored on a computer-readable medium,
A computer program stored in a computer-readable medium, wherein when the command of the computer program is executed, the method according to any one of claims 1 to 8 is performed.