KR102285427B1

KR102285427B1 - Method and apparatus of visual feature augmentation for visual slam using object labels

Info

Publication number: KR102285427B1
Application number: KR1020190039736A
Authority: KR
Inventors: 장병탁; 이충연; 이현도; 황인준
Original assignee: 서울대학교산학협력단
Priority date: 2018-11-28
Filing date: 2019-04-04
Publication date: 2021-08-03
Also published as: KR20200063949A

Abstract

물체 레이블을 활용한 Visual SLAM에서의 영상 특징점 강화 방법 및 장치를 제시하며, 물체 레이블을 활용한 Visual SLAM에서의 영상 특징점 강화 방법 및 장치는 카메라를 통해 획득된 영상을 구성하는 프레임으로부터 특징점을 추출하고, 추출된 특징점의 특성에 대한 정보를 포함하는 기술자의 분포를 기록한 시각적 단어 벡터를 생성하는 시각적특징처리부, 상기 프레임으로부터 상기 특징점 영역에 위치한 물체를 인식하고, 인식된 물체에 대한 의미정보를 추출하는 의미적특징처리부 및 상기 시각적 단어 벡터에 상기 의미정보가 추가된 시각적 의미 단어 벡터를 기초로 현재 위치가 기 방문한 지점인지 식별하는 위치추정부를 포함할 수 있다.A method and device for enhancing image feature points in Visual SLAM using object labels are presented. The method and device for image feature point enhancement in Visual SLAM using object labels extracts feature points from frames composing images acquired through a camera and , a visual feature processing unit generating a visual word vector that records the distribution of descriptors including information on the characteristics of the extracted feature point, recognizing an object located in the feature point area from the frame, and extracting semantic information about the recognized object It may include a semantic feature processing unit and a location estimator for identifying whether the current location is a previously visited point based on the visual semantic word vector to which the semantic information is added to the visual word vector.

Description

물체 레이블을 활용한 Visual SLAM에서의 영상 특징점 강화 방법 및 장치{METHOD AND APPARATUS OF VISUAL FEATURE AUGMENTATION FOR VISUAL SLAM USING OBJECT LABELS}Method and apparatus for enhancing image feature points in Visual SLAM using object labels

본 명세서에서 개시되는 실시예들은 물체 레이블을 활용한 Visual SLAM(Simultaneous Localization and Mapping)에서의 영상 특징점 강화 방법 및 장치에 관한 것으로, 보다 상세하게는 카메라 영상 기반으로 미지의 공간에서 로봇이 주행하기 위해 지도를 생성하는 동시에 자신의 위치를 추정하는 방법 및 장치에 관한 것이다.Embodiments disclosed herein relate to a method and apparatus for enhancing image feature points in Visual SLAM (Simultaneous Localization and Mapping) using an object label, and more particularly, to a robot driving in an unknown space based on a camera image. It relates to a method and apparatus for generating a map and estimating its location at the same time.

일반적으로 사람이 공간 인식 과정에서 주변 환경에 배치된 물체의 종류와 색상 등 다양한 의미적 특징들을 기억하고 활용하는 것과 달리, Visual SLAM은 카메라의 영상으로부터 인식 가능한 특징점들을 추출하여 3차원 지도에 매핑하는 방식을 사용하며, 현재 시점의 영상 특징점들을 기존 지도와 비교하여 매칭 결과가 가장 높은 지점으로 현재 위치를 추정한다.Unlike in general, people memorize and utilize various semantic features such as types and colors of objects placed in the surrounding environment in the process of spatial recognition, Visual SLAM extracts recognizable feature points from the camera image and maps them to a 3D map. method is used, and the current position is estimated as the point with the highest matching result by comparing the image feature points of the current view with the existing map.

하지만 영상처리 기반의 방법은 처리 속도가 빠른 장점이 있지만, 영상으로부터 추출 가능한 정보량이 부족한 태생적인 한계를 지닌다. However, although the image processing-based method has the advantage of high processing speed, it has an inherent limitation in that the amount of information that can be extracted from the image is insufficient.

즉, 특정 픽셀과 그 주변 픽셀들의 밝기값으로부터 추출된 국소적인 정보만을 사용하기 때문에 모바일 로봇의 움직임에 따른 영상 변화에 민감하며, 이로 인해 충분한 영상 특징점이 추출되지 않는 경우 위치 추정에 실패하고, 이전에 방문한 장소를 올바로 인식하지 못하게 되는 문제가 있다.That is, because only local information extracted from the brightness values of a specific pixel and its surrounding pixels is used, it is sensitive to image changes according to the movement of the mobile robot. There is a problem in that the visited places cannot be recognized correctly.

관련하여 선행기술 문헌인 한국특허공개번호 제 10-2016-0111008 호에서는 카메라가 파노라마 SLAM(simultaneous localization and mapping)을 수행하고 있는 동안, 적어도 하나의 모션 센서를 사용하여 카메라의 움직임에서 병진 모션을 검출하고 유한 피쳐들의 추적을 위해 3차원 맵을 초기화하는 것에 기초하는 단안 시각적 SLAM을 위한 기술들을 제시할 뿐, 상술된 문제점을 해결하지 못한다. In relation to this, in Korean Patent Publication No. 10-2016-0111008, a prior art document, while the camera is performing panoramic SLAM (simultaneous localization and mapping), at least one motion sensor is used to detect translational motion in the movement of the camera. and present techniques for monocular visual SLAM based on initializing a three-dimensional map for tracking of finite features, but do not solve the above-mentioned problems.

따라서 상술된 문제점을 해결하기 위한 기술이 필요하게 되었다.Therefore, there is a need for a technique for solving the above-mentioned problems.

한편, 전술한 배경기술은 발명자가 본 발명의 도출을 위해 보유하고 있었거나, 본 발명의 도출 과정에서 습득한 기술 정보로서, 반드시 본 발명의 출원 전에 일반 공중에게 공개된 공지기술이라 할 수는 없다.On the other hand, the above-mentioned background art is technical information that the inventor possessed for the derivation of the present invention or acquired in the process of derivation of the present invention, and it cannot be said that it is necessarily a known technique disclosed to the general public before the filing of the present invention. .

본 명세서에서 개시되는 실시예들은, 물체 레이블을 활용한 Visual SLAM에서의 영상 특징점 강화 방법 및 장치를 제시하는 데 목적이 있다. Embodiments disclosed herein have an object to present a method and apparatus for enhancing image feature points in Visual SLAM using object labels.

본 명세서에서 개시되는 실시예들은, 물체 인식을 기반으로 물체의 의미적 정보를 추출하여 영상의 특징점에 결합하는 물체 레이블을 활용한 Visual SLAM에서의 영상 특징점 강화 방법 및 장치를 제시하는 데 목적이 있다.Embodiments disclosed in this specification have an object to present a method and apparatus for enhancing image feature points in Visual SLAM using an object label that extracts semantic information of an object based on object recognition and combines it with feature points of an image. .

본 명세서에서 개시되는 실시예들은, 물체의 의미적 정보를 추가적으로 이용함으로써 이전 방문 장소의 인식 성능을 향상시키는 물체 레이블을 활용한 Visual SLAM에서의 영상 특징점 강화 방법 및 장치를 제시하는 데 목적이 있다. Embodiments disclosed in the present specification provide a method and apparatus for enhancing image feature points in Visual SLAM using an object label that improves recognition performance of a previously visited place by additionally using semantic information of the object.

상술한 기술적 과제를 달성하기 위한 기술적 수단으로서, 일 실시예에 따르면, SLAM에서 영상의 특징점을 강화하는 장치에 있어서, 카메라를 통해 획득된 영상을 구성하는 프레임으로부터 특징점을 추출하고, 추출된 특징점의 특성에 대한 정보를 포함하는 기술자의 분포를 기록한 시각적 단어 벡터를 생성하는 시각적특징처리부, 상기 프레임으로부터 상기 특징점 영역에 위치한 물체를 인식하고, 인식된 물체에 대한 의미정보를 추출하는 의미적특징처리부 및 상기 시각적 단어 벡터에 상기 의미정보가 추가된 시각적 의미 단어 벡터를 기초로 현재 위치가 기 방문한 지점인지 식별하는 위치추정부를 포함할 수 있다.As a technical means for achieving the above-described technical problem, according to an embodiment, in an apparatus for enhancing feature points of an image in SLAM, extracting feature points from a frame constituting an image acquired through a camera, and A visual feature processing unit for generating a visual word vector that records the distribution of descriptors including information on characteristics, a semantic feature processing unit for recognizing an object located in the characteristic point region from the frame, and extracting semantic information about the recognized object, and and a location estimator for identifying whether the current location is a previously visited point based on the visual word vector to which the semantic information is added to the visual word vector.

다른 실시예에 따르면, 영상특징점강화장치가 SLAM에서 영상의 특징점을 강화하는 방법에 있어서, 카메라를 통해 획득된 영상을 구성하는 프레임으로부터 특징점을 추출하고, 추출된 특징점의 특성에 대한 정보를 포함하는 기술자의 분포를 기록한 시각적 단어 벡터를 생성하는 단계, 상기 프레임으로부터 상기 특징점 영역에 위치한 물체를 인식하고, 인식된 물체에 대한 의미정보를 추출하는 단계 및 상기 시각적 단어 벡터에 상기 의미정보가 추가된 시각적 의미 단어 벡터를 기초로 현재 위치가 기 방문한 지점인지 식별하는 단계를 포함할 수 있다.According to another embodiment, in a method for an image feature point enhancement device to enhance feature points of an image in SLAM, extracting feature points from a frame constituting an image acquired through a camera, and including information on the characteristics of the extracted feature points generating a visual word vector recording the distribution of descriptors, recognizing an object located in the feature point region from the frame, extracting semantic information about the recognized object, and adding the semantic information to the visual word vector The method may include identifying whether the current location is a previously visited point based on the semantic word vector.

또 다른 실시예에 따르면, 영상특징점강화방법을 수행하는 프로그램이 기록된 컴퓨터 판독이 가능한 기록매체로서, 상기 영상특징점강화방법은, 카메라를 통해 획득된 영상을 구성하는 프레임으로부터 특징점을 추출하고, 추출된 특징점의 특성에 대한 정보를 포함하는 기술자의 분포를 기록한 시각적 단어 벡터를 생성하는 단계, 상기 프레임으로부터 상기 특징점 영역에 위치한 물체를 인식하고, 인식된 물체에 대한 의미정보를 추출하는 단계 및 상기 시각적 단어 벡터에 상기 의미정보가 추가된 시각적 의미 단어 벡터를 기초로 현재 위치가 기 방문한 지점인지 식별하는 단계를 포함할 수 있다.According to another embodiment, as a computer-readable recording medium on which a program for performing an image feature enhancement method is recorded, the image feature point enhancement method extracts feature points from a frame constituting an image acquired through a camera, and extracts generating a visual word vector recording the distribution of descriptors including information on the characteristics of the characteristic point, recognizing an object located in the key point area from the frame, and extracting semantic information about the recognized object; and identifying whether the current location is a previously visited point based on the visual meaning word vector to which the semantic information is added to the word vector.

다른 실시예에 따른면, 영상특징점강화장치에 의해 수행되며, 상기 영상특징점강화방법을 수행하기 위해 기록매체에 저장된 컴퓨터프로그램으로서, 상기 영상특징점강화방법은, 카메라를 통해 획득된 영상을 구성하는 프레임으로부터 특징점을 추출하고, 추출된 특징점의 특성에 대한 정보를 포함하는 기술자의 분포를 기록한 시각적 단어 벡터를 생성하는 단계, 상기 프레임으로부터 상기 특징점 영역에 위치한 물체를 인식하고, 인식된 물체에 대한 의미정보를 추출하는 단계 및 상기 시각적 단어 벡터에 상기 의미정보가 추가된 시각적 의미 단어 벡터를 기초로 현재 위치가 기 방문한 지점인지 식별하는 단계를 포함할 수 있다.According to another embodiment, a computer program stored in a recording medium is performed by an image feature enhancement apparatus and stored in a recording medium to perform the image feature enhancement method, wherein the image feature enhancement method comprises a frame constituting an image acquired through a camera. extracting a feature point from , generating a visual word vector recording the distribution of descriptors including information on the characteristics of the extracted feature point, recognizing an object located in the feature point region from the frame, and semantic information about the recognized object extracting , and identifying whether the current location is a previously visited point based on the visual meaning word vector to which the semantic information is added to the visual word vector.

전술한 과제 해결 수단 중 어느 하나에 의하면, 물체 레이블을 활용한 Visual SLAM에서의 영상 특징점 강화 방법 및 장치를 제시할 수 있다. According to any one of the above-described problem solving means, it is possible to present a method and apparatus for enhancing image feature points in Visual SLAM using an object label.

전술한 과제 해결 수단 중 어느 하나에 의하면, 영상의 특징점이 추출되지 않는 경우라도 물체의 의미적 정보를 이용하여 방문장소를 올바르게 인식하는 물체 레이블을 활용한 Visual SLAM에서의 영상 특징점 강화 방법 및 장치를 제시할 수 있다. According to any one of the above-mentioned problem solving means, a method and apparatus for enhancing image feature points in Visual SLAM using an object label that correctly recognizes a visited place using semantic information of an object even when feature points of an image are not extracted can present

전술한 과제 해결 수단 중 어느 하나에 의하면, 물체의 의미적 정보를 추가적으로 이용함으로써 이전 방문 장소의 인식 성능을 향상시키는 물체 레이블을 활용한 Visual SLAM에서의 영상 특징점 강화 방법 및 장치를 제시할 수 있다.According to any one of the above-mentioned problem solving means, it is possible to present a method and apparatus for enhancing image feature points in Visual SLAM using an object label that improves recognition performance of a previously visited place by additionally using semantic information of an object.

개시되는 실시예들에서 얻을 수 있는 효과는 이상에서 언급한 효과들로 제한되지 않으며, 언급하지 않은 또 다른 효과들은 아래의 기재로부터 개시되는 실시예들이 속하는 기술분야에서 통상의 지식을 가진 자에게 명확하게 이해될 수 있을 것이다.Effects obtainable in the disclosed embodiments are not limited to the above-mentioned effects, and other effects not mentioned are clear to those of ordinary skill in the art to which the embodiments disclosed from the description below belong. will be able to be understood

도 1 은 일 실시예에 따른 영상특징점강화장치를 도시한 블록도이다.
도 2 및 도 3 은 일 실시예에 따른 영상특징점강화방법을 설명하기 위한 순서도이다. 1 is a block diagram illustrating an image feature point enhancement apparatus according to an embodiment.
2 and 3 are flowcharts for explaining an image feature point enhancement method according to an embodiment.

아래에서는 첨부한 도면을 참조하여 다양한 실시예들을 상세히 설명한다. 아래에서 설명되는 실시예들은 여러 가지 상이한 형태로 변형되어 실시될 수도 있다. 실시예들의 특징을 보다 명확히 설명하기 위하여, 이하의 실시예들이 속하는 기술분야에서 통상의 지식을 가진 자에게 널리 알려져 있는 사항들에 관해서 자세한 설명은 생략하였다. 그리고, 도면에서 실시예들의 설명과 관계없는 부분은 생략하였으며, 명세서 전체를 통하여 유사한 부분에 대해서는 유사한 도면 부호를 붙였다.Hereinafter, various embodiments will be described in detail with reference to the accompanying drawings. The embodiments described below may be modified and implemented in various different forms. In order to more clearly describe the characteristics of the embodiments, detailed descriptions of matters widely known to those of ordinary skill in the art to which the following embodiments belong are omitted. In addition, in the drawings, parts irrelevant to the description of the embodiments are omitted, and similar reference numerals are attached to similar parts throughout the specification.

명세서 전체에서, 어떤 구성이 다른 구성과 "연결"되어 있다고 할 때, 이는 ‘직접적으로 연결’되어 있는 경우뿐 아니라, ‘그 중간에 다른 구성을 사이에 두고 연결’되어 있는 경우도 포함한다. 또한, 어떤 구성이 어떤 구성을 "포함"한다고 할 때, 이는 특별히 반대되는 기재가 없는 한, 그 외 다른 구성을 제외하는 것이 아니라 다른 구성들을 더 포함할 수도 있음을 의미한다.Throughout the specification, when a component is said to be "connected" with another component, it includes not only the case where it is 'directly connected' but also the case where it is 'connected with another component in between'. In addition, when a component "includes" a component, it means that other components may be further included, rather than excluding other components, unless otherwise stated.

이하 첨부된 도면을 참고하여 실시예들을 상세히 설명하기로 한다.Hereinafter, embodiments will be described in detail with reference to the accompanying drawings.

다만 이를 설명하기에 앞서, 아래에서 사용되는 용어들의 의미를 먼저 정의한다. However, before explaining this, the meaning of the terms used below is first defined.

‘SLAM(Simultaneous Localization and Mapping)’이란 미지의 공간에서 로봇이 주행하기 위해 지도를 생성하는 동시에 자신의 위치를 추정하는 기술이다. 그리고 카메라 영상 기반의 SLAM을 Visual SLAM 이라고 한다. 'SLAM (Simultaneous Localization and Mapping)' is a technology that generates a map for a robot to drive in an unknown space and at the same time estimates its location. And SLAM based on camera image is called Visual SLAM.

‘키프레임’은 로봇에 구비된 카메라를 통해 획득된 영상을 구성하는 프레임 중 로봇의 현재 위치를 식별하는데 이용되는 프레임이다. A ‘key frame’ is a frame used to identify the current position of the robot among the frames constituting the image acquired through the camera provided in the robot.

‘단어의 가방(Bag Of Words)’은 복수의 텍스트가 그룹핑된 그룹을 이용하여 입력되는 텍스트를 숫자로 변환하는 기법이다.‘Bag Of Words’ is a technique that converts input text into numbers using a group in which multiple texts are grouped.

‘특징점’은 카메라를 통해 촬영된 영상에서 주변 점들과 강도(intensity)값 차이가 큰 점이다. A 'feature point' is a point with a large difference in intensity value from surrounding points in the image captured by the camera.

위에 정의한 용어 이외에 설명이 필요한 용어는 아래에서 각각 따로 설명한다.In addition to the terms defined above, terms that require explanation will be separately explained below.

도 1은 일 실시예에 따른 영상특징점강화장치(10)를 설명하기 위한 블럭도이다.1 is a block diagram illustrating an image feature enhancement apparatus 10 according to an embodiment.

영상특징점강화장치(10)는 네트워크(N)를 통해 원격지의 서버에 접속하거나, 타 단말 및 서버와 연결 가능한 컴퓨터나 휴대용 단말기, 텔레비전, 웨어러블 디바이스(Wearable Device) 등으로 구현될 수 있다. 여기서, 컴퓨터는 예를 들어, 웹 브라우저(WEB Browser)가 탑재된 노트북, 데스크톱(desktop), 랩톱(laptop)등을 포함하고, 휴대용 단말기는 예를 들어, 휴대성과 이동성이 보장되는 무선 통신 장치로서, PCS(Personal Communication System), PDC(Personal Digital Cellular), PHS(Personal Handyphone System), PDA(Personal Digital Assistant), GSM(Global System for Mobile communications), IMT(International Mobile Telecommunication)-2000, CDMA(Code Division Multiple Access)-2000, W-CDMA(W-Code Division Multiple Access), Wibro(Wireless Broadband Internet), 스마트폰(Smart Phone), 모바일 WiMAX(Mobile Worldwide Interoperability for Microwave Access) 등과 같은 모든 종류의 핸드헬드(Handheld) 기반의 무선 통신 장치를 포함할 수 있다. 또한, 텔레비전은 IPTV(Internet Protocol Television), 인터넷 TV(Internet Television), 지상파 TV, 케이블 TV 등을 포함할 수 있다. 나아가 웨어러블 디바이스는 예를 들어, 시계, 안경, 액세서리, 의복, 신발 등 인체에 직접 착용 가능한 타입의 정보처리장치로서, 직접 또는 다른 정보처리장치를 통해 네트워크를 경유하여 원격지의 서버에 접속하거나 타 단말과 연결될 수 있다.The image feature enhancement apparatus 10 may be implemented as a computer or portable terminal, a television, a wearable device, etc. that can be connected to a remote server through the network N or connectable to other terminals and servers. Here, the computer includes, for example, a laptop, a desktop, and a laptop equipped with a web browser, and the portable terminal is, for example, a wireless communication device that ensures portability and mobility. , PCS (Personal Communication System), PDC (Personal Digital Cellular), PHS (Personal Handyphone System), PDA (Personal Digital Assistant), GSM (Global System for Mobile communications), IMT (International Mobile Telecommunication)-2000, CDMA (Code) All kinds of handhelds such as Division Multiple Access)-2000, W-Code Division Multiple Access (W-CDMA), Wireless Broadband Internet (Wibro), Smart Phone, Mobile Worldwide Interoperability for Microwave Access (WiMAX), etc. It may include a (Handheld)-based wireless communication device. In addition, the television may include IPTV (Internet Protocol Television), Internet TV (Internet Television), terrestrial TV, cable TV, and the like. Furthermore, the wearable device is, for example, a type of information processing device that can be worn directly on the human body, such as watches, glasses, accessories, clothes, shoes, etc. can be connected with

도 1 을 참조하면, 일 실시예에 따른 영상특징점강화장치(10)는, 입출력부(110), 제어부(120), 통신부(130) 및 메모리(140)를 포함할 수 있다. Referring to FIG. 1 , an image feature enhancement apparatus 10 according to an embodiment may include an input/output unit 110 , a control unit 120 , a communication unit 130 , and a memory 140 .

우선, 입출력부(110)는 사용자로부터 입력을 수신하기 위한 입력부와, 작업의 수행 결과 또는 영상특징점강화장치(10)의 상태 등의 정보를 표시하기 위한 출력부를 포함할 수 있다. 예를 들어, 입출력부(110)는 사용자 입력을 수신하는 조작 패널(operation panel) 및 화면을 표시하는 디스플레이 패널(display panel) 등을 포함할 수 있다.First, the input/output unit 110 may include an input unit for receiving an input from a user, and an output unit for displaying information such as a result of a task or a state of the image feature enhancement device 10 . For example, the input/output unit 110 may include an operation panel for receiving a user input and a display panel for displaying a screen.

구체적으로, 입력부는 키보드, 물리 버튼, 터치 스크린, 카메라 또는 마이크 등과 같이 다양한 형태의 사용자 입력을 수신할 수 있는 장치들을 포함할 수 있다. 예를 들어, 카메라 센서는 로봇에 구비되어 로봇의 주위를 촬영할 수 있다. Specifically, the input unit may include devices capable of receiving various types of user input, such as a keyboard, a physical button, a touch screen, a camera, or a microphone. For example, a camera sensor may be provided in the robot to photograph the surroundings of the robot.

또한, 출력부는 디스플레이 패널 또는 스피커 등을 포함할 수 있다. 다만, 이에 한정되지 않고 입출력부(110)는 다양한 입출력을 지원하는 구성을 포함할 수 있다.Also, the output unit may include a display panel or a speaker. However, the present invention is not limited thereto, and the input/output unit 110 may include a configuration supporting various input/output.

제어부(120)는 영상특징점강화장치(10)의 전체적인 동작을 제어하며, CPU 등과 같은 프로세서를 포함할 수 있다. 제어부(120)는 입출력부(110)를 통해 수신한 사용자 입력에 대응되는 동작을 수행하도록 영상특징점강화장치(10)에 포함된 다른 구성들을 제어할 수 있다.The control unit 120 controls the overall operation of the image feature enhancement apparatus 10, and may include a processor such as a CPU. The controller 120 may control other components included in the image feature enhancement apparatus 10 to perform an operation corresponding to a user input received through the input/output unit 110 .

예를 들어, 제어부(120)는 메모리(140)에 저장된 프로그램을 실행시키거나, 메모리(140)에 저장된 파일을 읽어오거나, 새로운 파일을 메모리(140)에 저장할 수도 있다.For example, the controller 120 may execute a program stored in the memory 140 , read a file stored in the memory 140 , or store a new file in the memory 140 .

이러한 제어부(120)는 시각적특징처리부(121), 의미적특징처리부(122), 위치추정부(123) 또는 지도갱신부(124)를 포함할 수 있다. The control unit 120 may include a visual feature processing unit 121 , a semantic feature processing unit 122 , a location estimation unit 123 , or a map update unit 124 .

우선, 시각적특징처리부(121)는 입출력부(110)에서 의해 촬영된 영상의 프레임으로부터 프레임의 특징점을 추출할 수 있고, 추출된 특징점 각각에 대해 기술자(descriptor)를 계산하여 각 특징점의 특성을 저장할 수 있다. First, the visual feature processing unit 121 may extract a feature point of a frame from a frame of an image captured by the input/output unit 110 , calculate a descriptor for each extracted feature point, and store the characteristic of each feature point. can

이를 위해, 시각적특징처리부(121)는 기술자를 K-means 클러스터링으로 그룹핑하여 단어의 가방에 포함되는 시각적 단어(Visual vocabulary)를 생성하여 영상의 프레임별 기술자의 분포를 기록한 시각적 단어 벡터(Visual vocabulary vector)를 생성할 수 있고, 생성된 시각적 단어 벡터를 통해 각 프레임의 특성을 저장할 수 있다. To this end, the visual feature processing unit 121 groups the descriptors by K-means clustering to generate a visual vocabulary included in the bag of words, and records the distribution of the descriptors for each frame of the image (Visual vocabulary vector). ) can be generated, and the properties of each frame can be stored through the generated visual word vector.

그리고 의미적특징처리부(122)는 특징점이 속한 영역의 물체 레이블인 의미 정보를 기술자의 시각적 단어 벡터에 추가하여 시각적 의미 단어 벡터(visual-semantic vocabulary vector)를 생성할 수 있다. In addition, the semantic feature processing unit 122 may generate a visual-semantic vocabulary vector by adding semantic information that is an object label of a region to which the feature point belongs to the visual word vector of the descriptor.

이때, 시각적 의미 단어 벡터(visual-semantic vocabulary vector)를 수식으로 나타내면 아래와 같다. In this case, a visual-semantic vocabulary vector is expressed as an equation as follows.

프레임 f 에 포함된 의미 단어(semantic vocabulary)와 시각적 단어(visual vocabulary)의 집합을 각각

,

로 두고, f에 포함된 특징점들 중 의미 단어 l을 가지는 집합을

, 시각적 단어 i 를 가지는 집합을

로 두었을 때, 시각적 의미 단어 벡터(visual-semantic vocabulary vector)

를 아래의 수식과 같이 나타낼 수 있다. The sets of semantic vocabulary and visual vocabulary contained in frame f are each

,

, and among the feature points included in f, a set having a semantic word l

, the set with the visual word i

When placed as , a visual-semantic vocabulary vector

can be expressed as the following formula.

[수학식 1][Equation 1]

여기서

는 단어의 가방에서 단어 i의 가중치이고,

는 시각적 단어 벡터와 의미 단어 벡터간의 가중치를 조절하는 상수이다. here

is the weight of word i in the bag of words,

is a constant that adjusts the weight between the visual word vector and the semantic word vector.

그리고 위치추정부(123)는 촬영된 영상의 프레임을 이용하여 현재 위치에서 촬영된 영상의 프레임이 이전에 방문한 지점인지를 식별하기 위해 프레임간 유사도를 계산할 수 있다.In addition, the location estimator 123 may calculate a similarity between frames in order to identify whether the frame of the image photographed at the current location is a previously visited point using the frame of the photographed image.

이때, 위치추정부(123)는 현재 위치의 프레임과 유사도를 비교할 후보 키 프레임(key frame)을 선택할 수 있으며, 키 프레임의 시각적 의미 단어 벡터를 비교하여 현재 프레임과 비슷한 단어 분포를 가진 키 프레임을 유사도를 비교할 키 프레임으로 선택할 수 있다. At this time, the location estimator 123 may select a candidate key frame to compare similarity with the frame of the current location, and compare the visual meaning word vector of the key frame to select a key frame having a word distribution similar to the current frame. You can select the key frame to compare similarity to.

우선, 위치추정부(123)는 촬영된 영상의 모든 키 프레임 각각과 현재 위치의 프레임 f간 공통된 물체 레이블의 수를 카운팅할 수 있고, 프레임 f와의 공통된 레이블의 수가 0이 아니면, 키 프레임 k 를 제 1 후보프레임으로 설정할 수 있다. First, the location estimator 123 may count the number of common object labels between each of all key frames of the captured image and the frame f of the current location, and if the number of labels common to the frame f is not 0, the key frame k It can be set as the first candidate frame.

그리고 위치추정부(123)는 제 1 후보프레임에 포함된 키 프레임 k 중 현재 위치의 프레임 f와 공통된 물체 레이블의 수를 기초로 기 설정된 비율의 키 프레임 k를 추출하여 제 2 후보프레임으로 설정할 수 있다. In addition, the position estimator 123 may extract key frames k of a preset ratio based on the number of object labels common to the frame f of the current position among key frames k included in the first candidate frame and set it as the second candidate frame. there is.

예를 들어, 위치추정부(123)는 제 1 후보프레임에 포함된 키 프레임 k 에서 현재 위치의 프레임 f와 공통된 물체 레이블의 수를 기준으로 키 프레임 k를 제 2 후보프레임으로 설정할 수 있으며, 이를 수식으로 나타내면 아래와 같다.For example, the position estimator 123 may set the key frame k as the second candidate frame based on the number of object labels common to the frame f of the current position in the key frame k included in the first candidate frame, Expressed as a formula, it is as follows:

[수학식 2][Equation 2]

C1 : 제 1 후보프레임, C2 : 제 2 후보프레임 C1: first candidate frame, C2: second candidate frame

그리고 위치추정부(123)는 제 2 후보프레임에 속한 키 프레임 k의 이웃프레임

로 설정하고, 이웃프레임

에 속하는 키 프레임 k’ 중 현재 프레임 f와 유사한 프레임을 제 3 후보프레임으로 설정할 수 있다. And the position estimator 123 is a neighboring frame of the key frame k belonging to the second candidate frame.

set to , and neighboring frames

Among key frames k' belonging to , a frame similar to the current frame f may be set as the third candidate frame.

예를 들어, 위치추정부(123)는 이웃프레임

에 속하는 프레임 k’에 대해 현재 프레임 f 에 속하는 시각적 의미 단어 벡터와 프레임 k’ 에 속하는 시각적 의미 단어 벡터간 동일여부를 L1_norm을 이용하여 계산하여 현재 프레임 f와 가장 유사한 프레임 k’을 제 3 후보프레임으로 설정할 수 있다.For example, the location estimator 123 is a neighboring frame

For the frame k' belonging to the current frame f, it is calculated using L1_norm whether the visual meaning word vector belonging to the current frame f and the visual meaning word vector belonging to the frame k' are the same, and the frame k' most similar to the current frame f is selected as the third candidate frame. can be set to

이후, 위치추정부(123)는 제 3 후보프레임에 속한 키 프레임 중 현재 프레임 f와의 벡터와 유사한 프레임을 후보 키 프레임으로 설정할 수 있다. Thereafter, the position estimator 123 may set a frame similar to a vector with the current frame f among key frames belonging to the third candidate frame as a candidate key frame.

이와 같이 위치추정부(123)가 현재 위치를 식별하기 위해 현재 프레임과 유사도를 비교할 후보프레임을 선택하는 알고리즘을 수도코드로 표시하면 아래와 같다. As described above, when the algorithm for selecting a candidate frame to be compared with the current frame in order to identify the current position by the position estimator 123 is expressed as a pseudo code, the algorithm is as follows.

이와 같이 현재 프레임과 유사도를 비교할 후보 키 프레임을 시각적 단어 벡터를 이용하여 선택함으로써 기존에 지도상에 저장된 모든 키 프레임(key frame)에 대하여 현재 프레임과의 유사도를 비교하지 않아 계산량을 줄일 수 있다. As described above, by selecting a candidate key frame to be compared with the current frame using a visual word vector, the amount of calculation can be reduced by not comparing the similarity with the current frame with respect to all key frames previously stored on the map.

그리고 위치추정부(123)는 선택된 후보 키 프레임을 기준으로 현재 프레임과의 유사도를 계산할 수 있다. In addition, the location estimator 123 may calculate a similarity with the current frame based on the selected candidate key frame.

즉, 위치추정부(123)는 현재 프레임과 선택된 후보 키 프레임 각각에서 추출된 특징점의 기술자의 시각적 의미 단어 벡터를 이용하여 기술자간 유사도를 계산함으로써 프레임간 유사도를 계산할 수 있다. That is, the position estimator 123 may calculate the similarity between frames by calculating the degree of similarity between the descriptors using the visual meaning word vector of the descriptor of the feature point extracted from each of the current frame and the selected candidate key frame.

예를 들어, 위치추정부(123)는 해밍거리(Hamming distance) 방법을 이용하되, 두 기술자의 시각적 의미 단어 벡터에 포함된 물체 레이블이 다르면 패널티

를 곱함으로써 기술자간 유사도를 아래의 수식에 따라 계산할 수 있다. For example, the location estimator 123 uses a Hamming distance method, but is penalized if the object labels included in the visual meaning word vectors of the two descriptors are different.

By multiplying by , the degree of similarity between descriptors can be calculated according to the following equation.

[수학식 3][Equation 3]

그리고 위치추정부(123)는 현재 프레임과 후보 키 프레임간의 유사도를 기초로 현재 위치가 이전에 방문했던 지점인지 여부를 식별할 수 있다.In addition, the location estimator 123 may identify whether the current location is a previously visited point based on the similarity between the current frame and the candidate key frame.

이를 통해, 프레임의 특징점이 위치한 영역의 물체를 인식하여 의미 정보를 포함한 시각적 의미 단어 벡터를 이용하여 프레임간의 유사여부를 식별함으로써 계산량을 줄이면서도 이전 방문 지점인지 여부를 정확하게 판단할 수 있다. Through this, by recognizing the object in the region where the feature point of the frame is located and using the visual semantic word vector including the semantic information to identify the similarity between the frames, it is possible to accurately determine whether it is the previous visited point while reducing the amount of computation.

한편, 지도갱신부(124)는 촬영된 영상에 기초하여 지도를 생성하고, 현재 위치가 기 방문한 지점으로 식별되면 지도를 갱신할 수 있다. Meanwhile, the map update unit 124 may generate a map based on the captured image, and update the map when the current location is identified as a previously visited point.

예를 들어, 현재 위치에서 촬영된 영상의 프레임이 기 저장된 영상의 프레임과 유사하여 기 방문 지점으로 판단되면, 지도갱신부(124)는 지도상의 회귀점을 폐쇄하여 지도를 갱신할 수 있다. For example, if a frame of an image captured at the current location is similar to a frame of a pre-stored image and thus is determined to be a previously visited point, the map update unit 124 may update the map by closing the regression point on the map.

통신부(130)는 다른 디바이스 또는 네트워크와 유무선 통신을 수행할 수 있다. 이를 위해, 통신부(130)는 다양한 유무선 통신 방법 중 적어도 하나를 지원하는 통신 모듈을 포함할 수 있다. 예를 들어, 통신 모듈은 칩셋(chipset)의 형태로 구현될 수 있다.The communication unit 130 may perform wired/wireless communication with other devices or networks. To this end, the communication unit 130 may include a communication module supporting at least one of various wired and wireless communication methods. For example, the communication module may be implemented in the form of a chipset.

통신부(130)가 지원하는 무선 통신은, 예를 들어 Wi-Fi(Wireless Fidelity), Wi-Fi Direct, 블루투스(Bluetooth), UWB(Ultra Wide Band) 또는 NFC(Near Field Communication) 등일 수 있다. 또한, 통신부(130)가 지원하는 유선 통신은, 예를 들어 USB 또는 HDMI(High Definition Multimedia Interface) 등일 수 있다.The wireless communication supported by the communication unit 130 may be, for example, Wireless Fidelity (Wi-Fi), Wi-Fi Direct, Bluetooth, Ultra Wide Band (UWB), or Near Field Communication (NFC). In addition, the wired communication supported by the communication unit 130 may be, for example, USB or High Definition Multimedia Interface (HDMI).

메모리(140)에는 파일, 어플리케이션 및 프로그램 등과 같은 다양한 종류의 데이터가 설치 및 저장될 수 있다. 제어부(120)는 메모리(140)에 저장된 데이터에 접근하여 이를 이용하거나, 또는 새로운 데이터를 메모리(140)에 저장할 수도 있다. 또한, 제어부(120)는 메모리(140)에 설치된 프로그램을 실행할 수도 있다. Various types of data such as files, applications, and programs may be installed and stored in the memory 140 . The controller 120 may access and use data stored in the memory 140 , or may store new data in the memory 140 . Also, the controller 120 may execute a program installed in the memory 140 .

이러한 메모리(140)는 입출력부(110)에 의해 촬영된 영상을 저장할 수 있다. The memory 140 may store an image captured by the input/output unit 110 .

도 2 는 일 실시예에 따른 영상특징점강화방법을 설명하기 위한 순서도이다. 2 is a flowchart illustrating a method for enhancing image feature points according to an embodiment.

도 2 에 도시된 실시예에 따른 영상특징점강화방법은 도 1 에 도시된 영상특징점강화장치(10)에서 시계열적으로 처리되는 단계들을 포함한다. 따라서, 이하에서 생략된 내용이라고 하더라도 도 1 에 도시된 영상특징점강화장치(10)에 관하여 이상에서 기술한 내용은 도 2 에 도시된 실시예에 따른 영상특징점강화방법에도 적용될 수 있다.The image feature enhancement method according to the embodiment shown in FIG. 2 includes steps processed in time series by the image feature enhancement apparatus 10 shown in FIG. 1 . Therefore, even if omitted below, the contents described above with respect to the image feature enhancement apparatus 10 shown in FIG. 1 may also be applied to the image feature enhancement method according to the embodiment shown in FIG. 2 .

우선, 영상특징점강화장치(10)는 카메라센서를 통해 영상을 획득할 수 있고, 획득된 영상으로부터 특징점을 추출할 수 있다(S2001). First, the image feature point enhancement apparatus 10 may acquire an image through a camera sensor, and may extract a feature point from the acquired image (S2001).

예를 들어, 영상특징점강화장치(10)는 영상을 구성하는 각 프레임 내에서 주변의 지점과 밝기 등의 차이가 있는 지점을 특징점으로 추출할 수 있다. For example, the image feature point enhancement apparatus 10 may extract a point having a difference in brightness, etc. from a peripheral point within each frame constituting the image as the feature point.

그리고 영상특징점강화장치(10)는 추출된 특징점의 특성에 대한 정보를 포함하는 기술자의 분포를 기록한 시각적 단어 벡터를 생성할 수 있다(S2002). In addition, the image feature point enhancement device 10 may generate a visual word vector in which a distribution of descriptors including information on the characteristics of the extracted feature points is recorded ( S2002 ).

예를 들어, 영상특징점강화장치(10)는 기술자를 K-means 클러스터링하여 단어의 가방(Bag of words) 개념의 시각적 단어(visual vocabulary)를 생성할 수 있다. 그리고 영상특징점강화장치(10)는 프레임 상의 특징점에 대응되는 기술자를 시각적 단어에 매칭할 수 있고, 매칭된 시각적 단어에 대응되는 값으로 시각적 단어 벡터를 생성하여 프레임의 특성을 저장할 수 있다. For example, the image feature enhancement apparatus 10 may generate a visual vocabulary of the concept of a bag of words by K-means clustering the descriptors. In addition, the image feature enhancement apparatus 10 may match a descriptor corresponding to a feature point on a frame to a visual word, and may generate a visual word vector with a value corresponding to the matched visual word to store the frame characteristics.

그리고 영상특징점강화장치(10)는 프레임에서 특징점 영역에 위치한 물체를 인식하고, 인식된 물체에 대한 레이블인 의미정보를 시각적 단어 벡터에 추가하여 시각적 의미 단어 벡터를 생성할 수 있다(S2003). In addition, the image feature enhancement device 10 may recognize an object located in the feature point region in the frame, and add semantic information, which is a label for the recognized object, to the visual word vector to generate a visual semantic word vector (S2003).

예를 들어, 영상특징점강화장치(10)는 특징점을 기준으로 기 설정된 반경 내에 위치한 물체를 인식할 수 있고, 기 저장된 레이블 중 인식된 물체에 대응되는 레이블을 의미정보로 설정할 수 있다. 그리고 영상특징점강화장치(10)는 물체가 위치한 영역의 특징점에 대한 기술자의 마지막 값으로 물체 레이블을 포함시킴으로써 시각적 단어 벡터에 의미정보가 포함된 시각적 의미 단어 벡터를 생성할 수 있다. For example, the image feature enhancement apparatus 10 may recognize an object located within a preset radius based on the feature point, and set a label corresponding to the recognized object among pre-stored labels as semantic information. In addition, the image feature point reinforcing apparatus 10 may generate a visual semantic word vector including semantic information in the visual word vector by including the object label as the last value of the descriptor for the feature point of the region where the object is located.

이후, 영상특징점강화장치(10)는 현재 위치가 기 방문한 위치인지 여부를 식별하기 위해 시각적 의미 단어 벡터를 기초로 현재 프레임과 기 저장된 영상에 포함된 프레임의 유사도를 계산할 수 있으며, 이를 위해, 영상특징점강화장치(10)는 현재 프레임과의 유사도를 계산할 후보 키 프레임을 결정할 수 있다(S2004).Thereafter, the image feature enhancement device 10 may calculate the similarity between the current frame and the frame included in the pre-stored image based on the visual semantic word vector to identify whether the current location is a previously visited location. The feature point enhancement apparatus 10 may determine a candidate key frame for which a similarity with the current frame is to be calculated ( S2004 ).

도 3 은 후보 키 프레임을 결정하는 순서를 도시한 순서도이다. 도 3 을 참조하면, 영상특징점강화장치(10)는 현재 프레임의 물체 레이블에 기초하여 기 저장된 영상의 키 프레임에서 제 1 후보프레임을 설정할 수 있다(S3001). 3 is a flowchart illustrating a procedure for determining a candidate key frame. Referring to FIG. 3 , the image feature enhancement apparatus 10 may set a first candidate frame from a key frame of a pre-stored image based on the object label of the current frame ( S3001 ).

예를 들어, 영상특징점강화장치(10)는 기 저장된 영상의 키 프레임 중 현재 프레임에 포함된 물체 레이블과 동일한 적어도 하나의 물체 레이블을 포함하고 있는 프레임을 제 1 후보프레임으로 설정할 수 있다. For example, the image feature enhancement apparatus 10 may set a frame including at least one object label identical to an object label included in the current frame among key frames of a pre-stored image as the first candidate frame.

그리고 영상특징점강화장치(10)는 제 1 후보프레임 중 현재 프레임의 물체 레이블과 동일한 물체 레이블을 포함하는 수가 기 설정된 수를 초과하는 프레임을 제 2 후보프레임으로 설정할 수 있다(S3002). In addition, the image feature enhancement apparatus 10 may set a frame in which the number of object labels identical to the object labels of the current frame among the first candidate frames exceeds a preset number as the second candidate frame ( S3002 ).

예를 들어, 영상특징점강화장치(10)는 현재 프레임의 물체 레이블과 동일한 물체 레이블의 포함 개수를 기준으로 제 1 후보프레임에서 상위 20% 내에 속하는 제 1 후보프레임을 제 2 후보프레임으로 설정할 수 있다.For example, the image feature enhancement apparatus 10 may set a first candidate frame belonging to the top 20% of the first candidate frame based on the number of object labels identical to the object label of the current frame as the second candidate frame. .

이후, 영상특징점강화장치(10)는 제 2 후보프레임의 이웃 프레임 중 현재 프레임 각각의 시각적 단어 벡터의 크기를 기초로 현재 프레임과 유사한 제 2 후보프레임의 이웃 프레임을 제 3 후보프레임으로 설정할 수 있다(S3003).Thereafter, the image feature enhancement apparatus 10 may set the neighboring frame of the second candidate frame similar to the current frame as the third candidate frame based on the size of the visual word vector of each current frame among the neighboring frames of the second candidate frame. (S3003).

예를 들어, 영상특징점강화장치(10)는 제 2 후보프레임의 이웃프레임과 현재 프레임간 L1_Norm 값인 벡터크기가 가장 큰 이웃 프레임을 제 3 후보프레임으로 설정할 수 있다. For example, the image feature enhancement apparatus 10 may set a neighboring frame having the largest vector magnitude, which is an L1_Norm value between the neighboring frame of the second candidate frame and the current frame, as the third candidate frame.

그리고 영상특징점강화장치(10)는 제 3 후보프레임으로부터 현재 프레임과 시각적 의미 단어 벡터의 크기를 이용하여 최종적으로 후보 키 프레임을 설정할 수 있다(S3004). In addition, the image feature enhancement apparatus 10 may finally set a candidate key frame from the third candidate frame by using the size of the current frame and the visual semantic word vector ( S3004 ).

예를 들어, 영상특징점강화장치(10)는 현재 프레임과 이웃 프레임의 L1_Norm 값을 기초로 제 3 후보프레임을 정렬할 수 있고, 현재 프레임과의 시각적 의미 단어 벡터의 크기가 기 설정된 값을 초과하는 제 3 후보프레임을 후보 키 프레임을 선택할 수 있다. For example, the image feature enhancement apparatus 10 may align the third candidate frame based on the L1_Norm value of the current frame and the neighboring frame, and the size of the visual meaning word vector with the current frame exceeds a preset value. A candidate key frame may be selected as the third candidate frame.

이후, 영상특징점강화장치(10)는 결정된 후보 키 프레임과 현재 위치의 프레임 각각의 시각적 의미 단어 벡터를 구성하는 값의 차이를 기초로 유사도를 계산하여 현재 위치가 기 방문한 지점인지 식별할 수 있다(S2005).Thereafter, the image feature enhancement apparatus 10 calculates the similarity based on the difference between the determined candidate key frame and the values constituting the visual meaning word vector of each frame of the current position to identify whether the current position is a previously visited point ( S2005).

예를 들어, 영상특징점강화장치(10)는 현재 프레임의 시각적 의미 단어 벡터와 후보 키 프레임의 시각적 의미 단어 벡터간의 해밍거리(Hamming distance)를 계산하여 현재 프레임과 후보 키 프레임간의 유사도를 계산할 수 있다. For example, the image feature enhancement apparatus 10 may calculate the similarity between the current frame and the candidate key frame by calculating a Hamming distance between the visual meaning word vector of the current frame and the visual meaning word vector of the candidate key frame. .

그리고 현재 위치가 상기 기 방문한 지점으로 식별되면, 영상특징점강화장치(10)는 영상을 기초로 생성된 지도를 갱신할 수 있다(S2006).And if the current location is identified as the previously visited point, the image feature enhancement apparatus 10 may update the map generated based on the image (S2006).

예를 들어, 영상특징점강화장치(10)는 촬영된 영상을 기초로 지도를 생성할 수 있고, S2005단계에서 현재 위치가 기 방문한 위치인지 여부에 따라 회귀점을 검출할 수 있고, 회귀점을 폐쇄하여 지도를 갱신할 수 있다. For example, the image feature point enhancement device 10 may generate a map based on the captured image, detect a regression point according to whether the current location is a previously visited location in step S2005, and close the regression point. to update the map.

이상의 실시예들에서 사용되는 '~부'라는 용어는 소프트웨어 또는 FPGA(field programmable gate array) 또는 ASIC 와 같은 하드웨어 구성요소를 의미하며, '~부'는 어떤 역할들을 수행한다. 그렇지만 '~부'는 소프트웨어 또는 하드웨어에 한정되는 의미는 아니다. '~부'는 어드레싱할 수 있는 저장 매체에 있도록 구성될 수도 있고 하나 또는 그 이상의 프로세서들을 재생시키도록 구성될 수도 있다. 따라서, 일 예로서 '~부'는 소프트웨어 구성요소들, 객체지향 소프트웨어 구성요소들, 클래스 구성요소들 및 태스크 구성요소들과 같은 구성요소들과, 프로세스들, 함수들, 속성들, 프로시저들, 서브루틴들, 프로그램특허 코드의 세그먼트들, 드라이버들, 펌웨어, 마이크로코드, 회로, 데이터, 데이터베이스, 데이터 구조들, 테이블들, 어레이들, 및 변수들을 포함한다.The term '~ unit' used in the above embodiments means software or hardware components such as field programmable gate array (FPGA) or ASIC, and '~ unit' performs certain roles. However, '-part' is not limited to software or hardware. The '~ unit' may be configured to reside on an addressable storage medium or may be configured to refresh one or more processors. Thus, as an example, '~' denotes components such as software components, object-oriented software components, class components, and task components, and processes, functions, properties, and procedures. , subroutines, segments of program patent code, drivers, firmware, microcode, circuitry, data, databases, data structures, tables, arrays, and variables.

구성요소들과 '~부'들 안에서 제공되는 기능은 더 작은 수의 구성요소들 및 '~부'들로 결합되거나 추가적인 구성요소들과 '~부'들로부터 분리될 수 있다.Functions provided in components and '~ units' may be combined into a smaller number of components and '~ units' or separated from additional components and '~ units'.

뿐만 아니라, 구성요소들 및 '~부'들은 디바이스 또는 보안 멀티미디어카드 내의 하나 또는 그 이상의 CPU 들을 재생시키도록 구현될 수도 있다.In addition, components and '~ units' may be implemented to play one or more CPUs in a device or secure multimedia card.

도 2 및 도 3 을 통해 설명된 실시예에 따른 영상특징점강화방법은 컴퓨터에 의해 실행 가능한 명령어 및 데이터를 저장하는, 컴퓨터로 판독 가능한 매체의 형태로도 구현될 수 있다. 이때, 명령어 및 데이터는 프로그램 코드의 형태로 저장될 수 있으며, 프로세서에 의해 실행되었을 때, 소정의 프로그램 모듈을 생성하여 소정의 동작을 수행할 수 있다. 또한, 컴퓨터로 판독 가능한 매체는 컴퓨터에 의해 액세스될 수 있는 임의의 가용 매체일 수 있고, 휘발성 및 비휘발성 매체, 분리형 및 비분리형 매체를 모두 포함한다. 또한, 컴퓨터로 판독 가능한 매체는 컴퓨터 기록 매체일 수 있는데, 컴퓨터 기록 매체는 컴퓨터 판독 가능 명령어, 데이터 구조, 프로그램 모듈 또는 기타 데이터와 같은 정보의 저장을 위한 임의의 방법 또는 기술로 구현된 휘발성 및 비휘발성, 분리형 및 비분리형 매체를 모두 포함할 수 있다.예를 들어, 컴퓨터 기록 매체는 HDD 및 SSD 등과 같은 마그네틱 저장 매체, CD, DVD 및 블루레이 디스크 등과 같은 광학적 기록 매체, 또는 네트워크를 통해 접근 가능한 서버에 포함되는 메모리일 수 있다. The image feature point enhancement method according to the embodiment described with reference to FIGS. 2 and 3 may also be implemented in the form of a computer-readable medium storing instructions and data executable by a computer. In this case, the instructions and data may be stored in the form of program code, and when executed by the processor, a predetermined program module may be generated to perform a predetermined operation. In addition, computer-readable media can be any available media that can be accessed by a computer, and includes both volatile and nonvolatile media, removable and non-removable media. In addition, the computer-readable medium may be a computer recording medium, which is a volatile and non-volatile and non-volatile embodied in any method or technology for storage of information such as computer-readable instructions, data structures, program modules, or other data. It may include both volatile, removable and non-removable media. For example, the computer recording medium may include magnetic storage media such as HDD and SSD, optical recording media such as CD, DVD and Blu-ray disc, or accessible through a network. It may be memory included in the server.

또한 도 2 및 도 3 을 통해 설명된 실시예에 따른 영상특징점강화방법은 컴퓨터에 의해 실행 가능한 명령어를 포함하는 컴퓨터 프로그램(또는 컴퓨터 프로그램 제품)으로 구현될 수도 있다. 컴퓨터 프로그램은 프로세서에 의해 처리되는 프로그래밍 가능한 기계 명령어를 포함하고, 고레벨 프로그래밍 언어(High-level Programming Language), 객체 지향 프로그래밍 언어(Object-oriented Programming Language), 어셈블리 언어 또는 기계 언어 등으로 구현될 수 있다. 또한 컴퓨터 프로그램은 유형의 컴퓨터 판독가능 기록매체(예를 들어, 메모리, 하드디스크, 자기/광학 매체 또는 SSD(Solid-State Drive) 등)에 기록될 수 있다. Also, the image feature point enhancement method according to the embodiment described with reference to FIGS. 2 and 3 may be implemented as a computer program (or computer program product) including instructions executable by a computer. The computer program includes programmable machine instructions processed by a processor, and may be implemented in a high-level programming language, an object-oriented programming language, an assembly language, or a machine language. . In addition, the computer program may be recorded in a tangible computer-readable recording medium (eg, a memory, a hard disk, a magnetic/optical medium, or a solid-state drive (SSD), etc.).

따라서 도 2 및 도 3 을 통해 설명된 실시예에 따른 영상특징점강화방법은 상술한 바와 같은 컴퓨터 프로그램이 컴퓨팅 장치에 의해 실행됨으로써 구현될 수 있다. 컴퓨팅 장치는 프로세서와, 메모리와, 저장 장치와, 메모리 및 고속 확장포트에 접속하고 있는 고속 인터페이스와, 저속 버스와 저장 장치에 접속하고 있는 저속 인터페이스 중 적어도 일부를 포함할 수 있다. 이러한 성분들 각각은 다양한 버스를 이용하여 서로 접속되어 있으며, 공통 머더보드에 탑재되거나 다른 적절한 방식으로 장착될 수 있다. Accordingly, the image feature point enhancement method according to the embodiment described with reference to FIGS. 2 and 3 may be implemented by executing the computer program as described above by the computing device. The computing device may include at least a portion of a processor, a memory, a storage device, a high-speed interface connected to the memory and the high-speed expansion port, and a low-speed interface connected to the low-speed bus and the storage device. Each of these components is connected to each other using various buses, and may be mounted on a common motherboard or in any other suitable manner.

여기서 프로세서는 컴퓨팅 장치 내에서 명령어를 처리할 수 있는데, 이런 명령어로는, 예컨대 고속 인터페이스에 접속된 디스플레이처럼 외부 입력, 출력 장치상에 GUI(Graphic User Interface)를 제공하기 위한 그래픽 정보를 표시하기 위해 메모리나 저장 장치에 저장된 명령어를 들 수 있다. 다른 실시예로서, 다수의 프로세서 및(또는) 다수의 버스가 적절히 다수의 메모리 및 메모리 형태와 함께 이용될 수 있다. 또한 프로세서는 독립적인 다수의 아날로그 및(또는) 디지털 프로세서를 포함하는 칩들이 이루는 칩셋으로 구현될 수 있다. Here, the processor may process a command within the computing device, such as for displaying graphic information for providing a Graphical User Interface (GUI) on an external input or output device, such as a display connected to a high-speed interface. Examples are instructions stored in memory or a storage device. In other embodiments, multiple processors and/or multiple buses may be used with multiple memories and types of memory as appropriate. In addition, the processor may be implemented as a chipset formed by chips including a plurality of independent analog and/or digital processors.

또한 메모리는 컴퓨팅 장치 내에서 정보를 저장한다. 일례로, 메모리는 휘발성 메모리 유닛 또는 그들의 집합으로 구성될 수 있다. 다른 예로, 메모리는 비휘발성 메모리 유닛 또는 그들의 집합으로 구성될 수 있다. 또한 메모리는 예컨대, 자기 혹은 광 디스크와 같이 다른 형태의 컴퓨터 판독 가능한 매체일 수도 있다. Memory also stores information within the computing device. In one example, the memory may be configured as a volatile memory unit or a set thereof. As another example, the memory may be configured as a non-volatile memory unit or a set thereof. The memory may also be another form of computer readable medium such as, for example, a magnetic or optical disk.

그리고 저장장치는 컴퓨팅 장치에게 대용량의 저장공간을 제공할 수 있다. 저장 장치는 컴퓨터 판독 가능한 매체이거나 이런 매체를 포함하는 구성일 수 있으며, 예를 들어 SAN(Storage Area Network) 내의 장치들이나 다른 구성도 포함할 수 있고, 플로피 디스크 장치, 하드 디스크 장치, 광 디스크 장치, 혹은 테이프 장치, 플래시 메모리, 그와 유사한 다른 반도체 메모리 장치 혹은 장치 어레이일 수 있다. In addition, the storage device may provide a large-capacity storage space to the computing device. A storage device may be a computer-readable medium or a component comprising such a medium, and may include, for example, devices or other components within a storage area network (SAN), a floppy disk device, a hard disk device, an optical disk device, or a tape device, a flash memory, or other semiconductor memory device or device array similar thereto.

상술된 실시예들은 예시를 위한 것이며, 상술된 실시예들이 속하는 기술분야의 통상의 지식을 가진 자는 상술된 실시예들이 갖는 기술적 사상이나 필수적인 특징을 변경하지 않고서 다른 구체적인 형태로 쉽게 변형이 가능하다는 것을 이해할 수 있을 것이다. 그러므로 상술된 실시예들은 모든 면에서 예시적인 것이며 한정적이 아닌 것으로 이해해야만 한다. 예를 들어, 단일형으로 설명되어 있는 각 구성 요소는 분산되어 실시될 수도 있으며, 마찬가지로 분산된 것으로 설명되어 있는 구성 요소들도 결합된 형태로 실시될 수 있다.The above-described embodiments are for illustration, and those of ordinary skill in the art to which the above-described embodiments pertain can easily transform into other specific forms without changing the technical idea or essential features of the above-described embodiments. You will understand. Therefore, it should be understood that the above-described embodiments are illustrative in all respects and not restrictive. For example, each component described as a single type may be implemented in a dispersed form, and likewise components described as distributed may be implemented in a combined form.

본 명세서를 통해 보호 받고자 하는 범위는 상기 상세한 설명보다는 후술하는 특허청구범위에 의하여 나타내어지며, 특허청구범위의 의미 및 범위 그리고 그 균등 개념으로부터 도출되는 모든 변경 또는 변형된 형태를 포함하는 것으로 해석되어야 한다.The scope to be protected through this specification is indicated by the claims described below rather than the above detailed description, and should be construed to include all changes or modifications derived from the meaning and scope of the claims and their equivalents. .

10: 영상특징점강화장치
110: 입출력부
120: 제어부
121: 시각적특징처리부 122: 의미적특징처리부
123: 위치추정부 124: 지도갱신부
130: 통신부
140: 메모리 10: image feature point enhancement device
110: input/output unit
120: control unit
121: visual feature processing unit 122: semantic feature processing unit
123: location estimation unit 124: map update unit
130: communication department
140: memory

Claims

SLAM에서 영상의 특징점을 강화하는 장치에 있어서,
카메라를 통해 획득된 영상을 구성하는 프레임으로부터 특징점을 추출하고, 추출된 특징점의 특성에 대한 정보를 포함하는 기술자(descriptor)의 분포를 기록한 시각적 단어 벡터를 생성하는 시각적특징처리부;
상기 프레임으로부터 상기 특징점 영역에 위치한 물체를 인식하고, 인식된 물체에 대한 의미정보를 추출하는 의미적특징처리부; 및
상기 시각적 단어 벡터에 상기 의미정보가 추가된 시각적 의미 단어 벡터를 기초로 현재 위치가 기 방문한 지점인지 식별하는 위치추정부를 포함하며,
상기 위치추정부는 상기 물체의 레이블 및 상기 시각적 의미 단어 벡터를 이용하여 기 저장된 영상의 키 프레임 중 후보 키 프레임을 결정하고, 상기 후보 키 프레임과 현재 위치의 프레임 간 유사도를 계산하여 현재 위치가 기 방문한 지점인지 식별하는, 영상특징점강화장치.An apparatus for enhancing image feature points in SLAM, comprising:
a visual feature processing unit that extracts a feature point from a frame constituting an image acquired through a camera, and generates a visual word vector in which a distribution of descriptors including information on characteristics of the extracted feature point is recorded;
a semantic feature processing unit for recognizing an object located in the feature point region from the frame and extracting semantic information about the recognized object; and
and a location estimator for identifying whether the current location is a previously visited point based on the visual word vector to which the semantic information is added to the visual word vector,
The location estimator determines a candidate key frame from among key frames of a pre-stored image using the label of the object and the visual semantic word vector, calculates the similarity between the candidate key frame and the frame of the current location, so that the current location is previously visited. Image feature point enhancement device that identifies whether it is a point.

제 1 항에 있어서,
상기 의미적특징처리부는,
기 저장된 레이블에서 상기 인식된 물체에 대응되는 레이블을 의미정보로써 설정하고, 상기 프레임 내의 물체 레이블의 분포를 상기 시각적 단어 벡터에 추가하여 상기 시각적 의미 단어 벡터를 생성하는, 영상특징점강화장치.The method of claim 1,
The semantic feature processing unit,
An image feature enhancement apparatus for generating the visual semantic word vector by setting a label corresponding to the recognized object from a pre-stored label as semantic information, and adding a distribution of object labels in the frame to the visual word vector.

삭제delete

제 1 항에 있어서,
상기 위치추정부는,
상기 현재 위치의 프레임에 포함된 물체의 레이블을 이용하여 상기 기 저장된 영상의 키 프레임 중 유사도를 계산할 후보프레임을 추출하는, 영상특징점강화장치.The method of claim 1,
The location estimator is
An image feature enhancement apparatus for extracting a candidate frame for calculating a similarity from among key frames of the pre-stored image by using a label of an object included in the frame of the current position.

제 4 항에 있어서,
상기 위치추정부는,
추출된 후보프레임 중 상기 현재 위치의 프레임의 시각적 의미 단어 벡터와의 유사도에 기초하여 후보 키 프레임을 결정하는, 영상특징점강화장치.5. The method of claim 4,
The location estimator is
An image feature enhancement apparatus for determining a candidate key frame based on a degree of similarity with a visual semantic word vector of the frame of the current position among the extracted candidate frames.

제 5 항에 있어서,
상기 위치추정부는,
상기 결정된 후보 키 프레임과 상기 현재 위치의 프레임 각각의 시각적 의미 단어 벡터를 구성하는 값의 차이를 기초로 유사도를 계산하여 현재 위치가 기 방문한 지점인지 식별하는, 영상특징점강화장치.6. The method of claim 5,
The location estimator is
An image feature enhancement apparatus for identifying whether a current location is a previously visited point by calculating a similarity based on a difference between the determined candidate key frame and a value constituting a visual meaning word vector of each frame of the current location.

제 6 항에 있어서,
상기 영상특징점강화장치는,
상기 현재 위치가 상기 기 방문한 지점으로 식별되면, 상기 영상을 기초로 생성된 지도를 갱신하는 지도갱신부를 더 포함하는, 영상특징점강화장치. 7. The method of claim 6,
The image feature point enhancement device,
When the current location is identified as the previously visited point, further comprising a map update unit for updating the map generated based on the image, image feature point enhancement apparatus.

영상특징점강화장치가 SLAM에서 영상의 특징점을 강화하는 방법에 있어서,
카메라를 통해 획득된 영상을 구성하는 프레임으로부터 특징점을 추출하고, 추출된 특징점의 특성에 대한 정보를 포함하는 기술자의 분포를 기록한 시각적 단어 벡터를 생성하는 단계;
상기 프레임으로부터 상기 특징점 영역에 위치한 물체를 인식하고, 인식된 물체에 대한 의미정보를 추출하는 단계; 및
상기 시각적 단어 벡터에 상기 의미정보가 추가된 시각적 의미 단어 벡터를 기초로 현재 위치가 기 방문한 지점인지 식별하는 단계를 포함하며,
상기 현재 위치가 기 방문한 지점인지 식별하는 단계는,
상기 물체의 레이블 및 상기 시각적 의미 단어 벡터를 이용하여 기 저장된 영상의 키 프레임 중 후보 키 프레임을 결정하는 단계; 및
상기 후보 키 프레임과 현재 위치의 프레임 간 유사도를 계산하여 현재 위치가 기 방문한 지점인지 식별하는 단계를 포함하는, 영상특징점강화방법. In the method for the image feature enhancement device to enhance the feature point of the image in SLAM,
extracting a feature point from a frame constituting an image obtained through a camera, and generating a visual word vector in which a distribution of descriptors including information on the characteristics of the extracted feature point is recorded;
recognizing an object located in the feature point region from the frame, and extracting semantic information about the recognized object; and
and identifying whether the current location is a previously visited point based on the visual meaning word vector to which the semantic information is added to the visual word vector,
The step of identifying whether the current location is a previously visited point,
determining a candidate key frame from among key frames of a pre-stored image by using the label of the object and the visual semantic word vector; and
and determining whether the current location is a previously visited point by calculating a similarity between the candidate key frame and the frame of the current location.

제 8 항에 있어서,
상기 의미정보를 추출하는 단계는,
기 저장된 레이블에서 상기 인식된 물체에 대응되는 레이블을 의미정보로써 설정하는 단계; 및
상기 프레임 내의 물체 레이블의 분포를 상기 시각적 단어 벡터에 추가하여 상기 시각적 의미 단어 벡터를 생성하는 단계를 포함하는, 영상특징점강화방법. 9. The method of claim 8,
The step of extracting the semantic information is,
setting a label corresponding to the recognized object from a pre-stored label as semantic information; and
and generating the visual semantic word vector by adding the distribution of object labels in the frame to the visual word vector.

삭제delete

제 8 항에 있어서,
상기 후보 키 프레임을 결정하는 단계는,
상기 현재 위치의 프레임에 포함된 물체의 레이블을 이용하여 상기 기 저장된 영상의 키 프레임 중 유사도를 계산할 후보프레임을 추출하는 단계; 및
상기 추출된 후보프레임 중 상기 현재 위치의 프레임의 시각적 의미 단어 벡터와의 유사도에 기초하여 후보 키 프레임을 결정하는 단계를 포함하는, 영상특징점강화방법.9. The method of claim 8,
Determining the candidate key frame comprises:
extracting a candidate frame for calculating a similarity from among the key frames of the pre-stored image by using the label of the object included in the frame of the current position; and
and determining a candidate key frame based on a similarity with a visual semantic word vector of a frame at the current position among the extracted candidate frames.

삭제delete

제 11 항에 있어서,
상기 유사도를 계산하여 현재 위치가 기 방문한 지점인지 식별하는 단계는,
상기 결정된 후보 키 프레임과 상기 현재 위치의 프레임 각각의 시각적 의미 단어 벡터를 구성하는 값의 차이를 기초로 유사도를 계산하여 현재 위치가 기 방문한 지점인지 식별하는 것을 특징으로 하는, 영상특징점강화방법.12. The method of claim 11,
The step of calculating the similarity and identifying whether the current location is a previously visited point includes:
The image feature point enhancement method, characterized in that it is identified whether the current location is a previously visited point by calculating a degree of similarity based on a difference between the determined candidate key frame and the values constituting the visual meaning word vector of each frame of the current location.

제 13 항에 있어서,
상기 영상특징점강화방법은,
상기 현재 위치가 상기 기 방문한 지점으로 식별되면, 상기 영상을 기초로 생성된 지도를 갱신하는 단계를 더 포함하는, 영상특징점강화방법. 14. The method of claim 13,
The image feature point enhancement method is
When the current location is identified as the previously visited point, the method further comprising the step of updating a map generated based on the image, image feature point enhancement method.

제 8 항에 기재된 방법을 수행하는 프로그램이 기록된 컴퓨터 판독 가능한 기록 매체.A computer-readable recording medium in which a program for performing the method according to claim 8 is recorded.

영상특징점강화장치에 의해 수행되며, 제 8 항에 기재된 방법을 수행하기 위해 기록매체에 저장된 컴퓨터 프로그램.A computer program stored in a recording medium for performing the method according to claim 8, which is performed by the image feature enhancement device.