CN108981698B

CN108981698B - Visual positioning method based on multi-mode data

Info

Publication number: CN108981698B
Application number: CN201810534761.3A
Authority: CN
Inventors: 程瑞琦; 林书妃; 杨恺伦; 汪凯巍; 于红雷
Original assignee: Hangzhou Kr Vision Technology Co ltd
Current assignee: Hangzhou Kr Vision Technology Co ltd
Priority date: 2018-05-29
Filing date: 2018-05-29
Publication date: 2020-07-14
Anticipated expiration: 2038-05-29
Also published as: CN108981698A

Abstract

The invention discloses a visual positioning method based on multi-modal data. The method utilizes multimode data acquired by a GNSS and a camera, utilizes a small-sized processor to process the acquired data and outputs a positioning result. The method can be used for positioning under different illumination conditions such as day and night, and has the advantages of low false detection rate, low omission factor, good real-time performance and good cross-platform performance. Can well meet the application requirement of accurate positioning of the visually impaired.

Description

Visual positioning method based on multi-mode data

Technical Field

The invention belongs to the technical fields of image processing technology, signal processing technology and computer vision, and relates to a visual positioning method based on multi-modal data.

Background

Visual information is the most important information source for human beings to recognize the surrounding environment, and about 80% of information obtained by human beings is input from a visual system. According to the statistics of the world health organization, 2.85 hundred million people with visual impairment exist in the world. The visually impaired person loses normal vision and has difficulty in understanding the color and shape. Many of them now use white canes or guide dogs to assist their daily lives. White canes are not sufficient to solve all the difficulties during travel. The guide dogs can guide visually impaired people to avoid danger when walking on the road, but they cannot be used for all visually impaired people because of the great cost required for training the guide dogs. Therefore, the traditional tools such as walking sticks and guide dogs cannot provide sufficient assistance for traveling. Since the development of various Electronic Travel Aid (ETA) devices, it has been considered as an effective method for assisting visually impaired people to travel under various conditions. To help users find access, many auxiliary systems deploy depth cameras to detect accessible paths and obstacles. However, in these systems, there is little integration of accurate positioning system detection. Due to limited vision, the visually impaired cannot be accurately positioned when the person goes outdoors. The positioning accuracy of GNSS devices is typically several meters or more to more than ten and several meters. Therefore, it is important for the visually impaired to perform outdoor trip to realize accurate positioning.

Much work has been devoted to solving the visual localization problem. However, most current solutions apply to automated navigation of robots or unmanned vehicles. In automatic navigation, a camera is static relative to a bearer, and the shooting direction is single; the blind-person assistance application is quite different, the image captured by the handheld camera is unstable, and the camera can shoot any direction image. Therefore, the visual localization method for visually impaired people must be robust enough to cope with various environments. As an aid to visually impaired people, the visual positioning method should have a very low false alarm rate, which is very important for the safety of the user. In addition, real-time is also a requirement of the algorithm. Because the algorithms must be implemented in portable platforms, limited system resources require efficient algorithms to maintain moderate frame rates.

Disclosure of Invention

The invention aims to provide a visual positioning method based on multi-modal data, aiming at overcoming the defects of the prior art.

The purpose of the invention is realized by the following technical scheme: a visual localization method based on multi-modal data comprises the following steps:

(1) establishing a position P-feature database W, wherein the feature data W comprises longitude data L on, latitude data L at and three features of GIST, L DB and BoW, the three features of GIST, L DB and BoW are extracted from image information, the image information comprises a Color image Color, a Depth image Depth and an infrared image IR which are collected at a position P, and the extraction method comprises the steps of extracting GIST features from the Color, respectively extracting L DB features from the Color, Depth and IR and splicing the three L DB features into L DB features;

(2) when positioning is needed, acquiring a Color image Color ', a Depth image Depth ' and an infrared image IR ' at a position to be positioned, longitude data L on ' and latitude data L at ', extracting GIST characteristics from the Color ', respectively extracting L DB characteristics from the Color ', Depth ' and IR ', and splicing the three L DB characteristics into one L DB characteristic;

(3) and (3) screening out the approximate position P according to L on 'and L at', and meeting the following conditions:

where t is an approximate threshold, and 0< t <1, L on, L at are longitude data L on and latitude data L at, respectively, of the position P stored in the database.

(4) Respectively searching the nearest positions from the approximate positions screened out in the step 3 according to three characteristics of GIST ', L DB ' and BoW ', and respectively obtaining three nearest positions P_GIST,P_LDB,P_BoWWherein the distance between GIST features of two locations is calculated using euclidean distance, the distance between two L DB features is calculated using Hamming distance, and the distance between two BoW features is calculated using L1 fraction;

(5) if P_GIST,P_LDB,P_BoWThe three points are overlapped, the final positioning result P is obtained₀＝P_GIST＝P_LDB＝P_BoWIf the three points are not coincident, the final positioning result is the central point position of the three most adjacent positions.

Further, the three L DB features are spliced into a L DB feature, specifically, L DB features are extracted from Color, Depth and IR respectively and are marked as L DBc, L DBd and L DBi respectively, and the three L DB features are spliced into a L DB feature in an end-to-end mode.

Further, the center point positions of the three nearest neighboring positions are obtained by:

wherein (L on₀,Lat₀) As a result of positioning P₀(L on)_GIST,Lat_GIST) Is P_GIST(L on)_LDB,Lat_LDB) Is P_LDB(L on)_BoW,Lat_BoW) Is P_BoWThe coordinates of (a).

Compared with the conventional sidewalk traffic light detection method, the method has the following advantages that:

1. the positioning accuracy is high, and compared with the positioning only by using a GNSS positioning module, the positioning accuracy can be improved by vision-assisted positioning.

2. And (4) environmental suitability. The method can be used for positioning by utilizing visual image information under different illumination conditions such as strong light, weak light and the like.

3. The real-time performance is good. The method can be used for positioning on a mobile platform (such as a mobile phone) in real time without delay under various conditions.

4. The portability is good. The core part of the method is a camera, a processor and an earphone returning device, and the method can be conveniently transplanted to intelligent devices such as mobile phones and tablets.

Drawings

1. FIG. 1 is a flow chart of a visual positioning method for visually impaired people based on multimodal data;

2. fig. 2 is a schematic diagram of an original image acquired by a camera.

Detailed Description

A visual localization method based on multi-modal data comprises the following steps:

(1) establishing a position P-characteristic database W, wherein the characteristic data W comprises longitude data L on, latitude data L at and three characteristics of GIST, L DB and BoW, the latitude and longitude data format is a format specified by an NMEA-0183 protocol, as shown in the following table, the three characteristics of GIST, L DB and BoW are extracted from image information, the image information comprises a Color image Color, a Depth image Depth and an infrared image IR which are collected at a position P, and a plurality of common key positions are stored in the database under the normal condition;

wherein t is an approximate threshold value, t is more than 0 and less than 1, t is selected according to the positioning accuracy of the GNSS module, when the GNSS positioning accuracy is influenced by building sheltering, rainy and foggy weather and the like, a larger numerical value is selected, and vice versa, L on and L at are longitude data L on and latitude data L at of the position P stored in the database respectively.

(5) if P_GIST,P_LDB,P_BoWThe three points are overlapped, the final positioning result P is obtained₀＝P_GIST＝P_LDB＝P_BoWIf the three points are not coincident, the final positioning is performedThe result is the center point location of the three nearest neighbors, namely:

Claims

1. A visual positioning method based on multi-modal data is characterized by comprising the following steps:

(1) establishing a position P-feature data W database, wherein the feature data W comprises longitude data L on, latitude data L at and three features of GIST, L DB and BoW, the three features of GIST, L DB and BoW are extracted from image information, the image information comprises a Color image Color, a Depth image Depth and an infrared image IR which are collected at a position P, and the extraction method comprises the steps of extracting GIST features from the Color, extracting L DB features from the Color, Depth and IR respectively, and splicing the three L DB features into L DB features;

(2) when positioning is needed, acquiring a Color image Color ', a Depth image Depth ' and an infrared image IR ' at a position to be positioned, longitude data L on ' and latitude data L at ', extracting GIST ' features from the Color ', respectively extracting L DB ' features from the Color ', Depth ' and IR ', splicing the three L DB ' features into one L DB ' feature, and extracting BoW ' features from the Color ';

(3) and (5) screening out approximate positions according to L on 'and L at', and meeting the following conditions:

where t is an approximate threshold, and 0< t <1, L on, L at are longitude data L on and latitude data L at of the position P stored in the database, respectively;

(4) respectively searching the nearest positions from the approximate positions screened out in the step (3) according to three characteristics of GIST ', L DB ' and BoW ', and respectively obtaining three nearest positions P_GIST,P_LDB,P_BoWWherein the distance between GIST and GIST ' features is calculated by using Euclidean distance, the distance between L DB and L DB ' features is calculated by using Hamming distance, and the distance between BoW and BoW ' features is calculated by using L1 fraction;

2. The method of claim 1, wherein the three L DB features are spliced into one L DB feature, and specifically, L DB features are extracted from Color, Depth and IR respectively and are respectively marked as L DBc, L DBd and L DBi, and the three L DB features are spliced end to end into one L DB feature.

3. The method of claim 1, wherein the location of the center point of the three nearest neighbor locations is obtained by: