CN112562081A - Visual map construction method for visual layered positioning - Google Patents

Visual map construction method for visual layered positioning Download PDF

Info

Publication number
CN112562081A
CN112562081A CN202110175262.1A CN202110175262A CN112562081A CN 112562081 A CN112562081 A CN 112562081A CN 202110175262 A CN202110175262 A CN 202110175262A CN 112562081 A CN112562081 A CN 112562081A
Authority
CN
China
Prior art keywords
frame
visual
superpoint
points
key
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110175262.1A
Other languages
Chinese (zh)
Other versions
CN112562081B (en
Inventor
朱世强
钟心亮
顾建军
姜峰
李特
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang Lab
Original Assignee
Zhejiang Lab
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang Lab filed Critical Zhejiang Lab
Priority to CN202110175262.1A priority Critical patent/CN112562081B/en
Publication of CN112562081A publication Critical patent/CN112562081A/en
Application granted granted Critical
Publication of CN112562081B publication Critical patent/CN112562081B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T17/00Three dimensional [3D] modelling, e.g. data description of 3D objects
    • G06T17/05Geographic models
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • G06T7/73Determining position or orientation of objects or cameras using feature-based methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2200/00Indexing scheme for image data processing or generation, in general
    • G06T2200/04Indexing scheme for image data processing or generation, in general involving 3D image data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Geometry (AREA)
  • Remote Sensing (AREA)
  • Computer Graphics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a visual map construction method for visual layered positioning, which comprises the following steps: 1) sequentially acquiring binocular image frame data collected by a binocular camera and determining a motion track; 2) respectively extracting a NetVLAD global descriptor, a SuperPoint characteristic point and a local descriptor from the image frame; 3) finding out the matching key points of the characteristic points of each frame of image and restoring the 3D position of each characteristic point in an incremental manner; 4) determining the optimal co-view key frame of each frame according to the 2D observation of the 3D feature points of the frame; 5) and generating visual map information finally used for visual layered positioning. The visual map construction method provided by the invention combines deep learning characteristics, the descriptive performance of the visual map is enhanced, and the generated map for visual positioning contains multi-level description information and can be used for visual global robust positioning with consistent coordinate systems.

Description

Visual map construction method for visual layered positioning
Technical Field
The invention relates to the technical field of computer vision, in particular to a visual map construction method for visual layered positioning.
Background
In recent years, with the continuous development of computer vision, SLAM technology is widely used in various fields, virtual reality, augmented reality, robotics, unmanned planes, and the like. With the continuous development of computer hardware, the real-time processing of visual information becomes possible, and the real-time positioning and map construction by using the visual information greatly reduces the price of intelligent robot products while improving the information acquisition amount.
However, most visual SLAM systems focus on online localization estimation, do not have localization function under global coordinate system, and most SLAM system outputs do not include representation of visual maps. The traditional method is based on matching of a bag-of-words model and traditional feature points, and has limited environment adaptation capability and low positioning success rate.
Disclosure of Invention
In view of the above, the invention provides a visual map construction method for visual layered positioning, which combines with deep learning domain knowledge and visual map construction method and expression form, and can construct a global consistent map and be used for visual-based layered global consistent positioning.
The invention adopts the following technical scheme: a visual map construction method for visual layered positioning comprises the following steps:
(1) sequentially acquiring binocular image frames acquired by the binocular camera and recording the motion trail of the binocular camera to obtain a binocular image sequence containing the motion trail
Figure 377834DEST_PATH_IMAGE001
Wherein, in the step (A),
Figure 967079DEST_PATH_IMAGE002
is shown as
Figure 402608DEST_PATH_IMAGE003
The left frame image to which the frame corresponds,
Figure 328976DEST_PATH_IMAGE004
is shown as
Figure 815452DEST_PATH_IMAGE003
The right frame image to which the frame corresponds,
Figure 169073DEST_PATH_IMAGE005
is an external parameter of the binocular camera,
Figure 360407DEST_PATH_IMAGE006
is as follows
Figure 965832DEST_PATH_IMAGE003
The pose of the frame left camera relative to the world coordinate system, and N is the frame number of the binocular image; using the first frame as a key frame, and aiming at the binocular image sequence
Figure 165869DEST_PATH_IMAGE001
Selecting subsequent key frames to form a key frame sequence
Figure 815025DEST_PATH_IMAGE007
,
Figure 100513DEST_PATH_IMAGE008
Selecting a binocular image frame number N;
(2) respectively extracting NetVLAD global descriptors from key frames
Figure 509629DEST_PATH_IMAGE009
SuperPoint feature points
Figure 564173DEST_PATH_IMAGE010
And corresponding local descriptors
Figure 649809DEST_PATH_IMAGE011
And according to SuperPoint feature points
Figure 422593DEST_PATH_IMAGE010
The response values are sorted from high to low, the first 2000 feature points and the corresponding local descriptors are reserved, and for the second
Figure 369821DEST_PATH_IMAGE008
Obtaining two triple description information by one key frame
Figure 669084DEST_PATH_IMAGE012
(ii) a The NetVLAD global descriptor
Figure 535409DEST_PATH_IMAGE009
Is composed of
Figure 670855DEST_PATH_IMAGE013
Feature vectors of dimensions, the local descriptors
Figure 15249DEST_PATH_IMAGE011
Is composed of
Figure 437527DEST_PATH_IMAGE014
A vector of dimensions;
(3) finding out SuperPoint feature points of key frames
Figure 740332DEST_PATH_IMAGE010
And incrementally restoring the 3D position of each SuperPoint feature point
Figure 97496DEST_PATH_IMAGE015
(4) Traversing each of the steps (3)
Figure 511159DEST_PATH_IMAGE015
Statistical observation of 3D position
Figure 519436DEST_PATH_IMAGE015
All key frames of the points are sorted in descending order, and the first 5 key frames are taken as the optimal common-view key frame of the key frames
Figure 258721DEST_PATH_IMAGE016
(ii) a And finally, corresponding the key frame sequence and each SuperPoint characteristic point to a 3D position
Figure 103181DEST_PATH_IMAGE015
As visual map information.
Further, the selection of the subsequent key frame needs to satisfy the following condition: the Euclidean distance between adjacent left frame images or right frame images is larger than 0.3 meter, and the rotation angle of the adjacent left frame images or right frame images is larger than 3 degrees.
Further, the step (3) comprises the following sub-steps:
(3.1) describing information for a certain triple
Figure 54956DEST_PATH_IMAGE017
SuperPoint feature points in (1)
Figure 183318DEST_PATH_IMAGE018
Selecting the characteristic points which can observe SuperPoint simultaneously
Figure 827926DEST_PATH_IMAGE010
Candidate frame set of
Figure 159681DEST_PATH_IMAGE019
Wherein A is selected from the key frames, satisfying: pose of keyframes in the set of candidate frames C
Figure 915148DEST_PATH_IMAGE020
And
Figure 632437DEST_PATH_IMAGE021
the movement translation distance of the rotating shaft is less than 10m, and the rotating angle is less than 45 degrees;
(3.2) traverse each frame in the candidate frame set C, pair
Figure 447946DEST_PATH_IMAGE022
The local descriptor corresponding to each feature point in
Figure 266998DEST_PATH_IMAGE023
Calculating
Figure 826155DEST_PATH_IMAGE011
And
Figure 932039DEST_PATH_IMAGE023
and obtaining two nearest local descriptors
Figure 528236DEST_PATH_IMAGE024
And
Figure 959218DEST_PATH_IMAGE025
satisfy the following requirements
Figure 446700DEST_PATH_IMAGE026
Then local descriptor
Figure 279526DEST_PATH_IMAGE024
Corresponding to
Figure 46625DEST_PATH_IMAGE027
And SuperPoint feature points
Figure 964903DEST_PATH_IMAGE010
For mutually matching points, SuperPoint characteristic points are obtained after traversal is finished
Figure 990496DEST_PATH_IMAGE028
Set of all match points in candidate frame set C
Figure 553196DEST_PATH_IMAGE029
SuperPoint feature points
Figure 881409DEST_PATH_IMAGE018
Corresponding 3D position
Figure 146037DEST_PATH_IMAGE015
The associated matching information is
Figure 850688DEST_PATH_IMAGE030
Establishing a constraint equation to solve according to the internal reference K of the binocular camera
Figure 267894DEST_PATH_IMAGE015
The constraint equation is:
Figure 767008DEST_PATH_IMAGE031
if it is obtained
Figure 533581DEST_PATH_IMAGE015
If Z is negative or greater than 40m, the value is discarded
Figure 776344DEST_PATH_IMAGE015
(3.3) traversing each SuperPoint feature point
Figure 48056DEST_PATH_IMAGE010
And obtaining the 3D position corresponding to each SuperPoint characteristic point.
Compared with the prior art, the invention has the beneficial effects that: the method combines an image global descriptor NetVLAD method for image retrieval and a local image feature point and descriptor SuperPoint method based on a convolutional neural network, the combination of the two enhances the descriptive performance of a visual map, the visual localization is decoupled into global and local localization, and the generated map for visual localization comprises multi-level description information, wherein the global information comprises the size information of the map, the number of feature points and the corresponding 3D point positions, the number of key frames and NetVLAD descriptors corresponding to each frame, the local information comprises the posture of each frame key frame, the position of the feature points in each key frame and the corresponding SuperPoint descriptors, and the optimal co-view key frame index corresponding to each key frame, and the two are combined and can be used for visual global robust localization with consistent coordinate systems. On one hand, the problem that the traditional SLAM system cannot realize map multiplexing can be solved, and on the other hand, the robustness of global positioning can be improved.
Drawings
FIG. 1 is a flow chart of a visual map construction method for visual layered positioning according to the present invention;
FIG. 2 is a schematic diagram of multi-view observation recovery 3D feature points according to the present invention;
FIG. 3 is a schematic diagram of a visual map containing information according to the present invention;
FIG. 4 is a schematic view of the layered global positioning based on the visual map according to the present invention.
Detailed Description
The principles and aspects of the present invention will be further explained with reference to the drawings and the detailed description, it being understood that the illustrated embodiments are only some examples and the specific embodiments described herein are merely illustrative of the relevant invention and are not intended to limit the invention.
Fig. 1 is a schematic flow chart of a visual map construction method for visual layered positioning according to the present invention, where the visual map construction method includes the following steps:
(1) installing a binocular camera in a robot body, moving the robot and starting to acquire data, sequentially acquiring binocular image frames acquired by the binocular camera and recording the motion trail of the binocular camera to obtain a binocular image sequence containing the motion trail
Figure 718072DEST_PATH_IMAGE001
Wherein, in the step (A),
Figure 222871DEST_PATH_IMAGE002
is shown as
Figure 3746DEST_PATH_IMAGE003
The left frame image to which the frame corresponds,
Figure 395544DEST_PATH_IMAGE004
is shown as
Figure 236461DEST_PATH_IMAGE003
The right frame image to which the frame corresponds,
Figure 962977DEST_PATH_IMAGE005
is an external parameter of the binocular camera,
Figure 547542DEST_PATH_IMAGE006
is as follows
Figure 793847DEST_PATH_IMAGE003
The pose of the frame left camera relative to the world coordinate system, and N is the frame number of the binocular image; using the first frame as a key frame, and aiming at the binocular image sequence
Figure 805665DEST_PATH_IMAGE001
Selecting subsequent key frames to form a key frame sequence
Figure 753899DEST_PATH_IMAGE007
,
Figure 283100DEST_PATH_IMAGE032
Selecting a binocular image frame number N; the selection of the subsequent key frame needs to satisfy the following conditions: the Euclidean distance between adjacent left frame images or right frame images is larger than 0.3 meter, and the rotation angle of the adjacent left frame images or right frame images is larger than 3 degrees.
(2) Extracting NetVLAD global descriptor based on convolution neural network for key frame
Figure 508545DEST_PATH_IMAGE009
The NetVLAD global descriptor
Figure 818828DEST_PATH_IMAGE009
Is described as
Figure 395303DEST_PATH_IMAGE013
Extracting SuperPoint characteristic points of the key frame based on the convolution neural network by using the characteristic vector of the dimension
Figure 197037DEST_PATH_IMAGE018
And corresponding local descriptors
Figure 276988DEST_PATH_IMAGE011
The local descriptor
Figure 286402DEST_PATH_IMAGE033
Is described as
Figure 84593DEST_PATH_IMAGE014
A vector of dimensions; and according to SuperPoint characteristic points
Figure 690018DEST_PATH_IMAGE018
The response values are sorted from high to low, the first 2000 feature points and the corresponding local descriptors are reserved, and for the second
Figure 624476DEST_PATH_IMAGE032
Obtaining two triple description information by one key frame
Figure 539211DEST_PATH_IMAGE012
(3) In order to find out SuperPoint characteristic points of key frames
Figure 824699DEST_PATH_IMAGE010
And incrementally restoring the 3D position of each SuperPoint feature point
Figure 968236DEST_PATH_IMAGE015
It is necessary to find out the sequence of key frames
Figure 22779DEST_PATH_IMAGE007
The same pair
Figure 108416DEST_PATH_IMAGE015
With observed 2D point locations, as shown in fig. 2, comprising the sub-steps of:
(3.1) describing information for a certain triple
Figure 756566DEST_PATH_IMAGE034
SuperPoint feature points in (1)
Figure 828427DEST_PATH_IMAGE018
Selecting the characteristic points which can observe SuperPoint simultaneously
Figure 130620DEST_PATH_IMAGE018
Candidate frame set of
Figure 996945DEST_PATH_IMAGE019
Wherein A is selected from the key frames, satisfying: pose of keyframes in the set of candidate frames C
Figure 132391DEST_PATH_IMAGE020
And
Figure 7943DEST_PATH_IMAGE035
the movement translation distance of the rotating shaft is less than 10m, and the rotating angle is less than 45 degrees;
(3.2) traverse each frame in the candidate frame set C, pair
Figure 161713DEST_PATH_IMAGE036
The local descriptor corresponding to each feature point in
Figure 198939DEST_PATH_IMAGE023
Calculating
Figure 821682DEST_PATH_IMAGE011
And
Figure 235345DEST_PATH_IMAGE037
and obtaining two nearest local descriptors
Figure 509201DEST_PATH_IMAGE038
And
Figure 858274DEST_PATH_IMAGE025
satisfy the following requirements
Figure 827367DEST_PATH_IMAGE026
Then local descriptor
Figure 903776DEST_PATH_IMAGE038
Corresponding to
Figure 907504DEST_PATH_IMAGE027
And SuperPoint feature points
Figure 161899DEST_PATH_IMAGE010
For mutually matching points, SuperPoint characteristic points are obtained after traversal is finished
Figure 618288DEST_PATH_IMAGE010
Set of all match points in candidate frame set C
Figure 766897DEST_PATH_IMAGE039
Figure 625132DEST_PATH_IMAGE040
Corresponding 3D point sets are
Figure 316007DEST_PATH_IMAGE041
(ii) a SuperPoint feature points
Figure 259692DEST_PATH_IMAGE028
Corresponding 3D position
Figure 677904DEST_PATH_IMAGE015
The associated matching information is
Figure 390645DEST_PATH_IMAGE030
Establishing a constraint equation to solve according to the internal reference K of the binocular camera
Figure 518001DEST_PATH_IMAGE015
The constraint equation is:
Figure 73617DEST_PATH_IMAGE031
if it is obtained
Figure 170886DEST_PATH_IMAGE015
If Z is negative or greater than 40m, then discardingThe
Figure 613499DEST_PATH_IMAGE015
(3.3) traversing each SuperPoint feature point
Figure 770811DEST_PATH_IMAGE010
Obtaining a 3D position corresponding to each SuperPoint characteristic point;
(3.4) the whole
Figure 79302DEST_PATH_IMAGE042
And traversing each feature point of the sequence to obtain a corresponding 3D value of the feature point. And finally obtaining basic information of the whole map:
Figure 714682DEST_PATH_IMAGE043
(4) corresponding 3D position for each acquired SuperPoint feature point
Figure 277382DEST_PATH_IMAGE015
Will be observed by multiple frames, traverse
Figure 605595DEST_PATH_IMAGE044
Each of which is
Figure 861434DEST_PATH_IMAGE015
Statistical observation of 3D position
Figure 566085DEST_PATH_IMAGE015
All key frames of the points are sorted in descending order, and the first 5 key frames are taken as the optimal common-view key frame of the key frames
Figure 983291DEST_PATH_IMAGE045
(ii) a And finally, corresponding the key frame sequence and each SuperPoint characteristic point to a 3D position
Figure 482405DEST_PATH_IMAGE015
As a visual groundThe map information, as shown in fig. 3, the final map information, which is exemplified by the left camera of the binocular camera, may be represented as map basic information and feature information, where the basic information includes map size information, the number of key frames, the number of feature points, and the number of 3D points; the feature information comprises key frames, key frame global descriptors, common view key frames, feature points, feature point descriptors and feature point corresponding sets of 3D points.
The visual map information obtained by the visual map construction method is stored and loaded on the robot, the robot is restarted and placed in the environment for positioning, and as shown in fig. 4, firstly, the characteristics of the collected image are extracted: the system comprises a NetVLAD global descriptor, a SuperPoint characteristic point and a descriptor. Comparing the NetVLAD global descriptor of the current frame with the NetVLAD set of the global descriptors in the map visual information, calculating the Euclidean distance of the NetVLAD global descriptor to obtain a key frame closest to the map, finishing the first-layer positioning of the hierarchical positioning, namely the global coarse positioning part, and indicating that the frame is to be positioned near the closest key frame and is coarse positioning. And finding out 5 optimal co-viewing key frames corresponding to the key frames, performing feature matching on the current frame and the closest key frame and the 5 optimal co-viewing key frames corresponding to the current frame by the method in the step (3.2), finally obtaining 3D-2D matching according to data association in the step (3.3), finally obtaining 6DoF postures according to PnP solution, and finishing the fine positioning function of layered positioning.
Table 1 gives, for example, a comparison of the visual mapping method of the present invention with some prior disclosed SLAM methods. The sparse and dense evaluation criteria in table 1 are the number of map 3D points, if the map is constructed by using only a small number of feature points, such as 2000 feature points selected by the scheme, the map is sparse, and if all pixel points in the selected image are constructed, the map is considered dense. In table 1, the coarse positioning mode DBoW is a bag-of-words model, which requires pre-training a dictionary of features for image retrieval, and finally outputs a word vector for coarse positioning for each image.
According to the comparison in table 1, in the prior art scheme, map reuse is basically not considered, which means that the corresponding method cannot perform globally consistent positioning, and the positioning result of each time is related to the initial position of the acquired data.
TABLE 1 comparison of the Performance of the location method of the present invention with that of the prior art method
Figure 234330DEST_PATH_IMAGE046
In terms of robustness, table 2 compares the VINS-Mono scheme closest to the present invention in the collected data set. Because the positioning precision of the invention is related to the camera pose of the image construction, the rough positioning result can only be compared with the VINS-Mono, the total frame number of 7 sequences is counted, and the frame number of the image which passes through the same place and is collected for at least two times in the sequence is counted and recorded as the number of the loop frame. The 7 sequences of images contain challenging factors such as motion blur, occlusion, scene lighting changes and the like. As can be seen from the results in table 2, the present invention has a great improvement in the success rate and robustness of coarse positioning, which is a front step of fine positioning, and all of which have great improvements on the whole positioning system.
TABLE 2 comparison of the coarse positioning scheme of the present invention with the VINS-Mono method
Figure 211513DEST_PATH_IMAGE047
Therefore, the visual map construction method of the invention can solve the problem that the traditional SLAM system can not realize map multiplexing on one hand, and can improve the robustness of global positioning on the other hand.
The above are merely preferred embodiments of the present invention; the scope of the invention is not limited thereto. Any person skilled in the art should be able to cover the technical scope of the present invention by equivalent or modified solutions and modifications within the technical scope of the present invention.

Claims (3)

1. A visual map construction method for visual layered positioning is characterized in that: the method comprises the following steps:
(1) sequentially acquiring binocular image frames acquired by the binocular camera and recording the motion trail of the binocular camera to obtain a binocular image sequence containing the motion trail
Figure 967432DEST_PATH_IMAGE001
Wherein, in the step (A),
Figure 27661DEST_PATH_IMAGE002
is shown as
Figure 142248DEST_PATH_IMAGE003
The left frame image to which the frame corresponds,
Figure 798488DEST_PATH_IMAGE004
is shown as
Figure 580499DEST_PATH_IMAGE003
The right frame image to which the frame corresponds,
Figure 811629DEST_PATH_IMAGE005
is an external parameter of the binocular camera,
Figure 288878DEST_PATH_IMAGE006
is as follows
Figure 139023DEST_PATH_IMAGE003
The pose of the frame left camera relative to the world coordinate system, and N is the frame number of the binocular image; using the first frame as a key frame, and aiming at the binocular image sequence
Figure 634595DEST_PATH_IMAGE001
Selecting subsequent key frames to form a key frame sequence
Figure 646413DEST_PATH_IMAGE007
,
Figure 345379DEST_PATH_IMAGE008
Selecting a binocular image frame number N;
(2) respectively extracting NetVLAD global descriptors from key frames
Figure 733635DEST_PATH_IMAGE009
SuperPoint feature points
Figure 352223DEST_PATH_IMAGE010
And corresponding local descriptors
Figure 410308DEST_PATH_IMAGE011
And according to SuperPoint feature points
Figure 986783DEST_PATH_IMAGE010
The response values are sorted from high to low, the first 2000 feature points and the corresponding local descriptors are reserved, and for the second
Figure 37785DEST_PATH_IMAGE008
Obtaining two triple description information by one key frame
Figure 383316DEST_PATH_IMAGE012
(ii) a The NetVLAD global descriptor
Figure 877882DEST_PATH_IMAGE009
Is composed of
Figure 676074DEST_PATH_IMAGE013
Feature vectors of dimensions, the local descriptors
Figure 530766DEST_PATH_IMAGE011
Is composed of
Figure 606169DEST_PATH_IMAGE014
A vector of dimensions;
(3) finding out SuperPoint feature points of key frames
Figure 396271DEST_PATH_IMAGE010
And incrementally restoring the 3D position of each SuperPoint feature point
Figure 540813DEST_PATH_IMAGE015
(4) Traversing each of the steps (3)
Figure 340142DEST_PATH_IMAGE015
Statistical observation of 3D position
Figure 270052DEST_PATH_IMAGE015
All key frames of the points are sorted in descending order, and the first 5 key frames are taken as the optimal common-view key frame of the key frames
Figure 965475DEST_PATH_IMAGE016
(ii) a And finally, corresponding the key frame sequence and each SuperPoint characteristic point to a 3D position
Figure 865823DEST_PATH_IMAGE015
As visual map information.
2. The visual map construction method for visual layered positioning according to claim 1,
the selection of the subsequent key frame needs to satisfy the following conditions: the Euclidean distance between adjacent left frame images or right frame images is larger than 0.3 meter, and the rotation angle of the adjacent left frame images or right frame images is larger than 3 degrees.
3. A visual mapping method for visual layered positioning according to claim 1, wherein step (3) comprises the sub-steps of:
(3.1) describing information for a certain triple
Figure 937684DEST_PATH_IMAGE017
SuperPoint feature points in (1)
Figure 987680DEST_PATH_IMAGE018
Selecting the characteristic points which can observe SuperPoint simultaneously
Figure 244218DEST_PATH_IMAGE010
Candidate frame set of
Figure 238718DEST_PATH_IMAGE019
Wherein A is selected from the key frames, satisfying: pose of keyframes in the set of candidate frames C
Figure 989637DEST_PATH_IMAGE020
And
Figure 18773DEST_PATH_IMAGE021
the movement translation distance of the rotating shaft is less than 10m, and the rotating angle is less than 45 degrees;
(3.2) traverse each frame in the candidate frame set C, pair
Figure 180632DEST_PATH_IMAGE022
The local descriptor corresponding to each feature point in
Figure 928009DEST_PATH_IMAGE023
Calculating
Figure 217039DEST_PATH_IMAGE011
And
Figure 366260DEST_PATH_IMAGE023
and obtaining two nearest local descriptors
Figure 699021DEST_PATH_IMAGE024
And
Figure 668114DEST_PATH_IMAGE025
satisfy the following requirements
Figure 760835DEST_PATH_IMAGE026
Then local descriptor
Figure 498984DEST_PATH_IMAGE024
Corresponding to
Figure 259437DEST_PATH_IMAGE027
And SuperPoint feature points
Figure 856772DEST_PATH_IMAGE010
For mutually matching points, SuperPoint characteristic points are obtained after traversal is finished
Figure 612238DEST_PATH_IMAGE028
Set of all match points in candidate frame set C
Figure 595106DEST_PATH_IMAGE029
SuperPoint feature points
Figure 410616DEST_PATH_IMAGE018
Corresponding 3D position
Figure 964088DEST_PATH_IMAGE015
The associated matching information is
Figure 523245DEST_PATH_IMAGE030
Establishing a constraint equation to solve according to the internal reference K of the binocular camera
Figure 360620DEST_PATH_IMAGE015
The constraint equation is:
Figure 347031DEST_PATH_IMAGE031
if it is obtained
Figure 653378DEST_PATH_IMAGE015
If Z is negative or greater than 40m, the value is discarded
Figure 16226DEST_PATH_IMAGE015
(3.3) traversing each SuperPoint feature point
Figure 708108DEST_PATH_IMAGE010
And obtaining the 3D position corresponding to each SuperPoint characteristic point.
CN202110175262.1A 2021-02-07 2021-02-07 Visual map construction method for visual layered positioning Active CN112562081B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110175262.1A CN112562081B (en) 2021-02-07 2021-02-07 Visual map construction method for visual layered positioning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110175262.1A CN112562081B (en) 2021-02-07 2021-02-07 Visual map construction method for visual layered positioning

Publications (2)

Publication Number Publication Date
CN112562081A true CN112562081A (en) 2021-03-26
CN112562081B CN112562081B (en) 2021-05-11

Family

ID=75035905

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110175262.1A Active CN112562081B (en) 2021-02-07 2021-02-07 Visual map construction method for visual layered positioning

Country Status (1)

Country Link
CN (1) CN112562081B (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113781563A (en) * 2021-09-14 2021-12-10 中国民航大学 Mobile robot loop detection method based on deep learning
CN114639006A (en) * 2022-03-15 2022-06-17 北京理工大学 Loop detection method and device and electronic equipment
CN114674328A (en) * 2022-03-31 2022-06-28 北京百度网讯科技有限公司 Map generation method, map generation device, electronic device, storage medium, and vehicle
CN114694013A (en) * 2022-04-11 2022-07-01 北京理工大学 Distributed multi-machine cooperative vision SLAM method and system
CN115049731A (en) * 2022-06-17 2022-09-13 感知信息科技(浙江)有限责任公司 Visual mapping and positioning method based on binocular camera

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110070615A (en) * 2019-04-12 2019-07-30 北京理工大学 A kind of panoramic vision SLAM method based on polyphaser collaboration
CN111292420A (en) * 2020-02-28 2020-06-16 北京百度网讯科技有限公司 Method and device for constructing map
CN111652934A (en) * 2020-05-12 2020-09-11 Oppo广东移动通信有限公司 Positioning method, map construction method, device, equipment and storage medium
CN111768498A (en) * 2020-07-09 2020-10-13 中国科学院自动化研究所 Visual positioning method and system based on dense semantic three-dimensional map and mixed features

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110070615A (en) * 2019-04-12 2019-07-30 北京理工大学 A kind of panoramic vision SLAM method based on polyphaser collaboration
CN111292420A (en) * 2020-02-28 2020-06-16 北京百度网讯科技有限公司 Method and device for constructing map
CN111652934A (en) * 2020-05-12 2020-09-11 Oppo广东移动通信有限公司 Positioning method, map construction method, device, equipment and storage medium
CN111768498A (en) * 2020-07-09 2020-10-13 中国科学院自动化研究所 Visual positioning method and system based on dense semantic three-dimensional map and mixed features

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
XIAO HAN等: "SuperPointVO A Lightweight Visual Odometrybased on CNN Feature Extraction", 《IEEE》 *
唐灿 等: "图像特征检测与匹配方法研究综述", 《南京信息工程大学学报》 *

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113781563A (en) * 2021-09-14 2021-12-10 中国民航大学 Mobile robot loop detection method based on deep learning
CN113781563B (en) * 2021-09-14 2023-10-24 中国民航大学 Mobile robot loop detection method based on deep learning
CN114639006A (en) * 2022-03-15 2022-06-17 北京理工大学 Loop detection method and device and electronic equipment
CN114639006B (en) * 2022-03-15 2023-09-26 北京理工大学 Loop detection method and device and electronic equipment
CN114674328A (en) * 2022-03-31 2022-06-28 北京百度网讯科技有限公司 Map generation method, map generation device, electronic device, storage medium, and vehicle
CN114694013A (en) * 2022-04-11 2022-07-01 北京理工大学 Distributed multi-machine cooperative vision SLAM method and system
CN114694013B (en) * 2022-04-11 2022-11-15 北京理工大学 Distributed multi-machine cooperative vision SLAM method and system
CN115049731A (en) * 2022-06-17 2022-09-13 感知信息科技(浙江)有限责任公司 Visual mapping and positioning method based on binocular camera

Also Published As

Publication number Publication date
CN112562081B (en) 2021-05-11

Similar Documents

Publication Publication Date Title
CN112562081B (en) Visual map construction method for visual layered positioning
CN109166149B (en) Positioning and three-dimensional line frame structure reconstruction method and system integrating binocular camera and IMU
CN103577793B (en) Gesture identification method and device
Li et al. Object detection in the context of mobile augmented reality
CN107016319B (en) Feature point positioning method and device
CN110555408B (en) Single-camera real-time three-dimensional human body posture detection method based on self-adaptive mapping relation
CN112598775B (en) Multi-view generation method based on contrast learning
CN109272577B (en) Kinect-based visual SLAM method
Laga A survey on deep learning architectures for image-based depth reconstruction
CN110119768B (en) Visual information fusion system and method for vehicle positioning
CN110942476A (en) Improved three-dimensional point cloud registration method and system based on two-dimensional image guidance and readable storage medium
CN112419497A (en) Monocular vision-based SLAM method combining feature method and direct method
Dharmasiri et al. MO-SLAM: Multi object slam with run-time object discovery through duplicates
CN115147599A (en) Object six-degree-of-freedom pose estimation method for multi-geometric feature learning of occlusion and truncation scenes
Wang et al. Joint head pose and facial landmark regression from depth images
CN113592015B (en) Method and device for positioning and training feature matching network
CN111402331A (en) Robot repositioning method based on visual word bag and laser matching
CN112329662B (en) Multi-view saliency estimation method based on unsupervised learning
Shin et al. Loop closure detection in simultaneous localization and mapping using descriptor from generative adversarial network
CN111626417A (en) Closed loop detection method based on unsupervised deep learning
CN110070626B (en) Three-dimensional object retrieval method based on multi-view classification
Lu et al. Model and exemplar-based robust head pose tracking under occlusion and varying expression
CN115496859A (en) Three-dimensional scene motion trend estimation method based on scattered point cloud cross attention learning
Song et al. ConcatNet: A deep architecture of concatenation-assisted network for dense facial landmark alignment
Ma et al. Capsule-based regression tracking via background inpainting

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant