CN112001970A

CN112001970A - Monocular vision odometer method based on point-line characteristics

Info

Publication number: CN112001970A
Application number: CN202010866388.9A
Authority: CN
Inventors: 郭继峰; 白成超; 郑红星; 郭爽
Original assignee: Harbin Institute of Technology
Current assignee: Harbin Institute of Technology
Priority date: 2020-08-25
Filing date: 2020-08-25
Publication date: 2020-11-27

Abstract

A monocular vision odometer method based on dotted line characteristics aims at the problem that a traditional monocular camera vision navigation scheme generally faces scale loss, the method can realize vision navigation requirements under a real environment by only depending on one monocular camera, and can calculate position and posture information of a camera body in real time. The problem of missing dimension is avoided. In addition, the addition of line segment features makes the present invention more accurate and robust in artificial environments represented by indoor, city streets.

Description

Monocular vision odometer method based on point-line characteristics

Technical Field

The invention relates to the technical field of visual navigation, in particular to a monocular visual odometer method based on point-line characteristics.

Background

In recent years, with rapid development of industries such as automatic driving, AR/VR, mobile robot autonomous navigation, and the like, demands for visual navigation technology are increasing. In such a context, various visual navigation techniques emerge endlessly, distinguished from the sensor hardware used, by: monocular cameras, binocular cameras, RGB-D cameras, etc.; distinguished from algorithmic software, there are: direct method, feature point method, and the like. Because of the stereo perception capability, the current binocular and RGB-D cameras are used in more actual scenes, but the depth range of the binocular camera is determined by the length of the base line of the binocular camera, and the size and the weight of the camera are not acceptable while the wide range is realized; RGB-D cameras are also limited to indoor applications due to their measurement principle. Compared with the prior art, the monocular camera has the advantages of small volume, low cost and low power consumption, but due to the physical limitation, the depth information of a scene cannot be acquired, so that the conventional visual navigation scheme of the monocular camera generally faces the problem of scale loss, namely, only the posture can be estimated, and the displacement lacks scale information, so that the monocular camera is difficult to use in the actual scene.

Disclosure of Invention

The purpose of the invention is: aiming at the problem that the visual navigation scheme of the traditional monocular camera generally faces scale loss, a monocular visual odometer method based on point-line characteristics is provided.

The technical scheme adopted by the invention to solve the technical problems is as follows:

a monocular visual odometry method based on dotted line characteristics comprises the following steps:

the method comprises the following steps: acquiring an image sequence shot by a monocular camera in real time;

step two: roughly estimating the relative pose between two adjacent frames of pictures of the image sequence;

step three: re-projecting the point-line feature in the map to the current frame according to the relative pose, and optimizing the projection position of the point-line feature by minimizing the luminosity error;

step four: the reprojection error of the point-line characteristic is optimized, and when the reprojection error is converged, a track estimation value is obtained, wherein the track estimation value comprises positions and postures at different moments;

step five: judging whether the current frame is a key frame, if so, extracting feature points and feature line segments from the current frame, initializing the depth values of the extracted feature points and the feature line segments by using a monocular depth estimation method, and if not, updating the depth distribution of the seed points by using the mapping relation between the 2D point line features in the current frame and the 3D points in the map;

step six: when the depth distribution of the seed point is converged, the seed point is inserted into a map for a tracking thread to use.

Further, the monocular visual odometry method based on the dotted line features is characterized by comprising the following steps of:

Further, photometric error optimization in the third step is performed through feature alignment.

Further, the method for optimizing the reprojection error of the dotted line feature in the fourth step is an L-M method or a Gaussian Newton method.

Further, the method for determining the convergence of the reprojection error in the fourth step is as follows: a threshold is set and convergence is considered when the reprojection error is less than the threshold.

Further, extracting feature points in the fifth step utilizes a FAST feature point algorithm

Further, the feature line segments in the fifth step are extracted by using an LSD algorithm.

Further, in the fifth step, the extracted depth values of the feature points and the feature line segments are initialized according to the depth map output by the monocular depth estimation module.

Further, the condition for judging whether the current frame is the key frame in the fifth step is as follows: and setting a threshold, and judging the current frame as the key frame when the number of the remaining key points in the current frame is less than the threshold.

Further, the method for determining convergence of depth distribution of seed points in the sixth step includes: when the uncertainty of the depth distribution decreases to one sixth of the initial uncertainty, then the depth distribution for that seed point has converged.

The invention has the beneficial effects that:

the invention can realize the visual navigation requirement under the real environment only by depending on a monocular camera, and calculate the position and the attitude information of the camera body in real time. The problem of missing dimension is avoided. In addition, the addition of line segment features makes the present invention more accurate and robust in artificial environments represented by indoor, city streets.

Drawings

FIG. 1 is a schematic diagram of a monocular depth estimation module network architecture;

FIG. 2 is a monocular depth estimation module test chart;

FIG. 3 is a depth map output by the monocular depth estimation module;

FIG. 4 is a diagram illustrating the variation of error of predicted depth values with real distance of the monocular depth estimation module;

FIG. 5 is a schematic view of a visual mileage calculation method based on monocular depth estimation, incorporating dotted line features;

FIG. 6 is a schematic diagram of model-based image sparse alignment;

FIG. 7 is a schematic view of feature alignment;

FIG. 8 is a schematic diagram of camera pose and scene structure optimization;

FIG. 9 is a diagram of an apparatus in accordance with an embodiment;

FIG. 10 is a graph of accuracy versus time for the present invention.

Detailed Description

In order to solve the problems in the prior art, the invention provides a method which only uses a monocular camera as a sensor, restores the scale information of a track and a map structure by adding an additional monocular depth estimation module based on depth learning, and adds line features to improve the robustness of the system.

The first embodiment is as follows: referring to fig. 1, the embodiment will be described in detail, and the monocular visual odometer method based on the dotted line feature in the embodiment includes the following steps, as shown in fig. 5:

step two: as shown in fig. 6, for two adjacent frames of pictures acquired in step one, the relative pose between the two frames is roughly estimated by minimizing the photometric error of the corresponding pixel;

step three: as shown in fig. 7, for the relative pose estimated in step two, the dotted line features in the map are re-projected to the current frame, and optimization is performed through feature alignment;

step four: as shown in fig. 8, the reprojection error of the dotted line feature is optimized so as to optimize the pose of the current frame and the map structure;

step five: judging whether the current frame is a key frame, if so, extracting feature points and feature line segments, performing position initialization on the current frame by using a depth map output by a monocular depth estimation module, and if not, updating the depth distribution of the seed points by using the mapping relation between the 2D point line features in the current frame and the 3D points in the map;

step six: when the uncertainty of the depth distribution is reduced to be small enough, the depth distribution of the seed point is considered to be converged and is inserted into the map for the tracking thread to use.

Wherein the second, third and fourth steps are collectively called tracking threads, the fifth step, the sixth step is called graph building threads, and the two threads run in parallel.

However, due to the physical limitation of the monocular camera, the depth information of the scene cannot be acquired, so that the conventional visual navigation scheme of the monocular camera generally faces the problem of scale loss, that is, only the posture can be estimated, and the displacement lacks scale information, so that the conventional visual navigation scheme is difficult to use in the actual scene. The invention adopts a monocular depth estimation method to initialize the depth values of the extracted characteristic points and the characteristic line segments, and recovers the scale information of the motion trail compared with the prior art.

Monocular depth estimation principle:

a monocular image I is given during testing, and the target is to learn a function capable of predicting the depth of each pixel point in a scene

The depth estimation problem is translated into the left and right image reconstruction problem during training, so the problem now translates into: given a set of rectified binocular images, if a mapping function for recovering one image from another can be learned through training, three-dimensional information in the captured scene is extracted, wherein the required depth information is contained.

Specifically, the left image I taken simultaneously in the raw data needs to be used simultaneously during training^lAnd right picture I^r. It should be emphasized here that the algorithm does not fit depth directly, but predicts the disparity maps d of the left and right images through the neural network^lAnd d^rAnd is combined with the original left and right images I in the data set^lAnd I^rAdding to reconstruct the predicted left and right images I^l(d^r) And I^r(d^l). According to the geometric principle, with the known binocular camera base length b and camera focal length f, the formula can be used:

the depth is calculated.

The following examples were used to demonstrate the beneficial effects of the present invention:

the first embodiment is as follows:

1) design of experiments

In order to verify the correctness and the rationality of the visual odometer, the effectiveness of the monocular depth estimation module is verified firstly in the experiment of the visual odometer, and then the effectiveness of the visual odometer which is embedded into the monocular depth estimation module and is based on the dotted line characteristics is verified.

2) Results and analysis of the experiments

And selecting pictures in the KITTI public data set, verifying the depth estimation effect, wherein the test pictures are shown in figure 2, and the depth map output by the network is shown in figure 3. One of the rows of pixels is selected, and the variation of the error of the predicted depth value with the real distance is calculated as shown in fig. 4.

For the test of the visual odometer system, a camera is carried on a trolley platform and is aligned with the positioning data of a GPS under a satellite map, and the GPS needs to be noticed that the GPS does not have too high interference precision due to signal reception between high buildings, and only reference is made here, as shown in fig. 9, therefore, the invention can achieve sufficient navigation positioning precision by only depending on a monocular camera, and realize the visual navigation of a mobile platform (a vehicle, an unmanned aerial vehicle and the like).

It should be noted that the detailed description is only for explaining and explaining the technical solution of the present invention, and the scope of protection of the claims is not limited thereby. It is intended that all such modifications and variations be included within the scope of the invention as defined in the following claims and the description.

Claims

1. A monocular visual odometry method based on dotted line characteristics is characterized by comprising the following steps:

2. The method of claim 1, wherein the second step of estimating the relative pose between two adjacent frames of the image sequence by minimizing the photometric error of the corresponding pixel.

3. The method of claim 1, wherein the step three photometric error optimization is performed by feature alignment.

4. The method of claim 1, wherein the method for optimizing reprojection error of dotted line feature in step four is L-M method or Gaussian Newton method.

5. The method of claim 1, wherein the method for determining convergence of the reprojection error in the step four comprises: a threshold is set and convergence is considered when the reprojection error is less than the threshold.

6. The method of claim 1, wherein the extracting feature points in the fifth step utilizes FAST feature point algorithm.

7. The method of claim 1, wherein the feature line segments in the fifth step are extracted by using LSD algorithm.

8. The method of claim 1, wherein the depth values of the extracted feature points and feature line segments are initialized according to the depth map outputted from the monocular depth estimation module in step five.

9. The method of claim 1, wherein the condition of determining whether the current frame is a key frame in the step five is: and setting a threshold, and judging the current frame as the key frame when the number of the remaining key points in the current frame is less than the threshold.

10. The method of claim 1, wherein the convergence of the depth distribution of the seed points in the sixth step is determined by: when the uncertainty of the depth distribution decreases to one sixth of the initial uncertainty, then the depth distribution for that seed point has converged.