CN112085790A

CN112085790A - Point-line combined multi-camera visual SLAM method, equipment and storage medium

Info

Publication number: CN112085790A
Application number: CN202010819166.1A
Authority: CN
Inventors: 史文中; 李铁维; 王牧阳
Original assignee: Shenzhen Research Institute HKPU
Current assignee: Shenzhen Research Institute HKPU
Priority date: 2020-08-14
Filing date: 2020-08-14
Publication date: 2020-12-15

Abstract

The invention provides a point-line combined multi-camera vision SLAM method, equipment and a storage medium, which are characterized in that multi-angle image data of a target scene are collected, point features and line features in the multi-angle image data are extracted and matched, position information of the point features and the line features in a three-dimensional space is obtained, camera pose preliminary estimation is carried out on each frame of image in the multi-angle image data, a graph structure is constructed by combining the extracted and matched point features, line features and the camera pose preliminary estimation result, and a three-dimensional map is determined according to the extracted point features, line features and the graph structure. In the embodiment, a point and line feature joint calculation method is adopted, and the line features contain more information, so that the tracking stability and accuracy can be improved, and the scene can be more clearly and intuitively abstracted and described by using the sparse feature map constructed by using the line features.

Description

Point-line combined multi-camera visual SLAM method, equipment and storage medium

Technical Field

The invention relates to the technical field of computer vision, in particular to a dotted-line combined multi-camera vision SLAM method, equipment and a storage medium.

Background

The SLAM algorithm is widely applied to autonomous navigation and environment recognition of robots in scenes such as augmented reality, aerospace and underwater, and has important theoretical significance and practical value. Among them, visual SLAM based on serial video data has been a hot issue of research due to its advantages such as economy and portability. How to effectively extract the characteristics with representative and tracking meanings from the image data, describe the characteristics by using a proper algebraic language, and how to fully utilize the information to recover the camera posture and the scene structure are the key concerns of the SLAM algorithm.

The feature method is a mainstream method in the conventional visual SLAM, and comprises the steps of extracting abstract geometric features from images, carrying out data association to solve the relative pose relationship among different frames, recovering camera tracks and constructing a sparse feature map. The conventional feature method visual SLAM has the following disadvantages: (1) the camera used in the general visual SLAM system has a narrow viewing angle and acquires limited information within a single frame. (2) The point features used in a large number of algorithms are single in dimension, contain less scene information compared with other high-dimensional features, and are easy to lose when the image data quality is poor or the camera moves too fast, so that the tracking result is unstable, the constructed map is not intuitive enough, and the real scene is difficult to reflect, so that the feature method visual SLAM in the prior art cannot meet the requirements of feature tracking stability and high precision.

Therefore, the prior art is subject to further improvement.

Disclosure of Invention

In view of the defects of the prior art, the present invention aims to provide a multi-camera visual SLAM method, an apparatus and a storage medium for combining point and line to overcome the defect that the stability and the precision cannot meet the requirement when a feature method is used for feature tracking in the prior art.

The technical scheme of the invention is as follows:

in a first aspect, the present embodiment discloses a dotted-line combined multi-camera visual SLAM method, including:

collecting multi-angle image data of a target scene;

extracting and matching point features and line features in the multi-angle image data, and obtaining position information of the point features and the line features in a three-dimensional space;

performing camera pose preliminary estimation on each frame of image in the multi-angle image data, and constructing a graph structure by combining the extracted and matched point features, line features and the camera pose preliminary estimation result;

determining a three-dimensional map from the extracted point features, line features, and the graph structure.

Optionally, before the step of extracting the point feature and the line feature in the multi-angle image data, the method further includes:

carrying out distortion correction processing on the acquired multi-angle image data to obtain distortion-corrected intermediate image data;

and performing mask processing on the intermediate image data by using a preset camera mask to obtain the multi-angle image data after the preprocessing.

Optionally, the step of extracting the point feature and the line feature in the multi-angle image data includes:

extracting corner features in the multi-angle image data by using a feature point extraction algorithm, performing corner feature description by using an rBRIEF descriptor, and performing feature matching based on the corner description to obtain extracted point features;

extracting line segment features in the image by using a line feature extraction algorithm, describing the extracted line segment features by using an LBD (local binary decomposition) description operator, and performing line feature matching based on the line segment feature description; and obtaining the extracted line features.

Optionally, the step of extracting line segment features in the image by using a line feature extraction algorithm, describing the extracted line segment features by using an LBD descriptor, and performing line feature matching based on the line segment feature description includes:

extracting a structural line segment from the multi-angle image data by using a line feature extraction algorithm;

selecting a preset number of structural line segments, performing image feature description on the selected structural line segments by using a descriptor, and performing structural line segment matching according to the image feature description;

and describing the straight line matched by the structural line segment by using a four-dimensional orthogonal description formula of the Prock coordinate to obtain a plurality of extracted line features.

Optionally, the step of performing structural line segment matching according to the image special description includes:

respectively matching structural line segments in a plurality of channels of the image, and marking the structural line segments which meet preset matching conditions in at least one channel as effective line segments to obtain a first effective line segment set;

matching each line segment marked as the effective line segment again by using a k nearest neighbor method to obtain an effective line segment obtained by matching again and obtain a second effective line segment set;

and carrying out bidirectional matching on each effective line segment in the second effective line segment set to obtain an optimal matching pair.

Optionally, the step of performing a camera pose preliminary estimation on each frame of image in the multi-angle image data, and combining the extracted and matched point features, line features, and the camera pose preliminary estimation result to construct a graph structure includes:

determining the relative pose relationship of the initial frame by using epipolar geometry;

estimating the pose relationship of the frame to be solved relative to the known frame by a camera motion state estimation method and an EPnP method to obtain the relative pose relationship of each frame to be solved;

and constructing a graph structure by taking the point feature, the line feature and the camera position as vertexes and taking the projection relation of the point feature and the line feature.

Optionally, the step of constructing a graph structure by using the point feature, the line feature and the camera position as vertices and using the projection relationship between the point feature and the line feature includes:

adding line features and point features associated with the pose of the camera to be solved into a graph structure as vertexes;

respectively constructing multi-vertex edges in a G2O open-source library according to the visual relationship of the point features and the line features in the resolving frame; the multi-vertex edge is a corresponding relative pose relation among the point feature, the line feature and the camera;

calculating a Jacobian matrix of the point feature reprojection errors to the camera attitude in each frame of image and calculating a Jacobian matrix of the reprojection errors to the camera attitude in each frame of image according to the corresponding relative pose relations between the point feature, the line feature and the camera in the multi-vertex edge;

and (4) iteratively solving the position and the characteristic coordinates of the camera by using an optimization algorithm according to the solved reprojection error.

Optionally, the step of determining a three-dimensional map according to the extracted point features, line features and the graph structure further includes:

based on the common view relation and the similarity between adjacent frames, screening the adjacent frames in the multi-angle image data to obtain a closed-loop standby group;

detecting a matching area corresponding to the closed loop standby group by using a bag-of-words closed loop detection method, correcting the co-view relation of the current frame according to a detection result, updating coordinate values of feature points in the matching area, and correcting the position of the closed loop in a world coordinate system;

updating the public time domain and connection relation between the past frames in the graph structure according to the detected closed loop;

and cutting the extracted straight line by combining the updated graph structure and using the end point coordinates of the pre-stored line features on the multi-angle image, and determining the three-dimensional map by using the cut line segment.

In a second aspect, the present embodiment discloses an information processing apparatus, comprising a processor, a storage medium communicatively coupled to the processor, the storage medium adapted to store a plurality of instructions; the processor is adapted to invoke instructions in the storage medium to perform the steps of implementing the dotted line combined multi-camera visual SLAM method described above.

In a third aspect, the present embodiments disclose a computer readable storage medium having one or more programs stored thereon that are executable by one or more processors to implement the steps of the dotted line combined multi-camera visual SLAM method as described.

Has the advantages that: the invention provides a point-line combined multi-camera vision SLAM method, system and equipment, which are characterized in that multi-angle image data of a target scene are collected, point features and line features in the multi-angle image data are extracted and matched, position information of the point features and the line features in a three-dimensional space is obtained, camera pose preliminary estimation is carried out on each frame of image in the multi-angle image data, a graph structure is constructed by combining the extracted and matched point features, line features and camera pose preliminary estimation results, and a three-dimensional map is determined according to the extracted point features, line features and the graph structure. In the embodiment, a point and line feature joint calculation method is adopted, and the line features contain more information, so that the tracking stability and accuracy can be improved, and the scene can be more clearly and intuitively abstracted and described by using the sparse feature map constructed by using the line features.

Drawings

FIG. 1 is a flow chart of the steps of a dotted line combined multi-camera vision SLAM method of the present invention;

FIG. 2 is a schematic diagram illustrating the use of a camera mask to capture image data in an embodiment of the present invention;

FIG. 3(a) is a diagram illustrating a parameter θ in a linear orthogonal description method according to an embodiment of the present invention₁A schematic representation of the geometrical meaning;

FIG. 3(b) is a diagram illustrating a parameter θ in a linear orthogonal description method according to an embodiment of the present invention₂A schematic representation of the geometrical meaning;

FIG. 3(c) is a diagram illustrating a parameter θ in a linear orthogonal description method according to an embodiment of the present invention₃A schematic representation of the geometrical meaning;

fig. 4 is a schematic diagram of a result of a map with sparse features corresponding to a local scene according to an embodiment of the present invention;

fig. 5 is a schematic structural diagram of an information processing apparatus in an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention clearer and clearer, the present invention is further described in detail below with reference to the accompanying drawings and examples. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

As used herein, the singular forms "a", "an", "the" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof. It will be understood that when an element is referred to as being "connected" or "coupled" to another element, it can be directly connected or coupled to the other element or intervening elements may also be present. Further, "connected" or "coupled" as used herein may include wirelessly connected or wirelessly coupled. As used herein, the term "and/or" includes all or any element and all combinations of one or more of the associated listed items.

It will be understood by those skilled in the art that, unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. It will be further understood that terms, such as those defined in commonly used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of the prior art and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein.

The inventor finds that the camera used in the SLAM algorithm in the prior art has narrow vision, the information which can be acquired by a single-frame picture is limited, and the tracking result is unstable because the camera track is recovered based on the point characteristics, so that the constructed map cannot reflect the real scene.

In order to overcome the above problems in the prior art, the present embodiment discloses a multi-camera visual SLAM method, device and storage medium with combined point and line, which extracts point features and line features in multi-angle image data, obtains position descriptions of the point features and the line features, and establishes a corresponding relationship between the point features and the line features between frames by using a matching algorithm; and then, mathematical description is respectively carried out on point-line characteristics in a two-dimensional image plane and a three-dimensional world by adopting a model, a projection relation is established, a graph structure is established according to the point characteristics, the line characteristics, the point characteristics, the line characteristics and the camera pose primary estimation result, and the visualization of a camera track and a three-dimensional characteristic map is realized.

The method, system, and apparatus of the present invention are described in further detail below with reference to the following figures and examples.

Exemplary method

In a first aspect, the present embodiment discloses a dotted-line combined multi-camera visual SLAM method, as shown in fig. 1, including:

and step S1, collecting multi-angle image data of the target scene.

The method comprises the steps of firstly collecting multi-angle image data in a target scene area, and in order to obtain panoramic data containing more information, collecting the multi-angle image data in the target scene area by using Ladybug5+ multi-camera panoramic equipment in the step.

Because the 6 cameras carried on the Ladybug panorama device simultaneously acquire 360-degree scene data. Compared with a system for acquiring data by using a single common camera, more scene information can be acquired by using a plurality of wide-angle cameras to acquire data in the same long time; the track operation result integrates the information of each angle, and the stability of the system is improved.

In order to facilitate processing of the acquired multi-angle image data, after the multi-angle image data are acquired, preprocessing is also performed on the acquired multi-angle image data. The step of pre-treating comprises:

and step S01, distortion correction processing is carried out on the acquired multi-angle image data to obtain distortion corrected intermediate image data.

When the step is performed by using a Ladybug5+ multi-camera panoramic device, distortion correction can be directly performed on the image by using a Ladybug SDK kit.

And step S02, performing masking processing on the intermediate image data by using a preset camera mask to obtain the pre-processed multi-angle image data.

During data acquisition, the outline of the head of the acquirer and the outline of other sensors carried on the backpack appear on the image. In order to avoid that features are extracted from the boundary of an image and a frame and a shot interference object mistakenly to influence subsequent feature tracking and trajectory solving, distortion correction is performed on the image by using a Laydback SDK tool, then mask processing is performed on the images shot by using different lenses respectively, and the mask corresponding to the corrected image is shown in FIG. 2. Because the positions of the lenses on the data acquisition backpack are different, and the shielding parts in the images are different, masks are manufactured according to the different lenses, each lens is unique and fixed, and the masks of each lens have adaptability to all images shot by the lens.

Further, the method also comprises the following steps: and acquiring internal and external parameters of the Ladybug device by using a Ladybug SDK toolkit, wherein the internal and external parameters comprise focal lengths of 6 cameras carried by the Ladybug, image center coordinates and rotation and translation parameters relative to the center of the device.

And S2, extracting and matching the point features and the line features in the multi-angle image data, and obtaining the position information of the point features and the line features in the three-dimensional space.

The method comprises the steps of extracting point features and line features from multi-angle image data collected by a camera, and acquiring position information of the point features and the line features in a three-dimensional space in the extracted multi-angle image data.

Specifically, the method comprises the steps of respectively carrying out extraction and matching operation of point-line characteristics on the obtained sequence image data, and carrying out geometric description on the positions of the point-line characteristics in the image and the space by using a proper method.

Further, the extracting and matching operation of the point line features includes: and extracting corner features in the multi-angle image data by using a feature point extraction algorithm, performing corner feature description by using an rBRI EF descriptor, and performing feature matching based on the corner description to obtain the extracted point features. Extracting line segment features in the image by using a line feature extraction algorithm, describing the extracted line segment features by using an LBD (local binary decomposition) description operator, and performing line feature matching based on the line segment feature description; and obtaining the extracted line features.

Specifically, the method comprises the following steps:

and step S21, extracting point features in the sequence images, and performing matching and geometric description. The specific process is as follows:

and S21.1, extracting point features in the image by using an ORB algorithm. The steps for extracting point features using other algorithms are similar to ORB. The method comprises the steps of ensuring that point features are uniformly distributed in an image, dividing the image into a plurality of grids before extracting the features, extracting FAST angular points in each grid by using an initial threshold, if extraction fails, extracting by using a minimum threshold slightly larger than the initial threshold, and adjusting the threshold according to effects to enable the point features with similar quantities to be extracted in each grid.

Firstly, extracting a plurality of angular points in an image by using a FAST angular point detection algorithm, then calculating Harris response of each angular point, screening out N points with the maximum Harris response as optimal characteristics, then establishing an image pyramid, generally establishing 8 layers of pyramids, setting the 0 th layer scale of the pyramid to be 1, the 1 st layer scale to be 1.2, and then setting the 2 nd layer scale to be 1.2²And so on. And respectively extracting angular points from the 8 pyramid images, recording the pyramid layer number where the angular points are located as scale information of the points, and determining the directions of the angular points by using moments (moment).

For a certain angular point to be solved, firstly, a brightness weighted centroid within a radius r range with the point as a circle center is calculated, and the direction of a vector moment formed from the centroid to the point is taken as the direction of the angular point.

Using I (x, y) to represent the luminance of a corner (x, y), the moments of the corner are:

the coordinates of the center of mass are

The direction of the corner point is theta-atan 2 (y)_c，x_c)。

And selecting a 5 x 5 neighborhood as a judgment window for each extracted point feature, and judging the feature as a valid feature only when all pixels in the window are in the range of the mask passing part.

Step S21.2, using rBRIEF descriptor to generate 256-dimensional binary descriptor to describe the image characteristics around the corner, and estimating the similarity of the corresponding characteristic point image expression by calculating the Hamming Distance (Hamming Distance) between the descriptors to carry out point characteristic matching.

Step S21.3, using three-dimensional space coordinates

Or homogeneous coordinate

The position of point features in the world coordinate system is described. The use of homogeneous coordinates allows for more convenient conversion of points between different coordinate systems.

Further, the method for extracting line features in the multi-angle image data comprises the following steps:

the method comprises the steps of extracting line segment features in an image by using a line feature extraction algorithm, describing the extracted line segment features by using an LBD (local binary decomposition) description operator, and matching line features based on the line segment feature description, and comprises the following steps:

And step S22, extracting line features in the sequence images, matching the line features, and geometrically describing the positions of the line features in the three-dimensional space by using a four-dimensional orthogonal expression based on the Prock coordinates. Different from a general coordinate conversion mode, a mode of converting coordinates of line features in a world space, a camera space and an image plane into each other is described in detail. The specific process is as follows:

in step S22.1, a lsd (line Segment detector) method is used to extract a structural line Segment from the image. Screening by a certain length, and taking two ends p of the line segment_s、p_eAnd the midpoint p_mAnd respectively judging whether all points in the 3-by-3 neighborhood are within the range of the mask passing part in the step S101.2, and judging the extracted line segment as the effective feature only when all three points pass.

And S22.2, after sufficient Line segments (not less than 30 in texture-rich areas and not less than 10 in texture-poor areas) are extracted, generating 256-dimensional binary descriptors by using a LBD (Line Band Descriptor) based method, carrying out image feature description on the Line segments extracted from different images, and matching by calculating Hamming distances.

Step S22.3, for the line segment characteristics in the image, using the end point homogeneous coordinate x₁(u₁，v₁1) and x₂(u₂，v₂And 1) is described. For linear features in space, use is made ofThe method is described based on a four-dimensional orthogonal description of the Prockian coordinates.

Homogeneous coordinates of two points in a given space

The Prock coordinates of the spatial line may be represented by a 6-dimensional vector L^T～(n^T，v^T)^TIs shown, in which:

the orthogonal coordinates of the straight line are:

order matrix

Then U ∈ SO (3), W ∈ SO (2), and the line segment parameters can be reduced to the combination of the logarithm mapping vector theta of the matrix U and the corresponding angle theta of W

Wherein each parameter has its own geometric meaning: the matrix W contains ratio information σ₁/σ₂The distance d from the origin O of the coordinates to the straight line is shown, the parameter θ being related to the direction of d. The matrix U contains the three-dimensional coordinate information of the straight line L, θ₁、θ₂、θ₃The respective corresponding geometrical relations are as follows:

1. θ₁the straight line is tangent to the circle on the OL plane by taking O as the center of the circle and d as the radius;

2. θ₂the straight line is represented to be tangent to a circle which takes O as the center of a circle and d as the radius, is vertical to the OL plane and intersects the straight line on the point P for rotation;

3. θ₃representing a straight line rotating about the axis OP. The geometrical meaning of each component in the rectilinear orthogonal coordinates by which the position of the line changesDescription, as shown in fig. 3a to 3 c.

In subsequent graph optimization calculations, the matrix needs to be paired according to the increment of theta

Updating is carried out, and the algorithm is carried out by calculating an exponential and a logarithmic mapping and updating the matrixes U and W:

step S22.4, with

Coordinate transformation matrix representing world to camera, basis

The Prock line L_wConversion from world coordinate system to straight line L under camera coordinate system_cWherein R is_cwEpsilon SO (3) is a conversion rotation matrix,

in order to translate the vector, the vector is translated,

representing T in the Prock coordinate system_cwA variant form of (a).

According to

Converting the straight line from the camera coordinate system to a camera plane straight line l', wherein

Is a phase built-in matrix of the Prock coordinate system, in which (u)_c,v_c) Representing the coordinates of the principal optical axis of the camera in the image coordinate system, f_u、f_vThe focal lengths of the cameras in the u and v directions, n_cStraight line is at camera seatThe components of the internal Prockian coordinates are plotted.

In this step, in order to achieve a better line feature matching effect, the step of matching the structural line segment according to the image feature description includes:

and step 22.21, respectively performing structural line segment matching in a plurality of channels of the image, and marking the structural line segment which meets the preset matching condition in at least one channel as an effective line segment to obtain a first effective line segment set.

For a multi-channel color image, line segment matching is respectively carried out in each channel, and as long as the matching conditions are met in more than one channel, the line segment can be marked as effective matching, which is to fully utilize the information of color image data in each channel, and a first effective line segment set is obtained after the matching of the step.

And step 22.22, carrying out re-matching on each line segment marked as the effective line segment by using a k nearest neighbor method to obtain an effective line segment obtained by re-matching, and obtaining a second effective line segment set.

For each match, a K Nearest-Neighbor (K Nearest-Neighbor) method is used for searching a line segment which describes an operator distance next to the match, the match is marked as a valid match only when the difference between the two distances is larger than a set threshold value, and otherwise, the match result is abandoned. The screening is to reduce the occurrence of matching dislocation between similar line segments to increase the matching accuracy, and a second effective line segment set is obtained after the matching.

And step 22.23, performing bidirectional matching on each effective line segment in the second effective line segment set to obtain an optimal matching pair.

And exchanging the positions of the frame feature sequence to be matched and the new input frame feature sequence, performing bidirectional matching, and only keeping the two directions as the best matched feature pairs. For example, assume that there is a line segment l in frame A₁Calculating and screening to obtain the sum l in the frame B₁The best matching line segment is l₂(ii) a Position of switch A, B, only if in frame A with l₂The best matching line segment is still l₁Then, the matching result { l } is determined₁，l₂And is an effective matching pair.

And S3, performing camera pose preliminary estimation on each frame of image in the multi-angle image data, and constructing a graph structure by combining the extracted and matched point features, line features and the camera pose preliminary estimation result.

In this step, using the feature association information obtained according to the foregoing method, performing system initialization using epipolar geometry, performing pose preliminary estimation using an EPnP method, specifically, recovering the relative pose relationship of an initial frame using epipolar geometry, and performing system initialization, mainly including the following steps:

step S31, determining the relative pose relationship of the initial frame using the epipolar geometry.

Extracting point features from the input image, when the number of the point features is sufficient (greater than a threshold tau)_d) Taking the frame as an undetermined initial frame to carry out the next step, otherwise, directly skipping the frame to continue calculation; and after the undetermined initial frame is created, extracting features from the next frame of the sequence image, matching, and performing nearest neighbor and bidirectional screening on a matching result. If the number of matched features is sufficient (greater than τ)_m) And then, the image is regarded as an undetermined initial frame pair, the epipolar geometry is used for solving the basic matrix F, the essential matrix E is further solved, and the relative pose of the two frames of cameras is preliminarily estimated.

And step S32, estimating the pose relationship of the frame to be solved relative to the known frame by the camera motion state estimation and the EPnP method to obtain the relative pose relationship of each frame to be solved.

And carrying out system initialization. And adding the initial frame as a key frame into the system, converting the matched characteristic points and lines in the initial frame into a world coordinate system, and adding the world coordinate system into a map. And then, carrying out global graph optimization on the camera pose, the point features and the line features by using a graph optimization method, and adjusting the pose relationship. When the multi-camera sequence image data is used for actual operation, the initialization result of the camera No. 0 is used as a standard, the initialization pose is converted into the pose of Ladybug equipment according to the camera calibration result obtained in the preprocessing, and the pose is mapped to other cameras.

And step S33, constructing a graph structure by using the projection relation of the point features and the line features and the camera position as vertexes.

And roughly estimating the pose relationship of the frame to be solved relative to the known frame by using a camera motion state estimation method and an EPnP method, and taking the pose relationship as an initial value of graph optimization.

And constructing a graph according to the dotted line feature matching relation, wherein the graph comprises a plurality of point feature vertexes, a plurality of line feature vertexes, a plurality of (local or global estimation) or one (single-frame estimation) camera vertexes and feature-camera corresponding edge relations.

And step S4, determining a three-dimensional map according to the extracted point features, the line features and the graph structure.

In this step, based on the point features and line features extracted in the above steps S2 and S3, and the constructed graph structure, trajectory tracking of the camera is realized, and a constructed three-dimensional map is obtained.

Further, the step of performing map optimization on the map constructed in the step S3 by using a G2O tool is included in the step, and the location and posture information of the camera in each shooting frame is optimized by using a map optimization algorithm, so as to optimize the three-dimensional world coordinates of the point features and the line features in the map structure, thereby obtaining a more accurate three-dimensional map.

Specifically, the step of constructing the graph structure by using the point feature, the line feature and the camera position as vertices and the projection relationship between the point feature and the line feature includes:

The specific steps of pose estimation using the G2O diagram structure are as follows:

and step S41, adding the line features and point features associated with the camera pose to be solved into the graph as vertexes. The method comprises the steps of describing vectors of geometric positions of features required to be contained in a vertex structure, and optimizing and updating the pose according to the increment estimated by Jacobian calculation in the optimization process. The description vector of the point feature is three-dimensional space coordinate

Updating according to vector addition; the description vector of the line features is:

and (4) converting the increment into a special Euclidean group by using exponential mapping, and then converting the increment into an orthogonal description coordinate by using logarithmic mapping for updating.

Step S42, according to the visual relationship of the features in the frame to be solved, a multi-vertex edge is constructed: point feature-line feature-camera. The setting is convenient for resolving, because the pose relations among the features are mutually independent, the filling is carried out by using the empty features in the actual operation process, namely, the empty point-line-camera description line feature-camera corresponding relation is used, and the point-empty line-camera description point feature-camera corresponding relation is used. The method for solving the edge structure including the reprojection error and the Jacobian matrix comprises the following specific steps:

step S42.1, in the kth frame, the world coordinate of the point feature i is set as P_iCorresponding to image coordinate p_k，iThe built-in camera matrix is K, and the conversion matrix from the world coordinate system to the camera coordinate system is T_k，cwThen reprojection error ep_k，i＝p_k，i-KT_k，cwP_i。

Solving the Jacobian matrix of the point feature reprojection error to the camera attitude xi is as follows:

step S42.2, in the k frame, the world Prock coordinate of the line feature j is L_jIn the frame, corresponding measurement line segment l_k，jRepresented by the homogeneous coordinates of its end points, the reprojection error of the characteristic line j

Shown in 2]_1-3The first three dimensions of the vector are represented, d (.) represents the distance function:

l′(l₁，l₂，l₃) For projection of a straight line from the world coordinate system to an image using equation coefficients, x₁(u₁，v₁1) and x₂(u₂，v₂And 1) actual measurement line segment endpoint coordinates expressed by homogeneous coordinates. According to the chain rule, solving the Jacobian matrix of the line feature reprojection error on the camera attitude is as follows:

step S42.3, the robustness of the optimization process is enhanced by two steps: (1) setting a robust kernel function for each edge to reduce the influence of outliers; (2) each optimization packet is repeated. After each group of iteration is completed, the features corresponding to the edges with the reprojection error larger than the threshold th are marked as outliers, the edges are removed, and only the edges (Inliers) with the errors within the threshold range are reserved during the next group of iteration, so that a relatively stable pose estimation value is obtained. Different degrees of freedom, point features th, distributed according to chi-square_p＝5.991th_pLine characteristic th of 5.991_l7.815. If the number of the non-outlier features is enough after the initial pose calculation and the distance from the last local optimization reaches a certain interval, the local graph optimization is needed to be carried out, and the pose and feature are further stabilizedThe accumulated error is reduced.

When local optimization is carried out, past key frames with a common view field with the current key frame are searched through map search, the positions and the features of the common view key frames are used for constructing vertexes, corresponding edges are constructed according to a visual relation, an optimization graph is added, and meanwhile optimization is carried out. After the closed loop is detected, global optimization is needed.

And step S43, when the pose of a single frame is estimated in the Ladybug multi-camera panoramic shooting system, the pose of each frame relative to the initialized local coordinate system needs to be solved, and finally the pose is unified into the Ladybug coordinate system, the coordinates of each local camera are converted to the center and the mean value is solved, so that the coordinate values of the Ladybug equipment in the scene are obtained.

In step S44, except that the initial frame is necessarily used as the key frame, the following conditions for determining whether a frame of video is a key frame are: (1) adding the last key frame into the map by N common frames (taking N as 9 in the implementation process); (2) the number of the inner point/line features (Inliers) obtained after the solution is within a certain range, such as less than nf_max(nf_max90) and greater than nf_min(nf_min70); (3) the ratio of the number of the inner point/line features to the number of the matched features of the previous frame is less than r_kf(r_kf0.9) and greater than 50. Condition (2) (3) indicates that the matching features are decreasing, but the frame solution is still stable. And if the calculation result of a certain frame meets the condition (1) and either one of the conditions (2) and (3) in the three conditions, adding the frame into the map as a key frame. In order to reduce the situation that feature information is repeatedly provided among key frames, the key frames which are too dense need to be removed. And acquiring the rest key frames sharing the sight line range with the current key frame and the feature information in the rest key frames through the common-view relation, wherein if 90% or more of features of the current key frame can be observed in the other key frames, the frame belongs to the redundant key frames and is deleted from the map.

Further, the method also comprises the following steps: bag-of-words closed-loop detection is performed using a dictionary (dictionary) generated by DBoW2 to check whether the camera has reached a place that has been passed before, and closed-loop correction is performed to apply constraints to the trajectory to eliminate or reduce cumulative errors. And then cutting the optimized line segment to finally generate a scene sparse feature map.

And S44.1, screening adjacent frames in the multi-angle image data based on the common view relation and the similarity between the adjacent frames to obtain a closed loop standby group.

And (4) screening a common view relation, removing adjacent frames with a common view field with the current frame, and searching frames with the same words as the current frame in the rest key frames in the map and listing the frames as alternative frames. And counting the number of common words of all the alternative frames and the current frame, taking 80% of the maximum number of the common words as a screening threshold, wherein the number of the common words exceeds the threshold, and the similarity with the current frame exceeds all key frames with the lowest similarity between the current frame and the adjacent frame. Grouping the key frames and the previous adjacent 10 key frames, calculating the sum of the similarity of each frame and the current frame in each group as a group score, taking 75% of the highest group score as a threshold value, and finding out the key frames with the highest similarity to the current frame in all groups with the group scores higher than the threshold value as an alternative group. All frames in the alternative set are checked for consistency (to see if they have a common neighbor to the current frame) to screen out the final closed-loop alternative set.

And S44.2, detecting the matching area corresponding to the closed loop standby group by using a bag-of-words closed loop detection method, correcting the common-view relation of the current frame according to the detection result, updating the coordinate values of the feature points in the matching area, and correcting the position of the closed loop in a world coordinate system.

Screening an alternative group according to the number of matching points, and carrying out sim3 resolving on the matching points by using the residual frames in the alternative group and the current frame 3, namely obtaining a rotation matrix, a translation vector and a scale factor among the matching points through a similarity transformation group, and screening out a rough matching area by taking the number of matched interior points (Inliers) as a condition according to the obtained sim3 relation. And after correcting the camera pose and the characteristics of the matching area by using the sim3 relation, searching the nearby area for matching. And if the number of the matching points exceeds 40, judging that a closed loop is detected and informing a resolving thread to stop inserting a new key frame and enter the next step, otherwise, emptying the alternative group and waiting for the next detection.

And S44.3, updating the common time domain and connection relation between the past frames in the graph structure according to the detected closed loop.

Updating the common-view relation of the current frame according to the closed-loop detection result, resolving the sim3 relation of the adjacent frame of the current frame, updating the coordinate values of each feature point in the region according to the closed-loop sim3, and converting the sim3 relation into T belonging to SE (3) and T belonging to SE (3);

the positions of the closed loops in the world coordinate system are corrected, and then the common view and the connection relation between the past key frames in the map are updated according to the newly found closed loops. And finally, reconstructing a Graph structure according to the updated relationship, firstly optimizing a key part (Essential Graph) of a new connection relationship generated by adding a closed loop in the Graph, and finally performing global optimization on all key frames and the feature point lines.

And S44.4, combining the updated graph structure, cutting the extracted straight line by using the end point coordinates of the pre-stored line features on the multi-angle image, and determining the three-dimensional map by using the cut line segment.

When line features are used for graph optimization, in order to reduce description parameters and further reduce the calculation amount and reduce the influence of the drift of line segment end points between different frames along the direction of the line where the line segment is located caused by image frame truncation or a line segment detection operator, the orthogonal parameters are used for describing the position of the line where the line segment is located after being projected to a world coordinate system. When the scene feature sparse map is constructed, the end point coordinates of the line features stored in advance on the image are used for cutting the line, and the cut line segments are used for constructing the sparse scene map. The structural diagram of the local feature sparse map is shown in fig. 4.

Exemplary device

The embodiment discloses an information processing device, as shown in fig. 5, comprising a processor, a storage medium communicatively connected to the processor, the storage medium adapted to store a plurality of instructions; the processor is adapted to invoke instructions in the storage medium to perform the steps of implementing the dotted line combined multi-camera visual SLAM method described above.

Furthermore, the logic instructions in the memory 22 may be implemented in software functional units and stored in a computer readable storage medium when sold or used as a stand-alone product.

The memory 22, which is a computer-readable storage medium, may be configured to store a software program, a computer-executable program, such as program instructions or modules corresponding to the methods in the embodiments of the present disclosure. The processor 30 executes the functional application and data processing, i.e. implements the method in the above-described embodiments, by executing the software program, instructions or modules stored in the memory 22.

The memory 22 may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created according to the use of the terminal device, and the like. Further, the memory 22 may include a high speed random access memory and may also include a non-volatile memory. For example, a variety of media that can store program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk, may also be transient storage media.

In another aspect, the present embodiment also provides a computer-readable storage medium, wherein the computer-readable storage medium stores one or more programs, which are executable by one or more processors, to implement the steps of the dotted-line combined multi-camera visual SLAM method.

Compared with the prior characteristic method visual SLAM, the embodiment of the method disclosed by the invention comprehensively uses the dotted line characteristic and multi-angle data shot by multiple cameras:

1) and simultaneously acquiring 360-degree scene data by using 6 cameras mounted on the Ladybug panoramic equipment. Compared with a system for acquiring data by using a single common camera, more scene information can be acquired by using a plurality of wide-angle cameras to acquire data in the same long time; the track operation result integrates the information of each angle, and the stability of the system is improved.

2) On the basis of the LBD line feature matching method, optimization strategies such as multi-channel matching, bidirectional verification, Knn conditional constraint and the like are added, and matching accuracy is improved.

3) A method of joint calculation of points and line characteristics is adopted. The line features have higher dimensionality than the point features and contain more scene structure information, the point-line features are added into the optimization graph at the same time for joint calculation, and the tracking stability and accuracy can be improved. The sparse feature map constructed by using the line features can more clearly and intuitively describe the scene in an abstract way.

Through the three points, the embodiment of the invention can obtain better track tracking and map construction results.

The results of the experiment on the single-camera sequence images by using the point-line combined feature algorithm, the comparison of the results of the experiment by using the ORB algorithm of the point features only, the camera parameter setting of the experiment d by using the multi-camera panoramic data, and the comparison of the precision results of the track calculation of each single camera in the multi-camera panoramic data experiment are respectively given.

TABLE 1

TABLE 2

TABLE 3

Table 1 shows the results of experiments performed on single-camera sequence images using the dotted-line combined feature algorithm, compared to the results of experiments performed using the ORB algorithm with only point features. In the table, the unit of the track length is unit length, the RPE RMSE is a root mean square Error value of Relative Pose Error (Relative position Error), and the percentage is a ratio of the Error to the track length and is used for measuring the track tracking precision.

Table 2 shows the results of experiments using multi-camera panoramic data, wherein each camera takes images of size 1232 × 1024, and the sequence images are taken by a backpack-mounted Ladybug5+ device. Table 3 shows the accuracy results of the single-camera trajectory solution in the multi-camera panoramic data experiment. It can be seen from table 2 and table 3 that the accuracy of tracking track is obviously improved compared with that of a single camera when shooting at multiple angles.

In summary, when the experiment is performed using monocular fisheye camera data, the number of trace frames and the trace length after adding the line feature are higher than the result of the ORB algorithm implementation using only the point feature, and the accuracy is also improved.

The invention provides a point-line combined multi-camera vision SLAM method, system and equipment, which are characterized in that multi-angle image data of a target scene are collected, point features and line features in the multi-angle image data are extracted and matched, position information of the point features and the line features in a three-dimensional space is obtained, camera pose preliminary estimation is carried out on each frame of image in the multi-angle image data, a graph structure is constructed by combining the extracted and matched point features, line features and camera pose preliminary estimation results, and a three-dimensional map is determined according to the extracted point features, line features and the graph structure. In the embodiment, a point and line feature joint calculation method is adopted, and the line features contain more information, so that the tracking stability and accuracy can be improved, and the scene can be more clearly and intuitively abstracted and described by using the sparse feature map constructed by using the line features.

It should be understood that equivalents and modifications of the technical solution and inventive concept thereof may occur to those skilled in the art, and all such modifications and alterations should fall within the scope of the appended claims.

Claims

1. A dotted line combined multi-camera visual SLAM method, comprising:

collecting multi-angle image data of a target scene area;

2. The dotted-line combined multi-camera visual SLAM method of claim 1, wherein said step of extracting point features and line features in said multi-angle image data is preceded by the steps of:

3. The dotted-line combined multi-camera visual SLAM method of claim 1 or 2, wherein the step of extracting the point feature and the line feature in the multi-angle image data comprises:

and extracting line segment features in the image by using a line feature extraction algorithm, describing the extracted line segment features by using an LBD (local binary decomposition) description operator, and performing line feature matching based on the line segment feature description to obtain the extracted line features.

4. The dotted line combined multi-camera visual SLAM method of claim 3 wherein the steps of using a line feature extraction algorithm to extract line segment features in the image, using an LBD descriptor to describe the extracted line segment features, and performing line feature matching based on the line segment feature description comprise:

5. The dotted line combined multi-camera visual SLAM method of claim 4, wherein said step of performing structure line segment matching from said image bit description comprises:

6. The point-line combined multi-camera visual SLAM method of claim 1, wherein the step of performing a camera pose preliminary estimation on each frame of image in the multi-angle image data, and combining the extracted and matched point features, line features and the camera pose preliminary estimation result to construct a graph structure comprises:

7. The dotted line combined multi-camera visual SLAM method of claim 6, wherein the step of constructing a graph structure with the projected relationship of point features and line features using the point features, line features and camera positions as vertices comprises:

8. The dotted line combined multi-camera visual SLAM method of claim 7, wherein the step of determining a three-dimensional map from the extracted point features, line features and the map structure further comprises:

9. An information processing apparatus comprising a processor, a storage medium communicatively coupled to the processor, the storage medium adapted to store a plurality of instructions; the processor is adapted to invoke instructions in the storage medium to perform the steps of implementing the dotted line combined multi-camera visual SLAM method of any of the above claims 1-8.

10. A computer readable storage medium, having one or more programs stored thereon, the one or more programs being executable by one or more processors to perform the steps of the dotted line combined multi-camera visual SLAM method of any one of claims 1-8.