CN112016612A - Monocular depth estimation-based multi-sensor fusion SLAM method - Google Patents
Monocular depth estimation-based multi-sensor fusion SLAM method Download PDFInfo
- Publication number
- CN112016612A CN112016612A CN202010871686.7A CN202010871686A CN112016612A CN 112016612 A CN112016612 A CN 112016612A CN 202010871686 A CN202010871686 A CN 202010871686A CN 112016612 A CN112016612 A CN 112016612A
- Authority
- CN
- China
- Prior art keywords
- map
- loss function
- image
- representing
- parallax
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 42
- 230000004927 fusion Effects 0.000 title claims abstract description 15
- 238000001514 detection method Methods 0.000 claims abstract description 21
- 230000006870 function Effects 0.000 claims description 54
- 230000008569 process Effects 0.000 claims description 15
- 239000011159 matrix material Substances 0.000 claims description 12
- 238000010586 diagram Methods 0.000 claims description 7
- 238000009499 grossing Methods 0.000 claims description 7
- 230000009466 transformation Effects 0.000 claims description 7
- 238000013528 artificial neural network Methods 0.000 claims description 6
- 238000013527 convolutional neural network Methods 0.000 claims description 6
- 238000013519 translation Methods 0.000 claims description 5
- 238000005457 optimization Methods 0.000 claims description 4
- 238000001914 filtration Methods 0.000 claims description 3
- 230000001360 synchronised effect Effects 0.000 claims description 3
- 238000006073 displacement reaction Methods 0.000 claims description 2
- 238000012360 testing method Methods 0.000 claims description 2
- 230000002146 bilateral effect Effects 0.000 claims 1
- 238000007670 refining Methods 0.000 claims 1
- 230000000007 visual effect Effects 0.000 description 6
- 238000004364 calculation method Methods 0.000 description 4
- 238000000354 decomposition reaction Methods 0.000 description 4
- 238000002474 experimental method Methods 0.000 description 3
- 238000004458 analytical method Methods 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 230000001186 cumulative effect Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 239000000284 extract Substances 0.000 description 1
- 238000010606 normalization Methods 0.000 description 1
- 230000004044 response Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 238000012549 training Methods 0.000 description 1
- 230000007704 transition Effects 0.000 description 1
- 238000012795 verification Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/25—Fusion techniques
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01C—MEASURING DISTANCES, LEVELS OR BEARINGS; SURVEYING; NAVIGATION; GYROSCOPIC INSTRUMENTS; PHOTOGRAMMETRY OR VIDEOGRAMMETRY
- G01C22/00—Measuring distance traversed on the ground by vehicles, persons, animals or other moving solid bodies, e.g. using odometers, using pedometers
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01S—RADIO DIRECTION-FINDING; RADIO NAVIGATION; DETERMINING DISTANCE OR VELOCITY BY USE OF RADIO WAVES; LOCATING OR PRESENCE-DETECTING BY USE OF THE REFLECTION OR RERADIATION OF RADIO WAVES; ANALOGOUS ARRANGEMENTS USING OTHER WAVES
- G01S11/00—Systems for determining distance or velocity not using reflection or reradiation
- G01S11/12—Systems for determining distance or velocity not using reflection or reradiation using electromagnetic waves other than radio waves
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01S—RADIO DIRECTION-FINDING; RADIO NAVIGATION; DETERMINING DISTANCE OR VELOCITY BY USE OF RADIO WAVES; LOCATING OR PRESENCE-DETECTING BY USE OF THE REFLECTION OR RERADIATION OF RADIO WAVES; ANALOGOUS ARRANGEMENTS USING OTHER WAVES
- G01S17/00—Systems using the reflection or reradiation of electromagnetic waves other than radio waves, e.g. lidar systems
- G01S17/86—Combinations of lidar systems with systems other than lidar, radar or sonar, e.g. with direction finders
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01S—RADIO DIRECTION-FINDING; RADIO NAVIGATION; DETERMINING DISTANCE OR VELOCITY BY USE OF RADIO WAVES; LOCATING OR PRESENCE-DETECTING BY USE OF THE REFLECTION OR RERADIATION OF RADIO WAVES; ANALOGOUS ARRANGEMENTS USING OTHER WAVES
- G01S17/00—Systems using the reflection or reradiation of electromagnetic waves other than radio waves, e.g. lidar systems
- G01S17/88—Lidar systems specially adapted for specific applications
- G01S17/93—Lidar systems specially adapted for specific applications for anti-collision purposes
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/22—Matching criteria, e.g. proximity measures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/50—Depth or shape recovery
- G06T7/55—Depth or shape recovery from multiple images
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/44—Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/56—Extraction of image or video features relating to colour
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10016—Video; Image sequence
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10024—Color image
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10032—Satellite or aerial image; Remote sensing
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Radar, Positioning & Navigation (AREA)
- Remote Sensing (AREA)
- General Engineering & Computer Science (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Life Sciences & Earth Sciences (AREA)
- Electromagnetism (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Molecular Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computing Systems (AREA)
- Evolutionary Biology (AREA)
- Multimedia (AREA)
- Biophysics (AREA)
- Biomedical Technology (AREA)
- Computer Networks & Wireless Communication (AREA)
- Health & Medical Sciences (AREA)
- Image Analysis (AREA)
Abstract
The application provides a monocular depth estimation-based multi-sensor fusion SLAM method, which comprises the following steps: firstly, acquiring an RGB image, laser point cloud information and mileage information of a current environment, and performing monocular depth estimation on the RGB image to acquire corresponding depth information; secondly, synchronizing topics corresponding to the depth information, the laser point cloud information and the mileage information in an ROS system; thirdly, establishing a new node according to the information transmitted in each topic after synchronization to obtain a local grid map; then, carrying out loop detection on the local grid map; and finally, fusing the local grid maps after loop detection to create a global map. The method and the device for creating the global map by using the laser radar and the sensor combination of the pseudo RGBD camera obtained through monocular depth estimation reduce the economic cost and the size of directly using the RGBD camera or the combination of the binocular camera and the laser radar sensor.
Description
Technical Field
The application relates to the technical field of mobile robots, in particular to a multi-sensor fusion SLAM method based on monocular depth estimation.
Background
The multi-sensor fusion is one of the future development directions in robot positioning and navigation, and in the multi-sensor fusion, the fusion of laser radar and visual information is an important part. In the existing modes, visual images are obtained through an RGBD camera, a binocular camera and the like. However, the RGBD camera and the binocular camera which can directly obtain the depth information have high cost and large volume, and for a low-cost robot or a small-sized robot, there are not little problems in terms of economic cost and structural design. Although monocular cameras are small in size and low in cost, depth information cannot be directly acquired. It is therefore desirable to provide a solution to this problem.
Disclosure of Invention
The application aims to provide a monocular depth estimation-based multi-sensor fusion SLAM method, which is used for achieving the technical effects of obtaining depth information of surrounding environment through a monocular sensor and reducing cost and equipment size.
The embodiment of the application provides a multi-sensor fusion SLAM method based on monocular depth estimation, which comprises the following steps:
step 3, establishing new nodes according to the information transmitted in each synchronized topic to obtain a local grid map;
step 4, carrying out loop detection on the local grid map;
and 5, fusing the local grid maps subjected to loop detection to create a global map.
The beneficial effect that this application can realize is: the method and the device for creating the global map by using the laser radar and the sensor combination of the pseudo RGBD camera obtained through monocular depth estimation reduce the economic cost and the size of directly using the RGBD camera or the combination of the binocular camera and the laser radar sensor.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are required to be used in the embodiments of the present application will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present application and therefore should not be considered as limiting the scope, and that those skilled in the art can also obtain other related drawings based on the drawings without inventive efforts.
Fig. 1 is a schematic flowchart of a multi-sensor fusion SLAM method based on monocular depth estimation according to an embodiment of the present disclosure;
fig. 2 is a schematic view of a monocular depth estimation process according to an embodiment of the present disclosure;
fig. 3 is a schematic diagram of a new node creation process provided in an embodiment of the present application;
FIG. 4 is a schematic diagram of the results of an analysis comparison experiment provided in the examples of the present application;
fig. 5 is a schematic diagram of a loopback detection result provided in an embodiment of the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be described below with reference to the drawings in the embodiments of the present application.
It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined and explained in subsequent figures. Meanwhile, in the description of the present application, the terms "first", "second", and the like are used only for distinguishing the description, and are not to be construed as indicating or implying relative importance.
Referring to fig. 1, fig. 1 is a schematic flowchart illustrating a multi-sensor fusion SLAM method based on monocular depth estimation according to an embodiment of the present disclosure.
The embodiment of the application provides a multi-sensor fusion SLAM method based on monocular depth estimation, which comprises the following steps:
step 3, establishing new nodes according to the information transmitted in each synchronized topic to obtain a local grid map;
step 4, carrying out loop detection on the local grid map;
and 5, fusing the local grid maps subjected to loop detection to create a global map.
In one embodiment, the RGB image of the current environment can be acquired through the monocular sensor, the depth information of the RGB image is analyzed, and then the RGB image is transmitted to the ROS system; acquiring laser point cloud information of the current environment through a laser radar and transmitting the laser point cloud information to an ROS system; and acquiring the mileage information of the robot through an encoder and transmitting the mileage information to the ROS system. The RGB image may be directly obtained by the monocular sensor, and the depth information (including depth information of each pixel point in the depth image) matched with the RGB image obtained by the monocular sensor may be obtained by learning through the CNN neural network.
Considering that different sensors transmit in the ROS system through respective topics, and the frequency of sending topics to the outside by each sensor is different, it is necessary to use a topic synchronization scheme with a certain policy to perform corresponding synchronization on sensor topics. When topic synchronization is performed through the step 2, the sets output firstly should not have intersection. Assuming that there are two sets M and N, and i is a certain moment in the topic, then for all i, these two sets should either satisfy Mi ≦ Ni, or satisfy Ni ≦ Mi. Secondly, there is at least one topic with no other messages discarded between the two groups of messages, i.e. the set should be continuous; the set of outgoing topics should depend on the time stamp of the ROS, not the actual arrival time of the message.
Please refer to fig. 2, fig. 2 is a schematic view illustrating a monocular depth estimation process according to an embodiment of the present application.
Monocular depth estimation is to estimate the depth information of each pixel point in a monocular image input to a function through the function. Such a function can be briefly described as:
wherein,i is the depth information of the pixel point, I is the input image. The training step of the function comprises:
firstly, acquiring a real left image and a real right image of a surrounding environment;
secondly, estimating a right disparity map and a left disparity map according to the real left map through a CNN neural network;
and finally, overlapping the right disparity map and the real left map to obtain a right map estimation map, and overlapping the left disparity map and the real right map to obtain a left map estimation map.
In the above process, the loss functions of the right and left image estimation maps in the estimation process can be analyzed, and the loss functions are reduced to expected values by adjusting the parameters of the CNN neural network.
In particular, the loss function comprises an image reconstruction loss functionNumber CapParallax smoothing loss function CdsLeft and right consistency loss function Clr. Specifically, taking the real left image as an example, the loss function C of image reconstruction thereofapThe calculation method is as follows:
wherein,representing a first image reconstruction loss function; n represents the number of pixel points;representing a true left graph;representing a left graph estimation graph; α represents a weight. In one embodiment, a single scale SSIM (structural similarity index) and simplified 3 x 3 filtering may be employed, with α taken to be 0.82. The same way can obtain a second image reconstruction loss function Cr ap。
Parallax smoothing loss function CdsThe calculation method is as follows:
wherein, Cl dsRepresenting a first disparity smoothing loss function;representing the parallax gradient of the left parallax map in the x direction;representing the image gradient of the true left image in the x direction;representing the parallax gradient of the left parallax map in the y direction;representing the image gradient in the y-direction of the true left image. The same can obtain the second parallax smooth loss function Cr ds。
In practical situations, the left-to-right parallax of the binocular camera and the right-to-left parallax are completely equivalent, the CNN neural network only receives a left image as input and simultaneously predicts a left parallax image and a right parallax image, in order to ensure coordination, a penalty for left-right parallax inconsistency is added into a loss function, the left parallax image is forced to be equal to a projected right parallax image, the prediction accuracy of the network on the parallax images can be effectively improved by adding strong prior supervision, and a left-right consistency loss function C is addedlrThe calculation method is as follows:
wherein, Cl lrRepresenting a first left-right consistency loss function;representing a left disparity map;a reconstructed disparity map representing the left disparity map. The same can obtain a second left-right consistency loss function Cr lr。
The three loss functions are weighted to obtain the total loss function:
in the formula, CsRepresenting the total loss function, αapA weight representing an image reconstruction loss function; alpha is alphadsWeights representing a disparity smoothing loss function; alpha is alphalrWeights representing left and right consistency loss functionsAnd (4) heavy. In one embodiment, αapMay be 0.9; alpha is alphadsCan be 1; alpha is alphalrMay be 0.01.
As shown in fig. 3, fig. 3 is a schematic diagram of a new node creation process provided in the embodiment of the present application.
In one embodiment, the new node may be obtained by a visual odometer, and specifically, the establishment of the new node is performed in sequence according to the following procedures:
first, feature detection is performed. Specifically, a frame of image is acquired, and the GFTT feature is used to perform feature detection on the frame of image, thereby acquiring a feature map.
Secondly, feature matching is performed. Specifically, through NNDR test, extracting the features in the feature map by using binary robust independent basic feature descriptors; the feature map contains 3D features with descriptors from the last key frame.
Again, motion estimation is performed. Specifically, a PnP algorithm is used to calculate the feature transformation of the current frame corresponding to the feature map, and the implementation manner is as follows:
λivi=RPi+t (1)
wherein λ isiFor 2D reference points v in the camera coordinate systemiDepth of (v) ofiSatisfy | | v i1, R is a rotation matrix, and t is a translation vector; piIs a spatial 3D reference point in the world coordinate system.
From time to time, local BA optimization was performed. Specifically, the displacement result is refined by using the features of all key frames in the feature map.
And then, updating the pose. Specifically, the output odometer and the coordinate transformation are updated using the feature transformation of the motion estimation; namely, updating the conversion of the output odometer and the ROS system tf tree from an "from" coordinate system to a "base _ link" coordinate system; adopting a Median Absolute Deviation (MAD) method of 3D characteristics when the covariance is calculated;
finally, motion prediction is performed. Specifically, according to the result of pose updating, motion prediction is carried out on the position of the feature map in the current frame, and then feature matching is carried out again; specifically, after the pose is updated, the position of the feature of the motion model feature map in the current frame can be used for motion prediction, and then feature matching is performed again.
In one embodiment, the present application uses an improved PnP algorithm for motion estimation, which is as follows: set space point P under world coordinate systemiProjection in the image coordinate system is Pi'=[x',y']TSince there is some error and uncertainty in measuring the location of the projected point, to express this uncertainty, a two-dimensional covariance matrix can be used (where P' corresponds to P)i'):
In order to introduce such uncertainty into the whole calculation process, a two-dimensional covariance matrix can also be used to express the normalization coordinate process and the transformation process of the space vector of the coordinate point in the camera coordinate system. The uncertainty of the reference point is finally expressed as a matrix C:
Writing equation (1) in the form of a homogeneous system of linear equations:
Bu=0
where B is the coefficient of the homogeneous system of equations, and u is the column vector:
the uncertainty of the reference point described above is added to the resulting homogeneous system of linear equations:
BTCBu=Nu=0
wherein u satisfies the constraint condition | | | u | | | ═ 1. And carrying out singular value decomposition on the coefficient matrix in the formula.
N=UDVT
Wherein, U, D, VTRepresenting the singular values after singular value decomposition.
Obtaining a characteristic vector through singular value decomposition of the formula, and obtaining a rotation matrix and a translation vector according to the characteristic vector:
wherein,only the translation direction is reflected, and distance information is not available, so that the distance information is added:
R=URVR T
and finally, obtaining initial values of a rotation matrix and a translation vector of the camera, constructing an error function, and iteratively searching for an optimal solution by adopting a Gauss-Newton method.
In one embodiment, a new node is established according to the information transmitted in each topic after synchronization, and the step of obtaining the local grid map further includes: and analyzing whether the inner layer number calculated in the motion estimation process is lower than a set fixed threshold, and if the inner layer number is lower than the set fixed threshold, updating the key frames in the feature matching and local BA optimization.
Specifically, if the number of intra layers calculated in the motion estimation process is below a fixed threshold, the keyframes or feature maps are updated. And for the feature map, updating the feature map by adding the matching features of the new frame and updating the positions of the matching features through a local BA optimization module. When the size of the feature map exceeds a fixed threshold, the oldest feature that does not match the current frame is deleted. If a key frame no longer has a feature in the feature map, it will be discarded.
In one embodiment, the process of fusing the local grid maps after the loop detection to create the global map uses a bayesian filtering algorithm to fuse the local grid maps:
wherein z represents an observed value of the local map; stateiIs the state of the ith grid; p (state)i|zt+1) And p (state)i|zt) The state distribution of the ith grid at the time t +1 and the time t respectively; p is a radical ofobs(statei|zt+1) Is a constant set. Illustratively, p is the occupied state when the state of the ith grid at time t +1 is the occupied stateobs(statei|zt+1) 0.8; when the status of the ith grid at time t +1 is nullobs(statei|zt+1)=0.3。
Fig. 4 is a schematic diagram showing the results of an analysis and comparison experiment provided in the embodiment of the present application, as shown in fig. 4.
In one embodiment, the application performs algorithmic verification with the freiburg2_ Pioneer _360 dataset of the institute of information and technology computer vision group (TUM) of munich university, which is recorded by a Kinect camera mounted on top of a Pioneer robot, which performs circular motion under the control of a remote control handle, and the dataset also includes lidar data and odometry data of the robot, and the experiment uses only RGB images of the Kinect camera.
Firstly, data playback is carried out in an ROSbag system by using a ROSbag format file of a data set, an RGB image of a Kinect camera, laser point cloud information of a laser radar and mileage information of a mileometer are input into the ROS system, a monocular depth estimation module estimates a depth image sequence from the obtained RGB image sequence and inputs the obtained depth image sequence into the ROS system, and the motion of the camera is obtained through fusion estimation of the obtained image information and the laser radar point cloud information. As can be seen from fig. 4, the cumulative error of the path obtained by using the method of the present application is smaller and closer to the real path.
As shown in fig. 5, fig. 5 is a schematic diagram of a loop detection result provided in the embodiment of the present application.
The path of the robot is still deviated from the actual path along with the movement of the robot, so that the influence of accumulated errors on the path of the robot needs to be reduced by using a loop detection module. After the loop detection is carried out, the robot path is adjusted again, and a more accurate final result is obtained.
In one embodiment, a word-bag method may be used when performing loop detection. When a new node is created, the program extracts visual characteristics from the RGB image (the same GFTT features as in building the node can be used) and quantizes it to a delta visual vocabulary. The GFTT characteristics can extract a large amount of characteristic information in a short time, the characteristic diversity is rich, and in order to improve the loop detection efficiency, only the subset of the odometer characteristics with the highest response can quantize visual vocabularies. When a loop back transition has to be computed, other characteristics remain in the node, and the created new node is then compared to the previous node to detect a loop back.
In summary, the present application provides a monocular depth estimation-based multi-sensor fusion SLAM method, including acquiring RGB images, laser point cloud information, and mileage information of a current environment, and performing monocular depth estimation on the RGB images to acquire corresponding depth information; synchronizing topics corresponding to the depth information, the laser point cloud information and the mileage information in the ROS system; establishing a new node according to the information transmitted in each topic after synchronization to obtain a local grid map; carrying out loop detection on the local grid map: and fusing the local grid maps after the loopback detection to create a global map. The method and the device for creating the global map by using the laser radar and the sensor combination of the pseudo RGBD camera obtained through monocular depth estimation reduce the economic cost and the size of directly using the RGBD camera or the combination of the binocular camera and the laser radar sensor.
The above description is only for the specific embodiments of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present application, and shall be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.
Claims (6)
1. A multi-sensor fusion SLAM method based on monocular depth estimation is characterized by comprising the following steps:
step 1, acquiring an RGB image, laser point cloud information and mileage information of a current environment, and performing monocular depth estimation on the RGB image to acquire corresponding depth information;
step 2, synchronizing topics corresponding to the depth information, the laser point cloud information and the mileage information in an ROS system;
step 3, establishing new nodes according to the information transmitted in each synchronized topic to obtain a local grid map;
step 4, carrying out loop detection on the local grid map;
and 5, fusing the local grid maps subjected to loop detection to create a global map.
2. The method of claim 1, wherein step 1 comprises:
step 11, acquiring a real left image and a real right image of the surrounding environment;
step 12, estimating a right disparity map and a left disparity map according to the real left map through a CNN neural network;
and step 13, overlapping the right disparity map and the real left map to obtain a right map estimation map, and overlapping the left disparity map and the real right map to obtain a left map estimation map.
3. The method of claim 2, wherein step 1 further comprises:
step 14, analyzing the loss functions of the right graph estimation graph and the left graph estimation graph, and adjusting the parameters of the CNN neural network to reduce the total loss function to an expected value; the method comprises the following steps:
analyzing a first image reconstruction loss function of an estimation process from the true left image and the left image estimate; analyzing a second image reconstruction loss function of the estimation process from the true right image and the right image estimation map; wherein, the first image reconstruction loss function expression is:
in the formula, Cl apRepresenting the first image reconstruction loss function; n represents the number of pixel points;representing a true left graph;representing a left graph estimation graph; alpha is the weight of the basic reconstruction error and the similarity error, and the second image reconstruction loss function C can be obtained by the same methodr ap;
Analyzing a first parallax smooth loss function according to the parallax gradient of the left parallax image and the image gradient of the real left image; analyzing a second parallax smoothing loss function according to the parallax gradient of the right parallax image and the image gradient of the real right image; wherein, the first parallax smooth loss function expression is:
wherein, Cl dsRepresenting the first disparity smoothing loss function;representing the parallax gradient of the left parallax map in the x direction;representing the image gradient of the true left image in the x direction;representing the parallax gradient of the left parallax map in the y direction;representing the image gradient of the true left image in the y direction; the second parallax smooth loss function C can be obtained by the same methodr ds;
Reconstructing a disparity map according to the left disparity map and the right disparity map, and analyzing a first left-right consistency loss function of the left disparity map and a second left-right consistency loss function of the right disparity map; wherein the first left-right consistency loss function expression is:
wherein, Cl lrRepresenting the first left-right consistency loss function;representing a left disparity map;a reconstructed disparity map representing a left disparity map; the second bilateral consistency loss function C can be obtained in the same wayr lr;
Weighting the first image reconstruction loss function, the second image reconstruction loss function, the first parallax smooth loss function, the second parallax smooth loss function, the first left-right consistency loss function and the second left-right consistency loss function to obtain a total loss function; wherein, the total loss function expression is:
in the formula, CsRepresenting the total loss function, αapA weight representing an image reconstruction loss function; alpha is alphadsWeights representing a disparity smoothing loss function; alpha is alphalrRepresenting the weights of the left and right consistency loss functions.
4. The method of claim 1, wherein step 3 comprises:
step 31, feature detection: acquiring a frame of image, and performing feature detection on the frame of image by using GFTT features to acquire a feature map;
step 32, feature matching: extracting features in the feature map by using a binary robust independent basic feature descriptor through NNDR test; the feature map contains 3D features with descriptors from the last key frame;
step 33, motion estimation: the PnP algorithm is used for calculating the feature transformation of the current frame corresponding to the feature map, and the implementation mode is that
λivi=RPi+t
Wherein λ isiFor 2D reference points v in the camera coordinate systemiDepth of (v) ofiSatisfy | | vi1, R is a rotation matrix, and t is a translation vector; piA spatial 3D reference point under a world coordinate system;
step 34, local BA optimization: refining the displacement result by using the characteristics of all key frames in the characteristic diagram;
step 35, pose updating: updating the output odometer and the coordinate transformation by using the characteristic transformation of the motion estimation;
step 36, motion prediction: and according to the pose updating result, performing motion prediction on the position in the current frame by using the features of the feature map, and then performing the step 32 again.
5. The method of claim 4, wherein step 3 further comprises:
and step 37, analyzing whether the inner layer number calculated in the motion estimation process is lower than a set fixed threshold, and if the inner layer number is lower than the set fixed threshold, updating the key frames in the step 32 and the step 34.
6. The method according to claim 1, wherein said step 5 fuses said local grid map using a bayesian filtering algorithm:
wherein z represents an observed value of the local map; stateiIs the state of the ith grid; p (state)i|zt+1) And p (state)i|zt) The state distribution of the ith grid at the time t +1 and the time t respectively; p is a radical ofobs(statei|zt+1) Is a constant set.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010871686.7A CN112016612A (en) | 2020-08-26 | 2020-08-26 | Monocular depth estimation-based multi-sensor fusion SLAM method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010871686.7A CN112016612A (en) | 2020-08-26 | 2020-08-26 | Monocular depth estimation-based multi-sensor fusion SLAM method |
Publications (1)
Publication Number | Publication Date |
---|---|
CN112016612A true CN112016612A (en) | 2020-12-01 |
Family
ID=73503706
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010871686.7A Pending CN112016612A (en) | 2020-08-26 | 2020-08-26 | Monocular depth estimation-based multi-sensor fusion SLAM method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112016612A (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112446905A (en) * | 2021-01-29 | 2021-03-05 | 中国科学院自动化研究所 | Three-dimensional real-time panoramic monitoring method based on multi-degree-of-freedom sensing association |
CN112561824A (en) * | 2020-12-21 | 2021-03-26 | 中国科学院自动化研究所 | Data alignment restoration method and system for fusion of 2D laser and depth image |
CN113219492A (en) * | 2021-03-30 | 2021-08-06 | 苏州市卫航智能技术有限公司 | Method and system for positioning and navigating river course ship driving |
CN113256807A (en) * | 2021-06-04 | 2021-08-13 | 武汉工控仪器仪表有限公司 | Effective map merging method based on context description conforming to corresponding expansion |
CN114624688A (en) * | 2022-03-15 | 2022-06-14 | 电子科技大学 | Tracking and positioning method based on multi-sensor combination |
Citations (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20120121161A1 (en) * | 2010-09-24 | 2012-05-17 | Evolution Robotics, Inc. | Systems and methods for vslam optimization |
CN106127739A (en) * | 2016-06-16 | 2016-11-16 | 华东交通大学 | A kind of RGB D SLAM method of combination monocular vision |
CN107193279A (en) * | 2017-05-09 | 2017-09-22 | 复旦大学 | Robot localization and map structuring system based on monocular vision and IMU information |
US20170337470A1 (en) * | 2016-05-20 | 2017-11-23 | Magic Leap, Inc. | Method and system for performing convolutional image transformation estimation |
EP3451288A1 (en) * | 2017-09-04 | 2019-03-06 | Universität Zürich | Visual-inertial odometry with an event camera |
CN110443843A (en) * | 2019-07-29 | 2019-11-12 | 东北大学 | A kind of unsupervised monocular depth estimation method based on generation confrontation network |
CN110501017A (en) * | 2019-08-12 | 2019-11-26 | 华南理工大学 | A kind of Mobile Robotics Navigation based on ORB_SLAM2 ground drawing generating method |
US20190377952A1 (en) * | 2016-06-24 | 2019-12-12 | Robert Bosch Gmbh | RGB-D Camera Based Tracking System and Method Thereof |
CN111045017A (en) * | 2019-12-20 | 2020-04-21 | 成都理工大学 | Method for constructing transformer substation map of inspection robot by fusing laser and vision |
CN111089585A (en) * | 2019-12-30 | 2020-05-01 | 哈尔滨理工大学 | Mapping and positioning method based on sensor information fusion |
WO2020111844A2 (en) * | 2018-11-28 | 2020-06-04 | 서울대학교 산학협력단 | Method and apparatus for enhancing image feature point in visual slam by using object label |
CN111462207A (en) * | 2020-03-30 | 2020-07-28 | 重庆邮电大学 | RGB-D simultaneous positioning and map creation method integrating direct method and feature method |
-
2020
- 2020-08-26 CN CN202010871686.7A patent/CN112016612A/en active Pending
Patent Citations (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20120121161A1 (en) * | 2010-09-24 | 2012-05-17 | Evolution Robotics, Inc. | Systems and methods for vslam optimization |
US20170337470A1 (en) * | 2016-05-20 | 2017-11-23 | Magic Leap, Inc. | Method and system for performing convolutional image transformation estimation |
CN106127739A (en) * | 2016-06-16 | 2016-11-16 | 华东交通大学 | A kind of RGB D SLAM method of combination monocular vision |
US20190377952A1 (en) * | 2016-06-24 | 2019-12-12 | Robert Bosch Gmbh | RGB-D Camera Based Tracking System and Method Thereof |
CN107193279A (en) * | 2017-05-09 | 2017-09-22 | 复旦大学 | Robot localization and map structuring system based on monocular vision and IMU information |
EP3451288A1 (en) * | 2017-09-04 | 2019-03-06 | Universität Zürich | Visual-inertial odometry with an event camera |
WO2020111844A2 (en) * | 2018-11-28 | 2020-06-04 | 서울대학교 산학협력단 | Method and apparatus for enhancing image feature point in visual slam by using object label |
CN110443843A (en) * | 2019-07-29 | 2019-11-12 | 东北大学 | A kind of unsupervised monocular depth estimation method based on generation confrontation network |
CN110501017A (en) * | 2019-08-12 | 2019-11-26 | 华南理工大学 | A kind of Mobile Robotics Navigation based on ORB_SLAM2 ground drawing generating method |
CN111045017A (en) * | 2019-12-20 | 2020-04-21 | 成都理工大学 | Method for constructing transformer substation map of inspection robot by fusing laser and vision |
CN111089585A (en) * | 2019-12-30 | 2020-05-01 | 哈尔滨理工大学 | Mapping and positioning method based on sensor information fusion |
CN111462207A (en) * | 2020-03-30 | 2020-07-28 | 重庆邮电大学 | RGB-D simultaneous positioning and map creation method integrating direct method and feature method |
Non-Patent Citations (5)
Title |
---|
庄严, 王伟, 王珂, 徐晓东: "移动机器人基于激光测距和单目视觉的室内同时定位和地图构建", 自动化学报, no. 06 * |
李爽: "基于视觉的室内定位算法研究", 《中国优秀硕士学位论文全文数据库 信息科技辑》, pages 49 - 51 * |
王化友;代波;何玉庆;: "CFD-SLAM:融合特征法与直接法的快速鲁棒SLAM***", 高技术通讯, no. 12 * |
王消为;贺利乐;赵涛;: "基于激光雷达与双目视觉的移动机器人SLAM研究", 传感技术学报, no. 03 * |
骆燕燕;陈龙;: "融合视觉信息的激光定位与建图", 工业控制计算机, no. 12 * |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112561824A (en) * | 2020-12-21 | 2021-03-26 | 中国科学院自动化研究所 | Data alignment restoration method and system for fusion of 2D laser and depth image |
CN112561824B (en) * | 2020-12-21 | 2023-04-07 | 中国科学院自动化研究所 | Data alignment restoration method and system for fusion of 2D laser and depth image |
CN112446905A (en) * | 2021-01-29 | 2021-03-05 | 中国科学院自动化研究所 | Three-dimensional real-time panoramic monitoring method based on multi-degree-of-freedom sensing association |
CN112446905B (en) * | 2021-01-29 | 2021-05-11 | 中国科学院自动化研究所 | Three-dimensional real-time panoramic monitoring method based on multi-degree-of-freedom sensing association |
CN113219492A (en) * | 2021-03-30 | 2021-08-06 | 苏州市卫航智能技术有限公司 | Method and system for positioning and navigating river course ship driving |
CN113256807A (en) * | 2021-06-04 | 2021-08-13 | 武汉工控仪器仪表有限公司 | Effective map merging method based on context description conforming to corresponding expansion |
CN114624688A (en) * | 2022-03-15 | 2022-06-14 | 电子科技大学 | Tracking and positioning method based on multi-sensor combination |
CN114624688B (en) * | 2022-03-15 | 2023-04-07 | 电子科技大学 | Tracking and positioning method based on multi-sensor combination |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112016612A (en) | Monocular depth estimation-based multi-sensor fusion SLAM method | |
CN110827415B (en) | All-weather unknown environment unmanned autonomous working platform | |
EP3779881A1 (en) | Method for determining depth information and related device | |
CN110009674B (en) | Monocular image depth of field real-time calculation method based on unsupervised depth learning | |
CN110717927A (en) | Indoor robot motion estimation method based on deep learning and visual inertial fusion | |
CN107808407A (en) | Unmanned plane vision SLAM methods, unmanned plane and storage medium based on binocular camera | |
US20130335528A1 (en) | Imaging device capable of producing three dimensional representations and methods of use | |
CN112200165A (en) | Model training method, human body posture estimation method, device, equipment and medium | |
CN110969648B (en) | 3D target tracking method and system based on point cloud sequence data | |
KR101869605B1 (en) | Three-Dimensional Space Modeling and Data Lightening Method using the Plane Information | |
US11948309B2 (en) | Systems and methods for jointly training a machine-learning-based monocular optical flow, depth, and scene flow estimator | |
CN111462324A (en) | Online spatiotemporal semantic fusion method and system | |
CN110070578B (en) | Loop detection method | |
CN115900710A (en) | Dynamic environment navigation method based on visual information | |
CN114627491A (en) | Single three-dimensional attitude estimation method based on polar line convergence | |
CN111753696A (en) | Method for sensing scene information, simulation device and robot | |
Tarrio et al. | Realtime edge based visual inertial odometry for MAV teleoperation in indoor environments | |
CN115727854A (en) | VSLAM positioning method based on BIM structure information | |
US11188787B1 (en) | End-to-end room layout estimation | |
CN114266823A (en) | Monocular SLAM method combining SuperPoint network characteristic extraction | |
CN117711066A (en) | Three-dimensional human body posture estimation method, device, equipment and medium | |
Zhu et al. | PairCon-SLAM: Distributed, online, and real-time RGBD-SLAM in large scenarios | |
KR102181828B1 (en) | 4d rig reconstructing device and a method thereof | |
CN115880555A (en) | Target detection method, model training method, device, equipment and medium | |
US11443455B2 (en) | Prior informed pose and scale estimation |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |