CN111369608A - Visual odometer method based on image depth estimation - Google Patents

Visual odometer method based on image depth estimation Download PDF

Info

Publication number
CN111369608A
CN111369608A CN202010478460.0A CN202010478460A CN111369608A CN 111369608 A CN111369608 A CN 111369608A CN 202010478460 A CN202010478460 A CN 202010478460A CN 111369608 A CN111369608 A CN 111369608A
Authority
CN
China
Prior art keywords
image
depth
estimation
loss
pose
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010478460.0A
Other languages
Chinese (zh)
Inventor
王燕清
陈长伟
王寅同
石朝侠
杨鑫
徐创
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing Xiaozhuang University
Original Assignee
Nanjing Xiaozhuang University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing Xiaozhuang University filed Critical Nanjing Xiaozhuang University
Priority to CN202010478460.0A priority Critical patent/CN111369608A/en
Publication of CN111369608A publication Critical patent/CN111369608A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/50Depth or shape recovery
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/11Region-based segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)
  • Image Processing (AREA)

Abstract

The invention discloses a visual odometer method based on image depth estimation, and provides an algorithm idea for realizing scale consistency constraint by combining a depth image and a monocular image aiming at a common scale fuzzy problem in a monocular visual odometer. In the network structure design, a long-time memory unit and a short-time memory unit are fused into a convolution neural unit, monocular images are used for training a depth estimation network, and luminosity consistency loss is introduced into a loss function, and smoothness loss calculation is added to capture more image characteristics and generate more accurate depth images. And then, the estimated depth image and the original monocular image are combined to realize scale consistency constraint, the pose estimation network is trained, the depth estimation network and the pose estimation network are respectively subjected to experiment and result analysis, and the result shows that the visual odometer combined with the depth image estimation can solve the scale fuzzy problem in the monocular visual odometer to a certain extent.

Description

Visual odometer method based on image depth estimation
Technical Field
The invention relates to the technical field of visual odometry, in particular to a visual odometry method based on image depth estimation.
Background
The input used by monocular visual odometers (i.e. self-motion estimation of vehicles, robots from a sequence of images in a single view) is RGB images, but in the field of computer vision and robotics, the use of Depth image (Depth Map) information also provides vital information for various applications, such as autopilot, virtual reality VR, augmented reality AR applications, etc. A depth Image, also known as a Range Image, refers to an Image or Image channel that contains information about the distance to the surface of an object in a viewpoint scene, and its pixel value represents the actual distance of the sensor from the object. When the traditional monocular vision odometer is used for pose estimation, compared with a binocular vision odometer, the pose estimation method has a very obvious defect of scale ambiguity (Scale ambiguity). Scale-blurring means that the monocular visual odometer cannot judge the specific length of translational motion, i.e., the scale factor, simply by correlation between features. Most of the existing methods for solving the problem are to fuse image measurement information and other sensor information, such as Inertial Navigation System (INS), GNSS sensor information, and the like. Although the scale ambiguity problem can be solved by additionally using other sensors, the method breaks one of the biggest advantages of the monocular vision odometer structure, namely small volume and low cost.
However, most methods based on the convolutional neural network CNN only use depth estimation as a single-view task, neglecting important timing information in monocular or binocular video, the basic principle of the single-view depth estimation method is the possibility that human beings sense depth through a single image, but neglecting more important motion for human beings when distance is inferred, and moving objects existing when geometric image reconstruction is performed in a monocular vision meter affect the static assumption of a scene, thereby affecting the performance of the scene.
Disclosure of Invention
1. Network architecture design
The network architecture comprises a neural network of a monocular depth image and a monocular RGB image, the depth image estimation network and the self-movement estimation network are trained by using a monocular image frame sequence, the scale consistency constraint is realized, and the whole process comprises the following steps:
s1, image from given two consecutive frames
Figure 100002_DEST_PATH_IMAGE001
Respectively estimating the depth by using a depth estimation network to obtain corresponding depth images
Figure 100002_DEST_PATH_IMAGE002
S2, converting the original image
Figure 100002_DEST_PATH_IMAGE003
With corresponding depth image estimation
Figure 100002_DEST_PATH_IMAGE004
Collectively as inputs to an auto-motion estimation network, and outputs a pose prediction for the camera at time t
Figure 100002_DEST_PATH_IMAGE005
S3, converting the estimated pose into a pose transformation matrix of 4 × 4
Figure 100002_DEST_PATH_IMAGE006
Is calculated according to the transformation matrix
Figure 212245DEST_PATH_IMAGE004
Depth image of next frame
Figure 100002_DEST_PATH_IMAGE007
By calculating
Figure 100002_DEST_PATH_IMAGE008
And the consistency between the pose and the pose is lost, and the model training is carried out to improve the scale consistency of the pose prediction, as shown in figure 1.
2. Depth estimation network
The depth estimation network employs a self-encoding-decoding U-type network architecture, as shown in fig. 2. According to the invention, a cyclic nerve unit and an encoder unit are fused to form a long-time memory unit as a self-encoding part of a network, so that spatial information and time information are simultaneously utilized; the spatio-temporal features computed by the encoder are then input into a decoder network for accurate depth image estimation and reconstruction, and the decoder part fuses low-level feature representations from different levels of the encoder by using a jump connection method, and fig. 3 shows specific parameter settings of a neural network architecture for depth estimation.
3. Pose estimation network
The neural network for pose estimation uses a VGG16 convolutional neural network architecture and is designed by fusing a cyclic neural unit, and the visual odometry network is characterized in that: 1) in the scheme, the input of the visual odometer comprises the depth image information of the current frame, so that the scale consistency of a scene between the depth and the pose is ensured; 2) the input adopted by the visual odometer is the joint representation of the image frame and the depth image corresponding to a single time point, and the information of the previous frame is stored in the hidden layer; 3) the visual odometry network is able to maintain the same scene scale when run over the entire image sequence.
4. Loss function
Computing predicted depth images
Figure 100002_DEST_PATH_IMAGE009
And a loss of photometric consistency between the known depth image data, performing supervised training on the depth estimation neural network; the luminosity loss provides less information in a low-texture environment, and smoothness loss calculation is also added during depth estimation; in the part of the visual odometer, pose information estimated by a network and truth value information provided in a data set are used for calculating pose estimation loss, so that supervision training of a pose estimation network is realized; introducing geometric consistency loss, performing torsion transformation on the depth image estimated in the previous frame according to the pose transformation matrix, and calculating the difference between the depth image estimated in the previous frame and the depth image estimated in the next frame; the overall objective loss function is calculated as follows:
Figure 100002_DEST_PATH_IMAGE010
(1)
where the loss of photometric consistency and the loss of smoothness are represented respectively,
Figure 100002_DEST_PATH_IMAGE012
representing pose estimation loss and representing geometric consistency loss; in order to balance the scale and the size of each loss calculation result, a corresponding weight parameter is added for the calculation of the loss of each category; parameters are also added to control the degree of smoothing of the depth image.
4.1 loss of photometric uniformity and smoothness
The brightness consistency and the space smooth prior used in the dense association algorithm are used for calculating the luminosity difference between the estimated depth image and the really acquired depth image information, and are used as the loss function of network training, and the calculation formula of the luminosity consistency loss function is expressed as follows:
Figure 100002_DEST_PATH_IMAGE015
(2)
wherein the content of the first and second substances,
Figure 100002_DEST_PATH_IMAGE016
expressing the number of pixel points in the image, and expressing the set of all the pixel points in the image by V; the L1 norm loss function is selected in the calculation of the loss function; the L1 norm loss function, also called the minimum absolute deviation or minimum absolute error, is calculated as and minimized by the sum of the absolute values of the differences between the estimated and target values; compared with the L2 norm loss function for calculating the sum of the squares of the differences, the calculation method of the L1 loss function has better robustness in processing abnormal values, and the L1 norm loss in the photometric consistency difference can be calculated according to the following formula:
Figure 100002_DEST_PATH_IMAGE017
(3)
the luminance loss is less in the information quantity provided when the scene is uniformly distributed and the texture is less, more information is generated by calculating multiple differences, and the calculation of smoothness loss is introduced, so that the network can more sensitively sense the edge information in the image, and the accuracy of the output result in a low-texture environment is ensured; the smoothness loss is calculated as follows:
Figure 100002_DEST_PATH_IMAGE018
(4)
wherein
Figure 100002_DEST_PATH_IMAGE019
Representing the first derivative along the spatial direction by which it is ensured that the smoothness is guided by edges in the image.
4.2 pose estimation loss
The pose estimation loss is used for representing the estimated absolute pose in a six-dimensional vector form, and the six-dimensional pose vector consists of a three-dimensional vector representing the position and a three-dimensional vector representing the posture; true pose vector to be provided
Figure 100002_DEST_PATH_IMAGE023
And fitting the estimated pose vector, and calculating the error between the two as a loss function of pose estimation:
Figure 100002_DEST_PATH_IMAGE025
(5)
wherein the parameters
Figure 100002_DEST_PATH_IMAGE026
And represents a scale factor to balance the difference between the displacement error and the rotation error.
4.3 loss of geometric consistency
Loss of geometric consistency, enhancement of geometric consistency of predicted results, and requirement of depth images of two frames at adjacent moments
Figure 768734DEST_PATH_IMAGE009
And
Figure 172034DEST_PATH_IMAGE027
the method conforms to the same scene architecture and minimizes the difference between the two; the geometric consistency between sample images of the same training batch can be improved, and the geometric consistency of the whole image sequence is realized through the transitivity of the sample images, for example, the depth images of It and It +1 in the same training batch are kept consistent, while the depth images of It +1 and It +2 are consistent in another training batch, so that the consistency of the depth images of It and It +2 can be ensured even though not necessarily in the same training batchThe consistency of the depth images of the whole image sequence is realized; in the training process, the pose estimation network and the depth estimation network are naturally coupled, and a prediction result with consistent scale can be generated in the whole image sequence; according to the constraint, the inconsistency of the depth images of the adjacent frames is calculated, and for any pixel point P in the depth image, the depth image difference of the adjacent frames
Figure 100002_DEST_PATH_IMAGE028
The formula is defined as follows:
Figure 100002_DEST_PATH_IMAGE029
(6)
wherein the content of the first and second substances,
Figure 368660DEST_PATH_IMAGE027
the depth image corresponding to the image frame at the t +1 moment calculated by the depth estimation neural network is shown,
Figure 100002_DEST_PATH_IMAGE030
the expression is that the depth estimation neural network carries out depth image estimation on the image frame at the t moment
Figure 591831DEST_PATH_IMAGE009
And estimating the pose transformation matrix from the current time to the next time output by the neural network according to the self-motion
Figure 100002_DEST_PATH_IMAGE031
To pair
Figure 540894DEST_PATH_IMAGE009
The depth image after being transformed, i.e.
Figure 100002_DEST_PATH_IMAGE032
(7)
Because the camera is in continuous motion, the acquired image scene is continuously changed, and the inconsistency of calculation is ensured by cutting the depth imageValidity of pixel points, each pixel point calculated correspondingly
Figure 533121DEST_PATH_IMAGE028
Summing to standardize the calculation difference of the depth images; during optimization, points with different absolute depths are equally processed, so that the calculation of absolute distances is more visual than that of absolute distances; the function is a symmetrical function, and the value range of the function is between 0 and 1, so that the stability of the training value is ensured; from the inconsistency map described above, the proposed geometric consistency penalty is defined as follows:
Figure 100002_DEST_PATH_IMAGE033
(8)
wherein V represents all pixel points after performing matrix transformation calculation and clipping on the depth image,
Figure 100002_DEST_PATH_IMAGE034
representing the number of pixel points in V; the formula algorithm guarantees scale consistency between adjacent image pairs by minimizing the geometric distance of the predicted depth, and propagates the consistency into the whole image sequence through training. The self-motion estimation network and the depth estimation network are closely linked, and the self-motion estimation network can finally predict tracks with consistent scale in the global range.
Advantageous effects
The invention discloses a visual odometer method based on image depth estimation, and provides an algorithm idea for realizing scale consistency constraint by combining a depth image and a monocular image aiming at a common scale fuzzy problem in a monocular visual odometer. In the network structure design, a long-time memory unit and a short-time memory unit are fused into a convolution neural unit, monocular images are used for training a depth estimation network, and luminosity consistency loss is introduced into a loss function, and smoothness loss calculation is added to capture more image characteristics and generate more accurate depth images. And then, the estimated depth image and the original monocular image are combined to realize scale consistency constraint, the pose estimation network is trained, the depth estimation network and the pose estimation network are respectively subjected to experiment and result analysis, and the result shows that the visual odometer combined with the depth image estimation can solve the scale fuzzy problem in the monocular visual odometer to a certain extent.
Drawings
FIG. 1 is a diagram of a visual odometry network architecture incorporating depth image estimation.
Fig. 2 is a structural design diagram of a depth estimation network.
Fig. 3 is a parameter setting diagram of the depth estimation network.
Fig. 4 is a pose estimation network architecture diagram.
Fig. 5 is a parameter setting diagram of the pose estimation network.
FIG. 6 is a graph of test results under the Eigen split data set.
Fig. 7 is a graph of test results under the KITTI Odometry data set.
Fig. 8 is a track reconstruction result diagram of the sequence 01 pose estimation network model in each sequence.
Fig. 9 is a track reconstruction result diagram of the sequence 05 pose estimation network model in each sequence.
Fig. 10 is a track reconstruction result diagram of the sequence 09 pose estimation network model in each sequence.
Detailed Description
1. Introduction to data set
The invention analyzes and evaluates the frame performance provided by the display experiment result, and compares the frame performance with the prior work for depth estimation and pose estimation of the visual odometer. The system mainly trains on a KITTI original data set (raw data), wherein the acquisition frequency of the data set is 10Hz, the data set comprises an original binocular color and gray level image sequence (which is not synchronized and corrected) and a binocular color and gray level image sequence after synchronization and correction, 3D point cloud map information (about 10 ten thousand points are corresponding to each frame and stored in a binary floating point matrix form), 3D GPS/IMU data information (txt files storing positioning information, speed, acceleration and meta information), related camera calibration information and label information of a 3D object. There were 61 video sequences in the entire data set. When the experiment of the monocular depth image estimation network is carried out, an Eigen data set segmentation method and a KITTI Odometry data set segmentation method are referred to at the same time. When the performance of the visual odometer part is evaluated, the experiment is based on a KITTIOdometer data set, and meanwhile, the images in the data set and the depth images generated by the depth estimation network are combined to carry out network model training. And it is noted that there are overlapping portions between the Eigen and odometric data sets, the two segmentation methods are described below.
Data set segmentation method
In Eigen et al, a total of 697 frame images from a sequence of 28 images were selected as a test set for monocular depth estimation. The other 33 scene sequences, 23488 frames of binocular image pairs as training sets, and the binocular images are respectively used as images acquired by two monocular cameras. Since image reprojection loss is caused by parallax at the time of motion, all still frame images with a motion s less than 0.3 meters from the baseline are discarded during the data preparation phase.
There are 11 image sequences in the Odometry dataset that contain true values of camera pose. When the pose estimation network is evaluated, a 00-08 (03 is not included) image sequence in the data set is used as a training data set, and a 09-10 image sequence is used as a data set for test evaluation.
3. Results and analysis of the experiments
Fig. 6 shows the results of the depth estimation images output after inputting different images, respectively, and it can be seen that the models can output more accurate depth estimation results in different scenes. In order to show the higher robustness of the model, two images of the same object in the same scene under different illumination conditions are respectively input in fig. 7 (the vehicle circled in the figure is under the conditions of sunlight irradiation and tree shadow shielding in the two images respectively), and the corresponding output results show that the model can still accurately detect the object in the image under different illumination conditions.
When the performance of the depth estimation neural network is evaluated and compared with other existing methods, two test set segmentation methods, namely Eigen segmentation and KITTI segmentation, are used simultaneously. The comparison index is divided into an error index part and a precision index part, the error index part comprises an absolute relative error Abs Rel, a square relative error Sq Rel, a root mean square error RMSE and a root mean square logarithmic error RMSE log, and the smaller the error value is, the better the performance is; the accuracy index portion includes the larger the value the better the performance. In the KITTI segmentation method, the test data set contains a total of 200 images acquired from 28 different scenes, each image having corresponding true value data. Fig. 6 and fig. 7 show the test results of the depth estimation network under the Eigen segmentation data set and the kttiodometry data set, respectively, and compare with the existing method.
After the depth estimation network training is carried out, the pose estimation network, namely the visual odometer part, is subjected to model training by combining the output depth image and the original RGB image. Fig. 8-10 show the final trajectory reconstruction effect, the results of which indicate that the visual odometer combined with depth image estimation can solve the scale blur problem in monocular visual odometers to some extent.
It is to be noted that, in the present invention, relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus.
Although embodiments of the present invention have been shown and described, it will be appreciated by those skilled in the art that changes, modifications, substitutions and alterations can be made in these embodiments without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents.

Claims (7)

1. A visual odometry method based on image depth estimation is characterized in that: the network architecture comprises a neural network of a monocular depth image and a monocular RGB image, the depth image estimation network and the self-movement estimation network are trained by using a monocular image frame sequence, the scale consistency constraint is realized, and the whole process comprises the following steps:
s1, image from given two consecutive frames
Figure DEST_PATH_IMAGE001
Respectively estimating the depth by using a depth estimation network to obtain corresponding depth images
Figure DEST_PATH_IMAGE002
S2, converting the original image
Figure DEST_PATH_IMAGE003
With corresponding depth image estimation
Figure DEST_PATH_IMAGE004
Collectively as inputs to an auto-motion estimation network, and outputs a pose prediction for the camera at time t
Figure DEST_PATH_IMAGE005
S3, converting the predicted pose into a pose transformation matrix of 4 × 4
Figure DEST_PATH_IMAGE006
Is calculated according to the transformation matrix
Figure 698773DEST_PATH_IMAGE004
Depth image of next frame
Figure DEST_PATH_IMAGE007
By calculating
Figure DEST_PATH_IMAGE008
And the consistency between the pose and the pose is lost, and the model training is carried out to improve the scale consistency of the pose prediction.
2. The visual odometry method based on image depth estimation according to claim 1, characterized in that: the depth estimation network uses a self-coding-decoding U-shaped network architecture, a cyclic neural unit and an encoder unit are fused to form a long-time memory unit which is used as a self-coding part of the network, and space information and time information are utilized simultaneously; the space-time characteristics calculated by the encoder are input into a decoder network for accurate depth image estimation and reconstruction, and the decoder part fuses low-level characteristic representations from different levels of the encoder by using a jump connection method.
3. The visual odometry method based on image depth estimation according to claim 1, characterized in that: the neural network for pose estimation uses a VGG16 convolutional neural network architecture and is designed by fusing a cyclic neural unit, and the visual odometry network is characterized in that: 1) the input of the visual odometer comprises the depth image information of the current frame, so that the scale consistency of a scene between the depth and the pose is ensured; 2) the input adopted by the visual odometer is the joint representation of the image frame and the depth image corresponding to a single time point, and the information of the previous frame is stored in the hidden layer; 3) the visual odometry network is able to maintain the same scene scale when run over the entire image sequence.
4. The visual odometry method based on image depth estimation according to claim 1, characterized in that: computing predicted depth images
Figure DEST_PATH_IMAGE009
And a loss of photometric consistency between the known depth image data, performing supervised training on the depth estimation neural network; the luminosity loss provides less information in a low-texture environment, and smoothness loss calculation is also added during depth estimation;in the part of the visual odometer, pose information estimated by a network and truth value information provided in a data set are used for calculating pose estimation loss, so that supervision training of a pose estimation network is realized; introducing geometric consistency loss, performing torsion transformation on the depth image estimated in the previous frame according to the pose transformation matrix, and calculating the difference between the depth image estimated in the previous frame and the depth image estimated in the next frame; the overall objective loss function is calculated as follows:
Figure DEST_PATH_IMAGE010
(1)
where the loss of photometric consistency and the loss of smoothness are represented respectively,
Figure DEST_PATH_IMAGE012
representing pose estimation loss and representing geometric consistency loss; in order to balance the scale and size of each loss calculation result, a corresponding weight parameter is added for the calculation of the loss of each category, and a parameter is also added to control the smoothness degree of the depth image.
5. The visual odometry method based on image depth estimation according to claim 4, characterized in that: luminosity consistency loss and smoothness loss, brightness consistency and space smooth prior used in a dense correlation algorithm, luminosity difference calculation is carried out on the estimated depth image and the really acquired depth image information and is used as a loss function of network training, and a calculation formula of the luminosity consistency loss function is expressed as follows:
Figure DEST_PATH_IMAGE015
(2)
wherein the content of the first and second substances,
Figure DEST_PATH_IMAGE016
expressing the number of pixel points in the image, and expressing the set of all the pixel points in the image by V; the L1 norm loss function is selected in the calculation of the loss function; l1 norm lossThe loss function, also called the minimum absolute deviation or minimum absolute error, is calculated as the sum of the absolute values of the differences between the estimated value and the target value and minimized; compared with the L2 norm loss function for calculating the sum of the squares of the differences, the calculation method of the L1 loss function has better robustness in processing abnormal values, and the L1 norm loss in the photometric consistency difference can be calculated according to the following formula:
Figure DEST_PATH_IMAGE017
(3)
the luminance loss is less in the information quantity provided when the scene is uniformly distributed and the texture is less, more information is generated by calculating multiple differences, and the calculation of smoothness loss is introduced, so that the network can more sensitively sense the edge information in the image, and the accuracy of the output result in a low-texture environment is ensured; the smoothness loss is calculated as follows:
Figure DEST_PATH_IMAGE018
(4)
wherein
Figure DEST_PATH_IMAGE019
Representing the first derivative along the spatial direction, P is an arbitrary pixel in the image.
6. The visual odometry method based on image depth estimation according to claim 4, characterized in that: the pose estimation loss is used for representing the estimated absolute pose in a six-dimensional vector form, and the six-dimensional pose vector consists of a three-dimensional vector representing the position and a three-dimensional vector representing the posture; true pose vector to be provided
Figure DEST_PATH_IMAGE023
And fitting the estimated pose vector, and calculating the error between the two as a loss function of pose estimation:
Figure DEST_PATH_IMAGE025
(5)
wherein the parameters
Figure DEST_PATH_IMAGE026
And represents a scale factor to balance the difference between the displacement error and the rotation error.
7. The visual odometry method based on image depth estimation according to claim 4, characterized in that: loss of geometric consistency, enhancement of geometric consistency of predicted results, and requirement of depth images of two frames at adjacent moments
Figure 756858DEST_PATH_IMAGE009
And
Figure DEST_PATH_IMAGE027
the method conforms to the same scene architecture and minimizes the difference between the two; geometric consistency between sample images of the same training batch can be improved, and the geometric consistency of the whole image sequence is realized through the transitivity of the sample images, for example, the depth images of It and It +1 in the same training batch are kept consistent, while the depth images of It +1 and It +2 are consistent in another training batch, so that the consistency of the depth images of It and It +2 can be ensured even though not necessarily in the same training batch, and the consistency of the depth images of the whole image sequence is realized; in the training process, the pose estimation network and the depth estimation network are naturally coupled, and a prediction result with consistent scale can be generated in the whole image sequence; according to the constraint, the inconsistency of the depth images of the adjacent frames is calculated, and for any pixel point P in the depth image, the depth image difference of the adjacent frames
Figure DEST_PATH_IMAGE028
The formula is defined as follows:
Figure DEST_PATH_IMAGE029
(6)
wherein the content of the first and second substances,
Figure 97448DEST_PATH_IMAGE027
the depth image corresponding to the image frame at the t +1 moment calculated by the depth estimation neural network is shown,
Figure DEST_PATH_IMAGE030
the expression is that the depth estimation neural network carries out depth image estimation on the image frame at the t moment
Figure 289395DEST_PATH_IMAGE009
And estimating the pose transformation matrix from the current time to the next time output by the neural network according to the self-motion
Figure DEST_PATH_IMAGE031
To pair
Figure 697242DEST_PATH_IMAGE009
The depth image after being transformed, i.e.
Figure DEST_PATH_IMAGE032
(7)
Because the camera is in continuous motion, the acquired image scene is in continuous change, the effectiveness of calculating inconsistent pixel points is ensured by cutting the depth image, and the depth image difference of adjacent frames calculated by each pixel point is corresponding to
Figure 988546DEST_PATH_IMAGE028
Summing to standardize the calculation difference of the depth images; during optimization, points with different absolute depths are equally processed, so that the calculation of absolute distances is more visual than that of absolute distances; the function is a symmetrical function, and the value range of the function is between 0 and 1, so that the stability of the training value is ensured; according to the above-mentioned inconsistency map, proposed geometry oneSexual loss is defined as follows:
Figure DEST_PATH_IMAGE033
(8)
wherein V represents all pixel points after performing matrix transformation calculation and clipping on the depth image,
Figure DEST_PATH_IMAGE034
representing the number of pixel points in V; the formula algorithm guarantees scale consistency between adjacent image pairs by minimizing the geometric distance of the predicted depth, and propagates the consistency into the whole image sequence through training.
CN202010478460.0A 2020-05-29 2020-05-29 Visual odometer method based on image depth estimation Pending CN111369608A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010478460.0A CN111369608A (en) 2020-05-29 2020-05-29 Visual odometer method based on image depth estimation

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010478460.0A CN111369608A (en) 2020-05-29 2020-05-29 Visual odometer method based on image depth estimation

Publications (1)

Publication Number Publication Date
CN111369608A true CN111369608A (en) 2020-07-03

Family

ID=71211134

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010478460.0A Pending CN111369608A (en) 2020-05-29 2020-05-29 Visual odometer method based on image depth estimation

Country Status (1)

Country Link
CN (1) CN111369608A (en)

Cited By (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111899280A (en) * 2020-07-13 2020-11-06 哈尔滨工程大学 Monocular vision odometer method adopting deep learning and mixed pose estimation
CN112052626A (en) * 2020-08-14 2020-12-08 杭州未名信科科技有限公司 Automatic neural network design system and method
CN112102399A (en) * 2020-09-11 2020-12-18 成都理工大学 Visual mileage calculation method based on generative antagonistic network
CN112150531A (en) * 2020-09-29 2020-12-29 西北工业大学 Robust self-supervised learning single-frame image depth estimation method
CN112184611A (en) * 2020-11-03 2021-01-05 支付宝(杭州)信息技术有限公司 Image generation model training method and device
CN112308918A (en) * 2020-10-26 2021-02-02 杭州电子科技大学 Unsupervised monocular vision odometer method based on pose decoupling estimation
CN112561978A (en) * 2020-12-18 2021-03-26 北京百度网讯科技有限公司 Training method of depth estimation network, depth estimation method of image and equipment
CN112819853A (en) * 2021-02-01 2021-05-18 太原理工大学 Semantic prior-based visual odometer method
CN113012191A (en) * 2021-03-11 2021-06-22 中国科学技术大学 Laser mileage calculation method based on point cloud multi-view projection graph
CN113160294A (en) * 2021-03-31 2021-07-23 中国科学院深圳先进技术研究院 Image scene depth estimation method and device, terminal equipment and storage medium
CN113570658A (en) * 2021-06-10 2021-10-29 西安电子科技大学 Monocular video depth estimation method based on depth convolutional network
CN114463420A (en) * 2022-01-29 2022-05-10 北京工业大学 Visual mileage calculation method based on attention convolution neural network
CN114526728A (en) * 2022-01-14 2022-05-24 浙江大学 Monocular vision inertial navigation positioning method based on self-supervision deep learning
CN114663509A (en) * 2022-03-23 2022-06-24 北京科技大学 Self-supervision monocular vision odometer method guided by key point thermodynamic diagram
CN114998411A (en) * 2022-04-29 2022-09-02 中国科学院上海微***与信息技术研究所 Self-supervision monocular depth estimation method and device combined with space-time enhanced luminosity loss
WO2023109221A1 (en) * 2021-12-14 2023-06-22 北京地平线信息技术有限公司 Method and apparatus for determining homography matrix, medium, device, and program product
WO2023165093A1 (en) * 2022-03-01 2023-09-07 上海商汤智能科技有限公司 Training method for visual inertial odometer model, posture estimation method and apparatuses, electronic device, computer-readable storage medium, and program product
CN117197229A (en) * 2023-09-22 2023-12-08 北京科技大学顺德创新学院 Multi-stage estimation monocular vision odometer method based on brightness alignment
WO2024012405A1 (en) * 2022-07-11 2024-01-18 华为技术有限公司 Calibration method and apparatus
CN117456531A (en) * 2023-12-25 2024-01-26 乐山职业技术学院 Multi-view pure rotation anomaly identification and automatic mark training method, equipment and medium

Cited By (31)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111899280A (en) * 2020-07-13 2020-11-06 哈尔滨工程大学 Monocular vision odometer method adopting deep learning and mixed pose estimation
CN111899280B (en) * 2020-07-13 2023-07-25 哈尔滨工程大学 Monocular vision odometer method adopting deep learning and mixed pose estimation
CN112052626A (en) * 2020-08-14 2020-12-08 杭州未名信科科技有限公司 Automatic neural network design system and method
CN112052626B (en) * 2020-08-14 2024-01-19 杭州未名信科科技有限公司 Automatic design system and method for neural network
CN112102399B (en) * 2020-09-11 2022-07-19 成都理工大学 Visual mileage calculation method based on generative antagonistic network
CN112102399A (en) * 2020-09-11 2020-12-18 成都理工大学 Visual mileage calculation method based on generative antagonistic network
CN112150531A (en) * 2020-09-29 2020-12-29 西北工业大学 Robust self-supervised learning single-frame image depth estimation method
CN112308918A (en) * 2020-10-26 2021-02-02 杭州电子科技大学 Unsupervised monocular vision odometer method based on pose decoupling estimation
CN112308918B (en) * 2020-10-26 2024-03-29 杭州电子科技大学 Non-supervision monocular vision odometer method based on pose decoupling estimation
CN112184611A (en) * 2020-11-03 2021-01-05 支付宝(杭州)信息技术有限公司 Image generation model training method and device
CN112561978B (en) * 2020-12-18 2023-11-17 北京百度网讯科技有限公司 Training method of depth estimation network, depth estimation method of image and equipment
CN112561978A (en) * 2020-12-18 2021-03-26 北京百度网讯科技有限公司 Training method of depth estimation network, depth estimation method of image and equipment
CN112819853B (en) * 2021-02-01 2023-07-25 太原理工大学 Visual odometer method based on semantic priori
CN112819853A (en) * 2021-02-01 2021-05-18 太原理工大学 Semantic prior-based visual odometer method
CN113012191B (en) * 2021-03-11 2022-09-02 中国科学技术大学 Laser mileage calculation method based on point cloud multi-view projection graph
CN113012191A (en) * 2021-03-11 2021-06-22 中国科学技术大学 Laser mileage calculation method based on point cloud multi-view projection graph
CN113160294A (en) * 2021-03-31 2021-07-23 中国科学院深圳先进技术研究院 Image scene depth estimation method and device, terminal equipment and storage medium
CN113570658A (en) * 2021-06-10 2021-10-29 西安电子科技大学 Monocular video depth estimation method based on depth convolutional network
WO2023109221A1 (en) * 2021-12-14 2023-06-22 北京地平线信息技术有限公司 Method and apparatus for determining homography matrix, medium, device, and program product
CN114526728A (en) * 2022-01-14 2022-05-24 浙江大学 Monocular vision inertial navigation positioning method based on self-supervision deep learning
CN114526728B (en) * 2022-01-14 2023-12-05 浙江大学 Monocular vision inertial navigation positioning method based on self-supervision deep learning
CN114463420A (en) * 2022-01-29 2022-05-10 北京工业大学 Visual mileage calculation method based on attention convolution neural network
WO2023165093A1 (en) * 2022-03-01 2023-09-07 上海商汤智能科技有限公司 Training method for visual inertial odometer model, posture estimation method and apparatuses, electronic device, computer-readable storage medium, and program product
CN114663509A (en) * 2022-03-23 2022-06-24 北京科技大学 Self-supervision monocular vision odometer method guided by key point thermodynamic diagram
CN114998411A (en) * 2022-04-29 2022-09-02 中国科学院上海微***与信息技术研究所 Self-supervision monocular depth estimation method and device combined with space-time enhanced luminosity loss
CN114998411B (en) * 2022-04-29 2024-01-09 中国科学院上海微***与信息技术研究所 Self-supervision monocular depth estimation method and device combining space-time enhancement luminosity loss
WO2024012405A1 (en) * 2022-07-11 2024-01-18 华为技术有限公司 Calibration method and apparatus
CN117197229A (en) * 2023-09-22 2023-12-08 北京科技大学顺德创新学院 Multi-stage estimation monocular vision odometer method based on brightness alignment
CN117197229B (en) * 2023-09-22 2024-04-19 北京科技大学顺德创新学院 Multi-stage estimation monocular vision odometer method based on brightness alignment
CN117456531B (en) * 2023-12-25 2024-03-19 乐山职业技术学院 Multi-view pure rotation anomaly identification and automatic mark training method, equipment and medium
CN117456531A (en) * 2023-12-25 2024-01-26 乐山职业技术学院 Multi-view pure rotation anomaly identification and automatic mark training method, equipment and medium

Similar Documents

Publication Publication Date Title
CN111369608A (en) Visual odometer method based on image depth estimation
Shamwell et al. Unsupervised deep visual-inertial odometry with online error correction for RGB-D imagery
CN107945204B (en) Pixel-level image matting method based on generation countermeasure network
CN108961327B (en) Monocular depth estimation method and device, equipment and storage medium thereof
US20210142095A1 (en) Image disparity estimation
US20180308240A1 (en) Method for estimating the speed of movement of a camera
CN111311666A (en) Monocular vision odometer method integrating edge features and deep learning
CN110689562A (en) Trajectory loop detection optimization method based on generation of countermeasure network
US11082633B2 (en) Method of estimating the speed of displacement of a camera
CN110675418A (en) Target track optimization method based on DS evidence theory
CN110009674A (en) Monocular image depth of field real-time computing technique based on unsupervised deep learning
CN112233179B (en) Visual odometer measuring method
CN113963240A (en) Comprehensive detection method for multi-source remote sensing image fusion target
CN112802096A (en) Device and method for realizing real-time positioning and mapping
Chen et al. A stereo visual-inertial SLAM approach for indoor mobile robots in unknown environments without occlusions
CN111998862A (en) Dense binocular SLAM method based on BNN
CN114581571A (en) Monocular human body reconstruction method and device based on IMU and forward deformation field
CN117274515A (en) Visual SLAM method and system based on ORB and NeRF mapping
CN117367427A (en) Multi-mode slam method applicable to vision-assisted laser fusion IMU in indoor environment
CN103839280B (en) A kind of human body attitude tracking of view-based access control model information
CN116188550A (en) Self-supervision depth vision odometer based on geometric constraint
CN115482252A (en) Motion constraint-based SLAM closed loop detection and pose graph optimization method
Pirvu et al. Depth distillation: unsupervised metric depth estimation for UAVs by finding consensus between kinematics, optical flow and deep learning
CN112731503A (en) Pose estimation method and system based on front-end tight coupling
CN112307917A (en) Indoor positioning method integrating visual odometer and IMU

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20200703