CN113108771B

CN113108771B - Movement pose estimation method based on closed-loop direct sparse visual odometer

Info

Publication number: CN113108771B
Application number: CN202110245806.7A
Authority: CN
Inventors: 李奎霖; 魏武; 曾锦秀; 肖文煜
Original assignee: South China University of Technology SCUT
Current assignee: South China University of Technology SCUT
Priority date: 2021-03-05
Filing date: 2021-03-05
Publication date: 2022-08-16
Anticipated expiration: 2041-03-05
Also published as: CN113108771A

Abstract

The invention discloses a closed-loop direct sparse vision odometer-based mobile pose estimation method, which comprises the following steps of: acquiring a peripheral image sequence of the robot; initializing the front end of the visual odometer on the basis of the first two frames of the image sequence, and determining a reference frame; the front end of the visual odometer solves the relative pose change among image sequence frames by a direct method based on the minimized luminosity error and judges whether the image sequence frames are key frames or not; adding the new key frame into the rear end of the visual odometer for optimization, and simultaneously combining all key frame poses and activation points to optimize a total luminosity error function; and performing marginalization processing on the old key frame, performing closed-loop detection based on the global pose graph, and determining the optimal motion pose estimation among frames. The invention can enable the robot to accurately and rapidly resolve the motion pose in a complex environment, obviously reduce the accumulated errors of rotation, translation, scale drift and the like through pose graph optimization, and has the advantages of less time consumption, better real-time property and stronger robustness in the area lacking the corner points.

Description

Movement pose estimation method based on closed-loop direct sparse visual odometer

Technical Field

The invention relates to the technical field of machine vision and instant positioning, in particular to a mobile pose estimation method based on a closed-loop direct sparse vision odometer.

Background

The research field of the mobile robot comprises a plurality of sub-problems, wherein the positioning and mapping problem is a key problem which needs to be solved for realizing the autonomous movement and obstacle avoidance function of the robot. The robot can rely on information provided by a Global Positioning System (GPS) to a great extent in outdoor motion, but is limited by the accuracy and coverage of the GPS, positioning and Mapping of indoor scenes often depends on a Simultaneous positioning and Mapping (SLAM) algorithm, and since the 20 th century and 80 th century, an instant positioning and Mapping (SLAM) system has become an active research field in computer vision and robotics, and the SLAM system has become a basic module for many applications requiring real-time positioning, such as mobile robots, automatic driving, virtual and augmented reality. Visual SLAM is popular in part because cameras are readily available in consumer products and passively acquire rich information about the environment.

The SLAM system consists of a visual odometer front end and a back end optimization, the back end creates and maintains key frame mapping, and global drift is reduced through cyclic closed loop detection and map optimization. The front end can solve the interframe motion pose according to the current video frame overall situation, and the visual odometer pose solving method can be divided into two types: indirect (feature-based) methods, minimizing reprojection errors by the pairwise relationship between feature points between frames, and direct methods, jointly estimating pose by minimizing photometric errors between frames.

Although feature-based methods have long been the mainstay, recent advances in direct methods have shown greater accuracy and robustness, particularly when the image does not contain sufficient well-defined corner features. Whereas the closed-loop detection generally uses the traditional feature-based Bag-of-word (Bow).

At present, most of visual inertial odometers are based on a characteristic point method, and have the following problems:

the feature point method needs to extract a large number of feature points in each image, perform feature point matching between adjacent image frames, and perform pose estimation according to the matched feature points. Because a large number of feature points are extracted and matched in each frame, the calculation amount of the system is large, and certain influence is exerted on the real-time performance. And the method is very dependent on environmental information, and can hardly work when the acquired image has no good texture features.

Compared with the characteristic point method, the sparse direct method applied by the invention saves the time for subsequent characteristic point extraction and characteristic point matching, and has better real-time performance. The method has better performance in weak texture regions lacking angular points, and meanwhile, the accumulated errors such as rotation, translation, scale drift and the like are remarkably reduced by applying closed-loop detection.

Disclosure of Invention

The invention aims to provide a mobile pose estimation method based on a closed-loop direct sparse visual odometer on the premise of saving cost and ensuring reliability, and provides a method for realizing accurate and rapid pose resolving of a robot in a complex environment by applying a direct visual odometer calculation method to perform indoor robot visual positioning.

The invention is realized by at least one of the following technical schemes.

A mobile pose estimation method based on a closed-loop direct sparse visual odometer comprises the following steps:

s1, acquiring a robot peripheral image sequence;

step S2, initializing based on the first two frames of images, and constructing a reference frame of a subsequent image sequence;

s3, acquiring relative pose change between frames and judging whether the pose change is a key frame;

step S4, adding a rear-end sliding window into the new key frame, and jointly constructing a total luminosity error function by the positions and postures of all the key frames in the window and the key points extracted from the key frames, and optimizing the initial value of the relative position and posture in front-end estimation;

and step S5, marginalizing the old key frames, constructing a global pose graph based on descriptors of all the key frames for closed-loop detection, and estimating the pose of the robot.

Preferably, step S2 is further to initialize the input of the camera sensor at the front end of the odometer; the initialization needs two frames of images, the first frame is used as a reference frame, namely a Gaussian pyramid is constructed for the images according to a world coordinate system of a camera coordinate system where the first frame is located, gradient histograms of all pixel points in the 0 th layer of the image pyramid are counted, different gradient points are extracted from the images layer by taking the number of bits as a gradient threshold, a Nearest neighbor (Fast Library for Approximate neighbor Neighbors) index is constructed for each gradient point, s points closest to the s gradient points are found in the same layer to construct a neighborhood relation, a point with the minimum distance to the current point is found in the Nearest neighbor index of the next layer and is set as an associated point, and the current layer and the next layer are associated, so that the spread of the optical flow pyramid and the inverse depth value is facilitated.

Preferably, in step S2, let the second frame be I _v The first frame is I _u ，I _v Based on I _u The selected key point with the maximum gradient is used for constructing interframe luminosityError function:

E _u，v，w ＝ω _p (I _v [p｀]-a _vu I _u [p]-b _vu )

wherein ω is _p To compensate for the weighting parameter, p is I _u Point of medium gradient, p' denotes p is in the current frame I _v Back-projected coordinates of (1) _vu Correcting the parameters for the exposure time, b _vu Correcting parameters for the image brightness basis;

iteratively minimizing the photometric error by a gauss-newton method; if the error is optimized and not converged, judging that the initialization fails, resetting the first frame and repeating the process; otherwise, the initialization is successful, the first frame is set as a key frame, the gradient points extracted from the first frame are activated, and the gradient points are added into the optimization of the sliding window; the second frame enters sliding window optimization after initialization.

Preferably, the gradient point activation is divided into an activation point and an immature point, the immature point being a point extracted in the image or not converged in depth; and the activated points are continuously filtered among frames through a depth filter, so that the depth of the points is continuously converged, and finally the obtained points are activated and added into a sliding window.

Preferably, step S3 is further to perform pose solution on each input frame of new image based on the latest key frame, that is, project the gradient points of the latest key frame into the current frame to construct a photometric error function:

E _u，v，w ＝ω _p (I _v [p`]-a _vu I _u [p]-b _vu )；

wherein ω is _p To compensate for the weighting parameter, p is I _u Point of medium gradient, p' denotes p in the current frame I _v Back-projected coordinates of (1) _vu Correcting the parameters for the exposure time, b _vu Correcting parameters for the image brightness base; and iteratively minimizing the luminosity error through a Gauss-Newton method, and if the error is optimized and converged, taking the new frame as a key frame and adding the key frame into a sliding window for optimization.

Preferably, in step S4, each point is projected into all key frames in the inter-frame pose solution by using a sparse direct method or a direct optical flow matching method, a residual error in each frame is calculated, a sliding window composed of a plurality of system key frames is used at the back end, all key frame poses and activation points in the sliding window are all combined together to optimize a total luminosity error function, and when a front-end line judges that a frame meets a condition and is inserted into the back end, a minimum luminosity error to be optimized is defined as follows:

let the image frame set S ═ T ₁ ，T ₂ …T _j ，p ₁ ，p ₂ …p _k In which T is _j Representing the pose, p, of the keyframe j _k Representing coordinates of pixel points in a keyframe, minimizing photometric errors

The method comprises the following steps:

wherein, t _u 、t _v Is an image I _u 、I _v A corresponding exposure time;

is a collection of pixel points, p _w Is a pixel point in the set, p is I _v Middle pixel point, a _v Is I _v Exposure time correction parameter of a _u Is I _u Exposure time correction parameter of (b) _v Is I _v Luminance floor correction parameter, b _u Is I _u A luminance floor correction parameter; omega _p To compensate for the weighting parameter; p' is a constant value containing an inverse depth parameter d _p In the image I at the pixel coordinate p _v The reprojection coordinates of (c).

Preferably, the formula of the reprojection coordinate p' is as follows:

p`＝Π(RΠ ^-1 (p，d _p )+t)

r, T is a rotation matrix and a translation vector, and is transformed by a transformation matrix T _u 、T _v Calculated result, pi and pi ^-1 Respectively representing the reprojection and the inverseAnd (5) carrying out reprojection transformation.

Preferably, in step S5, an old frame and an old point are removed by using a marginalization strategy, and if a point is not in the camera view, the point is marginalized; if the number of frames in the sliding window exceeds a set threshold value, selecting one frame for marginalization; when a frame is marginalized, the map points that dominate it will be removed.

Preferably, step S5 performs closed-loop detection using pose graph optimization, adds global pose in a sliding window to maintain the connection between key frames, takes the relative pose transformation of key frames in the sliding window as the measurement data of pose pairs, based on a matching method that combines the ORB features with the original point selection strategy using the bag-of-words approach, calculates the similar transformation of matching pairs to the current key frame after selecting the loop matching pair, and adds it to the global pose graph, with the optimization to estimate the pose of the camera.

Preferably, h is used to solve similarity transformation (similarity transformation) for the matching points, and further solve rotation matrix, translation vector and scale between two coordinate systems, i.e. sim (h) transformation, so that matching sub-similarity transformation for the current key frame is S _cr Then S is _cr By minimizing a cost function E _l Obtaining:

E _l ＝ω(S _cr Π ^-1 (p，d _p )-Π ^-1 (q，d _q ))

wherein p and q represent matched pixel points in the matched sub-and key-frames, d _p 、d _q Representing the inverse depth corresponding to the pixel points p and q, and omega is a balance weight parameter.

Compared with the prior art, the invention has the beneficial effects that:

the sparse direct method applied by the invention saves the time for subsequent feature point extraction and feature point matching, and has better real-time performance. The method has better performance in weak texture regions lacking corner points, and meanwhile, the accumulated errors of rotation, translation, scale drift and the like are remarkably reduced by applying closed-loop detection. The real-time performance is better, and the robustness is stronger in the area lacking the corner points.

Drawings

FIG. 1 is a flow chart of a method for estimating a mobile pose based on a closed-loop direct sparse visual odometer according to the present embodiment;

fig. 2 is a closed-loop detection framework diagram of the present embodiment.

Detailed Description

The invention is further described below with reference to the accompanying drawings. It should be noted that the following description gives detailed embodiments and specific operation procedures, and is intended to describe the present application in detail, but the scope of the present invention is not limited to this embodiment.

As shown in fig. 1, a flow chart of a method for estimating a motion pose based on a closed-loop direct sparse visual odometer includes the following steps:

s1, acquiring a robot peripheral image sequence through a Kinect camera;

in order to further verify the effectiveness and accuracy of the invention, the scheme of the invention is transplanted to a mobile robot platform Turtlens and a Kinect camera sensor is carried. Field experiments were performed in an indoor corridor environment.

As another example, the Kinect camera may be replaced with a small-foraging smart camera depth plate as a vision sensor.

the method comprises the steps that firstly, input of a camera sensor needs to be initialized, two frames of images are needed in an initialization stage, the first frame is used as a reference frame, namely a Gaussian pyramid is constructed for the images according to a world coordinate system of a camera coordinate system where the first frame is located, gradient histograms of all pixel points in a 0 th layer of the image pyramid are counted, the number of bits is used as a gradient threshold value, points with larger gradient are extracted from the images layer by layer, a Nearest neighbor (Fast Library for application neighbor Neighbors) index is constructed for each gradient point, and 10 points closest to the gradient points are found in the same layer to construct a neighborhood relation.

As another embodiment, 20 points closest to the points can be found to construct a neighborhood relationship, so that the neighborhood range is improved, and the matching accuracy is improved.

Finding a point with the minimum distance from the current point in the nearest neighbor index of the next layer, and setting the point as an associated point, wherein the aim is to associate the current layer with the next layer, so that the propagation of an optical flow pyramid and an inverse depth value is facilitated;

let the second frame be I _v The first frame is I _u ，I _v Based on I _u And constructing an interframe luminosity error function by using the selected key points with larger gradients:

E _u，v，w ＝ω _p (I _v [p`]-a _vu I _u [p]-b _vu )

wherein ω is _p To compensate for the weighting parameter, p is I _u The point with larger gradient in the middle, p' represents that p is in the current frame I _v Back-projected coordinates of (1) _vu Correcting the parameters for the exposure time, b _vu Correcting parameters for the image brightness basis;

In the invention, the pixel points are divided into two types, one is an activation point and the other is an immature point. Often the points that have just been extracted or have unconverged depths in the image are called immature points, with a very large depth range. The depth filter continuously filters between frames, so that the depth of the point is continuously converged, and the finally obtained point is activated and added into the sliding window, and the point is called an activation point.

Step S3, resolving the relative pose change between frames based on the direct method of the minimum luminosity error and judging whether the relative pose change is a key frame;

similar to the process of establishing inter-frame associated pixel points by minimizing luminosity errors in the initialization process of a first frame and a second frame, the algorithm carries out pose calculation on each input frame of new images based on the nearest key frame, namely, the gradient points of the nearest key frame are projected to the current frame to establish a luminosity error function:

E _u，v，w ＝ω _p (I _v [p`]-a _vu I _u [p]-b _vu )

wherein ω is _p To compensate for the weighting parameter, p is I _u The point with larger gradient in the middle, p' represents that p is in the current frame I _v Back-projected coordinates of (1) _vu Correcting the parameters for the exposure time, b _vu Correcting parameters for the image brightness basis; and iteratively minimizing the luminosity error through a Gauss-Newton method, and if the error is optimized and converged, taking the new frame as a key frame and adding the key frame into a sliding window for optimization.

projecting each point to all key frames in the interframe pose calculation by using a sparse direct method, calculating residual errors in each frame, using a sliding window consisting of a plurality of system key frames at the rear end, and optimizing all key frame poses and activation points in the sliding window into a total luminosity error function in a combined manner, wherein when a front end process judges that a certain frame meets conditions and is inserted into the rear end, the minimum luminosity error to be optimized is defined as follows:

let the image frame set S ═ T ₁ ，T ₂ …T _j ，p ₁ ，p ₂ …p _k In which T is _j Representing the pose, p, of the keyframe j _k Representing coordinates of pixel points in keyframes, minimizing photometric errors

The method comprises the following steps:

wherein, t _u 、t _v As an image I _u 、I _v A corresponding exposure time;

is a collection of pixel points, p _w Is a pixel point in the set, p is I _v Middle pixel point, a _v Is I _v Exposure time correction parameter of a _u Is I _u Exposure time correction parameter of (b) _v Is shown as I _v Luminance floor correction parameter, b _u Is I _u Luminance floor correction parameter, ω _p To compensate for the weighting parameter; p' is a depth parameter d containing the inverse _p In the image I at the pixel coordinate p _v The formula p' is as follows:

p`＝Π(RΠ ^-1 (p，d _p )+t)

r, T is a rotation matrix and a translation vector, and is transformed by a transformation matrix T _u 、T _v Calculated result, pi and pi ^-1 Representing the reprojection and inverse reprojection transforms, respectively. Compared with the constraint between two frames, more constraint conditions are provided in the sliding window, and the initial value of the relative pose estimated by the front end can be optimized.

As another example, the sparse direct method may be replaced with a direct optical flow matching method.

And step S5, performing marginalization on the old key frames, and constructing a global pose graph for closed-loop detection based on the descriptors of all the key frames. Removing the old frame and the old point by adopting an marginalization strategy, and if one point is not in the camera view, marginalizing the point; if the number of frames in the sliding window exceeds a set threshold value, selecting one frame for marginalizing; when a frame is marginalized, the map points that dominate it will be removed.

Meanwhile, closed-loop detection is carried out by using pose graph optimization, global pose maintaining key frame connection is added into a sliding window, relative pose transformation of the key frames in the sliding window is taken as pose matching measurement data, a matching method combining ORB characteristics and an original point selection strategy is adopted based on a word bag method, after a loop matching sub is selected, pose constraint of the matching sub on a current key frame is calculated and added into a global pose graph, and the pose of a camera is estimated along with optimization.

As shown in fig. 2, in the optimization of the sliding window, as new key frames and points are added, the size of the sliding window will increase continuously, and eventually, the flexible and efficient function thereof is lost. In order to maintain a fixed-size window, the invention uses an marginalization strategy to cull out old frames and old points. Marginalizing a point if the point is not already within the camera field of view; if the number of frames in the sliding window exceeds a set threshold value of 7 frames (the number of frames in the sliding window can be modified into 10 frames to improve the pose estimation precision), one of the frames is selected for marginalization; when a certain frame is marginalized, the map point taking the certain frame as the leading point is removed, and does not participate in calculation, so that the pose estimation accuracy is further improved;

meanwhile, closed-loop detection is carried out by using pose graph optimization, a global pose is added in a sliding window to maintain connection between key frames, relative pose transformation of the key frames in the sliding window is taken as measurement data of pose pairing, a matching method combining ORB characteristics with an original point selection strategy is provided based on a bag-of-words method, when a loop matching sub-tree is selected, 3 pairs of matching points are used for solving similarity transformation (similarity transformation), and then a rotation matrix, a translation vector and a scale between two coordinate systems are solved, namely Sim (3) transformation is carried out, so that the similarity transformation of the matching sub-tree to the current key frame is S _cr Then S is _cr By minimizing a cost function E _l To obtain E _l ＝ω(S _cr Π ^-1 (p，d _p )-Π ^-1 (q，d _q ))

Wherein p and q represent matched pixel points in the matched sub-frame and the key frame, d _p 、d _q Representing the inverse depth corresponding to the pixel points p and q, and omega is a balance weight parameter. And mixing S _cr And adding the pose estimation result into a global pose graph, and obtaining accurate camera pose estimation along with optimization.

As another embodiment, the scheme of the invention can be transplanted to an AGV (automated Guided vehicle) with a microphone mother wheel as a research platform, namely a PIBOT Hades robot main body, and a Kinect camera sensor is mounted. Field experiments were performed in an indoor corridor environment.

In order that those skilled in the art will better understand the disclosure, the invention will be described in further detail with reference to the accompanying drawings and specific embodiments. It is to be understood that the described embodiments are merely exemplary of the invention, and not restrictive of the full scope of the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Claims

1. A mobile pose estimation method based on a closed-loop direct sparse visual odometer is characterized by comprising the following steps:

s1, acquiring a robot peripheral image sequence;

step S2, initializing based on the first two frames of images, and constructing a reference frame of a subsequent image sequence: let the second frame be I _v The first frame is I _u ，I _v Based on I _u And constructing an interframe luminosity error function by using the selected key point with the maximum gradient:

E _u，v，w ＝ω _p (I _v [p`]-a _vu I _u [p]-b _vu )

wherein ω is _p To compensate for the weighting parameter, p is I _u Point of medium gradient, p' denotes p is in the current frame I _v Back-projected coordinates of (1) _vu Correction of parameters for exposure time, b _vu Correcting parameters for the image brightness basis;

iteratively minimizing the photometric error by a gauss-newton method; if the error is optimized and not converged, judging that the initialization fails, resetting the first frame and repeating the process; otherwise, the initialization is successful, the first frame is set as a key frame, the gradient points extracted from the first frame are activated, and the gradient points are added into the optimization of the sliding window; after the second frame is initialized, entering sliding window optimization;

and step S5, performing marginalization on the old key frames, constructing a global pose graph based on descriptors of all the key frames for closed-loop detection, and estimating the pose of the robot.

2. The method for estimating the motion pose based on the closed-loop direct sparse visual odometer according to claim 1, wherein the method comprises the following steps: step S2 is further to initialize the input of the camera sensor at the front end of the odometer; the initialization needs two frames of images, the first frame is used as a reference frame, a Gaussian pyramid is constructed for the images according to a world coordinate system of a camera coordinate system where the first frame is located, gradient histograms of all pixel points in a 0 th layer of the image pyramid are counted, different gradient points are extracted from the images layer by taking the number of bits as a gradient threshold, a nearest neighbor index is constructed for each gradient point, s points closest to the points in the same layer are found to construct a neighborhood relationship, a point with the minimum distance to a current point is found in the nearest neighbor index of the next layer and is set as an associated point, and the current layer and the next layer are associated, so that the spread of an optical flow pyramid and an inverse depth value is facilitated.

3. The method for estimating the motion pose based on the closed-loop direct sparse visual odometer according to claim 2, characterized in that: the gradient point activation is divided into an activation point and an immature point, wherein the immature point is a point extracted from an image or not converged in depth; and the activated points are continuously filtered among frames through a depth filter, so that the depth of the points is continuously converged, and finally the obtained points are activated and added into a sliding window.

4. The method for estimating the motion pose based on the closed-loop direct sparse visual odometer according to claim 3, wherein the method comprises the following steps: step S3 is further to perform pose calculation based on the nearest key frame for each input frame of new image, and project the gradient points of the nearest key frame into the current frame to construct a photometric error function:

E _u，v，w ＝ω _p (I _v [p`]-a _vu I _u [p]-b _vu )；

wherein ω is _p To compensate for the weighting parameter, p is I _u Point of medium gradient, p' denotes p is in the current frame I _v Back-projected coordinates of (1) _vu Correcting the parameters for the exposure time, b _vu Correcting parameters for the image brightness basis; and iteratively minimizing the luminosity error through a Gauss-Newton method, and if the error is optimized and converged, taking the new frame as a key frame and adding the key frame into a sliding window for optimization.

5. The method for estimating the motion pose based on the closed-loop direct sparse visual odometer according to claim 4, wherein the method comprises the following steps: in step S4, each point is projected into all key frames in the inter-frame pose solution by using a sparse direct method or a direct optical flow matching method, a residual error in each frame is calculated, a sliding window composed of a plurality of system key frames is used at the back end, all key frame poses and activation points in the sliding window are all combined to optimize a total luminosity error function, and when a frame is judged to be in accordance with a condition by the front end process and inserted into the back end, a minimum luminosity error to be optimized is defined as follows:

The method comprises the following steps:

wherein, t _u 、t _v As an image I _u 、I _v A corresponding exposure time;

is a collection of pixel points, p _w Is a pixel point in the set, p is I _v Middle pixel point, a _v Is I _v Exposure time correction parameter of a _u Is I _u Exposure time correction parameter of (b) _v Is I _v Luminance floor correction parameter, b _u Is I _u A luminance floor correction parameter; omega _p To compensate for the weighting parameter; p' is a depth parameter d containing the inverse _p In the image I at the pixel coordinate p _v The reprojection coordinates of (c).

6. The method for estimating the motion pose based on the closed-loop direct sparse visual odometer according to claim 5, wherein the method comprises the following steps: the formula of the reprojection coordinate p' is as follows:

p`＝∏(R∏ ^-1 (p，d _p )+t)

wherein R, T is rotation matrix and translation vector, and is transformed by transformation matrix T _u 、T _v Calculate pi and pi ^-1 Representing the reprojection and inverse reprojection transforms, respectively.

7. The method for estimating the motion pose based on the closed-loop direct sparse visual odometer according to claim 6, wherein the method comprises the following steps: in step S5, an old frame and an old point are removed by using a marginalization strategy, and if a point is not in the camera view, the point is marginalized; if the number of frames in the sliding window exceeds a set threshold value, selecting one frame for marginalization; when a frame is marginalized, the map points that dominate it will be removed.

8. The method for estimating the motion pose based on the closed-loop direct sparse visual odometer according to claim 7, wherein the method comprises the following steps: step S5, using pose graph optimization to perform closed loop detection, adding global pose in a sliding window to maintain the connection between key frames, taking the relative pose transformation of the key frames in the sliding window as the measurement data of pose matching, based on a matching method combining ORB characteristics with an original point selection strategy by a bag-of-words method, after selecting a loop matching sub, calculating the similar transformation of the matching sub to the current key frame, adding the similar transformation to the global pose graph, and estimating the pose of the camera along with optimization.

9. The method for estimating the motion pose based on the closed-loop direct sparse visual odometer according to claim 8, wherein the method comprises the following steps:

using h to solve the similarity transformation of the matching points, further solving the rotation matrix, the translation vector and the scale between the two coordinate systems, sim (h) transformation, and leading the similarity transformation of the matching sub-pair to the current key frame to be S _cr Then S is _cr By minimizing a cost function E _l Obtaining:

E _l ＝ω(S _cr ∏ ^-1 (p，d _p )-∏ ^-1 (q，d _q ))