CN114549634A - Camera pose estimation method and system based on panoramic image - Google Patents

Camera pose estimation method and system based on panoramic image Download PDF

Info

Publication number
CN114549634A
CN114549634A CN202111634998.7A CN202111634998A CN114549634A CN 114549634 A CN114549634 A CN 114549634A CN 202111634998 A CN202111634998 A CN 202111634998A CN 114549634 A CN114549634 A CN 114549634A
Authority
CN
China
Prior art keywords
image
point
points
pixel
radius
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111634998.7A
Other languages
Chinese (zh)
Inventor
黄昊宇
王之丰
冯逸鹤
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Huanjun Technology Co ltd
Original Assignee
Hangzhou Huanjun Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Huanjun Technology Co ltd filed Critical Hangzhou Huanjun Technology Co ltd
Priority to CN202111634998.7A priority Critical patent/CN114549634A/en
Publication of CN114549634A publication Critical patent/CN114549634A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • G06T7/73Determining position or orientation of objects or cameras using feature-based methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/40Image enhancement or restoration using histogram techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/80Geometric correction
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/30Determination of transform parameters for the alignment of images, i.e. image registration
    • G06T7/33Determination of transform parameters for the alignment of images, i.e. image registration using feature-based methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/60Analysis of geometric attributes
    • G06T7/62Analysis of geometric attributes of area, perimeter, diameter or volume
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10004Still image; Photographic image
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2210/00Indexing scheme for image generation or computer graphics
    • G06T2210/61Scene description

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Geometry (AREA)
  • Studio Devices (AREA)
  • Image Processing (AREA)

Abstract

The invention belongs to the technical field of visual SLAM processing, and relates to a camera pose estimation method and system based on panoramic images, wherein the camera pose estimation method comprises the following steps: s1, collecting a panoramic image, and carrying out image preprocessing on the panoramic image; s2, carrying out distortion correction on the panoramic image after image preprocessing to obtain a corrected image; s3, extracting feature points of the corrected image by adopting an ORB feature extraction algorithm and calculating a descriptor; and S4, matching feature points of two adjacent frames of images, adding weights to the reprojection errors based on the radius of the imaging positions of the matched feature points on the two image planes and the difference between the radii to construct a loss function, and estimating the pose of the camera by minimizing the loss function. According to the method, the loss function is constructed by considering the different weights of each matched characteristic point pair, the loss function is constructed by adding weights to the original reprojection error, and the accuracy of camera pose estimation can be effectively improved.

Description

Camera pose estimation method and system based on panoramic image
Technical Field
The invention belongs to the technical field of visual SLAM processing, and relates to a camera pose estimation method and system based on a panoramic image.
Background
As robots are increasingly incorporated into everyday life, there is increasing concern about their robustness in the real world. Smaller, more powerful computers and sensors, and the combination of more efficient algorithms, have created the rise of mobile robots, such as autonomous vehicles, aerial photography drones, and search and rescue drones. For mobile robots, positioning itself is an important task, and the existing method commonly used to estimate the position of the robot is GPS, but requires wireless communication, and is not available in many environments, especially indoors, and therefore other sensors must be used to estimate the state of the robot and map the environment around it. Estimating the pose of the robot while building a map of surrounding obstacles is called simultaneous Localization and Mapping (SLAM).
An important variable of a visual system is a camera visual Field (FOV), the visual field of a traditional lens is generally less than 120 degrees, an omnidirectional camera (omni-directional camera) represented by a fisheye camera and a panoramic annular camera has a visual field of more than 180 degrees or even more, and the sensing capability of an ultra-large visual field brought by a large visual field camera is very helpful for reducing the number of sensors and reducing the size of a mobile robot. However, wide-angle images suffer from optical distortion, causing physical straight lines to appear curved in the image and the field of view gradually increases as a function of radial distance from the center of the image to the edges. Traditional simple camera models, such as pinhole models, cannot model such deformations, so traditional computer vision has been focused on perspective cameras with minimal deformation, without considering larger FOVs. However, using a wider field of view in the navigation task has many potential benefits, since visual SLAM relies more on tracking the environment around the camera, and it is intuitively desirable to have a larger field of view so that more of the environment can be seen in the instantaneous image. Therefore, panoramic cameras are an ideal device for visual SLAM.
At present, the mainstream feature point extraction mainly comprises a FAST corner, a HARRISS corner and the like, and the descriptor calculation mode comprises a BRIEF descriptor and the like, but the methods are all directed at the traditional image imaged by a pinhole model. For panoramic images, a robust and effective extraction method is still lacking at present.
The conventional SLAM technology mainly uses a conventional camera, and the imaging model mainly uses a pinhole imaging model for detection, for example, a two-dimensional panoramic image acquisition method disclosed in patent document No. CN110197455A and a garbage collection robot based on visual semantic SLAM disclosed in patent document No. CN 111360780A. The traditional feature point extraction and loss function calculation are usually based on the design of a pinhole model, and for a panoramic image with large distortion, the traditional feature point descriptor usually causes matching failure. Since image distortion may cause failure of the feature matching algorithm and the image of the panoramic camera may cause poor robustness due to matching difficulty, there are few methods of using the panoramic camera image as the input of the visual SLAM, which results in that the panoramic annular camera cannot be applied to the SLAM technology as a large-field-of-view low-cost sensor.
Disclosure of Invention
Based on overcoming the defects in the prior visual SLAM technology and simultaneously exerting the view field advantage of the panoramic annular camera, the invention aims to provide a camera pose estimation method and system based on a panoramic image, so as to solve the current situation that the feature points of the panoramic image are difficult to extract and accurately estimate the camera pose by utilizing the proposed loss function.
In order to achieve the purpose, the invention adopts the following technical scheme:
a camera pose estimation method based on a panoramic image comprises the following steps:
s1, collecting a panoramic image, and carrying out image preprocessing on the panoramic image;
s2, distortion correction is carried out on the panoramic image after image preprocessing to obtain a corrected image;
s3, extracting feature points of the corrected image by adopting an ORB feature extraction algorithm and calculating a descriptor;
and S4, matching feature points of two adjacent frames of images, adding weights to the reprojection errors based on the radius of the imaging positions of the matched feature points on the two image planes and the difference between the radii to construct a loss function, and estimating the pose of the camera by minimizing the loss function.
Preferably, in step S1, the image preprocessing is histogram equalization.
Preferably, in step S2, the distortion correction process includes:
let P be the coordinate (x, y, z)TU is the projection of P on the image plane, and the coordinates are (U, v)T,(x,y)TAnd (u, v)TProportional, the following relationship is satisfied:
Figure BDA0003435322040000031
wherein f isb(ρ)=α02·ρ2+…+αN·ρN
Figure BDA0003435322040000032
α0、α2、…、αNIs a polynomial coefficient;
using a back-projection function pi-1(u) the mapping relation from the image coordinates to the corresponding three-dimensional coordinates of the object points is as follows:
P=π-1(U)=λ-1·g(u)
Figure BDA0003435322040000033
accordingly, the projection function is:
U=π(P)=fp(θ)·h(P)
wherein the content of the first and second substances,
Figure BDA0003435322040000034
fp(θ)=β02·θ2+…+βN·θN
Figure BDA0003435322040000035
obtaining a calibration coefficient fb(p) and fp(θ) to obtain a projection function or a back-projection function to achieve distortion correction of the panoramic image.
Preferably, in step S3, the feature point extraction process includes:
setting the search radius of edge pixel points of a corrected image as 2, the actual radius as R, and the product of the two as 2R; correspondingly, for other pixel points in the corrected image, the search radius is
Figure BDA0003435322040000036
Wherein r' is the actual radius of the pixel point; the actual radius is defined as the distance from the pixel point to the central point of the image;
the center of the panoramic annular camera has an actual radius r0So that the minimum actual radius corresponds to a search radius of
Figure BDA0003435322040000041
Rounded down to a maximum search radius of
Figure BDA0003435322040000042
The number of pixels used for comparison is 4 · (r)max+ 1); correspondingly, rounding down is carried out on other search radiuses;
if the number of pixels used for comparison within the search radius is less than 4 · (r)max+1), the target pixel is inserted by linear interpolation, so that the number of pixels used for comparison within the search radius is 4 (r)max+1) of;
each pixel point is taken as a central point to be 4 (r) around the pixel pointmax+1) pixel points for comparison are compared; if 3 · (r) is continuousmax+1) absolute differences between the pixel points used for comparison and the pixel of the central pointAnd if the pair values are all larger than the set threshold value, extracting the corresponding central points as the characteristic points.
Preferably, in step S3, after the feature points are extracted, descriptions of scale and rotation are added to the feature points by using a grayscale centroid method.
Preferably, in step S3, the pixel value of the target pixel point interpolated by linear interpolation is an average value of the pixel values of the left and right pixel points.
Preferably, in step S3, the descriptor calculation of the feature point includes:
setting the actual radius of the feature point as r, taking the feature point as the center, respectively selecting eight angular points of upper, lower, left, right, upper left, upper right, lower left and lower right on the circle with the search radius of 2, starting from each angular point to
Figure BDA0003435322040000043
Taking eight pixel points along the ray direction from the angular point to the central point of the image according to the angular point position and the step length for pixel value comparison so as to calculate a BRIEF descriptor;
by analogy, when the search radius is 3, 4 and 5, the BRIEF descriptor calculation process is executed, and the feature point corresponds to 256 sets of binary strings as the descriptor of the feature point.
Preferably, in step S4, the feature point matching includes:
and carrying out descriptor Hamming distance calculation on the two adjacent frames of images one by one, and selecting the feature points with the Hamming distance smaller than a target threshold value as matching feature points.
Preferably, in step S4, the constructing of the loss function includes:
the original reprojection error obtained from the gray scale difference is
Figure BDA0003435322040000044
Wherein the content of the first and second substances,
Figure BDA0003435322040000045
and
Figure BDA0003435322040000046
f (-) represents the function operation of the brightness difference, and n is the number of the matched characteristic point pairs;
the actual radiuses of the matched ith characteristic point pair in the two adjacent frames of images are respectively set as
Figure BDA0003435322040000047
And
Figure BDA0003435322040000048
the weighting factor is then:
Figure BDA0003435322040000051
accordingly, adding weight to the original reprojection error yields a loss function as:
Figure BDA0003435322040000052
the invention also provides a camera pose estimation system based on panoramic images, which applies the camera pose estimation method according to any one of the above schemes, and the camera pose estimation system comprises:
the image acquisition module is used for acquiring a panoramic image;
the image preprocessing module is used for preprocessing the panoramic image;
the distortion correction module is used for carrying out distortion correction on the panoramic image after image preprocessing to obtain a corrected image;
the characteristic point extraction module is used for extracting characteristic points of the corrected image by adopting an ORB characteristic extraction algorithm;
the descriptor computation module is used for performing descriptor computation on the extracted feature points;
the characteristic point matching module is used for matching the characteristic points of two adjacent frames of images;
the loss function building module is used for adding weight to the reprojection error based on the radius of the imaging position of the matched feature point on the two image planes and the difference of the radii so as to build a loss function;
a camera pose estimation module to estimate a camera pose by minimizing a loss function.
Compared with the prior art, the invention has the beneficial effects that:
(1) the feature point extraction fully considers a panoramic annular lens imaging model and distortion introduced by the panoramic annular lens imaging model, different extraction radiuses are adopted from different positions from the edge to the center, the extraction of the traditional BRIEF descriptor is simplified, the random extraction is changed into the extraction at intervals in sequence, and the descriptor calculation process is accelerated.
(2) The loss function calculation method designed by the invention better utilizes the good imaging quality of the lens edge field of the panoramic annular zone, and utilizes the characteristic points with smaller position difference on the two images as far as possible, so as to process the distortion influence introduced by the panoramic camera as little as possible.
(3) According to the method, the loss function is constructed by considering the different weights of each matched characteristic point pair, the loss function is constructed by adding weights to the original reprojection error, and the accuracy of camera pose estimation can be effectively improved.
(4) The method designed by the invention can effectively utilize the strong perception capability of the panoramic annular camera to the surrounding environment, thereby obviously improving the positioning speed of the visual SLAM, enhancing the purpose of the system and improving the overall detection efficiency.
Drawings
FIG. 1 is a flow chart of a panoramic image based camera pose estimation method of an embodiment of the invention;
FIG. 2 is a schematic view of a panoramic annular camera imaging model according to an embodiment of the invention;
FIG. 3 is a flow chart of feature point extraction according to an embodiment of the present invention;
fig. 4 is an architecture diagram of a panoramic image-based camera pose estimation system according to an embodiment of the present invention.
Detailed Description
In order to more clearly illustrate the embodiments of the present invention, the following description will explain the embodiments of the present invention with reference to the accompanying drawings. It is obvious that the drawings in the following description are only some examples of the invention, and that for a person skilled in the art, other drawings and embodiments can be derived from them without inventive effort.
As shown in fig. 1, the method for estimating the pose of a camera based on a panoramic image according to the embodiment of the present invention includes the following steps:
s1, collecting a panoramic image, and carrying out image preprocessing on the panoramic image;
acquiring a panoramic image by using a panoramic annular camera;
the image preprocessing process comprises the following steps: and inputting a panoramic image from the panoramic annular camera, and performing histogram equalization to improve the contrast of the image and facilitate the extraction of subsequent feature points.
S2, distortion correction is carried out on the panoramic image after image preprocessing to obtain a corrected image;
and (3) carrying out panoramic image distortion correction on the image after the histogram equalization, wherein the distortion correction is carried out based on a Taylor model proposed by Scaramuzza et al. As shown in FIG. 2, P is the coordinate (x, y, z)TU is the projection of P on the image plane, and the coordinates are (U, v)T,(x,y)TAnd (u, v)TProportional, the following relationship is satisfied:
Figure BDA0003435322040000071
wherein f isb(ρ)=α02·ρ2+…+αN·ρN
Figure BDA0003435322040000072
α0、α2、…、αNIs a polynomial coefficient;
thus, using back projection of π-1(U), the mapping relation from the image coordinate U to the corresponding object point three-dimensional coordinate (2D to 3D) can be obtained, and the expression is as follows:
P=π-1(U)=λ-1·g(u)
Figure BDA0003435322040000073
accordingly, the projection function can be expressed as follows:
U=π(P)=fp(θ)·h(P)
wherein the content of the first and second substances,
Figure BDA0003435322040000074
fp(θ)=β02·θ2+…+βN·θN
Figure BDA0003435322040000075
thus, when f is obtained by calibrationb(p) and fpAnd (theta) after one of the calibration coefficients is obtained, a complete projection and back projection model can be obtained, and the distortion correction of the panoramic image is completed.
S3, extracting feature points of the corrected image by adopting an ORB feature extraction algorithm and calculating a descriptor;
the specific process of feature point extraction comprises the following steps:
carrying out feature point extraction on the corrected image after histogram equalization and correction by using the panoramic ORB features provided by the embodiment of the invention; the panoramic ORB Features are mainly composed of panoramic organized FAST key points and modified panoramic image-based BREIF (Binary Robust Independent Elementary Features) descriptors.
Specifically, the traditional FAST feature points are extracted according to a pinhole model, and are used for detecting places with obvious local pixel gray scale changes, and the flow mainly includes: a pixel point p is selected in the image and the luminance value (i.e. pixel value) is assumed to be Ip(ii) a Then, a threshold value T (brightness value, for example, I can be selected) is setp30% of); then, taking the pixel point p as a center, and selecting 16 pixel points on a circle with the radius of 3; assuming that the brightness of N successive points on the selected circle is greater than Ip+ T or less than IpT, then the pixel p is considered a feature point (e.g., when N takes 12, it is considered as 12 consecutive points, FAST-12); and executing the four-step circular operation on all the pixel points to finish the FAST characteristic point extraction of the whole image.
For the panoramic image with the ultra-large view field in the embodiment of the invention, the imaging model is not suitable for a pinhole model any more, and the corresponding feature point extraction method also needs to be corrected. According to the imaging model of the panoramic annular camera, different u and v on an image surface correspond to different positions on a spherical surface, and further correspond to different view fields in a space. And u and v with the same distance on the image surface, the closer to the center of the image surface, the more corresponding scene information in space is according to the Taylor model used for imaging. In other words, if at a certain moment (u) is away from the center of the image plane1,v1) A spatial element is observed, which at the next moment is imaged close to the centre of the circle (u) due to the rotation of the panoramic camera2,v2) Where, then, is (u)2,v2) The image of (a) will be stretched to some extent. Therefore, for feature point extraction, modification is made according to the imaging model. According to fb(ρ)=α02·ρ2+…+αN·ρN
Figure BDA0003435322040000081
The model can know the radius r of the imaging plane as a parameter, here
Figure BDA0003435322040000082
Take two radial positions r1And r2Then the difference in height in space is (α)2·r2 2+…+αN·r2 N)-(α2·r1 2+…+αN·r1 N) Approximately satisfy
Figure BDA0003435322040000083
Figure BDA0003435322040000084
Since Δ r is a relatively small quantity, the lower order term, i.e., 2 α, can be taken as the main factor2·r1Δ r, in this interior α2Is a calibration parameter and can be ignored, so the main factor influencing the space height difference is the radius r1Therefore, when extracting feature points, the radius is taken as a parameter, and a position with a small radius corresponds to a field of view with a small space. The radius search range is designed to ensure the same range of the corresponding space, namely the radius r needs to be searchedsThe product with the current radius is a constant. Meanwhile, considering that a large search radius corresponds to more candidate points, the pyramid scaling is performed on the points on the search radius of the edge pixel points, which is a similar linear interpolation method to ensure that the comparison times of all feature points are the same when comparing the threshold values.
The method comprises the following specific steps: setting the search radius of the edge pixel point of the image as 2 pixels, and setting the actual edge radius of the edge pixel point as R pixels, wherein the product is 2R; corresponding to radius r1The search radius of the pixel point of (1) is
Figure BDA0003435322040000091
The consistency of the scales of all the pixel points is ensured;
the center of the panoramic annular camera has a radius r0So that the minimum radius corresponds to a search radius of
Figure BDA0003435322040000092
If rounding down corresponds to a maximum search radius of
Figure BDA0003435322040000093
The number of pixels used for comparison is 4 · (r)min+1), then for other search radii,the same rounding down is also done if less than 4 (r) points are found for comparisonminAnd +1), inserting a pixel point at the position of each edge at equal interval on the search radius by using a linear interpolation mode (for example, three points need to be inserted, each edge is divided into four equal parts, and the pixel value is taken as the average value of the left and right pixel points for the convenience of calculation.
After interpolation in the above manner, all pixels will have 4 (r) to the surroundingmin+1) pixels for comparison, if 3 (r) in successionminAnd +1) pixel differences between the pixel points for comparison and the central pixel are all larger than a set threshold value T, and the point is considered to be a panoramic feature point.
Meanwhile, the direction of the characteristic point of each image block is calculated according to a gray scale centroid method, which specifically comprises the following steps:
in a small image block B, the moment of the image block is defined as mpq=∑x,y∈BxpyqI (x, y), p, q ═ 0,1 }; the centroid of the image block can be found by the moment
Figure BDA0003435322040000094
Connecting the geometric center O and the centroid C of the image block to obtain a direction vector
Figure BDA0003435322040000095
The direction of the feature point may be defined as
Figure BDA0003435322040000096
Thus, a description of scale and rotation is added to the panoramic feature points.
The flow of feature point extraction according to the embodiment of the present invention is shown in fig. 3.
In addition, the descriptor computation of the embodiment of the present invention includes:
after extracting the full scene feature points, calculating a descriptor of each panoramic feature point, wherein the conventional BRIEF descriptor is to take 128 pairs of random pixels p and q around the feature points, if p is greater than q, then take 1, otherwise take 0, thus obtaining a 128-dimensional vector composed of 0 and 1, which is a binary expression descriptor for image matching. In the conventional pinhole imaging model, the BRIEF descriptor selected at any position is 128 pairs of random pixels p and q at the same position around the BRIEF descriptor, and obviously, for the distortion of the panoramic image, the conventional method of using the random pixels at the same position at any position is not applicable, so that the BRIEF descriptor is modified here to provide the panoramic BRIEF descriptor.
Because the search radius of the edge pixel point is smaller, interpolation is carried out during searching, and the search radius of the central point is larger, when the panorama BRIEF descriptor is calculated, random pixels at the same position can not be adopted at the edge and the center of the image surface. Therefore, in the embodiment of the present invention, different intervals are adopted for comparison according to different image surface radii R, similarly, the edge radius is denoted as R, the radius of the current feature point is denoted as R, eight corner points, namely, upper, lower, left, right, upper left, upper right, lower left and lower right, are respectively selected on a circumference with the radius of 2 by using the feature point as a center, and starting from each corner point, the eight corner points are used
Figure BDA0003435322040000101
For the step length (the floor represents rounding-down), eight pixel points are taken along the ray direction from the corner point to the central point of the image according to the corner point position and the step length, namely, the eight pixel points are respectively compared with the following eight points in the transverse direction, the longitudinal direction and the 45-degree direction according to the corner point position (for example, the upper point is downwards compared with the following eight points, the upper point on the right is downwards compared with the following eight points on the left, the lower point on the right is upwards compared with the following eight points on the left, the step length represents the comparison interval, if the step length is 1, the continuous taking is carried out, the step length is 2, the taking of one pixel point at the interval is carried out, and the like in sequence);
a similar operation is then performed for radii 3, 4, 5, which results in 256 binary strings for each feature point as descriptors of the feature points.
The descriptor calculation method provided by the embodiment of the invention considers that the image surface center position actually corresponds to a larger space view field, stretches the calculation range of the descriptor, and has better adaptability to the distortion of the panoramic image.
And S4, matching feature points of two adjacent frames of images, adding weights to the reprojection errors based on the radius of the imaging positions of the matched feature points on the two image planes and the difference between the radii to construct a loss function, and estimating the pose of the camera by minimizing the loss function.
After the feature points and the descriptor are extracted, feature point matching is performed, specifically, hamming distance calculation of the descriptor is performed on two images one by one, and the feature points with the hamming distance smaller than a set threshold value are selected as matching feature points.
After the feature point matching is completed, the re-projection error is calculated, and the loss function of the re-projection error is correspondingly designed in the embodiment of the invention.
In the imaging process of the panoramic annular camera, the same radial interval is arranged on an image surface, the actual space view field occupied by the edge pixel points is small, but the same pixel number is occupied on a sensor, so that the imaging quality of the edge pixel points on the image surface is considered to be superior to that of a central point, and therefore when feature point matching is carried out, the edge pixel points need to be more prone to being believed and large weight needs to be given to the edge pixel points. Meanwhile, if the radial positions of a pair of matched feature points on the two graphs are far apart, the pair of feature points tend to be considered to be greatly displaced and relatively unreliable, and a small weight is given to the pair of matched feature points. When the features are matched, the reprojection error of the three-dimensional points is generally minimized, and the reprojection error is calculated by using the matched feature points. The loss function designed by the embodiment of the invention adds weight to the radius of the feature point according to the imaging positions of the feature point on the two image surfaces and the difference between the radii, and if the radius of a matched feature on the image 1 is r1Radius on image 2 is r2And recording the radius of the edge of the image as R and the minimum imaging radius as R0(i.e., the center dead zone is r0) The weight factor can be recorded as
Figure BDA0003435322040000111
Using this weight factorBy designing the loss function, the edge feature points can be better utilized, and smaller feature points can be moved on the two frame images.
The loss function of the embodiment of the invention is constructed as follows:
the original reprojection error obtained from the gray scale difference is
Figure BDA0003435322040000112
Wherein the content of the first and second substances,
Figure BDA0003435322040000113
and
Figure BDA0003435322040000114
for the matched ith feature point pair, the luminance values in the two adjacent frame images respectively, f (-) represents a functional operation on the luminance difference, and can be a linear functional relation or a quadratic functional relation, for example; n is the number of matched characteristic point pairs;
the actual radiuses of the matched ith characteristic point pair in the two adjacent frames of images are respectively set as
Figure BDA0003435322040000115
And
Figure BDA0003435322040000116
the weighting factor is then:
Figure BDA0003435322040000117
accordingly, adding weight to the original reprojection error yields a loss function as:
Figure BDA0003435322040000118
according to the embodiment of the invention, the loss function is constructed by considering that the weight of each matched characteristic point pair is different, so that the loss function is constructed by adding the weight to the original reprojection error, and the accuracy of the subsequent camera pose estimation can be effectively improved.
The embodiment of the invention also provides a camera pose estimation system based on the panoramic image, which comprises an image acquisition module, an image preprocessing module, a distortion correction module, a feature point extraction module, a descriptor calculation module, a feature point matching module, a loss function construction module and a camera pose estimation module, as shown in fig. 4.
The image acquisition module is used for acquiring panoramic images. For example, a panoramic image is captured using a panoramic annular camera.
The image preprocessing module is used for carrying out image preprocessing on the panoramic image. The image preprocessing process comprises the following steps: and inputting a panoramic image from the panoramic annular camera, and performing histogram equalization to improve the contrast of the image and facilitate the extraction of subsequent feature points.
And the distortion correction module is used for carrying out distortion correction on the panoramic image after the image preprocessing to obtain a corrected image. Specifically, panoramic image distortion correction is performed on the image after histogram equalization, and the distortion correction is performed based on the taylor model proposed by Scaramuzza et al. As shown in FIG. 2, P is the coordinate (x, y, z)TU is the projection of P on the image plane, and the coordinates are (U, v)T,(x,y)TAnd (u, v)TProportional, the following relationship is satisfied:
Figure BDA0003435322040000121
wherein f isb(ρ)=α02·ρ2+…+αN·ρN
Figure BDA0003435322040000122
α0、α2、…、αNIs a polynomial coefficient;
thus, using back projection of π-1(U), the mapping relation from the image coordinate U to the corresponding object point three-dimensional coordinate (2D to 3D) can be obtained, and the expression is as follows:
P=π-1(U)=λ-1·g(u)
Figure BDA0003435322040000123
accordingly, the projection function can be expressed as follows:
U=π(P)=fp(θ)·h(P)
wherein the content of the first and second substances,
Figure BDA0003435322040000131
fp(θ)=β02·θ2+…+βN·θN
Figure BDA0003435322040000132
thus, when f is obtained by calibrationb(p) and fpAnd (theta) after one of the calibration coefficients is obtained, a complete projection and back projection model can be obtained, and the distortion correction of the panoramic image is completed.
And the characteristic point extraction module is used for extracting the characteristic points of the corrected image by adopting an ORB characteristic extraction algorithm. Specifically, the specific process of feature point extraction includes:
carrying out feature point extraction on the corrected image after histogram equalization and correction by using the panoramic ORB features provided by the embodiment of the invention; the panoramic ORB Features are mainly composed of panoramic organized FAST key points and modified panoramic image-based BREIF (Binary Robust Independent Elementary Features) descriptors.
Specifically, the traditional FAST feature points are extracted according to a pinhole model, and are used for detecting places with obvious local pixel gray scale changes, and the flow mainly includes: a pixel point p is selected in the image and the luminance value (i.e. pixel value) is assumed to be Ip(ii) a Then, a threshold value T (brightness value, for example, I can be selected) is setp30%) of; then, taking the pixel point p as a center, and selecting 16 pixel points on a circle with the radius of 3; assuming that there are links on the selected circleThe brightness of continuous N points is greater than Ip+ T or less than IpT, then the pixel p is considered a feature point (e.g., when N takes 12, it is considered as 12 consecutive points, FAST-12); and executing the four-step circular operation on all the pixel points to finish the FAST characteristic point extraction of the whole image.
For the panoramic image with the ultra-large view field in the embodiment of the invention, the imaging model is not suitable for a pinhole model any more, and the corresponding feature point extraction method also needs to be corrected. According to the imaging model of the panoramic annular camera, different u and v on an image surface correspond to different positions on a spherical surface, and further correspond to different view fields in a space. And u and v with the same distance on the image surface, the closer to the center of the image surface, the more corresponding scene information in space is according to the Taylor model used for imaging. In other words, if at a certain moment (u) is away from the center of the image plane1,v1) A spatial element is observed, which at the next moment is imaged close to the centre of the circle (u) due to the rotation of the panoramic camera2,v2) Where, then, is (u)2,v2) The image of (a) will stretch to some extent. Therefore, for feature point extraction, modification is made according to the imaging model. According to fb(ρ)=α02·ρ2+…+αN·ρN
Figure BDA0003435322040000141
The model can know the radius r of the imaging plane as a parameter, here
Figure BDA0003435322040000142
Take two radial positions r1And r2Then the difference in height in space is (α)2·r2 2+…+αN·r2 N)-(α2·r1 2+…+αN·r1 N) Approximately satisfy
Figure BDA0003435322040000143
Figure BDA0003435322040000144
Since Δ r is a relatively small quantity, the lower order term, i.e., 2 α, can be taken as the main factor2·r1Δ r, herein α2Is a calibration parameter and can be ignored, so the main factor influencing the space height difference is the radius r1Therefore, when extracting feature points, the radius is taken as a parameter, and a position with a small radius corresponds to a field of view with a small space. The radius search range is designed to ensure the same range of the corresponding space, namely the radius r needs to be searchedsThe product with the current radius is a constant. Meanwhile, considering that a large search radius can correspond to more candidate points, the pyramid scaling is performed on the points on the search radius of the edge pixel points, which is a similar linear interpolation method to ensure that the comparison times when all the feature points are compared with the threshold are the same.
The method comprises the following specific steps: setting the search radius of the edge pixel point of the image as 2 pixels, and setting the actual edge radius of the edge pixel point as R pixels, wherein the product is 2R; corresponding to radius r1The search radius of the pixel point of (1) is
Figure BDA0003435322040000145
The consistency of the scales of all the pixel points is ensured;
the center of the panoramic annular camera has a radius r0So that the minimum radius corresponds to a search radius of
Figure BDA0003435322040000146
If rounding down corresponds to a maximum search radius of
Figure BDA0003435322040000147
The number of pixels used for comparison is 4 · (r)min+1), then the same rounding-down is done for other search radii as well, if less than 4 (r) points are found for comparisonmin+1, then use linear interpolation mode to search each edge equal interval position on radius (for example, need to insertAnd entering three points, each side is divided into four parts, and inserted into the points of the equal division) to insert a pixel point, and the pixel value is taken as the average value of the left pixel point and the right pixel point for convenient calculation.
After interpolation in the above manner, all pixels will have 4 (r) to the surroundingmin+1) pixels for comparison, if 3 (r) in successionmin+1) pixel differences between the pixel points for comparison and the central pixel are all larger than a set threshold value T, and the point is considered to be a panoramic feature point; otherwise, the feature points are not panoramic and are eliminated.
If the feature points are considered as panoramic feature points, calculating the direction of the feature points of each image block according to a gray centroid method, specifically as follows:
in a small image block B, the moment of the image block is defined as mpq=∑x,y∈BxpyqI (x, y), p, q ═ 0,1 }; the centroid of the image block can be found by the moment
Figure BDA0003435322040000151
Connecting the geometric center O and the centroid C of the image block to obtain a direction vector
Figure BDA0003435322040000152
The direction of the feature point can be defined as
Figure BDA0003435322040000153
Thus, a description of scale and rotation is added to the panoramic feature points.
The flow of feature point extraction according to the embodiment of the present invention is shown in fig. 3.
And the descriptor computation module is used for performing descriptor computation on the extracted feature points. Specifically, the descriptor computation of the embodiment of the present invention includes:
after extracting the full scene feature points, calculating a descriptor of each panoramic feature point, wherein the conventional BRIEF descriptor is to take 128 pairs of random pixels p and q around the feature points, if p is greater than q, then take 1, otherwise take 0, thus obtaining a 128-dimensional vector composed of 0 and 1, which is a binary expression descriptor for image matching. In the conventional pinhole imaging model, the BRIEF descriptor selected at any position is 128 pairs of random pixels p and q at the same position around the BRIEF descriptor, and obviously, for the distortion of the panoramic image, the conventional method of using the random pixels at the same position at any position is not applicable, so that the BRIEF descriptor is modified here to provide the panoramic BRIEF descriptor.
Because the search radius of the edge pixel point is smaller, interpolation is carried out during searching, and the search radius of the central point is larger, when the panorama BRIEF descriptor is calculated, random pixels at the same position can not be adopted at the edge and the center of the image surface. Therefore, in the embodiment of the present invention, different intervals are adopted for comparison according to different image surface radii R, similarly, the edge radius is denoted as R, the radius of the current feature point is denoted as R, eight corner points, namely, upper, lower, left, right, upper left, upper right, lower left and lower right, are respectively selected on a circumference with the radius of 2 by using the feature point as a center, and starting from each corner point, the eight corner points are used
Figure BDA0003435322040000154
For the step length (the floor represents rounding-down), eight pixel points are taken along the ray direction from the corner point to the central point of the image according to the corner point position and the step length, namely, the eight pixel points are respectively compared with the following eight points in the transverse direction, the longitudinal direction and the 45-degree direction according to the corner point position (for example, the upper point is downwards compared with the following eight points, the upper point on the right is downwards compared with the following eight points on the left, the lower point on the right is upwards compared with the following eight points on the left, the step length represents the comparison interval, if the step length is 1, the continuous taking is carried out, the step length is 2, the taking of one pixel point at the interval is carried out, and the like in sequence);
a similar operation is then performed for radii 3, 4, 5, which results in 256 binary strings for each feature point as descriptors of the feature points.
The descriptor calculation method provided by the embodiment of the invention considers that the image surface center position actually corresponds to a larger space view field, stretches the calculation range of the descriptor, and has better adaptability to the distortion of the panoramic image.
The characteristic point matching module is used for matching the characteristic points of two adjacent frames of images. Specifically, the hamming distance of descriptors is calculated for two images one by one, and the feature point with the hamming distance smaller than a set threshold value is selected as a matching feature point.
And the loss function building module is used for adding weight to the reprojection error based on the radius and the difference of the radius of the imaging positions of the matched characteristic points on the two image planes respectively so as to build a loss function. Specifically, in the imaging process of the panoramic annular camera, the same radial interval is arranged on the image surface, the actual space view field occupied by the edge pixel points is small, but the same pixel number is occupied on the sensor, so that the imaging quality of the edge pixel points on the image surface is considered to be superior to that of the central point, and when feature point matching is carried out, the edge pixel points need to be more prone to being believed, and the edge pixel points need to be endowed with large weight. Meanwhile, if the radial positions of a pair of matched feature points on the two graphs are far apart, the pair of feature points tend to be considered to be greatly displaced and relatively unreliable, and a small weight is given to the pair of matched feature points. When the features are matched, the reprojection error of the three-dimensional points is generally minimized, and the reprojection error is calculated by using the matched feature points. The loss function designed by the embodiment of the invention adds weight to the radius of the feature point according to the imaging positions of the feature point on the two image surfaces and the difference between the radii, and if the radius of a matched feature on the image 1 is r1Radius on image 2 is r2And recording the radius of the edge of the image as R and the minimum imaging radius as R0(i.e., the center dead zone is r0) The weight factor can be recorded as
Figure BDA0003435322040000161
By designing the loss function by using the weight factor, the edge feature points can be better utilized, and the feature points with smaller movement on the two frame images can be better utilized.
The loss function of the embodiment of the invention is constructed as follows:
the original reprojection error obtained from the gray scale difference is
Figure BDA0003435322040000171
Wherein the content of the first and second substances,
Figure BDA0003435322040000172
and
Figure BDA0003435322040000173
for the matched ith feature point pair, the luminance values in the two adjacent frame images respectively, f (-) represents a functional operation on the luminance difference, and can be a linear functional relation or a quadratic functional relation, for example; n is the number of matched characteristic point pairs;
the actual radiuses of the matched ith characteristic point pair in the two adjacent frames of images are respectively set as
Figure BDA0003435322040000174
And
Figure BDA0003435322040000175
the weighting factor is then:
Figure BDA0003435322040000176
accordingly, adding weight to the original reprojection error yields a loss function as:
Figure BDA0003435322040000177
according to the embodiment of the invention, the loss function is constructed by considering that the weight of each matched characteristic point pair is different, so that the loss function is constructed by adding the weight to the original reprojection error, and the accuracy of the subsequent camera pose estimation can be effectively improved.
And the camera pose estimation module is used for estimating the camera pose by minimizing the loss function so as to realize fine estimation of the camera pose.
The foregoing has outlined rather broadly the preferred embodiments and principles of the present invention and it will be appreciated that those skilled in the art may devise variations of the present invention that are within the spirit and scope of the appended claims.

Claims (10)

1. A camera pose estimation method based on a panoramic image is characterized by comprising the following steps:
s1, collecting a panoramic image, and carrying out image preprocessing on the panoramic image;
s2, distortion correction is carried out on the panoramic image after image preprocessing to obtain a corrected image;
s3, extracting feature points of the corrected image by adopting an ORB feature extraction algorithm and calculating a descriptor;
and S4, matching feature points of two adjacent frames of images, adding weights to the reprojection errors based on the radius of the imaging positions of the matched feature points on the two image planes and the difference between the radii to construct a loss function, and estimating the pose of the camera by minimizing the loss function.
2. The method according to claim 1, characterized in that in step S1, the image preprocessing is histogram equalization.
3. The panoramic image-based camera pose estimation method according to claim 1, wherein in the step S2, the distortion correction process comprises:
let P be the coordinate (x, y, z)TU is the projection of P on the image plane, and the coordinates are (U, v)T,(x,y)TAnd (u, v)TProportional, the following relationship is satisfied:
Figure FDA0003435322030000011
wherein f isb(ρ)=α02·ρ2+…+αN·ρN
Figure FDA0003435322030000012
α0、α2、…、αNIs a polynomial coefficient;
using a back-projection function pi-1(u) the mapping relation from the image coordinates to the corresponding three-dimensional coordinates of the object points is as follows:
P=π-1(U)=λ-1·g(u)
Figure FDA0003435322030000013
accordingly, the projection function is:
U=π(P)=fp(θ)·h(P)
wherein the content of the first and second substances,
Figure FDA0003435322030000014
fp(θ)=β02·θ2+…+βN·θN
Figure FDA0003435322030000021
obtaining a calibration coefficient fb(p) and fp(θ) to obtain a projection function or a back-projection function to achieve distortion correction of the panoramic image.
4. The method according to claim 1, wherein in step S3, the process of extracting the feature points includes:
setting the search radius of edge pixel points of a corrected image as 2, the actual radius as R, and the product of the two as 2R; correspondingly, for other pixel points in the corrected image, the search radius is
Figure FDA0003435322030000022
Wherein r' isThe actual radius of the pixel; the actual radius is defined as the distance from the pixel point to the central point of the image;
the center of the panoramic annular camera has an actual radius r0So that the minimum actual radius corresponds to a search radius of
Figure FDA0003435322030000023
Rounded down to a maximum search radius of
Figure FDA0003435322030000024
The number of pixels used for comparison is 4 · (r)max+ 1); correspondingly, rounding down is carried out on other search radiuses;
if the number of pixels used for comparison within the search radius is less than 4 · (r)max+1), the target pixel is inserted by linear interpolation, so that the number of pixels used for comparison within the search radius is 4 (r)max+1) of;
each pixel point is taken as a central point to be 4 (r) around the pixel pointmax+1) pixel points for comparison are compared; if 3 · (r) is continuousmax+1) pixel differences between the pixel points for comparison and the center point are all greater than a set threshold, and then the corresponding center point is extracted as a feature point.
5. The method for estimating the pose of a camera based on a panoramic image according to claim 4, wherein in the step S3, after the feature points are extracted, the description of the scale and the rotation is added to the feature points by using a gray centroid method.
6. The method according to claim 4, wherein in step S3, the pixel value of the target pixel point interpolated by linear interpolation is an average of the pixel values of the left and right pixel points.
7. The panoramic image-based camera pose estimation method according to claim 4 or 5, wherein in the step S3, the descriptor calculation of the feature points includes:
setting the actual radius of the feature point as r, taking the feature point as the center, respectively selecting eight angular points of upper, lower, left, right, upper left, upper right, lower left and lower right on the circle with the search radius of 2, starting from each angular point to
Figure FDA0003435322030000031
Taking eight pixel points along the ray direction from the angular point to the central point of the image according to the angular point position and the step length for pixel value comparison so as to calculate a BRIEF descriptor;
by analogy, when the search radius is 3, 4 and 5, the BRIEF descriptor calculation process is executed, and the feature point corresponds to 256 sets of binary strings as the descriptor of the feature point.
8. The panoramic image-based camera pose estimation method according to claim 7, wherein in the step S4, the feature point matching includes:
and carrying out descriptor Hamming distance calculation on the two adjacent frames of images one by one, and selecting the feature points with the Hamming distance smaller than a target threshold value as matching feature points.
9. The panoramic image-based camera pose estimation method according to claim 8, wherein in the step S4, the constructing of the loss function includes:
the original reprojection error obtained from the gray scale difference is
Figure FDA0003435322030000032
Wherein the content of the first and second substances,
Figure FDA0003435322030000033
and
Figure FDA0003435322030000034
for the matched ith characteristic point pair, the brightness of the characteristic point pair in the two adjacent frame images is respectivelyThe value of the value f (·) represents the function operation on the brightness difference, and n is the number of matched feature point pairs;
the actual radiuses of the matched ith characteristic point pair in the two adjacent frames of images are respectively set as
Figure FDA0003435322030000035
And
Figure FDA0003435322030000036
the weighting factor is then:
Figure FDA0003435322030000037
accordingly, adding weight to the original reprojection error yields a loss function as:
Figure FDA0003435322030000038
10. a camera pose estimation system based on a panoramic image, to which the camera pose estimation method according to any one of claims 1 to 9 is applied, the camera pose estimation system comprising:
the image acquisition module is used for acquiring a panoramic image;
the image preprocessing module is used for preprocessing the panoramic image;
the distortion correction module is used for carrying out distortion correction on the panoramic image after image preprocessing to obtain a corrected image;
the characteristic point extraction module is used for extracting characteristic points of the corrected image by adopting an ORB characteristic extraction algorithm;
the descriptor computation module is used for performing descriptor computation on the extracted feature points;
the characteristic point matching module is used for matching the characteristic points of two adjacent frames of images;
the loss function building module is used for adding weight to the reprojection error based on the radius of the imaging position of the matched feature point on the two image planes and the difference of the radii so as to build a loss function;
a camera pose estimation module to estimate a camera pose by minimizing a loss function.
CN202111634998.7A 2021-12-27 2021-12-27 Camera pose estimation method and system based on panoramic image Pending CN114549634A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111634998.7A CN114549634A (en) 2021-12-27 2021-12-27 Camera pose estimation method and system based on panoramic image

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111634998.7A CN114549634A (en) 2021-12-27 2021-12-27 Camera pose estimation method and system based on panoramic image

Publications (1)

Publication Number Publication Date
CN114549634A true CN114549634A (en) 2022-05-27

Family

ID=81670554

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111634998.7A Pending CN114549634A (en) 2021-12-27 2021-12-27 Camera pose estimation method and system based on panoramic image

Country Status (1)

Country Link
CN (1) CN114549634A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116824183A (en) * 2023-07-10 2023-09-29 北京大学 Image feature matching method and device based on multiple feature descriptors

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116824183A (en) * 2023-07-10 2023-09-29 北京大学 Image feature matching method and device based on multiple feature descriptors
CN116824183B (en) * 2023-07-10 2024-03-12 北京大学 Image feature matching method and device based on multiple feature descriptors

Similar Documents

Publication Publication Date Title
CN109242873B (en) Method for carrying out 360-degree real-time three-dimensional reconstruction on object based on consumption-level color depth camera
CN111325797B (en) Pose estimation method based on self-supervision learning
CN111080529A (en) Unmanned aerial vehicle aerial image splicing method for enhancing robustness
US8467596B2 (en) Method and apparatus for object pose estimation
CN109472820B (en) Monocular RGB-D camera real-time face reconstruction method and device
CN111553939B (en) Image registration algorithm of multi-view camera
JPH0935061A (en) Image processing method
CN110110694B (en) Visual SLAM closed-loop detection method based on target detection
Jung et al. Deep360Up: A deep learning-based approach for automatic VR image upright adjustment
CN112750198B (en) Dense correspondence prediction method based on non-rigid point cloud
CN109859137B (en) Wide-angle camera irregular distortion global correction method
CN109493384A (en) Camera position and orientation estimation method, system, equipment and storage medium
CN114331879A (en) Visible light and infrared image registration method for equalized second-order gradient histogram descriptor
CN115393519A (en) Three-dimensional reconstruction method based on infrared and visible light fusion image
CN111462198A (en) Multi-mode image registration method with scale, rotation and radiation invariance
CN107679542B (en) Double-camera stereoscopic vision identification method and system
CN116468786A (en) Semantic SLAM method based on point-line combination and oriented to dynamic environment
CN114549634A (en) Camera pose estimation method and system based on panoramic image
CN114120013A (en) Infrared and RGB cross-modal feature point matching method
Li et al. Depth-based 6dof object pose estimation using swin transformer
CN117372244A (en) Large scene feature stereo matching method based on two-dimensional array representation
Zhang et al. RGB-D simultaneous localization and mapping based on combination of static point and line features in dynamic environments
CN111815511A (en) Panoramic image splicing method
CN114565516B (en) Sensor data fusion containment surface area robust splicing method
CN110910418B (en) Target tracking algorithm based on rotation invariance image feature descriptor

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination