CN114549634A

CN114549634A - Camera pose estimation method and system based on panoramic image

Info

Publication number: CN114549634A
Application number: CN202111634998.7A
Authority: CN
Inventors: 黄昊宇; 王之丰; 冯逸鹤
Original assignee: Hangzhou Huanjun Technology Co ltd
Current assignee: Hangzhou Huanjun Technology Co ltd
Priority date: 2021-12-27
Filing date: 2021-12-27
Publication date: 2022-05-27

Abstract

The invention belongs to the technical field of visual SLAM processing, and relates to a camera pose estimation method and system based on panoramic images, wherein the camera pose estimation method comprises the following steps: s1, collecting a panoramic image, and carrying out image preprocessing on the panoramic image; s2, carrying out distortion correction on the panoramic image after image preprocessing to obtain a corrected image; s3, extracting feature points of the corrected image by adopting an ORB feature extraction algorithm and calculating a descriptor; and S4, matching feature points of two adjacent frames of images, adding weights to the reprojection errors based on the radius of the imaging positions of the matched feature points on the two image planes and the difference between the radii to construct a loss function, and estimating the pose of the camera by minimizing the loss function. According to the method, the loss function is constructed by considering the different weights of each matched characteristic point pair, the loss function is constructed by adding weights to the original reprojection error, and the accuracy of camera pose estimation can be effectively improved.

Description

Camera pose estimation method and system based on panoramic image

Technical Field

The invention belongs to the technical field of visual SLAM processing, and relates to a camera pose estimation method and system based on a panoramic image.

Background

As robots are increasingly incorporated into everyday life, there is increasing concern about their robustness in the real world. Smaller, more powerful computers and sensors, and the combination of more efficient algorithms, have created the rise of mobile robots, such as autonomous vehicles, aerial photography drones, and search and rescue drones. For mobile robots, positioning itself is an important task, and the existing method commonly used to estimate the position of the robot is GPS, but requires wireless communication, and is not available in many environments, especially indoors, and therefore other sensors must be used to estimate the state of the robot and map the environment around it. Estimating the pose of the robot while building a map of surrounding obstacles is called simultaneous Localization and Mapping (SLAM).

An important variable of a visual system is a camera visual Field (FOV), the visual field of a traditional lens is generally less than 120 degrees, an omnidirectional camera (omni-directional camera) represented by a fisheye camera and a panoramic annular camera has a visual field of more than 180 degrees or even more, and the sensing capability of an ultra-large visual field brought by a large visual field camera is very helpful for reducing the number of sensors and reducing the size of a mobile robot. However, wide-angle images suffer from optical distortion, causing physical straight lines to appear curved in the image and the field of view gradually increases as a function of radial distance from the center of the image to the edges. Traditional simple camera models, such as pinhole models, cannot model such deformations, so traditional computer vision has been focused on perspective cameras with minimal deformation, without considering larger FOVs. However, using a wider field of view in the navigation task has many potential benefits, since visual SLAM relies more on tracking the environment around the camera, and it is intuitively desirable to have a larger field of view so that more of the environment can be seen in the instantaneous image. Therefore, panoramic cameras are an ideal device for visual SLAM.

At present, the mainstream feature point extraction mainly comprises a FAST corner, a HARRISS corner and the like, and the descriptor calculation mode comprises a BRIEF descriptor and the like, but the methods are all directed at the traditional image imaged by a pinhole model. For panoramic images, a robust and effective extraction method is still lacking at present.

The conventional SLAM technology mainly uses a conventional camera, and the imaging model mainly uses a pinhole imaging model for detection, for example, a two-dimensional panoramic image acquisition method disclosed in patent document No. CN110197455A and a garbage collection robot based on visual semantic SLAM disclosed in patent document No. CN 111360780A. The traditional feature point extraction and loss function calculation are usually based on the design of a pinhole model, and for a panoramic image with large distortion, the traditional feature point descriptor usually causes matching failure. Since image distortion may cause failure of the feature matching algorithm and the image of the panoramic camera may cause poor robustness due to matching difficulty, there are few methods of using the panoramic camera image as the input of the visual SLAM, which results in that the panoramic annular camera cannot be applied to the SLAM technology as a large-field-of-view low-cost sensor.

Disclosure of Invention

Based on overcoming the defects in the prior visual SLAM technology and simultaneously exerting the view field advantage of the panoramic annular camera, the invention aims to provide a camera pose estimation method and system based on a panoramic image, so as to solve the current situation that the feature points of the panoramic image are difficult to extract and accurately estimate the camera pose by utilizing the proposed loss function.

In order to achieve the purpose, the invention adopts the following technical scheme:

a camera pose estimation method based on a panoramic image comprises the following steps:

s1, collecting a panoramic image, and carrying out image preprocessing on the panoramic image;

s2, distortion correction is carried out on the panoramic image after image preprocessing to obtain a corrected image;

s3, extracting feature points of the corrected image by adopting an ORB feature extraction algorithm and calculating a descriptor;

and S4, matching feature points of two adjacent frames of images, adding weights to the reprojection errors based on the radius of the imaging positions of the matched feature points on the two image planes and the difference between the radii to construct a loss function, and estimating the pose of the camera by minimizing the loss function.

Preferably, in step S1, the image preprocessing is histogram equalization.

Preferably, in step S2, the distortion correction process includes:

let P be the coordinate (x, y, z)^TU is the projection of P on the image plane, and the coordinates are (U, v)^T，(x,y)^TAnd (u, v)^TProportional, the following relationship is satisfied:

wherein f is_b(ρ)＝α₀+α₂·ρ²+…+α_N·ρ^N；

α₀、α₂、…、α_NIs a polynomial coefficient;

using a back-projection function pi^-1(u) the mapping relation from the image coordinates to the corresponding three-dimensional coordinates of the object points is as follows:

P＝π^-1(U)＝λ^-1·g(u)

accordingly, the projection function is:

U＝π(P)＝f_p(θ)·h(P)

wherein the content of the first and second substances,

f_p(θ)＝β₀+β₂·θ²+…+β_N·θ^N

obtaining a calibration coefficient f_b(p) and f_p(θ) to obtain a projection function or a back-projection function to achieve distortion correction of the panoramic image.

Preferably, in step S3, the feature point extraction process includes:

setting the search radius of edge pixel points of a corrected image as 2, the actual radius as R, and the product of the two as 2R; correspondingly, for other pixel points in the corrected image, the search radius is

Wherein r' is the actual radius of the pixel point; the actual radius is defined as the distance from the pixel point to the central point of the image;

the center of the panoramic annular camera has an actual radius r₀So that the minimum actual radius corresponds to a search radius of

Rounded down to a maximum search radius of

The number of pixels used for comparison is 4 · (r)_max+ 1); correspondingly, rounding down is carried out on other search radiuses;

if the number of pixels used for comparison within the search radius is less than 4 · (r)_max+1), the target pixel is inserted by linear interpolation, so that the number of pixels used for comparison within the search radius is 4 (r)_max+1) of;

each pixel point is taken as a central point to be 4 (r) around the pixel point_max+1) pixel points for comparison are compared; if 3 · (r) is continuous_max+1) absolute differences between the pixel points used for comparison and the pixel of the central pointAnd if the pair values are all larger than the set threshold value, extracting the corresponding central points as the characteristic points.

Preferably, in step S3, after the feature points are extracted, descriptions of scale and rotation are added to the feature points by using a grayscale centroid method.

Preferably, in step S3, the pixel value of the target pixel point interpolated by linear interpolation is an average value of the pixel values of the left and right pixel points.

Preferably, in step S3, the descriptor calculation of the feature point includes:

setting the actual radius of the feature point as r, taking the feature point as the center, respectively selecting eight angular points of upper, lower, left, right, upper left, upper right, lower left and lower right on the circle with the search radius of 2, starting from each angular point to

Taking eight pixel points along the ray direction from the angular point to the central point of the image according to the angular point position and the step length for pixel value comparison so as to calculate a BRIEF descriptor;

by analogy, when the search radius is 3, 4 and 5, the BRIEF descriptor calculation process is executed, and the feature point corresponds to 256 sets of binary strings as the descriptor of the feature point.

Preferably, in step S4, the feature point matching includes:

and carrying out descriptor Hamming distance calculation on the two adjacent frames of images one by one, and selecting the feature points with the Hamming distance smaller than a target threshold value as matching feature points.

Preferably, in step S4, the constructing of the loss function includes:

the original reprojection error obtained from the gray scale difference is

Wherein the content of the first and second substances,

and

f (-) represents the function operation of the brightness difference, and n is the number of the matched characteristic point pairs;

the actual radiuses of the matched ith characteristic point pair in the two adjacent frames of images are respectively set as

And

the weighting factor is then:

accordingly, adding weight to the original reprojection error yields a loss function as:

the invention also provides a camera pose estimation system based on panoramic images, which applies the camera pose estimation method according to any one of the above schemes, and the camera pose estimation system comprises:

the image acquisition module is used for acquiring a panoramic image;

the image preprocessing module is used for preprocessing the panoramic image;

the distortion correction module is used for carrying out distortion correction on the panoramic image after image preprocessing to obtain a corrected image;

the characteristic point extraction module is used for extracting characteristic points of the corrected image by adopting an ORB characteristic extraction algorithm;

the descriptor computation module is used for performing descriptor computation on the extracted feature points;

the characteristic point matching module is used for matching the characteristic points of two adjacent frames of images;

the loss function building module is used for adding weight to the reprojection error based on the radius of the imaging position of the matched feature point on the two image planes and the difference of the radii so as to build a loss function;

a camera pose estimation module to estimate a camera pose by minimizing a loss function.

Compared with the prior art, the invention has the beneficial effects that:

(1) the feature point extraction fully considers a panoramic annular lens imaging model and distortion introduced by the panoramic annular lens imaging model, different extraction radiuses are adopted from different positions from the edge to the center, the extraction of the traditional BRIEF descriptor is simplified, the random extraction is changed into the extraction at intervals in sequence, and the descriptor calculation process is accelerated.

(2) The loss function calculation method designed by the invention better utilizes the good imaging quality of the lens edge field of the panoramic annular zone, and utilizes the characteristic points with smaller position difference on the two images as far as possible, so as to process the distortion influence introduced by the panoramic camera as little as possible.

(3) According to the method, the loss function is constructed by considering the different weights of each matched characteristic point pair, the loss function is constructed by adding weights to the original reprojection error, and the accuracy of camera pose estimation can be effectively improved.

(4) The method designed by the invention can effectively utilize the strong perception capability of the panoramic annular camera to the surrounding environment, thereby obviously improving the positioning speed of the visual SLAM, enhancing the purpose of the system and improving the overall detection efficiency.

Drawings

FIG. 1 is a flow chart of a panoramic image based camera pose estimation method of an embodiment of the invention;

FIG. 2 is a schematic view of a panoramic annular camera imaging model according to an embodiment of the invention;

FIG. 3 is a flow chart of feature point extraction according to an embodiment of the present invention;

fig. 4 is an architecture diagram of a panoramic image-based camera pose estimation system according to an embodiment of the present invention.

Detailed Description

In order to more clearly illustrate the embodiments of the present invention, the following description will explain the embodiments of the present invention with reference to the accompanying drawings. It is obvious that the drawings in the following description are only some examples of the invention, and that for a person skilled in the art, other drawings and embodiments can be derived from them without inventive effort.

As shown in fig. 1, the method for estimating the pose of a camera based on a panoramic image according to the embodiment of the present invention includes the following steps:

acquiring a panoramic image by using a panoramic annular camera;

the image preprocessing process comprises the following steps: and inputting a panoramic image from the panoramic annular camera, and performing histogram equalization to improve the contrast of the image and facilitate the extraction of subsequent feature points.

and (3) carrying out panoramic image distortion correction on the image after the histogram equalization, wherein the distortion correction is carried out based on a Taylor model proposed by Scaramuzza et al. As shown in FIG. 2, P is the coordinate (x, y, z)^TU is the projection of P on the image plane, and the coordinates are (U, v)^T，(x,y)^TAnd (u, v)^TProportional, the following relationship is satisfied:

wherein f is_b(ρ)＝α₀+α₂·ρ²+…+α_N·ρ^N；

α₀、α₂、…、α_NIs a polynomial coefficient;

thus, using back projection of π^-1(U), the mapping relation from the image coordinate U to the corresponding object point three-dimensional coordinate (2D to 3D) can be obtained, and the expression is as follows:

P＝π^-1(U)＝λ^-1·g(u)

accordingly, the projection function can be expressed as follows:

U＝π(P)＝f_p(θ)·h(P)

wherein the content of the first and second substances,

f_p(θ)＝β₀+β₂·θ²+…+β_N·θ^N

thus, when f is obtained by calibration_b(p) and f_pAnd (theta) after one of the calibration coefficients is obtained, a complete projection and back projection model can be obtained, and the distortion correction of the panoramic image is completed.

the specific process of feature point extraction comprises the following steps:

carrying out feature point extraction on the corrected image after histogram equalization and correction by using the panoramic ORB features provided by the embodiment of the invention; the panoramic ORB Features are mainly composed of panoramic organized FAST key points and modified panoramic image-based BREIF (Binary Robust Independent Elementary Features) descriptors.

Specifically, the traditional FAST feature points are extracted according to a pinhole model, and are used for detecting places with obvious local pixel gray scale changes, and the flow mainly includes: a pixel point p is selected in the image and the luminance value (i.e. pixel value) is assumed to be I_p(ii) a Then, a threshold value T (brightness value, for example, I can be selected) is set_p30% of); then, taking the pixel point p as a center, and selecting 16 pixel points on a circle with the radius of 3; assuming that the brightness of N successive points on the selected circle is greater than I_p+ T or less than I_pT, then the pixel p is considered a feature point (e.g., when N takes 12, it is considered as 12 consecutive points, FAST-12); and executing the four-step circular operation on all the pixel points to finish the FAST characteristic point extraction of the whole image.

For the panoramic image with the ultra-large view field in the embodiment of the invention, the imaging model is not suitable for a pinhole model any more, and the corresponding feature point extraction method also needs to be corrected. According to the imaging model of the panoramic annular camera, different u and v on an image surface correspond to different positions on a spherical surface, and further correspond to different view fields in a space. And u and v with the same distance on the image surface, the closer to the center of the image surface, the more corresponding scene information in space is according to the Taylor model used for imaging. In other words, if at a certain moment (u) is away from the center of the image plane₁,v₁) A spatial element is observed, which at the next moment is imaged close to the centre of the circle (u) due to the rotation of the panoramic camera₂,v₂) Where, then, is (u)₂,v₂) The image of (a) will be stretched to some extent. Therefore, for feature point extraction, modification is made according to the imaging model. According to f_b(ρ)＝α₀+α₂·ρ²+…+α_N·ρ^N；

The model can know the radius r of the imaging plane as a parameter, here

Take two radial positions r₁And r₂Then the difference in height in space is (α)₂·r₂ ²+…+α_N·r₂ ^N)-(α₂·r₁ ²+…+α_N·r₁ ^N) Approximately satisfy

Since Δ r is a relatively small quantity, the lower order term, i.e., 2 α, can be taken as the main factor₂·r₁Δ r, in this interior α₂Is a calibration parameter and can be ignored, so the main factor influencing the space height difference is the radius r₁Therefore, when extracting feature points, the radius is taken as a parameter, and a position with a small radius corresponds to a field of view with a small space. The radius search range is designed to ensure the same range of the corresponding space, namely the radius r needs to be searched_sThe product with the current radius is a constant. Meanwhile, considering that a large search radius corresponds to more candidate points, the pyramid scaling is performed on the points on the search radius of the edge pixel points, which is a similar linear interpolation method to ensure that the comparison times of all feature points are the same when comparing the threshold values.

The method comprises the following specific steps: setting the search radius of the edge pixel point of the image as 2 pixels, and setting the actual edge radius of the edge pixel point as R pixels, wherein the product is 2R; corresponding to radius r₁The search radius of the pixel point of (1) is

The consistency of the scales of all the pixel points is ensured;

the center of the panoramic annular camera has a radius r₀So that the minimum radius corresponds to a search radius of

If rounding down corresponds to a maximum search radius of

The number of pixels used for comparison is 4 · (r)_min+1), then for other search radii,the same rounding down is also done if less than 4 (r) points are found for comparison_minAnd +1), inserting a pixel point at the position of each edge at equal interval on the search radius by using a linear interpolation mode (for example, three points need to be inserted, each edge is divided into four equal parts, and the pixel value is taken as the average value of the left and right pixel points for the convenience of calculation.

After interpolation in the above manner, all pixels will have 4 (r) to the surrounding_min+1) pixels for comparison, if 3 (r) in succession_minAnd +1) pixel differences between the pixel points for comparison and the central pixel are all larger than a set threshold value T, and the point is considered to be a panoramic feature point.

Meanwhile, the direction of the characteristic point of each image block is calculated according to a gray scale centroid method, which specifically comprises the following steps:

in a small image block B, the moment of the image block is defined as m_pq＝∑_x,y∈Bx^py^qI (x, y), p, q ═ 0,1 }; the centroid of the image block can be found by the moment

Connecting the geometric center O and the centroid C of the image block to obtain a direction vector

The direction of the feature point may be defined as

Thus, a description of scale and rotation is added to the panoramic feature points.

The flow of feature point extraction according to the embodiment of the present invention is shown in fig. 3.

In addition, the descriptor computation of the embodiment of the present invention includes:

after extracting the full scene feature points, calculating a descriptor of each panoramic feature point, wherein the conventional BRIEF descriptor is to take 128 pairs of random pixels p and q around the feature points, if p is greater than q, then take 1, otherwise take 0, thus obtaining a 128-dimensional vector composed of 0 and 1, which is a binary expression descriptor for image matching. In the conventional pinhole imaging model, the BRIEF descriptor selected at any position is 128 pairs of random pixels p and q at the same position around the BRIEF descriptor, and obviously, for the distortion of the panoramic image, the conventional method of using the random pixels at the same position at any position is not applicable, so that the BRIEF descriptor is modified here to provide the panoramic BRIEF descriptor.

Because the search radius of the edge pixel point is smaller, interpolation is carried out during searching, and the search radius of the central point is larger, when the panorama BRIEF descriptor is calculated, random pixels at the same position can not be adopted at the edge and the center of the image surface. Therefore, in the embodiment of the present invention, different intervals are adopted for comparison according to different image surface radii R, similarly, the edge radius is denoted as R, the radius of the current feature point is denoted as R, eight corner points, namely, upper, lower, left, right, upper left, upper right, lower left and lower right, are respectively selected on a circumference with the radius of 2 by using the feature point as a center, and starting from each corner point, the eight corner points are used

For the step length (the floor represents rounding-down), eight pixel points are taken along the ray direction from the corner point to the central point of the image according to the corner point position and the step length, namely, the eight pixel points are respectively compared with the following eight points in the transverse direction, the longitudinal direction and the 45-degree direction according to the corner point position (for example, the upper point is downwards compared with the following eight points, the upper point on the right is downwards compared with the following eight points on the left, the lower point on the right is upwards compared with the following eight points on the left, the step length represents the comparison interval, if the step length is 1, the continuous taking is carried out, the step length is 2, the taking of one pixel point at the interval is carried out, and the like in sequence);

a similar operation is then performed for radii 3, 4, 5, which results in 256 binary strings for each feature point as descriptors of the feature points.

The descriptor calculation method provided by the embodiment of the invention considers that the image surface center position actually corresponds to a larger space view field, stretches the calculation range of the descriptor, and has better adaptability to the distortion of the panoramic image.

After the feature points and the descriptor are extracted, feature point matching is performed, specifically, hamming distance calculation of the descriptor is performed on two images one by one, and the feature points with the hamming distance smaller than a set threshold value are selected as matching feature points.

After the feature point matching is completed, the re-projection error is calculated, and the loss function of the re-projection error is correspondingly designed in the embodiment of the invention.

In the imaging process of the panoramic annular camera, the same radial interval is arranged on an image surface, the actual space view field occupied by the edge pixel points is small, but the same pixel number is occupied on a sensor, so that the imaging quality of the edge pixel points on the image surface is considered to be superior to that of a central point, and therefore when feature point matching is carried out, the edge pixel points need to be more prone to being believed and large weight needs to be given to the edge pixel points. Meanwhile, if the radial positions of a pair of matched feature points on the two graphs are far apart, the pair of feature points tend to be considered to be greatly displaced and relatively unreliable, and a small weight is given to the pair of matched feature points. When the features are matched, the reprojection error of the three-dimensional points is generally minimized, and the reprojection error is calculated by using the matched feature points. The loss function designed by the embodiment of the invention adds weight to the radius of the feature point according to the imaging positions of the feature point on the two image surfaces and the difference between the radii, and if the radius of a matched feature on the image 1 is r₁Radius on image 2 is r₂And recording the radius of the edge of the image as R and the minimum imaging radius as R₀(i.e., the center dead zone is r₀) The weight factor can be recorded as

Using this weight factorBy designing the loss function, the edge feature points can be better utilized, and smaller feature points can be moved on the two frame images.

The loss function of the embodiment of the invention is constructed as follows:

the original reprojection error obtained from the gray scale difference is

Wherein the content of the first and second substances,

and

for the matched ith feature point pair, the luminance values in the two adjacent frame images respectively, f (-) represents a functional operation on the luminance difference, and can be a linear functional relation or a quadratic functional relation, for example; n is the number of matched characteristic point pairs;

And

the weighting factor is then:

according to the embodiment of the invention, the loss function is constructed by considering that the weight of each matched characteristic point pair is different, so that the loss function is constructed by adding the weight to the original reprojection error, and the accuracy of the subsequent camera pose estimation can be effectively improved.

The embodiment of the invention also provides a camera pose estimation system based on the panoramic image, which comprises an image acquisition module, an image preprocessing module, a distortion correction module, a feature point extraction module, a descriptor calculation module, a feature point matching module, a loss function construction module and a camera pose estimation module, as shown in fig. 4.

The image acquisition module is used for acquiring panoramic images. For example, a panoramic image is captured using a panoramic annular camera.

The image preprocessing module is used for carrying out image preprocessing on the panoramic image. The image preprocessing process comprises the following steps: and inputting a panoramic image from the panoramic annular camera, and performing histogram equalization to improve the contrast of the image and facilitate the extraction of subsequent feature points.

And the distortion correction module is used for carrying out distortion correction on the panoramic image after the image preprocessing to obtain a corrected image. Specifically, panoramic image distortion correction is performed on the image after histogram equalization, and the distortion correction is performed based on the taylor model proposed by Scaramuzza et al. As shown in FIG. 2, P is the coordinate (x, y, z)^TU is the projection of P on the image plane, and the coordinates are (U, v)^T，(x,y)^TAnd (u, v)^TProportional, the following relationship is satisfied:

wherein f is_b(ρ)＝α₀+α₂·ρ²+…+α_N·ρ^N；

α₀、α₂、…、α_NIs a polynomial coefficient;

P＝π^-1(U)＝λ^-1·g(u)

accordingly, the projection function can be expressed as follows:

U＝π(P)＝f_p(θ)·h(P)

wherein the content of the first and second substances,

f_p(θ)＝β₀+β₂·θ²+…+β_N·θ^N

And the characteristic point extraction module is used for extracting the characteristic points of the corrected image by adopting an ORB characteristic extraction algorithm. Specifically, the specific process of feature point extraction includes:

Specifically, the traditional FAST feature points are extracted according to a pinhole model, and are used for detecting places with obvious local pixel gray scale changes, and the flow mainly includes: a pixel point p is selected in the image and the luminance value (i.e. pixel value) is assumed to be I_p(ii) a Then, a threshold value T (brightness value, for example, I can be selected) is set_p30%) of; then, taking the pixel point p as a center, and selecting 16 pixel points on a circle with the radius of 3; assuming that there are links on the selected circleThe brightness of continuous N points is greater than I_p+ T or less than I_pT, then the pixel p is considered a feature point (e.g., when N takes 12, it is considered as 12 consecutive points, FAST-12); and executing the four-step circular operation on all the pixel points to finish the FAST characteristic point extraction of the whole image.

For the panoramic image with the ultra-large view field in the embodiment of the invention, the imaging model is not suitable for a pinhole model any more, and the corresponding feature point extraction method also needs to be corrected. According to the imaging model of the panoramic annular camera, different u and v on an image surface correspond to different positions on a spherical surface, and further correspond to different view fields in a space. And u and v with the same distance on the image surface, the closer to the center of the image surface, the more corresponding scene information in space is according to the Taylor model used for imaging. In other words, if at a certain moment (u) is away from the center of the image plane₁,v₁) A spatial element is observed, which at the next moment is imaged close to the centre of the circle (u) due to the rotation of the panoramic camera₂,v₂) Where, then, is (u)₂,v₂) The image of (a) will stretch to some extent. Therefore, for feature point extraction, modification is made according to the imaging model. According to f_b(ρ)＝α₀+α₂·ρ²+…+α_N·ρ^N；

The model can know the radius r of the imaging plane as a parameter, here

Since Δ r is a relatively small quantity, the lower order term, i.e., 2 α, can be taken as the main factor₂·r₁Δ r, herein α₂Is a calibration parameter and can be ignored, so the main factor influencing the space height difference is the radius r₁Therefore, when extracting feature points, the radius is taken as a parameter, and a position with a small radius corresponds to a field of view with a small space. The radius search range is designed to ensure the same range of the corresponding space, namely the radius r needs to be searched_sThe product with the current radius is a constant. Meanwhile, considering that a large search radius can correspond to more candidate points, the pyramid scaling is performed on the points on the search radius of the edge pixel points, which is a similar linear interpolation method to ensure that the comparison times when all the feature points are compared with the threshold are the same.

The consistency of the scales of all the pixel points is ensured;

If rounding down corresponds to a maximum search radius of

The number of pixels used for comparison is 4 · (r)_min+1), then the same rounding-down is done for other search radii as well, if less than 4 (r) points are found for comparison_min+1, then use linear interpolation mode to search each edge equal interval position on radius (for example, need to insertAnd entering three points, each side is divided into four parts, and inserted into the points of the equal division) to insert a pixel point, and the pixel value is taken as the average value of the left pixel point and the right pixel point for convenient calculation.

After interpolation in the above manner, all pixels will have 4 (r) to the surrounding_min+1) pixels for comparison, if 3 (r) in succession_min+1) pixel differences between the pixel points for comparison and the central pixel are all larger than a set threshold value T, and the point is considered to be a panoramic feature point; otherwise, the feature points are not panoramic and are eliminated.

If the feature points are considered as panoramic feature points, calculating the direction of the feature points of each image block according to a gray centroid method, specifically as follows:

The direction of the feature point can be defined as

And the descriptor computation module is used for performing descriptor computation on the extracted feature points. Specifically, the descriptor computation of the embodiment of the present invention includes:

The characteristic point matching module is used for matching the characteristic points of two adjacent frames of images. Specifically, the hamming distance of descriptors is calculated for two images one by one, and the feature point with the hamming distance smaller than a set threshold value is selected as a matching feature point.

And the loss function building module is used for adding weight to the reprojection error based on the radius and the difference of the radius of the imaging positions of the matched characteristic points on the two image planes respectively so as to build a loss function. Specifically, in the imaging process of the panoramic annular camera, the same radial interval is arranged on the image surface, the actual space view field occupied by the edge pixel points is small, but the same pixel number is occupied on the sensor, so that the imaging quality of the edge pixel points on the image surface is considered to be superior to that of the central point, and when feature point matching is carried out, the edge pixel points need to be more prone to being believed, and the edge pixel points need to be endowed with large weight. Meanwhile, if the radial positions of a pair of matched feature points on the two graphs are far apart, the pair of feature points tend to be considered to be greatly displaced and relatively unreliable, and a small weight is given to the pair of matched feature points. When the features are matched, the reprojection error of the three-dimensional points is generally minimized, and the reprojection error is calculated by using the matched feature points. The loss function designed by the embodiment of the invention adds weight to the radius of the feature point according to the imaging positions of the feature point on the two image surfaces and the difference between the radii, and if the radius of a matched feature on the image 1 is r₁Radius on image 2 is r₂And recording the radius of the edge of the image as R and the minimum imaging radius as R₀(i.e., the center dead zone is r₀) The weight factor can be recorded as

By designing the loss function by using the weight factor, the edge feature points can be better utilized, and the feature points with smaller movement on the two frame images can be better utilized.

The loss function of the embodiment of the invention is constructed as follows:

the original reprojection error obtained from the gray scale difference is

Wherein the content of the first and second substances,

and

And

the weighting factor is then:

And the camera pose estimation module is used for estimating the camera pose by minimizing the loss function so as to realize fine estimation of the camera pose.

The foregoing has outlined rather broadly the preferred embodiments and principles of the present invention and it will be appreciated that those skilled in the art may devise variations of the present invention that are within the spirit and scope of the appended claims.

Claims

1. A camera pose estimation method based on a panoramic image is characterized by comprising the following steps:

2. The method according to claim 1, characterized in that in step S1, the image preprocessing is histogram equalization.

3. The panoramic image-based camera pose estimation method according to claim 1, wherein in the step S2, the distortion correction process comprises:

let P be the coordinate (x, y, z)^TU is the projection of P on the image plane, and the coordinates are (U, v)^T，(x，y)^TAnd (u, v)^TProportional, the following relationship is satisfied:

wherein f is_b(ρ)＝α₀+α₂·ρ²+…+α_N·ρ^N；

α₀、α₂、…、α_NIs a polynomial coefficient;

P＝π^-1(U)＝λ^-1·g(u)

accordingly, the projection function is:

U＝π(P)＝f_p(θ)·h(P)

wherein the content of the first and second substances,

f_p(θ)＝β₀+β₂·θ²+…+β_N·θ^N

4. The method according to claim 1, wherein in step S3, the process of extracting the feature points includes:

Wherein r' isThe actual radius of the pixel; the actual radius is defined as the distance from the pixel point to the central point of the image;

Rounded down to a maximum search radius of

each pixel point is taken as a central point to be 4 (r) around the pixel point_max+1) pixel points for comparison are compared; if 3 · (r) is continuous_max+1) pixel differences between the pixel points for comparison and the center point are all greater than a set threshold, and then the corresponding center point is extracted as a feature point.

5. The method for estimating the pose of a camera based on a panoramic image according to claim 4, wherein in the step S3, after the feature points are extracted, the description of the scale and the rotation is added to the feature points by using a gray centroid method.

6. The method according to claim 4, wherein in step S3, the pixel value of the target pixel point interpolated by linear interpolation is an average of the pixel values of the left and right pixel points.

7. The panoramic image-based camera pose estimation method according to claim 4 or 5, wherein in the step S3, the descriptor calculation of the feature points includes:

8. The panoramic image-based camera pose estimation method according to claim 7, wherein in the step S4, the feature point matching includes:

9. The panoramic image-based camera pose estimation method according to claim 8, wherein in the step S4, the constructing of the loss function includes:

the original reprojection error obtained from the gray scale difference is

Wherein the content of the first and second substances,

and

for the matched ith characteristic point pair, the brightness of the characteristic point pair in the two adjacent frame images is respectivelyThe value of the value f (·) represents the function operation on the brightness difference, and n is the number of matched feature point pairs;

And

the weighting factor is then:

10. a camera pose estimation system based on a panoramic image, to which the camera pose estimation method according to any one of claims 1 to 9 is applied, the camera pose estimation system comprising:

the image acquisition module is used for acquiring a panoramic image;

the image preprocessing module is used for preprocessing the panoramic image;