CN117036488B

CN117036488B - Binocular vision positioning method based on geometric constraint

Info

Publication number: CN117036488B
Application number: CN202311280755.7A
Authority: CN
Inventors: 冯冠元; 刘雨; 蒋振刚; 师为礼; 苗语; 何飞; 张琛皓
Original assignee: Changchun University of Science and Technology
Current assignee: Changchun University of Science and Technology
Priority date: 2023-10-07
Filing date: 2023-10-07
Publication date: 2024-01-02
Anticipated expiration: 2043-10-07
Also published as: CN117036488A

Abstract

The invention discloses a binocular vision positioning method based on geometric constraint, which comprises the following steps of S1, solving normalized image feature point position coordinates; s2, solving a rotation matrix and a translation vector; s3, calculating the distance between the query camera and the target point; s4, constructing a geometric constraint relation; according to the invention, the geometric constraint relation established by using one database image and one query image is utilized, so that the solving of scale coefficients is avoided, the common scale ambiguity problem in 2D-2D indoor positioning is solved, the uncertainty and the calculation complexity of image retrieval are reduced by reducing the number of matched images, and the stability, the accuracy and the instantaneity of a positioning algorithm are enhanced; the global position information of the query camera can be obtained through geometric constraint conditions, so that accurate estimation of the position of the query camera is realized, and the method has obvious advantages in solving the problem of scale ambiguity, improving the positioning accuracy and obtaining global position estimation.

Description

Binocular vision positioning method based on geometric constraint

Technical Field

The invention relates to the technical field of visual positioning methods, in particular to a binocular visual positioning method based on geometric constraint.

Background

With the rapid development of the internet and the popularization of wearable equipment, the demands of people on self-position positioning are continuously improved. Currently, about 80% of humans are active indoors for most of the time every day, and thus indoor positioning is of great concern. The visual indoor positioning system is similar to a mode that a human person estimates the position of the visual indoor positioning system through eyes, and a user shoots a query image through a handheld intelligent mobile terminal and uploads the query image to a network server. And estimating the position of the user according to the query image provided by the user at the server side, and sending the position information back to the intelligent mobile terminal of the user so as to realize the estimation of the position of the user. Visual positioning systems have significant advantages over other positioning systems in terms of perceiving the user's surroundings through images only and estimating the user's location in complex indoor environments. The method can overcome the problems of signal interference, transmission limitation and the like possibly suffered by other sensors, and provides more reliable and accurate positioning results. The advantages of the visual positioning system promote the intensive research and wide application of indoor visual positioning. In existing indoor vision positioning system research, two main phases are generally involved: an offline phase and an online phase.

Before visual localization is implemented (i.e., off-line phase), it is necessary to model the indoor scene and create a visual map. The off-line stage map creation device adopts a 3D stereoscopic vision information acquisition device with two RGB color lenses, and a typical device is a 3D binocular stereoscopic vision depth camera ZED 2i of STEREOLABS company. The device calculates parallax between images by utilizing a stereo matching algorithm by simultaneously acquiring binocular stereo vision images in an indoor scene, so that depth information of each pixel point is deduced. Based on the principle of triangulation and known camera internal and external parameters, the three-dimensional spatial position of each pixel point relative to the camera is accurately estimated, and the spatial position information is constructed into a point cloud to represent the geometric structure of an indoor scene. Further, stable feature points are extracted from the point cloud data by utilizing a feature extraction and description algorithm, and map construction is performed by establishing local and global feature descriptors. Finally, a three-dimensional Dense Map (3D Dense Map) containing high-density geometric information is generated. In the visual feature map creation process, it is necessary to save the key frame database image and record the shooting pose (including position and pose) of the current frame at the same time. In the invention, the visual map creation method in the off-line stage is not discussed, and the visual map is considered to be established, and the pose of the database image and the spatial position of the database image pixel point are both known conditions. The visual map created in the off-line stage is stored in the server side, and is helpful for visual positioning.

In the actual visual localization process (i.e., the online stage), it is necessary to upload the query image to the server side and retrieve the database image matching it in the visual map. After the database image matching the query image is obtained, the precise positioning method can be performed. In general, accurate positioning methods fall into three categories: 2D-2D method, 3D-2D method, and 3D-3D method. Among them, the 2D-2D method is the most commonly used method in indoor vision positioning, which estimates a user's position using only two-dimensional image information, and generally adopts a position estimation method based on epipolar constraint. By means of the epipolar geometry constraint relationship, the relative position relationship between the query camera and the database camera can be estimated. It should be noted that, in the epipolar geometry constraint relationship established by a query image and a database image, the relative position of the query camera is estimated only according to the relationship, and the absolute position of the query camera cannot be obtained due to the Scale Ambiguity (Scale ambience) problem. In general, a method for solving the problem of scale ambiguity is to build a plurality of epipolar geometry constraint relationships by using a plurality of matching database images, so as to avoid solving the scale coefficients. However, not every query image may retrieve a plurality of matching database images. Another common method is to solve the scale coefficients in the epipolar geometry constraint relationship by iterative re-weighted least squares using the spatial locations of the matching feature points. Specifically, by weighting the spatial positions of the feature points, the importance of the feature points in estimating the scale can be adjusted, thereby reducing the influence of the outliers on the scale estimation. However, this approach does not guarantee that each iteration converges to an accurate result, but as close to the optimal solution as possible, due to noise and matching errors.

Disclosure of Invention

The invention aims to provide a binocular vision positioning method based on geometric constraint, which can solve the problem of scale ambiguity, improve positioning accuracy, reduce influence of abnormal disturbance factors and has obvious advantages in the aspect of global position estimation capability, so as to solve the problems in the background technology.

In order to achieve the above purpose, the present invention provides the following technical solutions: a binocular vision positioning method based on geometric constraints, comprising:

s1, solving the feature point position coordinates of the normalized image;

s2, solving a rotation matrix and a translation vector;

s3, calculating the distance between the query camera and the target point;

s4, constructing a geometric constraint relation.

Preferably, the step S1 of solving the coordinates of the feature points of the normalized image: SIFT feature point extraction is respectively carried out on the left and right images of the query camera and the database image matched with the query image, so as to obtain a feature point position matrix of the left and right images of the query camera and the database image SIFT (Scale-invariant feature transform, scale invariant feature transform)、/>、/>The method comprises the steps of carrying out a first treatment on the surface of the Then, obtaining the left image of the query camera through BF (Brute Force) feature point matching algorithm>Right image of query camera->Between->For matching feature points, query camera left image and matching database image +.>For matching feature points, query camera right image and matching database image +.>For the matching characteristic points, the coordinate matrixes of the three-view common matching characteristic points of the left image, the right image and the matching database image of the query camera are respectively +.>、/>And->Here, it is necessary to normalize the matching feature point positions and obtain a normalized position coordinate matrix +.>、/>And->：

（1）

（2）

（3）。

Preferably, the step S2 of solving the rotation matrix and the translation vector: normalized position matrix、/>And essence matrix->The epipolar constraint relation is satisfied:

（4）

normalized position matrix、/>And essence matrix->The epipolar constraint relation is satisfied:

（5）

according to the epipolar constraint relation between the two images of the query camera and the database camera image shown in the formulas (4) and (5), respectively, an essential matrix can be obtainedAnd->Essence matrix->And->Respectively reflect the relative position relation between the left and right images of the query camera and the database camera, the relation can be realized by rotating the matrix +.>、/>And translation vector->、Describing the relationship between the essence matrix and the rotation matrix and translation vector is:

（6）

（7）

wherein,、/>is vector->、/>By matching the matrix of the inverse symmetry of (a)And->The rotation matrix between cameras can be solved by singular value decomposition (Singular Value Decomposition, SVD)>、/>And translation vector->、/>；

Rotation matrixIs +.>Dimension matrix, composed of elements->The composition is as follows:

（8）

translation vectorIs +.>Dimension vector, by element->The composition is as follows:

（9）

（10）

（11）。

preferably, the step S3 of calculating the distance between the query camera and the target point: binocular ranging is a principle of simulating biological binocular ranging, a left picture and a right picture are obtained through a binocular camera, the obtained images are transmitted to a computer for analysis and calculation of parallax, and then three-dimensional space information of a target object is obtained; assume thatIs the object to be measured, is->、/>Is the optical center of the left and right cameras, < >>Is the distance between the optical centers of the left and right cameras, also called the baseline distance, +>Is the focal length of the camera +.>Is->Coordinates of points in the left and right camera image coordinate system,/->Is->Point-to-camera projectionShadow distance;

the formula can be obtained according to the principle of similar triangles:

（12）

further, the expression of the distance D between the object to be measured and the camera can be deduced as shown in formula (13):

（13）

in the formula (13) of the present invention,and->Respectively->The dot is on the abscissa of the pixel points in the left and right images, < +.>Is the parallax between the left and right cameras, i.e. the difference of the image positions of the target point in the left and right cameras,/>Focal length->And baseline distance->Obtained by calibration.

Preferably, the step S4 of constructing a geometric constraint relation:

、/>respectively representing the position coordinates of the left camera and the right camera under the world coordinate system, and inquiring the base line of the camera according to +.>The length can be known as follows:

（14）

according to the S2, solving the relative position relationship between the left lens and the right lens of the query camera and the database camera, which are obtained by the rotation matrix and the translation vector, and obtaining the left camera according to the proportional relationshipAnd database camera->Positional relationship between:

（15）

similarly, right cameraAnd database camera->Positional relationship between:

（16）

wherein,、/>、/>、/>respectively represent left camera->And right camera->And database camera->Edge->Shaft and->An offset of the shaft;

according to the S3, calculating the distance between the query camera and the target point, and calculating the binocular camera and the spatial pointPThe distance between them can be measured and is recorded asDLet the straight line equation determined between cameras C1 and C2 bePoint thenPTo the straight line:

（17）

wherein,，/>，/>the simultaneous formulas (14), (15), (16), (17) can be solved>、/>I.e. the position of the query camera in the world coordinate system.

Compared with the prior art, the invention has the beneficial effects that:

according to the method, the existing binocular vision positioning method based on multiple geometric constraints is optimized, and the epipolar geometric constraint relation established by using one query image and one database image is utilized, so that the method can avoid solving the scale coefficient, and further the problem of scale ambiguity is solved; secondly, compared with the traditional method for establishing the epipolar geometry constraint relation by using a plurality of matching database images, the method only uses one database image, reduces the dependence and the calculation complexity on the matching images, realizes accurate visual positioning results through a plurality of geometry constraint conditions, and improves the positioning precision; in addition, the invention reduces the influence of noise and matching errors, and provides a more stable and robust positioning algorithm through multiple geometric constraint conditions; the method can acquire the global position information of the query camera, not just the relative position, thereby realizing accurate estimation of the position of the query camera. In conclusion, the method has obvious advantages in solving the problem of scale ambiguity, improving the positioning accuracy, reducing the influence of matching errors and the global position estimation capability, and solves the common problem of scale ambiguity and the limitation of the positioning accuracy in the 2D-2D fine visual positioning method.

Drawings

FIG. 1 is a binocular vision ranging schematic diagram in the present invention;

FIG. 2 is a schematic diagram of a multiple geometry constraint in accordance with the present invention;

fig. 3 is a flowchart of the algorithm of the present invention.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

Example 1

Please refer to a binocular vision positioning method based on geometric constraint in the diagram of fig. 3, which comprises solving the coordinates of feature points of the normalized image, solving the rotation matrix and translation vector, calculating the distance between the query camera and the target point and constructing the geometric constraint relation, and comprises the following specific steps

S1, solving the feature point position coordinates of the normalized image;

s2, solving a rotation matrix and a translation vector;

s3, calculating the distance between the query camera and the target point;

s4, constructing a geometric constraint relation.

The S1 is used for solving the coordinates of the characteristic points of the normalized image: SIFT feature point extraction is respectively carried out on the left and right images of the query camera and the database image matched with the query image, so as to obtain a feature point position matrix of the left and right images of the query camera and the database image SIFT (Scale-invariant feature transform, scale invariant feature transform)、/>、/>The method comprises the steps of carrying out a first treatment on the surface of the Then, obtaining the left image of the query camera through BF (Brute Force) feature point matching algorithm>Right image of query camera->Between->For matching feature points, query camera left image and matching database image +.>For matching characteristic points, inquiring camera right imageAnd matching database images +.>For the matching characteristic points, the coordinate matrixes of the three-view common matching characteristic points of the left image, the right image and the matching database image of the query camera are respectively +.>、/>And->Here, it is necessary to normalize the matching feature point positions and obtain a normalized position coordinate matrix +.>、/>And->：

（1）

（2）

（3）。

S2, solving a rotation matrix and a translation vector:

（4）

（5）

（6）

（7）

（8）

（9）

（10）

（11）。

s3, calculating the distance between the query camera and the target point:

binocular ranging is a principle of simulating biological binocular ranging, a left picture and a right picture are obtained through a binocular camera, the obtained images are transmitted to a computer for analysis and calculation of parallax, and then three-dimensional space information of a target object is obtained; the schematic diagram is shown in fig. 1: assume thatIs the object to be measured, is->、/>Is the optical center of the left and right cameras, < >>Is the distance between the optical centers of the left and right cameras, also called the baseline distance, +>Is the focal length of the camera +.>Is->Coordinates of points in the left and right camera image coordinate system,/->Is->Point-to-camera projectionShadow distance;

as shown in figure 1 of the drawings,the formula can be obtained according to the principle of similar triangles:

（12）

（13）

S4, constructing a geometric constraint relation;

as shown in the figure 2 of the drawings,、/>respectively representing the position coordinates of the left camera and the right camera under the world coordinate system, and inquiring the base line of the camera according to +.>The length can be known as follows:

（14）

（15）

similarly, right cameraAnd database camera->Positional relationship between:

（16）

（17）

The method comprises the steps of inquiring a left image of a camera, inquiring a right image of the camera, matching a database image with the camera, photographing a position matrix of the camera, spatial positions of pixels of the database image, an internal parameter matrix of the camera, an internal reference matrix of the camera and a baseline length (namely left-right lens spacing) of the camera. Specifically, the method comprises the steps of firstly determining feature point position coordinates of a left image, a right image of a query camera and a matching image of a database camera by using a three-view feature matching algorithm, and then solving the relative position relationship between the database camera and the left lens and the right lens of the query camera based on epipolar geometric constraints. Then, the projection distance from the binocular camera to the target point is calculated by using the triangulation principle, and global coordinates of the corresponding database image feature points are obtained. And finally, calculating the absolute positions of the left lens and the right lens of the query camera by solving a set of nonlinear equations.

For the known conditions of the present invention (i.e., input variables): querying camera left imageRight image->Matching database image of query image +.>The method comprises the steps of carrying out a first treatment on the surface of the Shooting position matrix of data camera>Pixel space position coordinate matrix of database image +.>Inquiring about the internal reference matrix of the left camera of the camera>Right camera internal reference matrix>Reference matrix for database cameraThe method comprises the steps of carrying out a first treatment on the surface of the Inquiring about the baseline length of the camera>(i.e., left-right lens spacing).

The variables to be solved: inquiring the position of the left and right lens of the camera、/>。

(description of known conditions: color image taken by left camera of query image)And the right camera, the image taken by the right camera is +.>The basic idea of visual positioning is to estimate the shooting position of the query camera, so as to realize the positioning of the user; matching database image +.>The database image is obtained through a certain search algorithm, has a certain visual characteristic similarity with the query image, and has a certain number of visual characteristic matching points between the query image and the database image; shooting position of database camera->Is a matching database image +.>Is used to query an absolute position estimate of the camera; database image +.>Pixel space position coordinate matrix of (2)>Is +.>A dimension matrix comprising +.>Three-dimensional position coordinates>Is the total number of pixels matching the database image, matrix +.>The three-dimensional position coordinates stored in the matching database image correspond to the data points in the matching database image one by one according to the matrix +.>The spatial position of each pixel point in the matched database image can be found, and the internal parameter matrix of the camera is queried>、/>Database camera internal parameter matrix>And query the baseline length of the camera->Which need to be obtained by camera calibration before positioning).

It is noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus.

Although embodiments of the present invention have been shown and described, it will be understood by those skilled in the art that various changes, modifications, substitutions and alterations can be made therein without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents.

Claims

1. A binocular vision positioning method based on geometric constraints, comprising:

s1, solving the feature point position coordinates of the normalized image;

s2, solving a rotation matrix and a translation vector;

s3, calculating the distance between the query camera and the target point;

s4, constructing a geometric constraint relation;

the S1 is used for solving the coordinates of the characteristic points of the normalized image: SIFT feature point extraction is respectively carried out on the left and right images of the query camera and the database image matched with the query image, so as to obtain a SIFT feature point position matrix of the left and right images of the query camera and the database image、/>、/>The method comprises the steps of carrying out a first treatment on the surface of the Then, obtaining the left image of the query camera through BF characteristic point matching algorithm>Right image of query camera->Between->For matching feature points, query camera left image and matching database image +.>For matching feature points, query camera right image and matching database image +.>For the matching characteristic points, the coordinate matrixes of the three-view common matching characteristic points of the left image, the right image and the matching database image of the query camera are respectively +.>、/>Andhere, it is necessary to normalize the matching feature point positions and obtain a normalized position coordinate matrix +.>、And->：

（1）

（2）

（3）

Wherein,for inquiring the camera left camera internal reference matrix, < >>Is a right camera internal reference matrix and +.>An internal reference matrix of the database camera;

s2, solving a rotation matrix and a translation vector: normalized position matrix、/>And essence matrix->The epipolar constraint relation is satisfied:

（4）

（5）

according to the epipolar constraint relation between the two images of the query camera and the database camera image shown in the formulas (4) and (5), respectively, an essential matrix can be obtainedAnd->Essence matrix->And->Respectively reflect the relative position relation between the left and right images of the query camera and the database camera, the relation can be realized by rotating the matrix +.>、/>And translation vector->、/>Describing the relationship between the essence matrix and the rotation matrix and translation vector is:

（6）

（7）

wherein,、/>is vector->、/>By applying +.>And->The rotation matrix between cameras can be solved by singular value decomposition (Singular Value Decomposition, SVD)>、/>And translation vector->、/>；

（8）

（9）

（10）

（11）；

s3, calculating the distance between the query camera and the target point: binocular ranging is a principle of simulating biological binocular ranging, a left picture and a right picture are obtained through a binocular camera, the obtained images are transmitted to a computer for analysis and calculation of parallax, and then three-dimensional space information of a target object is obtained; assume thatIs the object to be measured, is->、/>Is the optical center of the left and right cameras, < >>Is the distance between the optical centers of the left and right cameras, also called the baseline distance, +>Is the focal length of the camera +.>Is->Coordinates of points in the left and right camera image coordinate system,/->Is->The projection distance of the point to the camera;

according to similar trianglesThe shape principle can be given by the formula:

（12）

thereby the distance between the object to be measured and the camera can be deducedThe expression of (2) is shown in formula (13):

（13）

in the formula (13) of the present invention,and->Respectively->The dot is on the abscissa of the pixel points in the left and right images, < +.>Is the parallax between the left and right cameras, i.e. the difference of the image positions of the target point in the left and right cameras,/>Focal length->And baseline distance->The calibration is carried out;

s4, constructing a geometric constraint relation:

（14）

（15）

similarly, right cameraAnd database camera->Positional relationship between:

（16）

according to the S3, calculating the distance between the query camera and the target point, and calculating the binocular camera and the spatial pointThe distance between them is measured and is denoted +.>Set up camera->And->The straight line equation determined between is +.>Point->To the straight line:

（17）