CN112686953A

CN112686953A - Visual positioning method and device based on inverse depth parameter and electronic equipment

Info

Publication number: CN112686953A
Application number: CN202011522724.4A
Authority: CN
Inventors: 郎小明
Original assignee: Beijing Sankuai Online Technology Co Ltd
Current assignee: Beijing Sankuai Online Technology Co Ltd
Priority date: 2020-12-21
Filing date: 2020-12-21
Publication date: 2021-04-20

Abstract

The application discloses a visual positioning method based on inverse depth parameters, belongs to the technical field of computers, and is beneficial to improving the accuracy of visual positioning. The positioning method comprises the following steps: in the process of carrying out visual positioning based on the inverse depth parameter, in response to the current image frame acquired by the camera, determining corresponding feature points of the landmark points observed in the current image frame; determining whether the current image frame is a key frame; responding to the fact that the current image frame is a key frame, and setting a reference frame of the feature points corresponding to the newly added road mark observed by the current image frame relative to the previous key frame as the current image frame; otherwise, setting the current image frame as a current key frame relative to a reference frame of a corresponding feature point of the newly added landmark point observed in the previous image frame; and updating the information of the feature points corresponding to the landmark points observed in the image frame sequence according to the information of the feature points corresponding to the landmark points observed in the current image frame, so as to position the camera.

Description

Visual positioning method and device based on inverse depth parameter and electronic equipment

Technical Field

The embodiment of the application relates to the technical field of computers, in particular to a visual positioning method and device based on an inverse depth parameter, electronic equipment and a computer readable storage medium.

Background

In the prior art, the visual positioning technology can be generally divided into two parts, namely, the visual processing of a front-end image and the parameter optimization of a rear end, so as to estimate the position of a positioning object and construct a map. In the back-end optimization process, all variables to be estimated are taken as a band solution parameter, and then the value of the parameter which enables the error function to be minimum is found by constructing the error function. For example, in SLAM (simultaneous localization and mapping) technology, when solving a back-end optimization problem, parameters to be estimated are divided into poses { Ci } of cameras at all times and positions { Lj } of landmark points, where i is 1,2,3,4,5., i is the position of the camera at the ith time, j is 1,2,3,4,5., j is the jth landmark point, and a visual residual of an image frame having a common view relationship is calculated. In the prior art, one way is to calculate the visual residual error by using the reciprocal of the depth corresponding to the landmark point as a parameter, i.e., by using a parameterization mode of inverse depth. That is, i represents that a first frame of a certain landmark point is seen, j represents that an image frame of the same landmark point is seen subsequently, then, the position of the landmark point is projected to a jth frame, and then, the position of the landmark point is projected to a visual picture through a projection model of a camera, so that the visual residual error of the landmark point seen in different image frames can be solved for visual positioning.

However, in the visual positioning system, in order to ensure that the computational complexity of the system does not increase with the increase of image frames acquired by the camera, only a plurality of image frames at the latest moment are reserved as optimized camera poses, namely, an image frame sequence consisting of the latest frames is maintained. When the number of image frames in the image frame sequence reaches a set value, the first frame is edged according to different strategies, so that image information of some landmark points is lost, constraint of related poses is lost, and further inaccurate positioning is caused.

It is clear that the visual positioning method in the prior art needs to be improved.

Disclosure of Invention

The embodiment of the application provides a visual positioning method based on inverse depth parameters, which is beneficial to improving the accuracy of visual positioning.

In order to solve the above problem, in a first aspect, an embodiment of the present application provides a visual positioning method based on an inverse depth parameter, including:

in the process of carrying out visual positioning based on the inverse depth parameter, in response to the current image frame acquired by a camera, determining corresponding feature points of the landmark points observed in the current image frame;

determining whether the current image frame is a key frame;

responding to the fact that the current image frame is a key frame, and setting a reference frame of a new road sign point corresponding to a feature point observed by the current image frame relative to a previous key frame as the current image frame;

in response to the fact that the current image frame is not a key frame, setting a reference frame of the current image frame, relative to a feature point corresponding to a newly added landmark observed in a previous image frame, as the current key frame;

and updating the information of the feature points corresponding to the landmark points observed in the image frame sequence according to the information of the feature points corresponding to the landmark points observed in the current image frame, so that the preset visual positioning engine positions the camera based on the updated information of the feature points corresponding to the landmark points.

In a second aspect, an embodiment of the present application provides an apparatus for visual positioning based on an inverse depth parameter, including:

the system comprises a current image frame acquisition module, a feature point acquisition module and a feature point acquisition module, wherein the current image frame acquisition module is used for responding to the current image frame acquired by a camera in the process of carrying out visual positioning based on inverse depth parameters and determining the feature point corresponding to the landmark point observed in the current image frame;

the key frame judging module is used for determining whether the current image frame is a key frame;

the first reference frame setting module is used for setting a reference frame of a corresponding feature point of a newly added road sign observed by the current image frame relative to a previous key frame as the current image frame in response to the current image frame being a key frame;

the second reference frame setting module is used for setting a reference frame of a corresponding feature point of a newly added road sign point observed by the current image frame relative to a previous image frame as the current key frame in response to the fact that the current image frame is not a key frame;

and the feature point information updating module is used for updating the information of the feature points corresponding to the landmark points observed in the image frame sequence according to the information of the feature points corresponding to the landmark points observed in the current image frame, so that the preset visual positioning engine positions the camera based on the updated information of the feature points corresponding to the landmark points.

In a third aspect, an embodiment of the present application further discloses an electronic device, which includes a memory, a processor, and a computer program stored on the memory and executable on the processor, where the processor, when executing the computer program, implements the inverse depth parameter-based visual positioning method described in the embodiment of the present application.

In a fourth aspect, the present application provides a computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, performs the steps of the visual localization method based on inverse depth parameters disclosed in the embodiments of the present application.

In the visual positioning method based on the inverse depth parameter, in the process of visual positioning based on the inverse depth parameter, in response to the acquisition of a current image frame acquired by a camera, feature points corresponding to landmark points observed in the current image frame are determined; determining whether the current image frame is a key frame; responding to the fact that the current image frame is a key frame, and setting a reference frame of a new road sign point corresponding to a feature point observed by the current image frame relative to a previous key frame as the current image frame; in response to the fact that the current image frame is not a key frame, setting a reference frame of the current image frame, relative to a feature point corresponding to a newly added landmark observed in a previous image frame, as the current key frame; and updating the information of the feature points corresponding to the landmark points observed in the image frame sequence according to the information of the feature points corresponding to the landmark points observed in the current image frame, so that the preset visual positioning engine positions the camera based on the updated information of the feature points corresponding to the landmark points, and the accuracy of visual positioning is improved.

The foregoing description is only an overview of the technical solutions of the present application, and the present application can be implemented according to the content of the description in order to make the technical means of the present application more clearly understood, and the following detailed description of the present application is given in order to make the above and other objects, features, and advantages of the present application more clearly understandable.

Drawings

In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some embodiments of the present application, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

FIG. 1 is a flowchart of a visual positioning method based on inverse depth parameters according to a first embodiment of the present application;

FIG. 2 is a schematic diagram of a camera acquiring an image frame according to an embodiment of the present application;

FIG. 3 is a schematic diagram of feature points of a landmark point in different image frames according to an embodiment of the present application;

FIG. 4 is a schematic structural diagram of a visual positioning apparatus based on inverse depth parameters according to a second embodiment of the present application;

FIG. 5 schematically shows a block diagram of an electronic device for performing a method according to the present application; and

fig. 6 schematically shows a storage unit for holding or carrying program code implementing a method according to the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some, but not all, embodiments of the present application. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

Example one

As shown in fig. 1, a visual positioning method based on an inverse depth parameter, disclosed in an embodiment of the present application, includes: step 110 to step 150.

Step 110, in the process of performing visual positioning based on the inverse depth parameter, in response to acquiring a current image frame acquired by a camera, determining feature points corresponding to landmark points observed in the current image frame.

Taking visual positioning as an example, the current visual positioning problem (VISLAM or VIO) can be divided into two parts, namely, the estimation of the self position by the visual processing of the front-end image and the optimization method of the back end.

In the back end of the optimization, all variables to be estimated are taken as the solution parameters, and then the values of the parameters which minimize the error function are found by constructing the error function. For example, in the SLAM problem, the parameters to be estimated are divided into poses { Ci } of cameras at all times, where i is 1,2,3,4,5, i is the position of the camera at the ith time, and the position { Lj } of a landmark point, and j is 1,2,3,4,5. The whole optimization problem is converted into the following formula:

wherein,

a residual representing a priori information generated for the first frame,

which represents the visual residual error, is shown,

representing the position speed and posture constraint between two adjacent frames generated by a navigation system, namely residual error generated by the navigation system; v_iRepresenting a collection of other image frames having a common view relationship with the ith frame. In the manner of visual residual representation, one method is to use an image frame where a landmark point is first seen as a reference frame, and use the reciprocal of the corresponding depth as a parameter, i.e., an inverse depth parameter. When the parameterization mode of inverse depth is selected, in the visual residual error item, i represents that the first image frame of a certain road mark point is seen, and j represents that the image frame of the same road mark point is seen subsequently, and then the position of the road mark point is projected to the second image frameAnd j frames are projected onto the visual picture through a projection model of the camera, so that a visual residual error term, namely a re-projection error, can be obtained. This relationship can be generally represented in the form shown in fig. 2.

As shown in fig. 2, L1 and L2 identify two landmark points, where landmark point L1 is observed by the camera in the image frames acquired at time C0, C1, C2, and C3, and landmark point L2 is observed by the camera in the image frames acquired at time C2, C3, C4, and C5. When the reprojection error is represented by the inverse depth parameter, each residual will have a relationship with the camera position and pose at two time instants, e.g., one residual in the image frames acquired at C0, C1 and one residual in the image frames acquired at C0, C2 for the landmark point L1; one residual error is generated in the image frames acquired by C0 and C3, and the total residual errors are three. And the visual residual errors of the landmark points generated in the image frames acquired at different moments are used as parameters for positioning the camera.

In the process of carrying out visual positioning based on the inverse depth parameter, a preset visual positioning engine positions the camera according to the information of corresponding feature points of a series of landmark points observed in an image frame sequence containing the landmark points, which is acquired by the camera.

In some embodiments of the present application, a landmark corresponding feature point is an image of a landmark point in an image frame, where information of the landmark point based on the feature point of a specified image frame includes: a planar position coordinate of the feature point in the specified image frame and an inverse depth value, wherein the inverse depth value is an inverse of a depth of the landmark point relative to the current camera position.

For example, if a landmark point p is observed in the ith frame image frame, an image coordinate point in the ith frame image frame is represented by qi, and the image coordinate point qi is first back-projected onto a unit plane (i.e., a plane with a depth of 1) to obtain an image coordinate point qi ', i.e., a feature point of the landmark point p, the coordinates of the image coordinate point qi' can be represented as [ x, y,1 [ ]]^TThen the inverse depth of the landmark point p is λ, where x and y are image coordinates of the landmark point in the ith frame image, that is, plane position coordinates of the image coordinate point qi, and T is a transposed matrix. The image coordinate point qi (i.e., the features of the landmark point)Feature points) may be represented as [ x/λ, y/λ,1/λ]^T。

According to the method, an expression representing the coordinates of the landmark points can be established. Namely, the spatial position of the landmark point can be determined through the inverse depth lambda and the camera pose corresponding to the ith frame image.

In the process of visual positioning, a camera collects image frames containing the landmark points in real time and stores the image frames into the image frame sequence. And according to the method, the information of the feature points corresponding to the landmark points in the acquired current image frame is determined in real time.

Step 120, determining whether the current image frame is a key frame.

During the process of capturing image frames by a camera, there is usually a key frame in every certain number of image frames. In particular, whether the currently captured image frame is a key frame may be determined according to the definition of the image capturing system or according to the video transmission protocol.

Step 130, in response to that the current image frame is a key frame, setting a reference frame of a feature point corresponding to a newly added landmark observed in the current image frame relative to a previous key frame as the current image frame.

In general, in order to ensure that the computational complexity does not increase with the increase of the number of camera positions and save the storage space, on the premise of ensuring the positioning accuracy, an upper limit of length is set for an image frame sequence used for performing visual positioning, and when an image frame in the image frame sequence reaches the set upper limit of length, marginalization processing is performed on the image frame in the image frame sequence in a first-in first-out manner, that is, an image frame acquired earliest is deleted. Specifically, for example, the marginalization upper threshold may be set to 20, that is, the preset visual positioning engine positions the camera according to the landmark points observed in the newly acquired 20 image frames.

In the process of performing visual positioning, after a camera acquires image frames containing a landmark point in real time and stores the image frames into the image frame sequence, if the length of the image frame sequence reaches the marginalization processing upper limit threshold, the image frame sequence needs to be maintained, for example, marginalization processing is performed to delete the earliest acquired image frame.

And then, the preset visual positioning engine iteratively solves the parameters to be solved by solving the optimization problem to obtain the position of the camera at the appointed moment, so that the camera is positioned.

In the prior art, when performing visual positioning, feature point information of each observed landmark point in an image frame sequence is represented based on a first frame image frame in which the landmark point is observed. As can be seen from the foregoing description, when the image frame acquired at the time point C0 is edged, since the waypoint L1 depends on the image frame acquired at the time point C0, or each visual residual is related to the image frame acquired at the time point C0, the waypoint L1 must be edged at the same time to ensure that the optimized residual can be retained. But this can lead to a situation where if a waypoint is tracked well, it does not always provide good constraints. I.e., the image frame acquired at time C0, landmark point L1 tracks well, but must be marginalized, losing the opportunity for updates to landmark point L1 and other constraints between poses associated with landmark point L1.

In some embodiments of the present application, in order to improve the accuracy of visual positioning, the landmark corresponding feature points with a good tracking state are retained as much as possible, and in the process of collecting image frames, for a new landmark corresponding feature point observed in an image frame between two keyframes and also observed in a later keyframe, a reference frame of the feature point is set as the later keyframe, that is, the latest keyframe of the landmark corresponding feature point is observed.

In some embodiments of the present application, after determining landmark points observed in a current image frame, in response to the current image frame being a key frame, setting a reference frame of corresponding feature points of a newly added landmark point observed in the current image frame relative to a previous key frame as the current image frame, including: determining landmark points which are not observed in a previous image frame of the current image frame and are observed in the current image frame as first newly-added landmark points; setting the reference frame of the corresponding feature points of the first newly added road mark as the current image frame; determining landmark points observed for the first time in an image frame acquired between the previous keyframe and the current image frame and observed in the current image frame as second newly added landmark points; and converting the reference frame of the corresponding feature points of the second newly-added landmark point into the current image frame.

The newly added landmark point corresponding feature points comprise landmark point corresponding feature points observed in the current image frame for the first time, and landmark point corresponding feature points observed in the image frame acquired before the current image frame after the latest frame key frame for the first time and observed (i.e. viewed together) in the current image frame.

For a landmark point corresponding feature point observed in the current image frame for the first time, directly setting a reference frame of the landmark point corresponding feature point as the current image frame (namely, a current key frame); for a landmark point corresponding feature point observed in a common image frame for the first time, when the landmark point corresponding feature point is observed, a reference frame of the landmark point corresponding feature point is already an image frame for observing the landmark point corresponding feature point, and the reference frame of the landmark point corresponding feature point needs to be converted into the current image frame (i.e., the current key frame).

Taking an example that an image frame sequence for performing visual positioning includes 20 image frames, assuming that a current time is t, the image frame acquired by a camera at the current time is a key frame and is recorded as a current key frame, assuming that the image frame acquired by the camera at the t-n time (where n is an integer greater than 2) is a key frame, at the t-n +1 to t-1 times, the camera acquires n-1 image frames in total, if landmark points L1 and L2 are observed in the key frame acquired by the camera at the t-n time, landmark points L1 and L2 are also observed in the image frames acquired at the t-n +1 to t-2 times, and landmark points L1 and L2 and landmark points L3 are observed in the image frames acquired at the t-1 time and the current time, the landmark point L3 is a new added landmark point. For the newly added landmark point, the positioning system records the first image frame of the newly added landmark point as a reference frame of the corresponding feature point of the newly added landmark point. Specifically, in the embodiment, after the image frame acquired by the camera at the t-1 th time is acquired, it is recorded that a new landmark point L3 is observed in the image frame acquired at the time, and the reference frame mark of the feature point of the landmark point L3 is set as the image frame acquired at the t-1 th time.

If the currently acquired image frame is a key frame, the reference frame of the feature point of the landmark point L3 is converted into the current image frame, that is, the image frame acquired at the time t, which is also the key frame.

Assuming that an image frame acquired by a camera at the t-n (where n is an integer greater than 2) is a key frame, the camera acquires n-1 image frames in total at the t-n +1 to t-1, if landmark points L1 and L2 are observed in the key frame acquired by the camera at the t-n, landmark points L1 and L2 are also observed in the image frame acquired at the t-n +1 to t-1, and landmark points L1 and L2 and landmark point L3 are both observed in the image frame acquired at the current time (i.e., at the t), the landmark point L3 is a newly added landmark point. For the newly added landmark point, the positioning system records the first image frame of the newly added landmark point as the reference frame of the corresponding feature point of the newly added landmark point, that is, the image frame collected at the current time (i.e., time t) is set as the reference frame of the feature point of the newly added landmark point L3.

The method comprises the steps of mapping landmark points observed in each collected image frame to a recently collected key frame, and bearing landmark point information through the key frame, so that the problem that the landmark point information is lost due to the fact that the image frame observing a certain path of landmark points is marginalized in the marginalization processing process is avoided, and further the positioning constraint corresponding to the landmark points is lost is further avoided.

When a reference frame of a feature point corresponding to a certain landmark point is converted into other image frames, the inverse depth value and the variance of the inverse depth value of the feature point corresponding to the landmark point need to be updated at the same time, so that the camera can be conveniently positioned by a preset visual positioning engine based on the updated information of the feature point corresponding to the landmark point.

In some embodiments of the present application, setting the reference frame of the corresponding feature point of the newly added landmark point as the current image frame includes: and taking the reference frame of the corresponding feature points of the newly added landmark points as a first reference frame, taking the current image frame as a second reference frame, and updating the corresponding feature points of the landmark points into inverse depth values based on the second reference frame based on the inverse depth values of the first reference frame.

Taking the reference frame switching diagram shown in FIG. 3 as an example, P is a landmark point whose three-dimensional coordinate position can be expressed as [ x/λ, y/λ,1/λ [ ]]^TThe image frame C0 of the landmark point is collected on the left side, and the image frame C1 of the landmark point is collected on the right side, then the visual residual e of the landmark point in the two image frames, i.e. the parameter item in the optimization problem, is

Can be represented by the projection formula as: p2-p2 ═ p2-K [ x/λ, y/λ,1/λ]^TWhere K is the attitude rotation matrix of the camera, p1 is the plane position coordinate of the landmark point p observed in the image frame C0, p2 is the plane position coordinate after p1 is mapped to the image frame C1, and p 2' is the plane position coordinate of the landmark point p observed in the image frame C1. It follows that the visual residual can be represented by an inverse depth value. Wherein the planar position coordinates correspond to the pose of the camera.

In some embodiments of the present application, the updating the landmark corresponding feature points to be based on the inverse depth value of the second reference frame based on the inverse depth value of the first reference frame includes: converting the corresponding feature points of the landmark points into inverse depth values based on the second reference frame through a preset inverse depth conversion function based on the inverse depth values of the first reference frame; wherein the preset inverse depth conversion function performs the following operations: determining spatial position information of the corresponding feature points of the landmark points based on the second reference frame according to the spatial position of the corresponding feature points based on the first reference frame, a camera attitude rotation matrix corresponding to the first reference frame and a visual residual error of the corresponding feature points of the landmark points between the first reference frame and the second reference frame; and determining the inverse depth value of the corresponding feature points of the landmark points based on the second reference frame according to the spatial position information of the corresponding feature points based on the second reference frame and the camera attitude rotation matrix corresponding to the second reference frame.

In some embodiments of the present application, the preset inverse depth conversion function may be expressed as:

λ’＝1/[(R1^T(R0*[x/λ,y/λ,1/λ]^T+p0-p1)).z()]；

wherein [ x/λ, y/λ,1/λ]^TThe landmark corresponding feature points are based on the spatial position of the first reference frame, λ is the inverse depth value of the landmark corresponding feature points based on the first reference frame, R0 is the camera pose rotation matrix corresponding to the first reference frame, R1 is the camera pose rotation matrix corresponding to the second reference frame, P0 is the camera pose corresponding to the first reference frame, P1 is the camera pose corresponding to the second reference frame, T is the transpose matrix, and z () is a function for taking the depth value. Wherein p0-p1 is a visual residual of the landmark corresponding feature points between the first reference frame and the second reference frame.

In some embodiments of the present application, after updating the landmark corresponding feature points based on the inverse depth value of the first reference frame to be based on the inverse depth value of the second reference frame, the method further includes: determining the variance of the corresponding feature points of the landmark points based on the inverse depth value of the second reference frame according to the variance of the corresponding feature points of the landmark points based on the inverse depth value of the first reference frame and the Jacobian matrix of the preset inverse depth conversion function on the inverse depth value; and updating the variance of the corresponding feature points of the landmark points in the preset visual positioning engine parameters through the variance of the corresponding feature points of the landmark points based on the inverse depth value of the second reference frame.

As can be seen from the above problem of localization solution, the variance of the inverse depth value is also a parameter of the default visual localization engine, and therefore, after the inverse depth value is updated, the variance of the inverse depth value is also updated. For example, by the formula v' ═ J (λ) × v × J (λ)^TZ () updating the variance of the inverse depth values, wherein λ is the inverse depth value of the landmark point feature point based on the first reference frame, and v is the inverseThe square variance of the depth value in the first reference frame, v' is the square variance of the inverse depth value in the first reference frame, J (λ) is the partial derivative of the preset inverse depth conversion function on the inverse depth value (i.e. a jacobian matrix), T is a transposed matrix, and z () is a function for taking the depth value.

And 140, in response to that the current image frame is not a key frame, setting a feature point reference frame corresponding to a newly added landmark point observed in the current image frame relative to a previous image frame as the current image frame.

In some embodiments of the present application, after determining landmark points observed in a current image frame, in response to the current image frame not being a key frame, setting a feature point reference frame corresponding to a newly added landmark point observed in the current image frame with respect to a previous image frame as the current image frame comprises: determining that the landmark points which are not observed in the previous image frame of the current image frame and observed in the current image frame correspond to the feature points as corresponding feature points of the newly added landmark points; and setting the reference frame of the corresponding feature point of the newly added road point as the current image frame. Namely, for the landmark points corresponding to the feature points observed in the current image frame for the first time, the reference frame of the landmark points corresponding to the feature points is directly set as the current image frame.

In some embodiments of the present application, feature point information of each image frame corresponding to each landmark point observed in each image frame acquired in the positioning process may be stored by a feature point tracking list. For example, the correspondence between the identifier of each image frame observing the landmark point (which may include the timestamp of image frame acquisition, the image frame category identifier of whether it is a key frame) and the landmark point, and the feature point information (such as plane position coordinates and inverse depth values) of the landmark point based on the corresponding image frame, and other information for positioning (such as variance of inverse depth values) may be stored. When whether a newly added road mark point is observed in a certain image frame is judged, the lost or added road mark point can be determined based on the corresponding relation between the marks of the adjacent image frames and the road mark points.

And 150, updating the information of the feature points corresponding to the landmark points observed in the image frame sequence according to the information of the feature points corresponding to the landmark points observed in the current image frame, so that the preset visual positioning engine positions the camera based on the updated information of the feature points corresponding to the landmark points.

Through the steps, the reference frame of the corresponding feature points of the newly added road point in the current image frame can be set. And, if the current image frame is a key frame, for some landmark corresponding feature points that are observed for the first time in other image frames and are co-viewed in the current image frame, its reference frame is converted into the current image frame. That is, the current image frame may carry information of corresponding feature points of the landmark points in a good tracking state. Next, it is necessary to store the current image frame into the image frame sequence, and at the same time, update the landmark point corresponding feature point information observed in each image frame in the image frame sequence.

As described above, in order to ensure that the computational complexity does not increase with the increase of the number of camera positions and save the storage space, an upper length limit is set for the image frame sequence used for performing the visual positioning, and when the image frames in the image frame sequence reach the set upper length limit, the image frames in the image frame sequence are subjected to marginalization in a first-in first-out manner, that is, the image frame acquired earliest is deleted. Therefore, when inserting the current image frame into the image frame sequence, it is necessary to determine whether the length of the image frame sequence reaches an upper limit threshold of the marginalization process, and maintain the image frame sequence and landmark point feature point information corresponding to each image frame in the image frame sequence when the length of the image frame sequence reaches the upper limit threshold of the marginalization process.

In some embodiments of the present application, updating the information of the landmark points observed in the image frame sequence according to the information of the landmark points observed in the current image frame comprises: adding the current image frame to the sequence of image frames; in response to a length of the image frame sequence reaching an marginalization processing upper threshold, determining a marginalized image frame of the image frame sequence; and converting the reference frame of the landmark points corresponding to the feature points by taking the edged image frame as the reference frame into other image frames in the image frame sequence, wherein the other image frames refer to the image frames in the image frame sequence except the edged image frame.

After reference frame setting is performed based on the current image frame, finally, the acquired current image frame is added to the image frame sequence for visual localization. And then, further judging whether the length of the image frame sequence reaches a set length threshold value after the current image frame is added, if so, optimizing the image frame sequence, such as marginalizing.

In some embodiments of the present application, the image frame to be edged is usually the image frame with the earliest acquisition time in the image frame sequence, and therefore, the image frame to be edged can be determined by the timestamp of each image frame in the image frame sequence.

For the image frame which is marginalized, in the positioning process, the feature point information of the landmark point observed for the first time in the image frame is adopted, that is, the information of the landmark point feature point using the image frame as a reference frame is adopted, and in order to keep the constraint of the landmark point on the positioning, the information of the landmark point feature point using the image frame as the reference frame needs to be converted into the information of the landmark point feature point using other image frames as the reference frames, and the landmark point feature point information is recalculated.

In specific implementation, according to the type attribute of the edged image frame and the tracking state of the landmark point corresponding to the feature point of the reference frame by using the edged image frame, the information update of the feature point corresponding to the landmark point is executed. Several methods for updating the information of the feature points are respectively described below.

In some embodiments of the present application, the converting the reference frame with the edge-processed image frame as the landmark point corresponding feature point of the reference frame into other image frames in the image frame sequence includes: for each landmark point corresponding feature point with the rimmed image frame as a reference frame, determining whether other image frames in the image frame sequence include a keyframe from which the landmark point is observed; and ending the operation of updating the information of the corresponding feature points of the observed road mark points in the image frame sequence in response to the other image frames comprising the key frame observing the road mark points.

Taking an image frame acquired by the fringed image frame at the time t-n and the landmark point L1 observed in the fringed image frame as an example, namely, a reference frame of the feature point p1 of the fringed image frame at the time t-n is the landmark point L1, for the landmark point feature point p1 of the image frame taken as a reference frame, firstly, a key frame which is in common with the landmark point L1 in the image frame acquired from the time t-n +1 to the time t (for example, a key frame which is a reference frame of the landmark point p1 in the image frame acquired from the time t-n +1 to the time t) is determined, if a key frame which is in common with the landmark point L1 in the fringed image frame exists, the reference frame updating operation performed when the current image frame is acquired as a key frame, the reference frame with the landmark point feature point of the fringed image frame taken as a reference frame has been converted into the latest and newest acquired reference frame, therefore, in this step, it is not necessary to perform any operation on the landmark point feature point information with the edged image frame as a reference frame.

In some embodiments of the present application, when a current image frame is acquired, the step of determining that the current image frame is a key frame may not be executed, and correspondingly, in the process of marginalization processing, the step of converting the reference frame, in which the marginalized image frame is used as a landmark point of a reference frame, into another image frame in the image frame sequence includes: for each landmark point corresponding feature point with the rimmed image frame as a reference frame, determining whether other image frames in the image frame sequence include a keyframe from which the landmark point is observed; in response to the other image frames including the key frame observing the landmark point, determining the key frame with the earliest acquisition time in the key frames observing the landmark point as a second reference frame, and taking the edged image frame as a first reference frame; and updating the corresponding feature points of the landmark points to be inverse depth values based on the second reference frame based on the inverse depth values of the first reference frame.

Still taking the current time as the time t, the image frame being edged is the image frame acquired at the time t-n, for example, the landmark point L1 is observed in the edged image frame, that is, the reference frame of the feature point p1 of the image frame to be edged into the landmark point L1, for the landmark point feature point p1 of the image frame to be the reference frame, the key frame which looks at the landmark point L1 in the image frame acquired from the time t-n +1 to the time t (for example, the key frame which is the reference frame of the landmark point feature point p1 in the image frame acquired from the time t-n +1 to the time t) is determined first, if there is a keyframe that co-looks at the landmark point L1 with the rimmed image frame, then the rimmed image frame is closest in acquisition time, and the keyframe that co-looks at the landmark point L1 with the rimmed image frame serves as a reference frame for the feature point of the landmark point L1.

When modifying a reference frame of a feature point corresponding to a landmark point, taking a reference frame with an earlier acquisition time as a first reference frame (such as a rimmed image frame), taking a determined key frame as a second reference frame, and updating an inverse depth value of the feature point corresponding to the landmark point based on the first reference frame to an inverse depth value based on the second reference frame, thereby realizing updating of an inverse depth parameter used by the preset visual positioning engine.

In some embodiments of the present application, the converting the reference frame with the edge-processed image frame as the landmark point corresponding feature point of the reference frame into other image frames in the image frame sequence includes: for each landmark point corresponding feature point with the rimmed image frame as a reference frame, determining whether other image frames in the image frame sequence include a keyframe from which the landmark point is observed; in response to the fact that the key frame observing the landmark point is not included in the other image frames, determining the other image frame with the earliest acquisition time in the other image frames observing the landmark point as a second reference frame, and taking the edged image frame as a first reference frame; and updating the corresponding feature points of the landmark points to be inverse depth values based on the second reference frame based on the inverse depth values of the first reference frame.

Taking an image frame acquired when the current time is t time and an edged image frame is t-n time, as an example, a landmark point L1 is observed in the edged image frame, that is, a reference frame of which the edged image frame is a feature point p1 of a landmark point L1, for a landmark point feature point p1 of which the image frame is a reference frame, a key frame which is common with the landmark point L1 in the image frame acquired from t-n +1 time to t time is first determined (for example, a key frame which is a reference frame of the landmark point feature point p1 in the image frame acquired from t-n +1 time to t time is determined). If there is no keyframe that is co-located with the rimmed image frame at the landmark point L1, an image frame that is co-located with the rimmed image frame at the landmark point L1 in the image frames captured from time t-n +1 to time t is further determined (e.g., an image frame that is a reference frame for the landmark point feature point p1 in the image frames captured from time t-n +1 to time t is determined). If there is an image frame that is co-located with the rimmed image frame at the landmark point L1, the image frame that is closest in acquisition time to the rimmed image frame and that is co-located with the rimmed image frame at the landmark point L1 serves as a reference frame for the feature point of the landmark point L1.

When modifying a reference frame of a feature point corresponding to a certain landmark point, taking a reference frame with an earlier acquisition time as a first reference frame (such as an edged image frame), taking a reference frame with a later acquisition time as a second reference frame (such as an image frame with an acquisition time closest to the edged image frame and viewing the landmark point together with the edged image frame), updating the inverse depth value of the feature point corresponding to the landmark point based on the first reference frame to the inverse depth value based on the second reference frame, and implementing the updating of the inverse depth parameter used by the preset visual positioning engine.

In some embodiments of the present application, the converting the reference frame with the edge-processed image frame as the landmark point corresponding feature point of the reference frame into other image frames in the image frame sequence includes: for each landmark point corresponding feature point with the edged image frame as a reference frame, determining whether other image frames in the image frame sequence include an image frame observing the landmark point; and in response to the image frames in which the landmark points are observed are not included in the other image frames, deleting the corresponding feature points of the landmark points.

Taking an image frame acquired when the current time is t time and an edged image frame is t-n time, for example, a landmark point L1 is observed in the edged image frame, that is, a reference frame of which the edged image frame is a feature point p1 of a landmark point L1, for a landmark point feature point p1 of which the image frame is a reference frame, an image frame which is viewed from the landmark point L1 in common with the edged image frame in the image frame acquired from t-n +1 time to t time is first determined (for example, an image frame which is a reference frame of the landmark point feature point p1 in the image frame acquired from t-n +1 time to t time is determined). If there is no image frame that is co-located with the landmark point L1 from the rimmed image frame, indicating that the landmark point L1 is lost, the landmark point corresponding feature point is deleted.

In the marginalization processing process, the feature point corresponding to the landmark point is updated to the specific implementation based on the inverse depth value of the second reference frame based on the inverse depth value of the first reference frame, which is described in the foregoing steps and is not described herein again.

The specific way of performing visual positioning based on the updated information of the corresponding feature points of the landmark points is the prior art, and is not repeated in the embodiment of the application.

According to the visual positioning method based on the inverse depth parameter, when the key frame is collected, the reference frame of the corresponding feature point of the road sign point is converted into the subsequent recently collected key frame with the common view relation or the common image frame from the image frame observing the road sign point for the first time, so that when the positioning system optimizes the road sign point data for positioning, the corresponding feature point information of the road sign point with good tracking state can be reserved, the maximum utilization of the constraint of the road sign point is realized, and the effect of improving the positioning accuracy is achieved.

On the other hand, in the embodiment of the application, the newly added landmark points observed in the non-key frame are also tracked and updated, so that the positioning accuracy can be further improved.

Example two

The embodiment of the present application discloses a visual positioning device based on inverse depth parameter, as shown in fig. 4, the device includes:

a current image frame acquiring module 410, configured to determine, in a process of performing visual positioning based on an inverse depth parameter, feature points corresponding to landmark points observed in a current image frame in response to acquiring the current image frame acquired by a camera;

a key frame determining module 420, configured to determine whether the current image frame is a key frame;

a first reference frame setting module 430, configured to set, in response to that the current image frame is a key frame, a reference frame of a feature point corresponding to a newly added landmark observed in the current image frame relative to a previous key frame as the current image frame;

a second reference frame setting module 440, configured to set, in response to that the current image frame is not a key frame, a reference frame of a feature point corresponding to a newly added landmark observed in the current image frame relative to a previous image frame as the current keyframe;

a feature point information updating module 450, configured to update information of feature points corresponding to landmark points observed in the image frame sequence according to information of feature points corresponding to landmark points observed in the current image frame, so that the preset visual positioning engine positions the camera based on the updated information of feature points corresponding to landmark points.

In some embodiments of the present application, the feature point information updating module 450 is further configured to:

adding the current image frame to the sequence of image frames;

in response to a length of the image frame sequence reaching an marginalization processing upper threshold, determining a marginalized image frame of the image frame sequence;

and converting the reference frame of the landmark points corresponding to the feature points by taking the edged image frame as the reference frame into other image frames in the image frame sequence, wherein the other image frames refer to the image frames in the image frame sequence except the edged image frame.

In some embodiments of the present application, the converting the reference frame with the edge-processed image frame as the landmark point corresponding feature point of the reference frame into other image frames in the image frame sequence includes:

for each landmark point corresponding feature point with the rimmed image frame as a reference frame, determining whether other image frames in the image frame sequence include a keyframe from which the landmark point is observed;

and ending the operation of updating the information of the corresponding feature points of the observed road mark points in the image frame sequence in response to the other image frames comprising the key frame observing the road mark points.

in response to the fact that the key frame observing the landmark point is not included in the other image frames, determining the other image frame with the earliest acquisition time in the other image frames observing the landmark point as a second reference frame, and taking the edged image frame as a first reference frame;

and updating the corresponding feature points of the landmark points to be inverse depth values based on the second reference frame based on the inverse depth values of the first reference frame.

for each landmark point corresponding feature point with the edged image frame as a reference frame, determining whether other image frames in the image frame sequence include an image frame observing the landmark point;

and in response to the image frames in which the landmark points are observed are not included in the other image frames, deleting the corresponding feature points of the landmark points.

In some embodiments of the present application, the updating the landmark corresponding feature points to be based on the inverse depth value of the second reference frame based on the inverse depth value of the first reference frame includes:

converting the corresponding feature points of the landmark points into inverse depth values based on the second reference frame through a preset inverse depth conversion function based on the inverse depth values of the first reference frame; wherein the preset inverse depth conversion function performs the following operations:

determining spatial position information of the corresponding feature points of the landmark points based on the second reference frame according to the spatial position of the corresponding feature points based on the first reference frame, a camera attitude rotation matrix corresponding to the first reference frame and a visual residual error of the corresponding feature points of the landmark points between the first reference frame and the second reference frame;

and determining the inverse depth value of the corresponding feature points of the landmark points based on the second reference frame according to the spatial position information of the corresponding feature points based on the second reference frame and the camera attitude rotation matrix corresponding to the second reference frame.

In some embodiments of the application, after updating the landmark corresponding feature points based on the inverse depth value of the first reference frame to be based on the inverse depth value of the second reference frame, the method further includes:

determining the variance of the corresponding feature points of the landmark points based on the inverse depth value of the second reference frame according to the variance of the corresponding feature points of the landmark points based on the inverse depth value of the first reference frame and the Jacobian matrix of the preset inverse depth conversion function on the inverse depth value;

and updating the variance of the corresponding feature points of the landmark points in the preset visual positioning engine parameters through the variance of the corresponding feature points of the landmark points based on the inverse depth value of the second reference frame.

The visual positioning device based on the inverse depth parameter disclosed in the embodiment of the present application is used to implement the visual positioning method based on the inverse depth parameter described in the first embodiment of the present application, and specific implementation manners of each module of the device are not described again, and reference may be made to specific implementation manners of corresponding steps in the method embodiment.

In the visual positioning device based on the inverse depth parameter disclosed by the embodiment of the application, in the process of carrying out visual positioning based on the inverse depth parameter, in response to the acquisition of a current image frame acquired by a camera, feature points corresponding to landmark points observed in the current image frame are determined; determining whether the current image frame is a key frame; responding to the fact that the current image frame is a key frame, and setting a reference frame of a new road sign point corresponding to a feature point observed by the current image frame relative to a previous key frame as the current image frame; in response to the fact that the current image frame is not a key frame, setting a reference frame of the current image frame, relative to a feature point corresponding to a newly added landmark observed in a previous image frame, as the current key frame; and updating the information of the feature points corresponding to the landmark points observed in the image frame sequence according to the information of the feature points corresponding to the landmark points observed in the current image frame, so that the preset visual positioning engine positions the camera based on the updated information of the feature points corresponding to the landmark points, and the accuracy of visual positioning is improved.

The visual positioning device based on the inverse depth parameter, disclosed by the embodiment of the application, converts the reference frame of the corresponding feature points of the road sign into the subsequent recently acquired key frame with the common view relation from the image frame observing the corresponding feature points of the road sign for the first time when the key frame is acquired, or the common image frame, so that when the positioning system is used for optimizing the road sign point data for positioning, the corresponding feature point information of the road sign with good tracking state can be reserved, the maximum utilization of the constraint of the road sign points is realized, and the effect of improving the positioning accuracy is achieved.

The embodiments in the present specification are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other. For the device embodiment, since it is basically similar to the method embodiment, the description is simple, and for the relevant points, refer to the partial description of the method embodiment.

The foregoing detailed description is directed to a method and an apparatus for visual positioning based on inverse depth parameters, and specific examples are applied herein to illustrate the principles and implementations of the present application, and the above descriptions of the examples are only used to help understand the method and a core idea of the present application; meanwhile, for a person skilled in the art, according to the idea of the present application, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present application.

The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.

The various component embodiments of the present application may be implemented in hardware, or in software modules running on one or more processors, or in a combination thereof. Those skilled in the art will appreciate that a microprocessor or Digital Signal Processor (DSP) may be used in practice to implement some or all of the functionality of some or all of the components in an electronic device according to embodiments of the present application. The present application may also be embodied as apparatus or device programs (e.g., computer programs and computer program products) for performing a portion or all of the methods described herein. Such programs implementing the present application may be stored on a computer readable medium or may be in the form of one or more signals. Such a signal may be downloaded from an internet website or provided on a carrier signal or in any other form.

For example, fig. 5 shows an electronic device that may implement a method according to the present application. The electronic device can be a PC, a mobile terminal, a personal digital assistant, a tablet computer and the like. The electronic device conventionally comprises a processor 510 and a memory 520, and program code 530 stored on said memory 520 and executable on the processor 510, said processor 510 implementing the method described in the above embodiments when executing said program code 530. The memory 520 may be a computer program product or a computer readable medium. The memory 520 may be an electronic memory such as a flash memory, an EEPROM (electrically erasable programmable read only memory), an EPROM, a hard disk, or a ROM. The memory 520 has a storage space 5201 for program code 530 of the computer program for performing any of the method steps of the above-described method. For example, the storage space 5201 for the program code 530 may include respective computer programs for implementing the respective steps in the above methods. The program code 530 is computer readable code. The computer programs may be read from or written to one or more computer program products. These computer program products comprise a program code carrier such as a hard disk, a Compact Disc (CD), a memory card or a floppy disk. The computer program comprises computer readable code which, when run on an electronic device, causes the electronic device to perform the method according to the above embodiments.

The embodiment of the present application also discloses a computer readable storage medium, on which a computer program is stored, which when executed by a processor, implements the steps of the visual positioning method based on the inverse depth parameter according to the first embodiment of the present application.

Such a computer program product may be a computer-readable storage medium that may have memory segments, memory spaces, etc. arranged similarly to the memory 520 in the electronic device shown in fig. 5. The program code may be stored in a computer readable storage medium, for example, compressed in a suitable form. The computer readable storage medium is typically a portable or fixed storage unit as described with reference to fig. 6. Typically, the storage unit comprises computer readable code 530 ', said computer readable code 530' being code read by a processor, which when executed by the processor, performs the steps of the method described above.

Reference herein to "one embodiment," "an embodiment," or "one or more embodiments" means that a particular feature, structure, or characteristic described in connection with the embodiment is included in at least one embodiment of the present application. Moreover, it is noted that instances of the word "in one embodiment" are not necessarily all referring to the same embodiment.

In the description provided herein, numerous specific details are set forth. However, it is understood that embodiments of the application may be practiced without these specific details. In some instances, well-known methods, structures and techniques have not been shown in detail in order not to obscure an understanding of this description.

In the claims, any reference signs placed between parentheses shall not be construed as limiting the claim. The word "comprising" does not exclude the presence of elements or steps not listed in a claim. The word "a" or "an" preceding an element does not exclude the presence of a plurality of such elements. The application may be implemented by means of hardware comprising several distinct elements, and by means of a suitably programmed computer. In the unit claims enumerating several means, several of these means may be embodied by one and the same item of hardware. The usage of the words first, second and third, etcetera do not indicate any ordering. These words may be interpreted as names.

Finally, it should be noted that: the above embodiments are only used to illustrate the technical solutions of the present application, and not to limit the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions in the embodiments of the present application.

Claims

1. A visual positioning method based on an inverse depth parameter is characterized by comprising the following steps:

determining whether the current image frame is a key frame;

2. The method of claim 1, wherein the step of updating the information of landmark points corresponding to feature points observed in the image frame sequence with the information of landmark points corresponding to feature points observed in the current image frame comprises:

adding the current image frame to the sequence of image frames;

3. The method of claim 2, wherein the step of converting the reference frame having landmark points corresponding to the feature points of the edged image frame as the reference frame into other image frames in the image frame sequence comprises:

4. The method of claim 2, wherein the step of converting the reference frame having landmark points corresponding to the feature points of the edged image frame as the reference frame into other image frames in the image frame sequence comprises:

5. The method of claim 2, wherein the step of converting the reference frame having landmark points corresponding to the feature points of the edged image frame as the reference frame into other image frames in the image frame sequence comprises:

6. The method of claim 4, wherein the step of updating the landmark corresponding feature points based on the inverse depth value of the first reference frame to be based on the inverse depth value of the second reference frame comprises:

7. The method of claim 4, wherein after the step of updating the landmark corresponding feature points based on the inverse depth value of the first reference frame to be based on the inverse depth value of the second reference frame, further comprising:

8. A visual positioning apparatus based on inverse depth parameters, comprising:

9. An electronic device comprising a memory, a processor and a program code stored on the memory and executable on the processor, wherein the processor implements the method for visual localization based on inverse depth parameters of any of claims 1 to 7 when executing the program code.

10. A computer readable storage medium having stored thereon a program code, which when executed by a processor implements the steps of the inverse depth parameter based visual positioning method of any of claims 1 to 7.