CN114998743A

CN114998743A - Method, device, equipment and medium for constructing visual map points

Info

Publication number: CN114998743A
Application number: CN202210795743.7A
Authority: CN
Inventors: 邢志伟; 魏伟; 赵信宇; 魏金生; 李骥; 龙建睿; 颜世龙
Original assignee: Guangdong Dadao Zhichuang Technology Co ltd
Current assignee: Guangdong Dadao Zhichuang Technology Co ltd
Priority date: 2022-07-07
Filing date: 2022-07-07
Publication date: 2022-09-02

Abstract

The application relates to a method, a device, equipment and a medium for constructing visual map points, relating to the field of map point generation and comprising the following steps: analyzing and screening a real-time visual frame to generate a visual key frame, determining a first map point and a common view based on the visual key frame, if the characteristic points of the visual key frame are associated, updating the first map point and the common view, if the characteristic points of the visual key frame have depth values and are not associated, constructing a second map point, if the characteristic points of the visual key frame do not have depth values, triangulating the visual key frame, the first common view key frame and the second common view key frame to obtain a third map point, presetting the second map point, the third map point and the fourth map point to obtain a standard map point, a reject map point and a transfer map point, and merging the standard map point into the first map point to obtain the visual map point. The method and the device have the effect of improving the robustness of the visual map points.

Description

Method, device, equipment and medium for constructing visual map points

Technical Field

The present application relates to the field of map point construction technologies, and in particular, to a method, an apparatus, a device, and a medium for constructing visual map points.

Background

The core task of the visual SLAM (Simultaneous Localization and Mapping) technology algorithm is the construction of a visual map, and the visual map is generally composed of a set of visual map points with descriptive information.

Currently, an RGBD camera or a binocular camera may be used to acquire a visual frame of a target scene, and then a visual map of the target scene is constructed. However, the construction of the visual map points is often disposable, so that the constructed visual map contains map points constructed by dynamic objects and noisy points, and during subsequent loop detection or repositioning, the accuracy of loop detection and repositioning is reduced by the map points.

Disclosure of Invention

In order to improve the robustness of the visual map points, the application provides a construction method, a device, equipment and a medium of the visual map points.

In a first aspect, the present application provides a method for constructing a visual map point, which is implemented by the following technical solutions:

a method for constructing a visual map point, comprising:

when a map building instruction is detected, acquiring a real-time visual frame, analyzing and screening the real-time visual frame, and generating a visual key frame;

determining a first map point and a common view based on the visual key frame, and judging whether the feature point of the visual key frame has a depth value and/or the first map point is associated, wherein the first map point is an original map point of the visual key frame, and the common view is a common view corresponding to the first map point;

if the feature point of the visual key frame is associated with the first map point, updating the first map point and the common view based on the feature point of the visual key frame;

if the feature points of the visual key frame have depth values and are not associated with the first map point, constructing a second map point based on the feature points;

if the feature points of the visual key frame do not have depth values, triangularizing construction is carried out on the visual key frame, the first common-view key frame and the second common-view key frame to obtain a third map point;

respectively presetting a second map point, a third map point and a fourth map point to obtain standard map points, eliminating map points and moving map points, eliminating the eliminated map points, presetting the map points of the moving map points moving to the next key frame of the visual key frame, and presetting the fourth map point as the moving map point obtained by presetting the map point of the previous key frame of the visual key frame;

and merging the standard map points into the first map points to obtain the visual map points.

By adopting the technical scheme, when map points of the visual map are constructed, after a map construction instruction is detected, a real-time visual frame is obtained, the real-time visual frame is analyzed and screened to generate a visual key frame, then a first map point and a common view are determined based on the visual key frame, whether the feature points of the visual key frame have depth values and/or the first map point is associated is judged, the first map point is an original map point of the visual key frame, the common view is a common view corresponding to the first map point, when the feature points of the visual key frame are associated with the first map point, the first map point and the common view are updated based on the feature points of the visual key frame, when the feature points of the visual key frame have depth values and are not associated with the first map point, a second map point is constructed based on the feature points, when the feature points of the visual key frame do not have depth values, the method comprises the steps of triangularizing a visual key frame, a first common-view key frame and a second common-view key frame to obtain a third map point, presetting second map points, third map points and fourth map points respectively to obtain standard map points, rejected map points and moving map points, rejecting the rejected map points, moving the moving map points to the map point of the next key frame of the visual key frame to preset, and merging the standard map points into the first map point to obtain the visual map points, so that the visual map points with robustness are obtained, and the robustness of the visual map is improved.

In another possible implementation manner, determining the first map point and the co-view based on the visual key frame includes:

judging whether the visual key frame is an initial key frame;

if so, extracting the feature points with the depth values in the real-time visual frame, constructing map points based on the feature points, and initializing the map points to obtain a first map point and a common view.

According to the technical scheme, when the visual key frame is initialized, whether the visual key frame is the initial key frame or not is judged, when the visual key frame is the initial key frame, the feature points with the depth values in the real-time visual frame are extracted, the map points are constructed on the basis of the feature points, the map points are initialized, the first map points and the common view are obtained, and therefore a foundation is laid for construction of a follow-up visual map.

In another possible implementation manner, performing analysis screening on the real-time visual frames to generate the visual key frames includes:

acquiring equipment displacement information, wherein the equipment displacement information is the movement information of the shooting equipment;

and determining whether the shooting equipment generates a preset displacement distance or not based on the equipment displacement information, and if so, comparing and analyzing the real-time visual frame and a previous key frame of the real-time visual frame to generate a visual key frame.

According to the technical scheme, when the visual key frame is generated, the equipment displacement information is obtained, the equipment displacement information is the movement information of the shooting equipment, then whether the shooting equipment generates the preset displacement distance or not is determined based on the equipment displacement information, if the shooting equipment generates the preset displacement distance, the real-time visual frame and the previous key frame of the real-time visual frame are compared and analyzed, the visual key frame is generated, and therefore the data processing amount is reduced. The accuracy of the map data features of the visual keyframes is improved.

In another possible implementation manner, extracting feature points with depth values in the real-time visual frame includes:

determining an image pyramid based on the real-time visual frame, and extracting feature points from each layer of image of the image pyramid to obtain feature points with depth values;

and carrying out homogenization treatment on the characteristic points to obtain the treated characteristic points.

According to the technical scheme, when the feature points are extracted, the image pyramid is determined based on the real-time visual frame, the feature points are extracted from each layer of image of the image pyramid to obtain the feature points with the depth values, then the feature points are subjected to homogenization treatment to obtain the treated feature points, and therefore feature point aggregation is reduced. Therefore, the accuracy and the real-time performance of map points and co-view are improved.

In another possible implementation manner, the homogenizing processing is performed on the feature points to obtain processed feature points, and then the method further includes:

calculating the rotation main direction of the characteristic points to obtain direction variable quantity;

and binding the direction variable quantity and the characteristic points to obtain the bound characteristic points.

Through the technical scheme, after the feature points are extracted, the rotation main direction of the feature points is calculated to obtain the direction variation, and then the direction variation is bound with the feature points to obtain the bound feature points, so that the follow-up feature points can be matched conveniently, and the matching accuracy of the feature points is improved.

In another possible implementation manner, triangulating the visual key frame, the first common view key frame, and the second common view key frame to obtain a third map point includes:

acquiring a first common-view key frame and a second common-view key frame, wherein the first common-view key frame is a key frame meeting the common-view degree requirement of the visual key frames, and the second common-view key frame is a key frame meeting the common-view degree requirement of the first common-view key frames;

respectively carrying out feature point matching on the visual key frame, the first common-view key frame and the second common-view key frame to obtain matched feature points;

and constructing the matched feature points by triangulation to obtain a third map point.

According to the technical scheme, when the third map point is constructed, the first common-view key frame and the second common-view key frame are obtained, wherein the first common-view key frame is a key frame meeting the requirement of the common-view range of the visual key frame, the second common-view key frame is a key frame meeting the requirement of the common-view range of the first common-view key frame, then the visual key frame is respectively matched with the first common-view key frame and the second common-view key frame through the feature points to obtain the matched feature points, then the matched feature points are constructed through triangulation to obtain the third map point, and therefore the accuracy of the third map point is improved.

In another possible implementation manner, the step of merging the map point reaching the standard into the first map point to obtain the visual map point further comprises:

determining a historical key frame set based on the visual key frame, and screening historical map points in the historical key frame set to obtain a historical first map point;

and carrying out map construction on the historical first map points and the first map points to obtain the visual map.

According to the technical scheme, when the visual map is constructed, the historical key frame set is determined according to the visual key frames, the historical map points in the historical key frame set are screened to obtain the historical first map points, and then the visual map is constructed based on the historical first map points and the first map points to obtain the visual map, so that the robustness of the visual map is improved.

In a second aspect, the present application provides a device for constructing a visual map point, which adopts the following technical solutions:

an apparatus for constructing a visual map point, comprising:

the acquisition and generation module is used for acquiring a real-time visual frame after a map construction instruction is detected, analyzing and screening the real-time visual frame and generating a visual key frame;

a determination and judgment module, configured to determine a first map point and a common view based on the visual key frame, and judge whether a feature point of the visual key frame has a depth value and/or the first map point is associated, where the first map point is an original map point of the visual key frame, and the common view is a common view corresponding to the first map point;

an updating module, configured to update the first map point and the co-view based on the feature point of the visual key frame when the feature point of the visual key frame is associated with the first map point;

a first construction module for constructing second map points based on feature points of the visual key frame when the feature points have depth values and are not associated with the first map points;

the second construction module is used for triangularizing the visual key frame, the first common-view key frame and the second common-view key frame to obtain a third map point when the feature points of the visual key frame do not have depth values;

the processing module is used for respectively carrying out preset processing on the second map point, the third map point and the fourth map point to obtain a map point reaching the standard, a rejected map point and a moving map point, rejecting the rejected map point, moving the moving map point to a map point of a next key frame of the visual key frame for preset processing, and obtaining a moving map point by the preset processing of the map point of a previous key frame of the visual key frame;

and the fusion construction module is used for merging the standard map points into the first map points to obtain the visual map points.

By adopting the technical scheme, when map points of the visual map are constructed, after a map construction instruction is detected, a real-time visual frame is obtained, the real-time visual frame is analyzed and screened to generate a visual key frame, then a first map point and a common view are determined based on the visual key frame, whether feature points of the visual key frame have depth values and/or the first map point is associated is judged, the first map point is an original map point of the visual key frame, the common view is a common view corresponding to the first map point, when the feature points of the visual key frame are associated with the first map point, the first map point and the common view are updated based on the feature points of the visual key frame, when the feature points of the visual key frame have depth values and are not associated with the first map point, a second map point is constructed based on the feature points, when the feature points of the visual key frame do not have depth values, triangularization construction is carried out on the visual key frame, the first common-view key frame and the second common-view key frame to obtain a third map point, then preset processing is carried out on the second map point, the third map point and a fourth map point respectively to obtain a standard map point, a rejected map point and a moving map point, the rejected map point is rejected, the moving map point is moved to a map point of a next key frame of the visual key frame to be preset processed, the fourth map point is a moving map point obtained by preset processing of the map point of the last key frame of the visual key frame, then the standard map point is merged into the first map point to obtain the visual map point, and therefore the visual map point with robustness is obtained.

In a possible implementation manner, when determining the first map point and the co-view based on the visual key frame, the determination module is specifically configured to:

judging whether the visual key frame is an initial key frame;

if yes, extracting feature points with depth values in the real-time visual frame, constructing map points based on the feature points, and initializing the map points to obtain a first map point and a common view.

In another possible implementation manner, the obtaining and generating module is specifically configured to, when performing analysis and screening on a real-time visual frame and generating a visual key frame:

In another possible implementation manner, when extracting the feature points with depth values in the real-time visual frame, the determination module is specifically configured to:

In another possible implementation manner, the apparatus further includes: a direction calculation module and a data binding module, wherein,

the direction calculation module is used for calculating the main rotation direction of the characteristic points to obtain direction variation;

and the data binding module is used for binding the direction variable quantity and the characteristic points to obtain the bound characteristic points.

In another possible implementation manner, when triangulating the visual key frame, the first common-view key frame, and the second common-view key frame to obtain a third map point, the second construction module is specifically configured to:

acquiring a first common-view key frame and a second common-view key frame, wherein the first common-view key frame is a key frame meeting the common-view range requirement of the visual key frames, and the second common-view key frame is a key frame meeting the common-view range requirement of the first common-view key frames;

and constructing the matched feature points through triangulation to obtain a third map point.

In another possible implementation manner, the apparatus further includes: a data screening module and a map construction module, wherein,

the data screening module is used for determining a historical key frame set based on the visual key frame and screening historical map points in the historical key frame set to obtain a historical first map point;

and the map construction module is used for carrying out map construction on the historical first map points and the first map points to obtain the visual map.

In a third aspect, the present application provides an electronic device, which adopts the following technical solutions:

an electronic device, comprising:

one or more processors;

a memory;

one or more application programs, wherein the one or more application programs are stored in the memory and configured to be executed by the one or more processors, the one or more programs configured to: -performing a method of construction of a visual map point according to any of claims 1 to 7.

In a fourth aspect, the present application provides a computer-readable storage medium, which adopts the following technical solutions:

a computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the method of constructing a visual map point according to any one of claims 1 to 7.

In summary, the present application includes at least one of the following beneficial technical effects:

1. when map points of a visual map are constructed, after a map construction instruction is detected, a real-time visual frame is obtained, the real-time visual frame is analyzed and screened to generate a visual key frame, then a first map point and a common view are determined based on the visual key frame, whether a feature point of the visual key frame has a depth value and/or the first map point is associated is judged, the first map point is an original map point of the visual key frame, the common view is a common view corresponding to the first map point, when the feature point of the visual key frame is associated with the first map point, the first map point and the common view are updated based on the feature point of the visual key frame, when the feature point of the visual key frame has a depth value and is not associated with the first map point, a second map point is constructed based on the feature point, and when the feature point of the visual key frame does not have a depth value, the visual key frame is subjected to image extraction, and image extraction, Triangularization construction is carried out on the first common-view key frame and the second common-view key frame to obtain a third map point, then preset processing is respectively carried out on the second map point, the third map point and the fourth map point to obtain a standard map point, a rejected map point and a moving map point, the rejected map point is rejected, the moving map point is moved to a map point of a next key frame of the visual key frame to be preset processed, the fourth map point is a moving map point obtained by preset processing of the map point of the previous key frame of the visual key frame, and then the standard map point is merged into the first map point to obtain the visual map point, so that the visual map point with robustness is obtained, and the robustness of the visual map is improved;

2. when the visual key frame is initialized, judging whether the visual key frame is an initial key frame, when the visual key frame is the initial key frame, extracting feature points with depth values in the real-time visual frame, constructing map points based on the feature points, initializing the map points to obtain a first map point and a common view, and laying a foundation for constructing a subsequent visual map.

Drawings

Fig. 1 is a schematic flowchart of a method for constructing a visual map point according to an embodiment of the present application.

Fig. 2 is a schematic structural diagram of a construction apparatus for a visual map point according to an embodiment of the present application.

Fig. 3 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Detailed Description

The present application is described in further detail below with reference to figures 1-3.

A person skilled in the art, after reading the present specification, may make modifications to the present embodiments as necessary without inventive contribution, but only within the scope of the claims of the present application are protected by patent laws.

In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some embodiments of the present application, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

In addition, the term "and/or" herein is only one kind of association relationship describing an associated object, and means that there may be three kinds of relationships, for example, a and/or B, which may mean: a exists alone, A and B exist simultaneously, and B exists alone. In addition, the character "/" herein generally indicates that the former and latter related objects are in an "or" relationship, unless otherwise specified.

The terms used in this scheme are explained further:

SLAM: simultaneous localization and mapping, instantaneous localization and mapping.

The characteristic points are as follows: the feature points refer to some pixel points in the picture which are obviously different from other pixels and texture information formed by surrounding pixels. They are the main processing objects of the human visual system and the robot visual SLAM system, and have characteristics that can be extracted and matched for correlation. Common feature points are often characterized by extremely high or low brightness, surrounding pixels showing surrounding distribution, extreme pixel gradients in multiple directions, and the like. The extraction of the feature points is to find the special pixel points from the picture and record the coordinate information of the special pixel points.

The description of the feature point is that the feature point which has been extracted is assigned a specific and recognizable ID, which is usually related to the brightness of the surrounding image, and can be analogized to each feature pointThe person issues an identification number. Meanwhile, the description of the feature point generally also comprises an intensity value of the feature point, which indicates the degree of prominence of the feature, and the higher the feature point is, the less sensitive and more stable the feature point is to the influence of ambient illumination, and the lower the feature point is, the less reliable the feature point is. Each extracted feature point can be represented as:

。

where (u, v) represents the position coordinates of the feature point, s is its intensity value, and d is a descriptor (typically a fixed-size matrix).

Common feature point extraction algorithms include Harris, SIFT, SURF, ORB and the like. In engineering practice, the ORB features have the characteristics of small calculated amount, simple descriptors and the like, and are suitable for scenes with prior quantity and speed. All of the following defaults to ORB feature points.

Depth value: the distance from the imaged object to the center of the camera can be directly measured by the RGBD camera, and the distance from the imaged object to the center of the camera can be obtained by stereo vision calculation by the binocular camera.

Visual frame: a binocular image, or an RGBD image.

Key frame: based on visual frame construction, the visual frame mainly comprises the following components: { pose, feature point set, and map point set corresponding to the feature points (if there are associated map points at the feature points) }, wherein the pose is calculated by the PnP algorithm.

Visual map points: map points for short mainly comprise: { world coordinates, best descriptor, set of observation keyframes, observation distance range, average observation direction }.

And observing the key frame: the map point is associated (observed) with a feature point of the key frame.

Observation distance range: and observing the minimum and maximum intervals of the distance between the key frame and the map point.

Observation direction: and observing a straight line direction formed by the key frame and the map point under the world coordinate system.

In common view: the vertex is a key frame, the edge represents that the connected key frames have a common-view relationship (common-view points exist), and the weight of the edge is the total number of the common-view points.

A first map point: map points for finally forming visual map

And a second map point: and constructing the candidate map point by the feature points which are not associated with the first map point but have the depth value in the new key frame.

Third map point: and matching the candidate map points by using the feature points which are not associated with the first map point and have no depth value in the new key frame with the similar feature points (not associated with the first map point and also have no depth value) of the first and second co-viewing key frames of the key frame, and constructing the candidate map points by using a trigonometric method.

Fourth map point: and after the candidate map points, the second map point and the third map point are screened, the candidate map points are not merged into the first map point or are not eliminated, and the candidate map points need to be screened continuously.

The embodiment of the application provides a method for constructing a visual map point, which is executed by an electronic device, wherein the electronic device can be a server or a terminal device, wherein the server can be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server for providing cloud computing service. The terminal device may be a smart phone, a tablet computer, a notebook computer, a desktop computer, etc., but is not limited thereto, and the terminal device and the server may be directly or indirectly connected through wired or wireless communication, and the embodiment of the present application is not limited thereto, as shown in fig. 1, the method includes step S10, step S11, step S12, step S13, step S14, and step S15, wherein,

and step S10, acquiring a real-time visual frame after detecting the map building instruction, and analyzing and screening the real-time visual frame to generate a visual key frame.

Specifically, the real-time visual frame is an image shot in the current time and can be acquired through shooting equipment, and the visual key frame is a key frame in the current time and is obtained by extracting the visual frame in the previous time period. The visual frame is different from the key frame in that the visual frame is generated every moment, and the key frame is one of the most representative visual frames to be locally similar.

Specifically, a segment of animation is essentially a certain number of pictures that are played continuously over a period of time. For each picture, we call it a "visual frame". As for the reason, the visual frames are called, because the pictures contain the changing relation of time and position, and the animation can be seen by human eyes when a plurality of frames are played continuously and quickly within a certain time. In the same time, the more visual frames are played, the smoother the picture looks, namely, key frames are selected from the visual frames, and the keywords are played in a combined mode.

In particular, generating visual key frames has the advantages of:

the information redundancy among the close frames is high, and the key frame is the most representative frame in the local close frames, so that the information redundancy can be reduced. For example, the camera is left in place, normal frames are still to be recorded, but the key frames are not increased.

The quality of pictures, the quality of characteristic points and the like are also considered when the key frames are selected, the depth of common frames is often projected onto the key frames to optimize the depth map in RGB-D SLAM related schemes such as Bundle Fusion, RKD SLAM and the like, and the key frames are the result of filtering and optimization of the common frames to a certain extent, so that useless or wrong information is prevented from entering the optimization process to damage the accuracy of positioning and mapping.

Step S11, determining the first map point and the co-view based on the visual key frame, and determining whether the feature point of the visual key frame has a depth value and/or the first map point is associated.

The first map point is an original map point of the visual key frame, and the common view is a common view corresponding to the first map point.

Specifically, a map point corresponding to the image is determined according to the current visual key frame, and the map point is a first map point.

In step S12, if the feature point of the visual key frame is associated with the first map point, the first map point and the co-view are updated based on the feature point of the visual key frame.

In step S13, if the feature points of the visual key frame have depth values and are not associated with the first map point, constructing a second map point based on the feature points.

In particular, a new key frame (denoted as

) Has already been associated with the previous key frame (denoted as

) The feature point matching is carried out and recorded

A certain characteristic point in (1) is

,

There are several cases as follows:

1.

in that

There is a matching feature point, and the matching feature point has an associated first map point (marked as

) Then to

And

to pair

And performing association and update, and updating the common view.

2.

In that

There is a matching feature point that does not have an associated first map point, but has an associated fourth map point

Then will be

Adding into

If there is no associated fourth map point in the observation key frame of (2), but

With depth value, then

Can be constructed as a second map point if

Without depth value, then

A third map point may be constructed by triangulation with the matching feature points, and this operation is included in step S13.

3.

In that

If there is no matching feature point, but there is depth value, then

May be constructed as a second map point by an aperture imaging model.

4.

In that

There are no matching feature points, nor depth values,

the process proceeds to step S13. Comprising steps S121-S122, wherein,

S121

and

to pair

Associating and updating

In particular, the amount of the solvent to be used,

is associated with to

，

Is added to

The set of observation key frames of (a),

the world coordinate is kept unchanged and taken out

Current all observation key frames (note as

Comprising

) Descriptor of corresponding characteristic point (noted as

) Calculating

Each descriptor in

To other descriptors

Is a distance of

,

Average distance to other descriptors

,

To describe the number of children, with a minimum

The descriptor of (2) is updated into the best descriptor, and calculation is carried out

And with

Is (noted as)

) Extension of

Observation distance range (note as

) So that

Included

Calculating

Each observation key frame pair

Is observed in (is recorded as

)，

Is updated to

Average direction of observation of

。

In particular, the amount of the solvent to be used,

added as a new vertex to the co-view,

associated first map point

Is a set of observation key frames of

，

All key frames in (except

By itself) is that

Of a certain common view key frame

In that

The number of times of occurrence in the common view is the weight of the connecting line of the corresponding vertexes of the two key frames in the common view, which means the number of the common view points of the two key frames.

Step S14, if the feature points of the visual key frame do not have depth values, triangularization is performed on the visual key frame, the first common-view key frame, and the second common-view key frame to obtain a third map point.

Specifically, after the steps S12 and S13,

there is also a considerable portion of feature points that are not associated with the first map point, nor have depth values, but there is a high possibility that corresponding feature points can be matched in the co-view key frame, and then the first map point can be associated with the matched feature points, or a new map point can be generated as a candidate, that is, a third map point, by using trigonometry.

And step S15, respectively presetting the second map point, the third map point and the fourth map point to obtain the map points which reach the standard, the rejected map points and the moving map points, rejecting the rejected map points, and presetting the map points of the next key frame of the moving map points moving to the visual key frame.

And the fourth map point is a stream map point obtained by presetting the map point of the last key frame of the visual key frame.

In the embodiment of the present application, the second, third, and fourth map points are screened, eliminated, and merged, the screened or merged map points are merged into the first map point, the elimination condition is met, the remaining merged map points enter the processing flow of the next key frame, and the second, third, and fourth map points are hereinafter collectively referred to as candidate map points

。

Each candidate map point

There will be a corresponding generated key frame if

Produced until now have

A key frame is generated if

Is less than

If it is not, it is removed

Has more observation key frames than

If so, it is merged into the first map point by screening

In the number of observation key frames

And

and then the processing flow enters the next key frame as the fourth map point

And step S16, merging the standard map points into the first map points to obtain the visual map points.

In particular, the amount of the solvent to be used,

there are two ways to incorporate the first map point: fusion and addition. Firstly, judging whether the map can be fused with the existing first map point or not, and if not, adding the map as a new first map point. The judgment mode of the fusion is as follows: first, here will be incorporated into the first map point

Must be and

related, fused objects are

And

first and second common view key frames

The associated first map point

In particular, for a candidate map point

Which in turn is projected to

Each of which isKey frame

Matching feature points, if any, and the matched feature points are associated with the first map point

If, if

And

simultaneously, the following requirements are met: 1. the world coordinate distance is within a certain range, 2, the average observation direction difference is within a certain range, and 3, the observation distance ranges are overlapped. Then will be

And

fusing: and reserving the world coordinates of the map points with a large number of observation key frames, combining the observation key frames of the map points and the observation key frames, and updating the optimal descriptor, the observation distance range and the average observation direction.

The embodiment of the application provides a method for constructing visual map points, which comprises the steps of acquiring a real-time visual frame after a map construction instruction is detected when map points of a visual map are constructed, analyzing and screening the real-time visual frame to generate a visual key frame, determining a first map point and a common view based on the visual key frame, judging whether a feature point of the visual key frame has a depth value and/or the first map point is associated, wherein the first map point is an original map point of the visual key frame, the common view is a common view corresponding to the first map point, updating the first map point and the common view based on the feature point of the visual key frame when the feature point of the visual key frame is associated with the first map point, constructing a second map point based on the feature point when the feature point of the visual key frame has a depth value and is not associated with the first map point, when the feature points of the visual key frame do not have depth values, triangularization construction is carried out on the visual key frame, the first common-view key frame and the second common-view key frame to obtain third map points, then preset processing is respectively carried out on the second map points, the third map points and the fourth map points to obtain standard map points, reject map points and moving map points, the rejected map points are rejected, preset processing is carried out on the map points of the next key frame of moving the moving map points to the visual key frame, the fourth map points are moving map points obtained by preset processing of the map points of the previous key frame of the visual key frame, then the standard map points are merged into the first map points to obtain the visual map points, accordingly the visual map points with robustness are obtained, and further the robustness of the visual map is improved.

In a possible implementation manner of the embodiment of the present application, step S11 specifically includes step S111 (not shown in the figure) and step S112 (not shown in the figure), wherein,

step S111, determining whether the visual key frame is an initial key frame.

Specifically, when a map building instruction is detected, a visual key frame is generated from the first acquired visual frame, and the visual key frame is an initial key frame.

Step S112, if yes, extracting feature points with depth values in the real-time visual frame, constructing map points based on the feature points, and initializing the map points to obtain a first map point and a common view.

Specifically, the method for extracting the feature points in the real-time visual frame comprises the following steps: harris, ORB, SURF, SIFT and the like, on the basis of selecting the feature points, two steps of image pyramid construction and feature point homogenization are added in the embodiment of the application, rotation main direction information of the feature points is added, the rotation main direction information is used for feature point matching between two images, and finally a first map point constructed based on the feature points and a common view are obtained.

In another possible implementation manner of the embodiment of the present application, step S10 specifically includes: step S101 (not shown), and step S102 (not shown), wherein,

step S101, obtaining equipment displacement information.

Wherein the device displacement information is movement information of the photographing device.

And S102, determining whether the shooting equipment generates a preset displacement distance or not based on the equipment displacement information, and if the shooting equipment generates the preset displacement distance, comparing and analyzing the real-time visual frame and a previous key frame of the real-time visual frame to generate a visual key frame.

Specifically, motion filtering determines whether processing of the current visual frame is required based on motion information, that is: and if the change of the pose of the current visual frame and the pose of the previous visual frame to be processed is lower than a certain range, skipping the visual frame, otherwise, taking the visual frame as the visual frame to be processed for further processing, wherein the change of the pose comprises displacement and posture change, the motion information can be acquired by a motion sensor, and the motion sensor comprises an IMU, a wheel type encoder and the like.

And comparing the visual frame to be processed with the previous key frame, analyzing, and judging whether a new key frame needs to be generated or not, if so, generating, otherwise, skipping. The comparison method is as follows: and extracting the feature points of the visual frame to be processed, matching the feature points with the feature points of the previous key frame, wherein the ratio of the matched feature points is lower than a certain threshold value, which means that the difference between the current visual frame and the previous key frame is large, namely the difference between the view scene and the previous key frame is large, and a new key frame needs to be generated, otherwise, the new key frame does not need to be generated. Regarding feature point matching between frames (including visual frames and key frames, and key frames), a step of rotation principal direction-based screening is added to a general matching result, specifically: calculating the rotation principal direction change value of all the matched feature points, and putting the rotation principal direction change value into a plurality of sections, e.g.

And then, keeping the matched feature points in the 3 intervals with the largest number, and removing the rest of the matched feature points, wherein the change of the rotation main direction of the matched feature points between two frames is approximate under the general condition, and the specific interval size, the interval number and the result interval number can be configured according to the requirement.

In a possible implementation manner of the embodiment of the present application, step S112 (not shown in the figure) specifically includes step Sa (not shown in the figure), step Sb (not shown in the figure), step S133 (not shown in the figure), step S134 (not shown in the figure), step S135 (not shown in the figure), and step S136 (not shown in the figure), wherein,

and step Sa, determining an image pyramid based on the real-time visual frame, and extracting feature points from each layer of image of the image pyramid to obtain feature points with depth values.

Specifically, the image of the visual frame is subjected to multi-level scaling, for example, 480 × 640 resolution of the original image is used as the 0 th layer image, the 1 st layer image with 400 × 533 resolution and the 2 nd layer image with 333 × 444 resolution are obtained through 1.2 scaling, the obtained layers of images are sequentially scaled to form the image pyramid of the visual frame, and feature point extraction can be performed on each layer of images according to the scaling and the number of pyramid layers corresponding to the engineering setting. The image pyramid is a more complete description of the current visual frame, and is used for marking the visual frame image acquired at a position close to a target object as a frame 1 and marking the visual frame image acquired at a position far from the target object as a frame 2, so that the characteristic points of the target object can not be matched in the original images of the frame 1 and the frame 2 or the images at the same level of the image pyramid, and the characteristic points of the target object can be matched in the high-level image of the frame 1 and the low-level image of the frame 2, which is understood to be easier to match with a distant view after the near view is reduced.

And step Sb, carrying out homogenization treatment on the characteristic points to obtain the treated characteristic points.

Specifically, after feature points are extracted from a general image, a feature point aggregation phenomenon easily occurs, map points are directly generated, a large error is brought to pose calculation, and homogenization processing is performed on the map points, so that feature point aggregation is reduced. Such as: the 480 × 640 resolution image can be divided into 100 regions of 10 × 10, each region has 48 × 64 resolution, and the optimal 10 feature points of each region are selected. The mode of dividing the region and the number of the optimal feature points of the region can be configured according to the engineering requirement.

S1013 calculating the main rotation direction of the feature point

And extracting an image block within a certain distance range by taking the pixel where the feature point is located as the center, wherein the weighted sum of the pixel gray values of all the pixels of the image block is the gray center, the pixel coordinate where the feature point is located is the geometric center of the image block, and the vector direction of the geometric center pointing to the gray center is the main rotation direction of the feature point. The function is as follows: when the two images are matched with the characteristic points, the change of the rotation main direction of most matched characteristic points is approximate, so that the matching result can be screened, and the matching accuracy is improved.

In another possible implementation manner of the embodiment of the present application, step Sb (not shown in the figure) further includes: step Sb1 (not shown), and step Sb2 (not shown), wherein,

step Sb1, the rotation principal direction of the feature point is calculated to obtain the direction variation.

Specifically, an image block within a certain distance range with a pixel where a feature point is located as a center is extracted, the weighted sum of pixel gray values of all pixels of the image block is the gray center, the pixel coordinate where the feature point is located is the geometric center of the image block, and the vector direction of the geometric center pointing to the gray center is the main rotation direction of the feature point. The function is as follows: when the two images are matched with the characteristic points, the change of the rotation main direction of most matched characteristic points is approximate, so that the matching result can be screened, and the matching accuracy is improved.

And step Sb2, binding the direction variation quantity with the feature points to obtain the bound feature points.

In another possible implementation manner of the embodiment of the present application, step S14 specifically includes step S141 (not shown in the figure), step S142 (not shown in the figure), and step S143 (not shown in the figure), wherein,

in step S141, a first common view key frame and a second common view key frame are obtained.

The first common-view key frame is a key frame meeting the requirement of the common-view degree of the visual key frames, and the second common-view key frame is a key frame meeting the requirement of the common-view degree of the first common-view key frame.

Specifically, the common visibility requirement is the first ten frames with the highest common visibility, and the first common-view key frame is the first ten key frames with the highest common visibility with the key frame

The second common-view key frame is ten key frames with the highest common-view degree with the first common-view key frame

I.e. the total number of the first and second common-view key frames is at most

All the first and second common-view key frames are marked as

。

And step S142, respectively carrying out feature point matching on the visual key frame, the first common-view key frame and the second common-view key frame to obtain matched feature points.

Specifically, the visual key frames are sequentially connected with

Each common-view key frame is subjected to feature point matching, and the matching mode is synchronous with the feature point matching between the frames in the step S102. Only, one-step scene depth inspection needs to be performed before matching, specifically: for each one

And calculating the median depth values of all feature points with depth values, namely the median depth values of the scene

Relative displacement from the visual key frame (as a baseline in trigonometric calculations) with

Scene (2)If the depth median ratio is small, skipping is skipped because the map points constructed by triangulation are not reliable.

And S143, constructing the matched feature points through triangulation to obtain third map points.

In particular, feature points of a visual keyframe

，

Corresponding matched feature points

If, if

Associated first map point

Will be

Is re-projected to

Checking the weight projection error by a chi-square method, and if the chi-square method passes the check, directly checking the weight projection error by the chi-square method

Is associated with to

If the chi-square test fails or

Without the associated first map points, map points are constructed by triangulation, which are respectively re-projected onto

And

and calculating a reprojection error, and carrying out chi-square inspection on the reprojection error, wherein if the chi-square inspection is passed, the map point is a third map point, and if not, the map point is discarded.

There are two cases, map points, of the reprojection error chi-square test here

(including first map points and triangulated newly constructed map points) to keyframes

Upper, reprojection coordinates of

，

In that

The coordinates of the corresponding characteristic points are

If, if

Without depth values, a 2-degree-of-freedom chi-square test is performed, i.e. only calculations are made

And

pixel coordinate (two directions) error of if

If there is a depth value, then a 3-degree-of-freedom chi-square test is performed, plus the error in the depth direction, whichThe error requires a normalization process, assuming that the errors in these three directions are independent and normally distributed.

In another possible implementation manner of the embodiment of the present application, step S16 is followed by step S161 (not shown in the figure) and step S162 (not shown in the figure), wherein,

step S161, determining a historical keyframe set based on the visual keyframe, and screening historical map points in the historical keyframe set to obtain a historical first map point.

And step S162, carrying out map construction on the historical first map point and the first map point to obtain a visual map.

The above embodiments describe a method for constructing a visual map point from the perspective of a method flow, and the following embodiments describe a device for constructing a visual map point from the perspective of a virtual module or a virtual unit, which are described in detail in the following embodiments.

An embodiment of the present application provides a device for constructing a visual map point, as shown in fig. 2, the device 20 for constructing a visual map point may specifically include: an acquisition generation module 21, a determination judgment module 22, an update module 23, a first construction module 24, a second construction module 25, a processing module 26, and a fusion construction module 27, wherein,

the acquisition and generation module 21 is configured to acquire a real-time visual frame after detecting a map construction instruction, and analyze and screen the real-time visual frame to generate a visual key frame;

the determination and judgment module 22 is configured to determine a first map point and a common view based on the visual key frame, and judge whether a feature point of the visual key frame has a depth value and/or the first map point is associated, where the first map point is an original map point of the visual key frame, and the common view is a common view corresponding to the first map point;

an updating module 23, configured to update the first map point and the common view based on the feature point of the visual key frame when the feature point of the visual key frame is associated with the first map point;

a first construction module 24, configured to construct a second map point based on the feature points when the feature points of the visual key frame have depth values and are not associated with the first map point;

the second construction module 25 is configured to perform triangularization construction on the visual key frame, the first common-view key frame and the second common-view key frame to obtain a third map point when the feature points of the visual key frame do not have depth values;

a processing module 26, configured to perform preset processing on the second map point, the third map point, and the fourth map point respectively to obtain a map point that meets the standard, a rejected map point, and a moving map point, and reject the rejected map point, and perform preset processing on a map point of a key frame that moves the moving map point to a next key frame of the visual key frame, where the fourth map point is a moving map point obtained by performing preset processing on a map point of a previous key frame of the visual key frame;

and the blending construction module 27 is configured to merge the standard map point into the first map point to obtain a visual map point.

In a possible implementation manner of the embodiment of the present application, when determining the first map point and the common view based on the visual key frame, the determining and determining module 22 is specifically configured to:

judging whether the visual key frame is an initial key frame or not;

In another possible implementation manner of the embodiment of the present application, the obtaining and generating module 21 is specifically configured to, when analyzing and screening the real-time visual frame and generating the visual key frame:

In another possible implementation manner of the embodiment of the present application, when determining that the determining module 22 extracts the feature points with depth values in the real-time visual frame, it is specifically configured to:

In another possible implementation manner of the embodiment of the present application, the apparatus 20 further includes: a direction calculation module and a data binding module, wherein,

In another possible implementation manner of the embodiment of the present application, when triangulating the visual key frame, the first common view key frame, and the second common view key frame to obtain the third map point, the second constructing module 25 is specifically configured to:

In another possible implementation manner of the embodiment of the present application, the apparatus 20 further includes: a data filtering module and a map building module, wherein,

In an embodiment of the present application, an electronic device is provided, as shown in fig. 3, where the electronic device 300 shown in fig. 3 includes: a processor 301 and a memory 303. Wherein processor 301 is coupled to memory 303, such as via bus 302. Optionally, the electronic device 300 may also include a transceiver 304. It should be noted that the transceiver 304 is not limited to one in practical applications, and the structure of the electronic device 300 is not limited to the embodiment of the present application.

The Processor 301 may be a CPU (Central Processing Unit), a general-purpose Processor, a DSP (Digital Signal Processor), an ASIC (Application Specific Integrated Circuit), an FPGA (Field Programmable Gate Array) or other Programmable logic device, a transistor logic device, a hardware component, or any combination thereof. Which may implement or perform the various illustrative logical blocks, modules, and circuits described in connection with the disclosure. The processor 301 may also be a combination of computing functions, e.g., comprising one or more microprocessors, a combination of a DSP and a microprocessor, or the like.

Bus 302 may include a path that transfers information between the above components. The bus 302 may be a PCI (Peripheral Component Interconnect) bus, an EISA (Extended Industry Standard Architecture) bus, or the like. The bus 302 may be divided into an address bus, a data bus, a control bus, and the like. For ease of illustration, only one thick line is shown in FIG. 3, but this does not mean only one bus or one type of bus.

The Memory 303 may be a ROM (Read Only Memory) or other type of static storage device that can store static information and instructions, a RAM (Random Access Memory) or other type of dynamic storage device that can store information and instructions, an EEPROM (Electrically Erasable Programmable Read Only Memory), a CD-ROM (Compact Disc Read Only Memory) or other optical Disc storage, optical Disc storage (including Compact Disc, laser Disc, optical Disc, digital versatile Disc, blu-ray Disc, etc.), a magnetic Disc storage medium or other magnetic storage device, or any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer, but is not limited to these.

The memory 303 is used for storing application program codes for executing the scheme of the application, and the processor 301 controls the execution. The processor 301 is configured to execute application program code stored in the memory 303 to implement the aspects illustrated in the foregoing method embodiments.

Among them, electronic devices include but are not limited to: mobile terminals such as mobile phones, notebook computers, digital broadcast receivers, PDAs (personal digital assistants), PADs (tablet computers), PMPs (portable multimedia players), in-vehicle terminals (e.g., in-vehicle navigation terminals), and the like, and fixed terminals such as digital TVs, desktop computers, and the like. But also a server, etc. The electronic device shown in fig. 3 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present disclosure.

The present application provides a computer-readable storage medium, on which a computer program is stored, which, when running on a computer, enables the computer to execute the corresponding content in the foregoing method embodiments. Compared with the prior art, in the embodiment of the application, when map points of a visual map are constructed, after a map construction instruction is detected, a real-time visual frame is obtained, the real-time visual frame is analyzed and screened to generate a visual key frame, then a first map point and a common view are determined based on the visual key frame, whether a feature point of the visual key frame has a depth value and/or whether the first map point is associated is judged, the first map point is an original map point of the visual key frame, the common view is a common view corresponding to the first map point, when the feature point of the visual key frame is associated with the first map point, the first map point and the common view are updated based on the feature point of the visual key frame, when the feature point of the visual key frame has a depth value and is not associated with the first map point, a second map point is constructed based on the feature point, and when the feature point of the visual key frame does not have a depth value, the method comprises the steps of triangularizing a visual key frame, a first common-view key frame and a second common-view key frame to obtain a third map point, presetting second map points, third map points and fourth map points respectively to obtain standard map points, rejected map points and moving map points, rejecting the rejected map points, moving the moving map points to the map point of the next key frame of the visual key frame to preset, and merging the standard map points into the first map point to obtain the visual map points, so that the visual map points with robustness are obtained, and the robustness of the visual map is improved.

It should be understood that, although the steps in the flowcharts of the figures are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and may be performed in other orders unless explicitly stated herein. Moreover, at least a portion of the steps in the flow chart of the figure may include multiple sub-steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, which are not necessarily performed in sequence, but may be performed alternately or alternately with other steps or at least a portion of the sub-steps or stages of other steps.

The foregoing is only a partial embodiment of the present application, and it should be noted that, for those skilled in the art, various modifications and decorations can be made without departing from the principle of the present application, and these modifications and decorations should also be regarded as the protection scope of the present application.

Claims

1. A method for constructing a visual map point, comprising:

determining a first map point and a common view based on the visual key frame, and judging whether a feature point of the visual key frame has a depth value and/or the first map point is associated, wherein the first map point is an original map point of the visual key frame, and the common view is a common view corresponding to the first map point;

if the feature points of the visual key frame have depth values and do not have relevance with the first map point, constructing a second map point based on the feature points;

if the feature points of the visual key frame do not have depth values, triangularization construction is carried out on the visual key frame, the first common-view key frame and the second common-view key frame to obtain a third map point;

respectively presetting the second map point, the third map point and a fourth map point to obtain a map point reaching the standard, a rejected map point and a moving map point, rejecting the rejected map point, moving the moving map point to a map point of a next key frame of the visual key frame for presetting, wherein the fourth map point is the moving map point obtained by presetting the map point of the previous key frame of the visual key frame;

and merging the standard map point into the first map point to obtain a visual map point.

2. The method of claim 1, wherein determining a first map point and a co-view based on the visual keyframe comprises:

judging whether the visual key frame is an initial key frame;

3. The method of claim 1, wherein the performing analysis screening on the real-time visual frames to generate visual key frames comprises:

acquiring equipment displacement information, wherein the equipment displacement information is the movement information of shooting equipment;

4. The method of claim 2, wherein extracting feature points with depth values in the real-time visual frame comprises:

5. The method of claim 4, wherein the homogenizing the feature points to obtain processed feature points, further comprising:

calculating the rotation main direction of the characteristic points to obtain direction variation;

and binding the direction variable quantity and the characteristic point to obtain the bound characteristic point.

6. The method of claim 1, wherein triangulating the visual key frame, the first common view key frame and the second common view key frame to obtain a third map point comprises:

7. The method of claim 1, wherein said merging the qualifying map point into the first map point results in a visual map point, and thereafter further comprising:

and carrying out map construction on the historical first map point and the first map point to obtain a visual map.

8. An apparatus for constructing a visual map point, comprising:

a first construction module for constructing a second map point based on feature points of the visual key frame when the feature points have depth values and are not associated with the first map point;

9. An electronic device, comprising:

one or more processors;

a memory;

one or more applications, wherein the one or more applications are stored in the memory and configured to be executed by the one or more processors, the one or more programs configured to: performing the method of constructing a visual map point according to any one of claims 1 to 7.

10. A computer-readable storage medium on which a computer program is stored, the program, when executed by a processor, implementing the method for constructing a visual map point according to any one of claims 1 to 7.