CN114998743A - Method, device, equipment and medium for constructing visual map points - Google Patents

Method, device, equipment and medium for constructing visual map points Download PDF

Info

Publication number
CN114998743A
CN114998743A CN202210795743.7A CN202210795743A CN114998743A CN 114998743 A CN114998743 A CN 114998743A CN 202210795743 A CN202210795743 A CN 202210795743A CN 114998743 A CN114998743 A CN 114998743A
Authority
CN
China
Prior art keywords
key frame
visual
map
map point
point
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210795743.7A
Other languages
Chinese (zh)
Inventor
邢志伟
魏伟
赵信宇
魏金生
李骥
龙建睿
颜世龙
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangdong Dadao Zhichuang Technology Co ltd
Original Assignee
Guangdong Dadao Zhichuang Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangdong Dadao Zhichuang Technology Co ltd filed Critical Guangdong Dadao Zhichuang Technology Co ltd
Priority to CN202210795743.7A priority Critical patent/CN114998743A/en
Publication of CN114998743A publication Critical patent/CN114998743A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/10Terrestrial scenes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/46Descriptors for shape, contour or point-related descriptors, e.g. scale invariant feature transform [SIFT] or bags of words [BoW]; Salient regional features
    • G06V10/462Salient features, e.g. scale invariant feature transforms [SIFT]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • G06V10/75Organisation of the matching processes, e.g. simultaneous or sequential comparisons of image or video features; Coarse-fine approaches, e.g. multi-scale approaches; using context analysis; Selection of dictionaries
    • G06V10/757Matching configurations of points or features

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Multimedia (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • Databases & Information Systems (AREA)
  • Computing Systems (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Image Analysis (AREA)

Abstract

The application relates to a method, a device, equipment and a medium for constructing visual map points, relating to the field of map point generation and comprising the following steps: analyzing and screening a real-time visual frame to generate a visual key frame, determining a first map point and a common view based on the visual key frame, if the characteristic points of the visual key frame are associated, updating the first map point and the common view, if the characteristic points of the visual key frame have depth values and are not associated, constructing a second map point, if the characteristic points of the visual key frame do not have depth values, triangulating the visual key frame, the first common view key frame and the second common view key frame to obtain a third map point, presetting the second map point, the third map point and the fourth map point to obtain a standard map point, a reject map point and a transfer map point, and merging the standard map point into the first map point to obtain the visual map point. The method and the device have the effect of improving the robustness of the visual map points.

Description

Method, device, equipment and medium for constructing visual map points
Technical Field
The present application relates to the field of map point construction technologies, and in particular, to a method, an apparatus, a device, and a medium for constructing visual map points.
Background
The core task of the visual SLAM (Simultaneous Localization and Mapping) technology algorithm is the construction of a visual map, and the visual map is generally composed of a set of visual map points with descriptive information.
Currently, an RGBD camera or a binocular camera may be used to acquire a visual frame of a target scene, and then a visual map of the target scene is constructed. However, the construction of the visual map points is often disposable, so that the constructed visual map contains map points constructed by dynamic objects and noisy points, and during subsequent loop detection or repositioning, the accuracy of loop detection and repositioning is reduced by the map points.
Disclosure of Invention
In order to improve the robustness of the visual map points, the application provides a construction method, a device, equipment and a medium of the visual map points.
In a first aspect, the present application provides a method for constructing a visual map point, which is implemented by the following technical solutions:
a method for constructing a visual map point, comprising:
when a map building instruction is detected, acquiring a real-time visual frame, analyzing and screening the real-time visual frame, and generating a visual key frame;
determining a first map point and a common view based on the visual key frame, and judging whether the feature point of the visual key frame has a depth value and/or the first map point is associated, wherein the first map point is an original map point of the visual key frame, and the common view is a common view corresponding to the first map point;
if the feature point of the visual key frame is associated with the first map point, updating the first map point and the common view based on the feature point of the visual key frame;
if the feature points of the visual key frame have depth values and are not associated with the first map point, constructing a second map point based on the feature points;
if the feature points of the visual key frame do not have depth values, triangularizing construction is carried out on the visual key frame, the first common-view key frame and the second common-view key frame to obtain a third map point;
respectively presetting a second map point, a third map point and a fourth map point to obtain standard map points, eliminating map points and moving map points, eliminating the eliminated map points, presetting the map points of the moving map points moving to the next key frame of the visual key frame, and presetting the fourth map point as the moving map point obtained by presetting the map point of the previous key frame of the visual key frame;
and merging the standard map points into the first map points to obtain the visual map points.
By adopting the technical scheme, when map points of the visual map are constructed, after a map construction instruction is detected, a real-time visual frame is obtained, the real-time visual frame is analyzed and screened to generate a visual key frame, then a first map point and a common view are determined based on the visual key frame, whether the feature points of the visual key frame have depth values and/or the first map point is associated is judged, the first map point is an original map point of the visual key frame, the common view is a common view corresponding to the first map point, when the feature points of the visual key frame are associated with the first map point, the first map point and the common view are updated based on the feature points of the visual key frame, when the feature points of the visual key frame have depth values and are not associated with the first map point, a second map point is constructed based on the feature points, when the feature points of the visual key frame do not have depth values, the method comprises the steps of triangularizing a visual key frame, a first common-view key frame and a second common-view key frame to obtain a third map point, presetting second map points, third map points and fourth map points respectively to obtain standard map points, rejected map points and moving map points, rejecting the rejected map points, moving the moving map points to the map point of the next key frame of the visual key frame to preset, and merging the standard map points into the first map point to obtain the visual map points, so that the visual map points with robustness are obtained, and the robustness of the visual map is improved.
In another possible implementation manner, determining the first map point and the co-view based on the visual key frame includes:
judging whether the visual key frame is an initial key frame;
if so, extracting the feature points with the depth values in the real-time visual frame, constructing map points based on the feature points, and initializing the map points to obtain a first map point and a common view.
According to the technical scheme, when the visual key frame is initialized, whether the visual key frame is the initial key frame or not is judged, when the visual key frame is the initial key frame, the feature points with the depth values in the real-time visual frame are extracted, the map points are constructed on the basis of the feature points, the map points are initialized, the first map points and the common view are obtained, and therefore a foundation is laid for construction of a follow-up visual map.
In another possible implementation manner, performing analysis screening on the real-time visual frames to generate the visual key frames includes:
acquiring equipment displacement information, wherein the equipment displacement information is the movement information of the shooting equipment;
and determining whether the shooting equipment generates a preset displacement distance or not based on the equipment displacement information, and if so, comparing and analyzing the real-time visual frame and a previous key frame of the real-time visual frame to generate a visual key frame.
According to the technical scheme, when the visual key frame is generated, the equipment displacement information is obtained, the equipment displacement information is the movement information of the shooting equipment, then whether the shooting equipment generates the preset displacement distance or not is determined based on the equipment displacement information, if the shooting equipment generates the preset displacement distance, the real-time visual frame and the previous key frame of the real-time visual frame are compared and analyzed, the visual key frame is generated, and therefore the data processing amount is reduced. The accuracy of the map data features of the visual keyframes is improved.
In another possible implementation manner, extracting feature points with depth values in the real-time visual frame includes:
determining an image pyramid based on the real-time visual frame, and extracting feature points from each layer of image of the image pyramid to obtain feature points with depth values;
and carrying out homogenization treatment on the characteristic points to obtain the treated characteristic points.
According to the technical scheme, when the feature points are extracted, the image pyramid is determined based on the real-time visual frame, the feature points are extracted from each layer of image of the image pyramid to obtain the feature points with the depth values, then the feature points are subjected to homogenization treatment to obtain the treated feature points, and therefore feature point aggregation is reduced. Therefore, the accuracy and the real-time performance of map points and co-view are improved.
In another possible implementation manner, the homogenizing processing is performed on the feature points to obtain processed feature points, and then the method further includes:
calculating the rotation main direction of the characteristic points to obtain direction variable quantity;
and binding the direction variable quantity and the characteristic points to obtain the bound characteristic points.
Through the technical scheme, after the feature points are extracted, the rotation main direction of the feature points is calculated to obtain the direction variation, and then the direction variation is bound with the feature points to obtain the bound feature points, so that the follow-up feature points can be matched conveniently, and the matching accuracy of the feature points is improved.
In another possible implementation manner, triangulating the visual key frame, the first common view key frame, and the second common view key frame to obtain a third map point includes:
acquiring a first common-view key frame and a second common-view key frame, wherein the first common-view key frame is a key frame meeting the common-view degree requirement of the visual key frames, and the second common-view key frame is a key frame meeting the common-view degree requirement of the first common-view key frames;
respectively carrying out feature point matching on the visual key frame, the first common-view key frame and the second common-view key frame to obtain matched feature points;
and constructing the matched feature points by triangulation to obtain a third map point.
According to the technical scheme, when the third map point is constructed, the first common-view key frame and the second common-view key frame are obtained, wherein the first common-view key frame is a key frame meeting the requirement of the common-view range of the visual key frame, the second common-view key frame is a key frame meeting the requirement of the common-view range of the first common-view key frame, then the visual key frame is respectively matched with the first common-view key frame and the second common-view key frame through the feature points to obtain the matched feature points, then the matched feature points are constructed through triangulation to obtain the third map point, and therefore the accuracy of the third map point is improved.
In another possible implementation manner, the step of merging the map point reaching the standard into the first map point to obtain the visual map point further comprises:
determining a historical key frame set based on the visual key frame, and screening historical map points in the historical key frame set to obtain a historical first map point;
and carrying out map construction on the historical first map points and the first map points to obtain the visual map.
According to the technical scheme, when the visual map is constructed, the historical key frame set is determined according to the visual key frames, the historical map points in the historical key frame set are screened to obtain the historical first map points, and then the visual map is constructed based on the historical first map points and the first map points to obtain the visual map, so that the robustness of the visual map is improved.
In a second aspect, the present application provides a device for constructing a visual map point, which adopts the following technical solutions:
an apparatus for constructing a visual map point, comprising:
the acquisition and generation module is used for acquiring a real-time visual frame after a map construction instruction is detected, analyzing and screening the real-time visual frame and generating a visual key frame;
a determination and judgment module, configured to determine a first map point and a common view based on the visual key frame, and judge whether a feature point of the visual key frame has a depth value and/or the first map point is associated, where the first map point is an original map point of the visual key frame, and the common view is a common view corresponding to the first map point;
an updating module, configured to update the first map point and the co-view based on the feature point of the visual key frame when the feature point of the visual key frame is associated with the first map point;
a first construction module for constructing second map points based on feature points of the visual key frame when the feature points have depth values and are not associated with the first map points;
the second construction module is used for triangularizing the visual key frame, the first common-view key frame and the second common-view key frame to obtain a third map point when the feature points of the visual key frame do not have depth values;
the processing module is used for respectively carrying out preset processing on the second map point, the third map point and the fourth map point to obtain a map point reaching the standard, a rejected map point and a moving map point, rejecting the rejected map point, moving the moving map point to a map point of a next key frame of the visual key frame for preset processing, and obtaining a moving map point by the preset processing of the map point of a previous key frame of the visual key frame;
and the fusion construction module is used for merging the standard map points into the first map points to obtain the visual map points.
By adopting the technical scheme, when map points of the visual map are constructed, after a map construction instruction is detected, a real-time visual frame is obtained, the real-time visual frame is analyzed and screened to generate a visual key frame, then a first map point and a common view are determined based on the visual key frame, whether feature points of the visual key frame have depth values and/or the first map point is associated is judged, the first map point is an original map point of the visual key frame, the common view is a common view corresponding to the first map point, when the feature points of the visual key frame are associated with the first map point, the first map point and the common view are updated based on the feature points of the visual key frame, when the feature points of the visual key frame have depth values and are not associated with the first map point, a second map point is constructed based on the feature points, when the feature points of the visual key frame do not have depth values, triangularization construction is carried out on the visual key frame, the first common-view key frame and the second common-view key frame to obtain a third map point, then preset processing is carried out on the second map point, the third map point and a fourth map point respectively to obtain a standard map point, a rejected map point and a moving map point, the rejected map point is rejected, the moving map point is moved to a map point of a next key frame of the visual key frame to be preset processed, the fourth map point is a moving map point obtained by preset processing of the map point of the last key frame of the visual key frame, then the standard map point is merged into the first map point to obtain the visual map point, and therefore the visual map point with robustness is obtained.
In a possible implementation manner, when determining the first map point and the co-view based on the visual key frame, the determination module is specifically configured to:
judging whether the visual key frame is an initial key frame;
if yes, extracting feature points with depth values in the real-time visual frame, constructing map points based on the feature points, and initializing the map points to obtain a first map point and a common view.
In another possible implementation manner, the obtaining and generating module is specifically configured to, when performing analysis and screening on a real-time visual frame and generating a visual key frame:
acquiring equipment displacement information, wherein the equipment displacement information is the movement information of the shooting equipment;
and determining whether the shooting equipment generates a preset displacement distance or not based on the equipment displacement information, and if so, comparing and analyzing the real-time visual frame and a previous key frame of the real-time visual frame to generate a visual key frame.
In another possible implementation manner, when extracting the feature points with depth values in the real-time visual frame, the determination module is specifically configured to:
determining an image pyramid based on the real-time visual frame, and extracting feature points from each layer of image of the image pyramid to obtain feature points with depth values;
and carrying out homogenization treatment on the characteristic points to obtain the treated characteristic points.
In another possible implementation manner, the apparatus further includes: a direction calculation module and a data binding module, wherein,
the direction calculation module is used for calculating the main rotation direction of the characteristic points to obtain direction variation;
and the data binding module is used for binding the direction variable quantity and the characteristic points to obtain the bound characteristic points.
In another possible implementation manner, when triangulating the visual key frame, the first common-view key frame, and the second common-view key frame to obtain a third map point, the second construction module is specifically configured to:
acquiring a first common-view key frame and a second common-view key frame, wherein the first common-view key frame is a key frame meeting the common-view range requirement of the visual key frames, and the second common-view key frame is a key frame meeting the common-view range requirement of the first common-view key frames;
respectively carrying out feature point matching on the visual key frame, the first common-view key frame and the second common-view key frame to obtain matched feature points;
and constructing the matched feature points through triangulation to obtain a third map point.
In another possible implementation manner, the apparatus further includes: a data screening module and a map construction module, wherein,
the data screening module is used for determining a historical key frame set based on the visual key frame and screening historical map points in the historical key frame set to obtain a historical first map point;
and the map construction module is used for carrying out map construction on the historical first map points and the first map points to obtain the visual map.
In a third aspect, the present application provides an electronic device, which adopts the following technical solutions:
an electronic device, comprising:
one or more processors;
a memory;
one or more application programs, wherein the one or more application programs are stored in the memory and configured to be executed by the one or more processors, the one or more programs configured to: -performing a method of construction of a visual map point according to any of claims 1 to 7.
In a fourth aspect, the present application provides a computer-readable storage medium, which adopts the following technical solutions:
a computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the method of constructing a visual map point according to any one of claims 1 to 7.
In summary, the present application includes at least one of the following beneficial technical effects:
1. when map points of a visual map are constructed, after a map construction instruction is detected, a real-time visual frame is obtained, the real-time visual frame is analyzed and screened to generate a visual key frame, then a first map point and a common view are determined based on the visual key frame, whether a feature point of the visual key frame has a depth value and/or the first map point is associated is judged, the first map point is an original map point of the visual key frame, the common view is a common view corresponding to the first map point, when the feature point of the visual key frame is associated with the first map point, the first map point and the common view are updated based on the feature point of the visual key frame, when the feature point of the visual key frame has a depth value and is not associated with the first map point, a second map point is constructed based on the feature point, and when the feature point of the visual key frame does not have a depth value, the visual key frame is subjected to image extraction, and image extraction, Triangularization construction is carried out on the first common-view key frame and the second common-view key frame to obtain a third map point, then preset processing is respectively carried out on the second map point, the third map point and the fourth map point to obtain a standard map point, a rejected map point and a moving map point, the rejected map point is rejected, the moving map point is moved to a map point of a next key frame of the visual key frame to be preset processed, the fourth map point is a moving map point obtained by preset processing of the map point of the previous key frame of the visual key frame, and then the standard map point is merged into the first map point to obtain the visual map point, so that the visual map point with robustness is obtained, and the robustness of the visual map is improved;
2. when the visual key frame is initialized, judging whether the visual key frame is an initial key frame, when the visual key frame is the initial key frame, extracting feature points with depth values in the real-time visual frame, constructing map points based on the feature points, initializing the map points to obtain a first map point and a common view, and laying a foundation for constructing a subsequent visual map.
Drawings
Fig. 1 is a schematic flowchart of a method for constructing a visual map point according to an embodiment of the present application.
Fig. 2 is a schematic structural diagram of a construction apparatus for a visual map point according to an embodiment of the present application.
Fig. 3 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
Detailed Description
The present application is described in further detail below with reference to figures 1-3.
A person skilled in the art, after reading the present specification, may make modifications to the present embodiments as necessary without inventive contribution, but only within the scope of the claims of the present application are protected by patent laws.
In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some embodiments of the present application, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
In addition, the term "and/or" herein is only one kind of association relationship describing an associated object, and means that there may be three kinds of relationships, for example, a and/or B, which may mean: a exists alone, A and B exist simultaneously, and B exists alone. In addition, the character "/" herein generally indicates that the former and latter related objects are in an "or" relationship, unless otherwise specified.
The terms used in this scheme are explained further:
SLAM: simultaneous localization and mapping, instantaneous localization and mapping.
The characteristic points are as follows: the feature points refer to some pixel points in the picture which are obviously different from other pixels and texture information formed by surrounding pixels. They are the main processing objects of the human visual system and the robot visual SLAM system, and have characteristics that can be extracted and matched for correlation. Common feature points are often characterized by extremely high or low brightness, surrounding pixels showing surrounding distribution, extreme pixel gradients in multiple directions, and the like. The extraction of the feature points is to find the special pixel points from the picture and record the coordinate information of the special pixel points.
The description of the feature point is that the feature point which has been extracted is assigned a specific and recognizable ID, which is usually related to the brightness of the surrounding image, and can be analogized to each feature pointThe person issues an identification number. Meanwhile, the description of the feature point generally also comprises an intensity value of the feature point, which indicates the degree of prominence of the feature, and the higher the feature point is, the less sensitive and more stable the feature point is to the influence of ambient illumination, and the lower the feature point is, the less reliable the feature point is. Each extracted feature point can be represented as:
Figure 318320DEST_PATH_IMAGE001
where (u, v) represents the position coordinates of the feature point, s is its intensity value, and d is a descriptor (typically a fixed-size matrix).
Common feature point extraction algorithms include Harris, SIFT, SURF, ORB and the like. In engineering practice, the ORB features have the characteristics of small calculated amount, simple descriptors and the like, and are suitable for scenes with prior quantity and speed. All of the following defaults to ORB feature points.
Depth value: the distance from the imaged object to the center of the camera can be directly measured by the RGBD camera, and the distance from the imaged object to the center of the camera can be obtained by stereo vision calculation by the binocular camera.
Visual frame: a binocular image, or an RGBD image.
Key frame: based on visual frame construction, the visual frame mainly comprises the following components: { pose, feature point set, and map point set corresponding to the feature points (if there are associated map points at the feature points) }, wherein the pose is calculated by the PnP algorithm.
Visual map points: map points for short mainly comprise: { world coordinates, best descriptor, set of observation keyframes, observation distance range, average observation direction }.
And observing the key frame: the map point is associated (observed) with a feature point of the key frame.
Observation distance range: and observing the minimum and maximum intervals of the distance between the key frame and the map point.
Observation direction: and observing a straight line direction formed by the key frame and the map point under the world coordinate system.
In common view: the vertex is a key frame, the edge represents that the connected key frames have a common-view relationship (common-view points exist), and the weight of the edge is the total number of the common-view points.
A first map point: map points for finally forming visual map
And a second map point: and constructing the candidate map point by the feature points which are not associated with the first map point but have the depth value in the new key frame.
Third map point: and matching the candidate map points by using the feature points which are not associated with the first map point and have no depth value in the new key frame with the similar feature points (not associated with the first map point and also have no depth value) of the first and second co-viewing key frames of the key frame, and constructing the candidate map points by using a trigonometric method.
Fourth map point: and after the candidate map points, the second map point and the third map point are screened, the candidate map points are not merged into the first map point or are not eliminated, and the candidate map points need to be screened continuously.
The embodiment of the application provides a method for constructing a visual map point, which is executed by an electronic device, wherein the electronic device can be a server or a terminal device, wherein the server can be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server for providing cloud computing service. The terminal device may be a smart phone, a tablet computer, a notebook computer, a desktop computer, etc., but is not limited thereto, and the terminal device and the server may be directly or indirectly connected through wired or wireless communication, and the embodiment of the present application is not limited thereto, as shown in fig. 1, the method includes step S10, step S11, step S12, step S13, step S14, and step S15, wherein,
and step S10, acquiring a real-time visual frame after detecting the map building instruction, and analyzing and screening the real-time visual frame to generate a visual key frame.
Specifically, the real-time visual frame is an image shot in the current time and can be acquired through shooting equipment, and the visual key frame is a key frame in the current time and is obtained by extracting the visual frame in the previous time period. The visual frame is different from the key frame in that the visual frame is generated every moment, and the key frame is one of the most representative visual frames to be locally similar.
Specifically, a segment of animation is essentially a certain number of pictures that are played continuously over a period of time. For each picture, we call it a "visual frame". As for the reason, the visual frames are called, because the pictures contain the changing relation of time and position, and the animation can be seen by human eyes when a plurality of frames are played continuously and quickly within a certain time. In the same time, the more visual frames are played, the smoother the picture looks, namely, key frames are selected from the visual frames, and the keywords are played in a combined mode.
In particular, generating visual key frames has the advantages of:
the information redundancy among the close frames is high, and the key frame is the most representative frame in the local close frames, so that the information redundancy can be reduced. For example, the camera is left in place, normal frames are still to be recorded, but the key frames are not increased.
The quality of pictures, the quality of characteristic points and the like are also considered when the key frames are selected, the depth of common frames is often projected onto the key frames to optimize the depth map in RGB-D SLAM related schemes such as Bundle Fusion, RKD SLAM and the like, and the key frames are the result of filtering and optimization of the common frames to a certain extent, so that useless or wrong information is prevented from entering the optimization process to damage the accuracy of positioning and mapping.
Step S11, determining the first map point and the co-view based on the visual key frame, and determining whether the feature point of the visual key frame has a depth value and/or the first map point is associated.
The first map point is an original map point of the visual key frame, and the common view is a common view corresponding to the first map point.
Specifically, a map point corresponding to the image is determined according to the current visual key frame, and the map point is a first map point.
In step S12, if the feature point of the visual key frame is associated with the first map point, the first map point and the co-view are updated based on the feature point of the visual key frame.
In step S13, if the feature points of the visual key frame have depth values and are not associated with the first map point, constructing a second map point based on the feature points.
In particular, a new key frame (denoted as
Figure 334817DEST_PATH_IMAGE002
) Has already been associated with the previous key frame (denoted as
Figure 770478DEST_PATH_IMAGE003
) The feature point matching is carried out and recorded
Figure 214229DEST_PATH_IMAGE002
A certain characteristic point in (1) is
Figure 902217DEST_PATH_IMAGE004
,
Figure 722406DEST_PATH_IMAGE004
There are several cases as follows:
1.
Figure 746994DEST_PATH_IMAGE004
in that
Figure 892804DEST_PATH_IMAGE003
There is a matching feature point, and the matching feature point has an associated first map point (marked as
Figure 268422DEST_PATH_IMAGE005
) Then to
Figure 626722DEST_PATH_IMAGE002
And
Figure 36975DEST_PATH_IMAGE004
to pair
Figure 960544DEST_PATH_IMAGE005
And performing association and update, and updating the common view.
2.
Figure 823458DEST_PATH_IMAGE004
In that
Figure 251028DEST_PATH_IMAGE003
There is a matching feature point that does not have an associated first map point, but has an associated fourth map point
Figure 515787DEST_PATH_IMAGE006
Then will be
Figure 268980DEST_PATH_IMAGE002
Adding into
Figure 619190DEST_PATH_IMAGE006
If there is no associated fourth map point in the observation key frame of (2), but
Figure 319292DEST_PATH_IMAGE004
With depth value, then
Figure 438558DEST_PATH_IMAGE004
Can be constructed as a second map point if
Figure 846142DEST_PATH_IMAGE004
Without depth value, then
Figure 683648DEST_PATH_IMAGE004
A third map point may be constructed by triangulation with the matching feature points, and this operation is included in step S13.
3.
Figure 187441DEST_PATH_IMAGE004
In that
Figure 161214DEST_PATH_IMAGE003
If there is no matching feature point, but there is depth value, then
Figure 725050DEST_PATH_IMAGE004
May be constructed as a second map point by an aperture imaging model.
4.
Figure 784273DEST_PATH_IMAGE004
In that
Figure 826178DEST_PATH_IMAGE003
There are no matching feature points, nor depth values,
Figure 920036DEST_PATH_IMAGE004
the process proceeds to step S13. Comprising steps S121-S122, wherein,
S121
Figure 651844DEST_PATH_IMAGE007
and
Figure 198363DEST_PATH_IMAGE004
to pair
Figure 575118DEST_PATH_IMAGE005
Associating and updating
In particular, the amount of the solvent to be used,
Figure 523483DEST_PATH_IMAGE004
is associated with to
Figure 429122DEST_PATH_IMAGE005
Figure 603882DEST_PATH_IMAGE002
Is added to
Figure 987590DEST_PATH_IMAGE005
The set of observation key frames of (a),
Figure 324549DEST_PATH_IMAGE005
the world coordinate is kept unchanged and taken out
Figure 135510DEST_PATH_IMAGE005
Current all observation key frames (note as
Figure 656622DEST_PATH_IMAGE008
Comprising
Figure 109600DEST_PATH_IMAGE002
) Descriptor of corresponding characteristic point (noted as
Figure 32556DEST_PATH_IMAGE009
) Calculating
Figure 279998DEST_PATH_IMAGE009
Each descriptor in
Figure 288405DEST_PATH_IMAGE010
To other descriptors
Figure 13916DEST_PATH_IMAGE011
Is a distance of
Figure 788449DEST_PATH_IMAGE012
,
Figure 472371DEST_PATH_IMAGE010
Average distance to other descriptors
Figure 702496DEST_PATH_IMAGE013
,
Figure 231697DEST_PATH_IMAGE014
To describe the number of children, with a minimum
Figure 129246DEST_PATH_IMAGE015
The descriptor of (2) is updated into the best descriptor, and calculation is carried out
Figure 984069DEST_PATH_IMAGE002
And with
Figure 701490DEST_PATH_IMAGE005
Is (noted as)
Figure 34382DEST_PATH_IMAGE016
) Extension of
Figure 512069DEST_PATH_IMAGE005
Observation distance range (note as
Figure 537794DEST_PATH_IMAGE017
) So that
Figure 742510DEST_PATH_IMAGE018
Included
Figure 613514DEST_PATH_IMAGE019
Calculating
Figure 220076DEST_PATH_IMAGE008
Each observation key frame pair
Figure 416702DEST_PATH_IMAGE005
Is observed in (is recorded as
Figure 843136DEST_PATH_IMAGE020
),
Figure 783410DEST_PATH_IMAGE020
Is updated to
Figure 241548DEST_PATH_IMAGE005
Average direction of observation of
Figure 77917DEST_PATH_IMAGE021
In particular, the amount of the solvent to be used,
Figure 257226DEST_PATH_IMAGE022
added as a new vertex to the co-view,
Figure 735612DEST_PATH_IMAGE022
associated first map point
Figure 51187DEST_PATH_IMAGE023
Is a set of observation key frames of
Figure 589615DEST_PATH_IMAGE024
Figure 990641DEST_PATH_IMAGE024
All key frames in (except
Figure 131772DEST_PATH_IMAGE022
By itself) is that
Figure 711308DEST_PATH_IMAGE022
Of a certain common view key frame
Figure 14113DEST_PATH_IMAGE025
In that
Figure 168014DEST_PATH_IMAGE024
The number of times of occurrence in the common view is the weight of the connecting line of the corresponding vertexes of the two key frames in the common view, which means the number of the common view points of the two key frames.
Step S14, if the feature points of the visual key frame do not have depth values, triangularization is performed on the visual key frame, the first common-view key frame, and the second common-view key frame to obtain a third map point.
Specifically, after the steps S12 and S13,
Figure 722623DEST_PATH_IMAGE022
there is also a considerable portion of feature points that are not associated with the first map point, nor have depth values, but there is a high possibility that corresponding feature points can be matched in the co-view key frame, and then the first map point can be associated with the matched feature points, or a new map point can be generated as a candidate, that is, a third map point, by using trigonometry.
And step S15, respectively presetting the second map point, the third map point and the fourth map point to obtain the map points which reach the standard, the rejected map points and the moving map points, rejecting the rejected map points, and presetting the map points of the next key frame of the moving map points moving to the visual key frame.
And the fourth map point is a stream map point obtained by presetting the map point of the last key frame of the visual key frame.
In the embodiment of the present application, the second, third, and fourth map points are screened, eliminated, and merged, the screened or merged map points are merged into the first map point, the elimination condition is met, the remaining merged map points enter the processing flow of the next key frame, and the second, third, and fourth map points are hereinafter collectively referred to as candidate map points
Figure 12790DEST_PATH_IMAGE026
Each candidate map point
Figure 627442DEST_PATH_IMAGE026
There will be a corresponding generated key frame if
Figure 3060DEST_PATH_IMAGE026
Produced until now have
Figure 626940DEST_PATH_IMAGE027
A key frame is generated if
Figure 771613DEST_PATH_IMAGE026
Is less than
Figure 819816DEST_PATH_IMAGE028
If it is not, it is removed
Figure 948309DEST_PATH_IMAGE026
Has more observation key frames than
Figure 110300DEST_PATH_IMAGE029
If so, it is merged into the first map point by screening
Figure 516005DEST_PATH_IMAGE026
In the number of observation key frames
Figure 738039DEST_PATH_IMAGE028
And
Figure 822669DEST_PATH_IMAGE029
and then the processing flow enters the next key frame as the fourth map point
And step S16, merging the standard map points into the first map points to obtain the visual map points.
In particular, the amount of the solvent to be used,
Figure 788351DEST_PATH_IMAGE026
there are two ways to incorporate the first map point: fusion and addition. Firstly, judging whether the map can be fused with the existing first map point or not, and if not, adding the map as a new first map point. The judgment mode of the fusion is as follows: first, here will be incorporated into the first map point
Figure 734765DEST_PATH_IMAGE026
Must be and
Figure 127701DEST_PATH_IMAGE022
related, fused objects are
Figure 965207DEST_PATH_IMAGE026
And
Figure 469000DEST_PATH_IMAGE022
first and second common view key frames
Figure 708352DEST_PATH_IMAGE030
The associated first map point
Figure 6609DEST_PATH_IMAGE031
In particular, for a candidate map point
Figure 65832DEST_PATH_IMAGE032
Which in turn is projected to
Figure 232371DEST_PATH_IMAGE030
Each of which isKey frame
Figure 326229DEST_PATH_IMAGE033
Matching feature points, if any, and the matched feature points are associated with the first map point
Figure 198983DEST_PATH_IMAGE034
If, if
Figure 745502DEST_PATH_IMAGE032
And
Figure 591098DEST_PATH_IMAGE034
simultaneously, the following requirements are met: 1. the world coordinate distance is within a certain range, 2, the average observation direction difference is within a certain range, and 3, the observation distance ranges are overlapped. Then will be
Figure 70621DEST_PATH_IMAGE032
And
Figure 976260DEST_PATH_IMAGE034
fusing: and reserving the world coordinates of the map points with a large number of observation key frames, combining the observation key frames of the map points and the observation key frames, and updating the optimal descriptor, the observation distance range and the average observation direction.
The embodiment of the application provides a method for constructing visual map points, which comprises the steps of acquiring a real-time visual frame after a map construction instruction is detected when map points of a visual map are constructed, analyzing and screening the real-time visual frame to generate a visual key frame, determining a first map point and a common view based on the visual key frame, judging whether a feature point of the visual key frame has a depth value and/or the first map point is associated, wherein the first map point is an original map point of the visual key frame, the common view is a common view corresponding to the first map point, updating the first map point and the common view based on the feature point of the visual key frame when the feature point of the visual key frame is associated with the first map point, constructing a second map point based on the feature point when the feature point of the visual key frame has a depth value and is not associated with the first map point, when the feature points of the visual key frame do not have depth values, triangularization construction is carried out on the visual key frame, the first common-view key frame and the second common-view key frame to obtain third map points, then preset processing is respectively carried out on the second map points, the third map points and the fourth map points to obtain standard map points, reject map points and moving map points, the rejected map points are rejected, preset processing is carried out on the map points of the next key frame of moving the moving map points to the visual key frame, the fourth map points are moving map points obtained by preset processing of the map points of the previous key frame of the visual key frame, then the standard map points are merged into the first map points to obtain the visual map points, accordingly the visual map points with robustness are obtained, and further the robustness of the visual map is improved.
In a possible implementation manner of the embodiment of the present application, step S11 specifically includes step S111 (not shown in the figure) and step S112 (not shown in the figure), wherein,
step S111, determining whether the visual key frame is an initial key frame.
Specifically, when a map building instruction is detected, a visual key frame is generated from the first acquired visual frame, and the visual key frame is an initial key frame.
Step S112, if yes, extracting feature points with depth values in the real-time visual frame, constructing map points based on the feature points, and initializing the map points to obtain a first map point and a common view.
Specifically, the method for extracting the feature points in the real-time visual frame comprises the following steps: harris, ORB, SURF, SIFT and the like, on the basis of selecting the feature points, two steps of image pyramid construction and feature point homogenization are added in the embodiment of the application, rotation main direction information of the feature points is added, the rotation main direction information is used for feature point matching between two images, and finally a first map point constructed based on the feature points and a common view are obtained.
In another possible implementation manner of the embodiment of the present application, step S10 specifically includes: step S101 (not shown), and step S102 (not shown), wherein,
step S101, obtaining equipment displacement information.
Wherein the device displacement information is movement information of the photographing device.
And S102, determining whether the shooting equipment generates a preset displacement distance or not based on the equipment displacement information, and if the shooting equipment generates the preset displacement distance, comparing and analyzing the real-time visual frame and a previous key frame of the real-time visual frame to generate a visual key frame.
Specifically, motion filtering determines whether processing of the current visual frame is required based on motion information, that is: and if the change of the pose of the current visual frame and the pose of the previous visual frame to be processed is lower than a certain range, skipping the visual frame, otherwise, taking the visual frame as the visual frame to be processed for further processing, wherein the change of the pose comprises displacement and posture change, the motion information can be acquired by a motion sensor, and the motion sensor comprises an IMU, a wheel type encoder and the like.
And comparing the visual frame to be processed with the previous key frame, analyzing, and judging whether a new key frame needs to be generated or not, if so, generating, otherwise, skipping. The comparison method is as follows: and extracting the feature points of the visual frame to be processed, matching the feature points with the feature points of the previous key frame, wherein the ratio of the matched feature points is lower than a certain threshold value, which means that the difference between the current visual frame and the previous key frame is large, namely the difference between the view scene and the previous key frame is large, and a new key frame needs to be generated, otherwise, the new key frame does not need to be generated. Regarding feature point matching between frames (including visual frames and key frames, and key frames), a step of rotation principal direction-based screening is added to a general matching result, specifically: calculating the rotation principal direction change value of all the matched feature points, and putting the rotation principal direction change value into a plurality of sections, e.g.
Figure 744496DEST_PATH_IMAGE035
And then, keeping the matched feature points in the 3 intervals with the largest number, and removing the rest of the matched feature points, wherein the change of the rotation main direction of the matched feature points between two frames is approximate under the general condition, and the specific interval size, the interval number and the result interval number can be configured according to the requirement.
In a possible implementation manner of the embodiment of the present application, step S112 (not shown in the figure) specifically includes step Sa (not shown in the figure), step Sb (not shown in the figure), step S133 (not shown in the figure), step S134 (not shown in the figure), step S135 (not shown in the figure), and step S136 (not shown in the figure), wherein,
and step Sa, determining an image pyramid based on the real-time visual frame, and extracting feature points from each layer of image of the image pyramid to obtain feature points with depth values.
Specifically, the image of the visual frame is subjected to multi-level scaling, for example, 480 × 640 resolution of the original image is used as the 0 th layer image, the 1 st layer image with 400 × 533 resolution and the 2 nd layer image with 333 × 444 resolution are obtained through 1.2 scaling, the obtained layers of images are sequentially scaled to form the image pyramid of the visual frame, and feature point extraction can be performed on each layer of images according to the scaling and the number of pyramid layers corresponding to the engineering setting. The image pyramid is a more complete description of the current visual frame, and is used for marking the visual frame image acquired at a position close to a target object as a frame 1 and marking the visual frame image acquired at a position far from the target object as a frame 2, so that the characteristic points of the target object can not be matched in the original images of the frame 1 and the frame 2 or the images at the same level of the image pyramid, and the characteristic points of the target object can be matched in the high-level image of the frame 1 and the low-level image of the frame 2, which is understood to be easier to match with a distant view after the near view is reduced.
And step Sb, carrying out homogenization treatment on the characteristic points to obtain the treated characteristic points.
Specifically, after feature points are extracted from a general image, a feature point aggregation phenomenon easily occurs, map points are directly generated, a large error is brought to pose calculation, and homogenization processing is performed on the map points, so that feature point aggregation is reduced. Such as: the 480 × 640 resolution image can be divided into 100 regions of 10 × 10, each region has 48 × 64 resolution, and the optimal 10 feature points of each region are selected. The mode of dividing the region and the number of the optimal feature points of the region can be configured according to the engineering requirement.
S1013 calculating the main rotation direction of the feature point
And extracting an image block within a certain distance range by taking the pixel where the feature point is located as the center, wherein the weighted sum of the pixel gray values of all the pixels of the image block is the gray center, the pixel coordinate where the feature point is located is the geometric center of the image block, and the vector direction of the geometric center pointing to the gray center is the main rotation direction of the feature point. The function is as follows: when the two images are matched with the characteristic points, the change of the rotation main direction of most matched characteristic points is approximate, so that the matching result can be screened, and the matching accuracy is improved.
In another possible implementation manner of the embodiment of the present application, step Sb (not shown in the figure) further includes: step Sb1 (not shown), and step Sb2 (not shown), wherein,
step Sb1, the rotation principal direction of the feature point is calculated to obtain the direction variation.
Specifically, an image block within a certain distance range with a pixel where a feature point is located as a center is extracted, the weighted sum of pixel gray values of all pixels of the image block is the gray center, the pixel coordinate where the feature point is located is the geometric center of the image block, and the vector direction of the geometric center pointing to the gray center is the main rotation direction of the feature point. The function is as follows: when the two images are matched with the characteristic points, the change of the rotation main direction of most matched characteristic points is approximate, so that the matching result can be screened, and the matching accuracy is improved.
And step Sb2, binding the direction variation quantity with the feature points to obtain the bound feature points.
In another possible implementation manner of the embodiment of the present application, step S14 specifically includes step S141 (not shown in the figure), step S142 (not shown in the figure), and step S143 (not shown in the figure), wherein,
in step S141, a first common view key frame and a second common view key frame are obtained.
The first common-view key frame is a key frame meeting the requirement of the common-view degree of the visual key frames, and the second common-view key frame is a key frame meeting the requirement of the common-view degree of the first common-view key frame.
Specifically, the common visibility requirement is the first ten frames with the highest common visibility, and the first common-view key frame is the first ten key frames with the highest common visibility with the key frame
Figure 393783DEST_PATH_IMAGE036
The second common-view key frame is ten key frames with the highest common-view degree with the first common-view key frame
Figure 730742DEST_PATH_IMAGE037
I.e. the total number of the first and second common-view key frames is at most
Figure 541703DEST_PATH_IMAGE038
All the first and second common-view key frames are marked as
Figure 797235DEST_PATH_IMAGE039
And step S142, respectively carrying out feature point matching on the visual key frame, the first common-view key frame and the second common-view key frame to obtain matched feature points.
Specifically, the visual key frames are sequentially connected with
Figure 250213DEST_PATH_IMAGE039
Each common-view key frame is subjected to feature point matching, and the matching mode is synchronous with the feature point matching between the frames in the step S102. Only, one-step scene depth inspection needs to be performed before matching, specifically: for each one
Figure 438749DEST_PATH_IMAGE040
And calculating the median depth values of all feature points with depth values, namely the median depth values of the scene
Figure 686191DEST_PATH_IMAGE040
Relative displacement from the visual key frame (as a baseline in trigonometric calculations) with
Figure 429019DEST_PATH_IMAGE040
Scene (2)If the depth median ratio is small, skipping is skipped because the map points constructed by triangulation are not reliable.
And S143, constructing the matched feature points through triangulation to obtain third map points.
In particular, feature points of a visual keyframe
Figure 420109DEST_PATH_IMAGE041
Figure 194642DEST_PATH_IMAGE040
Corresponding matched feature points
Figure 612985DEST_PATH_IMAGE042
If, if
Figure 108688DEST_PATH_IMAGE042
Associated first map point
Figure 637890DEST_PATH_IMAGE034
Will be
Figure 535439DEST_PATH_IMAGE034
Is re-projected to
Figure 124683DEST_PATH_IMAGE022
Checking the weight projection error by a chi-square method, and if the chi-square method passes the check, directly checking the weight projection error by the chi-square method
Figure 576524DEST_PATH_IMAGE041
Is associated with to
Figure 909416DEST_PATH_IMAGE034
If the chi-square test fails or
Figure 918262DEST_PATH_IMAGE042
Without the associated first map points, map points are constructed by triangulation, which are respectively re-projected onto
Figure 678408DEST_PATH_IMAGE022
And
Figure 617545DEST_PATH_IMAGE040
and calculating a reprojection error, and carrying out chi-square inspection on the reprojection error, wherein if the chi-square inspection is passed, the map point is a third map point, and if not, the map point is discarded.
There are two cases, map points, of the reprojection error chi-square test here
Figure 19707DEST_PATH_IMAGE031
(including first map points and triangulated newly constructed map points) to keyframes
Figure 626269DEST_PATH_IMAGE043
Upper, reprojection coordinates of
Figure 557316DEST_PATH_IMAGE044
Figure 983749DEST_PATH_IMAGE031
In that
Figure 924023DEST_PATH_IMAGE043
The coordinates of the corresponding characteristic points are
Figure 647741DEST_PATH_IMAGE045
If, if
Figure 749689DEST_PATH_IMAGE045
Without depth values, a 2-degree-of-freedom chi-square test is performed, i.e. only calculations are made
Figure 928998DEST_PATH_IMAGE045
And
Figure 407384DEST_PATH_IMAGE044
pixel coordinate (two directions) error of if
Figure 722959DEST_PATH_IMAGE045
If there is a depth value, then a 3-degree-of-freedom chi-square test is performed, plus the error in the depth direction, whichThe error requires a normalization process, assuming that the errors in these three directions are independent and normally distributed.
In another possible implementation manner of the embodiment of the present application, step S16 is followed by step S161 (not shown in the figure) and step S162 (not shown in the figure), wherein,
step S161, determining a historical keyframe set based on the visual keyframe, and screening historical map points in the historical keyframe set to obtain a historical first map point.
And step S162, carrying out map construction on the historical first map point and the first map point to obtain a visual map.
The above embodiments describe a method for constructing a visual map point from the perspective of a method flow, and the following embodiments describe a device for constructing a visual map point from the perspective of a virtual module or a virtual unit, which are described in detail in the following embodiments.
An embodiment of the present application provides a device for constructing a visual map point, as shown in fig. 2, the device 20 for constructing a visual map point may specifically include: an acquisition generation module 21, a determination judgment module 22, an update module 23, a first construction module 24, a second construction module 25, a processing module 26, and a fusion construction module 27, wherein,
the acquisition and generation module 21 is configured to acquire a real-time visual frame after detecting a map construction instruction, and analyze and screen the real-time visual frame to generate a visual key frame;
the determination and judgment module 22 is configured to determine a first map point and a common view based on the visual key frame, and judge whether a feature point of the visual key frame has a depth value and/or the first map point is associated, where the first map point is an original map point of the visual key frame, and the common view is a common view corresponding to the first map point;
an updating module 23, configured to update the first map point and the common view based on the feature point of the visual key frame when the feature point of the visual key frame is associated with the first map point;
a first construction module 24, configured to construct a second map point based on the feature points when the feature points of the visual key frame have depth values and are not associated with the first map point;
the second construction module 25 is configured to perform triangularization construction on the visual key frame, the first common-view key frame and the second common-view key frame to obtain a third map point when the feature points of the visual key frame do not have depth values;
a processing module 26, configured to perform preset processing on the second map point, the third map point, and the fourth map point respectively to obtain a map point that meets the standard, a rejected map point, and a moving map point, and reject the rejected map point, and perform preset processing on a map point of a key frame that moves the moving map point to a next key frame of the visual key frame, where the fourth map point is a moving map point obtained by performing preset processing on a map point of a previous key frame of the visual key frame;
and the blending construction module 27 is configured to merge the standard map point into the first map point to obtain a visual map point.
In a possible implementation manner of the embodiment of the present application, when determining the first map point and the common view based on the visual key frame, the determining and determining module 22 is specifically configured to:
judging whether the visual key frame is an initial key frame or not;
if yes, extracting feature points with depth values in the real-time visual frame, constructing map points based on the feature points, and initializing the map points to obtain a first map point and a common view.
In another possible implementation manner of the embodiment of the present application, the obtaining and generating module 21 is specifically configured to, when analyzing and screening the real-time visual frame and generating the visual key frame:
acquiring equipment displacement information, wherein the equipment displacement information is the movement information of the shooting equipment;
and determining whether the shooting equipment generates a preset displacement distance or not based on the equipment displacement information, and if so, comparing and analyzing the real-time visual frame and a previous key frame of the real-time visual frame to generate a visual key frame.
In another possible implementation manner of the embodiment of the present application, when determining that the determining module 22 extracts the feature points with depth values in the real-time visual frame, it is specifically configured to:
determining an image pyramid based on the real-time visual frame, and extracting feature points from each layer of image of the image pyramid to obtain feature points with depth values;
and carrying out homogenization treatment on the characteristic points to obtain the treated characteristic points.
In another possible implementation manner of the embodiment of the present application, the apparatus 20 further includes: a direction calculation module and a data binding module, wherein,
the direction calculation module is used for calculating the main rotation direction of the characteristic points to obtain direction variation;
and the data binding module is used for binding the direction variable quantity and the characteristic points to obtain the bound characteristic points.
In another possible implementation manner of the embodiment of the present application, when triangulating the visual key frame, the first common view key frame, and the second common view key frame to obtain the third map point, the second constructing module 25 is specifically configured to:
acquiring a first common-view key frame and a second common-view key frame, wherein the first common-view key frame is a key frame meeting the common-view degree requirement of the visual key frames, and the second common-view key frame is a key frame meeting the common-view degree requirement of the first common-view key frames;
respectively carrying out feature point matching on the visual key frame, the first common-view key frame and the second common-view key frame to obtain matched feature points;
and constructing the matched feature points through triangulation to obtain a third map point.
In another possible implementation manner of the embodiment of the present application, the apparatus 20 further includes: a data filtering module and a map building module, wherein,
the data screening module is used for determining a historical key frame set based on the visual key frame and screening historical map points in the historical key frame set to obtain a historical first map point;
and the map construction module is used for carrying out map construction on the historical first map points and the first map points to obtain the visual map.
In an embodiment of the present application, an electronic device is provided, as shown in fig. 3, where the electronic device 300 shown in fig. 3 includes: a processor 301 and a memory 303. Wherein processor 301 is coupled to memory 303, such as via bus 302. Optionally, the electronic device 300 may also include a transceiver 304. It should be noted that the transceiver 304 is not limited to one in practical applications, and the structure of the electronic device 300 is not limited to the embodiment of the present application.
The Processor 301 may be a CPU (Central Processing Unit), a general-purpose Processor, a DSP (Digital Signal Processor), an ASIC (Application Specific Integrated Circuit), an FPGA (Field Programmable Gate Array) or other Programmable logic device, a transistor logic device, a hardware component, or any combination thereof. Which may implement or perform the various illustrative logical blocks, modules, and circuits described in connection with the disclosure. The processor 301 may also be a combination of computing functions, e.g., comprising one or more microprocessors, a combination of a DSP and a microprocessor, or the like.
Bus 302 may include a path that transfers information between the above components. The bus 302 may be a PCI (Peripheral Component Interconnect) bus, an EISA (Extended Industry Standard Architecture) bus, or the like. The bus 302 may be divided into an address bus, a data bus, a control bus, and the like. For ease of illustration, only one thick line is shown in FIG. 3, but this does not mean only one bus or one type of bus.
The Memory 303 may be a ROM (Read Only Memory) or other type of static storage device that can store static information and instructions, a RAM (Random Access Memory) or other type of dynamic storage device that can store information and instructions, an EEPROM (Electrically Erasable Programmable Read Only Memory), a CD-ROM (Compact Disc Read Only Memory) or other optical Disc storage, optical Disc storage (including Compact Disc, laser Disc, optical Disc, digital versatile Disc, blu-ray Disc, etc.), a magnetic Disc storage medium or other magnetic storage device, or any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer, but is not limited to these.
The memory 303 is used for storing application program codes for executing the scheme of the application, and the processor 301 controls the execution. The processor 301 is configured to execute application program code stored in the memory 303 to implement the aspects illustrated in the foregoing method embodiments.
Among them, electronic devices include but are not limited to: mobile terminals such as mobile phones, notebook computers, digital broadcast receivers, PDAs (personal digital assistants), PADs (tablet computers), PMPs (portable multimedia players), in-vehicle terminals (e.g., in-vehicle navigation terminals), and the like, and fixed terminals such as digital TVs, desktop computers, and the like. But also a server, etc. The electronic device shown in fig. 3 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present disclosure.
The present application provides a computer-readable storage medium, on which a computer program is stored, which, when running on a computer, enables the computer to execute the corresponding content in the foregoing method embodiments. Compared with the prior art, in the embodiment of the application, when map points of a visual map are constructed, after a map construction instruction is detected, a real-time visual frame is obtained, the real-time visual frame is analyzed and screened to generate a visual key frame, then a first map point and a common view are determined based on the visual key frame, whether a feature point of the visual key frame has a depth value and/or whether the first map point is associated is judged, the first map point is an original map point of the visual key frame, the common view is a common view corresponding to the first map point, when the feature point of the visual key frame is associated with the first map point, the first map point and the common view are updated based on the feature point of the visual key frame, when the feature point of the visual key frame has a depth value and is not associated with the first map point, a second map point is constructed based on the feature point, and when the feature point of the visual key frame does not have a depth value, the method comprises the steps of triangularizing a visual key frame, a first common-view key frame and a second common-view key frame to obtain a third map point, presetting second map points, third map points and fourth map points respectively to obtain standard map points, rejected map points and moving map points, rejecting the rejected map points, moving the moving map points to the map point of the next key frame of the visual key frame to preset, and merging the standard map points into the first map point to obtain the visual map points, so that the visual map points with robustness are obtained, and the robustness of the visual map is improved.
It should be understood that, although the steps in the flowcharts of the figures are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and may be performed in other orders unless explicitly stated herein. Moreover, at least a portion of the steps in the flow chart of the figure may include multiple sub-steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, which are not necessarily performed in sequence, but may be performed alternately or alternately with other steps or at least a portion of the sub-steps or stages of other steps.
The foregoing is only a partial embodiment of the present application, and it should be noted that, for those skilled in the art, various modifications and decorations can be made without departing from the principle of the present application, and these modifications and decorations should also be regarded as the protection scope of the present application.

Claims (10)

1. A method for constructing a visual map point, comprising:
when a map building instruction is detected, acquiring a real-time visual frame, analyzing and screening the real-time visual frame, and generating a visual key frame;
determining a first map point and a common view based on the visual key frame, and judging whether a feature point of the visual key frame has a depth value and/or the first map point is associated, wherein the first map point is an original map point of the visual key frame, and the common view is a common view corresponding to the first map point;
if the feature point of the visual key frame is associated with the first map point, updating the first map point and the common view based on the feature point of the visual key frame;
if the feature points of the visual key frame have depth values and do not have relevance with the first map point, constructing a second map point based on the feature points;
if the feature points of the visual key frame do not have depth values, triangularization construction is carried out on the visual key frame, the first common-view key frame and the second common-view key frame to obtain a third map point;
respectively presetting the second map point, the third map point and a fourth map point to obtain a map point reaching the standard, a rejected map point and a moving map point, rejecting the rejected map point, moving the moving map point to a map point of a next key frame of the visual key frame for presetting, wherein the fourth map point is the moving map point obtained by presetting the map point of the previous key frame of the visual key frame;
and merging the standard map point into the first map point to obtain a visual map point.
2. The method of claim 1, wherein determining a first map point and a co-view based on the visual keyframe comprises:
judging whether the visual key frame is an initial key frame;
if yes, extracting feature points with depth values in the real-time visual frame, constructing map points based on the feature points, and initializing the map points to obtain a first map point and a common view.
3. The method of claim 1, wherein the performing analysis screening on the real-time visual frames to generate visual key frames comprises:
acquiring equipment displacement information, wherein the equipment displacement information is the movement information of shooting equipment;
and determining whether the shooting equipment generates a preset displacement distance or not based on the equipment displacement information, and if so, comparing and analyzing the real-time visual frame and a previous key frame of the real-time visual frame to generate a visual key frame.
4. The method of claim 2, wherein extracting feature points with depth values in the real-time visual frame comprises:
determining an image pyramid based on the real-time visual frame, and extracting feature points from each layer of image of the image pyramid to obtain feature points with depth values;
and carrying out homogenization treatment on the characteristic points to obtain the treated characteristic points.
5. The method of claim 4, wherein the homogenizing the feature points to obtain processed feature points, further comprising:
calculating the rotation main direction of the characteristic points to obtain direction variation;
and binding the direction variable quantity and the characteristic point to obtain the bound characteristic point.
6. The method of claim 1, wherein triangulating the visual key frame, the first common view key frame and the second common view key frame to obtain a third map point comprises:
acquiring a first common-view key frame and a second common-view key frame, wherein the first common-view key frame is a key frame meeting the common-view range requirement of the visual key frames, and the second common-view key frame is a key frame meeting the common-view range requirement of the first common-view key frames;
respectively carrying out feature point matching on the visual key frame, the first common-view key frame and the second common-view key frame to obtain matched feature points;
and constructing the matched feature points through triangulation to obtain a third map point.
7. The method of claim 1, wherein said merging the qualifying map point into the first map point results in a visual map point, and thereafter further comprising:
determining a historical key frame set based on the visual key frame, and screening historical map points in the historical key frame set to obtain a historical first map point;
and carrying out map construction on the historical first map point and the first map point to obtain a visual map.
8. An apparatus for constructing a visual map point, comprising:
the acquisition and generation module is used for acquiring a real-time visual frame after a map construction instruction is detected, analyzing and screening the real-time visual frame and generating a visual key frame;
a determination and judgment module, configured to determine a first map point and a common view based on the visual key frame, and judge whether a feature point of the visual key frame has a depth value and/or the first map point is associated, where the first map point is an original map point of the visual key frame, and the common view is a common view corresponding to the first map point;
an updating module, configured to update the first map point and the co-view based on the feature point of the visual key frame when the feature point of the visual key frame is associated with the first map point;
a first construction module for constructing a second map point based on feature points of the visual key frame when the feature points have depth values and are not associated with the first map point;
the second construction module is used for triangularizing the visual key frame, the first common-view key frame and the second common-view key frame to obtain a third map point when the feature points of the visual key frame do not have depth values;
the processing module is used for respectively carrying out preset processing on the second map point, the third map point and the fourth map point to obtain a map point reaching the standard, a rejected map point and a moving map point, rejecting the rejected map point, moving the moving map point to a map point of a next key frame of the visual key frame for preset processing, and obtaining a moving map point by the preset processing of the map point of a previous key frame of the visual key frame;
and the fusion construction module is used for merging the standard map points into the first map points to obtain the visual map points.
9. An electronic device, comprising:
one or more processors;
a memory;
one or more applications, wherein the one or more applications are stored in the memory and configured to be executed by the one or more processors, the one or more programs configured to: performing the method of constructing a visual map point according to any one of claims 1 to 7.
10. A computer-readable storage medium on which a computer program is stored, the program, when executed by a processor, implementing the method for constructing a visual map point according to any one of claims 1 to 7.
CN202210795743.7A 2022-07-07 2022-07-07 Method, device, equipment and medium for constructing visual map points Pending CN114998743A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210795743.7A CN114998743A (en) 2022-07-07 2022-07-07 Method, device, equipment and medium for constructing visual map points

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210795743.7A CN114998743A (en) 2022-07-07 2022-07-07 Method, device, equipment and medium for constructing visual map points

Publications (1)

Publication Number Publication Date
CN114998743A true CN114998743A (en) 2022-09-02

Family

ID=83019863

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210795743.7A Pending CN114998743A (en) 2022-07-07 2022-07-07 Method, device, equipment and medium for constructing visual map points

Country Status (1)

Country Link
CN (1) CN114998743A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116704403A (en) * 2023-05-11 2023-09-05 杭州晶彩数字科技有限公司 Building image vision identification method and device, electronic equipment and medium

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116704403A (en) * 2023-05-11 2023-09-05 杭州晶彩数字科技有限公司 Building image vision identification method and device, electronic equipment and medium
CN116704403B (en) * 2023-05-11 2024-05-24 杭州晶彩数字科技有限公司 Building image vision identification method and device, electronic equipment and medium

Similar Documents

Publication Publication Date Title
CN109035304B (en) Target tracking method, medium, computing device and apparatus
CN109815843B (en) Image processing method and related product
CN110363817B (en) Target pose estimation method, electronic device, and medium
CN107329962B (en) Image retrieval database generation method, and method and device for enhancing reality
CN111104925B (en) Image processing method, image processing apparatus, storage medium, and electronic device
CN108875750B (en) Object detection method, device and system and storage medium
CN110998671B (en) Three-dimensional reconstruction method, device, system and storage medium
Son et al. A multi-vision sensor-based fast localization system with image matching for challenging outdoor environments
WO2023284358A1 (en) Camera calibration method and apparatus, electronic device, and storage medium
CN108229281B (en) Neural network generation method, face detection device and electronic equipment
CN115909157A (en) Machine vision-based identification detection method, device, equipment and medium
CN109961103B (en) Training method of feature extraction model, and image feature extraction method and device
CN114998743A (en) Method, device, equipment and medium for constructing visual map points
CN106033613B (en) Method for tracking target and device
CN114972492A (en) Position and pose determination method and device based on aerial view and computer storage medium
CN113704276A (en) Map updating method and device, electronic equipment and computer readable storage medium
CN115578432B (en) Image processing method, device, electronic equipment and storage medium
CN112085842A (en) Depth value determination method and device, electronic equipment and storage medium
CN116052175A (en) Text detection method, electronic device, storage medium and computer program product
JP2023065296A (en) Planar surface detection apparatus and method
CN114494977A (en) Abnormal parking detection method, electronic equipment and storage medium
CN110399892B (en) Environmental feature extraction method and device
CN112016495A (en) Face recognition method and device and electronic equipment
CN112767477A (en) Positioning method, positioning device, storage medium and electronic equipment
CN116704403B (en) Building image vision identification method and device, electronic equipment and medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination