CN114494612A

CN114494612A - Method, device and equipment for constructing point cloud map

Info

Publication number: CN114494612A
Application number: CN202011154289.4A
Authority: CN
Inventors: 刘毅; 张腾
Original assignee: Guangzhou Shiyuan Electronics Thecnology Co Ltd; Guangzhou Shirui Electronics Co Ltd
Current assignee: Guangzhou Shiyuan Electronics Thecnology Co Ltd; Guangzhou Shirui Electronics Co Ltd
Priority date: 2020-10-26
Filing date: 2020-10-26
Publication date: 2022-05-13

Abstract

The application relates to the technical field of computer vision, and provides a method, a device, equipment and a storage medium for constructing a point cloud map. According to the method and the device, the efficiency of point cloud map construction can be improved on the premise of ensuring the mapping performance. The method comprises the following steps: at least two mapping key frame images are acquired from a first image sequence shot by a first camera, and an image shot synchronously with a target mapping key frame image is acquired from a second image sequence shot by a second camera to serve as a scale reference image, wherein the target mapping key frame image can be one of the at least two mapping key frame images. Determining the mapping scale of a first camera according to a first space position, corresponding to a common characteristic point, obtained based on at least two mapping key frame images in a space, and a second space position, corresponding to the common characteristic point, obtained based on a target mapping key frame image and a scale reference image in the space, and finally constructing a point cloud map based on the target mapping key frame based on the mapping scale and a first image sequence.

Description

Method, device and equipment for constructing point cloud map

Technical Field

The present application relates to the field of computer vision technologies, and in particular, to a method, an apparatus, a device, and a storage medium for constructing a point cloud map.

Background

With the development of computer vision technology, a technology for constructing a point cloud map based on computer vision is developed, a three-dimensional space structure of a real scene can be recovered from an image sequence through the technology, and the technology is also one of key technologies used in application scenes such as intelligent robot navigation.

The Structure From Motion estimation (SFM) can gradually recover image positions and key point coordinates using multi-view geometry information based on the association information of two-dimensional key points between images, and an incremental mapping framework is widely used for robustness.

The currently used technique for constructing a point cloud map generally includes incrementally constructing a map by using images captured by multiple cameras each time as a series of discrete images, and then adding a ring constraint according to fixed positions between the cameras to perform modeling adjustment. However, if 100 times of photographing is performed using, for example, a binocular camera, it is necessary to perform individual composition using 200 images in the construction process, and beam adjustment is performed a plurality of times due to incremental composition, the beam adjustment having a complexity of O (n) with the number of images³) Therefore, when the number of cameras is increased by the technology, the time for map building is greatly increased, the map scale cannot be effectively constrained, and the efficiency of point cloud map building is low.

Disclosure of Invention

Based on this, it is necessary to provide a method, an apparatus, a device and a storage medium for constructing a point cloud map, aiming at the technical problem that the efficiency of constructing the point cloud map is low in the conventional technology.

A method of constructing a point cloud map, the method comprising:

acquiring at least two image-establishing key frame images from a first image sequence; the first image sequence is an image sequence shot by a first camera for constructing a point cloud map;

acquiring an image which is shot synchronously with a target image construction key frame image in the second image sequence and is used as a scale reference image; the second image sequence is an image sequence which is obtained by shooting by a second camera and is used for constructing the point cloud map; the second camera and the first camera are separated from each other by a preset distance when synchronously shooting an image sequence; the target image establishing key frame image is one of the at least two image establishing key frame images;

obtaining a corresponding first spatial position of the common characteristic point in the space based on the at least two mapping key frame images; the common characteristic points are the characteristic points which are common to the at least two image establishing key frame images and the scale reference image;

obtaining a corresponding second spatial position of the common characteristic point in the space based on the target mapping key frame image and the scale reference image;

determining a mapping scale of the first camera according to the first spatial position and the second spatial position;

and constructing a point cloud map based on the point cloud corresponding to the characteristic points in the space on the target mapping key frame image based on the mapping scale and the first image sequence.

An apparatus for constructing a point cloud map, comprising:

the first image acquisition module is used for acquiring at least two image establishing key frame images from the first image sequence; the first image sequence is an image sequence shot by a first camera for constructing a point cloud map;

the second image acquisition module is used for acquiring an image which is shot synchronously with the target mapping key frame image in the second image sequence and is used as a scale reference image; the second image sequence is an image sequence which is obtained by shooting by a second camera and is used for constructing the point cloud map; the second camera and the first camera are separated from each other by a preset distance when synchronously shooting an image sequence; the target image establishing key frame image is one of the at least two image establishing key frame images;

a first position obtaining module, configured to obtain, based on the at least two mapping key frame images, a corresponding first spatial position of the common feature point in space; the common characteristic points are the characteristic points which are common to the at least two image establishing key frame images and the scale reference image;

a second position obtaining module, configured to obtain a corresponding second spatial position of the common feature point in the space based on the target mapping key frame image and the scale reference image;

the scale determining module is used for determining the mapping scale of the first camera according to the first space position and the second space position;

and the map building module is used for building a point cloud map based on the point cloud corresponding to the characteristic points in the space on the target mapping key frame image based on the mapping scale and the first image sequence.

An apparatus for constructing a point cloud map, comprising a first camera and a second camera, and a processor; the processor is used for acquiring an image sequence synchronously shot by the first camera and the second camera and constructing a point cloud map according to the method; and the first camera and the second camera are separated from each other by a preset distance when synchronously shooting the image sequence.

An electronic device comprising a memory storing a computer program and a processor implementing the steps of the method as described above when executing the computer program.

A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the method as set forth above.

The method, the device, the equipment and the storage medium for constructing the point cloud map comprise the steps of acquiring at least two mapping key frame images from a first image sequence shot by a first camera, and then acquiring an image shot synchronously with a target mapping key frame image from a second image sequence shot by a second camera as a scale reference image, wherein the target mapping key frame image can be one of the at least two mapping key frame images. Then, a first space position corresponding to the common characteristic point in the space is obtained based on the at least two mapping key frame images, a second space position corresponding to the common characteristic point in the space is obtained based on the target mapping key frame image and the scale reference image, so that the mapping scale of the first camera is determined according to the first space position and the second space position, and finally, a point cloud map based on the point cloud corresponding to the characteristic point in the space on the target mapping key frame is constructed based on the mapping scale and the first image sequence shot by the first camera. According to the scheme, the second camera can be directly added to observe and constrain the map scale of the first camera under the condition that the complexity is almost unchanged, so that the point cloud map can be constructed based on the image shot by the first camera for obtaining the real map building scale, the efficiency of constructing the point cloud map is improved under the condition of ensuring the map building performance, and the method is also beneficial to improving the positioning precision of the intelligent robot.

Drawings

FIG. 1 is a diagram of an application environment of a method for constructing a point cloud map in one embodiment;

FIG. 2 is a schematic flow chart diagram of a method for constructing a point cloud map, according to one embodiment;

FIG. 3 is a schematic diagram of determining a mapping scale for a first camera in one embodiment;

FIG. 4 is a flowchart illustrating the steps of obtaining an image of a keyframe from an embodiment;

FIG. 5 is a schematic flow chart diagram illustrating the steps for optimizing mapping dimensions in one embodiment;

FIG. 6 is a schematic flow chart diagram illustrating the steps of constructing a point cloud map in one embodiment;

FIG. 7 is a flowchart illustrating the steps of determining a next key frame image for a map in one embodiment;

FIG. 8 is a schematic flow chart diagram illustrating the steps of superimposing a newly generated point cloud onto an initial point cloud map in one embodiment;

FIG. 9 is a flow diagram of a front-end process for graph creation in an example application;

FIG. 10 is a flow diagram of a back-end process of graph creation in an application example;

FIG. 11 is a diagram illustrating a comparison of mapping effects in an example application;

FIG. 12 is a block diagram of an apparatus for constructing a point cloud map in one embodiment;

FIG. 13 is a block diagram of an apparatus for constructing a point cloud map, in one embodiment;

FIG. 14 is a diagram illustrating the internal architecture of an electronic device in one embodiment.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.

The method for constructing the point cloud map can be applied to the application environment shown in fig. 1. The application environment may include a mapping front-end device 110 and a mapping back-end device 120, among others. The mapping front-end device 110 may be in communication connection with the mapping back-end device 120 through a network, the mapping front-end device 110 may further be connected to a first camera and a second camera, the mapping front-end device may be configured to control the first camera and the second camera to capture a live-action image, the first camera and the second camera may be configured to capture the live-action image synchronously, and when the live-action image is captured, the first camera and the second camera are separated by a preset distance. The mapping front-end device 110 may also acquire, in real time or non-real time, image sequences captured by the first camera and the second camera, where the image sequence captured by the first camera may be denoted as a first image sequence, and the image sequence captured by the second camera may be denoted as a second image sequence.

The method for constructing the point cloud map provided by the application can be executed by the mapping front-end device 110 or the mapping back-end device 120 alone, or executed by the mapping front-end device 110 and the mapping back-end device 120 in a matching manner.

In an exemplary embodiment, the method for constructing a point cloud map provided by the present application is described as an example implemented by the mapping front-end device 110 alone: the mapping front-end device 110 acquires at least two mapping key frame images from the first image sequence, and acquires an image which is shot synchronously with the target mapping key frame image in the second image sequence as a scale reference image; the target mapping key frame image can be one of the at least two mapping key frame images; then, the mapping front-end device 110 obtains a corresponding first spatial position of the common feature point in the space based on the at least two mapping key frame images, and obtains a corresponding second spatial position of the common feature point in the space based on the target mapping key frame image and the scale reference image; the common characteristic point is a characteristic point common to the at least two image construction key frame images and the scale reference image; the mapping scale is a scale adopted in the process of constructing the point cloud map, two mapping key frame images in the first image sequence are obtained by shooting through the first camera, and the depth of the point cloud determined by the monocular camera in the actual physical space has uncertainty due to the scale scaling effect when the point cloud map is constructed through the monocular camera. Therefore, the first spatial positions calculated only from at least two of the mapped keyframe images in the first image sequence usually do not reflect the actual positions of the common feature points in the actual physical space due to the lack of actual physical scale information. For example, although it can be calculated that the distance between a certain common feature point and the first camera in the space is 1, it cannot be clearly determined whether the common feature points are 1 meter or 1 decimeter away from the first camera in the space, that is, the mapping scale is absent. However, there is a real number (i.e., mapping scale) that enables the first spatial locations of the common feature points calculated from the at least two mapping keyframe images to be aligned with the actual spatial locations of the common feature points.

Then, the mapping front-end device 110 determines the mapping scale of the first camera according to the first spatial position and the second spatial position, and based on this, the mapping front-end device 110 may construct a point cloud map based on a point cloud corresponding to a feature point in space on the target mapping key frame image based on the mapping scale and the first image sequence.

In the exemplary embodiment, the point cloud map building method is executed by the mapping front-end device 110 alone, and the point cloud map of the actual physical scene can be built in real time under the condition that the mapping front-end device 110 mounts the first camera and the second camera.

In an exemplary embodiment, the method for constructing a point cloud map provided by the present application is described as an example implemented by the mapping backend device 120 alone: the mapping back-end device 120 may store the first image sequence and the second image sequence sent by the mapping front-end device 110, and when the point cloud map needs to be constructed, the mapping back-end device 120 extracts the first image sequence and the second image sequence, and then the mapping back-end device 120 may obtain the point cloud map by adopting a similar step flow of the above method. In the exemplary embodiment, the point cloud map construction method is executed by the mapping backend device 120 alone, and offline construction of the point cloud map corresponding to the public image dataset including but not limited to the public image dataset can be realized in the backend.

In an exemplary embodiment, the method for constructing a point cloud map provided by the present application is described as an example implemented by the mapping front-end device 110 and the mapping back-end device 120 in cooperation: the mapping front-end device 110 may obtain at least two mapping key frame images from the first image sequence and a scale reference image synchronously captured with the target mapping key frame image from the second image sequence, and obtain a corresponding first spatial position of the common feature point in the space based on the at least two mapping key frame images, and obtain a corresponding second spatial position of the common feature point in the space based on the target mapping key frame image and the scale reference image, so that the mapping front-end device 110 may determine a mapping scale of the first camera, the mapping front-end device 110 may further transmit the mapping scale and the first image sequence to the mapping back-end device 120, and the mapping back-end device 120 may construct, based on the mapping scale and the first image sequence, a point cloud map based on a point cloud corresponding to the feature point in the space on the target mapping key frame image. In the exemplary embodiment, the step of determining the mapping scale of the first camera may be performed by the mapping front-end device 110, and the subsequent step of constructing the point cloud map based on the mapping scale of the first camera and the first image sequence captured by the first camera may be performed by the mapping back-end device 120 which may have more computational resources, so as to improve the efficiency and accuracy of constructing the point cloud map.

In the above application scenarios, the front-end mapping device 110 may be, but is not limited to, various personal computers, notebook computers, smart phones, tablet computers, and other electronic devices with at least two cameras, such as a mobile robot, and the back-end mapping device 120 may be implemented by an independent server or a server cluster formed by a plurality of servers.

In one embodiment, as shown in fig. 2, a method for constructing a point cloud map is provided, which is exemplified by applying the method to the mapping backend device 120 in fig. 1, and the method may include the following steps:

step S201, at least two mapping key frame images are acquired from the first image sequence.

In this step, the mapping backend device 120 may obtain an image sequence, referred to as a first image sequence, shot by the first camera to construct the point cloud map, where the first image sequence may include multiple frames of images arranged in time sequence, obtained by shooting a space real scene by the first camera. The mapping back-end device 120 obtains at least two mapping key frame images from the first image sequence, where the mapping key frame images are key frame images used for constructing the point cloud map, and one of the at least two mapping key frame images may be used as a first frame image used for constructing the point cloud map in a subsequent step of the mapping back-end device 120, and one of the at least two mapping key frame images may be marked as a target mapping key frame image.

Step S202, acquiring images which are shot synchronously with the target mapping key frame image in the second image sequence as a scale reference image.

In this step, after obtaining at least two mapping key frame images from the first image sequence, the mapping back-end device 120 may further obtain an image captured synchronously with the target mapping key frame image from the second image sequence as a scale reference image. Specifically, the second image sequence refers to an image sequence obtained by shooting with a second camera and used for constructing the point cloud map, and is also a time-sequence-arranged multi-frame image obtained by shooting a space real scene with the second camera. The distance between the second camera and the first camera is known in the process of synchronously shooting the corresponding image sequence, that is, in the process of shooting the space real scene, the distance between the second camera and the first camera is known, for example, the distance between the first camera and the second camera can be calibrated by the optical center distance or the baseline length between the two cameras, for example, the baseline length can be kept to be about 15 centimeters and the two cameras can be calibrated in advance when the first camera and the second camera synchronously shoot the image sequence, so as to clarify the camera internal parameters of the two cameras, and further synchronize the exposure time and the shooting time stamp of the two cameras, so that the mapping back-end device 120 can accurately extract the image synchronously shot with the target mapping key frame image from the second image sequence.

Step S203, obtaining a first space position corresponding to the common feature point in the space based on at least two mapping key frame images.

In this step, the mapping backend device 120 determines a feature point common to at least three images, namely, at least two mapping key frame images and a scale reference image, as a common feature point, and the feature point can be determined by a feature point matching method in the at least three images. Specifically, for the calculation process of the first spatial position corresponding to the common feature point in the space, the feature point common to the at least two mapping key frame images and the scale reference image may be determined first, and for example, each pixel point in the at least three images may be matched in the at least three images according to the features, such as the gray value, the gray gradient value, and the like, of each pixel point in the at least three images in the corresponding image to determine the common feature point of the at least three images; and then, calculating the first spatial position corresponding to the common characteristic point according to the two-dimensional position coordinates of the common characteristic point on the at least two mapping key frame images and the rotational-translational relationship of the first camera when the at least two mapping key frame images are shot.

After the common feature points are determined, the mapping back-end device 120 further obtains, based on at least two mapping key frame images, corresponding spatial positions of the common feature points in the space, and records the spatial positions as first spatial positions, where the first spatial positions may be represented by three-dimensional space coordinates, and are used to represent positions of the common feature points in the real space, which are calculated based on the mapping key frame images, and the positions may be represented by the three-dimensional space coordinates. However, since the at least two mapping key frame images are captured by the first camera, and the first spatial position calculated thereby cannot reflect the real positions of the common feature points in the real space due to lack of actual physical dimension information, for example, the distance between a certain common feature point and the first camera in the space is calculated to be 1, but it cannot be determined whether the common feature points are 1 meter or 1 dm away from the first camera in the space, but there is a real number or mapping scale that can align the first spatial positions of the common feature points calculated by the at least two mapping key frame images with the actual spatial positions of the common feature points, and the alignment process can be performed by steps S204 to S205.

And step S204, obtaining a corresponding second spatial position of the common characteristic point in the space based on the target mapping key frame image and the scale reference image.

In this step, the mapping back-end device 120 may further obtain a corresponding second spatial position of the common feature point in the space based on the target mapping key frame image and the scale reference image. Specifically, the first camera and the second camera may be regarded as binocular cameras, and the target mapping key frame image and the scale reference image may be regarded as captured synchronously by the binocular cameras, and the distance relationship of the binocular cameras is known, so that the mapping backend device 120 may calculate the spatial position of the common feature point in the real space through binocular matching and triangularization feature points, and record the spatial position as a second spatial position, and the second spatial position may also be represented by three-dimensional spatial coordinates, and since the physical scale between the binocular cameras is known, the obtained second spatial position is in a1 to 1 relationship with the real physical world, that is, the second spatial position may accurately reflect the actual spatial position of the common feature point in the real space.

And S205, determining a mapping scale of the first camera according to the first space position and the second space position.

As shown in fig. 3, in this step, the mapping backend device 120 may perform single-point consistent sampling on the common feature points, for example, take the common feature points P1, P2, and P3, obtain first three-dimensional space coordinates used for characterizing first spatial positions corresponding to the common feature points, obtain second three-dimensional space coordinates used for characterizing second spatial positions corresponding to the common feature points, and divide the two sets of space coordinates to obtain a mapping scale of the first camera, that is, complete the process of aligning the first spatial positions with the actual spatial positions of the common feature points.

And step S206, constructing a point cloud map based on the point cloud corresponding to the characteristic points in the space on the target mapping key frame image based on the mapping scale and the first image sequence.

In this step, after obtaining the mapping scale of the first camera, the mapping backend device 120 may perform solution calculation on information such as camera pose when shooting each frame of image based on the mapping scale of the first camera, and the obtained information such as point clouds including the camera pose and the corresponding feature points in the space may all correspond to the actual physical scale in the real scene, so that the mapping backend device 120 may construct a point cloud map based on the point clouds corresponding to the feature points in the space on the target mapping key frame image based on the mapping scale and the first image sequence, using the target mapping key frame image as a first mapping key frame image, specifically may find a second frame of image from the remaining images of the first image sequence based on the point cloud corresponding to the feature points in the space on the target mapping key frame image as an initial point cloud to calculate the corresponding point cloud to be superimposed on the initial point cloud, by analogy, the construction process of the point cloud map of the real scene can be considered to be completed until the images of the first image sequence are all applied to the construction of the point cloud map, that is, the construction scale is only required to be obtained once, and the mapping back-end device 120 can complete the subsequent mapping process based on the obtained mapping scale and the image sequence shot by the first camera so as to realize the efficient construction of the point cloud map on the premise of accurately restoring the real scale of the map.

The method for constructing the point cloud map comprises the steps of obtaining at least two mapping key frame images from a first image sequence shot by a first camera, and then obtaining an image shot synchronously with a target mapping key frame image from a second image sequence shot by a second camera as a scale reference image, wherein the target mapping key frame image can be one of the at least two mapping key frame images. Then, a first space position corresponding to the common characteristic point in the space is obtained based on the at least two mapping key frame images, a second space position corresponding to the common characteristic point in the space is obtained based on the target mapping key frame image and the scale reference image, so that the mapping scale of the first camera is determined according to the first space position and the second space position, and finally, a point cloud map based on the point cloud corresponding to the characteristic point in the space on the target mapping key frame is constructed based on the mapping scale and the first image sequence shot by the first camera. According to the scheme, the second camera can be directly added to observe and constrain the map scale of the first camera under the condition that the complexity is almost unchanged, so that the point cloud map can be constructed based on the image shot by the first camera for obtaining the real map building scale, the efficiency of constructing the point cloud map is improved under the condition of ensuring the map building performance, and the method is also beneficial to improving the positioning precision of the intelligent robot.

As to the manner of determining at least two mapping key frame images in step S201, in some embodiments, at least two mapping key frame images may be selected from consecutive multi-frame images of the first image sequence, that is, the at least two mapping key frame images may be consecutive frame images of the first image sequence. For example, taking two mapping key frame images as an example, each group of two adjacent frame images of the first image sequence may be used as candidate images of the mapping key frame image, and then one group with the largest number of feature point matches in the two adjacent frame images is determined as the two mapping key frame images.

In other embodiments, the at least two mapping key frame images selected from the first image sequence may not be consecutive frame images in the first image sequence. One of the images can be selected from the first image sequence as a mapping key frame image, and then other mapping key frame images are determined according to the matching number of the feature points of the other images in the first image sequence and the mapping key frame image. Specifically, with reference to fig. 4, the acquiring at least two mapping key frame images from the first image sequence in step S201 may include:

step S401, acquiring the feature point matching number of each frame image of the first image sequence and the adjacent frame image;

in this step, the mapping back-end device 120 may perform feature point matching on each frame image of the first image sequence and its adjacent frame image respectively according to the image capturing time sequence, so as to obtain the feature point matching number of each frame image and its adjacent frame image.

Step S402, taking the frame image with the maximum matching number of the feature points of the adjacent frame images in each frame image of the first image sequence as a target image-building key frame image in at least two image-building key frame images;

specifically, the mapping back-end device 120 counts the number of feature point matches between each frame of image in the first image sequence and its adjacent frame of image, and uses the frame of image with the largest number of feature point matches or the largest number of feature point matches as the target mapping key frame image of the at least two mapping key frame images, where the target mapping key frame image may be used as the first frame of image for constructing the point cloud map.

Step S403, according to the feature point matching number of the other frame images of the first image sequence and the target mapping key frame image, selecting at least one frame image meeting the preset feature point matching number condition from the other frame images as the other mapping key frame images in the at least two mapping key frame images to obtain at least two mapping key frame images.

In this step, after obtaining the target mapping key frame image, the mapping back-end device 120 may select at least one frame image from the other frame images or remaining frame images of the first image sequence as the other mapping key frame images of the at least two mapping key frame images. The mapping backend apparatus 120 may select the at least one frame of image based on the feature point matching number to more accurately construct the point cloud map. Specifically, the mapping back-end device 120 determines feature point matching numbers of other frame images of the first image sequence and the target mapping key frame image, and then may use at least one frame image that satisfies a preset feature point matching number condition as the other mapping key frame images. The preset feature point matching number condition may be that the feature point matching number is greater than or equal to a preset feature point matching number threshold, or may be that the feature point matching number of the target mapping key frame image is maximum. In an embodiment, the mapping back-end device 120 may select one frame image with the largest number of feature point matches with the target mapping key frame image from the other frame images as one of the at least two mapping key frame images, so as to perform mapping initialization to obtain a mapping scale and the like in combination with the target mapping key frame image.

According to the technical scheme of the embodiment, the mapping back-end device 120 may select a first mapping key frame image and a second mapping key frame image with the maximum matching number of the feature points of the first mapping key frame based on the matching number of the feature points of the adjacent frames in the first image sequence, so that the subsequent steps can more accurately determine the mapping scale of the first camera based on the matching of the feature points of the two mapping key frame images, and the mapping accuracy is further improved.

In one embodiment, as shown in fig. 5, the determining the mapping scale of the first camera according to the first spatial position and the second spatial position in step S205 may include:

step S501, obtaining an initial mapping scale of a first camera according to a first space position and a second space position;

and step S502, optimizing the initial image construction scale to obtain an image construction scale based on the minimization processing of the point cloud corresponding to the common characteristic points in the space to the first reprojection error of the first camera and the second reprojection error of the second camera.

In this embodiment, the mapping back-end device 120 mainly obtains the mapping scale of the first camera according to the corresponding first and second spatial positions of the common feature points in the space, and then uses the mapping scale as the initial mapping scale to be optimized, and then optimizes the initial mapping scale according to the reprojection error of the binocular camera, so as to improve the mapping scale accuracy of the first camera. Specifically, the mapping backend device 120 may optimize the initial mapping scale s in a unit vector manner, where the formula is as follows:

wherein R and T represent camera poses, s represents an initial mapping scale to be optimized, d (×) represents a normalization operator,

and

representing the observation coordinates of the three-dimensional point cloud with the number i under the first camera and the second camera,

representing the three-dimensional coordinates of the three-dimensional point cloud with the number i in the world coordinate system, h_l(. about.) and h_r(. is) a camera observation model equation, which can represent the state of the camera and the difference between the projection value d (h (sRX + T)) calculated by the state of the point cloud under the observation model and the actual observation value uv.

In one embodiment, as shown in fig. 6, constructing a point cloud map based on a point cloud corresponding to a feature point in space on a target mapping key frame image based on the mapping scale and the first image sequence in step S206 may include:

step S601, determining a point cloud corresponding to a feature point in a space on a target mapping key frame image as an initial point cloud according to the mapping scale, the target mapping key frame image and at least one other mapping key frame image in at least two mapping key frame images to obtain an initial point cloud map;

after obtaining the mapping scale, the mapping back-end device 120 determines, according to the mapping scale, the target mapping key frame image and at least one other mapping key frame image of the at least two mapping key frame images, a point cloud corresponding to a feature point in space on the target mapping key frame image as an initial point cloud, and obtains an initial point cloud map. Specifically, the mapping back-end device 120 may obtain, according to the mapping scale, the target mapping key frame image, and the at least one other mapping key frame image, a camera pose estimation and feature point matching relationship when the corresponding image is captured by using homography matrix decomposition, and calculate, by using a triangulation method, a three-dimensional space coordinate of a point cloud corresponding to a feature point on the target mapping key frame image in a real-world space, where the point cloud is referred to as an initial point cloud, and the corresponding point cloud map is referred to as an initial point cloud map.

For example, it is assumed that the at least two mapping key frame images in the above steps include two mapping key frame images, which are respectively a first mapping key frame image and a second mapping key frame image, and the first mapping key frame image may be used as a target mapping key frame image. The first mapping key frame image comprises a characteristic point A1, a characteristic point B1, a characteristic point C1 and a characteristic point D1, a matching point A2, a matching point B2, a matching point C2 and a matching point D2 which are matched with the characteristic points A1 to D1 can be determined in the second mapping key frame image in an inter-image characteristic point matching mode, and three-dimensional space coordinates of the characteristic points A1 to D1 can be calculated in a triangularization mode. The triangulation is a way of calculating three-dimensional space coordinates of the feature points according to the pixel coordinates of the feature points matched in the two frames of images and the relative pose relationship of the camera when the two frames are shot, wherein the relative pose relationship can comprise rotation R and translation t. After the mapping scale is determined, the rotation R and the translation t of the camera when the two frames are shot correspondingly have actual physical dimensions, so that the three-dimensional space coordinates of the feature points A1-D1 characterized by the rotation R and the translation t also have actual physical dimensions. Specifically, for example, the feature point a1 is taken, and the camera coordinate system when the first camera captures the first mapping key frame image is the first coordinate system, and the camera coordinate system when the second mapping key frame image is captured is the second coordinate system. Thus, the three-dimensional space point P corresponding to the feature point a1 in the first coordinate system₁The spatial coordinates of (a) may be expressed as: s₁K^-1p₁＝P₁. Wherein s is₁Representing a three-dimensional spatial point P₁At the depth corresponding to the first coordinate system, K represents the camera reference matrix, p₁As a three-dimensional space point P₁Coordinate values of feature point A1, which is the projection point of the first mapping key frame image, wherein the depth s corresponding to the first coordinate system₁Can be made to have opposite directions

The least squares solution is performed to obtain, in this equation,

represents x₂Of an antisymmetric matrix of x₂＝K^-1p₂Wherein p is₂Representing a three-dimensional spatial point P₁Coordinate value, x, of matching point A2 at the second mapping key frame image projection point₁＝K^-1p₁. In this way, the three-dimensional coordinates of the point clouds corresponding to the feature points, including the feature point a1, the feature point B1, the feature point C1, the feature point D1 and the like, in the live-action space on the first mapping key frame image can be obtained, and the point clouds corresponding to the feature point a1, the feature point B1, the feature point C1 and the feature point D1 in the live-action space on the first mapping key frame image can be used as initial point clouds, so that the initial point cloud map including the point clouds can be constructed.

Step S602, acquiring observation feature points corresponding to the initial point cloud on other frames of images of the first image sequence;

in this step, the mapping backend device 120 may obtain observation feature points corresponding to the initial point cloud on other frames of images of the first image sequence; the other frame images may include frame images in the first image sequence except the target mapping key frame image. That is, after the mapping back-end device 120 obtains the initial point cloud, all the known three-dimensional point clouds on the point cloud map can be read, and the observation of the known three-dimensional point clouds by the other frames of images of the first image sequence is calculated through the feature point matching relationship between the images of the first image sequence, so as to obtain the observation feature points corresponding to the initial point cloud on the other frames of images. And calculating the observation of the image to the known three-dimensional point cloud, namely obtaining two-dimensional pixel points of the known three-dimensional point cloud projected on the image, wherein the projected two-dimensional pixel points are observation characteristic points. Specifically, in this step, the mapping back-end device 120 obtains two-dimensional pixel points of initial point clouds corresponding to the feature points a1 to D1 in the real scene space, which are respectively projected on the other frames of images.

Step S603, selecting frame images meeting the preset uniformity condition from other frame images as the next drawing key frame image according to the uniformity of the distribution of the observation feature points on the respective frame images;

in this step, the mapping back-end device 120 may calculate the uniformity of the distribution of the observation feature points on the respective frame images, and the image with good uniformity may ensure that the pose of the camera itself is calculated more accurately. For example, as a way of calculating the uniformity degree, the image may be uniformly divided into grids, for example, the image may be uniformly divided into 16 × 16 grids, and then the number of feature points in each grid is counted to calculate the variance, and a smaller variance indicates more uniformity. Based on this, the mapping back-end device 120 may select, from the other frame images, a frame image that satisfies a preset uniformity condition, which may be that the foregoing variance is smaller than a preset variance threshold, as a next mapping key frame image.

Step S604, based on the target mapping key frame image and the next mapping key frame image, determining a point cloud corresponding to the feature points in the space on the next mapping key frame image, and overlapping the point cloud to the initial point cloud map to construct a point cloud map.

In this step, the mapping backend device 120 may superimpose the point cloud corresponding to the feature point in the space on the next mapping key frame image as a newly generated point cloud into the initial point cloud map to gradually construct the point cloud map.

According to the technical scheme of the embodiment, the mapping back-end device 120 may select, based on the distribution uniformity of the observation feature points corresponding to the known point clouds in the initial point cloud map on the images of the remaining frames of images in the first image sequence, a next mapping key frame image satisfying the uniformity condition based on the distribution uniformity of the observation feature points corresponding to the known point clouds in the initial point cloud map, so that a new point cloud accurately generated based on the next mapping key frame image with a better uniformity is superimposed on the existing point cloud to gradually construct the point cloud map.

In an embodiment, as shown in fig. 7, further, the selecting, in step S603, a frame image satisfying a preset uniformity condition from other frame images as a next mapping key frame image specifically includes:

step S701, if at least two frame images meeting a preset uniformity condition exist in other frame images, taking the at least two frame images meeting the preset uniformity condition as candidate images of a next image construction key frame image to obtain at least two candidate images;

step S702, determining the matching number of the characteristic points between the candidate image and the synchronous shot image; wherein the synchronously-captured image refers to an image in the second image sequence that is captured synchronously with the candidate image;

step S703 is to use the candidate image with the largest matching number of feature points in the at least two candidate images as the next image-creating key frame image.

In this embodiment, when the mapping back-end device 120 detects that there are at least two frame images satisfying the preset uniformity condition in each other frame image, that is, if there are at least two images with the same uniformity in each other frame image, the two images may be used as candidate images of the next mapping key frame image, so as to obtain at least two candidate images. Then, the mapping back-end device 120 further determines the feature point matching number between the candidate images and the synchronous shot images, that is, compares the binocular feature point matching number corresponding to each candidate image, and then the mapping back-end device 120 uses the candidate image with the largest binocular feature point matching number in the candidate images as the next mapping keyframe image. By adopting the technical scheme of the embodiment, the binocular feature point matching can be further performed under the condition that the feature point distribution uniformity degree is equivalent, the more the number of the binocular feature point matching is, the more stable the representation image is shot, and the accuracy of the feature points on the image is relatively higher, so that the candidate image with the maximum binocular feature point matching number is used as the key frame image of the next map building, and the accuracy of the point cloud map building is improved.

In an embodiment, the determining, based on the target mapping key frame image and the next mapping key frame image in step S604, a point cloud corresponding to a feature point in a space on the next mapping key frame image specifically includes:

determining a corresponding camera pose when the first camera shoots the next mapping key frame image according to the initial point cloud and observation feature points on the next mapping key frame image; and determining the point cloud corresponding to the feature points in the space on the next mapping key frame image based on the camera pose, the target mapping key frame image and the next mapping key frame image.

The embodiment can provide a method for determining the point cloud corresponding to the feature point in the space on the next mapping key frame image based on the target mapping key frame image and the next mapping key frame image. Specifically, after determining the next mapping key frame image, the mapping back-end device 120 may add the next mapping key frame image to the current map as an input for resolving the camera pose corresponding to the image. When a next mapping key frame image is input for camera pose calculation, a PnP (coherent-n-Point) method can be used for camera pose calculation to obtain a corresponding camera pose when a first camera shoots a next mapping key frame image, after the camera pose is obtained, new Point cloud is generated through triangularization again, and the new Point cloud can be generated through the following formula:

wherein R represents rotation of the camera, T represents translation of the camera, uv represents observation coordinates of the point cloud in the camera, P represents three-dimensional coordinates of a new point cloud to be solved, | | h (RP + T) -uv |²And M represents a new point cloud map corresponding to the newly generated point cloud, and the new point cloud map can be superposed into the initial point cloud map to gradually complete the construction of the point cloud map.

In one embodiment, further, as shown in fig. 8, the overlaying to the initial point cloud map to construct the point cloud map in step S604 may include:

step S801, acquiring multiple common-view images of point clouds corresponding to feature points in space on a next mapping key frame image from a first image sequence;

in the step-by-step mapping process, the mapping backend device 120 may use the point cloud corresponding to the feature point on the next mapping key frame image determined in step S604 in the real space as a newly generated point cloud for being superimposed on the initial point cloud map so as to build the point cloud map step by step. In this step, after determining the newly generated point cloud, the mapping backend device 120 may obtain, from the first image sequence, a plurality of common-view images for the newly generated point cloud, where the common-view images refer to images in the first image sequence that have a common-view relationship with the newly generated point cloud, and the common-view relationship may be determined according to the number of pixel points that are matched on each frame image in the first image sequence by the feature points that correspond to the newly generated point cloud on the next mapping key frame image.

Step S802, based on the minimization processing of the reprojection error of each co-view image by the point cloud corresponding to the feature point in the space on the key frame image of the next image construction, optimizing the point cloud corresponding to the feature point in the space on the key frame image of the next image construction;

in this step, the point cloud corresponding to the feature point in the space on the next mapping key frame image may be used as the newly generated point cloud, and the mapping backend device 120 performs the minimization process on the reprojection error of each co-view image based on the newly generated point cloud to optimize the newly generated point cloud. The mapping back-end device 120 can perform local optimization on the camera pose related to the local map formed by the common view images and the three-dimensional coordinates of the newly generated point cloud in the mapping process, the local optimization is to perform minimization processing on the reprojection error of the common view images based on the newly generated point cloud, and the optimized newly generated point cloud can be superposed into the existing point cloud map again, so that the point cloud map can be constructed more accurately by the local optimization.

And step S803, overlapping the point cloud corresponding to the feature points in the space on the optimized next mapping key frame image to the initial point cloud map to construct the point cloud map.

In the embodiment, the mapping back-end device 120 can locally optimize the camera pose and the point cloud three-dimensional coordinates related to the local map in the process of constructing the point cloud map, and the optimized point cloud can be superposed in the existing point cloud map again, so that the point cloud map is constructed more accurately by the local optimization method.

Specifically, describing the local optimization process, the mapping back-end device 120 may obtain, from the first image sequence, a plurality of common-view images of the point clouds corresponding to the feature points on the next mapping key frame image in the space, that is, for the point cloud newly generated by the next mapping key frame image, the mapping back-end device 120 may determine, from the first image sequence, a common-view image of the newly generated point cloud by means of, for example, feature point matching, and the like, where the number of the common-view images is generally a plurality, and the newly generated point cloud corresponding to the common-view image forms a local map of the entire point cloud map. After obtaining the local map, the mapping back-end device 120 may input the camera pose of the first camera corresponding to all the local maps related to the common view image and the three-dimensional coordinates of the point cloud corresponding to the local map to the following local optimization function for local optimization:

where L denotes a local area, j denotes a point cloud number, and i denotes a common view image number. The method can be used for gradually constructing the point cloud map by superposing the new point cloud subjected to local optimization to the existing point cloud map.

In an embodiment, further, the method may further perform global optimization processing on the point cloud map by using the following steps:

determining an external point in the existing point cloud according to the reprojection error of the existing point cloud in the point cloud map to each frame of image in the first image sequence; and if the ratio of the observation characteristic points of the external points corresponding to each frame of image in the first image sequence to the observation characteristic points of the existing point cloud corresponding to each frame of image in the first image sequence is greater than or equal to a preset ratio, optimizing the existing point cloud based on the minimization processing of the existing point cloud on the reprojection error of each frame of image in the first image sequence.

In this embodiment, the mapping backend apparatus 120 may perform global optimization during the process of constructing the point cloud, so that the constructed point cloud is more accurate and smooth. Specifically, the mapping back-end device 120 may determine the outer points in the existing point cloud according to the reprojection error of the existing point cloud in the point cloud map to each frame of image in the first image sequence. For example, the reprojection error calculated by the mapping backend 120 may use the following equation:

reproj＝h(RP+T)-uv

and the reproj represents the difference between the theoretical observation coordinate projected to the camera by the point cloud and the actual observation coordinate, namely the reprojection error. The external point may be a point cloud with a reprojection error reproj greater than or equal to a certain threshold, and if the mapping backend device 120 determines that the ratio of the observation feature point corresponding to each frame of image of the external point in the first image sequence to the observation feature point corresponding to each frame of image of the existing point cloud in the first image sequence is greater than or equal to a certain ratio, that is, the preset ratio, the existing point cloud may be optimized by using the following formula:

wherein G denotes the global, j denotes the point cloud number, and i denotes the number of each frame image. The embodiment can perform global optimization in the process of constructing the point cloud so as to construct a more accurate point cloud map.

In order to more clearly explain the method for constructing the point cloud map provided by the present application, the method is applied to a binocular camera to construct the point cloud map for explanation, the method may include a front-end diagram construction process and a back-end diagram construction process in the present application example, and the following detailed description is given with reference to fig. 9 and fig. 10, and specifically follows:

for the front-end mapping process, as shown in fig. 9, in step S901, the mapping front-end device 110 may select each frame of binocular images according to the image capturing time sequence, where the binocular images are images captured by a left eye camera and a right eye camera during the process of capturing images of the binocular cameras, in step S902, the mapping front-end device 110 may perform feature point matching on each frame of image and its adjacent frame of image, after the mapping front-end device 110 completes feature point matching of adjacent frames of all images, the matching number of feature points between each frame of image and its adjacent frame of image of the whole image sequence may be counted, the frame with the largest matching number of feature points with its adjacent frame of image is used as the first frame of image for constructing the point cloud map, for this, the mapping front-end device 110 may perform step S903, that is, loop detection is performed every certain number of images at the same time as feature point matching, and when a closed loop is detected, performing step S904 of performing closed loop image matching to match feature points of the newly captured image with historical frame images with long time intervals, and performing step S905 of continuing to match binocular feature points of the image-building front-end device 110 after each loop detection to ensure that the images captured by the cameras are all subjected to binocular feature matching, wherein the binocular feature point matching refers to feature point matching performed between two images captured by a left-eye camera and a right-eye camera synchronously. After performing binocular matching, the front-end mapping device 110 may then execute step S906 to determine whether there are remaining images for which feature point matching has not been performed, if not, determine that the front-end mapping process is finished, and if so, the front-end mapping device 110 returns to step S901 to continue feature point matching.

For the back-end mapping process, as shown in fig. 10, in step S1001, the mapping back-end device 120 may select, according to the first frame image, a frame image with the largest number of feature points matching with the first frame image as a second frame image, use the first frame image and the second frame image as mapping key frame images, and perform mapping initialization to obtain a mapping scale of the camera and align three-dimensional spatial position coordinates of common feature points to obtain an initial point cloud and generate an initial point cloud map by using steps S201 to S206 in the above embodiment. Further, after obtaining the initial point cloud map, the mapping back-end device 120 may perform step S1002, i.e., select the next mapping key frame image for mapping, the mapping back-end device 120 may read all known three-dimensional point clouds in the initial point cloud map, enter step S1003, i.e., the mapping back-end device 120 calculates the observation feature points of the known three-dimensional point clouds under all images through the feature point matching relationship between the images, and calculates the uniformity degree of each observation feature point in each image using the grid, the mapping back-end device 120 enters step S1003 to determine whether the observation feature points are uniformly distributed in each image, if yes, the mapping back-end device 120 compares the binocular feature point matching number of the two images, then selects the image with the largest binocular feature point matching number as the next mapping image, and adds the image into the current map as the input of the camera pose of the next mapping image for resolving, when the next mapping image is input for camera pose resolving, the camera pose resolving is directly carried out by using a PnP method to obtain a camera pose to be optimized. If the mapping back-end device 120 determines in step S1003 that the observation points are not uniformly distributed, it returns to step S1002 to reselect the next frame for mapping. After the mapping back-end device 120 obtains the camera pose in step S1004, step S1005 is performed, that is, the mapping back-end device 120 may triangulate again to generate a new point cloud, specifically, the new point cloud may be generated according to the following formula:

after generating a new point cloud, the mapping back-end device 120 proceeds to step S1006 to perform local smoothing on the map, and the mapping back-end device 120 uses the newly generated point cloud and the camera pose to be optimized as input again, and finds a common-view image corresponding to the newly generated point cloud through feature point matching, where the common-view images may form a local map. After the local map is obtained, the camera poses and the point cloud three-dimensional space position coordinates related to all the local map are put into the following local optimization functions for local optimization:

further, the mapping back-end device 120 may further enter step S1007, that is, perform outlier rejection through a reprojection error, where the reprojection error may use the following formula:

reproj＝h(RP+T)-uv。

next, the mapping back-end device 120 proceeds to step S1008 to detect whether the proportion of the external points exceeds a threshold, if the proportion of the observation feature points corresponding to the external points detected by the mapping back-end device 120 to the observation feature points corresponding to the existing point cloud is greater than or equal to the proportion threshold, the mapping back-end device 120 proceeds to step S1009 to perform global smoothing or global optimization processing on the constructed point cloud map, and the mapping back-end device 120 returns to step S1002 after the global optimization processing to select the next frame for mapping processing; if not, the mapping backend device 120 may directly return to step S1002 to select the next frame for mapping processing. The global optimization formula may be:

fig. 11 is a diagram comparing the effect of the mapping scheme provided in this embodiment with that of the mapping scheme provided in the conventional technology, and shows an effect demonstration aiming at mapping a public data set, where the public data set can be regarded as a series of images captured by using a calibrated binocular camera, and after the series of images are acquired, the series of images can be sampled at intervals according to a frequency of 5Hz to ensure that redundant key frame images are as few as possible, and then mapping can be completed based on the mapping scheme provided in this embodiment. The first effect diagram 1110 shows an effect diagram of a mapping scheme provided by the conventional technology, and the second effect diagram 1120 shows an effect diagram of a mapping scheme provided by this embodiment, which shows that the mapping scheme provided by this embodiment has better mapping performance compared with the mapping scheme provided by the conventional technology. Specifically, the scheme provided by the embodiment is different from the mapping scheme provided by the conventional technology in that each pair of binocular images is regarded as a single image with sparse depth, so that the algorithm complexity is greatly reduced, the algorithm flow is accelerated, meanwhile, the mapping accuracy is ensured by using the binocular cameras to carry out links such as mapping initialization, a binocular mapping sequence and the like, the mapping time and the mapping performance can be effectively balanced, and the positioning accuracy of the intelligent robot can be improved to a greater extent.

It should be understood that, although the steps in the flowcharts of fig. 1 to 10 are shown in sequence as indicated by the arrows, the steps are not necessarily performed in sequence as indicated by the arrows. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least some of the steps in fig. 1 to 10 may include multiple steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, and the order of performing the steps or stages is not necessarily sequential, but may be performed alternately or alternately with other steps or at least some of the other steps or stages.

In one embodiment, as shown in fig. 12, there is provided an apparatus for constructing a point cloud map, the apparatus 1200 may include:

a first image obtaining module 1201, configured to obtain at least two mapping key frame images from a first image sequence; the first image sequence is an image sequence captured by a first camera for constructing a point cloud map;

a second image obtaining module 1202, configured to obtain an image, which is captured in synchronization with the target mapping key frame image in the second image sequence, as a scale reference image; the second image sequence is an image sequence which is shot by a second camera and used for constructing the point cloud map; the second camera and the first camera are separated from each other by a preset distance when synchronously shooting an image sequence; the target image establishing key frame image is one of the at least two image establishing key frame images;

a first position obtaining module 1203, configured to obtain, based on the at least two mapping key frame images, a corresponding first spatial position of the common feature point in the space; the common characteristic points are the characteristic points which are common to the at least two image establishing key frame images and the scale reference image;

a second position obtaining module 1204, configured to obtain, based on the target mapping key frame image and the scale reference image, a corresponding second spatial position of the common feature point in the space;

a scale determining module 1205, configured to determine a mapping scale of the first camera according to the first spatial position and the second spatial position;

a map construction module 1206, configured to construct, based on the mapping scale and the first image sequence, a point cloud map based on a point cloud corresponding to a feature point in the space on the target mapping key frame image.

In one embodiment, the first image obtaining module 1201 is further configured to obtain a feature point matching number of each frame image of the first image sequence and an adjacent frame image; taking the frame image with the maximum matching number with the feature points of the adjacent frame images in each frame image of the first image sequence as the target image construction key frame image in the at least two image construction key frame images; and according to the feature point matching number of the other frame images of the first image sequence and the target mapping key frame image, selecting at least one frame image meeting the condition of the preset feature point matching number from the other frame images as other mapping key frame images in the at least two mapping key frame images to obtain the at least two mapping key frame images.

In one embodiment, the preset feature point matching number condition includes that the feature point matching number of the target mapping key frame image is maximum.

In an embodiment, the map building module 1206 is further configured to determine, according to the mapping scale, the target mapping key frame image, and at least one other mapping key frame image of the at least two mapping key frame images, a point cloud corresponding to a feature point on the target mapping key frame image in the space as an initial point cloud, so as to obtain an initial point cloud map; acquiring observation characteristic points corresponding to the initial point cloud on other frames of images of the first image sequence; the other frame images comprise frame images in the first image sequence except the target image establishing key frame image; selecting a frame image meeting a preset uniformity condition from the other frame images as the next image construction key frame image according to the uniformity degree of the observation feature points distributed on the respective frame images; and determining a point cloud corresponding to the feature points on the next mapping key frame image in the space based on the target mapping key frame image and the next mapping key frame image, and overlapping the point cloud to the initial point cloud map to construct the point cloud map.

In an embodiment, the map building module 1206 is further configured to, if there are at least two frame images that satisfy a preset uniformity condition in the other frame images, take the at least two frame images that satisfy the preset uniformity condition as candidate images of the next mapping key frame image to obtain at least two candidate images; determining the matching number of the characteristic points between the candidate image and the synchronous shot image; the synchronous shot image is an image which is shot synchronously with the candidate image in the second image sequence; and taking the candidate image with the maximum feature point matching number in the at least two candidate images as the next image construction key frame image.

In one embodiment, the map construction module 1206 is further configured to determine, according to the observation feature points included in the initial point cloud and the next mapping key frame image, a corresponding camera pose when the first camera captures the next mapping key frame image; and determining a point cloud corresponding to the feature points on the next mapping key frame image in the space based on the camera pose, the target mapping key frame image and the next mapping key frame image.

In one embodiment, the map building module 1206 is further configured to obtain, from the first sequence of images, a plurality of co-view images of the point clouds corresponding in the space for the feature points that are present on the next mapping key frame image; optimizing the point clouds of the feature points on the next mapping key frame image corresponding to the space based on the minimization processing of the reprojection error of each co-view image by the point clouds of the feature points on the next mapping key frame image corresponding to the space; and overlapping the point cloud corresponding to the feature points in the space on the optimized next mapping key frame image to the initial point cloud map to construct the point cloud map.

In one embodiment, the apparatus 1200 may further include: the global optimization unit is used for determining external points in the existing point cloud according to the reprojection error of the existing point cloud in the point cloud map to each frame of image in the first image sequence; and if the ratio of the observation characteristic points of the external points corresponding to the frames of images in the first image sequence to the observation characteristic points of the existing point cloud corresponding to the frames of images in the first image sequence is greater than or equal to a preset ratio, optimizing the existing point cloud based on the minimization processing of the existing point cloud on the reprojection error of the frames of images in the first image sequence.

In one embodiment, the dimension determining module 1205 is further configured to: acquiring an initial mapping scale of the first camera according to the first spatial position and the second spatial position; and optimizing the initial mapping scale to obtain the mapping scale based on the minimization processing of the point cloud corresponding to the common feature points in the space to the first reprojection error of the first camera and the second reprojection error of the second camera.

For specific limitations of the apparatus for constructing the point cloud map, reference may be made to the above limitations of the method for constructing the point cloud map, and details are not repeated here. The modules in the device for constructing the point cloud map can be wholly or partially realized by software, hardware and a combination thereof. The modules can be embedded in a hardware form or independent of a processor in the electronic device, or can be stored in a memory in the electronic device in a software form, so that the processor can call and execute operations corresponding to the modules.

In one embodiment, there is provided an apparatus for constructing a point cloud map, as shown in fig. 13, the apparatus may include: a first camera and a second camera, and a processor; the processor can be used for acquiring an image sequence synchronously shot by the first camera and the second camera and constructing a point cloud map according to the method in any one of the above embodiments; and the first camera and the second camera are separated from each other by a preset distance when synchronously shooting the image sequence.

In one embodiment, an electronic device is provided, which may be a terminal of a front-end device for creating a drawing or a server of a back-end device for creating a drawing, and its internal structure diagram may be as shown in fig. 14. The electronic device includes a processor, a memory, and a communication interface connected by a system bus. Wherein the processor of the electronic device is configured to provide computing and control capabilities. The memory of the electronic equipment comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The communication interface of the electronic device is used for carrying out wired or wireless communication with external devices, and the wireless communication can be realized through WIFI, an operator network, NFC (near field communication) or other technologies. The computer program is executed by a processor to implement a method of constructing a point cloud map.

It will be appreciated by those skilled in the art that the configuration shown in fig. 14 is a block diagram of only a portion of the configuration relevant to the present application, and does not constitute a limitation on the electronic device to which the present application is applied, and a particular electronic device may include more or less components than those shown in the drawings, or may combine certain components, or have a different arrangement of components.

In one embodiment, an electronic device is further provided, which includes a memory and a processor, where the memory stores a computer program, and the processor implements the steps of the above method embodiments when executing the computer program.

In an embodiment, a computer-readable storage medium is provided, on which a computer program is stored which, when being executed by a processor, carries out the steps of the above-mentioned method embodiments.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, storage, database or other medium used in the embodiments provided herein can include at least one of non-volatile and volatile memory. Non-volatile Memory may include Read-Only Memory (ROM), magnetic tape, floppy disk, flash Memory, optical storage, or the like. Volatile Memory can include Random Access Memory (RAM) or external cache Memory. By way of illustration and not limitation, RAM can take many forms, such as Static Random Access Memory (SRAM) or Dynamic Random Access Memory (DRAM), among others.

The technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.

The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims

1. A method of constructing a point cloud map, the method comprising:

acquiring an image which is shot synchronously with a target image construction key frame image in the second image sequence and is used as a scale reference image; the second image sequence is an image sequence which is shot by a second camera and used for constructing the point cloud map; the second camera and the first camera are separated from each other by a preset distance when synchronously shooting an image sequence; the target image establishing key frame image is one of the at least two image establishing key frame images;

determining a mapping scale of the first camera according to the first space position and the second space position;

2. The method of claim 1, wherein the obtaining at least two mapping key frame images from the first image sequence comprises:

acquiring the feature point matching number of each frame image of the first image sequence and the adjacent frame image;

taking the frame image with the maximum matching number with the feature points of the adjacent frame images in each frame image of the first image sequence as the target image construction key frame image in the at least two image construction key frame images;

and according to the feature point matching number of the other frame images of the first image sequence and the target mapping key frame image, selecting at least one frame image meeting the condition of the preset feature point matching number from the other frame images as other mapping key frame images in the at least two mapping key frame images to obtain the at least two mapping key frame images.

3. The method according to claim 2, wherein the predetermined feature point matching number condition comprises a maximum feature point matching number with the target mapping key frame image.

4. The method of claim 1, wherein constructing a point cloud map based on corresponding point clouds in the space of feature points on the target mapping key frame image based on the mapping scale and the first image sequence comprises:

determining a point cloud corresponding to the feature points on the target mapping key frame image in the space as an initial point cloud according to the mapping scale, the target mapping key frame image and at least one other mapping key frame image of the at least two mapping key frame images, and obtaining an initial point cloud map;

acquiring observation characteristic points corresponding to the initial point cloud on other frames of images of the first image sequence; the other frame images comprise frame images in the first image sequence except the target image establishing key frame image;

selecting frame images meeting the preset uniformity condition from the other frame images as the key frame images of the next image construction according to the uniformity of the distribution of the observation feature points on the respective frame images;

and determining a point cloud corresponding to the feature points on the next mapping key frame image in the space based on the target mapping key frame image and the next mapping key frame image, and overlapping the point cloud to the initial point cloud map to construct the point cloud map.

5. The method according to claim 4, wherein the selecting the frame image satisfying the predetermined uniformity condition from the other frame images as a next mapping key frame image comprises:

if at least two frame images meeting the preset uniformity condition exist in the other frame images, taking the at least two frame images meeting the preset uniformity condition as candidate images of the key frame image of the next image building to obtain at least two candidate images;

determining the matching number of the characteristic points between the candidate image and the synchronous shot image; the synchronous shot image is an image which is shot synchronously with the candidate image in the second image sequence;

and taking the candidate image with the maximum feature point matching number in the at least two candidate images as the next image construction key frame image.

6. The method of claim 4, wherein determining, based on the target mapping key frame image and the next mapping key frame image, a corresponding point cloud in the space having feature points on the next mapping key frame image comprises:

determining a corresponding camera pose when the first camera shoots the next mapping key frame image according to the observation feature points on the initial point cloud and the next mapping key frame image;

and determining a point cloud corresponding to the feature points on the next mapping key frame image in the space based on the camera pose, the target mapping key frame image and the next mapping key frame image.

7. The method according to any one of claims 4 to 6,

the overlaying to the initial point cloud map to construct the point cloud map, comprising:

acquiring a plurality of common-view images of point clouds corresponding to the feature points on the next mapping key frame image in the space from the first image sequence;

optimizing the point clouds of the feature points on the next mapping key frame image corresponding to the space based on the minimization processing of the reprojection error of each co-view image by the point clouds of the feature points on the next mapping key frame image corresponding to the space;

overlapping the point cloud corresponding to the feature points in the space on the optimized next mapping key frame image to the initial point cloud map to construct the point cloud map;

the method further comprises the following steps:

determining an external point in the existing point cloud according to a reprojection error of the existing point cloud in the point cloud map to each frame of image in the first image sequence;

and if the ratio of the observation characteristic points of the external points corresponding to the frames of images in the first image sequence to the observation characteristic points of the existing point cloud corresponding to the frames of images in the first image sequence is greater than or equal to a preset ratio, optimizing the existing point cloud based on the minimization processing of the existing point cloud on the reprojection error of the frames of images in the first image sequence.

8. The method of claim 1, wherein determining the mapping scale for the first camera based on the first and second spatial locations comprises:

acquiring an initial mapping scale of the first camera according to the first spatial position and the second spatial position;

and optimizing the initial mapping scale to obtain the mapping scale based on the minimization processing of the point cloud corresponding to the common feature points in the space to the first reprojection error of the first camera and the second reprojection error of the second camera.

9. An apparatus for constructing a point cloud map, comprising:

10. An apparatus for constructing a point cloud map, comprising a first camera and a second camera, and a processor; wherein the content of the first and second substances,

the processor is used for acquiring the image sequence synchronously shot by the first camera and the second camera and constructing a point cloud map according to the method of any one of claims 1 to 8; and the first camera and the second camera are separated from each other by a preset distance when synchronously shooting the image sequence.