CN117252999A

CN117252999A - Dense map construction method based on semantic tags and sparse point cloud

Info

Publication number: CN117252999A
Application number: CN202311064044.6A
Authority: CN
Inventors: 李健华; 王茂帅; 谢恩鹏; 秦西运; 王辉
Original assignee: Shandong New Generation Information Industry Technology Research Institute Co Ltd
Current assignee: Shandong New Generation Information Industry Technology Research Institute Co Ltd
Priority date: 2023-08-23
Filing date: 2023-08-23
Publication date: 2023-12-19

Abstract

The invention discloses a dense map construction method based on semantic tags and sparse point clouds, and relates to the technical field of three-dimensional map construction; comprises the following steps: collecting data; step 2: performing semantic segmentation and point cloud registration; step 3: carrying out local map joint optimization processing based on the point cloud information; step 4: fusing the local map, and performing closed loop detection and optimization; step 5: perfecting the dense map; and the construction of the dense map is rapidly and accurately realized.

Description

Dense map construction method based on semantic tags and sparse point cloud

Technical Field

The invention discloses a method, relates to the technical field of three-dimensional map construction, and in particular relates to a dense map construction method based on semantic tags and sparse point clouds.

Background

The existing map construction method mainly focuses on geometric accuracy, and acquires geometric information of the environment, such as point cloud, laser radar data and the like, through sensor data. However, geometric information often lacks semantic attributes, detailed descriptions of object features and scene layout in the environment cannot be provided, and a robot cannot better understand the surrounding environment and execute tasks such as obstacle recognition, target tracking, path planning and the like by combining the semantic information with map construction.

Disclosure of Invention

Aiming at the problems in the prior art, the invention provides a dense map construction method based on semantic tags and sparse point clouds, which is characterized by combining point cloud configuration and interpolation operation of the sparse point cloud map through semantic tag segmentation, generating a new key frame based on a pixel point reselection strategy of a semantic plane and detection of semantic point motion consistency, optimizing data of a local map, and realizing dense map construction through semantic plane fitting and pixel point depth estimation.

The specific scheme provided by the invention is as follows:

the invention provides a dense map construction method based on semantic tags and sparse point clouds, which comprises the following steps:

step 1: collecting data: collecting key frames and sparse point cloud data, and defining semantic tags of pixel points of key frame pictures, wherein the semantic tags comprise static semantic tags and dynamic semantic tags;

step 2: semantic segmentation and point cloud registration are performed: dividing the key frame into area blocks according to semantic labels by utilizing a semantic division model, distributing and mapping each point in the sparse point cloud data to a corresponding semantic label according to the area blocks to finish the correspondence of the sparse point cloud data and the pixel points,

registering and aligning the sparse point cloud data under different coordinate systems by using a registration algorithm to obtain globally consistent point cloud information;

step 3: and carrying out local map joint optimization processing based on the point cloud information: detecting redundant key frames and deleting the redundant key frames to obtain the latest key frames, filtering out pixel points with dynamic labels in the latest key frames,

constructing a semantic plane by using a plane fitting algorithm: calculating the distance between pixels of the same static semantic label, aggregating the tracked pixels in the marginalized key frame into a plurality of pixel groups according to the distance and the corresponding static semantic label, wherein the pixels in each pixel group have the same static semantic label and are mutually close to each other to form a semantic plane of the pixel group, and filtering pixels inconsistent with the semantic plane distribution;

step 4: fusing the local map, and performing closed loop detection and optimization;

step 5: and constructing a semantic plane by using a plane fitting algorithm according to the fused map, and perfecting a dense map by combining the pixel point group and the static semantic label through an interpolation algorithm based on the semantic plane.

Further, in the dense map construction method based on semantic tags and sparse point clouds, the static semantic tags in step 1 include Road, sidewalk sidwalk, building, wall, railing, post Pole, traffic signal light Traffic Sign, traffic Sign, green planting guide, topography Terrain,

the dynamic semantic tags include people, automobiles, bicycles, motorcycles, dogs, and cats.

Further, in the dense map construction method based on semantic tags and sparse point clouds, redundant key frames are detected and deleted in step 3, and the method comprises the following steps: setting the redundancy quantity, and deleting one key frame if the pixel points exceeding the redundancy quantity in the one key frame are observed by at least 3 key frames.

Further, in the step 3 of the dense map construction method based on semantic tags and sparse point clouds, a plane fitting algorithm is used to calculate a combined distance according to euclidean distances of pixels in a key frame and differences of gray scales of the pixels, the combined distance represents a distance between pixels of the same static semantic tag,

the tracked pixel points of the same static semantic label in the marginal key frame are aggregated into point pairs according to the combined distance, the point pairs are aggregated into a pixel point group through Euclidean distance between the pixel points in the point pairs and the average gray level difference of the point pairs, a distance threshold value is set, the pixel points in the pixel point group are prevented from crossing the static semantic label beyond the distance threshold value,

according to the three-dimensional coordinates of the pixel points in the pixel point group, a RANSAC method is utilized to form a semantic plane corresponding to the pixel point group.

In the dense map construction method based on semantic labels and sparse point clouds, in step 3, grid intervals of collecting key frame pixels are set to be S, the pixels in a key frame picture, euclidean distances dis between the pixels, pixel gray level differences dis, and combined distance dis are obtained, and are expressed by the following formula:

calculating the combined distance disC, N by the above formula _c Representing the maximum color distance between pixel points, and setting a constant so that N _s Equal to S, representing the maximum spatial distance between the acquired pixel points within the grid interval, (x) _i ,y _i ) And (x) _j ,y _j ) Respectively representing pixel coordinates, [ l ] _i ,a _i ,b _i ]And [ l ] _j ,a _j ,b _j ]Respectively representing the pixel colors of the pixel points in the cielab color space.

The invention provides a dense map construction device based on semantic tags and sparse point clouds, which comprises an acquisition module, a semantic segmentation and point cloud registration module, a joint optimization processing module, a fusion and detection module and an updating module,

the acquisition module acquires data: collecting key frames and sparse point cloud data, and defining semantic tags of pixel points of key frame pictures, wherein the semantic tags comprise static semantic tags and dynamic semantic tags;

the semantic segmentation and point cloud registration module performs semantic segmentation and point cloud registration: dividing the key frame into area blocks according to semantic labels by utilizing a semantic division model, distributing and mapping each point in the sparse point cloud data to a corresponding semantic label according to the area blocks to finish the correspondence of the sparse point cloud data and the pixel points,

the joint optimization processing module performs local map joint optimization processing based on the point cloud information: detecting redundant key frames and deleting the redundant key frames to obtain the latest key frames, filtering out pixel points with dynamic labels in the latest key frames,

the fusion and detection module fuses the local map and performs closed loop detection and optimization;

and the updating module constructs a semantic plane by utilizing a plane fitting algorithm according to the fused map, and perfects the dense map by combining the pixel point group and the static semantic label through an interpolation algorithm based on the semantic plane.

Further, the static semantic tags defined by the acquisition module in the dense map construction device based on semantic tags and sparse point clouds comprise roads, sidewalks sidealk, building blocks, walls, railing, posts, traffic signal lights Traffic Sign, traffic Sign, green planting Vegetation, topography Terrain,

Further, the method for detecting and deleting redundant key frames by the joint optimization processing module in the dense map construction device based on the semantic tags and the sparse point cloud comprises the following steps: setting the redundancy quantity, and deleting one key frame if the pixel points exceeding the redundancy quantity in the one key frame are observed by at least 3 key frames.

Further, in the dense map construction device based on the semantic tags and the sparse point cloud, a plane fitting algorithm is utilized to calculate a combined distance according to Euclidean distance of the pixel points in the key frame and the difference of gray level of the pixel points, the combined distance represents the distance between the pixel points of the same static semantic tags,

Further, in the dense map construction device based on semantic labels and sparse point clouds, grid intervals for collecting pixels of a key frame are set to be S, pixels in a key frame picture, euclidean distances disO among the pixels, gray differences disG of the pixels and combined distances disC are collected, and the method is expressed by the following formula:

The invention has the advantages that:

the invention provides a dense map construction method based on semantic tags and sparse point clouds, which is characterized in that the dense map construction is rapidly and accurately realized through semantic tag segmentation features, point cloud registration and interpolation operation of the sparse point cloud map, a new key frame is generated based on a pixel point reselection strategy of a semantic plane, and data of a local map is optimized through semantic plane fitting and pixel point depth estimation.

Drawings

FIG. 1 is a schematic flow chart of the method of the invention.

FIG. 2 is a schematic diagram of the layout of the application of the method of the present invention.

Detailed Description

The present invention will be further described with reference to the accompanying drawings and specific examples, which are not intended to be limiting, so that those skilled in the art will better understand the invention and practice it.

step 2: semantic segmentation and point cloud registration are performed: dividing the key frame into area blocks according to semantic labels by utilizing a semantic division model, mapping each point distribution in the sparse point cloud data to a corresponding semantic label according to the area blocks,

The method enriches semantic information for map construction, and the map construction method based on the semantic information can allocate semantic tags for each map point, such as roads, buildings, trees, pedestrians and the like, so that the meaning and expression capacity of the map are enriched;

enhancing the environmental awareness: the semantic information enables the robot to better understand and perceive the surrounding environment. Through semantic tags, the robot can recognize different objects and scenes, so that tasks such as position location, obstacle detection, target tracking and the like can be more accurately performed.

The efficiency of navigation and path planning is further improved: the robot can conduct path planning and navigation according to semantic attributes of the target, and the robot can reach the destination faster. For example, it is possible to select a path suitable for a pedestrian, instead of a vehicle road, or to pass through a pedestrian area while avoiding a vehicle area.

Optimizing decision and interaction capabilities: the map based on semantic information can provide more context and semantic information for the robot, helping it to make more accurate decisions. For example, the robot may distinguish between different types of obstacles based on semantic tags, selecting an appropriate detour strategy.

Better man-machine interaction and interpretability: the semantic information is helpful for intelligent decision making of robots, and can provide richer expression capability for human-computer interaction. Through the semantic map, people can more intuitively understand the environment perception and decision process of the robot.

In specific applications, in some embodiments of the method of the present invention, when constructing a dense map based on semantic tags and sparse point clouds, the following procedure may be referred to:

step 1: collecting data: and acquiring key frames and sparse point cloud data by using sensors such as a laser radar and the like, and defining semantic tags of pixels of the key frame picture, wherein the semantic tags comprise static semantic tags and dynamic semantic tags. After data acquisition, an initialization process may be performed.

Further, the static semantic tags include Road, pavement, building, wall, railing, post, traffic Sign, green plant vehicle, terrain,

the dynamic semantic tags include people, automobiles, bicycles, motorcycles, dogs, and cats. The invention uses the static semantic tags to reduce the consumption of computing resources on the semantic segmentation network.

and registering and aligning the sparse point cloud data under different coordinate systems by using a registration algorithm to obtain globally consistent point cloud information. The semantic segmentation model can be an existing semantic segmentation model, a key frame is segmented into area blocks according to semantic tags, and each point in the sparse point cloud data is distributed and mapped to a corresponding semantic tag according to the area blocks.

And (3) aligning the point clouds at different positions of the acquired sparse point cloud data through a registration algorithm (Iterative Closest Point, ICP) so as to obtain globally consistent point cloud information under the same coordinate system.

Step 3: and carrying out local map joint optimization processing based on the point cloud information: redundant key frames are detected and deleted. Wherein detecting redundant key frames and deleting comprises: setting the redundancy quantity, and deleting one key frame if the pixel points exceeding the redundancy quantity in the one key frame are observed by at least 3 key frames. The redundancy number typically contains 90% of the pixels. The key frames can be obtained by using a beam adjustment method, a growth tree and a word bag BOW model of the key frames are calculated, and different positions of the same pixel points in the two key frames are tracked.

And obtaining the latest key frame, and filtering out the pixel points with dynamic labels in the latest key frame. Wherein the motion consistency pixel point is detected: dividing the tracked pixel points into a plurality of pixel point groups by utilizing semantic tags of the pixel points and position information of the pixel points in the image, filtering the pixel points with dynamic tags based on the pixel point groups and the polar plane geometric limitation, setting the pixel points to be untracked in the latest key frame in the area corresponding to the pixel points with the dynamic tags, and reducing the influence of a dynamic semantic tag target.

Constructing a semantic plane by using a plane fitting algorithm: calculating the distance between pixels of the same static semantic label, aggregating the tracked pixels in the marginalized key frame into a plurality of pixel groups according to the distance and the corresponding static semantic label, wherein the pixels in each pixel group have the same static semantic label and are mutually close to each other to form a semantic plane of the pixel group, and filtering pixels inconsistent with the semantic plane distribution.

Further, in step 3, the distance between pixels of the same static semantic tag is calculated by using a plane fitting algorithm according to the Euclidean distance of pixels in the key frame and the difference combination distance of gray scales of the pixels,

according to the three-dimensional coordinates of the pixel points in the pixel point group, a RANSAC method is utilized to form a semantic plane corresponding to the pixel point group. When the plane fitting algorithm is executed, firstly, setting grid intervals for collecting pixels of a key frame as S, collecting pixels in a key frame picture, euclidean distance dis between the pixels, gray level difference dis of the pixels, and combining the distance dis, wherein the distance dis is expressed by the following formula:

calculating the combined distance disC, N by the above formula _c Representing the maximum color distance between pixel points, and setting a constant so that N _s Equal to S, representing the maximum spatial distance between the acquired pixel points within the grid interval, (x) _i ,y _i ) And (x) _j ,y _j ) Respectively representing pixel coordinates, [ l ] _i ,a _i ,b _i ]And [ l ] _j ,a _j ,b _j ]Respectively representing the pixel colors of the pixel points in the cielab color space. The invention uses the difference of pixel gray level to distinguish points at different positions of the same object, such as the wall and the top surface of the same house. First, the tracked pixel points of the same semantic label in the marginalized frame are aggregated into point pairs by using a combined distance disC, and then pass between points inside the point pairsEuclidean distance disk-O and point-to-average gray level difference disk-G, the nearest, next nearest point pair is aggregated into a group of pixel points. The generated pixel group is then prevented from crossing the same static semantic tag by setting a distance threshold disthp. After the repeated points are removed, at least four points which are not on the same line can exist in each pixel point group, the points in the pixel point groups belong to the same semantic label and are close to each other, and a semantic plane corresponding to the pixel point groups, such as a road plane or a building plane, is estimated by using a RANSAC method according to the three-dimensional coordinates of the points in the pixel point groups. The semantic planes generated above all have a corresponding logo with location, gray scale and semantic label information.

Step 4: and fusing the local map, and performing closed loop detection and optimization. And fusing a plurality of dense point cloud maps through closed loop detection, closed loop correction and global pose optimization operation, and eliminating conflict of the overlapped areas to obtain more complete map representation. And optimizing the fused map by using a map optimization algorithm to eliminate noise and estimation errors, wherein the map optimization algorithm comprises g2o, iSAM, HOG-Man, SPA2d and the like, and generally defines that points in the optimization map are camera pose, edges are camera pose transformation relations and generally are error items. After an optimization graph is constructed, initial values are selected for iteration, a jacobian matrix and a sea plug matrix corresponding to the current estimated value are calculated, meanwhile, a sparse linear equation HkDeltax= -bkHkDeltax= -bk is solved, the gradient direction is obtained, and iteration is carried out by a Gaussian-Newton or L-M method continuously. And returning to the optimized value until the iteration is finished.

Step 5: and constructing a semantic plane by using a plane fitting algorithm according to the fused map, and perfecting a dense map by combining the pixel point group and the static semantic label through an interpolation algorithm based on the semantic plane. The dense map is expressed in a three-dimensional grid form, tracked pixels in the marginalized frame are divided into a plurality of pixel point groups by utilizing the position information of the pixel points of the marginalized frame in the image and the corresponding semantic labels, and the points in each pixel point group have the same static semantic label and are mutually close to each other, so that the points can be visually on the same semantic plane. Fitting a semantic plane according to the three-dimensional coordinates of the pixel points, combining point positions and static semantic labels through interpolation algorithms such as nearest neighbor interpolation, gaussian process interpolation and the like, and recovering a dense semantic map of the urban road environment according to the semantic plane, so that the aim of perfecting the dense map from the sparse point cloud is finally achieved. The method can be used in applications such as navigation and path planning, and navigation and decision-making capability of the robot is optimized by utilizing semantic information in the map.

the semantic segmentation and point cloud registration module performs semantic segmentation and point cloud registration: dividing the key frame into area blocks according to semantic labels by utilizing a semantic division model, mapping each point distribution in the sparse point cloud data to a corresponding semantic label according to the area blocks,

The content of information interaction and execution process between the modules in the device is based on the same conception as the embodiment of the method of the present invention, and specific content can be referred to the description in the embodiment of the method of the present invention, which is not repeated here.

Similarly, the device of the invention generates a new key frame based on a pixel point reselection strategy of a semantic plane by combining semantic label segmentation characteristics, point cloud registration and interpolation operation of a sparse point cloud map, optimizes data of a local map, and rapidly and accurately realizes construction of a dense map by semantic plane fitting and pixel point depth estimation.

It should be noted that not all the steps and modules in the above processes and the structures of the devices are necessary, and some steps or modules may be omitted according to actual needs. The execution sequence of the steps is not fixed and can be adjusted as required. The system structure described in the above embodiments may be a physical structure or a logical structure, that is, some modules may be implemented by the same physical entity, or some modules may be implemented by multiple physical entities, or may be implemented jointly by some components in multiple independent devices.

The above-described embodiments are merely preferred embodiments for fully explaining the present invention, and the scope of the present invention is not limited thereto. Equivalent substitutions and modifications will occur to those skilled in the art based on the present invention, and are intended to be within the scope of the present invention. The protection scope of the invention is subject to the claims.

Claims

1. A dense map construction method based on semantic tags and sparse point clouds is characterized by comprising the following steps:

2. The dense map construction method based on semantic tags and sparse point clouds according to claim 1, wherein the static semantic tags in step 1 include Road, sidewalk sidwalk, building, wall, railing, post, traffic signal light Traffic Sign, traffic Sign, green plant Vegetation, topography Terrain,

3. The dense map construction method based on semantic tags and sparse point clouds according to claim 1, wherein the step 3 of detecting and deleting redundant key frames comprises: setting the redundancy quantity, and deleting one key frame if the pixel points exceeding the redundancy quantity in the one key frame are observed by at least 3 key frames.

4. The method for constructing a dense map based on semantic tags and sparse point clouds according to claim 1, wherein in the step 3, a plane fitting algorithm is used to calculate a combined distance from Euclidean distances of pixels in a key frame and differences in gray levels of the pixels, the combined distance represents a distance between pixels of the same static semantic tag,

5. The dense map construction method based on semantic tags and sparse point clouds as claimed in claim 4, wherein in step 3, grid intervals of collecting keyframe pixels are set to be S, pixels in a keyframe picture, euclidean distances disO among the pixels, gray level differences disG of the pixels, and combined distances disC are set, and are expressed by the following formula:

6. The dense map construction device based on semantic tags and sparse point clouds is characterized by comprising an acquisition module, a semantic segmentation and point cloud registration module, a joint optimization processing module, a fusion and detection module and an updating module,

7. The dense map construction apparatus based on semantic tags and sparse point clouds according to claim 6, wherein the static semantic tags defined by the acquisition module include Road, sidewalk sidwalk, building, wall, railing, post, traffic signal light Traffic Sign, green plant vehicle, topography Terrain,

8. The dense map construction apparatus based on semantic tags and sparse point clouds according to claim 6, wherein the joint optimization processing module detects redundant key frames and deletes the same, comprising: setting the redundancy quantity, and deleting one key frame if the pixel points exceeding the redundancy quantity in the one key frame are observed by at least 3 key frames.

9. The dense map construction apparatus based on semantic tags and sparse point clouds as claimed in claim 6, wherein the joint optimization processing module calculates a combined distance from Euclidean distances of pixels in the key frame and differences in gray level of the pixels using a plane fitting algorithm, the combined distance representing a distance between pixels of the same static semantic tag,

10. The dense map construction method based on semantic tags and sparse point clouds as claimed in claim 9, wherein grid intervals of collecting keyframe pixels are set as S in a joint optimization processing module, pixels in a keyframe picture are collected, euclidean distances disO among the pixels, gray level differences disG of the pixels, and combined distances disC are represented by the following formula: