CN117542008A

CN117542008A - Semantic point cloud fusion automatic driving scene identification method and storage medium

Info

Publication number: CN117542008A
Application number: CN202311324311.9A
Authority: CN
Inventors: 汪洋; 钟煜; 赵雄伟
Original assignee: Harbin Institute Of Technology shenzhen Shenzhen Institute Of Science And Technology Innovation Harbin Institute Of Technology
Current assignee: Harbin Institute Of Technology shenzhen Shenzhen Institute Of Science And Technology Innovation Harbin Institute Of Technology
Priority date: 2023-10-12
Filing date: 2023-10-12
Publication date: 2024-02-09

Abstract

The invention provides an automatic driving scene recognition method and a storage medium fusing semantic point clouds. The scene recognition method comprises the following steps: acquiring an original point cloud, generating a semantic space point cloud based on the original point cloud, and calculating semantic space characteristics of the semantic space point cloud; generating semantic space enhancement features based on the semantic space features and other features of the original point cloud, and generating semantic space enhancement pseudo images based on the semantic space enhancement features; and performing loop detection based on the semantic space enhanced pseudo image. The scene recognition system is capable of executing the scene recognition method described above.

Description

Semantic point cloud fusion automatic driving scene identification method and storage medium

Technical Field

The invention relates to the fields of robots and autopilots, in particular to an autopilot scene identification method and storage medium integrating semantic point clouds.

Background

With the continued development of autopilot, it is desirable for vehicles to possess the ability to fully perceive the environment. Therefore, mapping the surrounding environment and locating the position of the vehicle (Simultaneous Localization and Mapping, i.e., SLAM) in real time becomes a key issue in this field. In the process of mapping the surrounding environment and positioning the self position in real time by using a laser radar, loop detection is an indispensable loop, and a scene recognition technology is a key point. For example, in a scenario where the vehicle is actually running, it is often necessary to locate the vehicle for a long time and distance, but the pose error estimated by the front-end odometer increases as the distance the vehicle moves increases. Therefore, loop-back detection at the back end is required to correct the error. That is, when the vehicle returns to the same position again, the current scene frame in the history map frame needs to be recognized by the scene recognition technique, so that the relative pose between the current frame and the history frame is calculated to optimize the history track positioning result of the vehicle. In addition, repositioning using scene recognition techniques is also required to obtain the pose of the current frame when locating the vehicle with high accuracy in a known map.

At present, the loop detection application in the process of mapping the surrounding environment and positioning the self position in real time by the laser radar is widely based on a Scan-Context algorithm. The algorithm is simple and easy to use, is convenient to deploy and has higher detection precision. The algorithm firstly carries out polar coordinate projection on point clouds under a top view, then converts three-dimensional point cloud data into two-dimensional image data for processing, takes the intensity or height and other characteristics of the point clouds as pixel values of a pseudo image corresponding area corresponding to the point clouds, calculates descriptors of the images according to rows, and measures similarity values of two frames of point clouds through similarity calculation. However, the inventors have found that: the processing of the method loses part of information of the point cloud, meanwhile, a mode of describing the scene by only one feature is not comprehensive, and more false detection and missing detection can occur in actual use, so that defects exist in the use of a real automatic driving scene. Therefore, there is a need for improvements to the existing scene recognition methods described above.

Disclosure of Invention

Aiming at the problem that the accuracy rate of the existing scene recognition method for recognizing the closed-loop frames of the point cloud in the complex environment is not high, the technical problem which is mainly solved by the invention is to provide the scene recognition method and the system for fusing the semantic space topology of the point cloud.

According to a first aspect, in one embodiment, an automatic driving scene recognition method for fusing semantic point clouds is provided. The scene recognition method comprises the following steps: acquiring an original point cloud, generating a semantic space point cloud based on the original point cloud, and calculating semantic space characteristics of the semantic space point cloud; generating semantic space enhancement features based on the semantic space features and other features of the original point cloud, and generating semantic space enhancement pseudo images based on the semantic space enhancement features; and performing loop detection based on the semantic space enhanced pseudo image.

In some embodiments, the generating, based on the original point cloud, a semantic space point cloud corresponding to the original point cloud, and calculating semantic space features of the semantic space point cloud include:

performing semantic segmentation on the original point cloud, clustering the original point cloud subjected to the semantic segmentation to obtain semantic space point cloud corresponding to the original point cloud, wherein the semantic space point cloud comprises a plurality of semantic point cloud clusters, each semantic point cloud cluster is provided with a corresponding semantic label, and calculating and storing the mass center of each semantic point cloud cluster;

and carrying out the process on each semantic point cloud cluster: determining the semantic point cloud cluster as a target point cloud cluster, determining k original adjacent point cloud clusters with semantic labels identical to the target point cloud clusters from other semantic point cloud clusters in the semantic space point cloud, determining areas or volumes of a target graph and the target graph based on the target point cloud clusters and the k original adjacent point cloud clusters, and determining semantic space features of the target point cloud clusters based on the areas or volumes of the target graph.

In some embodiments, the determining the area or volume of the target graph and the target graph based on the target point cloud cluster and the k original neighboring point cloud clusters includes: determining a target centroid of the target point cloud cluster and neighbor centroids of the k original neighbor point cloud clusters; determining a plurality of target graphs based on the target centroid of the target point cloud cluster and the neighbor centroids of k original neighbor point cloud clusters; and calculating the areas or volumes of the target graphs so as to take the sum of the areas or volumes of the target graphs as the semantic space characteristics of the target point cloud cluster.

In some embodiments, the determining the area or volume of the target graph and the target graph based on the target point cloud cluster and the k original neighboring point cloud clusters includes: determining a target centroid of the target point cloud cluster and neighbor centroids of the k original neighbor point cloud clusters; determining a nearest neighbor centroid and (k-1) secondary nearest neighbor centroids from the neighbor centroids of the k original nearest neighbor cloud clusters; determining a plurality of target graphs based on a target centroid, a nearest neighbor centroid and a next nearest neighbor centroid of the target point cloud cluster; calculating (k-1) distances between the secondary neighbor centroids and the target centroids, and an area or volume of the target graph, respectively; and weighting areas or volumes of a plurality of target graphs based on the distances corresponding to the target graphs, and taking the weighted areas or weighted volumes obtained by weighting as semantic space features of the target point cloud cluster.

In some embodiments, generating semantic space enhancement features based on the semantic space features and other features of the original point cloud, generating semantic space enhancement pseudo-images based on the semantic space enhancement features, comprises: acquiring the original point cloud or semantic space point cloud; setting a radius of an interested region, and carrying out space division on point clouds in the radius of the interested region according to polar coordinates on a top view to obtain a plurality of unit regions; assigning and normalizing the unit area by utilizing other characteristics of the point cloud corresponding to the unit area; superposing semantic space features corresponding to the unit areas with other features corresponding to the unit areas respectively and normalizing again to obtain semantic space enhancement features corresponding to the unit areas; and generating a semantic space enhancement pseudo image based on the semantic space enhancement features corresponding to the unit areas.

In some embodiments, generating semantic space enhancement features based on the semantic space features and other features of the original point cloud, generating semantic space enhancement pseudo-images based on the semantic space enhancement features, comprises: acquiring the original point cloud or semantic space point cloud; setting a radius of an interested region, and carrying out space division on point clouds in the radius of the interested region according to polar coordinates on a top view to obtain a plurality of unit regions; assigning and normalizing the unit area by utilizing other characteristics of the point cloud corresponding to the unit area; correcting semantic space features corresponding to the unit area according to semantic tags of the target point cloud clusters corresponding to the unit area so as to obtain corrected semantic space features corresponding to the unit area; superposing the corrected semantic space features corresponding to the unit areas with other features corresponding to the unit areas and normalizing again to obtain corrected semantic space enhancement features corresponding to the unit areas; and generating a semantic space enhancement pseudo image based on the corrected semantic space enhancement features corresponding to the unit areas.

In some embodiments, determining k original neighboring point cloud clusters having the same semantic tag as the target point cloud cluster from other semantic point cloud clusters in the spatial point cloud clusters includes: searching a plurality of other semantic point cloud clusters with the same semantic label as the target point cloud cluster from the other semantic point cloud clusters in the space point cloud clusters; respectively calculating the centroid distances from the centroids of the plurality of other semantic point cloud clusters to the target centroids of the target point cloud clusters; and determining k other semantic point cloud clusters closest to the target centroid as original adjacent point cloud clusters corresponding to the target point cloud cluster based on the centroid distance.

In some embodiments, performing subsequent loop-back detection based on the semantic space-enhancing features and the semantic space-enhancing pseudo-images comprises: taking the semantic space enhancement pseudo image as a first descriptor; and carrying out subsequent loop detection based on the first descriptor.

In some embodiments, performing subsequent loop-back detection based on the semantic space-enhancing features and the semantic space-enhancing pseudo-images comprises: calculating a global descriptor of the semantic space enhancement pseudo image; converting the global descriptor into a one-dimensional feature vector, and taking the one-dimensional feature vector as a second descriptor; and carrying out subsequent loop detection based on the second descriptor.

In some embodiments, performing subsequent loop-back detection based on the semantic space-enhancing features and the semantic space-enhancing pseudo-images comprises: calculating a global descriptor of the semantic space enhancement pseudo image; and carrying out weighted self-adaptive fusion on the global descriptors of the plurality of semantic space enhancement pseudo images to obtain a third descriptor, and carrying out subsequent loop detection based on the third descriptor.

In some embodiments, performing subsequent loop-back detection based on the first descriptor or the second descriptor or the third descriptor comprises: constructing a KD-Tree based on the first descriptor or the second descriptor or the third descriptor to generate a historical map set, and determining a key frame in a historical frame corresponding to the historical map set according to a time interval and a similarity value; searching and matching a current frame in the historical map set, and judging that a loop frame is detected if the similarity value of the current frame and the key frame is larger than a preset threshold value; and if the similarity value of the current frame and the key frame is smaller than or equal to a preset threshold value, judging that the current frame does not meet loop-back, and generating a new key frame by using the KD-Tree.

According to a second aspect, in one embodiment a computer readable storage medium is provided. The computer-readable storage medium includes a program. Which is executable by a processor to implement the scene recognition method according to any of the embodiments of the present application

The beneficial effects of this application are:

according to some embodiments of the present disclosure, compared with other loop detection algorithms, the scene recognition method provided in the present disclosure describes a scene by introducing spatial topology information (i.e., the target graphics and semantic spatial features obtained based on the target graphics), so that description and recognition between different scenes are more differentiated, and meanwhile, similarity between the same scenes can be enhanced. Then, the area or volume of the completed target graph consisting of the target centroid, the nearest centroid and the next-nearest centroid is calculated, and semantic space characteristics are calculated based on the area or volume of the target graph. And then, superposing/fusing the semantic space features with other features (such as the height features or the reflection intensity features of the point cloud) to obtain semantic space enhancement features, so that the semantic space enhancement features can be used for describing the global features of the point cloud and retaining the local features of the same class. In addition, semantic constraint is introduced into semantic space enhancement features obtained through semantic segmentation and clustering, different weights are set on the basis of semantic labels, semantic space features are corrected, corrected semantic space features are obtained, further subsequent detection is carried out on the basis of the corrected semantic space features, and therefore the features of a dynamic target are weakened, interference of the dynamic target on scene recognition is reduced, and robustness of subsequent loop detection in a complex scene is improved.

Drawings

FIG. 1 is an overall flowchart of an automatic driving scene recognition method of fusing semantic point clouds according to one embodiment;

FIG. 2 is a flow chart of a first descriptor-based scene recognition method according to one embodiment;

FIG. 3 is a flow chart of a scene recognition method based on a second descriptor according to one embodiment;

FIG. 4 is a flow chart of a third descriptor-based scene recognition method according to one embodiment;

FIG. 5 is a flow chart of determining semantic spatial features of a target point cloud cluster based on the area or volume of a target graphic, according to one embodiment;

FIG. 6 is a schematic diagram of a target graphic of an embodiment;

FIG. 7 is a flow diagram of generating semantic space enhanced pseudo-images based on semantic space features and other features of an original point cloud for one embodiment;

FIG. 8 is a flow chart of generating semantic space enhanced pseudo-images based on semantic space features and other features of an original point cloud according to another embodiment;

FIG. 9 is a flow diagram of a global descriptor for computing semantic space-enhanced pseudo-images, according to one embodiment;

FIG. 10 is a flow diagram of acquiring a third descriptor by adaptive fusion, according to one embodiment.

Detailed Description

The invention will be described in further detail below with reference to the drawings by means of specific embodiments. Wherein like elements in different embodiments are numbered alike in association. In the following embodiments, numerous specific details are set forth in order to provide a better understanding of the present application. However, one skilled in the art will readily recognize that some of the features may be omitted, or replaced by other elements, materials, or methods in different situations. In some instances, some operations associated with the present application have not been shown or described in the specification to avoid obscuring the core portions of the present application, and may not be necessary for a person skilled in the art to describe in detail the relevant operations based on the description herein and the general knowledge of one skilled in the art.

Furthermore, the described features, operations, or characteristics of the description may be combined in any suitable manner in various embodiments. Also, various steps or acts in the method descriptions may be interchanged or modified in a manner apparent to those of ordinary skill in the art. Thus, the various orders in the description and drawings are for clarity of description of only certain embodiments, and are not meant to be required orders unless otherwise indicated.

Literature gossmatch: graph-of-semantics Matching for Detecting Loop Closures in3d Lidar Data, using RangeNet++ (RangeNet++ is a semantic segmentation network based on laser point clouds, and can be applied to the field of automatic driving in real time) to obtain semantic information of the point clouds, clustering the semantic point clouds, storing the clustered point clouds by adopting a Graph structure, and matching by using vertex descriptors based on histograms. The precision of the method depends on the number of categories used by semantics, but with the improvement of categories, the error of semantic segmentation is increased, and the time consumption of matching is greatly improved.

Literature SSC: semantic Scan Context for Large-scale Place Recognition, the most occurring object semantics in the region are used to assign values to pixels in ScanContext, and semantics are directly used as main features, and are used for fast ICP (Iterative Closest Point short for iterative closest point, used in SLAM for point cloud matching) computation. The method uses truth data in a data set to carry out semantic assignment, is inconsistent with the actual model reasoning situation, cannot guarantee accuracy, and ignores the spatial distribution characteristics of objects in a scene.

Literature OverLapNet: a Siamese NetworK for Computing LiDAR Scan Similarity with Applications to Loop Closing and Localization the front view projection is carried out on the point cloud, a plurality of characteristic modal diagrams such as a semantic diagram, a distance diagram and reflection intensity are combined together, the characteristics are encoded and stacked through a convolution neural network with shared weights, and the overlapping rate and the yaw angle of the point cloud are estimated.

Literature SGPR: semantic Graph Based Place Recognition for 3A closed loop detection algorithm based on a graph neural network is proposed, a scene is modeled into a semantic graph form, semantic and spatial topology information of Point cloud is reserved, a scene recognition problem is modeled into a graph matching problem, and similarity values are learned through the graph neural network. However, the semantic tags used in the closed loop detection algorithm come from RangeNet++, so that the lower mIoU (Mean Intersection over Union, namely average cross-over ratio) in multiple categories can influence the matching of the subsequent semantic graphs, the method does not distinguish between dynamic objects and static objects, and the accuracy of a dynamic scene is difficult to ensure.

In summary, comprehensive analysis on related documents and patents shows that the existing scene recognition method cannot meet the detection requirement on scene recognition under the outdoor dynamic scene, and the robustness and instantaneity of scene recognition are still to be further improved. Therefore, there is a need for improvements to the existing scene recognition methods described above.

The conception process of the application is as follows: because the existing Scan-Context-based method only uses one mode feature to describe the scene, and data loss exists in the process of converting the point cloud from three-dimensional data to two-dimensional data, the current scene cannot be effectively described by using single features such as the Z value of the highest point in the divided point cloud (namely the height feature of the point cloud) or the reflection intensity feature of the point cloud, and more mismatching can occur when searching and matching are performed in the map set through the single features. In addition, the existing scene recognition method lacks of perception of the whole space environment, isolates the relation among all objects in space, and introduces dynamic objects at the same time, so that the scene recognition accuracy is low, and the robustness in a long-scale scene is low. And the inventor finds that through the research of the prior literature: the method for describing the scene features by using the space triangle exists in the registration of the point cloud and the closed loop detection of the visual SLAM, and the accuracy of scene identification is better improved by introducing the space topological relation of the point cloud; in the existing point cloud closed loop detection, some methods also consider the method of pooling by using the point cloud features of the nearest neighbors in space or using the space topology features in a pattern mode, but do not directly use the space triangle area as the feature to carry out assignment. Therefore, the application takes the space topological relation in the added point cloud pseudo image as a guiding idea, introduces the semantic segmentation of the point cloud to improve the distinguishability of each point cloud, and weakens the potential dynamic point cloud targets. That is, semantic segmentation is performed on the point cloud, the point cloud clusters obtained by the semantic segmentation are clustered, a neighboring point cloud cluster with the same semantic label as the current point cloud cluster is searched, the area or volume of a target graph formed by the centroid of the current point cloud cluster and the centroid of the neighboring point cloud cluster is calculated, the area or volume is weighted based on the distance between the centroid of the neighboring point cloud cluster and the centroid of the current point cloud cluster, and the area/volume obtained by the weighting is used as the pixel value of a pseudo image corresponding to the point cloud. Because the point cloud obtained through the clustering is focused on the local features, the pseudo images are respectively overlapped and normalized with the pseudo images corresponding to other features (such as height features and reflection intensity features) of the point cloud, and then the semantically enhanced semantically spatial enhancement features and semantically spatial enhancement pseudo images are respectively obtained. And finally, carrying out subsequent loop detection based on the semantic space enhancement features and the semantic space enhancement pseudo images.

For a better understanding of the invention, some schemes and terms will be described.

(1)Scan-context

Scan-context is a global positioning method based on structural information, and is independent of histogram corpus descriptors or a machine learning method, and 3D structural information of point cloud is directly recorded. Scan-context is a 3D point cloud descriptor based on outdoor scene recognition, and the 3D point cloud descriptor converts 3D point cloud information into a 2D matrix, and has the following advantages: efficient coding functions, retention of point cloud structure information and a two-step matching algorithm.

(2)KD-Tree

KD-Tree (K-dimensional Tree), which is a Tree data structure that stores example points in K-dimensional space for quick retrieval. KD-Tree is mainly applied to searching of multidimensional space key data (such as range searching and nearest neighbor searching). KD-Tree is a special case of binary spatial segmentation trees.

Aiming at the problem that the accuracy rate of identifying the closed-loop frames of the point cloud is not high in the conventional scene identification method of the point cloud in a complex environment, some embodiments of the application disclose an automatic driving scene identification method of fusing semantic point cloud. The scene recognition method can be used for repositioning scenes such as outdoor mobile robots and/or automatic driving vehicles. The scene recognition method not only can describe the characteristics of each scene robustly, differently and effectively, but also can match the current scene with the map library accurately and rapidly, and can realize scene repositioning for robots, automatic driving vehicles and the like, so as to improve the positioning accuracy of the robots, the automatic driving vehicles and the like under the condition of long-time movement.

Referring to fig. 1, the automatic driving scene recognition method for fusing semantic point clouds includes:

s100: acquiring an original point cloud, generating a semantic space point cloud based on the original point cloud, and calculating semantic space characteristics of the semantic space point cloud;

s200: generating semantic space enhancement features based on the semantic space features and other features of the original point cloud, and generating semantic space enhancement pseudo images based on the semantic space enhancement features;

s300: and performing loop detection based on the semantic space enhanced pseudo image.

In some embodiments, in the step S100, generating a semantic space point cloud corresponding to the original point cloud based on the original point cloud, and calculating semantic space features of the semantic space point cloud includes:

s110: performing semantic segmentation on the original point cloud, clustering the original point cloud subjected to the semantic segmentation to obtain semantic space point cloud corresponding to the original point cloud, wherein the semantic space point cloud comprises a plurality of semantic point cloud clusters, each semantic point cloud cluster is provided with a corresponding semantic label, and calculating and storing the mass center of each semantic point cloud cluster;

s120: and carrying out the process on each semantic point cloud cluster:

S121: determining the semantic point cloud cluster as a target point cloud cluster, and determining k original adjacent point cloud clusters with the same semantic label as the target point cloud cluster from other semantic point cloud clusters in the semantic space point cloud;

s122: and determining the areas or volumes of a target graph and the target graph based on the target point cloud clusters and the k original adjacent point cloud clusters, and determining the semantic space characteristics of the target point cloud clusters based on the areas or volumes of the target graph.

In some embodiments, the semantic space point cloud is comprised of a plurality of semantic point cloud clusters. Each semantic point cloud cluster is provided with a corresponding semantic label. The semantic space point cloud is obtained by carrying out semantic segmentation and clustering treatment on the original point cloud. The semantic space features are used to characterize local features of the point cloud.

In some embodiments, step S100: an original point cloud is obtained. The raw point cloud may be acquired by one or more of a laser scanner, a depth camera, and/or a binocular camera, etc. disposed on the robotic/autopilot vehicle. The original point cloud may also be obtained through other channels, and the obtaining way of the original point cloud is not limited here. The original point cloud obtained in step S100 may be point cloud data corresponding to a single key frame in loop detection.

In some embodiments, the original point cloud may be semantically segmented using a Cylinder3D semantic segmentation model. Cylinder3D is an effective three-dimensional framework for driving scene lidar semantic segmentation.

In some embodiments, other semantic segmentation algorithms may also be used to semantically segment the original point cloud. For example, other semantic segmentation algorithms may include: random sampling coincidence method, conditional European cluster segmentation, segmentation based on region growth, and segmentation based on region growth of color.

A person skilled in the art can select an appropriate semantic segmentation model/algorithm according to the requirements of the actual application scene, and the semantic segmentation model/algorithm adopted by the semantic segmentation of the original point cloud is not limited.

In some embodiments, semantic segmentation of the original point cloud comprises: semantic segmentation of dynamic and static objects. The specific types of the point clouds (such as dynamic objects and static objects) obtained after the original point clouds are subjected to semantic segmentation are not limited.

In semantic segmentation of dynamic and static objects, the semantic segmentation dataset of the KITTI may be used as a training set or validation set of the semantic segmentation model. The KITTI data set is a public data set which is jointly sponsored by Karsuearly institute of technology and Toyota American society of technology, and is obtained by utilizing assembled acquisition vehicles with complete equipment to acquire data of an actual traffic scene.

In addition, a laser radar equipped on a robot/autopilot vehicle can also be used to collect the point cloud of the outdoor scene, and the point cloud of the outdoor scene can be used as a verification set of the semantic segmentation model.

Those skilled in the art may select an appropriate semantic segmentation data set according to the requirements of the actual application scenario, and the semantic segmentation data set is not limited herein.

In some embodiments, when using the Cylinder3D semantic segmentation model to perform semantic segmentation on the dynamic class and the static class of the original point cloud, the classification loss function in the Cylinder3D semantic segmentation model can be modified into a combination of Lovasz-range (a loss function commonly used in the technical field) and weighted binary cross entropy, and then the semantic segmentation model is subjected to binary semantic segmentation network training on the dynamic class and the static class by using the data sets such as the semantic segmentation data set of the KITTI.

In some embodiments, other semantic segmentation algorithms may be used to perform semantic segmentation on the dynamic class and the static class on the original point cloud to obtain semantic tags of the static target and the dynamic target. Dynamic classes include objects, such as vehicles and/or pedestrians, that are temporarily stationary or in motion. The dynamic class may also include other objects that are temporarily stationary or in motion, and the specific class of dynamic class is not limited herein. Static classes include stationary objects such as trees, signal lights, and the like. The static class may also include other stationary objects in a stationary state, and the specific class of static class is not limited herein.

In some embodiments, the semantic segmentation process simplifies multi-category semantic segmentation into binary semantic segmentation, that is, binary semantic segmentation of dynamic and static categories is performed on the original point cloud, based on the point cloud semantic segmentation model obtained through training.

It can be seen that in some embodiments, the simplified binary classification can greatly improve the accuracy of semantic segmentation, and reduce the error of semantic segmentation on samples difficult to classify such as bicycles, trucks and the like.

In some embodiments, clustering of point clouds is performed on point clouds belonging to dynamic class and/or point clouds belonging to static class, which are obtained through semantic segmentation, so as to obtain semantic space point clouds corresponding to the original point clouds. For example, the clustering may employ European clustering. Other clustering methods can be used for the above clustering. The specific method of clustering is not limited here. For example, when the euclidean clustering is employed, the search radius may be set to 0.13 meters, and the distance threshold may be set to 1 meter. The searching radius and the distance threshold value can be adjusted according to actual requirements, and are not limited here.

It can be seen that in some embodiments, clustering of point clouds is performed on point clouds belonging to a dynamic class and/or point clouds belonging to a static class, so as to obtain a semantic space point cloud corresponding to an original point cloud at an object level.

In some embodiments, the semantic space point cloud comprises a plurality of semantic point cloud clusters. Each semantic point cloud cluster is provided with a corresponding semantic label. The semantic tags include: dynamic class labels or static class labels.

In some embodiments, in step S120, each of the semantic point cloud clusters is performed: determining the semantic point cloud cluster as a target point cloud cluster, searching k original adjacent point cloud clusters with semantic labels identical to the target point cloud clusters from other semantic point cloud clusters in the space point cloud clusters, determining the areas or volumes of a target graph and the target graph based on the target point cloud clusters and the k original adjacent point cloud clusters, and determining the semantic space characteristics of the target point cloud clusters based on the areas or volumes of the target graph. The operation process can be performed on each semantic point cloud cluster from the beginning of the first semantic point cloud cluster to the ending of the last semantic point cloud cluster. The above operation process can also be performed on each semantic point cloud cluster at the same time. The timing of the above operation process for each semantic point cloud cluster is not limited here.

It should be noted that, a certain semantic point cloud cluster is designated as a target point cloud cluster, which is only for convenience in distinguishing and describing the subsequent original adjacent point cloud cluster and the target graph, and does not mean that a specific target point cloud cluster exists in the point cloud.

In some embodiments, the other features include one or more of average height features, reflected intensity features, highest height features and normal vector features,

the other feature artifacts include one or more of an average height feature artifact, an average reflected intensity feature artifact, a highest height feature artifact, and a normal vector feature artifact,

the semantic segmentation of the original point cloud comprises: and carrying out semantic segmentation on the dynamic object and the static object on the original point cloud.

In some embodiments, for other modes in the multi-mode fusion, an average height feature on the z-axis of the point cloud or a reflection intensity feature of the point cloud may be selected, and other features, such as a highest height feature of the z-axis, a normal vector of the point cloud, and the like, may also be used.

It should be noted that, since the acquisition of the other features belongs to common knowledge in the art, a description of the acquisition process of the other features is not repeated here.

In some embodiments, determining k original neighboring point cloud clusters having the same semantic tag as the target point cloud cluster from other semantic point cloud clusters in the spatial point cloud clusters includes:

searching a plurality of other semantic point cloud clusters with the same semantic label as the target point cloud cluster from the other semantic point cloud clusters in the space point cloud clusters;

respectively calculating the centroid distances from the centroids of the plurality of other semantic point cloud clusters to the target centroids of the target point cloud clusters;

and determining k other semantic point cloud clusters closest to the target centroid as original adjacent point cloud clusters corresponding to the target point cloud cluster based on the centroid distance.

In some embodiments, after the semantic space point cloud is obtained, the centroid of each semantic point cloud cluster within the semantic space point cloud is calculated separately and stored. For example, a hash table or other means may be employed to store the centroid of each semantic point cloud cluster.

In some embodiments, a corresponding calculation may be performed on each semantic point cloud cluster having the same semantic label as the target point cloud cluster, so as to obtain a centroid of each semantic point cloud cluster, and then a hash table or an adjacency table is used to store the centroid of each semantic point cloud cluster.

In some embodiments, corresponding computation may be performed on only k original neighboring point cloud clusters to obtain neighboring centroids of the k original neighboring point cloud clusters, and then the neighboring centroids of the k original neighboring point cloud clusters are stored by using a hash table or an adjacency table.

In some embodiments, a plurality of other semantic point cloud clusters having the same semantic label as the target point cloud cluster may be searched directly based on the semantic label of the target point cloud cluster. The semantic tags may be dynamic class tags or static class tags. For example, when the semantic tag of the target point cloud cluster is a static type tag, a search is performed on the surrounding space (such as within a threshold distance range) of the target point cloud cluster to search out all semantic point cloud clusters whose semantic tags are also static type tags.

In some embodiments, since the centroid of each semantic point cloud cluster in the semantic space point cloud is calculated and stored before the original neighboring point cloud cluster is determined, coordinates of the centroid of the target point cloud cluster and the centroids of the searched plurality of other semantic point cloud clusters having the same semantic label as the target point cloud cluster can be directly obtained, and the centroid distance from the centroids of the other semantic point cloud clusters to the target centroid can be calculated based on the coordinates. Each of the other semantic point cloud clusters has a centroid distance corresponding to the centroid of the target.

In some embodiments, k is a positive integer. The value of k can be flexibly adjusted according to the requirements of actual application scenes. For example, k may be equal to 5.

In some embodiments, a certain semantic point cloud cluster in the semantic space point cloud may be designated as a first semantic point cloud cluster or a last semantic point cloud cluster according to actual application requirements.

It should be noted that, the first semantic point cloud cluster or the last semantic point cloud cluster is not a point cloud cluster representing a semantic space point cloud, but only for convenience of description of the subsequent target point cloud cluster and the original neighboring point cloud cluster.

In some embodiments, the determining the area or volume of the target graph and the target graph based on the target point cloud cluster and the k original neighboring point cloud clusters, and determining the semantic spatial feature of the target point cloud cluster based on the area or volume of the target graph includes:

determining a target centroid of a target point cloud cluster and neighbor centroids of k original neighbor point cloud clusters;

determining a plurality of target graphs based on the target centroid of the target point cloud cluster and the neighbor centroids of k original neighbor point cloud clusters;

and calculating the areas or volumes of the target graphs so as to take the sum of the areas or volumes of the target graphs as the semantic space characteristics of the target point cloud cluster.

In some embodiments, a target centroid of a target point cloud cluster and neighbor centroids of k original neighbor point cloud clusters are determined. Because the centroid of each semantic point cloud cluster in the semantic space point cloud is calculated and stored before the original adjacent point cloud cluster is determined, when a certain semantic point cloud cluster in the semantic space point cloud is used for determining a target point cloud cluster, the target centroid of the target point cloud cluster is also determined; meanwhile, when (k-1) semantic point cloud clusters except for the target point cloud cluster in the semantic space point cloud are determined as (k-1) original adjacent point cloud clusters, the adjacent centroids of the original adjacent point cloud clusters are also determined.

In some embodiments, a plurality of the target patterns are determined based on target centroids of the target point cloud clusters and neighbor centroids of k of the original neighbor point cloud clusters. For example, one nearest neighbor centroid and (k-1) next nearest neighbor centroids may be determined from the neighbor centroids of the k original nearest neighbor cloud clusters. Then, a target graph is determined based on one or more of the target centroid, nearest neighbor centroid and (k-1) next-nearest neighbor centroid.

In some embodiments, the target pattern may be a planar pattern. For example, the target graphic may be a triangle. When the target pattern is a triangle, a target pattern (i.e., a target triangle) may be determined based on any of the target centroid, nearest neighbor centroid and (k-1) nearest neighbor centroids, i.e., the target centroid, nearest neighbor centroid and (k-1) nearest neighbor centroids are connected to form a target pattern. For another example, the target graphic may be a quadrilateral. When the target pattern is a quadrilateral, one target pattern (i.e., target quadrilateral) may be determined based on any two of the target centroid, nearest neighbor centroid, and (k-1) next-nearest neighbor centroids.

In some embodiments, the target graphic may be a stereoscopic graphic. For example, the target pattern may be a triangular pyramid. When the target pattern is a triangular pyramid, the bottom surface of the target pattern may be determined based on any one of the target centroid, the nearest neighbor centroid and (k-1) secondary neighbor centroids, and then one of the other secondary neighbor centroids is determined as the vertex of the target pattern, thereby determining the target pattern.

In some embodiments, the area or volume of the plurality of target graphics is calculated, so that the sum of the areas or volumes of the plurality of target graphics is used as the semantic space feature of the target point cloud cluster. Since the area or volume of each target pattern is easily calculated by those skilled in the art after the target pattern is determined, a description of a process for calculating the sum of the areas or volumes of each target pattern will not be repeated here.

In some embodiments, referring to FIG. 6, where the target graph is a triangle, the (k-1) target graphs may be determined based on the target point cloud cluster and the k original neighboring point cloud clusters. The above number (k-1)The sum of the areas of the target graphics can be directly used as the semantic space characteristics of the target point cloud cluster corresponding to the target graphics. Wherein P is ₀ Representing the centroid of the current target point cloud cluster; p (P) ₁ Representing nearest neighbor centroids of nearest neighbor cloud clusters; in the case where k is equal to 5, the 4 secondary neighbor centroids in FIG. 6 are P respectively ₂ 、P ₃ 、P ₄ And P ₅ . Wherein S is ₁ 、S ₂ 、S ₃ And S is ₄ The areas of the first to fourth target patterns, respectively. That is to say S ₁ 、S ₂ 、S ₃ And S is ₄ The sum of the semantic space features can be directly used as the semantic space features of the current target point cloud cluster.

It should be noted that, since the calculation of the volumes of the plurality of target patterns corresponding to the target point cloud cluster is similar to the principle of the calculation of the target patterns, the calculation process of the volumes of the target patterns corresponding to the target point cloud cluster is not repeated here.

It should be noted that, a person skilled in the art may determine the shape and the number of the target graphics corresponding to the target point cloud cluster according to the requirements of the actual application scenario, and the shape and the number of the target graphics are not limited herein.

In some embodiments, please refer to fig. 5, S122: the determining the area or volume of the target graph and the target graph based on the target point cloud cluster and the k original adjacent point cloud clusters, and determining the semantic space characteristics of the target point cloud cluster based on the area or volume of the target graph includes:

S122a: determining a target centroid of a target point cloud cluster and neighbor centroids of k original neighbor point cloud clusters;

s122b: determining a nearest neighbor centroid and (k-1) secondary nearest neighbor centroids from the neighbor centroids of the k original nearest neighbor cloud clusters;

s122c: determining (k-1) the target graphs based on a target centroid, a nearest neighbor centroid and a next nearest neighbor centroid of the target point cloud cluster;

s122d: calculating (k-1) distances between the secondary neighbor centroids and the target centroids, and (k-1) areas or volumes of the target patterns, respectively;

s122e: and weighting the areas or volumes of the (k-1) target graphs based on the distances corresponding to the target graphs, and taking the weighted areas or weighted volumes obtained by weighting as semantic space features of the target point cloud cluster.

In some embodiments, distances from the neighbor centroids of k original neighbor point cloud clusters corresponding to the target point cloud cluster to the target centroid may be calculated, then the nearest neighbor centroid to the target centroid is determined as the nearest neighbor centroid, and the remaining (k-1) neighbor centroids are determined as the secondary neighbor centroids. Thereafter, a target graph may be determined based on one or more of the target centroid and nearest neighbor centroid of the target point cloud cluster and the next nearest neighbor centroid. For example, the target graph determined by one of the target centroid and nearest neighbor centroid of the target point cloud cluster and the next nearest neighbor centroid is a triangle; the target graph determined by two of the target centroid and nearest neighbor centroid of the target point cloud cluster is quadrilateral. Thereafter, a nearest neighbor centroid and (k-1) secondary neighbor centroids may be determined from the neighbor centroids of the k original neighboring point cloud clusters. Thereafter, a plurality of target patterns may be determined based on the target centroid, nearest neighbor centroid, and next nearest neighbor centroid of the target point cloud cluster.

In some embodiments, the imaging of the point cloud (e.g., semantic spatial point cloud or original point cloud) is sparse as it is farther from the target centroid; at the same time, the error between the measured value and the true value of the spatial position of the point cloud farther from the target centroid is also larger, and therefore, the measurement uncertainty of the spatial position of the point cloud or the target pattern farther from the target centroid increases as the distance from the target centroid increases, compared with the target pattern closer to the target centroid, among the plurality of target patterns composed of the target centroid, the nearest neighbor centroid, and the next neighbor centroid. Thus, in some embodiments, it is desirable to reduce the area or volume of the target pattern that is farther from the target centroid by weighting the inverse of the distance of the center (e.g., centroid or geometric center) of the target pattern from the current target centroid. Therefore, it is also necessary to calculate the distances between the centroids of the (k-1) target patterns and the target centroids, and the areas or volumes of the plurality of target patterns, respectively. And then, weighting the areas or volumes of the plurality of target graphs based on the distances corresponding to the target graphs, and taking the weighted areas or weighted volumes obtained by weighting as semantic space features of the target point cloud cluster. Wherein the calculation formula for weighting the area of the target graph based on the distance from the center of the target graph (such as the centroid or geometric center) to the current target centroid is as follows:

In the above formula (1), S _spa A weighted area obtained by weighting areas of a plurality of target patterns corresponding to the target point cloud cluster; s is S _i The areas of the target patterns corresponding to the target point cloud clusters are respectively; d, d _i The distances from the center (such as centroid or geometric center) of the (k-1) target patterns to the current target centroid, respectively; τ is a distance threshold; alpha is a power transform coefficient; d, d _min The preset minimum value is generally 0.5 meter, and in addition, a person skilled in the art can adjust the preset minimum value according to the actual application scenario. In some embodiments, d _min The value of (2) is less than 5 m. In some embodiments, k is 5, τ is 10, α is 1.5, i is a positive integer, and i is sequentially 1 to (k-1). Of course, the values of k, τ, α may be flexibly adjusted based on the actual requirements of the application scenario.

For example, referring to fig. 6, in the case where the target graph is a triangle, where k is equal to 5, a nearest neighbor centroid P may be determined from the nearest neighbor centroids of the 5 original nearest neighbor cloud clusters ₁ And four secondary neighbor centroids (i.e., P in FIG. 6 ₂ 、P ₃ 、P ₄ 、P ₅ ). Then, four target graphs (i.e., target triangles) may be determined based on the target centroid, nearest neighbor centroid, and next-nearest neighbor centroid of the target point cloud cluster. Thereafter, the centroids of the four target patterns and the target centroid (i.e., P in FIG. 6) are calculated separately ₀ ) Distance between the two, and the areas of the four target patterns.Wherein S is ₁ 、S ₂ 、S ₃ And S is ₄ The areas of the first to fourth target patterns, respectively. And then, weighting the area of the target graph based on the distance between the centroid of the target graph and the centroid of the target graph, and taking the weighted area obtained by weighting as the semantic space characteristic of the cloud cluster of the target point.

The purpose of setting the distance threshold τ in the above formula (1) is to consider only the distance from the next-neighbor centroid to the current target centroid within the distance threshold range of the target point cloud cluster. That is, only the contribution of the weighted area or weighted volume of the secondary neighbor centroid and the target centroid and nearest neighbor centroid of the target graph within the range of the distance threshold τ of the target point cloud cluster to the local features of the current target point cloud cluster is considered in some embodiments.

It should be noted that, the calculation of the weighted volume is similar to the calculation of the weighted area, and thus, a description of the calculation of the weighted volume is omitted.

It can be seen that, compared with other scene recognition methods based on point clouds, some embodiments introduce area or volume features of the target graphics as enhancement information to describe the scene with respect to spatial structurality in the scene; while other embodiments introduce weighted area or volume features of the target graphic as enhancement information to describe the scene. That is, the area or volume of the target graph formed by the current target centroid, the nearest centroid and the next nearest centroid is calculated, and then the sum of the area or volume of the target graph is directly used as the semantic space feature of the target point cloud cluster; or, weighting the sum of the areas or volumes of the target graphics based on the distance between the center of the target graphics and the centroid of the current target, and taking the weighted areas or weighted volumes obtained by weighting as semantic spatial features of the target point cloud cluster to describe the local scene information of the current target point cloud cluster through the semantic spatial features of the target point cloud cluster. And then, overlapping the semantic space pseudo image corresponding to the semantic space features with the two-dimensional pseudo image generated by projecting the original point cloud to generate a semantic space enhancement pseudo image, so that the semantic space enhancement pseudo image has global point cloud information and contains local information.

It can be seen that in the visual SLAM loop detection algorithm, the triangle formed by the key points is not usually calculated as another feature, namely, the similarity judgment is directly performed by using the triangle area formed by the point cloud feature points in the point cloud registration algorithm; in some embodiments of the present application, feature point extraction is not performed on the point cloud, but the centroid of the point cloud cluster is calculated after the point cloud is clustered, meanwhile, only the nearest centroid and the next-nearest centroid of other point cloud clusters of the same semantic label are selected to construct a target graph, then the area/volume or weighted area/weighted volume of the target graph is calculated, the area/volume or weighted area/weighted volume of the target graph is used as a local feature (i.e., semantic space feature) of the scene, and then subsequent calculation is performed based on the semantic space feature.

Referring to fig. 7, in the step S200, a semantic space enhancement feature is generated based on the semantic space feature and other features of the original point cloud, and a semantic space enhancement pseudo image is generated based on the semantic space enhancement feature, including:

s210a, acquiring the original point cloud or the semantic space point cloud;

s220a, setting a radius of a region of interest, and carrying out space division on point clouds in the radius of the region of interest according to polar coordinates on a top view to obtain a plurality of unit areas;

S230a, assigning and normalizing the unit area by utilizing other characteristics of the point cloud corresponding to the unit area;

s240a, overlapping the semantic space features corresponding to the unit area with other features corresponding to the unit area and normalizing again to obtain semantic space enhancement features corresponding to the unit area;

s250a, generating a semantic space enhancement pseudo image based on the semantic space enhancement features corresponding to the unit areas.

In some embodiments, the point cloud acquired in step S210a may be original point cloud data of a single key frame, or may be a spatial semantic point cloud of a single key frame. The point cloud acquired in step S210a is not limited here.

It should be noted that, since the spatial positions of the same point cloud cluster in the original point cloud and the semantic space point cloud are the same when the point cloud within the radius of the region of interest (i.e., the ROI radius) is spatially segmented according to the polar coordinates on the top view, the step of spatially segmenting according to the polar coordinates on the top view is just to project the point cloud from the three-dimensional space onto the two-dimensional plane, and thus the point cloud data acquired in the step S210a may be the original point cloud or the semantic space point cloud. The original point cloud contains relatively more point clouds, and the semantic space point cloud is obtained by clustering the original point clouds, so that the number of the contained point clouds is relatively less.

In some embodiments, a semantic spatial point cloud or original point cloud cluster Q _k The information contained can be expressed as:

Q _k ＝(x _i ，y _i ，z _i ，l _i ,m _i ) (2)

wherein Q in the above formula (2) _k The subscript k of (2) represents the number of the centroid of the point cloud cluster in the semantic space or the original point cloud, the subscript i in the above formula represents the ith point cloud cluster, and the value of k can be the same as or different from the value of i, i.e. the value range of k can be determined by a person skilled in the art by himself; above x _i 、y _i And z _i Centroid P representing point cloud clusters of semantic space point cloud or original point cloud respectively _k Coordinates in an initial coordinate system; l (L) _i The semantic label is the semantic label of the ith point cloud cluster in the semantic space point cloud; m is m _i Semantic space feature F for ith point cloud cluster in semantic space point cloud _spa Or other features F of the ith point cloud cluster in the original point cloud _raw 。

In some embodiments, the step of setting a radius of the region of interest and spatially dividing the point cloud within the radius of the region of interest according to polar coordinates on a top view to obtain a plurality of unit areas is approximately as follows: sensing distance resolution L according to setting for point cloud in radius of region of interest _S Dividing the L equally, and carrying out horizontal angle resolution on 360-degree polar angleN _S Is divided into l×n unit areas by dividing the polar coordinates. The unit region may have a fan shape.

In some embodiments, the coordinates of the ith point cloud cluster (i.e., x in equation (2) above) are based on the point cloud top view _i ，y _i ，z _i ) Centroid P of ith point cloud cluster in semantic space point cloud or original point cloud can be obtained _k Polar coordinates of (c). Wherein, the centroid P _k The polar coordinates of (c) can be expressed as:

for a preset horizontal angular resolution N _S And perceived distance resolution L _S The range of the point cloud intercepted by each unit area is as follows:

wherein L in the above formula (5) _ROI The radius of the region of interest is preset; n (N) _S Is the horizontal angular resolution; l (L) _S For perceived distance resolution; i and j represent respectively the ith row of the semantic space enhanced pseudo-image or other feature pseudo-image, and the jth column of the semantic space pseudo-image or other feature pseudo-image (in some embodiments, the upper limit of i is 64, the upper limit of j is 900), R _ij Representing a point cloud taken from a single cell region.

The subscript i in the above formula (2) and R in the above formula (5) _ij The meaning of i in (c) is different. The subscript i in the above formula (2) represents the i-th point cloud cluster; and R is _ij I in (a) represents the i-th row of the projected pseudo image. For example, there are 1000 point clouds in total, then the projection of the ith (i.e., subscript i in equation (2) above) point cloud is not necessarily in the ith (equation (5) above) of the pseudo-image R _ij I) row of (a); the value of i in the point cloud (i.e. subscript i in formula (2) above) ranges from 0 to 999, but i in the pseudo image (R in formula (5) above) _ij The value of i) in (b) is generally in the range of 0 to 64.

It should be noted that the resolution of the semantic space enhancement pseudo-image or other feature pseudo-image may be expressed asWherein i represents the number of lines of the generated pseudo image, and the value range of i is +.>j represents the number of columns of the generated pseudo image, and the value range of j is +.>For example, when L _ROI Taking 50 meters, L _S Taking 1.56 m, N _S Taking 0.4 degrees, the resolution of the resulting pseudo-image is 900×32.

In some embodiments, S230a, the unit area is assigned and normalized by using other characteristics of the point cloud corresponding to the unit area. Firstly, normalizing a plurality of modal features (such as semantic space features, corrected semantic space features or other features) according to the upper limit and the lower limit of a preset feature region of interest. Wherein the normalization here may be in the following way:

wherein F in the above formula (6) _down For characterizing the lower limit of the region of interest, F _up For the upper limit of the feature region of interest, i and j represent the abscissa and ordinate, respectively, of the above-mentioned modality feature in its corresponding two-dimensional pseudo-image. In the above formula (6), F (i, j) represents the modal feature (such as semantic space feature F) before the first normalization process _spa Correction of semantic space features F _spa Or other feature). F in the above formula (6) ^* (i, j) represents the first normalizationProcessed modal features (e.g. semantic spatial features F _spa Correction of semantic space features F _spa Or other features F _raw )。

It should be noted that, those skilled in the art may set different F according to different modal characteristics _up And F _down . For example, when the modal feature in equation (6) is the height feature of the original point cloud, F _up Is 10, F _down Is-2; when the mode characteristic is the reflection intensity characteristic of the point cloud, F _up And F _down 255 and 0, respectively; f when the modal feature is a semantic space feature of the semantic space point cloud _up And F _down 15 and 0, respectively.

Other features of the point cloud within the cell region (such as an average height feature or an average reflected intensity feature within the cell region, etc.) serve as pixel values for the cell region in the other feature pseudo-image.

In some embodiments, one cell region corresponds to one pixel in the pseudo-image corresponding to the cell region.

In some embodiments, S240a, the semantic space features corresponding to the unit region are respectively overlapped with other features corresponding to the unit region and normalized again to obtain semantic space enhancement features. The step of respectively superposing and normalizing the semantic space features with other features of the original point cloud may include:

First, the unit areas generated by the aforementioned spatial division are assigned with other features (such as height features or reflection intensity features) of the original point cloud, and normalized to obtain two-dimensional pseudo images corresponding to the other features (such as height features or reflection intensity features).

It should be noted that, it is common knowledge in the art to generate other feature pseudo images corresponding to the original point cloud by using other features (such as the height feature or the reflection intensity feature) of the point cloud, so that a description of generating the pseudo images corresponding to the other features based on the other features is not repeated here.

Then, other features of each unit area after normalization processing are combined with the unit areaAnd overlapping the semantic space features corresponding to the domains to obtain semantic space enhancement features (such as semantic space enhancement average height features (SSE-HSC) or semantic space enhancement average reflection intensity features (SSE-ISC)) and carrying out normalization processing again, namely reassigning the unit region by utilizing the result obtained by overlapping other features of the unit region and the semantic space features corresponding to the unit region. For example, in the polar diameter (i.e., ρ in the above formula (3) _k ) The direction is the y-axis, at polar angle (i.e., θ in formula (4) above _k ) The direction establishes an image coordinate system for the x-axis. And the semantic space enhancement feature value corresponding to the semantic space enhancement feature in each unit area is used as the pixel value of the unit area in the semantic space enhancement pseudo image.

It should be noted that, after the point cloud is spatially segmented, one unit area may include a plurality of point cloud clusters, and one unit area corresponds to only one pixel in the two-dimensional pseudo image, so that the plurality of point clouds therein need to be feature-pooled. For example, if the other features are height features, then the average height features of the unit areas need to be calculated, and the height distribution of the point clouds in different scenes is not the same, so that the other features need to be normalized for the first time to unify the metrics and restrict them to be compared within a set height range. Then, adding other features subjected to the first normalization processing with the semantic space features, namely adding under the same measurement standard, otherwise, making the dimensions different, and making the normalized other features and the semantic space features be the same dimensions.

In some embodiments, the semantic space enhancement features of the point cloud clusters included in one unit area serve as pixel values of the unit area in the semantic space enhancement pseudo-image. Wherein, a unit area may correspond to a part or all of a point cloud cluster. One unit area may correspond to a part or all of the plurality of point cloud clusters. A point cloud cluster may also be divided into a plurality of cell areas.

In some embodiments, the semantic space features or the corrected semantic space features after the first normalization processing are overlapped or fused with other features corresponding to the unit area, and the following manner may be adopted to overlap or fuse to obtain the semantic space enhancement features corresponding to the unit area:

in the above formula (7), F _spa Representing or modifying semantic space features, F _raw And (3) representing other features, wherein F represents semantic space features or semantic space enhancement features obtained by fusing modified semantic space features with other features at the pixel level.

In some embodiments, one skilled in the art may select other features according to actual needs, for example, the other features may be reflection intensity features, height features, normal vector features, or the like. For example, the semantic space enhancement features may be SSE-HSC (semantic space enhanced mean height features) or SSE-ISC (semantic space enhanced mean reflectance intensity features), etc. The semantic space enhanced average height features are obtained by superposing semantic space features and average height features. The semantic space enhanced average reflection intensity characteristic is obtained by superposing the semantic space characteristic and the reflection intensity characteristic.

In some embodiments, a Scan-Context method may be employed to generate the pseudo-image. In some embodiments, a feature map calculation method such as Iris (i.e., a point cloud Iris descriptor) may also be selected to generate the pseudo image.

It should be noted that, since the generation of the semantic space enhancement pseudo-image and the other feature pseudo-image based on the semantic space enhancement feature and the other feature in the step S250a is common knowledge in the art, a specific process for generating the semantic space enhancement pseudo-image and the other feature pseudo-image is not repeated here.

In some embodiments, please refer to fig. 8, S200, generating semantic space enhancement features based on the semantic space features and other features of the original point cloud, generating semantic space enhancement pseudo-images based on the semantic space enhancement features includes:

s210b, acquiring the original point cloud or the semantic space point cloud;

s220b, setting a radius of an interested region, and carrying out space division on point clouds in the radius of the interested region according to polar coordinates on a top view to obtain a plurality of unit regions;

s230b, assigning and normalizing the unit area by utilizing other characteristics of the point cloud corresponding to the unit area;

S240b, correcting semantic space features corresponding to the unit area according to semantic tags of the target point cloud clusters corresponding to the unit area so as to obtain corrected semantic space features corresponding to the unit area;

s250b, overlapping and normalizing the corrected semantic space features corresponding to the unit areas with other features corresponding to the unit areas respectively to obtain corrected semantic space enhancement features corresponding to the unit areas;

and S260b, generating a semantic space enhancement pseudo image based on the corrected semantic space enhancement features corresponding to the unit areas.

Note that, the generation flow of the semantic space enhancement pseudo image is substantially the same as the generation flow of the semantic space enhancement pseudo image, and therefore the repeated portions are not described again.

In some embodiments, since the dynamic object may cause a larger interference to loop detection, for example, in a case where the semantic tag of the secondary neighboring point cloud cluster far from the target centroid is a dynamic object, when the shape of the target graph corresponding to the secondary neighboring point cloud cluster changes, the distance between the target centroid and other secondary neighboring centroids also changes more; while static objects are more indicative of invariant features of the scene. Thus, when generating the two-dimensional pseudo image, the semantic space features are modified based on different dynamic weights corresponding to the semantic tags to obtain modified semantic space features corresponding to the cell regions. The corrected semantic space feature corresponding to the unit region is the pixel value of the unit region. Wherein, semantic space features Q are based on different dynamic weights corresponding to different semantic tags _ij The correction may be performed in the following manner:

P _ij ＝(1+λ)Q _ij (8)

in the above formula (8), P _ij And (3) representing corrected semantic space features subjected to semantic weighted correction, wherein i and j respectively represent row coordinates and column coordinates of the semantic space features in the corresponding semantic space enhanced pseudo image, and lambda is dynamic weight corresponding to the semantic tag. For example, if the semantic tag of the point cloud cluster represents a static object, λ in the above formula (8) is 0.3; if the semantic label of the point cloud cluster represents a dynamic object, lambda in the above formula (8) is-0.3.

It can be seen that, in some embodiments, compared with the existing lidar scene recognition algorithm, in the dynamic scene and the long-distance scene, the semantic space features are corrected based on different weights corresponding to the semantic tags, so as to obtain corrected semantic space features corresponding to the unit regions, the corrected semantic space features corresponding to the unit regions are used as pixel values of the unit regions in the corresponding pseudo images, and subsequent loop detection and the like are performed by using the corrected semantic space features, so that the subsequent detection can obtain more robust scene recognition results, and has higher recognition accuracy and dynamic target interference resistance.

It should be noted that, since the generation of the semantic space enhanced pseudo image based on the corrected semantic space enhanced feature in the step S260b is common knowledge in the art, a specific process of generating the semantic space enhanced pseudo image is not described here.

In some embodiments, please refer to fig. 2, S300: the subsequent loop detection based on the semantic space enhancement pseudo image comprises the following steps:

s310a: taking the semantic space enhancement pseudo image as a first descriptor;

s320a: and carrying out subsequent loop detection based on the first descriptor.

Subsequent loop-back detection based on the first descriptor includes:

constructing a KD-Tree based on the first descriptor to generate a historical map set, and determining a key frame in a historical frame corresponding to the historical map set according to a time interval and a similarity value;

searching and matching a current frame in the historical map set, and judging that loop-back is detected if the similarity value of the current frame and the key frame is larger than a preset threshold value;

and if the similarity value of the current frame and the key frame is smaller than or equal to a preset threshold value, judging that the current frame does not meet loop-back, and generating a new key frame by using the KD-Tree.

In some embodiments, the semantic space enhanced pseudo image in step S310a may be a semantic space enhanced reflection intensity pseudo image or a semantic space enhanced average intensity pseudo image, etc.

It should be noted that, the subsequent loop detection based on the first descriptor in step S320a belongs to common knowledge in the art, so a detailed description of the subsequent loop detection based on the first descriptor is not repeated.

In some embodiments, please refer to fig. 3, S300: the subsequent loop detection based on the semantic space enhancement features and the semantic space enhancement pseudo-images comprises:

s310b: calculating a global descriptor of the semantic space enhancement pseudo image;

s320b: converting the global descriptor into a one-dimensional feature vector, and taking the one-dimensional feature vector as a second descriptor; and carrying out subsequent loop detection based on the second descriptor.

Subsequent loop-back detection based on the second descriptor includes:

constructing a KD-Tree based on a second descriptor to generate a historical map set, and determining a key frame in a historical frame corresponding to the historical map set according to a time interval and a similarity value;

In some embodiments, a GIST (GIST is an image global information feature that can identify and classify scenes) may be used to compute global descriptors of semantic space enhanced pseudo-images to extract global features of the semantic space enhanced pseudo-images. The calculation method of GIST is as follows: firstly, setting Gabor filters (Gabor function is a linear filter for edge extraction, is suitable for texture expression and separation, has frequency and direction expression similar to human visual system) in 4 scales and 8 directions, and filtering images to obtain 32 filtered images; then dividing the filtered image into 4*4 areas, and calculating the average value of pixels in each area; finally, feature vectors composed of 4×8×4=512 area mean values, i.e., 512 feature descriptors, are obtained. The calculation formula of GIST features can take the following form:

Gist(i，j)＝cat(P(i，j)·g _mn (i，j)) (9)

wherein P (i, j) in the above formula (9) represents the pixel value of the unit region of the ith row and jth column in the semantic space-enhanced pseudo image, for example, P (i, j) may be the corrected semantic space feature P in the above formula (9) _ij ；g _mn (i, j) is a two-dimensional Gabor function; cat denotes a cascading operation.

It should be noted that, those skilled in the art may also use other methods (such as VFH (viewpoint feature histogram), PFH (point feature histogram), and FPFH (fast point feature histogram)) to extract the global features of the semantic space enhanced pseudo-image, and the specific method for extracting the global features is not limited herein.

It should be noted that the specific process of converting the global descriptor into a one-dimensional feature vector, using the one-dimensional feature vector as the second descriptor, and performing subsequent loop detection based on the second descriptor belongs to common general knowledge in the art.

In some embodiments, please refer to fig. 4, S300: the subsequent loop detection based on the semantic space enhancement features and the semantic space enhancement pseudo-images comprises:

s310c: calculating a global descriptor of the semantic space enhancement pseudo image;

s320c: and carrying out weighted self-adaptive fusion on the global descriptors of the semantic space enhancement features to obtain a third descriptor, and carrying out subsequent loop detection based on the third descriptor.

In some embodiments, the global angle of the semantic space enhancement pseudo-image may be modified prior to extracting the global features of the semantic space enhancement pseudo-image in order to ensure that the semantic space enhancement pseudo-image has a globally consistent point cloud angle. Thus, referring to FIG. 9, the global descriptor of the semantic space enhanced pseudo-image may be calculated as follows:

S311c: the method comprises the steps of fusing the self-vehicle global pose estimated by a laser radar odometer, a GPS (Global Positioning System, namely a global positioning system) and an IMU (Inertial Measurement Unit, namely an inertial measurement unit) through factor graph optimization to estimate the rotation angle (namely a global correction angle) of the point cloud;

s312c: unifying the semantic space enhanced pseudo images to a coordinate system at the initial moment based on the global correction angle, namely, shifting the pseudo images according to columns;

s313c: global features of the pseudo-image are enhanced by using GIST to extract semantic space.

The data input sources of the factor graph optimization are the global pose of the vehicle estimated by the GPS, the IMU and the laser odometer respectively, and then the data of the various sensors (namely the global pose of the vehicle) are fused through the factor graph optimization so as to obtain the optimized global pose of the vehicle. The optimized global pose of the vehicle comprises a global yaw angle, namely a global correction angle, optimized at the current moment. Then, based on the global correction angle, yaw angle correction is performed on the semantic space enhanced pseudo image. For example, an image correction amount of the pseudo image displacement may be calculated based on the angular resolution used to generate the aforementioned pseudo image, and then the image correction amount may be used to unify the reference angles of the semantic space enhanced pseudo image into the same coordinate system, so as to ensure rotational invariance of the point cloud features.

In the above-described pseudo-image, since there is an angular difference between scenes, an error that appears in the above-described pseudo-image is a per-column displacement of the image. This is because the vehicle coordinate system is rotated, and therefore, the vehicle coordinate system needs to be corrected to be the same as the coordinate system at the initial time, and thus, the optimized global yaw angle needs to be obtained. Then, the rotation of the vehicle coordinate system is corrected by the global yaw angle, and the column displacement amount of the image (i.e., the image correction amount described above) is corrected. The process of correcting the global angle of the pseudo image belongs to common knowledge in the art, so a detailed description of the process of correcting the global angle is omitted here.

Because the time consumption of the whole fusion process is long when the global descriptors corresponding to the feature modes such as semantic space enhancement features are fused on a pixel level, and the accuracy of subsequent loop detection and the like is not remarkably improved by the global descriptors obtained through the fusion of the pixel levels, in some embodiments, the global descriptors corresponding to the feature modes (such as various semantic space enhancement features) can be fused on a feature level, so that the accuracy of the subsequent loop detection and the like can be considered while the third descriptors obtained through the fusion of the feature levels have higher real-time performance.

In some embodiments, please refer to fig. 10, for the one-dimensional global descriptors of each feature mode (for example, the one-dimensional global descriptor SSE-HSC-GIST of the semantic space-enhanced average height feature acquired in step S313a and the one-dimensional global descriptor SSE-ISC-GIST of the semantic space-enhanced average reflection intensity feature acquired in step S313 b), the fused descriptors (SSE-MMSC) may be obtained by adaptive fusion using the Softmax function in step S320c, and the fused descriptors are used as the third descriptors. The adaptive fusion may take the form:

wherein G in the above formula (10) _i For a one-dimensional global descriptor corresponding to a single mode (e.gGlobal descriptor SSE-HSC-GIST for semantic space enhanced height features and global descriptor SSE-ISC-GIST for semantic space enhanced reflection intensity features; g _k Each value being the one-dimensional vector (i.e., the one-dimensional global descriptor); and F is a third descriptor obtained through adaptive fusion.

Softmax is also known as a normalized exponential function. The method is popularization of a classification function sigmoid on multiple classifications, and aims to display the multi-classification result in a probability mode.

Subsequent loop-back detection based on the third descriptor includes:

Constructing a KD-Tree based on a third descriptor to generate a historical map set, and determining a key frame in a historical frame corresponding to the historical map set according to a time interval and a similarity value;

It should be noted that, constructing KD-Tree based on the first descriptor or the second descriptor or the third descriptor to generate the historical atlas belongs to common general knowledge in the art, and will not be described in detail.

In some embodiments, map key frames are selectively generated in all historical frames corresponding to the historical atlas according to time intervals and similarity values. For example, map key frames may be generated based on similarity values from frame to frame in historical frames. Map key frames are generated when the similarity value between frames in the historical frames is less than a certain threshold.

In some embodiments, the similarity value between frames in the historical frames may be calculated as follows:

Wherein, in the above formula (11), P represents a current frame, and P' represents a key frame; lat in the above formula (13) represents latitude, lng represents longitude, and r represents earth radius; sigma in the above formula (14) _DOP Represents GPS positioning precision factors, omega is GPS confidence weight, sim _F For similarity values calculated with features.

It can be seen that, compared to other loop detection algorithms, some embodiments introduce spatial topology information (i.e., the aforementioned target graphics and semantic spatial features derived based on the target graphics) to describe a scene, so that description and recognition between different scenes are more distinctive, while also enhancing the similarity between the same scenes. Then, the reference point cloud registration and the use method in the visual loop calculate the area or volume of the completed target graph consisting of the target centroid, the nearest centroid and the next nearest centroid, and calculate the semantic space characteristics based on the area or volume of the target graph. And then, superposing/fusing the semantic space features with other features (such as the height features or the reflection intensity features of the point cloud) to obtain semantic space enhancement features, so that the semantic space enhancement features can be used for describing the global features of the point cloud and retaining the local features of the same class. In addition, semantic constraint is introduced into semantic space enhancement features obtained through semantic segmentation and clustering, different weights are set on the basis of semantic labels, semantic space features are corrected, corrected semantic space features are obtained, further subsequent detection is carried out on the basis of the corrected semantic space features, and therefore the features of a dynamic target are weakened, interference of the dynamic target on scene recognition is reduced, and robustness of subsequent loop detection in a complex scene is improved.

It should be noted that, in the description of the scene, the dynamic object is not a key description of the scene, and the static object is a key description of the scene, so that the features of the static object need to be highlighted, and the features of the dynamic object are weakened.

It can be seen that, unlike the scene recognition method in the prior art that only uses one modal feature to describe the current scene, in some embodiments, a multi-modal fusion algorithm is designed, that is, a third descriptor is obtained by performing weighted adaptive fusion on global descriptors of a plurality of semantic space enhancement features, and subsequent loop detection is performed based on the third descriptor, so as to enhance the scene description feature by using the third descriptor fused with a plurality of different modal global feature descriptors. In the multi-mode fusion process, the adopted fusion algorithm can highlight the mode characteristics with larger characteristics in the current scene, so that the subsequent loop detection process can adaptively adjust the mode fusion weight according to the characteristics of different scenes, and finally, the technical effect of distinguishing different scenes more accurately is achieved. In addition, unlike some scene recognition methods using two-dimensional descriptors in the prior art, in some embodiments, one-dimensional multi-modal fusion descriptors (i.e., third descriptors) are used, so that the calculation amount in the matching of key frames is reduced, and therefore, the scene recognition method of the application can be more suitable for large-scale automatic driving scenes and the like.

It can be seen that, compared with some two-stage loop detection methods in the prior art, which require coarse matching and then fine matching, in some embodiments of the present application, single-stage loop detection is used, and under the correction of global consistent angle in the motion process, the closed loop frame description and similarity comparison are unified with the searched features, so that a great amount of time loss caused by violent search in the two-stage fine matching is avoided.

It can be seen that in some embodiments, global coordinates of GPS (i.e., longitude and latitude in equation (13) above) are introduced into the similarity measure of the closed-loop keyframes, and the screening capability of the similarity measure can be further enhanced by a priori limitation on the true distance, so that the occurrence of false detection is significantly reduced.

Compared with the prior art that only one type of feature description of the point cloud is used for loop detection, in some embodiments of the present application, features (such as semantic space features and various other features) of various point clouds are fused, and a feature fusion algorithm is designed to adaptively fuse multi-mode features obtained by feature extraction to obtain a third descriptor, so that a loop detection process based on the third descriptor can be well adapted to various outdoor scenes.

The above are some descriptions of the automatic driving scene recognition method fusing semantic point clouds. A computer-readable storage medium is also disclosed in some embodiments of the present application. The computer-readable storage medium described above includes a program. The above-described program can be executed by a processor to implement the automatic driving scene recognition method as any of the embodiments herein.

Those skilled in the art will appreciate that all or part of the functions of the various methods in the above embodiments may be implemented by hardware, or may be implemented by a computer program. When all or part of the functions in the above embodiments are implemented by means of a computer program, the program may be stored in a computer readable storage medium, and the storage medium may include: read-only memory, random access memory, magnetic disk, optical disk, hard disk, etc., and the program is executed by a computer to realize the above-mentioned functions. For example, the program is stored in the memory of the device, and when the program in the memory is executed by the processor, all or part of the functions described above can be realized. In addition, when all or part of the functions in the above embodiments are implemented by means of a computer program, the program may be stored in a storage medium such as a server, another computer, a magnetic disk, an optical disk, a flash disk, or a removable hard disk, and the program in the above embodiments may be implemented by downloading or copying the program into a memory of a local device or updating a version of a system of the local device, and when the program in the memory is executed by a processor.

The foregoing description of the invention has been presented for purposes of illustration and description, and is not intended to be limiting. Several simple deductions, modifications or substitutions may also be made by a person skilled in the art to which the invention pertains, based on the idea of the invention.

Claims

1. An automatic driving scene recognition method integrating semantic point clouds is characterized by comprising the following steps:

acquiring an original point cloud, generating a semantic space point cloud based on the original point cloud, and calculating semantic space characteristics of the semantic space point cloud;

generating semantic space enhancement features based on the semantic space features and other features of the original point cloud, and generating semantic space enhancement pseudo images based on the semantic space enhancement features;

and performing loop detection based on the semantic space enhanced pseudo image.

2. The automatic driving scene recognition method according to claim 1, characterized in that: generating a semantic space point cloud based on the original point cloud, and calculating semantic space characteristics of the semantic space point cloud, wherein the semantic space characteristics comprise:

3. The automatic driving scene recognition method according to claim 2, characterized in that: the determining the area or volume of the target graph and the target graph based on the target point cloud cluster and the k original adjacent point cloud clusters, and determining the semantic space characteristics of the target point cloud cluster based on the area or volume of the target graph includes:

determining a target centroid of the target point cloud cluster and neighbor centroids of the k original neighbor point cloud clusters;

4. The automatic driving scene recognition method according to claim 2, characterized in that: the determining the area or volume of the target graph and the target graph based on the target point cloud cluster and the k original adjacent point cloud clusters, and determining the semantic space characteristics of the target point cloud cluster based on the area or volume of the target graph includes:

determining a nearest neighbor centroid and (k-1) secondary nearest neighbor centroids from the neighbor centroids of the k original nearest neighbor cloud clusters;

determining a plurality of target graphs based on a target centroid, a nearest neighbor centroid and a next nearest neighbor centroid of the target point cloud cluster;

calculating (k-1) distances between the secondary neighbor centroids and the target centroids, and an area or volume of the target graph, respectively;

and weighting areas or volumes of a plurality of target graphs based on the distances corresponding to the target graphs, and taking the weighted areas or weighted volumes obtained by weighting as semantic space features of the target point cloud cluster.

5. The automatic driving scene recognition method according to any one of claims 1 to 4, characterized in that:

Generating semantic space enhancement features based on the semantic space features and other features of the original point cloud, generating semantic space enhancement pseudo-images based on the semantic space enhancement features, comprising:

acquiring the original point cloud or semantic space point cloud;

setting a radius of an interested region, and carrying out space division on point clouds in the radius of the interested region according to polar coordinates on a top view to obtain a plurality of unit regions;

assigning and normalizing the unit area by utilizing other characteristics of the point cloud corresponding to the unit area;

superposing semantic space features corresponding to the unit areas with other features corresponding to the unit areas respectively and normalizing again to obtain semantic space enhancement features corresponding to the unit areas;

and generating a semantic space enhancement pseudo image based on the semantic space enhancement features corresponding to the unit areas.

6. The automatic driving scene recognition method according to any one of claims 1 to 4, characterized in that: generating semantic space enhancement features based on the semantic space features and other features of the original point cloud, generating semantic space enhancement pseudo-images based on the semantic space enhancement features, comprising:

Acquiring the original point cloud or semantic space point cloud;

correcting semantic space features corresponding to the unit area according to semantic tags of the target point cloud clusters corresponding to the unit area so as to obtain corrected semantic space features corresponding to the unit area;

superposing the corrected semantic space features corresponding to the unit areas with other features corresponding to the unit areas and normalizing again to obtain corrected semantic space enhancement features corresponding to the unit areas;

and generating a semantic space enhancement pseudo image based on the corrected semantic space enhancement features corresponding to the unit areas.

7. The automatic driving scene recognition method according to claim 1, characterized in that: performing loop detection based on the semantic space enhanced pseudo image includes:

taking the semantic space enhancement pseudo image as a first descriptor;

calculating the similarity value of the current frame and the key frame, and judging that a loop frame is detected if the similarity value of the current frame and the key frame is larger than a preset threshold value; and if the similarity value of the current frame and the key frame is smaller than or equal to a preset threshold value, judging that the current frame does not meet loop-back, and generating a new key frame by using the KD-Tree.

8. The automatic driving scene recognition method according to claim 1, characterized in that: performing loop detection based on the semantic space enhanced pseudo image includes:

calculating a global descriptor of the semantic space enhancement pseudo image;

converting the global descriptor into a one-dimensional feature vector, and taking the one-dimensional feature vector as a second descriptor;

constructing a KD-Tree based on the second descriptor to generate a historical map set, and determining a key frame in a historical frame corresponding to the historical map set according to a time interval and a similarity value;

9. The automatic driving scene recognition method according to claim 1, characterized in that: performing loop detection based on the semantic space enhanced pseudo image includes:

calculating a global descriptor of the semantic space enhancement pseudo image;

performing weighted self-adaptive fusion on the global descriptors of the semantic space enhancement pseudo images to obtain third descriptors;

constructing a KD-Tree based on the third descriptor to generate a historical map set, and determining a key frame in a historical frame corresponding to the historical map set according to a time interval and a similarity value;

10. A computer-readable storage medium, characterized by comprising a program executable by a processor to implement the automatic driving scene recognition method according to any one of claims 1 to 9.