WO2020258297A1

WO2020258297A1 - Image semantic segmentation method, movable platform, and storage medium

Info

Publication number: WO2020258297A1
Application number: PCT/CN2019/093865
Authority: WO
Inventors: 王涛; 李鑫超; 李思晋
Original assignee: 深圳市大疆创新科技有限公司
Priority date: 2019-06-28
Filing date: 2019-06-28
Publication date: 2020-12-30
Also published as: CN111742344A

Abstract

Embodiments of the present invention provide an image semantic segmentation method, a movable platform, and a storage medium. The method comprises: acquiring point cloud data collected by a lidar and an image captured by a camera, wherein the point cloud data corresponds to the image; acquiring a depth map according to the point cloud data; acquiring a fused image of the depth map and the image; identifying scene category information of respective image blocks in the fused image; and marking the scene category information of the respective image blocks on the fused image. The depth map acquired by means of the point cloud data of the lidar is fused with the image captured by the camera, such that various targets can be accurately separated from each other during semantic image segmentation, thus improving semantic segmentation in images having complex background information and images including blocked or overlapping targets and similar textures, and improving the accuracy of image semantic segmentation.

Description

图像语义分割方法、可移动平台及存储介质Image semantic segmentation method, movable platform and storage medium

技术领域Technical field

本发明实施例涉及图像处理领域，尤其涉及一种图像语义分割方法、可移动平台及存储介质。The embodiment of the present invention relates to the field of image processing, in particular to an image semantic segmentation method, a movable platform and a storage medium.

背景技术Background technique

图像语义分割是计算机视觉中的基本任务，在语义分割中需要将图像分为不同的语义可解释类别，也即识别图像中存在的内容及位置信息，被广泛应用到自动驾驶、医疗影像分析、机器人等领域。Image semantic segmentation is a basic task in computer vision. In semantic segmentation, images need to be divided into different semantically interpretable categories, that is, to identify the content and location information in the image. It is widely used in autonomous driving, medical image analysis, and Robotics and other fields.

现有技术中通常采用相机拍摄图像后输入到神经网络模型中，由神经网络模型对图像进行语义分割。然而，现有技术的图像语义分割方法对于一些场景的图像无法做到准确的语义分割，例如车、飞机场景等背景信息复杂的图像，现有技术很难将各种目标都分离出来；再如存在遮挡或重叠、且纹理相近的图像，现有技术也很难进行区分。In the prior art, an image is usually taken by a camera and then input into a neural network model, and the neural network model performs semantic segmentation on the image. However, the image semantic segmentation method of the prior art cannot achieve accurate semantic segmentation for images of some scenes, such as images with complex background information such as scenes of cars and airplanes. It is difficult for the prior art to separate various targets; There are occluded or overlapping images with similar textures, and it is difficult to distinguish them in the prior art.

发明内容Summary of the invention

本发明实施例提供一种图像语义分割方法、可移动平台及存储介质，以提高图像语义分割的准确性。The embodiment of the present invention provides an image semantic segmentation method, a movable platform and a storage medium, so as to improve the accuracy of image semantic segmentation.

本发明实施例的第一方面是提供一种图像语义分割方法，包括：The first aspect of the embodiments of the present invention is to provide an image semantic segmentation method, including:

获取激光雷达采集的点云数据、以及相机拍摄的图像，其中所述点云数据与所述图像相对应；Acquiring point cloud data collected by lidar and an image taken by a camera, where the point cloud data corresponds to the image;

根据所述点云数据获取深度图；Obtaining a depth map according to the point cloud data;

获取所述深度图与所述图像的融合图像；Acquiring a fusion image of the depth map and the image;

识别所述融合图像中各图像块的场景类别信息；Identifying scene category information of each image block in the fused image;

将所述各图像块的场景类别信息标注到所述融合图像中。The scene category information of each image block is marked into the fusion image.

本发明实施例的第二方面是提供一种可移动平台，包括：激光雷达、相机、存储器和处理器；The second aspect of the embodiments of the present invention is to provide a movable platform, including: a lidar, a camera, a memory, and a processor;

所述存储器用于存储程序代码；The memory is used to store program codes;

所述处理器调用所述程序代码，当程序代码被执行时，用于执行以下操作：The processor calls the program code, and when the program code is executed, it is used to perform the following operations:

本发明实施例的第三方面是提供一种计算机可读存储介质，其上存储有计算机程序，所述计算机程序被处理器执行以实现如第一方面所述的方法。A third aspect of the embodiments of the present invention is to provide a computer-readable storage medium on which a computer program is stored, and the computer program is executed by a processor to implement the method as described in the first aspect.

本发明实施例提供的图像语义分割方法、可移动平台及存储介质，通过获取激光雷达采集的点云数据、以及相机拍摄的图像，其中所述点云数据与所述图像相对应；根据所述点云数据获取深度图；获取所述深度图与所述图像的融合图像；识别所述融合图像中各图像块的场景类别信息；将所述各图像块的场景类别信息标注到所述融合图像中。通过由激光雷达点云数据获取的深度图与相机拍摄的图像进行融合，可以在进行图像语义分割时将各种目标准确的分离出来，对于背景信息复杂的图像、以及对于存在遮挡或重叠、且纹理相近的图像的语义分割具有较好效果，可以提高图像语义分割的准确性。According to the image semantic segmentation method, the movable platform and the storage medium provided by the embodiments of the present invention, the point cloud data collected by the lidar and the image taken by the camera are acquired, wherein the point cloud data corresponds to the image; Obtaining a depth map from point cloud data; acquiring a fusion image of the depth map and the image; identifying the scene category information of each image block in the fusion image; marking the scene category information of each image block to the fusion image in. Through the fusion of the depth map obtained from the lidar point cloud data with the image taken by the camera, various targets can be accurately separated during image semantic segmentation. For images with complex background information, and for images with occlusion or overlap, and Semantic segmentation of images with similar textures has better results and can improve the accuracy of image semantic segmentation.

附图说明Description of the drawings

为了更清楚地说明本发明实施例中的技术方案，下面将对实施例描述中所需要使用的附图作一简单地介绍，显而易见地，下面描述中的附图是本发明的一些实施例，对于本领域普通技术人员来讲，在不付出创造性劳动性的前提下，还可以根据这些附图获得其他的附图。In order to explain the technical solutions in the embodiments of the present invention more clearly, the following will briefly introduce the drawings used in the description of the embodiments. Obviously, the drawings in the following description are some embodiments of the present invention. For those of ordinary skill in the art, other drawings can be obtained based on these drawings without creative labor.

图1为本发明实施例提供的图像语义分割方法的流程图；FIG. 1 is a flowchart of an image semantic segmentation method provided by an embodiment of the present invention;

图2a为本发明实施例提供的图像语义分割方法应用场景示意图；2a is a schematic diagram of an application scenario of an image semantic segmentation method provided by an embodiment of the present invention;

图2b为图2a场景的深度图和图像的示意图；Figure 2b is a schematic diagram of the depth map and image of the scene in Figure 2a;

图3a为本发明另一实施例提供的图像语义分割方法应用场景示意图；3a is a schematic diagram of an application scenario of an image semantic segmentation method provided by another embodiment of the present invention;

图3b为图3a场景的深度图和图像的示意图；Figure 3b is a schematic diagram of the depth map and image of the scene in Figure 3a;

图4a为本发明另一实施例提供的图像语义分割方法应用场景示意图；4a is a schematic diagram of an application scenario of an image semantic segmentation method provided by another embodiment of the present invention;

图4b为图4a场景的深度图和图像的示意图；Figure 4b is a schematic diagram of the depth map and image of the scene in Figure 4a;

图5为本发明另一实施例提供的图像语义分割方法的流程图；5 is a flowchart of an image semantic segmentation method provided by another embodiment of the present invention;

图6为本发明另一实施例提供的图像语义分割方法的流程图；6 is a flowchart of an image semantic segmentation method provided by another embodiment of the present invention;

图7为本发明另一实施例提供的图像语义分割方法的流程图；FIG. 7 is a flowchart of an image semantic segmentation method provided by another embodiment of the present invention;

图8为本发明实施例提供的可移动平台的结构图。Fig. 8 is a structural diagram of a movable platform provided by an embodiment of the present invention.

具体实施方式Detailed ways

下面将结合本发明实施例中的附图，对本发明实施例中的技术方案进行清楚地描述，显然，所描述的实施例仅仅是本发明一部分实施例，而不是全部的实施例。基于本发明中的实施例，本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例，都属于本发明保护的范围。The technical solutions in the embodiments of the present invention will be clearly described below in conjunction with the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are only a part of the embodiments of the present invention, rather than all of the embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those of ordinary skill in the art without creative work shall fall within the protection scope of the present invention.

需要说明的是，当组件被称为“固定于”另一个组件，它可以直接在另一个组件上或者也可以存在居中的组件。当一个组件被认为是“连接”另一个组件，它可以是直接连接到另一个组件或者可能同时存在居中组件。It should be noted that when a component is said to be "fixed to" another component, it can be directly on the other component or a central component may also exist. When a component is considered to be "connected" to another component, it can be directly connected to another component or there may be a centered component at the same time.

除非另有定义，本文所使用的所有的技术和科学术语与属于本发明的技术领域的技术人员通常理解的含义相同。本文中在本发明的说明书中所使用的术语只是为了描述具体的实施例的目的，不是旨在于限制本发明。本文所使用的术语“及/或”包括一个或多个相关的所列项目的任意的和所有的组合。Unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by those skilled in the technical field of the present invention. The terms used in the description of the present invention herein are only for the purpose of describing specific embodiments, and are not intended to limit the present invention. The term "and/or" as used herein includes any and all combinations of one or more related listed items.

下面结合附图，对本发明的一些实施方式作详细说明。在不冲突的情况下，下述的实施例及实施例中的特征可以相互组合。Hereinafter, some embodiments of the present invention will be described in detail with reference to the accompanying drawings. In the case of no conflict, the following embodiments and features in the embodiments can be combined with each other.

本发明实施例提供一种图像语义分割方法。图1为本发明实施例提供的图像语义分割方法的流程图。如图1所示，本实施例中的图像语义分割方法，可以包括：The embodiment of the present invention provides an image semantic segmentation method. FIG. 1 is a flowchart of an image semantic segmentation method provided by an embodiment of the present invention. As shown in Figure 1, the image semantic segmentation method in this embodiment may include:

步骤S101、获取激光雷达采集的点云数据、以及相机拍摄的图像，其中所述点云数据与所述图像相对应。Step S101: Obtain the point cloud data collected by the lidar and the image taken by the camera, where the point cloud data corresponds to the image.

在本实施例中，在车辆、无人机、机器人等可移动平台上设置激光雷达和相机，激光雷达能够采集点云数据，相机能够拍摄图像(例如RGB图像)，其中点云数据与图像相对应，点云数据投影到图像中时能够与图像重合，也即激光雷达和相机能够对同一目标区域进行扫描，激光雷达的视线方向与相机的拍摄方向相同或接近，可通过将激光雷达与相机进行联合标定来实现。也即，在S101获取激光雷达采集的点云数据、以及相机拍摄的图像前，对激光雷达和相机进行标定，以使所述激光雷达采集的点云数据与所述相机拍摄的图像相对应。当然，也可在获取到点云数据和图像后通过旋转平移处理后使点云数据与图像相对应。In this embodiment, a lidar and a camera are set on a mobile platform such as vehicles, drones, and robots. The lidar can collect point cloud data, and the camera can take images (such as RGB images), where the point cloud data is similar to the image. Correspondingly, the point cloud data can be overlapped with the image when projected into the image, that is, the lidar and the camera can scan the same target area, and the line of sight of the lidar is the same or close to the shooting direction of the camera. Carry out joint calibration to achieve. That is, before acquiring the point cloud data collected by the lidar and the image taken by the camera in S101, the lidar and the camera are calibrated so that the point cloud data collected by the lidar corresponds to the image taken by the camera. Of course, it is also possible to make the point cloud data correspond to the image after the point cloud data and the image are obtained through rotation and translation processing.

本实施例中，激光雷达和相机可以固定设置在可移动平台上，也可通过旋转部件设置在可移动平台上以实现一定范围内的点云数据和图像的采集。In this embodiment, the lidar and the camera may be fixedly arranged on a movable platform, or may be arranged on the movable platform through a rotating part to achieve point cloud data and image collection within a certain range.

步骤S102、根据所述点云数据获取深度图。Step S102: Obtain a depth map according to the point cloud data.

在本实施例中，在获取到点云数据后，可根据点云数据获取深度图，其中深度图的每一个像素值代表的是物体到激光雷达的距离，而点云数据中具有物体的方位、距离信息，根据点云数据中物体的方位、距离信息即可得到深度图。本实施例中可通过现有算法将点云数据转化为深度图，也可通过预先训练的卷积神经网络模型实现由点云数据转化为深度图。In this embodiment, after the point cloud data is obtained, a depth map can be obtained based on the point cloud data, where each pixel value of the depth map represents the distance from the object to the lidar, and the point cloud data has the position of the object , Distance information, the depth map can be obtained according to the position and distance information of the object in the point cloud data. In this embodiment, the point cloud data can be converted into a depth map through an existing algorithm, and the point cloud data can be converted into a depth map through a pre-trained convolutional neural network model.

步骤S103、获取所述深度图与所述图像的融合图像。Step S103: Obtain a fusion image of the depth map and the image.

在本实施例中，在获取到深度图后，可将深度图和图像进行融合，从而可以得到所述深度图与所述图像的融合图像，具体的，由于点云数据与图像相对应，因此可直接将深度图投影到图像中，形成所述融合图像。本实施例中具体可通过分别对深度图和图像提取特征点，并通过比对获取深度图和图像中相互对应的特征点，基于相互对应的特征点，通过旋转、平移、裁剪等操作，将深度图的坐标系转换到图像的坐标系中，进而实现将深度图投影到图像坐标系，使深度图的各像素与图像对齐，然后再与图像进行数据级别的融合，从而得到融合图像。In this embodiment, after the depth map is acquired, the depth map and the image can be fused, so that a fused image of the depth map and the image can be obtained. Specifically, since the point cloud data corresponds to the image, The depth map can be directly projected into the image to form the fused image. In this embodiment, the feature points can be extracted separately from the depth map and the image, and the corresponding feature points in the depth map and the image can be obtained through comparison. Based on the corresponding feature points, through operations such as rotation, translation, and cropping, The coordinate system of the depth map is converted to the coordinate system of the image, and then the depth map is projected to the image coordinate system, so that the pixels of the depth map are aligned with the image, and then data-level fusion with the image is performed to obtain the fused image.

此外，本实施例中也可首先将点云数据与图像进行对齐，例如通过基于灰度区域的配准方法、基于特征的配准方法、结合线特征和点特征的配准方法等实现点云数据与图像的配准，再根据点云数据获取深度图，此时获取到的深度图即与图像对齐，然后可直接将深度图与图像进行融合，从而得到融合图像。In addition, in this embodiment, the point cloud data can be aligned with the image first. For example, the point cloud can be realized by a gray-scale region-based registration method, a feature-based registration method, and a registration method combining line features and point features. The data and the image are registered, and then the depth map is obtained according to the point cloud data. At this time, the obtained depth map is aligned with the image, and then the depth map and the image can be directly fused to obtain a fused image.

步骤S104、识别所述融合图像中各图像块的场景类别信息。Step S104: Identify scene category information of each image block in the fused image.

在本实施例中，可预先定义多种预设场景类别，通过对融合图像中任一图像块进行场景类别的识别，判断该图像块属于哪一预设场景类别，进而可获取该图像块的场景类别信息。更具体的，可获取该图像块属于多种预设场景类别中每一预设场景类别的概率，将其中最大概率对应的预设场景类别作为该图像块的场景类别信息。In this embodiment, multiple preset scene categories can be defined in advance, and the scene category of any image block in the fused image is identified to determine which preset scene category the image block belongs to, and then the image block’s information can be obtained. Scene category information. More specifically, the probability that the image block belongs to each preset scene category among multiple preset scene categories can be obtained, and the preset scene category corresponding to the largest probability among them is used as the scene category information of the image block.

本实施例中，预设场景类别可以包括例如车、天空、道路、静态障碍物、动态障碍物等类别；当然也可包括更细化的场景类别，例如车具体可分为轿车、卡车、公交、火车、房车等，静态障碍物具体可分为建筑物、墙、护栏、电线杆、交通灯、交通标志等，动态障碍物可包括行人、自行车、摩托车等。本实施例中可将上述车、天空、道路、静态障碍物、动态障碍物等类别作为一级场景类别，将更细化的场景类别作为二级场景类别，也即，本实施例所述的多个预设场景类别可包括至少一种一级场景类别以及至少一种二级场景类别，其中任意所述一级场景类别以至少一个所述二级场景类别作为子类别。In this embodiment, the preset scene categories can include categories such as cars, sky, roads, static obstacles, dynamic obstacles, etc.; of course, they can also include more detailed scene categories, for example, cars can be specifically divided into cars, trucks, and buses. , Trains, RVs, etc. Static obstacles can be specifically divided into buildings, walls, guardrails, telephone poles, traffic lights, traffic signs, etc. Dynamic obstacles can include pedestrians, bicycles, motorcycles, etc. In this embodiment, the above-mentioned categories of cars, sky, roads, static obstacles, dynamic obstacles, etc. can be regarded as the first-level scene categories, and the more detailed scene categories can be regarded as the second-level scene categories, that is, as described in this embodiment The multiple preset scene categories may include at least one first-level scene category and at least one second-level scene category, wherein any of the first-level scene categories has at least one of the second-level scene categories as subcategories.

进一步的，本实施例中对于图像块进行场景类别信息的识别可以采用卷积神经网络来实现，当然也可采用其他方式进行识别，此处不再赘述。Further, the recognition of the scene category information for the image block in this embodiment can be implemented by using a convolutional neural network, of course, other methods can also be used for the recognition, which will not be repeated here.

在另一实施例中，也可首先将深度图和图像对齐，然后根据深度图和图像的对应关系，识别图像中各图像块的场景类别信息，然后再将深度图和图像融合获得融合图像，将各图像块的场景类别信息标注到所述融合图像中。In another embodiment, the depth map and the image may be aligned first, and then according to the corresponding relationship between the depth map and the image, the scene category information of each image block in the image is identified, and then the depth map and the image are fused to obtain a fused image. The scene category information of each image block is marked into the fused image.

步骤S105、将所述各图像块的场景类别信息标注到所述融合图像中。Step S105: Mark the scene category information of each image block into the fusion image.

在本实施例中，在识别出融合图像中各图像块的场景类别信息后可将场景类别信息标注到所述融合图像中，生成语义地图(semantic map)，从而完成图像语义分割。本实施例中对于场景类别信息的标注可仅标注场景类别对应的标识，或者对图像块按照场景类别信息以对应的颜色进行标识。In this embodiment, after identifying the scene category information of each image block in the fused image, the scene category information can be marked in the fused image to generate a semantic map, thereby completing image semantic segmentation. In this embodiment, the labeling of the scene category information may only label the identifier corresponding to the scene category, or the image block may be marked with a corresponding color according to the scene category information.

本实施例提供的图像语义分割方法，可应用于对于存在遮挡的图像进行语义分割，如图2a所示的俯视图，当车A和车B处于如图所示的位置时，搭载有激光雷达和相机的可移动平台从箭头方向的观察视角看，车A 被车B遮挡了一部分，通过激光雷达沿观察视角采集点云数据后，可得到如图2b上图所示的深度图(其中不同的填充图案表示不同的深度)，通过相机沿观察视角拍照后，可得到如图2b下图所示的图像，通过将深度图与图像融合并进行语音分割，即可分辨出前方为距离不同的两辆车，也可进一步分辨出车辆的类型(也即场景类别信息)。The image semantic segmentation method provided in this embodiment can be applied to semantic segmentation of occluded images. As shown in the top view of Figure 2a, when car A and car B are in the positions shown in the figure, they are equipped with lidar and The movable platform of the camera can be seen from the observation angle of the arrow direction, car A is partially blocked by car B. After collecting point cloud data along the observation angle by lidar, the depth map as shown in the upper figure of Figure 2b can be obtained (with different Filling patterns indicate different depths). After taking a photo along the viewing angle of the camera, the image shown in Figure 2b can be obtained. By fusing the depth map with the image and performing voice segmentation, it can be distinguished that the front is two different distances. For vehicles, the type of vehicle (that is, scene category information) can also be further distinguished.

当然本实施例也可分辨出更复杂的遮挡的情况，如图3a所示的俯视图，搭载有激光雷达和相机的可移动平台从箭头方向的观察视角看，车C遮挡了部分车B和车A，车B遮挡了部分车A，通过激光雷达沿观察视角采集点云数据后，可得到图3b上图所示的深度图，通过相机沿观察视角拍照后，可得到如图3b下图所示的图像，将深度图与图像融合并进行语音分割，即可分辨出前方不同的遮挡关系，也可进一步分辨出车辆的类型(也即场景类别信息)。Of course, this embodiment can also distinguish more complicated occlusion situations. As shown in the top view shown in Figure 3a, the movable platform equipped with lidar and camera can be seen from the perspective of the arrow direction. Car C blocks part of car B and car B. A, Car B obscures part of car A. After collecting point cloud data along the observation angle by lidar, the depth map shown in the upper figure of Fig. 3b can be obtained. After taking pictures along the observation angle by the camera, the following figure can be obtained as shown in Fig. 3b. In the image shown, the depth map and the image are fused and voice segmentation is performed to distinguish different occlusion relationships in the front, and further distinguish the type of vehicle (that is, the scene category information).

本实施例提供的图像语义分割方法，也可应用于对于存在纹理相近物体的图像进行语义分割，例如如图4a所示的俯视图，搭载有激光雷达和相机的可移动平台前方为存在转角的墙面，墙面D相对于墙面E更靠近可移动平台，且墙面D和墙面E具有相近的纹理，通过激光雷达沿观察视角采集点云数据后，可得到图4b上图所示的深度图，通过相机沿观察视角拍照后，可得到如图4b下图所示的图像，将深度图与图像融合并进行语音分割，即可分辨出墙面D与墙面E的前后关系，也可进一步分辨出场景类别信息为存在转角的墙面。The image semantic segmentation method provided by this embodiment can also be applied to semantic segmentation of images with objects with similar textures. For example, as shown in the top view of Figure 4a, the movable platform equipped with lidar and camera is in front of a wall with corners. The wall surface D is closer to the movable platform than the wall surface E, and the wall surface D and the wall surface E have similar textures. After collecting the point cloud data along the observation angle through the lidar, the image shown in the upper figure of Figure 4b can be obtained. Depth map. After taking a photo along the observation angle of the camera, the image shown in Figure 4b can be obtained. By fusing the depth map with the image and performing voice segmentation, the front and back relationship between wall D and wall E can be distinguished. It can be further distinguished that the scene category information is a wall with a corner.

本实施例提供的图像语义分割方法，通过获取激光雷达采集的点云数据、以及相机拍摄的图像，其中所述点云数据与所述图像相对应；根据所述点云数据获取深度图；获取所述深度图与所述图像的融合图像；识别所述融合图像中各图像块的场景类别信息；将所述各图像块的场景类别信息标注到所述融合图像中。通过由激光雷达点云数据获取的深度图与相机拍摄的图像进行融合，可以在进行图像语义分割时将各种目标准确的分离出来，对于背景信息复杂的图像、以及对于存在遮挡或重叠、且纹理相近的图像的语义分割具有较好效果，可以提高图像语义分割的准确性。The image semantic segmentation method provided in this embodiment obtains point cloud data collected by lidar and images taken by a camera, wherein the point cloud data corresponds to the image; obtains a depth map according to the point cloud data; A fusion image of the depth map and the image; identifying the scene category information of each image block in the fusion image; and marking the scene category information of each image block in the fusion image. Through the fusion of the depth map obtained from the lidar point cloud data with the image taken by the camera, various targets can be accurately separated during image semantic segmentation. For images with complex background information, and for images with occlusion or overlap, and Semantic segmentation of images with similar textures has better results and can improve the accuracy of image semantic segmentation.

本发明实施例提供一种图像语义分割方法。图5和图6为本发明另一实施例提供的图像语义分割方法的流程图。如图5和图6所示，在上述实施例的基础上，本实施例中的图像语义分割方法，可以包括：The embodiment of the present invention provides an image semantic segmentation method. Figures 5 and 6 are flowcharts of an image semantic segmentation method provided by another embodiment of the present invention. As shown in Figures 5 and 6, on the basis of the foregoing embodiment, the image semantic segmentation method in this embodiment may include:

步骤S201、获取激光雷达采集的点云数据、以及相机拍摄的图像，其中所述点云数据与所述图像相对应。Step S201: Obtain the point cloud data collected by the lidar and the image taken by the camera, where the point cloud data corresponds to the image.

步骤S202、根据所述点云数据获取深度图。Step S202: Obtain a depth map according to the point cloud data.

其中，步骤S201和S202可参见上述实施例中的S101和S102，此处不再赘述。本实施例中采用下述的第二模型实现根据所述点云数据获取深度图。Wherein, steps S201 and S202 can be referred to S101 and S102 in the foregoing embodiment, and details are not described herein again. In this embodiment, the following second model is used to obtain a depth map based on the point cloud data.

步骤S203、将所述深度图和所述图像输入预先训练的第一模型中，获取深度图与所述图像的融合图像。Step S203: Input the depth map and the image into a pre-trained first model, and obtain a fusion image of the depth map and the image.

步骤S204、通过第一模型获取所述融合图像中任一图像块属于多种预设场景类别中每一预设场景类别的概率，将其中最大概率对应的预设场景类别作为该图像块的场景类别信息。Step S204: Obtain the probability that any image block in the fused image belongs to each preset scene category in multiple preset scene categories through the first model, and use the preset scene category corresponding to the largest probability as the scene of the image block Category information.

在本实施例中，所述第一模型可以为卷积神经网络模型，如图7所示，所述第一模型包括多个依次连接的第一处理单元，所述第一处理单元包括卷积层、批标准化层和激活层。进一步的，第一模型中的多个依次连接的第一处理单元可通过跳跃连接(skip connection)防止梯度消散。In this embodiment, the first model may be a convolutional neural network model. As shown in FIG. 7, the first model includes a plurality of first processing units connected in sequence, and the first processing unit includes convolution Layer, batch standardization layer and activation layer. Further, the multiple sequentially connected first processing units in the first model can prevent the gradient from dissipating through skip connections.

本实施例中通过向该第一模型输入相互对应的深度图和图像后，由该第一模型首先将深度图与图像进行融合，获取深度图与图像的融合图像，然后再由第一模型对融合图像的各图像块进行场景类别识别，对于任一图像块，获取其属于多种预设场景类别中每一预设场景类别的概率，将其中最大概率对应的预设场景类别作为该图像块的场景类别信息。In this embodiment, after inputting the corresponding depth map and image to the first model, the first model first fused the depth map and the image to obtain the fused image of the depth map and the image, and then the first model Perform scene category recognition for each image block of the fused image. For any image block, obtain the probability that it belongs to each preset scene category among multiple preset scene categories, and use the preset scene category corresponding to the largest probability as the image block Scene category information.

需要说明的是，对于第一模型的训练可首先获取多个标注了预设场景类别的融合图像作为训练集和测试集，通过训练可以获取到每一种预设场景类别的图像块的特征，从而实现在输入新的融合图像时，能够分析出该新的融合图像中每一图像块属于多种预设场景类别中每一预设场景类别的概率，进而可将其中最大概率对应的预设场景类别作为该图像块的场景类别信息。具体的训练过程此处不再赘述。It should be noted that for the training of the first model, multiple fusion images labeled with preset scene categories can be obtained as the training set and the test set, and the characteristics of the image blocks of each preset scene category can be obtained through training. Therefore, when a new fusion image is input, the probability that each image block in the new fusion image belongs to each preset scene category among multiple preset scene categories can be analyzed, and the preset scene category with the largest probability can be compared. The scene category is used as the scene category information of the image block. The specific training process will not be repeated here.

在上述任一实施例的基础上，所述多个预设场景类别包括至少一种一级场景类别以及至少一种二级场景类别，其中任意所述一级场景类别以至少一个所述二级场景类别作为子类别。On the basis of any of the foregoing embodiments, the multiple preset scene categories include at least one first-level scene category and at least one second-level scene category, wherein any one of the first-level scene categories is based on at least one of the second-level scene categories. The scene category is used as a subcategory.

进一步的，步骤S204具体可包括：Further, step S204 may specifically include:

所述通过所述第一模型获取所述融合图像中任一图像块属于每一所述二级场景类别的概率，将其中最大概率对应的二级场景类别或该二级场景类别所属的一级场景类别作为该图像块的场景类别信息；或者The probability that any image block in the fusion image belongs to each of the second-level scene categories is acquired through the first model, and the second-level scene category corresponding to the largest probability or the first-level scene category to which the second-level scene category belongs The scene category is used as the scene category information of the image block; or

所述通过所述第一模型获取所述融合图像中任一图像块属于每一所述一级场景类别的概率，将其中最大概率对应的一级场景类别作为该图像块的场景类别信息。The probability that any image block in the fusion image belongs to each of the first-level scene categories is acquired through the first model, and the first-level scene category corresponding to the largest probability is used as the scene category information of the image block.

在本实施例中，由于预设场景类别包括一级场景和二级场景类别，因此在通过第一模型获取任一图像块属于多种预设场景类别中每一预设场景类别的概率时，可以以二级场景类别为衡量标准，判断该图像块属于每一种二级场景类别的概率，进而可以最大概率对应的二级场景类别作为该图像块的场景类别信息，当然也可以该二级场景类别所属的以及场景类别作为该图像块的场景类别信息，例如最大概率对应的二级场景类别为行人，则可以行人或者动态障碍物作为该图像块的场景类别信息。此外，本实施例中也可直接以一级场景类别为衡量标准，判断该图像块属于每一种一级场景类别的概率，将其中最大概率对应的一级场景类别作为该图像块的场景类别信息。In this embodiment, since the preset scene categories include primary scenes and secondary scene categories, when the probability of any image block belonging to each of the multiple preset scene categories is obtained through the first model, The second-level scene category can be used as the measurement standard to determine the probability that the image block belongs to each second-level scene category, and then the second-level scene category corresponding to the maximum probability can be used as the scene category information of the image block. The scene category belongs to and the scene category is used as the scene category information of the image block. For example, the second-level scene category corresponding to the maximum probability is pedestrian, then pedestrians or dynamic obstacles can be used as the scene category information of the image block. In addition, in this embodiment, the first-level scene category can also be directly used as the measurement standard to determine the probability that the image block belongs to each first-level scene category, and the first-level scene category corresponding to the largest probability is used as the scene category of the image block information.

步骤S205、将所述各图像块的场景类别信息标注到所述融合图像中。Step S205: Mark the scene category information of each image block into the fused image.

在本实施例中，步骤S205可参见上述实施例中的S105，此处不再赘述。In this embodiment, step S205 can be referred to S105 in the foregoing embodiment, which will not be repeated here.

在上述任一实施例的基础上，在获取所述融合图像中任一图像块属于多种预设场景类别中每一预设场景类别的概率，将其中最大概率对应的预设场景类别作为该图像块的场景类别信息后，还可包括：On the basis of any of the foregoing embodiments, in acquiring the probability that any image block in the fused image belongs to each preset scene category among multiple preset scene categories, the preset scene category corresponding to the largest probability is used as the After the scene category information of the image block, it can also include:

若所述最大概率低于预设概率阈值，则在所述融合图像中忽略该图像块的场景类别信息。If the maximum probability is lower than the preset probability threshold, the scene category information of the image block is ignored in the fused image.

在本实施例中，由于对于场景类别信息的识别可能存在一定的误差，因此可通过对上述的最大概率与预设概率阈值进行比较，来实现误识别的判定，也即当最大概率低于预设概率阈值时，则确定该图像块存在误识别的可能，而对于存在误识别的图像块，可直接忽略该图像块的场景类别信息，以避免对语义分割结果造成影响。In this embodiment, since there may be a certain error in the recognition of scene category information, the judgment of misrecognition can be achieved by comparing the above-mentioned maximum probability with the preset probability threshold, that is, when the maximum probability is lower than the preset probability threshold. When the probability threshold is set, it is determined that the image block may be misrecognized, and for the misrecognized image block, the scene category information of the image block can be directly ignored to avoid affecting the semantic segmentation result.

在上述任一实施例的基础上，所述方法还包括：Based on any of the foregoing embodiments, the method further includes:

将任一图像块的场景类别信息与其相邻的多个图像块的场景类别信息进行近似度比较，若近似度低于预设近似度阈值，则在所述融合图像中忽略该图像块的场景类别信息。The scene category information of any image block is compared with the scene category information of multiple adjacent image blocks. If the similarity is lower than the preset similarity threshold, the scene of the image block is ignored in the fused image Category information.

在本实施例中，若对于某一图像块，其场景类别信息识别结果与其相邻的多个图像块的场景类别信息存在较大差异，则说明也存在误识别的可能，也需要忽略该图像块的场景类别信息。本实施例中通过将任一图像块的场景类别信息与其相邻的多个图像块的场景类别信息进行近似度比较，当近似度存在较大差异，也即近似度低于预设近似度阈值时，判断该图像块的场景类别信息为误识别，进而忽略图像块的场景类别信息，以避免对语义分割结果造成影响。In this embodiment, if there is a big difference between the scene category information recognition result of a certain image block and the scene category information of multiple adjacent image blocks, it means that there is also the possibility of misrecognition, and the image needs to be ignored. The scene category information of the block. In this embodiment, the similarity is compared between the scene category information of any image block and the scene category information of multiple adjacent image blocks. When there is a large difference in the similarity, that is, the similarity is lower than the preset similarity threshold. At this time, it is judged that the scene category information of the image block is misrecognized, and then the scene category information of the image block is ignored to avoid affecting the semantic segmentation result.

在上述任一实施例的基础上，S102、S202所述的根据所述点云数据获取深度图，具体可包括：On the basis of any of the foregoing embodiments, the obtaining of the depth map according to the point cloud data in S102 and S202 may specifically include:

将所述点云数据输入预先训练的第二模型，生成所述深度图。The point cloud data is input into a pre-trained second model to generate the depth map.

其中，所述第二模型可以为卷积神经网络模型，如图7所示，所述第二模型包括多个依次连接的第二处理单元，所述第二处理单元包括卷积层、批标准化层和激活层。进一步的，第二模型中多个依次连接的第二处理单元可通过跳跃连接(skip connection)防止梯度消散。Wherein, the second model may be a convolutional neural network model. As shown in FIG. 7, the second model includes a plurality of second processing units connected in sequence, and the second processing unit includes a convolutional layer and batch normalization. Layer and active layer. Further, in the second model, multiple sequentially connected second processing units can prevent the gradient from dissipating through skip connections.

本实施例中通过预先训练的第二模型实现有点云数据到深度图的转化，其中第二模型的训练，可获取多组对应的点云数据和深度图作为训练集和测试集对第二模型进行训练，具体的训练过程此处不再赘述。In this embodiment, the pre-trained second model is used to transform the point cloud data into the depth map. The training of the second model can obtain multiple sets of corresponding point cloud data and depth maps as the training set and the test set for the second model For training, the specific training process will not be repeated here.

进一步的，所述将所述点云数据输入预先训练的第一模型，生成所述深度图，包括：Further, the inputting the point cloud data into a pre-trained first model to generate the depth map includes:

若所述点云数据的密度高于预设密度阈值，通过所述第二模型将所述点云数据转化为所述深度图；或者If the density of the point cloud data is higher than a preset density threshold, convert the point cloud data into the depth map through the second model; or

若所述点云数据的密度不高于所述预设密度阈值，通过所述第二模型对所述点云数据进行增密化处理，并将增密化处理后的点云数据转化为所述深度图。If the density of the point cloud data is not higher than the preset density threshold, the point cloud data is densified through the second model, and the densified point cloud data is converted into all Depth map.

在本实施例中，点云数据的密度影响了深度图中深度信息的精度，将很大程度的影响图像语义分割的准确度，因此在将点云数据转化为深度图时，需要判断点云数据的密度是否满足需求，当点云数据的密度高于预设密度阈值，说明点云数据的密度满足需求，可通过第二模型直接将点云数据转化为所述深度图；当点云数据的密度不高于预设密度阈值，则说明点云数据的密度不满足需求，需要对点云数据进行增密化处理，本实施例中可对第二模型进行训练，以使第二模型具有对点云数据增密化处理的功能，在通过第二模型将点云数据转化为深度图时先进行点云数据的增密化处理后再进行点云数据到深度图的转化。当然也可采用其他算法实现点云数据的增密化，例如基于稀疏匹配的点云增密化算法等，此处不再赘述。In this embodiment, the density of the point cloud data affects the accuracy of the depth information in the depth map, and will greatly affect the accuracy of image semantic segmentation. Therefore, when converting the point cloud data into a depth map, it is necessary to determine the point cloud Whether the density of the data meets the demand, when the density of the point cloud data is higher than the preset density threshold, indicating that the density of the point cloud data meets the demand, the point cloud data can be directly converted into the depth map through the second model; when the point cloud data If the density of is not higher than the preset density threshold, it means that the density of the point cloud data does not meet the demand, and the point cloud data needs to be densified. In this embodiment, the second model can be trained to make the second model have For the function of point cloud data densification processing, when the point cloud data is converted into a depth map through the second model, the point cloud data is first densified and then the point cloud data is converted to the depth map. Of course, other algorithms can also be used to achieve point cloud data densification, such as a point cloud densification algorithm based on sparse matching, etc., which will not be repeated here.

本发明实施例提供一种可移动平台。图8为本发明实施例提供的可移动平台的结构图，如图8所示，所述可移动平台30包括：激光雷达31、相机32、处理器33和存储器34。The embodiment of the present invention provides a movable platform. FIG. 8 is a structural diagram of a movable platform provided by an embodiment of the present invention. As shown in FIG. 8, the movable platform 30 includes: a laser radar 31, a camera 32, a processor 33 and a memory 34.

所述存储器34用于存储程序代码；The memory 34 is used to store program codes;

所述处理器33调用所述程序代码，当程序代码被执行时，用于执行以下操作：The processor 33 calls the program code, and when the program code is executed, it is used to perform the following operations:

获取激光雷达31采集的点云数据、以及相机32拍摄的图像，其中所述点云数据与所述图像相对应；Acquiring point cloud data collected by the lidar 31 and an image taken by the camera 32, where the point cloud data corresponds to the image;

在上述任一实施例的基础上，在所述处理器33获取所述深度图与所述图像的融合图像时，所述处理器33被配置为：On the basis of any of the foregoing embodiments, when the processor 33 acquires the fused image of the depth map and the image, the processor 33 is configured to:

将所述深度图和所述图像输入预先训练的第一模型中，获取深度图与所述图像的融合图像。The depth map and the image are input into a pre-trained first model, and a fusion image of the depth map and the image is obtained.

在上述任一实施例的基础上，在所述处理器33识别所述融合图像中各图像块的场景类别信息时，所述处理器33被配置为：On the basis of any of the foregoing embodiments, when the processor 33 recognizes the scene category information of each image block in the fused image, the processor 33 is configured to:

通过第一模型获取所述融合图像中任一图像块属于多种预设场景类别中每一预设场景类别的概率，将其中最大概率对应的预设场景类别作为该图像块的场景类别信息。The probability that any image block in the fusion image belongs to each preset scene category among multiple preset scene categories is acquired through the first model, and the preset scene category corresponding to the largest probability is used as the scene category information of the image block.

在上述任一实施例的基础上，在所述处理器33通过所述第一模型获取所述融合图像中任一图像块属于多种预设场景类别中每一预设场景类别的概率，将其中最大概率对应的预设场景类别作为该图像块的场景类别信息时，所述处理器33被配置为：On the basis of any of the foregoing embodiments, the processor 33 obtains the probability that any image block in the fused image belongs to each preset scene category among multiple preset scene categories through the first model, and calculates When the preset scene category corresponding to the maximum probability is used as the scene category information of the image block, the processor 33 is configured to:

在上述任一实施例的基础上，在所述处理器33将其中最大概率对应的预设场景类别作为该图像块的场景类别信息后，所述处理器33还被配置为：On the basis of any of the foregoing embodiments, after the processor 33 uses the preset scene category corresponding to the maximum probability as the scene category information of the image block, the processor 33 is further configured to:

在上述任一实施例的基础上，所述处理器33还被配置为：On the basis of any of the foregoing embodiments, the processor 33 is further configured to:

在上述任一实施例的基础上，所述第一模型为卷积神经网络模型，所述第一模型包括多个依次连接的第一处理单元，所述第一处理单元包括卷积层、批标准化层和激活层。On the basis of any of the above embodiments, the first model is a convolutional neural network model, the first model includes a plurality of first processing units connected in sequence, and the first processing unit includes a convolutional layer, a batch Standardization layer and activation layer.

在上述任一实施例的基础上，在所述处理器33根据所述点云数据获取深度图时，所述处理器33被配置为：On the basis of any of the foregoing embodiments, when the processor 33 acquires a depth map according to the point cloud data, the processor 33 is configured to:

在上述任一实施例的基础上，在所述处理器33将所述点云数据输入预先训练的第一模型，生成所述深度图时，所述处理器33被配置为：On the basis of any of the foregoing embodiments, when the processor 33 inputs the point cloud data into the pre-trained first model to generate the depth map, the processor 33 is configured to:

在上述任一实施例的基础上，所述第二模型为卷积神经网络模型，所述第二模型包括多个依次连接的第二处理单元，所述第二处理单元包括卷积层、批标准化层和激活层。On the basis of any of the foregoing embodiments, the second model is a convolutional neural network model, the second model includes a plurality of second processing units connected in sequence, and the second processing unit includes a convolutional layer, a batch Standardization layer and activation layer.

在上述任一实施例的基础上，在所述处理器33获取激光雷达31采集的点云数据、以及相机32拍摄的图像前，所述处理器33还被配置为：On the basis of any of the foregoing embodiments, before the processor 33 acquires the point cloud data collected by the lidar 31 and the image captured by the camera 32, the processor 33 is further configured to:

对所述激光雷达31和所述相机32进行标定，以使所述激光雷达31采集的点云数据与所述相机32拍摄的图像相对应。The lidar 31 and the camera 32 are calibrated so that the point cloud data collected by the lidar 31 corresponds to the image taken by the camera 32.

在上述任一实施例的基础上，所述可移动平台30包括：车辆、无人机、机器人中的至少一种。On the basis of any of the foregoing embodiments, the movable platform 30 includes at least one of a vehicle, a drone, and a robot.

本实施例提供的可移动平台30的具体原理和实现方式均与上述实施例类似，此处不再赘述。The specific principles and implementation manners of the movable platform 30 provided in this embodiment are similar to those in the foregoing embodiment, and will not be repeated here.

本实施例提供的可移动平台，获取激光雷达采集的点云数据、以及相机拍摄的图像，其中所述点云数据与所述图像相对应；根据所述点云数据获取深度图；获取所述深度图与所述图像的融合图像；识别所述融合图像中各图像块的场景类别信息；将所述各图像块的场景类别信息标注到所述融合图像中。通过由激光雷达点云数据获取的深度图与相机拍摄的图像进行融合，可以在进行图像语义分割时将各种目标准确的分离出来，对于背景信息复杂的图像、以及对于存在遮挡或重叠、且纹理相近的图像的语义分割具有较好效果，可以提高图像语义分割的准确性。The mobile platform provided in this embodiment acquires point cloud data collected by lidar and images taken by a camera, where the point cloud data corresponds to the image; the depth map is acquired according to the point cloud data; and the A fusion image of the depth map and the image; identifying the scene category information of each image block in the fusion image; marking the scene category information of each image block into the fusion image. Through the fusion of the depth map obtained from the lidar point cloud data with the image taken by the camera, various targets can be accurately separated during image semantic segmentation. For images with complex background information, and for images with occlusion or overlap, and Semantic segmentation of images with similar textures has better results and can improve the accuracy of image semantic segmentation.

另外，本实施例还提供一种计算机可读存储介质，其上存储有计算机程序，所述计算机程序被处理器执行以实现上述实施例所述的图像语义分割方法。In addition, this embodiment also provides a computer-readable storage medium on which a computer program is stored, and the computer program is executed by a processor to implement the image semantic segmentation method described in the foregoing embodiment.

在本发明所提供的几个实施例中，应该理解到，所揭露的装置和方法，可以通过其它的方式实现。例如，以上所描述的装置实施例仅仅是示意性的，例如，所述单元的划分，仅仅为一种逻辑功能划分，实际实现时可以有另外的划分方式，例如多个单元或组件可以结合或者可以集成到另一个***，或一些特征可以忽略，或不执行。另一点，所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口，装置或单元的间接耦合或通信连接，可以是电性，机械或其它的形式。In the several embodiments provided by the present invention, it should be understood that the disclosed device and method may be implemented in other ways. For example, the device embodiments described above are only illustrative. For example, the division of the units is only a logical function division, and there may be other divisions in actual implementation, for example, multiple units or components can be combined or It can be integrated into another system, or some features can be ignored or not implemented. In addition, the displayed or discussed mutual coupling or direct coupling or communication connection may be indirect coupling or communication connection through some interfaces, devices or units, and may be in electrical, mechanical or other forms.

所述作为分离部件说明的单元可以是或者也可以不是物理上分开的，作为单元显示的部件可以是或者也可以不是物理单元，即可以位于一个地方，或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。The units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or they may be distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the objectives of the solutions of the embodiments.

另外，在本发明各个实施例中的各功能单元可以集成在一个处理单元中，也可以是各个单元单独物理存在，也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现，也可以采用硬件加软件功能单元的形式实现。In addition, the functional units in the various embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit. The above-mentioned integrated unit may be implemented in the form of hardware, or may be implemented in the form of hardware plus software functional units.

上述以软件功能单元的形式实现的集成的单元，可以存储在一个计算机可读取存储介质中。上述软件功能单元存储在一个存储介质中，包括若干指令用以使得一台计算机设备(可以是个人计算机，服务器，或者网络设备等)或处理器(processor)执行本发明各个实施例所述方法的部分步骤。而前述的存储介质包括：U盘、移动硬盘、只读存储器(Read-Only Memory，ROM)、随机存取存储器(Random Access Memory，RAM)、磁碟或者光盘等各种可以存储程序代码的介质。The above-mentioned integrated unit implemented in the form of a software functional unit may be stored in a computer readable storage medium. The above-mentioned software functional unit is stored in a storage medium and includes several instructions to make a computer device (which may be a personal computer, a server, or a network device, etc.) or a processor execute the method described in each embodiment of the present invention. Part of the steps. The aforementioned storage media include: U disk, mobile hard disk, read-only memory (Read-Only Memory, ROM), random access memory (Random Access Memory, RAM), magnetic disk or optical disk and other media that can store program code .

本领域技术人员可以清楚地了解到，为描述的方便和简洁，仅以上述各功能模块的划分进行举例说明，实际应用中，可以根据需要而将上述功能分配由不同的功能模块完成，即将装置的内部结构划分成不同的功能模块，以完成以上描述的全部或者部分功能。上述描述的装置的具体工作过程，可以参考前述方法实施例中的对应过程，在此不再赘述。Those skilled in the art can clearly understand that, for the convenience and conciseness of the description, only the division of the above-mentioned functional modules is used as an example. In practical applications, the above-mentioned functions can be allocated by different functional modules as required, namely, the device The internal structure is divided into different functional modules to complete all or part of the functions described above. For the specific working process of the device described above, reference may be made to the corresponding process in the foregoing method embodiment, which is not repeated here.

最后应说明的是：以上各实施例仅用以说明本发明的技术方案，而非对其限制；尽管参照前述各实施例对本发明进行了详细的说明，本领域的普通技术人员应当理解：其依然可以对前述各实施例所记载的技术方案进行修改，或者对其中部分或者全部技术特征进行等同替换；而这些修改或者替换，并不使相应技术方案的本质脱离本发明各实施例技术方案的范围。Finally, it should be noted that the above embodiments are only used to illustrate the technical solutions of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, those of ordinary skill in the art should understand: It is still possible to modify the technical solutions described in the foregoing embodiments, or equivalently replace some or all of the technical features; these modifications or replacements do not make the essence of the corresponding technical solutions deviate from the technical solutions of the embodiments of the present invention range.

Claims

一种图像语义分割方法，其特征在于，包括：A method for image semantic segmentation, characterized in that it includes:

获取激光雷达采集的点云数据、以及相机拍摄的图像，其中所述点云数据与所述图像相对应；Acquiring point cloud data collected by lidar and an image taken by a camera, where the point cloud data corresponds to the image;

根据所述点云数据获取深度图；Obtaining a depth map according to the point cloud data;

获取所述深度图与所述图像的融合图像；Acquiring a fusion image of the depth map and the image;

识别所述融合图像中各图像块的场景类别信息；Identifying scene category information of each image block in the fused image;

将所述各图像块的场景类别信息标注到所述融合图像中。The scene category information of each image block is marked into the fusion image.
根据权利要求1所述的方法，其特征在于，所述获取所述深度图与所述图像的融合图像，包括：The method according to claim 1, wherein said acquiring a fusion image of said depth map and said image comprises:

将所述深度图和所述图像输入预先训练的第一模型中，获取深度图与所述图像的融合图像。The depth map and the image are input into a pre-trained first model, and a fusion image of the depth map and the image is obtained.
根据权利要求1或2所述的方法，其特征在于，所述识别所述融合图像中各图像块的场景类别信息，包括：The method according to claim 1 or 2, wherein said identifying the scene category information of each image block in the fused image comprises:

通过第一模型获取所述融合图像中任一图像块属于多种预设场景类别中每一预设场景类别的概率，将其中最大概率对应的预设场景类别作为该图像块的场景类别信息。The probability that any image block in the fusion image belongs to each preset scene category among multiple preset scene categories is acquired through the first model, and the preset scene category corresponding to the largest probability is used as the scene category information of the image block.
根据权利要求3所述的方法，其特征在于，所述多个预设场景类别包括至少一种一级场景类别以及至少一种二级场景类别，其中任意所述一级场景类别以至少一个所述二级场景类别作为子类别。The method according to claim 3, wherein the plurality of preset scene categories include at least one first-level scene category and at least one second-level scene category, wherein any one of the first-level scene categories has at least one The above-mentioned secondary scene category is regarded as a subcategory.
根据权利要求4所述的方法，其特征在于，所述通过所述第一模型获取所述融合图像中任一图像块属于多种预设场景类别中每一预设场景类别的概率，将其中最大概率对应的预设场景类别作为该图像块的场景类别信息，包括：The method according to claim 4, wherein the first model is used to obtain the probability that any image block in the fusion image belongs to each preset scene category among multiple preset scene categories, and The preset scene category corresponding to the maximum probability as the scene category information of the image block includes:

所述通过所述第一模型获取所述融合图像中任一图像块属于每一所述二级场景类别的概率，将其中最大概率对应的二级场景类别或该二级场景类别所属的一级场景类别作为该图像块的场景类别信息；或者The probability that any image block in the fusion image belongs to each of the second-level scene categories is acquired through the first model, and the second-level scene category corresponding to the largest probability or the first-level scene category to which the second-level scene category belongs The scene category is used as the scene category information of the image block; or

所述通过所述第一模型获取所述融合图像中任一图像块属于每一所述一级场景类别的概率，将其中最大概率对应的一级场景类别作为该图像块的场景类别信息。The probability that any image block in the fusion image belongs to each of the first-level scene categories is acquired through the first model, and the first-level scene category corresponding to the largest probability is used as the scene category information of the image block.
根据权利要求2-5任一项所述的方法，其特征在于，所述将其中最大概率对应的预设场景类别作为该图像块的场景类别信息后，还包括：The method according to any one of claims 2-5, wherein, after the preset scene category corresponding to the maximum probability is used as the scene category information of the image block, the method further comprises:

若所述最大概率低于预设概率阈值，则在所述融合图像中忽略该图像块的场景类别信息。If the maximum probability is lower than the preset probability threshold, the scene category information of the image block is ignored in the fused image.
根据权利要求2-6任一项所述的方法，其特征在于，还包括：The method according to any one of claims 2-6, further comprising:

将任一图像块的场景类别信息与其相邻的多个图像块的场景类别信息进行近似度比较，若近似度低于预设近似度阈值，则在所述融合图像中忽略该图像块的场景类别信息。The scene category information of any image block is compared with the scene category information of multiple adjacent image blocks. If the similarity is lower than the preset similarity threshold, the scene of the image block is ignored in the fused image Category information.
根据权利要求2-7任一项所述的方法，其特征在于，所述第一模型为卷积神经网络模型，所述第一模型包括多个依次连接的第一处理单元，所述第一处理单元包括卷积层、批标准化层和激活层。The method according to any one of claims 2-7, wherein the first model is a convolutional neural network model, and the first model includes a plurality of first processing units connected in sequence, and the first The processing unit includes a convolutional layer, a batch normalization layer, and an activation layer.
根据权利要求1-8任一项所述的方法，其特征在于，所述根据所述点云数据获取深度图，包括：The method according to any one of claims 1-8, wherein the obtaining a depth map according to the point cloud data comprises:

将所述点云数据输入预先训练的第二模型，生成所述深度图。The point cloud data is input into a pre-trained second model to generate the depth map.
根据权利要求9所述的方法，其特征在于，所述将所述点云数据输入预先训练的第一模型，生成所述深度图，包括：The method according to claim 9, wherein said inputting said point cloud data into a pre-trained first model to generate said depth map comprises:

若所述点云数据的密度高于预设密度阈值，通过所述第二模型将所述点云数据转化为所述深度图；或者If the density of the point cloud data is higher than a preset density threshold, convert the point cloud data into the depth map through the second model; or

若所述点云数据的密度不高于所述预设密度阈值，通过所述第二模型对所述点云数据进行增密化处理，并将增密化处理后的点云数据转化为所述深度图。If the density of the point cloud data is not higher than the preset density threshold, the point cloud data is densified through the second model, and the densified point cloud data is converted into all Depth map.
根据权利要求9或10所述的方法，其特征在于，所述第二模型为卷积神经网络模型，所述第二模型包括多个依次连接的第二处理单元，所述第二处理单元包括卷积层、批标准化层和激活层。The method according to claim 9 or 10, wherein the second model is a convolutional neural network model, the second model includes a plurality of sequentially connected second processing units, and the second processing units include Convolutional layer, batch normalization layer and activation layer.
根据权利要求1-11任一项所述的方法，其特征在于，所述获取激光雷达采集的点云数据、以及相机拍摄的图像前，还包括：The method according to any one of claims 1-11, wherein before acquiring the point cloud data collected by the lidar and the image taken by the camera, the method further comprises:

对所述激光雷达和所述相机进行标定，以使所述激光雷达采集的点云数据与所述相机拍摄的图像相对应。Calibrating the lidar and the camera so that the point cloud data collected by the lidar corresponds to the image taken by the camera.
一种可移动平台，其特征在于，包括：激光雷达、相机、存储器和处理器；A movable platform, which is characterized by comprising: a lidar, a camera, a memory and a processor;

所述存储器用于存储程序代码；The memory is used to store program codes;

所述处理器调用所述程序代码，当程序代码被执行时，用于执行以下操作：The processor calls the program code, and when the program code is executed, it is used to perform the following operations:

获取激光雷达采集的点云数据、以及相机拍摄的图像，其中所述点云数据与所述图像相对应；Acquiring point cloud data collected by lidar and an image taken by a camera, where the point cloud data corresponds to the image;

根据所述点云数据获取深度图；Obtaining a depth map according to the point cloud data;

获取所述深度图与所述图像的融合图像；Acquiring a fusion image of the depth map and the image;

识别所述融合图像中各图像块的场景类别信息；Identifying scene category information of each image block in the fused image;

将所述各图像块的场景类别信息标注到所述融合图像中。The scene category information of each image block is marked into the fusion image.
根据权利要求13所述的可移动平台，其特征在于，在所述处理器获取所述深度图与所述图像的融合图像时，所述处理器被配置为：The movable platform according to claim 13, wherein when the processor obtains the fused image of the depth map and the image, the processor is configured to:

将所述深度图和所述图像输入预先训练的第一模型中，获取深度图与所述图像的融合图像。The depth map and the image are input into a pre-trained first model, and a fusion image of the depth map and the image is obtained.
根据权利要求13或14所述的可移动平台，其特征在于，在所述处理器识别所述融合图像中各图像块的场景类别信息时，所述处理器被配置为：The mobile platform according to claim 13 or 14, wherein when the processor recognizes scene category information of each image block in the fused image, the processor is configured to:

通过第一模型获取所述融合图像中任一图像块属于多种预设场景类别中每一预设场景类别的概率，将其中最大概率对应的预设场景类别作为该图像块的场景类别信息。The probability that any image block in the fusion image belongs to each preset scene category among multiple preset scene categories is acquired through the first model, and the preset scene category corresponding to the largest probability is used as the scene category information of the image block.
根据权利要求15所述的可移动平台，其特征在于，所述多个预设场景类别包括至少一种一级场景类别以及至少一种二级场景类别，其中任意所述一级场景类别以至少一个所述二级场景类别作为子类别。The mobile platform according to claim 15, wherein the plurality of preset scene categories include at least one first-level scene category and at least one second-level scene category, wherein any one of the first-level scene categories is at least One of the secondary scene categories is used as a subcategory.
根据权利要求16所述的可移动平台，其特征在于，在所述处理器通过所述第一模型获取所述融合图像中任一图像块属于多种预设场景类别中每一预设场景类别的概率，将其中最大概率对应的预设场景类别作为该图像块的场景类别信息时，所述处理器被配置为：The mobile platform according to claim 16, wherein any image block in the fused image acquired by the processor through the first model belongs to each of the preset scene categories When the preset scene category corresponding to the maximum probability is used as the scene category information of the image block, the processor is configured to:

所述通过所述第一模型获取所述融合图像中任一图像块属于每一所述二级场景类别的概率，将其中最大概率对应的二级场景类别或该二级场景类别所属的一级场景类别作为该图像块的场景类别信息；或者The probability that any image block in the fusion image belongs to each of the second-level scene categories is acquired through the first model, and the second-level scene category corresponding to the largest probability or the first-level scene category to which the second-level scene category belongs The scene category is used as the scene category information of the image block; or

所述通过所述第一模型获取所述融合图像中任一图像块属于每一所述一级场景类别的概率，将其中最大概率对应的一级场景类别作为该图像块的场景类别信息。The probability of any image block in the fused image belonging to each of the first-level scene categories is acquired through the first model, and the first-level scene category corresponding to the largest probability is used as the scene category information of the image block.
根据权利要求14-17任一项所述的可移动平台，其特征在于，在所述处理器将其中最大概率对应的预设场景类别作为该图像块的场景类别信息后，所述处理器还被配置为：The mobile platform according to any one of claims 14-17, wherein after the processor uses the preset scene category corresponding to the maximum probability as the scene category information of the image block, the processor further Is configured as:

若所述最大概率低于预设概率阈值，则在所述融合图像中忽略该图像块的场景类别信息。If the maximum probability is lower than the preset probability threshold, the scene category information of the image block is ignored in the fused image.
根据权利要求14-18任一项所述的可移动平台，其特征在于，所述处理器还被配置为：The movable platform according to any one of claims 14-18, wherein the processor is further configured to:

将任一图像块的场景类别信息与其相邻的多个图像块的场景类别信息进行近似度比较，若近似度低于预设近似度阈值，则在所述融合图像中忽略该图像块的场景类别信息。The scene category information of any image block is compared with the scene category information of multiple adjacent image blocks. If the similarity is lower than the preset similarity threshold, the scene of the image block is ignored in the fused image Category information.
根据权利要求14-19任一项所述的可移动平台，其特征在于，所述第一模型为卷积神经网络模型，所述第一模型包括多个依次连接的第一处理单元，所述第一处理单元包括卷积层、批标准化层和激活层。The movable platform according to any one of claims 14-19, wherein the first model is a convolutional neural network model, and the first model includes a plurality of first processing units connected in sequence, and The first processing unit includes a convolution layer, a batch normalization layer, and an activation layer.
根据权利要求13-20任一项所述的可移动平台，其特征在于，在所述处理器根据所述点云数据获取深度图时，所述处理器被配置为：The mobile platform according to any one of claims 13-20, wherein when the processor obtains a depth map according to the point cloud data, the processor is configured to:

将所述点云数据输入预先训练的第二模型，生成所述深度图。The point cloud data is input into a pre-trained second model to generate the depth map.
根据权利要求21所述的可移动平台，其特征在于，在所述处理器将所述点云数据输入预先训练的第一模型，生成所述深度图时，所述处理器被配置为：The mobile platform of claim 21, wherein when the processor inputs the point cloud data into a pre-trained first model to generate the depth map, the processor is configured to:

若所述点云数据的密度高于预设密度阈值，通过所述第二模型将所述点云数据转化为所述深度图；或者If the density of the point cloud data is higher than a preset density threshold, convert the point cloud data into the depth map through the second model; or

若所述点云数据的密度不高于所述预设密度阈值，通过所述第二模型对所述点云数据进行增密化处理，并将增密化处理后的点云数据转化为所述深度图。If the density of the point cloud data is not higher than the preset density threshold, the point cloud data is densified through the second model, and the densified point cloud data is converted into all Depth map.
根据权利要求21或22所述的可移动平台，其特征在于，所述第二模型为卷积神经网络模型，所述第二模型包括多个依次连接的第二处理单元，所述第二处理单元包括卷积层、批标准化层和激活层。The movable platform according to claim 21 or 22, wherein the second model is a convolutional neural network model, and the second model includes a plurality of second processing units connected in sequence, and the second processing The unit includes convolutional layer, batch normalization layer and activation layer.
根据权利要求13-23任一项所述的可移动平台，其特征在于，在所述处理器获取激光雷达采集的点云数据、以及相机拍摄的图像前，所述处理器还被配置为：The movable platform according to any one of claims 13-23, wherein before the processor acquires the point cloud data collected by the lidar and the image taken by the camera, the processor is further configured to:

对所述激光雷达和所述相机进行标定，以使所述激光雷达采集的点云数据与所述相机拍摄的图像相对应。Calibrating the lidar and the camera so that the point cloud data collected by the lidar corresponds to the image taken by the camera.
根据权利要求13-24任一项所述的可移动平台，其特征在于，所述可移动平台包括：车辆、无人机、机器人中的至少一种。The movable platform according to any one of claims 13-24, wherein the movable platform comprises: at least one of a vehicle, a drone, and a robot.
一种计算机可读存储介质，其特征在于，其上存储有计算机程序，所述计算机程序被处理器执行以实现如权利要求1-12任一项所述的方法。A computer-readable storage medium, characterized in that a computer program is stored thereon, and the computer program is executed by a processor to implement the method according to any one of claims 1-12.