CN111742344A

CN111742344A - Image semantic segmentation method, movable platform and storage medium

Info

Publication number: CN111742344A
Application number: CN201980012353.4A
Authority: CN
Inventors: 王涛; 李鑫超; 李思晋
Original assignee: SZ DJI Technology Co Ltd
Current assignee: Shenzhen Zhuoyu Technology Co ltd
Priority date: 2019-06-28
Filing date: 2019-06-28
Publication date: 2020-10-02
Also published as: WO2020258297A1

Abstract

The embodiment of the invention provides an image semantic segmentation method, a movable platform and a storage medium, wherein point cloud data acquired by a laser radar and an image shot by a camera are acquired, wherein the point cloud data corresponds to the image; acquiring a depth map according to the point cloud data; acquiring a fusion image of the depth map and the image; identifying scene category information of each image block in the fusion image; and labeling the scene type information of each image block into the fusion image. The depth map acquired by the laser radar point cloud data is fused with the image shot by the camera, so that various targets can be accurately separated during image semantic segmentation, a good effect is achieved on the semantic segmentation of images with complex background information and images with shielding or overlapping and similar textures, and the accuracy of the image semantic segmentation can be improved.

Description

Image semantic segmentation method, movable platform and storage medium

Technical Field

The embodiment of the invention relates to the field of image processing, in particular to an image semantic segmentation method, a movable platform and a storage medium.

Background

Image semantic segmentation is a basic task in computer vision, and in semantic segmentation, images need to be divided into different semantic interpretable categories, namely, content and position information existing in the images are identified, so that the image semantic segmentation method is widely applied to the fields of automatic driving, medical image analysis, robots and the like.

In the prior art, a camera is generally adopted to shoot an image and then input the image into a neural network model, and the neural network model performs semantic segmentation on the image. However, the image semantic segmentation method in the prior art cannot perform accurate semantic segmentation on images of some scenes, such as images with complex background information, such as vehicle scenes and airplane scenes, and is difficult to separate various targets in the prior art; furthermore, if the images which are blocked or overlapped and have similar textures exist, the images are difficult to distinguish by the prior art.

Disclosure of Invention

The embodiment of the invention provides an image semantic segmentation method, a movable platform and a storage medium, which are used for improving the accuracy of image semantic segmentation.

The first aspect of the embodiments of the present invention provides an image semantic segmentation method, including:

acquiring point cloud data acquired by a laser radar and an image shot by a camera, wherein the point cloud data corresponds to the image;

acquiring a depth map according to the point cloud data;

acquiring a fusion image of the depth map and the image;

identifying scene category information of each image block in the fusion image;

and labeling the scene type information of each image block into the fusion image.

A second aspect of an embodiment of the present invention provides a movable platform, including: a lidar, a camera, a memory, and a processor;

the memory is used for storing program codes;

the processor calls the program code, and when the program code is executed, performs the following:

acquiring a depth map according to the point cloud data;

acquiring a fusion image of the depth map and the image;

identifying scene category information of each image block in the fusion image;

A third aspect of embodiments of the present invention is to provide a computer-readable storage medium, on which a computer program is stored, the computer program being executed by a processor to implement the method according to the first aspect.

According to the image semantic segmentation method, the movable platform and the storage medium, the point cloud data collected by the laser radar and the image shot by the camera are obtained, wherein the point cloud data correspond to the image; acquiring a depth map according to the point cloud data; acquiring a fusion image of the depth map and the image; identifying scene category information of each image block in the fusion image; and labeling the scene type information of each image block into the fusion image. The depth map acquired by the laser radar point cloud data is fused with the image shot by the camera, so that various targets can be accurately separated during image semantic segmentation, a good effect is achieved on the semantic segmentation of images with complex background information and images with shielding or overlapping and similar textures, and the accuracy of the image semantic segmentation can be improved.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without inventive labor.

FIG. 1 is a flowchart of an image semantic segmentation method according to an embodiment of the present invention;

FIG. 2a is a schematic view of an application scenario of the image semantic segmentation method according to the embodiment of the present invention;

FIG. 2b is a schematic illustration of a depth map and an image of the scene of FIG. 2 a;

FIG. 3a is a schematic view of an application scenario of a semantic segmentation method for images according to another embodiment of the present invention;

FIG. 3b is a schematic illustration of a depth map and image of the scene of FIG. 3 a;

FIG. 4a is a schematic view of an application scenario of a semantic segmentation method for images according to another embodiment of the present invention;

FIG. 4b is a schematic illustration of a depth map and image of the scene of FIG. 4 a;

FIG. 5 is a flowchart of a semantic segmentation method for an image according to another embodiment of the present invention;

FIG. 6 is a flowchart of a semantic segmentation method for an image according to another embodiment of the present invention;

FIG. 7 is a flowchart of a semantic segmentation method for an image according to another embodiment of the present invention;

fig. 8 is a block diagram of a movable platform according to an embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be described clearly below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

It will be understood that when an element is referred to as being "secured to" another element, it can be directly on the other element or intervening elements may also be present. When a component is referred to as being "connected" to another component, it can be directly connected to the other component or intervening components may also be present.

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. The terminology used in the description of the invention herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used herein, the term "and/or" includes any and all combinations of one or more of the associated listed items.

Some embodiments of the invention are described in detail below with reference to the accompanying drawings. The embodiments described below and the features of the embodiments can be combined with each other without conflict.

The embodiment of the invention provides an image semantic segmentation method. Fig. 1 is a flowchart of an image semantic segmentation method according to an embodiment of the present invention. As shown in fig. 1, the image semantic segmentation method in this embodiment may include:

s101, point cloud data collected by a laser radar and an image shot by a camera are obtained, wherein the point cloud data correspond to the image.

In this embodiment, set up lidar and camera on movable platform such as vehicle, unmanned aerial vehicle, robot, lidar can gather a cloud data, the camera can shoot the image (for example RGB image), wherein a cloud data is corresponding with the image, can coincide with the image when a cloud data projection is in the image, also can scan same target area with the camera, lidar's sight direction is the same with the shooting direction of camera or is close, the accessible carries out joint calibration with lidar and camera and realizes. That is, before the point cloud data collected by the laser radar and the image captured by the camera are acquired in S101, the laser radar and the camera are calibrated so that the point cloud data collected by the laser radar corresponds to the image captured by the camera. Of course, the point cloud data and the image may be made to correspond to each other after the point cloud data and the image are acquired and then subjected to rotational translation processing.

In this embodiment, laser radar and camera can fixed the setting on movable platform, also can set up the collection in order to realize the point cloud data and the image of certain within range on movable platform through rotary part.

And S102, acquiring a depth map according to the point cloud data.

In this embodiment, after the point cloud data is obtained, a depth map may be obtained according to the point cloud data, where each pixel value of the depth map represents a distance from an object to the laser radar, and the point cloud data has information of an orientation and a distance of the object, and the depth map may be obtained according to the information of the orientation and the distance of the object in the point cloud data. In this embodiment, the point cloud data may be converted into a depth map by using an existing algorithm, or the point cloud data may be converted into the depth map by using a pre-trained convolutional neural network model.

And S103, acquiring a fusion image of the depth map and the image.

In this embodiment, after the depth map is acquired, the depth map and the image may be fused, so that a fused image of the depth map and the image may be obtained, specifically, since the point cloud data corresponds to the image, the depth map may be directly projected into the image to form the fused image. In this embodiment, the depth map and the image may be extracted with feature points, the feature points corresponding to each other in the depth map and the image are obtained by comparison, and based on the feature points corresponding to each other, the coordinate system of the depth map is converted into the coordinate system of the image through operations such as rotation, translation, and clipping, so as to project the depth map onto the coordinate system of the image, align each pixel of the depth map with the image, and then perform data-level fusion with the image, thereby obtaining a fused image.

In addition, in this embodiment, the point cloud data may also be aligned with the image, for example, the point cloud data may be aligned with the image by using a registration method based on a gray scale region, a registration method based on a feature, a registration method combining a line feature and a point feature, and the like, and then a depth map is obtained according to the point cloud data, where the obtained depth map is aligned with the image, and then the depth map and the image may be directly fused, so as to obtain a fused image.

And step S104, identifying scene type information of each image block in the fusion image.

In this embodiment, multiple preset scene types may be predefined, and by identifying a scene type of any image block in the fused image, which preset scene type the image block belongs to is determined, so that the scene type information of the image block may be obtained. More specifically, the probability that the image block belongs to each of multiple preset scene categories may be obtained, and the preset scene category corresponding to the maximum probability is used as the scene category information of the image block.

In this embodiment, the preset scene categories may include, for example, categories such as a car, a sky, a road, a static obstacle, and a dynamic obstacle; of course, more detailed scene categories may be included, for example, vehicles may be specifically classified as cars, trucks, buses, trains, motor homes, etc., static obstacles may be specifically classified as buildings, walls, guardrails, utility poles, traffic lights, traffic signs, etc., and dynamic obstacles may include pedestrians, bicycles, motorcycles, etc. In this embodiment, the categories of the vehicle, the sky, the road, the static obstacle, the dynamic obstacle, and the like may be used as primary scene categories, and more detailed scene categories may be used as secondary scene categories, that is, the preset scene categories may include at least one primary scene category and at least one secondary scene category, where at least one of the primary scene categories is used as a sub-category.

Further, in this embodiment, the identification of the scene type information for the image block may be implemented by using a convolutional neural network, and certainly, the identification may also be performed by using other methods, which is not described herein again.

In another embodiment, the depth map and the image may be aligned first, then the scene category information of each image block in the image is identified according to the corresponding relationship between the depth map and the image, then the depth map and the image are fused to obtain a fused image, and the scene category information of each image block is labeled in the fused image.

And step S105, labeling the scene type information of each image block into the fusion image.

In this embodiment, after the scene category information of each image block in the fused image is identified, the scene category information may be labeled into the fused image to generate a semantic map (semantic map), thereby completing image semantic segmentation. In this embodiment, the label for the scene type information may only label the identifier corresponding to the scene type, or label the image block with the corresponding color according to the scene type information.

The image semantic segmentation method provided by this embodiment may be applied to perform semantic segmentation on an image with occlusion, for example, as shown in a top view of fig. 2a, when a vehicle a and a vehicle B are located at positions shown in the figure, a movable platform mounted with a laser radar and a camera is viewed from an observation angle in an arrow direction, the vehicle a is partially occluded by the vehicle B, a depth map shown in an upper diagram of fig. 2B (where different filling patterns represent different depths) may be obtained after point cloud data is collected by the laser radar along the observation angle, an image shown in a lower diagram of fig. 2B may be obtained after a camera takes a picture along the observation angle, and by fusing the depth map and the image and performing voice segmentation, two vehicles with different distances in front may be distinguished, and a type of the vehicle (that is, scene category information) may be further distinguished.

Certainly, in this embodiment, a more complicated shielding situation may also be resolved, for example, as shown in the top view of fig. 3a, when the movable platform carrying the laser radar and the camera is viewed from the observation angle in the arrow direction, the vehicle C shields part of the vehicle B and part of the vehicle a, the vehicle B shields part of the vehicle a, after the point cloud data is collected by the laser radar along the observation angle, the depth map shown in the upper map of fig. 3B may be obtained, after the camera takes a picture along the observation angle, the image shown in the lower map of fig. 3B may be obtained, the depth map is fused with the image and voice segmentation is performed, i.e., different shielding relationships in the front may be resolved, and the type of the vehicle (i.e., scene category information) may be.

The image semantic segmentation method provided in this embodiment may also be applied to perform semantic segmentation on an image of an object with similar texture, for example, as shown in fig. 4a, a wall surface with a corner is located in front of a movable platform carrying a laser radar and a camera, the wall surface D is closer to the movable platform than to the wall surface E, and the wall surface D and the wall surface E have similar texture, after point cloud data is collected by the laser radar along an observation angle, a depth map shown in an upper diagram of fig. 4b may be obtained, after a picture is taken by the camera along the observation angle, an image shown in a lower diagram of fig. 4b may be obtained, the depth map and the image are fused and subjected to voice segmentation, so that a front-back relationship between the wall surface D and the wall surface E may be identified, and further, the scene category information may be identified as.

According to the image semantic segmentation method provided by the embodiment, point cloud data acquired by a laser radar and an image shot by a camera are acquired, wherein the point cloud data corresponds to the image; acquiring a depth map according to the point cloud data; acquiring a fusion image of the depth map and the image; identifying scene category information of each image block in the fusion image; and labeling the scene type information of each image block into the fusion image. The depth map acquired by the laser radar point cloud data is fused with the image shot by the camera, so that various targets can be accurately separated during image semantic segmentation, a good effect is achieved on the semantic segmentation of images with complex background information and images with shielding or overlapping and similar textures, and the accuracy of the image semantic segmentation can be improved.

The embodiment of the invention provides an image semantic segmentation method. Fig. 5 and fig. 6 are flowcharts of an image semantic segmentation method according to another embodiment of the present invention. As shown in fig. 5 and fig. 6, on the basis of the foregoing embodiments, the image semantic segmentation method in this embodiment may include:

step S201, point cloud data collected by a laser radar and an image shot by a camera are obtained, wherein the point cloud data correspond to the image.

And S202, acquiring a depth map according to the point cloud data.

Steps S201 and S202 may refer to S101 and S102 in the above embodiments, and are not described herein again. In this embodiment, the following second model is used to obtain a depth map from the point cloud data.

And S203, inputting the depth map and the image into a pre-trained first model, and acquiring a fusion image of the depth map and the image.

Step S204, the probability that any image block in the fusion image belongs to each preset scene category in multiple preset scene categories is obtained through the first model, and the preset scene category corresponding to the maximum probability is used as the scene category information of the image block.

In this embodiment, the first model may be a convolutional neural network model, as shown in fig. 7, the first model includes a plurality of sequentially connected first processing units, and the first processing units include a convolutional layer, a batch normalization layer, and an activation layer. Further, the plurality of sequentially connected first processing units in the first model may prevent the gradient from dissipating by a skip connection (skip connection).

In this embodiment, after the depth map and the image corresponding to each other are input to the first model, the first model first fuses the depth map and the image to obtain a fused image of the depth map and the image, then the first model identifies scene types of each image block of the fused image, for any image block, a probability that the image block belongs to each preset scene type of multiple preset scene types is obtained, and the preset scene type corresponding to the maximum probability is used as scene type information of the image block.

It should be noted that, for training of the first model, a plurality of fusion images labeled with preset scene categories may be first obtained as a training set and a test set, and features of image blocks of each preset scene category may be obtained through training, so that when a new fusion image is input, a probability that each image block in the new fusion image belongs to each preset scene category in a plurality of preset scene categories may be analyzed, and then the preset scene category corresponding to the maximum probability may be used as scene category information of the image block. The specific training process is not described herein.

On the basis of any of the above embodiments, the plurality of preset scene categories include at least one primary scene category and at least one secondary scene category, where any of the primary scene categories has at least one secondary scene category as a sub-category.

Further, step S204 may specifically include:

the probability that any image block in the fusion image belongs to each secondary scene category is obtained through the first model, and the secondary scene category corresponding to the maximum probability or the primary scene category to which the secondary scene category belongs is used as the scene category information of the image block; or

And acquiring the probability that any image block in the fusion image belongs to each primary scene category through the first model, and taking the primary scene category corresponding to the maximum probability as the scene category information of the image block.

In this embodiment, since the preset scene categories include a primary scene and a secondary scene category, when the probability that any image block belongs to each of the multiple preset scene categories is obtained through the first model, the probability that the image block belongs to each of the secondary scene categories may be determined by using the secondary scene categories as a measurement standard, and then the secondary scene category corresponding to the maximum probability may be used as the scene category information of the image block, or the secondary scene category to which the secondary scene category belongs and the scene category may be used as the scene category information of the image block, for example, the secondary scene category corresponding to the maximum probability is a pedestrian, and a pedestrian or a dynamic obstacle may be used as the scene category information of the image block. In addition, in this embodiment, the first-level scene type may also be directly used as a measurement standard to determine the probability that the image block belongs to each first-level scene type, and the first-level scene type corresponding to the maximum probability is used as the scene type information of the image block.

Step S205, labeling the scene type information of each image block into the fusion image.

In this embodiment, step S205 can refer to step S105 in the above embodiments, which is not described herein again.

On the basis of any of the above embodiments, after obtaining the probability that any image block in the fusion image belongs to each preset scene category in multiple preset scene categories, and taking the preset scene category corresponding to the maximum probability as the scene category information of the image block, the method may further include:

and if the maximum probability is lower than a preset probability threshold, ignoring the scene category information of the image block in the fusion image.

In this embodiment, since there may be a certain error in the recognition of the scene type information, the false recognition determination may be implemented by comparing the maximum probability with a preset probability threshold, that is, when the maximum probability is lower than the preset probability threshold, it is determined that there is a possibility of false recognition in the image block, and for the image block with false recognition, the scene type information of the image block may be directly ignored, so as to avoid affecting the semantic segmentation result.

On the basis of any of the above embodiments, the method further comprises:

and comparing the similarity of the scene type information of any image block with the scene type information of a plurality of adjacent image blocks, and if the similarity is lower than a preset similarity threshold, ignoring the scene type information of the image block in the fusion image.

In the present embodiment, if the scene type information recognition result of a certain image block is significantly different from the scene type information of a plurality of adjacent image blocks, it indicates that there is a possibility of erroneous recognition, and the scene type information of the image block needs to be ignored. In the embodiment, the similarity of the scene type information of any image block is compared with the scene type information of a plurality of adjacent image blocks, and when the similarity is greatly different, that is, the similarity is lower than a preset similarity threshold, the scene type information of the image block is judged to be misrecognition, and then the scene type information of the image block is ignored, so that the influence on the semantic segmentation result is avoided.

On the basis of any of the above embodiments, the acquiring a depth map according to the point cloud data in S102 and S202 may specifically include:

and inputting the point cloud data into a pre-trained second model to generate the depth map.

The second model may be a convolutional neural network model, as shown in fig. 7, the second model includes a plurality of sequentially connected second processing units, and the second processing units include a convolutional layer, a batch normalization layer, and an activation layer. Further, a plurality of sequentially connected second processing units in the second model may prevent gradient dissipation through skip connection (skip connection).

In this embodiment, the conversion from the point cloud data to the depth map is implemented by the pre-trained second model, wherein the training of the second model can obtain multiple sets of corresponding point cloud data and depth map as a training set and a test set to train the second model, and the specific training process is not repeated here.

Further, the inputting the point cloud data into a pre-trained first model to generate the depth map includes:

if the density of the point cloud data is higher than a preset density threshold value, converting the point cloud data into the depth map through the second model; or

And if the density of the point cloud data is not higher than the preset density threshold value, performing densification processing on the point cloud data through the second model, and converting the point cloud data after the densification processing into the depth map.

In this embodiment, the density of the point cloud data affects the precision of depth information in the depth map, and affects the accuracy of image semantic segmentation to a great extent, so that when the point cloud data is converted into the depth map, whether the density of the point cloud data meets the requirement needs to be judged, when the density of the point cloud data is higher than a preset density threshold, the density of the point cloud data meets the requirement, and the point cloud data can be directly converted into the depth map through the second model; when the density of the point cloud data is not higher than the preset density threshold, it is indicated that the density of the point cloud data does not meet the requirement, and the point cloud data needs to be subjected to densification processing. Certainly, other algorithms may also be adopted to implement the point cloud data densification, for example, a point cloud densification algorithm based on sparse matching, and the like, which are not described herein again.

The embodiment of the invention provides a movable platform. Fig. 8 is a structural diagram of a movable platform according to an embodiment of the present invention, and as shown in fig. 8, the movable platform 30 includes: laser radar 31, camera 32, processor 33 and memory 34.

The memory 34 is used for storing program codes;

the processor 33 invokes the program code, which when executed, performs the following:

acquiring point cloud data acquired by a laser radar 31 and an image shot by a camera 32, wherein the point cloud data corresponds to the image;

acquiring a depth map according to the point cloud data;

acquiring a fusion image of the depth map and the image;

identifying scene category information of each image block in the fusion image;

On the basis of any of the above embodiments, when the processor 33 acquires the fused image of the depth map and the image, the processor 33 is configured to:

and inputting the depth map and the image into a pre-trained first model to obtain a fusion image of the depth map and the image.

On the basis of any of the above embodiments, when the processor 33 identifies the scene type information of each image block in the fused image, the processor 33 is configured to:

and acquiring the probability that any image block in the fusion image belongs to each preset scene category in multiple preset scene categories through a first model, and taking the preset scene category corresponding to the maximum probability as the scene category information of the image block.

On the basis of any of the above embodiments, when the processor 33 obtains, through the first model, a probability that any image block in the fused image belongs to each of multiple preset scene categories, and takes a preset scene category corresponding to a maximum probability as scene category information of the image block, the processor 33 is configured to:

On the basis of any of the above embodiments, after the processor 33 uses the preset scene category corresponding to the maximum probability as the scene category information of the image block, the processor 33 is further configured to:

On the basis of any of the above embodiments, the processor 33 is further configured to:

On the basis of any one of the above embodiments, the first model is a convolutional neural network model, the first model includes a plurality of sequentially connected first processing units, and the first processing units include a convolutional layer, a batch normalization layer, and an activation layer.

On the basis of any of the above embodiments, when the processor 33 acquires a depth map from the point cloud data, the processor 33 is configured to:

On the basis of any of the above embodiments, when the processor 33 inputs the point cloud data into the pre-trained first model to generate the depth map, the processor 33 is configured to:

On the basis of any one of the above embodiments, the second model is a convolutional neural network model, the second model includes a plurality of second processing units connected in sequence, and the second processing unit includes a convolutional layer, a batch normalization layer, and an activation layer.

On the basis of any of the above embodiments, before the processor 33 acquires the point cloud data acquired by the laser radar 31 and the image captured by the camera 32, the processor 33 is further configured to:

calibrating the laser radar 31 and the camera 32 so that the point cloud data collected by the laser radar 31 corresponds to the image shot by the camera 32.

On the basis of any of the above embodiments, the movable platform 30 includes: at least one of a vehicle, a drone, and a robot.

The specific principle and implementation of the movable platform 30 provided in this embodiment are similar to those of the above embodiments, and are not described herein again.

The movable platform provided by the embodiment acquires point cloud data acquired by a laser radar and an image shot by a camera, wherein the point cloud data corresponds to the image; acquiring a depth map according to the point cloud data; acquiring a fusion image of the depth map and the image; identifying scene category information of each image block in the fusion image; and labeling the scene type information of each image block into the fusion image. The depth map acquired by the laser radar point cloud data is fused with the image shot by the camera, so that various targets can be accurately separated during image semantic segmentation, a good effect is achieved on the semantic segmentation of images with complex background information and images with shielding or overlapping and similar textures, and the accuracy of the image semantic segmentation can be improved.

In addition, the present embodiment also provides a computer-readable storage medium, on which a computer program is stored, where the computer program is executed by a processor to implement the image semantic segmentation method described in the above embodiments.

In the embodiments provided in the present invention, it should be understood that the disclosed apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, or in a form of hardware plus a software functional unit.

The integrated unit implemented in the form of a software functional unit may be stored in a computer readable storage medium. The software functional unit is stored in a storage medium and includes several instructions to enable a computer device (which may be a personal computer, a server, or a network device) or a processor (processor) to execute some steps of the methods according to the embodiments of the present invention. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.

It is obvious to those skilled in the art that, for convenience and simplicity of description, the foregoing division of the functional modules is merely used as an example, and in practical applications, the above function distribution may be performed by different functional modules according to needs, that is, the internal structure of the device is divided into different functional modules to perform all or part of the above described functions. For the specific working process of the device described above, reference may be made to the corresponding process in the foregoing method embodiment, which is not described herein again.

Finally, it should be noted that: the above embodiments are only used to illustrate the technical solution of the present invention, and not to limit the same; while the invention has been described in detail and with reference to the foregoing embodiments, it will be understood by those skilled in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present invention.

Claims

1. An image semantic segmentation method, comprising:

acquiring a depth map according to the point cloud data;

acquiring a fusion image of the depth map and the image;

identifying scene category information of each image block in the fusion image;

2. The method of claim 1, wherein the obtaining a fused image of the depth map and the image comprises:

3. The method according to claim 1 or 2, wherein the identifying scene category information of each image block in the fused image comprises:

4. The method of claim 3, wherein the plurality of preset scene categories include at least one primary scene category and at least one secondary scene category, wherein any of the primary scene categories is sub-categorized by at least one of the secondary scene categories.

5. The method according to claim 4, wherein the obtaining, by the first model, a probability that any image block in the fused image belongs to each of a plurality of preset scene categories, and taking a preset scene category corresponding to a maximum probability therein as the scene category information of the image block includes:

6. The method according to any one of claims 2 to 5, wherein after the taking the preset scene category corresponding to the maximum probability as the scene category information of the image block, the method further comprises:

7. The method of any of claims 2-6, further comprising:

8. The method of any one of claims 2-7, wherein the first model is a convolutional neural network model, the first model comprising a plurality of sequentially connected first processing units, the first processing units comprising a convolutional layer, a batch normalization layer, and an activation layer.

9. The method of any one of claims 1-8, wherein the obtaining a depth map from the point cloud data comprises:

10. The method of claim 9, wherein inputting the point cloud data into a pre-trained first model, generating the depth map, comprises:

11. The method of claim 9 or 10, wherein the second model is a convolutional neural network model, the second model comprises a plurality of sequentially connected second processing units, and the second processing units comprise a convolutional layer, a batch normalization layer, and an activation layer.

12. The method according to any one of claims 1-11, wherein before acquiring the point cloud data collected by the lidar and the image captured by the camera, the method further comprises:

and calibrating the laser radar and the camera so as to enable point cloud data acquired by the laser radar to correspond to an image shot by the camera.

13. A movable platform, comprising: a lidar, a camera, a memory, and a processor;

the memory is used for storing program codes;

acquiring a depth map according to the point cloud data;

acquiring a fusion image of the depth map and the image;

identifying scene category information of each image block in the fusion image;

14. The movable platform of claim 13, wherein, when the processor acquires a fused image of the depth map and the image, the processor is configured to:

15. The movable platform of claim 13 or 14, wherein when the processor identifies scene category information for image blocks in the fused image, the processor is configured to:

16. The movable platform of claim 15, wherein the plurality of preset scene categories include at least one primary scene category and at least one secondary scene category, wherein any of the primary scene categories is sub-categorized by at least one of the secondary scene categories.

17. The movable platform according to claim 16, wherein when the processor obtains, through the first model, a probability that any image block in the fused image belongs to each of a plurality of preset scene categories, and takes a preset scene category corresponding to a maximum probability as the scene category information of the image block, the processor is configured to:

18. The movable platform of any one of claims 14-17, wherein after the processor uses the preset scene class corresponding to the maximum probability as the scene class information of the image block, the processor is further configured to:

19. The movable platform of any one of claims 14-18, wherein the processor is further configured to:

20. The movable platform of any one of claims 14-19, wherein the first model is a convolutional neural network model, the first model comprising a plurality of sequentially connected first processing units, the first processing units comprising a convolutional layer, a batch normalization layer, and an activation layer.

21. The movable platform of any one of claims 13-20, wherein, when the processor acquires a depth map from the point cloud data, the processor is configured to:

22. The movable platform of claim 21, wherein, when the processor inputs the point cloud data into a pre-trained first model to generate the depth map, the processor is configured to:

23. The movable platform of claim 21 or 22, wherein the second model is a convolutional neural network model, the second model comprising a plurality of sequentially connected second processing units, the second processing units comprising a convolutional layer, a batch normalization layer, and an activation layer.

24. The movable platform of any one of claims 13-23, wherein prior to the processor acquiring the lidar collected point cloud data and the camera captured image, the processor is further configured to:

25. The movable platform of any one of claims 13-24, wherein the movable platform comprises: at least one of a vehicle, a drone, and a robot.

26. A computer-readable storage medium, having stored thereon a computer program for execution by a processor to perform the method of any one of claims 1-12.