CN113298781B - Mars surface three-dimensional terrain detection method based on image and point cloud fusion - Google Patents

Mars surface three-dimensional terrain detection method based on image and point cloud fusion Download PDF

Info

Publication number
CN113298781B
CN113298781B CN202110565199.2A CN202110565199A CN113298781B CN 113298781 B CN113298781 B CN 113298781B CN 202110565199 A CN202110565199 A CN 202110565199A CN 113298781 B CN113298781 B CN 113298781B
Authority
CN
China
Prior art keywords
dimensional
point cloud
image
mars
camera
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110565199.2A
Other languages
Chinese (zh)
Other versions
CN113298781A (en
Inventor
高�浩
黄卫
胡海东
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University of Posts and Telecommunications
Original Assignee
Nanjing University of Posts and Telecommunications
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University of Posts and Telecommunications filed Critical Nanjing University of Posts and Telecommunications
Priority to CN202110565199.2A priority Critical patent/CN113298781B/en
Publication of CN113298781A publication Critical patent/CN113298781A/en
Application granted granted Critical
Publication of CN113298781B publication Critical patent/CN113298781B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/0002Inspection of images, e.g. flaw detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/047Probabilistic or stochastic networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/80Analysis of captured images to determine intrinsic or extrinsic camera parameters, i.e. camera calibration
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10024Color image
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10028Range image; Depth image; 3D point clouds
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20212Image combination
    • G06T2207/20221Image fusion; Image merging
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30204Marker

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Biophysics (AREA)
  • Molecular Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Probability & Statistics with Applications (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Quality & Reliability (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a Mars surface three-dimensional terrain detection method based on image and point cloud fusion, which comprises the following steps: acquiring image data of a region to be detected on the surface of the Mars; taking the image data of the area to be detected as the input of the trained three-dimensional target detection network; determining a terrain detection result of a region to be detected on the surface of the Mars according to the output of the three-dimensional target detection network; visually outputting a terrain detection result; according to the method, the image and the point cloud information are fused and utilized through a deep learning method, and the problem of Mars surface target terrain three-dimensional space detection is effectively solved.

Description

Mars surface three-dimensional terrain detection method based on image and point cloud fusion
Technical Field
The invention relates to a Mars surface three-dimensional terrain detection method based on image and point cloud fusion, and belongs to the technical field of Mars intelligent visual three-dimensional target detection and identification.
Background
The control of the landform and topography of mars is one of the bases for scientific research of mars, and the detection of the composition and distribution of the ground substance of mars is the key to the control of the landform and topography. Scientists can obtain a great deal of information just looking at the distribution and physical properties of rocks. Therefore, in the task of detecting the motion of the train, firstly, the train needs to identify and judge the environment of the train, acquire information through sensors such as a camera and the like, detect spatial information and category information of a target terrain, then continue to plan a route to determine the traveling direction, and use a detector to perform the next detection research.
Compared with 2D target detection, 3D target detection has obvious advantages, the mars train is ensured to safely travel, further survey and research are carried out on a specific target terrain, three-dimensional pose information of the terrain is needed, depth information is not carried by a two-dimensional pose in a picture, the traditional 2D target detection is to obtain the category of a target object and a detection frame in an image plane, the included parameters comprise the category, the central coordinate and the length and the width of the detection frame, and the 3 groups of parameters form description on the information such as the position and the size of the target object. The absolute scale and the position of an object cannot be obtained from the 2D picture, no method is available for effectively avoiding collision, and certain potential safety hazards are caused. Therefore, the 3D target detection of the Mars surface terrain plays an important role in the aspects of exploration research, movement, path planning and the like of the Mars vehicle, and has important significance for the target exploration of the Mars surface.
Disclosure of Invention
The invention aims to overcome the defects in the prior art, and provides a mars surface three-dimensional terrain detection method based on image and point cloud fusion.
In order to achieve the purpose, the invention is realized by adopting the following technical scheme:
the invention provides a Mars surface three-dimensional terrain detection method based on image and point cloud fusion, which comprises the following steps:
acquiring image data of a region to be detected on the surface of the Mars;
taking the image data of the area to be detected as the input of the trained three-dimensional target detection network;
determining a terrain detection result of a region to be detected on the surface of the Mars according to the output of the three-dimensional target detection network;
visually outputting a terrain detection result;
the image data of the region to be detected on the surface of the Mars are obtained by driving a calibrated camera set to move on the surface of the Mars through a trolley robot, and the calibrating of the camera set comprises calibrating a depth camera by using a calibration plate and obtaining camera internal parameters and camera external parameters;
the camera set comprises a color camera and a depth camera, and the image data comprises an RGB color image and a depth image.
Further, the training of the three-dimensional target detection network includes:
acquiring image data of a Mars simulation field, namely an RGB color image and a depth image;
converting the depth image of the Mars simulation field into a three-dimensional point cloud, and preprocessing the three-dimensional point cloud data;
labeling an RGB color image of a Mars simulation field to obtain a color image data set, and labeling the preprocessed three-dimensional point cloud data to obtain a three-dimensional point cloud data set;
the color graphic data set is used as the input of a two-dimensional target detection network, a Mars simulation field is detected, and a two-dimensional detection result is output;
taking the two-dimensional detection result and the three-dimensional point cloud data set as input, and extracting conical point cloud of a target area in the three-dimensional point cloud;
using the conical point cloud as the input of a three-dimensional target detection network, detecting the Mars simulation field and outputting a three-dimensional detection result;
and performing iterative training on the three-dimensional target detection network based on the three-dimensional detection result.
The image data of the surface of the Mars simulation field is obtained by driving a calibrated camera set to move in the Mars simulation field through a trolley robot.
Further, the acquiring of the image data includes:
adjusting the acquisition angle and the illumination environment of the camera set; and controlling the trolley robot to move by using a ros system according to preset parameters and routes, recording the terrain environment by the camera group to obtain a bag file, transmitting and storing the bag file, and analyzing the bag file according to the timestamp to obtain an RGB (red, green and blue) color image and a depth image of each frame.
Further, the converting the depth image into a three-dimensional point cloud comprises:
the camera internal reference is used as the constraint adjustment of coordinate transformation to convert the depth image into three-dimensional point cloud, and the formula is as follows:
Figure BDA0003080447250000031
wherein x, y, z are three-dimensional point cloud coordinates, x 'and y' are depth image coordinates, f x And f y Is the focal length in the camera parameters, and D is the depth value of the depth image.
Further, the three-dimensional point cloud coordinates are stored in a pcd point cloud format and a two-level bin file mode.
Further, the taking the color graphic data set as an input of the two-dimensional object detection network and outputting a two-dimensional detection result includes:
the color image data set is input into a two-dimensional target detection network Yolov5 to be trained to obtain weights, the trained weight files are used for predicting and identifying the target area to obtain the category and the position of the target area in the two-dimensional image, and the category and the position are output as two-dimensional detection results.
Further, the extracting the cone-shaped point cloud of the target area in the three-dimensional point cloud by using the two-dimensional detection result and the three-dimensional point cloud data set as input includes:
converting a three-dimensional point cloud data set under a depth camera coordinate system into a two-dimensional image under a color camera coordinate system based on coordinate conversion of the depth camera and the color camera;
taking the position in the two-dimensional detection result as a target area in the two-dimensional image, and converting the two-dimensional image in the target area into a three-dimensional point cloud data set, namely a cone-shaped point cloud, in the target area based on the coordinate conversion of the depth camera and the color camera;
the formula for converting the coordinates of the depth camera and the color camera is as follows:
P rgb =RP ir +T
wherein, P rgb And P ir Projection coordinate points under the color camera and the depth camera, respectively, R, T are a rotation matrix and a translation matrix of the camera extrinsic seed.
Further, the preprocessing of the three-dimensional point cloud data comprises format conversion and data normalization;
the marking of the RGB color image of the mars simulation field comprises marking operation by using label img marking software, wherein the marking comprises positions and types;
and the marking of the preprocessed three-dimensional point cloud data comprises marking operation by adopting a ros system, wherein the marking comprises a three-dimensional boundary bounding box and a spatial position.
Further, the step of using the cone-shaped point cloud as an input of a three-dimensional target detection network, detecting the mars simulation site and outputting a three-dimensional detection result comprises:
extracting the characteristics of the conical point cloud by adopting an improved PointSIFT network, thereby performing three-dimensional semantic segmentation;
performing feature extraction and prediction on the segmented object point cloud by using a T-Net sub-network so as to obtain the spatial position of the terrain of a target area;
performing network classification and regression on the object point cloud obtained by segmentation by using a box regression subnetwork, thereby obtaining a three-dimensional boundary bounding box of the target area terrain;
and outputting the space position and the three-dimensional boundary bounding box as a three-dimensional detection result.
Further, the size of the RGB color image is 1920x1080 pixels, and the RGB color image comprises three channels of RGB; the depth image has a size of 512x424 pixels, each pixel has 16 bits, occupies 2 bytes, and represents depth data by pixel, i.e., an actual distance, in millimeters.
Compared with the prior art, the invention has the following beneficial effects:
the Mars surface three-dimensional terrain detection method based on image and point cloud fusion comprises sub-modules of camera calibration, data acquisition, data preprocessing, data annotation, two-dimensional image terrain detection and three-dimensional terrain detection, result visualization and the like, image and point cloud information are fused, a neural network is used for being faster, more accurate and more stable than a traditional method, and the automation degree and the accuracy of terrain detection are improved; according to the method, the image and the point cloud information are fused by a deep learning method and applied to a Mars three-dimensional terrain detection scene, an original semantic segmentation network in a Frustum PointNets network is replaced, point cloud semantic segmentation is more accurately realized by using an improved PointSIFT-based network, meanwhile, the network is simplified, the detection speed is improved, the T-Net network and a box estimation sub-network are reserved to realize target space positioning and bounding box parameter regression, the space information of the target terrain is finally obtained and the result is visualized, and the problems of Mars surface terrain detection and positioning in the detection process of a Mars vehicle are effectively solved.
Drawings
Fig. 1 is a flowchart of a method for detecting a three-dimensional terrain on a surface of a mars according to an embodiment of the present invention;
FIG. 2 is a schematic structural diagram of a data acquisition device according to an embodiment of the present invention;
fig. 3 is a flowchart illustrating a three-dimensional target detection network according to an embodiment of the present invention;
FIG. 4 is a schematic diagram of an improved PointSIFT network of the present invention;
FIG. 5 is a diagram of a three-dimensional object detection network architecture in accordance with the present invention;
labeled in the figure as:
1. the system comprises a steering wheel 2, a motor control system 3, a camera set 4 and a laser radar.
Detailed Description
The invention is further described below with reference to the accompanying drawings. The following examples are only for illustrating the technical solutions of the present invention more clearly, and the protection scope of the present invention is not limited thereby.
The embodiment provides a mars surface three-dimensional terrain detection method based on image and point cloud fusion, as shown in fig. 1, the method comprises the following steps:
step 101, acquiring image data of a region to be detected on the surface of a Mars;
step 102, taking image data of a region to be detected as input of a trained three-dimensional target detection network;
103, determining a terrain detection result of the region to be detected on the surface of the Mars according to the output of the three-dimensional target detection network;
and 104, visually outputting the terrain detection result, namely visualizing the predicted result in the two-dimensional image and the three-dimensional point cloud respectively by obtaining the space position and the three-dimensional boundary bounding box of the mars terrain.
As shown in fig. 2, image data of a region to be detected on the surface of the mars is obtained by driving a calibrated camera set to move on the surface of the mars through a trolley robot, and the calibrating of the camera set comprises calibrating a depth camera by using a calibration plate and obtaining internal parameters and external parameters of the camera; the camera set includes a color camera and a depth camera, and the image data includes RGB color images and depth images. The size of the RGB color image is 1920x1080 pixels, and the RGB color image comprises three channels of RGB; the depth image has a size of 512 × 424 pixels, each of which has 16 bits and occupies 2 bytes, and depth data, i.e., an actual distance, is expressed in millimeters by pixels.
And the data acquisition device calibrates the camera set 3 by using the calibration plate, acquires the internal and external parameters of the camera, and adjusts the acquisition angle and the illumination environment of the camera set. A camera set 3 is carried by a trolley robot to collect original data in a mars simulation field, and the trolley mainly comprises a steering wheel 1, a chassis, a motor control system 2, a laser radar 4 and the like. When data are collected, hardware devices such as a motor control system 2 and a laser radar 4 are used, a ros system is used for controlling and moving the trolley robot, a camera set is used for recording bag files and realizing transmission and storage of the data, and the bag files are analyzed according to timestamps to complete preliminary collection of the data.
As can be seen from the above steps, the three-dimensional target detection network needs to be trained first, as shown in fig. 3, the method includes the following steps:
step 201, acquiring image data of a mars simulation field, namely an RGB (red, green and blue) color image and a depth image; in the embodiment, a Mars simulation field of a certain space yard is used for collecting a simulated Mars terrain, and the acquisition of the image data of the Mars simulation field is the same as the acquisition principle of the image data of the region to be detected on the surface of the Mars.
Step 202, converting the depth image of the mars simulation field into a three-dimensional point cloud, and performing preprocessing operation on the three-dimensional point cloud data, wherein the preprocessing comprises format conversion and data normalization;
the conversion of the depth image into a three-dimensional point cloud comprises: the camera internal reference is used as the constraint adjustment of coordinate transformation to convert the depth image into three-dimensional point cloud, and the formula is as follows:
Figure BDA0003080447250000071
where x, y, z are three-dimensional point cloud coordinates, x 'and y' are depth image coordinates, f x And f y The three-dimensional point cloud coordinate is stored in a pcd point cloud format and a two-level bin file mode.
Step 203, labeling the RGB color image of the mars simulation field by using labelImg labeling software to obtain a color image data set, wherein the labeling comprises positions and categories, and labeling the preprocessed three-dimensional point cloud data by using a ros system to obtain a three-dimensional point cloud data set, wherein the labeling comprises a three-dimensional boundary bounding box and a space position.
Step 204, taking the color graphic data set as the input of a two-dimensional target detection network, detecting a mars simulation field and outputting a two-dimensional detection result; specifically, the color image data set is input into a two-dimensional target detection network Yolov5 to be trained to obtain weights, the trained weight files are used for predicting and identifying the target area, the category and the position of the target area in the two-dimensional image are obtained, and the category and the position are output as two-dimensional detection results.
Yolov5 still uses the idea of Multi-Scale Training in Yolov2 to fine-tune the input size of the network after every few iterations. It uses Darknet-53 to remove the front 52 layers of the fully connected layer. This is so that: the network structure can have good classification results on ImageNet, thereby showing that the network can learn good characteristics. Compared with ResNet-152 and ResNet-101, Darknet-53 has not only little difference in classification accuracy, but also better calculation speed than ResNet-152 and ResNet-101, and more concise network structure. The reason why the Darknet-53 adopts a layer-hopping connection mode of a residual network and has better performance than the ResNet-152 and ResNet-101 deep networks is that the difference of basic units of the networks, the fewer the number of network layers, the fewer parameters and the less calculation amount are needed. Regarding the classification predicted value, the Yolov5 adopts a Logistic function to replace a Softmax function, so that the classifications are independent from each other, and the class decoupling is realized. With respect to the position predictor, Yolov5 does not typically predict the exact coordinates of the bounding box center, which predicts the offset associated with the upper left corner of the grid cell of the prediction target.
Step 205, taking the two-dimensional detection result and the three-dimensional point cloud data set as input, and extracting conical point cloud of a target area in the three-dimensional point cloud; the method specifically comprises the following steps:
taking the two-dimensional detection result and the three-dimensional point cloud data set as input, and extracting the cone point cloud of the target area in the three-dimensional point cloud comprises the following steps:
converting a three-dimensional point cloud data set under a depth camera coordinate system into a two-dimensional image under a color camera coordinate system based on coordinate conversion of the depth camera and the color camera;
taking the position in the two-dimensional detection result as a target area in the two-dimensional image, and converting the two-dimensional image in the target area into a three-dimensional point cloud data set, namely a cone-shaped point cloud, in the target area based on the coordinate conversion of the depth camera and the color camera;
the formula for converting the coordinates of the depth camera and the color camera is as follows:
P rgb =RP ir +T
wherein, P rgb And P ir Projection coordinate points under the color camera and the depth camera, respectively, R, T are a rotation matrix and a translation matrix of the camera extrinsic seed.
The size of the bounding box of the center pixel position of the candidate box of the target area terrain on the two-dimensional image is as follows:
b x =σ(t x )+c x
b y =σ(t y )+c y
Figure BDA0003080447250000092
Figure BDA0003080447250000091
wherein, (bx, by) is the coordinate of the center point of the bounding box needing to be predicted, b w Is the width of the bounding box, b h Is the height of the bounding box (t) x ,t y ) A coordinate offset value that is a predicted coordinate of a center point of the bounding box; t is t w And t h Is scale scaling, and outputs the offset between 0 and 1, t, respectively through sigmoid function w And c x Adding to obtain the X-axis coordinate of the central point of the bounding box, t h And c y Adding to obtain a Y-axis coordinate of the central point of the boundary frame; (c) x ,c y ) Is the position coordinate relative to the upper left corner of the bounding box; p is a radical of w 、p h The width and the height of the preset candidate frame mapped to the feature map are the width and the height of the anchor frame manually set; t is t w And p w After the action, the width b of the bounding box is obtained w ,t h And p h After the action, the height b of the bounding box is obtained h Then, the coordinates of the top left vertex and the bottom right vertex of the candidate frame are obtained by the following formula.
Figure BDA0003080447250000101
Figure BDA0003080447250000102
Figure BDA0003080447250000103
Figure BDA0003080447250000104
Knowing the projection matrix of the camera, the two-dimensional bounding box can be lifted to a cone that defines a three-dimensional search of the object.
Step 206, using the cone-shaped point cloud as an input of a three-dimensional target detection network, detecting the mars simulation site and outputting a three-dimensional detection result, specifically comprising:
extracting the characteristics of the conical point cloud by adopting an improved PointSIFT network, thereby performing three-dimensional semantic segmentation;
performing feature extraction and prediction on the segmented object point cloud by using a T-Net sub-network so as to obtain the spatial position of the terrain of the target area;
performing network classification and regression on the object point cloud obtained by segmentation by using a box regression subnetwork, thereby obtaining a three-dimensional boundary bounding box of the target area terrain;
and outputting the space position and the three-dimensional boundary bounding box as a three-dimensional detection result.
As shown in FIG. 4, the semantic segmentation network result in the F-PointNet network directly affects the precision of localization and bounding box regression. Therefore, the point cloud semantic segmentation is carried out by using the PointSIFT-based sub-network, so that the target point cloud and the spatial characteristics thereof can be better learned, and the point cloud classification accuracy can be improved. The PointSIFT is advantageous in that it takes into account the orientation-encoding (OE) and scale-perception (scale-aware) of the point cloud. The direction coding can enable the point cloud to sense information in different directions, and the characteristics of other point clouds around the point cloud are fused. The scale perception can enable the network to continuously change the weight parameters in the learning process so as to learn the size which is most suitable for extracting the point cloud features. The basic module of PointSIFT is a directional coding unit, which extracts features in 8 directions (up, down, left, right, up-left, down-left, up-right, down-right). By stacking multiple directional-encoding units (orientation-encoding units), information of different scales can be perceived by OE units on different layers, i.e. the capability of scale perception is provided. Meanwhile, the network uses SA (set interaction) and FP (feature propagation) modules based on the PointNet + + network, and the main tasks are down sampling and up sampling respectively. Finally, the output of the decoder is connected to the fully-connected layer for predicting the probability of each class.
Meanwhile, the input is the view cone obtained in the third step, so that the number of point clouds input into the semantic segmentation network is greatly reduced, the network is more simplified in order to increase the overall inference speed of the network, and the speed and the precision of the semantic segmentation network are remarkably improved by a model compression method. Specifically, the original point cloud is used as a network input, an n × D matrix is given as an input, which describes a point set with size n, each point has D-dimensional features, only X, Y, Z coordinates of 3D points are considered, and then D is 3. In fig. 3, the number of layers below is the shape of the corresponding output point set. For example, a first 1024x64 layer means 1024 dots, each dot having 64 characteristic channels. And then taking the output of the PointSIFT module as the input of a downsampling stage of the SA module to extract higher dimensional features. In the up-sampling stage, the FP module is used as a decoder, and the output of the FP module is used as the input of PointSIFT, so that the point cloud network learns the feature information in different directions and different scales. Finally, the output of the decoder is connected to the fully-connected layer for predicting the probabilities of the various classes.
After three-dimensional semantic segmentation, as shown in fig. 5, a probability score is predicted for each point, and the point cloud of the object of interest can be extracted and classified. After these segmented target points are obtained, their coordinates are further normalized to improve the translational invariance of the algorithm. F-Pointnet converts the point cloud to local coordinates by subtracting XYZ values from the centroid and estimates the center of the real object through a lightweight Net T-Net. And finally, estimating a three-dimensional boundary box of the object facing to any direction by the F-PointNet network through a box regression point network and a network for preprocessing point cloud conversion. For a given object, the object point cloud under the three-dimensional object coordinates outputs the object class score, but a parameter of a three-dimensional frame. A three-dimensional bounding box is parameterized by its center (cx, cy, cz), size (h, w, l), etc.
And step 207, performing iterative training on the three-dimensional target detection network based on the three-dimensional detection result.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The above description is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, several modifications and variations can be made without departing from the technical principle of the present invention, and these modifications and variations should also be regarded as the protection scope of the present invention.

Claims (8)

1. A Mars surface three-dimensional terrain detection method based on image and point cloud fusion is characterized by comprising the following steps:
acquiring image data of a region to be detected on the surface of the Mars;
taking the image data of the area to be detected as the input of the trained three-dimensional target detection network;
determining a terrain detection result of a region to be detected on the surface of the Mars according to the output of the three-dimensional target detection network;
visually outputting a terrain detection result;
the image data of the region to be detected on the surface of the Mars are obtained by driving a calibrated camera set to move on the surface of the Mars through a trolley robot, and the calibrating of the camera set comprises calibrating a depth camera by using a calibration plate and obtaining camera internal parameters and camera external parameters;
the camera set comprises a color camera and a depth camera, and the image data comprises an RGB color image and a depth image;
wherein the training of the three-dimensional target detection network comprises:
acquiring image data of a Mars simulation field, namely an RGB color image and a depth image;
converting the depth image of the Mars simulation field into a three-dimensional point cloud, and preprocessing the three-dimensional point cloud data;
labeling an RGB color image of a mars simulation field to obtain a color image data set, and labeling the preprocessed three-dimensional point cloud data to obtain a three-dimensional point cloud data set;
the color graphic data set is used as the input of a two-dimensional target detection network, a Mars simulation field is detected, and a two-dimensional detection result is output;
taking the two-dimensional detection result and the three-dimensional point cloud data set as input, and extracting conical point cloud of a target area in the three-dimensional point cloud;
using the conical point cloud as the input of a three-dimensional target detection network, detecting the Mars simulation field and outputting a three-dimensional detection result;
performing iterative training on the three-dimensional target detection network based on the three-dimensional detection result;
the image data of the surface of the Mars simulation field is obtained by driving a calibrated camera group to move in the Mars simulation field through a trolley robot;
the cone-shaped point cloud is used as the input of the three-dimensional target detection network, the detection of the Mars simulation field and the output of the three-dimensional detection result comprise:
extracting the characteristics of the conical point cloud by adopting an improved PointSIFT network, thereby performing three-dimensional semantic segmentation;
performing feature extraction and prediction on the segmented object point cloud by using a T-Net sub-network so as to obtain the spatial position of the terrain of the target area;
performing network classification and regression on the object point cloud obtained by segmentation by using a box regression subnetwork, thereby obtaining a three-dimensional boundary bounding box of the target area terrain;
outputting the space position and the three-dimensional boundary bounding box as a three-dimensional detection result;
the improved PointSIFT network comprises a first PointSIFT module, a first SA module, a second PointSIFT module, a second SA module, a first FP module, a third PointSIFT module, a second FP module, a fourth PointSIFT module and a full connection layer which are connected in sequence;
the first PointSIFT module, the second PointSIFT module, the third PointSIFT module and the fourth PointSIFT module respectively comprise a direction coding unit and a scale sensing unit, and the direction coding unit and the scale sensing unit are used for extracting features in different directions and different scales;
the first SA module and the second SA module are used for down-sampling, and the first FP module and the second FP module are used for up-sampling; and the full connection layer is used for receiving the processed characteristic information.
2. The Mars surface three-dimensional terrain detection method based on image and point cloud fusion as claimed in claim 1, wherein the acquisition of the image data comprises:
adjusting the acquisition angle and the illumination environment of the camera set; and controlling the trolley robot to move by using a ros system according to preset parameters and routes, recording the terrain environment by the camera group to obtain a bag file, transmitting and storing the bag file, and analyzing the bag file according to the timestamp to obtain an RGB (red, green and blue) color image and a depth image of each frame.
3. The Mars surface three-dimensional terrain detection method based on image and point cloud fusion of claim 1, characterized in that the depth image conversion into a three-dimensional point cloud comprises:
the camera internal reference is used as the constraint adjustment of coordinate transformation to convert the depth image into three-dimensional point cloud, and the formula is as follows:
Figure FDA0003736316580000031
wherein x, y, z are three-dimensional point cloud coordinates, x 'and y' are depth image coordinates, f x And f y Is the focal length in the camera parameters, and D is the depth value of the depth image.
4. The Mars surface three-dimensional terrain detection method based on image and point cloud fusion of claim 3, characterized in that the three-dimensional point cloud coordinates are saved in a pcd point cloud format and a two-level bin file.
5. The mars surface three-dimensional terrain detection method based on image and point cloud fusion, as claimed in claim 1, wherein the taking the color graphics dataset as an input of a two-dimensional object detection network and outputting a two-dimensional detection result comprises:
inputting the color image data set into a two-dimensional target detection network Yolov5 to be trained to obtain weights, predicting and identifying a target area by using the trained weight file to obtain the category and the position of the target area in a two-dimensional image, and outputting the category and the position as a two-dimensional detection result.
6. The Mars surface three-dimensional terrain detection method based on image and point cloud fusion as claimed in claim 1, wherein the extracting of the cone-shaped point cloud of the target area in the three-dimensional point cloud by taking the two-dimensional detection result and the three-dimensional point cloud data set as input comprises:
converting a three-dimensional point cloud data set under a depth camera coordinate system into a two-dimensional image under a color camera coordinate system based on coordinate conversion of the depth camera and the color camera;
taking the position in the two-dimensional detection result as a target area in the two-dimensional image, and converting the two-dimensional image in the target area into a three-dimensional point cloud data set, namely a cone-shaped point cloud, in the target area based on the coordinate conversion of the depth camera and the color camera;
the formula for converting the coordinates of the depth camera and the color camera is as follows:
P rgb =RP ir +T
wherein, P rgb And P ir Projection coordinate points under the color camera and the depth camera, respectively, R, T are a rotation matrix and a translation matrix of the camera extrinsic seed.
7. The Mars surface three-dimensional terrain detection method based on image and point cloud fusion as claimed in claim 1,
the preprocessing of the three-dimensional point cloud data comprises format conversion and data normalization;
the marking of the RGB color image of the mars simulation field comprises marking operation by using label img marking software, wherein the marking comprises positions and types;
and the marking of the preprocessed three-dimensional point cloud data comprises marking operation by adopting a ros system, wherein the marking comprises a three-dimensional boundary bounding box and a spatial position.
8. The mars surface three-dimensional terrain detection method based on image and point cloud fusion of claim 1, wherein the size of the RGB color image is 1920x1080 pixels, and the RGB color image comprises three RGB channels; the depth image has a size of 512x424 pixels, each pixel has 16 bits, occupies 2 bytes, and represents depth data by pixel, i.e., an actual distance, in millimeters.
CN202110565199.2A 2021-05-24 2021-05-24 Mars surface three-dimensional terrain detection method based on image and point cloud fusion Active CN113298781B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110565199.2A CN113298781B (en) 2021-05-24 2021-05-24 Mars surface three-dimensional terrain detection method based on image and point cloud fusion

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110565199.2A CN113298781B (en) 2021-05-24 2021-05-24 Mars surface three-dimensional terrain detection method based on image and point cloud fusion

Publications (2)

Publication Number Publication Date
CN113298781A CN113298781A (en) 2021-08-24
CN113298781B true CN113298781B (en) 2022-09-16

Family

ID=77324256

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110565199.2A Active CN113298781B (en) 2021-05-24 2021-05-24 Mars surface three-dimensional terrain detection method based on image and point cloud fusion

Country Status (1)

Country Link
CN (1) CN113298781B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114078151B (en) * 2022-01-19 2022-04-22 季华实验室 Point cloud fusion method and device, electronic equipment and storage medium
CN117944059B (en) * 2024-03-27 2024-05-31 南京师范大学 Track planning method based on vision and radar feature fusion

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109523552A (en) * 2018-10-24 2019-03-26 青岛智能产业技术研究院 Three-dimension object detection method based on cone point cloud
CN110264572A (en) * 2019-06-21 2019-09-20 哈尔滨工业大学 A kind of terrain modeling method and system merging geometrical property and mechanical characteristic
CN112287939A (en) * 2020-10-29 2021-01-29 平安科技(深圳)有限公司 Three-dimensional point cloud semantic segmentation method, device, equipment and medium

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109523552A (en) * 2018-10-24 2019-03-26 青岛智能产业技术研究院 Three-dimension object detection method based on cone point cloud
CN110264572A (en) * 2019-06-21 2019-09-20 哈尔滨工业大学 A kind of terrain modeling method and system merging geometrical property and mechanical characteristic
CN112287939A (en) * 2020-10-29 2021-01-29 平安科技(深圳)有限公司 Three-dimensional point cloud semantic segmentation method, device, equipment and medium

Also Published As

Publication number Publication date
CN113298781A (en) 2021-08-24

Similar Documents

Publication Publication Date Title
CN110175576B (en) Driving vehicle visual detection method combining laser point cloud data
CN110956651B (en) Terrain semantic perception method based on fusion of vision and vibrotactile sense
US10915793B2 (en) Method and system for converting point cloud data for use with 2D convolutional neural networks
US20180307921A1 (en) Image-Based Pedestrian Detection
CN110706248A (en) Visual perception mapping algorithm based on SLAM and mobile robot
CN117441197A (en) Laser radar point cloud dynamic segmentation and fusion method based on driving safety risk field
CN113298781B (en) Mars surface three-dimensional terrain detection method based on image and point cloud fusion
CN108711172B (en) Unmanned aerial vehicle identification and positioning method based on fine-grained classification
CN115187964A (en) Automatic driving decision-making method based on multi-sensor data fusion and SoC chip
CN114821507A (en) Multi-sensor fusion vehicle-road cooperative sensing method for automatic driving
CN114325634A (en) Method for extracting passable area in high-robustness field environment based on laser radar
CN117058646B (en) Complex road target detection method based on multi-mode fusion aerial view
CN113592905A (en) Monocular camera-based vehicle running track prediction method
CN106845458A (en) A kind of rapid transit label detection method of the learning machine that transfinited based on core
CN116597122A (en) Data labeling method, device, electronic equipment and storage medium
Liao et al. Lr-cnn: Local-aware region cnn for vehicle detection in aerial imagery
CN113538585B (en) High-precision multi-target intelligent identification, positioning and tracking method and system based on unmanned aerial vehicle
CN113219472B (en) Ranging system and method
Zhao et al. Improving autonomous vehicle visual perception by fusing human gaze and machine vision
CN117423077A (en) BEV perception model, construction method, device, equipment, vehicle and storage medium
Bruno et al. Computer vision system with 2d and 3d data fusion for detection of possible auxiliaries routes in stretches of interdicted roads
Hadzic et al. Rasternet: Modeling free-flow speed using lidar and overhead imagery
CN112395956A (en) Method and system for detecting passable area facing complex environment
CN113836975A (en) Binocular vision unmanned aerial vehicle obstacle avoidance method based on YOLOV3
Liu et al. A lightweight lidar-camera sensing method of obstacles detection and classification for autonomous rail rapid transit

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant