CN113139602A - 3D target detection method and system based on monocular camera and laser radar fusion - Google Patents

3D target detection method and system based on monocular camera and laser radar fusion Download PDF

Info

Publication number
CN113139602A
CN113139602A CN202110447403.0A CN202110447403A CN113139602A CN 113139602 A CN113139602 A CN 113139602A CN 202110447403 A CN202110447403 A CN 202110447403A CN 113139602 A CN113139602 A CN 113139602A
Authority
CN
China
Prior art keywords
point cloud
cloud data
monocular camera
point
target detection
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110447403.0A
Other languages
Chinese (zh)
Inventor
张宇轩
郝洁
陈兵
邓海
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University of Aeronautics and Astronautics
Original Assignee
Nanjing University of Aeronautics and Astronautics
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University of Aeronautics and Astronautics filed Critical Nanjing University of Aeronautics and Astronautics
Priority to CN202110447403.0A priority Critical patent/CN113139602A/en
Publication of CN113139602A publication Critical patent/CN113139602A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01SRADIO DIRECTION-FINDING; RADIO NAVIGATION; DETERMINING DISTANCE OR VELOCITY BY USE OF RADIO WAVES; LOCATING OR PRESENCE-DETECTING BY USE OF THE REFLECTION OR RERADIATION OF RADIO WAVES; ANALOGOUS ARRANGEMENTS USING OTHER WAVES
    • G01S17/00Systems using the reflection or reradiation of electromagnetic waves other than radio waves, e.g. lidar systems
    • G01S17/66Tracking systems using electromagnetic waves other than radio waves
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01SRADIO DIRECTION-FINDING; RADIO NAVIGATION; DETERMINING DISTANCE OR VELOCITY BY USE OF RADIO WAVES; LOCATING OR PRESENCE-DETECTING BY USE OF THE REFLECTION OR RERADIATION OF RADIO WAVES; ANALOGOUS ARRANGEMENTS USING OTHER WAVES
    • G01S17/00Systems using the reflection or reradiation of electromagnetic waves other than radio waves, e.g. lidar systems
    • G01S17/86Combinations of lidar systems with systems other than lidar, radar or sonar, e.g. with direction finders
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01SRADIO DIRECTION-FINDING; RADIO NAVIGATION; DETERMINING DISTANCE OR VELOCITY BY USE OF RADIO WAVES; LOCATING OR PRESENCE-DETECTING BY USE OF THE REFLECTION OR RERADIATION OF RADIO WAVES; ANALOGOUS ARRANGEMENTS USING OTHER WAVES
    • G01S17/00Systems using the reflection or reradiation of electromagnetic waves other than radio waves, e.g. lidar systems
    • G01S17/88Lidar systems specially adapted for specific applications
    • G01S17/93Lidar systems specially adapted for specific applications for anti-collision purposes
    • G01S17/931Lidar systems specially adapted for specific applications for anti-collision purposes of land vehicles
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/136Segmentation; Edge detection involving thresholding
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/181Segmentation; Edge detection involving edge growing; involving edge linking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/194Segmentation; Edge detection involving foreground-background segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/60Analysis of geometric attributes
    • G06T7/66Analysis of geometric attributes of image moments or centre of gravity
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/80Analysis of captured images to determine intrinsic or extrinsic camera parameters, i.e. camera calibration
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10028Range image; Depth image; 3D point clouds
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10032Satellite or aerial image; Remote sensing
    • G06T2207/10044Radar image
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Data Mining & Analysis (AREA)
  • Remote Sensing (AREA)
  • Radar, Positioning & Navigation (AREA)
  • Electromagnetism (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Probability & Statistics with Applications (AREA)
  • Geometry (AREA)
  • Traffic Control Systems (AREA)
  • Length Measuring Devices By Optical Means (AREA)

Abstract

The invention relates to a 3D target detection method and a system based on monocular camera and laser radar fusion, wherein the method comprises the following steps: acquiring an image acquired by a monocular camera; calculating an instance segmentation score of each pixel point in the image based on an instance segmentation network; acquiring 3D point cloud data of a laser radar; fusing the instance segmentation scores and the 3D point cloud data to obtain fused 3D point cloud data; and performing 3D target detection on the fused 3D point cloud data by adopting a point cloud depth model algorithm to obtain a 3D boundary frame of the detected object. According to the invention, through the data fusion process, the problem that the visual angle of the monocular camera is inconsistent with the visual angle of the laser radar in the fusion process can be effectively solved, and the fusion efficiency is higher compared with the prior art.

Description

3D target detection method and system based on monocular camera and laser radar fusion
Technical Field
The invention relates to the technical field of fusion of laser radars and cameras, in particular to a 3D target detection method and system based on fusion of a monocular camera and a laser radar.
Background
With the rapid development of the fields of artificial intelligence and big data, the automatic driving technology is greatly promoted, and higher requirements are provided for the environment perception capability of the automatic driving automobile. The multi-sensor fusion technology can solve the problem of inherent defects of a single sensor, and improve the stability and safety of the automatic driving system.
The image sensor has high resolution, but poor depth estimation precision, stability and robustness; the laser radar has low resolution, but the point cloud ranging accuracy is very high, and the anti-interference capability to the outdoor environment is also strong, so that the effective advantage complementation can be formed by combining the sparse depth data and the dense image depth data of the laser radar, and the method is also the mainstream and the key point of the sensor fusion research in the field of automatic driving at present.
However, the current fusion method of the image sensor and the laser radar cannot effectively solve the problem of difference of visual angles and data characteristics of different sensors, the fusion efficiency is low, compared with the detection method using a single laser radar sensor, the calculation cost is greatly increased, and the improved detection effect is not ideal enough.
Disclosure of Invention
The invention aims to provide a 3D target detection method and a system based on monocular camera and laser radar fusion, which are used for solving the problems of view angle difference and data characteristic difference existing in the fusion of image and laser radar data, improving the detection efficiency and precision and carrying out accurate and rapid 3D target detection.
In order to achieve the purpose, the invention provides the following scheme:
A3D target detection method based on monocular camera and laser radar fusion comprises the following steps:
acquiring an image acquired by a monocular camera;
calculating an instance segmentation score of each pixel point in the image based on an instance segmentation network;
acquiring 3D point cloud data of a laser radar;
fusing the instance segmentation scores and the 3D point cloud data to obtain fused 3D point cloud data;
and performing 3D target detection on the fused 3D point cloud data by adopting a point cloud depth model algorithm to obtain a 3D boundary frame of the detected object.
Optionally, the output of the example segmentation network includes a classification branch and a mask branch, where the classification branch is used to predict semantic categories of objects and obtain corresponding probability values, and the mask branch is used to calculate example masks of the objects.
Optionally, the calculating the instance segmentation score of each pixel point in the image based on the instance segmentation network specifically includes:
obtaining a prediction probability value of the classification branch;
judging whether the prediction probability value is larger than a first threshold value or not;
if so, acquiring a position index corresponding to the prediction probability value;
dividing the mask branches into an X direction and a Y direction;
calculating a mask according to the position index, the X direction and the Y direction;
acquiring a mask which is larger than a second threshold value in the masks;
performing local maximum search on the mask which is larger than the second threshold value to obtain the maximum value of the mask;
and carrying out size scaling on the mask maximum value according to the size of the original image to obtain an example segmentation score.
Optionally, the example segmentation score and the 3D point cloud data are fused to obtain fused 3D point cloud data, and the method specifically includes:
acquiring external parameters of a monocular camera and a laser radar, wherein the external parameters comprise a rotation matrix and a translation matrix;
projecting the 3D point cloud data of the laser radar to a monocular camera three-dimensional coordinate system according to the external parameters;
acquiring internal parameters of a monocular camera, wherein the internal parameters comprise an internal parameter matrix and a distortion parameter matrix;
projecting points under the three-dimensional coordinate system of the monocular camera to an imaging plane according to the internal reference to obtain the corresponding relation between the 3D point cloud data and the image pixels of the laser radar;
and adding the instance segmentation score of each pixel point in the image to the 3D point cloud data according to the corresponding relation between the 3D point cloud data of the laser radar and the image pixels to obtain fused 3D point cloud data.
Optionally, the depth model algorithm of the point cloud performs 3D target detection on the fused 3D point cloud data to obtain a 3D bounding box of the detected object, and specifically includes:
segmenting the fused 3D point cloud data through learning features point by point to obtain segmented foreground points;
generating a 3D proposal according to the segmented foreground points;
3D point cloud data after pooling fusion according to the 3D proposal and corresponding point characteristics;
and generating a 3D boundary frame of the detected object according to the 3D point cloud data after pooling and the point characteristics corresponding to the point cloud data.
Optionally, the first threshold is 0.1, and the second threshold is 0.5.
A3D target detection system based on monocular camera and lidar fusion comprises:
the image acquisition module is used for acquiring an image acquired by the monocular camera;
the example segmentation score calculation module is used for calculating the example segmentation score of each pixel point in the image based on an example segmentation network;
the 3D point cloud data acquisition module is used for acquiring 3D point cloud data of the laser radar;
the data fusion module is used for fusing the instance segmentation scores with the 3D point cloud data to obtain fused 3D point cloud data;
and the target detection module is used for carrying out 3D target detection on the fused 3D point cloud data by adopting a point cloud depth model algorithm to obtain a 3D boundary frame of the detected object.
Optionally, the example segmentation score calculating module specifically includes:
the classification branch unit is used for acquiring the prediction probability value of the classification branch;
the first judging unit is used for judging whether the prediction probability value is larger than a first threshold value or not;
the position index unit is used for acquiring a position index corresponding to the prediction probability value when the prediction probability value is larger than a first threshold value;
a mask branching unit for dividing the mask branch into an X direction and a Y direction;
a mask calculation unit for calculating a mask according to the position index, the X direction and the Y direction;
a second judging unit, configured to obtain a mask that is greater than a second threshold value from among the masks;
the local search unit is used for performing local maximum search on the mask which is greater than the second threshold value to obtain the maximum value of the mask;
and the example division score calculating unit is used for carrying out size scaling on the mask maximum value according to the original image size to obtain the example division score.
Optionally, the data fusion module specifically includes:
the external parameter acquisition unit is used for acquiring external parameters of the monocular camera and the laser radar, and the external parameters comprise a rotation matrix and a translation matrix;
the first projection unit is used for projecting the 3D point cloud data of the laser radar to a monocular camera three-dimensional coordinate system according to the external parameters;
the internal reference acquisition unit is used for acquiring internal reference of the monocular camera, and the internal reference comprises an internal reference matrix and a distortion parameter matrix;
the second projection unit is used for projecting points under the three-dimensional coordinate system of the monocular camera to an imaging plane according to the internal reference to obtain the corresponding relation between the 3D point cloud data and the image pixels of the laser radar;
and the data fusion unit is used for adding the instance segmentation scores of each pixel point in the image to the 3D point cloud data according to the corresponding relation between the 3D point cloud data of the laser radar and the image pixels to obtain fused 3D point cloud data.
Optionally, the target detection module specifically includes:
the foreground extraction unit is used for segmenting the fused 3D point cloud data through learning features point by point to obtain segmented foreground points;
a 3D proposal generating unit, which is used for generating a 3D proposal according to the segmented foreground points;
the point cloud data pooling unit is used for pooling the fused 3D point cloud data and corresponding point characteristics according to the 3D proposal;
and the 3D bounding box refining unit is used for generating a 3D bounding box of the detected object according to the 3D point cloud data after the pooling and the point characteristics corresponding to the 3D point cloud data.
According to the specific embodiment provided by the invention, the invention discloses the following technical effects:
according to the 3D target detection method based on the fusion of the monocular camera and the laser radar, the problem that the visual angle of the monocular camera is inconsistent with the visual angle of the laser radar in the fusion process can be effectively solved through the data fusion process, and the fusion efficiency is higher compared with the prior art.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings needed to be used in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings without inventive exercise.
FIG. 1 is a flow chart of a 3D target detection method based on monocular camera and laser radar fusion according to the present invention;
FIG. 2 is a block diagram of a 3D target detection system based on the fusion of a monocular camera and a laser radar.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
It should be noted that the terms "first," "second," and the like in the description and claims of the present invention and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the invention described herein are capable of operation in sequences other than those illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
The invention aims to provide a 3D target detection method and a system based on monocular camera and laser radar fusion, which are used for solving the problems of view angle difference and data characteristic difference existing in the fusion of image and laser radar data, improving the detection efficiency and precision and carrying out accurate and rapid 3D target detection.
In order to make the aforementioned objects, features and advantages of the present invention comprehensible, embodiments accompanied with figures are described in further detail below.
Fig. 1 is a flowchart of a 3D target detection method based on monocular camera and lidar fusion according to the present invention, and as shown in fig. 1, a 3D target detection method based on monocular camera and lidar fusion includes:
step 101: acquiring an image acquired by a monocular camera;
step 102: calculating an instance segmentation score of each pixel point in the image based on an instance segmentation network;
where an example segmentation network divides a picture into n x n grids that have two tasks if the centroid of an object is located in a certain grid: (1) predicting a semantic category of the object; (2) an instance mask of the object is generated.
The two tasks of the example segmentation network are realized through a classification branch and a mask branch of the example segmentation network, and simultaneously, objects with different sizes are distributed to feature maps with different levels by using the feature pyramid network and are sequentially used as size categories of the objects.
It should be noted that, the specific design method of the example segmentation network is as follows: the example splits the network output into two branches: a classification branch and a mask branch. Equally dividing the picture into S multiplied by S grids, wherein the size of the classification branch is S multiplied by C, and C is the number of categories; mask branch size is H × W × S2,S2The predicted maximum number of instances corresponds to the original image position from top to bottom and from left to right. When the center of the target object falls into the grid (i, j), the corresponding position of the classification branch and the corresponding channel of the mask branch are responsible for predicting the object.
The method comprises the following specific steps: (1) for each mesh, the classification branch predicts a C-dimensional output representing the probability of a semantic class. Extracting the prediction probability values of all the classified branches, filtering by using a first threshold (for example, 0.1), and filtering probability values larger than the first threshold; (2) obtaining position indexes i, j corresponding to the remaining classifications after filtering; (3) dividing a mask branch into X and Y directions, and acquiring the mask of the category by an X branch i channel and a Y branch j channel in an element-wise multiplication mode so as to establish a one-to-one corresponding relation between the semantic category and the mask; (4) screening the masks by using a second threshold (for example, 0.5), and screening masks larger than the second threshold; (5) performing local maximum search on all the screened masks; (6) and scaling the final mask obtained after the local maximum search to the size of the original image to obtain the instance segmentation score of each pixel point.
Step 103: acquiring 3D point cloud data of a laser radar;
step 104: fusing the instance segmentation scores and the 3D point cloud data to obtain fused 3D point cloud data;
the data fusion process is mainly characterized by 1) joint calibration: finding a space conversion relation from the laser radar to the camera, and projecting laser radar points to an image through a conversion matrix; 2) information fusion: and adding the example segmentation score obtained by each pixel point to the laser radar point, and realizing the two steps.
The method comprises the following specific steps: (1) acquiring external parameters (a rotation matrix and a translation matrix) of the monocular camera and the laser radar, and projecting points under a three-dimensional coordinate system of the laser radar point cloud to the three-dimensional coordinate system of the monocular camera; (2) obtaining monocular camera internal parameters (an internal parameter matrix and a distortion parameter matrix) through monocular camera calibration, and projecting points under a monocular camera three-dimensional coordinate system to an imaging plane so as to establish a corresponding relation between laser radar point cloud and image pixels; (3) and adding the instance segmentation score obtained by each pixel point to the laser radar point according to the corresponding relation.
It should be noted that if there are overlapping fields of view of multiple monocular cameras, there may be a case where the lidar points are projected on multiple images simultaneously, and at this time, the instance division scores in one image are randomly selected.
Step 105: and performing 3D target detection on the fused 3D point cloud data by adopting a point cloud depth model algorithm to obtain a 3D boundary frame of the detected object.
The method comprises the following specific steps: 1) generating a 3D proposal: learning the characteristics of laser radar points of segmentation scores of the fused image examples point by point, segmenting an original point cloud, and generating a 3D proposal from segmented foreground points;
2) b, point cloud regional pooling: pooling each point and its features according to the position of each 3D proposal;
3)3D bounding box optimization: and converting the points after 3D proposal pooling into standard coordinates, learning local spatial features and global semantic features, and using the points for 3D bounding box optimization and confidence prediction.
The depth model in the embodiment of the invention mainly comprises three modules: the system comprises a 3D proposal generation module, a point cloud region pooling module and a standard 3D bounding box refining module. Wherein, the 3D proposal generation module divides the original point cloud (namely laser radar point cloud data of the fusion image instance division score) by learning the characteristics point by point, and simultaneously generates the 3D proposal from the already divided foreground points; the point cloud area pooling module pools the 3D points from the previous stage and their corresponding point features according to each 3D proposal with the aim of learning more specific local features of each 3D proposal; namely, the pooled object is fused 3D point cloud data and has the function of retaining key features; and the standard 3D bounding box refining module receives the pooling points of each 3D proposal and the related characteristics of the pooling points, finely adjusts the position of the 3D box and the confidence coefficient of the foreground object and generates a refined detected object 3D bounding box.
Specifically, the 3D proposal generation module in object detection specifically includes the following parts:
(1) learning point cloud representation: in order to learn distinctive point-by-point features to describe the original point cloud, the embodiment of the invention uses PointNet + + with multi-scale grouping as a backbone network;
(2) foreground point segmentation: foreground points can provide sufficient information in predicting the location and direction of objects with which they are associated. In order to learn how to segment foreground points, the point cloud network needs to capture context to make accurate point-by-point predictions. The 3D proposal generation method designed by the embodiment of the invention directly generates the 3D frame proposal from the foreground point, namely, the foreground segmentation and the 3D frame proposal generation are carried out simultaneously. In view of the point-wise nature of the backbone network coding, embodiments of the present invention add a segmentation header for estimating the foreground mask and a bounding box regression header for generating the 3D proposal. For an external scene with a large scale, the number of foreground points is much smaller than that of background points, so the embodiment of the present invention uses a focus loss function to handle the imbalance-like problem:
Figure BDA0003037380350000071
wherein the content of the first and second substances,
Figure BDA0003037380350000072
setting alpha during training point cloud segmentationt=0.25,γ=2。
(3) Bin-based three-dimensional bounding box generation: a 3D bounding box is represented in the lidar coordinate system as (x, y, z, h, w, l, θ), where (x, y, z) is the target center position, (h, w, l) is the size of the target, and θ is the target direction in top view. To constrain the generated 3D box proposal, embodiments of the present invention propose estimating the 3D bounding box of an object based on the regression loss function of the bins. To estimate the center position of an object, embodiments of the present invention split the region around each foreground point into a series of discrete bins along the Z, X axis. Specifically, the embodiment of the present invention sets a search range S for each X, Z axis of the current foreground point, and each one-dimensional search range is divided into bins of equal length δ to represent the center (X, Z) of the different objects in the X-Z plane. The localization loss function for the X, Z axis consists of two parts: one part is bin classification along each X, Z axis and one part is residual regression in the classified bins. For the center position Y along the Y-axis, the regression is done directly using the smoothed L1 loss function. The positioning formula is as follows:
Figure BDA0003037380350000081
Figure BDA0003037380350000082
Figure BDA0003037380350000083
Figure BDA0003037380350000084
wherein (x)(p),y(p),z(p)) Is the coordinate of the foreground point of interest, (x)p,yp,zp) Is the center coordinates of its corresponding object,
Figure BDA0003037380350000085
is the true value of the bin assigned along the X, Z axes,
Figure BDA0003037380350000086
is the true residual for further localization fine tuning in the assigned bin, and iota is the normalized bin length.
(4) Setting object properties: the embodiment of the invention divides the direction 2 pi into n bins, and then calculates bin classification targets in the directions of x and z
Figure BDA0003037380350000087
And residual regression object
Figure BDA0003037380350000088
The dimensions (h, w, l) of the object are directly regressed by calculating the average target dimension for each class over the entire training set.
(5) Initializing and setting related parameters: in the inference phase, for bin-based prediction parameters x, z, θ, the bin center with the highest prediction confidence is selected first, and the prediction residuals are added to obtain the fine-tuned parameters. For other parameters of the direct regression, including y, h, w, l, the prediction residuals are added to their initial values.
(6) Training loss function: regression loss L of the entire 3D bounding box under different training loss termsregIs represented as follows:
Figure BDA0003037380350000091
Figure BDA0003037380350000092
Figure BDA0003037380350000093
wherein N isposIs the number of the points of sight of the foreground,
Figure BDA0003037380350000094
and
Figure BDA0003037380350000095
is the bin allocation and residual for which the foreground point p is predicted,
Figure BDA0003037380350000096
and
Figure BDA0003037380350000097
is a calculated true object, FclsIs a cross-entropy loss of classification, FregA smooth L1 loss.
(7) Non-maxima suppression of training and reasoning: to remove redundant proposals, non-maximum suppression of the bird's-eye-based orientation IoU needs to be used to generate a small (no specific number requirement) high quality proposal. The threshold of bird's eye view IoU was 0.85 at the time of training, and the non-maximum suppression retained the first 300 proposals for subsequent subnetwork training. The threshold for setting the bird's eye view IoU at inference is 0.8, and the non-maximum suppression retains the first 100 proposals for use by subsequent trim subnetworks.
The point cloud area pooling module specifically comprises the following parts:
(1) expanding a 3D recommendation box: for each 3D recommendation box bi=(xi,yi,zi,hi,wi,li,θi) The appropriate zoom-in operation is required to create a new 3D frame
Figure BDA0003037380350000098
To encode additional information from his environment, where η is a fixed value used to enlarge the size of the box.
(2) Determine if the point is within the expanded frame: for each point p ═ x(p),y(p),z(p)) Performing an inside/outside test to determine if the point is in the expanded recommended box
Figure BDA0003037380350000099
If so, the point and its features are retained for fine-tuning bi. Features associated with the inner point p include: its 3D point coordinate (x)(p),y(p),z(p))∈R3Its laser reflection intensity r(p)E.g. R, its predictive segmentation mask m from the previous stage(p)E {0, 1}, and its C-dimensional learning point feature from the previous stage represents f(p)∈Rc. By including a split mask m(p)To distinguish the foreground point or background point in the expanded frame, the feature f of the learning point(p)Valuable information is encoded by learning for segmentation and proposal generation.
The 3D bounding box refinement module specifically comprises the following parts:
(1) regular transformation: to take advantage of the high recall recommendation box generated by the 3D proposal generation module and estimate the residuals of the recommendation box parameters, embodiments of the present invention convert the pooling points belonging to each proposal into the canonical coordinate system of the corresponding 3D proposal. One proposed canonical coordinate system of 3D is represented as: the origin is in the middle of the recommendation box; the local X and Z axes are approximately parallel to the ground plane, X points in the proposed heading direction, and the other Z axis is perpendicular to X; the Y-axis is kept coincident with the lidar coordinate system.
(2) Fine-tuning the composition of the sub-networks: fine tuning sub-network from transformed local spatial point features p and their global semantic features f from the 3D proposal generation module(p)And (4) forming, and performing fine adjustment on a better frame and confidence.
(3) Defects and solutions to regular variations: while canonical transforms enable robust local spatial feature learning,but inevitably the depth information of each object is lost. For example, due to the fixed angular scanning resolution of the lidar sensor, distant objects typically have fewer points than nearby objects. To compensate for the loss of depth information, embodiments of the present invention will
Figure BDA0003037380350000101
Added to the feature at point p.
(4) After all the features of the proposal are obtained, for each proposal, the local spatial features of its associated points are first correlated
Figure BDA0003037380350000102
And an additional feature [ r(p),m(p),d(p)]Connecting to several full connection layers (the specific number is determined according to the situation), coding their local features into global features f(p)The same dimension. And then, connecting the local features and the global features to feed the local features and the global features into a network to obtain differentiated feature vectors, and performing confidence classification and fine adjustment of a frame.
(5) The box proposes the loss of refinement: the proposed fine-tuning employs a bin-based regression loss. If IoU for a true box and a 3D box proposal is greater than 0.55, then the true box is assigned to the 3D box proposal to learn box trimming. The overall loss function for the entire module is as follows:
Figure BDA0003037380350000103
where β is the 3D proposal from the 3D proposal generating module, βposIs a regression proposal, prob, that holds positive valuesiIs estimated
Figure BDA0003037380350000104
Confidence of (1), labeliIs its corresponding label. Finally, the overlapped bounding box is removed by applying directional non-maximum suppression of the bird's eye IoU threshold value of 0.01, and a three-dimensional bounding box of the detected object is generated.
In addition, corresponding to the above method, the present invention further provides a 3D target detection system based on monocular camera and lidar fusion, as shown in fig. 2, specifically including:
an image acquisition module 201, configured to acquire an image acquired by a monocular camera;
an example segmentation score calculation module 202, configured to calculate an example segmentation score of each pixel point in the image based on an example segmentation network;
a 3D point cloud data obtaining module 203, configured to obtain 3D point cloud data of the laser radar;
a data fusion module 204, configured to fuse the instance segmentation scores with the 3D point cloud data to obtain fused 3D point cloud data;
and the target detection module 205 is configured to perform 3D target detection on the fused 3D point cloud data by using a point cloud depth model algorithm to obtain a 3D bounding box of the detected object.
Due to the adoption of the technical scheme, the invention has the following advantages:
the 3D target detection method based on the fusion of the monocular camera and the laser radar can effectively solve the problem that the visual angle of the monocular camera is inconsistent with the visual angle of the laser radar in the fusion process, and compared with the prior art, the fusion efficiency is higher.
Compared with the prior art, the 3D target detection method based on the fusion of the monocular camera and the laser radar can improve the detection precision of small objects by adding detail information in case segmentation.
The 3D target detection method based on the fusion of the monocular camera and the laser radar adopts the instance segmentation as the fusion means, and the output information can not only act on the 3D target detection but also can act on tasks such as depth estimation, multi-target tracking and the like in an automatic driving task.
The embodiments in the present description are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other. For the system disclosed by the embodiment, the description is relatively simple because the system corresponds to the method disclosed by the embodiment, and the relevant points can be referred to the method part for description.
The principles and embodiments of the present invention have been described herein using specific examples, which are provided only to help understand the method and the core concept of the present invention; meanwhile, for a person skilled in the art, according to the idea of the present invention, the specific embodiments and the application range may be changed. In view of the above, the present disclosure should not be construed as limiting the invention.

Claims (10)

1. A3D target detection method based on monocular camera and laser radar fusion is characterized by comprising the following steps:
acquiring an image acquired by a monocular camera;
calculating an instance segmentation score of each pixel point in the image based on an instance segmentation network;
acquiring 3D point cloud data of a laser radar;
fusing the instance segmentation scores and the 3D point cloud data to obtain fused 3D point cloud data;
and performing 3D target detection on the fused 3D point cloud data by adopting a point cloud depth model algorithm to obtain a 3D boundary frame of the detected object.
2. The monocular camera and lidar fusion based 3D target detection method of claim 1, wherein the output of the instance segmentation network comprises a classification branch for predicting semantic categories of objects and deriving corresponding probability values, and a mask branch for calculating instance masks of objects.
3. The monocular camera and lidar fusion based 3D target detection method of claim 2, wherein the calculating an instance segmentation score for each pixel point in the image based on an instance segmentation network specifically comprises:
obtaining a prediction probability value of the classification branch;
judging whether the prediction probability value is larger than a first threshold value or not;
if so, acquiring a position index corresponding to the prediction probability value;
dividing the mask branches into an X direction and a Y direction;
calculating a mask according to the position index, the X direction and the Y direction;
acquiring a mask which is larger than a second threshold value in the masks;
performing local maximum search on the mask which is larger than the second threshold value to obtain the maximum value of the mask;
and carrying out size scaling on the mask maximum value according to the size of the original image to obtain an example segmentation score.
4. The monocular camera and lidar fusion based 3D target detection method of claim 1, wherein fusing the instance segmentation score with the 3D point cloud data to obtain fused 3D point cloud data specifically comprises:
acquiring external parameters of a monocular camera and a laser radar, wherein the external parameters comprise a rotation matrix and a translation matrix;
projecting the 3D point cloud data of the laser radar to a monocular camera three-dimensional coordinate system according to the external parameters;
acquiring internal parameters of a monocular camera, wherein the internal parameters comprise an internal parameter matrix and a distortion parameter matrix;
projecting points under the three-dimensional coordinate system of the monocular camera to an imaging plane according to the internal reference to obtain the corresponding relation between the 3D point cloud data and the image pixels of the laser radar;
and adding the instance segmentation score of each pixel point in the image to the 3D point cloud data according to the corresponding relation between the 3D point cloud data of the laser radar and the image pixels to obtain fused 3D point cloud data.
5. The monocular camera and lidar fusion based 3D target detection method of claim 1, wherein the depth model algorithm of the point cloud performs 3D target detection on the fused 3D point cloud data to obtain a 3D bounding box of the detected object, specifically comprising:
segmenting the fused 3D point cloud data through learning features point by point to obtain segmented foreground points;
generating a 3D proposal according to the segmented foreground points;
3D point cloud data after pooling fusion according to the 3D proposal and corresponding point characteristics;
and generating a 3D boundary frame of the detected object according to the 3D point cloud data after pooling and the point characteristics corresponding to the point cloud data.
6. The monocular camera and lidar fusion based 3D target detection method of claim 1, wherein the first threshold is 0.1 and the second threshold is 0.5.
7. A3D target detection system based on monocular camera and lidar fusion, characterized by comprising:
the image acquisition module is used for acquiring an image acquired by the monocular camera;
the example segmentation score calculation module is used for calculating the example segmentation score of each pixel point in the image based on an example segmentation network;
the 3D point cloud data acquisition module is used for acquiring 3D point cloud data of the laser radar;
the data fusion module is used for fusing the instance segmentation scores with the 3D point cloud data to obtain fused 3D point cloud data;
and the target detection module is used for carrying out 3D target detection on the fused 3D point cloud data by adopting a point cloud depth model algorithm to obtain a 3D boundary frame of the detected object.
8. The monocular camera and lidar fusion based 3D target detection system of claim 7, wherein the instance segmentation score calculation module specifically comprises:
the classification branch unit is used for acquiring the prediction probability value of the classification branch;
the first judging unit is used for judging whether the prediction probability value is larger than a first threshold value or not;
the position index unit is used for acquiring a position index corresponding to the prediction probability value when the prediction probability value is larger than a first threshold value;
a mask branching unit for dividing the mask branch into an X direction and a Y direction;
a mask calculation unit for calculating a mask according to the position index, the X direction and the Y direction;
a second judging unit, configured to obtain a mask that is greater than a second threshold value from among the masks;
the local search unit is used for performing local maximum search on the mask which is greater than the second threshold value to obtain the maximum value of the mask;
and the example division score calculating unit is used for carrying out size scaling on the mask maximum value according to the original image size to obtain the example division score.
9. The monocular camera and lidar fusion based 3D target detection system of claim 7, wherein the data fusion module specifically comprises:
the external parameter acquisition unit is used for acquiring external parameters of the monocular camera and the laser radar, and the external parameters comprise a rotation matrix and a translation matrix;
the first projection unit is used for projecting the 3D point cloud data of the laser radar to a monocular camera three-dimensional coordinate system according to the external parameters;
the internal reference acquisition unit is used for acquiring internal reference of the monocular camera, and the internal reference comprises an internal reference matrix and a distortion parameter matrix;
the second projection unit is used for projecting points under the three-dimensional coordinate system of the monocular camera to an imaging plane according to the internal reference to obtain the corresponding relation between the 3D point cloud data and the image pixels of the laser radar;
and the data fusion unit is used for adding the instance segmentation scores of each pixel point in the image to the 3D point cloud data according to the corresponding relation between the 3D point cloud data of the laser radar and the image pixels to obtain fused 3D point cloud data.
10. The monocular camera and lidar fusion based 3D target detection system of claim 7, wherein the target detection module specifically comprises:
the foreground extraction unit is used for segmenting the fused 3D point cloud data through learning features point by point to obtain segmented foreground points;
a 3D proposal generating unit, which is used for generating a 3D proposal according to the segmented foreground points;
the point cloud data pooling unit is used for pooling the fused 3D point cloud data and corresponding point characteristics according to the 3D proposal;
and the 3D bounding box refining unit is used for generating a 3D bounding box of the detected object according to the 3D point cloud data after the pooling and the point characteristics corresponding to the 3D point cloud data.
CN202110447403.0A 2021-04-25 2021-04-25 3D target detection method and system based on monocular camera and laser radar fusion Pending CN113139602A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110447403.0A CN113139602A (en) 2021-04-25 2021-04-25 3D target detection method and system based on monocular camera and laser radar fusion

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110447403.0A CN113139602A (en) 2021-04-25 2021-04-25 3D target detection method and system based on monocular camera and laser radar fusion

Publications (1)

Publication Number Publication Date
CN113139602A true CN113139602A (en) 2021-07-20

Family

ID=76811961

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110447403.0A Pending CN113139602A (en) 2021-04-25 2021-04-25 3D target detection method and system based on monocular camera and laser radar fusion

Country Status (1)

Country Link
CN (1) CN113139602A (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113985445A (en) * 2021-08-24 2022-01-28 中国北方车辆研究所 3D target detection algorithm based on data fusion of camera and laser radar
CN114359181A (en) * 2021-12-17 2022-04-15 上海应用技术大学 Intelligent traffic target fusion detection method and system based on image and point cloud
CN114913209A (en) * 2022-07-14 2022-08-16 南京后摩智能科技有限公司 Multi-target tracking network construction method and device based on overlook projection
CN116265862A (en) * 2021-12-16 2023-06-20 动态Ad有限责任公司 Vehicle, system and method for a vehicle, and storage medium
CN117994504A (en) * 2024-04-03 2024-05-07 国网江苏省电力有限公司常州供电分公司 Target detection method and target detection device
CN118015411A (en) * 2024-02-27 2024-05-10 北京化工大学 Automatic driving-oriented large vision language model increment learning method and device
CN117994504B (en) * 2024-04-03 2024-07-02 国网江苏省电力有限公司常州供电分公司 Target detection method and target detection device

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110472534A (en) * 2019-07-31 2019-11-19 厦门理工学院 3D object detection method, device, equipment and storage medium based on RGB-D data

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110472534A (en) * 2019-07-31 2019-11-19 厦门理工学院 3D object detection method, device, equipment and storage medium based on RGB-D data

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113985445A (en) * 2021-08-24 2022-01-28 中国北方车辆研究所 3D target detection algorithm based on data fusion of camera and laser radar
CN116265862A (en) * 2021-12-16 2023-06-20 动态Ad有限责任公司 Vehicle, system and method for a vehicle, and storage medium
CN114359181A (en) * 2021-12-17 2022-04-15 上海应用技术大学 Intelligent traffic target fusion detection method and system based on image and point cloud
CN114359181B (en) * 2021-12-17 2024-01-26 上海应用技术大学 Intelligent traffic target fusion detection method and system based on image and point cloud
CN114913209A (en) * 2022-07-14 2022-08-16 南京后摩智能科技有限公司 Multi-target tracking network construction method and device based on overlook projection
CN118015411A (en) * 2024-02-27 2024-05-10 北京化工大学 Automatic driving-oriented large vision language model increment learning method and device
CN117994504A (en) * 2024-04-03 2024-05-07 国网江苏省电力有限公司常州供电分公司 Target detection method and target detection device
CN117994504B (en) * 2024-04-03 2024-07-02 国网江苏省电力有限公司常州供电分公司 Target detection method and target detection device

Similar Documents

Publication Publication Date Title
CN111797716B (en) Single target tracking method based on Siamese network
CN110675418B (en) Target track optimization method based on DS evidence theory
CN110929692B (en) Three-dimensional target detection method and device based on multi-sensor information fusion
CN113139602A (en) 3D target detection method and system based on monocular camera and laser radar fusion
CN113159151B (en) Multi-sensor depth fusion 3D target detection method for automatic driving
CN109903331B (en) Convolutional neural network target detection method based on RGB-D camera
CN111830502B (en) Data set establishing method, vehicle and storage medium
CN110689562A (en) Trajectory loop detection optimization method based on generation of countermeasure network
CN111563415A (en) Binocular vision-based three-dimensional target detection system and method
CN114022830A (en) Target determination method and target determination device
CN109919026B (en) Surface unmanned ship local path planning method
CN110941996A (en) Target and track augmented reality method and system based on generation of countermeasure network
CN113643345A (en) Multi-view road intelligent identification method based on double-light fusion
CN114114312A (en) Three-dimensional target detection method based on fusion of multi-focal-length camera and laser radar
CN111292369A (en) Pseudo-point cloud data generation method for laser radar
CN114399734A (en) Forest fire early warning method based on visual information
CN111260687A (en) Aerial video target tracking method based on semantic perception network and related filtering
CN115115690A (en) Video residual decoding device and associated method
CN112529917A (en) Three-dimensional target segmentation method, device, equipment and storage medium
CN116958927A (en) Method and device for identifying short column based on BEV (binary image) graph
CN116664851A (en) Automatic driving data extraction method based on artificial intelligence
CN115861709A (en) Intelligent visual detection equipment based on convolutional neural network and method thereof
CN115249269A (en) Object detection method, computer program product, storage medium, and electronic device
CN114758087A (en) Method and device for constructing city information model
CN113569803A (en) Multi-mode data fusion lane target detection method and system based on multi-scale convolution

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination