CN112464905B - 3D target detection method and device - Google Patents

3D target detection method and device Download PDF

Info

Publication number
CN112464905B
CN112464905B CN202011494753.4A CN202011494753A CN112464905B CN 112464905 B CN112464905 B CN 112464905B CN 202011494753 A CN202011494753 A CN 202011494753A CN 112464905 B CN112464905 B CN 112464905B
Authority
CN
China
Prior art keywords
target
point cloud
shape
module
feature
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011494753.4A
Other languages
Chinese (zh)
Other versions
CN112464905A (en
Inventor
刘彩苹
易子越
李智勇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hunan University
Original Assignee
Hunan University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hunan University filed Critical Hunan University
Priority to CN202011494753.4A priority Critical patent/CN112464905B/en
Publication of CN112464905A publication Critical patent/CN112464905A/en
Application granted granted Critical
Publication of CN112464905B publication Critical patent/CN112464905B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/56Context or environment of the image exterior to a vehicle by using sensors mounted on the vehicle
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/2431Multiple classes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/40Scaling of whole images or parts thereof, e.g. expanding or contracting
    • G06T3/4038Image mosaicing, e.g. composing plane images from plane sub-images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • G06V10/267Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion by performing operations on regions, e.g. growing, shrinking or watersheds
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/56Extraction of image or video features relating to colour
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Multimedia (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Engineering & Computer Science (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a 3D target detection method, which comprises the steps of obtaining an original RGB image; performing 2D target detection on the RGB image to obtain a 2D bounding box and a target category; segmenting and resampling by using the 2D bounding box to obtain viewing cone point cloud data containing a target; cutting the RGB image by using the 2D bounding box to obtain a target RGB image; inputting a target RGB image into a feature extraction network to obtain RGB depth features; inputting the viewing cone point cloud data and the RGB depth characteristics into a segmentation network to obtain a segmentation mask and converting the segmentation mask into a target point cloud; and resampling the target point cloud and inputting the target point cloud into a 3D frame prediction network to obtain a final target 3D boundary frame. The invention also provides a device for realizing the 3D target detection method. The method disclosed by the invention integrates the RGB depth characteristics and the view cone point cloud data, so that the reliability is higher and the accuracy is better.

Description

3D target detection method and device
Technical Field
The invention belongs to the field of image processing, and particularly relates to a 3D target detection method and device.
Background
With the development of economic technology and the wide application of intelligent technology, the field of automatic driving becomes a research hotspot nowadays.
The multi-modal perception fusion technology is an important component of an automatic driving system; the automatic driving system often needs to fuse the perception data of various sensors and detect a target in a three-dimensional space, so that a real, reliable and reasonable expression of the surrounding environment of the vehicle is provided for a planning module.
The view cone point cloud is formed by dividing laser points in a space view cone where a 2D target is located by utilizing a target 2D boundary frame on an image plane and a mapping relation between a laser radar coordinate system and a camera coordinate system. Currently, there are many 3D object detection methods based on viewing cone point clouds:
the method a comprises the following steps: detecting a 2D target frame on the RGB image, inputting the viewing cone point cloud into a segmentation network after segmenting the viewing cone point cloud for carrying out target or non-target binary classification, outputting a segmentation mask and segmenting a target point cloud; inputting the target point cloud into a 3D boundary frame prediction network, performing regression on target center coordinates, classifying and performing regression on the size and the course angle, and finally outputting a target 3D boundary frame expressed in a vector (x, y, z, w, l, h and theta) form;
the method b comprises the following steps: on the basis of the method a, a Mask RCNN is introduced to directly output a 2D Mask of a target on an image plane, and the 2D Mask is used for segmenting an original point cloud to obtain a view cone point cloud instead of segmenting the original point cloud in a three-dimensional coordinate like a;
the method c comprises the following steps: on the basis of the method a, when the target or the non-target is classified, an attention mechanism is introduced to find out a space point and a characteristic channel which need to be concerned in the point cloud data so as to achieve the purpose of effectively increasing target information. And solving the problem of unbalanced target and background categories in the point cloud data by using Focal local.
However, the existing 3D target detection method still has the problems of poor accuracy and low reliability, thereby affecting the application of the multi-modal perceptual fusion technology.
Disclosure of Invention
One of the purposes of the invention is to provide a 3D target detection method with high reliability and good accuracy.
The second objective of the present invention is to provide a device for implementing the 3D object detection method.
The 3D target detection method provided by the invention comprises the following steps:
s1, acquiring an original RGB image;
s2, performing 2D target detection on the RGB image acquired in the step S1 to obtain a 2D boundary frame and a target category;
s3, segmenting and resampling by using the 2D boundary box obtained in the step S2 to obtain view cone point cloud data containing a target;
s4, cutting the RGB image obtained in the step S1 by using the 2D bounding box obtained in the step S2, and thus obtaining a target RGB image;
s5, inputting the target RGB image obtained in the step S4 into a feature extraction network to obtain RGB depth features;
s6, inputting the view cone point cloud data obtained in the step S3 and the RGB depth characteristics obtained in the step S5 into a segmentation network to obtain a segmentation mask, and converting the segmentation mask into target point cloud;
and S7, resampling the target point cloud obtained in the step S6, and inputting the target point cloud into a 3D frame prediction network, so as to obtain a final target 3D boundary frame.
In step S3, the 2D bounding box obtained in step S2 is used for segmentation and resampling to obtain the viewing cone point cloud data including the target, specifically, the 2D bounding box obtained in step S2 is used for segmentation, and 1024 points are resampled to obtain the viewing cone point cloud data including the target.
In step S4, the RGB image obtained in step S1 is clipped by using the 2D bounding box obtained in step S2, so as to obtain a target RGB image, specifically, a copyMakeBorder function in an OpenCV library is used, an edge is filled with a gray value (128,128,128), a square image with an aspect ratio of 1:1 is obtained, and the size is adjusted to be a fixed [244 × 224 ].
Inputting the target RGB image obtained in step S4 into the feature extraction network to obtain RGB depth features in step S5, specifically, inputting the target RGB image into a ResNet 50 network, outputting features having a shape of [1 × 1 × 2048], and performing dimension reduction by convolution of [1 × 1,128], thereby obtaining RGB depth features γ having a shape of [1 × 1 × 128 ].
Step S6, inputting the viewing cone point cloud data obtained in step S3 and the RGB depth features obtained in step S5 into a segmentation network to obtain a segmentation mask, and converting the segmentation mask into a target point cloud, specifically, obtaining the target point cloud by the following steps:
A. expanding the dimensionality of the view cone point cloud data tensor with the shape of [1024 × 4] obtained in the step S3, and then passing through 3 layers of [1 × 1,64] of convolution layers to obtain point-by-point feature alpha with the shape of [1024 × 1 × 64 ];
B. b, the pointwise features alpha obtained in the step A are subjected to two convolutional layers of [1 × 1,128] and [1 × 1,1024] respectively to obtain features of [1024 × 1 × 1024], and the features are subjected to maximum pooling in the first dimension to obtain global features beta of [1 × 1 × 1024 ];
C. since the target is of three types, the type of the target obtained in step S2 is expressed as a tensor of [3], and dimension expansion is performed to obtain a type feature δ of [1 × 1 × 3 ];
D. splicing the global feature beta obtained in the step B, the RGB depth feature gamma obtained in the step S5 and the category feature delta obtained in the step C in a third dimension, and copying the first dimension to obtain a feature epsilon with the shape of [1024 × 1 × 1155 ];
E. splicing the point-by-point feature alpha obtained in the step A and the feature epsilon obtained in the step D in a third dimension to obtain a feature with the shape of [1024 × 1 × 1219 ];
F. inputting the [1024 × 1 × 1219] feature obtained in step E into one [1 × 1,512] convolution layer, one [1 × 1,256] convolution layer, two [1 × 1,128] convolution layers, and one [1 × 1,2] convolution layer, respectively, and then deleting the second dimension to obtain a [1024 × 2] division mask;
G. and F, correspondingly classifying the two types of each point in 1024 points in the input view cone point cloud data by the aid of the segmentation mask with the shape of [1024 × 2], obtaining a point cloud with the shape of [1024 × 3] by segmentation, resampling the first dimension, and outputting a target point cloud with the shape of [1024 × 3 ].
In step S7, the target point cloud obtained in step S6 is resampled and input to the 3D frame prediction network, so as to obtain a final target 3D bounding box, specifically, the following steps are adopted to obtain the final target 3D bounding box:
a. resampling an input target point cloud with the shape of [1024 × 3] to be [512 × 3], and expanding one dimension in a second dimension to obtain a tensor with the shape of [512 × 1 × 3 ];
b. b, respectively passing the tensor of [512 multiplied by 1 multiplied by 3] obtained in the step a through two convolutional layers of [1 multiplied by 1,128], one convolutional layer of [1 multiplied by 1,256] and one convolutional layer of [1 multiplied by 1,512], and obtaining the characteristic of the shape of [512 multiplied by 1 multiplied by 512 ];
c. c, performing maximum pooling on the [512 multiplied by 1 multiplied by 512] characteristics obtained in the step b in a first dimension to obtain characteristics xi with the shape of [1 multiplied by 512 ];
d. reducing the dimension of the RGB depth feature gamma obtained in the step S5 to [1 × 1 × 64] through convolution of [1 × 1,64], splicing the RGB depth feature gamma with the feature xi of [1 × 1 × 512] obtained in the step c in the third dimension, and then deleting the first two dimensions to obtain a tensor with the shape of [576 ];
e. d, splicing the tensor with the shape of [576] obtained in the step d with the tensor of the target type with the shape of [3], and obtaining the characteristic with the shape of [579 ];
f. inputting the characteristics with the shape of [579] obtained in the step e into three full-connection layers with the widths of 512, 256 and 59 in sequence, and finally outputting a vector with the length of 59;
g. in the vector with the length of 59 obtained in the step f, the meaning of each item is as follows: target center x coordinate, target center y coordinate, target center z coordinate, the next 24 items are [ angle scores, corresponding residuals ] corresponding to 12 predefined angles, the next 32 items are [ size scores, height residuals, width residuals, length residuals ] corresponding to 8 predefined sizes;
h. and g, restoring to obtain a final target 3D boundary box according to the definition of the step g.
The invention also provides a device for realizing the 3D target detection method, which comprises an image acquisition module, a 2D target detection module, a segmentation resampling module, a cutting module, a feature extraction module, a network segmentation module and a 3D frame prediction network module; the image acquisition module, the 2D target detection module, the segmentation resampling module, the cutting module, the feature extraction module, the network segmentation module and the 3D frame prediction network module are sequentially connected in series; the image acquisition module is used for acquiring an original RGB image; the 2D target detection module is used for carrying out 2D target detection on the acquired original RGB image so as to obtain a 2D bounding box and a target category; the segmentation and resampling module is used for segmenting and resampling the obtained 2D bounding box so as to obtain the viewing cone point cloud data containing the target; the cutting module is used for cutting the RGB image by using the obtained 2D boundary frame so as to obtain a target RGB image; the feature extraction module is used for extracting features of the target RGB image so as to obtain RGB depth features; the network segmentation module is used for segmenting the view cone point cloud data and the RGB depth features to obtain a segmentation mask and converting the segmentation mask into a target point cloud; and the 3D frame prediction network module is used for resampling the target point cloud and inputting the resampled target point cloud into the 3D frame prediction network so as to obtain a final target 3D boundary frame.
According to the 3D target detection method provided by the invention, the RGB depth features and the view cone point cloud data are fused, detected and calculated, so that a final target 3D bounding box is obtained; the method disclosed by the invention integrates the RGB depth characteristics and the view cone point cloud data, so that the reliability is higher and the accuracy is better.
Drawings
FIG. 1 is a schematic process flow diagram of the process of the present invention.
FIG. 2 is a functional block diagram of the apparatus of the present invention.
Detailed Description
FIG. 1 is a schematic flow chart of the method of the present invention: the 3D target detection method provided by the invention comprises the following steps:
s1, acquiring an original RGB image;
s2, performing 2D target detection on the RGB image acquired in the step S1 to obtain a 2D boundary frame and a target category;
s3, segmenting and resampling by using the 2D bounding box obtained in the step S2 to obtain viewing cone point cloud data containing a target; specifically, the 2D bounding box obtained in the step S2 is used for segmentation, 1024 points are resampled, and therefore viewing cone point cloud data containing the target are obtained;
s4, cutting the RGB image obtained in the step S1 by using the 2D boundary frame obtained in the step S2, and thus obtaining a target RGB image; specifically, a copy MakeBorder function in an OpenCV library is adopted, edges are filled by gray values (128,128,128), a square image with the length-width ratio of 1:1 is obtained, and the size is adjusted to be a fixed [244 x 224 ];
s5, inputting the target RGB image obtained in the step S4 into a feature extraction network to obtain RGB depth features; inputting a target RGB image into a ResNet 50 network, outputting a feature with the shape of [1 multiplied by 2048], and performing dimensionality reduction through convolution of [1 multiplied by 1,128] to obtain an RGB depth feature gamma with the shape of [1 multiplied by 128 ];
s6, inputting the viewing cone point cloud data obtained in the step S3 and the RGB depth characteristics obtained in the step S5 into a segmentation network to obtain a segmentation mask, and converting the segmentation mask into a target point cloud; specifically, the method comprises the following steps of:
A. expanding the dimensionality of the view cone point cloud data tensor with the shape of [1024 × 4] obtained in the step S3, and then passing through 3 layers of [1 × 1,64] of convolution layers to obtain point-by-point feature alpha with the shape of [1024 × 1 × 64 ];
B. b, the pointwise features alpha obtained in the step A are subjected to two convolutional layers of [1 × 1,128] and [1 × 1,1024] respectively to obtain features of [1024 × 1 × 1024], and the features are subjected to maximum pooling in the first dimension to obtain global features beta of [1 × 1 × 1024 ];
C. since the target is of three types, the type of the target obtained in step S2 is expressed as a tensor of [3], and dimension expansion is performed to obtain a type feature δ of [1 × 1 × 3 ];
D. splicing the global feature beta obtained in the step B, the RGB depth feature gamma obtained in the step S5 and the category feature delta obtained in the step C in a third dimension, and copying the first dimension to obtain a feature epsilon with the shape of [1024 × 1 × 1155 ];
E. splicing the point-by-point feature alpha obtained in the step A and the feature epsilon obtained in the step D in a third dimension to obtain a feature with the shape of [1024 × 1 × 1219 ];
F. inputting the [1024 × 1 × 1219] feature obtained in step E into one [1 × 1,512] convolution layer, one [1 × 1,256] convolution layer, two [1 × 1,128] convolution layers, and one [1 × 1,2] convolution layer, respectively, and then deleting the second dimension to obtain a [1024 × 2] division mask;
G. f, obtaining a segmentation mask with a shape of [1024 × 2], wherein the segmentation mask corresponds to two classification categories of each point in 1024 points in the input view cone point cloud data, and the target point cloud with the shape of [1024 × 3] is obtained through segmentation;
s7, resampling the target point cloud obtained in the step S6, and inputting the target point cloud into a 3D frame prediction network to obtain a final target 3D boundary frame; specifically, the following steps are adopted to obtain a final target 3D bounding box:
a. resampling an input target point cloud with the shape of [1024 × 3] to be [512 × 3], and expanding one dimension in a second dimension to obtain a tensor with the shape of [512 × 1 × 3 ];
b. b, respectively passing the tensor of [512 multiplied by 1 multiplied by 3] obtained in the step a through two convolutional layers of [1 multiplied by 1,128], one convolutional layer of [1 multiplied by 1,256] and one convolutional layer of [1 multiplied by 1,512], and obtaining the characteristic of the shape of [512 multiplied by 1 multiplied by 512 ];
c. c, performing maximum pooling on the [512 multiplied by 1 multiplied by 512] characteristics obtained in the step b in a first dimension to obtain characteristics xi with the shape of [1 multiplied by 512 ];
d. reducing the dimension of the RGB depth feature gamma obtained in the step S5 to [1 × 1 × 64] through convolution of [1 × 1,64], splicing the RGB depth feature gamma with the feature xi of [1 × 1 × 512] obtained in the step c in the third dimension, and then deleting the first two dimensions to obtain a tensor with the shape of [576 ];
e. d, splicing the tensor with the shape of [576] obtained in the step d with the tensor of the target type with the shape of [3], and obtaining the characteristic with the shape of [579 ];
f. inputting the characteristics with the shape of [579] obtained in the step e into three full-connection layers with the widths of 512, 256 and 59 in sequence, and finally outputting a vector with the length of 59;
g. in the vector with the length of 59 obtained in the step f, the meaning of each item is as follows: target center x coordinate, target center y coordinate, target center z coordinate, the next 24 items are [ angle scores, corresponding residuals ] corresponding to 12 predefined angles, the next 32 items are [ size scores, height residuals, width residuals, length residuals ] corresponding to 8 predefined sizes;
h. and g, restoring to obtain a final target 3D bounding box according to the definition of the step g.
On the KITTI data set val set, the method of the invention uses the same group of 2D detection results to perform the test as the method a in the background art, and the obtained comparison value of the average precision AP is shown in the following table 1:
table 1 comparative results schematic table
Figure BDA0002841801480000091
As can be seen from Table 1, the method of the present invention has the advantages of better precision, higher reliability and better accuracy.
Fig. 2 is a schematic diagram of functional modules of the apparatus of the present invention: the device for realizing the 3D target detection method comprises an image acquisition module, a 2D target detection module, a segmentation resampling module, a cutting module, a feature extraction module, a network segmentation module and a 3D frame prediction network module; the image acquisition module, the 2D target detection module, the segmentation resampling module, the cutting module, the feature extraction module, the network segmentation module and the 3D frame prediction network module are sequentially connected in series; the image acquisition module is used for acquiring an original RGB image; the 2D target detection module is used for carrying out 2D target detection on the acquired original RGB image so as to obtain a 2D bounding box and a target category; the segmentation and resampling module is used for segmenting and resampling the obtained 2D bounding box so as to obtain the viewing cone point cloud data containing the target; the cutting module is used for cutting the RGB image by using the obtained 2D bounding box so as to obtain a target RGB image; the feature extraction module is used for extracting features of the target RGB image so as to obtain RGB depth features; the network segmentation module is used for segmenting the view cone point cloud data and the RGB depth features to obtain a segmentation mask and converting the segmentation mask into a target point cloud; and the 3D frame prediction network module is used for resampling the target point cloud and inputting the resampled target point cloud into the 3D frame prediction network so as to obtain a final target 3D boundary frame.

Claims (5)

1. A3D target detection method comprises the following steps:
s1, acquiring an original RGB image;
s2, performing 2D target detection on the RGB image obtained in the step S1 to obtain a 2D bounding box and a target category;
s3, segmenting and resampling by using the 2D bounding box obtained in the step S2 to obtain viewing cone point cloud data containing a target;
s4, cutting the RGB image obtained in the step S1 by using the 2D bounding box obtained in the step S2, and thus obtaining a target RGB image;
s5, inputting the target RGB image obtained in the step S4 into a feature extraction network to obtain RGB depth features;
s6, inputting the viewing cone point cloud data obtained in the step S3 and the RGB depth characteristics obtained in the step S5 into a segmentation network to obtain a segmentation mask, and converting the segmentation mask into a target point cloud; specifically, the method comprises the following steps of:
A. expanding the dimensionality of the view cone point cloud data tensor with the shape of [1024 × 4] obtained in the step S3, and then passing through 3 layers of [1 × 1,64] of convolution layers to obtain point-by-point feature alpha with the shape of [1024 × 1 × 64 ];
B. b, the pointwise features alpha obtained in the step A are subjected to two convolutional layers of [1 × 1,128] and [1 × 1,1024] respectively to obtain features of a shape of [1024 × 1 × 1024], and the features are subjected to maximal pooling in the first dimension to obtain global features beta of a shape of [1 × 1 × 1024 ];
C. since the target is in three categories, the category of the target obtained in step S2 is expressed as a tensor of [3], and dimension expansion is performed to obtain a category feature δ of [1 × 1 × 3 ];
D. splicing the global feature beta obtained in the step B, the RGB depth feature gamma obtained in the step S5 and the category feature delta obtained in the step C in a third dimension, and copying the first dimension to obtain a feature epsilon with the shape of [1024 × 1 × 1155 ];
E. splicing the point-by-point feature alpha obtained in the step A and the feature epsilon obtained in the step D in a third dimension to obtain a feature with the shape of [1024 × 1 × 1219 ];
F. inputting the [1024 × 1 × 1219] feature obtained in step E into one [1 × 1,512] convolution layer, one [1 × 1,256] convolution layer, two [1 × 1,128] convolution layers, and one [1 × 1,2] convolution layer, respectively, and then deleting the second dimension to obtain a [1024 × 2] division mask;
G. f, obtaining a segmentation mask with a shape of [1024 × 2], wherein the segmentation mask corresponds to two classification categories of each point in 1024 points in the input view cone point cloud data, and the target point cloud with the shape of [1024 × 3] is obtained through segmentation;
s7, resampling the target point cloud obtained in the step S6, and inputting the target point cloud into a 3D frame prediction network to obtain a final target 3D boundary frame; specifically, the final target 3D bounding box is obtained by adopting the following steps:
a. resampling an input target point cloud with the shape of [1024 × 3] to be [512 × 3], and expanding one dimension in a second dimension to obtain a tensor with the shape of [512 × 1 × 3 ];
b. b, respectively passing the tensor of [512 multiplied by 1 multiplied by 3] obtained in the step a through two convolutional layers of [1 multiplied by 1,128], one convolutional layer of [1 multiplied by 1,256] and one convolutional layer of [1 multiplied by 1,512], and obtaining the characteristic of the shape of [512 multiplied by 1 multiplied by 512 ];
c. c, performing maximum pooling on the [512 multiplied by 1 multiplied by 512] characteristics obtained in the step b in a first dimension to obtain characteristics xi with the shape of [1 multiplied by 512 ];
d. reducing the dimension of the RGB depth feature gamma obtained in the step S5 to [1 × 1 × 64] through convolution of [1 × 1,64], splicing the RGB depth feature gamma with the feature xi of [1 × 1 × 512] obtained in the step c in the third dimension, and then deleting the first two dimensions to obtain a tensor with the shape of [576 ];
e. d, splicing the tensor with the shape of [576] obtained in the step d with the tensor of the target category with the shape of [3], and obtaining the feature with the shape of [579 ];
f. inputting the characteristics with the shape of [579] obtained in the step e into three full-connection layers with the widths of 512, 256 and 59 in sequence, and finally outputting a vector with the length of 59;
g. in the vector with the length of 59 obtained in the step f, the meaning of each item is as follows: target center x coordinate, target center y coordinate, target center z coordinate, the next 24 items are [ angle scores, corresponding residuals ] corresponding to 12 predefined angles, the next 32 items are [ size scores, height residuals, width residuals, length residuals ] corresponding to 8 predefined sizes;
h. and g, restoring to obtain a final target 3D boundary box according to the definition of the step g.
2. The 3D object detection method according to claim 1, wherein the 2D bounding box obtained in step S2 is used for segmentation and resampling in step S3, so as to obtain the view cone point cloud data including the object, specifically, the 2D bounding box obtained in step S2 is used for segmentation, and 1024 points are resampled, so as to obtain the view cone point cloud data including the object.
3. The 3D object detection method according to claim 2, wherein in step S4, the RGB image obtained in step S1 is cropped by using the 2D bounding box obtained in step S2, so as to obtain an object RGB image, and specifically, the copy makeborder function in the OpenCV library is used, the gray value (128,128,128) is used to fill the edge, so as to obtain a square image with an aspect ratio of 1:1, and the size is adjusted to be a fixed [244 x 224 ].
4. The 3D object detection method according to claim 3, wherein the step S5 inputs the target RGB image obtained in step S4 to a feature extraction network to obtain RGB depth features, specifically, the target RGB image is input to a ResNet 50 network to output features with a shape of [1 x 2048], and dimension reduction is performed by convolution of [1 x 1,128] to obtain RGB depth features γ with a shape of [1 x 128 ].
5. A device for realizing the 3D target detection method in claim 1 is characterized by comprising an image acquisition module, a 2D target detection module, a segmentation resampling module, a clipping module, a feature extraction module, a network segmentation module and a 3D frame prediction network module; the image acquisition module, the 2D target detection module, the segmentation resampling module, the cutting module, the feature extraction module, the network segmentation module and the 3D frame prediction network module are sequentially connected in series; the image acquisition module is used for acquiring an original RGB image; the 2D target detection module is used for carrying out 2D target detection on the acquired original RGB image so as to obtain a 2D boundary frame and a target category; the segmentation and resampling module is used for segmenting and resampling the obtained 2D bounding box so as to obtain the view cone point cloud data containing the target; the cutting module is used for cutting the RGB image by using the obtained 2D bounding box so as to obtain a target RGB image; the feature extraction module is used for extracting features of the target RGB image so as to obtain RGB depth features; the network segmentation module is used for segmenting the view cone point cloud data and the RGB depth features to obtain a segmentation mask and converting the segmentation mask into a target point cloud; and the 3D frame prediction network module is used for resampling the target point cloud and inputting the target point cloud into the 3D frame prediction network so as to obtain a final target 3D boundary frame.
CN202011494753.4A 2020-12-17 2020-12-17 3D target detection method and device Active CN112464905B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011494753.4A CN112464905B (en) 2020-12-17 2020-12-17 3D target detection method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011494753.4A CN112464905B (en) 2020-12-17 2020-12-17 3D target detection method and device

Publications (2)

Publication Number Publication Date
CN112464905A CN112464905A (en) 2021-03-09
CN112464905B true CN112464905B (en) 2022-07-26

Family

ID=74803668

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011494753.4A Active CN112464905B (en) 2020-12-17 2020-12-17 3D target detection method and device

Country Status (1)

Country Link
CN (1) CN112464905B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115496923B (en) * 2022-09-14 2023-10-20 北京化工大学 Multi-mode fusion target detection method and device based on uncertainty perception

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110390302A (en) * 2019-07-24 2019-10-29 厦门大学 A kind of objective detection method
CN110472534A (en) * 2019-07-31 2019-11-19 厦门理工学院 3D object detection method, device, equipment and storage medium based on RGB-D data
CN110879994A (en) * 2019-12-02 2020-03-13 中国科学院自动化研究所 Three-dimensional visual inspection detection method, system and device based on shape attention mechanism
CN111145174A (en) * 2020-01-02 2020-05-12 南京邮电大学 3D target detection method for point cloud screening based on image semantic features
CN111723721A (en) * 2020-06-15 2020-09-29 中国传媒大学 Three-dimensional target detection method, system and device based on RGB-D

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10824862B2 (en) * 2017-11-14 2020-11-03 Nuro, Inc. Three-dimensional object detection for autonomous robotic systems using image proposals

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110390302A (en) * 2019-07-24 2019-10-29 厦门大学 A kind of objective detection method
CN110472534A (en) * 2019-07-31 2019-11-19 厦门理工学院 3D object detection method, device, equipment and storage medium based on RGB-D data
CN110879994A (en) * 2019-12-02 2020-03-13 中国科学院自动化研究所 Three-dimensional visual inspection detection method, system and device based on shape attention mechanism
CN111145174A (en) * 2020-01-02 2020-05-12 南京邮电大学 3D target detection method for point cloud screening based on image semantic features
CN111723721A (en) * 2020-06-15 2020-09-29 中国传媒大学 Three-dimensional target detection method, system and device based on RGB-D

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
Charles R. Qi 等.Frustum PointNets for 3D Object Detection from RGB-D Data.《2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition》.2018, *

Also Published As

Publication number Publication date
CN112464905A (en) 2021-03-09

Similar Documents

Publication Publication Date Title
CN111027401B (en) End-to-end target detection method with integration of camera and laser radar
CN110264416B (en) Sparse point cloud segmentation method and device
CN114708585B (en) Attention mechanism-based millimeter wave radar and vision fusion three-dimensional target detection method
CN110674829B (en) Three-dimensional target detection method based on graph convolution attention network
CN111145174B (en) 3D target detection method for point cloud screening based on image semantic features
CN111563415B (en) Binocular vision-based three-dimensional target detection system and method
CN110298884B (en) Pose estimation method suitable for monocular vision camera in dynamic environment
CN111080693A (en) Robot autonomous classification grabbing method based on YOLOv3
CN112613378B (en) 3D target detection method, system, medium and terminal
JP2007527569A (en) Imminent collision detection based on stereoscopic vision
CN111797836B (en) Depth learning-based obstacle segmentation method for extraterrestrial celestial body inspection device
CN115116049B (en) Target detection method and device, electronic equipment and storage medium
CN111768415A (en) Image instance segmentation method without quantization pooling
CN113657409A (en) Vehicle loss detection method, device, electronic device and storage medium
TWI745204B (en) High-efficiency LiDAR object detection method based on deep learning
CN115035296B (en) Flying car 3D semantic segmentation method and system based on aerial view projection
Gluhaković et al. Vehicle detection in the autonomous vehicle environment for potential collision warning
CN114792416A (en) Target detection method and device
CN115641322A (en) Robot grabbing method and system based on 6D pose estimation
CN112464905B (en) 3D target detection method and device
CN117058646A (en) Complex road target detection method based on multi-mode fusion aerial view
CN115115917A (en) 3D point cloud target detection method based on attention mechanism and image feature fusion
CN113221957A (en) Radar information fusion characteristic enhancement method based on Centernet
EP3905107A1 (en) Computer-implemented method for 3d localization of an object based on image data and depth data
CN115035492B (en) Vehicle identification method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant