CN116051633B

CN116051633B - 3D point cloud target detection method and device based on weighted relation perception

Info

Publication number: CN116051633B
Application number: CN202211618478.1A
Authority: CN
Inventors: 李骏; 张新钰; 王力; 谢涛; 陆晓敏; 邓富强
Original assignee: Tsinghua University
Current assignee: Tsinghua University
Priority date: 2022-12-15
Filing date: 2022-12-15
Publication date: 2024-02-13
Anticipated expiration: 2042-12-15
Also published as: CN116051633A

Abstract

The application provides a 3D point cloud target detection method and device based on weighted relation perception, wherein the method comprises the following steps: acquiring original 3D point cloud data; processing the original 3D point cloud data by utilizing a PointNet++ network to obtain a plurality of seed points; processing a plurality of seed points by utilizing a shared voting model which is trained in advance to obtain a voting cluster comprising a plurality of votes; sampling and grouping the voting clusters to obtain a plurality of target candidates; updating the target candidate by using a weighting relation perception proposal generation model which is trained in advance to obtain an updated target candidate; processing the updated target candidates by using a multi-layer perceptron to obtain an object proposal; decoding the object proposal to obtain a target detection result. According to the method and the device, the accuracy of target detection of the 3D point cloud data can be improved.

Description

3D point cloud target detection method and device based on weighted relation perception

Technical Field

The application relates to the technical field of automatic driving, in particular to a 3D point cloud target detection method and device based on weighted relation sensing.

Background

When 3D point cloud data is subject to target detection by convolution, there are two common methods, one is to project the original point cloud onto an aligned structure, such as a voxel grid, three-dimensional convolution can be naturally applied, and the other is to directly fuse information of the irregular point cloud by using a maximum pooling method. These methods can achieve good performance with the input scene complete and clear, but the real data is often incomplete and noisy, making it difficult for this method of latent context fusion to extract critical information.

To further exploit the good context information, a relationship graph between targets can be established, with inference of the scene graph to enhance understanding of the 3D scene, but to construct the correct scene graph, additional regression supervision needs to be introduced. Furthermore, all possible relationships in the scene may be exploited to avoid introducing additional tags. However, even if a hierarchical structure is used to maintain context relationships, it is considered that all relationships are still redundant and may have too many noise points. Furthermore, most methods of explicitly utilizing context are specialized network architectures, which are difficult to improve existing detection methods.

Disclosure of Invention

In view of the above, the present application provides a method and apparatus for detecting a 3D point cloud target based on weighted relation sensing, so as to solve the above technical problems.

In a first aspect, an embodiment of the present application provides a 3D point cloud target detection method based on weighted relation awareness, including:

acquiring original 3D point cloud data;

processing the original 3D point cloud data by utilizing a PointNet++ network to obtain a plurality of seed points;

processing a plurality of seed points by utilizing a shared voting model which is trained in advance to obtain a voting cluster comprising a plurality of votes;

sampling and grouping the voting clusters to obtain a plurality of target candidates;

updating the target candidate by using a weighting relation perception proposal generation model which is trained in advance to obtain an updated target candidate;

processing the updated target candidates by using a first multi-layer perceptron which is trained in advance to obtain an object proposal; decoding the object proposal to obtain a target detection result.

Further, the shared voting model adopts a second multi-layer perceptron; processing a plurality of seed points by utilizing a shared voting model which is trained in advance to obtain a voting cluster comprising a plurality of votes; comprising the following steps:

processing the ith seed point by using a second multi-layer perceptron to obtain the position offset delta p of the ith seed point _i And a characteristic offset Δf _i The method comprises the steps of carrying out a first treatment on the surface of the Ith seed point s _i ＝[p _i ；f _i ]，p _i Is the three-dimensional position of the seed point, f _i A point cloud feature that is a seed point;

calculating the three-dimensional position y after the i-th seed point correction _i And point cloud feature g _i ：

y _i ＝p _i +Δp _i ，g _i ＝f _i +Δf _i

Voting v of the ith seed point _i The method comprises the following steps: v _i ＝[y _i ，g _i ]；

The voting cluster is:m is the number of seed points.

Further, sampling and grouping the voting clusters to obtain a plurality of target candidates; comprising the following steps:

sampling the voting clusters by using a furthest point sampling algorithm to obtain K center points;

and taking each center point as a sphere center, acquiring a local neighborhood of each sphere center by utilizing a sphere query, taking the local neighborhood of each sphere center as a target candidate, and taking all votes of the local neighborhood of the sphere center as each target candidate.

Further, the weighted relation perception proposal generation model comprises: the system comprises a weighted self-attention processing unit, a splicing unit, a third multi-layer perceptron and an addition unit; the weighted self-attention processing unit comprises a processing unit and a weighted self-attention layer; the processing unit includes: a first processing branch, a second processing branch, a third processing branch, and a multi-layer perceptron;

updating the target candidate by using a weighting relation perception proposal generation model which is trained in advance to obtain an updated target candidate; comprising the following steps:

processing the K target candidates by using a first processing branch to obtain a matrix Q:

Q＝W _Q O(C)

wherein W is _Q For a parameterized multi-layer perceptron, O (C) represents K target candidates;

processing the K target candidates by using a second processing branch to obtain a matrix P:

P＝W _P O(C)

wherein W is _P Is a parameterized multi-layer perceptron;

processing the K target candidates by using a third processing branch to obtain a matrix V:

V＝W _V O(C)

wherein W is _V Is a parameterized multi-layer perceptron;

MLP using multi-layer perceptron ₁ (. Cndot.) predicting whether K target candidates produce positive or negative effects, resulting in a predictive score vector w:

w＝MLP ₁ (O(C))

processing the matrix Q, the matrix P, the matrix V and the prediction score matrix w by using a weighted self-attention layer to obtain a correlation delta for representing different target candidates:

Δ＝softmax(QP ^T )WV

wherein s0ftmax (·) is a weighted self-attention function; the matrix W is a matrix with the predictive score vector W as a diagonal;

splicing the association delta with K target candidates O (C) by using a splicing unit to obtain an expanded target candidate O (C) delta, wherein delta represents a juxtaposition operation;

learning the expanded target candidate by using a third multi-layer perceptron to obtain a correction quantity of the target candidate;

the correction amounts of the target candidates are added by an adding unit to obtainUpdated target candidate O ^r (C)。

Further, processing the updated target candidates by using a multi-layer perceptron to obtain an object proposal, and decoding the object proposal to obtain a target detection result; comprising the following steps:

MLP with first multi-layer perceptron _s (. Cndot.) for updated target candidate O ^r (C) Processing is carried out to obtain an object proposal P (C):

P(C)＝MLP _s (O ^r (C))

wherein, P (C) is expressed as a multidimensional vector comprising objective scores, bounding box parameters and semantic classification scores;

decoding the object proposal P (C) to obtain a target detection result.

Further, the method further comprises: and performing joint training on the shared voting model, the weighted relation perception proposal generation model and the first multi-layer perception machine.

In a second aspect, an embodiment of the present application provides a 3D point cloud object detection apparatus based on weighted relation awareness, including:

the acquisition unit is used for acquiring original 3D point cloud data;

the seed point generation unit is used for processing the original 3D point cloud data by utilizing the PointNet++ network to obtain a plurality of seed points;

the voting unit is used for processing the plurality of seed points by utilizing the shared voting model which is trained in advance to obtain a voting cluster comprising a plurality of votes;

the target candidate generation unit is used for sampling and grouping the voting clusters to obtain a plurality of target candidates;

the target candidate updating unit is used for updating the target candidate by utilizing the weighting relation perception proposal generating model which is trained in advance to obtain an updated target candidate;

the detection unit is used for processing the updated target candidates by using a first multi-layer perceptron which is trained in advance to obtain an object proposal; decoding the object proposal to obtain a target detection result.

In a third aspect, an embodiment of the present application provides an electronic device, including: a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the methods of the embodiments of the present application when executing the computer program.

In a fourth aspect, embodiments of the present application provide a computer-readable storage medium storing computer instructions that, when executed by a processor, implement a method of embodiments of the present application.

According to the method and the device, the accuracy of target detection of the 3D point cloud data can be improved.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings that are needed in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present application, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

Fig. 1 is a flowchart of a 3D point cloud target detection method based on weighted relation awareness provided in an embodiment of the present application;

fig. 2 is a functional block diagram of a 3D point cloud object detection device based on weighted relation sensing according to an embodiment of the present application;

fig. 3 is a functional block diagram of an electronic device according to an embodiment of the present application.

Detailed Description

For the purposes of making the objects, technical solutions and advantages of the embodiments of the present application more clear, the technical solutions of the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is apparent that the described embodiments are some embodiments of the present application, but not all embodiments. The components of the embodiments of the present application, which are generally described and illustrated in the figures herein, may be arranged and designed in a wide variety of different configurations.

Thus, the following detailed description of the embodiments of the present application, as provided in the accompanying drawings, is not intended to limit the scope of the application, as claimed, but is merely representative of selected embodiments of the application. All other embodiments, which can be made by one of ordinary skill in the art based on the embodiments herein without making any inventive effort, are intended to be within the scope of the present application.

First, the design concept of the embodiment of the present application will be briefly described.

In real AR applications, it is challenging to detect objects directly from the 3D point cloud. Three-dimensional object detection is to locate all object objects and simultaneously identify semantic tags thereof, which puts high demands on understanding the whole input scene. With the rapid development of deep learning and the continuous expansion of online three-dimensional data set scale, data-driven methods such as CNN have been widely used for target detection. The key discovery of these methods is that for accurate detection, the context is as important as the detection object itself. However, the additional information provided in three dimensions also introduces noise and irregularities, which make it more difficult to use convolution to collect the correct context information for detection.

In order to avoid irregularities when convolution is applied to three-dimensional object detection, two typical methods have recently appeared, one is to project an original point cloud onto an aligned structure, such as a voxel grid, and three-dimensional convolution can be naturally applied, and the other is to directly fuse information of the irregular point cloud by using a maximum pooling method. These methods can achieve good performance with the input scene complete and clear, but the real data is often incomplete and noisy, making it difficult for this method of latent context fusion to extract critical information. To further utilize good context information, some approaches attempt to explicitly utilize context for object detection. Establishing a relationship graph between objects is a conventional approach that utilizes contextual information, some of which utilize inference of a scene graph to enhance understanding of a 3D scene, but require the introduction of additional regression supervision in order to construct the correct scene graph. Still other approaches make use of all possible relationships in the scene to avoid introducing additional tags. Furthermore, even though a hierarchical structure is proposed to maintain context relationships, it is considered that all relationships are still redundant and may have too many noise points. Furthermore, most methods of explicitly utilizing context are specialized network architectures, which are difficult to improve existing detection methods.

For the problems, in order to better utilize the context information and weaken the influence of noise and redundant candidate objects on the context information, the application provides a 3D point cloud target detection method based on weighted relation perception, which can weaken the influence of noise or redundant candidate objects on the context information and improve the target detection precision.

After the application scenario and the design idea of the embodiment of the present application are introduced, the technical solution provided by the embodiment of the present application is described below.

As shown in fig. 1, the implementation of the present application provides a 3D point cloud target detection method based on weighted relation sensing, which includes:

step 101: acquiring original 3D point cloud data;

step 102: processing the original 3D point cloud data by utilizing a PointNet++ network to obtain a plurality of seed points;

PointNet++ is used as the network backbone. The backbox has a plurality of set-abstraction layers layers and a feature propagation (upsampling) layer (feature propagation layers) with skip connections, which output subsets of position coordinates XYZ and a d-dimensional feature vector point; the result is M seed points of dimension (3+d); the set of seed points is:ith seed point s _i ＝[p _i ；f _i ]，p _i Is the three-dimensional position of the seed point, f _i Is a point cloud feature of the seed point.

Step 103: processing a plurality of seed points by utilizing a shared voting model which is trained in advance to obtain a voting cluster comprising a plurality of votes;

the shared voting model adopts a second multi-layer perceptron; the method comprises the following steps:

processing the ith seed point by using a second multi-layer perceptron to obtain the position offset delta p of the ith seed point _i And a characteristic offset Δf _i ；

y _i ＝p _i +Δp _i ，g _i ＝f _i +Δf _i

Voting v of the ith seed point _i The method comprises the following steps: v _i ＝[y _i ，g _i ]The method comprises the steps of carrying out a first treatment on the surface of the The voting cluster is:

step 104: sampling and grouping the voting clusters to obtain a plurality of target candidates;

Step 105: updating the target candidate by using a weighting relation perception proposal generation model which is trained in advance to obtain an updated target candidate;

the weighted relation perception proposal generation model comprises the following steps: the system comprises a weighted self-attention processing unit, a splicing unit, a third multi-layer perceptron and an addition unit; the weighted self-attention processing unit comprises a processing unit and a weighted self-attention layer; the processing unit includes: a first processing branch, a second processing branch, a third processing branch, and a multi-layer perceptron;

Q＝W _Q O(C)

P＝W _P O(C)

wherein W is _P Is a parameterized multi-layer perceptron;

V＝W _V O(C)

wherein W is _V Is a parameterized multi-layer perceptron;

w＝MLP ₁ (O(C))

Δ＝softmax(QP ^T )WV

wherein softmax (·) is a weighted self-attention function; the matrix W is a matrix with the predictive score vector W as a diagonal; in this way, one active candidate will retrieve and aggregate more important information from other candidates that are predicted to be active.

MLP with third multi-layer perceptron ₂ (. Cndot.) after learning the expanded target candidates, obtaining a correction amount of the target candidates: MLP (Multi-layer Programming protocol) ₂ (O(C)||Δ)；

The addition unit is used for adding the target candidate and the correction quantity of the target candidate to obtain an updated target candidate O ^r (C)：

O ^r (C)＝O(C)+MLP ₂ (O(C)||Δ)。

O ^r (C) Comprises itselfContext information that creates positive effects around.

By learning the predictive score vector, allowing more clear dependent data information to be passed between the target candidates, the target candidate will learn more context information from another target candidate whose predictive score is positive.

Step 106: processing the updated target candidates by using a first multi-layer perceptron to obtain an object proposal; decoding the object proposal to obtain a target detection result;

P(C)＝MLP _s (O ^r (C))

wherein, P (C) is expressed as a multidimensional vector comprising objective scores, bounding box parameters and semantic classification scores; each object proposal P (C) has (5+2nh+4dns+nc) number of channels, where NH is the number of channels containing a class score for each heading and a deviation value from a standard heading under the highest scoring class, NS is the number of channels containing a class score for each dimension and a deviation value from a standard for three dimensions of length, width, and height, and NC is the number of channels containing a semantic class.

Decoding the object proposal P (C) to obtain a target detection result.

Furthermore, the method comprises the following steps: and performing joint training on the shared voting model, the weighted relation perception proposal generation model and the first multi-layer perception machine.

Based on the foregoing embodiments, the present application provides a 3D point cloud target detection device based on weighted relation sensing, and referring to fig. 2, the 3D point cloud target detection device 200 based on weighted relation sensing provided in the present application at least includes:

an acquiring unit 201, configured to acquire original 3D point cloud data;

a seed point generating unit 202, configured to process the original 3D point cloud data by using a pointnet++ network, so as to obtain a plurality of seed points;

a voting unit 203, configured to process a plurality of seed points by using a shared voting model that is trained in advance, so as to obtain a voting cluster that includes a plurality of votes;

a target candidate generating unit 204, configured to sample and group the voting clusters to obtain a plurality of target candidates;

a target candidate updating unit 205, configured to update a target candidate by using a weighted relation perception proposal generation model that is trained in advance, so as to obtain an updated target candidate;

a detection unit 206, configured to process the updated target candidate by using a first multi-layer perceptron that is trained in advance, so as to obtain an object proposal; decoding the object proposal to obtain a target detection result.

It should be noted that, the principle of the 3D point cloud target detection device 200 based on weighted relation sensing provided in the embodiment of the present application to solve the technical problem is similar to the 3D point cloud target detection method based on weighted relation sensing provided in the embodiment of the present application, so the implementation of the 3D point cloud target detection device 200 based on weighted relation sensing provided in the embodiment of the present application may refer to the implementation of the 3D point cloud target detection method based on weighted relation sensing provided in the embodiment of the present application, and the repetition is omitted.

Based on the foregoing embodiments, the embodiment of the present application further provides an electronic device, as shown in fig. 3, where the electronic device 300 provided in the embodiment of the present application includes at least: the method for detecting the 3D point cloud target based on weighted relation sensing comprises a processor 301, a memory 302 and a computer program which is stored in the memory 302 and can be run on the processor 301, wherein the processor 301 executes the computer program to realize the method for detecting the 3D point cloud target based on weighted relation sensing.

The electronic device 300 provided by the embodiments of the present application may also include a bus 303 that connects the different components, including the processor 301 and the memory 302. Bus 303 represents one or more of several types of bus structures, including a memory bus, a peripheral bus, a local bus, and so forth.

The Memory 302 may include readable media in the form of volatile Memory, such as random access Memory (Random Access Memory, RAM) 3021 and/or cache Memory 3022, and may further include Read Only Memory (ROM) 3023.

The memory 302 may also include a program tool 3025 having a set (at least one) of program modules 3024, the program modules 3024 including, but not limited to: an operating subsystem, one or more application programs, other program modules, and program data, each or some combination of which may include an implementation of a network environment.

The electronic device 300 may also communicate with one or more external devices 304 (e.g., keyboard, remote control, etc.), one or more devices that enable a user to interact with the electronic device 300 (e.g., cell phone, computer, etc.), and/or any device that enables the electronic device 300 to communicate with one or more other electronic devices 300 (e.g., router, modem, etc.). Such communication may occur through an Input/Output (I/O) interface 305. Also, electronic device 300 may communicate with one or more networks such as a local area network (Local Area Network, LAN), a wide area network (Wide Area Network, WAN), and/or a public network such as the internet via network adapter 306. As shown in fig. 3, the network adapter 306 communicates with other modules of the electronic device 300 over the bus 303. It should be appreciated that although not shown in fig. 3, other hardware and/or software modules may be used in connection with electronic device 300, including, but not limited to: microcode, device drivers, redundant processors, external disk drive arrays, disk array (Redundant Arrays of Independent Disks, RAID) subsystems, tape drives, data backup storage subsystems, and the like.

It should be noted that the electronic device 300 shown in fig. 3 is only an example, and should not impose any limitation on the functions and application scope of the embodiments of the present application.

The embodiment of the application also provides a computer readable storage medium, and the computer readable storage medium stores computer instructions which are executed by a processor to realize the 3D point cloud target detection method based on weighted relation sensing. Specifically, the executable program may be built-in or installed in the electronic device 300, so that the electronic device 300 may implement the 3D point cloud target detection method based on weighted relation sensing provided in the embodiments of the present application by executing the built-in or installed executable program.

The method provided by the embodiments of the present application may also be implemented as a program product comprising program code for causing an electronic device 300 to perform the weighted relation awareness based 3D point cloud object detection method provided by the embodiments of the present application when the program product is executable on the electronic device 300.

The program product provided by the embodiments of the present application may employ any combination of one or more readable media, where the readable media may be a readable signal medium or a readable storage medium, and the readable storage medium may be, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination thereof, and more specific examples (a non-exhaustive list) of the readable storage medium include: an electrical connection having one or more wires, a portable disk, a hard disk, a RAM, a ROM, an erasable programmable read-Only Memory (Erasable Programmable Read Only Memory, EPROM), an optical fiber, a portable compact disk read-Only Memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

The program product provided by the embodiments of the present application may be implemented as a CD-ROM and include program code that may also be run on a computing device. However, the program product provided by the embodiments of the present application is not limited thereto, and in the embodiments of the present application, the readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.

It should be noted that although several units or sub-units of the apparatus are mentioned in the above detailed description, such a division is merely exemplary and not mandatory. Indeed, the features and functions of two or more of the elements described above may be embodied in one element in accordance with embodiments of the present application. Conversely, the features and functions of one unit described above may be further divided into a plurality of units to be embodied.

Furthermore, although the operations of the methods of the present application are depicted in the drawings in a particular order, this is not required to or suggested that these operations must be performed in this particular order or that all of the illustrated operations must be performed in order to achieve desirable results. Additionally or alternatively, certain steps may be omitted, multiple steps combined into one step to perform, and/or one step decomposed into multiple steps to perform.

Finally, it should be noted that the above embodiments are merely illustrative of the technical solution of the present application and not limiting. Although the present application has been described in detail with reference to the embodiments, it should be understood by those skilled in the art that the modifications and equivalents may be made to the technical solutions of the present application without departing from the spirit and scope of the technical solutions of the present application, and all such modifications and equivalents are intended to be encompassed in the scope of the claims of the present application.

Claims

1. A3D point cloud target detection method based on weighted relation perception is characterized by comprising the following steps:

acquiring original 3D point cloud data;

processing the updated target candidates by using a first multi-layer perceptron which is trained in advance to obtain an object proposal; decoding the object proposal to obtain a target detection result;

sampling and grouping the voting clusters to obtain a plurality of target candidates; comprising the following steps:

taking each center point as a sphere center, acquiring a local neighborhood of each sphere center by utilizing sphere query, taking the local neighborhood of each sphere center as a target candidate, and taking all votes of the local neighborhood of the sphere center as the target candidate;

Q＝W _Q O(C)

P＝W _P O(C)

wherein W is _P Is a parameterized multi-layer perceptron;

V＝W _V O(C)

wherein W is _V Is a parameterized multi-layer perceptron;

w＝MLP ₁ (O(C))

Δ＝softmax(QP ^T )WV

wherein softmax (·) is a weighted self-attention function; the matrix W is a matrix with the predictive score vector W as a diagonal;

the addition unit is used for adding the target candidate and the correction quantity of the target candidate to obtain an updated target candidate O ^r (C)。

2. The method of claim 1, wherein the shared voting model employs a second multi-layer perceptron; processing a plurality of seed points by utilizing a shared voting model which is trained in advance to obtain a voting cluster comprising a plurality of votes; comprising the following steps:

processing the ith seed point by using a second multi-layer perceptron to obtain the position offset delta p of the ith seed point _i And a characteristic offset Δf _i The method comprises the steps of carrying out a first treatment on the surface of the Ith seed point s _i ＝[p _i ；f _i ],p _i Is the three-dimensional position of the seed point, f _i A point cloud feature that is a seed point;

y _i ＝p _i +Δp _i ,g _i ＝f _i +Δf _i

Voting v of the ith seed point _i The method comprises the following steps: v _i ＝[y _i ,g _i ]；

The voting cluster is:m is the number of seed points.

3. The method of claim 1, wherein the object proposal is obtained by processing the updated object candidates with a multi-layer perceptron, and the object proposal is decoded to obtain the object detection result; comprising the following steps:

P(C)＝MLP _s (O ^r (C))

decoding the object proposal P (C) to obtain a target detection result.

4. The method according to claim 1, wherein the method further comprises: and performing joint training on the shared voting model, the weighted relation perception proposal generation model and the first multi-layer perception machine.

5. A 3D point cloud target detection apparatus based on weighted relation awareness, comprising:

the acquisition unit is used for acquiring original 3D point cloud data;

the detection unit is used for processing the updated target candidates by using a first multi-layer perceptron which is trained in advance to obtain an object proposal; decoding the object proposal to obtain a target detection result;

Q＝W _Q O(C)

P＝W _P O(C)

wherein W is _P Is a parameterized multi-layer perceptron;

V＝W _V O(C)

wherein W is _V Is a parameterized multi-layer perceptron;

w＝MLP ₁ (I(C))

Δ＝softmax(QP ^T )WV

6. An electronic device, comprising: memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the method according to any of claims 1-4 when the computer program is executed.

7. A computer readable storage medium storing computer instructions which, when executed by a processor, implement the method of any one of claims 1-4.