CN116051633B - 3D point cloud target detection method and device based on weighted relation perception - Google Patents

3D point cloud target detection method and device based on weighted relation perception Download PDF

Info

Publication number
CN116051633B
CN116051633B CN202211618478.1A CN202211618478A CN116051633B CN 116051633 B CN116051633 B CN 116051633B CN 202211618478 A CN202211618478 A CN 202211618478A CN 116051633 B CN116051633 B CN 116051633B
Authority
CN
China
Prior art keywords
processing
target
target candidate
voting
matrix
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202211618478.1A
Other languages
Chinese (zh)
Other versions
CN116051633A (en
Inventor
李骏
张新钰
王力
谢涛
陆晓敏
邓富强
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tsinghua University
Original Assignee
Tsinghua University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tsinghua University filed Critical Tsinghua University
Priority to CN202211618478.1A priority Critical patent/CN116051633B/en
Publication of CN116051633A publication Critical patent/CN116051633A/en
Application granted granted Critical
Publication of CN116051633B publication Critical patent/CN116051633B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • G06T7/73Determining position or orientation of objects or cameras using feature-based methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T19/00Manipulating 3D models or images for computer graphics

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biophysics (AREA)
  • Computer Graphics (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Computer Hardware Design (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Image Analysis (AREA)

Abstract

The application provides a 3D point cloud target detection method and device based on weighted relation perception, wherein the method comprises the following steps: acquiring original 3D point cloud data; processing the original 3D point cloud data by utilizing a PointNet++ network to obtain a plurality of seed points; processing a plurality of seed points by utilizing a shared voting model which is trained in advance to obtain a voting cluster comprising a plurality of votes; sampling and grouping the voting clusters to obtain a plurality of target candidates; updating the target candidate by using a weighting relation perception proposal generation model which is trained in advance to obtain an updated target candidate; processing the updated target candidates by using a multi-layer perceptron to obtain an object proposal; decoding the object proposal to obtain a target detection result. According to the method and the device, the accuracy of target detection of the 3D point cloud data can be improved.

Description

3D point cloud target detection method and device based on weighted relation perception
Technical Field
The application relates to the technical field of automatic driving, in particular to a 3D point cloud target detection method and device based on weighted relation sensing.
Background
When 3D point cloud data is subject to target detection by convolution, there are two common methods, one is to project the original point cloud onto an aligned structure, such as a voxel grid, three-dimensional convolution can be naturally applied, and the other is to directly fuse information of the irregular point cloud by using a maximum pooling method. These methods can achieve good performance with the input scene complete and clear, but the real data is often incomplete and noisy, making it difficult for this method of latent context fusion to extract critical information.
To further exploit the good context information, a relationship graph between targets can be established, with inference of the scene graph to enhance understanding of the 3D scene, but to construct the correct scene graph, additional regression supervision needs to be introduced. Furthermore, all possible relationships in the scene may be exploited to avoid introducing additional tags. However, even if a hierarchical structure is used to maintain context relationships, it is considered that all relationships are still redundant and may have too many noise points. Furthermore, most methods of explicitly utilizing context are specialized network architectures, which are difficult to improve existing detection methods.
Disclosure of Invention
In view of the above, the present application provides a method and apparatus for detecting a 3D point cloud target based on weighted relation sensing, so as to solve the above technical problems.
In a first aspect, an embodiment of the present application provides a 3D point cloud target detection method based on weighted relation awareness, including:
acquiring original 3D point cloud data;
processing the original 3D point cloud data by utilizing a PointNet++ network to obtain a plurality of seed points;
processing a plurality of seed points by utilizing a shared voting model which is trained in advance to obtain a voting cluster comprising a plurality of votes;
sampling and grouping the voting clusters to obtain a plurality of target candidates;
updating the target candidate by using a weighting relation perception proposal generation model which is trained in advance to obtain an updated target candidate;
processing the updated target candidates by using a first multi-layer perceptron which is trained in advance to obtain an object proposal; decoding the object proposal to obtain a target detection result.
Further, the shared voting model adopts a second multi-layer perceptron; processing a plurality of seed points by utilizing a shared voting model which is trained in advance to obtain a voting cluster comprising a plurality of votes; comprising the following steps:
processing the ith seed point by using a second multi-layer perceptron to obtain the position offset delta p of the ith seed point i And a characteristic offset Δf i The method comprises the steps of carrying out a first treatment on the surface of the Ith seed point s i =[p i ;f i ],p i Is the three-dimensional position of the seed point, f i A point cloud feature that is a seed point;
calculating the three-dimensional position y after the i-th seed point correction i And point cloud feature g i
y i =p i +Δp i ,g i =f i +Δf i
Voting v of the ith seed point i The method comprises the following steps: v i =[y i ,g i ];
The voting cluster is:m is the number of seed points.
Further, sampling and grouping the voting clusters to obtain a plurality of target candidates; comprising the following steps:
sampling the voting clusters by using a furthest point sampling algorithm to obtain K center points;
and taking each center point as a sphere center, acquiring a local neighborhood of each sphere center by utilizing a sphere query, taking the local neighborhood of each sphere center as a target candidate, and taking all votes of the local neighborhood of the sphere center as each target candidate.
Further, the weighted relation perception proposal generation model comprises: the system comprises a weighted self-attention processing unit, a splicing unit, a third multi-layer perceptron and an addition unit; the weighted self-attention processing unit comprises a processing unit and a weighted self-attention layer; the processing unit includes: a first processing branch, a second processing branch, a third processing branch, and a multi-layer perceptron;
updating the target candidate by using a weighting relation perception proposal generation model which is trained in advance to obtain an updated target candidate; comprising the following steps:
processing the K target candidates by using a first processing branch to obtain a matrix Q:
Q=W Q O(C)
wherein W is Q For a parameterized multi-layer perceptron, O (C) represents K target candidates;
processing the K target candidates by using a second processing branch to obtain a matrix P:
P=W P O(C)
wherein W is P Is a parameterized multi-layer perceptron;
processing the K target candidates by using a third processing branch to obtain a matrix V:
V=W V O(C)
wherein W is V Is a parameterized multi-layer perceptron;
MLP using multi-layer perceptron 1 (. Cndot.) predicting whether K target candidates produce positive or negative effects, resulting in a predictive score vector w:
w=MLP 1 (O(C))
processing the matrix Q, the matrix P, the matrix V and the prediction score matrix w by using a weighted self-attention layer to obtain a correlation delta for representing different target candidates:
Δ=softmax(QP T )WV
wherein s0ftmax (·) is a weighted self-attention function; the matrix W is a matrix with the predictive score vector W as a diagonal;
splicing the association delta with K target candidates O (C) by using a splicing unit to obtain an expanded target candidate O (C) delta, wherein delta represents a juxtaposition operation;
learning the expanded target candidate by using a third multi-layer perceptron to obtain a correction quantity of the target candidate;
the correction amounts of the target candidates are added by an adding unit to obtainUpdated target candidate O r (C)。
Further, processing the updated target candidates by using a multi-layer perceptron to obtain an object proposal, and decoding the object proposal to obtain a target detection result; comprising the following steps:
MLP with first multi-layer perceptron s (. Cndot.) for updated target candidate O r (C) Processing is carried out to obtain an object proposal P (C):
P(C)=MLP s (O r (C))
wherein, P (C) is expressed as a multidimensional vector comprising objective scores, bounding box parameters and semantic classification scores;
decoding the object proposal P (C) to obtain a target detection result.
Further, the method further comprises: and performing joint training on the shared voting model, the weighted relation perception proposal generation model and the first multi-layer perception machine.
In a second aspect, an embodiment of the present application provides a 3D point cloud object detection apparatus based on weighted relation awareness, including:
the acquisition unit is used for acquiring original 3D point cloud data;
the seed point generation unit is used for processing the original 3D point cloud data by utilizing the PointNet++ network to obtain a plurality of seed points;
the voting unit is used for processing the plurality of seed points by utilizing the shared voting model which is trained in advance to obtain a voting cluster comprising a plurality of votes;
the target candidate generation unit is used for sampling and grouping the voting clusters to obtain a plurality of target candidates;
the target candidate updating unit is used for updating the target candidate by utilizing the weighting relation perception proposal generating model which is trained in advance to obtain an updated target candidate;
the detection unit is used for processing the updated target candidates by using a first multi-layer perceptron which is trained in advance to obtain an object proposal; decoding the object proposal to obtain a target detection result.
In a third aspect, an embodiment of the present application provides an electronic device, including: a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the methods of the embodiments of the present application when executing the computer program.
In a fourth aspect, embodiments of the present application provide a computer-readable storage medium storing computer instructions that, when executed by a processor, implement a method of embodiments of the present application.
According to the method and the device, the accuracy of target detection of the 3D point cloud data can be improved.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings that are needed in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present application, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
Fig. 1 is a flowchart of a 3D point cloud target detection method based on weighted relation awareness provided in an embodiment of the present application;
fig. 2 is a functional block diagram of a 3D point cloud object detection device based on weighted relation sensing according to an embodiment of the present application;
fig. 3 is a functional block diagram of an electronic device according to an embodiment of the present application.
Detailed Description
For the purposes of making the objects, technical solutions and advantages of the embodiments of the present application more clear, the technical solutions of the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is apparent that the described embodiments are some embodiments of the present application, but not all embodiments. The components of the embodiments of the present application, which are generally described and illustrated in the figures herein, may be arranged and designed in a wide variety of different configurations.
Thus, the following detailed description of the embodiments of the present application, as provided in the accompanying drawings, is not intended to limit the scope of the application, as claimed, but is merely representative of selected embodiments of the application. All other embodiments, which can be made by one of ordinary skill in the art based on the embodiments herein without making any inventive effort, are intended to be within the scope of the present application.
First, the design concept of the embodiment of the present application will be briefly described.
In real AR applications, it is challenging to detect objects directly from the 3D point cloud. Three-dimensional object detection is to locate all object objects and simultaneously identify semantic tags thereof, which puts high demands on understanding the whole input scene. With the rapid development of deep learning and the continuous expansion of online three-dimensional data set scale, data-driven methods such as CNN have been widely used for target detection. The key discovery of these methods is that for accurate detection, the context is as important as the detection object itself. However, the additional information provided in three dimensions also introduces noise and irregularities, which make it more difficult to use convolution to collect the correct context information for detection.
In order to avoid irregularities when convolution is applied to three-dimensional object detection, two typical methods have recently appeared, one is to project an original point cloud onto an aligned structure, such as a voxel grid, and three-dimensional convolution can be naturally applied, and the other is to directly fuse information of the irregular point cloud by using a maximum pooling method. These methods can achieve good performance with the input scene complete and clear, but the real data is often incomplete and noisy, making it difficult for this method of latent context fusion to extract critical information. To further utilize good context information, some approaches attempt to explicitly utilize context for object detection. Establishing a relationship graph between objects is a conventional approach that utilizes contextual information, some of which utilize inference of a scene graph to enhance understanding of a 3D scene, but require the introduction of additional regression supervision in order to construct the correct scene graph. Still other approaches make use of all possible relationships in the scene to avoid introducing additional tags. Furthermore, even though a hierarchical structure is proposed to maintain context relationships, it is considered that all relationships are still redundant and may have too many noise points. Furthermore, most methods of explicitly utilizing context are specialized network architectures, which are difficult to improve existing detection methods.
For the problems, in order to better utilize the context information and weaken the influence of noise and redundant candidate objects on the context information, the application provides a 3D point cloud target detection method based on weighted relation perception, which can weaken the influence of noise or redundant candidate objects on the context information and improve the target detection precision.
After the application scenario and the design idea of the embodiment of the present application are introduced, the technical solution provided by the embodiment of the present application is described below.
As shown in fig. 1, the implementation of the present application provides a 3D point cloud target detection method based on weighted relation sensing, which includes:
step 101: acquiring original 3D point cloud data;
step 102: processing the original 3D point cloud data by utilizing a PointNet++ network to obtain a plurality of seed points;
PointNet++ is used as the network backbone. The backbox has a plurality of set-abstraction layers layers and a feature propagation (upsampling) layer (feature propagation layers) with skip connections, which output subsets of position coordinates XYZ and a d-dimensional feature vector point; the result is M seed points of dimension (3+d); the set of seed points is:ith seed point s i =[p i ;f i ],p i Is the three-dimensional position of the seed point, f i Is a point cloud feature of the seed point.
Step 103: processing a plurality of seed points by utilizing a shared voting model which is trained in advance to obtain a voting cluster comprising a plurality of votes;
the shared voting model adopts a second multi-layer perceptron; the method comprises the following steps:
processing the ith seed point by using a second multi-layer perceptron to obtain the position offset delta p of the ith seed point i And a characteristic offset Δf i
Calculating the three-dimensional position y after the i-th seed point correction i And point cloud feature g i
y i =p i +Δp i ,g i =f i +Δf i
Voting v of the ith seed point i The method comprises the following steps: v i =[y i ,g i ]The method comprises the steps of carrying out a first treatment on the surface of the The voting cluster is:
step 104: sampling and grouping the voting clusters to obtain a plurality of target candidates;
sampling the voting clusters by using a furthest point sampling algorithm to obtain K center points;
and taking each center point as a sphere center, acquiring a local neighborhood of each sphere center by utilizing a sphere query, taking the local neighborhood of each sphere center as a target candidate, and taking all votes of the local neighborhood of the sphere center as each target candidate.
Step 105: updating the target candidate by using a weighting relation perception proposal generation model which is trained in advance to obtain an updated target candidate;
the weighted relation perception proposal generation model comprises the following steps: the system comprises a weighted self-attention processing unit, a splicing unit, a third multi-layer perceptron and an addition unit; the weighted self-attention processing unit comprises a processing unit and a weighted self-attention layer; the processing unit includes: a first processing branch, a second processing branch, a third processing branch, and a multi-layer perceptron;
updating the target candidate by using a weighting relation perception proposal generation model which is trained in advance to obtain an updated target candidate; comprising the following steps:
processing the K target candidates by using a first processing branch to obtain a matrix Q:
Q=W Q O(C)
wherein W is Q For a parameterized multi-layer perceptron, O (C) represents K target candidates;
processing the K target candidates by using a second processing branch to obtain a matrix P:
P=W P O(C)
wherein W is P Is a parameterized multi-layer perceptron;
processing the K target candidates by using a third processing branch to obtain a matrix V:
V=W V O(C)
wherein W is V Is a parameterized multi-layer perceptron;
MLP using multi-layer perceptron 1 (. Cndot.) predicting whether K target candidates produce positive or negative effects, resulting in a predictive score vector w:
w=MLP 1 (O(C))
processing the matrix Q, the matrix P, the matrix V and the prediction score matrix w by using a weighted self-attention layer to obtain a correlation delta for representing different target candidates:
Δ=softmax(QP T )WV
wherein softmax (·) is a weighted self-attention function; the matrix W is a matrix with the predictive score vector W as a diagonal; in this way, one active candidate will retrieve and aggregate more important information from other candidates that are predicted to be active.
Splicing the association delta with K target candidates O (C) by using a splicing unit to obtain an expanded target candidate O (C) delta, wherein delta represents a juxtaposition operation;
MLP with third multi-layer perceptron 2 (. Cndot.) after learning the expanded target candidates, obtaining a correction amount of the target candidates: MLP (Multi-layer Programming protocol) 2 (O(C)||Δ);
The addition unit is used for adding the target candidate and the correction quantity of the target candidate to obtain an updated target candidate O r (C):
O r (C)=O(C)+MLP 2 (O(C)||Δ)。
O r (C) Comprises itselfContext information that creates positive effects around.
By learning the predictive score vector, allowing more clear dependent data information to be passed between the target candidates, the target candidate will learn more context information from another target candidate whose predictive score is positive.
Step 106: processing the updated target candidates by using a first multi-layer perceptron to obtain an object proposal; decoding the object proposal to obtain a target detection result;
MLP with first multi-layer perceptron s (. Cndot.) for updated target candidate O r (C) Processing is carried out to obtain an object proposal P (C):
P(C)=MLP s (O r (C))
wherein, P (C) is expressed as a multidimensional vector comprising objective scores, bounding box parameters and semantic classification scores; each object proposal P (C) has (5+2nh+4dns+nc) number of channels, where NH is the number of channels containing a class score for each heading and a deviation value from a standard heading under the highest scoring class, NS is the number of channels containing a class score for each dimension and a deviation value from a standard for three dimensions of length, width, and height, and NC is the number of channels containing a semantic class.
Decoding the object proposal P (C) to obtain a target detection result.
Furthermore, the method comprises the following steps: and performing joint training on the shared voting model, the weighted relation perception proposal generation model and the first multi-layer perception machine.
Based on the foregoing embodiments, the present application provides a 3D point cloud target detection device based on weighted relation sensing, and referring to fig. 2, the 3D point cloud target detection device 200 based on weighted relation sensing provided in the present application at least includes:
an acquiring unit 201, configured to acquire original 3D point cloud data;
a seed point generating unit 202, configured to process the original 3D point cloud data by using a pointnet++ network, so as to obtain a plurality of seed points;
a voting unit 203, configured to process a plurality of seed points by using a shared voting model that is trained in advance, so as to obtain a voting cluster that includes a plurality of votes;
a target candidate generating unit 204, configured to sample and group the voting clusters to obtain a plurality of target candidates;
a target candidate updating unit 205, configured to update a target candidate by using a weighted relation perception proposal generation model that is trained in advance, so as to obtain an updated target candidate;
a detection unit 206, configured to process the updated target candidate by using a first multi-layer perceptron that is trained in advance, so as to obtain an object proposal; decoding the object proposal to obtain a target detection result.
It should be noted that, the principle of the 3D point cloud target detection device 200 based on weighted relation sensing provided in the embodiment of the present application to solve the technical problem is similar to the 3D point cloud target detection method based on weighted relation sensing provided in the embodiment of the present application, so the implementation of the 3D point cloud target detection device 200 based on weighted relation sensing provided in the embodiment of the present application may refer to the implementation of the 3D point cloud target detection method based on weighted relation sensing provided in the embodiment of the present application, and the repetition is omitted.
Based on the foregoing embodiments, the embodiment of the present application further provides an electronic device, as shown in fig. 3, where the electronic device 300 provided in the embodiment of the present application includes at least: the method for detecting the 3D point cloud target based on weighted relation sensing comprises a processor 301, a memory 302 and a computer program which is stored in the memory 302 and can be run on the processor 301, wherein the processor 301 executes the computer program to realize the method for detecting the 3D point cloud target based on weighted relation sensing.
The electronic device 300 provided by the embodiments of the present application may also include a bus 303 that connects the different components, including the processor 301 and the memory 302. Bus 303 represents one or more of several types of bus structures, including a memory bus, a peripheral bus, a local bus, and so forth.
The Memory 302 may include readable media in the form of volatile Memory, such as random access Memory (Random Access Memory, RAM) 3021 and/or cache Memory 3022, and may further include Read Only Memory (ROM) 3023.
The memory 302 may also include a program tool 3025 having a set (at least one) of program modules 3024, the program modules 3024 including, but not limited to: an operating subsystem, one or more application programs, other program modules, and program data, each or some combination of which may include an implementation of a network environment.
The electronic device 300 may also communicate with one or more external devices 304 (e.g., keyboard, remote control, etc.), one or more devices that enable a user to interact with the electronic device 300 (e.g., cell phone, computer, etc.), and/or any device that enables the electronic device 300 to communicate with one or more other electronic devices 300 (e.g., router, modem, etc.). Such communication may occur through an Input/Output (I/O) interface 305. Also, electronic device 300 may communicate with one or more networks such as a local area network (Local Area Network, LAN), a wide area network (Wide Area Network, WAN), and/or a public network such as the internet via network adapter 306. As shown in fig. 3, the network adapter 306 communicates with other modules of the electronic device 300 over the bus 303. It should be appreciated that although not shown in fig. 3, other hardware and/or software modules may be used in connection with electronic device 300, including, but not limited to: microcode, device drivers, redundant processors, external disk drive arrays, disk array (Redundant Arrays of Independent Disks, RAID) subsystems, tape drives, data backup storage subsystems, and the like.
It should be noted that the electronic device 300 shown in fig. 3 is only an example, and should not impose any limitation on the functions and application scope of the embodiments of the present application.
The embodiment of the application also provides a computer readable storage medium, and the computer readable storage medium stores computer instructions which are executed by a processor to realize the 3D point cloud target detection method based on weighted relation sensing. Specifically, the executable program may be built-in or installed in the electronic device 300, so that the electronic device 300 may implement the 3D point cloud target detection method based on weighted relation sensing provided in the embodiments of the present application by executing the built-in or installed executable program.
The method provided by the embodiments of the present application may also be implemented as a program product comprising program code for causing an electronic device 300 to perform the weighted relation awareness based 3D point cloud object detection method provided by the embodiments of the present application when the program product is executable on the electronic device 300.
The program product provided by the embodiments of the present application may employ any combination of one or more readable media, where the readable media may be a readable signal medium or a readable storage medium, and the readable storage medium may be, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination thereof, and more specific examples (a non-exhaustive list) of the readable storage medium include: an electrical connection having one or more wires, a portable disk, a hard disk, a RAM, a ROM, an erasable programmable read-Only Memory (Erasable Programmable Read Only Memory, EPROM), an optical fiber, a portable compact disk read-Only Memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
The program product provided by the embodiments of the present application may be implemented as a CD-ROM and include program code that may also be run on a computing device. However, the program product provided by the embodiments of the present application is not limited thereto, and in the embodiments of the present application, the readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
It should be noted that although several units or sub-units of the apparatus are mentioned in the above detailed description, such a division is merely exemplary and not mandatory. Indeed, the features and functions of two or more of the elements described above may be embodied in one element in accordance with embodiments of the present application. Conversely, the features and functions of one unit described above may be further divided into a plurality of units to be embodied.
Furthermore, although the operations of the methods of the present application are depicted in the drawings in a particular order, this is not required to or suggested that these operations must be performed in this particular order or that all of the illustrated operations must be performed in order to achieve desirable results. Additionally or alternatively, certain steps may be omitted, multiple steps combined into one step to perform, and/or one step decomposed into multiple steps to perform.
Finally, it should be noted that the above embodiments are merely illustrative of the technical solution of the present application and not limiting. Although the present application has been described in detail with reference to the embodiments, it should be understood by those skilled in the art that the modifications and equivalents may be made to the technical solutions of the present application without departing from the spirit and scope of the technical solutions of the present application, and all such modifications and equivalents are intended to be encompassed in the scope of the claims of the present application.

Claims (7)

1. A3D point cloud target detection method based on weighted relation perception is characterized by comprising the following steps:
acquiring original 3D point cloud data;
processing the original 3D point cloud data by utilizing a PointNet++ network to obtain a plurality of seed points;
processing a plurality of seed points by utilizing a shared voting model which is trained in advance to obtain a voting cluster comprising a plurality of votes;
sampling and grouping the voting clusters to obtain a plurality of target candidates;
updating the target candidate by using a weighting relation perception proposal generation model which is trained in advance to obtain an updated target candidate;
processing the updated target candidates by using a first multi-layer perceptron which is trained in advance to obtain an object proposal; decoding the object proposal to obtain a target detection result;
sampling and grouping the voting clusters to obtain a plurality of target candidates; comprising the following steps:
sampling the voting clusters by using a furthest point sampling algorithm to obtain K center points;
taking each center point as a sphere center, acquiring a local neighborhood of each sphere center by utilizing sphere query, taking the local neighborhood of each sphere center as a target candidate, and taking all votes of the local neighborhood of the sphere center as the target candidate;
the weighted relation perception proposal generation model comprises the following steps: the system comprises a weighted self-attention processing unit, a splicing unit, a third multi-layer perceptron and an addition unit; the weighted self-attention processing unit comprises a processing unit and a weighted self-attention layer; the processing unit includes: a first processing branch, a second processing branch, a third processing branch, and a multi-layer perceptron;
updating the target candidate by using a weighting relation perception proposal generation model which is trained in advance to obtain an updated target candidate; comprising the following steps:
processing the K target candidates by using a first processing branch to obtain a matrix Q:
Q=W Q O(C)
wherein W is Q For a parameterized multi-layer perceptron, O (C) represents K target candidates;
processing the K target candidates by using a second processing branch to obtain a matrix P:
P=W P O(C)
wherein W is P Is a parameterized multi-layer perceptron;
processing the K target candidates by using a third processing branch to obtain a matrix V:
V=W V O(C)
wherein W is V Is a parameterized multi-layer perceptron;
MLP using multi-layer perceptron 1 (. Cndot.) predicting whether K target candidates produce positive or negative effects, resulting in a predictive score vector w:
w=MLP 1 (O(C))
processing the matrix Q, the matrix P, the matrix V and the prediction score matrix w by using a weighted self-attention layer to obtain a correlation delta for representing different target candidates:
Δ=softmax(QP T )WV
wherein softmax (·) is a weighted self-attention function; the matrix W is a matrix with the predictive score vector W as a diagonal;
splicing the association delta with K target candidates O (C) by using a splicing unit to obtain an expanded target candidate O (C) delta, wherein delta represents a juxtaposition operation;
learning the expanded target candidate by using a third multi-layer perceptron to obtain a correction quantity of the target candidate;
the addition unit is used for adding the target candidate and the correction quantity of the target candidate to obtain an updated target candidate O r (C)。
2. The method of claim 1, wherein the shared voting model employs a second multi-layer perceptron; processing a plurality of seed points by utilizing a shared voting model which is trained in advance to obtain a voting cluster comprising a plurality of votes; comprising the following steps:
processing the ith seed point by using a second multi-layer perceptron to obtain the position offset delta p of the ith seed point i And a characteristic offset Δf i The method comprises the steps of carrying out a first treatment on the surface of the Ith seed point s i =[p i ;f i ],p i Is the three-dimensional position of the seed point, f i A point cloud feature that is a seed point;
calculating the three-dimensional position y after the i-th seed point correction i And point cloud feature g i
y i =p i +Δp i ,g i =f i +Δf i
Voting v of the ith seed point i The method comprises the following steps: v i =[y i ,g i ];
The voting cluster is:m is the number of seed points.
3. The method of claim 1, wherein the object proposal is obtained by processing the updated object candidates with a multi-layer perceptron, and the object proposal is decoded to obtain the object detection result; comprising the following steps:
MLP with first multi-layer perceptron s (. Cndot.) for updated target candidate O r (C) Processing is carried out to obtain an object proposal P (C):
P(C)=MLP s (O r (C))
wherein, P (C) is expressed as a multidimensional vector comprising objective scores, bounding box parameters and semantic classification scores;
decoding the object proposal P (C) to obtain a target detection result.
4. The method according to claim 1, wherein the method further comprises: and performing joint training on the shared voting model, the weighted relation perception proposal generation model and the first multi-layer perception machine.
5. A 3D point cloud target detection apparatus based on weighted relation awareness, comprising:
the acquisition unit is used for acquiring original 3D point cloud data;
the seed point generation unit is used for processing the original 3D point cloud data by utilizing the PointNet++ network to obtain a plurality of seed points;
the voting unit is used for processing the plurality of seed points by utilizing the shared voting model which is trained in advance to obtain a voting cluster comprising a plurality of votes;
the target candidate generation unit is used for sampling and grouping the voting clusters to obtain a plurality of target candidates;
the target candidate updating unit is used for updating the target candidate by utilizing the weighting relation perception proposal generating model which is trained in advance to obtain an updated target candidate;
the detection unit is used for processing the updated target candidates by using a first multi-layer perceptron which is trained in advance to obtain an object proposal; decoding the object proposal to obtain a target detection result;
sampling and grouping the voting clusters to obtain a plurality of target candidates; comprising the following steps:
sampling the voting clusters by using a furthest point sampling algorithm to obtain K center points;
taking each center point as a sphere center, acquiring a local neighborhood of each sphere center by utilizing sphere query, taking the local neighborhood of each sphere center as a target candidate, and taking all votes of the local neighborhood of the sphere center as the target candidate;
the weighted relation perception proposal generation model comprises the following steps: the system comprises a weighted self-attention processing unit, a splicing unit, a third multi-layer perceptron and an addition unit; the weighted self-attention processing unit comprises a processing unit and a weighted self-attention layer; the processing unit includes: a first processing branch, a second processing branch, a third processing branch, and a multi-layer perceptron;
updating the target candidate by using a weighting relation perception proposal generation model which is trained in advance to obtain an updated target candidate; comprising the following steps:
processing the K target candidates by using a first processing branch to obtain a matrix Q:
Q=W Q O(C)
wherein W is Q For a parameterized multi-layer perceptron, O (C) represents K target candidates;
processing the K target candidates by using a second processing branch to obtain a matrix P:
P=W P O(C)
wherein W is P Is a parameterized multi-layer perceptron;
processing the K target candidates by using a third processing branch to obtain a matrix V:
V=W V O(C)
wherein W is V Is a parameterized multi-layer perceptron;
MLP using multi-layer perceptron 1 (. Cndot.) predicting whether K target candidates produce positive or negative effects, resulting in a predictive score vector w:
w=MLP 1 (I(C))
processing the matrix Q, the matrix P, the matrix V and the prediction score matrix w by using a weighted self-attention layer to obtain a correlation delta for representing different target candidates:
Δ=softmax(QP T )WV
wherein softmax (·) is a weighted self-attention function; the matrix W is a matrix with the predictive score vector W as a diagonal;
splicing the association delta with K target candidates O (C) by using a splicing unit to obtain an expanded target candidate O (C) delta, wherein delta represents a juxtaposition operation;
learning the expanded target candidate by using a third multi-layer perceptron to obtain a correction quantity of the target candidate;
the addition unit is used for adding the target candidate and the correction quantity of the target candidate to obtain an updated target candidate O r (C)。
6. An electronic device, comprising: memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the method according to any of claims 1-4 when the computer program is executed.
7. A computer readable storage medium storing computer instructions which, when executed by a processor, implement the method of any one of claims 1-4.
CN202211618478.1A 2022-12-15 2022-12-15 3D point cloud target detection method and device based on weighted relation perception Active CN116051633B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211618478.1A CN116051633B (en) 2022-12-15 2022-12-15 3D point cloud target detection method and device based on weighted relation perception

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211618478.1A CN116051633B (en) 2022-12-15 2022-12-15 3D point cloud target detection method and device based on weighted relation perception

Publications (2)

Publication Number Publication Date
CN116051633A CN116051633A (en) 2023-05-02
CN116051633B true CN116051633B (en) 2024-02-13

Family

ID=86119129

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211618478.1A Active CN116051633B (en) 2022-12-15 2022-12-15 3D point cloud target detection method and device based on weighted relation perception

Country Status (1)

Country Link
CN (1) CN116051633B (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112767447A (en) * 2021-01-25 2021-05-07 电子科技大学 Time-sensitive single-target tracking method based on depth Hough optimization voting, storage medium and terminal
CN112785526A (en) * 2021-01-28 2021-05-11 南京大学 Three-dimensional point cloud repairing method for graphic processing
CN113095205A (en) * 2021-04-07 2021-07-09 北京航空航天大学 Point cloud target detection method based on improved Hough voting
WO2022141718A1 (en) * 2020-12-31 2022-07-07 罗普特科技集团股份有限公司 Method and system for assisting point cloud-based object detection
CN114882495A (en) * 2022-04-02 2022-08-09 华南理工大学 3D target detection method based on context-aware feature aggregation
CN114972654A (en) * 2022-06-15 2022-08-30 清华大学 Three-dimensional target detection method based on roadside point cloud completion
CN115222954A (en) * 2022-06-09 2022-10-21 江汉大学 Weak perception target detection method and related equipment

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2022141718A1 (en) * 2020-12-31 2022-07-07 罗普特科技集团股份有限公司 Method and system for assisting point cloud-based object detection
CN112767447A (en) * 2021-01-25 2021-05-07 电子科技大学 Time-sensitive single-target tracking method based on depth Hough optimization voting, storage medium and terminal
CN112785526A (en) * 2021-01-28 2021-05-11 南京大学 Three-dimensional point cloud repairing method for graphic processing
CN113095205A (en) * 2021-04-07 2021-07-09 北京航空航天大学 Point cloud target detection method based on improved Hough voting
CN114882495A (en) * 2022-04-02 2022-08-09 华南理工大学 3D target detection method based on context-aware feature aggregation
CN115222954A (en) * 2022-06-09 2022-10-21 江汉大学 Weak perception target detection method and related equipment
CN114972654A (en) * 2022-06-15 2022-08-30 清华大学 Three-dimensional target detection method based on roadside point cloud completion

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
Deep Hough voting for 3D object Detection in point clouds;C. R. Qi et al.;《in proceedings of the IEEE/CVF International Conference on Computer Vision》;9277–9286 *
面向自动驾驶目标检测的深度多模态融合技术;张新钰等;《智能***学报》;第15卷(第04期);758-771 *

Also Published As

Publication number Publication date
CN116051633A (en) 2023-05-02

Similar Documents

Publication Publication Date Title
CN109885842B (en) Processing text neural networks
US20190370647A1 (en) Artificial intelligence analysis and explanation utilizing hardware measures of attention
CN112119409B (en) Neural network with relational memory
CN109376222B (en) Question-answer matching degree calculation method, question-answer automatic matching method and device
CN110383299B (en) Memory enhanced generation time model
CN112287126B (en) Entity alignment method and device suitable for multi-mode knowledge graph
Adhikari et al. A comprehensive survey on imputation of missing data in internet of things
Ayodeji et al. Causal augmented ConvNet: A temporal memory dilated convolution model for long-sequence time series prediction
Hua et al. Collaborative active visual recognition from crowds: A distributed ensemble approach
CN115937644B (en) Point cloud feature extraction method and device based on global and local fusion
Ni et al. An improved adaptive ORB-SLAM method for monocular vision robot under dynamic environments
CN113095072B (en) Text processing method and device
Pillai et al. Deep learning for machine health prognostics using Kernel-based feature transformation
Das et al. Environmental sound classification using convolution neural networks with different integrated loss functions
Nagahisarchoghaei et al. Generative Local Interpretable Model-Agnostic Explanations
CN116051633B (en) 3D point cloud target detection method and device based on weighted relation perception
CN116824138A (en) Interactive image segmentation method and device based on click point influence enhancement
CN116109449A (en) Data processing method and related equipment
CN116257632A (en) Unknown target position detection method and device based on graph comparison learning
CN115146258B (en) Request processing method and device, storage medium and electronic equipment
CN117058402B (en) Real-time point cloud segmentation method and device based on 3D sparse convolution
Cuzzocrea Multidimensional Clustering over Big Data: Models, Issues, Analysis, Emerging Trends
CN114663650B (en) Image description generation method and device, electronic equipment and readable storage medium
US20220269984A1 (en) Continuous learning process using concept drift monitoring
KR20230062430A (en) Method, apparatus and system for determining story-based image sequence

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant