CN115049786A - Task-oriented point cloud data down-sampling method and system - Google Patents

Task-oriented point cloud data down-sampling method and system Download PDF

Info

Publication number
CN115049786A
CN115049786A CN202210689275.5A CN202210689275A CN115049786A CN 115049786 A CN115049786 A CN 115049786A CN 202210689275 A CN202210689275 A CN 202210689275A CN 115049786 A CN115049786 A CN 115049786A
Authority
CN
China
Prior art keywords
point cloud
sampling
task
network
loss
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202210689275.5A
Other languages
Chinese (zh)
Other versions
CN115049786B (en
Inventor
金�一
王旭
岑翼刚
刘柏甫
王涛
李浥东
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Jiaotong University
Original Assignee
Beijing Jiaotong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Jiaotong University filed Critical Beijing Jiaotong University
Priority to CN202210689275.5A priority Critical patent/CN115049786B/en
Publication of CN115049786A publication Critical patent/CN115049786A/en
Application granted granted Critical
Publication of CN115049786B publication Critical patent/CN115049786B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T17/00Three dimensional [3D] modelling, e.g. data description of 3D objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/46Descriptors for shape, contour or point-related descriptors, e.g. scale invariant feature transform [SIFT] or bags of words [BoW]; Salient regional features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y04INFORMATION OR COMMUNICATION TECHNOLOGIES HAVING AN IMPACT ON OTHER TECHNOLOGY AREAS
    • Y04SSYSTEMS INTEGRATING TECHNOLOGIES RELATED TO POWER NETWORK OPERATION, COMMUNICATION OR INFORMATION TECHNOLOGIES FOR IMPROVING THE ELECTRICAL POWER GENERATION, TRANSMISSION, DISTRIBUTION, MANAGEMENT OR USAGE, i.e. SMART GRIDS
    • Y04S10/00Systems supporting electrical power generation, transmission or distribution
    • Y04S10/50Systems or methods supporting the power network operation or management, involving a certain degree of interaction with the load-side end user applications

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Software Systems (AREA)
  • Multimedia (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computing Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Mathematical Physics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Geometry (AREA)
  • Computer Graphics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Image Analysis (AREA)

Abstract

The invention provides a task-oriented point cloud data down-sampling method and a task-oriented point cloud data down-sampling system, which belong to the technical field of point cloud data processing, adjust a resource-intensive structure in a converter network, remove position embedding, simplify an input data embedding layer structure, delete mapping matrix operation of a self-attention mechanism, and introduce a scaling strategy in a feedforward neural network layer; based on a sampling loss function, the coverage area and the attention capacity of a key area of the down-sampled point cloud are expanded, and the point cloud is promoted to be generated to be a proper subset of the original point cloud; and combining the down-sampling module with the task network, and updating the weight parameters of the down-sampling network by using the sampling loss and the task loss. The invention reduces the consumption of computing and storage resources; designing a sampling loss function to promote the acquisition of proper subset point cloud data with more uniform point cloud distribution and more comprehensive key point coverage; the universal downsampling module is combined with the three-dimensional classification task network, and effective balance of performance optimization and resource overhead minimization of the task network is achieved.

Description

Task-oriented point cloud data down-sampling method and system
Technical Field
The invention relates to the technical field of point cloud data processing, in particular to a task-oriented point cloud data down-sampling method and system based on a converter neural network.
Background
In recent years, due to the continuous reduction of the price of three-dimensional sensors, three-dimensional sensor modules such as laser radars are widely used in daily life of people, such as the fields of rail transit, smart traffic, unmanned systems, three-dimensional vision robots, augmented reality technologies, smart cities, point cloud data processing systems, and the like. Meanwhile, with the continuous development of deep learning technology, the point cloud data acquired by the three-dimensional sensor is widely applied to different traffic scenes such as intelligent rail transit, intelligent urban traffic and the like, and data support is provided for realizing the planned designs such as infrastructure intellectualization, standard design digitization, safe trip and the like. In general, in order to obtain refined modeling data in an actual scene, a large amount of dense point cloud data is often collected on the surface of an object, so as to improve modeling accuracy. However, as three-dimensional point cloud processing devices evolve into miniaturized, hand-held devices, processing large-scale dense point clouds on mobile devices or terminals becomes a significant challenge. On the other hand, the acquired large-scale point cloud data provides complete global scene information, but for specific tasks, such as three-dimensional object classification, three-dimensional scene segmentation, three-dimensional point cloud calibration and the like, the resource overhead is increased when the large-scale point cloud data is processed, the scale and the density of points in the point cloud and the task performance or precision show non-positive correlation increase, namely, the performance of a task model cannot be continuously increased along with the continuous increase of the number of points. To solve the above problem, a three-dimensional down-sampling technique is proposed.
The existing down-sampling methods are mainly divided into a traditional method and a deep learning method. Conventional downsampling methods are represented by Farthest Point Sampling (farthst Point Sampling), random Sampling (Randomly Sampling), and voxelization (Voxel). Although the traditional method can complete the task of point cloud down-sampling, the method is data-oriented and does not fully consider the deep-layer geometric characteristics of the point cloud data and the network requirements of downstream tasks, so that sub-optimal sampling results are often obtained. Meanwhile, the conventional method usually needs to perform conventional downsampling on the input original point cloud data for multiple times to select the optimal downsampling result, so as to ensure satisfactory task network output accuracy. It is emphasized that the repeated downsampling operation described above results in a multiple increase of the resource cost, which is contrary to the goal of reducing the resource cost in the point cloud downsampling. The performance of the conventional method therefore remains to be optimized.
Downsampling methods based on deep learning have also been proposed in recent years. Currently popular downsampling methods based on deep learning can be divided into two categories: (1) a particular downsampling layer, and (2) a generic downsampling module. Specifically, the specific down-sampling layer method is characterized in that an embedded point cloud down-sampling network layer is designed and combined with a specific task neural network, and redundant point cloud information is continuously filtered while feature learning is carried out. Although such methods can effectively reduce the point cloud size, they are not friendly to a priori network structures with fixed model structures. The reason for this is that (1) for a predefined network structure, any slight structural change may cause a reduction in output performance; (2) for a priori defined neural network with complex structure and high precision, the resource cost of retraining is huge.
The principle of the task-oriented universal downsampling module design is mainly to design a downsampling module independent of a task network, so that the downsampling module can be combined with any task network needing downsampling under the condition of not changing the task network structure. It should be noted that, existing general task-oriented downsampling modules all adopt a point-based (PointNet-like) deep learning framework, and although the resource overhead of such a structure is low, the existing general task-oriented downsampling modules are limited by such a network to individually process points in a point cloud, and ignore correlation and geometric relationship information before the points, so that the model performance needs to be continuously optimized. The success of the converter network in the machine vision task provides a new idea for the processing of three-dimensional point cloud data. At present, a mainstream converter network enhances the model depth and width of a final converter network by superposing a plurality of converter modules and introducing a multi-head attention network into each module, so that abundant learnable parameters are utilized to fit visual tasks, such as a three-dimensional classification task, a three-dimensional segmentation task and the like, and high output performance is achieved. It should be noted that the conventional transformer network model has a complex structure and huge computation and storage overhead. Instead, the purpose of the point cloud down-sampling task is to save computation and storage overhead. Therefore, the existing converter network is difficult to be directly applied to the point cloud down-sampling task, and the resource consumption of the down-sampling task network is possibly larger than the resource saving caused by point cloud reduction, so that the resource saving is violated.
In summary, in the field of point cloud down-sampling, a converter network framework recently proposed is not well incorporated into the design of a depth model, and meanwhile, there is no effective method for reducing the performance degradation caused by point cloud scale reduction while reducing the resource utilization rate aiming at the existing resource-intensive converter network structure.
Disclosure of Invention
The invention aims to provide a task-oriented point cloud data down-sampling method and a task-oriented point cloud data down-sampling system, which are used for solving at least one technical problem in the background technology.
In order to achieve the purpose, the invention adopts the following technical scheme:
in one aspect, the invention provides a task-oriented point cloud data down-sampling method, which comprises the following steps:
adjusting a resource-intensive structure in a converter network, removing position embedding, simplifying an input data embedding layer structure, deleting mapping matrix operation of a self-attention mechanism, and introducing a scaling strategy in a feedforward neural network layer; the English words of the 'scaling' are scaling, and the scaling strategy represents that the scale of the feedforward neural network layer is increased or reduced according to the requirement of an actual task. Namely, the expansion in the expansion and contraction means the size of the scale of the feedforward neural network is increased, and the contraction means the size of the scale of the feedforward neural network is reduced.
Based on a sampling loss function, expanding the coverage area of the down-sampling point cloud and the attention capacity of a key area, and promoting the generation of the point cloud to be a proper subset of the original point cloud;
and combining the down-sampling module with the task network, and updating the weight parameters of the down-sampling network by using the sampling loss and the task loss.
Optionally, adjusting a resource-intensive structure in the transformer network, removing position embedding, simplifying an input data embedding layer structure, deleting a mapping matrix operation of a self-attention mechanism, and introducing a scaling strategy in a feedforward neural network layer includes:
constructing a lightweight input data embedding layer;
deleting the mapping steps of the query vector and the key vector in the self-attention mechanism, only keeping the dot product operation of the input data, and constructing a lightweight self-correlation attention mechanism;
a scalable feed-forward neural network is constructed. And expanding the network scale according to the prior knowledge according to the scale of the data set and the scene complexity. The feed-forward neural network is a fully-connected neural network (MLP), and the scaling feed-forward neural network is to increase the number of layers of the MLP and the number of neurons in each layer. After the structure is adjusted, testing and fine tuning are carried out until the network converges to a proper precision.
Optionally, the mathematical formula of the sampling loss is expressed as:
L sampling =L CD +αL repl +βL soft ,
wherein α and β are regularization parameters, L CD Represents the chamfer loss function, L repl Representing a repulsion loss function; l is soft Representing a nonlinear mapping penalty.
Optionally, a chamfer loss function L CD (Q, P) is:
Figure BDA0003700974380000041
wherein Q represents the generated point cloud, and Q represents the points in the generated point cloud; p represents the original point cloud and P represents a point in the original point cloud.
Optionally, the repulsion loss function is:
Figure BDA0003700974380000042
where η (r) is max (0, h) 2 -r 2 ) Is a function for ensuring that Q keeps a certain distance from other points in Q, h represents the average separation distance between the generated points, and K represents the number of K neighbors of Q.
Optionally, the nonlinear mapping penalty includes:
and (3) expressing q by using the average weight w of k adjacent points of q as a soft projection point z, wherein a specific mathematical formula is expressed as follows:
Figure BDA0003700974380000043
next, Gumbel-Softmax Trick is used to optimize the weights w, with a specific mathematical formula:
Figure BDA0003700974380000051
wherein t is a learnable temperature coefficient, the distribution shape of the weight w is controlled, and when t is close to 0, the point z can be approximated to a proper subset of the input point cloud;
finally, adding mapping loss into the sampling loss to optimize the nonlinearity and convergence of the soft projection, wherein the specific mathematical formula is expressed as:
L soft =T(t),t∈[0,+∞),
wherein T (-) is a function of T for introducing a nonlinear relationship.
In a second aspect, the present invention provides a task-oriented point cloud data down-sampling system, comprising:
the converter module is used for adjusting a resource-intensive structure in a converter network, removing position embedding, simplifying an input data embedding layer structure, deleting mapping matrix operation of a self-attention mechanism, and introducing a scaling strategy in a feedforward neural network layer;
the sampling loss module is used for expanding the coverage area and the attention capacity of a key area of the down-sampling point cloud based on a sampling loss function and promoting the generation of the point cloud to be a proper subset of the original point cloud;
and the task guide module is used for combining the down-sampling module with the task network and updating the weight parameters of the down-sampling network by using the sampling loss and the task loss.
In a third aspect, the present invention provides a computer apparatus comprising a memory and a processor, the processor and the memory being in communication with each other, the memory storing program instructions executable by the processor, the processor invoking the program instructions to perform a task-oriented point cloud data down-sampling method as described above.
In a fourth aspect, the present invention provides an electronic device comprising a memory and a processor, the processor and the memory being in communication with each other, the memory storing program instructions executable by the processor, the processor invoking the program instructions to perform the task-oriented point cloud data down-sampling method as described above.
In a fifth aspect, the invention provides a computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the task-oriented point cloud data down-sampling method as described above.
Interpretation of terms:
a converter: the Transformer (Transformer) is a new deep learning framework proposed in 2017 by the article "Attention is All YouNeed" of the *** machine translation team. The converter in the deep learning domain has an encoder-decoder (encoder-decoder) structure, which contains three main modules: an input data embedding module (input embedding), a position encoding module (positional encoding), and a self-attention module (self-attention).
Point cloud data: the point cloud data in the rail transit system is a set of vectors in a three-dimensional coordinate system acquired by three-dimensional acquisition equipment, such as a laser radar, a stereo camera and the like, wherein each point contains a three-dimensional coordinate, and some point cloud data also comprise information such as color, depth, reflection intensity and the like.
Downsampling (downsampling): the point cloud data acquired in the rail transit system is often large in scale, for example, the number of point clouds in a single point cloud image can reach hundreds of thousands to millions, but is limited by indexes such as time, energy consumption and the like, and the existing embedded equipment is difficult to directly operate the large-scale data. Meanwhile, due to the influences of weather, road jolt, illumination change and the like, the point cloud data often contains a large number of noise points, which may seriously affect the accuracy of the data, thereby causing the accuracy reduction of unmanned and other analysis systems depending on large data scale. Therefore, in an actual point cloud data processing system, a down-sampling operation of the point cloud is often included, that is, noise points and redundant points in the point cloud data are removed.
The invention has the beneficial effects that: the resource-intensive structure in the converter network is adjusted in a light weight manner, so that the consumption of calculation and storage resources is reduced as much as possible; designing a sampling loss function to promote the acquisition of proper subset point cloud data with more uniform point cloud distribution and more comprehensive key point coverage; and finally, combining the universal downsampling module with the three-dimensional classification task network to realize effective balance of performance optimization and resource overhead minimization of the task network.
Additional aspects and advantages of the invention will be set forth in part in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
Fig. 1 is a processing flow chart of a task-oriented point cloud data down-sampling method based on a lightweight transformer neural network according to an embodiment of the present invention.
Fig. 2 is a specific instantiation structural diagram of a task-oriented lightweight converter network model according to an embodiment of the present invention.
Fig. 3 is a specific instantiation structure diagram of a lightweight autocorrelation attention model according to an embodiment of the present invention.
Fig. 4 is a specific instantiation structure diagram of a target task model for constructing task guidance according to an embodiment of the present invention.
FIG. 5 is a portion of a training sample and a corresponding downsampled point cloud diagram according to an embodiment of the present invention.
Detailed Description
Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like reference numerals refer to the same or similar elements or elements having the same or similar functions throughout. The embodiments described below by way of the drawings are illustrative only and are not to be construed as limiting the invention.
It will be understood by those skilled in the art that, unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs.
It will be further understood that terms, such as those defined in commonly used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of the prior art and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein.
As used herein, the singular forms "a", "an", "the" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.
In the description herein, references to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, various embodiments or examples and features of different embodiments or examples described in this specification can be combined and combined by one skilled in the art without contradiction.
For the purpose of facilitating an understanding of the present invention, the present invention will be further explained by way of specific embodiments with reference to the accompanying drawings, which are not intended to limit the present invention.
It should be understood by those skilled in the art that the drawings are merely schematic representations of embodiments and that the elements shown in the drawings are not necessarily required to practice the invention.
Example 1
This embodiment 1 provides a task-oriented point cloud data down-sampling system, including:
the converter module is used for adjusting a resource-intensive structure in a converter network, removing position embedding, simplifying an input data embedding layer structure, deleting mapping matrix operation of a self-attention mechanism, and introducing a scaling strategy in a feedforward neural network layer;
the sampling loss module is used for expanding the coverage area and the attention capacity of a key area of the down-sampling point cloud based on a sampling loss function and promoting the generation of the point cloud to be a proper subset of the original point cloud;
and the task guide module is used for combining the down-sampling module with the task network and updating the weight parameters of the down-sampling network by utilizing the sampling loss and the task loss.
In this embodiment 1, with the above system, a task-oriented point cloud data down-sampling method is implemented, including:
adjusting a resource-intensive structure in a converter network, removing position embedding, simplifying an input data embedding layer structure, deleting mapping matrix operation of a self-attention mechanism, and introducing a scaling strategy in a feedforward neural network layer; the English words of the 'scaling' are scaling, and the scaling strategy represents that the scale of the feedforward neural network layer is increased or reduced according to the requirement of an actual task. Namely, the expansion in the expansion and contraction means the size of the scale of the feedforward neural network is increased, and the contraction means the size of the scale of the feedforward neural network is reduced.
Based on a sampling loss function, expanding the coverage area of the down-sampling point cloud and the attention capacity of a key area, and promoting the generation of the point cloud to be a proper subset of the original point cloud;
and combining the down-sampling module with the task network, and updating the weight parameters of the down-sampling network by using the sampling loss and the task loss.
The method comprises the following steps of adjusting a resource-intensive structure in a converter network, removing position embedding, simplifying an input data embedding layer structure, deleting mapping matrix operation of a self-attention mechanism, and introducing a scaling strategy in a feedforward neural network layer, wherein the scaling strategy comprises the following steps:
constructing a lightweight input data embedding layer;
deleting the mapping steps of the query vector and the key vector in the self-attention mechanism, only keeping the dot product operation of the input data, and constructing a lightweight self-correlation attention mechanism;
a scalable feed-forward neural network is constructed. The method comprises the following steps: and expanding the network scale according to the prior knowledge according to the scale size of the data set and the scene complexity. The feed-forward neural network is a fully-connected neural network (MLP), and the scaling feed-forward neural network is to increase the number of layers of the MLP and the number of neurons in each layer. After the structure is adjusted, testing and fine tuning are performed until the network converges to a suitable precision.
Wherein, the mathematical formula of the sampling loss is expressed as:
L sampling =L CD +αL repl +βL soft ,
wherein α and β are regularization parameters, L CD Represents the chamfer loss function, L repl Representing a repulsion loss function; l is soft Representing a nonlinear mapping penalty.
Chamfer loss function L CD (Q, P) is:
Figure BDA0003700974380000101
wherein Q represents the generated point cloud, and Q represents the points in the generated point cloud; p represents the original point cloud and P represents a point in the original point cloud.
The rejection loss function is:
Figure BDA0003700974380000102
where η (r) is max (0, h) 2 -r 2 ) Is a function for ensuring that Q keeps a certain distance from other points in Q, h represents the average separation distance between the generated points, and K represents the number of K neighbors of Q.
Nonlinear mapping losses include:
and (3) expressing q by using the average weight w of k adjacent points of q as a soft projection point z, wherein a specific mathematical formula is expressed as follows:
Figure BDA0003700974380000103
next, Gumbel-Softmax Trick is used to optimize the weights w, with a specific mathematical formula:
Figure BDA0003700974380000104
wherein t is a learnable temperature coefficient, the distribution shape of the weight w is controlled, and when t is close to 0, the point z can be approximated to a proper subset of the input point cloud;
finally, adding mapping loss into the sampling loss to optimize the nonlinearity and convergence of the soft projection, wherein the specific mathematical formula is expressed as:
L soft =T(t),t∈[0,+∞),
where T (-) is a function of T to introduce a non-linear relationship.
Example 2
In this embodiment 2, a task-oriented point cloud data down-sampling method based on a lightweight transducer neural network is provided, which redesigns an existing transducer network structure, simplifies a model structure, and ensures that the model structure has a sufficiently strong learning ability as far as possible.
As shown in fig. 1, the processing flow of the method specifically includes the following steps:
step S1: and constructing a lightweight converter model, wherein the model mainly carries out lightweight adjustment on all modules in the traditional converter network, and reduces the expenditure of calculation and storage resources as far as possible while ensuring the learning capability of the model. The specific structure is shown in fig. 2.
Step S1-1: building lightweight input data embedding layers
Firstly, a point cloud data set used for a three-dimensional point cloud classification task is collected by utilizing laser radar equipment and is divided into a training set and a testing set. Secondly, the input data embedding layer is used for mapping the input point cloud data to a high-dimensional feature space to prepare for subsequent feature extraction. Point cloud data comprising N points is given, wherein each point comprises three-dimensional coordinate information. Compared with the traditional input data embedding layer based on a multilayer shared linear layer, the single-layer shared linear layer (shared linear layer) is utilized to map the original data to a high-dimensional feature space, and the output can be obtained
Figure BDA0003700974380000111
Wherein F o Representing output characteristics, d o Representing the output feature dimension.
Step S1-2: constructing lightweight autocorrelation attention modules
Self-attention (self-attention) was originally designed to address feature extraction in natural language processing. Since the word order is one of the important characteristics of the text meaning in the natural language processing task, the input sequence needs to be converted into a vector by a mapping matrix before calculating the attention score. The traditional multi-head attention model is composed of a plurality of single-head attention modules and has a feature extraction mode executed in parallel. Formalizing a conventional single-headed dot product attention function as:
SA(P)=FC out (Atten(FC Q (P),FC K (P),FC V (P)),
Figure BDA0003700974380000121
Q,K∈R N×(D/a) ,V∈R N×D ,
where P represents the input point cloud, FC (-) represents the linear transformation through the projection matrix, Q, K, V represents the vector representation of the input point cloud after linear transformation, softmax is the activation function, Q.K T Zoom
Figure BDA0003700974380000123
To improve network stability, D is the dimension of the Q and K vectors. Note that a is a scaling factor for the computational cost of maintaining multi-head attention, the computational overhead for maintaining a multi-head attention mechanism is close to that of a single-head mechanism.
On the contrary, the point cloud has a disorder characteristic, positions of two points can be exchanged, and the point cloud representation cannot be influenced, so that the design deletes mapping steps of query vectors (Q) and key vectors (K) in the traditional self-attention mechanism, and only retains point multiplication operation of input data. In theory, this deletion operation is more favorable for the calculation of the attention score matrix of the point cloud data because it satisfies the arrangement invariance, i.e., a i j=a j i, where a represents the attention score between two points, i and j represent any two points in the point cloud, respectivelyThe difference. In addition, in order to further reduce the calculation and storage overhead of the self-attention mechanism, the operation of the value vector (V) is further removed in the present embodiment. In summary, the new calculation operations are all input data auto-correlation operations, so the embodiment is named as auto-correlation attention layer. Formalizing the autocorrelation layer attention function as:
SA(X)=FC out (C(X))
Figure BDA0003700974380000122
where X is the output characteristic of the lightweight input data embedding layer, FC o ut (-) denotes the projection matrix transformed by linearity, softmax denotes the normalization function, and D denotes the characteristic dimension of X. The concrete instantiation structure is shown in fig. 3.
Step S1-3: constructing scalable feed-forward neural networks
Based on the problem of network learnable parameter reduction caused by the lightweight autocorrelation attention module of S1-2, the invention designs a scalable feedforward neural network. The method is mainly characterized in that the scale and the depth of the feedforward neural network are dynamically adjusted according to the requirement of task output performance, so that the purpose of strengthening the network learning capability is achieved.
Step S2: constructing a sampling loss function
Through the operation of step S1, the high-dimensional point cloud feature data containing rich geometric information can be acquired in the present embodiment. The loss function is used for evaluating the degree of inconsistency between the real value and the predicted value of the model, and the better the loss function is, the higher the performance of the model after training is generally. Therefore, in the embodiment, a sampling loss function including a chamfer distance loss function, a rejection loss function and a nonlinear soft mapping loss is designed.
Step S2-1: chamfer loss function
In order to ensure that points generated by a point cloud down-sampling network are a proper subset of original data, the invention firstly introduces a chamfering loss function L C D(Q,P):
Figure BDA0003700974380000131
Wherein Q represents the generated point cloud, and Q represents the points in the generated point cloud; p represents the original point cloud and P represents a point in the original point cloud.
Step S2-2: rejection loss function
The main limitation of the chamfer loss function of step S2-1 is that it ignores the uniform distribution of points, making it difficult for a simplified set of points to effectively cover the entire surface and critical areas of the object. To alleviate this problem, the present invention introduces a repulsive penalty function to encourage homogeneity of the generation points and critical area coverage. The concrete mathematical formula is expressed as L r epl(Q):
Figure BDA0003700974380000141
Where η (r) is max (0, h) 2 -r 2 ) Is a function for ensuring that Q keeps a certain distance from other points in Q, h represents the average separation distance between the generated points, and K represents the number of K neighbors of Q.
Step S2-3: nonlinear soft mapping loss
The set of key points generated by the task-oriented down-sampling module cannot be guaranteed to be a proper subset of the original point cloud. In this case, the generated point set will inevitably lose geometric information. In some studies, additional matching operations are introduced, such as nearest neighbor searches, that map each generated point to a nearest neighbor point in the original point cloud. However, the matching step limits further improvement of the downsampling model performance because the conventional matching operation is not differentiable, i.e. cannot be optimized by network training. Therefore, there is a need for an improved matching algorithm.
In order to solve the above problem, the present embodiment proposes a nonlinear soft projection method to implement differential matching. Firstly, using the average weight w of k adjacent points of q as a soft projection point z to represent q, and a specific mathematical formula is represented as follows:
Figure BDA0003700974380000142
next, Gumbel-Softmax Trick is used to optimize the weights w, with the specific mathematical formula:
Figure BDA0003700974380000143
where t is a learnable temperature coefficient, controlling the distribution shape of the weight w. It is clear that when t goes towards 0, point z can be approximated as a proper subset of the input point cloud.
Finally, adding mapping loss into the sampling loss to optimize the nonlinearity and convergence of the soft projection, wherein the specific mathematical formula is expressed as:
L soft =T(t),t∈[0,+∞),
where T (-) is a function of T to introduce a nonlinear relationship. In summary, the mathematical formula for the sampling loss is:
L sampling =L CD +αL repl +βL soft ,
where α and β are regularization parameters.
Step S3: constructing task-oriented target task models
The lightweight converter neural network provided by the design is a plug-and-play down-sampling module, and is combined with a three-dimensional classification task network to form an end-to-end task-oriented point cloud down-sampling model.
An instantiation of the overall network structure in this embodiment is shown in fig. 4. Firstly, step S1 is to construct a lightweight transformer model for extracting the high-dimensional global geometric features of the point cloud. Next, in step S2, a sampling loss function is designed to optimize the update of the weight parameters in the network training. And finally, inputting the simplified point cloud into a task network, combining the sampling loss and the task loss, and jointly optimizing the weight updating of the down-sampling network. All loss functions are aggregated together and minimized:
argminL sampling (P,Q)+δL task (Q),
where δ is the equilibrium parameter. A portion of the training samples and the corresponding down-sampled point cloud images are shown in fig. 5, where the gray points are the original point clouds and the bold points are the down-sampled point clouds.
In summary, in embodiment 2, a lightweight converter network model is constructed: the model consists of five modules: (1) a lightweight input data embedding layer; (2) a lightweight autocorrelation layer; (3) a scalable feed-forward neural network; (4) layer normalization; (5) a jump connection. The point cloud data is mapped to a high-dimensional feature space by the lightweight input mapping layer to prepare for subsequent deep feature learning; the lightweight autocorrelation layer is used for extracting refined global feature information; the scalable feedforward neural network introduces a scaling mechanism on the basis of the traditional feedforward neural network, increases the width and depth of a down-sampling network under the constraint of limited resource overhead, and improves the learning parameter number, thereby improving the learning capability of the down-sampling network. Layer normalization and hopping connections are used to optimize the network training process, preventing gradient explosion and overfitting. A sampling loss function is constructed: the sampling loss function contains three loss functions: and the chamfer distance loss function, the rejection loss function and the nonlinear soft mapping loss are jointly used for improving the down-sampling network training performance. And constructing a task-oriented target task model. The lightweight converter neural network provided by the design is a plug-and-play universal sampling module and can be combined with any task network or model with point cloud down-sampling requirements theoretically. And finally, cascading the lightweight converter network, the sampling loss function and the task neural network to form an end-to-end task-oriented point cloud down-sampling model.
Example 3
Embodiment 3 of the present invention provides an electronic device, including a memory and a processor, where the processor and the memory are in communication with each other, the memory stores program instructions executable by the processor, and the processor invokes the program instructions to execute a task-oriented point cloud data down-sampling method based on a transformer neural network, where the method includes the following steps:
adjusting a resource-intensive structure in a converter network, removing position embedding, simplifying an input data embedding layer structure, deleting mapping matrix operation of a self-attention mechanism, and introducing a scaling strategy in a feedforward neural network layer;
based on a sampling loss function, expanding the coverage area of the down-sampling point cloud and the attention capacity of a key area, and promoting the generation of the point cloud to be a proper subset of the original point cloud;
and combining the down-sampling module with the task network, and updating the weight parameters of the down-sampling network by using the sampling loss and the task loss.
Example 4
An embodiment 4 of the present invention provides a computer-readable storage medium, in which a computer program is stored, and when the computer program is executed by a processor, the computer program implements a task-oriented point cloud data down-sampling method based on a transformer neural network, where the method includes the following steps:
adjusting a resource-intensive structure in a converter network, removing position embedding, simplifying an input data embedding layer structure, deleting mapping matrix operation of a self-attention mechanism, and introducing a scaling strategy in a feedforward neural network layer;
based on a sampling loss function, the coverage area and the attention capacity of a key area of the down-sampled point cloud are expanded, and the point cloud is promoted to be generated to be a proper subset of the original point cloud;
and combining the down-sampling module with the task network, and updating the weight parameters of the down-sampling network by using the sampling loss and the task loss.
Example 5
Embodiment 5 of the present invention provides a computer device, including a memory and a processor, where the processor and the memory are in communication with each other, the memory stores program instructions executable by the processor, and the processor calls the program instructions to execute a task-oriented point cloud data down-sampling method based on a transformer neural network, where the method includes the following steps:
adjusting a resource-intensive structure in a converter network, removing position embedding, simplifying an input data embedding layer structure, deleting mapping matrix operation of a self-attention mechanism, and introducing a scaling strategy in a feedforward neural network layer;
based on a sampling loss function, expanding the coverage area of the down-sampling point cloud and the attention capacity of a key area, and promoting the generation of the point cloud to be a proper subset of the original point cloud;
and combining the down-sampling module with the task network, and updating the weight parameters of the down-sampling network by using the sampling loss and the task loss.
In summary, the point cloud down-sampling method based on the lightweight transformer neural network provided by the embodiment of the invention can be used for rail transit, intelligent traffic, unmanned systems, three-dimensional vision robots, augmented reality technologies, intelligent cities and point cloud data processing systems. The method comprises the following steps: constructing a lightweight autocorrelation attention mechanism, and extracting refined global geometric information of the point cloud; combining hardware resource requirements and global geometric information, and generating down-sampling point cloud data of a specific scale by using a scalable feed-forward neural network; optimizing the generated point cloud based on the sampling loss function designed by the invention, ensuring the generated point cloud to be a proper subset of the original point cloud, and accelerating the convergence speed of the model; and finally, cascading a target task network to complete a specific target task. By utilizing the powerful global feature extraction capability of the converter network, the refined down-sampling of the original point cloud input is completed by designing a brand-new lightweight frame. Specifically, a lightweight autocorrelation attention mechanism is designed, the extraction capability of point cloud geometric characteristic information is optimized, and the calculation amount and parameter amount requirements of a model are compressed; under the condition that resource expenditure is limited, a lightweight scalable feedforward neural network is designed and used for adjusting the depth and width of the neural network, so that the learning capacity of the neural network is enhanced; in order to improve the performance of a point cloud down-sampling task, a new sampling loss function is designed, and the down-sampling point cloud data which is more uniform in point cloud distribution and more comprehensive in key point coverage is obtained based on a lightweight neural network; by combining the modules, a plug-and-play point cloud down-sampling model based on task guidance is designed, the module can be combined with a three-dimensional classification task neural network, the original point cloud geometric information is completely kept as far as possible under the condition that the point cloud scale is reduced, and the effective balance of performance optimization and resource overhead minimization of the three-dimensional classification task network is completed.
As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
Although the embodiments of the present invention have been described with reference to the accompanying drawings, it is not intended to limit the scope of the present invention, and it should be understood by those skilled in the art that various modifications and variations can be made without inventive efforts based on the technical solutions disclosed in the present invention.

Claims (10)

1. A task-oriented point cloud data down-sampling method is characterized by comprising the following steps:
adjusting a resource-intensive structure in a converter network, removing position embedding, simplifying an input data embedding layer structure, deleting mapping matrix operation of a self-attention mechanism, and introducing a scaling strategy in a feedforward neural network layer;
based on a sampling loss function, the coverage area and the attention capacity of a key area of the down-sampled point cloud are expanded, and the point cloud is promoted to be generated to be a proper subset of the original point cloud;
and combining the down-sampling module with the task network, and updating the weight parameters of the down-sampling network by using the sampling loss and the task loss.
2. The task-oriented point cloud data down-sampling method of claim 1, wherein adjusting resource-intensive structures in the transformer network, removing position embedding, simplifying input data embedding layer structures, deleting mapping matrix operations from attention mechanism, introducing scaling strategies at the feedforward neural network layer, comprises:
constructing a lightweight input data embedding layer;
deleting the mapping steps of the query vector and the key vector in the self-attention mechanism, only keeping the dot product operation of the input data, and constructing a lightweight self-correlation attention mechanism;
a scalable feed-forward neural network is constructed.
3. The task-oriented point cloud data downsampling method according to claim 1, wherein a mathematical formulation of sampling loss is represented as:
L sampling =L CD +αL repl +βL soft ,
wherein α and β are regularization parameters, L CD Represents the chamfer loss function, L repl Representing a repulsion loss function; l is soft Representing a nonlinear mapping penalty.
4. The task-oriented point cloud data downsampling method of claim 3, wherein a chamfer loss function L CD (Q, P) is:
Figure FDA0003700974370000011
wherein Q represents the generated point cloud, and Q represents the points in the generated point cloud; p represents the original point cloud and P represents a point in the original point cloud.
5. The task-oriented point cloud data downsampling method of claim 3, wherein the rejection loss function is:
Figure FDA0003700974370000021
where η (r) is max (0, h) 2 -r 2 ) Is a function for ensuring that Q keeps a certain distance from other points in Q, h represents the average separation distance between the generated points, and K represents the number of K neighbors of Q.
6. The task-oriented point cloud data downsampling method of claim 3, wherein the nonlinear mapping penalty comprises:
and (3) expressing q by using the average weight w of k adjacent points of q as a soft projection point z, wherein a specific mathematical formula is expressed as follows:
Figure FDA0003700974370000022
next, Gumbel-Softmax Trick is used to optimize the weights w, with a specific mathematical formula:
Figure FDA0003700974370000023
wherein t is a learnable temperature coefficient, the distribution shape of the weight w is controlled, and when t is close to 0, the point z can be approximated to a proper subset of the input point cloud;
finally, adding mapping loss into the sampling loss to optimize the nonlinearity and convergence of the soft projection, wherein the specific mathematical formula is expressed as:
L soft =T(t),t∈[0,+∞),
where T (-) is a function of T to introduce a non-linear relationship.
7. A task-oriented point cloud data down-sampling system, comprising:
the converter module is used for adjusting a resource-intensive structure in a converter network, removing position embedding, simplifying an input data embedding layer structure, deleting mapping matrix operation of a self-attention mechanism, and introducing a scaling strategy in a feedforward neural network layer;
the sampling loss module is used for expanding the coverage area and the attention capacity of a key area of the down-sampling point cloud based on a sampling loss function and promoting the generation of the point cloud to be a proper subset of the original point cloud;
and the task guide module is used for combining the down-sampling module with the task network and updating the weight parameters of the down-sampling network by utilizing the sampling loss and the task loss.
8. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out a task-oriented point cloud data down-sampling method according to any one of claims 1 to 6.
9. A computer device comprising a memory and a processor, the processor and the memory in communication with one another, the memory storing program instructions executable by the processor, the processor invoking the program instructions to perform the task-oriented point cloud data down-sampling method of any of claims 1-6.
10. An electronic device comprising a memory and a processor, the processor and the memory in communication with one another, the memory storing program instructions executable by the processor, the processor invoking the program instructions to perform the task-oriented point cloud data down-sampling method of any of claims 1-6.
CN202210689275.5A 2022-06-17 2022-06-17 Task-oriented point cloud data downsampling method and system Active CN115049786B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210689275.5A CN115049786B (en) 2022-06-17 2022-06-17 Task-oriented point cloud data downsampling method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210689275.5A CN115049786B (en) 2022-06-17 2022-06-17 Task-oriented point cloud data downsampling method and system

Publications (2)

Publication Number Publication Date
CN115049786A true CN115049786A (en) 2022-09-13
CN115049786B CN115049786B (en) 2023-07-18

Family

ID=83160762

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210689275.5A Active CN115049786B (en) 2022-06-17 2022-06-17 Task-oriented point cloud data downsampling method and system

Country Status (1)

Country Link
CN (1) CN115049786B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116029022A (en) * 2022-12-23 2023-04-28 内蒙古自治区交通建设工程质量监测鉴定站(内蒙古自治区交通运输科学发展研究院) Three-dimensional visualization temperature field construction method for tunnel and related equipment

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111382496A (en) * 2018-12-29 2020-07-07 达索***公司 Learning neural networks for inferring editable feature trees
CN113870160A (en) * 2021-09-10 2021-12-31 北京交通大学 Point cloud data processing method based on converter neural network
CN114445280A (en) * 2022-01-21 2022-05-06 太原科技大学 Point cloud down-sampling method based on attention mechanism

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111382496A (en) * 2018-12-29 2020-07-07 达索***公司 Learning neural networks for inferring editable feature trees
CN113870160A (en) * 2021-09-10 2021-12-31 北京交通大学 Point cloud data processing method based on converter neural network
CN114445280A (en) * 2022-01-21 2022-05-06 太原科技大学 Point cloud down-sampling method based on attention mechanism

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116029022A (en) * 2022-12-23 2023-04-28 内蒙古自治区交通建设工程质量监测鉴定站(内蒙古自治区交通运输科学发展研究院) Three-dimensional visualization temperature field construction method for tunnel and related equipment

Also Published As

Publication number Publication date
CN115049786B (en) 2023-07-18

Similar Documents

Publication Publication Date Title
Mousavi et al. Traffic light control using deep policy‐gradient and value‐function‐based reinforcement learning
CN110060475B (en) Multi-intersection signal lamp cooperative control method based on deep reinforcement learning
CN112634276B (en) Lightweight semantic segmentation method based on multi-scale visual feature extraction
CN113887610B (en) Pollen image classification method based on cross-attention distillation transducer
CN111860951B (en) Rail transit passenger flow prediction method based on dynamic hypergraph convolutional network
Wang et al. Interpretable decision-making for autonomous vehicles at highway on-ramps with latent space reinforcement learning
CN112819080B (en) High-precision universal three-dimensional point cloud identification method
CN113516133B (en) Multi-modal image classification method and system
CN113052254A (en) Multi-attention ghost residual fusion classification model and classification method thereof
CN113449736A (en) Photogrammetry point cloud semantic segmentation method based on deep learning
CN111767678A (en) Structure on-demand design method of metamaterial electromagnetic induction transparent device based on deep learning neural network
CN115049786B (en) Task-oriented point cloud data downsampling method and system
CN111898316A (en) Construction method and application of super-surface structure design model
Yu et al. 3D CNN-based accurate prediction for large-scale traffic flow
Liu et al. Data augmentation technology driven by image style transfer in self-driving car based on end-to-end learning
CN116977712A (en) Knowledge distillation-based road scene segmentation method, system, equipment and medium
Lou et al. Meta-reinforcement learning for multiple traffic signals control
CN115860119A (en) Low-sample knowledge graph completion method and system based on dynamic meta-learning
CN113570712B (en) 3D modeling optimization method based on GCN
Yu et al. An incremental learning based convolutional neural network model for large-scale and short-term traffic flow
Yan [Retracted] Application Method of Environmental Protection Building Elements Based on Artificial Intelligence Technology in the Field of Urban Planning and Design
Song et al. Efficient 3D object recognition in mobile edge environment
Rüb et al. A Practical View on Training Neural Networks in the Edge
CN117036698B (en) Semantic segmentation method based on dual feature knowledge distillation
Kurama et al. Detection of natural features and objects in satellite images by semantic segmentation using neural networks

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant