WO2022194035A1 - 一种三维模型构建方法、神经网络训练方法以及装置 - Google Patents

一种三维模型构建方法、神经网络训练方法以及装置 Download PDF

Info

Publication number
WO2022194035A1
WO2022194035A1 PCT/CN2022/080295 CN2022080295W WO2022194035A1 WO 2022194035 A1 WO2022194035 A1 WO 2022194035A1 CN 2022080295 W CN2022080295 W CN 2022080295W WO 2022194035 A1 WO2022194035 A1 WO 2022194035A1
Authority
WO
WIPO (PCT)
Prior art keywords
point
points
manifold
sub
cloud data
Prior art date
Application number
PCT/CN2022/080295
Other languages
English (en)
French (fr)
Inventor
黄经纬
张彦峰
孙明伟
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2022194035A1 publication Critical patent/WO2022194035A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T17/00Three dimensional [3D] modelling, e.g. data description of 3D objects
    • G06T17/10Constructive solid geometry [CSG] using solid primitives, e.g. cylinders, cubes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/46Descriptors for shape, contour or point-related descriptors, e.g. scale invariant feature transform [SIFT] or bags of words [BoW]; Salient regional features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks

Definitions

  • the present application relates to the field of artificial intelligence, and in particular, to a method for constructing a three-dimensional model, a method for training a neural network, and an apparatus.
  • the input 3D reconstruction data is reconstructed into a vectorized 3D model, which can be used for fast rendering and interaction of the terminal.
  • the acquisition methods of 3D models can be divided into two methods: artificial modeling and reconstruction using acquisition equipment.
  • the reconstruction of acquisition equipment can truly restore the scene, but the data is often full of noise and huge volume.
  • Artificial modeling usually uses basic shapes (such as planes, cylinders, spheres, cones) to abstractly fit the real environment, so the model is small and inaccurate, but has a better structured expression.
  • the features of the point cloud can be extracted through the PointNet++ network, and the instance ID of each point, the normal vector and the type of instance primitives, etc. can be output.
  • this method can only identify and reconstruct a small number of objects, and for many instances In other scenarios, the implementation cost is high or even impossible, resulting in weak generalization ability.
  • the present application provides a method for constructing a three-dimensional model, a method for training a neural network, and an apparatus, which are used to implement instance segmentation at the primitive level to obtain a simplified three-dimensional model.
  • the present application provides a method for constructing a three-dimensional model, including: first, acquiring point cloud data, where the point cloud data includes a plurality of points and information corresponding to each point, and each point has corresponding information Can include information such as depth, pixel value, brightness value, or intensity value; then, the point cloud data is input into the sub-manifold prediction network to obtain prediction results for multiple points, and the prediction results are used to identify each of the multiple points Whether the point and the adjacent points belong to the same sub-manifold, where the sub-manifold prediction network extracts features from the point cloud data, obtains the feature corresponding to each point in the multiple points, and determines each point according to the corresponding feature of each point.
  • a sub-manifold prediction network can be used to predict whether a pair of points is in the same sub-manifold, and then, according to the prediction results of all points, a plurality of points on the boundary of the sub-manifold are screened out. corner points, so as to construct the shape of the sub-manifold according to the corner points, and then combine to obtain a simplified three-dimensional model.
  • the present application predicts whether point pairs are in the same sub-manifold through the sub-manifold prediction network obtained by training, thereby dividing multiple points in the point cloud data into different sub-manifolds or primitives Instance segmentation at the primitive level can be achieved, and instance segmentation can be achieved very accurately, which can improve the effect of the final 3D model and make the 3D model more detailed on the basis of simplification. Even if there is noise, this application The provided method can also adapt to different noise levels by training a submanifold prediction network, improving the accuracy of the output 3D model.
  • the aforementioned sub-manifold prediction network extracts features from point cloud data, which may include: extracting features from the point cloud data in units of each point and the adjacent first preset number of points feature to obtain the local features corresponding to each point; downsample the point cloud data to obtain downsampled data, the resolution of the downsampled data is lower than that of the point cloud data; extract features from the downsampled data to obtain each The global feature corresponding to the point; the local feature and the global feature are fused to obtain the feature corresponding to each point in the multiple points.
  • local features fused with local information can be extracted, and global information can be fused in a larger range, so that the information included in the features corresponding to each point is more complex, and the features with higher complexity can be obtained, which is equivalent to Each point and its surroundings can be described more accurately, which in turn makes subsequent prediction results more accurate.
  • the aforementioned down-sampling of the point cloud data may include: dividing the point cloud data to obtain a plurality of voxels, each voxel including at least one point and each of the corresponding at least one point Local features of the points; the aforementioned feature extraction from the down-sampling data may include: using the points in each voxel in the plurality of voxels and the points in the adjacent second preset number of voxels as units, performing feature extraction , the global feature is obtained, and the number of points in the second preset number of voxels is not less than the first preset number.
  • the range of extracted features can be expanded by means of downsampling, so as to obtain global features with stronger correlation with the surroundings, so that the features corresponding to each point contain more information.
  • the aforementioned determination of whether each point and adjacent points belong to the same submanifold according to the characteristics of each point may include: determining the method corresponding to each point according to the characteristics corresponding to each point vector; determine whether each point and adjacent points belong to the same submanifold based on the features of each point, the normal vector of each point, and the normal vectors of adjacent points.
  • selecting a plurality of corner points from a plurality of points according to a prediction result includes: performing triangular meshing on the plurality of points to form at least one triangular mesh; The boundary belonging to the same sub-manifold is extracted from the grid; multiple corner points are extracted from the points on the boundary belonging to the same sub-manifold extracted from at least one triangular mesh.
  • multiple points in the point cloud data can be triangularly constructed, and the boundary belonging to the same submanifold can be extracted from the triangular mesh according to the output result of the submanifold prediction network, and then extracted from the triangular mesh. Points are extracted on the boundary as corner points, so that a simplified 3D model can be constructed through the corner points.
  • constructing a three-dimensional model according to a plurality of corner points may include: constructing at least one Delaunay triangle mesh by using the plurality of corner points and the geodesic distances between the plurality of corner points; merging at least one Delaunay triangular mesh to obtain a 3D model.
  • a Delaunay triangular mesh can be constructed based on the geodesic distance, so as to obtain a simplified three-dimensional model efficiently and accurately.
  • the present application provides a neural network training method, including: first, acquiring training data, the training data includes a plurality of points and a label corresponding to each point, and the label corresponding to each point includes a label for indicating each point
  • the prediction results include each point in the multiple points and the adjacent points.
  • the sub-manifold prediction network extracts features from the point cloud data, obtains the features corresponding to each point in multiple points, and determines each point and adjacent points according to the corresponding features of each point Whether they belong to the same sub-manifold; calculate the loss value according to the prediction result and the label corresponding to each point; update the sub-manifold prediction network according to the loss value, and obtain the updated sub-manifold prediction network.
  • a sub-manifold prediction network can be trained to output whether the point pairs in the point cloud data are in the same sub-manifold, so that the prediction result of the network can be predicted according to the sub-manifold during inference
  • the aforementioned sub-manifold prediction network extracts features from point cloud data, which may include: extracting features from the point cloud data in units of each point and the adjacent first preset number of points feature to obtain local features; downsample the point cloud data at least once to obtain downsampled data, the resolution of the downsampled data is lower than that of the point cloud data; extract features from the downsampled data to obtain global features; fuse local features features and global features to obtain the feature corresponding to each point in multiple points.
  • local features fused with local information can be extracted, and global information can be fused in a larger range, so that the information included in the features corresponding to each point is more complex, and the features with higher complexity can be obtained, which is equivalent to Each point and its surroundings can be described more accurately, which in turn makes subsequent prediction results more accurate.
  • performing one of the at least one downsampling on the point cloud data may include: dividing the point cloud data to obtain a plurality of voxels, each voxel including at least one point and a corresponding Local features of each point in the at least one point; extracting features from the down-sampled data, including: in units of points in each voxel of the plurality of voxels and points in the adjacent second preset number of voxels, At least one feature extraction is performed to obtain global features, and the number of points in the second preset number of voxels is not less than the first preset number.
  • the range of extracted features can be expanded by means of downsampling, so as to obtain global features with stronger correlation with the surroundings, so that the features corresponding to each point contain more information.
  • the aforementioned determination of whether each point and adjacent points belong to the same submanifold according to the characteristics of each point may include: determining the prediction corresponding to each point according to the characteristics corresponding to each point Normal vector; according to the feature of each point, the predicted normal vector of each point and the predicted normal vector of adjacent points, determine whether each point and adjacent points belong to the same submanifold.
  • the prediction result further includes a normal vector corresponding to each point, and the label of each point also includes a ground truth normal vector corresponding to each point; the aforementioned prediction results correspond to each point
  • Calculating the loss value according to the label of may include: calculating the loss value according to the normal vector corresponding to each point and the ground truth normal vector corresponding to each point.
  • the normal vector can also be included in the output of the trained sub-manifold prediction network by defining the method of including the normal phasor in the prediction result, so as to facilitate the determination of whether the point pair is in the same sub-manifold. more accurate identification.
  • an embodiment of the present application provides a three-dimensional model construction apparatus, and the three-dimensional model construction apparatus has the function of implementing the three-dimensional model construction method of the first aspect.
  • This function can be implemented by hardware or by executing corresponding software by hardware.
  • the hardware or software includes one or more modules corresponding to the above functions.
  • an embodiment of the present application provides a neural network training apparatus, and the neural network training apparatus has the function of implementing the neural network training method of the second aspect.
  • This function can be implemented by hardware or by executing corresponding software by hardware.
  • the hardware or software includes one or more modules corresponding to the above functions.
  • an embodiment of the present application provides an apparatus for constructing a three-dimensional model, including: a processor and a memory, wherein the processor and the memory are interconnected through a line, and the processor invokes program codes in the memory to execute any one of the above-mentioned first aspects
  • the processing-related functions in the three-dimensional model construction method shown in the item may be a chip.
  • an embodiment of the present application provides a neural network training apparatus, including: a processor and a memory, wherein the processor and the memory are interconnected through a line, and the processor invokes program codes in the memory to execute any of the foregoing second aspects
  • the processing-related functions in the neural network training method shown in item may be a chip.
  • an embodiment of the present application provides a three-dimensional model construction device.
  • the three-dimensional model construction device may also be called a digital processing chip or a chip.
  • the chip includes a processing unit and a communication interface.
  • the processing unit obtains program instructions through the communication interface.
  • the instructions are executed by a processing unit, and the processing unit is configured to perform processing-related functions as in the first aspect or any of the optional embodiments of the first aspect.
  • an embodiment of the present application provides a neural network training device.
  • the neural network training device may also be called a digital processing chip or a chip.
  • the chip includes a processing unit and a communication interface.
  • the processing unit obtains program instructions through the communication interface, and the program The instructions are executed by a processing unit, and the processing unit is configured to perform processing-related functions as described in the second aspect or any of the optional embodiments of the second aspect.
  • an embodiment of the present application provides a computer-readable storage medium, including instructions, which, when executed on a computer, cause the computer to execute the method in any optional implementation manner of the first aspect or the second aspect. .
  • an embodiment of the present application provides a computer program product including instructions, which, when run on a computer, enables the computer to execute the method in any optional implementation manner of the first aspect or the second aspect.
  • Fig. 1 is a schematic diagram of a main frame of artificial intelligence applied by the application
  • FIG. 2 is a schematic diagram of a system architecture provided by the application.
  • FIG. 3 is a schematic diagram of another system architecture provided by an embodiment of the present application.
  • FIG. 4 is a schematic flowchart of a neural network training method provided by an embodiment of the present application.
  • FIG. 5 is a schematic flowchart of a method for constructing a three-dimensional model provided by the application
  • FIG. 6 is a schematic structural diagram of a sub-manifold prediction network provided by an embodiment of the present application.
  • FIG. 7 is a schematic diagram of specific steps performed by a submanifold prediction network provided by an embodiment of the present application.
  • FIG. 8 is a schematic flowchart of another three-dimensional model construction method provided by an embodiment of the present application.
  • FIG. 9 is a schematic diagram of comparison of models output in various ways provided by an embodiment of the present application.
  • FIG. 10 is a schematic diagram of an instance segmentation and a three-dimensional model provided by an embodiment of the present application.
  • FIG. 11 is a schematic structural diagram of a neural network training apparatus provided by an embodiment of the application.
  • FIG. 12 is a schematic structural diagram of a three-dimensional model building apparatus provided by an embodiment of the present application.
  • FIG. 13 is a schematic structural diagram of another neural network training device provided by the application.
  • FIG. 14 is a schematic structural diagram of another three-dimensional model construction device provided by the application.
  • FIG. 15 is a hardware execution flow of the server provided by the embodiment of the application when the neural network training method is executed;
  • FIG. 16 is a schematic structural diagram of a terminal according to an embodiment of the present application.
  • FIG. 17 is a schematic structural diagram of another terminal provided by an embodiment of the present application.
  • FIG. 18 is a schematic structural diagram of a server according to an embodiment of the present application.
  • FIG. 19 is a schematic structural diagram of another server provided by an embodiment of the present application.
  • FIG. 20 is a schematic structural diagram of a chip according to an embodiment of the present application.
  • Figure 1 shows a schematic structural diagram of the main frame of artificial intelligence.
  • the above-mentioned artificial intelligence theme framework is explained in two dimensions (vertical axis).
  • the "intelligent information chain” reflects a series of processes from data acquisition to processing. For example, it can be the general process of intelligent information perception, intelligent information representation and formation, intelligent reasoning, intelligent decision-making, intelligent execution and output. In this process, data has gone through the process of "data-information-knowledge-wisdom".
  • the "IT value chain” reflects the value brought by artificial intelligence to the information technology industry from the underlying infrastructure of human intelligence, information (providing and processing technology implementation) to the industrial ecological process of the system.
  • the infrastructure provides computing power support for artificial intelligence systems, realizes communication with the outside world, and supports through the basic platform. Communicate with the outside through sensors; computing power is provided by intelligent chips, such as central processing unit (CPU), network processor (neural-network processing unit, NPU), graphics processor (English: graphics processing unit, GPU), Application specific integrated circuit (ASIC) or field programmable gate array (field programmable gate array, FPGA) and other hardware acceleration chips) are provided; the basic platform includes distributed computing framework and network related platform guarantee and support, It can include cloud storage and computing, interconnection networks, etc. For example, sensors communicate with external parties to obtain data, and these data are provided to the intelligent chips in the distributed computing system provided by the basic platform for calculation.
  • intelligent chips such as central processing unit (CPU), network processor (neural-network processing unit, NPU), graphics processor (English: graphics processing unit, GPU), Application specific integrated circuit (ASIC) or field programmable gate array (field programmable gate array, FPGA) and other hardware acceleration chips
  • CPU central processing unit
  • the data on the upper layer of the infrastructure is used to represent the data sources in the field of artificial intelligence.
  • the data involves graphics, images, voice, and text, as well as IoT data from traditional devices, including business data from existing systems and sensory data such as force, displacement, liquid level, temperature, and humidity.
  • Data processing usually includes data training, machine learning, deep learning, search, reasoning, decision-making, etc.
  • machine learning and deep learning can perform symbolic and formalized intelligent information modeling, extraction, preprocessing, training, etc. on data.
  • Reasoning refers to the process of simulating human's intelligent reasoning method in a computer or intelligent system, using formalized information to carry out machine thinking and solving problems according to the reasoning control strategy, and the typical function is search and matching.
  • Decision-making refers to the process of making decisions after intelligent information is reasoned, usually providing functions such as classification, sorting, and prediction.
  • some general capabilities can be formed based on the results of data processing, such as algorithms or a general system, such as translation, text analysis, computer vision processing, speech recognition, image identification, etc.
  • Intelligent products and industry applications refer to the products and applications of artificial intelligence systems in various fields. They are the encapsulation of the overall solution of artificial intelligence, and the productization of intelligent information decision-making to achieve landing applications. Its application areas mainly include: intelligent terminals, intelligent transportation, Smart healthcare, autonomous driving, smart city, etc.
  • the point cloud data of the map can be collected by laser, and then use the Point cloud data to build AR maps.
  • the point cloud data of the scene currently captured can be collected by a camera, and then a 3D model in the current scene can be constructed based on the point cloud data, and then applied to the terminal's image processing or games, thereby improving user experience.
  • the methods provided in this application have value in AI systems, terminal applications or cloud services.
  • the grid is reconstructed through various algorithms. For each point, the Quadric energy is calculated according to its neighbor coordinates, and then the edge/point with the lowest energy is selected through the priority queue, and the selected element is deleted. And maintain the local topology to ensure that the local is still a manifold. After deleting enough points, the simplified mesh is output, and the combination is the simplified 3D model.
  • Quadric describes local energy, so there is no guarantee that each removal of the local element with the lowest energy will not destroy the overall structure. In the case of requiring a high simplification rate, it is often necessary to destroy the overall structure, resulting in the geometric shape being too different from the original data. This method is hereinafter referred to as the common method 1.
  • a large plane is selected by the geometric plane fitting method, the extracted plane is extended along the boundary until it intersects with other planes, and the space divided by the plane is tetrahedral.
  • the energy optimization based on graphcut is performed, and all the surfaces adjacent to the inner and outer tetrahedra at the same time are extracted as the final mesh.
  • this method assumes that the constructed point cloud is composed of closed surfaces. In practical application scenarios, the 3D model may not be all closed surfaces. If it is assumed that the constructed point cloud is a plane, there may actually be Curved surfaces, therefore, have weak generalization and poor robustness in plane detection. This method is hereinafter referred to as the common method 2.
  • the features of the point cloud data are extracted through the PointNet++ network, and the instance, normal vector and the type of the primitive of the instance corresponding to each point are output.
  • this method only reconstructs objects, and the generalization ability is weak, and there are a large number of instances in large scenes. This method consumes a large amount of computing power and is difficult to implement. This method is hereinafter referred to as the common method four.
  • the present application provides a neural network training method and a three-dimensional model construction method. By judging whether each point and adjacent points are in the same sub-manifold, the points on the boundary of the sub-manifold are screened out, thereby constructing a simplified 3D model after.
  • a neural network training method provided in this application can be used to train a sub-manifold prediction network, and the sub-manifold prediction network can be used to identify whether each point in the input point cloud data and adjacent points are in the same in a submanifold.
  • the sub-manifold prediction network is obtained by training the neural network training method provided in this application, and then the point cloud data can be reconstructed in 3D based on the prediction result of the sub-manifold prediction network to obtain a reconstructed simplified 3D model.
  • the neural network training method and the three-dimensional model construction method provided by the present application are respectively a training phase and an inference phase, and in the inference phase, a step of using the prediction result of the neural network to perform three-dimensional reconstruction is added.
  • the neural network training method and the three-dimensional model construction method provided in this application can be applied to terminals, servers, cloud platforms, and the like.
  • the sub-manifold prediction network can be trained in the server, and then the sub-manifold prediction network can be deployed in the terminal, and the terminal can execute the three-dimensional model construction method provided by this application; or the sub-manifold prediction network can be trained in the terminal, and the The sub-manifold prediction network is deployed in the terminal, and then the terminal executes the three-dimensional model construction method provided by this application through the sub-manifold prediction network; or, the sub-manifold prediction network can be trained in the server, and then the sub-manifold The network is deployed in the server, and the server executes the three-dimensional model construction method and the like provided by the present application.
  • the system architecture includes a database 230 , a client device 240 , a training device 220 and an execution device 210 .
  • the data collection device 260 is used to collect data and store it in the database 230
  • the training device 220 obtains the target model/rule 201 by training based on the data maintained in the database 230 .
  • the execution device 210 is used for predicting the target model/rule 201 trained by the training device 220 to process the data input by the client device 240 , and to feed back the output result to the client device 240 .
  • the training device 220 can be used to train the neural network and output the target model/rule 201 .
  • the execution device 210 can call data, codes, etc. in the data storage system 250 , and can also store data, instructions, etc. in the data storage system 250 .
  • the target model/rule 201 is the sub-manifold prediction network obtained by training in the following embodiments of the present application, please refer to the relevant descriptions in FIGS. 4-10 below for details.
  • the execution device 210 may further include a calculation module 211 for processing the input data using the target model/rule 201 .
  • the target model/rule 201 obtained by the training device 220 can be applied to different systems or devices. As shown in FIG. 2 , the target model/rule 201 can be deployed in the execution device 210 .
  • the execution device 210 is configured with a transceiver 212 (taking the I/O interface as an example) to perform data interaction with external devices, and the “user” can input data to the I/O interface 212 through the client device 240, for example,
  • the client device 240 may send the point cloud data that needs to be reconstructed for the three-dimensional model to the execution device 210 .
  • the transceiver 212 returns the three-dimensional model of the computing module 211 to the client device 240, so that the client device 240 or other devices can use the three-dimensional model for other operations, such as image processing or application in games.
  • the training device 220 can obtain corresponding target models/rules 201 based on different data for different tasks, so as to provide users with better results.
  • the data input into the execution device 210 can be determined according to the input data of the user, for example, the user can operate in the interface provided by the transceiver 212 .
  • the client device 240 can automatically input data to the transceiver 212 and obtain the result. If the client device 240 automatically inputs data and needs to obtain the authorization of the user, the user can set the corresponding permission in the client device 240 .
  • the user can view the result output by the execution device 210 on the client device 240, and the specific presentation form can be a specific manner such as display, sound, and action.
  • the client device 240 can also act as a data collection end to store the collected data associated with the target task into the database 230 .
  • FIG. 2 is only an exemplary schematic diagram of a system architecture provided by an embodiment of the present application, and the positional relationship among the devices, devices, modules, etc. shown in the figure does not constitute any limitation.
  • the data storage system 250 is an external memory relative to the execution device 210 . In other scenarios, the data storage system 250 may also be placed in the execution device 210 .
  • the training process of the neural network is to learn the way to control the spatial transformation, and more specifically, to learn the weight matrix.
  • the purpose of training a neural network is to make the output of the neural network as close to the expected value as possible, so you can compare the predicted value and expected value of the current network, and then update the weight of each layer of the neural network in the neural network according to the difference between the two. vector (of course, the weight vector can usually be initialized before the first update, that is, the parameters are pre-configured for each layer in the deep neural network). For example, if the predicted value of the network is too high, the value of the weight in the weight matrix is adjusted to reduce the predicted value, and after continuous adjustment, the value output by the neural network is close to or equal to the expected value.
  • the difference between the predicted value and the expected value of the neural network can be measured by a loss function or an objective function.
  • the loss function as an example, the higher the output value of the loss function (loss), the greater the difference.
  • the training of the neural network can be understood as the process of reducing the loss as much as possible. For the process of updating the weight of the starting point network and training the serial network in the following embodiments of the present application, reference may be made to this process, which will not be repeated below.
  • a target model/rule 201 is obtained by training according to the training device 220 , and the target model/rule 201 may be the sub-popularity prediction network mentioned in the present application in this embodiment of the present application.
  • the device for training the sub-manifold prediction network and the device where the sub-manifold is deployed may be the same device, that is, the training device 220 and the execution device 210 as shown in FIG. 2 may be the same device or setting. in the same device.
  • the training device may be a terminal, and the execution device may be a server, or the training device may be a server, and the execution device may be the same server.
  • the neural network training method provided by the present application can be executed by the server cluster 310, that is, the sub-manifold prediction network is trained, and the trained sub-manifold prediction network is sent to the network through the communication network.
  • Terminal 301 to deploy the sub-manifold prediction network in the terminal 301 .
  • the point cloud data collected by the camera or lidar of the terminal can be used as the input of the sub-manifold prediction network, and the terminal processes the output of the sub-manifold prediction network, and outputs the simplified 3D model reconstructed.
  • the 3D model can be used for image processing on the terminal to identify the type of each object in the image, or the reconstructed 3D model can be applied to the AR game of the terminal, so that the AR game can be combined with the real scene where the user is located to improve user experience .
  • the neural network training method provided in this application may be executed by a server, and the submanifold prediction network obtained by training may be deployed in the server.
  • the server can be used to execute the three-dimensional model construction method provided by this application, the server can receive point cloud data sent by the client, or extract point cloud data from locally stored data, and then use the three-dimensional model construction method provided by this application, Build a simplified 3D model. If the point cloud data is sent from the client to the server, the server can feed back the simplified 3D model to the client.
  • the embodiments of the present application involve some neural network-related applications.
  • some related terms and concepts of neural networks that may be involved in the embodiments of the present application are first introduced below.
  • the embodiments of the present application involve related applications in the field of neural networks and images.
  • relevant terms and concepts of neural networks that may be involved in the embodiments of the present application are first introduced below.
  • a neural network can be composed of neural units, and a neural unit can refer to an operation unit that takes x s and an intercept 1 as input, and the output of the operation unit can be shown in formula (1-1):
  • W s is the weight of x s
  • b is the bias of the neural unit.
  • f is an activation function of the neural unit, which is used to introduce nonlinear characteristics into the neural network to convert the input signal in the neural unit into an output signal.
  • the output signal of the activation function can be used as the input of the next convolutional layer, and the activation function can be a sigmoid function.
  • a neural network is a network formed by connecting a plurality of the above single neural units together, that is, the output of one neural unit can be the input of another neural unit.
  • the input of each neural unit can be connected with the local receptive field of the previous layer to extract the features of the local receptive field, and the local receptive field can be an area composed of several neural units.
  • Convolutional neural network is a deep neural network with a convolutional structure.
  • a convolutional neural network consists of a feature extractor consisting of convolutional layers and subsampling layers, which can be viewed as a filter.
  • the convolutional layer refers to the neuron layer in the convolutional neural network that convolves the input signal.
  • a convolutional layer of a convolutional neural network a neuron can only be connected to some of its neighbors.
  • a convolutional layer usually contains several feature planes, and each feature plane can be composed of some neural units arranged in a rectangle. Neural units in the same feature plane share weights, and the shared weights here are convolution kernels. Shared weights can be understood as the way to extract image information is independent of location.
  • the convolution kernel can be initialized in the form of a matrix of random size, and the convolution kernel can obtain reasonable weights by learning during the training process of the convolutional neural network.
  • the immediate benefit of sharing weights is to reduce the connections between the layers of the convolutional neural network, while reducing the risk of overfitting.
  • the network for extracting features mentioned below in this application may include one or more layers of convolutional layers.
  • the network for extracting features may be implemented by using CNN.
  • Loss function also known as cost function (cost function), a measure to compare the difference between the prediction output of the machine learning model for the sample and the real value (also known as the supervision value) of the sample, namely Used to measure the difference between the predicted output of a machine learning model for a sample and the true value of the sample.
  • the loss function can generally include loss functions such as mean square error, cross entropy, logarithm, and exponential.
  • the mean squared error can be used as a loss function, defined as Specifically, a specific loss function can be selected according to the actual application scenario.
  • Stochastic gradient The number of samples in machine learning is large, so the loss function of each calculation is calculated from the data obtained by random sampling, and the corresponding gradient is called stochastic gradient.
  • BP Back propagation
  • Manifold a geometric shape that is isomorphic to Euclidean space in any subset domain.
  • Submanifold A subset of a manifold, which itself has the structure of a manifold.
  • a submanifold can include one or more primitives, such as a primitive can be understood as a submanifold .
  • Point cloud data formed by multiple points, each point has corresponding information, such as depth, brightness or intensity.
  • the present application provides a neural network training method and a three-dimensional model construction method. It can be understood that the neural network training method and the three-dimensional model construction method provided by the present application are respectively a training phase and an inference phase, and in the inference phase, the use of neural network is added. The prediction result of the network is used for the step of 3D reconstruction.
  • the method provided by the present application is divided into different stages, such as a training stage and an inference stage, for introduction.
  • the training phase is the neural network training method provided by the present application
  • the reasoning phase is the process of the three-dimensional model construction method provided by the present application.
  • FIG. 4 a schematic flowchart of a method for constructing a three-dimensional model provided by the present application is as follows.
  • the training data may include point cloud data and labels corresponding to multiple points in the point cloud data.
  • the point cloud data may include data formed by a plurality of points.
  • the point cloud data can be collected by a camera, collected by a lidar, or read from stored data.
  • the label corresponding to each point may include an indication of whether the point pair formed by the point and any point is in the same submanifold. That is, it indicates whether a certain point in the point cloud data and other points are in the same subpopulation.
  • the label corresponding to each point may also include information on the tangent plane of the submanifold where each point is located, such as a normal vector.
  • the true value data of the point cloud data can be marked.
  • the true value data can include: taking each point and a certain number of points adjacent to it as a point pair, and marking it with an identifier. Whether each point pair is in the same submanifold.
  • the ground truth data may also include: tangent plane information of the primitive corresponding to each point in the point cloud data, the primitive may include the primitive formed by each point and the nearest point, for example, each point may be represented as a coordinate o i and the corresponding normal vector ni .
  • the sub-manifold prediction network can be supervised with training data, so that the output of the sub-manifold prediction network is closer to the labels of the point cloud data.
  • the sub-manifold prediction network may be a pre-trained network that outputs a prediction result corresponding to each of the multiple points. This prediction can be used to indicate whether each point and its neighbors belong to the same submanifold.
  • the sub-manifold prediction network can extract features from the input point cloud data, and then identify whether each point and adjacent points belong to the same sub-manifold according to the extracted features. For example, features can be extracted in units of each point and a certain number of points around it, and then based on the extracted features, it can be identified whether each point and its neighbors belong to the same sub-manifold.
  • features can be extracted from the point cloud data in units of each point and the adjacent first preset number of points to obtain local features corresponding to each point, and the point cloud
  • the data is down-sampled to obtain down-sampled data with lower resolution, and the features are extracted from the down-sampled data to obtain the global features of each point, and the features corresponding to each point can be obtained by fusing the global features and local features.
  • the point cloud data can be downsampled multiple times, and then each downsampling can extract features from the corresponding feature map after the current downsampling, so that feature extraction can be performed iteratively, so that the final
  • the obtained feature can refer more to the features of each adjacent point, improve the accuracy and complexity of the feature, and increase the implicit information included in the feature.
  • the specific manner of downsampling may include: dividing the point cloud data to obtain a plurality of voxels, and each voxel may include at least one point and a local feature of each point. It is equivalent to dividing multiple points into multiple grids, each grid includes one or more points, and each point has corresponding local features. Then, feature extraction can be performed in units of points within each voxel and a second preset number of adjacent voxels to obtain a global feature corresponding to each point. The number is not less than the first preset number.
  • features can be extracted in a larger range by downsampling, and the feature maps of each point and more adjacent points can be combined, so that the global features of each point extracted can be extracted. , increasing the information included in the features of each point.
  • the manner of fusing the local features and the global features may include: splicing the local features and the global features to form a feature of each point. For example, if the local feature is a 16-dimensional feature, and the global feature is also a 16-dimensional feature, a 32-dimensional feature can be obtained by splicing.
  • the features that can better describe the information of each point can be obtained by combining the local features and the global features, and the information included in the features of each point is increased, so that the subsequent judgment of the prediction result of each point is more accurate. precise.
  • the normal vector of the plane where each point is located can also be output through the sub-manifold prediction network.
  • the part of the extracted feature can be used as the information of the tangent plane corresponding to each point, so that after the feature of each point is extracted, the information of the predicted tangent plane can be obtained based on some of the features.
  • the information of this tangent plane can be used to determine whether a pair of points is in the same submanifold.
  • the information of the tangent plane may include the normal vector of the tangent plane, and it may be determined whether the point pair is in the same submanifold through the offset between the normal vectors of the two points in the point pair.
  • the loss value between the prediction result corresponding to each point and the label of each point can be calculated, and then the sub-manifold prediction network can be reversely updated using the loss value to obtain the updated sub-flow. shape prediction network.
  • the loss function may adopt a loss function such as mean square error, cross entropy, logarithm, and exponential.
  • the loss between the predicted result and the true value can be calculated through the loss function, and then backpropagated based on the loss to calculate the gradient, which is equivalent to the reciprocal vector of the parameters of the sub-manifold prediction network, and then use the gradient to update the sub-manifold Manifold prediction network parameters.
  • step 404 Determine whether the convergence condition is met, if yes, go to step 405, if not, go to step 402.
  • the updated sub-manifold prediction network After the updated sub-manifold prediction network is obtained, it can be judged whether the convergence conditions are met. If the convergence conditions are met, the updated sub-manifold prediction network can be output, that is, the training of the sub-manifold prediction network is completed. If the convergence conditions are not met, the sub-manifold prediction network can continue to be trained, that is, step 402 is repeatedly executed until the convergence conditions are met.
  • the convergence condition may include one or more of the following: the number of times of training the sub-manifold prediction network reaches a preset number of times, or the output accuracy of the sub-manifold prediction network is higher than the preset accuracy value, or, the sub-manifold prediction network The average accuracy of , is higher than the preset average, or the training duration of the sub-manifold prediction network exceeds the preset duration, etc.
  • the updated sub-manifold prediction network can be output.
  • the neural network training method provided by the present application is executed by a server, after a converged sub-manifold prediction network is obtained, the sub-manifold prediction network can be deployed in the server or terminal.
  • the sub-manifold prediction network can be trained to output whether the point pairs in the point cloud data are in the same sub-manifold, so that the prediction results of the sub-manifold prediction network can be used for inference during inference.
  • the boundary of the sub-manifold corresponding to the point cloud data is constructed, and then a simplified 3-D model is constructed, and the accuracy of the obtained 3-D model can be improved, so that the sub-manifold included in the 3-D model is richer and the details are richer.
  • the updated sub-manifold prediction network is trained to output whether the point pairs in the point cloud data are in the same sub-manifold.
  • the trained sub-manifold prediction network can be used to predict whether the point pairs in the input point cloud data are in the same sub-manifold, and then the boundary of the sub-manifold formed by the point cloud data can be determined according to the prediction result. , and construct a simplified 3D model from the points on the boundaries of the submanifold. The flow of the three-dimensional model construction method provided by the present application will be described in detail below.
  • FIG. 5 a schematic flowchart of a method for constructing a three-dimensional model provided by the present application is as follows.
  • the point cloud data includes data formed by a plurality of points, and the point cloud data can refer to the point cloud data mentioned in the aforementioned step 401. The difference is that the point cloud data in this step does not have a label, which will not be repeated here. .
  • the sub-manifold prediction network may be a network obtained by training the method steps in the aforementioned FIG. 4 , using point cloud data as the input of the sub-manifold prediction network, and outputting prediction results of multiple points.
  • the prediction result may include an identifier indicating whether the point pairs in the plurality of points are in the same sub-manifold.
  • the prediction result may also include information about the tangent plane corresponding to each point, such as the normal vector of the tangent plane.
  • Information on this tangent plane can be used to help identify pairs of points in the same submanifold.
  • the normal vector can be used to identify whether the tangent planes of two points are the same or parallel, so as to help determine whether the two points are in the same submanifold.
  • a plurality of corner points can be screened out from the plurality of points, and the corner points include the plurality of points. Points on the boundary of the formed submanifold.
  • the points in the same sub-manifold can be known, and the points on the boundary of the sub-manifold can be selected as the corner points.
  • the sub-manifold is a polygon
  • points at both ends of each side of the polygon can be selected as corner points, and the connection between the corner points can form a simplified pattern of the sub-manifold.
  • the specific method of screening corner points may include: performing triangular meshing on multiple points, that is, connecting three adjacent points to form one or more triangular meshes. Points on the boundary of the same subpopulation are selected from one or more triangular meshes, and corner points are extracted from them. For example, when the sub-manifold is a polygon, the corner points of the polygon can be extracted, and when the sub-manifold is a circle, a point can be selected as the corner point after each distance from the boundary of the circle. Therefore, in the embodiments of the present application, a simplified sub-manifold shape can be obtained by selecting corner points, so as to facilitate subsequent construction of a simplified three-dimensional model.
  • each submanifold After obtaining multiple corner points, the shape of each submanifold can be determined according to the multiple corner points, and then combined into a simplified three-dimensional model.
  • one or more Delaunay triangle meshes may be constructed by using the selected corner points and the geodesic distances between the corner points.
  • a simplified three-dimensional model can be obtained by merging the plurality of Delaunay triangular meshes.
  • a sub-manifold prediction network can be used to predict whether a pair of points is in the same sub-manifold, and then, according to the prediction results of all points, a plurality of points on the boundary of the sub-manifold are screened out. corner points, so that the shape of the sub-manifold can be constructed according to the corner points, so that the simplified three-dimensional model can be obtained by combining, and the simplified three-dimensional model can be efficiently constructed.
  • the present application predicts whether point pairs are in the same sub-manifold through the sub-manifold prediction network obtained by training, thereby dividing multiple points in the point cloud data into different sub-manifolds or primitives Instance segmentation at the primitive level can be achieved, and instance segmentation can be achieved very accurately, which can improve the effect of the final 3D model and make the 3D model more detailed on the basis of simplification. Even if there is noise, this application The provided method can also adapt to different noise levels by training a submanifold prediction network, improving the accuracy of the output 3D model.
  • the structure of the sub-manifold prediction network can be shown in Fig. 6 .
  • the sub-manifold prediction network may include a PointNet++ network (hereinafter referred to as a PN network) 601 , a sparse three-dimensional convolution (spconv) 602 and a boundary discrimination network 603 .
  • a PointNet++ network hereinafter referred to as a PN network
  • spconv sparse three-dimensional convolution
  • boundary discrimination network 603
  • PN networks can be used to extract features from point cloud data to obtain low-resolution local features.
  • Sparse 3D convolution can be used for feature extraction based on low-resolution local features to obtain global features.
  • UNet can be composed of one or more spconvs, which are used to perform multiple convolutions and corresponding deconvolutions on the input features, so as to combine the features of more points near each point and output the global corresponding to each point. feature.
  • the global feature and the local feature can be combined to be the feature corresponding to each point.
  • the network formed by the PN network and spconv can be called a sub-manifold nested network, that is, the sub-manifold nested network can be used to extract features from point cloud data to obtain low-resolution local features.
  • the local features of the resolution are used for feature extraction to obtain global features, and the global features and local features are combined to obtain the corresponding features of each point.
  • the input of the boundary discriminant network includes the features corresponding to each point (that is, the features composed of local features and global features), which are used to judge whether the point pairs composed of each point and adjacent points are in the same submanifold.
  • the input point cloud data can include N points, which are represented by ⁇ pi ⁇ as Nx3 tensors. Among them, the position of each point is identified in the point cloud data, and each point also has corresponding information, such as pixel value, brightness value, depth value or intensity and other information.
  • the point cloud data also has corresponding ground-truth data, such as manually labeled data. It includes: taking the adjacent j points of each point i as point pairs, such as taking each point and its adjacent 16 points as point pairs, and marking whether each point pair is in the same primitive or submanifold , such as whether the point pair is in the same submanifold by True/False.
  • the ground truth data may also include information about the tangent plane corresponding to the primitive or submanifold where each point is located, such as the closest point coordinate o i and the corresponding normal vector ni .
  • the N points are respectively formed into point pairs with adjacent points, as shown in Figure 7, through the K-Nearest Neighbor (KNN) classification algorithm, each point and multiple adjacent points are formed into point pairs, each Point i takes the adjacent j points to form a point pair, and obtains a point pair ⁇ i, j>.
  • KNN K-Nearest Neighbor
  • the PN network can include a two-layer point cloud network (PointNet++, PN).
  • PointNet++ point cloud network
  • the information of each point and the adjacent 16 points is obtained, which can be extracted by conv1D convolution, batchnorm and maxpool operation on the 16 nearest neighbors.
  • the output feature of each point of this layer because the number of adjacent points is 16, the output dimension is 16 dimensions. It is equivalent to extracting features in units of each point and 16 adjacent points, and obtaining the local features corresponding to each point.
  • the N points are voxelized, and the N points are divided into M voxels, and each voxel includes one or more points. It is equivalent to down-sampling N points to obtain M voxels with lower resolution.
  • the feature of each voxel can be set to the average value of the features of the points it includes, or it can also be set to the value with the most distribution, etc., which can be adjusted according to the actual application scenario. Assuming that N points are mapped to M voxels, the voxelized feature tensor is M*16.
  • UNet can perform multiple convolutions and deconvolutions on the input features, for example, perform a convolution operation on the input M*16-dimensional features, extract features from the input data, and then perform a convolution operation on the extracted features.
  • the feature is deconvolved, and the deconvolution operation is used as the input of the next convolution operation, and so on, and finally the M*16-dimensional global feature is output. It is equivalent to extracting features in units of a larger number (ie, the second preset number) of points through UNet to obtain global features that are more relevant to adjacent points.
  • inverse voxelization or devoxelization, is performed, that is, the M*16 feature is fed back to each point, and the corresponding feature of each point is obtained.
  • N points are divided into M voxels, and after M*16 features are obtained, the features of the points in each voxel are set as the features of the voxel, and N*16 is output. global features.
  • the combination method can be spliced, that is, the local feature N*16 and the global feature N*16 are spliced together to obtain the final N*32 feature.
  • a two-layer Multilayer Perceptron can be set to increase the feature dimension from 32 dimensions to 64 dimensions, and then reduce to 32 dimensions, and the output is updated.
  • the N*32 features of It is equivalent to fitting more complex data through MLP and increasing the parameters of the sub-manifold prediction network, so that the output of the trained sub-manifold prediction network is more accurate.
  • the label of each point also includes the information of the tangent plane of the submanifold where it is located
  • some features in the N*32 dimension can also be used as the predicted tangent plane output by the submanifold prediction network
  • the first 6-dimensional features in the N*32-dimensional features are taken as the information of the predicted tangent plane output by the sub-manifold prediction network as an example.
  • the first 6 dimensions are defined as the predicted tangent plane information, that is, the coordinates o i of the closest point of each point and the corresponding normal vector ni
  • the last 26 dimensions can be understood as the implicit feature vector of the local information of the primitive corresponding to the point X i .
  • the loss function can be set to train the sub-manifold prediction network, so that the information of the first 6 dimensions in the N*32 dimension output by the sub-manifold prediction network is closer to or the same as the information of the tangent plane in the true value.
  • the loss function can be, for example, regression loss function (L2-Loss) or mean-square error (mean-square error, MSE). It is equivalent to calculating the error between the feature output in the sub-manifold prediction network and the normal vector included in the ground truth data, and then calculating and updating the gradient of the sub-manifold prediction network according to the error, and updating the gradient of the sub-manifold prediction network according to the gradient. parameter.
  • N points in the point cloud data form N*16 point pairs, and the corresponding features of each point can be output through the aforementioned steps, such as: ⁇ o i ,n i ,X i , o j >, o i represents the position of the point, ni represents the normal vector of the tangent plane of the primitive or submanifold where each point is located, X i represents the feature corresponding to the point, o j represents the formation with o i The position of another point to which the point is paired.
  • the output of the boundary discrimination network can be input into a three-layer MLP (the output dimensions of the three layers are 64, 32 and 2 respectively), and the output is the score of whether each point is a boundary.
  • the marked boundaries and binary cross-entropy calculate the loss value, and then back-update the submanifold prediction network based on the loss value.
  • the above steps S1-1 to S1-8 can be repeatedly performed until the sub-manifold prediction network satisfies the convergence condition, which may include that the number of iterations reaches a preset number, the iteration duration reaches a preset duration, and the output accuracy of the sub-manifold prediction network If it is higher than the first threshold, or the average output accuracy of the sub-manifold prediction network is higher than one or more of the second threshold, etc., a sub-manifold prediction network that meets the requirements is obtained, so as to facilitate subsequent 3D model construction.
  • the convergence condition which may include that the number of iterations reaches a preset number, the iteration duration reaches a preset duration, and the output accuracy of the sub-manifold prediction network If it is higher than the first threshold, or the average output accuracy of the sub-manifold prediction network is higher than one or more of the second threshold, etc.
  • combining PN and Spconv to form a sub-manifold nested network is equivalent to a feature extraction network with high resolution, which improves the recognition accuracy of the subsequent boundary discrimination network, so that the predicted results are in the same
  • the points of the submanifold are more accurate.
  • a large number of real laser-collected point cloud data and the corresponding manual hand-painted 3D model can be artificially constructed as the true value data.
  • the sub-manifold prediction network trained in this way can accurately generate primitive-level prediction results for point cloud data in the nearest way, which improves the accuracy of whether the predicted points are in the same sub-manifold, which in turn can improve
  • the accuracy of the 3D model construction is subsequently carried out to obtain a more accurate and clearer 3D model.
  • the training process of the sub-manifold prediction network is exemplarily introduced above, and the inference process is exemplified below.
  • FIG. 8 another schematic flowchart of the three-dimensional model construction method provided by the present application.
  • the point cloud data includes N points, and the point cloud data is similar to the point cloud data input to the sub-manifold prediction network in the aforementioned Figure 7, the difference is that the point cloud data does not include the true value data, which is not here. Repeat.
  • the point cloud data may include data collected by a lidar or a camera, which includes multiple points, and the multiple points form each instance in the collection scene.
  • the three-dimensional model construction method provided by the present application can be executed by a terminal, and the point cloud data can be collected by a lidar set on the terminal or a camera with a depth that can be collected, and the point cloud data includes N points. N points form each instance in the current scene.
  • the terminal may be a smart car.
  • the smart car is provided with multiple cameras and lidars.
  • the cameras or lidars provided in the car can collect point cloud data of the surrounding environment, and construct a simplified three-dimensional model through the following steps. Model, so that the vehicle can quickly or the surrounding environment information, such as the position of the obstacle, the shape of the obstacle and other information, so as to improve the driving safety of the vehicle.
  • the steps performed by the sub-manifold prediction network can refer to the aforementioned steps S1-1 to S1-8, the difference is that the point cloud data does not have corresponding ground-truth data, and there is no need to train the sub-manifold prediction network, that is, there is no need to calculate the loss value, and similar steps are not repeated here.
  • This step is equivalent to constructing a network based on N points, that is, connecting the N points to form multiple triangular meshes, and predicting whether the point pairs formed by the N points output by the network are in the same sub-manifold according to the sub-manifold. , select edges adjacent to at least two instances or only a unique triangle from the plurality of triangle meshes as the boundaries of the submanifold.
  • the corresponding point is included in the selection point, and step 2 is repeated, otherwise the polyline connected at the end of the selection point is the final simplified polyline, which can ensure that the distance between the simplified polyline and the original polyline does not exceed the threshold.
  • the points in the same submanifold can be determined, and which points are in the submanifold can be determined.
  • the boundary of the sub-manifold is extracted or the points inside the sub-manifold are deleted, and the corner points on the boundary of the sub-manifold are preserved.
  • an undirected graph is established through the 16 nearest neighbors (that is, each point is a point of the graph, and each point is to the nearest 16 points and edges), and use the boundary discrimination network in the sub-manifold prediction network to determine whether each edge of the undirected graph belongs to a different primitive instance, and delete the edges that do not belong to the same instance.
  • the undirected graph is divided into connected components using a standard flood fill algorithm.
  • the point set in each connected component is a primitive instance, and a submanifold can include one or more primitives. It can be understood that after the boundary of the submanifold is determined, the set of points within the boundary constitutes a primitive.
  • the triangular grid mentioned in step S2 can be used as an undirected graph, the point set of which is all corner points in the triangular grid, the edge set is all the edges in the triangular grid, and the distance between the edges is determined. is the Euclidean distance of the two points of the edge.
  • the shortest path from any point in the N points to the corner point extracted in S2 can be calculated, and the nearest corner point of each point in the N points can be recorded, so that the Voronoi can be output. picture.
  • the corners can be connected to form a Delaunay triangular mesh based on the geodesic distance.
  • the Dijkstra algorithm can be used, for example, given N points and the connecting edge between the N points, find N points to each Shortest path to corners. Set the shortest path of the corner point to 0, which is included in the calculation completion point set. The shortest path for the remaining points is infinity. Add the corner to the priority queue. Continue to take out the point with the smallest distance from the queue, traverse all adjacent edges, if the distance of the point plus the edge distance is less than the distance of the adjacent point, update the distance of the adjacent point and join the priority queue. When the queue is empty, the algorithm ends.
  • a simplified three-dimensional model can be obtained by combining the plurality of Delaunay triangle meshes.
  • the sub-manifold prediction network can be used to determine whether the point pairs formed by multiple points in the point cloud data are in the same sub-manifold, which is equivalent to identifying the points in the same sub-manifold, Extract primitive instances, then determine the boundaries of each submanifold, and obtain Delaunay triangle meshes through geodesic distance division.
  • a simplified 3D model in units of primitives is obtained. ; that is, a geometric analysis system from network construction to vectorization of point clouds, combined with tetrahedral mesh construction, boundary simplification and geodesic distance Delaunay triangulation, to obtain a simplified 3D model based on primitives. Therefore, with the method provided in the present application, a simplified three-dimensional model can be output efficiently and accurately, and a lightweight three-dimensional model can be realized.
  • FIG. 9 it is a schematic diagram of comparison based on the commonly used simplified three-dimensional model based on spatial division, the locally simplified model through geometry optimization, and the model output by the three-dimensional model construction method provided by the present application.
  • the plane can be structured based on the method of spatial division to obtain a simplified three-dimensional model, that is, the aforementioned common method 2.
  • a simplified mesh can be reconstructed to form a simplified three-dimensional model, that is, the aforementioned common method 1.
  • the sub-manifold prediction network is used to predict whether each point pair of multiple points is in the same sub-manifold, and the multiple points are connected to construct multiple triangular meshes, and the output of the network is predicted using the sub-manifold. Based on the result, the boundary is selected, and the corner points are selected from the boundary, so as to realize the simplification based on the primitive level, and obtain a more simplified and more accurate 3D model.
  • the three-dimensional model of the model structure can obviously be simplified and more accurately described.
  • instance segmentation can be accurately achieved, as shown in Figure 10.
  • the instance formed by each point in the point cloud data can be identified through the sub-manifold prediction network, and the simplified 3D model can be obtained by combining the tetrahedral meshing, boundary simplification and geodesic distance Delaunay triangulation.
  • the method provided by the present application can predict the network through the sub-manifold, and the simplification ratio can be further reduced to 0.15%, thereby obtaining a simplified three-dimensional model that can accurately describe the instance.
  • the point cloud data may contain noise, or the sub-manifold prediction network may be trained to improve the prediction effect in a noisy environment and improve the robustness of the solution.
  • the methods provided in this application such as using the sub-manifold to predict the output of the network, and use the flood fill algorithm to identify each primitive, and there are a large number of primitives in time. It can also accurately build a 3D model with strong generalization ability.
  • the simplified three-dimensional model can be applied to various scenarios.
  • the surrounding environment information can be collected by the lidar set in the vehicle, and a simplified three-dimensional model can be constructed by the method provided in this application, and the simplified three-dimensional model can be displayed on the display screen of the vehicle, It enables the user to improve the user experience by displaying the simplified 3D model, quickly or the surrounding environment of the vehicle.
  • the user can use the AR map in the terminal, and when the user uses the AR map for navigation, the user can use the terminal to photograph the surrounding environment in real time, and quickly complete the construction of the three-dimensional model through the method provided in this application, and quickly display it on the display screen. Instances are identified so that a navigation path is displayed in the display screen based on the identified instance, so that the user can walk along the navigation path.
  • the method provided in this application can be deployed in a cloud platform.
  • the data collected by the camera or lidar can be sent to the cloud platform through the client, and the cloud platform can Quickly build a simplified 3D model and feed it back to the client, so that users can quickly get a simplified 3D model in a certain area.
  • the flow of the neural network training method and the three-dimensional model construction method provided by the present application is described in detail above, and the training apparatus and the three-dimensional model construction apparatus provided by the present application are introduced below.
  • the training device can be used to perform the steps of the neural network training method mentioned in the aforementioned FIGS. 4-8
  • the three-dimensional model building device can be used to perform the three-dimensional model building method mentioned in the aforementioned FIGS. 5-8 .
  • FIG. 11 a schematic structural diagram of a neural network training apparatus provided by the present application is as follows.
  • the neural network training device may include:
  • the acquisition module 1101 is used for acquiring training data, the training data includes a plurality of points and a label corresponding to each point, and the label corresponding to each point includes the real result of whether each point and adjacent points belong to the same sub-manifold ;
  • the output module 1102 is configured to use multiple points as the input of the sub-manifold prediction network, and output the prediction results of the multiple points, and the prediction results include whether each point and the adjacent points in the multiple points belong to the same sub-manifold, wherein , the sub-manifold prediction network extracts features from the point cloud data, obtains the features corresponding to each point in multiple points, and determines whether each point and adjacent points belong to the same sub-manifold according to the features corresponding to each point;
  • the loss module 1103 is used to calculate the loss value according to the prediction result and the label corresponding to each point;
  • the updating module 1104 is configured to update the sub-manifold prediction network according to the loss value to obtain the updated sub-manifold prediction network.
  • the output module 1102 is specifically configured to: extract features from the point cloud data in units of each point and the adjacent first preset number of points to obtain local features; The data is down-sampled at least once to obtain down-sampled data; features are extracted from the down-sampled data to obtain global features; local features and global features are fused to obtain features corresponding to each of the multiple points.
  • the output module 1102 is specifically configured to: divide the point cloud data to obtain a plurality of voxels, each voxel including at least one point and a local feature of each point in the corresponding at least one point; Taking each voxel in the plurality of voxels and points in the adjacent second preset number of voxels as units, feature extraction is performed at least once to obtain a global feature.
  • the output module 1102 is specifically configured to: determine the predicted normal vector corresponding to each point according to the feature corresponding to each point; The predicted normals of the neighbors, determining whether each point and its neighbors belong to the same submanifold.
  • the prediction result further includes the normal vector corresponding to each point, and the label of each point also includes the true value normal vector corresponding to each point;
  • the loss module 1103 is specifically configured to The normal vector corresponding to the point and the ground-truth normal vector corresponding to each point calculate the loss value.
  • FIG. 12 a schematic structural diagram of a three-dimensional model building apparatus provided by the present application.
  • the three-dimensional model building device may include:
  • the transceiver module 1201 is used to obtain point cloud data, and the point cloud data includes data formed by a plurality of points;
  • the prediction module 1202 is used to input the point cloud data into the sub-manifold prediction network, and output the prediction results of multiple points, and the prediction results are used to identify whether each point and the adjacent points in the multiple points belong to the same sub-manifold, Among them, the sub-manifold prediction network extracts features from the point cloud data, obtains the features corresponding to each point in multiple points, and determines whether each point and adjacent points belong to the same sub-manifold according to the features corresponding to each point;
  • the screening module 1203 is used to screen out a plurality of corner points from a plurality of points according to the prediction results of the plurality of points, and the plurality of corner points include points on the boundary of each sub-manifold formed by the plurality of points;
  • the building module 1204 is configured to build a three-dimensional model according to a plurality of corner points, and a grid formed by the plurality of corner points constitutes a manifold in the three-dimensional model.
  • the prediction module 1202 is specifically configured to perform the following steps through the subpopulation prediction network: extract features from the point cloud data in units of each point and a first preset number of adjacent points , obtain the local features corresponding to each point; downsample the point cloud data to obtain downsampled data; extract features from the downsampled data to obtain the global features corresponding to each point; fuse local features and global features to obtain multiple The feature corresponding to each of the points.
  • the prediction module 1202 is specifically configured to: divide the point cloud data to obtain a plurality of voxels, and each voxel includes at least one point and a local feature of each point in the corresponding at least one point; Using each voxel in the plurality of voxels and points in the adjacent second preset number of voxels as units, feature extraction is performed to obtain a global feature.
  • the prediction module 1202 is specifically configured to: determine the normal vector corresponding to each point according to the feature corresponding to each point; according to the feature of each point, the normal vector of each point and the adjacent points The normal vector of , determines whether each point and its neighbors belong to the same submanifold.
  • the screening module 1203 is specifically configured to: perform triangular meshing on multiple points to form at least one triangular mesh; Boundary; extracts multiple corner points from points on the boundary of the same submanifold extracted from at least one triangular mesh.
  • the construction module 1204 is specifically configured to: construct at least one Delaunay triangle mesh by using multiple corner points and geodesic distances between the multiple corner points; merge at least one Delaunay triangle mesh to get a 3D model.
  • FIG. 13 is a schematic structural diagram of another neural network training apparatus provided by the present application, as described below.
  • the neural network training apparatus may include a processor 1301 and a memory 1302 .
  • the processor 1301 and the memory 1302 are interconnected by wires.
  • the memory 1302 stores program instructions and data.
  • the memory 1302 stores program instructions and data corresponding to the steps in the foregoing FIG. 4 to FIG. 10 .
  • the processor 1301 is configured to perform the method steps performed by the training apparatus shown in any of the foregoing embodiments in FIG. 4 to FIG. 10 .
  • the neural network training apparatus may further include a transceiver 1303 for receiving or sending data.
  • Embodiments of the present application also provide a computer-readable storage medium, where a program is stored in the computer-readable storage medium, and when the computer-readable storage medium runs on a computer, the computer is made to execute the above-described embodiments shown in FIG. 4 to FIG. 10 . Steps in a neural network training method.
  • the aforementioned neural network training apparatus shown in FIG. 13 is a chip.
  • FIG. 14 a schematic structural diagram of another three-dimensional model building apparatus provided by the present application.
  • the three-dimensional model building apparatus may include a processor 1401 and a memory 1402 .
  • the processor 1401 and the memory 1402 are interconnected by wires.
  • the memory 1402 stores program instructions and data.
  • the memory 1402 stores program instructions and data corresponding to the steps in the foregoing FIG. 4 to FIG. 10 .
  • the processor 1401 is configured to execute the method steps executed by the three-dimensional model building apparatus shown in any of the foregoing embodiments in FIG. 4 to FIG. 10 .
  • the three-dimensional model building apparatus may further include a transceiver 1403 for receiving or sending data.
  • Embodiments of the present application also provide a computer-readable storage medium, where a program is stored in the computer-readable storage medium, and when the computer-readable storage medium runs on a computer, the computer is made to execute the above-described embodiments shown in FIG. 4 to FIG. 10 . Steps in a 3D model building method.
  • the aforementioned three-dimensional model building device shown in FIG. 14 is a chip.
  • the neural network training method can be deployed on a server, and of course, can also be deployed on a terminal, and the application is exemplified by deploying on a server as an example below.
  • the hardware execution flow when the server executes the neural network training method is as shown in FIG. 15 .
  • the server 1500 may include neural network running hardware, such as a GPU/Ascend chip 1501, a CPU 1502, and a memory 1503 as shown in FIG. 15 .
  • neural network running hardware such as a GPU/Ascend chip 1501, a CPU 1502, and a memory 1503 as shown in FIG. 15 .
  • the server-side GPU/Ascend chip 1501 can be used to read the ground truth data from the memory during training, and use the read ground truth data to train the neural network.
  • the GPU/Ascend1501 can use the CPU1502 to read the ground truth data corresponding to the point cloud data from the database, wherein the data of each point can be divided into tangent plane information (such as normal vector) and whether the point pair is in the same subsection Manifold discriminant information.
  • the GPU/Ascend1501 can train the submanifold nesting network 1504 through the tangent plane information in the ground truth data, and use the output result of the submanifold nesting network 1504 and the point pairs in the ground truth data whether they are in the same submanifold.
  • the discriminative information is used to train the boundary discriminant network 1505, thereby realizing the training of the sub-manifold nested network and the boundary discriminant network.
  • the trained submanifold prediction network can be deployed on a server or terminal for use in the inference phase.
  • the two modules, the sub-manifold nesting network 1504 and the boundary discriminating network 1505, determine the hardware deployment according to the application environment of the server or terminal: if inference is performed on the server side, the sub-manifold nesting is deployed on the GPU/Ascend chip on the server side.
  • Network 1504 and Boundary Discrimination Network 1505; if inference is performed at the terminal, the sub-manifold nesting network 1504 and Boundary Discrimination Network 1505 are deployed in the terminal GPU/D chip.
  • the three-dimensional model construction method provided by the present application can be deployed on a terminal or a server, and an exemplary introduction will be given below.
  • the terminal 1600 may include a GPU/D chip 1601 and a CPU 1602 .
  • the GPU/D chip 1601 can be used to run the sub-manifold prediction network, that is, the sub-manifold nesting network and the boundary discrimination network as shown in FIG.
  • the prediction result is equivalent to dividing multiple points in the point cloud data, and dividing each point into a corresponding sub-manifold.
  • the CPU 1602 can perform instance segmentation on the point cloud data according to the prediction result fed back by the GPU/D chip 1601, and identify each sub-manifold in the point cloud data.
  • the CPU 1602 can also use multiple points in the input point cloud data to construct a mesh to form multiple triangular meshes, and then extract the triangular meshes as the boundary of the submanifold based on the instance segmentation result, and filter out the corner points. Then, the corner points are triangulated based on the geodesic distance to form a plurality of Delaunay triangles, and the sets are combined to obtain the output simplified 3D model.
  • the terminal can apply the simplified three-dimensional model to an AR map, an AR game, or other scenes.
  • visual or depth information may be collected through a camera or a laser (Dtof) set on the terminal. Then, the corresponding point cloud data is obtained through the SLAM algorithm. Then the GPU/D chip 1601 can output the prediction result of whether each point pair in the point cloud data is in the same sub-manifold through the sub-manifold prediction network, which is equivalent to the instance segmentation result of the output point cloud data.
  • Dtof laser
  • the CPU1602 performs instance segmentation according to the output results of the sub-manifold prediction network, that is, according to the output results of the sub-manifold prediction network, identifies points in the same sub-manifold or the same primitive, and constructs the three-dimensional model provided by this application.
  • the method obtains the vectorized 3D model.
  • the 3D model can be directly sent to the GPU for rendering, and the instance segmentation results can be used as information to apply to AR transactions. For example, in AR games, structural information (ground, wall, etc.) Build a reasonable AR game scene.
  • more functions can be endowed on the terminal, providing direct vectorized 3D model and primitive-level instance segmentation capability for 3D data of the terminal, bringing more terminal applications and generating value.
  • users can use mobile phones to provide ordinary users with furniture and indoor modeling capabilities, allowing users to share lightweight 3D data, and provide furniture design ideas or AR games based on environmental information to improve user experience.
  • the server may include a GPU/Ascend chip 1801, a CPU 1802, and the like.
  • the processing flow of the server is similar to the processing flow of the terminal in the aforementioned FIG. 16, the difference is that the steps performed by the GPU/D chip 1601 are replaced by the GPU/Ascend chip 1801 to perform, and the steps performed by the CPU 1602 are replaced by the CPU 1802. It is not repeated here.
  • the server may also include a transceiver, such as an I/O interface, an antenna or other wired or wireless communication interfaces, etc.
  • a transceiver such as an I/O interface, an antenna or other wired or wireless communication interfaces, etc.
  • the input point cloud data can be received through the transceiver 1803, and the simplified three-dimensional data can be output. Model.
  • the server may be a server of a cloud platform or one of the servers in a server cluster, loaded with GPU/Ascend1801 for supporting neural network operations, and loaded with CPU1802 for supporting instance segmentation and vectorized 3D models. Wait. Users can transmit the collected point cloud data to the cloud platform through the network interface, and run the sub-manifold prediction network deployed in the server through GPU/Ascend1801 to predict the information of local sub-manifolds, such as those in the same sub-manifold. point and transfer to the CPU.
  • the CPU performs instance segmentation based on the sub-manifold information fed back by the GPU/Ascend, and constructs a vectorized simplified 3D model, and then feeds back the instance segmentation result and/or the simplified 3D model to the network interface according to user requirements.
  • user It is equivalent to a server that can provide users with primitive-level instance segmentation and 3D model vectorization through the cloud platform, so that users can easily and efficiently realize primitive-level instance segmentation and 3D model vectorization through the cloud platform.
  • cloud platforms can provide automated 3D modeling services for 3D data producers. The manufacturer only needs to upload the point cloud reconstructed by laser scanning or photographing to the server of the cloud platform, and the server can automatically complete the algorithm process of vectorized CAD, thereby outputting the vectorized 3D model.
  • the server can receive point cloud data from the network interface, build tensors, and transfer the tensors to GPU/Ascend1801.
  • the input point cloud data can include N points, which are represented by ⁇ pi ⁇ to establish an Nx3 tensor.
  • the GPU/Ascend chip 1801 and the CPU 1802 use the GPU/Ascend chip 1801 and the CPU 1802 to perform instance segmentation at the primitive level on the input point cloud data. Specifically, the 16 nearest neighbors of each point in the point set are taken to form N*16 point pairs, and then the GPU/Ascend1801 inputs the point cloud data into the sub-manifold nesting network to predict the local sub-manifold.
  • GPU/Ascend1801 inputs each pair of adjacent sub-manifold features into the boundary discrimination network to determine whether it is a boundary, that is, to determine whether each edge of the undirected graph composed of N points belongs to different primitives or sub-manifolds, Delete edges that do not belong to the same primitive or submanifold, or filter out points that belong to the same primitive or submanifold. Then CPU1802 can predict the boundary of the network output according to the sub-manifold and identify the points in the same sub-manifold or primitive, and use the flood fill algorithm to perform primitive-level instance segmentation on the point cloud,
  • the CPU1802 triangulates multiple points in the point cloud data, and then extracts the triangle mesh corresponding to the boundary from the triangle mesh, that is, the boundary adjacent to two different instances or only adjacent to a unique triangle collection.
  • the set is then simplified by the Ramer-Douglas-Peucker algorithm to extract the corners.
  • the CPU 1802 triangulates the corner points of each primitive according to the prediction results transmitted by the GPU/Ascend 1801, as well as the triangle mesh and the corner points.
  • the Dijkstra algorithm is used to find the nearest corner point of any point of the grid, and for each triangle whose nearest corner points are different from each other, the three related corner points are connected into a triangle, so as to realize the triangulation of the corner points, and get more A Delaunay triangle mesh.
  • the three-dimensional model construction method provided by the present application can be deployed in the server, so that the server can convert the input point cloud data into a vectorized three-dimensional model, and can realize instance segmentation at the primitive level, It has high robustness and efficiently obtains a better 3D model.
  • the embodiments of the present application also provide a neural network training device, which may also be called a digital processing chip or a chip.
  • the chip includes a processing unit and a communication interface.
  • the processing unit obtains program instructions through the communication interface, and the program instructions are processed.
  • the unit is executed, and the processing unit is configured to execute the method steps executed by the neural network training apparatus shown in any of the foregoing embodiments in FIG. 4 to FIG. 10 .
  • the embodiment of the present application also provides a three-dimensional model construction device.
  • the three-dimensional model construction device may also be called a digital processing chip or a chip.
  • the chip includes a processing unit and a communication interface.
  • the processing unit obtains program instructions through the communication interface, and the program instructions are processed.
  • the unit is executed, and the processing unit is configured to execute the method steps executed by the target detection apparatus shown in any of the foregoing embodiments in FIG. 4 to FIG. 10 .
  • the embodiments of the present application also provide a digital processing chip.
  • the digital processing chip integrates circuits and one or more interfaces for realizing the above-mentioned processors 1301/1401 or the functions of the processors 1301/1401.
  • the digital processing chip can perform the method steps of any one or more of the foregoing embodiments.
  • the digital processing chip does not integrate the memory, it can be connected with the external memory through the communication interface.
  • the digital processing chip implements the actions performed by the neural network distillation apparatus in the above embodiment according to the program codes stored in the external memory.
  • Embodiments of the present application also provide a computer program product that, when running on a computer, causes the computer to execute the steps performed by the neural network training apparatus in the methods described in the embodiments shown in FIG. 4 to FIG. 10 .
  • Embodiments of the present application also provide a computer program product that, when running on a computer, causes the computer to execute the steps performed by the three-dimensional model building apparatus in the method described in the embodiments shown in FIGS. 4 to 10 .
  • the neural network training device or the three-dimensional model building device provided in the embodiment of the present application may be a chip, and the chip may include: a processing unit and a communication unit, the processing unit may be, for example, a processor, and the communication unit may be, for example, an input/output interface, pin or circuit, etc.
  • the processing unit can execute the computer-executed instructions stored in the storage unit, so that the chip in the server executes the neural network training method or the three-dimensional model building method described in the embodiments shown in FIG. 4 to FIG. 10 .
  • the storage unit is a storage unit in the chip, such as a register, a cache, etc.
  • the storage unit may also be a storage unit located outside the chip in the wireless access device, such as only Read-only memory (ROM) or other types of static storage devices that can store static information and instructions, random access memory (RAM), etc.
  • ROM Read-only memory
  • RAM random access memory
  • the aforementioned processing unit or processor may be a central processing unit (CPU), a network processor (neural-network processing unit, NPU), a graphics processing unit (graphics processing unit, GPU), a digital signal processing digital signal processor (DSP), application specific integrated circuit (ASIC) or field programmable gate array (FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc.
  • a general purpose processor may be a microprocessor or it may be any conventional processor or the like.
  • FIG. 20 is a schematic structural diagram of a chip provided by an embodiment of the present application.
  • the chip can be represented as a neural network processor NPU 200, and the NPU 200 is mounted on the main CPU ( Host CPU), the task is allocated by the Host CPU.
  • the core part of the NPU is the arithmetic circuit 2003, which is controlled by the controller 2004 to extract the matrix data in the memory and perform multiplication operations.
  • the arithmetic circuit 2003 includes multiple processing units (process engines, PEs). In some implementations, the arithmetic circuit 2003 is a two-dimensional systolic array. The arithmetic circuit 2003 may also be a one-dimensional systolic array or other electronic circuitry capable of performing mathematical operations such as multiplication and addition. In some implementations, the arithmetic circuit 2003 is a general-purpose matrix processor.
  • the arithmetic circuit fetches the data corresponding to the matrix B from the weight memory 2002 and buffers it on each PE in the arithmetic circuit.
  • the arithmetic circuit fetches the data of matrix A and matrix B from the input memory 2001 to perform matrix operation, and stores the partial result or final result of the matrix in an accumulator 2008 .
  • Unified memory 2006 is used to store input data and output data.
  • the weight data is directly accessed through the storage unit access controller (direct memory access controller, DMAC) 2005, and the DMAC is transferred to the weight memory 2002.
  • Input data is also transferred to unified memory 2006 via the DMAC.
  • DMAC direct memory access controller
  • the bus interface unit (BIU) 2010 is used for the interaction between the AXI bus and the DMAC and the instruction fetch buffer (Instruction Fetch Buffer, IFB) 2009.
  • the bus interface unit 2010 (bus interface unit, BIU) is used for the instruction fetch memory 2009 to obtain instructions from the external memory, and also for the storage unit access controller 2005 to obtain the original data of the input matrix A or the weight matrix B from the external memory.
  • the DMAC is mainly used to transfer the input data in the external memory DDR to the unified memory 2006 , the weight data to the weight memory 2002 , or the input data to the input memory 2001 .
  • the vector calculation unit 2007 includes a plurality of operation processing units, and further processes the output of the operation circuit, such as vector multiplication, vector addition, exponential operation, logarithmic operation, size comparison and so on, if necessary. It is mainly used for non-convolutional/fully connected layer network computation in neural networks, such as batch normalization, pixel-level summation, and upsampling of feature planes.
  • the vector computation unit 2007 can store the processed output vectors to the unified memory 2006 .
  • the vector calculation unit 2007 may apply a linear function and/or a nonlinear function to the output of the operation circuit 2003, such as linear interpolation of the feature plane extracted by the convolutional layer, such as a vector of accumulated values, to generate activation values.
  • the vector computation unit 2007 generates normalized values, pixel-level summed values, or both.
  • the vector of processed outputs can be used as activation input to the arithmetic circuit 2003, eg, for use in subsequent layers in a neural network.
  • the instruction fetch memory (instruction fetch buffer) 2009 connected to the controller 2004 is used to store the instructions used by the controller 2004;
  • Unified memory 2006, input memory 2001, weight memory 2002 and instruction fetch memory 2009 are all On-Chip memories. External memory is private to the NPU hardware architecture.
  • each layer in the recurrent neural network can be performed by the operation circuit 2003 or the vector calculation unit 2007 .
  • the processor mentioned in any one of the above may be a general-purpose central processing unit, a microprocessor, an ASIC, or one or more integrated circuits for controlling the execution of the programs of the above-mentioned methods in FIGS. 4-10 .
  • the device embodiments described above are only schematic, wherein the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be A physical unit, which can be located in one place or distributed over multiple network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution in this embodiment.
  • the connection relationship between the modules indicates that there is a communication connection between them, which may be specifically implemented as one or more communication buses or signal lines.
  • U disk U disk
  • mobile hard disk ROM
  • RAM random access memory
  • disk or CD etc.
  • a computer device which can be a personal computer, server, or network device, etc. to execute the methods described in the various embodiments of the present application.
  • the computer program product includes one or more computer instructions.
  • the computer may be a general purpose computer, special purpose computer, computer network, or other programmable device.
  • the computer instructions may be stored in or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be downloaded from a website site, computer, server, or data center Transmission to another website site, computer, server, or data center is by wire (eg, coaxial cable, fiber optic, digital subscriber line (DSL)) or wireless (eg, infrared, wireless, microwave, etc.).
  • wire eg, coaxial cable, fiber optic, digital subscriber line (DSL)
  • wireless eg, infrared, wireless, microwave, etc.
  • the computer-readable storage medium may be any available medium that can be stored by a computer, or a data storage device such as a server, data center, etc., which includes one or more available media integrated.
  • the usable media may be magnetic media (eg, floppy disks, hard disks, magnetic tapes), optical media (eg, DVDs), or semiconductor media (eg, solid state disks (SSDs)), and the like.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Evolutionary Computation (AREA)
  • Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • Geometry (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Multimedia (AREA)
  • Computer Graphics (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Image Analysis (AREA)

Abstract

本申请公开了人工智能领域的一种三维模型构建方法、神经网络训练方法以及装置,用于实现图元层面的实例分割,得到简化的三维模型。该方法包括:首先,获取点云数据,该点云数据包括多个点以及每个点对应的信息;随后,将点云数据输入至子流形预测网络,得到多个点的预测结果,预测结果用于标识多个点中的每个点和相邻点是否属于同一个子流形,其中,子流形预测网络从点云数据提取每个点对应的特征,并根据每个点对应的特征确定每个点和相邻点是否属于同一子流形;根据多个点的预测结果从多个点中筛选出多个角点,多个角点包括多个点形成的各个子流形的边界上的点;根据多个角点构建三维模型,多个角点形成的网格组成三维模型中的流形。

Description

一种三维模型构建方法、神经网络训练方法以及装置
本申请要求于2021年03月16日提交中国专利局、申请号为“202110280138.1”、申请名称为“一种三维模型构建方法、神经网络训练方法以及装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及人工智能领域,尤其涉及一种三维模型构建方法、神经网络训练方法以及装置。
背景技术
将输入的三维重建数据重建为矢量化的三维模型,可以用于终端的快速渲染和交互。三维模型的获取方式可以分为人工建模和使用采集设备重建两种方式,其中采集设备的重建可以真实还原场景,但数据往往充满噪声、体量巨大。人工建模通常用基础的形状(如平面、圆柱、球、圆锥)对真实环境进行抽象拟合,因此模型体量小,不精准但拥有较好的结构化表达。
例如,可以通过PointNet++网络提取点云的特征,并输出每个点的实例ID,法向量和实例图元的类型等,然而,此方式仅能对少量的物体进行识别以及重建,对于实例较多的场景则实现所需成本大甚至无法实现,导致泛化能力弱。
发明内容
本申请提供一种三维模型构建方法、神经网络训练方法以及装置,用于实现图元层面的实例分割,得到简化的三维模型。
有鉴于此,第一方面,本申请提供一种三维模型构建方法,包括:首先,获取点云数据,该点云数据包括多个点以及每个点对应的信息,每个点具有对应的信息可以包括如深度、像素值、亮度值或强度值等信息;随后,将点云数据输入至子流形预测网络,得到多个点的预测结果,预测结果用于标识多个点中的每个点和相邻点是否属于同一个子流形,其中,该子流形预测网络从点云数据提取特征,得到多个点中每个点对应的特征,并根据每个点对应的特征确定每个点和相邻点是否属于同一子流形;根据多个点的预测结果从多个点中筛选出多个角点,多个角点包括多个点形成的各个子流形的边界上的点;根据多个角点构建三维模型,多个角点形成的网格组成三维模型中的流形。
因此,在本申请实施方式中,可以通过子流形预测网络来预测点对是否处于同一个子流形中,然后根据所有点的预测结果,从多个点中筛选出子流形的边界上的角点,从而根据角点构建出子流形的形状,从而组合得到简化的三维模型。并且,更详细地,本申请通过训练得到的子流形预测网络,来预测点对是否处于同一个子流形中,从而将点云数据中的多个点划分到不同的子流形或者图元中,可以实现图元级别的实例分割,可以非常精准地实现实例分割,进而可以提高最终得到的三维模型的效果,使三维模型在进行了简化的基础上细节更丰富,即使存在噪声,本申请提供的方法也可以通过训练子流形预测网络的方式适应不同的噪声水平,提高输出的三维模型的准确性。
在一种可能的实施方式中,前述的子流形预测网络从点云数据提取特征,可以包括:从点云数据中,以每个点以及相邻的第一预设数量的点为单位提取特征,得到每个点对应的局部特征;对点云数据进行下采样,得到下采样数据,下采样数据的分辨率低于点云数据的分辨率;从下采样数据中提取特征,得到每个点对应的全局特征;融合局部特征和全局特征,得到多个点中每个点对应的特征。
本申请实施方式中,可以提取融合了局部信息的局部特征,以及融合了更大范围的全局信息,使每个点对应的特征所包括的信息更复杂,得到复杂度更高的特征,相当于能更准确描述每个点及其周边,进而使后续的预测结果更准确。
在一种可能的实施方式中,前述的对点云数据进行下采样,可以包括:对点云数据进行划分,得到多个体素,每个体素包括至少一个点以及对应的至少一个点中每个点的局部特征;前述的从下采样数据中提取特征,可以包括:以多个体素中每个体素内的点以及相邻的第二预设数量的体素内的点为单位,进行特征提取,得到全局特征,且第二预设数量的体素内的点的数量不小于第一预设数量。
本申请实施方式中,可以通过下采样的方式来扩大提取特征的范围,得到与周边关联性更强的全局特征,从而使每个点对应的特征中包含的信息更多。
在一种可能的实施方式中,前述的根据每个点的特征确定每个点和相邻的点是否属于同一子流形,可以包括:根据每个点对应的特征确定每个点对应的法向量;根据每个点的特征、每个点的法向量和相邻点的法向量,确定每个点和相邻点是否属于同一子流形。
因此,本申请实施方式中,可以通过法向量的偏移量来识别两个点组成的点对是否处于同一个子流形中,相当于通过几何的方式准确地识别出两个点是否处在同一个子流形中,提高识别的准确率。
在一种可能的实施方式中,根据预测结果从多个点中筛选出多个角点,包括:对多个点进行三角形构网,形成至少一个三角形网格;根据预测结果从至少一个三角形网格中提取属于同一子流形的边界;从至少一个三角形网格中提取到的属于同一子流形的边界上的点中,提取多个角点。
因此,本申请实施方式中,可以对点云数据中的多个点进行三角形构,并根据子流形预测网络的输出结果来从三角形网格中提取到属于同一个子流形的边界,并从该边界上提取点作为角点,以便于通过角点来构建简化的三维模型。
在一种可能的实施方式中,根据多个角点构建三维模型,可以包括:使用多个角点和多个角点之间的测地距离构建至少一个德劳内三角形网格;合并至少一个德劳内三角形网格,得到三维模型。
因此,本申请实施方式中,可以基于测地距离构建德劳内三角形网格,从而高效准确地得到简化的三维模型。
第二方面,本申请提供一种神经网络训练方法,包括:首先,获取训练数据,训练数据中包括多个点和每个点对应的标签,每个点对应的标签中包括用于指示每个点和相邻的点是否属于同一个子流形的标识;将多个点作为子流形预测网络的输入,得到多个点的预测结果,预测结果包括多个点中的每个点和相邻点是否属于同一个子流形,其中,子流形 预测网络从点云数据提取特征,得到多个点中每个点对应的特征,并根据每个点对应的特征确定每个点和相邻点是否属于同一子流形;根据预测结果和每个点对应的标签计算损失值;根据损失值更新子流形预测网络,得到更新后的子流形预测网络。
因此,在本申请实施方式中,可以训练子流形预测网络,用于输出点云数据中的点对是否处于同一个子流形中,以便于在推理时可以根据子流形预测网络的预测结果来构建点云数据对应的子流形的边界,进而构建出简化后的三维模型,并且可以提高得到的三维模型的准确度,使该三维模型所包括的子流形更丰富,细节更丰富。
在一种可能的实施方式中,前述的子流形预测网络从点云数据提取特征,可以包括:从点云数据中,以每个点以及相邻的第一预设数量的点为单位提取特征,得到局部特征;对点云数据进行至少一次下采样,得到下采样数据,下采样数据的分辨率低于点云数据的分辨率;从下采样数据中提取特征,得到全局特征;融合局部特征和全局特征,得到多个点中每个点对应的特征。
本申请实施方式中,可以提取融合了局部信息的局部特征,以及融合了更大范围的全局信息,使每个点对应的特征所包括的信息更复杂,得到复杂度更高的特征,相当于能更准确描述每个点及其周边,进而使后续的预测结果更准确。
在一种可能的实施方式中,前述的对点云数据进行至少一次下采样中的其中一次,可以包括:对点云数据进行划分,得到多个体素,每个体素包括至少一个点以及对应的至少一个点中每个点的局部特征;从下采样数据中提取特征,包括:以多个体素中每个体素内的点以及相邻的第二预设数量的体素内的点为单位,进行至少一次特征提取,得到全局特征,第二预设数量的体素内的点的数量不小于第一预设数量。
本申请实施方式中,可以通过下采样的方式来扩大提取特征的范围,得到与周边关联性更强的全局特征,从而使每个点对应的特征中包含的信息更多。
在一种可能的实施方式中,前述的根据每个点的特征确定每个点和相邻的点是否属于同一子流形,可以包括:根据每个点对应的特征确定每个点对应的预测法向量;根据每个点的特征、每个点的预测法向量和相邻点的预测法向量,确定每个点和相邻点是否属于同一子流形。
因此,本申请实施方式中,可以通过法向量的偏移量来识别两个点组成的点对是否处于同一个子流形中,相当于通过几何的方式准确地识别出两个点是否处在同一个子流形中,提高识别的准确率。
在一种可能的实施方式中,预测结果中还包括每个点对应的法向量,每个点的标签中还包括每个点对应的真值法向量;前述的根据预测结果和每个点对应的标签计算损失值,可以包括:根据每个点对应的法向量和每个点对应的真值法向量计算损失值。
本申请实施方式中,还可以通过定义预测结果中包括法相量的方式,来使训练后的子流形预测网络的输出中可以包括法向量,以便于对点对是否处于同一个子流形中进行更准确地识别。
第三方面,本申请实施例提供一种三维模型构建装置,该三维模型构建装置具有实现上述第一方面三维模型构建方法的功能。该功能可以通过硬件实现,也可以通过硬件执行 相应的软件实现。该硬件或软件包括一个或多个与上述功能相对应的模块。
第四方面,本申请实施例提供一种神经网络训练装置,该神经网络训练装置具有实现上述第二方面神经网络训练方法的功能。该功能可以通过硬件实现,也可以通过硬件执行相应的软件实现。该硬件或软件包括一个或多个与上述功能相对应的模块。
第五方面,本申请实施例提供一种三维模型构建装置,包括:处理器和存储器,其中,处理器和存储器通过线路互联,处理器调用存储器中的程序代码用于执行上述第一方面任一项所示的三维模型构建方法中与处理相关的功能。可选地,该三维模型构建装置可以是芯片。
第六方面,本申请实施例提供一种神经网络训练装置,包括:处理器和存储器,其中,处理器和存储器通过线路互联,处理器调用存储器中的程序代码用于执行上述第二方面任一项所示的神经网络训练方法中与处理相关的功能。可选地,该神经网络训练装置可以是芯片。
第七方面,本申请实施例提供了一种三维模型构建装置,该三维模型构建装置也可以称为数字处理芯片或者芯片,芯片包括处理单元和通信接口,处理单元通过通信接口获取程序指令,程序指令被处理单元执行,处理单元用于执行如上述第一方面或第一方面任一可选实施方式中与处理相关的功能。
第八方面,本申请实施例提供了一种神经网络训练装置,该神经网络训练装置也可以称为数字处理芯片或者芯片,芯片包括处理单元和通信接口,处理单元通过通信接口获取程序指令,程序指令被处理单元执行,处理单元用于执行如上述第二方面或第二方面任一可选实施方式中与处理相关的功能。
第九方面,本申请实施例提供了一种计算机可读存储介质,包括指令,当其在计算机上运行时,使得计算机执行上述第一方面或第二方面中任一可选实施方式中的方法。
第十方面,本申请实施例提供了一种包含指令的计算机程序产品,当其在计算机上运行时,使得计算机执行上述第一方面或第二方面中任一可选实施方式中的方法。
附图说明
图1为本申请应用的一种人工智能主体框架示意图;
图2为本申请提供的一种***架构示意图;
图3为本申请实施例提供的另一种***架构示意图;
图4为本申请实施例提供的一种神经网络训练方法的流程示意图;
图5为本申请提供的一种三维模型构建方法的流程示意图;
图6为本申请实施例提供的一种子流形预测网络的结构示意图;
图7为本申请实施例提供的子流形预测网络执行的具体步骤示意图;
图8为本申请实施例提供的另一种三维模型构建方法的的流程示意图;
图9为本申请实施例提供的多种方式输出的模型的对比示意图;
图10为本申请实施例提供的一种实例分割和三维模型示意图;
图11为本申请实施例提供的一种神经网络训练装置的结构示意图;
图12为本申请实施例提供的一种三维模型构建装置的结构示意图;
图13为本申请提供的另一种神经网络训练装置的结构示意图;
图14为本申请提供的另一种三维模型构建装置结构示意图;
图15为本申请实施例提供的服务器在执行神经网络训练方法时的硬件执行流程;
图16为本申请实施例提供的一种终端的结构示意图;
图17为本申请实施例提供的另一种终端的结构示意图;
图18为本申请实施例提供的一种服务器的结构示意图;
图19为本申请实施例提供的另一种服务器的结构示意图;
图20为本申请实施例提供的一种芯片的结构示意图。
具体实施方式
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行描述,显然,所描述的实施例仅仅是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。
首先对人工智能***总体工作流程进行描述,请参见图1,图1示出的为人工智能主体框架的一种结构示意图,下面从“智能信息链”(水平轴)和“IT价值链”(垂直轴)两个维度对上述人工智能主题框架进行阐述。其中,“智能信息链”反映从数据的获取到处理的一列过程。举例来说,可以是智能信息感知、智能信息表示与形成、智能推理、智能决策、智能执行与输出的一般过程。在这个过程中,数据经历了“数据—信息—知识—智慧”的凝练过程。“IT价值链”从人智能的底层基础设施、信息(提供和处理技术实现)到***的产业生态过程,反映人工智能为信息技术产业带来的价值。
(1)基础设施
基础设施为人工智能***提供计算能力支持,实现与外部世界的沟通,并通过基础平台实现支撑。通过传感器与外部沟通;计算能力由智能芯片,如中央处理器(central processing unit,CPU)、网络处理器(neural-network processing unit,NPU)、图形处理器(英语:graphics processing unit,GPU)、专用集成电路(application specific integrated circuit,ASIC)或现场可编程逻辑门阵列(field programmable gate array,FPGA)等硬件加速芯片)提供;基础平台包括分布式计算框架及网络等相关的平台保障和支持,可以包括云存储和计算、互联互通网络等。举例来说,传感器和外部沟通获取数据,这些数据提供给基础平台提供的分布式计算***中的智能芯片进行计算。
(2)数据
基础设施的上一层的数据用于表示人工智能领域的数据来源。数据涉及到图形、图像、语音、文本,还涉及到传统设备的物联网数据,包括已有***的业务数据以及力、位移、液位、温度、湿度等感知数据。
(3)数据处理
数据处理通常包括数据训练,机器学习,深度学习,搜索,推理,决策等方式。
其中,机器学习和深度学习可以对数据进行符号化和形式化的智能信息建模、抽取、预处理、训练等。
推理是指在计算机或智能***中,模拟人类的智能推理方式,依据推理控制策略,利用形式化的信息进行机器思维和求解问题的过程,典型的功能是搜索与匹配。
决策是指智能信息经过推理后进行决策的过程,通常提供分类、排序、预测等功能。
(4)通用能力
对数据经过上面提到的数据处理后,进一步基于数据处理的结果可以形成一些通用的能力,比如可以是算法或者一个通用***,例如,翻译,文本的分析,计算机视觉的处理,语音识别,图像的识别等等。
(5)智能产品及行业应用
智能产品及行业应用指人工智能***在各领域的产品和应用,是对人工智能整体解决方案的封装,将智能信息决策产品化、实现落地应用,其应用领域主要包括:智能终端、智能交通、智能医疗、自动驾驶、智慧城市等。
在一些场景中,如智能终端、智能交通、智能医疗、自动驾驶、智慧城市等,需要进行模型的构建,如在构建AR地图的过程中,可以通过激光采集地图的点云数据,然后使用该点云数据来构建AR地图。或者,在智能终端中,可以通过相机采集当前拍摄的场景的点云数据,然后基于该点云数据来构建当前场景中的三维模型,然后应用在终端的图像处理或者游戏中,从而提高用户体验。即本申请提供的方法,在AI***、终端应用或者云服务等方面,都具有价值。
例如,得到需要重建的点云数据之后,通过多种算法重建网格,针对每个点,根据其近邻坐标计算Quadric能量,然后通过优先级队列选取能量最低的边/点,删除选取的元素,并维护局部的拓扑结构,保证局部依然是流形,在删除足够的点数后,输出简化的网格,组合起来即简化后的三维模型。然而,Quadric描述的是局部的能量,因此并无法保证每次删除的局部能量最低元素不会破坏整体的结构。在要求高简化率的情况下,经常需要破坏整体的结构,导致几何形状与原数据差异过大。以下将此种方式称为常用方式一。
又例如,得到需要重建的点云数据之后,通过几何的平面拟合法选取大平面,将提取的平面沿着边界扩展,直到与其他平面相交,对平面划分的空间进行四面体剖分。根据点云的法相信息和空间连续特性,进行基于graphcut的能量优化,提取所有同时与内外四面体相邻的面作为最终网格。然而,此方式假设构建的点云组成的流行是封闭的曲面,而在实际应用场景中,三维模型中可能并不全是封闭的曲面,而若假设构建的点云是平面,而实际上可能存在曲面,因此,泛化能力弱,且在进行平面检测时,鲁棒性较差。以下将此种方式称为常用方式二。
又例如,得到需要重建的点云数据之后,利用随机抽样一致(random sample consensus,RANSAC)算法检测图元,去除与检测出的图元相近的点,得到简化后的图元,若检测不到新的平面则输出所有简化后的图元,组成三维模型。然而,在点云数据存在噪声的情况下,此种方式构建的三维模型鲁棒性较差,且对于曲面的检测难度依赖于法向量的精确度,在实际数据中难以准确预测法向量。以下将此种方式称为常用方式三。
还例如,在得到需要重建的点云数据之后,通过PointNet++网络来提取点云数据的特征,并输出每个点对应的实例、法向量和实例的图元的类型。然而,此种方式仅针对物体进行重建,泛化能力较弱,且针对较大场景中,存在大量实例,通过此方式将耗费较大的计算力,实现起来较为困难。以下将此种方式称为常用方式四。
因此,本申请提供一种神经网络训练方法和三维模型构建方法,通过判断每个点和相邻点之间是否处于同一子流形,来筛选出子流形边界上的点,从而构建出简化后的三维模型。
具体地,可以通过本申请提供的一种神经网络训练方法,训练子流形预测网络,该子流形预测网络可以用于识别输入的点云数据中各个点与相邻点之间是否处于同一个子流形中。通过本申请提供的神经网络训练方法训练得到子流形预测网络,然后即可通过该子流形预测网络的预测结果来对点云数据进行三维重建,得到重建后的简化的三维模型。
可以理解为,本申请体提供的神经网络训练方法和三维模型构建方法分别为训练阶段和推理阶段,且在推理阶段,增加了使用神经网络的预测结果来进行三维重建的步骤。
本申请提供的神经网络训练方法和三维模型构建方法可以应用于终端、服务器或者云平台等。如可以在服务器中训练子流形预测网络,然后将该子流形预测网络部署于终端,由终端执行本申请提供的三维模型构建方法;或者可以在终端中训练子流形预测网络,并将该子流形预测网络部署于终端,然后由终端通过该子流形预测网络来执行本申请提供的三维模型构建方法;或者,可以在服务器中训练子流形预测网络,然后将该子流形网络部署于服务器中,由服务器执行本申请提供的三维模型构建方法等。
首先,参与图2,对本申请提供的一种***架构进行介绍。
该***架构中包括数据库230、客户设备240、训练设备220和执行设备210。数据采集设备260用于采集数据并存入数据库230,训练设备220基于数据库230中维护的数据训练得到目标模型/规则201。执行设备210用于预计训练设备220训练得到的目标模型/规则201对客户设备240输入的数据进行处理,并向客户设备240反馈输出结果。
训练设备220可以用于进行神经网络训练,输出目标模型/规则201。
执行设备210可以调用数据存储***250中的数据、代码等,也可以将数据、指令等存入数据存储***250中。
关于训练设备220如何基于数据得到目标模型/规则201,目标模型/规则201即本申请以下实施方式中训练得到的子流形预测网络,具体参阅以下图4-图10中的相关描述。
执行设备210还可以包括计算模块211,用于使用目标模型/规则201对输入的数据进行处理。
具体地,训练设备220得到的目标模型/规则201可以应用不同的***或设备中,如图2中所示,可以将该目标模型/规则201部署于执行设备210中。在附图2中,执行设备210配置有收发器212(以I/O接口为例),与外部设备进行数据交互,“用户”可以通过客户设备240向I/O接口212输入数据,例如,本申请以下实施方式,客户设备240可以向执行设备210发送需要进行三维模型重建的点云数据。
最后,收发器212将计算模块211的三维模型返回给客户设备240,以使客户设备240 或者其他设备可以使用该三维模型进行其他操作,如图像处理或者应用于游戏等。
更深层地,训练设备220可以针对不同的任务,基于不同的数据得到相应的目标模型/规则201,以给用户提供更佳的结果。
在附图2中所示情况下,可以根据用户的输入数据确定输入执行设备210中的数据,例如,用户可以在收发器212提供的界面中操作。另一种情况下,客户设备240可以自动地向收发器212输入数据并获得结果,若客户设备240自动输入数据需要获得用户的授权,用户可以在客户设备240中设置相应权限。用户可以在客户设备240查看执行设备210输出的结果,具体的呈现形式可以是显示、声音、动作等具体方式。客户设备240也可以作为数据采集端将采集到与目标任务关联的数据存入数据库230。
需要说明的是,附图2仅是本申请实施例提供的一种***架构的示例性的示意图,图中所示设备、器件、模块等之间的位置关系不构成任何限制。例如,在附图2中,数据存储***250相对执行设备210是外部存储器,在其它场景中,也可以将数据存储***250置于执行设备210中。
可以理解的是,神经网络的训练过程即学习控制空间变换的方式,更具体即学习权重矩阵。训练神经网络的目的是使神经网络的输出尽可能接近期望值,因此可以通过比较当前网络的预测值和期望值,再根据两者之间的差异情况来更新神经网络中的每一层神经网络的权重向量(当然,在第一次更新之前通常可以先对权重向量进行初始化,即为深度神经网络中的各层预先配置参数)。例如,如果网络的预测值过高,则调整权重矩阵中的权重的值从而降低预测值,经过不断的调整,直到神经网络输出的值接近期望值或者等于期望值。具体地,可以通过损失函数(loss function)或目标函数(objective function)来衡量神经网络的预测值和期望值之间的差异。以损失函数举例,损失函数的输出值(loss)越高表示差异越大,神经网络的训练可以理解为尽可能缩小loss的过程。本申请以下实施方式中更新起点网络的权重以及对串行网络进行训练的过程可以参阅此过程,以下不再赘述。
如图2所示,根据训练设备220训练得到目标模型/规则201,该目标模型/规则201在本申请实施例中可以是本申请中所提及的子流行预测网络。
可选地,训练子流形预测网络的设备和该子流形被部署的设备可能是同一个设备,即如图2中所示出的训练设备220和执行设备210可能是相同的设备或者设置于相同的设备中。例如,该训练设备可以是终端,该执行设备可以是服务器,或者该训练设备为服务器,该执行设备为同一个服务器等。
下面示例性地对本申请提供的一些可能的***架构进行介绍。
示例性地,如图3所示,本申请提供的神经网络训练方法可以由服务器集群310来执行,即对子流形预测网络进行训练,并将训练后子流形预测网络通过通信网络发送至终端301,以在该终端301中部署子流形预测网络。可以将终端的相机或者激光雷达采集到的点云数据作为子流形预测网络的输入,终端对子流形预测网络的输出进行处理,输出由重建的简化后的三维模型。该三维模型可以用于终端进行图像处理,识别图像中各个对象的类型,或者,将重建得到的三维模型应用于终端的AR游戏中,使AR游戏可以与用户所在的 实景相结合,提高用户体验。
示例性地,本申请提供的神经网络训练方法可以由服务器来执行,训练得到的子流形预测网络可以部署于服务器中。该服务器可以用于执行本申请提供的三维模型构建方法,该服务器可以接收客户端发送的点云数据,或者从本地存储的数据中提取点云数据,然后通过本申请提供的三维模型构建方法,构建简化后的三维模型。若点云数据由客户端发送至服务器,则服务器可以将简化后的三维模型反馈至客户端。
本申请实施例涉及了一些神经网络相关的应用,为了更好地理解本申请实施例的方案,下面先对本申请实施例可能涉及的一些神经网络的相关术语和概念进行介绍。
本申请实施例涉及了神经网络和图像领域的相关应用,为了更好地理解本申请实施例的方案,下面先对本申请实施例可能涉及的神经网络的相关术语和概念进行介绍。
(1)神经网络
神经网络可以是由神经单元组成的,神经单元可以是指以x s和截距1为输入的运算单元,该运算单元的输出可以如公式(1-1)所示:
Figure PCTCN2022080295-appb-000001
其中,s=1、2、……n,n为大于1的自然数,W s为x s的权重,b为神经单元的偏置。f为神经单元的激活函数(activation functions),用于将非线性特性引入神经网络中,来将神经单元中的输入信号转换为输出信号。该激活函数的输出信号可以作为下一层卷积层的输入,激活函数可以是sigmoid函数。神经网络是将多个上述单一的神经单元联结在一起形成的网络,即一个神经单元的输出可以是另一个神经单元的输入。每个神经单元的输入可以与前一层的局部接受域相连,来提取局部接受域的特征,局部接受域可以是由若干个神经单元组成的区域。
(2)卷积神经网络
卷积神经网络(convolutional neuron network,CNN)是一种带有卷积结构的深度神经网络。卷积神经网络包含了一个由卷积层和子采样层构成的特征抽取器,该特征抽取器可以看作是滤波器。卷积层是指卷积神经网络中对输入信号进行卷积处理的神经元层。在卷积神经网络的卷积层中,一个神经元可以只与部分邻层神经元连接。一个卷积层中,通常包含若干个特征平面,每个特征平面可以由一些矩形排列的神经单元组成。同一特征平面的神经单元共享权重,这里共享的权重就是卷积核。共享权重可以理解为提取图像信息的方式与位置无关。卷积核可以以随机大小的矩阵的形式初始化,在卷积神经网络的训练过程中卷积核可以通过学习得到合理的权重。另外,共享权重带来的直接好处是减少卷积神经网络各层之间的连接,同时又降低了过拟合的风险。
本申请以下提及的用于提取特征的网络,即可以包括一层或者多层卷积层,示例性地,该用于提取特征的网络即可以采用CNN来实现。
(3)损失函数(loss function):也可以称为代价函数(cost function),一种比较机器学***方均方、交叉熵、对数、指数等损失函数。例如,可以使用误差均方作为损失 函数,定义为
Figure PCTCN2022080295-appb-000002
具体可以根据实际应用场景选择具体的损失函数。
(4)梯度:损失函数关于参数的导数向量。
(5)随机梯度:机器学习中样本数量很大,所以每次计算的损失函数都由随机采样得到的数据计算,相应的梯度称作随机梯度。
(6)反向传播(back propagation,BP):一种计算根据损失函数计算模型参数梯度、更新模型参数的算法。
(7)流形(manifold):一个任意子集领域与欧几里得空间同构的几何形体。
(8)子流形(submanifold):流形的子集,并且本身拥有流形的结构。
(9)图元(primitive):基础形状,如平面、圆柱面、锥面或球面等,一个子流形中可以包括一个或者多个图元,如一个图元即可理解为一个子流形。
(10)嵌套(embedding):高维空间到连续低维空间的映射表达。
(11)点云:由多个点形成的数据,每个点具有对应的信息,如深度、亮度或者强度等信息。
本申请提供一种神经网络训练方法和三维模型构建方法,可以理解为,本申请体提供的神经网络训练方法和三维模型构建方法分别为训练阶段和推理阶段,且在推理阶段,增加了使用神经网络的预测结果来进行三维重建的步骤。下面分别将本申请提供的方法分为训练阶段和推理阶段等不同的阶段来进行介绍。该训练阶段即本申请提供的神经网络训练方法,该推理阶段即本申请提供的三维模型构建方法的过程。
一、训练阶段
参阅图4,本申请提供的一种三维模型构建方法的流程示意图,如下所述。
401、获取训练数据。
该训练数据中可以包括点云数据和点云数据中的多个点对应的标签(label)。
该点云数据可以包括多个点形成的数据。该点云数据可以是由相机采集到的,也可以是由激光雷达采集到的,还可以是从存储的数据中读取到的数据。
每个点对应的标签中,可以包括该点和任意点形成的点对是否处于同一子流形中的标识。即表示点云数据中的某一个点与其他点是否处于同一子流行中。
可选地,每个点对应的标签中还可以包括每个点所在的子流形的切平面的信息,如法向量。
例如,在采集到点云数据之后,可以标记出该点云数据的真值数据,如该真值数据可以包括:取每个点和与其邻近的一定数量的点作为点对,使用标识标记出每个点对是否在同一子流形中。该真值数据还可以包括:点云数据中每个点对应的图元的切平面信息,该图元可以包括每个点和最近的点形成的图元,如每个点可以表示为坐标o i和对应的法向量n i
通常,可以使用训练数据对子流形预测网络进行有监督训练,从而使子流形预测网络的输出结果与点云数据的标签更接近。
402、将训练数据中的点云数据输入至子流形预测网络,输出多个点的预测结果。
该子流形预测网络可以是预先训练的网络,输出多个点中每个点对应的预测结果。该 预测结果可以用于表示每个点与其相邻点是否属于同一子流形。
具体地,子流形预测网络可以从输入的点云数据中提取特征,然后根据提取到的特征识别每个点和相邻点是否属于同一个子流形。例如,可以以每个点及其周边的一定数量的点为单位提取特征,然后基于提取到的特征识别每个点与相邻点是否属于同一个子流形。
在一种可能的实施方式中,可以从点云数据中,以每个点以及相邻的第一预设数量的点为单位提取特征,得到每个点对应的局部特征,还可以对点云数据进行下采样,得到分辨率更低的下采样数据,从下采样数据中提取特征,即可得到每个点的全局特征,融合全局特征和局部特征,即可得到每个点对应的特征。
在提取全局特征的过程中,可以对点云数据进行多次下采样,然后每次下采样即可从当前次下采样后对应的特征图中提取特征,从而可以迭代地进行特征提取,使最终得到的特征可以更多参考与其邻近的各个点的特征,提高特征的准确度和复杂度,增加了特征所包括的隐式信息。
可选地,下采样的具体方式可以包括:对点云数据进行划分,得到多个体素,每个体素可以包括至少一个点以及每个点的局部特征。相当于将多个点划分为了多个网格,每个网格包括一个或者多个点,每个点具有对应的局部特征。然后,可以以每个体素以内的点及相邻的第二预设数量的体素为单位进行特征提取,得到每个点对应的全局特征,该第二预设数量的体素内的点的数量不小于第一预设数量。
因此,本申请实施方式中,可以通过下采样的方式以更大的范围来提取特征,结合每个点和相邻的更多的点的特征图,从而使提取到的每个点的全局特征,增加每个点的特征中所包括的信息。
可选地,融合局部特征和全局特征的方式可以包括:将局部特征和全局特征进行拼接,组成每个点的特征。例如,局部特征是16维的特征,全局特征也是16维的特征,则可以拼接得到32维的特征。
因此,在本申请实施方式中,可以结合局部特征和全局特征得到能更描述每个点的信息的特征,增加了每个点的特征所包括的信息,使后续判断每个点的预测结果更准确。
在一种可能的实施方式中,还可以通过子流形预测网络输出每个点所在的平面的法向量。例如,可以将提取到的特征的部分作为每个点对应的切平面的信息,从而在提取到每个点的特征之后,即可基于其中的部分特征得到预测的切平面的信息。该切平面的信息可以用于判断点对是否处于同一子流形中。例如,该切平面的信息可以包括该切平面的法向量,可以通过点对中两个点的法向量之间的偏移量,确定点对是否处于同一个子流形中。
403、根据预测结果和每个点对应的标签之间的损失值更新子流形预测网络。
在得到预测结果之后,即可计算每个点对应的预测结果和每个点的标签之间的损失值,然后使用该损失值对子流形预测网络进行反向更新,得到更新后的子流形预测网络。
具体地,损失函数可以采用误差平方均方、交叉熵、对数、指数等损失函数。例如,可以通过损失函数计算出预测结果和真实值之间的损失,然后基于该损失进行反向传播,计算出梯度,相当于子流形预测网络的参数的倒数向量,然后使用该梯度更新子流形预测网络参数。
404、判断是否符合收敛条件,若是,则执行步骤405,若否则执行步骤402。
在得到更新后的子流形预测网络之后,可以判断是否符合收敛条件,若符合收敛条件,则可以输出更新后的子流形预测网络,即完成对子流形预测网络的训练。若不符合收敛条件,在可以继续对子流形预测网络进行训练,即重复执行步骤402,直到满足收敛条件。
该收敛条件可以包括以下一项或者多项:对子流形预测网络的训练次数达到预设次数,或者,子流形预测网络的输出精度高于预设精度值,或者,子流形预测网络的平均精度高于预设平均值,或者,子流形预测网络的训练时长超过预设时长等。
405、输出更新后的子流形预测网络。
其中,在符合收敛条件之后,即可输出更新后的子流形预测网络。例如,若本申请提供的神经网络训练方法由服务器来执行,则在得到收敛的子流形预测网络之后,即可将该子流形预测网络部署于服务器或者终端中。
因此,本申请实施方式中,可以训练子流形预测网络,用于输出点云数据中的点对是否处于同一个子流形中,以便于在推理时可以根据子流形预测网络的预测结果来构建点云数据对应的子流形的边界,进而构建出简化后的三维模型,并且可以提高得到的三维模型的准确度,使该三维模型所包括的子流形更丰富,细节更丰富。
二、推理阶段
在训练阶段,训练得到更新后的子流形预测网络,用于输出点云数据中点对是否处于同一个子流形中。在推理阶段,即可使用训练好的子流形预测网络来预测输入的点云数据中的点对是否处于同一个子流形中,然后根据预测结果来确定点云数据形成的子流形的边界,并根据子流形的边界上的点来构建简化的三维模型。下面对本申请提供的三维模型构建方法的流程进行详细介绍。
参阅图5,本申请提供的一种三维模型构建方法的流程示意图,如下所述。
501、获取点云数据。
该点云数据中包括多个点形成的数据,该点云数据可以参考前述步骤401中所提及的点云数据,区别在于本步骤中的该点云数据不具有标签,此处不再赘述。
502、将点云数据输入至子流形预测网络,输出多个点的预测结果。
该子流形预测网络可以是通过前述图4中的方法步骤训练得到的网络,将点云数据作为子流形预测网络的输入,输出多个点的预测结果。该预测结果中可以包括表示该多个点中的点对是否在同一个子流形的标识。
可选地,该预测结果中还可以包括每个点对应的切平面的信息,如该切平面的法向量。该切平面的信息可以用于帮助识别点对是否在同一个子流形。例如,可以通过该法向量,识别两个点的切平面是否相同或者是否平行,从而帮助判断该两个点是否处于同一个子流形中。
503、根据预测结果从多个点中筛选出多个角点。
在通过子流形预测网络预测输出的预测结果确定每个点和相邻点是否属于同一子流形之后,可以从该多个点中筛选出多个角点,角点即包括该多个点形成的子流形的边界上的点。
具体地,在得到多个点的预测结果之后,即可获知处于同一子流形内的点,可以选择子流形的边界上的点作为角点。例如,若子流形是多边形,则可以选择该多边形每条边的两端的点作为角点,该角点之间的连接可以形成简化后的子流形的图案。
在一种可能的实施方式中,筛选角点的具体方式可以包括:对多个点进行三角形构网,即将相邻的三个点连接起来,形成一个或者多个三角形网格,根据预测结果从一个或者多个三角形网格中选择处于同一子流行的边界上的点,并从中提取出角点。例如,当子流形为多边形时,即可提取该多边形的角点,当子流形为圆形时,可以从圆形的边界中每个一段距离后选择点作为角点。因此,本申请实施方式中,可以通过选取角点的方式,得到简化的子流形的形状,以便于后续构建简化的三维模型。
504、根据多个角点构建三维模型。
在得到多个角点之后,即可根据该多个角点,确定各个子流形的形状,然后组合成简化的三维模型。
在一种可能的实施方式中,可以使用筛选出来的多个角点和该多个角点之间的测地距离,构建一个或者多个德劳内(Delaunay)三角形网格,当存在多个德劳内三角形网格时,合并该多个德劳内三角形网格,即可得到简化的三维模型。
因此,在本申请实施方式中,可以通过子流形预测网络来预测点对是否处于同一个子流形中,然后根据所有点的预测结果,从多个点中筛选出子流形的边界上的角点,从而根据角点构建出子流形的形状,从而组合得到简化的三维模型,可以高效地构建出简化的三维模型。并且,更详细地,本申请通过训练得到的子流形预测网络,来预测点对是否处于同一个子流形中,从而将点云数据中的多个点划分到不同的子流形或者图元中,可以实现图元级别的实例分割,可以非常精准地实现实例分割,进而可以提高最终得到的三维模型的效果,使三维模型在进行了简化的基础上细节更丰富,即使存在噪声,本申请提供的方法也可以通过训练子流形预测网络的方式适应不同的噪声水平,提高输出的三维模型的准确性。
前述对本申请提供的神经网络训练方法和三维模型构建方法的流程进行了介绍,为便于理解,下面结合更具体的示例,对本申请提供的神经网络训练方法和三维模型构建方法的流程进行更详细的介绍。
首先,示例性地,子流形预测网络的结构可以如图6所示。
该子流形预测网络可以包括PointNet++网络(以下简称PN网络)601、稀疏三维卷积(spconv)602和边界判别网络603。
PN网络可以用于从点云数据中提取特征,得到低分辨率的局部特征。
稀疏三维卷积可以用于基于低分辨率局部特征来进行特征提取,得到全局特征。例如,可以通过一个或者多个spconv组成UNet,用于对输入的特征进行多次卷积和对应的反卷积,从而结合每个点附近更多的点的特征,输出每个点对应的全局特征。
该全局特征和局部特征组合起来即可以为每个点对应的特征。
为便于理解,可以将PN网络和spconv形成的网络称为子流形嵌套网络,即子流形嵌套网络可以用于从点云数据中提取特征,得到低分辨率的局部特征,基于低分辨率局部特 征来进行特征提取,得到全局特征,组合全局特征和局部特征,得到每个点对应的特征。
边界判别网络的输入包括每个点对应的特征(即局部特征和全局特征组成的特征),用于判断每个点和相邻点组成的点对是否处于同一个子流形中。
结合前述图6所述的子流形预测网络,参阅图7,下面结合训练过程,对该子流形预测网络执行的具体步骤进行示例性说明。为便于理解,将该子流形预测网络的输出步骤分为多个步骤,如下表示为S1-1至S1-8。
S1-1:
首先,输入的点云数据中可以包括N个点,通过{p i}表示,作为Nx3的张量。其中,点云数据中标识出了每个点的位置,而每个点还具有对应的信息,如像素值、亮度值、深度值或者强度等信息。
其中,点云数据还具有对应的真值数据,如人工标记的数据。其中包括:取每个点i的邻近j个点作为点对,如取每个点和其邻近的16个点作为点对,标记出每个点对是否处于同一个图元或者子流形中,如通过True/False表示点对是否处于同一个子流形中。
可选地,真值数据中还可以包括每个点所在的图元或者子流形对应的切平面的信息,如表示为最近点坐标o i和对应的法向量n i
将N个点分别与邻近的点组成点对,如图7中所示的通过K最近邻(K-NearestNeighbor,KNN)分类算法,将每个点和邻近的多个点组成点对,每个点i取邻近的j个点组成点对,得到点对<i,j>。为便于理解,本申请实施例中以取每个点周围的16个点组成16个点对,即点云数据即可组成N*16个点对为例。
PN网络可以包括两层点云网络(PointNet++,PN),在每层PN中,获取每个点和邻近的16个点的信息,可以通过conv1D卷积、batchnorm和对16个近邻的maxpool操作提取该层的每个点的输出特征,因取邻近点的数量为16,输出的维度为16维。相当于以每个点与邻近的16个点为单位提取特征,得到每个点对应的局部特征。
S1-2:
然后对N个点进行体素化,将N个点划分为M个体素,每个体素内包括一个或者多个点。相当于对N个点进行了下采样,得到分辨率更低的M个体素。例如,可以设置体素化的分辨率为r=0.1米,对于其中一个点,如坐标[x,y,z]的点,将其对应到[floor(x/r),floor(y/r),floor(z/r)]的体素(floor(d)代表小于等于d的最大整数)。体素化后每个体素的特征可以设置为其包括的点的特征的平均值,或者也可以设置为分布最多的值等,具体可以根据实际应用场景调整。假设N个点被映射到M个体素,则体素化后的特征张量为M*16。
S1-3:
随后通过spconv组成的UNet,输出每个点对应的特征。示例性地,UNet可以对输入的特征进行多次卷积和反卷积,例如,对输入的M*16维的特征进行卷积操作,从输入的数据中提取到特征,然后对提取到的特征进行反卷积操作,并将反卷积的操作作为下一次卷积操作的输入,以此类推,最终输出M*16维的全局特征。相当于通过UNet,以更多数量(即第二预设数量)的点为单位来提取特征,得到与邻近点更具关联性的全局特征。
S1-4:
随后进行反体素化,或者称为去体素化,即将M*16特征反馈至每个点,得到每个点对应的特征。例如,在前述体素化的步骤中,将N个点划分为M个体素,在得到M*16特征之后,将每个体素内的点的特征设置为该体素的特征,输出N*16全局特征。
S1-5:
然后将全局特征和局部特征组合起来,组合方式可以选择拼接,即将局部特征N*16和全局特征N*16拼接起来,得到最终的N*32特征。
S1-6:
可选地,为了进一步融合局部特征和全局特征,可以设置二层的多层感知器(Multilayer Perceptron,MLP),将特征维度从32维提升到64维,然后再降低至32维,输出更新后的N*32特征。相当于通过MLP来拟合出更复杂的数据,增加子流形预测网络的参数,从而使训练后的子流形预测网络的输出更准确。
其中,可选地,若每个点的标签还包括其所在的子流形的切平面的信息,则还可以将N*32维中的部分特征作为子流形预测网络输出的预测的切平面的信息,此处以N*32维特征中的前6维特征作为子流形预测网络输出的预测的切平面的信息为例。定义前6维为的预测的切平面信息,即每个点最近点的坐标o i和对应法向量n i,后26维则可以理解为该点对应的图元的局部信息的隐式特征向量X i。相应地,在训练阶段,可以设置损失函数对子流形预测网络进行训练,使子流形预测网络输出的N*32维中的前6维与真值中的切平面的信息更接近或者相同。该损失函数可以如回归损失函数(L2-Loss)或均方误差(mean-square error,MSE)等。相当于计算子流形预测网络中输出的特征和真值数据中包括的法向量之间的误差,然后根据该误差计算更新子流形预测网络的梯度,根据该梯度更新子流形预测网络的参数。
S1-7:
然后将N*32特征作为边界判别网络的输入,输出每个点对是否在同一子流形的预测结果。边界判别网络具体执行的步骤例如,点云数据中的N个点形成N*16个点对,通过前述的步骤可以输出每个点对应的特征,如表示为:<o i,n i,X i,o j>,o i表示点的位置,n i表示每个点所在的图元或者子流形的切平面的法向量,X i表示该点对应的特征,o j表示与o i形成点对的另一个点的位置。获取待判别的信息,如表示为:<n i,X i,o j-o i>,即对于朝向为n i且特征为X i的以原点为中心点的子流形中,判断偏移量为o j-o i的点是否在该子流形中。可以理解为,相当于以某一个点为原点,然后划出一定范围,如以与邻近的16个点中最远的点之间的距离为半径形成子流形或者图元,判断其余的N-1个点是否在该子流形或者图元中,如输出0表示点对不在同一子流形,输出1表示点对在同一个子流形中,或者,输出1表示点对不在同一个子流形中,而输出0表示点对在同一个子流形中。
S1-8:
在训练过程中,可以将边界判别网络的输出输入一个三层的MLP(三层的输出维度分别 为64、32和2),输出为每个点是否为边界的分值,通过真值数据中标记出的边界和二值交叉熵(binary cross-entropy)计算出损失值,然后基于该损失值反向更新子流形预测网络。
可以重复执行上述S1-1至S1-8步骤,直到子流形预测网络满足收敛条件,该收敛条件可以包括迭代次数达到预设次数,迭代时长达到预设时长,子流形预测网络的输出精度高于第一阈值,或者子流形预测网络的平均输出精度高于第二阈值等中的一项或者多项,从而得到满足需求的子流形预测网络,以便于进行后续的三维模型构建。
因此,在本申请实施方式中,结合PN和Spconv来构成子流形嵌套网络,相当于具有高分辨率的特征提取网络,提高后续的边界判别网络的识别准确率,使预测出的处于同一个子流形的点更准确。并且,可以人工构造的大量真是激光采集的点云数据以及对应的人工手绘三维模型作为真值数据,通过分析该三维模型的面片结构,相当于分析了人工手绘模型的图元实例,通过这种方式训练得到的子流形预测网络,可以准确地用最邻近的方式为点云数据生成图元级的预测结果,提高了预测的各个点是否处于同一个子流形的准确度,进而可以提高后续进行三维模型构建的准确度,得到更准确更清晰的三维模型。
前述对子流形预测网络的训练过程进行了示例性介绍,下面对推理过程进行示例性介绍。
参阅图8,本申请提供的三维模型构建方法的另一种流程示意图。
S0:获取输入的点云数据。
其中,点云数据中包括了N个点,该点云数据与前述图7中输入至子流形预测网络的点云数据类似,区别在于该点云数据中不包括真值数据,此处不再赘述。
该点云数据可以包括激光雷达或者相机采集到的数据,其中包括了多个点,该多个点形成采集场景中的各个实例。
例如,本申请提供的三维模型构建方法可以由终端来执行,终端上设置的激光雷达或者能采集到的深度的摄像头可以采集到的点云数据,该点云数据中包括了N个点,该N个点形成了当前场景中的各个实例。
又例如,该终端可以是智能车,该智能车内设置有多个摄像头和激光雷达,可以通过车内设置的摄像头或者激光雷达来采集周围环境的点云数据,并通过以下步骤构建简化的三维模型,从而使车辆快速地或者周边的环境信息,如障碍物的位置、障碍物的形状等信息,从而提高车辆的行车安全性。
S1:子流形嵌套。
通过子流形预测网络判断N个点形成的各个点对是否处于同一个子流形或者同一个图元中,相当于识别N个点之间是否处于同一个平面中。该子流形预测网络执行的步骤可以参阅前述步骤S1-1至步骤S1-8,区别在于点云数据不具有对应的真值数据,且无需对子流形预测网络进行训练,即无需计算损失值,对于类似的步骤此处不再赘述。
S2:构网与折线简化。
此步骤相当于基于N个点进行构网,即连接该N个点,形成多个三角形网格,根据子流形预测网络输出的N个点形成的点对是否处于同一个子流形的预测结果,从该多个三角 形网格中选择与至少两个实例相邻或者只与唯一的三角形相邻的边作为子流形的边界。
然后使用可以Ramer-Douglas-Peucker算法,来提取简化的角点,相当于提取边界上的端点作为角点,或者若子流形为圆形,则可以从圆形的边界上每隔一定距离选取一个点作为角点等。提取角点的步骤可以例如:1、对于N个收尾相连的点,初始时只取收尾两点。2、对于当前选取的点收尾相连形成折线。计算未被选取点到折线的最远距离。3、若最远距离大于某阈值,则将对应的点纳入选取点,重复步骤2,否则选取点收尾相连的折线即为最终简化折线,能够确保简化的折线与原始折线距离不超过阈值。
可以理解为,根据子流形预测网络输出的N个点形成的点对是否处于同一个子流形的预测结果,可以确定出处于同一个子流形中的点,并确定哪些点是处于子流形的边界上,从而提取出子流形的边界或者删除子流形内部的点,保留子流形边界上的角点。
例如,在将点云数据中的点p i移动到子流形预测网络输出的o i,通过16近邻建立无向图(即每个点为图的点,每个点到离其最近的16个点连边),并利用子流形预测网络中的边界判别网络判断无向图的每条边是否属于不同的图元实例,将不属于同一实例的边删除。接下来使用标准的泛洪(flood fill)算法将无向图分割为联通分量,每个联通分量内的点集即为一个图元实例,一个子流形可以包括一个或者多个图元。可以理解为,在确定子流形的边界之后,边界内的点集即组成一个图元。
S3:测地距离三角剖分。
具体地,可以将步骤S2中提及的三角形格网作为一个无向图,其点集为三角形网格中的所有角点,边集为三角形网格中的所有边,边之间的距离定位为边的两个点的欧氏距离。针对每个图元,可以计算N个点中的任意点到S2中所提取到的角点的最短路径,并记录N个点中每个点最近的角点,从而可以输出维诺(voronoi)图。针对维诺图中的每个三角形网格,若三个点的最近角点互不相同,则可以将角点进行连接,形成基于测地距离的德劳内(delaunay)三角形网格。
计算N个点中的任意点到S2中所提取到的角点的最短路径可以使用通过Dijkstra算法,例如,给定N个点和该N个点之间的连接边,找寻N个点到各个角点的最短路径。设置角点的最短路径为0,计入计算完成点集。其余点的最短路为无穷大。将角点加入优先级队列。不断从队列中取出最小距离的点,遍历所有相邻边,若该点距离加边距小于相邻点距离,则更新相邻点距离,并加入优先级队列。当队列为空时,算法结束。
S4:平面组合。
在步骤S3中通过简化后的角点得到多个德劳内三角形网格之后,组合该多个德劳内三角形网格,即可得到简化后的三维模型。
S5:输出矢量化三维模型。
通过上述S0-S4,即可得到简化后的三维模型,将该简化后的三维模型作为最终的三维模型进行输出。
因此,在本申请实施方式中,可以通过子流形预测网络来判断点云数据中多个点形成的点对是否处于同一个子流形中,相当于识别出在同一个子流形中的点,提取出图元实例,然后确定出各个子流形的边界,并通过测地距离划分得到德劳内三角形网格,通过组合德 劳内三角形网格,得到以图元为单位的简化的三维模型;即对点云进行构网到矢量化的几何分析***,结合四面体剖分构网、边界简化和测地距离Delaunay三角化,得到基于图元的简化的三维模型。因此,通过本申请提供的方法,可以高效且准确地输出简化的三维模型,实现轻量化三维模型。
为进一步便于理解,下面对本申请提供的三维模型构建方法的输出进行更详细的介绍。
如图9所示,为基于常用的基于空间剖分简化的三维模型、通过几何最优的局部简化模型和通过本申请提供的三维模型构建方法输出的模型的对比示意图。
其中,在得到点云数据之后,常用地,可以基于空间剖分的方式来结构化平面,得到简化的三维模型,即前述的常用方式二。或者,还可以基于几何最优的局部简化,重建得到简化后的网格,组成简化后的三维模型,即前述的常用方式一。
而本申请通过子流形预测网络,预测多个点中的各个点对是否处于同一个子流形中,并将多个点进行连接构建多个三角形网格,并使用子流形预测网络的输出结果来选择边界,并从边界上选取角点,从而实现基于图元级别的简化,得到更为简化且表达更准确的三维模型。如图9中所示,通过本申请提供的方式,明显可以得到简化,且能更准确描述模型结构的三维模型。例如,通过本申请提供的子流形预测网络,可以准确地实现实例分割,如图10所示。可以通过子流形预测网络识别出点云数据中各个点形成的实例,并结合四面体剖分构网、边界简化和测地距离Delaunay三角化,得到简化后的三维模型。相对于前述的常用方式三,本申请提供的方式可以通过子流形预测网络,简化比可以进一步下降到0.15%,从而得到简化而又可以准确描述实例的三维模型。
并且,本申请提供的方法中,即可点云数据中具有噪声,也可以通过对子流形预测网络进行训练来提高在具有噪声的环境下的预测效果,提高本方案的鲁棒性。此外,即使点云数据中具有大量的图元实例,也可以通过本申请提供的方式,如使用子流形预测网络的输出,并采用flood fill算法来识别各个图元,及时存在大量的图元也可以准确构建出三维模型,泛化能力强。
并且,在得到简化的三维模型之后,可以将该简化的三维模型应用在各种场景。
例如,在自动驾驶场景中,可以通过车辆中设置的激光雷达采集周围的环境信息,并通过本申请提供的方法,来构建简化的三维模型,在车辆的显示屏中显示该简化的三维模型,使用户可以通过显示的简化的三维模型,快速或者车辆的周边环境,提高用户体验。
又例如,用户可以在终端中使用AR地图,当用户使用AR地图进行导航时,可以使用终端实时拍摄周边的环境,并通过本申请提供的方法快速完成三维模型的构建,并在显示屏中快速识别出实例,从而基于识别出的实例在显示屏中显示导航路径,使用户可以按照导航路径来行走。
还例如,本申请提供的方法可以部署于云平台中,当用户需要构建某个区域内的简化的三维模型时,可以通过客户端向云平台发送相机或者激光雷达采集到的数据,云平台可以快速构建简化的三维模型,并反馈至客户端,使用户快速得到某一区域内的简化的三维模型。
前述对本申请提供的神经网络训练方法和三维模型构建方法的流程进行了详细介绍, 下面对本申请提供的训练装置和三维模型构建装置进行介绍。该训练装置可以用于执行前述图4-图8中所提及中的神经网络训练方法的步骤,该三维模型构建装置可以用于执行前述图5-图8中所提及的三维模型构建方法的步骤。
参阅图11,本申请提供的一种神经网络训练装置的结构示意图,如下所述。
该神经网络训练装置可以包括:
获取模块1101,用于获取训练数据,训练数据中包括多个点和每个点对应的标签,每个点对应的标签中包括每个点和相邻的点是否属于同一个子流形的真实结果;
输出模块1102,用于将多个点作为子流形预测网络的输入,输出多个点的预测结果,预测结果包括多个点中的每个点和相邻点是否属于同一个子流形,其中,子流形预测网络从点云数据提取特征,得到多个点中每个点对应的特征,并根据每个点对应的特征确定每个点和相邻点是否属于同一子流形;
损失模块1103,用于根据预测结果和每个点对应的标签计算损失值;
更新模块1104,用于根据损失值更新子流形预测网络,得到更新后的子流形预测网络。
在一种可能的实施方式中,输出模块1102,具体用于:从点云数据中,以每个点以及相邻的第一预设数量的点为单位提取特征,得到局部特征;对点云数据进行至少一次下采样,得到下采样数据;从下采样数据中提取特征,得到全局特征;融合局部特征和全局特征,得到多个点中每个点对应的特征。
在一种可能的实施方式中,输出模块1102,具体用于:对点云数据进行划分,得到多个体素,每个体素包括至少一个点以及对应的至少一个点中每个点的局部特征;以多个体素中每个体素以及相邻的第二预设数量的体素内的点为单位,进行至少一次特征提取,得到全局特征。
在一种可能的实施方式中,输出模块1102,具体用于:根据每个点对应的特征确定每个点对应的预测法向量;根据每个点的特征、每个点的预测法向量和相邻点的预测法向量,确定每个点和相邻点是否属于同一子流形。
在一种可能的实施方式中,预测结果中还包括每个点对应的法向量,每个点的标签中还包括每个点对应的真值法向量;损失模块1103,具体用于根据每个点对应的法向量和每个点对应的真值法向量计算损失值。
参阅图12,本申请提供的一种三维模型构建装置的结构示意图。
该三维模型构建装置可以包括:
收发模块1201,用于获取点云数据,点云数据包括多个点形成的数据;
预测模块1202,用于将点云数据输入至子流形预测网络,输出多个点的预测结果,预测结果用于标识多个点中的每个点和相邻点是否属于同一个子流形,其中,子流形预测网络从点云数据提取特征,得到多个点中每个点对应的特征,并根据每个点对应的特征确定每个点和相邻点是否属于同一子流形;
筛选模块1203,用于根据多个点的预测结果从多个点中筛选出多个角点,多个角点包括多个点形成的各个子流形的边界上的点;
构建模块1204,用于根据多个角点构建三维模型,多个角点形成的网格组成三维模型 中的流形。
在一种可能的实施方式中,预测模块1202,具体用于通过子流行预测网络执行以下步骤:从点云数据中,以每个点以及相邻的第一预设数量的点为单位提取特征,得到每个点对应的局部特征;对点云数据进行下采样,得到下采样数据;从下采样数据中提取特征,得到每个点对应的全局特征;融合局部特征和全局特征,得到多个点中每个点对应的特征。
在一种可能的实施方式中,预测模块1202,具体用于:对点云数据进行划分,得到多个体素,每个体素包括至少一个点以及对应的至少一个点中每个点的局部特征;以多个体素中每个体素以及相邻的第二预设数量的体素内的点为单位,进行特征提取,得到全局特征。
在一种可能的实施方式中,预测模块1202,具体用于:根据每个点对应的特征确定每个点对应的法向量;根据每个点的特征、每个点的法向量和相邻点的法向量,确定每个点和相邻点是否属于同一子流形。
在一种可能的实施方式中,筛选模块1203,具体用于:对多个点进行三角形构网,形成至少一个三角形网格;根据预测结果从至少一个三角形网格中提取属于同一子流形的边界;从至少一个三角形网格中提取到的属于同一子流形的边界上的点中,提取多个角点。
在一种可能的实施方式中,构建模块1204,具体用于:使用多个角点和多个角点之间的测地距离构建至少一个德劳内三角形网格;合并至少一个德劳内三角形网格,得到三维模型。
请参阅图13,本申请提供的另一种神经网络训练装置的结构示意图,如下所述。
该神经网络训练装置可以包括处理器1301和存储器1302。该处理器1301和存储器1302通过线路互联。其中,存储器1302中存储有程序指令和数据。
存储器1302中存储了前述图4至图10中的步骤对应的程序指令以及数据。
处理器1301用于执行前述图4至图10中任一实施例所示的训练装置执行的方法步骤。
可选地,该神经网络训练装置还可以包括收发器1303,用于接收或者发送数据。
本申请实施例中还提供一种计算机可读存储介质,该计算机可读存储介质中存储有程序,当其在计算机上运行时,使得计算机执行如前述图4至图10所示实施例描述的神经网络训练方法中的步骤。
可选地,前述的图13中所示的神经网络训练装置为芯片。
参与图14,本申请提供的另一种三维模型构建装置结构示意图。
该三维模型构建装置可以包括处理器1401和存储器1402。该处理器1401和存储器1402通过线路互联。其中,存储器1402中存储有程序指令和数据。
存储器1402中存储了前述图4至图10中的步骤对应的程序指令以及数据。
处理器1401用于执行前述图4至图10中任一实施例所示的三维模型构建装置执行的方法步骤。
可选地,该三维模型构建装置还可以包括收发器1403,用于接收或者发送数据。
本申请实施例中还提供一种计算机可读存储介质,该计算机可读存储介质中存储有程序,当其在计算机上运行时,使得计算机执行如前述图4至图10所示实施例描述的三维模 型构建方法中的步骤。
可选地,前述的图14中所示的三维模型构建装置为芯片。
前述对本申请提供的方法和装置分别进行了介绍,为便于理解,下面结合前述的方法和装置,对本申请提供的方法在硬件上运行时由各个硬件分别执行的步骤进行示例性介绍。
一、针对神经网络训练方法,
其中,该神经网络训练方法可以部署于服务器上,当然也可以部署于终端上,本申请以下以部署于服务器为例进行示例性说明。
其中,示例性地,服务器在执行神经网络训练方法时的硬件执行流程如图15所述。
其中,服务器1500中可以包括神经网络运行硬件,如图15中所示的GPU/Ascend芯片1501、CPU1502以及存储器1503等。
服务器端GPU/Ascend芯片1501可以用于在训练时,从存储器中读取真值数据,以及使用读取到的真值数据训练神经网络。
具体地,GPU/Ascend1501可以通过CPU1502从数据库中读取的点云数据对应的真值数据,其中,每个点的数据可以分为切平面信息(如法向量)与点对是否处在同一个子流形的判别信息。GPU/Ascend1501可以通过真值数据中的切平面信息来训练子流形嵌套网络1504,使用子流形嵌套网络1504的输出结果和真值数据中的点对是否处在同一个子流形的判别信息来训练边界判别网络1505,从而实现对子流形嵌套网络和边界判别网络的训练。
训练得到的子流形预测网络可以部署于服务器或者终端,以在推理阶段使用。但子流形嵌套网络1504和边界判别网络1505这两个模块根据服务器或者终端的应用环境决定硬件部署:如果在服务器端做推理,则在服务器端的GPU/Ascend芯片来部署子流形嵌套网络1504和边界判别网络1505;如果在终端做推理,则在终端GPU/D芯片部署子流形嵌套网络1504和边界判别网络1505。
二、针对三维模型构建方法
其中,本申请提供的三维模型构建方法可以部署于终端也可以部署于服务器,下面分别进行示例性介绍。
1、部署于终端
如图16所示,终端1600可以包括GPU/D芯片1601和CPU1602。
GPU/D芯片1601可以用于运行子流形预测网络,即如图16中所示出的子流形嵌套网络和边界判别网络,输出点云数据中各个点对是否处于同一个子流形的预测结果,相当于对点云数据中的多个点进行划分,将每个点划分至对应的子流形中。
CPU1602可以根据GPU/D芯片1601反馈的预测结果,对点云数据进行实例分割,识别出点云数据中的各个子流形。CPU1602还可以使用输入的点云数据中的多个点进行构网,形成多个三角形网格,然后基于实例分割结果提取三角形网格作为子流形的边界,并筛选出角点。然后基于测地距离对角点进行三角化,形成多个德劳内三角形,组合集合得到输出的简化的三维模型。
终端在得到简化后的三维模型之后,即可将该简化的三维模型应用于AR地图、AR游戏或者其他场景等。
具体例如,如图17所示,可以通过终端上设置的相机或者激光(Dtof)采集视觉或者深度信息。然后通过SLAM算法得到对应的点云数据。然后GPU/D芯片1601即可通过子流形预测网络输出点云数据中各个点对是否处于同一个子流形的预测结果,即相当于输出点云数据的实例分割结果。
CPU1602通过子流形预测网络的输出结果来进行实例分割,即按照子流形预测网络的输出结果,识别出处于同一个子流形或者同一个图元的点,并通过本申请提供的三维模型构建方法得到矢量化后的三维模型。该三维模型可以直接送入GPU端进行渲染,而实例分割结果可以作为信息应用于AR事务,比如在AR游戏中,可以通过本文的方法自动提取结构信息(地面、墙面等),并根据信息建立合理的AR游戏场景。
因此,在本申请实施方式中,可以赋予终端更多的功能,为终端的3D数据提供直接的矢量化三维模型和图元级的实例分割能力,带来更多的终端应用并产生价值。例如,用户可以使用手机可以为普通用户提供家具、室内建模能力,允许用户分享轻量级的三维数据,根据环境信息提供家具设计思路或者AR游戏等,提高用户体验。
2、部署于服务器
如图18所示,该服务器可以包括GPU/Ascend芯片1801和CPU1802等。其中,服务器的处理流程与前述图16中终端的处理流程类似,区别在于,GPU/D芯片1601所执行的步骤替换为GPU/Ascend芯片1801来执行,CPU1602执行的步骤替换为CPU1802来执行,此处不再赘述。
此外,服务器还可以包括收发器,如I/O接口,天线或者其他有线或者无线通信接口等,如图19所示,可以通过该收发器1803接收输入的点云数据,并输出简化后的三维模型。
示例性地,该服务器可以是云平台的服务器或者服务器集群中的其中一个服务器,装载了GPU/Ascend1801,用于支持神经网络的运算,同时装载了CPU1802,用于支持实例分割和矢量化三维模型等。用户可以将采集到的点云数据通过网络接口传送至云平台,通过GPU/Ascend1801运行部署于服务器中的子流形预测网络来预测局部的子流形的信息,如处于同一个子流形中的点,并传输至CPU。CPU基于GPU/Ascend反馈的子流形的信息来进行实例分割,并构建矢量化的简化后的三维模型,然后根据用户需求,将实例分割的结果和/或简化的三维模型通过网络接口反馈至用户。相当于可以通过云平台为用户提供图元级别的实例分割和三维模型矢量化的服务器,从而使用户可以方便高效地通过云平台实现图元级别的实例分割和三维模型矢量化。例如,云平台可以为三维数据生产商提供自动化三维建模的服务。生产商只需要将激光扫描或者拍照重建的点云上传到云平台的服务器上,服务器可以自动完成矢量化CAD的算法流程,从而输出矢量化的三维模型。
下面结合前述图4-图10的方法步骤,对服务器进行图元级的实例分割和矢量化三维模型的具体流程进行示例性说明。
首先,服务器可以从网络接口接收到点云数据,建立张量,并将张量传输至GPU/Ascend1801。如输入的点云数据中可以包括N个点,通过{p i}表示,建立Nx3的张量。
然后,利用GPU/Ascend芯片1801和CPU1802对输入的点云数据进行图元级的实例分 割。具体地,将点集的每个点取16个最近邻,组成N*16个点对,接下来GPU/Ascend1801将点云数据输入子流形嵌套网络预测局部子流形。接着,GPU/Ascend1801将每对近邻的子流形特征输入边界判别网络对其判断是否为边界,即判断N个点组成的无向图的每条边是否属于不同的图元或者子流形,将不属于同一图元或者子流形的边删除,或者筛选出属于同一个图元或者子流形的点。随后CPU1802可以根据子流形预测网络输出的边界和识别出处于同一个子流形或者图元中的点,使用flood fill算法对点云进行图元级的实例分割,
随后,CPU1802对点云数据中的多个点进行三角化构网,随后从三角形网格中提取边界对应的三角形网格,即为两个不同实例相邻或者只和唯一的三角形相邻的边界的集合。然后通过Ramer-Douglas-Peucker算法简化该集合,提取到角点。
随后CPU1802根据GPU/Ascend1801传输的预测结果,和三角形网格与角点,对每个图元进行角点三角化。如使用Dijkstra算法求网格任意点的最近角点,并对于每个三点最近角点互不相同的三角形,将其相关的三个角点连成三角形,从而实现角点三角化,得到多个德劳内三角形网格。
将角点构成的三角形网格合并,即可输出最终的矢量化的三维模型,如CAD模型。
因此,本申请实施方式中,可以在服务器中部署本申请提供的三维模型构建方法,从而使服务器可以将输入的点云数据转换为矢量化的三维模型,且可以实现图元级别的实例分割,具有很高的鲁棒性,高效的得到效果更好的三维模型。
本申请实施例还提供了一种神经网络训练装置,该神经网络训练装置也可以称为数字处理芯片或者芯片,芯片包括处理单元和通信接口,处理单元通过通信接口获取程序指令,程序指令被处理单元执行,处理单元用于执行前述图4-图10中任一实施例所示的神经网络训练装置执行的方法步骤。
本申请实施例还提供了一种三维模型构建装置,该三维模型构建装置也可以称为数字处理芯片或者芯片,芯片包括处理单元和通信接口,处理单元通过通信接口获取程序指令,程序指令被处理单元执行,处理单元用于执行前述图4-图10中任一实施例所示的目标检测装置执行的方法步骤。
本申请实施例还提供一种数字处理芯片。该数字处理芯片中集成了用于实现上述处理器1301/1401,或者处理器1301/1401的功能的电路和一个或者多个接口。当该数字处理芯片中集成了存储器时,该数字处理芯片可以完成前述实施例中的任一个或多个实施例的方法步骤。当该数字处理芯片中未集成存储器时,可以通过通信接口与外置的存储器连接。该数字处理芯片根据外置的存储器中存储的程序代码来实现上述实施例中神经网络蒸馏装置执行的动作。
本申请实施例中还提供一种包括计算机程序产品,当其在计算机上运行时,使得计算机执行如前述图4至图10所示实施例描述的方法中神经网络训练装置所执行的步骤。
本申请实施例中还提供一种包括计算机程序产品,当其在计算机上运行时,使得计算机执行如前述图4至图10所示实施例描述的方法中三维模型构建装置所执行的步骤。
本申请实施例提供的神经网络训练装置或者三维模型构建装置可以为芯片,该芯片可以包括:处理单元和通信单元,所述处理单元例如可以是处理器,所述通信单元例如可以 是输入/输出接口、管脚或电路等。该处理单元可执行存储单元存储的计算机执行指令,以使服务器内的芯片执行上述图4至图10所示实施例描述的神经网络训练方法或者三维模型构建方法。可选地,所述存储单元为所述芯片内的存储单元,如寄存器、缓存等,所述存储单元还可以是所述无线接入设备端内的位于所述芯片外部的存储单元,如只读存储器(read-only memory,ROM)或可存储静态信息和指令的其他类型的静态存储设备,随机存取存储器(random access memory,RAM)等。
具体地,前述的处理单元或者处理器可以是中央处理器(central processing unit,CPU)、网络处理器(neural-network processing unit,NPU)、图形处理器(graphics processing unit,GPU)、数字信号处理器(digital signal processor,DSP)、专用集成电路(application specific integrated circuit,ASIC)或现场可编程逻辑门阵列(field programmable gate array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等。通用处理器可以是微处理器或者也可以是任何常规的处理器等。
示例性地,请参阅图20,图20为本申请实施例提供的芯片的一种结构示意图,所述芯片可以表现为神经网络处理器NPU 200,NPU 200作为协处理器挂载到主CPU(Host CPU)上,由Host CPU分配任务。NPU的核心部分为运算电路2003,通过控制器2004控制运算电路2003提取存储器中的矩阵数据并进行乘法运算。
在一些实现中,运算电路2003内部包括多个处理单元(process engine,PE)。在一些实现中,运算电路2003是二维脉动阵列。运算电路2003还可以是一维脉动阵列或者能够执行例如乘法和加法这样的数学运算的其它电子线路。在一些实现中,运算电路2003是通用的矩阵处理器。
举例来说,假设有输入矩阵A,权重矩阵B,输出矩阵C。运算电路从权重存储器2002中取矩阵B相应的数据,并缓存在运算电路中每一个PE上。运算电路从输入存储器2001中取矩阵A数据与矩阵B进行矩阵运算,得到的矩阵的部分结果或最终结果,保存在累加器(accumulator)2008中。
统一存储器2006用于存放输入数据以及输出数据。权重数据直接通过存储单元访问控制器(direct memory access controller,DMAC)2005,DMAC被搬运到权重存储器2002中。输入数据也通过DMAC被搬运到统一存储器2006中。
总线接口单元(bus interface unit,BIU)2010,用于AXI总线与DMAC和取指存储器(Instruction Fetch Buffer,IFB)2009的交互。
总线接口单元2010(bus interface unit,BIU),用于取指存储器2009从外部存储器获取指令,还用于存储单元访问控制器2005从外部存储器获取输入矩阵A或者权重矩阵B的原数据。
DMAC主要用于将外部存储器DDR中的输入数据搬运到统一存储器2006或将权重数据搬运到权重存储器2002中或将输入数据数据搬运到输入存储器2001中。
向量计算单元2007包括多个运算处理单元,在需要的情况下,对运算电路的输出做进一步处理,如向量乘,向量加,指数运算,对数运算,大小比较等等。主要用于神经网 络中非卷积/全连接层网络计算,如批归一化(batch normalization),像素级求和,对特征平面进行上采样等。
在一些实现中,向量计算单元2007能将经处理的输出的向量存储到统一存储器2006。例如,向量计算单元2007可以将线性函数和/或非线性函数应用到运算电路2003的输出,例如对卷积层提取的特征平面进行线性插值,再例如累加值的向量,用以生成激活值。在一些实现中,向量计算单元2007生成归一化的值、像素级求和的值,或二者均有。在一些实现中,处理过的输出的向量能够用作到运算电路2003的激活输入,例如用于在神经网络中的后续层中的使用。
控制器2004连接的取指存储器(instruction fetch buffer)2009,用于存储控制器2004使用的指令;
统一存储器2006,输入存储器2001,权重存储器2002以及取指存储器2009均为On-Chip存储器。外部存储器私有于该NPU硬件架构。
其中,循环神经网络中各层的运算可以由运算电路2003或向量计算单元2007执行。
其中,上述任一处提到的处理器,可以是一个通用中央处理器,微处理器,ASIC,或一个或多个用于控制上述图4-图10的方法的程序执行的集成电路。
另外需说明的是,以上所描述的装置实施例仅仅是示意性的,其中所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部模块来实现本实施例方案的目的。另外,本申请提供的装置实施例附图中,模块之间的连接关系表示它们之间具有通信连接,具体可以实现为一条或多条通信总线或信号线。
通过以上的实施方式的描述,所属领域的技术人员可以清楚地了解到本申请可借助软件加必需的通用硬件的方式来实现,当然也可以通过专用硬件包括专用集成电路、专用CPU、专用存储器、专用元器件等来实现。一般情况下,凡由计算机程序完成的功能都可以很容易地用相应的硬件来实现,而且,用来实现同一功能的具体硬件结构也可以是多种多样的,例如模拟电路、数字电路或专用电路等。但是,对本申请而言更多情况下软件程序实现是更佳的实施方式。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在可读取的存储介质中,如计算机的软盘、U盘、移动硬盘、只读存储器(read only memory,ROM)、随机存取存储器(random access memory,RAM)、磁碟或者光盘等,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本申请各个实施例所述的方法。
在上述实施例中,可以全部或部分地通过软件、硬件、固件或者其任意组合来实现。当使用软件实现时,可以全部或部分地以计算机程序产品的形式实现。
所述计算机程序产品包括一个或多个计算机指令。在计算机上加载和执行所述计算机程序指令时,全部或部分地产生按照本申请实施例所述的流程或功能。所述计算机可以是通用计算机、专用计算机、计算机网络、或者其他可编程装置。所述计算机指令可以存储 在计算机可读存储介质中,或者从一个计算机可读存储介质向另一计算机可读存储介质传输,例如,所述计算机指令可以从一个网站站点、计算机、服务器或数据中心通过有线(例如同轴电缆、光纤、数字用户线(DSL))或无线(例如红外、无线、微波等)方式向另一个网站站点、计算机、服务器或数据中心进行传输。所述计算机可读存储介质可以是计算机能够存储的任何可用介质或者是包含一个或多个可用介质集成的服务器、数据中心等数据存储设备。所述可用介质可以是磁性介质,(例如,软盘、硬盘、磁带)、光介质(例如,DVD)、或者半导体介质(例如固态硬盘(solid state disk,SSD))等。
本申请的说明书和权利要求书及上述附图中的术语“第一”、“第二”、“第三”、“第四”等(如果存在)是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。应该理解这样使用的数据在适当情况下可以互换,以便这里描述的实施例能够以除了在这里图示或描述的内容以外的顺序实施。此外,术语“包括”和“具有”以及他们的任何变形,意图在于覆盖不排他的包含,例如,包含了一系列步骤或单元的过程、方法、***、产品或设备不必限于清楚地列出的那些步骤或单元,而是可包括没有清楚地列出的或对于这些过程、方法、产品或设备固有的其它步骤或单元。
最后应说明的是:以上,仅为本申请的具体实施方式,但本申请的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本申请的保护范围之内。

Claims (26)

  1. 一种三维模型构建方法,其特征在于,包括:
    获取点云数据,所述点云数据包括多个点以及每个点对应的信息;
    将所述点云数据输入至子流形预测网络,得到所述多个点的预测结果,所述预测结果用于标识所述多个点中的每个点和相邻点是否属于同一个子流形,其中,所述子流形预测网络从所述点云数据提取特征,得到所述多个点中每个点对应的特征,并根据所述每个点对应的特征确定所述每个点和相邻点是否属于同一子流形;
    根据所述多个点的预测结果从所述多个点中筛选出多个角点,所述多个角点包括所述多个点形成的各个子流形的边界上的点;
    根据所述多个角点构建三维模型,所述多个角点形成的网格组成所述三维模型中的流形。
  2. 根据权利要求1所述的方法,其特征在于,所述子流形预测网络从所述点云数据提取特征,包括:
    从所述点云数据中,以每个点以及相邻的第一预设数量的点为单位提取特征,得到所述每个点对应的局部特征;
    对所述点云数据进行下采样,得到下采样数据,所述下采样数据的分辨率低于所述点云数据的分辨率;
    从所述下采样数据中提取特征,得到所述每个点对应的全局特征;
    融合所述局部特征和所述全局特征,得到所述多个点中每个点对应的特征。
  3. 根据权利要求2所述的方法,其特征在于,所述对所述点云数据进行下采样,包括:
    对所述点云数据进行划分,得到多个体素,每个体素包括至少一个点以及对应的所述至少一个点中每个点的局部特征;
    所述从所述下采样数据中提取特征,包括:
    以所述多个体素中每个体素内的点以及相邻的第二预设数量的体素内的点为单位,进行特征提取,得到所述全局特征,所述第二预设数量的体素内的点的数量不小于所述第一预设数量。
  4. 根据权利要求1-3中任一项所述的方法,其特征在于,所述根据所述每个点的特征确定所述每个点和相邻的点是否属于同一子流形,包括:
    根据所述每个点对应的特征确定所述每个点对应的法向量;
    根据所述每个点的特征、所述每个点的法向量和相邻点的法向量,确定所述每个点和相邻点是否属于同一子流形。
  5. 根据权利要求1-4中任一项所述的方法,其特征在于,所述根据所述预测结果从所述多个点中筛选出多个角点,包括:
    对所述多个点进行三角形构网,形成至少一个三角形网格;
    根据所述预测结果从所述至少一个三角形网格中提取属于同一子流形的边界;
    从所述至少一个三角形网格中提取到的属于同一子流形的边界上的点中,提取所述多个角点。
  6. 根据权利要求5所述的方法,其特征在于,所述根据所述多个角点构建三维模型,包括:
    使用所述多个角点和所述多个角点之间的测地距离构建至少一个德劳内三角形网格;
    合并所述至少一个德劳内三角形网格,得到所述三维模型。
  7. 一种神经网络训练方法,其特征在于,包括:
    获取训练数据,所述训练数据中包括多个点和每个点对应的标签,所述每个点对应的标签中包括用于指示所述每个点和相邻的点是否属于同一个子流形的标识;
    将所述多个点作为子流形预测网络的输入,得到所述多个点的预测结果,所述预测结果包括所述多个点中的每个点和相邻点是否属于同一个子流形,其中,所述子流形预测网络从所述点云数据提取特征,得到所述多个点中每个点对应的特征,并根据所述每个点对应的特征确定所述每个点和相邻点是否属于同一子流形;
    根据所述预测结果和所述每个点对应的标签计算损失值;
    根据所述损失值更新所述子流形预测网络,得到更新后的子流形预测网络。
  8. 根据权利要求7所述的方法,其特征在于,所述子流形预测网络从所述点云数据提取特征,包括:
    从所述点云数据中,以每个点以及相邻的第一预设数量的点为单位提取特征,得到局部特征;
    对所述点云数据进行至少一次下采样,得到下采样数据,所述下采样数据的分辨率低于所述点云数据的分辨率;
    从所述下采样数据中提取特征,得到全局特征;
    融合所述局部特征和所述全局特征,得到所述多个点中每个点对应的特征。
  9. 根据权利要求8所述的方法,其特征在于,所述对所述点云数据进行至少一次下采样中的其中一次,包括:
    对所述点云数据进行划分,得到多个体素,每个体素包括至少一个点以及对应的所述至少一个点中每个点的局部特征;
    所述从所述下采样数据中提取特征,包括:
    以所述多个体素中每个体素内的点以及相邻的第二预设数量的体素内的点为单位,进行至少一次特征提取,得到所述全局特征,所述第二预设数量的体素内的点的数量不小于所述第一预设数量。
  10. 根据权利要求7-9中任一项所述的方法,其特征在于,所述根据所述每个点的特征确定所述每个点和相邻的点是否属于同一子流形,包括:
    根据所述每个点对应的特征确定所述每个点对应的预测法向量;
    根据所述每个点的特征、所述每个点的预测法向量和相邻点的预测法向量,确定所述每个点和相邻点是否属于同一子流形。
  11. 根据权利要求10所述的方法,其特征在于,所述预测结果中还包括所述每个点对应的法向量,所述每个点的标签中还包括所述每个点对应的真值法向量;
    所述根据所述预测结果和所述每个点对应的标签计算损失值,包括:
    根据所述每个点对应的法向量和所述每个点对应的真值法向量计算所述损失值。
  12. 一种三维模型构建装置,其特征在于,包括:
    收发模块,用于获取点云数据,所述点云数据包括多个点以及每个点对应的信息;
    预测模块,用于将所述点云数据输入至子流形预测网络,得到所述多个点的预测结果,所述预测结果用于标识所述多个点中的每个点和相邻点是否属于同一个子流形,其中,所述子流形预测网络从所述点云数据提取特征,得到所述多个点中每个点对应的特征,并根据所述每个点对应的特征确定所述每个点和相邻点是否属于同一子流形;
    筛选模块,用于根据所述多个点的预测结果从所述多个点中筛选出多个角点,所述多个角点包括所述多个点形成的各个子流形的边界上的点;
    构建模块,用于根据所述多个角点构建三维模型,所述多个角点形成的网格组成所述三维模型中的流形。
  13. 根据权利要求12所述的装置,其特征在于,所述预测模块,具体用于通过所述子流行预测网络执行以下步骤:
    从所述点云数据中,以每个点以及相邻的第一预设数量的点为单位提取特征,得到所述每个点对应的局部特征;
    对所述点云数据进行下采样,得到下采样数据,所述下采样数据的分辨率低于所述点云数据的分辨率;
    从所述下采样数据中提取特征,得到所述每个点对应的全局特征;
    融合所述局部特征和所述全局特征,得到所述多个点中每个点对应的特征。
  14. 根据权利要求13所述的装置,其特征在于,所述预测模块,具体用于:
    对所述点云数据进行划分,得到多个体素,每个体素包括至少一个点以及对应的所述至少一个点中每个点的局部特征;
    以所述多个体素中每个体素内的点以及相邻的第二预设数量的体素内的点为单位,进行特征提取,得到所述全局特征,所述第二预设数量的体素内的点的数量不小于所述第一预设数量。
  15. 根据权利要求12-14中任一项所述的装置,其特征在于,所述预测模块,具体用于:
    根据所述每个点对应的特征确定所述每个点对应的法向量;
    根据所述每个点的特征、所述每个点的法向量和相邻点的法向量,确定所述每个点和相邻点是否属于同一子流形。
  16. 根据权利要求12-15中任一项所述的装置,其特征在于,所述筛选模块,具体用于:
    对所述多个点进行三角形构网,形成至少一个三角形网格;
    根据所述预测结果从所述至少一个三角形网格中提取属于同一子流形的边界;
    从所述至少一个三角形网格中提取到的属于同一子流形的边界上的点中,提取所述多个角点。
  17. 根据权利要求16所述的装置,其特征在于,所述构建模块,具体用于:
    使用所述多个角点和所述多个角点之间的测地距离构建至少一个德劳内三角形网格;
    合并所述至少一个德劳内三角形网格,得到所述三维模型。
  18. 一种神经网络训练装置,其特征在于,包括:
    获取模块,用于获取训练数据,所述训练数据中包括多个点和每个点对应的标签,所述每个点对应的标签中包括用于指示所述每个点和相邻的点是否属于同一个子流形的标识;
    输出模块,用于将所述多个点作为子流形预测网络的输入,得到所述多个点的预测结果,所述预测结果包括所述多个点中的每个点和相邻点是否属于同一个子流形,其中,所述子流形预测网络从所述点云数据提取特征,得到所述多个点中每个点对应的特征,并根据所述每个点对应的特征确定所述每个点和相邻点是否属于同一子流形;
    损失模块,用于根据所述预测结果和所述每个点对应的标签计算损失值;
    更新模块,用于根据所述损失值更新所述子流形预测网络,得到更新后的子流形预测网络。
  19. 根据权利要求18所述的装置,其特征在于,所述输出模块,具体用于:
    从所述点云数据中,以每个点以及相邻的第一预设数量的点为单位提取特征,得到局部特征;
    对所述点云数据进行至少一次下采样,得到下采样数据,所述下采样数据的分辨率低于所述点云数据的分辨率;
    从所述下采样数据中提取特征,得到全局特征;
    融合所述局部特征和所述全局特征,得到所述多个点中每个点对应的特征。
  20. 根据权利要求19所述的装置,其特征在于,所述输出模块,具体用于:
    对所述点云数据进行划分,得到多个体素,每个体素包括至少一个点以及对应的所述至少一个点中每个点的局部特征;
    以所述多个体素中每个体素内的点以及相邻的第二预设数量的体素内的点为单位,进行至少一次特征提取,得到所述全局特征,所述第二预设数量的体素内的点的数量不小于所述第一预设数量。
  21. 根据权利要求18-20中任一项所述的装置,其特征在于,所述输出模块,具体用于:
    根据所述每个点对应的特征确定所述每个点对应的预测法向量;
    根据所述每个点的特征、所述每个点的预测法向量和相邻点的预测法向量,确定所述每个点和相邻点是否属于同一子流形。
  22. 根据权利要求21所述的装置,其特征在于,所述预测结果中还包括所述每个点对应的法向量,所述每个点的标签中还包括所述每个点对应的真值法向量;
    所述损失模块,具体用于根据所述每个点对应的法向量和所述每个点对应的真值法向量计算所述损失值。
  23. 一种三维模型构建装置,其特征在于,包括处理器,所述处理器和存储器耦合,所述存储器存储有程序,当所述存储器存储的程序指令被所述处理器执行时实现权利要求1至6中任一项所述的方法。
  24. 一种神经网络训练装置,其特征在于,包括处理器,所述处理器和存储器耦合,所述存储器存储有程序,当所述存储器存储的程序指令被所述处理器执行时实现权利要求7-11中任一项所述的方法。
  25. 一种计算机可读存储介质,包括程序,当其被处理单元所执行时,执行如权利要求1至6或者7至11中任一项所述的方法。
  26. 一种装置,其特征在于,包括处理单元和通信接口,所述处理单元通过所述通信接口获取程序指令,当所述程序指令被所述处理单元执行时实现权利要求1至6或者7至11中任一项所述的方法。
PCT/CN2022/080295 2021-03-16 2022-03-11 一种三维模型构建方法、神经网络训练方法以及装置 WO2022194035A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110280138.1 2021-03-16
CN202110280138.1A CN115147564A (zh) 2021-03-16 2021-03-16 一种三维模型构建方法、神经网络训练方法以及装置

Publications (1)

Publication Number Publication Date
WO2022194035A1 true WO2022194035A1 (zh) 2022-09-22

Family

ID=83321590

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/080295 WO2022194035A1 (zh) 2021-03-16 2022-03-11 一种三维模型构建方法、神经网络训练方法以及装置

Country Status (2)

Country Link
CN (1) CN115147564A (zh)
WO (1) WO2022194035A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115937644A (zh) * 2022-12-15 2023-04-07 清华大学 一种基于全局及局部融合的点云特征提取方法及装置

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118096906B (zh) * 2024-04-29 2024-07-05 中国铁路设计集团有限公司 基于矢量绑定和形态学的点云抽稀方法及***

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2013177586A1 (en) * 2012-05-25 2013-11-28 The Johns Hopkins University An integrated real-time tracking system for normal and anomaly tracking and the methods therefor
CN111199206A (zh) * 2019-12-30 2020-05-26 上海眼控科技股份有限公司 三维目标检测方法、装置、计算机设备及存储介质
CN111615706A (zh) * 2017-11-17 2020-09-01 脸谱公司 基于子流形稀疏卷积神经网络分析空间稀疏数据
CN112288709A (zh) * 2020-10-28 2021-01-29 武汉大学 一种基于点云的三维目标检测方法

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2013177586A1 (en) * 2012-05-25 2013-11-28 The Johns Hopkins University An integrated real-time tracking system for normal and anomaly tracking and the methods therefor
CN111615706A (zh) * 2017-11-17 2020-09-01 脸谱公司 基于子流形稀疏卷积神经网络分析空间稀疏数据
CN111199206A (zh) * 2019-12-30 2020-05-26 上海眼控科技股份有限公司 三维目标检测方法、装置、计算机设备及存储介质
CN112288709A (zh) * 2020-10-28 2021-01-29 武汉大学 一种基于点云的三维目标检测方法

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
SCHMOHL S., SÖRGEL U.: "SUBMANIFOLD SPARSE CONVOLUTIONAL NETWORKS FOR SEMANTIC SEGMENTATION OF LARGE-SCALE ALS POINT CLOUDS", ISPRS ANNALS OF THE PHOTOGRAMMETRY, REMOTE SENSING AND SPATIAL INFORMATION SCIENCES, vol. IV-2/W5, 14 June 2019 (2019-06-14), pages 77 - 84, XP055968117, DOI: 10.5194/isprs-annals-IV-2-W5-77-2019 *
XIAOGANG WANG; YUELANG XU; KAI XU; ANDREA TAGLIASACCHI; BIN ZHOU; ALI MAHDAVI-AMIRI; HAO ZHANG: "PIE-NET: Parametric Inference of Point Cloud Edges", ARXIV.ORG, CORNELL UNIVERSITY LIBRARY, 201 OLIN LIBRARY CORNELL UNIVERSITY ITHACA, NY 14853, 25 October 2020 (2020-10-25), 201 Olin Library Cornell University Ithaca, NY 14853 , XP081798951 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115937644A (zh) * 2022-12-15 2023-04-07 清华大学 一种基于全局及局部融合的点云特征提取方法及装置
CN115937644B (zh) * 2022-12-15 2024-01-02 清华大学 一种基于全局及局部融合的点云特征提取方法及装置

Also Published As

Publication number Publication date
CN115147564A (zh) 2022-10-04

Similar Documents

Publication Publication Date Title
CN108496127B (zh) 集中于对象的有效三维重构
CN111242041B (zh) 基于伪图像技术的激光雷达三维目标快速检测方法
CN112488210A (zh) 一种基于图卷积神经网络的三维点云自动分类方法
CN111401517B (zh) 一种感知网络结构搜索方法及其装置
WO2022194035A1 (zh) 一种三维模型构建方法、神经网络训练方法以及装置
WO2021218786A1 (zh) 一种数据处理***、物体检测方法及其装置
CN113362382A (zh) 三维重建方法和三维重建装置
WO2022179587A1 (zh) 一种特征提取的方法以及装置
CN109902702A (zh) 目标检测的方法和装置
CN112529015A (zh) 一种基于几何解缠的三维点云处理方法、装置及设备
CN111368972B (zh) 一种卷积层量化方法及其装置
CN112990010B (zh) 点云数据处理方法、装置、计算机设备和存储介质
WO2021203865A9 (zh) 分子结合位点检测方法、装置、电子设备及存储介质
WO2023164933A1 (zh) 一种建筑物建模方法以及相关装置
CN110569926B (zh) 一种基于局部边缘特征增强的点云分类方法
WO2022100607A1 (zh) 一种神经网络结构确定方法及其装置
CN112258565B (zh) 图像处理方法以及装置
EP4053734A1 (en) Hand gesture estimation method and apparatus, device, and computer storage medium
CN115222896B (zh) 三维重建方法、装置、电子设备及计算机可读存储介质
CN108367436A (zh) 针对三维空间中的物***置和范围的主动相机移动确定
CN111950702A (zh) 一种神经网络结构确定方法及其装置
CN114140841A (zh) 点云数据的处理方法、神经网络的训练方法以及相关设备
CN115147798A (zh) 可行驶区域预测方法、模型、装置及车辆
CN113553943B (zh) 目标实时检测方法以及装置、存储介质、电子装置
EP3965071A2 (en) Method and apparatus for pose identification

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22770390

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 22770390

Country of ref document: EP

Kind code of ref document: A1