CN117953177A

CN117953177A - Differential lifelike point cloud data generation method, differential lifelike point cloud data generation device and storage medium

Info

Publication number: CN117953177A
Application number: CN202311751711.8A
Authority: CN
Inventors: 王子豪; 李艳雄
Original assignee: South China University of Technology SCUT
Current assignee: South China University of Technology SCUT
Priority date: 2023-12-18
Filing date: 2023-12-18
Publication date: 2024-04-30

Abstract

The invention discloses a differentiable vivid point cloud data generation method, a differentiable vivid point cloud data generation device and a storage medium, and belongs to the technical field of computer vision. The method comprises the following steps: acquiring a synthetic three-dimensional model data set and a real point cloud data sample, and preprocessing the synthetic three-dimensional model data set and the real point cloud data sample; constructing a point cloud generator based on differential rendering, and initializing the point cloud generator based on differential rendering; constructing a point cloud discriminator, and initializing the point cloud discriminator; constructing a downstream task point cloud depth network, and initializing the downstream task point cloud depth network; training and optimizing relevant parameters of a point cloud generator and a downstream task point cloud depth network; and testing and using the optimized point cloud generator and the downstream task point cloud depth network. The invention can not only keep the rendering engine to generate the characteristics of real noise which are similar to reality, but also generate the synthesized point cloud data which approximates to the real data point cloud sample provided by the appointed.

Description

Differential lifelike point cloud data generation method, differential lifelike point cloud data generation device and storage medium

Technical Field

The present invention relates to the field of computer vision, and in particular, to a method and apparatus for generating differentiable lifelike point cloud data, and a storage medium.

Background

As an intuitive three-dimensional representation, the point cloud can better describe the three-dimensional structure of an observed object compared with the traditional RGB color image, greatly promotes the perception and understanding of various algorithm applications to the real physical world, and has been widely applied to various industrial scenes such as classification, segmentation, detection and the like. Meanwhile, with the rise of the deep learning field, a great deal of researchers and work are devoted to researching a deep neural network based on a three-dimensional point cloud. Training a three-dimensional point cloud based deep neural network for a given downstream task requires providing a large amount of point cloud data containing a given task tag as a dataset for supervised learning. However, compared with RGB images, data acquisition of three-dimensional point clouds requires special personnel and special three-dimensional sensing equipment, and data labeling difficulty of the three-dimensional point clouds is high, so that in the current real industrial scene, people are difficult to acquire large-scale high-quality real three-dimensional point cloud data and label the data aiming at downstream tasks to realize model training, and only a small number of real data samples can be acquired, so that the existing solution is more prone to using large-scale synthesized point cloud data for model training and then using the acquired small number of real data samples for optimization.

However, it has been proved that the existing synthetic point cloud data set or the existing synthetic point cloud data generation mode has fixed defects and problems, the deep neural network trained based on the synthetic point cloud data set is migrated to the real scene to have insufficient performance and is difficult to be applied on the ground, and the main problem is that the synthetic point cloud data set has the advantages of clean and noiseless point cloud, uniform distribution and complete observation despite large data quantity and easy labeling, and the point cloud collected in the real scene has a large amount of complicated and various noises, such as noise, nonuniform distribution, incomplete observation and the like, so that the synthetic point cloud data and the real point cloud data have huge data distribution difference, thereby seriously affecting the performance of the deep learning network.

The existing synthetic point cloud data set and the generation mode thereof comprise the following steps: (1) The method is characterized in that the method directly carries out uniform surface point sampling on a three-dimensional model, and the data obtained by the method is always too clean, noiseless and excessively large in difference between real conditions; (2) Theoretical modeling is carried out on the real noise, but experiments prove that a more complex and diversified theoretical model of the real noise is difficult to completely cover; (3) By means of rendering introduction data generation, although the fidelity is increased, such work requires experienced staff to adjust complex relevant parameters for downstream tasks and only offline data generation; (4) The point cloud data is directly generated by using the generation type network, and the method is also limited by the fact that the effect of the number of training samples is completely unexpected.

More specifically, as described above, a small batch of real point cloud data samples may be often obtained in a real industrial scene, but no work is currently available to generate large-scale synthetic point cloud data containing labels and approaching to a given real point cloud data sample, so that approaching to the synthetic point cloud of the given real point cloud data sample can promote training of a network model and improve migration performance.

Disclosure of Invention

In order to solve at least one of the technical problems existing in the prior art to a certain extent, the invention aims to provide a differentiable method, a differentiable device and a storage medium for generating lifelike point cloud data.

The technical scheme adopted by the invention is as follows:

A differentiable realistic point cloud data generation method comprising the steps of:

Acquiring a synthetic three-dimensional model data set and a real point cloud data sample, and preprocessing the synthetic three-dimensional model data set and the real point cloud data sample;

constructing a point cloud generator based on differential rendering, and initializing the point cloud generator based on differential rendering;

Constructing a point cloud discriminator, and initializing the point cloud discriminator;

constructing a downstream task point cloud depth network, and initializing the downstream task point cloud depth network;

Training and optimizing relevant parameters of a point cloud generator and a downstream task point cloud depth network;

And testing and using the optimized point cloud generator and the downstream task point cloud depth network.

Further, the constructing a point cloud generator based on differential rendering, and initializing the point cloud generator based on differential rendering, includes:

constructing and initializing a virtual camera based on differential rendering;

constructing and initializing a differentiable stereo matching algorithm;

and constructing and initializing a distinguishable point cloud generation and post-processing module.

Further, the constructing and initializing a differentiable rendering based virtual camera includes:

Building a virtual scene in a differentiable rendering engine;

Creating a monocular camera in a differentiable rendering engine;

a point source is created in the differentiable rendering engine.

Further, the constructing the point cloud discriminator and initializing the point cloud discriminator includes:

Constructing and initializing a feature extractor based on a graph neural network;

the arbiter classification header is constructed and initialized.

Further, the training and optimizing relevant parameters of the point cloud generator and the downstream task point cloud depth network includes:

Generating a realistic synthetic point cloud by the differential rendering-based point cloud generator according to the synthetic three-dimensional model dataset;

And according to the vivid synthesized point cloud and the real point cloud, adopting a preset loss function to monitor and train the point cloud discriminator and the downstream task point cloud depth network.

Further, the generating, by the differential rendering-based point cloud generator, a realistic synthetic point cloud from the synthetic three-dimensional model dataset includes:

For each three-dimensional model in the synthetic three-dimensional model dataset, importing the three-dimensional model into a built virtual scene, and generating an image through rendering by the virtual camera based on differential rendering;

performing pixel-by-pixel point depth calculation on the generated image based on the differential stereo matching algorithm to obtain a corresponding depth map;

And based on the distinguishable point cloud generation and post-processing module, back projecting the pixel points in the depth map back to a three-dimensional space to obtain a corresponding synthesized point cloud serving as a realistic synthesized point cloud.

Further, the importing the three-dimensional model into the built virtual scene for each three-dimensional model in the synthetic three-dimensional model dataset, generating an image by the virtual camera rendering based on the differential rendering, includes:

importing a three-dimensional model, and placing the three-dimensional model at the origin of a world coordinate system of the built virtual scene; randomly generating a virtual monocular camera pointing to the origin of the world coordinate system according to a virtual camera initialization principle, and setting relevant parameters of a point light source according to the point light source initialization principle;

By means of delayed rendering of the differential rendering engine, aiming at each pixel point in an image with the size of H x W, transmitting light rays to the center of each pixel point by taking the optical center of the built virtual camera as an origin, wherein the transmitted light rays intersect with a three-dimensional model in a virtual scene, acquiring specific three-dimensional coordinates of intersection points of the light rays and the three-dimensional model under a world coordinate system, and outputting a three-dimensional coordinate graph;

Converting the three-dimensional coordinate graph into a camera coordinate system of the point light source according to the camera external parameters of the point light source;

and projecting the updated three-dimensional coordinate graph according to the camera internal parameters of the point light source to generate a two-dimensional sampling graph serving as a monocular image.

Further, the performing pixel-by-pixel depth calculation on the generated image based on the differentiable stereo matching algorithm to obtain a corresponding depth map includes:

Generating a cost volume by a differentiable stereo matching algorithm aiming at the monocular image generated by rendering and a pre-provided matching model layout;

Performing maximum operation on the third dimension of the cost volume to obtain maximum similarity, if the similarity is greater than a preset cost threshold value, disposing 0 on the mask, otherwise disposing 1 on the mask, and finally generating a two-dimensional mask with the shape of H x W;

generating a two-dimensional index map corresponding to the cost volume through a differentiable maximum index equation;

for the generated two-dimensional index map, calculating a disparity value according to the matching relation, and giving a matching point pair (u _left,v_lefleftft) and (u _righght,v_righght), wherein the disparity value is u _{rigighrightightt}-u_left, so that a two-dimensional disparity map is generated; aiming at the generated two-dimensional parallax map, a two-dimensional depth map is generated by utilizing a preset baseline parameter baseline and a focal length parameter focallength; the specific conversion formula is as follows:

The invention adopts another technical scheme that:

A differentiable realistic point cloud data generation apparatus comprising:

At least one processor;

At least one memory for storing at least one program;

The at least one program, when executed by the at least one processor, causes the at least one processor to implement the method as described above.

The invention adopts another technical scheme that:

A computer readable storage medium, in which a processor executable program is stored, which when executed by a processor is adapted to carry out the method as described above.

The beneficial effects of the invention are as follows: according to the invention, a differential rendering mode is utilized to construct a point cloud generator, differential modeling simulation is carried out on the working principle of a real three-dimensional sensor, meanwhile, related parameters which are required to be manually set originally are converted into optimizable objects, and a gradient return is utilized to carry out update iteration on the related parameters by a calculation loss function through constructing a point cloud discriminator and a downstream task point cloud depth network, so that the generation of the point cloud can be guaranteed to have complex and various noises similar to real data, the targeted more closely provided real point cloud data can be further generated to synthesize the point cloud data, the migration performance of the network is improved, the requirement of a real industrial scene is met, the problem that the current synthesized data set is difficult to generalize well is solved, and more possibilities are provided for industrial landing. Meanwhile, due to the design of complete end-to-end micro differentiation, the point cloud generation mode and the transposition can be directly embedded into any three-dimensional point cloud deep learning pipeline, and a new solution idea is provided for the point cloud generation direction in the current academic world.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the following description is made with reference to the accompanying drawings of the embodiments of the present invention or the related technical solutions in the prior art, and it should be understood that the drawings in the following description are only for convenience and clarity of describing some embodiments in the technical solutions of the present invention, and other drawings may be obtained according to these drawings without the need of inventive labor for those skilled in the art.

FIG. 1 is a flowchart illustrating steps of a method for generating differentiable real-life point cloud data according to an embodiment of the present invention.

Detailed Description

Embodiments of the present invention are described in detail below, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to like or similar elements or elements having like or similar functions throughout. The embodiments described below by referring to the drawings are illustrative only and are not to be construed as limiting the invention. The step numbers in the following embodiments are set for convenience of illustration only, and the order between the steps is not limited in any way, and the execution order of the steps in the embodiments may be adaptively adjusted according to the understanding of those skilled in the art.

In the description of the present invention, it should be understood that references to orientation descriptions such as upper, lower, front, rear, left, right, etc. are based on the orientation or positional relationship shown in the drawings, are merely for convenience of description of the present invention and to simplify the description, and do not indicate or imply that the apparatus or elements referred to must have a particular orientation, be constructed and operated in a particular orientation, and thus should not be construed as limiting the present invention.

In the description of the present invention, a number means one or more, a number means two or more, and greater than, less than, exceeding, etc. are understood to not include the present number, and above, below, within, etc. are understood to include the present number. The description of the first and second is for the purpose of distinguishing between technical features only and should not be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated or implicitly indicating the precedence of the technical features indicated.

In addition, in the description of the present invention, "and/or" describing the association relationship of the association object means that there may be three relationships, for example, a and/or B may mean: a exists alone, A and B exist together, and B exists alone. The character "/" generally indicates that the context-dependent object is an "or" relationship.

In the description of the present invention, unless explicitly defined otherwise, terms such as arrangement, installation, connection, etc. should be construed broadly and the specific meaning of the terms in the present invention can be reasonably determined by a person skilled in the art in combination with the specific contents of the technical scheme.

As shown in fig. 1, the embodiment provides a method for generating differential realistic point cloud data, which constructs a point cloud generator by utilizing a differential rendering mode, performs differential modeling simulation on the working principle of a real three-dimensional sensor, converts related parameters which are required to be manually set into an optimizable object, and performs update iteration of the related parameters by a calculation loss function through constructing a point cloud discriminator and a downstream task point cloud depth network by utilizing gradient feedback, so that the realization of ensuring that the generated point cloud can have complex and diversified noises similar to real data, and further generating targeted more approximate realistic synthesized point cloud data of the provided real point cloud data is realized, and the migration performance of the network is improved. The method specifically comprises the following steps:

S1, acquiring a synthetic three-dimensional model data set and a real point cloud data sample, and preprocessing the synthetic three-dimensional model data set and the real point cloud data sample.

In this embodiment, the synthetic three-dimensional model dataset ModelNet and the real data point cloud sample DEPTHSCANNET are preprocessed; the three-dimensional model mainly refers to Mesh data and mainly consists of vertexes and patches, and is used for describing the geometric structure of an object; the point cloud sample is composed of only vertices and is used for describing surface information of an observation object.

The method comprises the following steps: for each three-dimensional Mesh model in ModelNet, we randomly rotate around the z-axis of the model itself as data enhancement; then normalizing the dimensions into a cube of unit one; finally, calculating a three-dimensional bounding box and giving a targeted downstream task label to the three-dimensional bounding box; for each real point cloud data sample in DEPTHSCANNET, firstly performing furthest point sampling to downsample the original point cloud sample to 1024 points; then, the down-sampled point cloud is also normalized to be in a cube of a unit I; and (3) carrying out random rotation around the z axis of the point cloud, and simultaneously, directly adding Gaussian noise generated randomly to the coordinates of the point cloud to carry out coordinate disturbance as data enhancement.

S2, constructing a point cloud generator based on differential rendering, and initializing the point cloud generator based on differential rendering.

As an alternative embodiment, step S2 specifically includes steps S21-S23:

s21, constructing and initializing a virtual camera based on differential rendering.

Specifically, the differentiable rendering engine is a special graphics rendering engine, and generates a vivid composite image by providing relevant parameters such as models, materials, illumination, cameras and the like and original elements to the engine; meanwhile, because of the differentiable property, the gradient at the composite image layer can be transmitted back to relevant input parameters and elements such as models, materials, illumination, cameras and the like through a differentiable rendering engine, so that the relevant input parameters and elements can be automatically optimized by utilizing gradient back transmission in cooperation with a subsequent deep learning network and a loss function. In this embodiment we do a virtual camera build based on differentiable rendering by means of the differentiable rendering engine redner.

The step S21 specifically includes steps S211-S213:

S211, building a virtual scene in the differentiable rendering engine.

In this embodiment, we first initialize a virtual scene instance in the differentiable rendering engine, in which a three-dimensional Mesh model can be imported, a camera and a light source can be created, and all subsequent differentiable rendering operations are based on the virtual scene.

S212, creating a monocular camera in the differentiable rendering engine.

In the embodiment, a monocular camera is created in the built virtual scene example, and a differentiable rendering engine renders and generates an image according to a virtual observation result of the camera; the monocular camera specifically refers to a camera model based on a common fisheye projection principle, wherein the camera model participates in camera external parameters through the arranged camera, and the observed three-dimensional space is projected into a two-dimensional image according to a transmission principle. The specific camera internal parameters are mainly used for transmission change, and Chen Tuxiang two-dimensional coordinate systems are converted from a three-dimensional coordinate system of the camera, and specific parameters comprise optical center coordinates (cx, cy) and focal length parameters (fx, fy); specifically, given a point (x, y, z) in the three-dimensional coordinate system of the camera, the conversion relationship of its corresponding point (u, v) in the two-dimensional coordinate system of the image is:

The camera external parameters are mainly used for converting a world three-dimensional coordinate system into a camera three-dimensional coordinate system, and the specific parameters comprise a rotation matrix (R) and a translation vector (t); specifically, given a point (x, y, z) in the world three-dimensional coordinate system, the conversion relationship of its corresponding point (u, v) in the image two-dimensional coordinate system is:

zP_uv＝K(RP_w+t)

In this embodiment, the resolution of the virtual camera is set to 540×540, the resolution of the virtual camera fov is set to 45.0, and the corresponding camera parameters are automatically generated by the rendering engine; for the external parameters of the virtual camera, we define that the virtual camera always faces the origin of the world coordinate system, and the specific camera position is designated and generated by the spherical coordinate system; the spherical coordinate system is controlled by radius, elevation and azimuth, and the conversion equation of the spherical coordinate system and the European space is as follows:

z＝r*cosθ

More specifically, we take the maximum and minimum values of radius, elevation and azimuth as parameters to be optimized, respectively, and generate random camera positions by randomly sampling within the maximum and minimum value range; specifically, we initialize radius to maximum and minimum values of 3.5 and 3.0, elevation to maximum and minimum values of 180.0 and 0.0, and azimuth to maximum and minimum values of 50.0 and 30.0.

S213, creating a point light source in the differentiable rendering engine.

In the example, we create point light sources in the built virtual scene example; in particular, the point light source is also modeled as a transmission camera model, and besides the camera internal parameters and external parameters are required to be set, an illumination intensity pattern is also required to be provided for illuminating an observation scene; specifically, we use the same camera internal reference as the created monocular camera, we change the camera external reference of the point light source on the camera external reference of the created virtual camera, we set the orientation of the point light source to be the same as the orientation of the camera, ensure that the point light source has only one displacement on the x-axis under the three-dimensional coordinate centered on the camera, the displacement value is called baseline, we set the baseline length to be 0.1; at the same time we use a pre-defined speckle pattern (SPECKLE PATTERN) as the luminance pattern, specifically a projected pattern commonly used in indoor three-dimensional sensors, consisting of several random dots alternating in black and white.

S22, constructing and initializing a differentiable stereo matching algorithm.

In this example, we construct and initialize a differentiable stereo matching algorithm (Differentiable Stereo Matching) to achieve image matching from the composite rendered image with a pre-set matching pattern to generate a Depth Map; specifically, a Stereo Matching algorithm (Stereo Matching) is a common and important algorithm in computer vision, and the core of the Stereo Matching algorithm is that a Cost Volume (Cost Volume) is built through a series of similarity calculations for given two pictures, and a Matching relation of each pixel point in the two images is finally obtained through solving the Cost Volume; in particular, in our embodiment, for each pixel of the composite rendering map, the pixel with the highest similarity in the matching pattern layout is found. Specifically, the existing general stereo matching algorithm is basically non-conductive, in order to ensure that the module can normally and accurately return the gradient, a differential stereo matching algorithm is constructed, wherein the differential stereo matching algorithm comprises the steps of constructing a differential cost volume function, carrying out setting initialization of related parameters, constructing a differential maximum index function, carrying out related parameter initialization, and incorporating internal related optimizable parameters into parameters to be optimized;

Specifically, during the process of initializing the module, a plurality of matching templates are generated and provided in advance and stored in the module; for the differentiable cost volume function, we choose a zero-mean normalized cross-Correlation function (Zero Mean Normalized Cross-Correlation) as a similarity descriptor, whose corresponding formula is:

Wherein I ₁,I₂ is left and right two images, u ₁,u₂,v₁,v₂ is left and right pixels, and n is window size.

For the differentiable maximum index function, we have realized the differentiable acquired maximum index function with the aid of the softmax function.

S23, constructing and initializing a distinguishable point cloud generation and post-processing module.

In this example, we construct and initialize the distinguishable point cloud generation and post-processing module, specifically we need to convert the two-dimensional depth map obtained in the process into three-dimensional point cloud, and perform post-processing on the relevant point cloud layer to obtain the final realistic synthesized point cloud, and incorporate the internal relevant optimizable parameters into the parameters to be optimized.

S3, constructing a point cloud discriminator, and initializing the point cloud discriminator.

In the embodiment, a point cloud discriminator is built and initialized; since our goal is to generate a realistic synthetic point cloud as close as possible to the provided real point cloud, we need a way to measure the difference between the synthetic point cloud and the real point cloud, and utilize gradient feedback to adaptively adjust the relevant core parameters of the point cloud generator by calculating the difference. However, unlike the learning training mode of the common depth network, that is, providing accurate input and output data pairs (data pair), calculating the difference between the predicted result and the correct result for training, we cannot construct accurate data pairs of the real point cloud and the input Mesh model, so in this embodiment, we imitate the mode of image generation task to pull the distribution distance of the synthetic point cloud and the real point cloud by constructing a discriminator. Specifically, in our embodiment, the input of the point cloud discriminator is a point cloud, the output is its corresponding classification result, and the purpose is to determine whether the input point cloud is a real point cloud.

As an alternative embodiment, step S3 specifically includes steps S31-S32:

s31, constructing and initializing a characteristic extractor based on the graph neural network.

In the embodiment, feature extraction is performed on the input point cloud; specifically, we choose DGCNN as a point cloud feature extractor, DGCNN is a graph-based mature point cloud neural network that enables the use of two-dimensional convolution on a point cloud by building a graph structure on the point cloud.

Specifically, the DGCNN point cloud feature extractor comprises a plurality of point cloud picture descriptors, a plurality of two-dimensional convolution layers and a plurality of pooling layers; the point cloud image descriptor is used for capturing and describing local features of a current discrete point set, specifically, for each point in the current point set, K adjacent points are calculated to be used as image structures of the current point through a KNN (K-NearestNeighbor)) algorithm, the point cloud shape of the input image descriptor is B, C and N, and the point cloud shape of the input image descriptor is B, C and N; in this embodiment, the method includes 5 independent two-dimensional convolution layers, which are used for performing 2D convolution on the graph output by the point cloud graph descriptor to obtain global and high-dimensional point cloud characteristics; the embodiment comprises 2 kinds of pooling layers, namely a maximum pooling layer and an average pooling layer, and the two pooling layers are used for carrying out dimension reduction fusion on the final cloud characteristics.

Specifically, DGCNN point cloud feature extractor flow: given the inputs b×c×n, a basic feature processing module is: outputting B, C, N and K through the point cloud descriptor, inputting the B, C, N and K into the two-dimensional convolution layer, and then processing the maximum value in the last dimension to reduce the dimension back to B, C and N; and continuously carrying out a plurality of times of processing based on the feature processing module, inputting a plurality of intermediate characterization combinations into the two-dimensional convolution to output complete point cloud features, and finally respectively combining the pooled features through a maximum pooling layer and an average pooling layer to be used as final point cloud feature expression.

S32, constructing and initializing a discriminator classification head.

In the example, a discriminator classifying head needs to be constructed, and a discriminating result is output aiming at the point cloud characteristics; specifically, the classification header comprises a plurality of full connection layers, an activation function and a plurality of random loss layers; the input features repeatedly pass through the full connection layer, the activation function and the random losing layer, and finally the discriminating label is output through the full connection layer.

S4, constructing a downstream task point cloud depth network, and initializing the downstream task point cloud depth network.

S5, training and optimizing relevant parameters of the point cloud generator and the downstream task point cloud depth network.

As an alternative embodiment, step S5 specifically includes steps S51-S52:

s51, generating a vivid synthesized point cloud by utilizing the point cloud generator based on differential rendering according to the synthesized three-dimensional model data set.

Step S51 includes steps S511-S513:

s511, importing the model into the built virtual scene for each three-dimensional Mesh model in the model data set, and generating an image through the virtual camera rendering based on the differential rendering.

In this embodiment, we need to render the imported three-dimensional model according to the specified rendering mode using the differentiable rendering engine to obtain the synthesized two-dimensional image, and the specific steps include A1-A5:

A1, firstly, importing a three-dimensional Mesh model, placing the model at the origin of a world coordinate system of a built virtual scene, endowing the model with gray mold materials, randomly generating a virtual monocular camera pointing to the origin of the world coordinate system according to the virtual camera initialization principle, and setting relevant parameters of a point light source according to the point light source initialization principle.

A2, by means of delayed rendering (DEFERRED RENDERING) of the differentiable rendering engine, for each pixel point (u, v) in an image with the size of H x W, light rays are emitted to the center of each pixel point by taking the optical center of the built virtual camera as an origin, the emitted light rays intersect with a three-dimensional Mesh model in the virtual scene, specific three-dimensional coordinates (x, y, z) of the intersection point under a world coordinate system are acquired, and a three-dimensional coordinate graph (H x W x 3) is output.

A3, converting the three-dimensional coordinate graph according to the camera external parameters of the point light source, specifically converting the three-dimensional coordinate graph from the world coordinate system to the camera coordinate system of the point light source by utilizing the camera external parameters aiming at the three-dimensional point coordinate corresponding to each pixel point in the three-dimensional coordinate graph, and updating the three-dimensional coordinate graph.

A4, projecting the updated three-dimensional coordinate graph according to the camera internal parameters of the point light source to generate a two-dimensional sampling graph, specifically, aiming at the three-dimensional point coordinate corresponding to each pixel point in the three-dimensional coordinate graph, using the camera internal parameters to project the three-dimensional point coordinate to the image coordinate system of the point light source according to the transmission model from the camera coordinate system to obtain the two-dimensional point coordinate, and outputting the two-dimensional sampling graph (H.times.W.times.2).

A5, sampling the brightness pattern of the point light source according to the two-dimensional sampling graph to output a rendering graph of the final monocular camera, specifically, for sampling points (uv) stored in each pixel point in the two-dimensional sampling graph, performing interpolation sampling in a preset brightness pattern by utilizing bilinear sampling to obtain corresponding colors (brightness values) of the sampling points in the brightness pattern, and generating and outputting a final two-dimensional rendering graph (H.times.W.times.1).

S512, carrying out pixel-by-pixel depth calculation on the generated image based on the differentiable stereo matching algorithm to obtain a depth map.

In this embodiment, we need to generate a two-dimensional depth map from the two-dimensional rendering map through a constructed differentiable stereo matching algorithm, and the specific steps include B1-B5:

B1, generating a cost volume by a built differentiable three-dimensional matching function aiming at the two-dimensional rendering map generated by rendering and a pre-provided matching model map; specifically, given a two-dimensional rendering diagram with a shape of h×w×1 and a two-dimensional matching pattern layout with a shape of h×w×1, for each pixel point in the two-dimensional rendering diagram, we calculate the similarity degree with each pixel point of the matching pattern layout with the same line number, and output a cost volume with a shape of h×w×w.

B2, the similarity degree of the pixel points of the partial rendering graph and the pixel points of the matching graph with the same line number is very low, and the noise points very affect the final generation quality, so that a preset cost threshold value is utilized to generate a corresponding two-dimensional cost mask for filtering; specifically, we first perform maximum operation on the third dimension of the cost volume to obtain the maximum similarity, if the maximum similarity is lower than the cost threshold set by us, then disposing 0 on the mask, and disposing 1 on the contrary, and finally generating the two-dimensional mask with the shape of h×w.

B3, generating a two-dimensional index map corresponding to the cost volume through a differentiable maximum index equation; the meaning of constructing the cost volume is to find the matching relation between the rendering graph and the template graph; specifically, we consider that the pixel points (uleft, vleft) in the rendering map and the pixel points (uright, vright) in the matching map are matching relations, and consider that the two pixel points correspond to the same point in the 3D space if their similarity scores are highest; however, directly searching the index corresponding to the maximum value is an unpredictable process, which cannot accept the gradient and return the gradient, so that the index of the matching point can be obtained differentially through our differentiable maximum index equation, and a two-dimensional index map is output, and the shape is H.

B4, calculating a disparity value (disparity) according to the matching relation aiming at the generated two-dimensional index map, and giving a matching point pair (u _left,v_left) and a matching point pair (u _right,v_right), wherein the disparity value is u _right-u_left, so that a two-dimensional disparity map is generated; generating a two-dimensional depth map by utilizing a preset baseline parameter (baseline) and a preset focal length parameter (focallength) aiming at the generated two-dimensional parallax map; the specific conversion formula is as follows:

b5, generating a graph aiming at each two-dimensional rendering, and obtaining a corresponding two-dimensional depth graph and a two-dimensional denoising mask graph in a differential way; it should be noted that, in order to improve the quality of the generated two-dimensional depth map, we can choose to match the same two-dimensional rendering map with a plurality of different matching maps to generate a plurality of two-dimensional depth maps and corresponding mask maps, and obtain a final two-dimensional depth map and two-dimensional mask map in a fusion manner, where the depth map can reach sub-pixel precision.

S513, based on the point cloud generation and post-processing module, back projecting the pixel points in the depth map back to a three-dimensional space to obtain corresponding synthesized point clouds.

In the embodiment, we need to utilize the built camera internal parameter and external parameter of the monocular camera to project the two-dimensional depth map into the three-dimensional space coordinate system of the camera in an inverse way, and we only aim at the place where the two-dimensional mask map is 1 to generate the point cloud; specifically, for each pixel (u, v) of the two-dimensional mask map 1, we obtain a depth value depth corresponding to the two-dimensional depth map, and a calculation formula of the corresponding three-dimensional point cloud coordinates (x, y, z) is:

Outputting three-dimensional point clouds N x 3 in a monocular camera space, wherein N is the number of 1 in a two-dimensional mask; aiming at the differentiable post-processing modules, as the representation of the three-dimensional point cloud is different in sensitivity degree to various noises compared with the representation of the two-dimensional depth map, in order to ensure that the finally generated point cloud has high fidelity and can be used for normal depth network training, various differentiable post-processing modules are constructed, specifically, the three-dimensional point cloud in a three-dimensional coordinate system of a camera is firstly subjected to coordinate system conversion, and the camera external parameters of a monocular camera are utilized to be transferred to a world coordinate system; then we utilize the furthest point sampling strategy to sample down the point cloud, and the number of the point cloud is 1024; and finally, carrying out point cloud preprocessing which is the same as the real point cloud sample, wherein the point cloud preprocessing comprises normalizing the size of the point cloud to a cube of a unit 1, carrying out random rotation according to the z axis of the point cloud, and generating random Gaussian noise to disturb the position of the point cloud. It should be noted that, we can optionally perform point cloud fusion generated by corresponding to the view angles of a plurality of virtual cameras, so as to increase the degree of coverage of the point cloud on the Mesh model; specifically, in post-processing, since we all shift back to the world coordinate system, the point cloud generation results of each view angle can be directly combined.

And S52, monitoring by using a preset loss function according to the realistic synthetic point cloud and the real point cloud by using the point cloud discriminator and the downstream task point cloud depth network.

In this example, we provide a differentiable composite point cloud and a real point cloud to a point cloud discriminator and a downstream point cloud task network, and supervise and learn the differential composite point cloud and the real point cloud through a preset loss function, so as to optimize related parameters including related parameters of a point cloud generator and network parameters of the point cloud discriminator and the downstream point cloud task network, and a specific training process includes C1-C3:

c1, inputting the synthesized point cloud and the real point cloud to a point cloud discriminator at the same time, inputting the generated synthesized point cloud data to a downstream task point cloud network and supervising by using a known semantic tag; on the one hand, the function of the network parameters for completing the designated tasks is optimized in a gradient optimization mode, and on the other hand, the related parameters of the point cloud generator are optimized to enable the synthesized point cloud to further approach to the given real point cloud.

C2, because the point cloud discriminator is used for shortening the distance between the synthesized point cloud and the real point cloud, the optimization supervision is carried out by simulating the mode strategy of the two-dimensional image GAN-based generation network; specifically, we use BCEWithLogitsLoss as a loss function for the second class of discriminators; the method comprises the steps of performing alternating training according to a two-dimensional generation task, aiming at training and optimizing a point cloud discriminator, freezing relevant optimizable parameters of a point cloud generator, endowing a true label of a true point cloud '1', endowing a false label of a synthetic point cloud '0', and learning through a BCEWithLogitsLoss loss function so that the point cloud discriminator can distinguish domain differences; aiming at the training and optimization of a point cloud generator, the relevant optimizable parameters of the point cloud discriminator are frozen to serve as a strong priori, and different from the training of the point cloud discriminator, the true label of the synthesized point cloud '1' is given to the point cloud discriminator, and the loss function of BCEWithLogitsLoss is enabled to enable the point cloud discriminator to consider the synthesized point cloud to be a real point cloud, so that the generated synthesized point cloud is closer to the provided real point cloud.

In addition, we additionally provide a point cloud network of a downstream task to perform semantic learning, and because in practical application, we finally need to use the synthesized realistic point cloud for various point cloud downstream semantic tasks, such as classification, segmentation and the like, we can use the designated downstream task to monitor the synthesized point cloud so as to ensure basic semantic information; specifically, in this example, we train using a point cloud classification network, we train with known class labels using CrossEntropyLoss as a preset supervision function.

And S6, testing and using the optimized point cloud generator and the downstream task point cloud depth network.

In this embodiment, we can use each element we have trained for two specific ways for the set downstream task; the core task of the method is to construct a realistic point cloud synthesizer which can generate real point cloud samples close to the provision in a specified downstream task related to the point cloud, train or optimize the related downstream point cloud network through the large-scale labeled realistic synthesized data, thereby improving the migration performance of the existing point cloud network in a real data environment and accelerating the floor application of a related deep learning algorithm. In particular we provide two ways in which this embodiment can be tested and used:

1) Directly using a point cloud depth network of the downstream task which is already optimized in the training process; because we additionally provide a point cloud depth network branch of a downstream task in the training process for maintaining semantic information of the synthesized point cloud, the network is always trained by only realistic synthesized point cloud, and can be directly applied to the use of the subsequent downstream task, compared with the prior point cloud network trained by only a synthesized data set, the performance of the network is improved.

2) Regenerating a large-scale tagged realistic synthetic point cloud by using a trained point cloud generator; after the training, the optimized point cloud classifier can better generate the synthesized point cloud which approximates to the provided real point cloud property, the synthesized point cloud data set can be generated again in a large scale in an off-line mode according to the downstream task demand, and then a new downstream task point cloud network is retrained based on the labeled synthesized point cloud data set and is applied to a downstream real environment.

In summary, compared with the prior art, the method of the embodiment has the following advantages and beneficial effects:

(1) The method of the embodiment relieves the problem that the depth network algorithm obtained by training on the synthetic point cloud data set is poor in generalization performance under the real point cloud data. According to the method, a point cloud generator based on differential rendering is utilized, setting parameters related to the quality of the synthesized point cloud are opened as parameters to be optimized, and the gradient return is utilized to adaptively optimize and adjust the related parameters by constructing a point cloud discriminator, so that the characteristic that a rendering engine can generate near-lifelike noise is reserved, synthesized point cloud data approaching to a real data point cloud sample provided by a specified mode can be generated, the method is used for training a point cloud network of a subsequent downstream task, migration performance is improved, and floor application is accelerated.

(2) The core parameters are adaptively updated through the fully differentiable point cloud generator, the process of complex parameter adjustment according to specific scenes by manpower is omitted, the requirements of real industrial scenes are met, namely, a small amount of untagged real point cloud data samples can be captured in a specific task, and the advantages of the existing real data samples and differentiable rendering are fully exerted.

(3) The fully micro-differentiable point cloud generator successfully enables the whole point cloud generation process to be directly used as a module in the deep learning framework and can be used in any point cloud downstream task in a plug-and-play mode.

(4) In the testing and using stage, the downstream task point cloud network trained based on the synthetic data set generated by the point cloud generator has obvious performance improvement, and simultaneously provides more choices for mixed use of real and synthetic point cloud data.

The embodiment also provides a device for generating differentiable lifelike point cloud data, which comprises:

At least one processor;

At least one memory for storing at least one program;

The at least one program, when executed by the at least one processor, causes the at least one processor to implement the method illustrated in fig. 1.

The generation device of the differentiable lifelike point cloud data can execute any combination implementation steps of the method embodiments, and has corresponding functions and beneficial effects.

Embodiments of the present application also disclose a computer program product or computer program comprising computer instructions stored in a computer readable storage medium. The computer instructions may be read from a computer-readable storage medium by a processor of a computer device, and executed by the processor, to cause the computer device to perform the method shown in fig. 1.

The embodiment also provides a storage medium which stores instructions or programs for executing the method for generating the differentiable lifelike point cloud data, and when the instructions or programs are run, the method can be executed according to any combination of the embodiments, and the method has the corresponding functions and beneficial effects.

In some alternative embodiments, the functions/acts noted in the block diagrams may occur out of the order noted in the operational illustrations. For example, two blocks shown in succession may in fact be executed substantially concurrently or the blocks may sometimes be executed in the reverse order, depending upon the functionality/acts involved. Furthermore, the embodiments presented and described in the flowcharts of the present invention are provided by way of example in order to provide a more thorough understanding of the technology. The disclosed methods are not limited to the operations and logic flows presented herein. Alternative embodiments are contemplated in which the order of various operations is changed, and in which sub-operations described as part of a larger operation are performed independently.

Furthermore, while the invention is described in the context of functional modules, it should be appreciated that, unless otherwise indicated, one or more of the described functions and/or features may be integrated in a single physical device and/or software module or one or more functions and/or features may be implemented in separate physical devices or software modules. It will also be appreciated that a detailed discussion of the actual implementation of each module is not necessary to an understanding of the present invention. Rather, the actual implementation of the various functional modules in the apparatus disclosed herein will be apparent to those skilled in the art from consideration of their attributes, functions and internal relationships. Accordingly, one of ordinary skill in the art can implement the invention as set forth in the claims without undue experimentation. It is also to be understood that the specific concepts disclosed are merely illustrative and are not intended to be limiting upon the scope of the invention, which is to be defined in the appended claims and their full scope of equivalents.

The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a usb disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.

Logic and/or steps represented in the flowcharts or otherwise described herein, e.g., a ordered listing of executable instructions for implementing logical functions, can be embodied in any computer-readable medium for use by or in connection with an instruction execution system, apparatus, or device, such as a computer-based system, processor-containing system, or other system that can fetch the instructions from the instruction execution system, apparatus, or device and execute the instructions. For the purposes of this description, a "computer-readable medium" can be any means that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device.

More specific examples (a non-exhaustive list) of the computer-readable medium would include the following: an electrical connection (electronic device) having one or more wires, a portable computer diskette (magnetic device), a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber device, and a portable compact disc read-only memory (CDROM). In addition, the computer readable medium may even be paper or other suitable medium on which the program is printed, as the program may be electronically captured, via, for instance, optical scanning of the paper or other medium, then compiled, interpreted or otherwise processed in a suitable manner, if necessary, and then stored in a computer memory.

It is to be understood that portions of the present invention may be implemented in hardware, software, firmware, or a combination thereof. In the above-described embodiments, the various steps or methods may be implemented in software or firmware stored in a memory and executed by a suitable instruction execution system. For example, if implemented in hardware, as in another embodiment, may be implemented using any one or combination of the following techniques, as is well known in the art: discrete logic circuits having logic gates for implementing logic functions on data signals, application specific integrated circuits having suitable combinational logic gates, programmable Gate Arrays (PGAs), field Programmable Gate Arrays (FPGAs), and the like.

In the foregoing description of the present specification, reference has been made to the terms "one embodiment/example", "another embodiment/example", "certain embodiments/examples", and the like, means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, schematic representations of the above terms do not necessarily refer to the same embodiments or examples. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.

While embodiments of the present invention have been shown and described, it will be understood by those of ordinary skill in the art that: many changes, modifications, substitutions and variations may be made to the embodiments without departing from the spirit and principles of the invention, the scope of which is defined by the claims and their equivalents.

While the preferred embodiment of the present application has been described in detail, the present application is not limited to the above embodiments, and various equivalent modifications and substitutions can be made by those skilled in the art without departing from the spirit of the present application, and these equivalent modifications and substitutions are intended to be included in the scope of the present application as defined in the appended claims.

Claims

1.A method for generating differentiable realistic point cloud data, comprising the steps of:

2. The method for generating differentiable realistic point cloud data according to claim 1, wherein the constructing and initializing the differentiable rendering based point cloud generator comprises:

constructing and initializing a virtual camera based on differential rendering;

constructing and initializing a differentiable stereo matching algorithm;

3. The method for generating differentiable real-life point cloud data according to claim 2, wherein the constructing and initializing the differentiable rendering-based virtual camera comprises:

Building a virtual scene in a differentiable rendering engine;

Creating a monocular camera in a differentiable rendering engine;

a point source is created in the differentiable rendering engine.

4. The method for generating differentiable realistic point cloud data of claim 1, wherein the constructing and initializing the point cloud discriminator comprises:

the arbiter classification header is constructed and initialized.

5. The method for generating differentiable realistic point cloud data of claim 2, wherein the training and optimizing relevant parameters of the point cloud generator and the downstream task point cloud depth network comprises:

6. The method of generating differentiable real-life point cloud data of claim 5, wherein generating a real-life synthetic point cloud from the synthetic three-dimensional model dataset by the differentiable rendering-based point cloud generator comprises:

7. The method of generating differentiable real-life point cloud data according to claim 6, wherein the importing the three-dimensional model into the built virtual scene for each three-dimensional model in the synthetic three-dimensional model dataset, generating an image by the differentiable rendering-based virtual camera rendering, comprises:

Converting the three-dimensional coordinate graph into a camera coordinate system of the point light source according to the camera external parameters of the point light source; and projecting the updated three-dimensional coordinate graph according to the camera internal parameters of the point light source to generate a two-dimensional sampling graph serving as a monocular image.

8. The method for generating differentiable real-life point cloud data according to claim 6, wherein the performing pixel-by-pixel depth calculation on the generated image based on the differentiable stereo matching algorithm to obtain the corresponding depth map comprises:

For the generated two-dimensional index map, calculating a disparity value according to the matching relation, and giving a matching point pair (u _left,v_left) and (u _right,v_right), wherein the disparity value is u _right-u_left, so that a two-dimensional disparity map is generated; aiming at the generated two-dimensional parallax map, generating a two-dimensional depth map by utilizing a preset baseline parameter baseline and a focal length parameter focal length; the specific conversion formula is as follows:

9. A differentiable realistic point cloud data generation apparatus, comprising:

At least one processor;

At least one memory for storing at least one program;

The at least one program, when executed by the at least one processor, causes the at least one processor to implement the method of any one of claims 1-8.

10. A computer readable storage medium, in which a processor executable program is stored, characterized in that the processor executable program is for performing the method according to any of claims 1-8 when being executed by a processor.