CN112581597A

CN112581597A - Three-dimensional reconstruction method and device, computer equipment and storage medium

Info

Publication number: CN112581597A
Application number: CN202011407052.2A
Authority: CN
Inventors: 曹逸尘
Original assignee: Shanghai Eye Control Technology Co Ltd
Current assignee: Shanghai Eye Control Technology Co Ltd
Priority date: 2020-12-04
Filing date: 2020-12-04
Publication date: 2021-03-30

Abstract

The application relates to a three-dimensional reconstruction method, a three-dimensional reconstruction device, a computer device and a storage medium. The method comprises the following steps: acquiring a target two-dimensional image; the target two-dimensional image comprises a target object to be subjected to three-dimensional reconstruction; inputting the target two-dimensional image into a pre-trained target neural network to obtain a reconstruction result output by the target neural network; the reconstruction result comprises three-dimensional point cloud data of the target object; the target neural network is obtained by carrying out neural network training based on the sample two-dimensional image, the sample rendering image and the sample point cloud data. The method can be used for realizing the cost of three-dimensional reconstruction and improving the efficiency of three-dimensional reconstruction.

Description

Three-dimensional reconstruction method and device, computer equipment and storage medium

Technical Field

The present application relates to the field of three-dimensional reconstruction technologies, and in particular, to a three-dimensional reconstruction method, an apparatus, a computer device, and a storage medium.

Background

Three-dimensional reconstruction is a simulation of a three-dimensional object in the real world with a computer. At present, the three-dimensional reconstruction technology is widely applied to the fields of medical systems, autonomous navigation, aviation and remote sensing measurement, industrial automation and the like, and brings great convenience to the work and life of people.

In the related art, the three-dimensional reconstruction process includes: and acquiring an image of the target object by using a camera with a depth information acquisition function, and then performing three-dimensional reconstruction on the target object according to the image with the depth information.

However, the camera with the depth information acquisition function is expensive and is not easy to operate, so that the cost of three-dimensional reconstruction is high and the efficiency is low.

Disclosure of Invention

In view of the above, it is necessary to provide a three-dimensional reconstruction method, an apparatus, a computer device, and a storage medium, which can reduce the cost of three-dimensional reconstruction and improve the efficiency.

A method of three-dimensional reconstruction, the method comprising:

acquiring a target two-dimensional image; the target two-dimensional image comprises a target object to be subjected to three-dimensional reconstruction;

inputting the target two-dimensional image into a pre-trained target neural network to obtain a reconstruction result output by the target neural network; the reconstruction result comprises three-dimensional point cloud data of the target object;

the target neural network is obtained by carrying out neural network training based on the sample two-dimensional image, the sample rendering image and the sample point cloud data.

In one embodiment, before the inputting the target two-dimensional image into the pre-trained target neural network and obtaining the reconstruction result output by the target neural network, the method further includes:

acquiring a training sample set; the training sample set comprises a plurality of sample two-dimensional images, a plurality of sample rendering images and sample point cloud data corresponding to each sample rendering image; the sample two-dimensional image comprises a training object;

inputting the sample two-dimensional image, the sample rendering image and sample point cloud data corresponding to the sample rendering image into a neural network to be trained to obtain a training result output by the neural network to be trained; the training result comprises three-dimensional point cloud data of a training object;

and training the neural network based on the training result to obtain the target neural network.

In one embodiment, the neural network to be trained includes a coding sub-network, a decision sub-network and a decoding sub-network; the above-mentioned input sample two-dimensional image, sample rendering image and sample point cloud data that sample rendering image corresponds to in the neural network that waits to be trained, obtain the training result of waiting the neural network output of training, include:

inputting the sample two-dimensional image, the sample rendering image and sample point cloud data corresponding to the sample rendering image into a coding sub-network for dimension transformation to obtain a first hidden space vector corresponding to the sample two-dimensional image, a second hidden space vector corresponding to the sample rendering image and a third hidden space vector corresponding to the sample point cloud data, wherein the first hidden space vector, the second hidden space vector and the third hidden space vector are output by the coding sub-network;

inputting the first hidden space vector, the second hidden space vector and the third hidden space vector into a judgment sub-network for judgment to obtain a judgment result output by the judgment sub-network; the judgment result comprises whether the first hidden space vector is from the sample rendering image and whether the second hidden space vector is from the sample point cloud data;

and inputting the second hidden space vector into a decoding sub-network to obtain a training result output by the decoding sub-network.

In one embodiment, the encoding sub-network includes a first encoder, a second encoder, and a third encoder, and the inputting of the sample point cloud data corresponding to the sample two-dimensional image, the sample rendered image, and the sample rendered image into the encoding sub-network for dimensional transformation to obtain a first hidden space vector corresponding to the sample two-dimensional image, a second hidden space vector corresponding to the sample rendered image, and a third hidden space vector corresponding to the sample point cloud data output by the encoding sub-network includes:

inputting a sample two-dimensional image into a first encoder to obtain a first hidden space vector output by the first encoder;

inputting the sample rendering image into a second encoder to obtain a second hidden space vector output by the second encoder;

and inputting the sample point cloud data into a third encoder to obtain a third hidden space vector output by the third encoder.

In one embodiment, the decision sub-network includes a first decision device and a second decision device; the above inputting the first hidden space vector, the second hidden space vector and the third hidden space vector into the decision sub-network to obtain a decision result output by the decision sub-network includes:

inputting the first hidden space vector and the second hidden space vector into a first decision device to obtain a first decision result output by the first decision device; the first judgment result comprises whether the first implicit space vector comes from the sample rendering image;

inputting the second implicit space vector and the third implicit space vector into a second decision device to obtain a second decision result output by the second decision device; the second decision result includes whether the second implicit space vector is from the sample point cloud data.

In one embodiment, the training of the neural network based on the training result to obtain the target neural network includes:

and training the neural network according to the first judgment result, the second judgment result and the training result to obtain the target neural network.

In one embodiment, the training of the neural network according to the first decision result, the second decision result and the training result to obtain the target neural network includes:

calculating according to the training result and the sample point cloud data to obtain a chamfering distance between the training result and the sample point cloud data;

and adjusting the adjustable parameters in the neural network according to the chamfering distance, the first judgment result and the second judgment result until the chamfering distance, the first judgment result and the second judgment result meet the preset convergence condition, and obtaining the target neural network.

A three-dimensional reconstruction apparatus, the apparatus comprising:

the image acquisition module is used for acquiring a target two-dimensional image; the target two-dimensional image comprises a target object to be subjected to three-dimensional reconstruction;

the reconstruction module is used for inputting the target two-dimensional image into a pre-trained target neural network to obtain a reconstruction result output by the target neural network; the reconstruction result comprises three-dimensional point cloud data of the target object;

In one embodiment, the apparatus further comprises:

the sample acquisition module is used for acquiring a training sample set; the training sample set comprises a plurality of sample two-dimensional images, a plurality of sample rendering images and sample point cloud data corresponding to each sample rendering image; the sample two-dimensional image comprises a training object;

the training result obtaining module is used for inputting the sample two-dimensional image, the sample rendering image and the sample point cloud data corresponding to the sample rendering image into the neural network to be trained to obtain a training result output by the neural network to be trained; the training result comprises three-dimensional point cloud data of a training object;

and the training module is used for training the neural network based on the training result to obtain the target neural network.

In one embodiment, the neural network to be trained includes a coding sub-network, a decision sub-network and a decoding sub-network; the training result obtaining module comprises:

the coding sub-module is used for inputting the sample two-dimensional image, the sample rendering image and the sample point cloud data corresponding to the sample rendering image into the coding sub-network for dimension conversion to obtain a first hidden space vector corresponding to the sample two-dimensional image, a second hidden space vector corresponding to the sample rendering image and a third hidden space vector corresponding to the sample point cloud data, wherein the first hidden space vector, the second hidden space vector and the third hidden space vector are output by the coding sub-network;

the judgment sub-module is used for inputting the first hidden space vector, the second hidden space vector and the third hidden space vector into a judgment sub-network for judgment to obtain a judgment result output by the judgment sub-network; the judgment result comprises whether the first hidden space vector is from the sample rendering image and whether the second hidden space vector is from the sample point cloud data;

and the decoding submodule is used for inputting the second hidden space vector into the decoding sub-network to obtain a training result output by the decoding sub-network.

In one embodiment, the coding sub-module is specifically configured to input the sample two-dimensional image into the first encoder to obtain a first hidden space vector output by the first encoder; inputting the sample rendering image into a second encoder to obtain a second hidden space vector output by the second encoder; and inputting the sample point cloud data into a third encoder to obtain a third hidden space vector output by the third encoder.

In one embodiment, the decision sub-network includes a first decision device and a second decision device; the decision submodule is specifically configured to input the first hidden space vector and the second hidden space vector into a first decision device, so as to obtain a first decision result output by the first decision device; the first judgment result comprises whether the first implicit space vector comes from the sample rendering image; inputting the second implicit space vector and the third implicit space vector into a second decision device to obtain a second decision result output by the second decision device; the second decision result includes whether the second implicit space vector is from the sample point cloud data.

In one embodiment, the training module is specifically configured to perform training of the neural network according to the first decision result, the second decision result, and the training result, so as to obtain the target neural network.

In one embodiment, the training module is specifically configured to calculate according to a training result and sample point cloud data to obtain a chamfer distance between the training result and the sample point cloud data; and adjusting the adjustable parameters in the neural network according to the chamfering distance, the first judgment result and the second judgment result until the chamfering distance, the first judgment result and the second judgment result meet the preset convergence condition, and obtaining the target neural network.

A computer device comprising a memory and a processor, the memory storing a computer program, the processor implementing the following steps when executing the computer program:

A computer-readable storage medium, on which a computer program is stored which, when executed by a processor, carries out the steps of:

According to the three-dimensional reconstruction method, the three-dimensional reconstruction device, the computer equipment and the storage medium, the server acquires the target two-dimensional image, inputs the target two-dimensional image into the pre-trained target neural network, and obtains the reconstruction result output by the target neural network. According to the embodiment of the disclosure, the target object can be reconstructed in three dimensions only by using the two-dimensional image, the cost of three-dimensional reconstruction can be reduced because an expensive camera with a depth information acquisition function is not needed, and the efficiency of three-dimensional reconstruction can be improved because the depth information of the target object is not needed to be acquired.

Drawings

FIG. 1 is a diagram of an exemplary three-dimensional reconstruction method;

FIG. 2 is a schematic flow chart diagram of a three-dimensional reconstruction method in one embodiment;

FIG. 3 is a schematic flow chart diagram illustrating the steps for training a target neural network in one embodiment;

FIG. 4 is a diagram illustrating an exemplary architecture of a neural network to be trained;

FIG. 5 is a flowchart illustrating the steps of obtaining a training result output by a neural network to be trained in one embodiment;

FIG. 6 is a second schematic diagram illustrating the structure of a neural network to be trained according to an embodiment;

FIG. 7 is a flowchart illustrating steps of training a neural network according to a first decision result, a second decision result, and a training result to obtain a target neural network in one embodiment;

FIG. 8 is a block diagram showing the structure of a three-dimensional reconstruction apparatus according to an embodiment;

FIG. 9 is a second block diagram illustrating the structure of a three-dimensional reconstruction apparatus according to an embodiment;

FIG. 10 is a diagram showing an internal structure of a computer device according to an embodiment.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.

The three-dimensional reconstruction method provided by the application can be applied to the application environment shown in fig. 1. The application environment includes a terminal 102 and a server 104, the terminal 102 communicating with the server 104 through a network. In one embodiment, the terminal 102 sends the target two-dimensional image selected by the user to the server 104, and the server receives the target two-dimensional image sent by the terminal 102. The terminal 102 may be, but not limited to, various personal computers, notebook computers, smart phones, tablet computers, and portable wearable devices, and the server 104 may be implemented by an independent server or a server cluster formed by a plurality of servers.

In one embodiment, as shown in fig. 2, a three-dimensional reconstruction method is provided, which is described by taking the method as an example applied to the server in fig. 1, and includes the following steps:

in step 201, a server acquires a target two-dimensional image.

Wherein the target two-dimensional image comprises a target object to be three-dimensionally reconstructed. The target object may be a pedestrian, a building, a vehicle, or the like. The embodiment of the present disclosure does not limit the target object.

The terminal can communicate with the server, a user selects a target two-dimensional image from a plurality of two-dimensional images stored in the terminal, and then the terminal sends the target two-dimensional image to the server; and the server receives the target two-dimensional image sent by the terminal. Or, the user sends a selection instruction to the server through the terminal, and the server receives the selection instruction sent by the terminal and selects a target two-dimensional image from the plurality of locally stored two-dimensional images according to the selection instruction. The embodiment of the present disclosure does not limit the manner of acquiring the target two-dimensional image.

Step 202, inputting the target two-dimensional image into a pre-trained target neural network to obtain a reconstruction result output by the target neural network.

Wherein the reconstruction result comprises three-dimensional point cloud data of the target object; the target neural network is obtained by carrying out neural network training based on the sample two-dimensional image, the sample rendering image and the sample point cloud data.

And the server performs neural network training in advance according to the sample two-dimensional image, the sample rendering image and the sample point cloud data to obtain a target neural network. In practical applications, the training target neural network may combine the sample two-dimensional image with the data set sharenet or the data set ModelNet. The data set includes a sample rendered image and sample point cloud data. It can be understood that, if the training of the neural network is performed only according to the data set, the trained neural network may perform three-dimensional reconstruction according to the rendered image to obtain the point cloud data. However, since the two-dimensional image and the rendered image are different from each other, the point cloud data obtained by inputting the target two-dimensional image into the neural network trained from the data set may be different from the actual situation. And the two-dimensional image of the sample is combined with the data set to train the neural network, and the three-dimensional point cloud data of the target object output by the trained target neural network is closer to the actual condition, namely the reconstruction result output by the target neural network is more accurate.

And after the target neural network is trained, the server inputs the obtained target two-dimensional image into the target neural network, and the target neural network carries out three-dimensional reconstruction on the target object according to the target two-dimensional image and outputs a reconstruction result.

In the three-dimensional reconstruction method, the server acquires a target two-dimensional image, inputs the target two-dimensional image into a pre-trained target neural network, and obtains a reconstruction result output by the target neural network. According to the embodiment of the disclosure, the target object can be reconstructed in three dimensions only by using the two-dimensional image, the cost of three-dimensional reconstruction can be reduced because an expensive camera with a depth information acquisition function is not needed, and the efficiency of three-dimensional reconstruction can be improved because the depth information of the target object is not needed to be acquired.

In an embodiment, as shown in fig. 3, before inputting the target two-dimensional image into the pre-trained target neural network to obtain the reconstruction result output by the target neural network, the method may further include the step of training the target neural network:

in step 301, the server obtains a training sample set.

The training sample set comprises a plurality of sample two-dimensional images, a plurality of sample rendering images and sample point cloud data corresponding to the sample rendering images; the sample two-dimensional image comprises a training object.

The server may obtain a plurality of sample two-dimensional images from the terminal. For example, the terminal acquires an image of a training object through an image acquisition device to obtain a sample two-dimensional image, and then the terminal transmits the sample two-dimensional image to the server, and the server receives the sample two-dimensional image transmitted by the terminal. Alternatively, the server selects a sample two-dimensional image from a plurality of two-dimensional images stored locally. The embodiments of the present disclosure do not limit this.

After the server obtains the multiple sample two-dimensional images, the multiple sample two-dimensional images are combined with the data set ShapeNet or the data set ModelNet to obtain a training sample set. The sample rendering image and the sample point cloud data are in one-to-one correspondence, and the training objects contained in the sample two-dimensional image may be the same as or different from the objects corresponding to the sample image and the sample point cloud data. The embodiments of the present disclosure do not limit this.

Step 302, inputting the sample two-dimensional image, the sample rendering image and the sample point cloud data corresponding to the sample rendering image into the neural network to be trained, and obtaining a training result output by the neural network to be trained.

Wherein the training result comprises three-dimensional point cloud data of the training object.

The server inputs a sample two-dimensional image, a sample rendering image and sample point cloud data corresponding to the sample rendering image into a neural network to be trained, the neural network carries out three-dimensional reconstruction on a training object according to the input sample two-dimensional image, the sample rendering image and the sample point cloud data, and outputs a training result, namely the three-dimensional point cloud data of the training object.

In practical applications, the structure of the neural network may be a challenge-generating network. The embodiments of the present disclosure do not limit this.

And 303, training the neural network based on the training result to obtain the target neural network.

And the server judges according to the training result output by the neural network, and if the training result meets the preset convergence condition, the target neural network is obtained after the training is finished. And if the training result does not accord with the preset convergence condition, adjusting the adjustable parameters in the neural network. And then, the server inputs the other sample two-dimensional image, one sample rendering image and sample point cloud data corresponding to the sample rendering image into the neural network after the parameters are adjusted, and a second training result output by the neural network is obtained. And the server judges according to the second training result until the training result output by the neural network meets the preset convergence condition to obtain the target neural network.

In practical application, the adjustable parameters in the neural network are adjusted, and the gradient can be determined according to the training result and the propagation is carried out reversely. The embodiments of the present disclosure do not limit this.

In the above embodiment, the server obtains a training sample set; and inputting the sample two-dimensional image, the sample rendering image and the sample point cloud data corresponding to the sample rendering image into the neural network to be trained to obtain a training result output by the neural network to be trained. If the neural network is trained only according to the sample rendering image and the sample point cloud data, the target two-dimensional image is input into the trained neural network, and the obtained point cloud data is different from the actual situation. In the embodiment of the present disclosure, the training sample set includes a plurality of sample two-dimensional images, a plurality of sample rendering images, and sample point cloud data corresponding to each sample rendering image, and the three-dimensional point cloud data of the target object output by the trained target neural network is closer to the actual situation, that is, the reconstruction result output by the target neural network is more accurate.

In one embodiment, as shown in fig. 4, the neural network to be trained includes a coding sub-network, a decision sub-network, and a decoding sub-network; as shown in fig. 5, the step of inputting the sample two-dimensional image, the sample rendering image, and the sample point cloud data corresponding to the sample rendering image into the neural network to be trained to obtain a training result output by the neural network to be trained may include:

step 401, the server inputs the sample two-dimensional image, the sample rendering image and the sample point cloud data corresponding to the sample rendering image into the coding sub-network for dimension transformation, so as to obtain a first hidden space vector corresponding to the sample two-dimensional image, a second hidden space vector corresponding to the sample rendering image and a third hidden space vector corresponding to the sample point cloud data, which are output by the coding sub-network.

The coding sub-network is used for carrying out dimension transformation on the sample two-dimensional image, the sample rendering image and the sample point cloud data, so that the sample two-dimensional image, the sample rendering image and the sample point cloud data are transformed into a high-dimension hidden space vector. The transformed dimension may be 124 dimensions or 512 dimensions, and the embodiment of the present disclosure does not limit the transformed dimension.

In one embodiment, as shown in fig. 6, the coding sub-network includes a first coder, a second coder and a third coder, and the sample two-dimensional image is input into the first coder to obtain a first implicit spatial vector output by the first coder; inputting the sample rendering image into a second encoder to obtain a second hidden space vector output by the second encoder; and inputting the sample point cloud data into a third encoder to obtain a third hidden space vector output by the third encoder.

And step 402, inputting the first hidden space vector, the second hidden space vector and the third hidden space vector into a judgment sub-network for judgment to obtain a judgment result output by the judgment sub-network.

The judgment result comprises whether the first hidden space vector is from the sample rendering image and whether the second hidden space vector is from the sample point cloud data.

The server inputs the first hidden space vector, the second hidden space vector and the third hidden space vector into a judgment sub-network, the judgment sub-network judges whether the first hidden space vector is from a sample rendering image according to the first hidden space vector and the second hidden space vector, judges whether the second hidden space vector is from sample point cloud data according to the second hidden space vector and the third hidden space vector, and outputs a judgment result according to the judgment. In the process, the self-adaptation of the two-dimensional image and the rendered image on the hidden space domain, and the self-adaptation of the rendered image and the point cloud data on the hidden space domain are realized. Namely, the decision sub-network makes the first hidden space vector approach to the second hidden space vector, makes the second hidden space vector approach to the third hidden space vector, and finally makes the first hidden space vector approach to the third hidden space vector, so that the point cloud data output by the target neural network through three-dimensional reconstruction according to the two-dimensional image can better conform to the actual situation and is more accurate.

In one embodiment, as shown in fig. 6, a decision sub-network includes a first decider and a second decider; inputting the first hidden space vector and the second hidden space vector into a first decision device to obtain a first decision result output by the first decision device; and inputting the second implicit space vector and the third implicit space vector into a second decision device to obtain a second decision result output by the second decision device.

Wherein the first decision result includes whether the first hidden space vector is from the sample rendered image; the second decision result includes whether the second implicit space vector is from the sample point cloud data.

And 403, inputting the second hidden space vector into the decoding subnetwork to obtain a training result output by the decoding subnetwork.

And the server inputs the second hidden space vector into a decoding sub-network, and the decoding sub-network performs dimension reduction processing on the second hidden space vector and outputs a training result. In practical application, the training result output by the decoding sub-network is point cloud data in an N x 3 matrix form; wherein, N is the number of point clouds, and 3 is the x coordinate, the y coordinate and the z coordinate of the point cloud points.

In one embodiment, as shown in fig. 6, the decoding subnetwork includes a decoder, and the second implicit spatial vector is input into the decoder to obtain the training result output by the decoder.

As shown in fig. 6, the first encoder, the second encoder, the first decider and the decoder form a countermeasure generating neural network, and the second encoder, the third encoder, the second decider and the decoder form another countermeasure generating neural network.

In the process of inputting the sample two-dimensional image, the sample rendering image and the sample point cloud data corresponding to the sample rendering image into the neural network to be trained to obtain the training result output by the neural network to be trained, the server inputs the sample two-dimensional image, the sample rendering image and the sample point cloud data corresponding to the sample rendering image into the coding sub-network for dimension transformation to obtain a first hidden space vector corresponding to the sample two-dimensional image output by the coding sub-network, a second hidden space vector corresponding to the sample rendering image and a third hidden space vector corresponding to the sample point cloud data; inputting the first hidden space vector, the second hidden space vector and the third hidden space vector into a judgment sub-network for judgment to obtain a judgment result output by the judgment sub-network; and inputting the second hidden space vector into a decoding sub-network to obtain a training result output by the decoding sub-network. By the embodiment of the invention, the self-adaptation of the two-dimensional image and the rendered image in the hidden space domain and the self-adaptation of the rendered image and the point cloud data in the hidden space domain are realized, so that the point cloud data output by the target neural network through three-dimensional reconstruction according to the two-dimensional image can better accord with the actual situation and is more accurate.

In one embodiment, training the neural network based on the training result to obtain the target neural network may include: and training the neural network according to the first judgment result, the second judgment result and the training result to obtain the target neural network.

In practical application, a first decision device outputs a first decision result, a second decision device outputs a second decision result, and a decoder outputs a training result; then, the server judges whether the first judgment result, the second judgment result and the training result accord with a preset convergence condition, if not, the gradient is determined and the back propagation is carried out to train the neural network; and if the preset convergence condition is met, obtaining the target neural network. The embodiment of the present disclosure does not limit the preset convergence condition.

In one embodiment, as shown in fig. 7, the step of training the neural network according to the first decision result, the second decision result and the training result to obtain the target neural network may include:

step 501, the server calculates according to the training result and the sample point cloud data to obtain the chamfer distance between the training result and the sample point cloud data.

The chamfering distance of the three-dimensional space is mainly used for three-dimensional reconstruction work. The larger the chamfering distance is, the larger the difference between two groups of point cloud data is; the smaller the chamfering distance is, the smaller the difference between two groups of point cloud data is, namely the better the reconstruction effect is.

And 502, adjusting adjustable parameters in the neural network according to the chamfering distance, the first judgment result and the second judgment result until the chamfering distance, the first judgment result and the second judgment result meet a preset convergence condition, and obtaining the target neural network.

As can be understood, the first decision result and the second decision result meet the preset convergence condition, which indicates that the first hidden space vector is closer to the second hidden space vector, and the second hidden space vector is closer to the third hidden space vector, i.e. the first hidden space vector is closer to the third hidden space vector; and the chamfering distance meets the preset convergence condition, which shows that the point cloud data obtained by the decoder performing three-dimensional reconstruction according to the sample rendering image is less different from the sample point cloud data, and the reconstruction effect is better.

The neural network is trained according to the first judgment result, the second judgment result and the training result to obtain a target neural network, and the server calculates according to the training result and the sample point cloud data to obtain the chamfering distance between the training result and the sample point cloud data; and adjusting the adjustable parameters in the neural network according to the chamfering distance, the first judgment result and the second judgment result until the chamfering distance, the first judgment result and the second judgment result meet the preset convergence condition, and obtaining the target neural network. By the aid of the method and the device, the target neural network can realize domain self-adaptation in the three-dimensional reconstruction process, and the reconstruction effect of the training result output by the target neural network is good.

It should be understood that although the various steps in the flowcharts of fig. 2-7 are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least some of the steps in fig. 2-7 may include multiple steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, which are not necessarily performed in sequence, but may be performed in turn or alternately with other steps or at least some of the other steps.

In one embodiment, as shown in fig. 8, there is provided a three-dimensional reconstruction apparatus including:

an image obtaining module 601, configured to obtain a target two-dimensional image; the target two-dimensional image comprises a target object to be subjected to three-dimensional reconstruction;

a reconstruction module 602, configured to input the target two-dimensional image into a pre-trained target neural network, so as to obtain a reconstruction result output by the target neural network; the reconstruction result comprises three-dimensional point cloud data of the target object;

In one embodiment, as shown in fig. 9, the apparatus further comprises:

a sample obtaining module 603, configured to obtain a training sample set; the training sample set comprises a plurality of sample two-dimensional images, a plurality of sample rendering images and sample point cloud data corresponding to each sample rendering image; the sample two-dimensional image comprises a training object;

a training result obtaining module 604, configured to input the sample two-dimensional image, the sample rendering image, and sample point cloud data corresponding to the sample rendering image into a neural network to be trained, so as to obtain a training result output by the neural network to be trained; the training result comprises three-dimensional point cloud data of a training object;

and a training module 605, configured to perform training of the neural network based on the training result to obtain a target neural network.

In one embodiment, the neural network to be trained includes a coding sub-network, a decision sub-network and a decoding sub-network; the training result obtaining module 604 includes:

For specific limitations of the three-dimensional reconstruction apparatus, reference may be made to the above limitations of the three-dimensional reconstruction method, which are not described herein again. The modules in the three-dimensional reconstruction device can be wholly or partially realized by software, hardware and a combination thereof. The modules can be embedded in a hardware form or independent from a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.

In one embodiment, a computer device is provided, which may be a server, and its internal structure diagram may be as shown in fig. 10. The computer device includes a processor, a memory, and a network interface connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, a computer program, and a database. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The database of the computer device is used for storing three-dimensional reconstruction data. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement a three-dimensional reconstruction method.

Those skilled in the art will appreciate that the architecture shown in fig. 10 is merely a block diagram of some of the structures associated with the disclosed aspects and is not intended to limit the computing devices to which the disclosed aspects apply, as particular computing devices may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.

In one embodiment, a computer device is provided, comprising a memory and a processor, the memory having a computer program stored therein, the processor implementing the following steps when executing the computer program:

In one embodiment, the processor, when executing the computer program, further performs the steps of:

In one embodiment, the neural network to be trained includes a coding sub-network, a decision sub-network and a decoding sub-network; the processor, when executing the computer program, further performs the steps of:

In one embodiment, the encoding subnetwork comprises a first encoder, a second encoder and a third encoder, and the processor when executing the computer program further performs the following steps:

In one embodiment, the decision sub-network comprises a first decision device and a second decision device; the processor, when executing the computer program, further performs the steps of:

In one embodiment, a computer-readable storage medium is provided, having a computer program stored thereon, which when executed by a processor, performs the steps of:

In one embodiment, the computer program when executed by the processor further performs the steps of:

In one embodiment, the neural network to be trained includes a coding sub-network, a decision sub-network and a decoding sub-network; the computer program when executed by the processor further realizes the steps of:

In an embodiment, the coding sub-network comprises a first coder, a second coder and a third coder, the computer program, when being executed by the processor, further realizes the following steps:

In one embodiment, the decision sub-network comprises a first decision device and a second decision device; the computer program when executed by the processor further realizes the steps of:

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, storage, database or other medium used in the embodiments provided herein can include at least one of non-volatile and volatile memory. Non-volatile Memory may include Read-Only Memory (ROM), magnetic tape, floppy disk, flash Memory, optical storage, or the like. Volatile Memory can include Random Access Memory (RAM) or external cache Memory. By way of illustration and not limitation, RAM can take many forms, such as Static Random Access Memory (SRAM) or Dynamic Random Access Memory (DRAM), among others.

The technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.

The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims

1. A method of three-dimensional reconstruction, the method comprising:

2. The method of claim 1, wherein before the inputting the target two-dimensional image into a pre-trained target neural network to obtain a reconstruction result output by the target neural network, the method further comprises:

inputting the sample two-dimensional image, the sample rendering image and sample point cloud data corresponding to the sample rendering image into a neural network to be trained to obtain a training result output by the neural network to be trained; the training result comprises three-dimensional point cloud data of the training object;

and training a neural network based on the training result to obtain the target neural network.

3. The method of claim 2, wherein the neural network to be trained comprises a coding sub-network, a decision sub-network, and a decoding sub-network; inputting the sample two-dimensional image, the sample rendering image and sample point cloud data corresponding to the sample rendering image into a neural network to be trained to obtain a training result output by the neural network to be trained, wherein the training result comprises:

inputting the sample two-dimensional image, the sample rendering image and sample point cloud data corresponding to the sample rendering image into the coding sub-network for dimension transformation to obtain a first hidden space vector corresponding to the sample two-dimensional image, a second hidden space vector corresponding to the sample rendering image and a third hidden space vector corresponding to the sample point cloud data, which are output by the coding sub-network;

inputting the first hidden space vector, the second hidden space vector and the third hidden space vector into the judgment sub-network for judgment to obtain a judgment result output by the judgment sub-network; the decision result includes whether the first hidden space vector is from the sample rendered image and whether the second hidden space vector is from the sample point cloud data;

and inputting the second implicit space vector into the decoding sub-network to obtain the training result output by the decoding sub-network.

4. The method of claim 3, wherein the coding sub-network comprises a first encoder, a second encoder and a third encoder, and the inputting the sample two-dimensional image, the sample rendered image and the sample point cloud data corresponding to the sample rendered image into the coding sub-network for dimensional transformation to obtain a first implicit space vector corresponding to the sample two-dimensional image, a second implicit space vector corresponding to the sample rendered image and a third implicit space vector corresponding to the sample point cloud data output by the coding sub-network comprises:

inputting the sample two-dimensional image into the first encoder to obtain the first hidden space vector output by the first encoder;

inputting the sample rendering image into the second encoder to obtain the second implicit space vector output by the second encoder;

and inputting the sample point cloud data into the third encoder to obtain the third hidden space vector output by the third encoder.

5. The method of claim 3, wherein the decision sub-network comprises a first decider and a second decider; the inputting the first hidden space vector, the second hidden space vector and the third hidden space vector into the decision sub-network to obtain a decision result output by the decision sub-network includes:

inputting the first implicit space vector and the second implicit space vector into the first decision device to obtain the first decision result output by the first decision device; the first decision result comprises whether the first hidden space vector is from the sample rendered image;

inputting the second implicit space vector and the third implicit space vector into the second decision device to obtain the second decision result output by the second decision device; the second decision result includes whether the second implicit spatial vector is from the sample point cloud data.

6. The method of claim 5, wherein the training of the neural network based on the training results to obtain the target neural network comprises:

and training a neural network according to the first judgment result, the second judgment result and the training result to obtain the target neural network.

7. The method of claim 6, wherein the training a neural network according to the first decision result, the second decision result, and the training result to obtain the target neural network comprises:

calculating according to the training result and the sample point cloud data to obtain a chamfer angle distance between the training result and the sample point cloud data;

and adjusting adjustable parameters in the neural network according to the chamfering distance, the first judgment result and the second judgment result until the chamfering distance, the first judgment result and the second judgment result meet a preset convergence condition, so as to obtain the target neural network.

8. A three-dimensional reconstruction apparatus, characterized in that the apparatus comprises:

9. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor, when executing the computer program, implements the steps of the method of any of claims 1 to 7.

10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 7.