CN115797565A

CN115797565A - Three-dimensional reconstruction model training method, three-dimensional reconstruction device and electronic equipment

Info

Publication number: CN115797565A
Application number: CN202211649135.1A
Authority: CN
Inventors: 孟庆月; 刘星; 吴进波; 沈铮阳; 赵晨
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2022-12-20
Filing date: 2022-12-20
Publication date: 2023-03-14
Anticipated expiration: 2042-12-20
Also published as: CN115797565B

Abstract

The disclosure provides a three-dimensional reconstruction model training method, a three-dimensional reconstruction device and electronic equipment, relates to the technical field of artificial intelligence, particularly relates to the technical fields of computer vision, augmented reality, virtual reality, deep learning and the like, and can be applied to scenes such as three-dimensional reconstruction, metauniverse and the like. The specific implementation scheme is as follows: obtaining street view image sample data; generating a visual angle ray according to the shooting pose of the street view image sample data; calculating first voxel density distribution information of the view ray based on building data associated with street view image sample data; inputting the view ray into a model to be trained for prediction to obtain second voxel density distribution information of the view ray; updating parameters of the model to be trained based on target information to obtain a three-dimensional reconstruction model for three-dimensional reconstruction, wherein the target information comprises: first voxel density distribution information and second voxel density distribution information. The method and the device can improve the accuracy of the three-dimensional reconstruction model.

Description

Three-dimensional reconstruction model training method, three-dimensional reconstruction device and electronic equipment

Technical Field

The disclosure relates to the technical field of artificial intelligence, in particular to the technical fields of computer vision, augmented reality, virtual reality, deep learning and the like, which can be applied to scenes such as three-dimensional reconstruction, metauniverse and the like, and particularly relates to a three-dimensional reconstruction model training method, a three-dimensional reconstruction device and electronic equipment.

Background

With the development of neural network technology, some three-dimensional reconstruction scenes are subjected to three-dimensional reconstruction based on a neural network at present, and the training of a three-dimensional reconstruction model for three-dimensional reconstruction is mainly based on the training of a traditional model training mode at present.

Disclosure of Invention

The disclosure provides a three-dimensional reconstruction model training method, a three-dimensional reconstruction device and electronic equipment.

According to an aspect of the present disclosure, there is provided a three-dimensional reconstruction model training method, including:

obtaining street view image sample data;

generating a visual angle ray according to the shooting pose of the street view image sample data;

calculating first voxel density distribution information of the view ray based on building data associated with the street view image sample data, wherein the building data is used for representing a building associated with the street view image sample data;

inputting the view ray into a model to be trained for prediction to obtain second voxel density distribution information of the view ray;

updating parameters of the model to be trained based on target information to obtain a three-dimensional reconstruction model for three-dimensional reconstruction, wherein the target information comprises: the first voxel density distribution information and the second voxel density distribution information.

According to an aspect of the present disclosure, there is provided a three-dimensional reconstruction method including:

obtaining street view image data;

generating a visual angle ray according to the shooting pose of the street view image data;

inputting the view ray into a three-dimensional reconstruction model for prediction to obtain voxel density distribution information and color information of the view ray, wherein the three-dimensional reconstruction model is a three-dimensional reconstruction model for three-dimensional reconstruction obtained by updating parameters of a model to be trained based on target information, and the target information comprises: the method comprises the steps of obtaining first voxel density distribution information and second voxel density distribution information, wherein the first voxel density distribution information is the first voxel density distribution information of a visual angle ray sample calculated based on building data related to street view image sample data, the visual angle ray sample is a visual angle ray corresponding to the street view image sample data, and the second voxel density distribution information is voxel density distribution information obtained by predicting the visual angle ray sample by a model to be trained;

and performing three-dimensional reconstruction based on the voxel density distribution information and the color information of the view ray.

According to another aspect of the present disclosure, there is provided a three-dimensional reconstruction model training apparatus including:

the acquisition module is used for acquiring street view image sample data;

the generating module is used for generating a visual angle ray according to the shooting pose of the street view image sample data;

the calculation module is used for calculating first voxel density distribution information of the view ray based on building data related to the street view image sample data, wherein the building data is used for representing a building related to the street view image sample data;

the first prediction module is used for inputting the view ray into a model to be trained for prediction to obtain second voxel density distribution information of the view ray;

an updating module, configured to update parameters of the model to be trained based on target information to obtain a three-dimensional reconstruction model for three-dimensional reconstruction, where the target information includes: the first voxel density distribution information and the second voxel density distribution information.

According to another aspect of the present disclosure, there is provided a three-dimensional reconstruction apparatus including:

the acquisition module is used for acquiring street view image data;

the generating module is used for generating a visual angle ray according to the shooting pose of the street view image data;

the prediction module is configured to input the view ray into a three-dimensional reconstruction model for prediction to obtain voxel density distribution information and color information of the view ray, where the three-dimensional reconstruction model is a three-dimensional reconstruction model for three-dimensional reconstruction obtained by updating parameters of a model to be trained based on target information, and the target information includes: the method comprises the steps of obtaining first voxel density distribution information and second voxel density distribution information, wherein the first voxel density distribution information is the first voxel density distribution information of a visual angle ray sample calculated based on building data related to street view image sample data, the visual angle ray sample is a visual angle ray corresponding to the street view image sample data, and the second voxel density distribution information is voxel density distribution information obtained by predicting the visual angle ray sample by a model to be trained;

and the reconstruction module is used for performing three-dimensional reconstruction based on the voxel density distribution information and the color information of the view ray.

According to another aspect of the present disclosure, there is provided an electronic device including:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform a three-dimensional reconstruction model training method or a three-dimensional reconstruction method provided by the present disclosure.

According to another aspect of the present disclosure, there is provided a non-transitory computer readable storage medium storing computer instructions for causing a computer to execute a three-dimensional reconstruction model training method or a three-dimensional reconstruction method provided by the present disclosure.

According to another aspect of the present disclosure, a computer program product is provided, comprising a computer program which, when executed by a processor, implements the three-dimensional reconstruction model training method or the three-dimensional reconstruction method provided by the present disclosure.

It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present disclosure, nor do they limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.

Drawings

The drawings are included to provide a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:

FIG. 1 is a flow chart of a three-dimensional reconstruction model training method provided by the present disclosure;

FIG. 2 is a flow chart of a three-dimensional reconstruction method provided by the present disclosure;

3 a-3 c are block diagrams of three-dimensional reconstruction model training devices provided by the present disclosure;

fig. 4 is a block diagram of a three-dimensional reconstruction apparatus provided by the present disclosure;

FIG. 5 is a block diagram of an electronic device used to implement embodiments of the present disclosure.

Detailed Description

Exemplary embodiments of the present disclosure are described below with reference to the accompanying drawings, in which various details of the embodiments of the disclosure are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

Referring to fig. 1, fig. 1 is a flowchart of a three-dimensional reconstruction model training method provided by the present disclosure, as shown in fig. 1, including the following steps:

s101, obtaining street view image sample data.

The street view image sample data may be street view image sample data obtained by shooting street view by using a shooting device.

And S102, generating a visual angle ray according to the shooting pose of the street view image sample data.

The generating of the view ray according to the shooting pose of the street view image sample data may be generating the view ray according to the shooting pose of the street view image sample data shot by the shooting device.

The view ray may be a ray extending from a shooting view at which the street view image sample data is shot by the shooting device.

In some embodiments, the perspective ray may be five-dimensional data, such as: (x, y, z, theta, phi), where x, y, z represent three-dimensional coordinates, the view ray is the ray that originates from a point represented by the three-dimensional coordinates, and phi and theta are the two angles to which the view ray corresponds, (x, y, z, theta, phi) represents the ray that originates from a point (x, y, z) in space and points to (theta, phi).

The view ray may be all or part of a view ray of the street view image sample data, or any view ray of the street view image sample data.

Step S103, calculating first voxel density distribution information of the view ray based on building data associated with the street view image sample data, wherein the building data is used for representing buildings associated with the street view image sample data.

The building associated with the street view image sample data may be a building included in the street view image sample data, and the building data may represent data such as a shape and coordinates of the building associated with the street view image sample data.

In some embodiments, the building data may be building block data, i.e. data representing a building.

In some embodiments, the building data includes data on roads, plants, and the like between buildings, in addition to the relevant data of the buildings.

The building data may be obtained by adding building-related data to the street view image sample data.

The first voxel density distribution information may be information indicating that the view ray has a physical probability distribution in the building data, such as a three-dimensional coordinate having a highest probability of indicating that an object is present in the building data in the view ray, or information indicating that an object is present in a plurality of three-dimensional coordinates in the building data in the view ray, where the objects are objects such as buildings, traffic lights, plants, and the like in the building data; alternatively, the first voxel density distribution information may be a probability that the view ray is terminated in the building data, a three-dimensional coordinate having a highest probability of being terminated in the building data among the view rays, or a probability that a plurality of three-dimensional coordinates in the building data among the view rays are terminated.

And S104, inputting the view ray into a model to be trained for prediction to obtain second voxel density distribution information of the view ray.

In some embodiments, the model to be trained may be a Multilayer Perceptron (MLP) model.

In some embodiments, the model to be trained may be a model including two MLPs.

In some embodiments, the model to be trained may be a Neural radiation Fields (NeRF) model.

The second voxel density distribution information is the view ray voxel density distribution information predicted by the model to be trained, such as a probability distribution that the predicted view ray indicates that there is an object or a predicted probability that the view ray is terminated.

Step S105, updating the parameters of the model to be trained based on target information to obtain a three-dimensional reconstruction model for three-dimensional reconstruction, wherein the target information comprises: the first voxel density distribution information and the second voxel density distribution information.

The updating of the parameters of the model to be trained based on the target information to obtain the three-dimensional reconstruction model for three-dimensional reconstruction may be updating the parameters of the model to be trained based on a loss (loss) of the first voxel density distribution information and the second voxel density distribution information. For example: and calculating loss values of the first voxel density distribution information and the second voxel density distribution information by using the first voxel density distribution information and the second voxel density distribution information as predicted values, and updating parameters of the model to be trained according to the loss values.

In the method, the parameters of the model to be trained can be updated based on the voxel density distribution information calculated by the building data and the voxel density distribution information predicted by the model to be trained, so that the accuracy of the three-dimensional reconstruction model can be improved.

In the present disclosure, the above method may be applied to an electronic device, that is, all the steps included in the method are performed by the electronic device, and the electronic device may be an electronic device such as a computer, a server, a mobile phone, and the like.

In one embodiment, the building data is in the form of a Mesh (Mesh) and includes shape data and coordinate data of a building associated with the street view image sample data;

step S103 in the embodiment shown in fig. 1 includes:

and calculating first voxel density distribution information of the view ray based on the shape data and the coordinate data of the building associated with the street view image sample data.

The shape data of the building may be shape pattern data of the building, for example: the buildings in the street view image sample data are replaced by the shape pattern data, or the shape pattern data is added to the periphery of the buildings in the street view image sample data.

The coordinate data may be coordinate data of a building associated with the street view image sample data in a coordinate system corresponding to the street view image sample data.

In the embodiment, the building data is data in a Mesh form, so that the calculated first voxel density distribution information is more reliable, and the training efficiency of the three-dimensional reconstruction model is improved.

It should be noted that, in the present disclosure, the building data is not limited to be data in a Mesh format, and may be data in other formats, such as depth map data.

In an embodiment, the calculating first voxel density distribution information of the perspective ray based on the shape data and the coordinate data of the building associated with the street view image sample data includes:

calculating voxel densities of a plurality of three-dimensional coordinate points on the view ray based on shape data and coordinate data of a building associated with the street view image sample data, wherein the first voxel density distribution information comprises: and the voxel density of the three-dimensional coordinate point with the maximum voxel density in the plurality of three-dimensional coordinate points is represented, wherein the voxel density represents the probability of an object in the three-dimensional coordinate point.

In this embodiment, based on the shape data and the coordinate data of the building associated with the street view image sample data, the voxel density of the three-dimensional coordinate associated with the building can be obtained, for example, the voxel density of the three-dimensional coordinate on the surface of the building indicates that an object exists, and when the view ray is known, the voxel densities of the three-dimensional coordinate points on the view ray can be obtained by calculation based on the shape data and the coordinate data of the building associated with the street view image sample data, so as to obtain the first voxel density distribution information.

The three-dimensional coordinate point having the highest voxel density may represent a three-dimensional coordinate point of an object, such as a building, existing on the view ray.

In this embodiment, the first voxel density distribution information can be accurately calculated from the above-described shape data and coordinate data.

It should be noted that the first voxel density distribution information is not limited to be calculated in the above manner in the present disclosure. For example: in some embodiments, the first voxel density distribution information may be calculated directly based on depth values of a depth map of the building data, such as a depth value of 6 (unit is not limited) on the view ray, which indicates that the voxel density is highest at a three-dimensional coordinate point of the view ray from the starting point to 6; or, based on the shape data and the coordinate data of the building associated with the street view image sample data, directly predicting the three-dimensional coordinate point with the maximum voxel density on the view ray through the neural network model.

In an embodiment, the updating the parameters of the model to be trained based on the target information to obtain a three-dimensional reconstruction model for three-dimensional reconstruction includes:

calculating a cross entropy of the first voxel density distribution information and the second voxel density distribution information, and calculating difference information of the cross entropy and an entropy of the first voxel density distribution information;

and updating the parameters of the model to be trained based on the difference information to obtain a three-dimensional reconstruction model for three-dimensional reconstruction.

It should be noted that the cross entropy and the calculation manner of the entropy are not limited in this disclosure.

The above-mentioned updating the parameters of the model to be trained based on the difference information to obtain the three-dimensional reconstruction model for three-dimensional reconstruction may be that the parameters of the model to be trained are updated based on the difference information until the model to be trained converges to obtain the three-dimensional reconstruction model for three-dimensional reconstruction.

In this embodiment, the parameters of the model to be trained are updated based on the difference information, so that the accuracy of the three-dimensional reconstruction model can be improved, because the difference information can better reflect the difference between the first voxel density distribution information and the second voxel density distribution information.

It should be noted that, in the present disclosure, the parameters of the model to be trained are not limited to be updated in the above manner. For example:

in some embodiments, the following divergence loss function (L) may be employed directly _σ ＝KL(σ _gt | σ)) of the second voxel density distribution information, wherein σ is the second voxel density distribution information, and σ is the second voxel density distribution information _gt First voxel density distribution information.

In some embodiments, a mean square loss function (L) may also be employed _σ ＝mse(σ,σ _gt ) Update parameters of the model to be trained, wherein σ is the second voxel density distribution information, σ _gt First voxel density distribution information.

In one embodiment, the target information further comprises: real color information of the view ray in the street view image sample data and predicted color information of the view ray predicted by the training model;

the updating the parameters of the model to be trained based on the target information to obtain a three-dimensional reconstruction model for three-dimensional reconstruction includes:

updating the parameters of the model to be trained based on a target loss function to obtain a three-dimensional reconstruction model for three-dimensional reconstruction;

wherein the target loss function comprises a first loss function and a second loss function, and the input of the first loss function comprises: the first voxel density distribution information and the second voxel density distribution information, the input of the second loss function comprising: the true color information and the predicted color information.

The real color information of the view ray in the street view image sample data may be real color information of the view ray directly identified from the street view image sample image.

In some embodiments, the true color information of the view ray may be true color information of a three-dimensional coordinate point of the view ray at which the voxel density is the greatest.

In the present disclosure, the color information may be red, green, and blue (RGB) information.

In some embodiments, the first loss function may beDivergence loss function (L) _σ ＝KL(σ _gt | σ)) or the mean square loss function (L) _σ ＝mse(σ,σ _gt ))；

In some embodiments, the second loss function may be a divergence loss function (L) _σ ＝KL(c _g C)) or the mean square loss function (L) _σ ＝mse(c,c _g ) C) is the above true color information, c is _g To predict color information.

In some embodiments, the target loss function may be L = L _σ +L _c 。

In the embodiment, the parameters of the model to be trained can be updated based on the two dimensions of the voxel density distribution information and the color information, so that the accuracy of the three-dimensional reconstruction model can be further improved, and the training efficiency of the three-dimensional reconstruction model can be further improved.

It is noted that the present disclosure, in some embodiments, may train the three-dimensional reconstruction model based only on voxel density distribution information.

In one embodiment, the model to be trained includes a first network and a second network, and the inputting the perspective ray into the model to be trained for prediction to obtain the second voxel density distribution information of the perspective ray includes:

inputting the view ray into the first network for prediction to obtain second voxel density distribution information and intermediate characteristic information of the view ray, wherein the intermediate characteristic information is characteristic information which is output by the first network and is associated with the view ray;

the method further comprises the following steps:

and inputting the intermediate characteristic information and the view ray into the second network for prediction to obtain the predicted color information of the view ray.

The first network and the second network may be MLPs, that is, the three-dimensional reconstruction model includes two MLPs. In some embodiments, at least one of the first network and the second network may be another network, for example: single layer perceptrons or other classification models.

The intermediate feature information may be another result than the second voxel density distribution information, which is output by the first network after the first network receives the line-of-sight ray and predicts the line-of-sight ray.

The prediction process by the first network and the second network may be represented by the following sub-formula:

F:(d)→(c,σ)

where d denotes a view ray, c denotes the predicted color, and σ denotes the second voxel density distribution information.

In this embodiment, the parameters of the first network may be updated based on the first voxel density distribution information and the second voxel density distribution information, or the parameters of the first network and the second network may be updated based on the first voxel density distribution information and the second voxel density distribution information; and updating the first network and the second network based on the true color information and the predicted color information, or updating the second network based on the true color information and the predicted color information.

In this embodiment, the voxel density distribution information predicted by the first network and the color information predicted by the second network may be realized, and the voxel density distribution information predicted by the second network is not required, so that the model training efficiency may be improved.

It should be noted that, the present disclosure does not limit the model to be trained to include a first network and a second network, for example: in some embodiments, the model to be trained may include only one network, and the output of the network may include voxel density distribution information and color information.

According to the method and the device, the voxel density distribution information calculated based on the building data and the voxel density distribution information predicted by the model to be trained can be updated to the parameters of the model to be trained, and therefore the accuracy of the three-dimensional reconstruction model can be improved.

Referring to fig. 2, fig. 2 is a flowchart of a three-dimensional reconstruction method provided by the present disclosure, as shown in fig. 2, including the following steps:

step S201, street view image data are obtained.

And S202, generating a visual angle ray according to the shooting pose of the street view image data.

The street view image data and the view ray may refer to the corresponding description of the above embodiments, and are not described herein again.

Step S203, inputting the view ray into a three-dimensional reconstruction model for prediction to obtain voxel density distribution information and color information of the view ray, wherein the three-dimensional reconstruction model is a three-dimensional reconstruction model for three-dimensional reconstruction obtained by updating parameters of a model to be trained based on target information, and the target information comprises: the method comprises the steps of obtaining first voxel density distribution information and second voxel density distribution information, wherein the first voxel density distribution information is the first voxel density distribution information of a visual angle ray sample calculated based on building data related to street view image sample data, the visual angle ray sample is a visual angle ray corresponding to the street view image sample data, and the second voxel density distribution information is voxel density distribution information obtained by predicting the visual angle ray sample by a model to be trained;

the three-dimensional reconstruction model may refer to the corresponding description of the above embodiments, which is not described herein again. It should be noted that, in this embodiment, the three-dimensional reconstruction model may be a three-dimensional reconstruction model trained by any one of the three-dimensional reconstruction model training methods provided in this disclosure.

And S204, performing three-dimensional reconstruction based on the voxel density distribution information and the color information of the view ray.

The three-dimensional reconstruction based on the voxel density distribution information and the color information of the view ray may be performed by constructing three-dimensional image data corresponding to the street view image data based on the voxel density distribution information and the color information of the view ray. For example: and constructing three-dimensional image data corresponding to the street view image data based on the voxel density distribution information and the color information of the plurality of view rays corresponding to the street view image data.

In this embodiment, the three-dimensional reconstruction model is obtained by updating the parameters of the model to be trained based on the target information, so that the voxel density distribution information and the color information predicted by the three-dimensional reconstruction model are more accurate, and the accuracy of the three-dimensional reconstruction is further improved.

Referring to fig. 3a, fig. 3a is a three-dimensional reconstruction model training apparatus provided by the present disclosure, and as shown in fig. 3a, the three-dimensional reconstruction model training apparatus 300 includes:

an obtaining module 301, configured to obtain street view image sample data;

a generating module 302, configured to generate a viewing ray according to a shooting pose of the street view image sample data;

a calculating module 303, configured to calculate first voxel density distribution information of the view ray based on building data associated with the street view image sample data, where the building data is used to represent a building associated with the street view image sample data;

the first prediction module 304 is configured to input the view ray to a model to be trained for prediction, so as to obtain second voxel density distribution information of the view ray;

an updating module 305, configured to update parameters of the model to be trained based on target information to obtain a three-dimensional reconstruction model for three-dimensional reconstruction, where the target information includes: the first voxel density distribution information and the second voxel density distribution information.

In one embodiment, the building data is in the form of a Mesh, and the building data includes shape data and coordinate data of a building associated with the street view image sample data;

the calculation module 303 is configured to:

In one embodiment, the calculation module 303 is configured to:

In one embodiment, as shown in fig. 3b, the update module 305 comprises:

a calculation unit 3051, configured to calculate a cross entropy of the first voxel density distribution information and the second voxel density distribution information, and calculate difference information between the cross entropy and an entropy of the first voxel density distribution information;

an updating unit 3052, configured to update parameters of the model to be trained based on the difference information, so as to obtain a three-dimensional reconstruction model for three-dimensional reconstruction.

the update module 305 is configured to:

In one embodiment, the model to be trained includes a first network and a second network, and the first prediction module 304 is configured to:

as shown in fig. 3c, the apparatus further comprises:

a second prediction module 306, configured to input the intermediate feature information and the view ray into the second network for prediction, so as to obtain the predicted color information of the view ray.

The three-dimensional reconstruction model training device provided by the disclosure can realize each process realized by the three-dimensional reconstruction model training method provided by the disclosure, and achieve the same technical effect, and is not repeated here for avoiding repetition.

Referring to fig. 4, fig. 4 is a three-dimensional reconstruction apparatus provided by the present disclosure, and as shown in fig. 4, a three-dimensional reconstruction apparatus 400 includes:

an obtaining module 401, configured to obtain street view image data;

a generating module 402, configured to generate a viewing ray according to a shooting pose of the street view image data;

the predicting module 403 is configured to input the view ray into a three-dimensional reconstruction model for prediction, so as to obtain voxel density distribution information and color information of the view ray, where the three-dimensional reconstruction model is a three-dimensional reconstruction model for three-dimensional reconstruction obtained by updating parameters of a model to be trained based on target information, and the target information includes: the method comprises the steps of obtaining first voxel density distribution information and second voxel density distribution information, wherein the first voxel density distribution information is the first voxel density distribution information of a visual angle ray sample calculated based on building data related to street view image sample data, the visual angle ray sample is a visual angle ray corresponding to the street view image sample data, and the second voxel density distribution information is voxel density distribution information obtained by predicting the visual angle ray sample by a model to be trained;

and a reconstruction module 404 configured to perform three-dimensional reconstruction based on the voxel density distribution information and the color information of the view ray.

The three-dimensional reconstruction device provided by the disclosure can realize each process realized by the three-dimensional reconstruction method provided by the disclosure, and achieve the same technical effect, and for avoiding repetition, the details are not repeated here.

In the technical scheme of the disclosure, the acquisition, storage, application and the like of the personal information of the related user all accord with the regulations of related laws and regulations, and do not violate the good customs of the public order.

The present disclosure also provides an electronic device, a readable storage medium, and a computer program product according to embodiments of the present disclosure.

Wherein, above-mentioned electronic equipment includes: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform a three-dimensional reconstruction model training method or a three-dimensional reconstruction method provided by the present disclosure.

The readable storage medium stores computer instructions for causing the computer to execute the three-dimensional reconstruction model training method or the three-dimensional reconstruction method provided by the present disclosure.

The computer program product includes a computer program, and the computer program realizes the three-dimensional reconstruction model training method or the three-dimensional reconstruction method provided by the present disclosure when being executed by a processor.

FIG. 5 illustrates a schematic block diagram of an example electronic device 500 that can be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. Electronic devices may also represent various forms of mobile devices, such as personal digital processors, cellular telephones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the disclosure described and/or claimed herein.

As shown in fig. 5, the device 500 comprises a computing unit 501 which may perform various suitable actions and processes according to a computer program stored in a Read Only Memory (ROM) 502 or a computer program loaded from a storage unit 508 into a Random Access Memory (RAM) 503. In the RAM 503, various programs and data required for the operation of the device 500 can also be stored. The computing unit 501, the ROM 502, and the RAM 503 are connected to each other via a bus 504. An input/output (I/O) interface 505 is also connected to bus 504.

A number of components in the device 500 are connected to the I/O interface 505, including: an input unit 506 such as a keyboard, a mouse, or the like; an output unit 507 such as various types of displays, speakers, and the like; a storage unit 508, such as a magnetic disk, optical disk, or the like; and a communication unit 509 such as a network card, modem, wireless communication transceiver, etc. The communication unit 509 allows the device 500 to exchange information/data with other devices through a computer grid such as the internet and/or various telecommunication grids.

The computing unit 501 may be a variety of general-purpose and/or special-purpose processing components having processing and computing capabilities. Some examples of the computing unit 501 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various dedicated Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, and so forth. The calculation unit 501 performs the respective methods and processes described above, such as a three-dimensional reconstruction model training method or a three-dimensional reconstruction method. For example, in some embodiments, the three-dimensional reconstruction model training method or the three-dimensional reconstruction method may be implemented as a computer software program tangibly embodied in a machine-readable medium, such as storage unit 508. In some embodiments, part or all of the computer program may be loaded and/or installed onto the device 500 via the ROM 502 and/or the communication unit 509. When the computer program is loaded into the RAM 503 and executed by the computing unit 501, one or more steps of the three-dimensional reconstruction model training method or the three-dimensional reconstruction method described above may be performed. Alternatively, in other embodiments, the computing unit 501 may be configured to perform a three-dimensional reconstruction model training method or a three-dimensional reconstruction method by any other suitable means (e.g., by means of firmware).

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuitry, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), system on a chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.

Program code for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the program codes, when executed by the processor or controller, cause the functions/operations specified in the flowchart and/or block diagram to be performed. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a grid browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication grid). Examples of communication grids include: local Area Networks (LANs), wide Area Networks (WANs), and the Internet.

The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communications grid. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server may be a cloud server, a server of a distributed system, or a server with a combined blockchain.

It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present disclosure may be executed in parallel or sequentially or in different orders, and are not limited herein as long as the desired results of the technical solutions disclosed in the present disclosure can be achieved.

The above detailed description should not be construed as limiting the scope of the disclosure. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present disclosure should be included in the scope of protection of the present disclosure.

Claims

1. A three-dimensional reconstruction model training method comprises the following steps:

obtaining street view image sample data;

calculating first voxel density distribution information of the view rays based on building data associated with the street view image sample data, wherein the building data is used for representing a building associated with the street view image sample data;

2. The method of claim 1, wherein the building data is in the form of a Mesh and includes shape data and coordinate data of a building associated with the street view image sample data;

the calculating the first voxel density distribution information of the view ray based on the building data associated with the street view image sample data comprises:

3. The method of claim 2, wherein the calculating first voxel density distribution information of the perspective ray based on shape data and coordinate data of a building associated with the street view image sample data comprises:

calculating voxel densities of a plurality of three-dimensional coordinate points on the view ray based on shape data and coordinate data of a building associated with the street view image sample data, wherein the first voxel density distribution information comprises: and the voxel density of the three-dimensional coordinate point with the maximum voxel density in the plurality of three-dimensional coordinate points is represented, wherein the voxel density represents the probability of the three-dimensional coordinate point having the object.

4. The method according to any one of claims 1 to 3, wherein the updating parameters of the model to be trained based on the target information to obtain a three-dimensional reconstruction model for three-dimensional reconstruction includes:

and updating the parameters of the model to be trained on the basis of the difference information to obtain a three-dimensional reconstruction model for three-dimensional reconstruction.

5. The method of any of claims 1-3, wherein the target information further comprises: real color information of the view ray in the street view image sample data and predicted color information of the view ray predicted by the training model;

the updating of the parameters of the model to be trained based on the target information to obtain a three-dimensional reconstruction model for three-dimensional reconstruction includes:

6. The method of claim 5, wherein the model to be trained includes a first network and a second network, and the inputting the perspective ray into the model to be trained for prediction to obtain the second voxel density distribution information of the perspective ray includes:

the method further comprises the following steps:

7. A method of three-dimensional reconstruction, comprising:

obtaining street view image data;

8. A three-dimensional reconstruction model training apparatus comprising:

the acquisition module is used for acquiring street view image sample data;

9. The apparatus of claim 8, wherein the building data is in the form of a Mesh and includes shape data and coordinate data of a building associated with the street view image sample data;

the calculation module is configured to:

10. The apparatus of claim 9, the computing module to:

11. The apparatus of any of claims 8 to 10, wherein the update module comprises:

a calculation unit configured to calculate a cross entropy of the first voxel density distribution information and the second voxel density distribution information, and calculate difference information of the cross entropy and an entropy of the first voxel density distribution information;

and the updating unit is used for updating the parameters of the model to be trained based on the difference information to obtain a three-dimensional reconstruction model for three-dimensional reconstruction.

12. The apparatus of any of claims 8 to 10, wherein the target information further comprises: real color information of the view ray in the street view image sample data and predicted color information of the view ray predicted by the training model;

the update module is to:

13. The apparatus of claim 12, wherein the model to be trained comprises a first network and a second network, the first prediction module to:

the device further comprises:

and the second prediction module is used for inputting the intermediate characteristic information and the view ray into the second network for prediction to obtain the predicted color information of the view ray.

14. A three-dimensional reconstruction apparatus comprising:

the acquisition module is used for acquiring street view image data;

15. An electronic device, comprising:

at least one processor; and

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-6 or to enable the at least one processor to perform the method of claim 7.

16. A non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of any one of claims 1-6 or causing the computer to perform the method of claim 7.

17. A computer program product comprising a computer program which, when executed by a processor, implements the method according to any one of claims 1-6, or which, when executed by a processor, implements the method according to claim 7.