CN115359170A - Scene data generation method and device, electronic equipment and storage medium - Google Patents

Scene data generation method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN115359170A
CN115359170A CN202211276170.3A CN202211276170A CN115359170A CN 115359170 A CN115359170 A CN 115359170A CN 202211276170 A CN202211276170 A CN 202211276170A CN 115359170 A CN115359170 A CN 115359170A
Authority
CN
China
Prior art keywords
data
voxel
scene
target
sampling
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202211276170.3A
Other languages
Chinese (zh)
Other versions
CN115359170B (en
Inventor
郑学兴
赵晨
孙昊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN202211276170.3A priority Critical patent/CN115359170B/en
Publication of CN115359170A publication Critical patent/CN115359170A/en
Application granted granted Critical
Publication of CN115359170B publication Critical patent/CN115359170B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T15/003D [Three Dimensional] image rendering
    • G06T15/04Texture mapping
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T15/003D [Three Dimensional] image rendering
    • G06T15/005General purpose rendering architectures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Graphics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Databases & Information Systems (AREA)
  • Multimedia (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Image Generation (AREA)

Abstract

The disclosure provides a scene data generation method and device, electronic equipment and a storage medium, relates to the field of artificial intelligence, in particular to the technical fields of deep learning, computer vision, augmented reality, virtual reality and the like, and can be applied to scenes such as the meta universe and the like. The specific implementation scheme of the scene data generation method is as follows: sampling voxel grid data output by a nerve radiation field at least twice in different granularities to obtain at least two sampling results; wherein the nerve radiation field is constructed from a plurality of images of the target scene at a plurality of viewing angles; extracting texture features according to each sampling result of the at least two sampling results to obtain at least two texture features respectively corresponding to the at least two sampling results; and generating a group of scene data expressing the target scene according to each sampling result and the texture feature corresponding to each sampling result, and obtaining at least two groups of scene data expressing the target scene under different granularities.

Description

Scene data generation method and device, electronic equipment and storage medium
Technical Field
The present disclosure relates to the field of artificial intelligence, and in particular to the technical fields of deep learning, computer vision, augmented reality, virtual reality, and the like, and can be applied to scenes such as the meta universe.
Background
With the development of computer technology and network technology, image rendering technology and neural rendering (neural rendering) technology, which improves image rendering technology by integrating with a neural network, are rapidly developing. Image rendering techniques aim at generating two-dimensional images from three-dimensional models to give the user a visual perception closer to the real world, requiring the generation of scene data representing a three-dimensional model of the scene prior to image rendering. The two-dimensional image can be obtained by loading the scene data and performing image rendering.
Disclosure of Invention
The present disclosure is directed to a method and an apparatus for generating scene data, an electronic device, and a storage medium, so that loading of scene data may be compatible with devices with different processing capabilities.
According to an aspect of the present disclosure, there is provided a method for generating scene data, including: sampling voxel grid data output by a nerve radiation field at least twice in different granularities to obtain at least two sampling results; wherein the nerve radiation field is constructed from a plurality of images of the target scene at a plurality of viewing angles; extracting texture features according to each sampling result of the at least two sampling results to obtain at least two texture features respectively corresponding to the at least two sampling results; and generating a group of scene data expressing the target scene according to each sampling result and the texture feature corresponding to each sampling result, and obtaining at least two groups of scene data expressing the target scene at different granularities.
According to another aspect of the present disclosure, there is provided a scene data generation apparatus including: the sampling module is used for sampling the voxel grid data output by the nerve radiation field at least twice in different granularities to obtain at least two sampling results; wherein the nerve radiation field is constructed from a plurality of images of the target scene at a plurality of viewing angles; the texture feature extraction module is used for extracting texture features according to each sampling result of the at least two sampling results to obtain at least two texture features respectively corresponding to the at least two sampling results; and the scene data generation module is used for generating a group of scene data expressing the target scene according to each sampling result and the texture feature corresponding to each sampling result to obtain at least two groups of scene data expressing the target scene at different granularities.
According to another aspect of the present disclosure, there is provided an electronic device including: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor, the instructions being executable by the at least one processor to enable the at least one processor to perform the method of generating scene data provided by the present disclosure.
According to another aspect of the present disclosure, there is provided a non-transitory computer-readable storage medium storing computer instructions for causing a computer to perform a method of generating scene data provided by the present disclosure.
According to another aspect of the present disclosure, there is provided a computer program product comprising computer programs/instructions which, when executed by a processor, implement the scene data generation method provided by the present disclosure.
It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present disclosure, nor do they limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.
Drawings
The drawings are included to provide a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:
fig. 1 is an application scenario diagram of a method and apparatus for generating scenario data according to an embodiment of the present disclosure;
fig. 2 is a flowchart schematic diagram of a scene data generation method according to an embodiment of the disclosure;
fig. 3 is an implementation schematic diagram of a generation method of scene data according to an embodiment of the present disclosure;
FIG. 4 is a schematic diagram of the principle of generating scene data representing a target scene in accordance with an embodiment of the present disclosure;
FIG. 5 is a schematic diagram of obtaining target scene data according to an embodiment of the present disclosure;
FIG. 6 is a diagram of an example application of the generation and loading of scene data, according to an embodiment of the present disclosure;
fig. 7 is a block diagram of a configuration of a scene data generation apparatus according to an embodiment of the present disclosure; and
fig. 8 is a block diagram of an electronic device for implementing a scene data generation method according to an embodiment of the present disclosure.
Detailed Description
Exemplary embodiments of the present disclosure are described below with reference to the accompanying drawings, in which various details of the embodiments of the disclosure are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
The terminology used in the present disclosure is explained below.
Neural radiation Fields, referred to as NeRF for short, are a technique for reconstructing three-dimensional scenes using multi-view images. NeRF uses a set of multiview maps to obtain a complete three-dimensional scene by optimizing a potential continuum of voxel scene equations.
Sparse Neural radiation network, sparse Neural radiation Grid, SNeRG for short, the principle of this network is to combine a Neural radiation Field (NeRF) with a pre-computation and storage technique (which may be referred to as a baking (bake) technique) intended to bake a continuous Neural volume scene representation of NeRF to a discrete coefficient Neural radiation network for real-time rendering. The implementation of this network is a way that NeRF architecture reshaping is combined with a sparse voxel grid (sparse voxel grid) representation based on learning feature vectors.
Voxel is an abbreviation of Volume element (Volume Pixel), and a solid containing a voxel can be represented by a solid rendering or by extracting a polygon isosurface of a given threshold contour. By voxels, the 3D space can be gridded and each grid feature is assigned. The voxel may be a volumetric point (volumetric points) that stores sampled data, including material, color, volume density, and the like.
Voxel grids, are a data structure that represents three-dimensional objects using fixed-size cubes as minimum units.
The region of interest, ROI for short, is a region to be processed, which is delineated by a square frame, a circle, an ellipse, an irregular polygon, etc. from the processed image in computer vision and image processing.
An application scenario of the method and apparatus provided by the present disclosure will be described below with reference to fig. 1.
Fig. 1 is an application scenario diagram of a method and an apparatus for generating scenario data according to an embodiment of the present disclosure.
As shown in fig. 1, the application scenario 100 of this embodiment may include an electronic device 110, and the electronic device 110 may be various electronic devices with processing functionality, including but not limited to a smartphone, a tablet, a laptop, a desktop computer, a server, and so on.
The electronic device 110 may, for example, provide a human-machine interface. For example, the electronic device 110 may present the rendered image 130 of the target scene to the user 120 in response to an operation by the user 120. The rendered image 130 may be rendered according to the scene data of the target scene and the target viewing angle. The target view angle may be obtained in response to an input operation of the user 120, or may be determined in response to a drag operation of the user 120 on the human-computer interaction interface, and the like, which is not limited in this disclosure.
In an embodiment, as shown in fig. 1, the application scenario 100 may further include a server 140, and the server 140 may be, for example, a background management server supporting the running of the client application in the electronic device 110. The electronic device 110 may be communicatively connected to the server 140 via a network, which may include wired or wireless communication links.
For example, the server 140 may construct a neural radiation field from a plurality of images of the target scene at a plurality of viewing angles, and generate scene data in advance from voxel grid data output by the neural radiation field for the electronic device 110 to load. For example, the server 140 may send the generated scene data 150 to the electronic device 110 in response to a loading request of the scene data sent by the electronic device 110, so as to serve as a basis for rendering the rendered image 130 by the electronic device. Alternatively, the server 140 may determine, according to the loading request, a part of scene data corresponding to the target view angle in the scene data of the target scene, and send the part of scene data to the electronic device 110, so as to reduce the data amount of the scene data loaded by the electronic device 110.
In an embodiment, the server 140 may further generate scene data of a target scene at different granularities according to the voxel grid data, for example, to send the scene data meeting the accuracy requirement and the memory limitation to the electronic device 110 according to the actual requirement of the electronic device 110, or send the scene data at different granularities to the electronic device 110 and another device with different computing capabilities from the electronic device 110, so that multiple devices with different computing capabilities may render and obtain a rendered image.
It should be noted that the scene data generation method provided by the present disclosure may be executed by the server 140. Accordingly, the generating device of the scene data provided by the present disclosure may be provided in the server 140.
It should be understood that the number and type of electronic devices 110 and servers 140 in fig. 1 are merely illustrative. There may be any number and type of electronic devices 110 and servers 140, as desired for an implementation.
The method for generating scene data provided by the present disclosure will be described in detail below with reference to fig. 2 to 6.
Fig. 2 is a flowchart illustrating a method of generating scene data according to an embodiment of the present disclosure.
As shown in FIG. 2, the method 200 for generating scene data of the embodiment may include operations S210 to S230.
In operation S210, the voxel grid data output by the nerve radiation field is sampled at least twice with different granularities, and at least two sampling results are obtained.
According to embodiments of the present disclosure, a nerve radiation field may be constructed from a plurality of images of a target scene taken at a plurality of viewing angles. The voxel grid data may include data for each of a plurality of voxel grids. This embodiment may down-sample the plurality of voxel grids by different multiples to obtain at least two sampling results.
For example, the embodiment may downsample the voxel grid data by a plurality of different multiples to obtain at least two sampling results. For example, by setting the voxel grid data output from the nerve radiation field to be data of an N × N voxel grid, and by down-sampling, data of a voxel grid of a size of N/k × N/k can be obtained, for example. Wherein k is a down-sampling coefficient, and the value of k is a natural number. This embodiment may select at least two k values and use the data of the voxel grid down-sampled according to at least two sampling coefficients as at least two sampling results.
In the down-sampling process, for example, a maximum pooling algorithm, an average pooling algorithm, or the like may be used to implement the down-sampling. For example, the voxel grid data may include color data (e.g., RGB values) and bulk density. When k is 2, an average value of two color data of two neighboring voxel grids may be used as the color data of the voxel grid at the corresponding position in the downsampled sampling result, and an average value of two volume densities of the two neighboring voxel grids may be used as the volume density of the voxel grid at the corresponding position in the downsampled sampling result.
In an embodiment, a sparse neural radiation grid combining the neural radiation field with a sparse voxel grid representation based on the learned feature vectors may be employed to output voxel grid data. In this embodiment, the voxel grid data may include, in addition to color data and volume density, view-dependent feature data, which may be, for example, a 4-dimensional feature vector (4-dimensional feature vector), the 4-dimensional feature vector being a learned feature vector that may encode view-dependent effects. It should be noted that, in this embodiment, the color data may represent the inherent color of the voxel grid independent of the viewing angle. The 4-dimensional feature vector and the visual angle information are processed by a tiny neural network (tiny neural network), and then the mirror color can be obtained. And superposing the mirror surface color and the inherent color to obtain the color of the pixel grid under the visual angle represented by the visual angle information.
In operation S220, a texture feature is extracted according to each of the at least two sampling results, and at least two texture features respectively corresponding to the at least two sampling results are obtained.
According to the embodiment of the present disclosure, the process of extracting the texture features may be understood as a baking (bake) process, which may be understood as an inverse process of the rendering process, and is used to decompose a set of gaussian distribution positions in each voxel grid, and the voxel grid data may be averaged according to the set of gaussian distribution positions, so as to perform antialiasing (antialiasing) estimation.
For example, the set of gaussian distribution positions may indicate weighting weights of voxel grid data of a set of sampling points corresponding to each voxel grid, and the actual voxel grid data of each voxel grid in the process of generating an image by rendering may be obtained by weighting the voxel grid data of the set of sampling points and superimposing the voxel grid data of each voxel grid output by the neural radiation field.
In the embodiment, in the process of generating the scene data, the texture features are extracted according to each sampling result, so that the storage space can be allocated to the voxel grid of the target scene, and the storage cost and the rendering time are reduced.
In an embodiment, each sampling result may be used as an input of a deep learning model, and a texture feature corresponding to each sampling result may be output by the deep learning model. The deep learning model can be constructed based on a convolutional neural network, for example, the deep learning model can be constructed based on ResNet50, which is not limited by the present disclosure. Alternatively, in this embodiment, a Gray-level co-occurrence matrix (GLCM) method, a Local Binary Pattern operator (LBP), or the like may be used to extract texture features, which is not limited in this disclosure.
According to an embodiment of the present disclosure, the texture feature corresponding to each sampling result may include a texture feature corresponding to each voxel grid in each sampling result.
In operation S230, a set of scene data expressing the target scene is generated according to each sampling result and the texture feature corresponding to each sampling result, and at least two sets of scene data expressing the target scene at different granularities are obtained.
According to the embodiment of the disclosure, the voxel grid data included in each sampling result can be fused with the corresponding texture features, so as to obtain scene data expressing a target scene.
For example, the voxel grid data of a plurality of sampling points for each voxel grid may be weighted according to the texture features corresponding to each sampling result, and the voxel grid data of each voxel grid output by the neural radiation field is superimposed to obtain the actual voxel grid data of each voxel grid in the process of rendering and generating the image. This embodiment may use actual voxel grid data of all voxel grids in each sampling result as scene data expressing the target scene.
In an embodiment, isosurface extraction may be performed according to each sampling result to obtain a patch model of the target scene, and a correspondence between the patch model and the voxel grid is established. The embodiment may use the correspondence and actual voxel grid data of all voxel grids as scene data expressing the target scene.
By sampling the voxel grid data at different granularities, the disclosed embodiments can generate scene data of the target scene at different granularities (e.g., at different resolutions). Therefore, the electronic equipment with different computing capabilities can perform scene rendering according to the scene data under different granularities, and the electronic equipment can load the scene data with different granularities according to different resolution requirements. For example, if downsampling is performed, the data amount of the generated scene data of the target scene can be obviously reduced, and it is convenient for the electronic device with limited computing capability to load the scene data and render the scene data to obtain the image of the target scene. Furthermore, the texture features can be fused into the scene data through the method of the embodiment of the disclosure, so that the electronic device does not need to extract the texture features in the process of rendering the image, the calculation amount in the process of rendering the image by the electronic device can be reduced, the requirement on the calculation capacity of the electronic device is favorably reduced, the rendering time is shortened, and the user experience is improved.
According to the embodiment of the disclosure, each sampling result can be partitioned, sub-texture features corresponding to each voxel block obtained through partitioning are determined according to the extracted texture features, and a mapping relation between each voxel block and scene data is established. Therefore, when the electronic equipment loads the scene data, only part of the scene data can be loaded according to actual requirements, and the scene data of the whole target scene does not need to be loaded. Therefore, the requirements on the computing capacity of the electronic equipment are further reduced, and the rendering time is shortened. The implementation principle of the generation method of the scene data in this embodiment will be described below with reference to fig. 3.
Fig. 3 is an implementation schematic diagram of a scene data generation method according to an embodiment of the present disclosure.
As shown in fig. 3, in generating scene data of a target scene, this embodiment 300 may first construct a neural radiation field NeRF 320 for the target scene from a plurality of images 310 of the target scene acquired at a plurality of view angles. The data output by the trained NeRF 320 is then used as voxel grid data 330.
In an embodiment, an image capturing device may be used to capture images of a target scene from multiple viewing angles, resulting in a plurality of images 310. Meanwhile, a mapping relation between the obtained scene image and the camera pose corresponding to the acquisition visual angle can be established, and a plurality of mapping relations aiming at a plurality of visual angles are obtained. The embodiment may construct a nerve radiation field 320 for the target scene from the plurality of mapping relationships.
For example, when the nerve radiation field 320 for the target scene is constructed in the embodiment, an image in each mapping relation may be sampled according to the camera pose in each mapping relation, and a three-dimensional sampling point may be obtained according to the position of a pixel point in the sampled image and the depth of the pixel point. For example, a plurality of three-dimensional sample points may be obtained for each mapping relationship, and the plurality of three-dimensional sample points may constitute a set of three-dimensional sample points for one scene image. The camera pose can be represented by the pitch angle, the roll angle and the yaw angle of the camera, and the coordinate value of each sampling point can be represented by the coordinate value of each pixel point in the world coordinate system. As such, this embodiment can construct a nerve radiation field from a plurality of sets of three-dimensional sampling points for a plurality of scene images and a plurality of camera poses having mapping relationships with the plurality of scene images, respectively.
Specifically, the embodiment can construct training data according to coordinate values of a plurality of groups of three-dimensional sampling points and pose information of a plurality of cameras to obtain a plurality of training data. Each training data comprises a plurality of groups of data, and each group of data comprises coordinate values of a three-dimensional sampling point and corresponding camera pose information. The camera pose information included in the multiple sets of data in each training data is the same. The camera pose information may include the angle of the pitch angle, the angle of the roll angle, and the angle of the yaw angle described above, and may be used as the view angle information. For example, the coordinate value to set a sampling point may be represented as (x, y, z), the camera pose information may be represented as (pitch, roll, yaw), and the set of data may be represented as (x, y, z, pitch, roll, yaw). This embodiment may use a plurality of training data as raw data for the target scene. In training the nerve radiation field 320, operation S310 may be performed to load raw data of a target scene, input the raw data into an initial nerve radiation field, and output color data, volume density, and 4-dimensional feature vectors from the initial nerve radiation field. From the color data, the bulk density, and the 4-dimensional feature vector, images from multiple perspectives corresponding to the multiple camera position information in the training data can then be obtained using a voxel rendering technique. The embodiment may then determine the loss of the initial nerve radiation field by comparing the images from the multiple viewing angles obtained by the voxel rendering technique with the images of the scene acquired at the multiple viewing angles, respectively. And adjusting the network parameters of the initial nerve radiation field by taking the minimum loss as a target to complete one round of training. The embodiment may perform multiple rounds of training on the nerve radiation field until the loss converges, and use the nerve radiation field obtained after completing the multiple rounds of training as the nerve radiation field 320 for the target scene. Wherein the network parameters in the initial nerve radiation field can be set empirically.
After the nerve radiation field 320 for the target scene is obtained through training, the voxel grid data 330 output by the nerve radiation field 320 may be baked in a layered manner (operation S320), that is, at least two sampling results are obtained through the above-described sampling in operation S210, and one sampling granularity corresponds to one level of baking. By then hierarchically baking the at least two sampling results, at least two texture features 340 corresponding to the at least two sampling results may be obtained.
Before or after obtaining the texture features, or while obtaining the texture features, the embodiment 300 may, for example, first perform blocking on each sampling result according to the blocking parameters corresponding to each sampling result, so as to obtain a plurality of voxel blocks. The blocking parameters may include, for example, the number of voxel blocks obtained by blocking and/or the size of the voxel blocks obtained by blocking. For example, if the sampling result includes voxel grid data of an M × M voxel grid and the size of a voxel block obtained by blocking is 2 × 2 × 2, M/2 × M/2 voxel blocks can be obtained in total.
After obtaining a plurality of voxel blocks and a texture feature corresponding to each sampling result, the embodiment may extract a plurality of sub-texture features respectively corresponding to the plurality of voxel blocks from the texture feature. It will be appreciated that if the sampling result includes voxel grid data of an M × M voxel grid, the extracted texture features may also be represented by an M × M tensor, where each element in the tensor corresponds to a voxel grid at a corresponding position in the M × M voxel grid. The embodiment may extract the feature at the corresponding position in the texture feature according to the position of each voxel block in the M × M voxel grid, so as to obtain the sub-texture feature corresponding to each voxel block.
In embodiment 300, after obtaining a plurality of voxel blocks and sub-texture features corresponding to the plurality of voxel blocks, scene data corresponding to each voxel block may be generated, for example, from voxel grid data and sub-texture features for each voxel block. The plurality of scene data respectively corresponding to the plurality of voxel blocks obtained by each sampling result block constitute scene data of the target scene together.
In this embodiment 300, a plurality of scene data blocks corresponding to a plurality of voxel blocks obtained by each sampling result block may be stored in a storage space in blocks (operation S330), so that the electronic device may load the scene data on a voxel block basis as needed without loading the entire scene data of the target scene.
According to an embodiment of the present disclosure, the blocking parameter corresponding to each sampling result may be different according to a sampling granularity. For example, if the blocking parameter includes the number of voxel blocks obtained by blocking, the number of sampling results obtained by coarse-grained sampling is smaller than the number of sampling results obtained by fine-grained sampling. Therefore, when the electronic equipment needs to render to obtain a clear image of a certain local area of a target scene, the voxel blocks within the view angle range in the sampling result obtained by fine-grained sampling can be accurately positioned, and the loading of scene data of unnecessary voxel blocks is reduced. The electronic equipment can ensure the accuracy of the loaded scene data, reduce the loaded data amount as much as possible and reduce the unnecessary consumption of computing resources. When the electronic equipment needs to render a target scene with a larger area, the rendered image can be ensured to show the larger area of the target scene by loading the scene data.
For example, the number of sampling results obtained by sampling with a down-sampling coefficient of 4 may be 4, and the number of sampling results obtained by sampling with a down-sampling coefficient of 2 may be 8, which is not limited by the present disclosure.
According to the embodiment of the disclosure, when extracting the texture features, for example, the voxel grid data may be first screened to remove the voxel grid data corresponding to the position where there is no object in the target scene, and only useful target grid data is retained. Texture features are then extracted from the target mesh data. The voxel grid data to be eliminated may be, for example, voxel grid data with opacity smaller than a predetermined value. The predetermined value may be any value close to 0 but larger than 0, for example. In one embodiment, the voxel grid data with opacity of 0 may be used as the data to be culled. It will be appreciated that the principle of extraction of the texture features in this embodiment is similar to that of extraction of the texture features in SNeRG. By the embodiment, the calculation amount in the texture feature extraction process can be reduced, and the rendering speed can be improved.
It will be appreciated that the opacity of the voxel grid may be determined, for example, from the bulk density that the voxel grid data comprises. For example, if the bulk density value is σ, the opacity value α =1-exp (σ v). Where v is the width of the voxel grid.
Fig. 4 is a schematic diagram of the principle of generating scene data representing a target scene according to an embodiment of the present disclosure.
According to an embodiment of the present disclosure, as shown in fig. 4, after determining voxel grid data 410 output by a nerve radiation field, the embodiment 400 may block the voxel grid data 410 to obtain a voxel block 420 and a sub-texture feature 430 of the voxel block 420. The embodiment may then use the sub-texture feature 430 to adjust the voxel grid data 440 of the voxel block 420, resulting in adjusted voxel grid data 450. The embodiment may then store the adjusted voxel grid data 450 as scene data corresponding to the voxel block 420 in the form of a file into the predetermined storage space 460. For example, the scene data stored in the predetermined storage space 460 may include scene data a, scene data b, \8230;, scene data c, etc. for a plurality of voxel blocks blocked per sampling result. One scene data corresponds to one voxel block, and the plurality of scene data constitute a set of scene data expressing the target scene.
According to the embodiment of the present disclosure, when the voxel grid data 440 is adjusted according to the sub-texture feature 430, the sub-texture feature 430 and the voxel grid data 440 may be fused by using an Alpha synthesis algorithm, so as to obtain adjusted voxel grid data. Alternatively, only the color data in the voxel grid data 440 may be adjusted according to the sub-texture feature 430, for example, according to the sub-texture feature, determining the influence weight of the color of the plurality of sampling points corresponding to each voxel grid on the color of the pixel corresponding to each voxel grid, weighting the color data of the plurality of sampling points according to the weight, and fusing (e.g., adding) the weighted color data with the color data of each voxel grid, thereby obtaining the adjusted color data.
In an embodiment, in order to facilitate loading of scene data corresponding to the voxel blocks, the embodiment may further allocate index values to a plurality of voxel blocks included in each sampling result. And then determining the mapping relation between the index values of the voxel blocks and the scene data corresponding to the voxel blocks according to the corresponding relation between the plurality of voxel blocks and the scene data of the target scene.
In an embodiment, the voxel grid data may include color data, feature data related to viewing angle, and volume density. The feature data related to the viewing angle may be the 4-dimensional feature vector described above. In this embodiment, the mapping relationship between the index values of the multiple voxel blocks obtained by blocking each sampling result and the scene data may be stored as a file named by altas _ indices (for example only), the color data and the volume density in the voxel grid data of each voxel block may be stored as a file named by rgba (for example only), and the feature data related to the view angle of each voxel block may be stored as a file named by feature (for example only).
In an embodiment, a mapping relationship between voxel blocks and model blocks in a model representing a target scene generated according to a voxel grid may also be established, and scene data composed of adjusted voxel grid data may be represented in a block-sparse format using two dense arrays. Where one of the two dense arrays represents a 3D texture atlas. The 3D texture atlas includes a plurality of "macro blocks" corresponding to a plurality of voxel blocks, the "macro blocks" corresponding to color data in sparse volumes, view-dependent feature data, and volume density. The other of the two dense arrays represents low resolution indirect mesh data that points to "macroblocks" that are null in value, such as the voxel mesh data described above that has an opacity less than a predetermined value. Thus, during rendering, the model block in the view angle range can be determined first, the corresponding voxel block can be determined according to the mapping relation, and the scene data with the opacity greater than or equal to the preset value can be loaded from the 3D texture atlas according to the 'macroblock' with an empty value.
Fig. 5 is a schematic diagram of a principle of obtaining target scene data according to an embodiment of the present disclosure.
According to the embodiment of the disclosure, after the scene data of the target scene at different granularities is obtained in the manner described above, when a loading request including granularity information is received, the embodiment may search the scene data at the target granularity indicated by the granularity information from the scene data at different granularities, use the searched scene data as the target scene data, and send the target scene data to the electronic device sending the loading request, so that the electronic device loads the target scene data, and render the target scene data to obtain the image of the target scene. By the method, the electronic equipment can load the scene data with the proper granularity according to the actual computing capacity, so that the scene data of the target scene can be loaded by the electronic equipment with different computing capacities.
As shown in fig. 5, in an embodiment 500, the load request may also include, for example, perspective information 510. In this embodiment, when obtaining the target scene data, a plurality of voxel blocks 520 included in the target sampling result corresponding to the target granularity may be determined first. Then, a voxel block matching the view information among the plurality of voxel blocks 520 is determined from the view information as a target voxel block 521. And the target sampling result is a sampling result obtained by sampling in the target granularity. Subsequently, the embodiment 500 may determine scene data corresponding to the target voxel block in a set of scene data of the target scene at the target granularity as target scene data 530.
For example, the embodiment may perform ray tracing according to the view angle information, and determine a voxel block located in the light coverage range from among the plurality of voxel blocks 520 as the target voxel block 521. The embodiment may obtain the target scene data 530 according to the mapping relationship between the index value of the target voxel block and the scene data.
According to the embodiment, the voxel block matched with the visual angle information is determined according to the visual angle information, and only the scene data corresponding to the voxel block is sent to the electronic equipment, so that the amount of the scene data loaded by the electronic equipment can be obviously reduced, the electronic equipment can be loaded as required, the scene data of the whole target scene does not need to be loaded, the requirement on the computing capacity of the electronic equipment is favorably reduced, and the rendering efficiency is improved.
Fig. 6 is an application example diagram of generation and loading of scene data according to an embodiment of the present disclosure.
As shown in fig. 6, in an embodiment 600, in the process of generating the scene data, a series of processes such as sampling, texture feature extraction, and blocking may be performed on voxel grid data 610 output by a neural radiation field by using a level of Detail (LOD) baking mechanism (LOD-baker mechanism) 620, and the scene data of the target scene is generated according to the processing result.
For example, sampling and baking may be set to three levels L1 to L3, and for the level L1, coarse sampling may be performed on the voxel grid data, but the sampled sampling result is not blocked (operation S621, coarse sampling baking is performed). For example, for the L1 stage, the down-sampling coefficient may be 4. The texture features obtained from the L1-level coarse sampling bake are for the entire target scene, and the finally generated scene data is the 3D overall scene data 621 of the target scene. For the L2 level, the voxel grid data may be coarsely sampled, but the granularity is smaller than the L1 level, and the sampled sampling result is blocked (operation S622 is performed, coarse blocking sample baking). For example, for the L2 level, the down-sampling coefficient may be 2, and the number of the voxel blocks obtained by blocking may be 1/4 of the number of the entire voxel grids. The texture features from the L2 level coarse sampling bake are for each voxel block, and the resulting scene data includes a plurality of feature data 622 of local scene coarse granularity. Also, local scene location data 623 corresponding to the plurality of voxel blocks may also be obtained, which may be the index values assigned to the plurality of voxel blocks described above. For the L3 level, the voxel grid data may be directly blocked (operation S623, block baking is performed). For example, for the L3 level, the down-sampling coefficient may be 1, and the number of the voxel blocks obtained by blocking may be 1/2 of the number of the entire voxel grids. The texture features from the L3 level bake are for each voxel block, and the resulting generated scene data includes a plurality of local scene fine-grained feature data 624. Also, local scene location data 625 corresponding to the plurality of voxel blocks may also be obtained, which may be the index values assigned to the plurality of voxel blocks described above.
By the embodiment, scene data of the target scene with three different granularities can be obtained.
When the electronic device needs to render scene data for generating a target scene, for example, a load request may be sent to a server executing the LOD-baker mechanism. For example, the electronic device may load scene data based on the scene loading presentation mechanism 630 and render the presented image. For example, the electronic device may open a client application that renders the target scene in response to a user operation. The electronic device may send a loading request to the server in response to the start of the client application, and for a requirement for rendering an image of the target scene for the first time, the granularity information in the loading request may be the coarsest granularity, so as to load the scene data of the entire target scene under the condition that the computational stress is not high. The server may determine that the overall scene data 621 is the target scene data in response to the loading request, and send the target scene data to the electronic device. Thus, the electronic device may load the entire scene data 621, render an image of the entire target scene according to the entire scene data 621, and display the L1 coarse-grained initial scene 631.
Subsequently, the electronic device may, for example, respond to a user operation (e.g., a local zoom-in operation), send a load request to the server again, where the granularity information in the load request is a fine granularity smaller than the coarsest granularity, for example, may be a granularity corresponding to an L2 level, and the load request may further include perspective information, where the perspective information is determined according to a zoom-in magnification selected by the user and a local scene that may be displayed. The server may determine, in response to the loading request, scene data that matches the view angle information in the local scene coarse feature data obtained in operation S622, and send the scene data to the electronic device as target scene data. Thus, the electronic device may load the feature data 622 of the local scene coarse granularity, and render an image of a local, fine-grained target scene according to the data to display the L2 fine-grained local scene 632.
Subsequently, the electronic device may, for example, respond to a user operation (e.g., a further partial zoom-in operation), send a load request to the server again, where the granularity information in the load request is a fine granularity smaller than the fine granularity, and may be, for example, a granularity corresponding to an L3 level, and at the same time, the load request may further include perspective information, where the perspective information is determined according to a zoom-in magnification selected by the user and a partial scene that may be displayed. The server may determine, in response to the loading request, scene data that matches the view information in the local scene coarse feature data obtained in operation S623, and send the scene data to the electronic device as target scene data. In this way, the electronic device may load the feature data 624 of the local scene fine granularity, render an image of a local and fine-grained target scene according to the data, and display the L3 fine-grained local scene 633.
According to the embodiment, the LOD-Bake mechanism and the scene loading and displaying mechanism are combined, so that the electronic equipment can obtain scene data of a coarse-grained whole scene or fine-grained local scene as required, the scene data of the fine-grained whole scene does not need to be loaded at one time, the scene can be rendered and displayed through loading the scene data by using a smart phone, a tablet personal computer and the like with low computing power, the computing amount in the rendering process is reduced, and the displaying efficiency and the user experience are improved.
Based on the scene data generation method provided by the present disclosure, the present disclosure also provides a scene data generation device, which will be described in detail below with reference to fig. 7.
Fig. 7 is a block diagram of a configuration of a scene data generation apparatus according to an embodiment of the present disclosure.
As shown in fig. 7, the scene data generation apparatus 700 of this embodiment may include a sampling module 710, a texture feature extraction module 720, and a scene data generation module 730.
The sampling module 710 is configured to sample the voxel grid data output by the neural radiation field at least twice with different granularities to obtain at least two sampling results. Wherein the nerve radiation field is constructed from a plurality of images of the target scene at a plurality of viewing angles. In an embodiment, the sampling module 710 may be configured to perform the operation S210 described above, which is not described herein again.
The texture feature extracting module 720 is configured to extract a texture feature according to each of the at least two sampling results, so as to obtain at least two texture features respectively corresponding to the at least two sampling results. In an embodiment, the texture feature extraction module 720 may be configured to perform the operation S220 described above, which is not described herein again.
The scene data generating module 730 is configured to generate a set of scene data expressing the target scene according to each sampling result and the texture feature corresponding to each sampling result, so as to obtain at least two sets of scene data expressing the target scene at different granularities. In an embodiment, the scene data generating module 730 may be configured to perform the operation S230 described above, which is not described herein again.
According to an embodiment of the present disclosure, the apparatus 700 may further include a blocking module and a sub-feature extraction module. And the blocking module is used for blocking each sampling result according to the blocking parameter corresponding to each sampling result to obtain a plurality of voxel blocks. The sub-feature extraction module is used for extracting a plurality of sub-texture features respectively corresponding to the plurality of voxel blocks from the texture features corresponding to each sampling result.
According to the embodiment of the disclosure, the blocking parameter includes the number of voxel blocks obtained by blocking, and the number corresponding to the sampling result obtained by coarse-grained sampling is smaller than the number corresponding to the sampling result obtained by fine-grained sampling.
According to an embodiment of the present disclosure, the above-mentioned texture feature extraction module 720 may include a first data determination sub-module and a feature extraction sub-module. The first data determination submodule is used for determining voxel grid data with opacity smaller than a preset value in the voxel grid data included in each sampling result as target grid data corresponding to the target scene. And the characteristic extraction submodule is used for extracting texture characteristics according to the target grid data to obtain the texture characteristics corresponding to each sampling result. Wherein the voxel grid data comprises a bulk density, and the opacity is determined from the bulk density.
According to an embodiment of the present disclosure, the scene data generation module 730 may include a data adjustment sub-module and a second data determination sub-module. And the data adjusting submodule is used for adjusting the voxel grid data of each voxel block according to the sub-texture characteristics corresponding to each voxel block for each voxel block included in each sampling result to obtain adjusted voxel grid data. The second data determination submodule is configured to determine that the scene data corresponding to each voxel block includes adjusted voxel grid data. Wherein the set of scene data representing the target scene includes: a plurality of scene data corresponding to the plurality of voxel blocks.
According to an embodiment of the present disclosure, the voxel grid data comprises feature data, color data and volume density related to the viewing angle. The data adjusting submodule is used for: and adjusting the color data of each voxel block according to the sub-texture features corresponding to each voxel block to obtain adjusted voxel grid data.
According to an embodiment of the present disclosure, the apparatus 700 may further include an index assignment module and a mapping relationship determination module. The index assignment module is used for assigning index values to a plurality of voxel blocks included in each sampling result. The mapping relation determining module is used for determining the mapping relation between a plurality of index values of a plurality of voxel blocks and a plurality of scene data according to the corresponding relation between the plurality of voxel blocks and the scene data of the target scene.
According to an embodiment of the present disclosure, the apparatus 700 may further include a target data determination module and a data transmission module. The target data determination module is used for determining a group of scene data expressing a target scene at a target granularity indicated by the granularity information in response to receiving a loading request comprising the granularity information, so as to obtain target scene data. The data sending module is used for sending the target scene data.
According to an embodiment of the present disclosure, the load request further includes perspective information. The target data determination module may include a target block determination submodule and a third data determination submodule. And the target block determination submodule is used for determining a voxel block matched with the view angle information in a plurality of voxel blocks included in the target sampling result as a target voxel block, wherein the target sampling result is obtained by sampling at a target granularity. The third data determination submodule is configured to determine, as target scene data, scene data corresponding to a target voxel block in a set of scene data that represents a target scene at a target granularity.
In the technical scheme of the present disclosure, the processes of collecting, storing, using, processing, transmitting, providing, disclosing and applying the personal information of the user all conform to the regulations of the relevant laws and regulations, and necessary security measures are taken without violating the customs of the public order. In the technical scheme of the disclosure, before the personal information of the user is obtained or collected, the authorization or the consent of the user is obtained.
The present disclosure also provides an electronic device, a readable storage medium, and a computer program product according to embodiments of the present disclosure.
Fig. 8 shows a schematic block diagram of an example electronic device 800 that may be used to implement the method of generation of scene data of embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not intended to limit implementations of the disclosure described and/or claimed herein.
As shown in fig. 8, the apparatus 800 includes a computing unit 801 that can perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM) 802 or a computer program loaded from a storage unit 808 into a Random Access Memory (RAM) 803. In the RAM 803, various programs and data necessary for the operation of the device 800 can also be stored. The calculation unit 801, the ROM 802, and the RAM 803 are connected to each other by a bus 804. An input/output (I/O) interface 805 is also connected to bus 804.
A number of components in the device 800 are connected to the I/O interface 805, including: an input unit 806 such as a keyboard, a mouse, or the like; an output unit 807 such as various types of displays, speakers, and the like; a storage unit 808, such as a magnetic disk, optical disk, or the like; and a communication unit 809 such as a network card, modem, wireless communication transceiver, etc. The communication unit 809 allows the device 800 to exchange information/data with other devices via a computer network such as the internet and/or various telecommunication networks.
Computing unit 801 may be a variety of general and/or special purpose processing components with processing and computing capabilities. Some examples of the computing unit 801 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various dedicated Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, and so forth. The calculation unit 801 executes the respective methods and processes described above, such as the generation method of scene data. For example, in some embodiments, the generation method of the scene data may be implemented as a computer software program that is tangibly embodied on a machine-readable medium, such as storage unit 808. In some embodiments, part or all of the computer program can be loaded and/or installed onto device 800 via ROM 802 and/or communications unit 809. When the computer program is loaded into the RAM 803 and executed by the computing unit 801, one or more steps of the scene data generation method described above may be performed. Alternatively, in other embodiments, the computing unit 801 may be configured to perform the method of generating scene data in any other suitable manner (e.g., by means of firmware).
Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuitry, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), system on a chip (SOCs), complex Programmable Logic Devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.
Program code for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the program codes, when executed by the processor or controller, cause the functions/operations specified in the flowchart and/or block diagram to be performed. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package partly on the machine and partly on a remote machine or entirely on the remote machine or server.
In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user may provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), and the Internet.
The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The Server may be a cloud Server, also called a cloud computing Server or a cloud host, and is a host product in a cloud computing service system, so as to solve the defects of high management difficulty and weak service extensibility in a conventional physical host and VPS service ("Virtual Private Server", or "VPS" for short). The server may also be a server of a distributed system, or a server incorporating a blockchain.
It should be understood that various forms of the flows shown above, reordering, adding or deleting steps, may be used. For example, the steps described in the present disclosure may be executed in parallel, sequentially, or in different orders, and are not limited herein as long as the desired results of the technical solutions disclosed in the present disclosure can be achieved.
The above detailed description should not be construed as limiting the scope of the disclosure. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present disclosure should be included in the scope of protection of the present disclosure.

Claims (20)

1. A method of generating scene data, comprising:
sampling voxel grid data output by a nerve radiation field at least twice in different granularities to obtain at least two sampling results; wherein the nerve radiation field is constructed from a plurality of images of the target scene at a plurality of viewing angles;
extracting texture features according to each sampling result of the at least two sampling results to obtain at least two texture features respectively corresponding to the at least two sampling results; and
and generating a group of scene data expressing the target scene according to each sampling result and the texture feature corresponding to each sampling result, and obtaining at least two groups of scene data expressing the target scene at different granularities.
2. The method of claim 1, further comprising:
partitioning each sampling result according to the partitioning parameter corresponding to each sampling result to obtain a plurality of voxel blocks; and
and extracting a plurality of sub-texture features respectively corresponding to the plurality of voxel blocks from the texture features corresponding to each sampling result.
3. The method of claim 2, wherein:
the blocking parameters comprise the number of voxel blocks obtained by blocking;
the number of sampling results obtained by coarse-grained sampling is smaller than the number of sampling results obtained by fine-grained sampling.
4. The method according to claim 2, wherein the extracting the texture feature according to each of the at least two sampling results, and obtaining at least two texture features respectively corresponding to the at least two sampling results comprises:
determining voxel grid data with opacity smaller than a preset value in the voxel grid data included in each sampling result as target grid data corresponding to the target scene; and
extracting texture features according to the target grid data to obtain texture features corresponding to each sampling result,
wherein the voxel grid data comprises a bulk density from which the opacity is determined.
5. The method of claim 2, wherein the generating scene data representing the target scene from the each sampling result and the texture feature corresponding to the each sampling result comprises:
for each voxel block included in each sampling result, adjusting voxel grid data of each voxel block according to sub-texture features corresponding to each voxel block to obtain adjusted voxel grid data; and
determining that the scene data corresponding to the each voxel block includes the adjusted voxel grid data,
wherein the set of scene data representing the target scene comprises: a plurality of scene data corresponding to the plurality of voxel blocks.
6. The method of claim 5, wherein the voxel grid data comprises view-angle-related feature data, color data, and volume density; the adjusting the voxel grid data of each voxel block according to the sub-texture features corresponding to each voxel block to obtain adjusted voxel grid data comprises:
and adjusting the color data of each voxel block according to the sub-texture features corresponding to each voxel block to obtain the adjusted voxel grid data.
7. The method of claim 5, further comprising:
assigning index values to the plurality of voxel blocks included in each sampling result; and
and determining the mapping relation between the index values of the voxel blocks and the scene data of the target scene according to the corresponding relation between the voxel blocks and the scene data of the target scene.
8. The method of claim 2, further comprising:
in response to receiving a loading request comprising granularity information, determining a group of scene data expressing the target scene under the target granularity indicated by the granularity information to obtain target scene data; and
and sending the target scene data.
9. The method of claim 8, wherein the load request further comprises perspective information; the determining a set of scene data expressing the target scene at the target granularity indicated by the granularity information includes:
determining a voxel block matched with the view information in a plurality of voxel blocks included in a target sampling result as a target voxel block, wherein the target sampling result is obtained by sampling at the target granularity; and
determining scene data corresponding to the target voxel block in a group of scene data expressing the target scene at the target granularity as the target scene data.
10. An apparatus for generating scene data, comprising:
the sampling module is used for sampling the voxel grid data output by the nerve radiation field at least twice in different granularities to obtain at least two sampling results; wherein the nerve radiation field is constructed from a plurality of images of the target scene at a plurality of viewing angles;
the texture feature extraction module is used for extracting texture features according to each sampling result of the at least two sampling results to obtain at least two texture features respectively corresponding to the at least two sampling results; and
and the scene data generation module is used for generating a group of scene data expressing the target scene according to each sampling result and the texture feature corresponding to each sampling result to obtain at least two groups of scene data expressing the target scene at different granularities.
11. The apparatus of claim 10, further comprising:
the blocking module is used for blocking each sampling result according to the blocking parameters corresponding to each sampling result to obtain a plurality of voxel blocks; and
and the sub-feature extraction module is used for extracting a plurality of sub-texture features respectively corresponding to the plurality of voxel blocks from the texture features corresponding to each sampling result.
12. The apparatus of claim 11, wherein:
the blocking parameters comprise the number of voxel blocks obtained by blocking;
the number of sampling results obtained by coarse-grained sampling is smaller than the number of sampling results obtained by fine-grained sampling.
13. The apparatus of claim 11, wherein the texture feature extraction module comprises:
a first data determining sub-module, configured to determine, as target mesh data corresponding to the target scene, voxel mesh data with an opacity smaller than a predetermined value in the voxel mesh data included in each sampling result; and
a feature extraction submodule for extracting texture features according to the target grid data to obtain texture features corresponding to each sampling result,
wherein the voxel grid data comprises a bulk density from which the opacity is determined.
14. The apparatus of claim 11, wherein the scene data generation module comprises:
the data adjusting submodule is used for adjusting the voxel grid data of each voxel block according to the sub-texture features corresponding to each voxel block for each voxel block included in each sampling result to obtain adjusted voxel grid data; and
a second data determination sub-module for determining that the scene data corresponding to each voxel block includes the adjusted voxel grid data,
wherein the set of scene data representing the target scene comprises: a plurality of scene data corresponding to the plurality of voxel blocks.
15. The apparatus of claim 14, wherein the voxel grid data comprises view-angle dependent feature data, color data, and volume density; the data adjustment submodule is used for:
and adjusting the color data of each voxel block according to the sub-texture features corresponding to each voxel block to obtain the adjusted voxel grid data.
16. The apparatus of claim 14, further comprising:
an index assignment module, configured to assign index values to the plurality of voxel blocks included in each sampling result; and
a mapping relation determining module, configured to determine, according to a correspondence relation between the multiple voxel blocks and the scene data of the target scene, a mapping relation between multiple index values of the multiple voxel blocks and the multiple scene data.
17. The apparatus of claim 11, further comprising:
the target data determining module is used for responding to a received loading request comprising granularity information, determining a group of scene data which expresses the target scene under the target granularity indicated by the granularity information, and obtaining target scene data; and
and the data sending module is used for sending the target scene data.
18. The apparatus of claim 17, wherein the load request further comprises perspective information; the target data determination module includes:
a target block determination sub-module, configured to determine, as a target voxel block, a voxel block that matches the view information from among a plurality of voxel blocks included in a target sampling result, where the target sampling result is obtained by sampling at the target granularity; and
and a third data determining submodule, configured to determine, as the target scene data, scene data corresponding to the target voxel block in a group of scene data expressing the target scene at the target granularity.
19. An electronic device, comprising:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1 to 9.
20. A non-transitory computer readable storage medium having stored thereon computer instructions for causing a computer to perform the method of any one of claims 1 to 9.
CN202211276170.3A 2022-10-19 2022-10-19 Scene data generation method and device, electronic equipment and storage medium Active CN115359170B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211276170.3A CN115359170B (en) 2022-10-19 2022-10-19 Scene data generation method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211276170.3A CN115359170B (en) 2022-10-19 2022-10-19 Scene data generation method and device, electronic equipment and storage medium

Publications (2)

Publication Number Publication Date
CN115359170A true CN115359170A (en) 2022-11-18
CN115359170B CN115359170B (en) 2023-03-03

Family

ID=84007809

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211276170.3A Active CN115359170B (en) 2022-10-19 2022-10-19 Scene data generation method and device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN115359170B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117036639A (en) * 2023-08-21 2023-11-10 北京大学 Multi-view geometric scene establishment method and device oriented to limited space

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20210279943A1 (en) * 2020-03-05 2021-09-09 Magic Leap, Inc. Systems and methods for end to end scene reconstruction from multiview images
CN113706714A (en) * 2021-09-03 2021-11-26 中科计算技术创新研究院 New visual angle synthesis method based on depth image and nerve radiation field
CN114119838A (en) * 2022-01-24 2022-03-01 阿里巴巴(中国)有限公司 Voxel model and image generation method, equipment and storage medium
CN114511662A (en) * 2022-01-28 2022-05-17 北京百度网讯科技有限公司 Method and device for rendering image, electronic equipment and storage medium
WO2022104299A1 (en) * 2020-11-16 2022-05-19 Google Llc Deformable neural radiance fields
CN114549723A (en) * 2021-03-30 2022-05-27 完美世界(北京)软件科技发展有限公司 Rendering method, device and equipment for illumination information in game scene
CN114998548A (en) * 2022-05-31 2022-09-02 北京非十科技有限公司 Image reconstruction method and system
CN115082639A (en) * 2022-06-15 2022-09-20 北京百度网讯科技有限公司 Image generation method and device, electronic equipment and storage medium

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20210279943A1 (en) * 2020-03-05 2021-09-09 Magic Leap, Inc. Systems and methods for end to end scene reconstruction from multiview images
WO2022104299A1 (en) * 2020-11-16 2022-05-19 Google Llc Deformable neural radiance fields
CN114549723A (en) * 2021-03-30 2022-05-27 完美世界(北京)软件科技发展有限公司 Rendering method, device and equipment for illumination information in game scene
CN113706714A (en) * 2021-09-03 2021-11-26 中科计算技术创新研究院 New visual angle synthesis method based on depth image and nerve radiation field
CN114119838A (en) * 2022-01-24 2022-03-01 阿里巴巴(中国)有限公司 Voxel model and image generation method, equipment and storage medium
CN114511662A (en) * 2022-01-28 2022-05-17 北京百度网讯科技有限公司 Method and device for rendering image, electronic equipment and storage medium
CN114998548A (en) * 2022-05-31 2022-09-02 北京非十科技有限公司 Image reconstruction method and system
CN115082639A (en) * 2022-06-15 2022-09-20 北京百度网讯科技有限公司 Image generation method and device, electronic equipment and storage medium

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
常远 等: "基于神经辐射场的视点合成算法综述", 《图学学报》 *
钱郁 等: "基于深度学习的三维物体重建方法研究综述", 《江苏理工学院学报》 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117036639A (en) * 2023-08-21 2023-11-10 北京大学 Multi-view geometric scene establishment method and device oriented to limited space
CN117036639B (en) * 2023-08-21 2024-04-30 北京大学 Multi-view geometric scene establishment method and device oriented to limited space

Also Published As

Publication number Publication date
CN115359170B (en) 2023-03-03

Similar Documents

Publication Publication Date Title
CN115082639B (en) Image generation method, device, electronic equipment and storage medium
CN115100339B (en) Image generation method, device, electronic equipment and storage medium
CN111369681A (en) Three-dimensional model reconstruction method, device, equipment and storage medium
CN114820906B (en) Image rendering method and device, electronic equipment and storage medium
CN114842121B (en) Method, device, equipment and medium for generating mapping model training and mapping
CN115330940B (en) Three-dimensional reconstruction method, device, equipment and medium
CN112967381A (en) Three-dimensional reconstruction method, apparatus, and medium
EP4092629A2 (en) Method and apparatus for displaying objects, and storage medium
CN114820905A (en) Virtual image generation method and device, electronic equipment and readable storage medium
CN115578515B (en) Training method of three-dimensional reconstruction model, three-dimensional scene rendering method and device
CN115359170B (en) Scene data generation method and device, electronic equipment and storage medium
CN115439543B (en) Method for determining hole position and method for generating three-dimensional model in meta universe
CN115100337A (en) Whole body portrait video relighting method and device based on convolutional neural network
CN114792355B (en) Virtual image generation method and device, electronic equipment and storage medium
CN114708374A (en) Virtual image generation method and device, electronic equipment and storage medium
CN108986210B (en) Method and device for reconstructing three-dimensional scene
CN114092673A (en) Image processing method and device, electronic equipment and storage medium
CN112013820B (en) Real-time target detection method and device for deployment of airborne platform of unmanned aerial vehicle
CN115375847B (en) Material recovery method, three-dimensional model generation method and model training method
CN116883573A (en) Map building rendering method and system based on WebGL
CN115861510A (en) Object rendering method, device, electronic equipment, storage medium and program product
CN114140320B (en) Image migration method and training method and device of image migration model
CN113421335B (en) Image processing method, image processing apparatus, electronic device, and storage medium
CN115082624A (en) Human body model construction method and device, electronic equipment and storage medium
CN108520259A (en) A kind of extracting method of foreground target, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant