CN116843808A

CN116843808A - Rendering, model training and virtual image generating method and device based on point cloud

Info

Publication number: CN116843808A
Application number: CN202310802479.XA
Authority: CN
Inventors: 李�杰
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2023-06-30
Filing date: 2023-06-30
Publication date: 2023-10-03
Anticipated expiration: 2043-06-30
Also published as: CN116843808B

Abstract

The disclosure provides a rendering, model training and virtual image generating method and device based on point cloud, relates to the technical field of artificial intelligence, in particular to the technical fields of computer vision, augmented reality, virtual reality, deep learning and the like, and can be applied to scenes such as metauniverse, digital people and the like. The specific implementation scheme is as follows: according to the three-dimensional point cloud information of the object, obtaining a first deep learning feature representing a first attribute feature of the object and a second deep learning feature representing a second attribute feature of the object; performing first feature enhancement processing on the first deep learning features to obtain first rendering feature vectors; performing second feature enhancement processing on the second deep learning features to obtain second rendering feature vectors; and obtaining object rendering information of the object according to the first rendering feature vector and the second rendering feature vector.

Description

Rendering, model training and virtual image generating method and device based on point cloud

Technical Field

The disclosure relates to the technical field of artificial intelligence, in particular to the technical fields of computer vision, augmented reality, virtual reality, deep learning and the like, and can be applied to scenes such as metauniverse, digital people and the like, and particularly relates to a method and a device for rendering, model training and virtual image generation based on point cloud.

Background

Virtual digital people are one of the key elements that create a metauniverse virtual world. According to different business requirements of digital persons, the digital persons can be divided into 2-dimensional, 3-dimensional, cartoon, realistic writing, super realistic writing and the like. In a real scenario, a basic avatar to adapt to business needs to be built for a virtual digital person.

Disclosure of Invention

The disclosure provides a method and a device for rendering, model training and avatar generation based on point cloud.

According to an aspect of the present disclosure, there is provided a point cloud-based rendering method, including: according to the three-dimensional point cloud information of the object, obtaining a first deep learning feature representing a first attribute feature of the object and a second deep learning feature representing a second attribute feature of the object; performing first feature enhancement processing on the first deep learning features to obtain first rendering feature vectors; performing second feature enhancement processing on the second deep learning features to obtain second rendering feature vectors; and obtaining object rendering information of the object according to the first rendering feature vector and the second rendering feature vector.

According to another aspect of the present disclosure, there is provided a training method of a deep learning model, including: inputting sample point cloud information of a sample object into a first neural network of a deep learning model to obtain a sample first deep learning feature representing a first attribute feature of the sample object, wherein the sample point cloud information has a real object geometric label; inputting the sample point cloud information into a second neural network of the deep learning model to obtain sample second deep learning features representing second attribute features of the sample object; inputting the first sample deep learning feature into a third neural network of the deep learning model to obtain a first sample rendering feature vector; inputting the sample second deep learning feature into a fourth neural network of the deep learning model to obtain a sample second rendering feature vector; inputting the first rendering feature vector of the sample and the second rendering feature vector of the sample into a fifth neural network of the deep learning model to obtain sample object rendering information; determining a sample object rendering result according to the sample object rendering information and the real object geometric label; and training the deep learning model according to the real object geometric label and the sample object rendering result to obtain a trained deep learning model.

According to another aspect of the present disclosure, there is provided a point cloud-based rendering method, including: determining object point cloud information to be processed according to the geometric information of the object to be processed; inputting the point cloud information of the object to be processed into a deep learning model to obtain rendering information of the object to be processed; rendering the geometric information of the object to be processed according to the rendering information of the object to be processed to obtain a rendering result of the object to be processed; wherein the deep learning model is trained by using the training method of the deep learning model.

According to another aspect of the present disclosure, there is provided an avatar generation method including: determining target object point cloud information according to target object geometric information of the target object; processing the target object point cloud information based on the point cloud-based rendering method to obtain target object rendering information; and rendering the geometric information of the target object according to the rendering information of the target object to generate the virtual image of the target object.

According to another aspect of the present disclosure, there is provided a point cloud-based rendering apparatus including: the deep learning feature obtaining module is used for obtaining a first deep learning feature representing a first attribute feature of the object and a second deep learning feature representing a second attribute feature of the object according to the three-dimensional point cloud information of the object; the first feature enhancement module is used for carrying out first feature enhancement processing on the first deep learning features to obtain first rendering feature vectors; the second feature enhancement module is used for carrying out second feature enhancement processing on the second deep learning feature to obtain a second rendering feature vector; and a rendering information obtaining module, configured to obtain object rendering information of the object according to the first rendering feature vector and the second rendering feature vector.

According to another aspect of the present disclosure, there is provided a training apparatus of a deep learning model, including: the first network module is used for inputting sample point cloud information of a sample object into a first neural network of the deep learning model to obtain a sample first deep learning feature representing a first attribute feature of the sample object, wherein the sample point cloud information has a real object geometric label; the second network module is used for inputting the sample point cloud information into a second neural network of the deep learning model to obtain sample second deep learning features representing second attribute features of the sample object; the third network module is used for inputting the first deep learning feature of the sample into a third neural network of the deep learning model to obtain a first rendering feature vector of the sample; a fourth network module, configured to input the second deep learning feature of the sample into a fourth neural network of the deep learning model, to obtain a second rendering feature vector of the sample; a fifth network module, configured to input the sample first rendering feature vector and the sample second rendering feature vector into a fifth neural network of the deep learning model, to obtain sample object rendering information; the rendering result determining module is used for determining a sample object rendering result according to the sample object rendering information and the real object geometric tag; and the training module is used for training the deep learning model according to the real object geometric label and the sample object rendering result to obtain a trained deep learning model.

According to another aspect of the present disclosure, there is provided a point cloud-based rendering apparatus including: the first point cloud information determining module is used for determining point cloud information of an object to be processed according to geometric information of the object to be processed; the deep learning module is used for inputting the point cloud information of the object to be processed into a deep learning model to obtain rendering information of the object to be processed; the second rendering module is used for rendering the geometric information of the object to be processed according to the rendering information of the object to be processed to obtain a rendering result of the object to be processed; wherein the deep learning model is trained by a training device of the deep learning model according to the disclosure.

According to another aspect of the present disclosure, there is provided an avatar generating apparatus including: the second point cloud information determining module is used for determining target object point cloud information according to the target object geometric information of the target object; the processing module is used for processing the target object point cloud information based on the point cloud-based rendering device disclosed by the disclosure to obtain target object rendering information; and the generating module is used for rendering the geometric information of the target object according to the rendering information of the target object and generating the virtual image of the target object.

According to another aspect of the present disclosure, there is provided an electronic device including: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform at least one of the point cloud based rendering method, the training method of the deep learning model, and the avatar generation method of the present disclosure.

According to another aspect of the present disclosure, there is provided a non-transitory computer-readable storage medium storing computer instructions for causing the computer to perform at least one of the point cloud-based rendering method, the training method of the deep learning model, and the avatar generation method of the present disclosure.

According to another aspect of the present disclosure, there is provided a computer program product comprising a computer program stored on at least one of a readable storage medium and an electronic device, which when executed by a processor, implements at least one of the point cloud based rendering method, the training method of a deep learning model, and the avatar generation method of the present disclosure.

It should be understood that the description in this section is not intended to identify key or critical features of the embodiments of the disclosure, nor is it intended to be used to limit the scope of the disclosure. Other features of the present disclosure will become apparent from the following specification.

Drawings

The drawings are for a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:

fig. 1 schematically illustrates an exemplary system architecture to which at least one of a point cloud-based rendering method, a training method of a deep learning model, and an avatar generation method and corresponding apparatuses may be applied, according to an embodiment of the present disclosure;

FIG. 2 schematically illustrates a flow chart of a point cloud based rendering method according to an embodiment of the present disclosure;

FIG. 3 schematically illustrates a flow chart of a training method of a deep learning model according to an embodiment of the present disclosure;

FIG. 4 schematically illustrates a schematic diagram of a training process of a deep learning model according to an embodiment of the present disclosure;

FIG. 5 schematically illustrates a flow chart of a method for implementing a point cloud based rendering based on a deep learning model according to an embodiment of the disclosure;

fig. 6 schematically illustrates a flowchart of an avatar generation method according to an embodiment of the present disclosure;

Fig. 7 schematically illustrates a block diagram of a point cloud based rendering apparatus according to an embodiment of the present disclosure;

FIG. 8 schematically illustrates a block diagram of a training apparatus of a deep learning model according to an embodiment of the present disclosure;

fig. 9 schematically illustrates a block diagram of a point cloud based rendering apparatus according to an embodiment of the present disclosure;

fig. 10 schematically illustrates a block diagram of an avatar generating apparatus according to an embodiment of the present disclosure; and

fig. 11 illustrates a schematic block diagram of an example electronic device 1100 that can be used to implement embodiments of the present disclosure.

Detailed Description

Exemplary embodiments of the present disclosure are described below in conjunction with the accompanying drawings, which include various details of the embodiments of the present disclosure to facilitate understanding, and should be considered as merely exemplary. Accordingly, one of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

In the technical scheme of the disclosure, the related processes of collecting, storing, using, processing, transmitting, providing, disclosing, applying and the like of the personal information of the user all conform to the regulations of related laws and regulations, necessary security measures are adopted, and the public order harmony is not violated.

In the technical scheme of the disclosure, the authorization or consent of the user is obtained before the personal information of the user is obtained or acquired.

When designing an avatar of a high-quality avatar, a professional animator is required to perform professional optimization design on geometric modeling, texture mapping, illumination mapping, and the like of the avatar to construct a basic avatar adapting to business requirements. For example, fine-grained modeling of digital human material, lighting models, 3D models, bone bindings, etc., is required due to the stylization requirements of the stylized avatar. When the stylized rendering map of the virtual image is designed, professional designers are required to be relied on, and iterative optimization design is carried out according to service requirements.

The inventor finds that in the process of realizing the conception of the present disclosure, the professional designer is required to rely on professional software to perform professional design on various aspects such as geometric textures, and the hardware cost and the design cost are high. In addition, the expandability is weak, and it is difficult to realize low-cost migration.

Fig. 1 schematically illustrates an exemplary system architecture to which at least one of a point cloud-based rendering method, a training method of a deep learning model, and an avatar generation method and corresponding apparatuses may be applied, according to an embodiment of the present disclosure.

It should be noted that fig. 1 is only an example of a system architecture to which embodiments of the present disclosure may be applied to assist those skilled in the art in understanding the technical content of the present disclosure, but does not mean that embodiments of the present disclosure may not be used in other devices, systems, environments, or scenarios. For example, in another embodiment, an exemplary system architecture to which at least one of a point cloud based rendering method, a deep learning model training method, and an avatar generation method and corresponding apparatuses may be applied may include a terminal device, but the terminal device may implement at least one of a point cloud based rendering method, a deep learning model training method, and an avatar generation method and corresponding apparatuses provided by embodiments of the present disclosure without interacting with a server.

As shown in fig. 1, a system architecture 100 according to this embodiment may include a first terminal device 101, a second terminal device 102, a third terminal device 103, a network 104, and a server 105. The network 104 is a medium used to provide a communication link between the first terminal device 101, the second terminal device 102, the third terminal device 103, and the server 105. The network 104 may include various connection types, such as wired and/or wireless communication links, and the like.

The user may interact with the server 105 via the network 104 using the first terminal device 101, the second terminal device 102, the third terminal device 103, to receive or send messages etc. Various communication client applications, such as a knowledge reading class application, a web browser application, a search class application, an instant messaging tool, a mailbox client and/or social platform software, etc. (by way of example only) may be installed on the first terminal device 101, the second terminal device 102, the third terminal device 103.

The first terminal device 101, the second terminal device 102, the third terminal device 103 may be various electronic devices having a display screen and supporting web browsing, including but not limited to smartphones, tablets, laptop and desktop computers, and the like.

The server 105 may be a server providing various services, such as a background management server (merely an example) providing support for content browsed by the user with the first terminal apparatus 101, the second terminal apparatus 102, the third terminal apparatus 103. The background management server may analyze and process the received data such as the user request, and feed back the processing result (e.g., the web page, information, or data obtained or generated according to the user request) to the terminal device. The server can be a cloud server, also called a cloud computing server or a cloud host, and is a host product in a cloud computing service system, so that the defects of high management difficulty and weak service expansibility in the traditional physical hosts and VPS service ("Virtual Private Server" or simply "VPS") are overcome. The server may also be a server of a distributed system or a server that incorporates a blockchain.

It should be noted that, at least one of the point cloud based rendering method, the training method of the deep learning model, and the avatar generation method provided in the embodiments of the present disclosure may be generally executed by the first terminal device 101, the second terminal device 102, or the third terminal device 103. Accordingly, at least one of the point cloud based rendering device, the training device of the deep learning model, and the avatar generating device provided in the embodiments of the present disclosure may also be provided in the first terminal device 101, the second terminal device 102, or the third terminal device 103.

Alternatively, at least one of the point cloud-based rendering method, the training method of the deep learning model, and the avatar generation method provided by the embodiments of the present disclosure may also be generally performed by the server 105. Accordingly, at least one of the point cloud based rendering device, the training device of the deep learning model, and the avatar generation device provided in the embodiments of the present disclosure may be generally provided in the server 105. At least one of the point cloud based rendering method, the training method of the deep learning model, and the avatar generation method provided by the embodiments of the present disclosure may also be performed by a server or a server cluster that is different from the server 105 and is capable of communicating with the first terminal device 101, the second terminal device 102, the third terminal device 103, and/or the server 105. Accordingly, at least one of the point cloud based rendering device, the training device of the deep learning model, and the avatar generating device provided in the embodiments of the present disclosure may also be provided in a server or a server cluster that is different from the server 105 and is capable of communicating with the first terminal device 101, the second terminal device 102, the third terminal device 103, and/or the server 105.

For example, when rendering is performed based on the point cloud, the first terminal device 101, the second terminal device 102, and the third terminal device 103 may acquire three-dimensional point cloud information of the object, then send the acquired three-dimensional point cloud information to the server 105, and the server 105 obtains, according to the three-dimensional point cloud information, a first deep learning feature characterizing a first attribute feature of the object and a second deep learning feature characterizing a second attribute feature of the object; performing first feature enhancement processing on the first deep learning features to obtain first rendering feature vectors; performing second feature enhancement processing on the second deep learning features to obtain second rendering feature vectors; and obtaining object rendering information of the object according to the first rendering feature vector and the second rendering feature vector. Or the server cluster capable of communicating with the first terminal device 101, the second terminal device 102, the third terminal device 103 and/or the server 105 performs the foregoing processing on the three-dimensional point cloud information, and achieves obtaining object rendering information of the object.

For example, when training the deep learning model, the first terminal device 101, the second terminal device 102, and the third terminal device 103 may acquire sample point cloud information of a sample object, where the sample point cloud information has a real object geometric tag, then send the acquired sample point cloud information to the server 105, and the server 105 inputs the sample point cloud information of the sample object into a first neural network of the deep learning model to obtain a sample first deep learning feature that characterizes a first attribute feature of the sample object; inputting sample point cloud information into a second neural network of the deep learning model to obtain sample second deep learning features representing second attribute features of the sample object; inputting the first deep learning feature of the sample into a third neural network of the deep learning model to obtain a first rendering feature vector of the sample; inputting the second deep learning feature of the sample into a fourth neural network of the deep learning model to obtain a second rendering feature vector of the sample; inputting the first rendering feature vector of the sample and the second rendering feature vector of the sample into a fifth neural network of the deep learning model to obtain sample object rendering information; determining a sample object rendering result according to the sample object rendering information and the real object geometric label; and training the deep learning model according to the real object geometric label and the sample object rendering result to obtain a trained deep learning model. Or by a server or cluster of servers capable of communicating with the first terminal device 101, the second terminal device 102, the third terminal device 103 and/or the server 105, and to achieve a trained deep learning model.

For example, when rendering is performed based on the point cloud, the first terminal device 101, the second terminal device 102, and the third terminal device 103 may acquire the geometric information of the object to be processed, and then send the acquired geometric information of the object to be processed to the server 105, and the server 105 determines the point cloud information of the object to be processed according to the geometric information of the object to be processed; inputting the point cloud information of the object to be processed into a deep learning model to obtain rendering information of the object to be processed; and rendering the geometric information of the object to be processed according to the rendering information of the object to be processed to obtain the rendering result of the object to be processed. Or the server cluster capable of communicating with the first terminal device 101, the second terminal device 102, the third terminal device 103 and/or the server 105 performs the foregoing processing on the geometric information of the object to be processed, and determines the rendering result of the object to be processed.

For example, when the avatar is generated, the first terminal device 101, the second terminal device 102, and the third terminal device 103 may acquire target object geometry information of the target object, and then transmit the acquired target object geometry information to the server 105, and the server 105 determines target object point cloud information according to the target object geometry information; processing the target object point cloud information based on the point cloud-based rendering method to obtain target object rendering information; and rendering the geometric information of the target object according to the rendering information of the target object to generate the virtual image of the target object. Or the aforementioned processing of the target object geometry information by a server or server cluster capable of communicating with the first terminal device 101, the second terminal device 102, the third terminal device 103 and/or the server 105, and the generation of the avatar of the target object is achieved.

It should be understood that the number of terminal devices, networks and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.

Fig. 2 schematically illustrates a flow chart of a point cloud based rendering method according to an embodiment of the present disclosure.

As shown in fig. 2, the method includes operations S210 to S240.

In operation S210, according to the three-dimensional point cloud information of the object, a first deep learning feature characterizing a first attribute feature of the object and a second deep learning feature characterizing a second attribute feature of the object are obtained.

In operation S220, a first feature enhancement process is performed on the first deep learning feature to obtain a first rendering feature vector.

In operation S230, a second feature enhancement process is performed on the second deep learning feature to obtain a second rendering feature vector.

In operation S240, object rendering information of the object is obtained according to the first rendering feature vector and the second rendering feature vector.

According to embodiments of the present disclosure, a three-dimensional laser scanner or a photographic scanner may be used to scan an object to obtain three-dimensional point cloud information of the object. One or more object images may also be acquired for the object first. Then, three-dimensional point cloud information of the acquired position information object of the point of the object is determined according to the one or more object images. The three-dimensional point cloud information may include three-dimensional position information and normal line information of the acquired points of the object, and the three-dimensional point cloud information may not include color information.

According to the embodiment of the disclosure, the first attribute information of the three-dimensional point cloud information can be subjected to feature extraction based on the first attribute feature extraction module, so that the first deep learning feature is obtained. And carrying out feature extraction on the second attribute information of the three-dimensional point cloud information based on the first attribute feature extraction module to obtain a second deep learning feature. The deep learning features of the three-dimensional point cloud information can also be learned based on a CodeBook algorithm, so that a first deep learning feature and a second deep learning feature are obtained.

It should be noted that the method for obtaining the first deep learning feature and the second deep learning feature is only an exemplary embodiment, but not limited thereto, and may include other methods known in the art, as long as the first deep learning feature characterizing the first attribute feature of the object and the second deep learning feature characterizing the second attribute feature of the object can be obtained according to the three-dimensional point cloud information of the object.

According to an embodiment of the present disclosure, the first attribute feature may include at least one of: diffuse reflection features, specular features, normal features, etc., and may not be limited thereto. The second attribute feature may include at least one of: diffuse reflection features, specular features, normal features, etc., and may not be limited thereto. The first attribute feature and the second attribute feature characterize different attribute features. The normal feature may be obtained by geometric calculation.

According to embodiments of the present disclosure, the deep learning feature and the rendering feature vector may have the same or different data manifestations. The feature enhancement process may include at least one of: convolution processing, upsampling processing, downsampling processing, texture analysis, and the like, and may not be limited thereto. For example, the first deep learning feature may be processed using a convolutional network to obtain a first rendered feature vector. The second deep learning feature may be processed using the same or a different convolutional network than the convolutional network that processed the first deep learning feature, resulting in a second rendered feature vector. The convolutional network may include, for example, a U-Net network, and may not be limited thereto.

It should be noted that the method for obtaining the first rendering feature vector and the second rendering feature vector is only an exemplary embodiment, but not limited thereto, and may also include other networks, models, or algorithms known in the art, as long as obtaining the first rendering feature vector according to the first deep learning feature and obtaining the second rendering feature vector according to the second deep learning feature can be achieved.

According to an embodiment of the present disclosure, the object rendering information may include information characterizing the first attribute feature and the second attribute feature. For example, the first rendering feature vector is a feature vector characterizing the first attribute feature, the second rendering feature vector is a feature vector characterizing the second attribute feature, and object rendering information including the first attribute feature and the second attribute feature may be obtained from the first rendering feature vector and the second rendering feature vector. For example, the first rendering feature vector is a feature vector characterizing a second attribute feature, the second rendering feature vector is a feature vector characterizing the first attribute feature, and object rendering information including the first attribute feature and the second attribute feature may be obtained from the first rendering feature vector and the second rendering feature vector.

Through the embodiment of the disclosure, different attribute characteristics of the object can be decoupled on the point cloud layer surface, deep characteristics of various attributes are respectively learned, self-supervision is convenient, and the obtained object rendering information can comprise the deep characteristics of various attributes, so that the rendering effect is improved.

The method shown in fig. 2 is further described below in connection with the specific examples.

According to an embodiment of the present disclosure, the three-dimensional point cloud information may include at least one of: sparse point cloud information and dense point cloud information.

According to embodiments of the present disclosure, the three-dimensional point cloud information may include a plurality of different densities of point cloud information acquired for the same object. The sparse point cloud information and the dense point cloud information can both represent point cloud information of a plurality of different densities. The measurement criteria of the sparse point cloud information and the dense point cloud information may be predetermined threshold determination or determined by comparison. For example, there are two different densities of point cloud information, relatively sparse point cloud information in the two pieces of point cloud information may be determined as sparse point cloud information, and relatively dense point cloud information in the two pieces of point cloud information may be determined as dense point cloud information.

According to the embodiment of the disclosure, the method is performed on sparse point cloud information and dense point cloud information corresponding to the same object, so that the same or similar object rendering information can be obtained, and the same or similar rendering effect can be achieved.

According to an embodiment of the present disclosure, the first attribute feature may include a diffuse reflectance feature. The second attribute feature may comprise a highlight feature.

According to embodiments of the present disclosure, diffuse reflectance features may characterize texture information. The highlight features may characterize the illumination information. In the case where a deep learning feature is required, for example, a first deep learning feature characterizing diffuse reflection features may be learned based on an Albedo-CodeBook module, and a second deep learning feature characterizing Specular reflection features may be learned based on a spectral-CodeBook module. The A1bedo-CodeBook module and the speclar-CodeBook module may have the same network result for extracting different feature information.

Through the above-mentioned embodiment of this disclosure, can realize diffuse reflection and the material decoupling of highlight, be favorable to the feel to promote, and make things convenient for self-supervision.

According to an embodiment of the present disclosure, after obtaining the first deep learning feature, the above-described operation S220 may include: the method comprises the steps of obtaining a first conditional feature vector, wherein the first conditional feature vector characterizes features of a first three-dimensional point cloud represented by at least one first view angle, and the first three-dimensional point cloud is a point cloud represented by three-dimensional point cloud information. And responding to the first condition feature vector as a constraint, and performing first convolution processing on the first deep learning feature to obtain a first rendering feature vector.

After obtaining the second deep learning feature, according to an embodiment of the present disclosure, the above-described operation S230 may include: and acquiring a second conditional feature vector, wherein the second conditional feature vector characterizes the features of a second three-dimensional point cloud represented by at least one second view angle, and the second three-dimensional point cloud is the point cloud represented by the three-dimensional point cloud information. And responding to the second condition feature vector as a constraint, and carrying out second convolution processing on the second deep learning feature to obtain a second rendering feature vector.

According to the embodiment of the disclosure, the conditional feature vector can be obtained by encoding or feature extraction of the three-dimensional point cloud information. The three-dimensional point cloud information can be mapped to at least one image space to obtain a two-dimensional point cloud image of at least one corresponding view angle. And then, coding or extracting the features of at least one two-dimensional point cloud image to obtain a conditional feature vector. Based on the mode, the obtained conditional feature vector can contain features represented by the three-dimensional point cloud information in at least one corresponding view angle.

According to the embodiment of the disclosure, the conditional feature vector can be used as a constraint in the feature processing process, and can be used for constraining the feature represented by the obtained rendering feature vector to be consistent with the feature represented by the initial three-dimensional point cloud information.

For example, the first three-dimensional point cloud information may be encoded or feature extracted based on at least one first view angle, resulting in a first conditional feature vector. Then, in the process of obtaining the first rendering feature vector according to the first deep learning feature, the first conditional feature vector is taken as a constraint, so that the first rendering feature vector with similar feature performance to that of the three-dimensional point cloud information is obtained.

For example, the second three-dimensional point cloud information may be encoded or feature extracted based on at least one second view angle, resulting in a second conditional feature vector. Then, in the process of obtaining the second rendering feature vector according to the second deep learning feature, the second rendering feature vector with the feature performance similar to the feature performance of the three-dimensional point cloud information can be obtained by taking the second conditional feature vector as a constraint.

The first convolution process and the second convolution process may be convolution processes implemented based on the same or different convolution networks, and are not limited herein.

By the embodiment of the disclosure, the conditional feature vector is introduced as constraint, so that more complete feature expression can be combined, and the accuracy of the obtained rendering feature vector is improved, so that the rendering result is more realistic.

According to an embodiment of the present disclosure, the acquiring the first conditional feature vector may include: and acquiring at least one first target point cloud obtained after rotating the first three-dimensional point cloud by at least one first angle. And extracting the characteristics of the first target point cloud to obtain a first conditional characteristic vector.

According to an embodiment of the present disclosure, the at least one first perspective may represent a perspective that the first three-dimensional point cloud exhibits after rotating the first three-dimensional point cloud by at least one first angle.

According to an embodiment of the present disclosure, the first angle may be a fixed angle, such as 10 ° or 20 ° or 30 °, or the like, and may not be limited thereto. For example, the first angle may be 20 °, rotation of at least one first angle may be represented as 20 °, 40 °, 60 °, 340 °, etc. The at least one first angle may be at least one angle at random, for example, may include 10 °, 50 °, 110 °, 310 °, etc., and may not be limited thereto. Corresponding to this embodiment, the rotation of at least one angle may be represented by, for example, at least one of rotation 10 °, rotation 50 °, rotation 110 °, rotation 310 °, and the like, and may not be limited thereto.

According to an embodiment of the present disclosure, the acquiring the first conditional feature vector may further include: and determining a first image obtained by projecting the first three-dimensional point cloud in at least one direction of the X-axis direction, the Y-axis direction and the Z-axis direction of the first preset three-dimensional rectangular coordinate system according to the first preset three-dimensional rectangular coordinate system and the three-dimensional point cloud information. And extracting the features of the first image to obtain a first conditional feature vector.

According to an embodiment of the disclosure, the at least one first viewing angle may further represent a viewing angle corresponding to at least one of an X-axis direction, a Y-axis direction, and a Z-axis direction of the first preset three-dimensional rectangular coordinate system. The orientation of the first preset three-dimensional rectangular coordinate system may be set in a self-defined manner, which is not limited herein. The first conditional feature vector may be obtained by feature extraction of a first image projected in the respective directions.

According to an embodiment of the disclosure, for each first target point cloud, a first preset three-dimensional rectangular coordinate system may be combined, and the first target point cloud may be projected in at least one direction of an X-axis direction, a Y-axis direction, and a Z-axis direction of the first preset three-dimensional rectangular coordinate system, to obtain a first image corresponding to the first target point cloud. And then, a first conditional feature vector of the first three-dimensional point cloud which is initially represented under a plurality of view angles can be obtained by combining a feature extraction mode.

According to the embodiment of the disclosure, since the illumination effect and the represented texture feature of the same object at different angles may be different, at least one first target point cloud obtained after rotating the first three-dimensional point cloud by at least one first angle may correspond to at least one highlight feature and at least one texture feature. The resulting first conditional feature vector may include highlight features and texture features for the corresponding viewing angle.

According to an embodiment of the present disclosure, the acquiring the second conditional feature vector may include: and acquiring at least one second target point cloud obtained after rotating the second three-dimensional point cloud by at least one second angle. And extracting features of the second target point cloud to obtain a second conditional feature vector.

According to an embodiment of the present disclosure, the obtaining the second conditional feature vector may further include: and determining a second image obtained by projecting the second three-dimensional point cloud in at least one direction of the X-axis direction, the Y-axis direction and the Z-axis direction of the second preset three-dimensional rectangular coordinate system according to the second preset three-dimensional rectangular coordinate system and the three-dimensional point cloud information. And extracting the features of the second image to obtain a second conditional feature vector.

According to embodiments of the present disclosure, the second angle may have the same or similar features as the first angle described previously, and the second angle and the first angle may be the same or different. The second target point cloud may have the same or similar characteristics as the first target point cloud described previously. The second conditional feature vector may have the same or similar features as the first conditional feature vector described above. The second preset three-dimensional rectangular coordinate system may have the same or similar characteristics as the aforementioned first preset three-dimensional rectangular coordinate system, and the set orientation of the second preset three-dimensional rectangular coordinate system and the set orientation of the first preset three-dimensional rectangular coordinate system may be the same or different. For the description of the second angle, the second target point cloud, the second conditional feature vector, the second preset three-dimensional rectangular coordinate system, the second conditional feature vector, and the like, reference may be made to the foregoing embodiments for the first angle, the first target point cloud, the first conditional feature vector, the first preset three-dimensional rectangular coordinate system, the first conditional feature vector, and the like, which are not described herein again.

According to an embodiment of the present disclosure, since lighting effects and the exhibited texture features of the same object at different angles may be different, based on the foregoing one or more embodiments, at least one second target point cloud obtained after rotating the second three-dimensional point cloud by at least one second angle may correspond to at least one highlight feature and at least one texture feature. The resulting second conditional feature vector may include highlight features and texture features for the corresponding viewing angle.

What is needed is that the highlight features and texture features are merely exemplary descriptions, and other attribute features that can be obtained in the process of extracting the iron certificate may be included in the second conditional feature vector, which is not limited herein.

Through the embodiment of the disclosure, the XYZ three-way visual angles and the 360-degree rotation visual angles can be decoupled, so that the characteristic expression of the three-dimensional point cloud at each visual angle is obtained, more complete characteristic information can be learned, and the rendering effect is improved.

According to an embodiment of the present disclosure, after obtaining the first rendering feature vector and the second rendering feature vector based on the aforementioned method, the aforementioned operation S240 may include: and fusing the first rendering feature vector and the second rendering feature vector to obtain a fused feature vector. And determining object rendering information according to the fusion feature vector.

According to the embodiment of the disclosure, the operation of adding or multiplying the first rendering feature vector and the second rendering feature vector can be performed to achieve fusion, so as to obtain a fusion feature vector. The fusion feature vector may be fused with the first attribute feature and the second attribute feature. And decoding the fusion feature vector to obtain object rendering information representing the first attribute feature and the second attribute feature.

For example, the first attribute feature is a texture feature characterized by diffuse reflection information and the second attribute feature is a highlight feature characterized by highlight information. The fused feature vector may include texture features and highlight features. The object rendering information may include texture information and highlight information.

It should be noted that the above-described addition or multiplication operation is only an exemplary embodiment, but is not limited thereto, and other methods of fusion calculation known in the art may be also included as long as determination of the fusion feature vector from the first rendering feature vector and the second rendering feature vector can be achieved.

Through the embodiment of the disclosure, the object rendering information independent of the object can be obtained through a point cloud based rendering method, and the object rendering information can be suitable for various objects to achieve rendering of various objects.

According to embodiments of the present disclosure, after object rendering information is obtained, object geometry information of the object may be first acquired. And then, rendering the object geometric information according to the object rendering information to obtain an object rendering result of the object.

According to embodiments of the present disclosure, the object geometry information may be in the form of a two-dimensional object image or a three-dimensional object model. The three-dimensional object model may be in the form of a three-dimensional object point cloud, and may not be limited thereto.

For example, when determining an object to be rendered, a three-dimensional object model of the object to be rendered may be obtained first. Then, first target object rendering information corresponding to each region to be rendered in the three-dimensional object model may be determined according to the object rendering information. And then, according to the first target object rendering information and the three-dimensional object model, obtaining a three-dimensional object rendering result of the object.

For example, when determining an object to be rendered, a two-dimensional object image of the object to be rendered may also be obtained first. Then, second target object rendering information corresponding to the two-dimensional object image may be determined according to the object rendering information. And then, according to the second target object rendering information and the two-dimensional object image, obtaining a rendering result of the two-dimensional object image, or obtaining an object rendering result of the object under the view angle represented by the two-dimensional object image.

According to the embodiment of the disclosure, based on the obtained object rendering information, the object is rendered, and the obtained object rendering result can be closer to the rendering effect of the real environment.

According to the embodiment of the disclosure, the deep learning model can be trained in combination with the implementation process, so that the rendering method based on the point cloud can be realized.

Fig. 3 schematically illustrates a flowchart of a training method of a deep learning model according to an embodiment of the present disclosure.

As shown in fig. 3, the method includes operations S310 to S370.

In operation S310, sample point cloud information of a sample object is input into a first neural network of a deep learning model, and a sample first deep learning feature characterizing a first attribute feature of the sample object is obtained, where the sample point cloud information has a real object geometric label.

In operation S320, the sample point cloud information is input into a second neural network of the deep learning model, resulting in sample second deep learning features that characterize second attribute features of the sample object.

In operation S330, the sample first deep learning feature is input into a third neural network of the deep learning model to obtain a sample first rendering feature vector.

In operation S340, the sample second deep learning feature is input into the fourth neural network of the deep learning model to obtain a sample second rendering feature vector.

In operation S350, the sample first rendering feature vector and the sample second rendering feature vector are input into a fifth neural network of the deep learning model to obtain sample object rendering information.

In operation S360, a sample object rendering result is determined according to the sample object rendering information and the real object geometric tag.

In operation S370, the deep learning model is trained according to the real object geometric label and the sample object rendering result, and a trained deep learning model is obtained.

According to embodiments of the present disclosure, the sample point cloud information may have the same or similar features as the aforementioned three-dimensional point cloud information. The sample first deep learning feature may have the same or similar features as the first deep learning feature described previously. The sample second deep learning feature may have the same or similar features as the aforementioned second deep learning feature. The sample first rendering feature vector may have the same or similar features as the aforementioned first rendering feature vector. The sample second rendering feature vector may have the same or similar features as the aforementioned second rendering feature vector. The sample object rendering information may have the same or similar characteristics as the aforementioned object rendering information. The real object geometric tags may have the same or similar characteristics as the object geometric information described previously. The sample object rendering results may have the same or similar characteristics as the aforementioned object rendering results. For the description of the embodiments of the sample point cloud information, the sample first deep learning feature, the sample second deep learning feature, the sample first rendering feature vector, the sample second rendering feature vector, the sample object rendering information, the real object geometric tag, and the sample object rendering result, reference may be made to the foregoing embodiments for the three-dimensional point cloud information, the first deep learning feature, the second deep learning feature, the first rendering feature vector, the second rendering feature vector, the object rendering information, the object geometric information, and the object rendering result, which are not described herein again.

According to embodiments of the present disclosure, both the first neural network and the second neural network may be networks constructed based on the CodeBook algorithm. Corresponding to a specific embodiment, for example, in the case where the first attribute feature is a diffuse reflectance feature and the second attribute feature is a highlight feature, the first neural network may be a network constructed by an Albedo-CodeBook module, and the second neural network may be a network constructed by a speculum-CodeBook module. The third neural network and the fourth neural network may each be a U-Net network. The fifth neural network may be, for example, an MLP (multilayer perceptron, multi-layer perceptron).

It should be noted that the above-mentioned limitation of the first neural network, the second neural network, the third neural network, the fourth neural network, and the fifth neural network is only an exemplary embodiment, but not limited thereto, and other networks known in the art may be used for each neural network as long as the corresponding method can be implemented. The learning process of the training process corresponding to the first neural network, the second neural network, the third neural network, the fourth neural network, and the fifth neural network may be referred to the description of the embodiment of the point cloud based rendering method, and will not be repeated here.

According to the embodiment of the disclosure, in the training process, RGB loss, perception loss, mask loss, eikonal-sdf loss and the like can be constructed for network supervision, and the training process is completed by making the losses converge through a training network. RGB supervision can be used to constrain sample object rendering results to approach those presented by the sample object in a real environment. The perceptual penalty may establish coarse-grained constraints for sample object rendering results and real object geometry labels, which may be used to constrain that different perspectives are for the same sample object. mask loss may be used to constrain semantic information of sample rendering result representations corresponding to the same perspective to approach semantic information of real object geometric label representations corresponding to the same perspective. eikonal-sdf loss may be used to constrain the surface characterized by the sample rendering results to tend to be smooth.

Through the embodiment of the disclosure, a model suitable for realizing a rendering method based on point cloud can be realized, and the model has strong expansibility and can be suitable for various virtual image generation scenes.

According to an embodiment of the present disclosure, before performing the above-described operations S310 and S320, sample point cloud information may be first acquired, and the method may include: and acquiring sparse point cloud information of the sample object. And up-sampling the sparse point cloud information to obtain dense point cloud information of the sample object. And determining sample point cloud information according to at least one of the sparse point cloud information and the dense point cloud information.

According to embodiments of the present disclosure, the above-described upsampling process may be implemented based on, for example, a KNN-mean algorithm. For example, the location information of each point in the sparse point cloud may be first determined from the sparse point cloud information. Then, a position average value can be calculated for the position information of a plurality of adjacent points in the sparse point cloud, and the position information of the insertion point can be obtained. By inserting new points at the insertion points, a denser dense point cloud is obtained, enabling the upsampling process. In this process, the pseudo normal information and the diffuse reflection information of the new point inserted at the insertion point may be determined by taking the average value of the pseudo normal information and the diffuse reflection information of a plurality of original points adjacent to the new point.

It should be noted that the above-mentioned up-sampling process may be performed multiple times, and a dense point cloud information with higher density may be obtained after each up-sampling. Each stage of training may be trained using one or more point cloud information of different densities.

Fig. 4 schematically illustrates a schematic diagram of a training process of a deep learning model according to an embodiment of the present disclosure.

As shown in FIG. 4, the deep learning model 400 includes an Albedo-Codebook module 410, a Specular-Codebook module 420, a first U-Net network 430, a second U-Net network 440, and an MLP 450. Initial point cloud information 401 may be acquired for a sample object, for example. The sparse point cloud information 402 may be obtained by performing denoising, data coordinate normalization, and the like on the initial point cloud information, and the first dense point cloud information 403 may be obtained by upsampling the sparse point cloud information 402, for example. By upsampling the sparse point cloud information 402 or the first dense point cloud information 403, for example, the second dense point cloud information 404 may be obtained.

For example, at the beginning of training a deep learning model, sparse point cloud information 402 may first be input into Albedo-Codebook module 410 and Specular-Codebook module 420, resulting in first deep learning features 412 characterizing diffuse reflection information and second deep learning features 422 characterizing highlight information, respectively. The two deep learning features may be learned and processed via the first U-Net network 430 and the second U-Net network 440, respectively, to obtain a first rendering feature vector 432 including diffuse reflection information and a second rendering feature vector 442 including highlight information. A first rendering effect 4321 corresponding to the sample object geometry information 409, which is obtained based on the first rendering feature vector, and a second rendering effect 4321 corresponding to the sample object geometry information 409, which is obtained based on the second rendering feature vector, are shown in fig. 4. The object rendering information output by the deep learning model in the current training stage can be obtained by fusing the first rendering feature vector 432 and the second rendering feature vector 442 and processing the obtained fused feature vector through the MLP 450. An object rendering result 452 corresponding to the sample object geometry information 409 based on the object rendering information is shown in fig. 4. Then, parameters in the deep learning model 400 can be adjusted according to the object rendering result 452 and the real object geometric information acquired for the sample object until the output result of the deep learning model obtained by training based on the sparse point cloud information 402 converges, and the deep learning model obtained by training in the stage can be obtained. The foregoing process may then be performed in conjunction with the first dense point cloud information 403 to obtain a deep learning model that is trained in the next stage. The foregoing process may then be performed in conjunction with the second dense point cloud information 404 to obtain a deep learning model that is trained in the next stage. And so on, the process may be cycled multiple times.

It should be noted that, for the training of each stage, at least two of the sparse point cloud information 402, the first dense point cloud information 403, the second dense point cloud information 404, and even more sparse or dense point cloud information may be used as sample point cloud information to perform the above training process. The sample object geometric information 409 may be information represented by a real object geometric label, which is not limited herein.

According to an embodiment of the disclosure, as shown in fig. 4, during each training process, a first conditional feature vector 439 obtained based on the foregoing method may be further input into the first U-Net network 430, and a second conditional feature vector 449 obtained based on the foregoing method may be input into the second U-Net network 440, so as to achieve more complete feature expression learned during the training process, and improve the output accuracy of the trained deep learning model.

Through the embodiment of the disclosure, the deep learning model is obtained based on the point cloud information training of different sparsities, and a more accurate output result can be obtained for the point cloud information of different sparsities, so that the generalization capability of the model can be effectively improved, and the application of the model in the point cloud corresponding to various sparsities is realized.

According to an embodiment of the present disclosure, after the above-described trained deep learning model is obtained, a point cloud-based rendering method may be performed based on the trained deep learning model.

Fig. 5 schematically illustrates a flowchart of a method for implementing a point cloud based rendering based on a deep learning model according to an embodiment of the present disclosure.

As shown in FIG. 5, the method includes operations S510-S530.

In operation S510, the object point cloud information to be processed is determined according to the object geometric information of the object to be processed.

In operation S520, the object point cloud information to be processed is input into the deep learning model to obtain rendering information of the object to be processed.

In operation S530, rendering is performed on the geometric information of the object to be processed according to the rendering information of the object to be processed, so as to obtain a rendering result of the object to be processed.

According to embodiments of the present disclosure, the object geometry information to be processed may have the same or similar features as the aforementioned object geometry information. The object point cloud information to be processed may have the same or similar characteristics as the aforementioned three-dimensional point cloud information. The object rendering information to be processed may have the same or similar characteristics as the aforementioned object rendering information. The object rendering results to be processed may have the same or similar characteristics as the aforementioned object rendering results. The geometric information of the object to be processed, the point cloud information of the object to be processed, the rendering result of the object to be processed, and the operations S510 to S530 may be described in embodiments, and will not be described in detail herein.

Through the embodiment of the disclosure, the rendering method based on the point cloud is realized by training the deep learning model, so that the rendering method based on the point cloud can be easily migrated to each scene, and stronger expansibility is realized.

According to an embodiment of the present disclosure, when an avatar needs to be generated, the avatar may be generated based on the above-described point cloud-based rendering method or based on the above-described trained deep learning model capable of implementing the above-described point cloud-based rendering method.

Fig. 6 schematically illustrates a flowchart of an avatar generation method according to an embodiment of the present disclosure.

As shown in fig. 6, the method includes operations S610 to S630.

In operation S610, target object point cloud information is determined according to target object geometric information of the target object.

In operation S620, the target object point cloud information is processed based on the point cloud rendering method, to obtain target object rendering information.

In operation S630, the target object geometry information is rendered according to the target object rendering information, and an avatar of the target object is generated.

According to embodiments of the present disclosure, the target object geometry information may have the same or similar features as the aforementioned object geometry information. The target object point cloud information may have the same or similar characteristics as the aforementioned three-dimensional point cloud information. The target object rendering information may have the same or similar characteristics as the aforementioned object rendering information. For the description of the embodiments of the target object geometry information, the target object point cloud information, the target object rendering information, and the steps of operations S610 to S630, reference may be made to the foregoing embodiments, and details are not repeated herein.

According to an embodiment of the present disclosure, the above operation S630 may include: and determining a target object rendering result according to the target object geometric information and the target object rendering information. The target object rendering result is determined as an avatar generated for the target object.

According to embodiments of the present disclosure, the target object rendering result may have the same or similar characteristics as the aforementioned object rendering result. For an embodiment description of the rendering result of the target object, reference may be made to the foregoing embodiment, and details are not repeated herein.

Through the above embodiments of the present disclosure, an avatar generation driving method implemented by a point cloud-based rendering method is provided, which has great advantages in terms of computing cost, hardware cost, terminal suitability, rendering engine adaptation, convergence speed, and other methods. The method is not only suitable for generating the interaction scene of the metauniverse virtual digital person, but also suitable for generating the interaction scene of the virtual image of most terminals at present. The method is expected to be a standard form for generating display interaction by multi-terminal digital persons in the meta universe.

Fig. 7 schematically illustrates a block diagram of a point cloud based rendering apparatus according to an embodiment of the present disclosure.

As shown in fig. 7, the point cloud based rendering apparatus 700 includes a deep learning feature obtaining module 710, a first feature enhancing module 720, a second feature enhancing module 730, and a rendering information obtaining module 740.

The deep learning feature obtaining module 710 is configured to obtain, according to the three-dimensional point cloud information of the object, a first deep learning feature that characterizes a first attribute feature of the object and a second deep learning feature that characterizes a second attribute feature of the object.

The first feature enhancement module 720 is configured to perform a first feature enhancement process on the first deep learning feature to obtain a first rendering feature vector.

And a second feature enhancement module 730, configured to perform a second feature enhancement process on the second deep learning feature to obtain a second rendering feature vector.

The rendering information obtaining module 740 is configured to obtain object rendering information of the object according to the first rendering feature vector and the second rendering feature vector.

According to an embodiment of the present disclosure, the three-dimensional point cloud information includes at least one of: sparse point cloud information and dense point cloud information.

According to an embodiment of the present disclosure, the first attribute features comprise diffuse reflectance features and the second attribute features comprise highlight features.

According to an embodiment of the present disclosure, the first feature enhancement module includes a first conditional feature vector acquisition unit and a first convolution unit.

The first conditional feature vector obtaining unit is configured to obtain a first conditional feature vector, where the first conditional feature vector characterizes features of a first three-dimensional point cloud represented by at least one first view angle, and the first three-dimensional point cloud is a point cloud represented by three-dimensional point cloud information.

And the first convolution unit is used for responding to the first condition feature vector serving as a constraint and carrying out first convolution processing on the first deep learning feature to obtain a first rendering feature vector.

According to an embodiment of the present disclosure, the first conditional feature vector acquisition unit includes a first target point cloud acquisition subunit and a first feature extraction subunit.

The first target point cloud acquisition subunit is used for acquiring at least one first target point cloud obtained after rotating the first three-dimensional point cloud by at least one first angle; and

and the first feature extraction subunit is used for carrying out feature extraction on the first target point cloud to obtain a first conditional feature vector.

According to an embodiment of the present disclosure, the first conditional feature vector obtaining unit includes a first image determining subunit and a second feature extracting subunit.

The first image determining subunit is configured to determine, according to the first preset three-dimensional rectangular coordinate system and the three-dimensional point cloud information, a first image obtained by projecting the first three-dimensional point cloud in at least one direction of an X-axis direction, a Y-axis direction, and a Z-axis direction of the first preset three-dimensional rectangular coordinate system.

And the second feature extraction subunit is used for carrying out feature extraction on the first image to obtain a first conditional feature vector.

According to an embodiment of the present disclosure, the second feature enhancement module includes a second conditional feature vector acquisition unit and a second convolution unit.

The second conditional feature vector obtaining unit is configured to obtain a second conditional feature vector, where the second conditional feature vector characterizes features of a second three-dimensional point cloud that are represented by three-dimensional point cloud information at least one second view angle.

And the second convolution unit is used for responding to the second deep learning feature by taking the second conditional feature vector as a constraint and carrying out second convolution processing on the second deep learning feature to obtain a second rendering feature vector.

According to an embodiment of the present disclosure, the second conditional feature vector acquisition unit includes a second target point cloud acquisition subunit and a third feature extraction subunit.

The second target point cloud acquisition subunit is configured to acquire at least one second target point cloud obtained after rotating the second three-dimensional point cloud by at least one second angle.

And the third feature extraction subunit is used for carrying out feature extraction on the second target point cloud to obtain a second conditional feature vector.

According to an embodiment of the present disclosure, the second conditional feature vector obtaining unit includes a second image determining subunit and a fourth feature extracting subunit.

The second image determining subunit is configured to determine, according to the second preset three-dimensional rectangular coordinate system and the three-dimensional point cloud information, a second image obtained by projecting the second three-dimensional point cloud in at least one direction of an X-axis direction, a Y-axis direction, and a Z-axis direction of the second preset three-dimensional rectangular coordinate system.

And the fourth feature extraction subunit is used for carrying out feature extraction on the second image to obtain a second conditional feature vector.

According to an embodiment of the present disclosure, a rendering information obtaining module includes a fusion unit and a rendering information determining unit.

And the fusion unit is used for fusing the first rendering feature vector and the second rendering feature vector to obtain a fusion feature vector.

And the rendering information determining unit is used for determining object rendering information according to the fusion feature vector.

According to an embodiment of the disclosure, the point cloud based rendering device further includes a geometric information acquisition module and a first rendering module.

And the geometric information acquisition module is used for acquiring the object geometric information of the object.

The first rendering module is used for rendering the object geometric information according to the object rendering information to obtain an object rendering result of the object.

Fig. 8 schematically illustrates a block diagram of a training apparatus of a deep learning model according to an embodiment of the present disclosure.

As shown in fig. 8, the training apparatus 800 of the deep learning model includes a first network module 810, a second network module 820, a third network module 830, a fourth network module 840, a fifth network module 850, a third rendering result determination module 860, and a third training module 870.

The first network module 810 is configured to input sample point cloud information of a sample object into a first neural network of a deep learning model, and obtain a sample first deep learning feature characterizing a first attribute feature of the sample object, where the sample point cloud information has a real object geometric tag.

The second network module 820 is configured to input the sample point cloud information into a second neural network of the deep learning model, and obtain a sample second deep learning feature that characterizes a second attribute feature of the sample object.

The third network module 830 is configured to input the first sample deep learning feature into a third neural network of the deep learning model to obtain a first sample rendering feature vector.

The fourth network module 840 is configured to input the sample second deep learning feature into a fourth neural network of the deep learning model, to obtain a sample second rendering feature vector.

The fifth network module 850 is configured to input the sample first rendering feature vector and the sample second rendering feature vector into a fifth neural network of the deep learning model, to obtain sample object rendering information.

The rendering result determining module 860 is configured to determine a sample object rendering result according to the sample object rendering information and the real object geometric tag.

The training module 870 is configured to train the deep learning model according to the real object geometric label and the sample object rendering result, and obtain a trained deep learning model.

According to the embodiment of the disclosure, the training device of the deep learning model further comprises a sparse point cloud acquisition module, an up-sampling module and a sample point cloud determination module.

The sparse point cloud acquisition module is used for acquiring sparse point cloud information of the sample object.

And the up-sampling module is used for up-sampling the sparse point cloud information to obtain dense point cloud information of the sample object.

The sample point cloud determining module is used for determining sample point cloud information according to at least one of sparse point cloud information and dense point cloud information.

Fig. 9 schematically illustrates a block diagram of a point cloud based rendering apparatus according to an embodiment of the present disclosure.

As shown in fig. 9, the point cloud based rendering apparatus 900 includes a first point cloud information determination module 910, a deep learning module 920, and a second rendering module 930.

The first point cloud information determining module 910 is configured to determine object point cloud information to be processed according to geometric information of an object to be processed.

The deep learning module 920 is configured to input the object point cloud information to be processed into a deep learning model, so as to obtain rendering information of the object to be processed.

The second rendering module 930 is configured to render the geometric information of the object to be processed according to the rendering information of the object to be processed, so as to obtain a rendering result of the object to be processed. The deep learning model is trained by a training device according to the deep learning model.

Fig. 10 schematically illustrates a block diagram of an avatar generating apparatus according to an embodiment of the present disclosure.

As shown in fig. 10, the avatar generating apparatus 1000 includes a second point cloud information determining module 1010, a processing module 1020, and a generating module 1030.

The second point cloud information determining module 1010 is configured to determine target object point cloud information according to target object geometric information of the target object.

The processing module 1020 is configured to process the target object point cloud information by using a point cloud-based rendering device, so as to obtain target object rendering information.

And the generating module 1030 is configured to render the geometric information of the target object according to the rendering information of the target object, and generate an avatar of the target object.

According to embodiments of the present disclosure, the present disclosure also provides an electronic device, a readable storage medium and a computer program product.

According to an embodiment of the present disclosure, an electronic device includes: at least one processor; and a memory communicatively coupled to the at least one processor; the memory stores instructions executable by the at least one processor to enable the at least one processor to perform at least one of the point cloud based rendering method, the deep learning model training method, and the avatar generation method of the present disclosure.

According to an embodiment of the present disclosure, a non-transitory computer-readable storage medium storing computer instructions for causing a computer to perform at least one of the point cloud based rendering method, the training method of a deep learning model, and the avatar generation method of the present disclosure.

According to an embodiment of the present disclosure, a computer program product includes a computer program stored on at least one of a readable storage medium and an electronic device, which when executed by a processor, implements at least one of a point cloud based rendering method, a training method of a deep learning model, and an avatar generation method of the present disclosure.

FIG. 11 illustrates a schematic block diagram of an example electronic device that may be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the disclosure described and/or claimed herein.

As shown in fig. 11, the apparatus 1100 includes a computing unit 1101 that can perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM) 1102 or a computer program loaded from a storage unit 1108 into a Random Access Memory (RAM) 1103. In the RAM 1103, various programs and data required for the operation of the device 1100 can also be stored. The computing unit 1101, ROM 1102, and RAM 1103 are connected to each other by a bus 1104. An input/output (I/O) interface 1105 is also connected to bus 1104.

Various components in device 1100 are connected to an input/output (I/O) interface 1105, including: an input unit 1106 such as a keyboard, a mouse, etc.; an output unit 1107 such as various types of displays, speakers, and the like; a storage unit 1108, such as a magnetic disk, optical disk, etc.; and a communication unit 1109 such as a network card, modem, wireless communication transceiver, or the like. The communication unit 1109 allows the device 1100 to exchange information/data with other devices through a computer network such as the internet and/or various telecommunication networks.

The computing unit 1101 may be a variety of general purpose and/or special purpose processing components having processing and computing capabilities. Some examples of computing unit 1101 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, etc. The calculation unit 1101 performs the respective methods and processes described above, such as at least one of a point cloud-based rendering method, a training method of a deep learning model, and an avatar generation method. For example, in some embodiments, at least one of the point cloud based rendering method, the deep learning model training method, and the avatar generation method may be implemented as a computer software program tangibly embodied on a machine-readable medium, such as the storage unit 1108. In some embodiments, some or all of the computer programs may be loaded and/or installed onto device 1100 via ROM 1102 and/or communication unit 1109. When the computer program is loaded into the RAM 1103 and executed by the computing unit 1101, one or more steps of at least one of the above-described point cloud-based rendering method, the training method of the deep learning model, and the avatar generation method may be performed. Alternatively, in other embodiments, the computing unit 1101 may be configured to perform at least one of a point cloud based rendering method, a training method of a deep learning model, and an avatar generation method by any other suitable means (e.g., by means of firmware).

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuit systems, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), systems On Chip (SOCs), complex Programmable Logic Devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.

Program code for carrying out methods of the present disclosure may be written in any combination of one or more programming languages. These program code may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus such that the program code, when executed by the processor or controller, causes the functions/operations specified in the flowchart and/or block diagram to be implemented. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. The machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and pointing device (e.g., a mouse or trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), and the internet.

The computer system may include a client and a server. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server may be a cloud server, a server of a distributed system, or a server incorporating a blockchain.

It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps recited in the present disclosure may be performed in parallel or sequentially or in a different order, provided that the desired results of the technical solutions of the present disclosure are achieved, and are not limited herein.

The above detailed description should not be taken as limiting the scope of the present disclosure. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present disclosure are intended to be included within the scope of the present disclosure.

Claims

1. A point cloud based rendering method, comprising:

according to the three-dimensional point cloud information of the object, obtaining a first deep learning feature representing a first attribute feature of the object and a second deep learning feature representing a second attribute feature of the object;

performing first feature enhancement processing on the first deep learning features to obtain first rendering feature vectors;

performing second feature enhancement processing on the second deep learning features to obtain second rendering feature vectors; and

And obtaining object rendering information of the object according to the first rendering feature vector and the second rendering feature vector.

2. The method of claim 1, wherein the three-dimensional point cloud information comprises at least one of: sparse point cloud information and dense point cloud information.

3. The method of claim 1 or 2, wherein the first attribute features comprise diffuse reflectance features and the second attribute features comprise highlight features.

4. The method of any of claims 1-3, wherein the performing a first feature enhancement process on the first deep learning feature to obtain a first rendered feature vector comprises:

acquiring a first conditional feature vector, wherein the first conditional feature vector characterizes features of a first three-dimensional point cloud represented by at least one first view angle, and the first three-dimensional point cloud is a point cloud characterized by the three-dimensional point cloud information; and

and responding to the first condition feature vector as constraint, and carrying out first convolution processing on the first deep learning feature to obtain the first rendering feature vector.

5. The method of claim 4, wherein the obtaining a first conditional feature vector comprises:

Acquiring at least one first target point cloud obtained after rotating the first three-dimensional point cloud by at least one first angle; and

and extracting the characteristics of the first target point cloud to obtain the first conditional feature vector.

6. The method of claim 4 or 5, wherein the acquiring a first conditional feature vector comprises:

determining a first image obtained by projecting the first three-dimensional point cloud in at least one direction of an X-axis direction, a Y-axis direction and a Z-axis direction of a first preset three-dimensional rectangular coordinate system according to the first preset three-dimensional rectangular coordinate system and the three-dimensional point cloud information; and

and extracting the characteristics of the first image to obtain the first conditional characteristic vector.

7. The method of any of claims 1-6, wherein performing a second feature enhancement process on the second deep learning feature to obtain a second rendered feature vector comprises:

acquiring a second conditional feature vector, wherein the second conditional feature vector characterizes features of a second three-dimensional point cloud represented by at least one second view angle, and the second three-dimensional point cloud is a point cloud characterized by the three-dimensional point cloud information; and

and responding to the second condition feature vector as constraint, and carrying out second convolution processing on the second deep learning feature to obtain the second rendering feature vector.

8. The method of claim 7, wherein the obtaining a second conditional feature vector comprises:

acquiring at least one second target point cloud obtained after rotating the second three-dimensional point cloud by at least one second angle; and

and extracting the characteristics of the second target point cloud to obtain the second conditional feature vector.

9. The method of claim 7 or 8, wherein the acquiring a second conditional feature vector comprises:

determining a second image obtained by projecting the second three-dimensional point cloud in at least one direction of an X-axis direction, a Y-axis direction and a Z-axis direction of a second preset three-dimensional rectangular coordinate system according to the second preset three-dimensional rectangular coordinate system and the three-dimensional point cloud information; and

and extracting the characteristics of the second image to obtain the second conditional feature vector.

10. The method of any of claims 1-9, wherein the deriving object rendering information for the object from the first rendering feature vector and the second rendering feature vector comprises:

fusing the first rendering feature vector and the second rendering feature vector to obtain a fused feature vector; and

And determining the object rendering information according to the fusion feature vector.

11. The method of any of claims 1-10, further comprising:

obtaining object geometric information of the object; and

and rendering the object geometric information according to the object rendering information to obtain an object rendering result of the object.

12. A training method of a deep learning model, comprising:

inputting sample point cloud information of a sample object into a first neural network of a deep learning model to obtain a sample first deep learning feature representing a first attribute feature of the sample object, wherein the sample point cloud information has a real object geometric label;

inputting the sample point cloud information into a second neural network of the deep learning model to obtain sample second deep learning features representing second attribute features of the sample object;

inputting the first sample deep learning feature into a third neural network of the deep learning model to obtain a first sample rendering feature vector;

inputting the sample second deep learning feature into a fourth neural network of the deep learning model to obtain a sample second rendering feature vector;

Inputting the first rendering feature vector of the sample and the second rendering feature vector of the sample into a fifth neural network of the deep learning model to obtain sample object rendering information;

determining a sample object rendering result according to the sample object rendering information and the real object geometric label; and

and training the deep learning model according to the real object geometric label and the sample object rendering result to obtain a trained deep learning model.

13. The method of claim 12, further comprising: before the sample point cloud information of the sample object is input into the first neural network of the deep learning model to obtain a sample first deep learning feature representing a first attribute feature of the sample object, and the sample point cloud information is input into the second neural network of the deep learning model to obtain a sample second deep learning feature representing a second attribute feature of the sample object,

acquiring sparse point cloud information of the sample object;

upsampling the sparse point cloud information to obtain dense point cloud information of the sample object; and

and determining the sample point cloud information according to at least one of the sparse point cloud information and the dense point cloud information.

14. A point cloud based rendering method, comprising:

determining object point cloud information to be processed according to the geometric information of the object to be processed;

inputting the point cloud information of the object to be processed into a deep learning model to obtain rendering information of the object to be processed; and

rendering the geometric information of the object to be processed according to the rendering information of the object to be processed to obtain a rendering result of the object to be processed;

wherein the deep learning model is trained using the method according to any one of claims 12-13.

15. An avatar generation method, comprising:

determining target object point cloud information according to target object geometric information of the target object;

processing the target object point cloud information based on the method of claims 1-11 or claim 14 to obtain target object rendering information; and

and rendering the geometric information of the target object according to the rendering information of the target object to generate the virtual image of the target object.

16. A point cloud based rendering device, comprising:

the deep learning feature obtaining module is used for obtaining a first deep learning feature representing a first attribute feature of the object and a second deep learning feature representing a second attribute feature of the object according to the three-dimensional point cloud information of the object;

The first feature enhancement module is used for carrying out first feature enhancement processing on the first deep learning features to obtain first rendering feature vectors;

the second feature enhancement module is used for carrying out second feature enhancement processing on the second deep learning features to obtain second rendering feature vectors; and

and the rendering information obtaining module is used for obtaining object rendering information of the object according to the first rendering feature vector and the second rendering feature vector.

17. The apparatus of claim 16, wherein the three-dimensional point cloud information comprises at least one of: sparse point cloud information and dense point cloud information.

18. The apparatus of claim 16 or 17, wherein the first attribute feature comprises a diffuse reflectance feature and the second attribute feature comprises a highlight feature.

19. The apparatus of any of claims 16-18, wherein the first feature enhancement module comprises:

the first conditional feature vector acquisition unit is used for acquiring a first conditional feature vector, wherein the first conditional feature vector represents the feature represented by a first three-dimensional point cloud in at least one first view angle, and the first three-dimensional point cloud is the point cloud represented by the three-dimensional point cloud information; and

And the first convolution unit is used for responding to the first condition feature vector serving as a constraint and carrying out first convolution processing on the first deep learning feature to obtain the first rendering feature vector.

20. The apparatus of claim 19, wherein the first conditional feature vector acquisition unit comprises:

a first target point cloud obtaining subunit, configured to obtain at least one first target point cloud obtained after rotating the first three-dimensional point cloud by at least one first angle; and

and the first feature extraction subunit is used for extracting features of the first target point cloud to obtain the first conditional feature vector.

21. The apparatus according to claim 19 or 20, wherein the first conditional feature vector acquisition unit includes:

the first image determining subunit is used for determining a first image obtained by projecting the first three-dimensional point cloud in at least one direction of the X-axis direction, the Y-axis direction and the Z-axis direction of a first preset three-dimensional rectangular coordinate system according to the first preset three-dimensional rectangular coordinate system and the three-dimensional point cloud information; and

and the second feature extraction subunit is used for carrying out feature extraction on the first image to obtain the first conditional feature vector.

22. The apparatus of any of claims 16-21, wherein the second feature enhancement module comprises:

a second conditional feature vector obtaining unit, configured to obtain a second conditional feature vector, where the second conditional feature vector characterizes features of a second three-dimensional point cloud that are represented by the three-dimensional point cloud information in at least one second view angle; and

and the second convolution unit is used for responding to the second condition feature vector serving as a constraint and carrying out second convolution processing on the second deep learning feature to obtain the second rendering feature vector.

23. The apparatus of claim 22, wherein the second conditional feature vector acquisition unit comprises:

a second target point cloud obtaining subunit, configured to obtain at least one second target point cloud obtained after rotating the second three-dimensional point cloud by at least one second angle; and

and the third feature extraction subunit is used for carrying out feature extraction on the second target point cloud to obtain the second conditional feature vector.

24. The apparatus according to claim 22 or 23, wherein the second conditional feature vector acquisition unit includes:

A second image determining subunit, configured to determine, according to a second preset three-dimensional rectangular coordinate system and the three-dimensional point cloud information, a second image obtained by projecting the second three-dimensional point cloud in at least one direction of an X-axis direction, a Y-axis direction, and a Z-axis direction of the second preset three-dimensional rectangular coordinate system; and

and the fourth feature extraction subunit is used for carrying out feature extraction on the second image to obtain the second conditional feature vector.

25. The apparatus of any of claims 16-24, wherein the rendering information obtaining module comprises:

the fusion unit is used for fusing the first rendering feature vector and the second rendering feature vector to obtain a fusion feature vector; and

and the rendering information determining unit is used for determining the object rendering information according to the fusion feature vector.

26. The apparatus of any of claims 16-25, further comprising:

the geometric information acquisition module is used for acquiring the object geometric information of the object; and

and the first rendering module is used for rendering the object geometric information according to the object rendering information to obtain an object rendering result of the object.

27. A training device for a deep learning model, comprising:

the first network module is used for inputting sample point cloud information of a sample object into a first neural network of the deep learning model to obtain a sample first deep learning feature representing a first attribute feature of the sample object, wherein the sample point cloud information has a real object geometric label;

the second network module is used for inputting the sample point cloud information into a second neural network of the deep learning model to obtain sample second deep learning features representing second attribute features of the sample object;

the third network module is used for inputting the first deep learning feature of the sample into a third neural network of the deep learning model to obtain a first rendering feature vector of the sample;

a fourth network module, configured to input the second deep learning feature of the sample into a fourth neural network of the deep learning model, to obtain a second rendering feature vector of the sample;

a fifth network module, configured to input the sample first rendering feature vector and the sample second rendering feature vector into a fifth neural network of the deep learning model, to obtain sample object rendering information;

The rendering result determining module is used for determining a sample object rendering result according to the sample object rendering information and the real object geometric tag; and

and the training module is used for training the deep learning model according to the real object geometric label and the sample object rendering result to obtain a trained deep learning model.

28. The apparatus of claim 27, further comprising:

the sparse point cloud acquisition module is used for acquiring sparse point cloud information of the sample object;

the up-sampling module is used for up-sampling the sparse point cloud information to obtain dense point cloud information of the sample object; and

and the sample point cloud determining module is used for determining the sample point cloud information according to at least one of the sparse point cloud information and the dense point cloud information.

29. A point cloud based rendering device, comprising:

the first point cloud information determining module is used for determining point cloud information of an object to be processed according to geometric information of the object to be processed;

the deep learning module is used for inputting the point cloud information of the object to be processed into a deep learning model to obtain rendering information of the object to be processed; and

The second rendering module is used for rendering the geometric information of the object to be processed according to the rendering information of the object to be processed to obtain a rendering result of the object to be processed;

wherein the deep learning model is trained using the apparatus according to any one of claims 27-28.

30. An avatar generation apparatus comprising:

the second point cloud information determining module is used for determining target object point cloud information according to the target object geometric information of the target object;

a processing module, configured to process the target object point cloud information based on the apparatus according to claims 16-26 or claim 29, to obtain target object rendering information; and

and the generating module is used for rendering the geometric information of the target object according to the rendering information of the target object and generating the virtual image of the target object.

31. An electronic device, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein,,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-15.

32. A non-transitory computer readable storage medium storing computer instructions for causing the computer to perform the method of any one of claims 1-15.

33. A computer program product comprising a computer program stored on at least one of a readable storage medium and an electronic device, which, when executed by a processor, implements the method according to any one of claims 1-15.