CN117496036A - Method and device for generating texture map, electronic equipment and storage medium - Google Patents

Method and device for generating texture map, electronic equipment and storage medium Download PDF

Info

Publication number
CN117496036A
CN117496036A CN202311532994.7A CN202311532994A CN117496036A CN 117496036 A CN117496036 A CN 117496036A CN 202311532994 A CN202311532994 A CN 202311532994A CN 117496036 A CN117496036 A CN 117496036A
Authority
CN
China
Prior art keywords
texture
under
view angle
acquisition
acquisition view
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311532994.7A
Other languages
Chinese (zh)
Inventor
曾仙芳
祁忠琪
杨凌波
陈欣
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN202311532994.7A priority Critical patent/CN117496036A/en
Publication of CN117496036A publication Critical patent/CN117496036A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T15/003D [Three Dimensional] image rendering
    • G06T15/04Texture mapping
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computer Graphics (AREA)
  • Databases & Information Systems (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • Multimedia (AREA)
  • Image Generation (AREA)

Abstract

The present disclosure relates to the field of computer technologies, and in particular, to a method and apparatus for generating a texture map, an electronic device, and a storage medium, which are used for improving efficiency of manufacturing a texture map of a three-dimensional object. The method comprises the following steps: acquiring three-dimensional object white model and texture description information corresponding to an object to be processed; respectively carrying out image rendering on the three-dimensional object white mould according to a plurality of preset acquisition visual angles, and respectively obtaining a normal image, a depth image and a visual angle corresponding image corresponding to the three-dimensional object white mould under each acquisition visual angle; respectively generating respective corresponding target texture images of all the acquisition view angles according to the depth images and view angle corresponding images under all the acquisition view angles and the texture description information; and carrying out texture fusion on each obtained target texture image according to the normal image under each acquisition view angle to obtain a target texture map corresponding to the object to be processed. Because the texture fusion is carried out on the target texture image under a plurality of acquisition view angles, the texture image generated under the plurality of acquisition view angles has consistency.

Description

Method and device for generating texture map, electronic equipment and storage medium
Technical Field
The present disclosure relates to the field of computer technologies, and in particular, to a method and apparatus for generating a texture map, an electronic device, and a storage medium.
Background
In the process of three-dimensional object fabrication, steps such as original drawing, geometric modeling of three-dimensional object, drawing of three-dimensional object mapping, skeleton binding and the like are generally involved, and in the step of drawing of three-dimensional object mapping, a designer is required to manually finish drawing of texture mapping, so that the fabrication period is long and the cost is high.
To solve the above problems, noise data may be added to a rendered image based on image depth information in the related art to obtain a noise image, and texture resources may be initialized from the noise image; or, performing texture analysis on the texture grid to extract texture resources with defects, and performing optimization processing on the texture resources under a specific view angle by using depth information of the three-dimensional object as guide information to generate a texture image under the specific view angle.
However, in this method, since the texture image at a single view angle can be independently generated, the data information at different view angles cannot be fused, and thus the generated texture image often has a problem of inconsistent multi-view texture.
In summary, how to achieve consistency of texture images under multiple views and improve the manufacturing efficiency of texture mapping of three-dimensional objects is needed to be solved.
Disclosure of Invention
The embodiment of the application provides a method, a device, electronic equipment and a storage medium for generating a texture map, which are used for improving the manufacturing efficiency of the texture map of a three-dimensional object.
The method for generating the texture map provided by the embodiment of the application comprises the following steps:
acquiring three-dimensional object white model and texture description information corresponding to an object to be processed;
respectively carrying out image rendering processing on the three-dimensional object white mould according to a plurality of preset acquisition visual angles, and respectively obtaining a normal image, a depth image and a visual angle corresponding image corresponding to the three-dimensional object white mould under each acquisition visual angle; the normal image is used for describing the normal direction of each bin in the object to be processed, and the view angle corresponding graph is used for describing the index value of each bin in the object to be processed under the corresponding acquisition view angle;
respectively generating target texture images corresponding to the acquisition view angles according to the depth images and view angle corresponding maps under the acquisition view angles and the texture description information;
And carrying out texture fusion processing on each obtained target texture image according to the normal image under each acquisition view angle to obtain a target texture map corresponding to the object to be processed.
The device for generating the texture map provided by the embodiment of the application comprises:
the acquisition unit is used for acquiring the three-dimensional object white model and texture description information corresponding to the object to be processed;
the rendering unit is used for performing image rendering processing on the three-dimensional object white model according to a plurality of preset acquisition visual angles respectively to obtain a normal image, a depth image and a visual angle corresponding map corresponding to the three-dimensional object white model under each acquisition visual angle respectively; the normal image is used for describing the normal direction of each bin in the object to be processed, and the view angle corresponding graph is used for describing the index value of each bin in the object to be processed under the corresponding acquisition view angle;
the texture generation unit is used for respectively generating target texture images corresponding to the acquisition view angles according to the depth images and the view angle corresponding maps under the acquisition view angles and the texture description information;
and the texture fusion unit is used for carrying out texture fusion processing on each obtained target texture image according to the normal image under each acquisition view angle to obtain the target texture map corresponding to the object to be processed.
Optionally, the texture generation unit is specifically configured to:
according to the texture description information, the depth image and the view angle corresponding image under each acquisition view angle, carrying out at least one iteration denoising treatment on the noisy texture image under each acquisition view angle, and taking the fused texture image under each acquisition view angle obtained by the last iteration denoising treatment as a target texture image under the corresponding view angle;
the texture image after fusion is obtained by fusing the denoised characteristics under each acquisition view angle based on the view angle corresponding diagram under each acquisition view angle; and the depth image and the texture description information are denoising conditions of the denoised features obtained in the iterative denoising processing process.
Optionally, the texture generation unit is configured to perform denoising processing for each iteration by:
for each acquisition view angle, denoising the noisy texture image under one acquisition view angle according to the texture description information and the depth image under the one acquisition view angle to obtain denoised characteristics under the one acquisition view angle; the noisy texture image is obtained by random initialization during the first iterative denoising process;
According to the view angle corresponding diagrams under the collecting view angles, carrying out multi-view angle fusion processing on the denoised characteristics under the collecting view angles to obtain fused texture images under the collecting view angles;
and respectively taking the fused texture images under all the acquired view angles as noisy texture images under the corresponding view angles in the next iterative denoising process.
Optionally, each acquisition view corresponds to a trained diffusion model; the texture generation unit is specifically configured to:
inputting the texture description information, the depth image under the one acquisition view angle and the texture image with noise under the one acquisition view angle into a corresponding diffusion model;
and extracting deep neural network features from the texture description information, the depth image and the noisy texture image based on the diffusion model, and denoising the deep neural network features to obtain denoised features under the acquisition view angle.
Optionally, the texture generation unit is specifically configured to:
based on a preset mapping relation, carrying out multi-view fusion processing on the denoised features under each acquisition view angle to obtain candidate fusion features under each acquisition view angle;
Constructing an objective function based on the view angle corresponding graph under each acquisition view angle, the denoised characteristic under each acquisition view angle and the candidate fusion characteristic under each acquisition view angle;
adjusting the preset mapping relation by minimizing the objective function to obtain an objective mapping relation;
based on the target mapping relation, carrying out multi-view fusion processing on the denoised features under each acquisition view angle to obtain fused features under each acquisition view angle;
and for each acquisition view angle, respectively carrying out feature decoding processing on the fused features under one acquisition view angle to obtain a fused texture image under the one acquisition view angle.
Optionally, the texture generation unit is specifically configured to:
for each surface element, determining the surface element weight of the surface element under the corresponding view angle according to the view angle corresponding graph under each acquisition view angle;
for each acquisition view angle, acquiring information difference between the denoised characteristic and the candidate fusion characteristic of each face element under one acquisition view angle; according to the respective bin weights of each bin under the one acquisition view angle, carrying out weighted summation on the corresponding information differences to obtain information difference sums under the one acquisition view angle;
And summing the information difference sum under each acquisition view angle again to obtain the objective function.
Optionally, the texture generation unit is specifically configured to:
if the one surface element is determined to be visible under the one acquisition view angle according to the view angle corresponding diagram under the one acquisition view angle, determining that the surface element weight of the one surface element under the one acquisition view angle is a preset first surface element weight;
if the one surface element is not visible under the one acquisition view angle according to the view angle corresponding diagram under the one acquisition view angle, determining that the surface element weight of the one surface element under the one acquisition view angle is a preset second surface element weight;
wherein the first element weight is greater than the second element weight.
Optionally, the texture fusion unit is specifically configured to:
for each acquisition view angle, respectively performing reverse rendering treatment on texture images under one acquisition view angle to obtain a defect texture map with texture holes under the one acquisition view angle; performing inverse rendering processing on the normal image under the one acquisition view angle to obtain a fusion weight set under the one acquisition view angle, wherein the fusion weight set comprises fusion weights of each bin under the one acquisition view angle;
And carrying out texture fusion processing on the defect texture map under each acquisition view angle according to the fusion weight set under each acquisition view angle to obtain a target texture map corresponding to the object to be processed.
Optionally, each acquisition view corresponds to one acquisition device; the texture fusion unit is specifically configured to:
for each surface element under the one acquisition view angle, determining the fusion weight of the surface element under the one acquisition view angle according to the included angle between the normal direction of the surface element and the main axis direction of the acquisition equipment; wherein the included angle is inversely related to the fusion weight;
and combining the fusion weights of each bin under the one acquisition view angle to obtain a fusion weight set under the one acquisition view angle.
Optionally, the texture fusion unit is specifically configured to:
respectively carrying out normalization processing on the fusion weight sets under the acquisition view angles to obtain normalized weight sets under the corresponding view angles;
and carrying out texture fusion processing on the defect texture map under each acquisition view angle based on the normalized weight set under each acquisition view angle to obtain a target texture map corresponding to the object to be processed.
An electronic device provided in an embodiment of the present application includes a processor and a memory, where the memory stores a computer program, and when the computer program is executed by the processor, the processor is caused to execute the steps of any one of the texture map generation methods described above.
Embodiments of the present application provide a computer readable storage medium including a computer program for causing an electronic device to execute the steps of any one of the above-described texture map generation methods when the computer program is run on the electronic device.
Embodiments of the present application provide a computer program product comprising a computer program stored in a computer readable storage medium; when the processor of the electronic device reads the computer program from the computer readable storage medium, the processor executes the computer program, so that the electronic device performs the steps of any one of the texture map generation methods described above.
The beneficial effects of the application are as follows:
the embodiment of the application provides a method, a device, electronic equipment and a storage medium for generating a texture map. In the embodiment of the application, image rendering processing is carried out on a three-dimensional white mold by utilizing a plurality of preset acquisition view angles, so as to obtain a normal image, a depth image and a view angle corresponding diagram of the three-dimensional object under the corresponding plurality of acquisition view angles; the view angle corresponding graph can reflect the corresponding relation of each bin of the three-dimensional object surface under different acquisition view angles, based on the view angle corresponding graph of the three-dimensional object under the corresponding acquisition view angles is used as guiding information, and according to the depth image of the three-dimensional object under the corresponding acquisition view angles and the texture description information of the three-dimensional object input by an object, target texture images corresponding to the acquisition view angles are generated.
And then, according to the normal images under each acquisition view angle, carrying out texture fusion on the target texture images under a plurality of acquisition view angles to obtain the complete target texture map of the three-dimensional object. In the processing process, according to the normal images under each acquisition view angle, the target texture images under a plurality of acquisition view angles are fused, the target texture map with consistent textures under a plurality of acquisition view angles is generated on the white mold surface of the three-dimensional object, the texture holes existing under each acquisition view angle due to the shielding of the three-dimensional object are made up, the complete texture map is obtained, and the manufacturing efficiency of the three-dimensional object texture map is improved.
Additional features and advantages of the application will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the application. The objectives and other advantages of the application will be realized and attained by the structure particularly pointed out in the written description and claims thereof as well as the appended drawings.
Drawings
The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiments of the application and together with the description serve to explain the application and do not constitute an undue limitation to the application. In the drawings:
FIG. 1 is a schematic diagram of multi-view texture non-uniformity according to an embodiment of the present application;
fig. 2 is a schematic view of an application scenario provided in an embodiment of the present application;
FIG. 3 is a flowchart of an implementation of a method for generating texture maps according to an embodiment of the present application;
FIG. 4 is a schematic illustration of a visualization of a normal image provided by an embodiment of the present application;
fig. 5 is a schematic view of visualization of a depth image according to an embodiment of the present application;
fig. 6 is a visual schematic diagram of a view angle corresponding diagram according to an embodiment of the present application;
fig. 7 is a flowchart of an iterative denoising process according to an embodiment of the present application;
fig. 8 is a schematic diagram of iterative denoising processing logic provided in an embodiment of the present application;
FIG. 9 is a schematic diagram of another iterative denoising logic diagram according to an embodiment of the present application;
FIG. 10 is a flowchart of a multi-view fusion process provided by the present application;
FIG. 11 is a schematic diagram of an image diffusion module for multi-view feature fusion according to the present application;
FIG. 12 is a schematic diagram of a post-fusion texture image visualization provided in an embodiment of the present application;
FIG. 13 is a flowchart of generating a target texture map according to an embodiment of the present application;
FIG. 14 is a schematic diagram of a defect texture map according to an embodiment of the present disclosure;
FIG. 15 is a schematic diagram of a texture fusion module according to an embodiment of the present disclosure;
FIG. 16A is a schematic diagram illustrating generation of a texture map according to an embodiment of the present application;
FIG. 16B is a logic diagram of texture map generation provided by an embodiment of the present application;
FIG. 17 is a graph of evaluation results generated by a texture map according to an embodiment of the present disclosure;
FIG. 18 is a schematic diagram illustrating the structure of a texture map generating apparatus according to an embodiment of the present application;
fig. 19 is a schematic diagram of a hardware composition structure of an electronic device according to an embodiment of the present application;
fig. 20 is a schematic diagram of a hardware composition structure of another electronic device in an embodiment of the present application.
Detailed Description
For the purposes of making the objects, technical solutions and advantages of the embodiments of the present application more clear, the technical solutions of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is apparent that the described embodiments are some embodiments of the technical solutions of the present application, but not all embodiments. All other embodiments, which can be made by a person of ordinary skill in the art without any inventive effort, based on the embodiments described in the present application are intended to be within the scope of the technical solutions of the present application.
Some of the concepts involved in the embodiments of the present application are described below.
Diffusion model (diffusion model): also referred to as a diffusion probability model, is a generation model in the field of machine learning that can be used for the generation of data such as text, speech, images, etc., and described herein is a diffusion model for generating image data. After the diffusion model is trained in the back diffusion process on the natural image, a new natural image can be generated by starting from the random noise image and adopting an iterative denoising mode.
Texture map (texture map): the method comprises the steps of tiling and unfolding the surface textures of a three-dimensional model, and obtaining a two-dimensional image after UV mapping, wherein U and V refer to a horizontal axis and a vertical axis of a two-dimensional space. Texture maps describe texture information for all surfaces in a three-dimensional model.
White mold (3D white mold): refers to a three-dimensional model with all surfaces being white, and the corresponding texture map is a pure white picture.
Rendering (rendering): the process that a renderer generates a two-dimensional image under a specific view angle by using information such as geometry, texture, illumination and the like of a three-dimensional object is called rendering. In the embodiment of the application, the corresponding normal image, depth image and view angle corresponding map can be obtained by rendering the three-dimensional object white model.
Normal image: the normal direction of each bin of a three-dimensional object is described.
Depth image: depth values for each bin of a three-dimensional object are described.
View angle correspondence diagram: the index value of each surface element of the three-dimensional object is described, and the index value of the surface element can determine the corresponding relation of the surface element under each acquisition view angle.
Reverse rendering (inverse rendering): referring to the inverse of rendering, image information at a particular perspective is known, and three-dimensional object information required to generate the image is estimated, which may be one or more of geometry, texture, illumination.
Texture description information: the information describing the texture of the surface of the three-dimensional object can be in the form of words, pictures and the like, and the texture description information comprises, but is not limited to, information of colors, materials, patterns and the like of the three-dimensional object.
Artificial intelligence (Artificial Intelligence, AI) is the theory, method, technique and application system that uses a digital computer or a machine controlled by a digital computer to simulate, extend and extend human intelligence, sense the environment, acquire knowledge and use the knowledge to obtain optimal results. In other words, artificial intelligence is an integrated technology of computer science that attempts to understand the essence of intelligence and to produce a new intelligent machine that can react in a similar way to human intelligence. Artificial intelligence, i.e. research on design principles and implementation methods of various intelligent machines, enables the machines to have functions of sensing, reasoning and decision.
The artificial intelligence technology is a comprehensive subject, and relates to the technology with wide fields, namely the technology with a hardware level and the technology with a software level. Artificial intelligence infrastructure technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and other directions.
Natural language processing (Nature Language processing, NLP) is an important direction in the fields of computer science and artificial intelligence. It is studying various theories and methods that enable effective communication between a person and a computer in natural language. Natural language processing is a science that integrates linguistics, computer science, and mathematics. Thus, the research in this field will involve natural language, i.e. language that people use daily, so it has a close relationship with the research in linguistics. Natural language processing techniques typically include text processing, semantic understanding, machine translation, robotic questions and answers, knowledge graph techniques, and the like.
Machine Learning (ML) is a multi-domain interdisciplinary, involving multiple disciplines such as probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory, etc. It is specially studied how a computer simulates or implements learning behavior of a human to acquire new knowledge or skills, and reorganizes existing knowledge structures to continuously improve own performance. Machine learning is the core of artificial intelligence, a fundamental approach to letting computers have intelligence, which is applied throughout various areas of artificial intelligence. Machine learning and deep learning typically include techniques such as artificial neural networks, confidence networks, reinforcement learning, transfer learning, induction learning, teaching learning, and the like.
In the embodiment of the application, the texture description information can be analyzed based on a natural language processing technology and a machine learning technology, so that the noise-carrying texture image is subjected to denoising processing and the like through the texture description information, a target texture image conforming to the texture description information is generated, the object can conveniently finish drawing of the three-dimensional object map, and the manufacturing efficiency of the three-dimensional object texture map is improved.
With research and advancement of artificial intelligence technology, research and application of artificial intelligence technology is being developed in various fields, such as common smart home, smart wearable devices, virtual assistants, smart speakers, smart marketing, unmanned, automatic driving, unmanned aerial vehicles, robots, smart medical treatment, smart customer service, etc., and it is believed that with the development of technology, artificial intelligence technology will be applied in more fields and with increasing importance value.
In addition, the method for generating the texture map in the embodiment of the application also relates to database technology.
The Database (Database), which can be considered as an electronic filing cabinet, is a place for storing electronic files, and the objects can perform operations such as adding, inquiring, updating, deleting and the like on the data in the files. A "database" is a collection of data stored together in a manner that can be shared with multiple objects, with as little redundancy as possible, independent of the application.
For example, various images (normal images, depth images, and view maps, target texture images, target texture maps, etc.) generated in embodiments of the present application may be stored in a database for later use, etc.
The following briefly describes the design concept of the embodiment of the present application:
in the process of manufacturing a three-dimensional object, steps such as original drawing, geometric modeling of the three-dimensional object, drawing of a three-dimensional object map, skeleton binding and the like are generally involved, wherein a designer is required to draw a texture pattern on the surface of the three-dimensional object according to an original drawing draft in the process of drawing the three-dimensional object map, and the step requires the designer to manually finish drawing of the texture map, so that the manufacturing period is long and the cost is high.
In order to solve the above problems, in the related art, noise data may be added to a rendered image based on image depth information to obtain a noise image, and texture resources are initialized from the noise image, or texture resources with defects are extracted by performing texture analysis on texture grids, depth information of a three-dimensional object is used as guiding information, image depth information of the image is used to perform optimization processing on the texture resources under a specific view angle, and a texture image under the specific view angle is generated.
However, in this manner, only a single texture resource and depth information data pair can be processed in a single optimization process, only texture images under the acquired view angles can be independently generated for a single view angle, and data information under different view angles cannot be fused, so that when the generated texture images are applied to a three-dimensional object, the problem of inconsistent multi-view texture often exists.
Fig. 1 is a schematic diagram of multi-view texture inconsistency according to an embodiment of the present application. In fig. 1, three texture maps corresponding to a three-dimensional object under 3 acquisition view angles are shown, for the three texture maps of the three-dimensional object, the texture of the region S101 in the texture image 1 is inconsistent with the texture of the region S102 in the texture image 2, and from the lattice texture corresponding to the region S102 in the texture image 2, the region S101 in the texture image 1 is not corresponding; the texture of the S104 region in the texture image 3 is inconsistent with the texture of the S103 region in the texture image 2, and from the lattice texture corresponding to the S103 region in the texture image 2, the S104 region in the texture image 3 does not completely correspond, and the S104 region has only a partial lattice texture.
In view of this, embodiments of the present application provide a method, an apparatus, an electronic device, and a storage medium for generating a texture map. In the embodiment of the application, the view angle corresponding graph of the three-dimensional object under the corresponding multiple acquisition view angles is used as the guide information, and the target texture image corresponding to each acquisition view angle is generated according to the depth image of the three-dimensional object under the corresponding multiple acquisition view angles and the texture description information of the three-dimensional object input by the object, so that the target texture images generated under the multiple acquisition view angles have consistency, and the target texture images conform to the texture description information input by the object. And then, according to the normal images under the acquisition view angles, carrying out texture fusion on each target texture image under the acquisition view angles, and obtaining the complete target texture map of the three-dimensional object through weighting fusion on the target texture maps under the acquisition view angles. According to texture description information given by an object, a target texture image under a plurality of acquisition view angles is fused by utilizing a plurality of preset acquisition view angles, and a target texture map with consistent textures under the plurality of acquisition view angles is generated on the surface of a white mold of the three-dimensional object, so that the manufacturing efficiency of the three-dimensional object texture map is improved.
The preferred embodiments of the present application will be described below with reference to the accompanying drawings of the specification, it being understood that the preferred embodiments described herein are for illustration and explanation only, and are not intended to limit the present application, and embodiments and features of embodiments of the present application may be combined with each other without conflict.
Fig. 2 is a schematic view of an application scenario in an embodiment of the present application. The application scenario diagram includes two terminal devices 210 and a server 220.
In the embodiment of the present application, the terminal device 210 includes, but is not limited to, a mobile phone, a tablet computer, a notebook computer, a desktop computer, an electronic book reader, an intelligent voice interaction device, an intelligent home appliance, a vehicle-mounted terminal, and the like; the terminal device may be provided with a client related to texture map generation, where the client may be software (such as a browser, computer aided design software, etc.), or may be a web page, an applet, etc., and the server 220 may be a background server corresponding to the software or the web page, the applet, etc., or a server specifically used for generating the texture map, which is not specifically limited in this application. The server 220 may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server providing cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, content delivery networks (Content Delivery Network, CDN), basic cloud computing services such as big data and artificial intelligence platforms, and the like.
It should be noted that, the method for generating the texture map in the embodiments of the present application may be performed by an electronic device, which may be the terminal device 210 or the server 220, that is, the method may be performed by the terminal device 210 or the server 220 separately, or may be performed by the terminal device 210 and the server 220 together. For example, when the terminal device 210 and the server 220 execute together, the object inputs texture description information, a three-dimensional object white model and a preset plurality of acquisition view angles at the terminal device 210, the terminal device 210 sends the information input by the object to the server 220, and the server 220 performs image rendering processing on the three-dimensional object white model according to the preset plurality of acquisition view angles to respectively obtain a normal image, a depth image and a view angle corresponding map corresponding to the three-dimensional object white model under each acquisition view angle; respectively generating target texture images corresponding to the acquisition view angles according to the depth images and view angle corresponding images under the acquisition view angles and the texture description information; and then, carrying out texture fusion processing on each obtained target texture image according to the normal image under each acquisition view angle to obtain the target texture map corresponding to the three-dimensional object. Finally, the server 220 may send the target texture map to the terminal device 210 for presentation to the subject for viewing use.
In an alternative embodiment, the terminal device 210 and the server 220 may communicate via a communication network.
In an alternative embodiment, the communication network is a wired network or a wireless network.
It should be noted that, the number of terminal devices and servers shown in fig. 2 is merely illustrative, and the number of terminal devices and servers is not limited in practice, and is not specifically limited in the embodiments of the present application.
In the embodiment of the present application, when the number of servers is plural, plural servers may be configured as a blockchain, and the servers are nodes on the blockchain; the method for generating texture maps according to the embodiments of the present application, wherein the image data involved may be stored on a blockchain, for example, a normal image, a depth image, a view map, a target texture image, a target texture map, and the like.
In addition, the embodiment of the application can be applied to various scenes, including not only 3D content generation scenes, but also scenes such as cloud technology, artificial intelligence, intelligent transportation, driving assistance and the like.
The method for generating texture maps according to the exemplary embodiments of the present application will be described below with reference to the accompanying drawings in conjunction with the application scenario described above, and it should be noted that the application scenario described above is only shown for the convenience of understanding the spirit and principles of the present application, and embodiments of the present application are not limited in this respect.
Referring to fig. 3, a flowchart of implementation of a method for generating a texture map according to an embodiment of the present application is shown, taking a server as an execution body as an example, and the specific implementation flow of the method is as follows S31 to S34:
s31: and acquiring the three-dimensional object white model and texture description information corresponding to the object to be processed.
The object to be processed may be a three-dimensional object under any scene (such as a game, virtual Reality (VR) scene, etc.), which is not specifically limited herein. For example, the object to be processed may be a person, a building, or the like in a game scene.
The three-dimensional object white model refers to a three-dimensional model with white surfaces of objects to be processed.
In this embodiment of the present application, the texture description information is description information describing the texture of the surface of the three-dimensional object, and the following text description information is taken as an example, where the text description information includes, but is not limited to, information such as color, texture, pattern, etc. of the surface of the three-dimensional object; for example, the texture descriptive information may be "yellow silk material with blue vertical stripe pattern attached.
In the embodiment of the application, for an object to be processed, a three-dimensional object white model corresponding to the object can be obtained through modeling software, digital engraving software and the like; the texture description information corresponding to the object can be obtained through the modes of manual input, 3D scanning and the like of the object.
Of course, the three-dimensional object white model and the texture description information corresponding to the object to be processed can be obtained in other manners, and the above is only a simple example and is not described in detail herein.
S32: and respectively carrying out image rendering processing on the three-dimensional object white mould according to a plurality of preset acquisition visual angles, and respectively obtaining a normal image, a depth image and a visual angle corresponding map corresponding to the three-dimensional object white mould under each acquisition visual angle.
In the embodiment of the application, the collection view angle can be the view angle of a digital camera, a video camera and other collection devices relative to the object to be processed, each collection view angle corresponds to one collection device, and the collection device is used for collecting the white mold surface texture of the three-dimensional object under the corresponding view angle. The collection devices corresponding to different collection viewing angles may be the same or different, and are not specifically limited herein.
This application considers when gathering three-dimensional object white mould, probably has invisible region, has consequently set up a plurality of collection visual angles, and specifically, the quantity of collection visual angle that sets up is more than or equal to 2 can. The preset plurality of acquisition viewing angles may be 4, 6, etc., which are not particularly limited herein.
Under different acquisition view angles, the surface texture of the three-dimensional object white mould is converted into a two-dimensional pure white texture map by rendering the three-dimensional object white mould, and then a normal image, a depth image and a view angle corresponding map under each acquisition view angle can be obtained.
Specifically, after the server obtains the three-dimensional object white model of the object to be processed and the texture description information of the object to be processed, which are input by the object, the server can perform image rendering on the three-dimensional object white model according to the preset acquisition view angles of the plurality of acquisition devices, so as to obtain a normal image, a depth image and a view angle corresponding diagram under each acquisition view angle.
For example, in a one-time image rendering process, a three-dimensional object white mold of an object to be processed, which is input by an object, is a piece of jacket, texture description information of the jacket is yellow silk material and is attached with a blue vertical stripe pattern, and image rendering is performed on the three-dimensional object white mold according to 3 acquisition view angles of an acquisition device, so as to obtain 3 normal images, 3 depth images and 3 view angle corresponding graphs corresponding to the 3 acquisition view angles, wherein each acquisition view angle corresponds to one normal image, one depth image and one acquisition view angle corresponding graph.
Wherein the normal image is used to describe the normal direction of each bin in the object to be processed.
Fig. 4 is a schematic view of a visualization of a normal image according to an embodiment of the present application. Specifically, fig. 4 is a normal image corresponding to each of the 3 acquisition viewing angles obtained by performing image rendering processing on a three-dimensional object white model of an object to be processed according to the preset 3 acquisition viewing angles in the process of generating a texture map. In the normal image 1, the normal image 2 and the normal image 3, the normal direction of each surface element of the three-dimensional object under the corresponding acquisition view angle is described respectively.
The depth image is used for describing a depth value of each surface element, and the depth value represents the distance of the corresponding surface element in the object to be processed relative to the acquisition equipment.
Fig. 5 is a schematic view of visualization of a depth image according to an embodiment of the present application. Specifically, fig. 5 is a depth image corresponding to each of 3 acquisition viewing angles obtained by performing image rendering processing on a three-dimensional object white model of an object to be processed according to 3 preset acquisition viewing angles. In the depth image 1, the depth image 2 and the depth image 3, when the object to be processed is in a partial area which is closer to the acquisition equipment, the partial area is in lighter color in the depth image; when the object to be processed is far away from the acquisition device, the partial area appears as darker color in the depth image.
The view angle corresponding graph is used for describing index values of visual surface elements in the object to be processed under the corresponding acquisition view angles, and because the index values of the surface elements corresponding to each surface element under different acquisition view angles are the same, the corresponding relation of the surface elements under different view angles can be calculated according to the surface element index values.
Fig. 6 is a schematic view of a visual angle mapping diagram according to an embodiment of the present application. Fig. 6 is a view angle corresponding diagram corresponding to each of the 3 acquisition view angles obtained by performing image rendering processing on a three-dimensional object white model of an object to be processed according to the preset 3 acquisition view angles. Taking bin a and bin B as examples. Wherein, the bin A is visible under 3 acquisition viewing angles, and the bin B is visible under 2 acquisition viewing angles. For bin a, A1 in view corresponds to fig. 1, A2 in view corresponds to fig. 2, and A3 in view corresponds to fig. 3; for bin B, B1 in view corresponds to fig. 1, B2 in view corresponds to fig. 2, and bin B is not visible in view corresponds to fig. 3. Thus A1, A2, A3 reflect the correspondence of bin a in these 3 view maps, and B1, B2 reflect the correspondence of bin B in the first 2 view maps.
After the normal image, the depth image and the view angle corresponding image under each acquisition view angle are obtained, the fused texture image under a plurality of acquisition view angles can be generated according to the texture description information and the normal image, the depth image and the view angle corresponding image under each acquisition view angle.
S33: and respectively generating target texture images corresponding to the acquisition view angles according to the depth images and the view angle corresponding images under the acquisition view angles and the texture description information.
In the embodiment of the present application, S33 may be implemented using a machine learning model under the technology. That is, the depth image, the perspective correspondence map, and the texture description information are input to a machine learning model to generate a target texture image that is semantically related to the texture description information.
For example, the diffusion model may be used to perform iterative denoising processing on the noisy texture image, so as to obtain the target texture image corresponding to each acquisition view angle. An alternative embodiment is as follows:
and carrying out at least one iteration denoising treatment on the noisy texture image under each acquisition view angle according to the texture description information, the depth image under each acquisition view angle and the view angle corresponding graph, and taking the fused texture image under each acquisition view angle obtained by the last iteration denoising treatment as a target texture image under the corresponding view angle.
The texture image after fusion is obtained by fusing denoised features under all the acquisition view angles based on view angle corresponding diagrams under all the acquisition view angles; the depth image and the texture description information are denoising conditions of the denoised features obtained in the iterative denoising processing.
Specifically, when iterative denoising processing is performed, depth images and texture description information are used as denoising conditions, denoised characteristics can be obtained, but in order to ensure consistency of texture images generated under multiple view angles of a three-dimensional object, simple denoising is not performed, and on the basis, target texture images of all view angles are generated by combining view angle corresponding graphs.
Alternatively, the process of iterative denoising processing may be implemented based on a diffusion model, and flowcharts S331 to S333 shown in fig. 7 are executed at each iterative denoising processing:
s331: for each acquisition view angle, denoising the noisy texture image under the acquisition view angle according to texture description information and a depth image under the acquisition view angle to obtain denoised characteristics under the acquisition view angle; the noisy texture image is obtained by random initialization during the first iterative denoising process.
For example, a noisy texture image is compressed by an image encoder to convert from pixel space to potential space to capture essential information of the image. The input texture descriptive information is converted into the condition of the denoising process using a text encoder such as a bag of words model (Count vector), a word Frequency-inverse document Frequency (Term Frequency-Inverse Document Frequency, TF-IDF vector), and the like.
Further, the noisy texture image is denoised based on the texture descriptive information, depth image conditions to obtain a potential representation of the generated image.
In the embodiment of the application, the denoising process can be flexibly adjusted according to the condition of the text image or other forms. For the texture description information, for example, text-based texture description information in the form of text input to an object may be used, and image-based texture description information in the form of image input to an object may be used. Finally, the image is reconverted from the potential space to the pixel space by an image decoder, generating a final image. For depth images, this is the image condition.
Specifically, the process of denoising the noisy texture image based on the texture description information and the depth image condition, namely, when iterative denoising is performed on the noisy texture image according to the distance between corresponding surface elements in the object to be processed provided by the depth image and the acquisition equipment, the denoising result can better restore the depth detail of the object to be processed; according to the texture of the object to be processed, which is provided by the texture description information, including information such as color, patterns and the like, when the noisy texture image is denoised, the denoising result can be gradually attached to the texture of the texture description information, so that finally the generated denoised characteristics can be matched with the characteristics of the object to be processed and the texture characteristics of the texture description information.
In embodiments of the present application, the above steps may be implemented based on a diffusion model, which is a diffusion model for generating image data. For each acquisition view angle, after the diffusion model is trained in the back diffusion process, deep neural network characteristics can be extracted from noisy texture images and depth images under the acquisition view angle and texture description information input by an object, noise components in the deep neural network characteristics are estimated, and therefore the denoised characteristics under the acquisition view angle are output.
Further, in implementing S331, in order to improve the efficiency of image processing, a trained diffusion model may be configured for each acquisition view angle.
Optionally, for each acquisition view corresponds to a trained diffusion model, then an alternative embodiment of S331 is as follows, including S3311-S3312 (not shown in fig. 7):
s3311: and inputting texture description information, a depth image under an acquisition view angle and a texture image with noise under the acquisition view angle into a corresponding diffusion model.
S3312: based on the diffusion model, deep neural network characteristics are extracted from texture description information, depth images and noisy texture images, and denoising is carried out on the deep neural network characteristics to obtain denoised characteristics under the acquisition view angle.
In summary, each acquisition view corresponds to a diffusion model, and the input of each diffusion model is a depth image and a noisy texture image under the corresponding acquisition view, and in addition, texture description information shared by a plurality of acquisition views is output as denoised features under one acquisition view. The diffusion models under different acquisition view angles can be processed in parallel so as to rapidly realize image processing under multiple view angles, thereby being beneficial to improving the generation efficiency of the texture map.
S332: and carrying out multi-view fusion processing on the denoised characteristics under each acquisition view according to the view corresponding diagram under each acquisition view to obtain a fused texture image under each acquisition view.
S333: and respectively taking the fused texture images under all the acquisition view angles as noisy texture images under the corresponding view angles in the next iterative denoising process.
In the iterative denoising process, the noisy texture image is randomly initialized during the first iterative denoising process. And then the noisy texture image with the denoising processing of each iteration is the fused texture image obtained by the last iteration.
Specifically, for each iterative denoising process, denoising the noisy texture image according to texture description information input by an object and according to a depth image under each acquisition view angle obtained after image rendering process, through a diffusion model corresponding to the acquisition view angle trained under the corresponding acquisition view angle, so as to obtain denoised characteristics under the acquisition view angle; specifically, corresponding denoised characteristics can be obtained based on the mode under each acquisition view angle; and then, for each acquisition view angle, carrying out multi-view fusion on the denoised characteristics of each acquisition view angle through a multi-view characteristic fusion algorithm according to the view angle corresponding diagram under the acquisition view angle, so as to obtain a fused texture image. And taking the fusion texture image obtained after the iterative denoising processing as the input of the next iterative denoising processing, namely the noisy texture image of the next iterative denoising processing.
Fig. 8 is a schematic diagram of an iterative denoising processing logic according to an embodiment of the present application. Fig. 8 illustrates a schematic diagram of denoising a noisy texture image under 3 acquisition view angles when denoising an object to be processed under 3 acquisition view angles is performed once. According to texture description information corresponding to the object to be processed and depth images under 3 acquisition view angles, which are input by an object, denoising the noisy texture images under 3 acquisition view angles through a trained diffusion model to obtain 3 denoised features corresponding to the 3 acquisition view angles, taking the 3 denoised features as input of a multi-view feature fusion algorithm, taking view angle corresponding images under each acquisition view angle as guide information, and respectively obtaining fused texture images corresponding to the 3 acquisition view angles by fusing the denoised features under the 3 view angles.
In the embodiment of the application, the denoising process is performed iteratively, and the fused texture image obtained after the denoising of the last iteration is the target texture image.
For example, the iterative denoising process may represent the optimization process of T iteration times by using states of [ T, …,0] time, and the value of T may be set empirically, or may be set according to the input noisy texture image range, for example, the value range of T may be 20-50, which is not specifically limited in this application. The image at the time T is a randomly initialized noisy image, and the image at the time 0 is a finally generated target texture image, wherein the fused texture image at the time T can be used as the noisy texture image at the time T-1. Reference may be made to an iterative denoising process schematic as shown in fig. 9.
Fig. 9 is a schematic diagram of another iterative denoising process according to an embodiment of the present application. Fig. 9 is an example of iterative denoising of a randomly initialized noisy texture image at one acquisition view angle. At time T, i.e. when denoising is performed for the first time, the input texture image is a randomly initialized noisy texture image shown in S901 in fig. 9, denoising is performed on the noisy texture image according to the depth image under the acquisition view angle and the texture description information input by the object, so that the noisy texture image is more attached to the shape of the object to be processed, and the texture conforming to the texture description information input by the object is gradually formed. At time t, the fused texture image generated at that time, i.e., the fused texture image shown in S902 in fig. 9, will be used as an input to the denoising process at time t-1, i.e., the noisy texture image at time t-1. Assuming that the iteration number is 20, through 20 denoising processes, a denoising process output at time 0 is obtained, that is, S903 in fig. 9, which is the target texture image obtained under the acquisition view angle.
In this embodiment of the present application, the process shown in fig. 9 may be used to obtain a corresponding target texture image at each acquisition view angle, and the process uses the view angle corresponding map at each acquisition view angle as guiding information to fuse the features at multiple acquisition view angles, so that the obtained target texture image has consistency at multiple acquisition view angles.
It should be noted that, the embodiment of the present application does not limit the type of diffusion model, and the diffusion model is a model with a basic image generating capability, including but not limited to any one of the following: diffusion Probabilistic Model (Diffusion probability Model, which may be simply referred to as Diffusion Model), denoising Diffusion Probabilistic Model (denoising Diffusion probability Model), and Stable-Diffusion Model.
In addition, the diffusion model according to the embodiment of the present application is a model having a basic image generation capability, and the diffusion model having a basic image generation capability means: based on the given texture description information, the image generation model has the capability of generating an image related to the semantics of the given texture description information; it can be understood that the diffusion model provided in the embodiment of the present application is a pre-trained diffusion model, and training the diffusion model in the embodiment of the present application can be understood as fine-tuning the diffusion model, and by means of the basic image generating capability of the diffusion model, after training in the back diffusion process on the natural image, the fused texture image can be generated by starting from the random initialization of the noisy texture image and through the iterative denoising method.
The training manner of the diffusion model in the present application may be a conventional model training manner, which is not specifically limited herein, wherein satisfying the diffusion model training termination condition may include any one of the following: the number of training times of iterative training reaches a number threshold (the number threshold may be a value set according to an empirical value for controlling the number of training times, for example, the number threshold is 20 times, 50 times, etc.), and the loss information of the diffusion model is smaller than the loss threshold (the loss threshold may be a value set according to an empirical value for controlling the loss information of the diffusion model).
It should be noted that the diffusion model according to the present embodiment may be a machine learning model related to the field of computer vision technology of artificial intelligence.
When the denoised feature is subjected to the multi-view feature information fusion algorithm to obtain the fused texture image, that is, when step S332 shown in fig. 7 is performed, the method may be implemented based on the flowchart shown in fig. 10, including S3321 to S3324:
s3321: and carrying out multi-view fusion processing on the denoised features under each acquisition view based on a preset mapping relation to obtain candidate fusion features under each acquisition view.
S3322: and constructing an objective function based on the view angle corresponding graph under each acquisition view angle, the denoised characteristic under each acquisition view angle and the candidate fusion characteristic under each acquisition view angle.
S3323: and adjusting a preset mapping relation by minimizing an objective function to obtain the objective mapping relation.
The preset mapping relationship may take the form of a mapping function, a mapping set, a mapping icon, etc., which is not specifically limited herein.
For example, the preset mapping relationship may be expressed in the form of a mapping function. In the embodiment of the application, the denoised characteristics under each acquisition view angle are used as the input of the mapping function, so that the candidate fusion characteristics under each acquisition view angle can be obtained.
When the input weights of the denoised features under each acquisition view angle are different, the mapping function is changed, the input weights are in one-to-one correspondence with the mapping function, and further, the candidate fusion features are changed along with the change of the input weights; therefore, when the objective function is constructed according to the view angle corresponding graph, the denoised characteristic and the candidate fusion characteristic under each acquisition view angle, the objective function is constructed according to different candidate fusion characteristics, so that different objective functions corresponding to the candidate fusion characteristics can be obtained.
In the embodiment of the application, by minimizing the objective function, the objective input weight uniquely corresponding to the minimum objective function can be determined, and according to the objective input weight, a mapping function can be uniquely determined, that is, the objective mapping relation in the application.
Under the condition of determining the target mapping relation, based on the target mapping relation, the denoised features under each acquisition view angle are respectively subjected to multi-view fusion processing to obtain candidate fusion features, namely the fusion features under the corresponding acquisition view angles.
Finally, decoding the fused features to obtain corresponding images, namely fused texture images, wherein the method comprises the following steps:
s3324: for each acquisition view angle, the fused features under the acquisition view angle are respectively subjected to feature decoding processing, so that the fused texture image under the acquisition view angle can be obtained.
The construction process of the objective function is described in detail as follows:
an alternative embodiment is to determine the objective function by:
for each surface element, determining the surface element weight of one surface element under the corresponding view angle according to the view angle corresponding graph under each acquisition view angle;
for each acquisition view angle, acquiring information difference between the denoised characteristic and the candidate fusion characteristic of each face element under one acquisition view angle; according to the respective bin weights of each bin under one acquisition view angle, carrying out weighted summation on the corresponding information differences to obtain information difference sums under one acquisition view angle;
And summing the information difference sum under each acquisition view angle again to obtain an objective function.
Specifically, in the embodiment of the present application, the objective function may be determined by the following formula:
wherein, the index i represents an acquisition view index, the index j represents a bin index, and the t represents a t-th iteration denoising process; m is the number of acquisition visual angles; l (L) FTD (J) Representing an objective function; ψ (·) represents the diffusion model;representing the characteristics extracted from the diffusion model under the i acquisition view angle during the t-th iterative denoising treatment; />The method is characterized in that candidate fusion characteristics under the view angle are acquired i when the t-th iterative denoising treatment is performed; />A bin weight representing each bin; />When the t-th iterative denoising treatment is shown, i is used for collecting a noisy texture image under a visual angle; d (D) i Representing i acquisition of depth images at a viewing angle; />And (5) representing the denoised characteristics under the i acquisition view angle during the t-th iterative denoising treatment.
Formula 1 describes the weighted summation of the differences between the features before fusion and the features after fusion under M acquisition view angles, and the minimization process of the objective function is to restrict the features after fusion to approach the denoised features under the acquisition view angles at the same time, and reconcile the plurality of denoised features to obtain the fused features.
F (-) is the mapping function representing the preset mapping relation, when the input of F (-) is the t-th iterative denoising processing, the denoised characteristics corresponding to all the acquisition view angles are corresponding to each acquisition view angleWhen the input occupies different input weights, F (-) also changes with the different input weights, and meanwhile L FTD (J) Can also change with the change, for each corresponding +.>Occupying different weights and corresponding to a plurality of different L FTD (J) For a plurality of L FTD (J) After the minimization treatment, the minimum L is obtained FTD (J) Will be the smallest L FTD (J) Corresponding->As a post-fusion feature.
For example, in one texture mapping process, the 3 rd iteration denoising process corresponds to 3 acquisition view anglesIt was weighted with three different weights: 0.2:0.2:0.6,0.1:0.3:0.6,0.3:0.1:0.6 as input of F (-), three different F (-) and thus three different L (-) will be obtained FTD (J) For these three L FTD (J) After the minimization treatment, the minimum L is obtained FTD (J) According to the L FTD (J) The input weights of the corresponding F (·) are determined as follows: 0.2:0.2:0.6, then +.>As a post-fusion feature.
In this embodiment of the present application, since different acquisition view angles are set, it cannot be guaranteed that each bin is visible under each acquisition view angle, and the view angle corresponding map may reflect whether one bin is visible under the corresponding acquisition view angle, so when determining the bin weight, it may be determined according to the view angle corresponding map.
For W i j An alternative embodiment is as follows:
if it is determined that a bin is visible at an acquisition view according to a view map at the acquisition view, determining that a bin weight of the bin at the acquisition view is a preset first bin weight.
If it is determined that a bin is not visible at an acquisition view angle according to the view angle map at the acquisition view angle, determining that the bin weight of the bin at the acquisition view angle is a preset second bin weight.
In the embodiment of the present application, the values of the first element weight and the second element weight are not specifically limited, and only the first element weight is ensured to be greater than the second element weight, that is, the visible element is more emphasized in the fusion process.
An alternative embodiment is to use 1 as the first element weight and 0 as the second element weight. Taking the assumption of FIG. 5, in FIG. 5, bin A is visible in view corresponding to FIG. 1, view corresponding to FIG. 2, and view corresponding to FIG. 3, and therefore, for bin A, bin weights The values are 1, for the bin B in fig. 5, it is visible bin in view angle corresponding to fig. 1 and view angle corresponding to fig. 2, and invisible bin in view angle corresponding to fig. 3, therefore, the bin weight for the bin BThe value is 1, & lt + & gt>The value is 0.
It should be noted that, the bin weight may have other values, for example, 1 may be used as the first bin weight and 0.5 may be used as the second bin weight; in addition, the bin weights may also be determined in other manners, for example, the bin weights of the bins may be divided according to different areas of the bin on the object to be processed, and the application is not limited specifically.
In addition to the above-listed denoised features obtained by minimizing the objective function, the denoised features can be obtained by simple weighted averaging of the features at each acquisition view angle, as follows:
in an alternative embodiment, when the denoised features under multiple acquisition view angles are subjected to multi-view fusion processing, the features of each bin under different acquisition view angles can be weighted and averaged, and the features of each bin under each acquisition view angle can be replaced by the weighted and averaged features. For example, when the assumption of fig. 5 is used to perform multi-view fusion on the denoised feature of the bin a in fig. 5, the denoised feature of the bin a in fig. 5 in the view corresponding to A1 in fig. 1, the view corresponding to A2 in fig. 2, and the view corresponding to A3 in fig. 3 is weighted and averaged to obtain new feature substitutions A1, A2, and A3. Similarly, for bin B in fig. 5, the denoised features of bin B in fig. 5 corresponding to B1 in fig. 1 and B2 in fig. 2 are weighted and averaged to obtain new features to replace B1 and B2.
It should be noted that, the above process of obtaining the fused features by performing iterative denoising on the noisy texture image may be modularized, for example, implemented by an image diffusion module based on multi-view feature fusion as listed below:
fig. 11 is a schematic diagram of an image diffusion module for multi-view feature fusion according to an embodiment of the present application. Fig. 11 is a schematic diagram of a denoising process in which a noisy texture image is subjected to iterative denoising processing, and according to texture description information input by an object and depth images under each acquisition view angle, deep neural network features can be extracted from the noisy texture image randomly initialized under the acquisition view angle through a trained diffusion model, and noise components in the deep neural network features are estimated, so that denoised features under the acquisition view angle are output.
Taking the denoised features under each acquisition view angle as the input of a multi-view feature information fusion algorithm, taking the view angle corresponding map under each acquisition view angle as the guiding information of multi-view fusion, and carrying out multi-view fusion on the denoised features to obtain the fused features.
And then, according to the feature decoding module, performing feature decoding on the fused features under each acquisition view angle to obtain a fused texture image, wherein the fused texture image can be used as a noisy texture image for the next iterative denoising process, so that the iterative denoising process of the noisy texture image which is randomly initialized is realized, and further a target texture image is obtained, namely, the fused texture image is obtained by the last iterative denoising.
In the embodiment of the application, as the view angle corresponding diagram under each acquisition view angle is used as the guide information, the characteristics under a plurality of acquisition view angles are fused, and then the characteristics are decoded, so that the obtained fused texture image has consistency under a plurality of acquisition view angles. For example, reference may be made to the fused texture image schematic diagram shown in fig. 12.
Fig. 12 is a schematic diagram of visualization of a texture image after fusion according to an embodiment of the present application. In fig. 12, in the process of generating a texture map for an object to be processed, the fused texture image 1, the fused texture image 2, and the fused texture image 3 respectively corresponding to the 3 acquisition view angles obtained by the image diffusion module for multi-view feature fusion are consistent as can be seen from fig. 12.
In the embodiment of the application, the noise texture image is subjected to denoising processing according to the waste heat texture description information input by the object, and the denoised characteristics under a plurality of acquisition view angles are subjected to multi-view fusion processing, so that the fused characteristics under a plurality of acquisition view angles have consistency, and the efficiency of generating the texture map is improved.
S34: and carrying out texture fusion processing on each obtained target texture image according to the normal image under each acquisition view angle to obtain a target texture map corresponding to the object to be processed.
In the embodiment of the application, the texture fusion processing is performed on the target texture image under each acquisition view angle by combining the normal direction of each bin described by the normal image under each acquisition view angle with the main axis direction of each acquisition view angle, so as to obtain a complete target texture map.
An alternative embodiment is to obtain the target texture map corresponding to the object to be processed through flowcharts S341 to S342 (not shown in fig. 3) shown in fig. 13:
s341: for each acquisition view angle, respectively performing reverse rendering treatment on texture images under one acquisition view angle to obtain a defect texture map with texture holes under one acquisition view angle; performing reverse rendering processing on a normal image under one acquisition view angle to obtain a fusion weight set under one acquisition view angle, wherein the fusion weight set comprises fusion weights of each surface element under one acquisition view angle;
s342: and carrying out texture fusion processing on the defect texture map under each acquisition view angle according to the fusion weight set under each acquisition view angle to obtain a target texture map corresponding to the object to be processed.
Because the object to be processed is a three-dimensional object, the three-dimensional object can shield part of the surface of the object to be processed, so that all the surfaces of the object to be processed cannot be acquired from one acquisition view angle, and when texture images under one acquisition view angle are reversely rendered to obtain texture maps, the texture maps under the acquisition view angle have texture holes in invisible areas under the acquisition view angle, and the texture maps are defective texture maps. See the defect texture map diagram of fig. 14.
Fig. 14 is a schematic diagram of a defect texture map according to an embodiment of the present application. Fig. 14 is a schematic diagram showing that after 2 texture images of an object to be processed corresponding to 2 collection views are reversely rendered, a defect texture map, namely a defect texture map a and a defect texture map B, is obtained at each collection view, and the three-dimensional object will block a part of its surface because the object to be processed is a three-dimensional object, so that a texture region corresponding to S1401 shown in the defect texture map a and S1402 shown in the defect texture map B show texture holes; s1403 in the defect texture map a corresponds to the texture region, and S1404 in the defect texture map B represents the texture hole; in the texture region corresponding to S1405 shown in the defect texture map a, S1406 shown in the defect texture map B shows a texture void or the like.
In addition, according to the normal image under each acquisition view angle, a fusion weight map under each acquisition view angle is obtained through reverse rendering processing. An alternative embodiment is:
for each surface element under one acquisition view angle, determining the fusion weight of the next surface element under the acquisition view angle according to the included angle between the normal direction of the surface element and the main axis direction of acquisition equipment; and combining the fusion weights of each surface element under the acquisition view angle to obtain a fusion weight set under the acquisition view angle.
Wherein the included angle is inversely related to the fusion weight. If the included angle is larger, the corresponding fusion weight is smaller.
Specifically, the normal image under each view angle describes the normal direction of the visible surface element of the object to be processed under the acquisition view angle, and the fusion weight of the surface element under the acquisition view angle can be calculated by utilizing the included angle between the normal direction and the main axis direction of the acquisition view angle of the acquisition device. The smaller the included angle, the higher the confidence that the bin is acquired, and therefore, the greater the fusion weight of the bin during texture fusion. And combining the fusion weights of all the surface elements under the acquisition view angle to obtain a corresponding fusion weight set under the acquisition view angle.
And fusing the defect texture maps under each view angle according to a corresponding fusion weight set under each view angle and a texture map fusion algorithm to obtain a complete texture map. An alternative embodiment is as follows:
respectively carrying out normalization processing on the fusion weight sets under all the acquisition view angles to obtain normalized weight sets under the corresponding view angles; and further, based on the normalized weight set under each acquisition view angle, performing texture fusion processing on the defect texture map under each acquisition view angle to obtain a target texture map corresponding to the object to be processed.
Specifically, in the embodiment of the present application, texture fusion of the defect texture map under each acquisition view angle may be achieved by the following formula:
wherein, the subscript i represents an acquisition visual angle subscript, and M represents the number of acquisition visual angles; softmax (·) represents the normalization procedure, softmax (N i ) Is normalized weight set, N i Representing a fusion weight set under the i acquisition view angle, T i I, collecting defect texture mapping under a visual angle; t (T) fuse Representing the complete texture map, i.e. the target texture map.
The above formula 2 describes a process of weighting and fusing the defect texture maps under each acquisition view angle according to the normalized fusion weight set under the M acquisition view angles, and finally obtaining the target texture map.
Of course, the calculation method listed in the above formula 2 is only a simple example, and other related calculation methods are also applicable to the embodiments of the present application, and are not described herein in detail.
In summary, in the process of generating the complete texture map, on one hand, the texture image under each acquisition view angle is subjected to inverse rendering treatment to obtain a corresponding defect texture map under each acquisition view angle, on the other hand, the normal image under each acquisition view angle is subjected to inverse rendering treatment to obtain a corresponding fusion weight set under each acquisition view angle, and then the fusion weight machine under each acquisition view angle is subjected to normalization treatment to obtain a normalization weight set under each acquisition view angle, and the defect texture map under the corresponding acquisition view angle is subjected to texture fusion treatment according to the normalization weight set to obtain the complete target texture map. In particular, reference may be made to the texture fusion module schematic diagram shown in fig. 15.
Fig. 15 is a schematic diagram of a texture fusion module according to an embodiment of the present application. In fig. 15, texture images under each acquisition view angle are subjected to inverse rendering treatment by an inverse rendering module to obtain corresponding defect texture maps under each acquisition view angle, meanwhile, normal images under each acquisition view angle are subjected to inverse rendering treatment to obtain corresponding fusion weight sets under each acquisition view angle, normalization treatment is performed on fusion weight machines under each acquisition view angle to obtain normalization weight sets under each acquisition view angle, and texture map fusion treatment is performed on the defect texture maps under the corresponding acquisition view angle according to the normalization weight sets to obtain a complete target texture map.
According to the embodiment, the texture mapping fusion is carried out on each defect texture mapping through the normal images under each acquisition view angle, so that the defect that texture cavities in invisible areas exist in the texture mapping under each acquisition view angle in a three-dimensional object is overcome, and the texture mapping under each acquisition view angle has consistency.
In summary, in the embodiment of the present application, the generation process of the texture map may be divided into the following three modules:
referring to fig. 16A, a schematic diagram of generating a texture map according to an embodiment of the present application is shown. The generation of the texture map can be divided into an image rendering module, an image diffusion module for multi-view feature fusion and a texture fusion module.
And an image rendering module: and performing image rendering processing on the three-dimensional object white model according to a plurality of preset acquisition visual angles, and respectively obtaining a normal image, a depth image and a visual angle corresponding map corresponding to the three-dimensional object white model under each acquisition visual angle.
Image diffusion module of multi-view feature fusion: and respectively generating target texture images corresponding to the acquisition view angles according to the depth images and the view angle corresponding images under the acquisition view angles and the texture description information.
Texture fusion module: and carrying out texture fusion processing on each obtained target texture image according to the normal image under each acquisition view angle to obtain a target texture map corresponding to the object to be processed.
According to the method, depth information under multiple visual angles can be simultaneously extracted and processed in a single reasoning process, texture images under the multiple visual angles are generated, and finally the texture images under the multiple visual angles are combined to obtain the complete texture map of the three-dimensional object.
An alternative embodiment is to generate a texture map of the object to be processed by means of the logical block diagram shown in fig. 16B:
as shown in FIG. 16B, a logical block diagram of texture map generation is provided in an embodiment of the present application. FIG. 16B depicts a texture map generated by the server from the three-dimensional object white model and texture description information after the object has been input into the three-dimensional object white model and texture description information of the object to be processed.
Specifically, after the object inputs the three-dimensional object white mould and texture description information of the object to be processed, the image rendering module is used for performing image rendering processing on the three-dimensional object white mould according to 3 preset acquisition view angles to obtain a view angle corresponding diagram, a depth image and a normal image corresponding to each acquisition view angle.
And generating a fused texture map based on the texture description information by an image diffusion module fused by multi-view features and the view corresponding map and the depth image under each acquisition view.
And finally, generating a target texture map based on the fused texture map and the normal image under each acquisition view angle through a texture fusion module.
It should be noted that, in addition to the above-listed several modules, the modules may be further divided into other modules, which are not specifically limited herein.
The target texture map can be applied to various 3D content generation scenes such as 3D material generation, game role manufacturing, VR scene synthesis and the like, for example, in a game role manufacturing process, the texture map generation technology can be adopted for the task of drawing the three-dimensional game role map, texture information conforming to the text description is generated on the surface of a three-dimensional object, the process of directly generating the texture map through the text description is realized, and the efficiency of manufacturing the texture map by a designer is further improved.
In the embodiment of the application, image rendering processing is carried out on a three-dimensional white mold by utilizing a plurality of preset acquisition view angles, so as to obtain a normal image, a depth image and a view angle corresponding diagram of the three-dimensional object under the corresponding plurality of acquisition view angles; taking a view angle corresponding diagram of the three-dimensional object under a plurality of corresponding acquisition view angles as guiding information, carrying out multi-view angle fusion on denoised features under each acquisition view angle according to depth images of the three-dimensional object under the plurality of corresponding acquisition view angles and texture description information of the three-dimensional object input by an object, generating fused features corresponding to each acquisition view angle, and decoding the fused features to ensure that target texture images generated under the plurality of acquisition view angles have consistency and the target texture images conform to the texture description information input by the object. And carrying out texture fusion on each target texture image under a plurality of acquisition view angles according to the normal image under each acquisition view angle, and obtaining the complete target texture map of the three-dimensional object through weighting fusion on the target texture maps under the plurality of acquisition view angles.
Referring to fig. 17, which is an evaluation result diagram generated by a texture map provided by an embodiment of the present application, fig. 17 shows a user evaluation result diagram of the beauty degree of the texture map generated by the present application, and among 1000 samples under investigation, a total of 569 samples consider that the texture map generated by the present application has a better effect, and the ratio of the texture map to the user evaluation result diagram is 56.9%; the 311 samples considered better for the mergy protocol, with a ratio of about 31.1%; the 120 samples considered the Texture solution to work better, with a ratio of about 12.0%.
Obviously, the above investigation further shows that the texture map generated in the embodiment of the application has better use effect.
Based on the same inventive concept, the embodiment of the application also provides a device for generating the texture map. As shown in fig. 18, which is a schematic structural diagram of a texture map generating apparatus 1800, may include:
an acquiring unit 1801, configured to acquire three-dimensional object white model and texture description information corresponding to an object to be processed;
the rendering unit 1802 is configured to perform image rendering processing on a three-dimensional object white model according to a plurality of preset acquisition viewing angles, and obtain a normal image, a depth image, and a viewing angle corresponding map corresponding to the three-dimensional object white model under each acquisition viewing angle; the normal image is used for describing the normal direction of each surface element in the object to be processed, and the view angle corresponding graph is used for describing the index value of each surface element in the object to be processed under the corresponding acquisition view angle;
The texture generation unit 1803 is configured to generate target texture images corresponding to the acquired view angles according to the depth image and the view angle corresponding map under the acquired view angles, and the texture description information;
the texture fusion unit 1804 is configured to perform texture fusion processing on each obtained target texture image according to the normal image under each acquisition view angle, so as to obtain a target texture map corresponding to the object to be processed.
Optionally, the texture generating unit 1803 is specifically configured to:
according to texture description information, depth images under all acquired view angles and view angle corresponding diagrams, carrying out at least one iteration denoising treatment on noisy texture images under all acquired view angles, and taking fused texture images under all acquired view angles obtained by the last iteration denoising treatment as target texture images under corresponding view angles;
the texture image after fusion is obtained by fusing denoised features under all the acquisition view angles based on view angle corresponding diagrams under all the acquisition view angles; the depth image and the texture description information are denoising conditions of the denoised features obtained in the iterative denoising processing.
Optionally, the texture generation unit 1803 performs the denoising process per iteration by:
For each acquisition view angle, denoising the noisy texture image under one acquisition view angle according to texture description information and a depth image under one acquisition view angle to obtain denoised characteristics under one acquisition view angle; the noisy texture image is obtained by random initialization during the first iterative denoising process;
according to the view angle corresponding graph under each acquisition view angle, carrying out multi-view angle fusion processing on the denoised characteristics under each acquisition view angle to obtain a fused texture image under each acquisition view angle;
and respectively taking the fused texture images under all the acquisition view angles as noisy texture images under the corresponding view angles in the next iterative denoising process.
Optionally, each acquisition view corresponds to a trained diffusion model;
the texture generation unit 1803 is specifically configured to:
inputting texture description information, a depth image under an acquisition view angle and a texture image with noise under the acquisition view angle into a corresponding diffusion model;
based on the diffusion model, deep neural network characteristics are extracted from texture description information, depth images and noisy texture images, and denoising is carried out on the deep neural network characteristics to obtain denoised characteristics under an acquisition view angle.
Optionally, the texture generating unit 1803 is specifically configured to:
based on a preset mapping relation, carrying out multi-view fusion processing on the denoised characteristics under each acquisition view angle to obtain candidate fusion characteristics under each acquisition view angle;
constructing an objective function based on the view angle corresponding graph under each acquisition view angle, the denoised characteristic under each acquisition view angle and the candidate fusion characteristic under each acquisition view angle;
adjusting a preset mapping relation by minimizing an objective function to obtain an objective mapping relation;
based on the target mapping relation, carrying out multi-view fusion processing on the denoised characteristics under each acquisition view angle to obtain fused characteristics under each acquisition view angle;
and for each acquisition view angle, respectively carrying out feature decoding processing on the fused features under one acquisition view angle to obtain a fused texture image under one acquisition view angle.
Optionally, the texture generating unit 1803 is specifically configured to:
for each surface element, determining the surface element weight of one surface element under the corresponding view angle according to the view angle corresponding graph under each acquisition view angle;
for each acquisition view angle, acquiring information difference between the denoised characteristic and the candidate fusion characteristic of each face element under one acquisition view angle; according to the respective bin weights of each bin under one acquisition view angle, carrying out weighted summation on the corresponding information differences to obtain information difference sums under one acquisition view angle;
And summing the information difference sum under each acquisition view angle again to obtain an objective function.
Optionally, the texture generating unit 1803 is specifically configured to:
if a face element is determined to be visible under one acquisition view angle according to the view angle corresponding diagram under the one acquisition view angle, determining the face element weight of the face element under the one acquisition view angle as a preset first face element weight;
if the face element is not visible under the acquisition view angle according to the view angle corresponding diagram under the acquisition view angle, determining the face element weight of the face element under the acquisition view angle as a preset second face element weight;
wherein the first primitive weight is greater than the second primitive weight.
Optionally, the texture fusion unit 1804 is specifically configured to:
for each acquisition view angle, respectively performing reverse rendering treatment on texture images under one acquisition view angle to obtain a defect texture map with texture holes under one acquisition view angle; performing reverse rendering processing on a normal image under one acquisition view angle to obtain a fusion weight set under one acquisition view angle, wherein the fusion weight set comprises fusion weights of each surface element under one acquisition view angle;
and carrying out texture fusion processing on the defect texture map under each acquisition view angle according to the fusion weight set under each acquisition view angle to obtain a target texture map corresponding to the object to be processed.
Optionally, each acquisition view corresponds to one acquisition device;
the texture fusion unit 1804 is specifically configured to:
for each surface element under one acquisition view angle, determining the fusion weight of the next surface element under the one acquisition view angle according to the included angle between the normal direction of the surface element and the main axis direction of the acquisition equipment; wherein the included angle is inversely related to the fusion weight;
and combining the fusion weights of each bin under one acquisition view angle to obtain a fusion weight set under one acquisition view angle.
Optionally, the texture fusion unit 1804 is specifically configured to:
respectively carrying out normalization processing on the fusion weight sets under all the acquisition view angles to obtain normalized weight sets under the corresponding view angles;
and carrying out texture fusion processing on the defect texture map under each acquisition view angle based on the normalized weight set under each acquisition view angle to obtain a target texture map corresponding to the object to be processed.
Based on the above embodiment, in the embodiment of the present application, image rendering processing is performed on the three-dimensional white mold by using a plurality of preset acquisition view angles, so as to obtain a normal image, a depth image and a view angle corresponding diagram of the three-dimensional object under the corresponding plurality of acquisition view angles; and taking the view angle corresponding graph of the three-dimensional object under the corresponding multiple acquisition view angles as guiding information, and generating target texture images corresponding to the acquisition view angles respectively according to depth images of the three-dimensional object under the corresponding multiple acquisition view angles and texture description information of the three-dimensional object input by the object, so that the target texture images generated under the multiple acquisition view angles have consistency, and the target texture images conform to the texture description information input by the object. And carrying out texture fusion on each target texture image under a plurality of acquisition view angles according to the normal image under each acquisition view angle, and obtaining the complete target texture map of the three-dimensional object through weighting fusion on the target texture maps under the plurality of acquisition view angles. In the processing process, according to texture description information given by an object, a target texture image under a plurality of acquisition view angles is fused by utilizing a plurality of preset acquisition view angles, and a target texture map with consistent textures under the plurality of acquisition view angles is generated on the white mold surface of the three-dimensional object, so that the manufacturing efficiency of the three-dimensional object texture map is improved.
For convenience of description, the above parts are described as being functionally divided into modules (or units) respectively. Of course, the functions of each module (or unit) may be implemented in the same piece or pieces of software or hardware when implementing the present application.
Having described the method and apparatus for generating texture maps of exemplary embodiments of the present application, an electronic device according to another exemplary embodiment of the present application is next described.
Those skilled in the art will appreciate that the various aspects of the present application may be implemented as a system, method, or program product. Accordingly, aspects of the present application may be embodied in the following forms, namely: an entirely hardware embodiment, an entirely software embodiment (including firmware, micro-code, etc.) or an embodiment combining hardware and software aspects may be referred to herein as a "circuit," module "or" system.
The embodiment of the application also provides electronic equipment based on the same inventive concept as the embodiment of the method. In one embodiment, the electronic device may be a server, such as server 220 shown in FIG. 2. In this embodiment, the electronic device may be configured as shown in fig. 19, including a memory 1901, a communication module 1903, and one or more processors 1902.
A memory 1901 for storing computer programs for execution by the processor 1902. The memory 1901 may mainly include a program storage area and a data storage area, wherein the program storage area may store an operating system, programs required for running an instant communication function, and the like; the storage data area can store various instant messaging information, operation instruction sets and the like.
The memory 1901 may be a volatile memory (RAM) such as a random-access memory (RAM); the memory 1901 may also be a nonvolatile memory (non-volatile memory), such as a read-only memory, a flash memory (flash memory), a hard disk (HDD) or a Solid State Drive (SSD); or memory 1901, is any other medium that can be used to carry or store a desired computer program in the form of instructions or data structures and that can be accessed by a computer, but is not limited to such. The memory 1901 may be a combination of the above memories.
The processor 1902 may include one or more central processing units (central processing unit, CPU) or digital processing units, or the like. A processor 1902, configured to implement the above-described method for generating texture maps when calling a computer program stored in a memory 1901.
The communication module 1903 is used for communicating with a terminal device and other servers.
The specific connection medium between the memory 1901, the communication module 1903, and the processor 1902 is not limited in the embodiments of the present application. In the embodiment of the present application, the memory 1901 and the processor 1902 are connected by a bus 1904 in fig. 19, and the bus 1904 is depicted by a thick line in fig. 19, and the connection manner between other components is merely illustrative and not limited thereto. The bus 1904 may be divided into an address bus, a data bus, a control bus, and the like. For ease of description, only one thick line is depicted in fig. 19, but only one bus or one type of bus is not depicted.
The memory 1901 stores therein a computer storage medium having stored therein computer executable instructions for implementing the method of generating a texture map according to the embodiment of the present application. The processor 1902 is configured to perform the method for generating a texture map described above, as shown in fig. 3.
In another embodiment, the electronic device may also be other electronic devices, such as the terminal device 210 shown in fig. 2. In this embodiment, the structure of the electronic device may include, as shown in fig. 20: communication assembly 2010, memory 2020, display unit 2030, camera 2040, sensor 2050, audio circuit 2060, bluetooth module 2070, processor 2080 and the like.
The communication component 2010 is for communicating with a server. In some embodiments, a circuit wireless fidelity (Wireless Fidelity, wiFi) module may be included, where the WiFi module belongs to a short-range wireless transmission technology, and the electronic device may help the user to send and receive information through the WiFi module.
Memory 2020 may be used for storing software programs and data. The processor 2080 executes various functions of the terminal device 210 and data processing by executing software programs or data stored in the memory 2020. The memory 2020 may include high-speed random access memory and may also include non-volatile memory such as at least one magnetic disk storage device, flash memory device, or other volatile solid-state storage device. The memory 2020 stores an operating system that enables the terminal device 210 to operate. The memory 2020 may store an operating system and various application programs, and may also store a computer program for executing the method for generating a texture map according to the embodiment of the present application.
The display unit 2030 may also be used to display information input by a user or information provided to the user and a graphical user interface (graphical user interface, GUI) of various menus of the terminal device 210. Specifically, the display unit 2030 may include a display screen 2032 provided on the front surface of the terminal apparatus 210. The display 2032 may be configured in the form of a liquid crystal display, light emitting diodes, or the like. The display unit 2030 may be configured to display an interface of the object to be processed in the embodiment of the application, for example, an interface that presents a white model of the object to be processed corresponding to a three-dimensional object, a target texture map, and the like.
The display unit 2030 may also be used for receiving input numeric or character information, generating signal inputs related to user settings and function control of the terminal device 210, and in particular, the display unit 2030 may include a touch screen 2031 provided on the front surface of the terminal device 210, and may collect touch operations on or near the user, such as clicking buttons, dragging scroll boxes, and the like.
The touch screen 2031 may be covered on the display screen 2032, or the touch screen 2031 and the display screen 2032 may be integrated to implement input and output functions of the terminal device 210, and after integration, the touch screen may be simply referred to as a touch screen. The display unit 2030 may display an application program and corresponding operation steps in this application.
The camera 2040 may be used to capture still images, and a user may post images captured by the camera 2040 through an application. The camera 2040 may be one or a plurality of cameras. The object generates an optical image through the lens and projects the optical image onto the photosensitive element. The photosensitive element may be a charge coupled device (charge coupled device, CCD) or a Complementary Metal Oxide Semiconductor (CMOS) phototransistor. The photosensitive element converts the optical signal into an electrical signal, which is then transferred to the processor 2080 for conversion into a digital image signal.
The terminal device may also include at least one sensor 2050, such as an acceleration sensor 2051, a distance sensor 2052, a fingerprint sensor 2053, a temperature sensor 2054. The terminal device may also be configured with other sensors such as gyroscopes, barometers, hygrometers, thermometers, infrared sensors, light sensors, motion sensors, and the like.
The audio circuitry 2060, speaker 2061, microphone 2062 may provide an audio interface between the user and the terminal device 210. The audio circuit 2060 may transmit the received electrical signal converted from audio data to the speaker 2061, and be converted into a sound signal by the speaker 2061 to be output. The terminal device 210 may also be configured with a volume button for adjusting the volume of the sound signal. On the other hand, the microphone 2062 converts the collected sound signal into an electrical signal, receives it by the audio circuit 2060 and converts it into audio data, which is then output to the communication component 2010 for transmission to, for example, another terminal device 210, or to the memory 2020 for further processing.
The bluetooth module 2070 is used for exchanging information with other bluetooth devices having a bluetooth module through a bluetooth protocol. For example, the terminal device may establish a bluetooth connection with a wearable electronic device (e.g., a smart watch) that also has a bluetooth module through the bluetooth module 2070, so as to perform data interaction.
The processor 2080 is a control center of the terminal device, and connects various parts of the entire terminal using various interfaces and lines, and performs various functions of the terminal device and processes data by running or executing software programs stored in the memory 2020, and calling data stored in the memory 2020. In some embodiments, the processor 2080 may include one or more processing units; the processor 2080 may also integrate an application processor that primarily handles operating systems, user interfaces, applications, etc., and a baseband processor that primarily handles wireless communications. It will be appreciated that the baseband processor described above may not be integrated into the processor 2080. The processor 2080 may run an operating system, an application program, a user interface display, a touch response, and a method for generating a texture map according to an embodiment of the present application. In addition, the processor 2080 is coupled to the display unit 2030.
In some possible embodiments, aspects of the method for generating a texture map provided herein may also be implemented in the form of a program product comprising a computer program for causing an electronic device to perform the steps in the method for generating a texture map according to the various exemplary embodiments of the present application described herein above, when the program product is run on an electronic device, e.g. the electronic device may perform the steps as shown in fig. 3.
The program product may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. The readable storage medium can be, for example, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium would include the following: an electrical connection having one or more wires, a portable disk, a hard disk, random Access Memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
The program product of embodiments of the present application may employ a portable compact disc read only memory (CD-ROM) and comprise a computer program and may be run on an electronic device. However, the program product of the present application is not limited thereto, and in this document, a readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with a command execution system, apparatus, or device.
The readable signal medium may comprise a data signal propagated in baseband or as part of a carrier wave in which a readable computer program is embodied. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A readable signal medium may also be any readable medium that is not a readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with a command execution system, apparatus, or device.
A computer program embodied on a readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
Computer programs for performing the operations of the present application may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, C++ or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The computer program may execute entirely on the consumer electronic device, partly on the consumer electronic device, as a stand-alone software package, partly on the consumer electronic device and partly on a remote electronic device or entirely on the remote electronic device or server. In the case of remote electronic devices, the remote electronic device may be connected to the consumer electronic device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external electronic device (e.g., connected through the internet using an internet service provider).
It should be noted that although several units or sub-units of the apparatus are mentioned in the above detailed description, such a division is merely exemplary and not mandatory. Indeed, the features and functions of two or more of the elements described above may be embodied in one element in accordance with embodiments of the present application. Conversely, the features and functions of one unit described above may be further divided into a plurality of units to be embodied.
Furthermore, although the operations of the methods of the present application are depicted in the drawings in a particular order, this is not required to or suggested that these operations must be performed in this particular order or that all of the illustrated operations must be performed in order to achieve desirable results. Additionally or alternatively, certain steps may be omitted, multiple steps combined into one step to perform, and/or one step decomposed into multiple steps to perform.
It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having a computer-usable computer program embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program commands may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the commands executed by the processor of the computer or other programmable data processing apparatus produce means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program commands may also be stored in a computer readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the commands stored in the computer readable memory produce an article of manufacture including command means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
While preferred embodiments of the present application have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. It is therefore intended that the following claims be interpreted as including the preferred embodiments and all such alterations and modifications as fall within the scope of the application.
It will be apparent to those skilled in the art that various modifications and variations can be made in the present application without departing from the spirit or scope of the application. Thus, if such modifications and variations of the present application fall within the scope of the claims and the equivalents thereof, the present application is intended to cover such modifications and variations.

Claims (15)

1. A method of generating a texture map, the method comprising:
Acquiring three-dimensional object white model and texture description information corresponding to an object to be processed;
respectively carrying out image rendering processing on the three-dimensional object white mould according to a plurality of preset acquisition visual angles, and respectively obtaining a normal image, a depth image and a visual angle corresponding image corresponding to the three-dimensional object white mould under each acquisition visual angle; the normal image is used for describing the normal direction of each bin in the object to be processed, and the view angle corresponding graph is used for describing the index value of each bin in the object to be processed under the corresponding acquisition view angle;
respectively generating target texture images corresponding to the acquisition view angles according to the depth images and view angle corresponding maps under the acquisition view angles and the texture description information;
and carrying out texture fusion processing on each obtained target texture image according to the normal image under each acquisition view angle to obtain a target texture map corresponding to the object to be processed.
2. The method of claim 1, wherein generating the target texture image corresponding to each acquisition view according to the depth image and the view map at each acquisition view and the texture description information, respectively, comprises:
According to the texture description information, the depth image and the view angle corresponding image under each acquisition view angle, carrying out at least one iteration denoising treatment on the noisy texture image under each acquisition view angle, and taking the fused texture image under each acquisition view angle obtained by the last iteration denoising treatment as a target texture image under the corresponding view angle;
the texture image after fusion is obtained by fusing the denoised characteristics under each acquisition view angle based on the view angle corresponding diagram under each acquisition view angle; and the depth image and the texture description information are denoising conditions of the denoised features obtained in the iterative denoising processing process.
3. The method of claim 2, wherein each iterative denoising process is performed by:
for each acquisition view angle, denoising the noisy texture image under one acquisition view angle according to the texture description information and the depth image under the one acquisition view angle to obtain denoised characteristics under the one acquisition view angle; the noisy texture image is obtained by random initialization during the first iterative denoising process;
According to the view angle corresponding diagrams under the collecting view angles, carrying out multi-view angle fusion processing on the denoised characteristics under the collecting view angles to obtain fused texture images under the collecting view angles;
and respectively taking the fused texture images under all the acquired view angles as noisy texture images under the corresponding view angles in the next iterative denoising process.
4. A method as claimed in claim 3, wherein each acquisition view corresponds to a trained diffusion model; denoising the noisy texture image at one acquisition view angle according to the texture description information and the depth image at the acquisition view angle to obtain denoised features at the one acquisition view angle, wherein the denoising processing comprises the following steps:
inputting the texture description information, the depth image under the one acquisition view angle and the texture image with noise under the one acquisition view angle into a corresponding diffusion model;
and extracting deep neural network features from the texture description information, the depth image and the noisy texture image based on the diffusion model, and denoising the deep neural network features to obtain denoised features under the acquisition view angle.
5. The method of claim 3, wherein the performing multi-view fusion processing on the denoised feature at each acquisition view according to the view map at each acquisition view to obtain the fused texture image at each acquisition view comprises:
based on a preset mapping relation, carrying out multi-view fusion processing on the denoised features under each acquisition view angle to obtain candidate fusion features under each acquisition view angle;
constructing an objective function based on the view angle corresponding graph under each acquisition view angle, the denoised characteristic under each acquisition view angle and the candidate fusion characteristic under each acquisition view angle;
adjusting the preset mapping relation by minimizing the objective function to obtain an objective mapping relation;
based on the target mapping relation, carrying out multi-view fusion processing on the denoised features under each acquisition view angle to obtain fused features under each acquisition view angle;
and for each acquisition view angle, respectively carrying out feature decoding processing on the fused features under one acquisition view angle to obtain a fused texture image under the one acquisition view angle.
6. The method of claim 5, wherein the constructing an objective function based on the view map at each of the acquisition views, the denoised feature at each of the acquisition views, and the candidate fusion feature at each of the acquisition views comprises:
For each surface element, determining the surface element weight of the surface element under the corresponding view angle according to the view angle corresponding graph under each acquisition view angle;
for each acquisition view angle, acquiring information difference between the denoised characteristic and the candidate fusion characteristic of each face element under one acquisition view angle; according to the respective bin weights of each bin under the one acquisition view angle, carrying out weighted summation on the corresponding information differences to obtain information difference sums under the one acquisition view angle;
and summing the information difference sum under each acquisition view angle again to obtain the objective function.
7. The method of claim 6, wherein the determining the bin weight for the one bin at the respective view angle based on the view angle map at the respective acquisition view angle, respectively, comprises:
if the one surface element is determined to be visible under the one acquisition view angle according to the view angle corresponding diagram under the one acquisition view angle, determining that the surface element weight of the one surface element under the one acquisition view angle is a preset first surface element weight;
if the one surface element is not visible under the one acquisition view angle according to the view angle corresponding diagram under the one acquisition view angle, determining that the surface element weight of the one surface element under the one acquisition view angle is a preset second surface element weight;
Wherein the first element weight is greater than the second element weight.
8. The method according to any one of claims 1 to 7, wherein the performing texture fusion processing on the target texture image under each acquisition view according to the normal image under each acquisition view to obtain a target texture map corresponding to the object to be processed includes:
for each acquisition view angle, respectively performing reverse rendering treatment on texture images under one acquisition view angle to obtain a defect texture map with texture holes under the one acquisition view angle; performing inverse rendering processing on the normal image under the one acquisition view angle to obtain a fusion weight set under the one acquisition view angle, wherein the fusion weight set comprises fusion weights of each bin under the one acquisition view angle;
and carrying out texture fusion processing on the defect texture map under each acquisition view angle according to the fusion weight set under each acquisition view angle to obtain a target texture map corresponding to the object to be processed.
9. The method of claim 8, wherein each of the acquisition perspectives corresponds to an acquisition device; performing inverse rendering processing on the normal image under the one acquisition view angle to obtain a fusion weight set under the one acquisition view angle, wherein the method comprises the following steps:
For each surface element under the one acquisition view angle, determining the fusion weight of the surface element under the one acquisition view angle according to the included angle between the normal direction of the surface element and the main axis direction of the acquisition equipment; wherein the included angle is inversely related to the fusion weight;
and combining the fusion weights of each bin under the one acquisition view angle to obtain a fusion weight set under the one acquisition view angle.
10. The method of claim 8, wherein the performing texture fusion processing on the defective texture map at each acquisition view according to the fusion weight set at each acquisition view to obtain the target texture map corresponding to the object to be processed comprises:
respectively carrying out normalization processing on the fusion weight sets under the acquisition view angles to obtain normalized weight sets under the corresponding view angles;
and carrying out texture fusion processing on the defect texture map under each acquisition view angle based on the normalized weight set under each acquisition view angle to obtain a target texture map corresponding to the object to be processed.
11. A texture map generation apparatus, comprising:
The acquisition unit is used for acquiring the three-dimensional object white model and texture description information corresponding to the object to be processed;
the rendering unit is used for performing image rendering processing on the three-dimensional object white model according to a plurality of preset acquisition visual angles respectively to obtain a normal image, a depth image and a visual angle corresponding map corresponding to the three-dimensional object white model under each acquisition visual angle respectively; the normal image is used for describing the normal direction of each bin in the object to be processed, and the view angle corresponding graph is used for describing the index value of each bin in the object to be processed under the corresponding acquisition view angle;
the texture generation unit is used for respectively generating target texture images corresponding to the acquisition view angles according to the depth images and the view angle corresponding maps under the acquisition view angles and the texture description information;
and the texture fusion unit is used for carrying out texture fusion processing on each obtained target texture image according to the normal image under each acquisition view angle to obtain the target texture map corresponding to the object to be processed.
12. The apparatus of claim 11, wherein the texture generation unit is specifically configured to:
and carrying out at least one iteration denoising treatment on the noisy texture image under each acquisition view angle according to the texture description information, the depth image under each acquisition view angle and the view angle corresponding graph, and taking the fused texture image under each acquisition view angle obtained by the last iteration denoising treatment as a target texture image under the corresponding view angle.
13. An electronic device comprising a processor and a memory, wherein the memory stores a computer program which, when executed by the processor, causes the processor to perform the steps of the method of any of claims 1 to 10.
14. A computer readable storage medium, characterized in that it comprises a computer program for causing an electronic device to perform the steps of the method according to any one of claims 1-10 when said computer program is run on the electronic device.
15. A computer program product comprising a computer program, the computer program being stored on a computer readable storage medium; when the computer program is read from the computer readable storage medium by a processor of an electronic device, the processor executes the computer program, causing the electronic device to perform the steps of the method of any one of claims 1-10.
CN202311532994.7A 2023-11-15 2023-11-15 Method and device for generating texture map, electronic equipment and storage medium Pending CN117496036A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311532994.7A CN117496036A (en) 2023-11-15 2023-11-15 Method and device for generating texture map, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311532994.7A CN117496036A (en) 2023-11-15 2023-11-15 Method and device for generating texture map, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN117496036A true CN117496036A (en) 2024-02-02

Family

ID=89668938

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311532994.7A Pending CN117496036A (en) 2023-11-15 2023-11-15 Method and device for generating texture map, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN117496036A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118154826A (en) * 2024-05-13 2024-06-07 腾讯科技(深圳)有限公司 Image processing method, device, equipment and storage medium
CN118247411A (en) * 2024-05-28 2024-06-25 淘宝(中国)软件有限公司 Material map generation method and device

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118154826A (en) * 2024-05-13 2024-06-07 腾讯科技(深圳)有限公司 Image processing method, device, equipment and storage medium
CN118247411A (en) * 2024-05-28 2024-06-25 淘宝(中国)软件有限公司 Material map generation method and device

Similar Documents

Publication Publication Date Title
JP7373554B2 (en) Cross-domain image transformation
CN117496036A (en) Method and device for generating texture map, electronic equipment and storage medium
CN112562019A (en) Image color adjusting method and device, computer readable medium and electronic equipment
US20220198731A1 (en) Pixel-aligned volumetric avatars
JP2022553252A (en) IMAGE PROCESSING METHOD, IMAGE PROCESSING APPARATUS, SERVER, AND COMPUTER PROGRAM
CN112598780B (en) Instance object model construction method and device, readable medium and electronic equipment
CN116957932A (en) Image generation method and device, electronic equipment and storage medium
CN109920016A (en) Image generating method and device, electronic equipment and storage medium
WO2024041235A1 (en) Image processing method and apparatus, device, storage medium and program product
CN115249304A (en) Training method and device for detecting segmentation model, electronic equipment and storage medium
CN117688204A (en) Training method and device for video recommendation model, electronic equipment and storage medium
CN116977547A (en) Three-dimensional face reconstruction method and device, electronic equipment and storage medium
CN117011415A (en) Method and device for generating special effect text, electronic equipment and storage medium
CN116958852A (en) Video and text matching method and device, electronic equipment and storage medium
CN111652831B (en) Object fusion method and device, computer-readable storage medium and electronic equipment
CN113362260A (en) Image optimization method and device, storage medium and electronic equipment
CN117576245B (en) Method and device for converting style of image, electronic equipment and storage medium
Li et al. Deep surface normal estimation on the 2-sphere with confidence guided semantic attention
CN116978079A (en) Image recognition method and device, electronic equipment and storage medium
CN112733731B (en) Monocular-based multi-modal depth map generation method, system, device and storage medium
CN113240796B (en) Visual task processing method and device, computer readable medium and electronic equipment
US20230099463A1 (en) Window, Door, and Opening Detection for 3D Floor Plans
CN116977195A (en) Method, device, equipment and storage medium for adjusting restoration model
He Interactive Virtual Reality Indoor Space Roaming System Based on 3D Vision
JP2024501958A (en) Pixel aligned volumetric avatar

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication