CN114419335A - Training and texture migration method of texture recognition model and related device - Google Patents

Training and texture migration method of texture recognition model and related device Download PDF

Info

Publication number
CN114419335A
CN114419335A CN202210010675.9A CN202210010675A CN114419335A CN 114419335 A CN114419335 A CN 114419335A CN 202210010675 A CN202210010675 A CN 202210010675A CN 114419335 A CN114419335 A CN 114419335A
Authority
CN
China
Prior art keywords
texture
image data
detection network
mapping
inputting
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210010675.9A
Other languages
Chinese (zh)
Inventor
胡忠冰
张彤
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Bigo Technology Pte Ltd
Original Assignee
Bigo Technology Pte Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Bigo Technology Pte Ltd filed Critical Bigo Technology Pte Ltd
Priority to CN202210010675.9A priority Critical patent/CN114419335A/en
Publication of CN114419335A publication Critical patent/CN114419335A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Image Analysis (AREA)

Abstract

The invention provides a method for training a texture recognition model, a method for texture migration and a related device, wherein the method comprises the following steps: acquiring first image data; calling a mapping detection network to extract a multi-frame texture mapping from first image data; calling a parameter detection network to extract texture parameters from the first image data; differentially rendering the texture map and the texture parameters to a scene in the first image data to obtain second image data; the second image data is used as a supervision signal to train the mapping detection network and the parameter detection network, the mapping detection network and the parameter detection network not only belong to automatic operation, a user has no perception and low learning threshold, but also have the capability of rendering textures to a scene, so that a designer can reduce the operation in the aspect of texture processing, the simplicity and convenience of operation are improved, the time and the energy consumed by modeling are reduced, the efficiency is improved, and the cost is reduced.

Description

Training and texture migration method of texture recognition model and related device
Technical Field
The invention relates to the technical field of computer vision, in particular to a training and texture transferring method and a related device of a texture recognition model.
Background
In scenes such as games, video entertainment and the like, scenes such as characters, props, buildings and the like are modeled, namely, objects are designed according to the proportion of the scenes such as the characters, the props, the buildings and the like.
At present, the modeling work is usually realized by a designer manually using a modeling engine, the learning threshold of the modeling engine is higher, and the operation is more complicated, so that the designer needs to spend a large amount of time and energy for modeling, the cost is higher, and the efficiency is lower.
Disclosure of Invention
The invention provides a training and texture migration method and a related device of a texture recognition model, and aims to solve the problems of high modeling cost and low efficiency.
In a first aspect, an embodiment of the present invention provides a training method for a texture recognition model, where the texture recognition model includes a mapping detection network and a parameter detection network, and the method includes:
acquiring first image data;
calling the mapping detection network to extract a multi-frame texture mapping from the first image data;
calling the parameter detection network to extract texture parameters from the first image data;
differentially rendering the texture map and the texture parameters to a scene in the first image data to obtain second image data;
and training the mapping detection network and the parameter detection network by taking the second image data as a supervision signal.
In a second aspect, an embodiment of the present invention further provides a texture migration method, including:
loading a texture recognition model, wherein the texture recognition model comprises a mapping detection network and a parameter detection network, and acquiring first image data;
calling the mapping detection network to extract a multi-frame texture mapping from the first image data;
calling the parameter detection network to extract texture parameters from the first image data;
the texture map and the texture parameters are differentially rendered into the scene independent of the first image data, obtaining second image data.
In a third aspect, an embodiment of the present invention further provides a training apparatus for a texture recognition model, where the texture recognition model includes a mapping detection network and a parameter detection network, and the apparatus includes:
the image data acquisition module is used for acquiring first image data;
the texture mapping extraction module is used for calling the mapping detection network to extract a plurality of frames of texture mapping from the first image data;
the texture parameter extraction module is used for calling the parameter detection network to extract texture parameters from the first image data;
an image data rendering module, configured to render the texture map and the texture parameter differentially to a scene in the first image data, so as to obtain second image data;
and the network training module is used for training the mapping detection network and the parameter detection network by taking the second image data as a supervision signal.
In a fourth aspect, an embodiment of the present invention further provides a texture migration apparatus, including:
a texture recognition model loading module for loading a texture recognition model, wherein the texture recognition model comprises a mapping detection network and a parameter detection network,
the image data acquisition module is used for acquiring first image data;
the texture mapping extraction module is used for calling the mapping detection network to extract a plurality of frames of texture mapping from the first image data;
the texture parameter extraction module is used for calling the parameter detection network to extract texture parameters from the first image data;
and the image data rendering module is used for rendering the texture map and the texture parameter into a scene independent of the first image data in a differentiable manner to obtain second image data.
In a fifth aspect, an embodiment of the present invention further provides a computer device, where the computer device includes:
one or more processors;
a memory for storing one or more programs,
when executed by the one or more processors, cause the one or more processors to implement a method of training a texture recognition model according to the first aspect or a method of texture migration according to the second aspect.
In a sixth aspect, the present invention further provides a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the method for training a texture recognition model according to the first aspect or the method for texture migration according to the second aspect.
In the present embodiment, first image data is acquired; calling a mapping detection network to extract a multi-frame texture mapping from first image data; calling a parameter detection network to extract texture parameters from the first image data; differentially rendering the texture map and the texture parameters to a scene in the first image data to obtain second image data; the second image data is used as a supervision signal to train the chartlet detection network and the parameter detection network, the chartlet detection network and the parameter detection network not only belong to automatic operation, users have no perception and low learning threshold, but also have the capability of rendering textures to scenes, so that designers can reduce the operation in the aspect of texture processing, the simplicity of operation is improved, the time and the energy consumed by modeling are reduced, the efficiency is improved, and the cost is reduced The parameter detects the difficulty of the network.
Drawings
Fig. 1 is a flowchart of a training method of a texture recognition model according to an embodiment of the present invention;
fig. 2 is a schematic structural diagram of a map detection network according to an embodiment of the present invention;
fig. 3 is a schematic structural diagram of a parameter detection network according to an embodiment of the present invention;
FIG. 4 is a flowchart of a texture migration method according to a second embodiment of the present invention;
FIG. 5 is a diagram illustrating a texture migration process according to a second embodiment of the present invention;
fig. 6 is a schematic structural diagram of a training apparatus for a texture recognition model according to a third embodiment of the present invention;
fig. 7 is a schematic structural diagram of a texture migration apparatus according to a fourth embodiment of the present invention;
fig. 8 is a schematic structural diagram of a computer device according to a fifth embodiment of the present invention.
Detailed Description
The present invention will be described in further detail with reference to the accompanying drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not limiting of the invention. It should be further noted that, for the convenience of description, only some of the structures related to the present invention are shown in the drawings, not all of the structures.
Example one
Fig. 1 is a flowchart of a training method for a texture recognition model according to an embodiment of the present invention, where the embodiment is applicable to a case of training the texture recognition model by self-supervision, and the method can be executed by a training apparatus for the texture recognition model, where the training apparatus for the texture recognition model can be implemented by software and/or hardware, and can be configured in a computer device, such as a server, a workstation, a personal computer, and the like, and specifically includes the following steps:
step 101, acquiring first image data.
In this embodiment, multiple frames of first image data may be collected through a public data set, and the like, where the first image data is two-dimensional image data captured in a real scene, and objects such as people, animals, tools, and buildings are located in the real scene, and the objects have real textures.
Step 102, calling a mapping detection network to extract a multi-frame texture mapping from the first image data.
In this embodiment, the texture recognition model includes a map detection network, which belongs to a reconstruction network, such as a full convolution network, and is configured to parse a multi-frame texture map of an object from the first image data, that is, the input of the map detection network is the first image data, and the output is the multi-frame texture map of the object.
In computer graphics, texture mapping is a technique that uses images, functions, or other data sources to change the appearance of an object's surface. For example, a color image of a brick wall may be applied to a polygon without an accurate representation of the geometry of the brick wall. When the polygon is viewed, the color image appears at the position of the polygon. The viewer will not typically notice deficiencies in the geometric details thereof (such as the fact that the image of the brick and mortar is actually displayed on a smooth surface) without access to the wall. Combining the image and the object surface in this way saves a lot of resources in terms of modeling, storage space and speed.
The type of texture map may be set according to the requirements of the service, which is not limited in this embodiment.
In an embodiment of the present invention, as shown in fig. 2, the map detection network includes an Encoder210, a pooling layer 220, and a Decoder 230, wherein, in this embodiment, step 102 may include the following steps:
step 1021, inputting the first image data into an encoder to execute encoding operation, and obtaining the target encoding characteristic.
The encoder is used for converting an input sequence with an indefinite length into a background variable with a definite length and encoding input sequence information in the background variable, the structure of the encoder can be a multilayer RNN (Recurrent Neural Network), a separable convolution and residual module and the like, the encoder is used for converting in a hidden layer, and hidden states of various time steps are converted into the background variable through a custom function.
In the present embodiment, as shown in fig. 2, for each frame of the first image data 200, the Encoder210 is invoked to encode the first image data into features, which are denoted as target encoding features, that is, the first image data 200 is input into the Encoder210, and the Encoder210 performs an encoding operation on the first image data 200, and outputs the target encoding features.
In one example, as shown in fig. 2, the Encoder encorder 210 includes a first coding Block encorder Block 211, a second coding Block encorder Block 212, and a third coding Block encorder Block 213, where the first coding Block encorder Block 211, the second coding Block encorder Block 212, and the third coding Block encorder Block 213 are all packages and abstractions of some structures for deep learning, so as to multiplex structures of other projects, and reduce development cost of technicians.
Further, the structures of the first, second, and third coding Block encorder blocks 211, 212, and 213 may be the same or different, and this example is not limited thereto.
In a specific implementation, the first image data 200 is input into a first encoding Block Encoder Block 211 for down-sampling, a first candidate encoding feature is obtained, that is, the first encoding Block Encoder Block 211 performs down-sampling on the first image data 200, extracts a feature in the first image data 200, and records the feature as the first candidate encoding feature, and outputs the first candidate encoding feature.
And inputting the first candidate coding features into a second coding Block Encoder Block 212 for downsampling to obtain second candidate coding features, namely the second coding Block Encoder Block 212 downsamples the first candidate coding features, extracts the features in the first candidate coding features, records the features as the second candidate coding features, and outputs the second candidate coding features.
And inputting the second candidate coding features into a third coding Block Encoder Block 213 for down-sampling to obtain target coding features, namely, the third coding Block Encoder Block 213 performs down-sampling on the second candidate coding features, extracts features in the second candidate coding features, records the features as the target coding features, and outputs the target coding features.
And 1022, inputting the target coding features into the pooling layer to perform pooling operation, so as to obtain target pooling features.
In this embodiment, as shown in fig. 2, with the goal of improving the receptive field, the Pooling layer Pooling220 is called to perform Pooling operation on the target coding feature, so as to obtain the target Pooling feature, that is, the input of the Pooling layer Pooling220 is the target coding feature, and the output is the target Pooling feature.
The receptive field represents the size of the receptive range of neurons at different positions in the network on an original image (namely, a target coding feature), and the larger the value of the receptive field of the neurons is, the larger the range of the original image which can be contacted by the neurons is, which also means that more global features with higher semantic level may be included; the smaller the value of the neural receptive field, the more local and detailed the features contained in the neural receptive field tend to be, so that the value of the receptive field can be roughly used to determine the abstraction level of each layer.
Illustratively, as shown in fig. 2, the Pooling layer Pooling220 includes an empty space convolution Pooling Pyramid (ASPP) 221, then, in this example, the target coding feature may be input into the empty space convolution Pooling Pyramid ASPP 221 to perform empty convolution at multiple scales, so as to obtain multiple candidate Pooling features, and the multiple candidate Pooling features are merged into the target Pooling feature, that is, the target coding feature is subjected to empty convolution and sampling at different sampling rates (scaling rates) to obtain information at different scales (i.e., candidate Pooling features), the different candidate Pooling features are concatated (connected) together, the number of channels is increased, and the number of channels is reduced to a desired number by performing convolution with a predetermined convolution kernel (e.g., 1 × 1), which corresponds to a context of the target coding feature at multiple scales.
And 1023, inputting the target pooling feature into a decoding layer to execute decoding operation, and obtaining the multi-frame texture map.
The initial time step input to the decoder is from a particular symbol and for a sequence in one output, the output sequence is completed when the decoder searches for the symbol at a certain time step.
The background variable output by the encoder encodes the information of the entire input sequence, and given the output sequence in the training samples, the conditional probability of the decoder output will be calculated for each time step based on the previous output sequence and the background variable.
The decoder, which is typically a multi-layered RNN, separable convolution and residual module, etc., takes as input the output of the last time step and the background variables for the time step of the output sequence, and transforms them from the hidden state of the last time step to the hidden state of the current time step.
In this embodiment, as shown in fig. 2, the Decoder 230 may be invoked to decode the target pooled feature into a multi-frame texture map 240, i.e., the target pooled feature is input into the Decoder 230, and the Decoder 230 decodes the target pooled feature and outputs the multi-frame texture map 240.
Further, the map detection network is a network with a residual linking (skip connection), the target pooled feature and the feature generated when the encoding operation is performed on the first image data may be input into a decoding layer through the residual linking, and the decoding layer performs the decoding operation on the target pooled feature and the feature generated when the encoding operation is performed on the first image data, so as to obtain the multi-frame texture map.
The residual error link is also called as jump link, and the problem of gradient loss and gradient explosion in the training process can be solved by adding the residual error link in the network, so that the number of layers of the network can be effectively reduced, and the training is easier. Intuitively, it can be understood that it is believed that gradients at deep levels in a network may be easier to return to shallow levels, so that setting the number of levels in the network may be easier.
In computer vision, each layer of the network corresponds to extracting features of different layers, including a low layer, a middle layer and a high layer, and when the network is deeper, the extracted features of different layers are more, the combination of layer information among different layers is more, the grade of the features is higher along with the deepening of the network depth, the depth of the network is one of the factors for realizing good effect, however, gradient dispersion and gradient explosion become barriers for training the network of the deep layer, and convergence cannot be caused. In the embodiment, the residual link is introduced into the residual network, when the input signal is transmitted in the forward direction, the input signal can be directly transmitted to the high layer from any low layer, because the input signal contains an identity mapping, the problem of network degradation can be solved to a certain extent, the error signal can be directly transmitted to the low layer without any intermediate weight matrix transformation, and the problem of gradient dispersion can be relieved to a certain extent, so that the forward and backward transmission of the information is smoother, therefore, the problems of gradient fading and gradient explosion in the neural network training process can be effectively solved, the number of layers of the network does not need to be increased, and an accurate training result can be obtained.
In one example, as shown in fig. 2, the Decoder 230 includes a first decoding Block Decoder 231, a second decoding Block Decoder 232, and a third decoding Block Decoder 233, where the first decoding Block Decoder 231, the second decoding Block Decoder 232, and the third decoding Block Decoder 233 are all packaged and abstracted for some structures used for deep learning, so as to facilitate the structure of multiplexing other items and reduce the development cost of technicians.
Further, the first decoding Block Decoder 231, the second decoding Block Decoder 232, and the third decoding Block Decoder 233 may have the same or different structures, which is not limited in this example.
In a specific implementation, the target pooled feature is input into a first decoding Block Decoder Block 231 for up-sampling to obtain a first decoding feature.
The first decoding characteristics and the second candidate encoding characteristics are input into a second decoding Block Decoder Block 232 to be up-sampled, second decoding characteristics are obtained, namely the first decoding characteristics and the second candidate encoding characteristics are combined into first combination characteristics, the first combination characteristics are input into the second decoding Block Decoder Block 232, and the second decoding Block Decoder Block 232 up-samples the first combination characteristics and inputs the second decoding characteristics.
And inputting the second decoding characteristic and the first candidate coding characteristic into a third decoding Block Decoder Block 233 for up-sampling to obtain a multi-frame texture map, namely, combining the second decoding characteristic and the first candidate coding characteristic into a second combined characteristic, inputting the second combined characteristic into the third decoding Block Decoder Block 233, and up-sampling the second combined characteristic by the third decoding Block to input the multi-frame texture map.
In this example, as shown in fig. 2, for a service such as virtual fitting, the texture map 240 includes at least one of the following:
1. diffuse reflection map difference 241
The diffuse reflection map diffuse 241 should represent the color of the surface of the object, called Albedo in Unity. In the specific/frosted workflow, it has no (or less) diffuse reflection for metallic materials, so black is used for filling. Non-metallic materials reflect less light than metallic materials and refract light with less absorption and generally refract back to the surface, so for non-metallic materials, diffuse colors are used for filling. It is important to note that the diffuse reflective texture should not contain any illumination information, since the illumination will be added to the texture of the object on an (ambient) basis.
2. Highlight special 242
Highlight map specula 242 indicates a high light yield range, intensity, and color, and in the workflow of highlight map specula, a brighter color indicates a stronger highlight, and a black color indicates no highlight.
3. Normal map normal 243
The normal map normal 243 contains angle information but not any altitude information, and its R, G, B stores information indicating the direction and steepness of the slope.
4. Roughness mapping roughness244
Roughness map roughness 244 defines roughness information of the material, 0 (black-0 sRGB) indicates smooth, and 1 (white-255 sRGB) indicates rough. Roughness refers to a surface irregularity condition causing light diffusion, and the reflection direction is freely changed according to the surface roughness. This changes the direction of the light, but the light intensity remains constant. The rougher the surface, the more diffuse and darker the highlight. The smoother the surface, the more concentrated the high light reflection, although the total amount of light reflected is a bit, the brighter the surface, the more intense the light.
The roughness value sampled by roughness map roughress 244 affects the degree of statistical orientation of the micro-planes of a surface. A rougher surface will give a broader, more blurred specular reflection (high gloss), while a smoother surface will give a concentrated, sharp specular reflection.
It should be understood that the structure of the above-mentioned tile detection network and the texture tile output by the tile detection network are only examples, and when implementing the embodiment of the present invention, the structure of other tile detection networks and the texture tile output by the tile detection network may be set according to actual situations, for example, 4 coding blocks are set in an encoder, 4 decoding blocks are set in a decoder, a Field of view is increased by using an RFB (received Field Block), and a self-luminous tile, a transparent tile option, etc. are output. In addition, besides the structure of the map detection network and the texture map output by the map detection network, those skilled in the art may also adopt other structures of the map detection network and the texture maps output by the map detection network according to actual needs, which is not limited in the embodiment of the present invention.
Step 103, invoking a parameter detection network to extract texture parameters from the first image data.
In this embodiment, the texture recognition model includes a parameter detection network, which belongs to a regression network, and is configured to parse texture parameters texture _ params of an object from the first image data, that is, the input of the parameter detection network is the first image data, and the output is the texture parameters texture _ params of the object.
In an embodiment of the present invention, as shown in fig. 3, the parameter detection network includes a feature extraction network 310 and a Multilayer Perceptron (MLP) 320, and in this embodiment, step 103 may include the following steps:
and step 1031, inputting the first image data into a feature extraction network to extract texture features.
As shown in fig. 3, the feature extraction network 310 is used to extract high-dimensional features from the first image data 200, and the features are recorded as texture features, that is, the input of the feature extraction network 310 is the first image data 200, and the output is the texture features.
In one example, as shown in fig. 3, the feature extraction network 310 includes a first Residual Block 311, a second Residual Block312, and a third Residual Block 313, where the first Residual Block 311, the second Residual Block312, and the third Residual Block 313 are all packaged and abstracted of some Residual structures, and the Residual structures are added with Skip Connection (Skip Connection) on a pure forward propagation basis, so as to improve the performance of the depth network, so as to multiplex structures of other projects and reduce the development cost of technicians.
Further, the structures of the first Residual Block 311, the second Residual Block312, and the third Residual Block 313 may be the same or different, and this example is not limited in this respect.
In a specific implementation, the first image data 200 is input into a first Residual Block Residual 311 to extract a first Residual feature, the first Residual feature is input into a second Residual Block Residual 312 to extract a second Residual feature, and the second Residual feature is input into a third Residual Block Residual 313 to extract a texture feature.
And step 1032, inputting the texture features into the multilayer perceptron to be mapped into texture parameters.
As shown in fig. 3, the multi-layered perceptron MLP 320 is configured to map texture features into texture parameters 330, i.e. the input of the multi-layered perceptron MLP 320 is texture features and the output is texture parameters 330.
One feature of the multi-layer perceptron is that the layers are multi-layered, the first layer is called the input layer, the last layer is called the output layer, and the middle layer is called the hidden layer. The multilayer perceptron does not stipulate the number of hidden layers, so that the number of proper hidden layer layers can be selected according to the requirements of services, and the number of neurons in an output layer is not limited, so that the corresponding number of neurons can be selected according to the requirements of services (namely the number of texture parameters).
In one example, as shown in fig. 3, the multi-layered perceptron MLP 320 includes a first fully-connected layer 321 and a second fully-connected layer 322, both of the first fully-connected layer 321 and the second fully-connected layer 322 belong to fully connected layers (FCs), and the fully-connected layer FC functions to map the learned feature representation to the sample label space. In practical use, the fully-connected layer may be implemented by a convolution operation: a fully connected layer with a fully connected preceding layer can be converted into a convolution with a convolution kernel of 1 × 1; and the fully-connected layer of which the front layer is the convolution layer can be converted into the global convolution with the convolution kernel h multiplied by w, wherein h and w are the height and width of the convolution result of the front layer respectively.
In this example, the texture features are input into the first fully-connected layer FC 321 and mapped as candidate parameters, and activation processing is performed on the candidate parameters using an activation function such as sigmoid, tanh, Relu, or the like.
The candidate parameters are input into the second fully-connected layer FC 322 and mapped to texture parameters 330, and the texture parameters are activated by using an activation function such as sigmoid, tanh, Relu, or the like.
The activation function is used, non-linear factors can be introduced into the neurons, so that the network can approach any non-linear function at will, and the neural network can be utilized in more non-linear models.
Of course, the structure of the parameter detection network is only an example, and when the embodiment of the present invention is implemented, the structure of other parameter detection networks may be set according to actual situations, for example, 4 Residual blocks are set in the feature extraction network, 3 full connection layers are set in the multilayer perceptron, and the like, which is not limited in this embodiment of the present invention. In addition, besides the structure of the parameter detection network, a person skilled in the art may also adopt other structures of the parameter detection network according to actual needs, and the embodiment of the present invention is not limited to this.
And 104, rendering the texture map and the texture parameters to a scene in the first image data in a differentiable manner to obtain second image data.
And 105, training a mapping detection network and a parameter detection network by taking the second image data as a supervision signal.
For the parameter detection network, since the real texture parameters of the first image data cannot be obtained, the real texture parameters cannot be used as supervision to directly train the parameter detection network.
Therefore, in this embodiment, the mapping detection network and the parameter detection network are taken as a whole, the Seq2Seq is taken as an example, the mapping detection network and the parameter detection network are taken as an Encoder in the Seq2Seq, a renderer is added as a Decoder in the Seq2Seq, a texture mapping output by the mapping detection network and a texture parameter output by the parameter detection network are taken as intermediate quantities, the intermediate quantities are input into the renderer, the second image data is differentially rendered, and the second image data is taken as a supervision signal of the whole, so that the parameters of the mapping detection network and the parameters of the parameter detection network are learned, and the real texture parameters are not required to be taken as supervision signals, thereby realizing the self-supervision learning.
Further, the texture map and the texture parameters are input to a renderer under a differentiable framework such as pytorech 3d or redner to render the texture, and the texture is rendered to the scene in the first image data during rendering, so as to obtain second image data which has a real texture and the texture is theoretically the same as the first image data.
The rendering is a process of converting a three-dimensional scene into two-dimensional image data on a screen, a traditional rendering model is divided into a local illumination model and a global illumination model, the local illumination model considers the illumination of a light source on the surface of an object, the global illumination model considers the illumination effect of the light source on the object and the mutual reflection of light between the objects, the light effect under the real scene can be well simulated, and the time cost and the calculation cost are higher.
The inverse rendering process is opposite to the rendering process, and the shape, material, illumination information, camera parameters and the like of an object in a scene are obtained from two-dimensional image data, so that the difficulty is higher compared with the forward rendering process. The differentiable rendering can better solve the inverse rendering problem, meanwhile, the dependence on training data is reduced, the differentiable rendering is a rendering process capable of differentiating and is divided into a forward process and a reverse process, the forward process is the same as the traditional rendering, the scene and corresponding geometric parameters are input to obtain image data, and the reverse process is the derivation of the geometric parameters of the pixels on the scene.
During rendering, geometric parameters of a scene, such as the height, width, length, and the like of a certain object, may be queried in the first image data, and these geometric parameters may be provided in a public data set, may also be learned from the first image data through deep learning, and may also be entered together when the first image data is entered, which is not limited in this embodiment.
The texture map, the texture parameters and the geometric parameters are input into a renderer to be differentially rendered into second image data.
In one embodiment of the present invention, step 105 may include the steps of:
step 1051, calculate the difference between the first image data and the second image data as a loss value.
In this embodiment, the data of the first image data and the data of the second image data may be input into a predetermined Loss Function (Loss Function), and the difference between the first image data and the second image data may be measured and recorded as a Loss value Loss.
Illustratively, a norm distance L2 between each pixel point in the first image data and each pixel point in the second image data is calculated, and an average value is calculated for the norm distances L2 of all the pixel points as a loss value.
Step 1052, updating parameters in the parameter detection network based on the loss value, and mapping the parameters in the detection network.
In this embodiment, the parameter detection network and the mapping detection network are propagated in reverse, and the parameters in the parameter detection network and the parameters in the mapping detection network are updated based on the loss values, respectively.
In some cases, in the process of back propagation for the parameter detection network and the map detection network, the loss value is substituted into algorithms such as SGD (random gradient descent) and Adam (Adaptive momentum), so as to calculate the update amplitude of the parameter in the parameter detection network and the update amplitude of the parameter in the map detection network respectively, update the parameter in the parameter detection network according to the update amplitudes respectively, and update the parameter in the map detection network according to the update amplitudes.
1053, judging whether the times of the training of the current round is equal to a preset threshold value; if yes, go to step 1054, otherwise, go back to step 102.
And 1054, outputting parameters in the parameter detection network and mapping parameters in the detection network.
In this embodiment, a threshold may be set in advance for the number of iterations, and as a stop condition, in each iteration training, the number of current iterations is counted, so as to determine whether the threshold is reached.
If the threshold is reached, the training of the parameter detection network and the mapping detection network can be considered to be completed, and at the moment, the weights in the parameter detection network and the weights in the mapping detection network are output and persisted to the database.
Further, considering that the loss value may oscillate in the training process, the parameter detection network and the mapping detection network are over-fitted, so that the loss values in each round of training can be compared, and the parameter in the parameter detection network and the parameter in the mapping detection network corresponding to the minimum loss value are output, and at this time, the performance of the parameter detection network and the mapping detection network is optimal.
If the threshold value is not reached, the next round of iterative training can be entered, and the steps 102 to 104 are executed again, so that the iterative training is circulated until the parameter detection network and the mapping detection network are trained completely.
In this embodiment, the structure and weight of the parameter detection network, the structure and weight of the mapping detection network, and the like can be recorded in the offline training parameter detection network and the mapping detection network, and distributed to the service terminals of the application parameter detection network and the mapping detection network in various ways.
In the present embodiment, first image data is acquired; calling a mapping detection network to extract a multi-frame texture mapping from first image data; calling a parameter detection network to extract texture parameters from the first image data; differentially rendering the texture map and the texture parameters to a scene in the first image data to obtain second image data; the second image data is used as a supervision signal to train the chartlet detection network and the parameter detection network, the chartlet detection network and the parameter detection network not only belong to automatic operation, users have no perception and low learning threshold, but also have the capability of rendering textures to scenes, so that designers can reduce the operation in the aspect of texture processing, the simplicity of operation is improved, the time and the energy consumed by modeling are reduced, the efficiency is improved, and the cost is reduced The parameter detects the difficulty of the network.
Example two
Fig. 4 is a flowchart of a texture migration method according to a second embodiment of the present invention, where this embodiment is applicable to a case of migrating a texture in image data, and the method may be executed by a training device of a texture migration model, where the training device of the texture migration model may be implemented by software and/or hardware, and may be configured in a computer device, for example, a server, a workstation, a personal computer, a mobile terminal (e.g., a mobile phone, a tablet computer, etc.), and specifically includes the following steps:
step 401, loading a texture recognition model.
In this embodiment, a texture recognition model may be trained in advance, as shown in fig. 5, where the texture recognition model includes a mapping detection network and a parameter detection network, and the training method is as follows:
acquiring first image data;
calling a mapping detection network to extract a multi-frame texture mapping from first image data;
calling a parameter detection network to extract texture parameters from the first image data;
differentially rendering the texture map and the texture parameters to a scene in the first image data to obtain second image data;
and training a mapping detection network and a parameter detection network by taking the second image data as a supervision signal.
In the embodiment of the present invention, since the method for training the texture recognition model is basically similar to the application of the first embodiment, the description is simple, and reference may be made to a part of the description of the first embodiment for relevant points, which is not described in detail herein.
The structure of the texture recognition model (the mapping detection network and the parameter detection network) and the parameters thereof are persisted in the database, and the texture recognition model (the mapping detection network and the parameter detection network) and the parameters thereof can be loaded into the memory for operation during texture migration.
Step 402, acquiring first image data.
In this embodiment, the first image data is two-dimensional image data having a real scene in which objects such as a person, an animal, a tool, and a building are present, and the objects have real textures.
For the designer, the first image data is the material from which the texture is derived, and the texture is to be migrated to another scene.
Step 403, invoking a map detection network to extract a multi-frame texture map from the first image data.
As shown in fig. 5, the first image data is input into a map detection network, which extracts a multi-frame texture map from the first image data.
In an embodiment of the present invention, the map detection network includes an encoder, a pooling layer, and a decoder, and in this embodiment, the first image data is input into the encoder to perform an encoding operation, so as to obtain a target encoding characteristic; inputting the target coding features into a pooling layer to execute pooling operation, and obtaining target pooling features; inputting the target pooling feature into a decoding layer to execute decoding operation, and obtaining a multi-frame texture mapping.
Illustratively, the encoder comprises a first encoding block, a second encoding block, and a third encoding block, the pooling layer comprises a hole space convolution pooling pyramid, and the decoder comprises a first decoding block, a second decoding block, and a third decoding block.
In the encoding operation, first image data is input into a first encoding block to be down-sampled, and a first candidate encoding characteristic is obtained; inputting the first candidate coding features into a second coding block for down-sampling to obtain second candidate coding features; and inputting the second candidate coding features into a third coding block for down sampling to obtain target coding features.
And during pooling operation, inputting the target coding features into a cavity space convolution pooling pyramid to execute cavity convolution under multiple scales to obtain multiple candidate pooling features, and fusing the multiple candidate pooling features into the target pooling features.
In the decoding operation, inputting the target pooling characteristic into a first decoding block for up-sampling to obtain a first decoding characteristic; inputting the first decoding characteristic and the second candidate coding characteristic into a second decoding block for up-sampling to obtain a second decoding characteristic; inputting the second decoding characteristic and the first candidate coding characteristic into a third decoding block for up-sampling to obtain a multi-frame texture mapping;
wherein the texture map comprises at least one of:
diffuse reflection mapping, highlight mapping, normal mapping, roughness mapping.
In the embodiment of the present invention, since the extracted texture map is basically similar to the application of the first embodiment, the description is simple, and reference may be made to part of the description of the first embodiment for relevant points.
Step 404, invoking a parameter detection network to extract texture parameters from the first image data.
As shown in fig. 5, the first image data is input into a parameter detection network, which extracts texture parameters from the first image data.
In one embodiment of the invention, the parameter detection network comprises a feature extraction network and a multilayer perceptron; in the embodiment, the first image data is input into a feature extraction network to extract texture features; inputting the texture features into a multi-layer perceptron to be mapped into texture parameters.
Illustratively, the feature extraction network comprises a first residual block, a second residual block and a third residual block, and the multilayer perceptron comprises a first fully-connected layer and a second fully-connected layer.
When extracting the texture features, inputting first image data into a first residual error block to extract first residual error features; inputting the first residual error characteristics into a second residual error block to extract second residual error characteristics; and inputting the second residual error feature into a third residual error block to extract texture features.
When mapping the texture parameters, inputting texture features into a first full-connection layer to be mapped into candidate parameters; and inputting the candidate parameters into the second fully-connected layer to be mapped into texture parameters.
In the embodiment of the present invention, since the extraction of the texture parameter is substantially similar to the application of the first embodiment, the description is relatively simple, and reference may be made to part of the description of the first embodiment for relevant points.
Step 405, rendering the texture map and the texture parameters differentially to a scene independent of the first image data, and obtaining second image data.
In this embodiment, as shown in fig. 5, a scene independent of the first image data, that is, a scene other than the scene of the first image data, may be provided as a target of texture migration, and for a designer, the scene is a scene to be designed, such as a virtual character, a virtual prop, a virtual building, a virtual garment, and the like.
As shown in fig. 5, a texture map and texture parameters in first image data are differentially rendered into a scene independent of the first image data, and second image data is obtained such that the texture of the first image data is migrated into the second image data, the scene of the second image data being different from the scene of the first image data, and the texture of the second image data being the same as or similar to the texture of the first image data.
In a specific implementation, geometric parameters independent of the scene in the first image data may be queried, which for the designer are geometric parameters of the design object. The texture map, texture parameters and geometric parameters are input to a renderer under a differentiable framework such as pyrtch 3d or redner to be rendered differentially as second image data.
Taking the person reconstruction as an example, the designer provides the first image data and the geometric parameters of the person B, wherein the person a and the person B are not the same in the first image data.
The first image data is processed in two paths, the first path is to input the first image data into a map detection network to extract a diffuse reflection map, a highlight map, a normal map and a roughness map of a person A, and the second path is to input the first image data into a parameter detection network to extract texture parameters of the person A.
Inputting the diffuse reflection map, highlight map, normal map and roughness map of the character A, inputting the texture parameters of the character A and the geometric parameters of the character B into a renderer such as a pytorech 3d or redner, and rendering into second image data, wherein the second image data records the character B, and the texture of the character B is the same as or similar to the texture of the character A.
In the embodiment, a texture recognition model is loaded, and the texture recognition model comprises a mapping detection network and a parameter detection network to obtain first image data; calling a mapping detection network to extract a multi-frame texture mapping from first image data; calling a parameter detection network to extract texture parameters from the first image data; the texture map and the texture parameters are differentially rendered into the scene independently of the first image data, obtaining second image data. The mapping detection network and the parameter detection network not only belong to automatic operation, no perception is provided for users, the learning threshold is low, but also the capability of rendering textures to a scene is provided, so that designers can reduce the operation in the aspect of texture processing, the simplicity and convenience of operation are improved, the time and energy consumed by modeling are reduced, the efficiency is improved, and the cost is reduced.
It should be noted that, for simplicity of description, the method embodiments are described as a series of acts or combination of acts, but those skilled in the art will recognize that the present invention is not limited by the illustrated order of acts, as some steps may occur in other orders or concurrently in accordance with the embodiments of the present invention. Further, those skilled in the art will appreciate that the embodiments described in the specification are presently preferred and that no particular act is required to implement the invention.
EXAMPLE III
Fig. 6 is a block diagram of a structure of a training apparatus for a texture recognition model according to a third embodiment of the present invention, where the texture recognition model includes a mapping detection network and a parameter detection network, and the apparatus may specifically include the following modules:
an image data obtaining module 601, configured to obtain first image data;
a texture map extracting module 602, configured to invoke the map detection network to extract multiple frames of texture maps from the first image data;
a texture parameter extracting module 603, configured to invoke the parameter detection network to extract a texture parameter from the first image data;
an image data rendering module 604, configured to render the texture map and the texture parameter differentially to a scene in the first image data, so as to obtain second image data;
a network training module 605, configured to train the map detection network and the parameter detection network by using the second image data as a supervision signal.
In one embodiment of the invention, the map detection network comprises an encoder, a pooling layer, a decoder;
the texture parameter extraction module 603 includes:
the encoding operation module is used for inputting the first image data into the encoder to execute encoding operation so as to obtain target encoding characteristics;
the pooling operation module is used for inputting the target coding features into the pooling layer to execute pooling operation so as to obtain target pooling features;
and the decoding operation module is used for inputting the target pooling characteristic into the decoding layer to execute decoding operation so as to obtain a multi-frame texture mapping.
In one example of the embodiment of the present invention, the encoder includes a first encoding block, a second encoding block, and a third encoding block, the pooling layer includes a hole space convolution pooling pyramid, and the decoder includes a first decoding block, a second decoding block, and a third decoding block;
the encoding operation module includes:
a first down-sampling module, configured to input the first image data into the first coding block for down-sampling to obtain a first candidate coding feature;
the second down-sampling module is used for inputting the first candidate coding features into the second coding block for down-sampling to obtain second candidate coding features;
the third down-sampling module is used for inputting the second candidate coding feature into the third coding block for down-sampling to obtain a target coding feature;
the pooling operation module is further configured to:
inputting the target coding features into the void space convolution pooling pyramid to execute void convolution under multiple scales to obtain multiple candidate pooling features, and fusing the multiple candidate pooling features into target pooling features;
the decoding operation module includes:
the first upsampling module is used for inputting the target pooling characteristic into the first decoding block for upsampling to obtain a first decoding characteristic;
a second upsampling module, configured to input the first decoding feature and the second candidate coding feature into the second decoding block for upsampling to obtain a second decoding feature;
a third upsampling module, configured to input the second decoding feature and the first candidate coding feature into the third decoding block for upsampling, so as to obtain a multi-frame texture map;
wherein the texture map comprises at least one of:
diffuse reflection mapping, highlight mapping, normal mapping, roughness mapping.
In one embodiment of the invention, the parameter detection network comprises a feature extraction network and a multilayer perceptron;
the texture parameter extraction module 603 includes:
the texture feature extraction module is used for inputting the first image data into the feature extraction network to extract texture features;
and the multilayer perception module is used for inputting the texture features into the multilayer perception machine and mapping the texture features into texture parameters.
In one example of the embodiment of the present invention, the feature extraction network includes a first residual block, a second residual block, and a third residual block, and the multi-layer perceptron includes a first fully-connected layer and a second fully-connected layer;
the texture feature extraction module comprises:
the first residual error processing module is used for inputting the first image data into the first residual error block to extract a first residual error characteristic;
the second residual error processing module is used for inputting the first residual error characteristics into the second residual error block to extract second residual error characteristics;
a third residual error processing module, configured to input the second residual error feature into the third residual error block to extract a texture feature;
the multi-layered sensing module includes:
the first mapping module is used for inputting the texture features into the first full-connection layer to be mapped into candidate parameters;
and the second mapping module is used for inputting the candidate parameters into the second full-connection layer and mapping the candidate parameters into texture parameters.
In one embodiment of the present invention, the image data rendering module 604 comprises:
the geometric parameter query module is used for querying geometric parameters of a scene in the first image data;
a differential renderer module for inputting the texture map, the texture parameters and the geometric parameters into a renderer for differentiably rendering into second image data.
In one embodiment of the present invention, the network training module 605 comprises:
a loss value calculation module for calculating a difference between the first image data and the second image data as a loss value;
a parameter updating module for updating the parameters in the parameter detection network and the parameters in the map detection network based on the loss values;
the threshold value judging module is used for judging whether the times of the training of the current round are equal to a preset threshold value or not; if yes, a parameter output module is called, and the texture map extraction module 602 is called;
and the parameter output module is used for outputting the parameters in the parameter detection network and the parameters in the mapping detection network.
In one embodiment of the present invention, the loss value calculation module includes:
a norm distance calculation module, configured to calculate a norm distance between each pixel point in the first image data and each pixel point in the second image data;
and the average value calculating module is used for calculating the average value of all the norm distances as a loss value.
In one embodiment of the present invention, the parameter output module includes:
the loss value comparison module is used for comparing the loss values in each round of training;
and the optimal parameter output module is used for outputting the parameters in the parameter detection network and the parameters in the mapping detection network corresponding to the minimum loss value.
The training device of the texture recognition model provided by the embodiment of the invention can execute the training method of the texture recognition model provided by any embodiment of the invention, and has corresponding functional modules and beneficial effects of the execution method.
Example four
Fig. 7 is a block diagram of a texture migration apparatus according to a fourth embodiment of the present invention, which may specifically include the following modules:
a texture recognition model loading module 701, configured to load a texture recognition model, where the texture recognition model includes a map detection network and a parameter detection network,
an image data obtaining module 702, configured to obtain first image data;
a texture map extracting module 703, configured to invoke the map detection network to extract multiple frames of texture maps from the first image data;
a texture parameter extracting module 704, configured to invoke the parameter detecting network to extract a texture parameter from the first image data;
an image data rendering module 705, configured to render the texture map and the texture parameter differentially into a scene independent of the first image data, to obtain second image data.
In one embodiment of the present invention, the training method of the texture recognition model is as follows:
acquiring first image data;
calling the mapping detection network to extract a multi-frame texture mapping from the first image data;
calling the parameter detection network to extract texture parameters from the first image data;
differentially rendering the texture map and the texture parameters to a scene in the first image data to obtain second image data;
and training the mapping detection network and the parameter detection network by taking the second image data as a supervision signal.
In one embodiment of the invention, the map detection network comprises an encoder, a pooling layer, a decoder;
the texture parameter extraction module 703 includes:
the encoding operation module is used for inputting the first image data into the encoder to execute encoding operation so as to obtain target encoding characteristics;
the pooling operation module is used for inputting the target coding features into the pooling layer to execute pooling operation so as to obtain target pooling features;
and the decoding operation module is used for inputting the target pooling characteristic into the decoding layer to execute decoding operation so as to obtain a multi-frame texture mapping.
In one example of the embodiment of the present invention, the encoder includes a first encoding block, a second encoding block, and a third encoding block, the pooling layer includes a hole space convolution pooling pyramid, and the decoder includes a first decoding block, a second decoding block, and a third decoding block;
the encoding operation module includes:
a first down-sampling module, configured to input the first image data into the first coding block for down-sampling to obtain a first candidate coding feature;
the second down-sampling module is used for inputting the first candidate coding features into the second coding block for down-sampling to obtain second candidate coding features;
the third down-sampling module is used for inputting the second candidate coding feature into the third coding block for down-sampling to obtain a target coding feature;
the pooling operation module is further configured to:
inputting the target coding features into the void space convolution pooling pyramid to execute void convolution under multiple scales to obtain multiple candidate pooling features, and fusing the multiple candidate pooling features into target pooling features;
the decoding operation module includes:
the first upsampling module is used for inputting the target pooling characteristic into the first decoding block for upsampling to obtain a first decoding characteristic;
a second upsampling module, configured to input the first decoding feature and the second candidate coding feature into the second decoding block for upsampling to obtain a second decoding feature;
a third upsampling module, configured to input the second decoding feature and the first candidate coding feature into the third decoding block for upsampling, so as to obtain a multi-frame texture map;
wherein the texture map comprises at least one of:
diffuse reflection mapping, highlight mapping, normal mapping, roughness mapping.
In one embodiment of the invention, the parameter detection network comprises a feature extraction network and a multilayer perceptron;
the texture parameter extraction module 703 includes:
the texture feature extraction module is used for inputting the first image data into the feature extraction network to extract texture features;
and the multilayer perception module is used for inputting the texture features into the multilayer perception machine and mapping the texture features into texture parameters.
In one example of the embodiment of the present invention, the feature extraction network includes a first residual block, a second residual block, and a third residual block, and the multi-layer perceptron includes a first fully-connected layer and a second fully-connected layer;
the texture feature extraction module comprises:
the first residual error processing module is used for inputting the first image data into the first residual error block to extract a first residual error characteristic;
the second residual error processing module is used for inputting the first residual error characteristics into the second residual error block to extract second residual error characteristics;
a third residual error processing module, configured to input the second residual error feature into the third residual error block to extract a texture feature;
the multi-layered sensing module includes:
the first mapping module is used for inputting the texture features into the first full-connection layer to be mapped into candidate parameters;
and the second mapping module is used for inputting the candidate parameters into the second full-connection layer and mapping the candidate parameters into texture parameters.
In one embodiment of the present invention, the image data rendering module 704 includes:
a geometric parameter query module for querying geometric parameters independent of the scene in the first image data;
a differential renderer module for inputting the texture map, the texture parameters and the geometric parameters into a renderer for differentiably rendering into second image data.
The texture migration device provided by the embodiment of the invention can execute the texture migration method provided by any embodiment of the invention, and has corresponding functional modules and beneficial effects of the execution method.
EXAMPLE five
Fig. 8 is a schematic structural diagram of a computer device according to a fifth embodiment of the present invention. FIG. 8 illustrates a block diagram of an exemplary computer device 12 suitable for use in implementing embodiments of the present invention. The computer device 12 shown in fig. 8 is only an example and should not bring any limitations to the functionality or scope of use of the embodiments of the present invention.
As shown in FIG. 8, computer device 12 is in the form of a general purpose computing device. The components of computer device 12 may include, but are not limited to: one or more processors or processing units 16, a system memory 28, and a bus 18 that couples various system components including the system memory 28 and the processing unit 16.
Bus 18 represents one or more of any of several types of bus structures, including a memory bus or memory controller, a peripheral bus, an accelerated graphics port, and a processor or local bus using any of a variety of bus architectures. By way of example, such architectures include, but are not limited to, Industry Standard Architecture (ISA) bus, micro-channel architecture (MAC) bus, enhanced ISA bus, Video Electronics Standards Association (VESA) local bus, and Peripheral Component Interconnect (PCI) bus.
Computer device 12 typically includes a variety of computer system readable media. Such media may be any available media that is accessible by computer device 12 and includes both volatile and nonvolatile media, removable and non-removable media.
The system memory 28 may include computer system readable media in the form of volatile memory, such as Random Access Memory (RAM)30 and/or cache memory 32. Computer device 12 may further include other removable/non-removable, volatile/nonvolatile computer system storage media. By way of example only, storage system 34 may be used to read from and write to non-removable, nonvolatile magnetic media (not shown in FIG. 8, and commonly referred to as a "hard drive"). Although not shown in FIG. 8, a magnetic disk drive for reading from and writing to a removable, nonvolatile magnetic disk (e.g., a "floppy disk") and an optical disk drive for reading from or writing to a removable, nonvolatile optical disk (e.g., a CD-ROM, DVD-ROM, or other optical media) may be provided. In these cases, each drive may be connected to bus 18 by one or more data media interfaces. Memory 28 may include at least one program product having a set (e.g., at least one) of program modules that are configured to carry out the functions of embodiments of the invention.
A program/utility 40 having a set (at least one) of program modules 42 may be stored, for example, in memory 28, such program modules 42 including, but not limited to, an operating system, one or more application programs, other program modules, and program data, each of which examples or some combination thereof may comprise an implementation of a network environment. Program modules 42 generally carry out the functions and/or methodologies of the described embodiments of the invention.
Computer device 12 may also communicate with one or more external devices 14 (e.g., keyboard, pointing device, display 24, etc.), with one or more devices that enable a user to interact with computer device 12, and/or with any devices (e.g., network card, modem, etc.) that enable computer device 12 to communicate with one or more other computing devices. Such communication may be through an input/output (I/O) interface 22. Also, computer device 12 may communicate with one or more networks (e.g., a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network such as the Internet) via network adapter 20. As shown, network adapter 20 communicates with the other modules of computer device 12 via bus 18. It should be understood that although not shown in the figures, other hardware and/or software modules may be used in conjunction with computer device 12, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, and data backup storage systems, among others.
The processing unit 16 executes programs stored in the system memory 28 to execute various functional applications and data processing, such as a training method or a texture migration method for implementing a texture recognition model provided by an embodiment of the present invention.
EXAMPLE six
An embodiment of the present invention further provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the computer program implements each process of the above-mentioned training method for a texture recognition model or the texture migration method, and can achieve the same technical effect, and is not described herein again to avoid repetition.
A computer readable storage medium may include, for example, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
It is to be noted that the foregoing is only illustrative of the preferred embodiments of the present invention and the technical principles employed. It will be understood by those skilled in the art that the present invention is not limited to the particular embodiments described herein, but is capable of various obvious changes, rearrangements and substitutions as will now become apparent to those skilled in the art without departing from the scope of the invention. Therefore, although the present invention has been described in greater detail by the above embodiments, the present invention is not limited to the above embodiments, and may include other equivalent embodiments without departing from the spirit of the present invention, and the scope of the present invention is determined by the scope of the appended claims.

Claims (14)

1. A training method for a texture recognition model is characterized in that the texture recognition model comprises a mapping detection network and a parameter detection network, and the method comprises the following steps:
acquiring first image data;
calling the mapping detection network to extract a multi-frame texture mapping from the first image data;
calling the parameter detection network to extract texture parameters from the first image data;
differentially rendering the texture map and the texture parameters to a scene in the first image data to obtain second image data;
and training the mapping detection network and the parameter detection network by taking the second image data as a supervision signal.
2. The method of claim 1, wherein the map detection network comprises an encoder, a pooling layer, a decoder;
the invoking the map detection network to extract a multi-frame texture map from the first image data includes:
inputting the first image data into the encoder to execute encoding operation, and obtaining target encoding characteristics;
inputting the target coding features into the pooling layer to execute pooling operation, and obtaining target pooling features;
inputting the target pooling feature into the decoding layer to execute decoding operation, and obtaining a multi-frame texture mapping.
3. The method of claim 2, wherein the encoder comprises a first encoding block, a second encoding block, and a third encoding block, wherein the pooling layer comprises a hole space convolution pooling pyramid, and wherein the decoder comprises a first decoding block, a second decoding block, and a third decoding block;
the inputting the first image data into the encoder to perform an encoding operation to obtain a target encoding characteristic includes:
inputting the first image data into the first coding block for down-sampling to obtain a first candidate coding feature;
inputting the first candidate coding features into the second coding block for down sampling to obtain second candidate coding features;
inputting the second candidate coding features into the third coding block for down sampling to obtain target coding features;
inputting the target coding features into the pooling layer to perform pooling operation to obtain target pooling features, including:
inputting the target coding features into the void space convolution pooling pyramid to execute void convolution under multiple scales to obtain multiple candidate pooling features, and fusing the multiple candidate pooling features into target pooling features;
inputting the target pooling feature into the decoding layer to execute decoding operation to obtain a multi-frame texture map, comprising:
inputting the target pooling feature into the first decoding block for upsampling to obtain a first decoding feature;
inputting the first decoding characteristic and the second candidate coding characteristic into the second decoding block for upsampling to obtain a second decoding characteristic;
inputting the second decoding characteristic and the first candidate coding characteristic into the third decoding block for up-sampling to obtain a plurality of frames of texture maps;
wherein the texture map comprises at least one of:
diffuse reflection mapping, highlight mapping, normal mapping, roughness mapping.
4. The method of claim 1, wherein the parameter detection network comprises a feature extraction network, a multi-tier perceptron;
the invoking the parameter detection network to extract texture parameters from the first image data includes:
inputting the first image data into the feature extraction network to extract texture features;
and inputting the texture features into the multilayer perceptron to be mapped into texture parameters.
5. The method of claim 4, wherein the feature extraction network comprises a first residual block, a second residual block, and a third residual block, and wherein the multi-layered perceptron comprises a first fully-connected layer, a second fully-connected layer;
the inputting the first image data into the feature extraction network to extract texture features comprises:
inputting the first image data into the first residual error block to extract a first residual error feature;
inputting the first residual error feature into the second residual error block to extract a second residual error feature;
inputting the second residual error feature into the third residual error block to extract texture features;
the inputting the texture features into the multi-layer perceptron to be mapped into texture parameters comprises:
inputting the texture features into the first fully-connected layer to be mapped into candidate parameters;
inputting the candidate parameters into the second fully-connected layer to be mapped into texture parameters.
6. The method according to any of claims 1-5, wherein said differentially rendering the texture map and the texture parameters to a scene in the first image data, obtaining second image data, comprises:
querying the geometric parameters of the scene in the first image data;
inputting the texture map, the texture parameters and the geometric parameters into a renderer to be rendered differentially into second image data.
7. The method according to any one of claims 1-5, wherein training the map detection network and the parameter detection network with the second image data as a supervisory signal comprises:
calculating a difference between the first image data and the second image data as a loss value;
updating parameters in the parameter detection network and parameters in the map detection network based on the loss values;
judging whether the times of the training of the current round are equal to a preset threshold value or not;
if yes, outputting parameters in the parameter detection network and parameters in the mapping detection network;
if not, returning to execute the step of inputting the first image data into the mapping detection network to extract the multi-frame texture mapping.
8. The method of claim 7, wherein the calculating a difference between the first image data and the second image data as a loss value comprises:
calculating a norm distance between each pixel point in the first image data and each pixel point in the second image data;
and calculating the average value of all the norm distances as a loss value.
9. The method of claim 7, wherein outputting the parameter in the parameter detection network and the parameter in the map detection network comprises:
comparing the loss values for each round of training;
and outputting the parameters in the parameter detection network and the parameters in the map detection network corresponding to the minimum loss value.
10. A method of texture migration, comprising:
loading a texture recognition model, wherein the texture recognition model comprises a mapping detection network and a parameter detection network,
acquiring first image data;
calling the mapping detection network to extract a multi-frame texture mapping from the first image data;
calling the parameter detection network to extract texture parameters from the first image data;
the texture map and the texture parameters are differentially rendered into the scene independent of the first image data, obtaining second image data.
11. An apparatus for training a texture recognition model, wherein the texture recognition model comprises a mapping detection network and a parameter detection network, the apparatus comprising:
the image data acquisition module is used for acquiring first image data;
the texture mapping extraction module is used for calling the mapping detection network to extract a plurality of frames of texture mapping from the first image data;
the texture parameter extraction module is used for calling the parameter detection network to extract texture parameters from the first image data;
an image data rendering module, configured to render the texture map and the texture parameter differentially to a scene in the first image data, so as to obtain second image data;
and the network training module is used for training the mapping detection network and the parameter detection network by taking the second image data as a supervision signal.
12. A texture migration apparatus, comprising:
a texture recognition model loading module for loading a texture recognition model, wherein the texture recognition model comprises a mapping detection network and a parameter detection network,
the image data acquisition module is used for acquiring first image data;
the texture mapping extraction module is used for calling the mapping detection network to extract a plurality of frames of texture mapping from the first image data;
the texture parameter extraction module is used for calling the parameter detection network to extract texture parameters from the first image data;
and the image data rendering module is used for rendering the texture map and the texture parameter into a scene independent of the first image data in a differentiable manner to obtain second image data.
13. A computer device, characterized in that the computer device comprises:
one or more processors;
a memory for storing one or more programs,
when executed by the one or more processors, cause the one or more processors to implement a method of training a texture recognition model according to any one of claims 1-9 or a method of texture migration according to claim 10.
14. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out a method of training a texture recognition model according to any one of claims 1 to 9 or a method of texture migration according to claim 10.
CN202210010675.9A 2022-01-06 2022-01-06 Training and texture migration method of texture recognition model and related device Pending CN114419335A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210010675.9A CN114419335A (en) 2022-01-06 2022-01-06 Training and texture migration method of texture recognition model and related device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210010675.9A CN114419335A (en) 2022-01-06 2022-01-06 Training and texture migration method of texture recognition model and related device

Publications (1)

Publication Number Publication Date
CN114419335A true CN114419335A (en) 2022-04-29

Family

ID=81270539

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210010675.9A Pending CN114419335A (en) 2022-01-06 2022-01-06 Training and texture migration method of texture recognition model and related device

Country Status (1)

Country Link
CN (1) CN114419335A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114663552A (en) * 2022-05-25 2022-06-24 武汉纺织大学 Virtual fitting method based on 2D image
CN114842121A (en) * 2022-06-30 2022-08-02 北京百度网讯科技有限公司 Method, device, equipment and medium for generating mapping model training and mapping
CN116433644A (en) * 2023-04-22 2023-07-14 深圳市江机实业有限公司 Eye image dynamic diagnosis method based on recognition model
CN117746069A (en) * 2024-02-18 2024-03-22 浙江啄云智能科技有限公司 Graph searching model training method and graph searching method

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114663552A (en) * 2022-05-25 2022-06-24 武汉纺织大学 Virtual fitting method based on 2D image
CN114663552B (en) * 2022-05-25 2022-08-16 武汉纺织大学 Virtual fitting method based on 2D image
CN114842121A (en) * 2022-06-30 2022-08-02 北京百度网讯科技有限公司 Method, device, equipment and medium for generating mapping model training and mapping
CN114842121B (en) * 2022-06-30 2022-09-09 北京百度网讯科技有限公司 Method, device, equipment and medium for generating mapping model training and mapping
CN116433644A (en) * 2023-04-22 2023-07-14 深圳市江机实业有限公司 Eye image dynamic diagnosis method based on recognition model
CN116433644B (en) * 2023-04-22 2024-03-08 深圳市江机实业有限公司 Eye image dynamic diagnosis method based on recognition model
CN117746069A (en) * 2024-02-18 2024-03-22 浙江啄云智能科技有限公司 Graph searching model training method and graph searching method
CN117746069B (en) * 2024-02-18 2024-05-14 浙江啄云智能科技有限公司 Graph searching model training method and graph searching method

Similar Documents

Publication Publication Date Title
CN114419335A (en) Training and texture migration method of texture recognition model and related device
Zhang et al. Deep depth completion of a single rgb-d image
CN111369681B (en) Three-dimensional model reconstruction method, device, equipment and storage medium
CN108895981B (en) Three-dimensional measurement method, device, server and storage medium
Li et al. [Retracted] Multivisual Animation Character 3D Model Design Method Based on VR Technology
CN111625608A (en) Method and system for generating electronic map according to remote sensing image based on GAN model
CN114842121B (en) Method, device, equipment and medium for generating mapping model training and mapping
CN111753698A (en) Multi-mode three-dimensional point cloud segmentation system and method
CN113159232A (en) Three-dimensional target classification and segmentation method
Yao et al. Multi‐image based photon tracing for interactive global illumination of dynamic scenes
CN112989085A (en) Image processing method, image processing device, computer equipment and storage medium
CN116612204B (en) Image generation method, training device, electronic equipment and storage medium
CN112330684A (en) Object segmentation method and device, computer equipment and storage medium
CN114792355B (en) Virtual image generation method and device, electronic equipment and storage medium
CN117611774A (en) Multimedia display system and method based on augmented reality technology
CN116863003A (en) Video generation method, method and device for training video generation model
CN114972017A (en) Generation method and device of personalized face style graph and electronic equipment
CN117557714A (en) Three-dimensional reconstruction method, electronic device and readable storage medium
CN117351133B (en) Data rendering method, device, equipment and computer readable storage medium
CN116958423B (en) Text-based three-dimensional modeling method, image rendering method and device
CN117746015A (en) Small target detection model training method, small target detection method and related equipment
CN112037305A (en) Method, device and storage medium for reconstructing tree-like organization in image
CN113610856B (en) Method and device for training image segmentation model and image segmentation
CN115775300A (en) Reconstruction method of human body model, training method and device of human body reconstruction model
CN111768007B (en) Method and device for mining data

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination