WO2022042470A1 - Image decomposition method and related apparatus and device - Google Patents

Image decomposition method and related apparatus and device Download PDF

Info

Publication number
WO2022042470A1
WO2022042470A1 PCT/CN2021/114023 CN2021114023W WO2022042470A1 WO 2022042470 A1 WO2022042470 A1 WO 2022042470A1 CN 2021114023 W CN2021114023 W CN 2021114023W WO 2022042470 A1 WO2022042470 A1 WO 2022042470A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
normal vector
feature map
decomposed
scene
Prior art date
Application number
PCT/CN2021/114023
Other languages
French (fr)
Chinese (zh)
Inventor
章国锋
鲍虎军
罗俊丹
黄昭阳
李易瑾
周晓巍
Original Assignee
浙江商汤科技开发有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 浙江商汤科技开发有限公司 filed Critical 浙江商汤科技开发有限公司
Publication of WO2022042470A1 publication Critical patent/WO2022042470A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/0002Inspection of images, e.g. flaw detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/50Image enhancement or restoration using two or more images, e.g. averaging or subtraction
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20212Image combination
    • G06T2207/20221Image fusion; Image merging

Definitions

  • the present disclosure relates to the technical field of image processing, and in particular, to an image decomposition method and related devices and equipment.
  • Intrinsic image decomposition is one of the important problems in computer vision and computer graphics. Intrinsic image refers to decomposing an original image into a shading image and a reflectance/albedo image. Intrinsic images have a wide range of applications in 3D reconstruction, photorealistic image editing, augmented reality, semantic segmentation and other fields, and have a great impact.
  • the embodiments of the present disclosure provide at least an image decomposition method and related apparatus and equipment.
  • a first aspect of the embodiments of the present disclosure provides an image decomposition method, the method includes: acquiring an image to be decomposed; using a normal vector estimation model to acquire normal vector information of the image to be decomposed; based on the normal vector information, using the image decomposition model to decompose the image to be decomposed Decomposition is performed to obtain the intrinsic image of the image to be decomposed.
  • the image decomposition model can use the normal vector information to better understand the environmental conditions of the scene in the image to be decomposed, so that the intrinsic image decomposed by the image decomposition model can be decomposed with the image to be decomposed.
  • the scene is better matched, which improves the decomposition effect of the intrinsic image;
  • the normal vector information of the image to be decomposed is obtained by using a normal vector estimation model independent of the image decomposition model, and the targeted model can be used to obtain accurate normal vector information.
  • the matching degree of the eigenimage obtained by subsequent decomposition and the scene of the image to be decomposed is further improved.
  • the above-mentioned intrinsic image includes an illuminance image; the above-mentioned based on normal vector information, using an image decomposition model to decompose the to-be-decomposed image to obtain an intrinsic image of the to-be-decomposed image, including: using the image decomposition model
  • the to-be-decomposed image is processed to obtain scene illumination condition information of the to-be-decomposed image; based on the scene illumination condition information and normal vector information, an illumination rate image of the to-be-decomposed image is obtained.
  • the effect of the image decomposition model for intrinsic image decomposition in a scene with complex illumination environment can be improved.
  • the above-mentioned scene lighting condition information is a normal vector adaptation map including normal vector adaptation vectors of different pixels of the image to be decomposed
  • the above-mentioned normal vector information is a normal vector including the normal vectors of different pixels of the to-be-decomposed image. Normal vector map.
  • the above-mentioned obtaining the illumination rate image of the image to be decomposed based on the scene illumination condition information and the normal vector information includes: performing a dot product on the normal vector adaptive map and the normal vector map to obtain the illumination rate image of the to-be-decomposed image.
  • the illumination conditions that change with the change of the space can be modeled, and the effect of the image decomposition model in the intrinsic image decomposition of the scene with complex illumination environment can be improved.
  • the above-mentioned image decomposition model includes a shared encoder and an illumination rate decoder.
  • the above-mentioned use of the image decomposition model to process the image to be decomposed to obtain scene illumination condition information of the image to be decomposed includes: using a shared encoder to perform feature extraction on the image to be decomposed to obtain an image feature map, and to the image feature map and the normal vector estimation model.
  • the first scene structure feature map output by the normal vector encoder is fused to obtain a first fused feature map; the first fused feature map is decoded by an illumination rate decoder to obtain scene illumination condition information of the image to be decomposed.
  • the image decomposition model can use the structure feature information of the first scene structure feature map to improve The decomposition effect of the eigenimage is obtained.
  • the above-mentioned shared encoder comprises at least one coding unit connected in sequence, each coding unit comprising a normal vector adaptor.
  • the above-mentioned fusion of the image feature map and the first scene structure feature map output by the normal vector encoder of the normal vector estimation model to obtain the first fusion feature map includes: outputting the image feature map to the first coding unit; encoding units: use the normal vector adaptor to fuse the feature map output by the previous encoding unit and the first scene structure feature map to obtain the second fusion feature map corresponding to the encoding unit; wherein, the scene structure corresponding to each encoding unit The feature richness in the feature map is different; based on the second fused feature map of the last coding unit, the first fused feature map is obtained.
  • the image decomposition model can subsequently use the scene structure feature map.
  • the scene structure information about the scene in the image to be decomposed realizes the effect of passing the feature information obtained by the normal vector estimation model to the image decomposition model for use.
  • the method before using the normal vector adaptor to fuse the feature map output by the previous coding unit and the scene structure feature map to obtain the second fused feature map corresponding to the coding unit, the method further includes: Perform down-sampling processing on the feature map output by the previous coding unit; and/or, use a normal vector adaptor to fuse the feature map output by the previous coding unit and the scene structure feature map to obtain the second fusion feature corresponding to the coding unit image, including: using a normal vector adaptor to perform: adjusting the scene structure feature map to a scene structure feature map of a preset scale, and concatenating and convolving the adjusted scene structure feature map with the feature map output by the previous coding unit , to obtain the second fusion feature map corresponding to the coding unit.
  • the normal vector adaptor also concatenates and convolves the scene structure feature map with the feature map output by the previous coding unit, and realizes the fusion of the scene structure feature map and the feature map output by the previous coding unit.
  • using the illumination rate decoder to decode the first fusion feature map to obtain scene illumination condition information of the image to be decomposed includes: using the illumination rate decoder to decode the first fusion feature map and at least one The second fusion feature map of the normal vector adaptor is decoded to obtain scene illumination condition information of the image to be decomposed.
  • the illumination rate decoder can obtain the scene illumination condition information of the image to be decomposed by using the first fusion feature map and the second fusion feature map output by the normal vector adaptor.
  • the above-mentioned image decomposition model further includes a reflectivity decoder; based on the normal vector information, using the image decomposition model to decompose the to-be-decomposed image to obtain an intrinsic image of the to-be-decomposed image, further comprising: using the reflectivity The decoder decodes the first fusion feature map to obtain a reflectivity image of the image to be decomposed.
  • the reflectivity decoder can obtain the reflectivity image of the image to be decomposed by using the first fusion feature map.
  • using the reflectivity decoder to decode the first fusion feature map to obtain the reflectivity image of the image to be decomposed includes: using the reflectivity decoder to decode the first fusion feature map and at least one method The second fusion feature map of the vector adaptor is decoded to obtain a reflectivity image of the image to be decomposed.
  • the reflectivity decoder can obtain the reflectivity image of the image to be decomposed by using the first fused feature map and the second fused feature map of the at least one normal vector adaptor.
  • the above-mentioned normal vector estimation model includes a normal vector encoder, a normal vector decoder and a sub-network.
  • the above-mentioned use of the normal vector estimation model to obtain the normal vector information of the image to be decomposed includes: using a normal vector encoder to encode the image to be decomposed to obtain a first scene structure feature map; using a normal vector decoder to perform the first scene structure feature map. Decode to obtain a decoded feature map; use the sub-network to fuse the first scene structure feature map and the decoded feature map to obtain normal vector information of the image to be decomposed.
  • the normal vector information of the to-be-decomposed image can be obtained by processing the to-be-decomposed image by using the normal vector encoder, the normal vector decoder and the subdivided sub-network of the normal vector estimation model.
  • using the normal vector encoder to encode the to-be-decomposed image to obtain the first scene structure feature map includes: using the normal vector encoder to perform multi-layer encoding on the to-be-decomposed image to obtain the first scene corresponding to each layer.
  • the above-mentioned use of the sub-molecular network to fuse the first scene structure feature map and the decoding feature map to obtain the normal vector information of the image to be decomposed includes: using the sub-molecular network to perform: connecting the first scene structure feature maps corresponding to each layer in series to obtain The second scene structure feature map is concatenated with the decoded feature map to obtain a third scene structure feature map, and based on the third scene structure feature map, normal vector information of the image to be decomposed is obtained.
  • the above-mentioned normal vector estimation model and image decomposition model are obtained by training separately.
  • the method before using the normal vector estimation model to obtain the normal vector information of the image to be decomposed, the method further includes: using a first sample set to train to obtain a normal vector estimation model, wherein the first sample set The images are marked with normal vector information; use the trained normal vector estimation model to obtain the sample normal vector information of the images in the second sample set, and use the second sample set and the sample normal vector information to train the image decomposition model.
  • the above-mentioned second sample set includes a first sub-sample set and a second sub-sample set
  • the image decomposition model is trained by using the second sample set and sample normal vector information, including: using the first sub-sample set
  • the sample set and the sample normal vector information corresponding to the first sub-sample set are used to train the image decomposition model to adjust the parameters of the shared encoder and the illumination rate decoder in the image decomposition model; the second sub-sample set and the second sub-sample set are used.
  • the corresponding sample normal vector information trains the image decomposition model to adjust the parameters of the shared encoder and reflectivity decoder in the image decomposition model.
  • the image decomposition model can obtain better illuminance maps and reflectance maps when the image to be decomposed is decomposed.
  • a second aspect of the embodiments of the present disclosure provides an image decomposition device, the device includes an acquisition module, a normal vector estimation module, and a decomposition module; the acquisition module is configured to acquire an image to be decomposed; the normal vector estimation module is configured to obtain an image by using a normal vector estimation model normal vector information of the image to be decomposed; the decomposition module is configured to decompose the image to be decomposed by using an image decomposition model based on the normal vector information to obtain an intrinsic image of the image to be decomposed.
  • a third aspect of the embodiments of the present disclosure provides an electronic device, including a mutually coupled memory and a processor, where the processor is configured to execute program instructions stored in the memory to implement the image decomposition method in the first aspect.
  • a fourth aspect of the embodiments of the present disclosure provides a computer-readable storage medium on which program instructions are stored, and when the program instructions are executed by a processor, the image decomposition method in the first aspect is implemented.
  • a fifth aspect of the embodiments of the present disclosure provides a computer program, including computer-readable codes, when the computer-readable codes are executed in an electronic device, a processor in the electronic device executes and is configured to implement the above-mentioned first aspect The image decomposition method in .
  • the image decomposition model can use the normal vector information to better understand the environment of the scene in the image to be decomposed, so that the intrinsic image decomposed by the image decomposition model can be decomposed with the to-be-decomposed image.
  • the scene of the image is better matched, which improves the decomposition effect of the intrinsic image;
  • the normal vector information of the image to be decomposed is obtained by using a normal vector estimation model independent of the image decomposition model, and the targeted model can be used to obtain accurate normal vectors information, which further improves the matching degree between the eigenimage obtained by subsequent decomposition and the scene of the image to be decomposed.
  • FIG. 1 is a schematic flowchart 1 of an image decomposition method according to an embodiment of the present disclosure
  • FIG. 2 is a second schematic flowchart of an image decomposition method according to an embodiment of the present disclosure
  • FIG. 3 is a schematic flowchart of obtaining normal vector information of an image to be decomposed by using a normal vector estimation model in an image decomposition method according to an embodiment of the present disclosure
  • FIG. 4 is a schematic diagram of a framework of a normal vector estimation model in an image decomposition method according to an embodiment of the present disclosure
  • FIG. 5 is a schematic flowchart of obtaining a first fusion feature map in an image decomposition method according to an embodiment of the present disclosure
  • FIG. 6 is a schematic diagram of a framework of an image decomposition model in an image decomposition method according to an embodiment of the present disclosure
  • FIG. 7 is a schematic frame diagram of an image decomposition apparatus according to an embodiment of the present disclosure.
  • FIG. 8 is a schematic diagram of a framework of an electronic device according to an embodiment of the present disclosure.
  • FIG. 9 is a schematic diagram of a framework of a computer-readable storage medium according to an embodiment of the present disclosure.
  • Intrinsic image decomposition aims to estimate the illumination rate of the scene and the reflectivity of the material from a single input image, that is, to obtain the illumination rate image and the reflectivity image.
  • a device for implementing the image decomposition method may be a computer or a server or other device.
  • the image decomposition method may be implemented by a processor invoking computer-readable instructions stored in a memory.
  • FIG. 1 is a schematic flow chart 1 of an image decomposition method according to an embodiment of the present disclosure. As shown in Figure 1, the following steps may be included:
  • Step S11 Acquire an image to be decomposed.
  • the image to be decomposed is used as the original input image to decompose the corresponding intrinsic image.
  • the image to be decomposed may be a color image, or a depth image or the like.
  • Step S12 Obtain normal vector information of the image to be decomposed by using the normal vector estimation model.
  • the normal vector estimation model is a neural network built based on deep learning, which is used to extract feature information from the image to be decomposed to obtain the normal vector information of the image to be decomposed.
  • the normal vector estimation model can obtain several feature maps by extracting feature information from the image to be decomposed.
  • the normal vector information is, for example, the normal vector information of each pixel in the image to be decomposed. Through the normal vector information, environmental information in the image to be input can be obtained, such as the structure information of the scene in the image to be decomposed.
  • the normal vector estimation model is a fully convolutional neural network, which can be composed of a coarse-grained to fine-grained two-level network structure.
  • the two-layer network can fuse feature maps of multiple scales (different feature numbers, different image resolutions) to obtain eigenimages with higher resolution, richer details, and more accurate object boundaries in the image.
  • Step S13 Based on the normal vector information, use an image decomposition model to decompose the image to be decomposed to obtain an intrinsic image of the image to be decomposed.
  • the image decomposition model can use the normal vector information to decompose the input image.
  • the image decomposition model can decompose the to-be-decomposed image based on the normal vector information of each pixel in the normal vector information and the structure information of the scene contained in the normal vector information to obtain an intrinsic image, that is, to obtain an illumination rate image. and reflectance images.
  • the image decomposition model is a fully convolutional neural network.
  • the image decomposition model can use the normal vector information to better understand the environmental conditions of the scene in the image to be decomposed, so that the intrinsic image decomposed by the image decomposition model can be decomposed with the image to be decomposed.
  • the scene is better matched, which improves the decomposition effect of the intrinsic image;
  • the normal vector information of the image to be decomposed is obtained by using a normal vector estimation model independent of the image decomposition model, and the targeted model can be used to obtain accurate normal vector information.
  • the matching degree of the eigenimage obtained by subsequent decomposition and the scene of the image to be decomposed is further improved.
  • FIG. 2 is a second schematic flowchart of an image decomposition method according to an embodiment of the present disclosure. As shown in Figure 2, the following steps may be included:
  • Step S21 Acquire the image to be decomposed.
  • Step S22 Obtain normal vector information of the image to be decomposed by using the normal vector estimation model.
  • the normal vector information is a normal vector map including normal vectors of different pixels of the image to be decomposed, that is, each pixel in the image to be decomposed has a corresponding normal vector.
  • the normal vector estimation model includes a normal vector encoder, a normal vector decoder and a sub-network.
  • the normal vector encoder can perform feature extraction on the decomposed image
  • the normal vector decoder can decode the features and output a feature map
  • the subdivision sub-network can refine the output of the decoder.
  • FIG. 3 is a schematic flowchart of obtaining normal vector information of an image to be decomposed by using a normal vector estimation model in an image decomposition method according to an embodiment of the present disclosure.
  • using the normal vector estimation model to obtain the normal vector information of the image to be decomposed may include the following steps S221 to S223 .
  • Step S221 use a normal vector encoder to encode the image to be decomposed to obtain a first scene structure feature map.
  • the normal vector encoder of the normal vector estimation model can be used to encode the to-be-decomposed image, and to extract feature information in the to-be-decomposed image.
  • the feature information obtained by encoding the image to be decomposed by the normal vector encoder is, for example, structural feature information of the scene in the image to be decomposed, and the structural feature information includes, for example, plane information and object boundary information.
  • the normal vector encoder can output the first scene structure feature map, that is, the structure feature map about the scene in the image to be decomposed.
  • the normal vector encoder can be used to perform multi-layer encoding (ie, feature extraction) on the image to be decomposed, and the feature map obtained by each layer of the encoder is the first scene structure feature map.
  • multi-layer encoding ie, feature extraction
  • the first layer of encoding blocks encodes the to-be-decomposed image and outputs the first scene structure feature map.
  • the second-layer coding block takes the first scene structure feature map output by the first-layer coding block as input, performs coding again, and outputs the corresponding first scene structure feature map.
  • the feature richness in the first scene structure feature map output by the coding block of each layer of the normal vector encoder can also be set to be different.
  • the feature richness may include the resolution of the first scene structure feature map, the dimension of feature information, and the like.
  • the first scene structure feature map corresponding to the coding block of the last layer is output to the normal vector decoder.
  • Step S222 Use the normal vector decoder to decode the first scene structure feature map to obtain the decoded feature map.
  • the normal vector decoder may be used to decode the first scene structure feature map to obtain the decoded feature map.
  • the normal vector decoder when it decodes the feature map of the first scene structure, it can decode the feature information extracted by the normal vector encoder, and reconstruct a decoded feature map of a preset dimension and a preset resolution.
  • the dimension of the feature information in the decoded feature map may be 64 dimensions, and the resolution is 1/2 of the image to be decomposed.
  • the normal vector decoder when the normal vector decoder has a multi-layer structure, the normal vector decoder also performs multi-layer decoding on the feature map of the first scene structure, and the first layer decoder also performs multi-layer decoding on the first scene structure.
  • the feature map is decoded, and the corresponding pre-decoded feature map is output.
  • the second layer decoder decodes the decoded feature map output by the first layer decoder, and outputs the corresponding pre-decoded feature map.
  • the pre-decoded feature map output by the last layer is the decoded feature map.
  • Step S223 Fusion of the first scene structure feature map and the decoded feature map by using a fine molecular network to obtain normal vector information of the image to be decomposed.
  • the subdivision network can be used to decode the first scene structure feature map and decoding
  • the feature maps are fused to obtain the normal vector information of the image to be decomposed.
  • the feature information in the first scene structure feature map and the feature information in the decoded feature map may be fused to obtain normal vector information of the image to be decomposed.
  • the feature information in the first scene structure feature map and the feature information in the decoded feature map are both 64-dimensional, and the normal vector information obtained after fusion may be 128-dimensional.
  • the normal vector information is a normal vector map including normal vectors of different pixels of the image to be decomposed, that is, each pixel in the image to be decomposed has a corresponding normal vector.
  • the sub-network in the case where the normal vector encoder has a multi-layer structure, can be used to concatenate the first scene structure feature maps corresponding to each layer to obtain the second scene structure feature map, and the second scene structure feature map can be obtained The scene structure feature map is concatenated with the decoding feature map to obtain a third scene structure feature map.
  • it may also be configured to use the first scene structure feature map output by the partial encoding layer of the normal vector encoder to perform concatenation.
  • a refinement sub-network may be used to process the first scene structure feature map output by each layer of the encoder, so that each second scene structure feature map has the same feature dimension and resolution.
  • the refinement sub-network may further decode based on the feature information of the third scene structure feature map to obtain normal vector information of the image to be decomposed, such as a normal vector map.
  • FIG. 4 is a schematic diagram of a framework of a normal vector estimation model in an image decomposition method according to an embodiment of the present disclosure.
  • the normal vector estimation model 400 includes: a normal vector encoder 401 , a normal vector decoder 402 and a refinement sub-network 403 .
  • the normal vector encoder 401 includes at least one initial convolution block 4011 (referred to as conv1, which may include three convolution layers and a maximum pooling layer), and four including squeeze-and-excitation blocks (Squeeze-and-Excitation block, SE block) coding block 4012.
  • the initial convolution block 4011 can perform preliminary coding on the to-be-decomposed image, and output the feature map to the coding block 4012 .
  • the encoding block 4012 compresses the resolution of the feature map to 1/4, 1/8, 1/16, and 1/32 of the original input image while gradually extracting features of higher dimensions.
  • Each encoding block 4012 outputs the first scene structure feature map, and the first scene structure feature map output by the last encoding block 4012 is output to the normal vector decoder 402 .
  • the normal vector decoder 402 includes a convolution block 4021 (denoted as conv2) and 4 up-projection blocks 4022 (denoted as up-projection block 5 to up-projection block 8). These 4 up-projection blocks 4022 The features are decoded step by step and a decoded feature map with a dimension of 64 and a resolution of 1/2 of the image to be decomposed is reconstructed.
  • the refinement sub-network 403 includes 4 up-projection blocks 4031 (denoted as up-projection block 1 to up-projection block 4) and 4 convolutional layers 4032 (denoted as conv3 to conv6).
  • the first scene feature map extracted by the encoding block 4012 is concatenated using the skip-connection and the up-projection block 4031 to obtain the second scene structure feature map.
  • the second scene structure feature map and the decoded feature map are then concatenated to obtain a third scene structure feature map.
  • four convolution layers 4032 are used to perform layer-by-layer decoding, and finally the normal vector information of the image to be decomposed, that is, the normal vector map, is obtained.
  • the image to be decomposed After obtaining the normal vector information of the image to be decomposed, the image to be decomposed can be decomposed by using the obtained normal vector information to obtain the intrinsic image of the image to be decomposed.
  • the above-mentioned steps of "decomposing an image to be decomposed by using an image decomposition model based on normal vector information to obtain an intrinsic image of the image to be decomposed" include the following steps.
  • Step S23 Use the image decomposition model to process the image to be decomposed to obtain scene lighting condition information of the image to be decomposed.
  • the image decomposition model can be, for example, a fully convolutional neural network.
  • the image decomposition model can perform a feature extraction operation on the image to be decomposed, and obtain scene illumination condition information of the image to be decomposed.
  • the scene illumination condition information can be understood as the illumination condition of the scene in the image to be decomposed.
  • the scene lighting condition information is a normal vector adaptation map including normal vector adaptation vectors of different pixels of the image to be decomposed. Normal vector adaptation maps can be used to encode scene lighting conditions.
  • the image decomposition model includes a shared encoder, an illumination rate decoder, and a reflectivity decoder. Use the image decomposition model to process the image to be decomposed to obtain scene lighting condition information of the image to be decomposed, which may specifically include the following steps:
  • Step S231 use the shared encoder to perform feature extraction on the image to be decomposed to obtain an image feature map, and fuse the image feature map and the first scene structure feature map output by the normal vector encoder of the normal vector estimation model to obtain a first fusion feature map .
  • the feature information extracted by the shared encoder is used to obtain both the illuminance image and the reflectance image.
  • the first fusion feature map obtained after fusion may include structural feature information and other feature information of the scene in the image to be decomposed.
  • the shared encoder includes at least one coding unit connected in sequence, and each coding unit includes a normal vector adaptor (Normal Feature Adapter, NFA).
  • NFA Normal Feature Adapter
  • FIG. 5 is a schematic flowchart of obtaining a first fusion feature map in an image decomposition method according to an embodiment of the present disclosure.
  • the image feature map and the scene structure feature map output by the normal vector encoder of the normal vector estimation model are fused to obtain the first fused feature map, which may specifically include the following steps S2311 to S2313 .
  • Step S2311 Output the image feature map to the first coding unit.
  • other encoders of the image decomposition model can first perform feature extraction on the image to be decomposed to obtain an image feature map. Then, output the image feature map to the first coding unit. The image feature map is further processed by the first coding unit.
  • Step S2312 each coding unit uses a normal vector adaptor to fuse the feature map and the scene structure feature map output by the previous coding unit to obtain the second fusion feature map corresponding to the coding unit; wherein, the scene corresponding to each coding unit Feature richness differs in structural feature maps.
  • a normal vector adaptor may be used to fuse the feature map output by the previous coding unit and the scene structure feature map to obtain a second fused feature map corresponding to the coding unit.
  • the feature richness in the scene structure feature map corresponding to each coding unit is different. Different feature richness can be understood as the resolution of the scene structure feature map and the dimension of the feature information.
  • the first coding unit For the first coding unit, it obtains the image feature map after feature extraction by other convolution blocks.
  • the acquired image feature map is the second fusion feature map output by the first coding unit.
  • the normal vector encoder of the normal vector estimation model When the normal vector encoder of the normal vector estimation model has only one layer, it means that the normal vector encoder only outputs a first scene structure feature map. At this time, all coding units can use the unique first scene structure feature map. It is fused with the feature map output by the previous coding unit. When the normal vector encoder of the normal vector estimation model has multiple layers, the first scene structure feature map output by the multi-layer normal vector encoder can be fused with the feature map output by the coding unit.
  • the first scene structure feature map obtained by the normal vector encoder of the layer is output to the first coding unit, and the first scene structure feature map obtained by the normal vector encoder of the second layer is output to the second coding unit, so that the second Each coding unit can use the feature map output by the previous coding unit to fuse with the first scene structure feature map obtained by the normal vector encoder of the second layer.
  • the normal vector adaptor fuses the feature map output by the previous coding unit and the scene structure feature map to obtain a second fused feature map corresponding to the coding unit, including: the normal vector adaptor fuses the scene
  • the structure feature map is adjusted to a scene structure feature map of a preset scale, for example, the resolution of the scene structure feature map and the dimension of feature information are adjusted.
  • the normal vector adaptor then concatenates and convolves the adjusted scene structure feature map and the feature map output by the previous coding unit to obtain a second fusion feature map corresponding to the coding unit.
  • the normal vector adaptor of the second coding unit can concatenate and convolve the second fusion feature map output by the first coding unit and the scene structure feature map input to it, so as to obtain a The corresponding second fusion feature map.
  • the normal vector adaptor also concatenates and convolves the scene structure feature map with the feature map output by the previous coding unit, so as to realize the fusion of the scene structure feature map and the feature map output by the previous coding unit.
  • each coding unit before using the normal vector adaptor to fuse the feature map and the scene structure feature map output by the previous coding unit to obtain the second fused feature map corresponding to the coding unit , and perform down-sampling processing on the feature map output by the previous coding unit.
  • the second encoding unit performs down-sampling processing on the second fusion feature map output by the first unit. Through the down-sampling process, the second fusion feature map can be reduced so that the second fusion feature map meets the requirements.
  • Step S2313 Obtain a first fused feature map based on the second fused feature map of the last coding unit.
  • the last layer of the shared encoder of the image decomposition model is not the last coding unit, that is, after the last coding unit of the shared encoder, there are still several coding blocks for the last coding unit.
  • After a coding unit outputs the second fused feature map it continues to encode it to further process the fused feature information.
  • the image output after the last layer of shared encoder processing is the first fusion feature map.
  • the second fused feature map output by the last coding unit can be down-sampled, the second fused feature map can be further reduced, and then the coding block is used for coding again to extract feature information. At this time, the output feature map is the first fusion feature map.
  • the second fusion feature map may also be directly used as the first fusion feature map.
  • the image decomposition model can subsequently use the scene structure feature map.
  • the scene structure information about the scene in the image to be decomposed realizes the effect of passing the feature information obtained by the normal vector estimation model to the image decomposition model for use, and improves the decomposition effect of the intrinsic image.
  • the image to be decomposed can be decomposed by using the first fusion feature map to obtain an intrinsic image.
  • Step S232 Decode the first fusion feature map by using an illumination rate decoder to obtain scene illumination condition information of the image to be decomposed.
  • an illumination rate decoder can be used to decode the first fused feature map to obtain the scene illumination condition information of the image to be decomposed, for example is to obtain the normal vector adaptation map of the normal vector adaptation vector of each pixel in the image to be decomposed.
  • the normal vector adaptation vector is defined as follows: the three components of the normal vector adaptation vector are represented by x, y, and z,
  • the decoding the first fusion feature map by the illumination rate decoder to obtain the scene illumination condition information of the image to be decomposed includes: using the illumination rate decoder to decode the first fusion feature map by using the illumination rate decoder
  • the fusion feature map and the second fusion feature map of at least one normal vector adaptor are decoded to obtain scene illumination condition information of the image to be decomposed.
  • the illumination rate decoder can simultaneously obtain the first fused feature map output by the last layer of the shared encoder of the image decomposition model, and the second fused feature map of at least one normal vector adaptor, and decode the two feature maps, In order to obtain the scene lighting condition information of the image to be decomposed.
  • the last layer of the shared encoder is a coding unit
  • the first fused feature map output by the last coding unit and the second fused feature map output by the normal vector adaptors of other coding units can be obtained for decoding.
  • the illumination rate decoder can simultaneously acquire the second fused feature maps output by the multiple coding units for decoding. For example, if the illumination rate decoder obtains the second fusion feature map output by 3 connected coding units, then in the illumination rate decoder, you can set 3 connected convolutional layers (such as up-projection blocks) to obtain 3 The second fusion feature map output by each coding unit is used for decoding.
  • the first convolutional layer of the illumination rate decoder can obtain the first fused feature map output by the shared encoder and the second fused feature map output by the first normal vector adaptor for decoding, and output the feature map.
  • the second convolutional layer of the illumination rate decoder can use the feature map output from the previous convolutional layer and the second fused feature map output from the second normal vector adaptor for decoding.
  • the convolutional layer of the illumination rate decoder uses the first fusion feature map and the second fusion feature map for decoding
  • several convolutional layers may be used for decoding to adjust the illumination rate decoding The final output of the light rate map.
  • the scene illumination condition information about the image to be decomposed for example, by obtaining the normal vector adaptation map of the normal vector adaptation vector of each pixel, the illumination conditions that change with the change of space can be modeled, and the image decomposition can be improved.
  • Step S24 Based on the scene illumination condition information and the normal vector information, an illumination rate image of the to-be-decomposed image is obtained.
  • the decomposed graph can be decomposed based on the scene illumination condition information and the normal vector information output by the normal vector estimation model to obtain the illumination rate image of the image to be decomposed.
  • a normal vector adaptation map and a normal vector map can be used to obtain the illumination rate image of the image to be decomposed.
  • a dot product of the normal vector adaptive map and the normal vector map may be performed to obtain an illumination rate image of the image to be decomposed.
  • the normal vector adaptation map makes full use of the plane information and object boundary information in the scene structure feature information provided by the normal vector estimation model, so that the illumination rate image decomposed by the image decomposition model can reduce the problem of texture residue on the plane area.
  • the object can have a clear and sharp outline, and the scene of the reflectivity image can be better matched with the scene of the image to be decomposed.
  • the image decomposition model may further include an reflectivity decoder. Because the feature information extracted by the shared encoder can be used to obtain the reflectivity image. Therefore, after the shared encoder performs feature extraction on the image to be decomposed, that is, after step S231, the following step 1 can be continued:
  • Step 1 Decode the first fusion feature map with a reflectivity decoder to obtain a reflectivity image of the image to be decomposed.
  • the output of the last layer of the shared encoder is the first fusion feature map
  • the first fusion feature map includes scene structure feature information of the scene in the image to be decomposed. Therefore, a reflectivity decoder can be used to decode the first fusion feature map to obtain a reflectivity image of the image to be decomposed.
  • using the reflectivity decoder to decode the first fusion feature map to obtain the reflectivity image of the image to be decomposed includes: using the reflectivity decoder to fuse the first fusion feature map The feature map and the second fusion feature map of at least one normal vector adaptor are decoded to obtain a reflectivity image of the image to be decomposed.
  • the reflectivity decoder can simultaneously obtain the first fused feature map output by the last layer of the shared encoder of the image decomposition model, and the second fused feature map of at least one normal vector adaptor, and decode the two feature maps, to obtain the reflectance image of the image to be decomposed.
  • the last layer of the shared encoder is a coding unit
  • the first fused feature map output by the last coding unit and the second fused feature map output by the normal vector adaptors of other coding units can be obtained for decoding.
  • the reflectivity decoder can simultaneously acquire the second fused feature maps output by the multiple coding units for decoding.
  • the reflectivity decoder obtains the second fusion feature map output by three sequentially connected coding units
  • three sequentially connected convolutional layers (such as up-projection blocks) can be set up respectively Obtain the second fusion feature map output by the three coding units for decoding.
  • the first convolutional layer of the reflectivity decoder can obtain the first fused feature map output by the shared encoder and the second fused feature map output by the first normal vector adaptor for decoding, and output the feature map.
  • the second convolutional layer of the reflectivity decoder can use the feature map output from the previous convolutional layer and the second fused feature map output from the second normal vector adaptor for decoding.
  • the convolutional layer of the reflectivity decoder uses the first fused feature map and the second fused feature map for decoding
  • several convolutional layers can be used for decoding to adjust the reflectivity decoding The reflectance map of the final output of the device.
  • the image to be decomposed is decomposed by using the first fusion feature map that contains the scene structure feature information of the scene in the image to be decomposed, and the scene structure feature information is used, and then each object of the scene in the image to be decomposed is assigned a more consistent
  • the reflectivity improves the decomposition effect of the intrinsic image.
  • FIG. 6 is a schematic frame diagram of an image decomposition model in an image decomposition method according to an embodiment of the present disclosure.
  • the image decomposition model 60 includes: a shared encoder 61 , an illumination rate decoder 62 and a reflectance decoder 63 .
  • the image decomposition model 60 is, for example, a fully convolutional neural network.
  • the shared encoder 61 includes a convolution block 611 (conv1 as shown) and several coding units 612 .
  • the encoding unit 612 includes a normal vector adaptor 6121 (eg, NFA1, NFA2, NFA3).
  • the normal vector adaptor 6121 may be linked with partial coding of the normal vector estimation model.
  • the illumination rate decoder 62 includes several convolutional blocks 621, some of which are up-projection blocks (up-projection block 1 to up-projection block 4 as shown in the figure).
  • the reflectivity decoder 63 includes several convolution blocks 631, some of which are up-projection blocks (up-projection block 5 to up-projection block 8 as shown in the figure).
  • the normal vector adaptor 6121 is skip-linked to the partial convolution block 621 of the illumination rate decoder 62 and the partial convolution block 631 of the reflectivity decoder 63, respectively.
  • the image decomposition model 60 may process the to-be-decomposed image to obtain scene illumination condition information of the to-be-decomposed image.
  • the image decomposition model 60 may also obtain the illumination rate image of the image to be decomposed based on the scene illumination condition information and the normal vector information output by the normal vector estimation model.
  • the image decomposition model 60 may also output a reflectance image.
  • the shared encoder 61 can perform feature extraction on the image to be decomposed to obtain an image feature map, and fuse the image feature map with the first scene structure feature map output by the normal vector encoder of the normal vector estimation model, and output the first fused feature map.
  • the convolution block located before the encoding unit 612 may perform feature extraction on the to-be-decomposed image to obtain the image feature map mentioned in the above embodiments.
  • the coding unit 612 can use the normal vector adaptor 6121 to fuse the feature map output by the previous coding unit and the scene structure feature map output by the encoder of the normal vector estimation model to obtain a second fused feature map corresponding to the coding unit.
  • Y represents the scene structure feature map output by the encoder of the normal vector estimation model.
  • the convolution block located after the coding unit 612 may further encode the second fused feature map output by the last coding unit, and finally output the first fused feature map.
  • the coding unit 612 may further include a down-sampling convolution block 6122 (referred to as a down-sampling block, as shown in the figure, the sampling block 1 to the down-sampling block 4), for down-sampling the feature map output by the previous coding unit deal with.
  • Illumination rate decoder 62 may include 5 convolutional blocks.
  • the last layer of convolution block 621 (conv4 as shown in the figure) outputs scene lighting condition information of the image to be decomposed, such as a normal vector adaptation map.
  • A represents the normal vector adaptation map
  • N represents the normal vector map output by the refinement sub-network.
  • the image decomposition model 60 takes the dot product of the normal vector adaptation map A and the normal vector map N, and can obtain the illuminance graph.
  • the reflectivity decoder 63 may include five convolution blocks 631, the convolution block 631 will perform layer-by-layer decoding on the first fusion structure feature map output by the shared encoder, and the last layer of the convolution block 631 (as shown in the figure) conv6) directly output the reflectance image.
  • the embodiments of the present disclosure also provide the training methods for the normal vector estimation model and the image decomposition model mentioned in the above-mentioned image decomposition method embodiments.
  • the normal vector estimation model and the image decomposition model may be trained first.
  • the normal vector estimation model contains independent normal vector encoder, normal vector decoder and sub-network. Therefore, a separate training of the normal vector estimation model can be implemented. At the same time, separate training of the image decomposition model can also be achieved.
  • the normal vector estimation model and the image decomposition model are obtained by training separately. That is, when training the normal vector estimation model and the image decomposition model, the normal vector estimation model and the image decomposition model can be trained separately.
  • the normal vector estimation model can be trained separately, so that the normal vector estimation model can be trained using only normal vector sample data.
  • the model is used to improve the decomposition effect of the intrinsic image and reduce the influence on the decomposition effect of the intrinsic image caused by the lack of intrinsic image sample data.
  • the normal vector estimation model when the normal vector estimation model is trained, may be obtained by training with a first sample set, wherein the images in the first sample set are marked with normal vector information.
  • the normal vector information is, for example, that each pixel in the image has a corresponding normal vector.
  • the first sample set includes, for example, the NYUv2 dataset and the Dense Indoor and Outdoor DEpth (DIODE) dataset.
  • the sample normal vector information of the images in the second sample set can be obtained by using the trained normal vector estimation model, and the image decomposition model can be trained by using the second sample set and the sample normal vector information.
  • the images of the second sample set may be annotated with the ground truth illuminance map and the ground truth reflectance map.
  • the second sample set is, for example, a CGI data set.
  • the second sample set includes a first sub-sample set and a second sub-sample set.
  • the images of the first sub-sample set may be marked with the ground truth value of the illuminance map, and the images of the second subset of samples may be marked with the ground truth value of the reflectivity map.
  • steps 1 and 2 may be specifically performed.
  • Step 1 The image decomposition model is trained by using the first sub-sample set and the sample normal vector information corresponding to the first sub-sample set, so as to adjust the parameters of the shared encoder and the illumination rate decoder in the image decomposition model.
  • the normal vector information corresponding to the first sub-sample set is obtained by using the trained normal vector estimation model.
  • the training of the shared encoder and the illuminance decoder in the image decomposition model can be achieved by utilizing the first subset of samples annotated with the ground truth of the illuminance map.
  • Step 2 The image decomposition model is trained by using the second sub-sample set and the sample normal vector information corresponding to the second sub-sample set, so as to adjust the parameters of the shared encoder and the reflectivity decoder in the image decomposition model.
  • the normal vector information corresponding to the second sub-sample set is obtained by using the trained normal vector estimation model.
  • the shared encoder and illuminance decoder in the image decomposition model After training the shared encoder and illuminance decoder in the image decomposition model with the first sub-sample set marked with the ground truth value of the illuminance map, the shared encoder and reflectance in the image decomposition model can be further trained on this basis.
  • the decoder is trained. Specifically, the shared encoder and the reflectivity decoder in the image decomposition model can be trained by using the second sub-sample set marked with the ground truth value of the reflectivity map.
  • the training effect can be judged according to the relevant loss function, and then the network parameters of each model can be adjusted according to the size of the loss value to complete the training.
  • the image decomposition model can obtain better illuminance maps and reflectance maps when decomposing the to-be-decomposed image.
  • the image decomposition model can use the normal vector information to better understand the environment of the scene in the image to be decomposed, so that the intrinsic image decomposed by the image decomposition model can be decomposed with the to-be-decomposed image.
  • the scene of the image is better matched, which improves the decomposition effect of the intrinsic image;
  • the normal vector information of the image to be decomposed is obtained by using a normal vector estimation model independent of the image decomposition model, and the targeted model can be used to obtain accurate normal vectors information, which further improves the matching degree between the eigenimage obtained by subsequent decomposition and the scene of the image to be decomposed.
  • the writing order of each step does not mean a strict execution order but constitutes any limitation on the implementation process, and the specific execution order of each step should be based on its function and possible Internal logic is determined.
  • FIG. 7 is a schematic frame diagram of an image decomposition apparatus according to an embodiment of the present disclosure.
  • the image decomposition apparatus 70 includes an acquisition module 71 , a normal vector estimation module 72 and a decomposition module 73 .
  • the acquisition module 71 is configured to perform the acquisition of the image to be decomposed;
  • the normal vector estimation module 72 is configured to perform the acquisition of the normal vector information of the image to be decomposed by using the normal vector estimation model;
  • the image is decomposed to obtain the intrinsic image of the image to be decomposed.
  • the above-mentioned intrinsic images include illuminance images.
  • the above-mentioned decomposition module 73 is configured to utilize the image decomposition model to process the image to be decomposed to obtain scene illumination condition information of the image to be decomposed; based on the scene illumination condition information and normal vector information, obtain the illumination rate image of the image to be decomposed.
  • the above-mentioned scene lighting condition information is a normal vector adaptation map including normal vector adaptation vectors of different pixels of the image to be decomposed
  • the normal vector information is a normal vector including normal vectors of different pixels of the image to be decomposed.
  • the above-mentioned decomposition module 73 is configured to: perform a dot product on the normal vector adaptive map and the normal vector map to obtain an illumination rate image of the image to be decomposed.
  • the above-mentioned image decomposition model includes a shared encoder and an illumination rate decoder.
  • the above-mentioned decomposition module 73 is configured to: use the shared encoder to perform feature extraction on the image to be decomposed to obtain an image feature map, and fuse the image feature map and the first scene structure feature map output by the normal vector encoder of the normal vector estimation model to obtain.
  • the first fusion feature map; the first fusion feature map is decoded by an illumination rate decoder to obtain scene illumination condition information of the image to be decomposed.
  • the above-mentioned shared encoder includes at least one encoding unit connected in sequence, each encoding unit includes a normal vector adaptor, and the above-mentioned decomposition module 73 is configured to: output the image feature map to the first coding unit; for each coding unit: use a normal vector adaptor to fuse the feature map output by the previous coding unit and the first scene structure feature map to obtain the second fusion feature map corresponding to the coding unit; wherein, each coding unit The feature richness in the scene structure feature map corresponding to the unit is different; based on the second fused feature map of the last coding unit, the first fused feature map is obtained.
  • the above-mentioned decomposition module 73 is configured to perform fusion of the feature map and the scene structure feature map output by the previous coding unit by using the normal vector adaptor to obtain the second fusion feature map corresponding to the coding unit Before, down-sampling is performed on the feature map output by the previous coding unit. And/or, the above-mentioned decomposition module 73 is configured to: use a normal vector adaptor to perform: adjust the scene structure feature map to a scene structure feature map of a preset scale, and output the adjusted scene structure feature map and the previous coding unit. The feature maps are concatenated and convolved to obtain the second fusion feature map corresponding to the coding unit.
  • the above-mentioned decomposition module 73 is configured to: use an illumination rate decoder to decode the first fused feature map and the second fused feature map of at least one normal vector adaptor to obtain the scene of the image to be decomposed Lighting condition information.
  • the above-mentioned image decomposition model further includes a reflectivity decoder.
  • the above-mentioned decomposition module 73 is configured to: decode the first fusion feature map by using a reflectivity decoder to obtain a reflectivity image of the image to be decomposed.
  • the above-mentioned decomposition module 73 is configured to: use a reflectivity decoder to decode the first fused feature map and the second fused feature map of at least one normal vector adaptor to obtain the reflection of the image to be decomposed rate image.
  • the above-mentioned normal vector estimation model includes a normal vector encoder, a normal vector decoder and a sub-network.
  • the above-mentioned normal vector estimation module 72 is configured to: use the normal vector encoder to encode the image to be decomposed to obtain the first scene structure feature map; use the normal vector decoder to decode the first scene structure feature map to obtain the decoded feature map; use The fine molecular network fuses the first scene structure feature map and the decoded feature map to obtain normal vector information of the image to be decomposed.
  • the above-mentioned normal vector estimation module 72 is configured to: use a normal vector encoder to perform multi-layer encoding on the to-be-decomposed image to obtain a first scene structure feature map corresponding to each layer, wherein the first scene structure feature map corresponding to each layer is The feature richness of the two scene structure feature maps is different, and the first scene structure feature map corresponding to the last layer is output to the normal vector decoder.
  • the above-mentioned normal vector estimation module 72 is configured to: use the sub-network to perform: connect the first scene structure feature map corresponding to each layer in series to obtain the second scene structure feature map, and combine the second scene structure feature map with the decoding feature map
  • the third scene structure feature map is obtained in series, and based on the third scene structure feature map, normal vector information of the image to be decomposed is obtained.
  • the above-mentioned normal vector estimation model and image decomposition model are obtained by training separately.
  • the image decomposition apparatus 70 further includes a training module, which is configured to use the first sample set to train to obtain the normal vector information before the normal vector estimation module 72 executes using the normal vector estimation model to obtain the normal vector information of the image to be decomposed.
  • a vector estimation model wherein the images in the first sample set are marked with normal vector information; use the trained normal vector estimation model to obtain sample normal vector information of the images in the second sample set, and use the second sample set and the sample normal vector information to train the image decomposition model.
  • the above-mentioned second sample set includes a first sub-sample set and a second sub-sample set.
  • the above-mentioned training module is configured to: use the first sub-sample set and the sample normal vector information corresponding to the first sub-sample set to train the image decomposition model, so as to adjust the parameters of the shared encoder and the illumination rate decoder in the image decomposition model;
  • the second sub-sample set and the sample normal vector information corresponding to the second sub-sample set train the image decomposition model to adjust the parameters of the shared encoder and the reflectivity decoder in the image decomposition model.
  • FIG. 8 is a schematic diagram of a frame of an electronic device according to an embodiment of the present disclosure.
  • the electronic device 80 includes a memory 81 and a processor 82 coupled to each other, and the processor 82 is configured to execute program instructions stored in the memory 81 to implement the steps of any of the above image decomposition method embodiments.
  • the electronic device 80 may include, but is not limited to, a microcomputer and a server.
  • the electronic device 80 may also include mobile devices such as a notebook computer and a tablet computer, which are not limited herein.
  • the processor 82 is configured to control itself and the memory 81 to implement the steps of any of the above image decomposition method embodiments.
  • the processor 82 may also be referred to as a central processing unit (Central Processing Unit, CPU).
  • the processor 82 may be an integrated circuit chip with signal processing capability.
  • the processor 82 may also be a general-purpose processor, a digital signal processor (Digital Signal Processor, DSP), an application specific integrated circuit (Application Specific Integrated Circuit, ASIC), a field programmable gate array (Field-Programmable Gate Array, FPGA) or other Programmable logic devices, discrete gate or transistor logic devices, discrete hardware components.
  • a general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
  • the processor 82 may be jointly implemented by an integrated circuit chip.
  • FIG. 9 is a schematic diagram of a framework of a computer-readable storage medium according to an embodiment of the disclosure.
  • the computer-readable storage medium 90 stores program instructions 901 that can be executed by the processor, and the program instructions 901 are used to implement the steps of any of the above image decomposition method embodiments.
  • the image decomposition model can use the normal vector information to better understand the environment of the scene in the image to be decomposed, so that the intrinsic image decomposed by the image decomposition model can be decomposed with the to-be-decomposed image.
  • the scene of the image is better matched, which improves the decomposition effect of the intrinsic image;
  • the normal vector information of the image to be decomposed is obtained by using a normal vector estimation model independent of the image decomposition model, and the targeted model can be used to obtain accurate normal vectors information, which further improves the matching degree between the eigenimage obtained by subsequent decomposition and the scene of the image to be decomposed.
  • the functions or modules included in the apparatuses provided in the embodiments of the present disclosure may be used to execute the methods described in the above method embodiments.
  • the disclosed method and apparatus may be implemented in other manners.
  • the device implementations described above are only illustrative.
  • the division of modules or units is only a logical function division. In actual implementation, there may be other divisions.
  • units or components may be combined or integrated. to another system, or some features can be ignored, or not implemented.
  • the shown or discussed mutual coupling or direct coupling or communication connection may be through some interfaces, indirect coupling or communication connection of devices or units, which may be in electrical, mechanical or other forms.
  • each functional unit in each embodiment of the present application may be integrated into one processing unit, or each unit may exist physically alone, or two or more units may be integrated into one unit.
  • the above-mentioned integrated units may be implemented in the form of hardware, or may be implemented in the form of software functional units.
  • the integrated unit if implemented as a software functional unit and sold or used as a stand-alone product, may be stored in a computer-readable storage medium.
  • the technical solutions of the present application can be embodied in the form of software products in essence, or the parts that contribute to the prior art, or all or part of the technical solutions, and the computer software products are stored in a storage medium , including several instructions to make a computer device (which may be a personal computer, a server, or a network device, etc.) or a processor (processor) to execute all or part of the steps of the methods of the various embodiments of the present application.
  • the aforementioned storage medium includes: U disk, mobile hard disk, read-only memory (Read-Only Memory, ROM), random access memory (Random Access Memory, RAM), magnetic disk or optical disk and other media that can store program codes .

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Biology (AREA)
  • Health & Medical Sciences (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Quality & Reliability (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

Provided are an image decomposition method and related apparatus and device. The method comprises: obtaining an image to be decomposed (S11); using a normal vector estimation model to obtain normal vector information of the image to be decomposed (S12); on the basis of the normal vector information, using an image decomposition model to decompose the image to be decomposed, to obtain an intrinsic image of the image to be decomposed (S13).

Description

图像分解方法和相关装置、设备Image decomposition method and related devices and equipment
相关申请的交叉引用CROSS-REFERENCE TO RELATED APPLICATIONS
本公开基于申请号为202010898798.1、申请日为2020年08月31日的中国专利申请提出,并要求该中国专利申请的优先权,该中国专利申请的全部内容在此以引入方式并入本公开。The present disclosure is based on a Chinese patent application with application number 202010898798.1 and an application date of August 31, 2020, and claims the priority of the Chinese patent application, the entire contents of which are hereby incorporated by reference into the present disclosure.
技术领域technical field
本公开涉及图像处理技术领域,特别涉及一种图像分解方法和相关装置、设备。The present disclosure relates to the technical field of image processing, and in particular, to an image decomposition method and related devices and equipment.
背景技术Background technique
本征图像分解是计算机视觉和计算机图形学领域的重要问题之一。本征图像是指将一幅原图像分解成光照率(shading)图像和反射率(reflectance/albedo)图像。本征图像在三维重建、真实感图像编辑、增强现实、语义分割等领域有着广泛的应用,影响重大。Intrinsic image decomposition is one of the important problems in computer vision and computer graphics. Intrinsic image refers to decomposing an original image into a shading image and a reflectance/albedo image. Intrinsic images have a wide range of applications in 3D reconstruction, photorealistic image editing, augmented reality, semantic segmentation and other fields, and have a great impact.
目前,如何实现本征图像的分解,使得本征图像更好与原图像中场景信息匹配,具有非常重要的意义。At present, how to realize the decomposition of the intrinsic image so that the intrinsic image can better match the scene information in the original image is of great significance.
发明内容SUMMARY OF THE INVENTION
本公开实施例至少提供一种图像分解方法和相关装置、设备。The embodiments of the present disclosure provide at least an image decomposition method and related apparatus and equipment.
本公开实施例第一方面提供了一种图像分解方法,该方法包括:获取待分解图像;利用法向量估计模型获取待分解图像的法向量信息;基于法向量信息,利用图像分解模型对待分解图像进行分解,得到待分解图像的本征图像。A first aspect of the embodiments of the present disclosure provides an image decomposition method, the method includes: acquiring an image to be decomposed; using a normal vector estimation model to acquire normal vector information of the image to be decomposed; based on the normal vector information, using the image decomposition model to decompose the image to be decomposed Decomposition is performed to obtain the intrinsic image of the image to be decomposed.
因此,通过获取待分解图像的法向量信息,使得图像分解模型能够利用法向量信息更好地理解待分解图像中场景的环境情况,使得图像分解模型分解得到的本征图像能够与待分解图像的场景较好匹配,提高了本征图像的分解效果;另外,待分解图像的法向量信息是利用独立于图像分解模型的法向量估计模型得到,采用针对性模型,能够得到准确的法向量信息,进一步提高了后续分解得到的本征图像与待分解图像的场景的匹配度。Therefore, by obtaining the normal vector information of the image to be decomposed, the image decomposition model can use the normal vector information to better understand the environmental conditions of the scene in the image to be decomposed, so that the intrinsic image decomposed by the image decomposition model can be decomposed with the image to be decomposed. The scene is better matched, which improves the decomposition effect of the intrinsic image; in addition, the normal vector information of the image to be decomposed is obtained by using a normal vector estimation model independent of the image decomposition model, and the targeted model can be used to obtain accurate normal vector information. The matching degree of the eigenimage obtained by subsequent decomposition and the scene of the image to be decomposed is further improved.
在一些可选实施例中,上述的本征图像包括光照率图像;上述的基于法向量信息,利用图像分解模型对待分解图像进行分解,得到待分解图像的本征图像,包括:利用图像分解模型对待分解图像进行处理,得到待分解图像的场景光照条件信息;基于场景光照条件信息和法向量信息,得到待分解图像的光照率图像。In some optional embodiments, the above-mentioned intrinsic image includes an illuminance image; the above-mentioned based on normal vector information, using an image decomposition model to decompose the to-be-decomposed image to obtain an intrinsic image of the to-be-decomposed image, including: using the image decomposition model The to-be-decomposed image is processed to obtain scene illumination condition information of the to-be-decomposed image; based on the scene illumination condition information and normal vector information, an illumination rate image of the to-be-decomposed image is obtained.
因此,通过获取关于待分解图像的场景光照条件信息,可以提高图像分解模型在光照环境复杂的场景下的本征图像分解的效果。Therefore, by acquiring the scene illumination condition information about the image to be decomposed, the effect of the image decomposition model for intrinsic image decomposition in a scene with complex illumination environment can be improved.
在一些可选实施例中,上述的场景光照条件信息为包含待分解图像不同像素的法向量自适应向量的法向量自适应图,上述的法向量信息为包含待分解图像不同像素的法向量的法向量图。上述的基于场景光照条件信息和法向量信息,得到待分解图像的光照率图像,包括:将法向量自适应图和法向量图进行点积,得到待分解图像的光照率图像。In some optional embodiments, the above-mentioned scene lighting condition information is a normal vector adaptation map including normal vector adaptation vectors of different pixels of the image to be decomposed, and the above-mentioned normal vector information is a normal vector including the normal vectors of different pixels of the to-be-decomposed image. Normal vector map. The above-mentioned obtaining the illumination rate image of the image to be decomposed based on the scene illumination condition information and the normal vector information includes: performing a dot product on the normal vector adaptive map and the normal vector map to obtain the illumination rate image of the to-be-decomposed image.
因此,通过法向量自适应图,可以建模随空间的变化而变化的光照条件,提高了图像分解模型在光照环境复杂的场景下的本征图像分解的效果。Therefore, through the normal vector adaptive map, the illumination conditions that change with the change of the space can be modeled, and the effect of the image decomposition model in the intrinsic image decomposition of the scene with complex illumination environment can be improved.
在一些可选实施例中,上述的图像分解模型包括共享编码器和光照率解码器。上述的利用图像分解模型对待分解图像进行处理,得到待分解图像的场景光照条件信息,包括:利用共享编码器对待分解图像进行特征提取得到图像特征图,并对图像特征图和法向量估计模型的法向量编码器输出的第一场景结构特征图进行融合,得到第一融合特征图;利用光照率解码器对第一融合特征图进行解码,得到待分解图像的场景光照条件信息。In some optional embodiments, the above-mentioned image decomposition model includes a shared encoder and an illumination rate decoder. The above-mentioned use of the image decomposition model to process the image to be decomposed to obtain scene illumination condition information of the image to be decomposed includes: using a shared encoder to perform feature extraction on the image to be decomposed to obtain an image feature map, and to the image feature map and the normal vector estimation model. The first scene structure feature map output by the normal vector encoder is fused to obtain a first fused feature map; the first fused feature map is decoded by an illumination rate decoder to obtain scene illumination condition information of the image to be decomposed.
因此,通过利用共享编码器将图像特征图和法向量估计模型的法向量编码器输出的第一场景结构特征图进行融合,使得图像分解模型可以利用第一场景结构特征图的结构特征信息,提高了本征图像的分解效果。Therefore, by using the shared encoder to fuse the image feature map and the first scene structure feature map output by the normal vector encoder of the normal vector estimation model, the image decomposition model can use the structure feature information of the first scene structure feature map to improve The decomposition effect of the eigenimage is obtained.
在一些可选实施例中,上述的共享编码器包括顺序连接的至少一个编码单元,每个编码单元包括法向量自适应器。上述的对图像特征图和法向量估计模型的法向量编码器输出的第一场景结构特征图进行融合,得到第一融合特征图,包括:将图像特征图输出至第一个编码单元;对于每个编码单元:利用法向量自适应器对前一编码单元输出的特征图和第一场景结构特征图进行融合,得到编码单元对应的第二融合特征图;其中,每个编码单元对应的场景结构特征图中的特征丰富度不同;基于最后一个编码单元的第二融合特征图,得到第一融合特征图。In some optional embodiments, the above-mentioned shared encoder comprises at least one coding unit connected in sequence, each coding unit comprising a normal vector adaptor. The above-mentioned fusion of the image feature map and the first scene structure feature map output by the normal vector encoder of the normal vector estimation model to obtain the first fusion feature map includes: outputting the image feature map to the first coding unit; encoding units: use the normal vector adaptor to fuse the feature map output by the previous encoding unit and the first scene structure feature map to obtain the second fusion feature map corresponding to the encoding unit; wherein, the scene structure corresponding to each encoding unit The feature richness in the feature map is different; based on the second fused feature map of the last coding unit, the first fused feature map is obtained.
因此,通过利用法向量自适应器将法向量估计模型输出的场景结构特征图和图像分解模型对待分解图像进行特征提取得到的图像特征图进行融合,使得图像分解模型后续可以利用场景结构特征图中的关于待分解图像中场景的场景结构信息,实现了将法向量估计模型得到的特征信息传递给图像分解模型来利用的效果。Therefore, by using the normal vector adaptor to fuse the scene structure feature map output by the normal vector estimation model and the image feature map obtained by extracting the features of the image to be decomposed by the image decomposition model, the image decomposition model can subsequently use the scene structure feature map. The scene structure information about the scene in the image to be decomposed realizes the effect of passing the feature information obtained by the normal vector estimation model to the image decomposition model for use.
在一些可选实施例中,上述的在利用法向量自适应器对前一编码单元输出的特征图和场景结构特征图进行融合,得到编码单元对应的第二融合特征图之前,方法还包括:对前一编码单元输出的特征图进行降采样处理;和/或,利用法向量自适应器对前一编码单元输出的特征图和场景结构特征图进行融合,得到编码单元对应的第二融合特征图,包括:利用法向量自适应器执行:将场景结构特征图调整为预设尺度的场景结构特征图,将调整后的场景结构特征图与前一编码单元输出的特征图进行串联并卷积,得到编码单元对应的第二融合特征图。In some optional embodiments, before using the normal vector adaptor to fuse the feature map output by the previous coding unit and the scene structure feature map to obtain the second fused feature map corresponding to the coding unit, the method further includes: Perform down-sampling processing on the feature map output by the previous coding unit; and/or, use a normal vector adaptor to fuse the feature map output by the previous coding unit and the scene structure feature map to obtain the second fusion feature corresponding to the coding unit image, including: using a normal vector adaptor to perform: adjusting the scene structure feature map to a scene structure feature map of a preset scale, and concatenating and convolving the adjusted scene structure feature map with the feature map output by the previous coding unit , to obtain the second fusion feature map corresponding to the coding unit.
因此,通过降采样处理,可以缩小前一编码单元输出的特征图。另外,法向量自适应器还通过场景结构特征图与前一编码单元输出的特征图进行串联并卷积,实现了对场景结构特征图和前一编码单元输出的特征图的融合。Therefore, through the downsampling process, the feature map output by the previous coding unit can be reduced. In addition, the normal vector adaptor also concatenates and convolves the scene structure feature map with the feature map output by the previous coding unit, and realizes the fusion of the scene structure feature map and the feature map output by the previous coding unit.
在一些可选实施例中,上述的利用光照率解码器对第一融合特征图进行解码,得到待分解图像的场景光照条件信息,包括:利用光照率解码器对第一融合特征图和至少一个法向量自适应器的第二融合特征图进行解码,得到待分解图像的场景光照条件信息。In some optional embodiments, using the illumination rate decoder to decode the first fusion feature map to obtain scene illumination condition information of the image to be decomposed includes: using the illumination rate decoder to decode the first fusion feature map and at least one The second fusion feature map of the normal vector adaptor is decoded to obtain scene illumination condition information of the image to be decomposed.
因此,光照率解码器通过利用第一融合特征图和法向量自适应器输出的第二融合特 征图,可以得到待分解图像的场景光照条件信息。Therefore, the illumination rate decoder can obtain the scene illumination condition information of the image to be decomposed by using the first fusion feature map and the second fusion feature map output by the normal vector adaptor.
在一些可选实施例中,上述的图像分解模型还包括反射率解码器;基于法向量信息,利用图像分解模型对待分解图像进行分解,得到待分解图像的本征图像,还包括:利用反射率解码器对第一融合特征图进行解码,得到待分解图像的反射率图像。In some optional embodiments, the above-mentioned image decomposition model further includes a reflectivity decoder; based on the normal vector information, using the image decomposition model to decompose the to-be-decomposed image to obtain an intrinsic image of the to-be-decomposed image, further comprising: using the reflectivity The decoder decodes the first fusion feature map to obtain a reflectivity image of the image to be decomposed.
因此,反射率解码器通过利用第一融合特征图,可以得到待分解图像的反射率图像。Therefore, the reflectivity decoder can obtain the reflectivity image of the image to be decomposed by using the first fusion feature map.
在一些可选实施例中,上述的利用反射率解码器对第一融合特征图进行解码,得到待分解图像的反射率图像,包括:利用反射率解码器对第一融合特征图和至少一个法向量自适应器的第二融合特征图进行解码,得到待分解图像的反射率图像。In some optional embodiments, using the reflectivity decoder to decode the first fusion feature map to obtain the reflectivity image of the image to be decomposed includes: using the reflectivity decoder to decode the first fusion feature map and at least one method The second fusion feature map of the vector adaptor is decoded to obtain a reflectivity image of the image to be decomposed.
因此,反射率解码器通过利用第一融合特征图和至少一个法向量自适应器的第二融合特征图,可以得到待分解图像的反射率图像。Therefore, the reflectivity decoder can obtain the reflectivity image of the image to be decomposed by using the first fused feature map and the second fused feature map of the at least one normal vector adaptor.
在一些可选实施例中,上述的法向量估计模型包括法向量编码器、法向量解码器和细分子网络。上述的利用法向量估计模型获取待分解图像的法向量信息,包括:利用法向量编码器对待分解图像进行编码,得到第一场景结构特征图;利用法向量解码器对第一场景结构特征图进行解码,得到解码特征图;利用细分子网络对第一场景结构特征图和解码特征图进行融合,得到待分解图像的法向量信息。In some optional embodiments, the above-mentioned normal vector estimation model includes a normal vector encoder, a normal vector decoder and a sub-network. The above-mentioned use of the normal vector estimation model to obtain the normal vector information of the image to be decomposed includes: using a normal vector encoder to encode the image to be decomposed to obtain a first scene structure feature map; using a normal vector decoder to perform the first scene structure feature map. Decode to obtain a decoded feature map; use the sub-network to fuse the first scene structure feature map and the decoded feature map to obtain normal vector information of the image to be decomposed.
因此,通过利用法向量估计模型的法向量编码器、法向量解码器和细分子网络来对待分解图像进行处理,可以获得待分解图像的法向量信息。Therefore, the normal vector information of the to-be-decomposed image can be obtained by processing the to-be-decomposed image by using the normal vector encoder, the normal vector decoder and the subdivided sub-network of the normal vector estimation model.
在一些可选实施例中,上述的利用法向量编码器对待分解图像进行编码,得到第一场景结构特征图,包括:利用法向量编码器对待分解图像进行多层编码,得到每层对应的第一场景结构特征图,其中,每层对应的第一场景结构特征图中的特征丰富度不同,最后一层编码器对应的第一场景结构特征图输出至法向量解码器。上述的利用细分子网络对第一场景结构特征图和解码特征图进行融合,得到待分解图像的法向量信息,包括:利用细分子网络执行:将每层对应的第一场景结构特征图串联得到第二场景结构特征图,并将第二场景结构特征图与解码特征图串联得到第三场景结构特征图,基于第三场景结构特征图,得到待分解图像的法向量信息。In some optional embodiments, using the normal vector encoder to encode the to-be-decomposed image to obtain the first scene structure feature map includes: using the normal vector encoder to perform multi-layer encoding on the to-be-decomposed image to obtain the first scene corresponding to each layer. A scene structure feature map, wherein the feature richness in the first scene structure feature map corresponding to each layer is different, and the first scene structure feature map corresponding to the last layer of the encoder is output to the normal vector decoder. The above-mentioned use of the sub-molecular network to fuse the first scene structure feature map and the decoding feature map to obtain the normal vector information of the image to be decomposed includes: using the sub-molecular network to perform: connecting the first scene structure feature maps corresponding to each layer in series to obtain The second scene structure feature map is concatenated with the decoded feature map to obtain a third scene structure feature map, and based on the third scene structure feature map, normal vector information of the image to be decomposed is obtained.
因此,通过对待分解图像进行多层编码,可以逐步抽取出更高维度的特征信息,使得获得待分解图像的场景的结构特征能够更加准确。Therefore, by performing multi-layer encoding on the image to be decomposed, feature information of higher dimensions can be gradually extracted, so that the structural features of the scene of the image to be decomposed can be obtained more accurately.
在一些可选实施例中,上述的法向量估计模型和图像分解模型分别训练得到的。In some optional embodiments, the above-mentioned normal vector estimation model and image decomposition model are obtained by training separately.
在一些可选实施例中,上述的在利用法向量估计模型获取待分解图像的法向量信息之前,方法还包括:利用第一样本集训练得到法向量估计模型,其中,第一样本集中的图像标注有法向量信息;利用经训练的法向量估计模型获取第二样本集中的图像的样本法向量信息,并利用第二样本集以及样本法向量信息对图像分解模型进行训练。In some optional embodiments, before using the normal vector estimation model to obtain the normal vector information of the image to be decomposed, the method further includes: using a first sample set to train to obtain a normal vector estimation model, wherein the first sample set The images are marked with normal vector information; use the trained normal vector estimation model to obtain the sample normal vector information of the images in the second sample set, and use the second sample set and the sample normal vector information to train the image decomposition model.
因此,通过对法向量估计模型进行单独的训练,以此就可以仅利用法向量样本数据来训练法向量估计模型,并以此来提高本征图像的分解效果,降低了因为缺乏本征图像样本数据而造成的对本征图像分解效果的影响。Therefore, by separately training the normal vector estimation model, only the normal vector sample data can be used to train the normal vector estimation model, so as to improve the decomposition effect of the eigenimage and reduce the problem of lack of eigenimage samples. The influence of the data on the decomposition effect of the intrinsic image.
在一些可选实施例中,上述的第二样本集包括第一子样本集和第二子样本集,利用第二样本集以及样本法向量信息对图像分解模型进行训练,包括:利用第一子样本集以及第一子样本集对应的样本法向量信息对图像分解模型进行训练,以调整图像分解模型中共享编码器和光照率解码器的参数;利用第二子样本集以及第二子样本集对应的样本法向量信息对图像分解模型进行训练,以调整图像分解模型中共享编码器和反射率解码 器的参数。In some optional embodiments, the above-mentioned second sample set includes a first sub-sample set and a second sub-sample set, and the image decomposition model is trained by using the second sample set and sample normal vector information, including: using the first sub-sample set The sample set and the sample normal vector information corresponding to the first sub-sample set are used to train the image decomposition model to adjust the parameters of the shared encoder and the illumination rate decoder in the image decomposition model; the second sub-sample set and the second sub-sample set are used. The corresponding sample normal vector information trains the image decomposition model to adjust the parameters of the shared encoder and reflectivity decoder in the image decomposition model.
因此,通过分别训练共享编码器和光照率解码器和共享编码器和反射率解码器,可以使得图像分解模型对待分解图像进行分解时,能够获得效果较好的光照率图和反射率图。Therefore, by separately training the shared encoder and the illuminance decoder and the shared encoder and the reflectance decoder, the image decomposition model can obtain better illuminance maps and reflectance maps when the image to be decomposed is decomposed.
本公开实施例第二方面提供了一种图像分解装置,该装置包括获取模块、法向量估计模块和分解模块;获取模块配置为获取待分解图像;法向量估计模块配置为利用法向量估计模型获取待分解图像的法向量信息;分解模块配置为基于法向量信息,利用图像分解模型对待分解图像进行分解,得到待分解图像的本征图像。A second aspect of the embodiments of the present disclosure provides an image decomposition device, the device includes an acquisition module, a normal vector estimation module, and a decomposition module; the acquisition module is configured to acquire an image to be decomposed; the normal vector estimation module is configured to obtain an image by using a normal vector estimation model normal vector information of the image to be decomposed; the decomposition module is configured to decompose the image to be decomposed by using an image decomposition model based on the normal vector information to obtain an intrinsic image of the image to be decomposed.
本公开实施例第三方面提供了一种电子设备,包括相互耦接的存储器和处理器,处理器用于执行存储器中存储的程序指令,以实现上述第一方面中的图像分解方法。A third aspect of the embodiments of the present disclosure provides an electronic device, including a mutually coupled memory and a processor, where the processor is configured to execute program instructions stored in the memory to implement the image decomposition method in the first aspect.
本公开实施例第四方面提供了一种计算机可读存储介质,其上存储有程序指令,程序指令被处理器执行时实现上述第一方面中的图像分解方法。A fourth aspect of the embodiments of the present disclosure provides a computer-readable storage medium on which program instructions are stored, and when the program instructions are executed by a processor, the image decomposition method in the first aspect is implemented.
本公开实施例第五方面提供了一种计算机程序,包括计算机可读代码,当所述计算机可读代码在电子设备中运行时,所述电子设备中的处理器执行配置为实现上述第一方面中的图像分解方法。A fifth aspect of the embodiments of the present disclosure provides a computer program, including computer-readable codes, when the computer-readable codes are executed in an electronic device, a processor in the electronic device executes and is configured to implement the above-mentioned first aspect The image decomposition method in .
采用上述方案,通过获取待分解图像的法向量信息,使得图像分解模型能够利用法向量信息更好地理解待分解图像中场景的环境情况,使得图像分解模型分解得到的本征图像能够与待分解图像的场景较好匹配,提高了本征图像的分解效果;另外,待分解图像的法向量信息是利用独立于图像分解模型的法向量估计模型得到,采用针对性模型,能够得到准确的法向量信息,进一步提高了后续分解得到的本征图像与待分解图像的场景的匹配度。With the above solution, by acquiring the normal vector information of the image to be decomposed, the image decomposition model can use the normal vector information to better understand the environment of the scene in the image to be decomposed, so that the intrinsic image decomposed by the image decomposition model can be decomposed with the to-be-decomposed image. The scene of the image is better matched, which improves the decomposition effect of the intrinsic image; in addition, the normal vector information of the image to be decomposed is obtained by using a normal vector estimation model independent of the image decomposition model, and the targeted model can be used to obtain accurate normal vectors information, which further improves the matching degree between the eigenimage obtained by subsequent decomposition and the scene of the image to be decomposed.
应当理解的是,以上的一般描述和后文的细节描述仅是示例性和解释性的,而非限制本申请。It is to be understood that the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the application.
附图说明Description of drawings
此处的附图被并入说明书中并构成本说明书的一部分,这些附图示出了符合本申请的实施例,并与说明书一起用于说明本申请的技术方案。The accompanying drawings, which are incorporated into and constitute a part of the specification, illustrate embodiments consistent with the present application, and together with the description, serve to explain the technical solutions of the present application.
图1是本公开实施例的图像分解方法的流程示意图一;FIG. 1 is a schematic flowchart 1 of an image decomposition method according to an embodiment of the present disclosure;
图2是本公开实施例的图像分解方法的流程示意图二;2 is a second schematic flowchart of an image decomposition method according to an embodiment of the present disclosure;
图3是本公开实施例的图像分解方法中利用法向量估计模型获取待分解图像的法向量信息的流程示意图;3 is a schematic flowchart of obtaining normal vector information of an image to be decomposed by using a normal vector estimation model in an image decomposition method according to an embodiment of the present disclosure;
图4是本公开实施例的图像分解方法中法向量估计模型的框架示意图;4 is a schematic diagram of a framework of a normal vector estimation model in an image decomposition method according to an embodiment of the present disclosure;
图5是本公开实施例的图像分解方法中得到第一融合特征图的流程示意图;5 is a schematic flowchart of obtaining a first fusion feature map in an image decomposition method according to an embodiment of the present disclosure;
图6是本公开实施例的图像分解方法中图像分解模型的框架示意图;6 is a schematic diagram of a framework of an image decomposition model in an image decomposition method according to an embodiment of the present disclosure;
图7是本公开实施例的图像分解装置的框架示意图;FIG. 7 is a schematic frame diagram of an image decomposition apparatus according to an embodiment of the present disclosure;
图8是本公开实施例的电子设备的框架示意图;8 is a schematic diagram of a framework of an electronic device according to an embodiment of the present disclosure;
图9是本公开实施例的计算机可读存储介质的框架示意图。FIG. 9 is a schematic diagram of a framework of a computer-readable storage medium according to an embodiment of the present disclosure.
具体实施方式detailed description
下面结合说明书附图,对本申请实施例的方案进行详细说明。The solutions of the embodiments of the present application will be described in detail below with reference to the accompanying drawings.
以下描述中,为了说明而不是为了限定,提出了诸如特定***结构、接口、技术之类的具体细节,以便透彻理解本申请。In the following description, for purposes of illustration and not limitation, specific details such as specific system structures, interfaces, techniques, etc. are set forth in order to provide a thorough understanding of the present application.
本文中术语“和/或”,仅仅是一种描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B这三种情况。另外,本文中字符“/”,一般表示前后关联对象是一种“或”的关系。此外,本文中的“多”表示两个或者多于两个。另外,本文中术语“至少一种”表示多种中的任意一种或多种中的至少两种的任意组合,例如,包括A、B、C中的至少一种,可以表示包括从A、B和C构成的集合中选择的任意一个或多个元素。The term "and/or" in this article is only an association relationship to describe the associated objects, indicating that there can be three kinds of relationships, for example, A and/or B, it can mean that A exists alone, A and B exist at the same time, and A and B exist independently B these three cases. In addition, the character "/" in this document generally indicates that the related objects are an "or" relationship. Also, "multiple" herein means two or more than two. In addition, the term "at least one" herein refers to any combination of any one of the plurality or at least two of the plurality, for example, including at least one of A, B, and C, and may mean including from A, B, and C. Any one or more elements selected from the set of B and C.
本征图像分解旨在从单张输入图像估计场景的光照率与材质的反射率,也即是得到光照率图像和反射率图像。在本公开实施例中,用于实现图像分解方法的设备可以是计算机或是服务器等设备。在一些可能的实现方式中,该图像分解方法可以通过处理器调用存储器中存储的计算机可读指令的方式来实现。Intrinsic image decomposition aims to estimate the illumination rate of the scene and the reflectivity of the material from a single input image, that is, to obtain the illumination rate image and the reflectivity image. In this embodiment of the present disclosure, a device for implementing the image decomposition method may be a computer or a server or other device. In some possible implementations, the image decomposition method may be implemented by a processor invoking computer-readable instructions stored in a memory.
图1是本公开实施例的图像分解方法的流程示意图一。如图1所示,可以包括如下步骤:FIG. 1 is a schematic flow chart 1 of an image decomposition method according to an embodiment of the present disclosure. As shown in Figure 1, the following steps may be included:
步骤S11:获取待分解图像。Step S11: Acquire an image to be decomposed.
待分解图像是作为原始输入图像,用以分解出与之对应的本征图像。待分解图像可以是彩色图像,或是深度图像等。The image to be decomposed is used as the original input image to decompose the corresponding intrinsic image. The image to be decomposed may be a color image, or a depth image or the like.
步骤S12:利用法向量估计模型获取待分解图像的法向量信息。Step S12: Obtain normal vector information of the image to be decomposed by using the normal vector estimation model.
法向量估计模型是基于深度学习搭建的神经网络,用于对待分解图像提取特征信息,以获得待分解图像的法向量信息。法向量估计模型对待分解图像提取特征信息可以得到若干张特征图。法向量信息例如是待分解图像中的每个像素点的法向量信息,通过法向量信息,可以获得待输入图像中的环境信息,例如是待分解图像中场景的结构信息。The normal vector estimation model is a neural network built based on deep learning, which is used to extract feature information from the image to be decomposed to obtain the normal vector information of the image to be decomposed. The normal vector estimation model can obtain several feature maps by extracting feature information from the image to be decomposed. The normal vector information is, for example, the normal vector information of each pixel in the image to be decomposed. Through the normal vector information, environmental information in the image to be input can be obtained, such as the structure information of the scene in the image to be decomposed.
在一个可选实施例中,法向量估计模型是一个全卷积神经网络,可以由粗粒度到细粒度的双层级网络结构组成。该双层网络可以融合多尺度(不同特征数量、不同图像分辨率)的特征图,从而获得分辨率较高,细节较丰富,图像中的物体边界较准确的本征图像。In an optional embodiment, the normal vector estimation model is a fully convolutional neural network, which can be composed of a coarse-grained to fine-grained two-level network structure. The two-layer network can fuse feature maps of multiple scales (different feature numbers, different image resolutions) to obtain eigenimages with higher resolution, richer details, and more accurate object boundaries in the image.
步骤S13:基于法向量信息,利用图像分解模型对待分解图像进行分解,得到待分解图像的本征图像。Step S13: Based on the normal vector information, use an image decomposition model to decompose the image to be decomposed to obtain an intrinsic image of the image to be decomposed.
在得到待分解图像的法向量信息后,图像分解模型就可以利用法向量信息,来对待输入图像进行分解。示例性的,图像分解模型可以基于法向量信息中每个像素的法向量信息以及法向量信息中包含的场景的结构信息,来对待分解图像进行分解,得到本征图像,即是得到光照率图像和反射率图像。After obtaining the normal vector information of the image to be decomposed, the image decomposition model can use the normal vector information to decompose the input image. Exemplarily, the image decomposition model can decompose the to-be-decomposed image based on the normal vector information of each pixel in the normal vector information and the structure information of the scene contained in the normal vector information to obtain an intrinsic image, that is, to obtain an illumination rate image. and reflectance images.
在一个可选实施例中,图像分解模型是一个全卷积神经网络。In an optional embodiment, the image decomposition model is a fully convolutional neural network.
因此,通过获取待分解图像的法向量信息,使得图像分解模型能够利用法向量信息更好地理解待分解图像中场景的环境情况,使得图像分解模型分解得到的本征图像能够与待分解图像的场景较好匹配,提高了本征图像的分解效果;另外,待分解图像的法向量信息是利用独立于图像分解模型的法向量估计模型得到,采用针对性模型,能够得到 准确的法向量信息,进一步提高了后续分解得到的本征图像与待分解图像的场景的匹配度。Therefore, by obtaining the normal vector information of the image to be decomposed, the image decomposition model can use the normal vector information to better understand the environmental conditions of the scene in the image to be decomposed, so that the intrinsic image decomposed by the image decomposition model can be decomposed with the image to be decomposed. The scene is better matched, which improves the decomposition effect of the intrinsic image; in addition, the normal vector information of the image to be decomposed is obtained by using a normal vector estimation model independent of the image decomposition model, and the targeted model can be used to obtain accurate normal vector information. The matching degree of the eigenimage obtained by subsequent decomposition and the scene of the image to be decomposed is further improved.
图2是本公开实施例的图像分解方法的流程示意图二。如图2所示,可以包括如下步骤:FIG. 2 is a second schematic flowchart of an image decomposition method according to an embodiment of the present disclosure. As shown in Figure 2, the following steps may be included:
步骤S21:获取待分解图像。Step S21: Acquire the image to be decomposed.
请参阅上述步骤S11,此处不再赘述。Please refer to the above step S11, which will not be repeated here.
步骤S22:利用法向量估计模型获取待分解图像的法向量信息。Step S22: Obtain normal vector information of the image to be decomposed by using the normal vector estimation model.
请参阅上述步骤S12,此处不再赘述。Please refer to the above step S12, which will not be repeated here.
在一个可选的实施例中,法向量信息为包含待分解图像不同像素的法向量的法向量图,也即待分解图像中的每个像素都有对应的法向量。In an optional embodiment, the normal vector information is a normal vector map including normal vectors of different pixels of the image to be decomposed, that is, each pixel in the image to be decomposed has a corresponding normal vector.
在一个可选的实施例中,法向量估计模型包括法向量编码器、法向量解码器和细分子网络。法向量编码器可以对待分解图像进行特征提取,法向量解码器可以解码特征并输出特征图,细分子网络可以细化解码器的输出。In an optional embodiment, the normal vector estimation model includes a normal vector encoder, a normal vector decoder and a sub-network. The normal vector encoder can perform feature extraction on the decomposed image, the normal vector decoder can decode the features and output a feature map, and the subdivision sub-network can refine the output of the decoder.
图3是本公开实施例的图像分解方法中利用法向量估计模型获取待分解图像的法向量信息的流程示意图。如图3所示,利用法向量估计模型获取待分解图像的法向量信息,可以包括以下步骤S221-步骤S223。FIG. 3 is a schematic flowchart of obtaining normal vector information of an image to be decomposed by using a normal vector estimation model in an image decomposition method according to an embodiment of the present disclosure. As shown in FIG. 3 , using the normal vector estimation model to obtain the normal vector information of the image to be decomposed may include the following steps S221 to S223 .
步骤S221:利用法向量编码器对待分解图像进行编码,得到第一场景结构特征图。Step S221 : use a normal vector encoder to encode the image to be decomposed to obtain a first scene structure feature map.
可以利用法向量估计模型的法向量编码器来对待分解图像进行编码,提取待分解图像中的特征信息。法向量编码器对待分解图像进行编码得到的特征信息例如是待分解图像中场景的结构特征信息,结构特征信息例如包括平面信息与物体边界信息。最终,法向量编码器可以输出第一场景结构特征图,也即是关于待分解图像中场景的结构特征图。The normal vector encoder of the normal vector estimation model can be used to encode the to-be-decomposed image, and to extract feature information in the to-be-decomposed image. The feature information obtained by encoding the image to be decomposed by the normal vector encoder is, for example, structural feature information of the scene in the image to be decomposed, and the structural feature information includes, for example, plane information and object boundary information. Finally, the normal vector encoder can output the first scene structure feature map, that is, the structure feature map about the scene in the image to be decomposed.
在法向量编码器具有多层结构的情况下,可以利用法向量编码器对待分解图像进行多层编码(即进行特征提取),每层编码器得到的特征图为第一场景结构特征图。例如,当编码器包括4层编码块来进行编码时,第一层编码块会对待分解图像进行编码,输出第一场景结构特征图。第二层编码块以第一层编码块输出的第一场景结构特征图为输入,再次进行编码,输出对应的第一场景结构特征图。另外,还可以设置法向量编码器的每一层编码块输出的第一场景结构特征图中的特征丰富度不同。特征丰富度可以包括第一场景结构特征图的分辨率以及特征信息的维度等。此时,最后一层编码块对应的第一场景结构特征图输出至法向量解码器。通过对待分解图像进行多层编码,可以逐步提取出更高维度的特征信息,使得获得待分解图像的场景的结构特征能够更加准确。When the normal vector encoder has a multi-layer structure, the normal vector encoder can be used to perform multi-layer encoding (ie, feature extraction) on the image to be decomposed, and the feature map obtained by each layer of the encoder is the first scene structure feature map. For example, when the encoder includes four layers of encoding blocks for encoding, the first layer of encoding blocks encodes the to-be-decomposed image and outputs the first scene structure feature map. The second-layer coding block takes the first scene structure feature map output by the first-layer coding block as input, performs coding again, and outputs the corresponding first scene structure feature map. In addition, the feature richness in the first scene structure feature map output by the coding block of each layer of the normal vector encoder can also be set to be different. The feature richness may include the resolution of the first scene structure feature map, the dimension of feature information, and the like. At this time, the first scene structure feature map corresponding to the coding block of the last layer is output to the normal vector decoder. By performing multi-layer coding on the image to be decomposed, feature information of higher dimensions can be gradually extracted, so that the structural features of the scene of the image to be decomposed can be obtained more accurately.
步骤S222:利用法向量解码器对第一场景结构特征图进行解码,得到解码特征图。Step S222: Use the normal vector decoder to decode the first scene structure feature map to obtain the decoded feature map.
在利用法向量编码器对待分解图像进行编码并输出第一场景结构特征图后,可以利用法向量解码器对第一场景结构特征图进行解码,得到解码特征图。示例性的,在法向量解码器对第一场景结构特征图进行解码时,可以是针对法向量编码器提取的特征信息进行解码,并且重建出预设维度的、预设分辨率的解码特征图。例如,解码特征图的中的特征信息的维度可以是64维度,分辨率为待分解图像的1/2。After using the normal vector encoder to encode the image to be decomposed and outputting the first scene structure feature map, the normal vector decoder may be used to decode the first scene structure feature map to obtain the decoded feature map. Exemplarily, when the normal vector decoder decodes the feature map of the first scene structure, it can decode the feature information extracted by the normal vector encoder, and reconstruct a decoded feature map of a preset dimension and a preset resolution. . For example, the dimension of the feature information in the decoded feature map may be 64 dimensions, and the resolution is 1/2 of the image to be decomposed.
在一个可选实施例中,当法向量解码器具有多层结构的情况下,法向量解码器同样也会对第一场景结构特征图进行多层解码,第一层解码器针对第一场景结构特征图进行解码,输出对应的预解码特征图。第二层解码器针对第一层解码器输出的解码特征图进 行解码,输出对应的预解码特征图。以此类推,最后一层输出的预解码特征图,即为解码特征图。In an optional embodiment, when the normal vector decoder has a multi-layer structure, the normal vector decoder also performs multi-layer decoding on the feature map of the first scene structure, and the first layer decoder also performs multi-layer decoding on the first scene structure. The feature map is decoded, and the corresponding pre-decoded feature map is output. The second layer decoder decodes the decoded feature map output by the first layer decoder, and outputs the corresponding pre-decoded feature map. By analogy, the pre-decoded feature map output by the last layer is the decoded feature map.
步骤S223:利用细分子网络对第一场景结构特征图和解码特征图进行融合,得到待分解图像的法向量信息。Step S223: Fusion of the first scene structure feature map and the decoded feature map by using a fine molecular network to obtain normal vector information of the image to be decomposed.
在利用法向量编码器进行解码后,为了进一步细化法向量编码器输出的特征信息,以及获得更加准确的待分解图像的场景结构信息,可以利用细分子网络对第一场景结构特征图和解码特征图进行融合,得到待分解图像的法向量信息。示例性的,可以将第一场景结构特征图中的特征信息和解码特征图的特征信息进行融合,得到待分解图像的法向量信息。例如,第一场景结构特征图中的特征信息和解码特征图的特征信息均为64维,融合后得到的法向量信息可以为128维。After the normal vector encoder is used for decoding, in order to further refine the feature information output by the normal vector encoder and obtain more accurate scene structure information of the image to be decomposed, the subdivision network can be used to decode the first scene structure feature map and decoding The feature maps are fused to obtain the normal vector information of the image to be decomposed. Exemplarily, the feature information in the first scene structure feature map and the feature information in the decoded feature map may be fused to obtain normal vector information of the image to be decomposed. For example, the feature information in the first scene structure feature map and the feature information in the decoded feature map are both 64-dimensional, and the normal vector information obtained after fusion may be 128-dimensional.
在一个可选实施例中,法向量信息为包含待分解图像不同像素的法向量的法向量图,也即待分解图像中的每个像素都有对应的法向量。In an optional embodiment, the normal vector information is a normal vector map including normal vectors of different pixels of the image to be decomposed, that is, each pixel in the image to be decomposed has a corresponding normal vector.
在一个可选实施例中,在法向量编码器具有多层结构的情况下,可以利用细分子网络将每层对应的第一场景结构特征图串联得到第二场景结构特征图,并将第二场景结构特征图与解码特征图串联得到第三场景结构特征图。In an optional embodiment, in the case where the normal vector encoder has a multi-layer structure, the sub-network can be used to concatenate the first scene structure feature maps corresponding to each layer to obtain the second scene structure feature map, and the second scene structure feature map can be obtained The scene structure feature map is concatenated with the decoding feature map to obtain a third scene structure feature map.
在一些可选实施例中,也可以设置为利用法向量编码器的部分编码层输出的第一场景结构特征图来进行串联。In some optional embodiments, it may also be configured to use the first scene structure feature map output by the partial encoding layer of the normal vector encoder to perform concatenation.
在一个可能的实施方式中,可以先利用细化子网络处理每一层编码器输出的第一场景结构特征图,使得每个第二场景结构特征图的特征维度和分辨率相同。在得到第三场景结构特征图后,细化子网络可以基于第三场景结构特征图的特征信息,进行进一步的解码,以得到待分解图像的法向量信息,法向量信息例如是法向量图。In a possible implementation, a refinement sub-network may be used to process the first scene structure feature map output by each layer of the encoder, so that each second scene structure feature map has the same feature dimension and resolution. After obtaining the third scene structure feature map, the refinement sub-network may further decode based on the feature information of the third scene structure feature map to obtain normal vector information of the image to be decomposed, such as a normal vector map.
图4是本公开实施例的图像分解方法中法向量估计模型的框架示意图。如图4所示,在一个可选实施例中,法向量估计模型400包括:法向量编码器401、法向量解码器402和细化子网络403。FIG. 4 is a schematic diagram of a framework of a normal vector estimation model in an image decomposition method according to an embodiment of the present disclosure. As shown in FIG. 4 , in an optional embodiment, the normal vector estimation model 400 includes: a normal vector encoder 401 , a normal vector decoder 402 and a refinement sub-network 403 .
法向量编码器401至少包括1个初始卷积块4011(记为conv1,可包括三层卷积层与一个最大池化层),和4个包含有挤压与激励块(Squeeze-and-Excitation block,SE block)的编码块4012。初始卷积块4011可以对待分解图像进行初步的编码,向编码块4012输出特征图。编码块4012在逐步抽取更高维度的特征的同时,压缩特征图的分辨率至原始输入图像的1/4,1/8,1/16,1/32。每一个编码块4012都会输出第一场景结构特征图,最后一个编码块4012输出的第一场景结构特征图会输出至法向量解码器402。The normal vector encoder 401 includes at least one initial convolution block 4011 (referred to as conv1, which may include three convolution layers and a maximum pooling layer), and four including squeeze-and-excitation blocks (Squeeze-and-Excitation block, SE block) coding block 4012. The initial convolution block 4011 can perform preliminary coding on the to-be-decomposed image, and output the feature map to the coding block 4012 . The encoding block 4012 compresses the resolution of the feature map to 1/4, 1/8, 1/16, and 1/32 of the original input image while gradually extracting features of higher dimensions. Each encoding block 4012 outputs the first scene structure feature map, and the first scene structure feature map output by the last encoding block 4012 is output to the normal vector decoder 402 .
法向量解码器402包括1个卷积块4021(记为conv2)和4个上投射(up-projection)块4022(记为上投射块5至上投射块8),这4个up-projection块4022会逐步解码特征并重建出维度为64、分辨率为待分解图像1/2的解码特征图。The normal vector decoder 402 includes a convolution block 4021 (denoted as conv2) and 4 up-projection blocks 4022 (denoted as up-projection block 5 to up-projection block 8). These 4 up-projection blocks 4022 The features are decoded step by step and a decoded feature map with a dimension of 64 and a resolution of 1/2 of the image to be decomposed is reconstructed.
细化子网络403包括4个up-projection块4031(记为上投射块1至上投射块4)和4个卷积层4032(记为conv3至conv6)。其中,使用跳跃链接(skip-connection)和up-projection块4031串联(concatenate)由编码块4012抽取得到的第一场景特征图,得到的即为第二场景结构特征图。再将第二场景结构特征图和解码特征图串联,得到第三场景结构特征图。进而再利用4个卷积层4032进行逐层解码,最后得到待分解图像的法向量信息,即法向量图。The refinement sub-network 403 includes 4 up-projection blocks 4031 (denoted as up-projection block 1 to up-projection block 4) and 4 convolutional layers 4032 (denoted as conv3 to conv6). The first scene feature map extracted by the encoding block 4012 is concatenated using the skip-connection and the up-projection block 4031 to obtain the second scene structure feature map. The second scene structure feature map and the decoded feature map are then concatenated to obtain a third scene structure feature map. Then, four convolution layers 4032 are used to perform layer-by-layer decoding, and finally the normal vector information of the image to be decomposed, that is, the normal vector map, is obtained.
在得到待分解图像的法向量信息以后,可以利用得到的法向量信息对待分解图像进 行分解,得到待分解图像的本征图像。在需要得到光照率图像时,上述的步骤“基于法向量信息,利用图像分解模型对待分解图像进行分解,得到待分解图像的本征图像”,包括以下步骤。After obtaining the normal vector information of the image to be decomposed, the image to be decomposed can be decomposed by using the obtained normal vector information to obtain the intrinsic image of the image to be decomposed. When an illuminance image needs to be obtained, the above-mentioned steps of "decomposing an image to be decomposed by using an image decomposition model based on normal vector information to obtain an intrinsic image of the image to be decomposed" include the following steps.
步骤S23:利用图像分解模型对待分解图像进行处理,得到待分解图像的场景光照条件信息。Step S23: Use the image decomposition model to process the image to be decomposed to obtain scene lighting condition information of the image to be decomposed.
图像分解模型例如可以是全卷积神经网络。图像分解模型可以对待分解图像进行特征提取的操作,得到待分解图像的场景光照条件信息。例如,场景光照条件信息可以理解为待分解图像中的场景的光照情况。示例性的,场景光照条件信息为包含待分解图像不同像素的法向量自适应向量的法向量自适应图。法向量自适应图可以用于编码场景光照条件。The image decomposition model can be, for example, a fully convolutional neural network. The image decomposition model can perform a feature extraction operation on the image to be decomposed, and obtain scene illumination condition information of the image to be decomposed. For example, the scene illumination condition information can be understood as the illumination condition of the scene in the image to be decomposed. Exemplarily, the scene lighting condition information is a normal vector adaptation map including normal vector adaptation vectors of different pixels of the image to be decomposed. Normal vector adaptation maps can be used to encode scene lighting conditions.
在一个可选实施例中,图像分解模型包括共享编码器、光照率解码器和反射率解码器。利用图像分解模型对待分解图像进行处理,得到待分解图像的场景光照条件信息,具体可以包括以下步骤:In an optional embodiment, the image decomposition model includes a shared encoder, an illumination rate decoder, and a reflectivity decoder. Use the image decomposition model to process the image to be decomposed to obtain scene lighting condition information of the image to be decomposed, which may specifically include the following steps:
步骤S231:利用共享编码器对待分解图像进行特征提取得到图像特征图,并对图像特征图和法向量估计模型的法向量编码器输出的第一场景结构特征图进行融合,得到第一融合特征图。Step S231: use the shared encoder to perform feature extraction on the image to be decomposed to obtain an image feature map, and fuse the image feature map and the first scene structure feature map output by the normal vector encoder of the normal vector estimation model to obtain a first fusion feature map .
共享编码器提取的特征信息会同时用于得到光照率图像和反射率图像。融合后得到的第一融合特征图,可以包括待分解图像中场景的结构特征信息以及其他特征信息。The feature information extracted by the shared encoder is used to obtain both the illuminance image and the reflectance image. The first fusion feature map obtained after fusion may include structural feature information and other feature information of the scene in the image to be decomposed.
在一个可能的实施方式中,共享编码器包括顺序连接的至少一个编码单元,每个编码单元包括法向量自适应器(Normal Feature Adapter,NFA)。In a possible implementation, the shared encoder includes at least one coding unit connected in sequence, and each coding unit includes a normal vector adaptor (Normal Feature Adapter, NFA).
图5是本公开实施例的图像分解方法中得到第一融合特征图的流程示意图。如图5所示,对图像特征图和法向量估计模型的法向量编码器输出的场景结构特征图进行融合,得到第一融合特征图,具体可以包括以下步骤S2311-步骤S2313。FIG. 5 is a schematic flowchart of obtaining a first fusion feature map in an image decomposition method according to an embodiment of the present disclosure. As shown in FIG. 5 , the image feature map and the scene structure feature map output by the normal vector encoder of the normal vector estimation model are fused to obtain the first fused feature map, which may specifically include the following steps S2311 to S2313 .
步骤S2311:将图像特征图输出至第一个编码单元。Step S2311: Output the image feature map to the first coding unit.
首先,图像分解模型的其他编码器可以先对待分解图像进行特征提取,获得图像特征图。然后,将图像特征图输出至第一个编码单元。由第一个编码单元对图像特征图进行进一步的处理。First, other encoders of the image decomposition model can first perform feature extraction on the image to be decomposed to obtain an image feature map. Then, output the image feature map to the first coding unit. The image feature map is further processed by the first coding unit.
步骤S2312:每个编码单元利用法向量自适应器对前一编码单元输出的特征图和场景结构特征图进行融合,得到编码单元对应的第二融合特征图;其中,每个编码单元对应的场景结构特征图中的特性丰富度不同。Step S2312: each coding unit uses a normal vector adaptor to fuse the feature map and the scene structure feature map output by the previous coding unit to obtain the second fusion feature map corresponding to the coding unit; wherein, the scene corresponding to each coding unit Feature richness differs in structural feature maps.
编码单元获取到图像特征图后,可以利用法向量自适应器对前一编码单元输出的特征图和场景结构特征图进行融合,得到编码单元对应的第二融合特征图。其中,每个编码单元对应的场景结构特征图中的特征丰富度不同。特征丰富度不同可以理解为场景结构特征图的分辨率、特征信息的维度不同,既可以是分辨率、特征信息的维度的其中一者不同,也可以是二者均不相同。After the coding unit obtains the image feature map, a normal vector adaptor may be used to fuse the feature map output by the previous coding unit and the scene structure feature map to obtain a second fused feature map corresponding to the coding unit. Among them, the feature richness in the scene structure feature map corresponding to each coding unit is different. Different feature richness can be understood as the resolution of the scene structure feature map and the dimension of the feature information.
对于第一个编码单元,其获取的是由其他卷积块进行特征提取后的图像特征图。对于第二个编码单元,其获取的图像特征图就是第一个编码单元输出的第二融合特征图。For the first coding unit, it obtains the image feature map after feature extraction by other convolution blocks. For the second coding unit, the acquired image feature map is the second fusion feature map output by the first coding unit.
当法向量估计模型的法向量编码器仅有一层时,意味着法向量编码器仅输出一张第一场景结构特征图,此时所有的编码单元都可以将该唯一的第一场景结构特征图与上一编码单元输出的特征图进行融合。当法向量估计模型的法向量编码器有多层时,则可以 将多层法向量编码器的输出的第一场景结构特征图,分别与编码单元输出的特征图进行融合,例如,将由第一层的法向量编码器得到的第一场景结构特征图输出至第一个编码单元,将由第二层的法向量编码器得到的第一场景结构特征图输出至第二个编码单元,使得第二个编码单元能够利用上一编码单元输出的特征图与由第二层的法向量编码器得到的第一场景结构特征图进行融合。When the normal vector encoder of the normal vector estimation model has only one layer, it means that the normal vector encoder only outputs a first scene structure feature map. At this time, all coding units can use the unique first scene structure feature map. It is fused with the feature map output by the previous coding unit. When the normal vector encoder of the normal vector estimation model has multiple layers, the first scene structure feature map output by the multi-layer normal vector encoder can be fused with the feature map output by the coding unit. The first scene structure feature map obtained by the normal vector encoder of the layer is output to the first coding unit, and the first scene structure feature map obtained by the normal vector encoder of the second layer is output to the second coding unit, so that the second Each coding unit can use the feature map output by the previous coding unit to fuse with the first scene structure feature map obtained by the normal vector encoder of the second layer.
在一个可能的实施方式中,法向量自适应器对前一编码单元输出的特征图和场景结构特征图进行融合,得到编码单元对应的第二融合特征图,包括:法向量自适应器将场景结构特征图调整为预设尺度的场景结构特征图,例如是调整场景结构特征图的分辨率和特征信息的维度。法向量自适应器再将调整后的场景结构特征图与前一编码单元输出的特征图进行串联并卷积,得到编码单元对应的第二融合特征图。例如,第二个编码单元的法向量自适应器可以将第一编码单元输出的第二融合特征图和向其输入的场景结构特征图进行串联并卷积,以此得到与第二个编码单元对应的第二融合特征图。In a possible implementation, the normal vector adaptor fuses the feature map output by the previous coding unit and the scene structure feature map to obtain a second fused feature map corresponding to the coding unit, including: the normal vector adaptor fuses the scene The structure feature map is adjusted to a scene structure feature map of a preset scale, for example, the resolution of the scene structure feature map and the dimension of feature information are adjusted. The normal vector adaptor then concatenates and convolves the adjusted scene structure feature map and the feature map output by the previous coding unit to obtain a second fusion feature map corresponding to the coding unit. For example, the normal vector adaptor of the second coding unit can concatenate and convolve the second fusion feature map output by the first coding unit and the scene structure feature map input to it, so as to obtain a The corresponding second fusion feature map.
因此,法向量自适应器还通过场景结构特征图与前一编码单元输出的特征图进行串联并卷积,实现了对场景结构特征图和前一编码单元输出的特征图的融合。Therefore, the normal vector adaptor also concatenates and convolves the scene structure feature map with the feature map output by the previous coding unit, so as to realize the fusion of the scene structure feature map and the feature map output by the previous coding unit.
在一个可选实施例中,对于每一编码单元,可以在利用法向量自适应器对前一编码单元输出的特征图和场景结构特征图进行融合,得到编码单元对应的第二融合特征图之前,对前一编码单元输出的特征图进行降采样处理。例如,第二编码单元对第一单元输出的第二融合特征图进行降采样处理。通过降采样的处理,可以缩小第二融合特征图,使得第二融合特征图的满足要求。In an optional embodiment, for each coding unit, before using the normal vector adaptor to fuse the feature map and the scene structure feature map output by the previous coding unit to obtain the second fused feature map corresponding to the coding unit , and perform down-sampling processing on the feature map output by the previous coding unit. For example, the second encoding unit performs down-sampling processing on the second fusion feature map output by the first unit. Through the down-sampling process, the second fusion feature map can be reduced so that the second fusion feature map meets the requirements.
步骤S2313:基于最后一个编码单元的第二融合特征图,得到第一融合特征图。Step S2313: Obtain a first fused feature map based on the second fused feature map of the last coding unit.
在一个可选实施例中,在图像分解模型的共享编码器最后一层不是最后一个编码单元的情况下,即共享编码器在最后一个编码单元之后,还有若干个编码块,用于在最后一个编码单元输出第二融合特征图以后,继续对其进行编码,来对融合的特征信息进行进一步地处理。共享编码器最后一层处理后输出的图即为第一融合特征图。例如,可以对最后一个编码单元输出第二融合特征图进行降采样处理,进一步的缩小第二融合特征图,然后再次利用编码块进行编码,来提取特征信息。此时,输出的特征图即为第一融合特征图。In an optional embodiment, in the case where the last layer of the shared encoder of the image decomposition model is not the last coding unit, that is, after the last coding unit of the shared encoder, there are still several coding blocks for the last coding unit. After a coding unit outputs the second fused feature map, it continues to encode it to further process the fused feature information. The image output after the last layer of shared encoder processing is the first fusion feature map. For example, the second fused feature map output by the last coding unit can be down-sampled, the second fused feature map can be further reduced, and then the coding block is used for coding again to extract feature information. At this time, the output feature map is the first fusion feature map.
在一个可选实施例中,也可以直接将第二融合特征图作为第一融合特征图。In an optional embodiment, the second fusion feature map may also be directly used as the first fusion feature map.
因此,通过利用法向量自适应器将法向量估计模型输出的场景结构特征图和图像分解模型对待分解图像进行特征提取得到的图像特征图进行融合,使得图像分解模型后续可以利用场景结构特征图中的关于待分解图像中场景的场景结构信息,实现了将法向量估计模型得到的特征信息传递给图像分解模型来利用的效果,提升了本征图像的分解效果。Therefore, by using the normal vector adaptor to fuse the scene structure feature map output by the normal vector estimation model and the image feature map obtained by extracting the features of the image to be decomposed by the image decomposition model, the image decomposition model can subsequently use the scene structure feature map. The scene structure information about the scene in the image to be decomposed realizes the effect of passing the feature information obtained by the normal vector estimation model to the image decomposition model for use, and improves the decomposition effect of the intrinsic image.
得到第一融合特征图后,可以继续利用第一融合特征图来对待分解图像进行分解,以得到本征图像。After the first fusion feature map is obtained, the image to be decomposed can be decomposed by using the first fusion feature map to obtain an intrinsic image.
步骤S232:利用光照率解码器对第一融合特征图进行解码,得到待分解图像的场景光照条件信息。Step S232: Decode the first fusion feature map by using an illumination rate decoder to obtain scene illumination condition information of the image to be decomposed.
因为第一融合特征图包含了场景的结构特征信息以及待分解图像的其他特征信息,因此可以利用光照率解码器对第一融合特征图进行解码,以获得待分解图像的场景光照条件信息,例如是获得关于待分解图像中每一个像素的法向量自适应向量的法向量自适 应图。Because the first fused feature map contains the structural feature information of the scene and other feature information of the image to be decomposed, an illumination rate decoder can be used to decode the first fused feature map to obtain the scene illumination condition information of the image to be decomposed, for example is to obtain the normal vector adaptation map of the normal vector adaptation vector of each pixel in the image to be decomposed.
在一个可选实施例中,法向量自适应向量定义如下:用x、y、z表示法向量自适应向量的三个分量,In an optional embodiment, the normal vector adaptation vector is defined as follows: the three components of the normal vector adaptation vector are represented by x, y, and z,
Figure PCTCN2021114023-appb-000001
Figure PCTCN2021114023-appb-000001
其中,
Figure PCTCN2021114023-appb-000002
是光向量分布函数,
Figure PCTCN2021114023-appb-000003
是球坐标,
Figure PCTCN2021114023-appb-000004
Figure PCTCN2021114023-appb-000005
in,
Figure PCTCN2021114023-appb-000002
is the light vector distribution function,
Figure PCTCN2021114023-appb-000003
are spherical coordinates,
Figure PCTCN2021114023-appb-000004
Figure PCTCN2021114023-appb-000005
在一个可选实施例中,所述利用所述光照率解码器对所述第一融合特征图进行解码,得到所述待分解图像的场景光照条件信息,包括:利用光照率解码器对第一融合特征图和至少一个法向量自适应器的第二融合特征图进行解码,得到待分解图像的场景光照条件信息。光照率解码器可以同时获取图像分解模型的共享编码器最后一层输出的第一融合特征图,以及至少一个法向量自适应器的第二融合特征图,并对这两个特征图进行解码,以得到待分解图像的场景光照条件信息。当共享编码器的最后一层是编码单元时,则可以是获取最后一个编码单元输出的第一融合特征图,以及其他编码单元的法向量自适应器输出的第二融合特征图进行解码。In an optional embodiment, the decoding the first fusion feature map by the illumination rate decoder to obtain the scene illumination condition information of the image to be decomposed includes: using the illumination rate decoder to decode the first fusion feature map by using the illumination rate decoder The fusion feature map and the second fusion feature map of at least one normal vector adaptor are decoded to obtain scene illumination condition information of the image to be decomposed. The illumination rate decoder can simultaneously obtain the first fused feature map output by the last layer of the shared encoder of the image decomposition model, and the second fused feature map of at least one normal vector adaptor, and decode the two feature maps, In order to obtain the scene lighting condition information of the image to be decomposed. When the last layer of the shared encoder is a coding unit, the first fused feature map output by the last coding unit and the second fused feature map output by the normal vector adaptors of other coding units can be obtained for decoding.
在一个可能的实施方式中,共享编码器的编码单元的数量有多个,光照率解码器可以同时获取多个编码单元输出的第二融合特征图来进行解码。例如,光照率解码器获取了3个连接的编码单元输出的第二融合特征图,则可以在光照率解码器中,设置3个连接的卷积层(例如是up-projection块)分别获取3个编码单元输出的第二融合特征图,来进行解码。例如,光照率解码器的第一个卷积层可以获取共享编码器输出的第一融合特征图和第一个法向量自适应器输出的第二融合特征图来进行解码,并输出特征图。光照率解码器的第二个卷积层可以利用上一个卷积层输出的特征图以及第二个法向量自适应器输出的第二融合特征图来进行解码。In a possible implementation manner, there are multiple coding units of the shared encoder, and the illumination rate decoder can simultaneously acquire the second fused feature maps output by the multiple coding units for decoding. For example, if the illumination rate decoder obtains the second fusion feature map output by 3 connected coding units, then in the illumination rate decoder, you can set 3 connected convolutional layers (such as up-projection blocks) to obtain 3 The second fusion feature map output by each coding unit is used for decoding. For example, the first convolutional layer of the illumination rate decoder can obtain the first fused feature map output by the shared encoder and the second fused feature map output by the first normal vector adaptor for decoding, and output the feature map. The second convolutional layer of the illumination rate decoder can use the feature map output from the previous convolutional layer and the second fused feature map output from the second normal vector adaptor for decoding.
在一个可能的实施方式中,可以在光照率解码器的卷积层利用第一融合特征图和第二融合特征图进行解码后,再利用若干个卷积层进行解码,用以调整光照率解码器最后输出的光照率图。In a possible implementation, after the convolutional layer of the illumination rate decoder uses the first fusion feature map and the second fusion feature map for decoding, several convolutional layers may be used for decoding to adjust the illumination rate decoding The final output of the light rate map.
因此,通过获取关于待分解图像的场景光照条件信息,例如是获得每一个像素的法向量自适应向量的法向量自适应图,可以建模随空间的变化而变化的光照条件,提高了图像分解模型在光照环境复杂的场景下的本征图像分解的效果。Therefore, by obtaining the scene illumination condition information about the image to be decomposed, for example, by obtaining the normal vector adaptation map of the normal vector adaptation vector of each pixel, the illumination conditions that change with the change of space can be modeled, and the image decomposition can be improved. The effect of the model's eigenimage decomposition in scenes with complex lighting environments.
步骤S24:基于场景光照条件信息和法向量信息,得到待分解图像的光照率图像。Step S24: Based on the scene illumination condition information and the normal vector information, an illumination rate image of the to-be-decomposed image is obtained.
在得到待分解图像中场景的场景光照条件信息以后,可以基于场景光照条件信息和法向量估计模型输出的法向量信息来对待分解图形进行分解,以获得待分解图像的光照率图像。例如,可以法向量自适应图和法向量图来得到待分解图像的光照率图像。After obtaining the scene illumination condition information of the scene in the image to be decomposed, the decomposed graph can be decomposed based on the scene illumination condition information and the normal vector information output by the normal vector estimation model to obtain the illumination rate image of the image to be decomposed. For example, a normal vector adaptation map and a normal vector map can be used to obtain the illumination rate image of the image to be decomposed.
在一个可选实施例中,可以将法向量自适应图和法向量图进行点积,得到待分解图像的光照率图像。In an optional embodiment, a dot product of the normal vector adaptive map and the normal vector map may be performed to obtain an illumination rate image of the image to be decomposed.
因此,法向量自适应图充分利用了法向量估计模型提供的场景结构特征信息中的平面信息与物体边界信息,使得图像分解模型分解得到的光照率图像可以减少平面区域上的纹理残留的问题,同时又能够使物体具有清晰、锐利的轮廓,并且使得反射率图像的场景能够与待分解图像的场景较好匹配。Therefore, the normal vector adaptation map makes full use of the plane information and object boundary information in the scene structure feature information provided by the normal vector estimation model, so that the illumination rate image decomposed by the image decomposition model can reduce the problem of texture residue on the plane area. At the same time, the object can have a clear and sharp outline, and the scene of the reflectivity image can be better matched with the scene of the image to be decomposed.
在本公开的一些可选实施例中,图像分解模型还可以包括反射率解码器。因为共享编码器提取的特征信息可以用于得到反射率图像。因此,在共享编码器对待分解图像进行特征提取后,也即在步骤S231之后,可以继续执行以下步骤1:In some optional embodiments of the present disclosure, the image decomposition model may further include an reflectivity decoder. Because the feature information extracted by the shared encoder can be used to obtain the reflectivity image. Therefore, after the shared encoder performs feature extraction on the image to be decomposed, that is, after step S231, the following step 1 can be continued:
步骤1:利用反射率解码器对第一融合特征图进行解码,得到待分解图像的反射率图像。Step 1: Decode the first fusion feature map with a reflectivity decoder to obtain a reflectivity image of the image to be decomposed.
从上述实施例可知,共享编码器最后一层输出的是第一融合特征图,该第一融合特征图包含了待分解图像中场景的场景结构特征信息。因此,可以利用反射率解码器对第一融合特征图进行解码,得到待分解图像的反射率图像。It can be seen from the above embodiment that the output of the last layer of the shared encoder is the first fusion feature map, and the first fusion feature map includes scene structure feature information of the scene in the image to be decomposed. Therefore, a reflectivity decoder can be used to decode the first fusion feature map to obtain a reflectivity image of the image to be decomposed.
在一个可选实施例中,所述利用所述反射率解码器对所述第一融合特征图进行解码,得到所述待分解图像的反射率图像,包括:利用反射率解码器对第一融合特征图和至少一个法向量自适应器的第二融合特征图进行解码,得到待分解图像的反射率图像。反射率解码器可以同时获取图像分解模型的共享编码器最后一层输出的第一融合特征图,以及至少一个法向量自适应器的第二融合特征图,并对这两个特征图进行解码,以得到待分解图像的反射率图像。当共享编码器的最后一层是编码单元时,则可以是获取最后一个编码单元输出的第一融合特征图,以及其他编码单元的法向量自适应器输出的第二融合特征图进行解码。In an optional embodiment, using the reflectivity decoder to decode the first fusion feature map to obtain the reflectivity image of the image to be decomposed includes: using the reflectivity decoder to fuse the first fusion feature map The feature map and the second fusion feature map of at least one normal vector adaptor are decoded to obtain a reflectivity image of the image to be decomposed. The reflectivity decoder can simultaneously obtain the first fused feature map output by the last layer of the shared encoder of the image decomposition model, and the second fused feature map of at least one normal vector adaptor, and decode the two feature maps, to obtain the reflectance image of the image to be decomposed. When the last layer of the shared encoder is a coding unit, the first fused feature map output by the last coding unit and the second fused feature map output by the normal vector adaptors of other coding units can be obtained for decoding.
在一个可能的实施方式中,共享编码器的编码单元的数量有多个,反射率解码器可以同时获取多个编码单元输出的第二融合特征图来进行解码。例如,反射率解码器获取了3个顺序连接的编码单元输出的第二融合特征图,则可以在反射率解码器中,设置3个顺序连接的卷积层(例如是up-projection块)分别获取3个编码单元输出的第二融合特征图,来进行解码。例如,反射率解码器的第一个卷积层可以获取共享编码器输出的第一融合特征图和第一个法向量自适应器输出的第二融合特征图来进行解码,并输出特征图。反射率解码器的第二个卷积层可以利用上一个卷积层输出的特征图以及第二个法向量自适应器输出的第二融合特征图来进行解码。In a possible implementation manner, there are multiple coding units sharing the encoder, and the reflectivity decoder can simultaneously acquire the second fused feature maps output by the multiple coding units for decoding. For example, if the reflectivity decoder obtains the second fusion feature map output by three sequentially connected coding units, then in the reflectivity decoder, three sequentially connected convolutional layers (such as up-projection blocks) can be set up respectively Obtain the second fusion feature map output by the three coding units for decoding. For example, the first convolutional layer of the reflectivity decoder can obtain the first fused feature map output by the shared encoder and the second fused feature map output by the first normal vector adaptor for decoding, and output the feature map. The second convolutional layer of the reflectivity decoder can use the feature map output from the previous convolutional layer and the second fused feature map output from the second normal vector adaptor for decoding.
在一个可能的实施方式中,可以在反射率解码器的卷积层利用第一融合特征图和第二融合特征图进行解码后,再利用若干个卷积层进行解码,用以调整反射率解码器最后输出的反射率图。In a possible implementation, after the convolutional layer of the reflectivity decoder uses the first fused feature map and the second fused feature map for decoding, several convolutional layers can be used for decoding to adjust the reflectivity decoding The reflectance map of the final output of the device.
因此,通过利用包含了待分解图像中场景的场景结构特征信息的第一融合特征图来对待分解图像进行分解,利用了场景结构特征信息,进而为待分解图像中场景的各个物体分配更加一致的反射率,提高了本征图像的分解效果。Therefore, the image to be decomposed is decomposed by using the first fusion feature map that contains the scene structure feature information of the scene in the image to be decomposed, and the scene structure feature information is used, and then each object of the scene in the image to be decomposed is assigned a more consistent The reflectivity improves the decomposition effect of the intrinsic image.
图6是本公开实施例的图像分解方法中图像分解模型的框架示意图。如图6所示,在一个可选实施例中,图像分解模型60包括:共享编码器61、光照率解码器62和反射率解码器63。图像分解模型60例如是全卷积神经网络。FIG. 6 is a schematic frame diagram of an image decomposition model in an image decomposition method according to an embodiment of the present disclosure. As shown in FIG. 6 , in an optional embodiment, the image decomposition model 60 includes: a shared encoder 61 , an illumination rate decoder 62 and a reflectance decoder 63 . The image decomposition model 60 is, for example, a fully convolutional neural network.
共享编码器61包括卷积块611(如图所示的conv1)和若干个编码单元612。编码单元612包括法向量自适应器6121(例如NFA1、NFA2、NFA3)。法向量自适应器6121可以与法向量估计模型的部分编码链接。光照率解码器62包括若干个卷积块621,其中部分是上投射(up-projection)块(如图中所示的上投射块1至上投射块4)。反射率解码器63包括若干个卷积块631,其中部分是上投射(up-projection)块(如图中所示的上投射块5至上投射块8)。其中,法向量自适应器6121分别与光照率解码器62的部分卷积块621以及反射率解码器63的部分卷积块631跳跃链接。The shared encoder 61 includes a convolution block 611 (conv1 as shown) and several coding units 612 . The encoding unit 612 includes a normal vector adaptor 6121 (eg, NFA1, NFA2, NFA3). The normal vector adaptor 6121 may be linked with partial coding of the normal vector estimation model. The illumination rate decoder 62 includes several convolutional blocks 621, some of which are up-projection blocks (up-projection block 1 to up-projection block 4 as shown in the figure). The reflectivity decoder 63 includes several convolution blocks 631, some of which are up-projection blocks (up-projection block 5 to up-projection block 8 as shown in the figure). The normal vector adaptor 6121 is skip-linked to the partial convolution block 621 of the illumination rate decoder 62 and the partial convolution block 631 of the reflectivity decoder 63, respectively.
图像分解模型60可以对待分解图像进行处理,得到待分解图像的场景光照条件信息。图像分解模型60还可以基于场景光照条件信息和法向量估计模型输出的法向量信息,得到待分解图像的光照率图像。另外,图像分解模型60也可以输出反射率图像。The image decomposition model 60 may process the to-be-decomposed image to obtain scene illumination condition information of the to-be-decomposed image. The image decomposition model 60 may also obtain the illumination rate image of the image to be decomposed based on the scene illumination condition information and the normal vector information output by the normal vector estimation model. In addition, the image decomposition model 60 may also output a reflectance image.
共享编码器61具体可以对待分解图像进行特征提取得到图像特征图,并对图像特征图和法向量估计模型的法向量编码器输出的第一场景结构特征图进行融合,输出第一融合特征图。示例性的,位于编码单元612之前的卷积块可以对待分解图像进行特征提取,以获得上述的实施例提及的图像特征图。编码单元612可以利用法向量自适应器6121对前一编码单元输出的特征图和法向量估计模型的编码器输出的场景结构特征图进行融合,得到与该编码单元对应的第二融合特征图。图中Y代表法向量估计模型的编码器输出的场景结构特征图。位于编码单元612之后的卷积块可以对最后一个编码单元输出的第二融合特征图进行进一步的编码,最后输出第一融合特征图。编码单元612还可以包括降采样卷积块6122(记为降采样块,如图中所示的将采样块1至降采样块4),用于对前一编码单元输出的特征图进行降采样处理。Specifically, the shared encoder 61 can perform feature extraction on the image to be decomposed to obtain an image feature map, and fuse the image feature map with the first scene structure feature map output by the normal vector encoder of the normal vector estimation model, and output the first fused feature map. Exemplarily, the convolution block located before the encoding unit 612 may perform feature extraction on the to-be-decomposed image to obtain the image feature map mentioned in the above embodiments. The coding unit 612 can use the normal vector adaptor 6121 to fuse the feature map output by the previous coding unit and the scene structure feature map output by the encoder of the normal vector estimation model to obtain a second fused feature map corresponding to the coding unit. In the figure, Y represents the scene structure feature map output by the encoder of the normal vector estimation model. The convolution block located after the coding unit 612 may further encode the second fused feature map output by the last coding unit, and finally output the first fused feature map. The coding unit 612 may further include a down-sampling convolution block 6122 (referred to as a down-sampling block, as shown in the figure, the sampling block 1 to the down-sampling block 4), for down-sampling the feature map output by the previous coding unit deal with.
光照率解码器62可包括5个卷积块。最后一层卷积块621(如图中所示的conv4)输出待分解图像的场景光照条件信息,例如是法向量自适应图。图中A代表法向量自适应图,N代表细化子网络输出的法向量图。图像分解模型60将法向量自适应图A和法向量图N进行点积,可以得到光照率图形。Illumination rate decoder 62 may include 5 convolutional blocks. The last layer of convolution block 621 (conv4 as shown in the figure) outputs scene lighting condition information of the image to be decomposed, such as a normal vector adaptation map. In the figure, A represents the normal vector adaptation map, and N represents the normal vector map output by the refinement sub-network. The image decomposition model 60 takes the dot product of the normal vector adaptation map A and the normal vector map N, and can obtain the illuminance graph.
反射率解码器63可包括5个卷积块631,卷积块631会针对共享编码器输出的第一融合结构特征图进行逐层解码,最后一层卷积块631(如图中所示的conv6)直接输出反射率图像。The reflectivity decoder 63 may include five convolution blocks 631, the convolution block 631 will perform layer-by-layer decoding on the first fusion structure feature map output by the shared encoder, and the last layer of the convolution block 631 (as shown in the figure) conv6) directly output the reflectance image.
本公开实施例还提供了关于上述图像分解方法实施例提及的法向量估计模型和图像分解模型的训练方法。The embodiments of the present disclosure also provide the training methods for the normal vector estimation model and the image decomposition model mentioned in the above-mentioned image decomposition method embodiments.
在执行利用法向量估计模型获取待分解图像的法向量信息之前,可以先对法向量估计模型和图像分解模型进行训练。Before obtaining the normal vector information of the image to be decomposed by using the normal vector estimation model, the normal vector estimation model and the image decomposition model may be trained first.
因为法向量估计模型包含有独立的法向量编码器、法向量解码器和细分子网络。因此,可以利用实现对法向量估计模型的单独训练。同时,也可以实现对图像分解模型的单独训练。Because the normal vector estimation model contains independent normal vector encoder, normal vector decoder and sub-network. Therefore, a separate training of the normal vector estimation model can be implemented. At the same time, separate training of the image decomposition model can also be achieved.
因此,在一个可选实施例中,法向量估计模型和图像分解模型分别训练得到的。即在训练法向量估计模型和图像分解模型时,可以单独对法向量估计模型和图像分解模型进行训练。Therefore, in an optional embodiment, the normal vector estimation model and the image decomposition model are obtained by training separately. That is, when training the normal vector estimation model and the image decomposition model, the normal vector estimation model and the image decomposition model can be trained separately.
通过为法向量估计模型设置独立的法向量编码器、法向量解码器和细分子网络,使得可以对法向量估计模型进行单独的训练,以此就可以仅利用法向量样本数据来训练法向量估计模型,并以此来提高本征图像的分解效果,降低了因为缺乏本征图像样本数据而造成的对本征图像分解效果的影响。By setting independent normal vector encoders, normal vector decoders and sub-networks for the normal vector estimation model, the normal vector estimation model can be trained separately, so that the normal vector estimation model can be trained using only normal vector sample data. The model is used to improve the decomposition effect of the intrinsic image and reduce the influence on the decomposition effect of the intrinsic image caused by the lack of intrinsic image sample data.
在一个可选实施例中,在对法向量估计模型进行训练时,可以利用第一样本集训练得到法向量估计模型,其中,第一样本集中的图像标注有法向量信息。法向量信息例如是图像中的每一个像素点都有对应的法向量。第一样本集例如包括:NYUv2数据集、密集的室内外深度(Dense Indoor and Outdoor DEpth,DIODE)数据集。In an optional embodiment, when the normal vector estimation model is trained, the normal vector estimation model may be obtained by training with a first sample set, wherein the images in the first sample set are marked with normal vector information. The normal vector information is, for example, that each pixel in the image has a corresponding normal vector. The first sample set includes, for example, the NYUv2 dataset and the Dense Indoor and Outdoor DEpth (DIODE) dataset.
在对法向量估计模型进行训练后,可以利用经训练的法向量估计模型获取第二样本集中的图像的样本法向量信息,并利用第二样本集以及样本法向量信息对图像分解模型 进行训练。第二样本集的图像可以标注有光照率图真值和反射率图真值。第二样本集例如是CGI数据集。After the normal vector estimation model is trained, the sample normal vector information of the images in the second sample set can be obtained by using the trained normal vector estimation model, and the image decomposition model can be trained by using the second sample set and the sample normal vector information. The images of the second sample set may be annotated with the ground truth illuminance map and the ground truth reflectance map. The second sample set is, for example, a CGI data set.
在一个可选实施例中,第二样本集包括第一子样本集和第二子样本集。第一子样本集的图像可以标注有光照率图真值,第二子样本集的图像可以标注有反射率图真值。In an optional embodiment, the second sample set includes a first sub-sample set and a second sub-sample set. The images of the first sub-sample set may be marked with the ground truth value of the illuminance map, and the images of the second subset of samples may be marked with the ground truth value of the reflectivity map.
在利用第二样本集以及样本法向量信息对图像分解模型进行训练时,具体可以执行步骤1和步骤2。When the image decomposition model is trained by using the second sample set and the sample normal vector information, steps 1 and 2 may be specifically performed.
步骤1:利用第一子样本集以及第一子样本集对应的样本法向量信息对图像分解模型进行训练,以调整图像分解模型中共享编码器和光照率解码器的参数。Step 1: The image decomposition model is trained by using the first sub-sample set and the sample normal vector information corresponding to the first sub-sample set, so as to adjust the parameters of the shared encoder and the illumination rate decoder in the image decomposition model.
第一子样本集对应的法向量信息是利用经过训练的法向量估计模型获得的。通过利用标注有光照率图真值的第一子样本集,可以实现对图像分解模型中共享编码器和光照率解码器的训练。The normal vector information corresponding to the first sub-sample set is obtained by using the trained normal vector estimation model. The training of the shared encoder and the illuminance decoder in the image decomposition model can be achieved by utilizing the first subset of samples annotated with the ground truth of the illuminance map.
步骤2:利用第二子样本集以及第二子样本集对应的样本法向量信息对图像分解模型进行训练,以调整图像分解模型中共享编码器和反射率解码器的参数。Step 2: The image decomposition model is trained by using the second sub-sample set and the sample normal vector information corresponding to the second sub-sample set, so as to adjust the parameters of the shared encoder and the reflectivity decoder in the image decomposition model.
第二子样本集对应的法向量信息是利用经过训练的法向量估计模型获得的。在利用标注有光照率图真值的第一子样本集对图像分解模型中共享编码器和光照率解码器进行训练后,可以在此基础上进一步的对图像分解模型中共享编码器和反射率解码器进行训练。具体可以通过利用标注有反射率图真值的第二子样本集,对图像分解模型中共享编码器和反射率解码器的训练。The normal vector information corresponding to the second sub-sample set is obtained by using the trained normal vector estimation model. After training the shared encoder and illuminance decoder in the image decomposition model with the first sub-sample set marked with the ground truth value of the illuminance map, the shared encoder and reflectance in the image decomposition model can be further trained on this basis. The decoder is trained. Specifically, the shared encoder and the reflectivity decoder in the image decomposition model can be trained by using the second sub-sample set marked with the ground truth value of the reflectivity map.
在上述对法向量估计模型和图像分解模型的训练过程中,可以根据相关的损失函数来判断训练的效果,进而根据损失值的大小来调整各个模型的网络参数,以此来完成训练。In the above training process of the normal vector estimation model and the image decomposition model, the training effect can be judged according to the relevant loss function, and then the network parameters of each model can be adjusted according to the size of the loss value to complete the training.
以此,通过分别训练共享编码器和光照率解码器和共享编码器和反射率解码器,可以使得图像分解模型对待分解图像进行分解时,能够获得效果较好的光照率图和反射率图。In this way, by training the shared encoder and the illuminance decoder and the shared encoder and the reflectance decoder respectively, the image decomposition model can obtain better illuminance maps and reflectance maps when decomposing the to-be-decomposed image.
采用上述方案,通过获取待分解图像的法向量信息,使得图像分解模型能够利用法向量信息更好地理解待分解图像中场景的环境情况,使得图像分解模型分解得到的本征图像能够与待分解图像的场景较好匹配,提高了本征图像的分解效果;另外,待分解图像的法向量信息是利用独立于图像分解模型的法向量估计模型得到,采用针对性模型,能够得到准确的法向量信息,进一步提高了后续分解得到的本征图像与待分解图像的场景的匹配度。With the above solution, by acquiring the normal vector information of the image to be decomposed, the image decomposition model can use the normal vector information to better understand the environment of the scene in the image to be decomposed, so that the intrinsic image decomposed by the image decomposition model can be decomposed with the to-be-decomposed image. The scene of the image is better matched, which improves the decomposition effect of the intrinsic image; in addition, the normal vector information of the image to be decomposed is obtained by using a normal vector estimation model independent of the image decomposition model, and the targeted model can be used to obtain accurate normal vectors information, which further improves the matching degree between the eigenimage obtained by subsequent decomposition and the scene of the image to be decomposed.
本领域技术人员可以理解,在具体实施方式的上述方法中,各步骤的撰写顺序并不意味着严格的执行顺序而对实施过程构成任何限定,各步骤的具体执行顺序应当以其功能和可能的内在逻辑确定。Those skilled in the art can understand that in the above method of the specific implementation, the writing order of each step does not mean a strict execution order but constitutes any limitation on the implementation process, and the specific execution order of each step should be based on its function and possible Internal logic is determined.
图7是本公开实施例的图像分解装置的框架示意图。如图7所示,图像分解装置70包括获取模块71、法向量估计模块72和分解模块73。FIG. 7 is a schematic frame diagram of an image decomposition apparatus according to an embodiment of the present disclosure. As shown in FIG. 7 , the image decomposition apparatus 70 includes an acquisition module 71 , a normal vector estimation module 72 and a decomposition module 73 .
获取模块71配置为执行获取待分解图像;法向量估计模块72配置为执行利用法向量估计模型获取待分解图像的法向量信息;分解模块73配置为执行基于法向量信息,利用图像分解模型对待分解图像进行分解,得到待分解图像的本征图像。The acquisition module 71 is configured to perform the acquisition of the image to be decomposed; the normal vector estimation module 72 is configured to perform the acquisition of the normal vector information of the image to be decomposed by using the normal vector estimation model; The image is decomposed to obtain the intrinsic image of the image to be decomposed.
在一些可选实施例中,上述的本征图像包括光照率图像。上述的分解模块73配置为利用图像分解模型对待分解图像进行处理,得到待分解图像的场景光照条件信息;基 于场景光照条件信息和法向量信息,得到待分解图像的光照率图像。In some optional embodiments, the above-mentioned intrinsic images include illuminance images. The above-mentioned decomposition module 73 is configured to utilize the image decomposition model to process the image to be decomposed to obtain scene illumination condition information of the image to be decomposed; based on the scene illumination condition information and normal vector information, obtain the illumination rate image of the image to be decomposed.
在一些可选实施例中,上述的场景光照条件信息为包含待分解图像不同像素的法向量自适应向量的法向量自适应图,法向量信息为包含待分解图像不同像素的法向量的法向量图。上述的分解模块73配置为:将法向量自适应图和法向量图进行点积,得到待分解图像的光照率图像。In some optional embodiments, the above-mentioned scene lighting condition information is a normal vector adaptation map including normal vector adaptation vectors of different pixels of the image to be decomposed, and the normal vector information is a normal vector including normal vectors of different pixels of the image to be decomposed. picture. The above-mentioned decomposition module 73 is configured to: perform a dot product on the normal vector adaptive map and the normal vector map to obtain an illumination rate image of the image to be decomposed.
在一些可选实施例中,上述的图像分解模型包括共享编码器和光照率解码器。上述的分解模块73配置为:利用共享编码器对待分解图像进行特征提取得到图像特征图,并对图像特征图和法向量估计模型的法向量编码器输出的第一场景结构特征图进行融合,得到第一融合特征图;利用光照率解码器对第一融合特征图进行解码,得到待分解图像的场景光照条件信息。In some optional embodiments, the above-mentioned image decomposition model includes a shared encoder and an illumination rate decoder. The above-mentioned decomposition module 73 is configured to: use the shared encoder to perform feature extraction on the image to be decomposed to obtain an image feature map, and fuse the image feature map and the first scene structure feature map output by the normal vector encoder of the normal vector estimation model to obtain. The first fusion feature map; the first fusion feature map is decoded by an illumination rate decoder to obtain scene illumination condition information of the image to be decomposed.
在一些可选实施例中,上述的共享编码器包括顺序连接的至少一个编码单元,每个编码单元包括法向量自适应器、上述的分解模块73配置为:将图像特征图输出至第一个编码单元;对于每个编码单元:利用法向量自适应器对前一编码单元输出的特征图和第一场景结构特征图进行融合,得到编码单元对应的第二融合特征图;其中,每个编码单元对应的场景结构特征图中的特征丰富度不同;基于最后一个编码单元的第二融合特征图,得到第一融合特征图。In some optional embodiments, the above-mentioned shared encoder includes at least one encoding unit connected in sequence, each encoding unit includes a normal vector adaptor, and the above-mentioned decomposition module 73 is configured to: output the image feature map to the first coding unit; for each coding unit: use a normal vector adaptor to fuse the feature map output by the previous coding unit and the first scene structure feature map to obtain the second fusion feature map corresponding to the coding unit; wherein, each coding unit The feature richness in the scene structure feature map corresponding to the unit is different; based on the second fused feature map of the last coding unit, the first fused feature map is obtained.
在一些可选实施例中,上述的分解模块73配置为执行在利用法向量自适应器对前一编码单元输出的特征图和场景结构特征图进行融合,得到编码单元对应的第二融合特征图之前,对前一编码单元输出的特征图进行降采样处理。和/或,上述的分解模块73配置为:利用法向量自适应器执行:将场景结构特征图调整为预设尺度的场景结构特征图,将调整后的场景结构特征图与前一编码单元输出的特征图进行串联并卷积,得到编码单元对应的第二融合特征图。In some optional embodiments, the above-mentioned decomposition module 73 is configured to perform fusion of the feature map and the scene structure feature map output by the previous coding unit by using the normal vector adaptor to obtain the second fusion feature map corresponding to the coding unit Before, down-sampling is performed on the feature map output by the previous coding unit. And/or, the above-mentioned decomposition module 73 is configured to: use a normal vector adaptor to perform: adjust the scene structure feature map to a scene structure feature map of a preset scale, and output the adjusted scene structure feature map and the previous coding unit. The feature maps are concatenated and convolved to obtain the second fusion feature map corresponding to the coding unit.
在一些可选实施例中,上述的分解模块73配置为:利用光照率解码器对第一融合特征图和至少一个法向量自适应器的第二融合特征图进行解码,得到待分解图像的场景光照条件信息。In some optional embodiments, the above-mentioned decomposition module 73 is configured to: use an illumination rate decoder to decode the first fused feature map and the second fused feature map of at least one normal vector adaptor to obtain the scene of the image to be decomposed Lighting condition information.
在一些可选实施例中,上述的图像分解模型还包括反射率解码器。上述的分解模块73配置为:利用反射率解码器对第一融合特征图进行解码,得到待分解图像的反射率图像。In some optional embodiments, the above-mentioned image decomposition model further includes a reflectivity decoder. The above-mentioned decomposition module 73 is configured to: decode the first fusion feature map by using a reflectivity decoder to obtain a reflectivity image of the image to be decomposed.
在一些可选实施例中,上述的分解模块73配置为:利用反射率解码器对第一融合特征图和至少一个法向量自适应器的第二融合特征图进行解码,得到待分解图像的反射率图像。In some optional embodiments, the above-mentioned decomposition module 73 is configured to: use a reflectivity decoder to decode the first fused feature map and the second fused feature map of at least one normal vector adaptor to obtain the reflection of the image to be decomposed rate image.
在一些可选实施例中,上述的法向量估计模型包括法向量编码器、法向量解码器和细分子网络。上述的法向量估计模块72配置为:利用法向量编码器对待分解图像进行编码,得到第一场景结构特征图;利用法向量解码器对第一场景结构特征图进行解码,得到解码特征图;利用细分子网络对第一场景结构特征图和解码特征图进行融合,得到待分解图像的法向量信息。In some optional embodiments, the above-mentioned normal vector estimation model includes a normal vector encoder, a normal vector decoder and a sub-network. The above-mentioned normal vector estimation module 72 is configured to: use the normal vector encoder to encode the image to be decomposed to obtain the first scene structure feature map; use the normal vector decoder to decode the first scene structure feature map to obtain the decoded feature map; use The fine molecular network fuses the first scene structure feature map and the decoded feature map to obtain normal vector information of the image to be decomposed.
在一些可选实施例中,上述的法向量估计模块72配置为:利用法向量编码器对待分解图像进行多层编码,得到每层对应的第一场景结构特征图,其中,每层对应的第二场景结构特征图中的特征丰富度不同,最后一层对应的第一场景结构特征图输出至法向量解码器。其中,上述的法向量估计模块72配置为:利用细分子网络执行:将每层对 应的第一场景结构特征图串联得到第二场景结构特征图,并将第二场景结构特征图与解码特征图串联得到第三场景结构特征图,基于第三场景结构特征图,得到待分解图像的法向量信息。In some optional embodiments, the above-mentioned normal vector estimation module 72 is configured to: use a normal vector encoder to perform multi-layer encoding on the to-be-decomposed image to obtain a first scene structure feature map corresponding to each layer, wherein the first scene structure feature map corresponding to each layer is The feature richness of the two scene structure feature maps is different, and the first scene structure feature map corresponding to the last layer is output to the normal vector decoder. Wherein, the above-mentioned normal vector estimation module 72 is configured to: use the sub-network to perform: connect the first scene structure feature map corresponding to each layer in series to obtain the second scene structure feature map, and combine the second scene structure feature map with the decoding feature map The third scene structure feature map is obtained in series, and based on the third scene structure feature map, normal vector information of the image to be decomposed is obtained.
在一些可选实施例中,上述的法向量估计模型和图像分解模型分别训练得到的。In some optional embodiments, the above-mentioned normal vector estimation model and image decomposition model are obtained by training separately.
在一个可选实施例中,图像分解装置70还包括训练模块,配置为在法向量估计模块72执行利用法向量估计模型获取待分解图像的法向量信息之前,利用第一样本集训练得到法向量估计模型,其中,第一样本集中的图像标注有法向量信息;利用经训练的法向量估计模型获取第二样本集中的图像的样本法向量信息,并利用第二样本集以及样本法向量信息对图像分解模型进行训练。In an optional embodiment, the image decomposition apparatus 70 further includes a training module, which is configured to use the first sample set to train to obtain the normal vector information before the normal vector estimation module 72 executes using the normal vector estimation model to obtain the normal vector information of the image to be decomposed. A vector estimation model, wherein the images in the first sample set are marked with normal vector information; use the trained normal vector estimation model to obtain sample normal vector information of the images in the second sample set, and use the second sample set and the sample normal vector information to train the image decomposition model.
在一些可选实施例中,上述的第二样本集包括第一子样本集和第二子样本集。上述的训练模块配置为:利用第一子样本集以及第一子样本集对应的样本法向量信息对图像分解模型进行训练,以调整图像分解模型中共享编码器和光照率解码器的参数;利用第二子样本集以及第二子样本集对应的样本法向量信息对图像分解模型进行训练,以调整图像分解模型中共享编码器和反射率解码器的参数。In some optional embodiments, the above-mentioned second sample set includes a first sub-sample set and a second sub-sample set. The above-mentioned training module is configured to: use the first sub-sample set and the sample normal vector information corresponding to the first sub-sample set to train the image decomposition model, so as to adjust the parameters of the shared encoder and the illumination rate decoder in the image decomposition model; The second sub-sample set and the sample normal vector information corresponding to the second sub-sample set train the image decomposition model to adjust the parameters of the shared encoder and the reflectivity decoder in the image decomposition model.
图8是本公开实施例的电子设备的框架示意图。如图8所示,电子设备80包括相互耦接的存储器81和处理器82,处理器82用于执行存储器81中存储的程序指令,以实现上述任一图像分解方法实施例的步骤。在一个可选的实施场景中,电子设备80可以包括但不限于:微型计算机、服务器,此外,电子设备80还可以包括笔记本电脑、平板电脑等移动设备,在此不做限定。FIG. 8 is a schematic diagram of a frame of an electronic device according to an embodiment of the present disclosure. As shown in FIG. 8 , the electronic device 80 includes a memory 81 and a processor 82 coupled to each other, and the processor 82 is configured to execute program instructions stored in the memory 81 to implement the steps of any of the above image decomposition method embodiments. In an optional implementation scenario, the electronic device 80 may include, but is not limited to, a microcomputer and a server. In addition, the electronic device 80 may also include mobile devices such as a notebook computer and a tablet computer, which are not limited herein.
示例性的,处理器82用于控制其自身以及存储器81以实现上述任一图像分解方法实施例的步骤。处理器82还可以称为中央处理单元(Central Processing Unit,CPU)。处理器82可能是一种集成电路芯片,具有信号的处理能力。处理器82还可以是通用处理器、数字信号处理器(Digital Signal Processor,DSP)、专用集成电路(Application Specific Integrated Circuit,ASIC)、现场可编程门阵列(Field-Programmable Gate Array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。另外,处理器82可以由集成电路芯片共同实现。Exemplarily, the processor 82 is configured to control itself and the memory 81 to implement the steps of any of the above image decomposition method embodiments. The processor 82 may also be referred to as a central processing unit (Central Processing Unit, CPU). The processor 82 may be an integrated circuit chip with signal processing capability. The processor 82 may also be a general-purpose processor, a digital signal processor (Digital Signal Processor, DSP), an application specific integrated circuit (Application Specific Integrated Circuit, ASIC), a field programmable gate array (Field-Programmable Gate Array, FPGA) or other Programmable logic devices, discrete gate or transistor logic devices, discrete hardware components. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. In addition, the processor 82 may be jointly implemented by an integrated circuit chip.
图9为本公开实施例的计算机可读存储介质的框架示意图。如图9所示,计算机可读存储介质90存储有能够被处理器运行的程序指令901,程序指令901用于实现上述任一图像分解方法实施例的步骤。FIG. 9 is a schematic diagram of a framework of a computer-readable storage medium according to an embodiment of the disclosure. As shown in FIG. 9 , the computer-readable storage medium 90 stores program instructions 901 that can be executed by the processor, and the program instructions 901 are used to implement the steps of any of the above image decomposition method embodiments.
采用上述方案,通过获取待分解图像的法向量信息,使得图像分解模型能够利用法向量信息更好地理解待分解图像中场景的环境情况,使得图像分解模型分解得到的本征图像能够与待分解图像的场景较好匹配,提高了本征图像的分解效果;另外,待分解图像的法向量信息是利用独立于图像分解模型的法向量估计模型得到,采用针对性模型,能够得到准确的法向量信息,进一步提高了后续分解得到的本征图像与待分解图像的场景的匹配度。With the above solution, by acquiring the normal vector information of the image to be decomposed, the image decomposition model can use the normal vector information to better understand the environment of the scene in the image to be decomposed, so that the intrinsic image decomposed by the image decomposition model can be decomposed with the to-be-decomposed image. The scene of the image is better matched, which improves the decomposition effect of the intrinsic image; in addition, the normal vector information of the image to be decomposed is obtained by using a normal vector estimation model independent of the image decomposition model, and the targeted model can be used to obtain accurate normal vectors information, which further improves the matching degree between the eigenimage obtained by subsequent decomposition and the scene of the image to be decomposed.
在一些实施例中,本公开实施例提供的装置具有的功能或包含的模块可以用于执行上文方法实施例描述的方法,其具体实现可以参照上文方法实施例的描述,为了简洁,这里不再赘述。In some embodiments, the functions or modules included in the apparatuses provided in the embodiments of the present disclosure may be used to execute the methods described in the above method embodiments. For specific implementation, reference may be made to the descriptions of the above method embodiments. For brevity, here No longer.
上文对各个实施例的描述倾向于强调各个实施例之间的不同之处,其相同或相似之 处可以互相参考,为了简洁,本文不再赘述。The above descriptions of the various embodiments tend to emphasize the differences between the various embodiments, and the same or similar points can be referred to each other, and for the sake of brevity, they will not be repeated herein.
在本申请所提供的几个实施例中,应该理解到,所揭露的方法和装置,可以通过其它的方式实现。例如,以上所描述的装置实施方式仅仅是示意性的,例如,模块或单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如单元或组件可以结合或者可以集成到另一个***,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性、机械或其它的形式。In the several embodiments provided in this application, it should be understood that the disclosed method and apparatus may be implemented in other manners. For example, the device implementations described above are only illustrative. For example, the division of modules or units is only a logical function division. In actual implementation, there may be other divisions. For example, units or components may be combined or integrated. to another system, or some features can be ignored, or not implemented. On the other hand, the shown or discussed mutual coupling or direct coupling or communication connection may be through some interfaces, indirect coupling or communication connection of devices or units, which may be in electrical, mechanical or other forms.
另外,在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。In addition, each functional unit in each embodiment of the present application may be integrated into one processing unit, or each unit may exist physically alone, or two or more units may be integrated into one unit. The above-mentioned integrated units may be implemented in the form of hardware, or may be implemented in the form of software functional units.
集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)或处理器(processor)执行本申请各个实施方式方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(Read-Only Memory,ROM)、随机存取存储器(Random Access Memory,RAM)、磁碟或者光盘等各种可以存储程序代码的介质。The integrated unit, if implemented as a software functional unit and sold or used as a stand-alone product, may be stored in a computer-readable storage medium. Based on this understanding, the technical solutions of the present application can be embodied in the form of software products in essence, or the parts that contribute to the prior art, or all or part of the technical solutions, and the computer software products are stored in a storage medium , including several instructions to make a computer device (which may be a personal computer, a server, or a network device, etc.) or a processor (processor) to execute all or part of the steps of the methods of the various embodiments of the present application. The aforementioned storage medium includes: U disk, mobile hard disk, read-only memory (Read-Only Memory, ROM), random access memory (Random Access Memory, RAM), magnetic disk or optical disk and other media that can store program codes .

Claims (29)

  1. 一种图像分解方法,包括:An image decomposition method, comprising:
    获取待分解图像;Get the image to be decomposed;
    利用法向量估计模型获取所述待分解图像的法向量信息;Use the normal vector estimation model to obtain the normal vector information of the to-be-decomposed image;
    基于所述法向量信息,利用图像分解模型对所述待分解图像进行分解,得到所述待分解图像的本征图像。Based on the normal vector information, the image to be decomposed is decomposed using an image decomposition model to obtain an intrinsic image of the image to be decomposed.
  2. 根据权利要求1所述的方法,其中,所述本征图像包括光照率图像;所述基于所述法向量信息,利用图像分解模型对所述待分解图像进行分解,得到所述待分解图像的本征图像,包括:The method according to claim 1, wherein the intrinsic image comprises an illuminance image; the image to be decomposed is decomposed by using an image decomposition model based on the normal vector information to obtain an image of the to-be-decomposed image. Eigen images, including:
    利用图像分解模型对所述待分解图像进行处理,得到所述待分解图像的场景光照条件信息;Use an image decomposition model to process the image to be decomposed to obtain scene lighting condition information of the image to be decomposed;
    基于所述场景光照条件信息和所述法向量信息,得到所述待分解图像的光照率图像。Based on the scene illumination condition information and the normal vector information, an illumination rate image of the to-be-decomposed image is obtained.
  3. 根据权利要求2所述的方法,其中,所述场景光照条件信息为包含所述待分解图像不同像素的法向量自适应向量的法向量自适应图,所述法向量信息为包含所述待分解图像不同像素的法向量的法向量图;所述基于所述场景光照条件信息和所述法向量信息,得到所述待分解图像的光照率图像,包括:The method according to claim 2, wherein the scene lighting condition information is a normal vector adaptation map including normal vector adaptation vectors of different pixels of the to-be-decomposed image, and the normal vector information is a normal vector adaptation map including the to-be-decomposed image The normal vector map of the normal vectors of different pixels of the image; the illumination rate image of the to-be-decomposed image is obtained based on the scene lighting condition information and the normal vector information, including:
    将所述法向量自适应图和所述法向量图进行点积,得到所述待分解图像的光照率图像。Dot product the normal vector adaptive map and the normal vector map to obtain the illumination rate image of the to-be-decomposed image.
  4. 根据权利要求2或3所述的方法,其中,所述图像分解模型包括共享编码器和光照率解码器;所述利用图像分解模型对所述待分解图像进行处理,得到所述待分解图像的场景光照条件信息,包括:The method according to claim 2 or 3, wherein the image decomposition model comprises a shared encoder and an illumination rate decoder; the image decomposition model is used to process the to-be-decomposed image to obtain the Scene lighting condition information, including:
    利用所述共享编码器对所述待分解图像进行特征提取得到图像特征图,并对所述图像特征图和所述法向量估计模型的法向量编码器输出的第一场景结构特征图进行融合,得到第一融合特征图;Use the shared encoder to perform feature extraction on the image to be decomposed to obtain an image feature map, and fuse the image feature map and the first scene structure feature map output by the normal vector encoder of the normal vector estimation model, obtain the first fusion feature map;
    利用所述光照率解码器对所述第一融合特征图进行解码,得到所述待分解图像的场景光照条件信息。The first fusion feature map is decoded by the illumination rate decoder to obtain scene illumination condition information of the image to be decomposed.
  5. 根据权利要求4所述的方法,其中,所述共享编码器包括顺序连接的至少一个编码单元,每个所述编码单元包括法向量自适应器;所述对所述图像特征图和所述法向量估计模型的法向量编码器输出的第一场景结构特征图进行融合,得到第一融合特征图,包括:The method of claim 4, wherein the shared encoder comprises at least one coding unit connected in sequence, each of the coding units comprising a normal vector adaptor; the pairing of the image feature map and the normal vector The first scene structure feature map output by the normal vector encoder of the vector estimation model is fused to obtain the first fused feature map, including:
    将所述图像特征图输出至第一个所述编码单元;outputting the image feature map to the first encoding unit;
    对于每个所述编码单元:利用所述法向量自适应器对前一所述编码单元输出的特征图和所述第一场景结构特征图进行融合,得到所述编码单元对应的第二融合特征图;其中,每个所述编码单元对应的所述场景结构特征图中的特征丰富度不同;For each coding unit: use the normal vector adaptor to fuse the feature map output by the previous coding unit and the first scene structure feature map to obtain the second fusion feature corresponding to the coding unit Figure; wherein, the feature richness in the scene structure feature map corresponding to each of the coding units is different;
    基于最后一个所述编码单元的所述第二融合特征图,得到所述第一融合特征图。Based on the second fused feature map of the last coding unit, the first fused feature map is obtained.
  6. 根据权利要求5所述的方法,其中,在所述利用所述法向量自适应器对前一所述编码单元输出的特征图和所述场景结构特征图进行融合,得到所述编码单元对应的第二融合特征图之前,所述方法还包括:The method according to claim 5, wherein, in the process of using the normal vector adaptor to fuse the feature map output by the previous coding unit and the scene structure feature map to obtain the corresponding feature map of the coding unit Before the second fusion feature map, the method further includes:
    对前一所述编码单元输出的特征图进行降采样处理;Perform down-sampling processing on the feature map output by the coding unit described in the previous section;
    和/或,所述利用所述法向量自适应器对前一所述编码单元输出的特征图和所述场景结构特征图进行融合,得到所述编码单元对应的第二融合特征图,包括:And/or, the use of the normal vector adaptor to fuse the feature map output by the previous coding unit and the scene structure feature map to obtain a second fusion feature map corresponding to the coding unit, including:
    利用所述法向量自适应器执行:将所述场景结构特征图调整为预设尺度的场景结构特征图,将调整后的场景结构特征图与前一所述编码单元输出的特征图进行串联并卷积,得到所述编码单元对应的第二融合特征图。Using the normal vector adaptor to perform: adjusting the scene structure feature map to a scene structure feature map of a preset scale, and concatenating the adjusted scene structure feature map with the feature map output by the previous coding unit and combining Convolution is performed to obtain the second fusion feature map corresponding to the coding unit.
  7. 根据权利要求5或6所述的方法,其中,所述利用所述光照率解码器对所述第一融合特征图进行解码,得到所述待分解图像的场景光照条件信息,包括:The method according to claim 5 or 6, wherein the decoding of the first fusion feature map by the illumination rate decoder to obtain scene illumination condition information of the image to be decomposed comprises:
    利用所述光照率解码器对所述第一融合特征图和至少一个所述法向量自适应器的第二融合特征图进行解码,得到所述待分解图像的场景光照条件信息。The first fused feature map and the second fused feature map of at least one of the normal vector adaptors are decoded by the illumination rate decoder to obtain scene illumination condition information of the image to be decomposed.
  8. 根据权利要求4至7任一项所述的方法,其中,所述图像分解模型还包括反射率解码器;所述基于所述法向量信息,利用图像分解模型对所述待分解图像进行分解,得到所述待分解图像的本征图像,还包括:The method according to any one of claims 4 to 7, wherein the image decomposition model further comprises a reflectivity decoder; the image decomposition model is used to decompose the to-be-decomposed image based on the normal vector information, Obtaining the intrinsic image of the image to be decomposed, further comprising:
    利用所述反射率解码器对所述第一融合特征图进行解码,得到所述待分解图像的反射率图像。The first fusion feature map is decoded by using the reflectivity decoder to obtain a reflectivity image of the image to be decomposed.
  9. 根据权利要求8所述的方法,其中,所述利用所述反射率解码器对所述第一融合特征图进行解码,得到所述待分解图像的反射率图像,包括:The method according to claim 8, wherein the decoding the first fusion feature map by using the reflectivity decoder to obtain the reflectivity image of the to-be-decomposed image comprises:
    利用所述反射率解码器对所述第一融合特征图和至少一个所述法向量自适应器的第二融合特征图进行解码,得到所述待分解图像的反射率图像。The first fused feature map and the second fused feature map of at least one of the normal vector adaptors are decoded by using the reflectivity decoder to obtain a reflectivity image of the image to be decomposed.
  10. 根据权利要求1至9任一项所述的方法,其中,所述法向量估计模型包括法向量编码器、法向量解码器和细分子网络;The method according to any one of claims 1 to 9, wherein the normal vector estimation model comprises a normal vector encoder, a normal vector decoder and a sub-network;
    所述利用法向量估计模型获取所述待分解图像的法向量信息,包括:The obtaining of the normal vector information of the to-be-decomposed image by using the normal vector estimation model includes:
    利用所述法向量编码器对所述待分解图像进行编码,得到第一场景结构特征图;Using the normal vector encoder to encode the to-be-decomposed image to obtain a first scene structure feature map;
    利用所述法向量解码器对所述第一场景结构特征图进行解码,得到解码特征图;Use the normal vector decoder to decode the first scene structure feature map to obtain a decoded feature map;
    利用所述细分子网络对所述第一场景结构特征图和解码特征图进行融合,得到所述待分解图像的法向量信息。The first scene structure feature map and the decoded feature map are fused by using the subdivision network to obtain normal vector information of the image to be decomposed.
  11. 根据权利要求1至10任一项所述的方法,其中,所述法向量估计模型和所述图像分解模型分别训练得到的。The method according to any one of claims 1 to 10, wherein the normal vector estimation model and the image decomposition model are obtained by training separately.
  12. 根据权利要求11所述的方法,其中,在所述利用法向量估计模型获取所述待分解图像的法向量信息之前,所述方法还包括:The method according to claim 11, wherein before the normal vector information of the to-be-decomposed image is obtained by using a normal vector estimation model, the method further comprises:
    利用第一样本集训练得到所述法向量估计模型,其中,所述第一样本集中的图像标注有法向量信息;The normal vector estimation model is obtained by training with a first sample set, wherein the images in the first sample set are marked with normal vector information;
    利用经训练的所述法向量估计模型获取所述第二样本集中的图像的样本法向量信息,并利用所述第二样本集以及所述样本法向量信息对所述图像分解模型进行训练。The sample normal vector information of the images in the second sample set is obtained by using the trained normal vector estimation model, and the image decomposition model is trained by using the second sample set and the sample normal vector information.
  13. 根据权利要求12所述的方法,其中,所述第二样本集包括第一子样本集和第二子样本集,所述利用所述第二样本集以及所述样本法向量信息对所述图像分解模型进行训练,包括:The method according to claim 12, wherein the second sample set includes a first sub-sample set and a second sub-sample set, and the image is analyzed by using the second sample set and the sample normal vector information. Decompose the model for training, including:
    利用所述第一子样本集以及所述第一子样本集对应的所述样本法向量信息对所述图像分解模型进行训练,以调整所述图像分解模型中共享编码器和光照率解码器的参数;The image decomposition model is trained by using the first sub-sample set and the sample normal vector information corresponding to the first sub-sample set, so as to adjust the difference between the shared encoder and the illumination rate decoder in the image decomposition model. parameter;
    利用所述第二子样本集以及所述第二子样本集对应的所述样本法向量信息对所述图像分解模型进行训练,以调整所述图像分解模型中共享编码器和反射率解码器的参数。The image decomposition model is trained by using the second sub-sample set and the sample normal vector information corresponding to the second sub-sample set, so as to adjust the difference between the shared encoder and the reflectivity decoder in the image decomposition model. parameter.
  14. 一种图像分解装置,包括:An image decomposition device, comprising:
    获取模块,配置为获取待分解图像;an acquisition module, configured to acquire the image to be decomposed;
    法向量估计模块,配置为利用法向量估计模型获取所述待分解图像的法向量信息;a normal vector estimation module, configured to obtain normal vector information of the to-be-decomposed image by using a normal vector estimation model;
    分解模块,配置为基于所述法向量信息,利用图像分解模型对所述待分解图像进行分解,得到所述待分解图像的本征图像。The decomposition module is configured to decompose the image to be decomposed by using an image decomposition model based on the normal vector information to obtain an intrinsic image of the image to be decomposed.
  15. 根据权利要求14所述的装置,其中,所述本征图像包括光照率图像;The apparatus of claim 14, wherein the intrinsic image comprises an illuminance image;
    所述分解模块,配置为利用图像分解模型对所述待分解图像进行处理,得到所述待分解图像的场景光照条件信息;基于所述场景光照条件信息和所述法向量信息,得到所述待分解图像的光照率图像。The decomposition module is configured to process the image to be decomposed by using an image decomposition model to obtain scene illumination condition information of the to-be-decomposed image; and obtain the scene illumination condition information and the normal vector information based on the scene illumination condition information and the normal vector information. The light rate image of the decomposed image.
  16. 根据权利要求15所述的装置,其中,所述场景光照条件信息为包含所述待分解图像不同像素的法向量自适应向量的法向量自适应图,所述法向量信息为包含所述待分解图像不同像素的法向量的法向量图;The apparatus according to claim 15, wherein the scene lighting condition information is a normal vector adaptation map including normal vector adaptation vectors of different pixels of the to-be-decomposed image, and the normal vector information is a normal vector adaptation map including the to-be-decomposed image The normal vector map of the normal vectors of different pixels of the image;
    所述分解模块,配置为将所述法向量自适应图和所述法向量图进行点积,得到所述待分解图像的光照率图像。The decomposition module is configured to perform a dot product on the normal vector adaptive map and the normal vector map to obtain an illumination rate image of the to-be-decomposed image.
  17. 根据权利要求15或16所述的装置,其中,所述图像分解模型包括共享编码器和光照率解码器;The apparatus of claim 15 or 16, wherein the image decomposition model comprises a shared encoder and an illumination rate decoder;
    所述分解模块,配置为利用所述共享编码器对所述待分解图像进行特征提取得到图像特征图,并对所述图像特征图和所述法向量估计模型的法向量编码器输出的第一场景结构特征图进行融合,得到第一融合特征图;利用所述光照率解码器对所述第一融合特征图进行解码,得到所述待分解图像的场景光照条件信息。The decomposition module is configured to use the shared encoder to perform feature extraction on the image to be decomposed to obtain an image feature map, and to perform a first image feature map output from the image feature map and the normal vector encoder of the normal vector estimation model. The scene structure feature map is fused to obtain a first fused feature map; the first fused feature map is decoded by the illumination rate decoder to obtain scene illumination condition information of the image to be decomposed.
  18. 根据权利要求17所述的装置,其中,所述共享编码器包括顺序连接的至少一个编码单元,每个所述编码单元包括法向量自适应器;18. The apparatus of claim 17, wherein the shared encoder comprises at least one coding unit connected sequentially, each of the coding units comprising a normal vector adaptor;
    所述分解模块,配置为将所述图像特征图输出至第一个所述编码单元;对于每个所述编码单元:利用所述法向量自适应器对前一所述编码单元输出的特征图和所述第一场景结构特征图进行融合,得到所述编码单元对应的第二融合特征图;其中,每个所述编码单元对应的所述场景结构特征图中的特征丰富度不同;基于最后一个所述编码单元的所述第二融合特征图,得到所述第一融合特征图。The decomposition module is configured to output the image feature map to the first coding unit; for each coding unit: use the normal vector adaptor to output the feature map of the previous coding unit Fusion with the first scene structure feature map to obtain a second fusion feature map corresponding to the coding unit; wherein, the feature richness in the scene structure feature map corresponding to each coding unit is different; based on the last One of the second fusion feature maps of the coding unit to obtain the first fusion feature map.
  19. 根据权利要求18所述的装置,其中,所述分解模块,配置为在所述利用所述法向量自适应器对前一所述编码单元输出的特征图和所述场景结构特征图进行融合,得到所述编码单元对应的第二融合特征图之前,对前一所述编码单元输出的特征图进行降采样处理;和/或,配置为利用所述法向量自适应器执行:将所述场景结构特征图调整为预设尺度的场景结构特征图,将调整后的场景结构特征图与前一所述编码单元输出的特征图进行串联并卷积,得到所述编码单元对应的第二融合特征图。The apparatus according to claim 18, wherein the decomposition module is configured to fuse the feature map output by the previous coding unit and the scene structure feature map by using the normal vector adaptor, Before obtaining the second fusion feature map corresponding to the coding unit, perform down-sampling processing on the feature map output by the previous coding unit; and/or, be configured to use the normal vector adaptor to perform: The structure feature map is adjusted to a scene structure feature map of a preset scale, and the adjusted scene structure feature map and the feature map output by the previous coding unit are concatenated and convolved to obtain a second fusion feature corresponding to the coding unit. picture.
  20. 根据权利要求18或19所述的装置,其中,所述分解模块,配置为利用所述光照率解码器对所述第一融合特征图和至少一个所述法向量自适应器的第二融合特征图进行解码,得到所述待分解图像的场景光照条件信息。19. The apparatus according to claim 18 or 19, wherein the decomposition module is configured to use the illumination rate decoder to fuse the first fused feature map and the second fused feature of at least one of the normal vector adaptors The image is decoded to obtain scene lighting condition information of the image to be decomposed.
  21. 根据权利要求17至20任一项所述的装置,其中,所述图像分解模型还包括反 射率解码器;The apparatus of any one of claims 17 to 20, wherein the image decomposition model further comprises a reflectivity decoder;
    所述分解模块,配置为利用所述反射率解码器对所述第一融合特征图进行解码,得到所述待分解图像的反射率图像。The decomposition module is configured to use the reflectivity decoder to decode the first fusion feature map to obtain a reflectivity image of the image to be decomposed.
  22. 根据权利要求21所述的装置,其中,所述分解模块,配置为利用所述反射率解码器对所述第一融合特征图和至少一个所述法向量自适应器的第二融合特征图进行解码,得到所述待分解图像的反射率图像。21. The apparatus of claim 21, wherein the decomposition module is configured to perform the first fused feature map and the second fused feature map of at least one of the normal vector adaptors using the reflectivity decoder. Decoding is performed to obtain a reflectivity image of the to-be-decomposed image.
  23. 根据权利要求14至22任一项所述的装置,其中,所述法向量估计模型包括法向量编码器、法向量解码器和细分子网络;The apparatus according to any one of claims 14 to 22, wherein the normal vector estimation model comprises a normal vector encoder, a normal vector decoder and a sub-network;
    所述法向量估计模块,配置为利用所述法向量编码器对所述待分解图像进行编码,得到第一场景结构特征图;利用所述法向量解码器对所述第一场景结构特征图进行解码,得到解码特征图;利用所述细分子网络对所述第一场景结构特征图和解码特征图进行融合,得到所述待分解图像的法向量信息。The normal vector estimation module is configured to use the normal vector encoder to encode the image to be decomposed to obtain a first scene structure feature map; use the normal vector decoder to perform the first scene structure feature map. decoding to obtain a decoded feature map; using the subdivision network to fuse the first scene structure feature map and the decoded feature map to obtain normal vector information of the to-be-decomposed image.
  24. 根据权利要求14至23任一项所述的装置,其中,所述法向量估计模型和所述图像分解模型分别训练得到的。The apparatus according to any one of claims 14 to 23, wherein the normal vector estimation model and the image decomposition model are obtained by training separately.
  25. 根据权利要求24所述的装置,其中,所述装置还包括训练模块,配置为在所述法向量估计模块利用法向量估计模型获取所述待分解图像的法向量信息之前,利用第一样本集训练得到所述法向量估计模型,其中,所述第一样本集中的图像标注有法向量信息;利用经训练的所述法向量估计模型获取所述第二样本集中的图像的样本法向量信息,并利用所述第二样本集以及所述样本法向量信息对所述图像分解模型进行训练。The apparatus according to claim 24, wherein the apparatus further comprises a training module configured to use the first sample before the normal vector estimation module obtains the normal vector information of the image to be decomposed by using the normal vector estimation model The normal vector estimation model is obtained by training on the first sample set, wherein the images in the first sample set are marked with normal vector information; the trained normal vector estimation model is used to obtain the sample normal vector of the images in the second sample set information, and use the second sample set and the sample normal vector information to train the image decomposition model.
  26. 根据权利要求25所述的装置,其中,所述第二样本集包括第一子样本集和第二子样本集;The apparatus of claim 25, wherein the second sample set comprises a first sub-sample set and a second sub-sample set;
    所述训练模块,配置为利用所述第一子样本集以及所述第一子样本集对应的所述样本法向量信息对所述图像分解模型进行训练,以调整所述图像分解模型中共享编码器和光照率解码器的参数;利用所述第二子样本集以及所述第二子样本集对应的所述样本法向量信息对所述图像分解模型进行训练,以调整所述图像分解模型中共享编码器和反射率解码器的参数。The training module is configured to use the first sub-sample set and the sample normal vector information corresponding to the first sub-sample set to train the image decomposition model, so as to adjust the shared coding in the image decomposition model parameters of the decoder and the illumination rate decoder; use the second sub-sample set and the sample normal vector information corresponding to the second sub-sample set to train the image decomposition model to adjust the image decomposition model in the image decomposition model. Shared encoder and reflectivity decoder parameters.
  27. 一种电子设备,包括相互耦接的存储器和处理器,所述处理器用于执行所述存储器中存储的程序指令,以实现权利要求1至13任一项所述的图像分解方法。An electronic device includes a memory and a processor coupled to each other, the processor is configured to execute program instructions stored in the memory, so as to implement the image decomposition method according to any one of claims 1 to 13.
  28. 一种计算机可读存储介质,其上存储有程序指令,所述程序指令被处理器执行时实现权利要求1至13任一项所述的图像分解方法。A computer-readable storage medium on which program instructions are stored, and when the program instructions are executed by a processor, implement the image decomposition method according to any one of claims 1 to 13.
  29. 一种计算机程序,包括计算机可读代码,当所述计算机可读代码在电子设备中运行时,所述电子设备中的处理器执行配置为实现权利要求1至13中任意一项所述的方法。A computer program comprising computer readable code that, when the computer readable code is run in an electronic device, executes a processor in the electronic device configured to implement the method of any one of claims 1 to 13 .
PCT/CN2021/114023 2020-08-31 2021-08-23 Image decomposition method and related apparatus and device WO2022042470A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010898798.1 2020-08-31
CN202010898798.1A CN112053338A (en) 2020-08-31 2020-08-31 Image decomposition method and related device and equipment

Publications (1)

Publication Number Publication Date
WO2022042470A1 true WO2022042470A1 (en) 2022-03-03

Family

ID=73608057

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/114023 WO2022042470A1 (en) 2020-08-31 2021-08-23 Image decomposition method and related apparatus and device

Country Status (2)

Country Link
CN (1) CN112053338A (en)
WO (1) WO2022042470A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117095158A (en) * 2023-08-23 2023-11-21 广东工业大学 Terahertz image dangerous article detection method based on multi-scale decomposition convolution

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112053338A (en) * 2020-08-31 2020-12-08 浙江商汤科技开发有限公司 Image decomposition method and related device and equipment
CN115222930B (en) * 2022-09-02 2022-11-29 四川蜀天信息技术有限公司 WebGL-based 3D model arrangement and combination method

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105447906A (en) * 2015-11-12 2016-03-30 浙江大学 Method for calculating lighting parameters and carrying out relighting rendering based on image and model
US20160328630A1 (en) * 2015-05-08 2016-11-10 Samsung Electronics Co., Ltd. Object recognition apparatus and method
CN106296749A (en) * 2016-08-05 2017-01-04 天津大学 RGB D image eigen decomposition method based on L1 norm constraint
CN110428491A (en) * 2019-06-24 2019-11-08 北京大学 Three-dimensional facial reconstruction method, device, equipment and medium based on single-frame images
CN110647859A (en) * 2019-09-29 2020-01-03 浙江商汤科技开发有限公司 Face image decomposition method and device, electronic equipment and storage medium
CN111445582A (en) * 2019-01-16 2020-07-24 南京大学 Single-image human face three-dimensional reconstruction method based on illumination prior
CN112053338A (en) * 2020-08-31 2020-12-08 浙江商汤科技开发有限公司 Image decomposition method and related device and equipment

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160328630A1 (en) * 2015-05-08 2016-11-10 Samsung Electronics Co., Ltd. Object recognition apparatus and method
CN105447906A (en) * 2015-11-12 2016-03-30 浙江大学 Method for calculating lighting parameters and carrying out relighting rendering based on image and model
CN106296749A (en) * 2016-08-05 2017-01-04 天津大学 RGB D image eigen decomposition method based on L1 norm constraint
CN111445582A (en) * 2019-01-16 2020-07-24 南京大学 Single-image human face three-dimensional reconstruction method based on illumination prior
CN110428491A (en) * 2019-06-24 2019-11-08 北京大学 Three-dimensional facial reconstruction method, device, equipment and medium based on single-frame images
CN110647859A (en) * 2019-09-29 2020-01-03 浙江商汤科技开发有限公司 Face image decomposition method and device, electronic equipment and storage medium
CN112053338A (en) * 2020-08-31 2020-12-08 浙江商汤科技开发有限公司 Image decomposition method and related device and equipment

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117095158A (en) * 2023-08-23 2023-11-21 广东工业大学 Terahertz image dangerous article detection method based on multi-scale decomposition convolution
CN117095158B (en) * 2023-08-23 2024-04-26 广东工业大学 Terahertz image dangerous article detection method based on multi-scale decomposition convolution

Also Published As

Publication number Publication date
CN112053338A (en) 2020-12-08

Similar Documents

Publication Publication Date Title
WO2022042470A1 (en) Image decomposition method and related apparatus and device
CN111210435B (en) Image semantic segmentation method based on local and global feature enhancement module
CN112001914A (en) Depth image completion method and device
US20210027526A1 (en) Lighting estimation
CN112183150B (en) Image two-dimensional code and preparation method, analysis device and analysis method thereof
Xu et al. Multi-exposure image fusion techniques: A comprehensive review
US20180300531A1 (en) Computer-implemented 3d model analysis method, electronic device, and non-transitory computer readable storage medium
CN113870335A (en) Monocular depth estimation method based on multi-scale feature fusion
CN114038006A (en) Matting network training method and matting method
CN113850324B (en) Multispectral target detection method based on Yolov4
CN111652273A (en) Deep learning-based RGB-D image classification method
CN114969417B (en) Image reordering method, related device and computer readable storage medium
Liu et al. Band-independent encoder–decoder network for pan-sharpening of remote sensing images
CN115358917B (en) Method, equipment, medium and system for migrating non-aligned faces of hand-painted styles
CN112348819A (en) Model training method, image processing and registering method, and related device and equipment
CN112241955A (en) Method and device for segmenting broken bones of three-dimensional image, computer equipment and storage medium
Zeng et al. Self-attention learning network for face super-resolution
CN113538662B (en) Single-view three-dimensional object reconstruction method and device based on RGB data
Dumka et al. Advanced digital image processing and its applications in big data
CN112560544A (en) Method and system for identifying ground object of remote sensing image and computer readable storage medium
CN112990213A (en) Digital multimeter character recognition system and method based on deep learning
Wang et al. Unpaired image-to-image shape translation across fashion data
CN113240589A (en) Image defogging method and system based on multi-scale feature fusion
Dong et al. ViT-SAPS: Detail-aware transformer for mechanical assembly semantic segmentation
CN117292040B (en) Method, apparatus and storage medium for new view synthesis based on neural rendering

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21860305

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21860305

Country of ref document: EP

Kind code of ref document: A1

122 Ep: pct application non-entry in european phase

Ref document number: 21860305

Country of ref document: EP

Kind code of ref document: A1

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 25.09.2023)