CN113810597B - Rapid image and scene rendering method based on semi-predictive filtering - Google Patents

Rapid image and scene rendering method based on semi-predictive filtering Download PDF

Info

Publication number
CN113810597B
CN113810597B CN202110914290.0A CN202110914290A CN113810597B CN 113810597 B CN113810597 B CN 113810597B CN 202110914290 A CN202110914290 A CN 202110914290A CN 113810597 B CN113810597 B CN 113810597B
Authority
CN
China
Prior art keywords
picture
filtering
module
kernel
shot
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110914290.0A
Other languages
Chinese (zh)
Other versions
CN113810597A (en
Inventor
颜成钢
陈泉
马立栋
郑博仑
孙垚棋
张继勇
李宗鹏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Dianzi University
Original Assignee
Hangzhou Dianzi University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Dianzi University filed Critical Hangzhou Dianzi University
Priority to CN202110914290.0A priority Critical patent/CN113810597B/en
Publication of CN113810597A publication Critical patent/CN113810597A/en
Application granted granted Critical
Publication of CN113810597B publication Critical patent/CN113810597B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N23/00Cameras or camera modules comprising electronic image sensors; Control thereof
    • H04N23/95Computational photography systems, e.g. light-field imaging systems
    • H04N23/951Computational photography systems, e.g. light-field imaging systems by using two or more images to influence resolution, frame rate or aspect ratio
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/04Context-preserving transformations, e.g. by using an importance map
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/40Scaling of whole images or parts thereof, e.g. expanding or contracting
    • G06T3/4007Scaling of whole images or parts thereof, e.g. expanding or contracting based on interpolation, e.g. bilinear interpolation

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Image Processing (AREA)

Abstract

A fast image scene rendering method based on semi-prediction filtering comprises the following steps. Firstly, shooting by a single lens reflex camera to obtain data shot under different scenes, interpolating all pictures of a data set into the size of 1024 × 1472 by using a bicubic linear interpolation method, carrying out coordinate assignment on the processed full-focus picture, making a coordinate graph, and then constructing and training a fast graph and scene rendering network model based on semi-prediction filtering, wherein the network model comprises an attention module, a residual error module, a semi-filtering kernel module and an image generation module; and finally, receiving the pictures needing to be subjected to the shot rendering processing by the trained neural network model, and outputting the pictures after the shot rendering processing is finished. The method realizes the rapid shot rendering of the image on the premise of ensuring the shot rendering quality, and innovatively provides a coordinate graph for assisting the training of a network model and improving the capability of the network model for distinguishing important contents in the input image.

Description

Rapid image and scene rendering method based on semi-prediction filtering
Technical Field
The invention relates to a fast image and shot rendering method based on semi-prediction filtering, in particular to the field of shot effect processing based on a deep learning technology.
Background
The shot rendering effect is generally considered as one of aesthetic standards in the field of photography, and is easily realized by a photographer using a single lens reflex camera with the support of the prior art, and the photographer sets the camera to a large aperture shooting mode for image shooting so as to blur an uninteresting part in an image. In consideration of popularization of the smart phone, manufacturers try to add complex hardware and a camera at a mobile phone end to achieve a shot rendering effect of the smart phone from a hardware level, but high manufacturing cost is not friendly to merchants and consumers. Therefore, a shot rendering algorithm aiming at the image is developed on the basis of a software level and becomes a research hotspot, the shot rendering implementation method depends on the operational performance of the mobile phone, the required hardware cost is relatively low, and the method is suitable for most smart phones on the market. At present, most algorithms are realized based on deep learning, and an end-to-end network is built to realize the shot effect rendering of images. However, when the deep learning algorithm is integrated into the mobile phone, shortening the operation time becomes a big problem, the operation speed and the rendering effect are mutually restricted, and how to unify the operation speed and the rendering effect is a problem to be considered.
Disclosure of Invention
The technical problem to be solved is as follows: aiming at the problems of high cost of the hardware-based implementation method and operation speed and rendering quality of the software-based implementation method, the invention provides a fast image and scene rendering method based on semi-prediction filtering.
The implementation steps are as follows: the invention provides a fast picture and scene rendering method based on semi-prediction filtering, which comprises the following basic steps:
step 1: making a data set;
step 1.1: the method comprises the steps of obtaining data shot under different scenes through shooting by a single lens reflex, wherein the data shot under the different scenes are a pair of pictures, namely, all-focus pictures I shot by the single lens reflex org Picture I actually shot by single lens reflex through large aperture and having scattered scene rendering effect gt . Wherein the picture is in full focus I org Picture I with true shot rendering effect as input image data in model training process gt As comparison data for comparison with the model output images during the model training process.
Step 1.2: all pictures of the data set are interpolated to a size of 1024 x 1472 in height by using a bicubic linear interpolation method.
Step 1.3: and (5) making a coordinate graph. For the full focus picture I processed in the step 1.2 org And (3) carrying out coordinate assignment, wherein the specific calculation method comprises the following steps:
Figure BDA0003205091940000021
Figure BDA0003205091940000022
wherein, X represents the pixel point coordinate corresponding to the high dimension of the picture, and Y represents the pixel point coordinate corresponding to the wide dimension of the picture. Combining the X and Y information with the full focus picture I org Combining to reconstruct a 5-channel full-focus picture I org+c As the final input picture of the network model.
Step 2: constructing a fast image scene rendering network model based on semi-prediction filtering;
step 2.1: and (3) carrying out theoretical derivation on a fast image scene rendering task based on semi-predictive filtering. Suppose the input is a full focus picture I org+c Using a significance detection algorithm to carry out full focus picture I org+c Divided into two parts including a significant characteristic part I in a picture focus And background characteristics I of the picture defocus . Background region picture I by utilizing semi-filtering fuzzy algorithm defocus Blurring to obtain picture I with blurred background blur The semi-filtering fuzzy algorithm divides the salient feature part I focus Preserving and finally obtaining the picture I with fuzzy background blur And salient feature portion I focus Fusing to obtain the required picture I with the shot rendering bokeh . The theoretical model of the scene rendering task is formulated as follows:
Figure BDA0003205091940000023
wherein
Figure BDA0003205091940000024
Representing a saliency detection algorithm;
Figure BDA0003205091940000025
representing a semi-filtered blurring algorithm.
Step 2.2: constructing a fast image scene rendering network based on semi-prediction filtering;
the fast image scene rendering network based on semi-prediction filtering comprises an attention module, a residual error module, a semi-filtering kernel module and an image generation module. Wherein the attention module is used for detecting the input full focus picture I org+c For assisting the operation of a subsequent restrictive prediction filtering module; the residual error module is used for carrying out deep feature enhancement on the input data; the semi-filtering kernel module is used for generating a required filtering kernel, and is used for carrying out filtering operation on an input image and blurring partial content of a picture so as to generate a shot rendering effect, wherein the filtering kernel consists of a self-adaptive filtering kernel generated by a network and a small number of Gabor filtering kernels manually defined with parameters, the self-adaptive filtering kernel generated by the network is used for self-adaptively blurring the input image, and the Gabor filtering kernel manually defined with parameters is used for reserving and enhancing salient region details and edge details of the image; the image generation module is used for generating a picture which needs to be filtered by using the filtering kernel generated by the half filtering kernel module.
The residual error module has the specific structure that: input feature map X of residual module res Sequentially passing through convolution layers with the convolution kernel number of 64 and the convolution kernel size of 3X 3 to obtain an output characteristic diagram X' res . Finally, will output X' res And input X res Adding element by element to obtain the final output characteristic diagram X of the residual error module res-out . Wherein all convolutional layers are followed by a ReLU nonlinear activation function.
Attention module specific structure: input profile X for attention Module att The dimension of (a) is height H width W channel C. The attention module is divided into three branches of up, middle mid and down, and a characteristic diagram X is input att The number of convolution kernels passing through the up branch is 64,after the convolution layer with convolution kernel size of 3 × 3, reshape operation is performed to obtain feature map X with shape of HW × 64 up (ii) a Input feature map X att Performing Reshape operation after the convolution layer with the number of the convolution kernels of 64 mid branches and the convolution kernel size of 3X 3 to obtain a feature map X with the shape of 64X HW mid (ii) a Will feature diagram X up And feature map X mid Activating by adopting a Softmax function after matrix multiplication is carried out to obtain a characteristic diagram X with the shape of HW and HW act (ii) a Input feature map X att After the convolution layer with the convolution kernel number of 64 and the convolution kernel size of 3X 3 is processed by Reshape operation, the characteristic diagram X with the shape of HW X64 is obtained down (ii) a Will feature diagram X act And feature map X down After matrix multiplication, reshape operation is carried out again to obtain a feature diagram X with the shape of H X W X64 final (ii) a Characteristic diagram X final After passing through a convolution layer with the number of 3 convolution kernels of 64 and the convolution kernel size of 3X 3, the convolution layer is compared with the input feature map X att Adding element by element to obtain the final output characteristic diagram X of the attention module att-out . Wherein all convolutional layers are followed by a ReLU nonlinear activation function.
The semi-filtering kernel module has the specific structure that: input feature map X of half-filter kernel module filter The dimension of (a) is height H width W channel C. Input feature map X filter Obtaining deep characteristic information X through a residual error module with 64 filtering kernels deep (ii) a Sequentially passing through convolution layers with the convolution kernel number of 64, the convolution kernel size of 3X 3 and the multiple of 2 to obtain a required feature map X 'for generating filtering' deep Size 2h x 2w x 64;
will feature picture X' deep Dividing the channel into feature maps X with the size of 2H × 2W × 48 A And feature pattern X of size 2H X2W X16 B (ii) a Characteristic diagram X A For generating adaptive filtering kernels, i.e. feature maps X A Sequentially passing through convolution kernel with number k 2 Convolution layer with convolution kernel size of 3 x 3 and Softmax activation function to obtain size of 2H x 2W x k 2 Adaptive filtering kernel X adp-f Where k is a predefined filter kernel size; feature(s)Drawing X B Edge filter kernels Gabor filter kernels for combined generation of fixed filter kernel parameters, i.e. feature maps X B Performing custom filtering operation with 16 Gabor filtering kernels with given parameters, and performing linear combination on the 16 Gabor filtering kernels to obtain the required size of 2H x 2W x k 2 Edge filtering kernel X gabor-f The method is used for rapidly enhancing the edge information of the picture to be reserved, wherein 16 Gabor filtering kernels comprise 8 directions, and the Gabor filtering kernels in the same direction comprise 2 Sigma parameters, so that the parameters of the 16 Gabor filtering kernels are different; finally, the adaptive filter kernel X is processed adp-f And an edge filtering kernel X gabor-f Adding element by element to obtain the final needed semi-filter kernel X filter-out
The image generation module has the following specific structure: the image generation module comprises three inputs, and an input feature map X with the same scale 1 Low-scale up-sampled input feature map X 2 Input half-filter kernel X generated by half-filter kernel module filter-out . Input feature map X 1 Sequentially passing through convolution layers with convolution kernel number of 3 and convolution kernel size of 3X 3 and up-sampling layers with multiple of 2, and comparing output result with input characteristic diagram X 2 Adding element by element to obtain a feature pattern X with the size of H W3 and finally needing filtering operation gen (ii) a Half-filtered kernel X filter-out And feature map X gen Performing convolution operation of the custom filtering kernel to obtain a final feature diagram X with the size of H X W X3 out Characteristic diagram X out The picture is the required picture which is processed by the shot rendering.
The specific structure of the complete network: the complete network is divided into 2 branches, and each branch processes information with different scales; the initial input to the network is the full focus picture I generated in step 1.3 org+c (ii) a Branch 1 contains a residual module, a half-filter kernel module and an image generation module, while branch 2 contains an attention module, a half-filter kernel module and an image generation module; in order to enhance the information correlation between different branches, the input of branch 2 is composed of the intermediate information and the full focus picture information of branch 1, which are all down-sampled, and the output result of branch 2 is fed back to the image generation of branch 1And the module is used for guiding the operation of the image generation module.
And 3, step 3: and training a semi-prediction filtering-based fast image scene rendering network model.
The network model is trained as follows:
firstly, the 5-channel full-focus picture I manufactured in the step 1.3 is input org+c (ii) a Then, the saliency detection module and the restrictive prediction filtering module are used for preserving the saliency characteristics of the image and blurring the background; finally, continuously optimizing the picture I with shot rendering output by the model by utilizing the loss function bokeh Gradually resembling the picture I with real bokeh rendering effect in the data set constructed in the step 1 gt
In the training process, the loss function L adopts the combination of the L1 function and the LS function, so that the model output picture I is improved bokeh And comparative picture I gt Structural similarity between the two images, and by utilizing the back propagation of deep learning, the output image I of the model is continuously reduced bokeh And comparative picture I gt So as to realize the picture I with the shot rendering output by the model bokeh The optimization is specifically represented as:
L=L1(I bokeh ,I gt )+LS(I bokeh ,I gt )
wherein L1 (I) bokeh ,I gt ) Picture I with shot rendering representing model output bokeh And contrast picture I gt LS (I) of bokeh ,I gt ) Picture I with shot rendering representing model output bokeh And comparative picture I gt The loss function is expressed as follows:
Figure BDA0003205091940000051
the Sobel represents that gradient calculation is carried out on the picture in the horizontal direction and the vertical direction and is used for calculating the outline structure of the content of the picture, and N represents the sum of the number of pixel points of the picture, namely the width W multiplied by the height H of the picture.
And 4, step 4: receiving the picture to be subjected to shot rendering processing by the trained neural network model, and outputting the picture after the shot rendering processing is finished;
and (3) loading the weight of the shot rendering network model trained in the step (2) and updating parameters in the model. Secondly, the full focus picture I with the modified size in step 1.2 is taken org+c The input data is transmitted into a shot rendering network model and sequentially passes through a significance detection module and a restrictive prediction filtering module to obtain a model output picture I with a shot rendering effect bokeh
The invention has the following beneficial effects:
1. the method for rendering the fast shot of the image based on the semi-prediction filtering is innovatively provided, and the fast shot rendering of the image is realized on the premise of ensuring the shot rendering quality.
2. The coordinate graph is innovatively provided and used for assisting the training of the network model and improving the capability of the network model for distinguishing important contents in the input image.
Drawings
FIG. 1 is a schematic flow diagram of the process of the present invention;
FIG. 2 is a flowchart of a shot rendering process for a single image;
FIG. 3 is a diagram of a scene rendering network based on semi-predictive filtering;
FIG. 4 is an effect diagram of an automobile generating a shot rendering;
FIG. 5 is an effect diagram of a streetlight generating a shot rendering.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
The invention is first defined and explained below:
I org : full focus picture
I org+c : 5-channel full containing coordinate graph informationFocusing on pictures, true input information for the network model
I gt : picture with real shot rendering effect
I bokeh : model output picture with shot rendering effect
FIG. 2 is a flowchart of a shot rendering process for a single image;
as shown in fig. 1, the present invention provides a fast image shot rendering method based on semi-predictive filtering, which comprises the following basic steps:
step 1: data set production
Step 1.1: the method comprises the steps of obtaining data shot under different scenes through shooting by a single lens reflex, wherein the data shot under the different scenes are a pair of pictures, namely, all-focus pictures I shot by the single lens reflex org And a picture I which is really shot by the single lens reflex camera by utilizing a large aperture and has a shot scene rendering effect gt . Wherein the picture I is in full focus org Picture I with real shot rendering effect as input image data in model training process gt As comparison data for comparison with the model output images during the model training process.
Step 1.2: all pictures of the data set are interpolated into the size of 1024 x 1472 by a bicubic linear interpolation method, and the size of the data set is unified, so that the operation time required by a training network is reduced.
Step 1.3: and (5) making a coordinate graph. For the full focus picture I processed in the step 1.2 org And (3) carrying out coordinate assignment, wherein the specific calculation method comprises the following steps:
Figure BDA0003205091940000071
Figure BDA0003205091940000072
wherein, X represents the pixel point coordinate corresponding to the picture high dimension, and Y represents the pixel point coordinate corresponding to the picture wide dimension. Combining the X and Y information with the full focus picture I org Combining, reconstructing a 5-channel full-focus picture I org+c As the final input picture of the network model.
Step 2: constructing a fast image scene rendering network model based on semi-prediction filtering;
step 2.1: and (3) carrying out scene rendering task theory derivation on the fast image based on semi-prediction filtering. Suppose the input is a full focus picture I org+c Using significance detection algorithm to convert the full focus picture I org+c Divided into two parts including a significant characteristic part I in a picture focus And background characteristics I of the picture defocus . Background region picture I by utilizing semi-filtering fuzzy algorithm defocus Blurring to obtain picture I with blurred background blur The semi-filter fuzzy algorithm is used for converting the salient characteristic part I focus Reserving and finally obtaining the picture I with the fuzzy background blur And salient feature part I focus Fusing to obtain the required picture I with the shot rendering bokeh . The theoretical model of the scene rendering task is formulated as follows:
Figure BDA0003205091940000073
wherein
Figure BDA0003205091940000074
Representing a saliency detection algorithm;
Figure BDA0003205091940000075
representing a semi-filtered blurring algorithm.
Step 2.2: constructing a fast image scene rendering network based on semi-prediction filtering:
the fast image scene rendering network based on semi-prediction filtering comprises an attention module, a residual error module, a semi-filtering kernel module and an image generation module. Wherein the attention module is used for detecting an input all-in-focus picture I org+c For assisting the operation of a subsequent restrictive prediction filtering module; residual mouldThe block is used for carrying out deep feature enhancement on input data; the semi-filtering kernel module is used for generating a required filtering kernel and is used for carrying out filtering operation on an input image and blurring partial content of a picture so as to generate a shot rendering effect, wherein the filtering kernel consists of a self-adaptive filtering kernel generated by a network and a small amount of Gabor filtering kernels manually defined with parameters, the self-adaptive filtering kernel generated by the network is used for self-adaptively blurring the input image, and the Gabor filtering kernel manually defined with parameters is used for reserving and enhancing salient region details and edge details of the image; the image generation module is used for generating a picture which needs to be filtered by using the filtering kernel generated by the half filtering kernel module.
The specific structure of the residual error module is as follows: input feature map X of residual module res Sequentially passing through convolution layers with the convolution kernel number of 64 and the convolution kernel size of 3X 3 to obtain an output characteristic diagram X' res . Finally, will output X' res And input X res Adding element by element to obtain the final output characteristic diagram X of the residual error module res-out . Wherein all convolutional layers are followed by a ReLU nonlinear activation function.
The attention module has the following specific structure: input profile X for attention Module att The dimension of (a) is height H width W channel C. The attention module is divided into three branches of up, middle mid and down, and a characteristic diagram X is input att After the convolution layers with the convolution kernel number of 64 and the convolution kernel size of 3 × 3 are processed by Reshape operation, the characteristic diagram X with the shape of HW × 64 is obtained up (ii) a Input feature map X att Performing Reshape operation after the convolution layer with the number of the convolution kernels of 64 mid branches and the convolution kernel size of 3X 3 to obtain a feature map X with the shape of 64X HW mid (ii) a Will feature diagram X up And feature map X mid Activating by adopting a Softmax function after matrix multiplication is carried out to obtain a characteristic diagram X with the shape of HW and HW act (ii) a Input feature map X att After the convolution layer with the convolution kernel number of 64 and the convolution kernel size of 3X 3 is processed by Reshape operation, the characteristic diagram X with the shape of HW X64 is obtained down (ii) a Will feature diagram X act And feature map X down After the matrix multiplication operation is carried out, reshape operation is carried out again,obtaining a characteristic diagram X with the shape of H X W64 final (ii) a Characteristic diagram X final After passing through a convolution layer with the number of 3 convolution kernels of 64 and the convolution kernel size of 3X 3, the convolution layer is compared with the input feature map X att Adding element by element to obtain the final output characteristic diagram X of the attention module att-out . Wherein all convolutional layers are followed by a ReLU nonlinear activation function.
The semi-filtering kernel module has the specific structure that: input feature map X of half-filter kernel module filter The dimension of (a) is height H width W channel C. Input feature map X filter Obtaining deep characteristic information X through a residual error module with 64 filtering kernels deep (ii) a Sequentially passing through convolution layers with the convolution kernel number of 64, the convolution kernel size of 3X 3 and the multiple of 2 to obtain a required feature map X 'for generating filtering' deep Size 2h x 2w x 64;
will feature picture X' deep Dividing the channel into feature maps X with the size of 2H.2W.48 A And feature pattern X of size 2H X2W X16 B (ii) a Characteristic diagram X A For generating adaptive filter kernels, i.e. feature maps X A Sequentially passing through convolution kernel with number k 2 Convolution kernel size 3 x 3 convolution layer and Softmax activation function to get size 2H x 2W x k 2 Adaptive filtering kernel X adp-f Where k is a predefined filter kernel size; characteristic diagram X B Edge filter kernels Gabor filter kernels for combined generation of fixed filter kernel parameters, i.e. feature maps X B Performing custom filtering operation with 16 Gabor filtering kernels with given parameters, and performing linear combination on the 16 Gabor filtering kernels to obtain the required size of 2H x 2W x k 2 Edge filtering kernel X gabor-f The method is used for rapidly enhancing the edge information of the picture to be changed and reserved, wherein 16 Gabor filtering kernels comprise 8 directions, and the Gabor filtering kernels in the same direction comprise 2 Sigma parameters, so that the parameters of the 16 Gabor filtering kernels are different; finally, the adaptive filter kernel X is processed adp-f And an edge filtering kernel X gabot-f Adding element by element to obtain the final needed half-filter kernel X filter-out
The image generation module has the specific structure that: image generation deviceThe module comprises three inputs, and an input feature map X with the same scale 1 Low-scale up-sampled input feature map X 2 Half-filter kernel X generated by input half-filter kernel module filter-out . Input feature map X 1 Sequentially passing through convolution layers with convolution kernel number of 3 and convolution kernel size of 3X 3 and up-sampling layers with multiple of 2, and comparing output result with input characteristic diagram X 2 Adding element by element to obtain a feature pattern X with the size of H W3 and finally needing filtering operation gen (ii) a Half-filtered kernel X filter-out And feature map X gen Performing convolution operation of the custom filtering kernel to obtain a final feature diagram X with the size of H X W X3 out Feature map X out The picture is the required picture which is processed by the shot rendering.
The specific structure of the complete network: the complete network is divided into 2 branches, and each branch processes information with different scales; the initial input to the network is the full focus picture I generated in step 1.3 org+c (ii) a Branch 1 contains a residual module, a half-filter kernel module and an image generation module, while branch 2 contains an attention module, a half-filter kernel module and an image generation module; in order to enhance the information correlation degree between different branches, the input of the branch 2 is composed of the intermediate information and the full focus picture information of the branch 1 which are all subjected to down sampling, and the output result of the branch 2 is fed back to the image generation module of the branch 1 for guiding the operation of the image generation module.
FIG. 3 is a diagram of a scene rendering network based on semi-predictive filtering;
and 3, step 3: and training a semi-prediction filtering-based fast image scene rendering network model.
The network model is trained as follows:
firstly, the 5-channel full-focus picture I manufactured in the step 1.3 is input org+c (ii) a Then, the saliency detection module and the restrictive prediction filtering module are used for preserving the saliency characteristics of the image and blurring the background; finally, continuously optimizing the picture I with shot rendering output by the model by utilizing the loss function bokeh To make it gradually resemble the truth in the data set constructed in step 1Picture I with real and scattered scene rendering effect gt
In the training process, the loss function L adopts the combination of an L1 function and an LS function, so that the output picture I of the model is improved bokeh And comparative picture I gt Structural similarity between the two images, and by utilizing the back propagation of deep learning, the output image I of the model is continuously reduced bokeh And comparative picture I gt So as to realize the picture I with the shot rendering output by the model bokeh The optimization is specifically represented as:
L=L1(I bokeh ,I gt )+LS(I bokeh ,I gt )
wherein L1 (I) bokeh ,I gt ) Picture I with shot rendering representing model output bokeh And comparative picture I gt Is a reconstruction function of LS (I) bokeh ,I gt ) Picture I with shot rendering representing model output bokeh And contrast picture I gt The loss function is expressed as follows:
Figure BDA0003205091940000101
the Sobel represents that gradient calculation in the horizontal direction and the vertical direction is carried out on the picture and is used for calculating the outline structure of the picture content, and N represents the sum of the number of pixel points of the picture, namely the width W multiplied by the height H of the picture.
And 4, step 4: the trained neural network model receives pictures needing to be subjected to shot rendering processing, and the pictures are output after the shot rendering processing is finished
Firstly, loading the weight of the shot rendering network model trained in the step 2, and updating parameters in the model. Secondly, the full focus picture I with the modified size in step 1.2 is taken org+c The input data is transmitted into a shot rendering network model and sequentially passes through a significance detection module and a restrictive prediction filtering module to obtain a model output picture I with a shot rendering effect bokeh
FIG. 4 is an effect diagram of an automobile generating a shot rendering;
FIG. 5 is an effect diagram of a streetlight generating a shot rendering.

Claims (1)

1. A fast image shot rendering method based on semi-prediction filtering is characterized by comprising the following steps:
step 1: making a data set;
step 2: constructing a fast image scene rendering network model based on semi-prediction filtering;
and step 3: training a fast image scene rendering network model based on semi-prediction filtering;
and 4, step 4: receiving the pictures needing to be subjected to the shot rendering processing by the trained neural network model, and outputting the pictures after the shot rendering processing is finished;
the specific method of the step 1 is as follows:
step 1.1: the method comprises the steps of obtaining data shot under different scenes through shooting by a single lens reflex, wherein the data shot under the different scenes are a pair of pictures, namely, the data are respectively a full-focus picture I shot by the single lens reflex org Picture I actually shot by single lens reflex through large aperture and having scattered scene rendering effect gt (ii) a Wherein the picture I is in full focus org Picture I with true shot rendering effect as input image data in model training process gt As comparison data for comparison with the model output image in the model training process;
step 1.2: interpolating all pictures of the data set into the size of 1024 height multiplied by 1472 width by a bicubic linear interpolation method;
step 1.3: making a coordinate graph; for the full focus picture I processed in the step 1.2 org And (3) carrying out coordinate assignment, wherein the specific calculation method comprises the following steps:
Figure FDA0003848393790000011
Figure FDA0003848393790000012
x represents the pixel point coordinate corresponding to the high dimension of the picture, and Y represents the pixel point coordinate corresponding to the wide dimension of the picture; combining the information of X and Y with the full focus picture I org Combining to reconstruct a 5-channel full-focus picture I org+c As the final input picture of the network model;
the specific method of the step 2 is as follows:
step 2.1: fast image scene rendering task theory derivation based on semi-prediction filtering; suppose the input is a full focus picture I org+c Using significance detection algorithm to convert the full focus picture I org+c Divided into two parts including a significant characteristic part I in a picture focus And background characteristics I of the picture defocus (ii) a Background region picture I by utilizing half-filtering fuzzy algorithm defocus Blurring to obtain picture I with blurred background blur The semi-filtering fuzzy algorithm divides the salient feature part I focus Preserving and finally obtaining the picture I with fuzzy background blur And salient feature portion I focus Fusing to obtain the required picture I with the shot rendering bokeh (ii) a The theoretical model of the scene rendering task is formulated as follows:
Figure FDA0003848393790000021
wherein
Figure FDA0003848393790000022
Representing a saliency detection algorithm;
Figure FDA0003848393790000023
representing a semi-filtered fuzzy algorithm;
step 2.2: constructing a fast image scene rendering network based on semi-prediction filtering;
the fast image scene rendering network based on semi-prediction filtering comprises an attention module, a residual error module, a semi-filtering kernel module and an imageAn image generation module; wherein the attention module is used for detecting the input full focus picture I org+c For assisting the operation of a subsequent restrictive prediction filtering module; the residual error module is used for carrying out deep feature enhancement on the input data; the semi-filtering kernel module is used for generating a required filtering kernel and is used for carrying out filtering operation on an input image and blurring partial content of a picture so as to generate a shot rendering effect, wherein the filtering kernel consists of a self-adaptive filtering kernel generated by a network and a small amount of Gabor filtering kernels manually defined with parameters, the self-adaptive filtering kernel generated by the network is used for self-adaptively blurring the input image, and the Gabor filtering kernel manually defined with parameters is used for reserving and enhancing salient region details and edge details of the image; the image generation module is used for generating a picture which needs to be filtered by using the filtering kernel generated by the half filtering kernel module;
the complete network is divided into 2 branches, and each branch processes information with different scales; the initial input to the network is the full focus picture I generated in step 1.3 org+c (ii) a Branch 1 contains a residual module, a half-filter kernel module and an image generation module, while branch 2 contains an attention module, a half-filter kernel module and an image generation module; in order to enhance the information correlation degree between different branches, the input of the branch 2 consists of the intermediate information and the full focus picture information of the branch 1 which are all subjected to down sampling, and the output result of the branch 2 is fed back to the image generation module of the branch 1 and is used for guiding the operation of the image generation module;
the specific structure of the residual error module is as follows: input feature map X of residual module res Sequentially passing through convolution layers with the convolution kernel number of 64 and the convolution kernel size of 3X 3 to obtain an output feature map X' res (ii) a Finally, X 'is output' res And input X res Adding element by element to obtain the final output characteristic diagram X of the residual error module res-out (ii) a Wherein a ReLU nonlinear activation function is connected after all the convolution layers;
attention module specific structure: input profile X for attention Module att The dimension of (a) is height H and width W channels C; the attention module is divided into three branches of up, middle mid and down, and a characteristic diagram X is input att Up-branched convolution kernelThe number of convolution layers is 64, the convolution kernel size is 3 × 3, and then Reshape operation is performed to obtain a feature map X with the shape of HW × 64 up (ii) a Input feature map X att Performing Reshape operation after convolution layer with the number of the convolution kernels passing through mid branches being 64 and the convolution kernel size being 3X 3 to obtain a feature map X with the shape of 64X HW mid (ii) a Will feature diagram X up And feature map X mid After matrix multiplication operation is carried out, activating by adopting Softmax function to obtain a characteristic diagram X with the shape of HW act (ii) a Input feature map X att After the convolution layer with the convolution kernel number of 64 and the convolution kernel size of 3X 3 is processed by Reshape operation, the characteristic diagram X with the shape of HW X64 is obtained down (ii) a Will feature diagram X act And feature map X down After the matrix multiplication, reshape operation is performed again to obtain a feature map X with a shape of H × W × 64 final (ii) a Characteristic diagram X final After passing through a convolution layer with the number of 3 convolution kernels of 64 and the convolution kernel size of 3X 3, the convolution layer is compared with the input feature map X att Adding element by element to obtain the final output characteristic diagram X of the attention module att-out (ii) a Wherein a ReLU nonlinear activation function is connected after all convolution layers;
the semi-filtering kernel module has the specific structure that: input feature map X of half-filter kernel module filter The dimension of (a) is height H wide W channel C; input feature map X filter Obtaining deep characteristic information X through a residual error module with 64 filtering kernels deep (ii) a Sequentially passing through convolution layers with convolution kernel number of 64 and convolution kernel size of 3X 3 and up-sampling layers with multiple of 2 to obtain a feature map X 'of required generated filtering' deep Size 2h x 2w x 64;
feature map X' deep Dividing the channel into feature maps X with the size of 2H.2W.48 A And feature pattern X of size 2H X2W X16 B (ii) a Characteristic diagram X A For generating adaptive filter kernels, i.e. feature maps X A Sequentially passes through convolution kernel with the number of k 2 Convolution kernel size 3 x 3 convolution layer and Softmax activation function to get size 2H x 2W x k 2 Adaptive filtering kernel X adp-f Wherein k is a predefined filtering kernelCun, cun; characteristic diagram X B Edge filter kernels Gabor filter kernels for combined generation of fixed filter kernel parameters, i.e. feature maps X B Performing custom filtering operation with 16 Gabor filtering kernels with given parameters, and performing linear combination on the 16 Gabor filtering kernels to obtain the required size 2H x 2W x k 2 Edge filtering kernel X gabor-f The method is used for rapidly enhancing the edge information of the picture to be reserved, wherein 16 Gabor filtering kernels comprise 8 directions, and the Gabor filtering kernels in the same direction comprise 2 Sigma parameters, so that the parameters of the 16 Gabor filtering kernels are different; finally, the adaptive filter kernel X is processed adp-f And an edge filtering kernel X gabor-f Adding element by element to obtain the final needed semi-filter kernel X filter-out
The image generation module has the specific structure that: the image generation module comprises three inputs, and an input feature map X with the same scale 1 Low-scale up-sampled input feature map X 2 Half-filter kernel X generated by input half-filter kernel module filter-out (ii) a Input feature map X 1 Sequentially passing through convolution layers with convolution kernel number of 3 and convolution kernel size of 3X 3 and up-sampling layers with multiple of 2, and outputting the output result and the input characteristic diagram X 2 Adding element by element to obtain a feature pattern X with the size of H W3 and finally needing filtering operation gen (ii) a Half-filtered kernel X filter-out And feature map X gen Performing convolution operation of the custom filtering kernel to obtain a final feature diagram X with the size of H X W X3 out Feature map X out The picture is the picture which is required to be subjected to the shot rendering;
the specific method of step 3 is as follows:
the network model is trained as follows:
firstly, inputting the 5-channel full-focusing picture I manufactured in the step 1.3 org+c (ii) a Then, the saliency detection module and the restrictive prediction filtering module are used for preserving the saliency characteristics of the image and blurring the background; finally, continuously optimizing the picture I with shot rendering output by the model by utilizing the loss function bokeh Gradually resembles the picture with real shot rendering effect in the data set constructed in the step 1I gt
In the training process, the loss function L adopts the combination of the L1 function and the LS function, so that the model output picture I is improved bokeh And comparative picture I gt Structural similarity between the two images, and by utilizing the back propagation of deep learning, the output image I of the model is continuously reduced bokeh And comparative picture I gt So as to realize the picture I with the shot rendering output by the model bokeh The optimization is specifically represented as:
L=L1(I bokeh ,I gt )+LS(I bokeh ,I gt )
wherein L1 (I) bokeh ,I gt ) Picture I with shot rendering representing model output bokeh And contrast picture I gt LS (I) of bokeh ,I gt ) Picture I with shot rendering representing model output bokeh And comparative picture I gt The loss function is expressed as follows:
Figure FDA0003848393790000041
wherein Sobel represents the gradient calculation of the picture in the horizontal and vertical directions, and is used for calculating the outline structure of the content of the picture, and N represents the sum of the number of pixel points of the picture, namely the width W multiplied by the height H of the picture;
the specific method of the step 4 is as follows:
loading the weight of the shot rendering network model trained in the step 2, and updating parameters in the model; secondly, the full focus picture I with the modified size in step 1.2 is taken org+c The input data is transmitted into a shot rendering network model and sequentially passes through a significance detection module and a restrictive prediction filtering module to obtain a model output picture I with a shot rendering effect bokeh
CN202110914290.0A 2021-08-10 2021-08-10 Rapid image and scene rendering method based on semi-predictive filtering Active CN113810597B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110914290.0A CN113810597B (en) 2021-08-10 2021-08-10 Rapid image and scene rendering method based on semi-predictive filtering

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110914290.0A CN113810597B (en) 2021-08-10 2021-08-10 Rapid image and scene rendering method based on semi-predictive filtering

Publications (2)

Publication Number Publication Date
CN113810597A CN113810597A (en) 2021-12-17
CN113810597B true CN113810597B (en) 2022-12-13

Family

ID=78893425

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110914290.0A Active CN113810597B (en) 2021-08-10 2021-08-10 Rapid image and scene rendering method based on semi-predictive filtering

Country Status (1)

Country Link
CN (1) CN113810597B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115049780A (en) * 2022-05-26 2022-09-13 北京京东尚科信息技术有限公司 Deep rendering model training method and device, and target rendering method and device

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108665494A (en) * 2017-03-27 2018-10-16 北京中科视维文化科技有限公司 Depth of field real-time rendering method based on quick guiding filtering
CN112073632A (en) * 2020-08-11 2020-12-11 联想(北京)有限公司 Image processing method, apparatus and storage medium
CN112184586A (en) * 2020-09-29 2021-01-05 中科方寸知微(南京)科技有限公司 Method and system for rapidly blurring monocular visual image background based on depth perception

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108055452B (en) * 2017-11-01 2020-09-18 Oppo广东移动通信有限公司 Image processing method, device and equipment
CN109345449B (en) * 2018-07-17 2020-11-10 西安交通大学 Image super-resolution and non-uniform blur removing method based on fusion network

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108665494A (en) * 2017-03-27 2018-10-16 北京中科视维文化科技有限公司 Depth of field real-time rendering method based on quick guiding filtering
CN112073632A (en) * 2020-08-11 2020-12-11 联想(北京)有限公司 Image processing method, apparatus and storage medium
CN112184586A (en) * 2020-09-29 2021-01-05 中科方寸知微(南京)科技有限公司 Method and system for rapidly blurring monocular visual image background based on depth perception

Also Published As

Publication number Publication date
CN113810597A (en) 2021-12-17

Similar Documents

Publication Publication Date Title
CN109493350B (en) Portrait segmentation method and device
TWI728465B (en) Method, device and electronic apparatus for image processing and storage medium thereof
Wang et al. Real-esrgan: Training real-world blind super-resolution with pure synthetic data
CN110136062B (en) Super-resolution reconstruction method combining semantic segmentation
US20230080693A1 (en) Image processing method, electronic device and readable storage medium
CN110717851A (en) Image processing method and device, neural network training method and storage medium
CN111669514B (en) High dynamic range imaging method and apparatus
US20230146181A1 (en) Integrated machine learning algorithms for image filters
CN111372006B (en) High dynamic range imaging method and system for mobile terminal
CN112164011A (en) Motion image deblurring method based on self-adaptive residual error and recursive cross attention
CN112419191B (en) Image motion blur removing method based on convolution neural network
CN113344773B (en) Single picture reconstruction HDR method based on multi-level dual feedback
CN111861886B (en) Image super-resolution reconstruction method based on multi-scale feedback network
Liu et al. Face super-resolution reconstruction based on self-attention residual network
Zhang et al. Multi-branch networks for video super-resolution with dynamic reconstruction strategy
CN116152128A (en) High dynamic range multi-exposure image fusion model and method based on attention mechanism
CN113810597B (en) Rapid image and scene rendering method based on semi-predictive filtering
Zhao et al. Deep pyramid generative adversarial network with local and nonlocal similarity features for natural motion image deblurring
CN112819705A (en) Real image denoising method based on mesh structure and long-distance correlation
CN112184550B (en) Neural network training method, image fusion method, device, equipment and medium
CN117952883A (en) Backlight image enhancement method based on bilateral grid and significance guidance
Raimundo et al. LAN: Lightweight attention-based network for RAW-to-RGB smartphone image processing
Wang et al. Self-supervised multi-scale pyramid fusion networks for realistic bokeh effect rendering
CN111953888B (en) Dim light imaging method and device, computer readable storage medium and terminal equipment
CN116485654A (en) Lightweight single-image super-resolution reconstruction method combining convolutional neural network and transducer

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant