CN113810597A - Rapid image and scene rendering method based on semi-prediction filtering - Google Patents

Rapid image and scene rendering method based on semi-prediction filtering Download PDF

Info

Publication number
CN113810597A
CN113810597A CN202110914290.0A CN202110914290A CN113810597A CN 113810597 A CN113810597 A CN 113810597A CN 202110914290 A CN202110914290 A CN 202110914290A CN 113810597 A CN113810597 A CN 113810597A
Authority
CN
China
Prior art keywords
picture
filtering
module
kernel
semi
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110914290.0A
Other languages
Chinese (zh)
Other versions
CN113810597B (en
Inventor
颜成钢
陈泉
马立栋
郑博仑
孙垚棋
张继勇
李宗鹏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Dianzi University
Original Assignee
Hangzhou Dianzi University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Dianzi University filed Critical Hangzhou Dianzi University
Priority to CN202110914290.0A priority Critical patent/CN113810597B/en
Publication of CN113810597A publication Critical patent/CN113810597A/en
Application granted granted Critical
Publication of CN113810597B publication Critical patent/CN113810597B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N23/00Cameras or camera modules comprising electronic image sensors; Control thereof
    • H04N23/95Computational photography systems, e.g. light-field imaging systems
    • H04N23/951Computational photography systems, e.g. light-field imaging systems by using two or more images to influence resolution, frame rate or aspect ratio
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/04Context-preserving transformations, e.g. by using an importance map
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/40Scaling of whole images or parts thereof, e.g. expanding or contracting
    • G06T3/4007Scaling of whole images or parts thereof, e.g. expanding or contracting based on interpolation, e.g. bilinear interpolation

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Image Processing (AREA)

Abstract

A fast image scene rendering method based on semi-prediction filtering comprises the following steps. Firstly, shooting by a single lens reflex camera to obtain data shot under different scenes, interpolating all pictures of a data set into the size of 1024 × 1472 by using a bicubic linear interpolation method, carrying out coordinate assignment on the processed full-focus picture, making a coordinate graph, and then constructing and training a fast graph and scene rendering network model based on semi-prediction filtering, wherein the network model comprises an attention module, a residual error module, a semi-filtering kernel module and an image generation module; and finally, receiving the pictures needing to be subjected to the shot rendering processing by the trained neural network model, and outputting the pictures after the shot rendering processing is finished. The method realizes the rapid shot rendering of the image on the premise of ensuring the shot rendering quality, and innovatively provides a coordinate graph for assisting the training of a network model and improving the capability of the network model for distinguishing important contents in the input image.

Description

Rapid image and scene rendering method based on semi-prediction filtering
Technical Field
The invention relates to a fast image and scene rendering method based on semi-prediction filtering, in particular to the field of scene effect processing based on a deep learning technology.
Background
The shot rendering effect is generally considered as one of aesthetic standards in the field of photography, and is easily realized by a photographer using a single lens reflex camera with the support of the prior art, and the photographer sets the camera to a large aperture shooting mode for image shooting so as to blur an uninteresting part in an image. In consideration of popularization of the smart phone, manufacturers try to add complex hardware and cameras at the mobile phone end to achieve the shot rendering effect of the smart phone from a hardware level, but high manufacturing cost is not friendly to merchants and consumers. Therefore, a shot rendering algorithm aiming at the image is developed on the basis of a software level and becomes a research hotspot, the shot rendering implementation method depends on the operational performance of the mobile phone, the required hardware cost is relatively low, and the method is suitable for most smart phones on the market. At present, most algorithms are realized based on deep learning, and an end-to-end network is built to realize the shot effect rendering of images. However, when the deep learning algorithm is integrated into the mobile phone, it is a difficult problem to shorten the operation time, the operation speed and the rendering effect are restricted, and how to unify the operation speed and the rendering effect is a problem that must be considered.
Disclosure of Invention
The technical problem to be solved is as follows: aiming at the problems of high cost of the hardware-based implementation method and operation speed and rendering quality of the software-based implementation method, the invention provides a fast image and scene rendering method based on semi-prediction filtering.
The implementation steps are as follows: the invention provides a fast image scene rendering method based on semi-prediction filtering, which comprises the following basic steps:
step 1: making a data set;
step 1.1: the method comprises the steps of obtaining data shot under different scenes through shooting by a single lens reflex, wherein the data shot under the different scenes are a pair of pictures, namely, the data are respectively a full-focus picture I shot by the single lens reflexorgAnd a picture I which is really shot by the single lens reflex camera by utilizing a large aperture and has a shot scene rendering effectgt. Wherein the picture is in full focus IorgPicture I with true shot rendering effect as input image data in model training processgtAs comparison data for comparison with the model output images during the model training process.
Step 1.2: all pictures of the data set are interpolated to a size of 1024 x 1472 in height by using a bicubic linear interpolation method.
Step 1.3: and (5) making a coordinate graph. For the full focus picture I processed in the step 1.2orgAnd (3) carrying out coordinate assignment, wherein the specific calculation method comprises the following steps:
Figure BDA0003205091940000021
Figure BDA0003205091940000022
wherein, X represents the pixel point coordinate corresponding to the high dimension of the picture, and Y represents the pixel point coordinate corresponding to the wide dimension of the picture. Combining the X and Y information with the full focus picture IorgCombining to reconstruct a 5-channel full-focus picture Iorg+cAs the final input picture of the network model.
Step 2: constructing a fast image scene rendering network model based on semi-prediction filtering;
step 2.1: semi-prediction filtering-based fast graph and scene rendering task theoryAnd (6) derivation. Suppose the input is a full focus picture Iorg+cUsing significance detection algorithm to convert the full focus picture Iorg+cDivided into two parts including a significant characteristic part I in a picturefocusAnd background characteristics I of the picturedefocus. Background region picture I by utilizing semi-filtering fuzzy algorithmdefocusBlurring to obtain picture I with blurred backgroundblurThe semi-filtering fuzzy algorithm divides the salient feature part IfocusPreserving and finally obtaining the picture I with fuzzy backgroundblurAnd salient feature portion IfocusFusing to obtain the required picture I with the shot renderingbokeh. The theoretical model of the scene rendering task is formulated as follows:
Figure BDA0003205091940000023
wherein
Figure BDA0003205091940000024
Representing a saliency detection algorithm;
Figure BDA0003205091940000025
representing a semi-filtered blurring algorithm.
Step 2.2: constructing a fast image scene rendering network based on semi-prediction filtering;
the fast image scene rendering network based on semi-prediction filtering comprises an attention module, a residual error module, a semi-filtering kernel module and an image generation module. Wherein the attention module is used for detecting the input full focus picture Iorg+cFor assisting the operation of a subsequent restrictive prediction filtering module; the residual error module is used for carrying out deep feature enhancement on the input data; the semi-filter kernel module is used for generating a needed filter kernel, and is used for carrying out filter operation on an input image and blurring partial content of the image so as to generate a shot rendering effect, wherein the filter kernel consists of a network generated self-adaptive filter kernel and a small amount of Gabor filter kernels with artificially defined parameters, and the network generated self-adaptive filter kernel is used for self-adaptingBlurring an input image, wherein a Gabor filtering kernel of artificially defined parameters is used for reserving and enhancing salient region details and edge details of the image; the image generation module is used for generating a picture which needs to be filtered by using the filtering kernel generated by the half filtering kernel module.
The specific structure of the residual error module is as follows: input feature map X of residual moduleresSequentially obtaining an output feature map X 'after 3 convolution layers with the convolution kernel number of 64 and the convolution kernel size of 3X 3'res. Finally, will output X'resAnd input XresAdding element by element to obtain the final output characteristic diagram X of the residual error moduleres-out. Wherein all convolutional layers are followed by a ReLU nonlinear activation function.
Attention module specific structure: input profile X for attention ModuleattThe dimension of (a) is height H width W channel C. The attention module is divided into three branches of up, middle mid and down, and a characteristic diagram X is inputattPerforming Reshape operation after convolution layer with convolution kernel number of 64 and convolution kernel size of 3X 3 after up branching to obtain feature diagram X with shape of HW X64up(ii) a Input feature map XattPerforming Reshape operation after convolution layers with the number of the convolution kernels passing through mid branches being 64 and the convolution kernel size being 3X 3 to obtain a feature map X with the shape of 64X HWmid(ii) a Will feature diagram XupAnd feature map XmidActivating by adopting a Softmax function after matrix multiplication is carried out to obtain a characteristic diagram X with the shape of HW and HWact(ii) a Input feature map XattAfter the convolution layer with the convolution kernel number of 64 and the convolution kernel size of 3X 3 is processed by Reshape operation, the characteristic diagram X with the shape of HW X64 is obtaineddown(ii) a Will feature diagram XactAnd feature map XdownAfter matrix multiplication, Reshape operation is carried out again to obtain a feature diagram X with the shape of H X W X64final(ii) a Characteristic diagram XfinalAfter passing through a convolution layer with the number of 3 convolution kernels being 64 and the convolution kernel size being 3X 3, the convolution layer is compared with the input feature map XattAdding element by element to obtain the final output characteristic diagram X of the attention moduleatt-out. Wherein all convolutional layers are followed by a ReLU nonlinear activation function.
The semi-filtering kernel module has the specific structure that: input feature map X of half-filter kernel modulefilterThe dimension of (a) is height H width W channel C. Input feature map XfilterObtaining deep characteristic information X through a residual error module with 64 filtering kernelsdeep(ii) a Sequentially passing through convolution layers with convolution kernel number of 64 and convolution kernel size of 3X 3 and up-sampling layers with multiple of 2 to obtain a feature map X 'of required generated filtering'deep2H x 2W x 64 in size;
will feature picture X'deepDivided by channel dimensions into a feature pattern X of size 2H X2W X48AAnd a feature pattern X of size 2H X2W X16B(ii) a Characteristic diagram XAFor generating adaptive filtering kernels, i.e. feature maps XASequentially passing through convolution kernel with number k2Convolution layer with convolution kernel size of 3 x 3 and Softmax activation function to obtain size of 2H x 2W x k2Adaptive filtering kernel Xadp-fWhere k is a predefined filter kernel size; characteristic diagram XBEdge filter kernels Gabor filter kernels for combined generation of fixed filter kernel parameters, i.e. feature maps XBPerforming custom filtering operation with 16 Gabor filtering kernels with given parameters, and performing linear combination on the 16 Gabor filtering kernels to obtain the required size of 2H x 2W x k2Edge filtering kernel Xgabor-fThe method is used for rapidly enhancing the edge information of the picture to be reserved, wherein 16 Gabor filtering kernels comprise 8 directions, and the Gabor filtering kernels in the same direction comprise 2 Sigma parameters, so that the parameters of the 16 Gabor filtering kernels are different; finally, the adaptive filter kernel X is processedadp-fAnd an edge filtering kernel Xgabor-fAdding element by element to obtain the final needed semi-filter kernel Xfilter-out
The image generation module has the specific structure that: the image generation module comprises three inputs, and an input feature map X with the same scale1Low-scale up-sampled input feature map X2Input half-filter kernel X generated by half-filter kernel modulefilter-out. Input feature map X1Sequentially passing through convolution layers with convolution kernel number of 3 and convolution kernel size of 3 x 3 and up-sampling layer with multiple of 2, and inputting the output resultCharacteristic diagram X2Adding element by element to obtain a feature diagram X with the size H W3 and finally needing filtering operationgen(ii) a Half-filtered kernel Xfilter-outAnd feature map XgenPerforming convolution operation of the custom filtering kernel to obtain a final feature diagram X with the size of H X W X3outFeature map XoutThe picture is the required picture which is processed by the shot rendering.
The specific structure of the complete network: the complete network is divided into 2 branches, and each branch processes information with different scales; the initial input to the network is the full focus picture I generated in step 1.3org+c(ii) a Branch 1 contains a residual module, a half-filter kernel module and an image generation module, while branch 2 contains an attention module, a half-filter kernel module and an image generation module; in order to enhance the information correlation degree between different branches, the input of the branch 2 is composed of the intermediate information and the full focus picture information of the branch 1 which are all subjected to down sampling, and the output result of the branch 2 is fed back to the image generation module of the branch 1 for guiding the operation of the image generation module.
And step 3: and training a semi-prediction filtering-based fast image scene rendering network model.
The network model is trained as follows:
firstly, inputting the 5-channel full-focusing picture I manufactured in the step 1.3org+c(ii) a Then, the saliency detection module and the restrictive prediction filtering module are used for preserving the saliency characteristics of the image and blurring the background; finally, continuously optimizing the picture I with shot rendering output by the model by utilizing the loss functionbokehGradually resembles the picture I with real shot rendering effect in the data set constructed in the step 1gt
In the training process, the loss function L adopts the combination of an L1 function and an LS function, and the model output picture I is improvedbokehAnd contrast picture IgtStructural similarity between the two images, and by utilizing the back propagation of deep learning, the output image I of the model is continuously reducedbokehAnd comparative picture IgtSo as to realize the picture I with the shot rendering output by the modelbokehOptimization of (1), which is embodiedExpressed as:
L=L1(Ibokeh,Igt)+LS(Ibokeh,Igt)
wherein L1 (I)bokeh,Igt) Picture I with shot rendering representing model outputbokehAnd contrast picture IgtIs a reconstruction function of LS (I)bokeh,Igt) Picture I with shot rendering representing model outputbokehAnd contrast picture IgtThe loss function is expressed as follows:
Figure BDA0003205091940000051
the Sobel represents that gradient calculation in the horizontal direction and the vertical direction is carried out on the picture and is used for calculating the outline structure of the picture content, and N represents the sum of the number of pixel points of the picture, namely the width W multiplied by the height H of the picture.
And 4, step 4: receiving the pictures needing to be subjected to the shot rendering processing by the trained neural network model, and outputting the pictures after the shot rendering processing is finished;
and (3) loading the weight of the shot rendering network model trained in the step (2) and updating parameters in the model. Secondly, the full focus picture I with the modified size in step 1.2 is takenorg+cThe input data is transmitted into a shot rendering network model and sequentially passes through a significance detection module and a restrictive prediction filtering module to obtain a model output picture I with a shot rendering effectbokeh
The invention has the following beneficial effects:
1. the method for fast image and scene rendering based on semi-predictive filtering is innovatively provided, and fast shot scene rendering of the image is achieved on the premise of ensuring shot scene rendering quality.
2. The coordinate graph is innovatively provided and used for assisting the training of the network model and improving the capability of the network model for distinguishing important contents in the input image.
Drawings
FIG. 1 is a schematic flow diagram of the process of the present invention;
FIG. 2 is a flowchart of a shot rendering process for a single image;
FIG. 3 is a diagram of a scene rendering network based on semi-predictive filtering;
FIG. 4 is an effect diagram of an automobile generating a shot rendering;
FIG. 5 is an effect diagram of a streetlight generating a shot rendering.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
The invention is first defined and explained below:
Iorg: full focus picture
Iorg+c: the 5-channel full-focus picture containing coordinate graph information is real input information of the network model
Igt: picture with real shot rendering effect
Ibokeh: model output picture with shot rendering effect
FIG. 2 is a flowchart of a shot rendering process for a single image;
as shown in fig. 1, the present invention provides a fast image shot rendering method based on semi-predictive filtering, which comprises the following basic steps:
step 1: data set production
Step 1.1: the method comprises the steps of obtaining data shot under different scenes through shooting by a single lens reflex, wherein the data shot under the different scenes are a pair of pictures, namely, the data are respectively a full-focus picture I shot by the single lens reflexorgAnd a picture I which is really shot by the single lens reflex camera by utilizing a large aperture and has a shot scene rendering effectgt. Wherein the picture is in full focus IorgPicture I with true shot rendering effect as input image data in model training processgtAs comparison data for comparison with the model output images during the model training process.
Step 1.2: all pictures of the data set are interpolated into the size of 1024 x 1472 by a bicubic linear interpolation method, and the size of the data set is unified, so that the operation time required by a training network is reduced.
Step 1.3: and (5) making a coordinate graph. For the full focus picture I processed in the step 1.2orgAnd (3) carrying out coordinate assignment, wherein the specific calculation method comprises the following steps:
Figure BDA0003205091940000071
Figure BDA0003205091940000072
wherein, X represents the pixel point coordinate corresponding to the high dimension of the picture, and Y represents the pixel point coordinate corresponding to the wide dimension of the picture. Combining the X and Y information with the full focus picture IorgCombining to reconstruct a 5-channel full-focus picture Iorg+cAs the final input picture of the network model.
Step 2: constructing a fast image scene rendering network model based on semi-prediction filtering;
step 2.1: and (3) carrying out theoretical derivation on a fast image scene rendering task based on semi-predictive filtering. Suppose the input is a full focus picture Iorg+cUsing significance detection algorithm to convert the full focus picture Iorg+cDivided into two parts including a significant characteristic part I in a picturefocusAnd background characteristics I of the picturedefocus. Background region picture I by utilizing semi-filtering fuzzy algorithmdefocusBlurring to obtain picture I with blurred backgroundblurThe semi-filtering fuzzy algorithm divides the salient feature part IfocusPreserving and finally obtaining the picture I with fuzzy backgroundblurAnd salient feature portion IfocusFusing to obtain the required picture I with the shot renderingbokeh. The theoretical model of the scene rendering task is formulated as follows:
Figure BDA0003205091940000073
wherein
Figure BDA0003205091940000074
Representing a saliency detection algorithm;
Figure BDA0003205091940000075
representing a semi-filtered blurring algorithm.
Step 2.2: constructing a fast image scene rendering network based on semi-prediction filtering:
the fast image scene rendering network based on semi-prediction filtering comprises an attention module, a residual error module, a semi-filtering kernel module and an image generation module. Wherein the attention module is used for detecting the input full focus picture Iorg+cFor assisting the operation of a subsequent restrictive prediction filtering module; the residual error module is used for carrying out deep feature enhancement on the input data; the semi-filtering kernel module is used for generating a required filtering kernel and is used for carrying out filtering operation on an input image and blurring partial content of a picture so as to generate a shot rendering effect, wherein the filtering kernel consists of a self-adaptive filtering kernel generated by a network and a small amount of Gabor filtering kernels manually defined with parameters, the self-adaptive filtering kernel generated by the network is used for self-adaptively blurring the input image, and the Gabor filtering kernel manually defined with parameters is used for reserving and enhancing salient region details and edge details of the image; the image generation module is used for generating a picture which needs to be filtered by using the filtering kernel generated by the half filtering kernel module.
The specific structure of the residual error module is as follows: input feature map X of residual moduleresSequentially obtaining an output feature map X 'after 3 convolution layers with the convolution kernel number of 64 and the convolution kernel size of 3X 3'res. Finally, will output X'resAnd input XresAdding element by element to obtain the final output characteristic diagram X of the residual error moduleres-out. Wherein all convolutional layers are followed by a ReLU nonlinear activation function.
Attention module specific structure: of attention modulesInput feature map XattThe dimension of (a) is height H width W channel C. The attention module is divided into three branches of up, middle mid and down, and a characteristic diagram X is inputattPerforming Reshape operation after convolution layer with convolution kernel number of 64 and convolution kernel size of 3X 3 after up branching to obtain feature diagram X with shape of HW X64up(ii) a Input feature map XattPerforming Reshape operation after convolution layers with the number of the convolution kernels passing through mid branches being 64 and the convolution kernel size being 3X 3 to obtain a feature map X with the shape of 64X HWmid(ii) a Will feature diagram XupAnd feature map XmidActivating by adopting a Softmax function after matrix multiplication is carried out to obtain a characteristic diagram X with the shape of HW and HWact(ii) a Input feature map XattAfter the convolution layer with the convolution kernel number of 64 and the convolution kernel size of 3X 3 is processed by Reshape operation, the characteristic diagram X with the shape of HW X64 is obtaineddown(ii) a Will feature diagram XactAnd feature map XdownAfter matrix multiplication, Reshape operation is carried out again to obtain a feature diagram X with the shape of H X W X64final(ii) a Characteristic diagram XfinalAfter passing through a convolution layer with the number of 3 convolution kernels being 64 and the convolution kernel size being 3X 3, the convolution layer is compared with the input feature map XattAdding element by element to obtain the final output characteristic diagram X of the attention moduleatt-out. Wherein all convolutional layers are followed by a ReLU nonlinear activation function.
The semi-filtering kernel module has the specific structure that: input feature map X of half-filter kernel modulefilterThe dimension of (a) is height H width W channel C. Input feature map XfilterObtaining deep characteristic information X through a residual error module with 64 filtering kernelsdeep(ii) a Sequentially passing through convolution layers with convolution kernel number of 64 and convolution kernel size of 3X 3 and up-sampling layers with multiple of 2 to obtain a feature map X 'of required generated filtering'deep2H x 2W x 64 in size;
will feature picture X'deepDivided by channel dimensions into a feature pattern X of size 2H X2W X48AAnd a feature pattern X of size 2H X2W X16B(ii) a Characteristic diagram XAFor generating adaptive filtering kernels, i.e. feature maps XASequentially passes through a convolution kernel by the number ofk2Convolution layer with convolution kernel size of 3 x 3 and Softmax activation function to obtain size of 2H x 2W x k2Adaptive filtering kernel Xadp-fWhere k is a predefined filter kernel size; characteristic diagram XBEdge filter kernels Gabor filter kernels for combined generation of fixed filter kernel parameters, i.e. feature maps XBPerforming custom filtering operation with 16 Gabor filtering kernels with given parameters, and performing linear combination on the 16 Gabor filtering kernels to obtain the required size of 2H x 2W x k2Edge filtering kernel Xgabor-fThe method is used for rapidly enhancing the edge information of the picture to be reserved, wherein 16 Gabor filtering kernels comprise 8 directions, and the Gabor filtering kernels in the same direction comprise 2 Sigma parameters, so that the parameters of the 16 Gabor filtering kernels are different; finally, the adaptive filter kernel X is processedadp-fAnd an edge filtering kernel Xgabot-fAdding element by element to obtain the final needed semi-filter kernel Xfilter-out
The image generation module has the specific structure that: the image generation module comprises three inputs, and an input feature map X with the same scale1Low-scale up-sampled input feature map X2Input half-filter kernel X generated by half-filter kernel modulefilter-out. Input feature map X1Sequentially passing through convolution layers with convolution kernel number of 3 and convolution kernel size of 3X 3 and up-sampling layer with multiple of 2, and outputting the result and inputting characteristic diagram X2Adding element by element to obtain a feature diagram X with the size H W3 and finally needing filtering operationgen(ii) a Half-filtered kernel Xfilter-outAnd feature map XgenPerforming convolution operation of the custom filtering kernel to obtain a final feature diagram X with the size of H X W X3outFeature map XoutThe picture is the required picture which is processed by the shot rendering.
The specific structure of the complete network: the complete network is divided into 2 branches, and each branch processes information with different scales; the initial input to the network is the full focus picture I generated in step 1.3org+c(ii) a Branch 1 contains a residual module, a half-filter kernel module and an image generation module, while branch 2 contains an attention module, a half-filter kernel module and an image generation module; to strengthenWith the information relevance between the branches, the input of the branch 2 is composed of the intermediate information and the full focus picture information of the branch 1 which are all subjected to down sampling, and the output result of the branch 2 is fed back to the image generation module of the branch 1 for guiding the operation of the image generation module.
FIG. 3 is a diagram of a scene rendering network based on semi-predictive filtering;
and step 3: and training a semi-prediction filtering-based fast image scene rendering network model.
The network model is trained as follows:
firstly, inputting the 5-channel full-focusing picture I manufactured in the step 1.3org+c(ii) a Then, the saliency detection module and the restrictive prediction filtering module are used for preserving the saliency characteristics of the image and blurring the background; finally, continuously optimizing the picture I with shot rendering output by the model by utilizing the loss functionbokehGradually resembles the picture I with real shot rendering effect in the data set constructed in the step 1gt
In the training process, the loss function L adopts the combination of an L1 function and an LS function, and the model output picture I is improvedbokehAnd contrast picture IgtStructural similarity between the two images, and by utilizing the back propagation of deep learning, the output image I of the model is continuously reducedbokehAnd comparative picture IgtSo as to realize the picture I with the shot rendering output by the modelbokehThe optimization is specifically represented as:
L=L1(Ibokeh,Igt)+LS(Ibokeh,Igt)
wherein L1 (I)bokeh,Igt) Picture I with shot rendering representing model outputbokehAnd contrast picture IgtIs a reconstruction function of LS (I)bokeh,Igt) Picture I with shot rendering representing model outputbokehAnd contrast picture IgtThe loss function is expressed as follows:
Figure BDA0003205091940000101
the Sobel represents that gradient calculation in the horizontal direction and the vertical direction is carried out on the picture and is used for calculating the outline structure of the picture content, and N represents the sum of the number of pixel points of the picture, namely the width W multiplied by the height H of the picture.
And 4, step 4: the trained neural network model receives pictures needing to be subjected to shot rendering processing, and the pictures are output after the shot rendering processing is finished
Firstly, loading the weight of the shot rendering network model trained in the step 2, and updating parameters in the model. Secondly, the full focus picture I with the modified size in step 1.2 is takenorg+cThe input data is transmitted into a shot rendering network model and sequentially passes through a significance detection module and a restrictive prediction filtering module to obtain a model output picture I with a shot rendering effectbokeh
FIG. 4 is an effect diagram of an automobile generating a shot rendering;
FIG. 5 is an effect diagram of a streetlight generating a shot rendering.

Claims (9)

1. A fast image shot rendering method based on semi-prediction filtering is characterized by comprising the following steps:
step 1: making a data set;
step 2: constructing a fast image scene rendering network model based on semi-prediction filtering;
and step 3: training a fast image scene rendering network model based on semi-prediction filtering;
and 4, step 4: and receiving the pictures needing to be subjected to the shot rendering processing by the trained neural network model, and outputting the pictures after the shot rendering processing is finished.
2. The method for rendering a fast image scene based on semi-predictive filtering according to claim 1, wherein the specific method in step 1 is as follows:
step 1.1: acquiring data shot in different scenes through shooting by a single lens reflex, wherein the data shot in the different scenes are all data shot in the different scenesA pair of pictures, i.e. all-in-focus pictures I taken by slrs respectivelyorgAnd a picture I which is really shot by the single lens reflex camera by utilizing a large aperture and has a shot scene rendering effectgt(ii) a Wherein the picture is in full focus IorgPicture I with true shot rendering effect as input image data in model training processgtAs comparison data for comparison with the model output image in the model training process;
step 1.2: interpolating all pictures of the data set into the size of 1024 height multiplied by 1472 width by a bicubic linear interpolation method;
step 1.3: making a coordinate graph; for the full focus picture I processed in the step 1.2orgAnd (3) carrying out coordinate assignment, wherein the specific calculation method comprises the following steps:
Figure FDA0003205091930000011
Figure FDA0003205091930000012
x represents the pixel point coordinate corresponding to the high dimension of the picture, and Y represents the pixel point coordinate corresponding to the wide dimension of the picture; combining the X and Y information with the full focus picture IorgCombining to reconstruct a 5-channel full-focus picture Iorg+cAs the final input picture of the network model.
3. The method for rendering a fast image scene based on semi-predictive filtering according to claim 2, wherein the specific method in step 2 is as follows:
step 2.1: carrying out scene rendering task theoretical derivation on the basis of the fast image of semi-prediction filtering; suppose the input is a full focus picture Iorg+cUsing significance detection algorithm to convert the full focus picture Iorg+cDivided into two parts including a significant characteristic part I in a picturefocusAnd background characteristics I of the picturedefocus(ii) a Background reduction using half-filter fuzzy algorithmRegion picture IdefocusBlurring to obtain picture I with blurred backgroundblurThe semi-filtering fuzzy algorithm divides the salient feature part IfocusPreserving and finally obtaining the picture I with fuzzy backgroundblurAnd salient feature portion IfocusFusing to obtain the required picture I with the shot renderingbokeh(ii) a The theoretical model of the scene rendering task is formulated as follows:
Figure FDA0003205091930000021
wherein
Figure FDA0003205091930000022
Representing a saliency detection algorithm;
Figure FDA0003205091930000023
represents a semi-filtered fuzzy algorithm;
step 2.2: constructing a fast image scene rendering network based on semi-prediction filtering;
the fast image scene rendering network based on semi-prediction filtering comprises an attention module, a residual error module, a semi-filtering kernel module and an image generation module; wherein the attention module is used for detecting the input full focus picture Iorg+cFor assisting the operation of a subsequent restrictive prediction filtering module; the residual error module is used for carrying out deep feature enhancement on the input data; the semi-filtering kernel module is used for generating a required filtering kernel and is used for carrying out filtering operation on an input image and blurring partial content of a picture so as to generate a shot rendering effect, wherein the filtering kernel consists of a self-adaptive filtering kernel generated by a network and a small amount of Gabor filtering kernels manually defined with parameters, the self-adaptive filtering kernel generated by the network is used for self-adaptively blurring the input image, and the Gabor filtering kernel manually defined with parameters is used for reserving and enhancing salient region details and edge details of the image; the image generation module is used for generating a picture which needs to be filtered by using the filtering kernel generated by the half filtering kernel module;
the complete network is divided into 2 branches, and each branch processes information with different scales; the initial input to the network is the full focus picture I generated in step 1.3org+c(ii) a Branch 1 contains a residual module, a half-filter kernel module and an image generation module, while branch 2 contains an attention module, a half-filter kernel module and an image generation module; in order to enhance the information correlation degree between different branches, the input of the branch 2 is composed of the intermediate information and the full focus picture information of the branch 1 which are all subjected to down sampling, and the output result of the branch 2 is fed back to the image generation module of the branch 1 for guiding the operation of the image generation module.
4. The method for fast picture and scene rendering based on semi-predictive filtering according to claim 3, wherein the residual module has a specific structure: input feature map X of residual moduleresSequentially obtaining an output feature map X 'after 3 convolution layers with the convolution kernel number of 64 and the convolution kernel size of 3X 3'res(ii) a Finally, will output X'resAnd input XresAdding element by element to obtain the final output characteristic diagram X of the residual error moduleres-out(ii) a Wherein all convolutional layers are followed by a ReLU nonlinear activation function.
5. The method for fast image scene rendering based on semi-predictive filtering according to claim 4, wherein the attention module has a specific structure: input profile X for attention ModuleattThe dimension of (a) is height H and width W channels C; the attention module is divided into three branches of up, middle mid and down, and a characteristic diagram X is inputattPerforming Reshape operation after convolution layer with convolution kernel number of 64 and convolution kernel size of 3X 3 after up branching to obtain feature diagram X with shape of HW X64up(ii) a Input feature map XattPerforming Reshape operation after convolution layers with the number of the convolution kernels passing through mid branches being 64 and the convolution kernel size being 3X 3 to obtain a feature map X with the shape of 64X HWmid(ii) a Will feature diagram XupAnd feature map XmidActivating by adopting a Softmax function after matrix multiplication is carried out to obtain a characteristic diagram X with the shape of HW and HWact(ii) a Input feature map XattAfter the convolution layer with the convolution kernel number of 64 and the convolution kernel size of 3X 3 is processed by Reshape operation, the characteristic diagram X with the shape of HW X64 is obtaineddown(ii) a Will feature diagram XactAnd feature map XdownAfter matrix multiplication, Reshape operation is carried out again to obtain a feature diagram X with the shape of H X W X64final(ii) a Characteristic diagram XfinalAfter passing through a convolution layer with the number of 3 convolution kernels being 64 and the convolution kernel size being 3X 3, the convolution layer is compared with the input feature map XattAdding element by element to obtain the final output characteristic diagram X of the attention moduleatt-out(ii) a Wherein all convolutional layers are followed by a ReLU nonlinear activation function.
6. The method for fast image scene rendering based on semi-predictive filtering according to claim 5, wherein the semi-filtering kernel module has a specific structure: input feature map X of half-filter kernel modulefilterThe dimension of (a) is height H and width W channels C; input feature map XfilterObtaining deep characteristic information X through a residual error module with 64 filtering kernelsdeep(ii) a Sequentially passing through convolution layers with convolution kernel number of 64 and convolution kernel size of 3X 3 and up-sampling layers with multiple of 2 to obtain a feature map X 'of required generated filtering'deep2H x 2W x 64 in size;
will feature picture X'deepDivided by channel dimensions into a feature pattern X of size 2H X2W X48AAnd a feature pattern X of size 2H X2W X16B(ii) a Characteristic diagram XAFor generating adaptive filtering kernels, i.e. feature maps XASequentially passing through convolution kernel with number k2Convolution layer with convolution kernel size of 3 x 3 and Softmax activation function to obtain size of 2H x 2W x k2Adaptive filtering kernel Xadp-fWhere k is a predefined filter kernel size; characteristic diagram XBEdge filter kernels Gabor filter kernels for combined generation of fixed filter kernel parameters, i.e. feature maps XBPerforming custom filtering operation with 16 Gabor filtering kernels with given parameters, and performing linear combination on the 16 Gabor filtering kernels to obtain the required size of 2H x 2W x k2Edge filterWave nucleus Xgabor-fThe method is used for rapidly enhancing the edge information of the picture to be reserved, wherein 16 Gabor filtering kernels comprise 8 directions, and the Gabor filtering kernels in the same direction comprise 2 Sigma parameters, so that the parameters of the 16 Gabor filtering kernels are different; finally, the adaptive filter kernel X is processedadp-fAnd an edge filtering kernel Xgabor-fAdding element by element to obtain the final needed semi-filter kernel Xfilter-out
7. The method for fast image scene rendering based on semi-predictive filtering according to claim 6, wherein the image generation module has a specific structure: the image generation module comprises three inputs, and an input feature map X with the same scale1Low-scale up-sampled input feature map X2Input half-filter kernel X generated by half-filter kernel modulefilter-out(ii) a Input feature map X1Sequentially passing through convolution layers with convolution kernel number of 3 and convolution kernel size of 3X 3 and up-sampling layer with multiple of 2, and outputting the result and inputting characteristic diagram X2Adding element by element to obtain a feature diagram X with the size H W3 and finally needing filtering operationgen(ii) a Half-filtered kernel Xfilter-outAnd feature map XgenPerforming convolution operation of the custom filtering kernel to obtain a final feature diagram X with the size of H X W X3outFeature map XoutThe picture is the required picture which is processed by the shot rendering.
8. The method for fast picture and scene rendering based on semi-predictive filtering according to any one of claims 3-7, wherein the specific method in step 3 is as follows:
the network model is trained as follows:
firstly, inputting the 5-channel full-focusing picture I manufactured in the step 1.3org+c(ii) a Then, the saliency detection module and the restrictive prediction filtering module are used for preserving the saliency characteristics of the image and blurring the background; finally, continuously optimizing the picture I with shot rendering output by the model by utilizing the loss functionbokehTo make it gradually resemble the number constructed in step 1Picture I with real shot rendering effect in data setgt
In the training process, the loss function L adopts the combination of an L1 function and an LS function, and the model output picture I is improvedbokehAnd contrast picture IgtStructural similarity between the two images, and by utilizing the back propagation of deep learning, the output image I of the model is continuously reducedbokehAnd comparative picture IgtSo as to realize the picture I with the shot rendering output by the modelbokehThe optimization is specifically represented as:
L=L1(Ibokeh,Igt)+LS(Ibokeh,Igt)
wherein L1 (I)bokeh,Igt) Picture I with shot rendering representing model outputbokehAnd contrast picture IgtIs a reconstruction function of LS (I)bokeh,Igt) Picture I with shot rendering representing model outputbokehAnd contrast picture IgtThe loss function is expressed as follows:
Figure FDA0003205091930000051
the Sobel represents that gradient calculation in the horizontal direction and the vertical direction is carried out on the picture and is used for calculating the outline structure of the picture content, and N represents the sum of the number of pixel points of the picture, namely the width W multiplied by the height H of the picture.
9. The method for fast image scene rendering based on semi-predictive filtering according to claim 8, wherein the specific method in step 4 is as follows:
loading the weight of the shot rendering network model trained in the step 2, and updating parameters in the model; secondly, the full focus picture I with the modified size in step 1.2 is takenorg+cThe input data is transmitted into a shot rendering network model and sequentially passes through a significance detection module and a restrictive prediction filtering module to obtain a model output picture I with a shot rendering effectbokeh
CN202110914290.0A 2021-08-10 2021-08-10 Rapid image and scene rendering method based on semi-predictive filtering Active CN113810597B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110914290.0A CN113810597B (en) 2021-08-10 2021-08-10 Rapid image and scene rendering method based on semi-predictive filtering

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110914290.0A CN113810597B (en) 2021-08-10 2021-08-10 Rapid image and scene rendering method based on semi-predictive filtering

Publications (2)

Publication Number Publication Date
CN113810597A true CN113810597A (en) 2021-12-17
CN113810597B CN113810597B (en) 2022-12-13

Family

ID=78893425

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110914290.0A Active CN113810597B (en) 2021-08-10 2021-08-10 Rapid image and scene rendering method based on semi-predictive filtering

Country Status (1)

Country Link
CN (1) CN113810597B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023226479A1 (en) * 2022-05-26 2023-11-30 北京京东尚科信息技术有限公司 Deep rendering model training method and apparatus, and target rendering method and apparatus

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108665494A (en) * 2017-03-27 2018-10-16 北京中科视维文化科技有限公司 Depth of field real-time rendering method based on quick guiding filtering
US20190130532A1 (en) * 2017-11-01 2019-05-02 Guangdong Oppo Mobile Telecommunications Corp., Ltd. Image-processing method, apparatus and device
CN112073632A (en) * 2020-08-11 2020-12-11 联想(北京)有限公司 Image processing method, apparatus and storage medium
CN112184586A (en) * 2020-09-29 2021-01-05 中科方寸知微(南京)科技有限公司 Method and system for rapidly blurring monocular visual image background based on depth perception
US20210166350A1 (en) * 2018-07-17 2021-06-03 Xi'an Jiaotong University Fusion network-based method for image super-resolution and non-uniform motion deblurring

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108665494A (en) * 2017-03-27 2018-10-16 北京中科视维文化科技有限公司 Depth of field real-time rendering method based on quick guiding filtering
US20190130532A1 (en) * 2017-11-01 2019-05-02 Guangdong Oppo Mobile Telecommunications Corp., Ltd. Image-processing method, apparatus and device
US20210166350A1 (en) * 2018-07-17 2021-06-03 Xi'an Jiaotong University Fusion network-based method for image super-resolution and non-uniform motion deblurring
CN112073632A (en) * 2020-08-11 2020-12-11 联想(北京)有限公司 Image processing method, apparatus and storage medium
CN112184586A (en) * 2020-09-29 2021-01-05 中科方寸知微(南京)科技有限公司 Method and system for rapidly blurring monocular visual image background based on depth perception

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023226479A1 (en) * 2022-05-26 2023-11-30 北京京东尚科信息技术有限公司 Deep rendering model training method and apparatus, and target rendering method and apparatus

Also Published As

Publication number Publication date
CN113810597B (en) 2022-12-13

Similar Documents

Publication Publication Date Title
CN109493350B (en) Portrait segmentation method and device
Wang et al. Real-esrgan: Training real-world blind super-resolution with pure synthetic data
CN108537746B (en) Fuzzy variable image blind restoration method based on deep convolutional network
Liang et al. Cameranet: A two-stage framework for effective camera isp learning
CN110136062B (en) Super-resolution reconstruction method combining semantic segmentation
WO2020152521A1 (en) Systems and methods for transforming raw sensor data captured in low-light conditions to well-exposed images using neural network architectures
US20230146181A1 (en) Integrated machine learning algorithms for image filters
CN111372006B (en) High dynamic range imaging method and system for mobile terminal
CN112419191B (en) Image motion blur removing method based on convolution neural network
CN113793286B (en) Media image watermark removing method based on multi-order attention neural network
Liu et al. Face super-resolution reconstruction based on self-attention residual network
CN116152128A (en) High dynamic range multi-exposure image fusion model and method based on attention mechanism
CN113810597B (en) Rapid image and scene rendering method based on semi-predictive filtering
CN116071279A (en) Image processing method, device, computer equipment and storage medium
CN113658091A (en) Image evaluation method, storage medium and terminal equipment
CN112184550B (en) Neural network training method, image fusion method, device, equipment and medium
Raimundo et al. LAN: Lightweight attention-based network for RAW-to-RGB smartphone image processing
CN113379600A (en) Short video super-resolution conversion method, device and medium based on deep learning
CN111953888B (en) Dim light imaging method and device, computer readable storage medium and terminal equipment
CN112150363A (en) Convolution neural network-based image night scene processing method, and computing module and readable storage medium for operating method
CN116485654A (en) Lightweight single-image super-resolution reconstruction method combining convolutional neural network and transducer
CN113763524B (en) Dual-flow foreground rendering method and system based on physical optical model and neural network
CN115564655A (en) Video super-resolution reconstruction method, system and medium based on deep learning
CN112016456B (en) Video super-resolution method and system based on adaptive back projection depth learning
Ren et al. Fast Ultra High-Definition Video Deblurring via Multi-scale Separable Network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant