CN113810597A - Rapid image and scene rendering method based on semi-prediction filtering - Google Patents
Rapid image and scene rendering method based on semi-prediction filtering Download PDFInfo
- Publication number
- CN113810597A CN113810597A CN202110914290.0A CN202110914290A CN113810597A CN 113810597 A CN113810597 A CN 113810597A CN 202110914290 A CN202110914290 A CN 202110914290A CN 113810597 A CN113810597 A CN 113810597A
- Authority
- CN
- China
- Prior art keywords
- picture
- filtering
- module
- kernel
- semi
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000001914 filtration Methods 0.000 title claims abstract description 120
- 238000009877 rendering Methods 0.000 title claims abstract description 105
- 238000000034 method Methods 0.000 title claims abstract description 50
- 230000011514 reflex Effects 0.000 claims abstract description 8
- 238000003062 neural network model Methods 0.000 claims abstract description 4
- 238000010586 diagram Methods 0.000 claims description 52
- 230000006870 function Effects 0.000 claims description 27
- 230000000694 effects Effects 0.000 claims description 21
- 238000004422 calculation algorithm Methods 0.000 claims description 17
- 238000001514 detection method Methods 0.000 claims description 12
- 230000004913 activation Effects 0.000 claims description 9
- 238000005070 sampling Methods 0.000 claims description 9
- 230000003044 adaptive effect Effects 0.000 claims description 6
- 238000004364 calculation method Methods 0.000 claims description 6
- 238000013135 deep learning Methods 0.000 claims description 6
- 230000002708 enhancing effect Effects 0.000 claims description 6
- 239000011159 matrix material Substances 0.000 claims description 6
- 230000000052 comparative effect Effects 0.000 claims description 3
- 238000009795 derivation Methods 0.000 claims description 3
- 238000005457 optimization Methods 0.000 claims description 3
- 238000004519 manufacturing process Methods 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N23/00—Cameras or camera modules comprising electronic image sensors; Control thereof
- H04N23/95—Computational photography systems, e.g. light-field imaging systems
- H04N23/951—Computational photography systems, e.g. light-field imaging systems by using two or more images to influence resolution, frame rate or aspect ratio
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/048—Activation functions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T3/00—Geometric image transformations in the plane of the image
- G06T3/04—Context-preserving transformations, e.g. by using an importance map
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T3/00—Geometric image transformations in the plane of the image
- G06T3/40—Scaling of whole images or parts thereof, e.g. expanding or contracting
- G06T3/4007—Scaling of whole images or parts thereof, e.g. expanding or contracting based on interpolation, e.g. bilinear interpolation
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Computing Systems (AREA)
- General Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Biomedical Technology (AREA)
- Molecular Biology (AREA)
- Biophysics (AREA)
- General Engineering & Computer Science (AREA)
- Life Sciences & Earth Sciences (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Image Processing (AREA)
Abstract
A fast image scene rendering method based on semi-prediction filtering comprises the following steps. Firstly, shooting by a single lens reflex camera to obtain data shot under different scenes, interpolating all pictures of a data set into the size of 1024 × 1472 by using a bicubic linear interpolation method, carrying out coordinate assignment on the processed full-focus picture, making a coordinate graph, and then constructing and training a fast graph and scene rendering network model based on semi-prediction filtering, wherein the network model comprises an attention module, a residual error module, a semi-filtering kernel module and an image generation module; and finally, receiving the pictures needing to be subjected to the shot rendering processing by the trained neural network model, and outputting the pictures after the shot rendering processing is finished. The method realizes the rapid shot rendering of the image on the premise of ensuring the shot rendering quality, and innovatively provides a coordinate graph for assisting the training of a network model and improving the capability of the network model for distinguishing important contents in the input image.
Description
Technical Field
The invention relates to a fast image and scene rendering method based on semi-prediction filtering, in particular to the field of scene effect processing based on a deep learning technology.
Background
The shot rendering effect is generally considered as one of aesthetic standards in the field of photography, and is easily realized by a photographer using a single lens reflex camera with the support of the prior art, and the photographer sets the camera to a large aperture shooting mode for image shooting so as to blur an uninteresting part in an image. In consideration of popularization of the smart phone, manufacturers try to add complex hardware and cameras at the mobile phone end to achieve the shot rendering effect of the smart phone from a hardware level, but high manufacturing cost is not friendly to merchants and consumers. Therefore, a shot rendering algorithm aiming at the image is developed on the basis of a software level and becomes a research hotspot, the shot rendering implementation method depends on the operational performance of the mobile phone, the required hardware cost is relatively low, and the method is suitable for most smart phones on the market. At present, most algorithms are realized based on deep learning, and an end-to-end network is built to realize the shot effect rendering of images. However, when the deep learning algorithm is integrated into the mobile phone, it is a difficult problem to shorten the operation time, the operation speed and the rendering effect are restricted, and how to unify the operation speed and the rendering effect is a problem that must be considered.
Disclosure of Invention
The technical problem to be solved is as follows: aiming at the problems of high cost of the hardware-based implementation method and operation speed and rendering quality of the software-based implementation method, the invention provides a fast image and scene rendering method based on semi-prediction filtering.
The implementation steps are as follows: the invention provides a fast image scene rendering method based on semi-prediction filtering, which comprises the following basic steps:
step 1: making a data set;
step 1.1: the method comprises the steps of obtaining data shot under different scenes through shooting by a single lens reflex, wherein the data shot under the different scenes are a pair of pictures, namely, the data are respectively a full-focus picture I shot by the single lens reflexorgAnd a picture I which is really shot by the single lens reflex camera by utilizing a large aperture and has a shot scene rendering effectgt. Wherein the picture is in full focus IorgPicture I with true shot rendering effect as input image data in model training processgtAs comparison data for comparison with the model output images during the model training process.
Step 1.2: all pictures of the data set are interpolated to a size of 1024 x 1472 in height by using a bicubic linear interpolation method.
Step 1.3: and (5) making a coordinate graph. For the full focus picture I processed in the step 1.2orgAnd (3) carrying out coordinate assignment, wherein the specific calculation method comprises the following steps:
wherein, X represents the pixel point coordinate corresponding to the high dimension of the picture, and Y represents the pixel point coordinate corresponding to the wide dimension of the picture. Combining the X and Y information with the full focus picture IorgCombining to reconstruct a 5-channel full-focus picture Iorg+cAs the final input picture of the network model.
Step 2: constructing a fast image scene rendering network model based on semi-prediction filtering;
step 2.1: semi-prediction filtering-based fast graph and scene rendering task theoryAnd (6) derivation. Suppose the input is a full focus picture Iorg+cUsing significance detection algorithm to convert the full focus picture Iorg+cDivided into two parts including a significant characteristic part I in a picturefocusAnd background characteristics I of the picturedefocus. Background region picture I by utilizing semi-filtering fuzzy algorithmdefocusBlurring to obtain picture I with blurred backgroundblurThe semi-filtering fuzzy algorithm divides the salient feature part IfocusPreserving and finally obtaining the picture I with fuzzy backgroundblurAnd salient feature portion IfocusFusing to obtain the required picture I with the shot renderingbokeh. The theoretical model of the scene rendering task is formulated as follows:
Step 2.2: constructing a fast image scene rendering network based on semi-prediction filtering;
the fast image scene rendering network based on semi-prediction filtering comprises an attention module, a residual error module, a semi-filtering kernel module and an image generation module. Wherein the attention module is used for detecting the input full focus picture Iorg+cFor assisting the operation of a subsequent restrictive prediction filtering module; the residual error module is used for carrying out deep feature enhancement on the input data; the semi-filter kernel module is used for generating a needed filter kernel, and is used for carrying out filter operation on an input image and blurring partial content of the image so as to generate a shot rendering effect, wherein the filter kernel consists of a network generated self-adaptive filter kernel and a small amount of Gabor filter kernels with artificially defined parameters, and the network generated self-adaptive filter kernel is used for self-adaptingBlurring an input image, wherein a Gabor filtering kernel of artificially defined parameters is used for reserving and enhancing salient region details and edge details of the image; the image generation module is used for generating a picture which needs to be filtered by using the filtering kernel generated by the half filtering kernel module.
The specific structure of the residual error module is as follows: input feature map X of residual moduleresSequentially obtaining an output feature map X 'after 3 convolution layers with the convolution kernel number of 64 and the convolution kernel size of 3X 3'res. Finally, will output X'resAnd input XresAdding element by element to obtain the final output characteristic diagram X of the residual error moduleres-out. Wherein all convolutional layers are followed by a ReLU nonlinear activation function.
Attention module specific structure: input profile X for attention ModuleattThe dimension of (a) is height H width W channel C. The attention module is divided into three branches of up, middle mid and down, and a characteristic diagram X is inputattPerforming Reshape operation after convolution layer with convolution kernel number of 64 and convolution kernel size of 3X 3 after up branching to obtain feature diagram X with shape of HW X64up(ii) a Input feature map XattPerforming Reshape operation after convolution layers with the number of the convolution kernels passing through mid branches being 64 and the convolution kernel size being 3X 3 to obtain a feature map X with the shape of 64X HWmid(ii) a Will feature diagram XupAnd feature map XmidActivating by adopting a Softmax function after matrix multiplication is carried out to obtain a characteristic diagram X with the shape of HW and HWact(ii) a Input feature map XattAfter the convolution layer with the convolution kernel number of 64 and the convolution kernel size of 3X 3 is processed by Reshape operation, the characteristic diagram X with the shape of HW X64 is obtaineddown(ii) a Will feature diagram XactAnd feature map XdownAfter matrix multiplication, Reshape operation is carried out again to obtain a feature diagram X with the shape of H X W X64final(ii) a Characteristic diagram XfinalAfter passing through a convolution layer with the number of 3 convolution kernels being 64 and the convolution kernel size being 3X 3, the convolution layer is compared with the input feature map XattAdding element by element to obtain the final output characteristic diagram X of the attention moduleatt-out. Wherein all convolutional layers are followed by a ReLU nonlinear activation function.
The semi-filtering kernel module has the specific structure that: input feature map X of half-filter kernel modulefilterThe dimension of (a) is height H width W channel C. Input feature map XfilterObtaining deep characteristic information X through a residual error module with 64 filtering kernelsdeep(ii) a Sequentially passing through convolution layers with convolution kernel number of 64 and convolution kernel size of 3X 3 and up-sampling layers with multiple of 2 to obtain a feature map X 'of required generated filtering'deep2H x 2W x 64 in size;
will feature picture X'deepDivided by channel dimensions into a feature pattern X of size 2H X2W X48AAnd a feature pattern X of size 2H X2W X16B(ii) a Characteristic diagram XAFor generating adaptive filtering kernels, i.e. feature maps XASequentially passing through convolution kernel with number k2Convolution layer with convolution kernel size of 3 x 3 and Softmax activation function to obtain size of 2H x 2W x k2Adaptive filtering kernel Xadp-fWhere k is a predefined filter kernel size; characteristic diagram XBEdge filter kernels Gabor filter kernels for combined generation of fixed filter kernel parameters, i.e. feature maps XBPerforming custom filtering operation with 16 Gabor filtering kernels with given parameters, and performing linear combination on the 16 Gabor filtering kernels to obtain the required size of 2H x 2W x k2Edge filtering kernel Xgabor-fThe method is used for rapidly enhancing the edge information of the picture to be reserved, wherein 16 Gabor filtering kernels comprise 8 directions, and the Gabor filtering kernels in the same direction comprise 2 Sigma parameters, so that the parameters of the 16 Gabor filtering kernels are different; finally, the adaptive filter kernel X is processedadp-fAnd an edge filtering kernel Xgabor-fAdding element by element to obtain the final needed semi-filter kernel Xfilter-out。
The image generation module has the specific structure that: the image generation module comprises three inputs, and an input feature map X with the same scale1Low-scale up-sampled input feature map X2Input half-filter kernel X generated by half-filter kernel modulefilter-out. Input feature map X1Sequentially passing through convolution layers with convolution kernel number of 3 and convolution kernel size of 3 x 3 and up-sampling layer with multiple of 2, and inputting the output resultCharacteristic diagram X2Adding element by element to obtain a feature diagram X with the size H W3 and finally needing filtering operationgen(ii) a Half-filtered kernel Xfilter-outAnd feature map XgenPerforming convolution operation of the custom filtering kernel to obtain a final feature diagram X with the size of H X W X3outFeature map XoutThe picture is the required picture which is processed by the shot rendering.
The specific structure of the complete network: the complete network is divided into 2 branches, and each branch processes information with different scales; the initial input to the network is the full focus picture I generated in step 1.3org+c(ii) a Branch 1 contains a residual module, a half-filter kernel module and an image generation module, while branch 2 contains an attention module, a half-filter kernel module and an image generation module; in order to enhance the information correlation degree between different branches, the input of the branch 2 is composed of the intermediate information and the full focus picture information of the branch 1 which are all subjected to down sampling, and the output result of the branch 2 is fed back to the image generation module of the branch 1 for guiding the operation of the image generation module.
And step 3: and training a semi-prediction filtering-based fast image scene rendering network model.
The network model is trained as follows:
firstly, inputting the 5-channel full-focusing picture I manufactured in the step 1.3org+c(ii) a Then, the saliency detection module and the restrictive prediction filtering module are used for preserving the saliency characteristics of the image and blurring the background; finally, continuously optimizing the picture I with shot rendering output by the model by utilizing the loss functionbokehGradually resembles the picture I with real shot rendering effect in the data set constructed in the step 1gt。
In the training process, the loss function L adopts the combination of an L1 function and an LS function, and the model output picture I is improvedbokehAnd contrast picture IgtStructural similarity between the two images, and by utilizing the back propagation of deep learning, the output image I of the model is continuously reducedbokehAnd comparative picture IgtSo as to realize the picture I with the shot rendering output by the modelbokehOptimization of (1), which is embodiedExpressed as:
L=L1(Ibokeh,Igt)+LS(Ibokeh,Igt)
wherein L1 (I)bokeh,Igt) Picture I with shot rendering representing model outputbokehAnd contrast picture IgtIs a reconstruction function of LS (I)bokeh,Igt) Picture I with shot rendering representing model outputbokehAnd contrast picture IgtThe loss function is expressed as follows:
the Sobel represents that gradient calculation in the horizontal direction and the vertical direction is carried out on the picture and is used for calculating the outline structure of the picture content, and N represents the sum of the number of pixel points of the picture, namely the width W multiplied by the height H of the picture.
And 4, step 4: receiving the pictures needing to be subjected to the shot rendering processing by the trained neural network model, and outputting the pictures after the shot rendering processing is finished;
and (3) loading the weight of the shot rendering network model trained in the step (2) and updating parameters in the model. Secondly, the full focus picture I with the modified size in step 1.2 is takenorg+cThe input data is transmitted into a shot rendering network model and sequentially passes through a significance detection module and a restrictive prediction filtering module to obtain a model output picture I with a shot rendering effectbokeh。
The invention has the following beneficial effects:
1. the method for fast image and scene rendering based on semi-predictive filtering is innovatively provided, and fast shot scene rendering of the image is achieved on the premise of ensuring shot scene rendering quality.
2. The coordinate graph is innovatively provided and used for assisting the training of the network model and improving the capability of the network model for distinguishing important contents in the input image.
Drawings
FIG. 1 is a schematic flow diagram of the process of the present invention;
FIG. 2 is a flowchart of a shot rendering process for a single image;
FIG. 3 is a diagram of a scene rendering network based on semi-predictive filtering;
FIG. 4 is an effect diagram of an automobile generating a shot rendering;
FIG. 5 is an effect diagram of a streetlight generating a shot rendering.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
The invention is first defined and explained below:
Iorg: full focus picture
Iorg+c: the 5-channel full-focus picture containing coordinate graph information is real input information of the network model
Igt: picture with real shot rendering effect
Ibokeh: model output picture with shot rendering effect
FIG. 2 is a flowchart of a shot rendering process for a single image;
as shown in fig. 1, the present invention provides a fast image shot rendering method based on semi-predictive filtering, which comprises the following basic steps:
step 1: data set production
Step 1.1: the method comprises the steps of obtaining data shot under different scenes through shooting by a single lens reflex, wherein the data shot under the different scenes are a pair of pictures, namely, the data are respectively a full-focus picture I shot by the single lens reflexorgAnd a picture I which is really shot by the single lens reflex camera by utilizing a large aperture and has a shot scene rendering effectgt. Wherein the picture is in full focus IorgPicture I with true shot rendering effect as input image data in model training processgtAs comparison data for comparison with the model output images during the model training process.
Step 1.2: all pictures of the data set are interpolated into the size of 1024 x 1472 by a bicubic linear interpolation method, and the size of the data set is unified, so that the operation time required by a training network is reduced.
Step 1.3: and (5) making a coordinate graph. For the full focus picture I processed in the step 1.2orgAnd (3) carrying out coordinate assignment, wherein the specific calculation method comprises the following steps:
wherein, X represents the pixel point coordinate corresponding to the high dimension of the picture, and Y represents the pixel point coordinate corresponding to the wide dimension of the picture. Combining the X and Y information with the full focus picture IorgCombining to reconstruct a 5-channel full-focus picture Iorg+cAs the final input picture of the network model.
Step 2: constructing a fast image scene rendering network model based on semi-prediction filtering;
step 2.1: and (3) carrying out theoretical derivation on a fast image scene rendering task based on semi-predictive filtering. Suppose the input is a full focus picture Iorg+cUsing significance detection algorithm to convert the full focus picture Iorg+cDivided into two parts including a significant characteristic part I in a picturefocusAnd background characteristics I of the picturedefocus. Background region picture I by utilizing semi-filtering fuzzy algorithmdefocusBlurring to obtain picture I with blurred backgroundblurThe semi-filtering fuzzy algorithm divides the salient feature part IfocusPreserving and finally obtaining the picture I with fuzzy backgroundblurAnd salient feature portion IfocusFusing to obtain the required picture I with the shot renderingbokeh. The theoretical model of the scene rendering task is formulated as follows:
Step 2.2: constructing a fast image scene rendering network based on semi-prediction filtering:
the fast image scene rendering network based on semi-prediction filtering comprises an attention module, a residual error module, a semi-filtering kernel module and an image generation module. Wherein the attention module is used for detecting the input full focus picture Iorg+cFor assisting the operation of a subsequent restrictive prediction filtering module; the residual error module is used for carrying out deep feature enhancement on the input data; the semi-filtering kernel module is used for generating a required filtering kernel and is used for carrying out filtering operation on an input image and blurring partial content of a picture so as to generate a shot rendering effect, wherein the filtering kernel consists of a self-adaptive filtering kernel generated by a network and a small amount of Gabor filtering kernels manually defined with parameters, the self-adaptive filtering kernel generated by the network is used for self-adaptively blurring the input image, and the Gabor filtering kernel manually defined with parameters is used for reserving and enhancing salient region details and edge details of the image; the image generation module is used for generating a picture which needs to be filtered by using the filtering kernel generated by the half filtering kernel module.
The specific structure of the residual error module is as follows: input feature map X of residual moduleresSequentially obtaining an output feature map X 'after 3 convolution layers with the convolution kernel number of 64 and the convolution kernel size of 3X 3'res. Finally, will output X'resAnd input XresAdding element by element to obtain the final output characteristic diagram X of the residual error moduleres-out. Wherein all convolutional layers are followed by a ReLU nonlinear activation function.
Attention module specific structure: of attention modulesInput feature map XattThe dimension of (a) is height H width W channel C. The attention module is divided into three branches of up, middle mid and down, and a characteristic diagram X is inputattPerforming Reshape operation after convolution layer with convolution kernel number of 64 and convolution kernel size of 3X 3 after up branching to obtain feature diagram X with shape of HW X64up(ii) a Input feature map XattPerforming Reshape operation after convolution layers with the number of the convolution kernels passing through mid branches being 64 and the convolution kernel size being 3X 3 to obtain a feature map X with the shape of 64X HWmid(ii) a Will feature diagram XupAnd feature map XmidActivating by adopting a Softmax function after matrix multiplication is carried out to obtain a characteristic diagram X with the shape of HW and HWact(ii) a Input feature map XattAfter the convolution layer with the convolution kernel number of 64 and the convolution kernel size of 3X 3 is processed by Reshape operation, the characteristic diagram X with the shape of HW X64 is obtaineddown(ii) a Will feature diagram XactAnd feature map XdownAfter matrix multiplication, Reshape operation is carried out again to obtain a feature diagram X with the shape of H X W X64final(ii) a Characteristic diagram XfinalAfter passing through a convolution layer with the number of 3 convolution kernels being 64 and the convolution kernel size being 3X 3, the convolution layer is compared with the input feature map XattAdding element by element to obtain the final output characteristic diagram X of the attention moduleatt-out. Wherein all convolutional layers are followed by a ReLU nonlinear activation function.
The semi-filtering kernel module has the specific structure that: input feature map X of half-filter kernel modulefilterThe dimension of (a) is height H width W channel C. Input feature map XfilterObtaining deep characteristic information X through a residual error module with 64 filtering kernelsdeep(ii) a Sequentially passing through convolution layers with convolution kernel number of 64 and convolution kernel size of 3X 3 and up-sampling layers with multiple of 2 to obtain a feature map X 'of required generated filtering'deep2H x 2W x 64 in size;
will feature picture X'deepDivided by channel dimensions into a feature pattern X of size 2H X2W X48AAnd a feature pattern X of size 2H X2W X16B(ii) a Characteristic diagram XAFor generating adaptive filtering kernels, i.e. feature maps XASequentially passes through a convolution kernel by the number ofk2Convolution layer with convolution kernel size of 3 x 3 and Softmax activation function to obtain size of 2H x 2W x k2Adaptive filtering kernel Xadp-fWhere k is a predefined filter kernel size; characteristic diagram XBEdge filter kernels Gabor filter kernels for combined generation of fixed filter kernel parameters, i.e. feature maps XBPerforming custom filtering operation with 16 Gabor filtering kernels with given parameters, and performing linear combination on the 16 Gabor filtering kernels to obtain the required size of 2H x 2W x k2Edge filtering kernel Xgabor-fThe method is used for rapidly enhancing the edge information of the picture to be reserved, wherein 16 Gabor filtering kernels comprise 8 directions, and the Gabor filtering kernels in the same direction comprise 2 Sigma parameters, so that the parameters of the 16 Gabor filtering kernels are different; finally, the adaptive filter kernel X is processedadp-fAnd an edge filtering kernel Xgabot-fAdding element by element to obtain the final needed semi-filter kernel Xfilter-out。
The image generation module has the specific structure that: the image generation module comprises three inputs, and an input feature map X with the same scale1Low-scale up-sampled input feature map X2Input half-filter kernel X generated by half-filter kernel modulefilter-out. Input feature map X1Sequentially passing through convolution layers with convolution kernel number of 3 and convolution kernel size of 3X 3 and up-sampling layer with multiple of 2, and outputting the result and inputting characteristic diagram X2Adding element by element to obtain a feature diagram X with the size H W3 and finally needing filtering operationgen(ii) a Half-filtered kernel Xfilter-outAnd feature map XgenPerforming convolution operation of the custom filtering kernel to obtain a final feature diagram X with the size of H X W X3outFeature map XoutThe picture is the required picture which is processed by the shot rendering.
The specific structure of the complete network: the complete network is divided into 2 branches, and each branch processes information with different scales; the initial input to the network is the full focus picture I generated in step 1.3org+c(ii) a Branch 1 contains a residual module, a half-filter kernel module and an image generation module, while branch 2 contains an attention module, a half-filter kernel module and an image generation module; to strengthenWith the information relevance between the branches, the input of the branch 2 is composed of the intermediate information and the full focus picture information of the branch 1 which are all subjected to down sampling, and the output result of the branch 2 is fed back to the image generation module of the branch 1 for guiding the operation of the image generation module.
FIG. 3 is a diagram of a scene rendering network based on semi-predictive filtering;
and step 3: and training a semi-prediction filtering-based fast image scene rendering network model.
The network model is trained as follows:
firstly, inputting the 5-channel full-focusing picture I manufactured in the step 1.3org+c(ii) a Then, the saliency detection module and the restrictive prediction filtering module are used for preserving the saliency characteristics of the image and blurring the background; finally, continuously optimizing the picture I with shot rendering output by the model by utilizing the loss functionbokehGradually resembles the picture I with real shot rendering effect in the data set constructed in the step 1gt。
In the training process, the loss function L adopts the combination of an L1 function and an LS function, and the model output picture I is improvedbokehAnd contrast picture IgtStructural similarity between the two images, and by utilizing the back propagation of deep learning, the output image I of the model is continuously reducedbokehAnd comparative picture IgtSo as to realize the picture I with the shot rendering output by the modelbokehThe optimization is specifically represented as:
L=L1(Ibokeh,Igt)+LS(Ibokeh,Igt)
wherein L1 (I)bokeh,Igt) Picture I with shot rendering representing model outputbokehAnd contrast picture IgtIs a reconstruction function of LS (I)bokeh,Igt) Picture I with shot rendering representing model outputbokehAnd contrast picture IgtThe loss function is expressed as follows:
the Sobel represents that gradient calculation in the horizontal direction and the vertical direction is carried out on the picture and is used for calculating the outline structure of the picture content, and N represents the sum of the number of pixel points of the picture, namely the width W multiplied by the height H of the picture.
And 4, step 4: the trained neural network model receives pictures needing to be subjected to shot rendering processing, and the pictures are output after the shot rendering processing is finished
Firstly, loading the weight of the shot rendering network model trained in the step 2, and updating parameters in the model. Secondly, the full focus picture I with the modified size in step 1.2 is takenorg+cThe input data is transmitted into a shot rendering network model and sequentially passes through a significance detection module and a restrictive prediction filtering module to obtain a model output picture I with a shot rendering effectbokeh。
FIG. 4 is an effect diagram of an automobile generating a shot rendering;
FIG. 5 is an effect diagram of a streetlight generating a shot rendering.
Claims (9)
1. A fast image shot rendering method based on semi-prediction filtering is characterized by comprising the following steps:
step 1: making a data set;
step 2: constructing a fast image scene rendering network model based on semi-prediction filtering;
and step 3: training a fast image scene rendering network model based on semi-prediction filtering;
and 4, step 4: and receiving the pictures needing to be subjected to the shot rendering processing by the trained neural network model, and outputting the pictures after the shot rendering processing is finished.
2. The method for rendering a fast image scene based on semi-predictive filtering according to claim 1, wherein the specific method in step 1 is as follows:
step 1.1: acquiring data shot in different scenes through shooting by a single lens reflex, wherein the data shot in the different scenes are all data shot in the different scenesA pair of pictures, i.e. all-in-focus pictures I taken by slrs respectivelyorgAnd a picture I which is really shot by the single lens reflex camera by utilizing a large aperture and has a shot scene rendering effectgt(ii) a Wherein the picture is in full focus IorgPicture I with true shot rendering effect as input image data in model training processgtAs comparison data for comparison with the model output image in the model training process;
step 1.2: interpolating all pictures of the data set into the size of 1024 height multiplied by 1472 width by a bicubic linear interpolation method;
step 1.3: making a coordinate graph; for the full focus picture I processed in the step 1.2orgAnd (3) carrying out coordinate assignment, wherein the specific calculation method comprises the following steps:
x represents the pixel point coordinate corresponding to the high dimension of the picture, and Y represents the pixel point coordinate corresponding to the wide dimension of the picture; combining the X and Y information with the full focus picture IorgCombining to reconstruct a 5-channel full-focus picture Iorg+cAs the final input picture of the network model.
3. The method for rendering a fast image scene based on semi-predictive filtering according to claim 2, wherein the specific method in step 2 is as follows:
step 2.1: carrying out scene rendering task theoretical derivation on the basis of the fast image of semi-prediction filtering; suppose the input is a full focus picture Iorg+cUsing significance detection algorithm to convert the full focus picture Iorg+cDivided into two parts including a significant characteristic part I in a picturefocusAnd background characteristics I of the picturedefocus(ii) a Background reduction using half-filter fuzzy algorithmRegion picture IdefocusBlurring to obtain picture I with blurred backgroundblurThe semi-filtering fuzzy algorithm divides the salient feature part IfocusPreserving and finally obtaining the picture I with fuzzy backgroundblurAnd salient feature portion IfocusFusing to obtain the required picture I with the shot renderingbokeh(ii) a The theoretical model of the scene rendering task is formulated as follows:
step 2.2: constructing a fast image scene rendering network based on semi-prediction filtering;
the fast image scene rendering network based on semi-prediction filtering comprises an attention module, a residual error module, a semi-filtering kernel module and an image generation module; wherein the attention module is used for detecting the input full focus picture Iorg+cFor assisting the operation of a subsequent restrictive prediction filtering module; the residual error module is used for carrying out deep feature enhancement on the input data; the semi-filtering kernel module is used for generating a required filtering kernel and is used for carrying out filtering operation on an input image and blurring partial content of a picture so as to generate a shot rendering effect, wherein the filtering kernel consists of a self-adaptive filtering kernel generated by a network and a small amount of Gabor filtering kernels manually defined with parameters, the self-adaptive filtering kernel generated by the network is used for self-adaptively blurring the input image, and the Gabor filtering kernel manually defined with parameters is used for reserving and enhancing salient region details and edge details of the image; the image generation module is used for generating a picture which needs to be filtered by using the filtering kernel generated by the half filtering kernel module;
the complete network is divided into 2 branches, and each branch processes information with different scales; the initial input to the network is the full focus picture I generated in step 1.3org+c(ii) a Branch 1 contains a residual module, a half-filter kernel module and an image generation module, while branch 2 contains an attention module, a half-filter kernel module and an image generation module; in order to enhance the information correlation degree between different branches, the input of the branch 2 is composed of the intermediate information and the full focus picture information of the branch 1 which are all subjected to down sampling, and the output result of the branch 2 is fed back to the image generation module of the branch 1 for guiding the operation of the image generation module.
4. The method for fast picture and scene rendering based on semi-predictive filtering according to claim 3, wherein the residual module has a specific structure: input feature map X of residual moduleresSequentially obtaining an output feature map X 'after 3 convolution layers with the convolution kernel number of 64 and the convolution kernel size of 3X 3'res(ii) a Finally, will output X'resAnd input XresAdding element by element to obtain the final output characteristic diagram X of the residual error moduleres-out(ii) a Wherein all convolutional layers are followed by a ReLU nonlinear activation function.
5. The method for fast image scene rendering based on semi-predictive filtering according to claim 4, wherein the attention module has a specific structure: input profile X for attention ModuleattThe dimension of (a) is height H and width W channels C; the attention module is divided into three branches of up, middle mid and down, and a characteristic diagram X is inputattPerforming Reshape operation after convolution layer with convolution kernel number of 64 and convolution kernel size of 3X 3 after up branching to obtain feature diagram X with shape of HW X64up(ii) a Input feature map XattPerforming Reshape operation after convolution layers with the number of the convolution kernels passing through mid branches being 64 and the convolution kernel size being 3X 3 to obtain a feature map X with the shape of 64X HWmid(ii) a Will feature diagram XupAnd feature map XmidActivating by adopting a Softmax function after matrix multiplication is carried out to obtain a characteristic diagram X with the shape of HW and HWact(ii) a Input feature map XattAfter the convolution layer with the convolution kernel number of 64 and the convolution kernel size of 3X 3 is processed by Reshape operation, the characteristic diagram X with the shape of HW X64 is obtaineddown(ii) a Will feature diagram XactAnd feature map XdownAfter matrix multiplication, Reshape operation is carried out again to obtain a feature diagram X with the shape of H X W X64final(ii) a Characteristic diagram XfinalAfter passing through a convolution layer with the number of 3 convolution kernels being 64 and the convolution kernel size being 3X 3, the convolution layer is compared with the input feature map XattAdding element by element to obtain the final output characteristic diagram X of the attention moduleatt-out(ii) a Wherein all convolutional layers are followed by a ReLU nonlinear activation function.
6. The method for fast image scene rendering based on semi-predictive filtering according to claim 5, wherein the semi-filtering kernel module has a specific structure: input feature map X of half-filter kernel modulefilterThe dimension of (a) is height H and width W channels C; input feature map XfilterObtaining deep characteristic information X through a residual error module with 64 filtering kernelsdeep(ii) a Sequentially passing through convolution layers with convolution kernel number of 64 and convolution kernel size of 3X 3 and up-sampling layers with multiple of 2 to obtain a feature map X 'of required generated filtering'deep2H x 2W x 64 in size;
will feature picture X'deepDivided by channel dimensions into a feature pattern X of size 2H X2W X48AAnd a feature pattern X of size 2H X2W X16B(ii) a Characteristic diagram XAFor generating adaptive filtering kernels, i.e. feature maps XASequentially passing through convolution kernel with number k2Convolution layer with convolution kernel size of 3 x 3 and Softmax activation function to obtain size of 2H x 2W x k2Adaptive filtering kernel Xadp-fWhere k is a predefined filter kernel size; characteristic diagram XBEdge filter kernels Gabor filter kernels for combined generation of fixed filter kernel parameters, i.e. feature maps XBPerforming custom filtering operation with 16 Gabor filtering kernels with given parameters, and performing linear combination on the 16 Gabor filtering kernels to obtain the required size of 2H x 2W x k2Edge filterWave nucleus Xgabor-fThe method is used for rapidly enhancing the edge information of the picture to be reserved, wherein 16 Gabor filtering kernels comprise 8 directions, and the Gabor filtering kernels in the same direction comprise 2 Sigma parameters, so that the parameters of the 16 Gabor filtering kernels are different; finally, the adaptive filter kernel X is processedadp-fAnd an edge filtering kernel Xgabor-fAdding element by element to obtain the final needed semi-filter kernel Xfilter-out。
7. The method for fast image scene rendering based on semi-predictive filtering according to claim 6, wherein the image generation module has a specific structure: the image generation module comprises three inputs, and an input feature map X with the same scale1Low-scale up-sampled input feature map X2Input half-filter kernel X generated by half-filter kernel modulefilter-out(ii) a Input feature map X1Sequentially passing through convolution layers with convolution kernel number of 3 and convolution kernel size of 3X 3 and up-sampling layer with multiple of 2, and outputting the result and inputting characteristic diagram X2Adding element by element to obtain a feature diagram X with the size H W3 and finally needing filtering operationgen(ii) a Half-filtered kernel Xfilter-outAnd feature map XgenPerforming convolution operation of the custom filtering kernel to obtain a final feature diagram X with the size of H X W X3outFeature map XoutThe picture is the required picture which is processed by the shot rendering.
8. The method for fast picture and scene rendering based on semi-predictive filtering according to any one of claims 3-7, wherein the specific method in step 3 is as follows:
the network model is trained as follows:
firstly, inputting the 5-channel full-focusing picture I manufactured in the step 1.3org+c(ii) a Then, the saliency detection module and the restrictive prediction filtering module are used for preserving the saliency characteristics of the image and blurring the background; finally, continuously optimizing the picture I with shot rendering output by the model by utilizing the loss functionbokehTo make it gradually resemble the number constructed in step 1Picture I with real shot rendering effect in data setgt;
In the training process, the loss function L adopts the combination of an L1 function and an LS function, and the model output picture I is improvedbokehAnd contrast picture IgtStructural similarity between the two images, and by utilizing the back propagation of deep learning, the output image I of the model is continuously reducedbokehAnd comparative picture IgtSo as to realize the picture I with the shot rendering output by the modelbokehThe optimization is specifically represented as:
L=L1(Ibokeh,Igt)+LS(Ibokeh,Igt)
wherein L1 (I)bokeh,Igt) Picture I with shot rendering representing model outputbokehAnd contrast picture IgtIs a reconstruction function of LS (I)bokeh,Igt) Picture I with shot rendering representing model outputbokehAnd contrast picture IgtThe loss function is expressed as follows:
the Sobel represents that gradient calculation in the horizontal direction and the vertical direction is carried out on the picture and is used for calculating the outline structure of the picture content, and N represents the sum of the number of pixel points of the picture, namely the width W multiplied by the height H of the picture.
9. The method for fast image scene rendering based on semi-predictive filtering according to claim 8, wherein the specific method in step 4 is as follows:
loading the weight of the shot rendering network model trained in the step 2, and updating parameters in the model; secondly, the full focus picture I with the modified size in step 1.2 is takenorg+cThe input data is transmitted into a shot rendering network model and sequentially passes through a significance detection module and a restrictive prediction filtering module to obtain a model output picture I with a shot rendering effectbokeh。
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110914290.0A CN113810597B (en) | 2021-08-10 | 2021-08-10 | Rapid image and scene rendering method based on semi-predictive filtering |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110914290.0A CN113810597B (en) | 2021-08-10 | 2021-08-10 | Rapid image and scene rendering method based on semi-predictive filtering |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113810597A true CN113810597A (en) | 2021-12-17 |
CN113810597B CN113810597B (en) | 2022-12-13 |
Family
ID=78893425
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110914290.0A Active CN113810597B (en) | 2021-08-10 | 2021-08-10 | Rapid image and scene rendering method based on semi-predictive filtering |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113810597B (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2023226479A1 (en) * | 2022-05-26 | 2023-11-30 | 北京京东尚科信息技术有限公司 | Deep rendering model training method and apparatus, and target rendering method and apparatus |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108665494A (en) * | 2017-03-27 | 2018-10-16 | 北京中科视维文化科技有限公司 | Depth of field real-time rendering method based on quick guiding filtering |
US20190130532A1 (en) * | 2017-11-01 | 2019-05-02 | Guangdong Oppo Mobile Telecommunications Corp., Ltd. | Image-processing method, apparatus and device |
CN112073632A (en) * | 2020-08-11 | 2020-12-11 | 联想(北京)有限公司 | Image processing method, apparatus and storage medium |
CN112184586A (en) * | 2020-09-29 | 2021-01-05 | 中科方寸知微(南京)科技有限公司 | Method and system for rapidly blurring monocular visual image background based on depth perception |
US20210166350A1 (en) * | 2018-07-17 | 2021-06-03 | Xi'an Jiaotong University | Fusion network-based method for image super-resolution and non-uniform motion deblurring |
-
2021
- 2021-08-10 CN CN202110914290.0A patent/CN113810597B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108665494A (en) * | 2017-03-27 | 2018-10-16 | 北京中科视维文化科技有限公司 | Depth of field real-time rendering method based on quick guiding filtering |
US20190130532A1 (en) * | 2017-11-01 | 2019-05-02 | Guangdong Oppo Mobile Telecommunications Corp., Ltd. | Image-processing method, apparatus and device |
US20210166350A1 (en) * | 2018-07-17 | 2021-06-03 | Xi'an Jiaotong University | Fusion network-based method for image super-resolution and non-uniform motion deblurring |
CN112073632A (en) * | 2020-08-11 | 2020-12-11 | 联想(北京)有限公司 | Image processing method, apparatus and storage medium |
CN112184586A (en) * | 2020-09-29 | 2021-01-05 | 中科方寸知微(南京)科技有限公司 | Method and system for rapidly blurring monocular visual image background based on depth perception |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2023226479A1 (en) * | 2022-05-26 | 2023-11-30 | 北京京东尚科信息技术有限公司 | Deep rendering model training method and apparatus, and target rendering method and apparatus |
Also Published As
Publication number | Publication date |
---|---|
CN113810597B (en) | 2022-12-13 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109493350B (en) | Portrait segmentation method and device | |
Wang et al. | Real-esrgan: Training real-world blind super-resolution with pure synthetic data | |
CN108537746B (en) | Fuzzy variable image blind restoration method based on deep convolutional network | |
Liang et al. | Cameranet: A two-stage framework for effective camera isp learning | |
CN110136062B (en) | Super-resolution reconstruction method combining semantic segmentation | |
WO2020152521A1 (en) | Systems and methods for transforming raw sensor data captured in low-light conditions to well-exposed images using neural network architectures | |
US20230146181A1 (en) | Integrated machine learning algorithms for image filters | |
CN111372006B (en) | High dynamic range imaging method and system for mobile terminal | |
CN112419191B (en) | Image motion blur removing method based on convolution neural network | |
CN113793286B (en) | Media image watermark removing method based on multi-order attention neural network | |
Liu et al. | Face super-resolution reconstruction based on self-attention residual network | |
CN116152128A (en) | High dynamic range multi-exposure image fusion model and method based on attention mechanism | |
CN113810597B (en) | Rapid image and scene rendering method based on semi-predictive filtering | |
CN116071279A (en) | Image processing method, device, computer equipment and storage medium | |
CN113658091A (en) | Image evaluation method, storage medium and terminal equipment | |
CN112184550B (en) | Neural network training method, image fusion method, device, equipment and medium | |
Raimundo et al. | LAN: Lightweight attention-based network for RAW-to-RGB smartphone image processing | |
CN113379600A (en) | Short video super-resolution conversion method, device and medium based on deep learning | |
CN111953888B (en) | Dim light imaging method and device, computer readable storage medium and terminal equipment | |
CN112150363A (en) | Convolution neural network-based image night scene processing method, and computing module and readable storage medium for operating method | |
CN116485654A (en) | Lightweight single-image super-resolution reconstruction method combining convolutional neural network and transducer | |
CN113763524B (en) | Dual-flow foreground rendering method and system based on physical optical model and neural network | |
CN115564655A (en) | Video super-resolution reconstruction method, system and medium based on deep learning | |
CN112016456B (en) | Video super-resolution method and system based on adaptive back projection depth learning | |
Ren et al. | Fast Ultra High-Definition Video Deblurring via Multi-scale Separable Network |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |