CN113810597B

CN113810597B - Rapid image and scene rendering method based on semi-predictive filtering

Info

Publication number: CN113810597B
Application number: CN202110914290.0A
Authority: CN
Inventors: 颜成钢; 陈泉; 马立栋; 郑博仑; 孙垚棋; 张继勇; 李宗鹏
Original assignee: Hangzhou Dianzi University
Current assignee: Hangzhou Dianzi University
Priority date: 2021-08-10
Filing date: 2021-08-10
Publication date: 2022-12-13
Anticipated expiration: 2041-08-10
Also published as: CN113810597A

Abstract

A fast image scene rendering method based on semi-prediction filtering comprises the following steps. Firstly, shooting by a single lens reflex camera to obtain data shot under different scenes, interpolating all pictures of a data set into the size of 1024 × 1472 by using a bicubic linear interpolation method, carrying out coordinate assignment on the processed full-focus picture, making a coordinate graph, and then constructing and training a fast graph and scene rendering network model based on semi-prediction filtering, wherein the network model comprises an attention module, a residual error module, a semi-filtering kernel module and an image generation module; and finally, receiving the pictures needing to be subjected to the shot rendering processing by the trained neural network model, and outputting the pictures after the shot rendering processing is finished. The method realizes the rapid shot rendering of the image on the premise of ensuring the shot rendering quality, and innovatively provides a coordinate graph for assisting the training of a network model and improving the capability of the network model for distinguishing important contents in the input image.

Description

Rapid image and scene rendering method based on semi-prediction filtering

Technical Field

The invention relates to a fast image and shot rendering method based on semi-prediction filtering, in particular to the field of shot effect processing based on a deep learning technology.

Background

The shot rendering effect is generally considered as one of aesthetic standards in the field of photography, and is easily realized by a photographer using a single lens reflex camera with the support of the prior art, and the photographer sets the camera to a large aperture shooting mode for image shooting so as to blur an uninteresting part in an image. In consideration of popularization of the smart phone, manufacturers try to add complex hardware and a camera at a mobile phone end to achieve a shot rendering effect of the smart phone from a hardware level, but high manufacturing cost is not friendly to merchants and consumers. Therefore, a shot rendering algorithm aiming at the image is developed on the basis of a software level and becomes a research hotspot, the shot rendering implementation method depends on the operational performance of the mobile phone, the required hardware cost is relatively low, and the method is suitable for most smart phones on the market. At present, most algorithms are realized based on deep learning, and an end-to-end network is built to realize the shot effect rendering of images. However, when the deep learning algorithm is integrated into the mobile phone, shortening the operation time becomes a big problem, the operation speed and the rendering effect are mutually restricted, and how to unify the operation speed and the rendering effect is a problem to be considered.

Disclosure of Invention

The technical problem to be solved is as follows: aiming at the problems of high cost of the hardware-based implementation method and operation speed and rendering quality of the software-based implementation method, the invention provides a fast image and scene rendering method based on semi-prediction filtering.

The implementation steps are as follows: the invention provides a fast picture and scene rendering method based on semi-prediction filtering, which comprises the following basic steps:

step 1: making a data set;

step 1.1: the method comprises the steps of obtaining data shot under different scenes through shooting by a single lens reflex, wherein the data shot under the different scenes are a pair of pictures, namely, all-focus pictures I shot by the single lens reflex _org Picture I actually shot by single lens reflex through large aperture and having scattered scene rendering effect _gt . Wherein the picture is in full focus I _org Picture I with true shot rendering effect as input image data in model training process _gt As comparison data for comparison with the model output images during the model training process.

Step 1.2: all pictures of the data set are interpolated to a size of 1024 x 1472 in height by using a bicubic linear interpolation method.

Step 1.3: and (5) making a coordinate graph. For the full focus picture I processed in the step 1.2 _org And (3) carrying out coordinate assignment, wherein the specific calculation method comprises the following steps:

wherein, X represents the pixel point coordinate corresponding to the high dimension of the picture, and Y represents the pixel point coordinate corresponding to the wide dimension of the picture. Combining the X and Y information with the full focus picture I _org Combining to reconstruct a 5-channel full-focus picture I _org+c As the final input picture of the network model.

Step 2: constructing a fast image scene rendering network model based on semi-prediction filtering;

step 2.1: and (3) carrying out theoretical derivation on a fast image scene rendering task based on semi-predictive filtering. Suppose the input is a full focus picture I _org+c Using a significance detection algorithm to carry out full focus picture I _org+c Divided into two parts including a significant characteristic part I in a picture _focus And background characteristics I of the picture _defocus . Background region picture I by utilizing semi-filtering fuzzy algorithm _defocus Blurring to obtain picture I with blurred background _blur The semi-filtering fuzzy algorithm divides the salient feature part I _focus Preserving and finally obtaining the picture I with fuzzy background _blur And salient feature portion I _focus Fusing to obtain the required picture I with the shot rendering _bokeh . The theoretical model of the scene rendering task is formulated as follows:

wherein

Representing a saliency detection algorithm;

representing a semi-filtered blurring algorithm.

Step 2.2: constructing a fast image scene rendering network based on semi-prediction filtering;

the fast image scene rendering network based on semi-prediction filtering comprises an attention module, a residual error module, a semi-filtering kernel module and an image generation module. Wherein the attention module is used for detecting the input full focus picture I _org+c For assisting the operation of a subsequent restrictive prediction filtering module; the residual error module is used for carrying out deep feature enhancement on the input data; the semi-filtering kernel module is used for generating a required filtering kernel, and is used for carrying out filtering operation on an input image and blurring partial content of a picture so as to generate a shot rendering effect, wherein the filtering kernel consists of a self-adaptive filtering kernel generated by a network and a small number of Gabor filtering kernels manually defined with parameters, the self-adaptive filtering kernel generated by the network is used for self-adaptively blurring the input image, and the Gabor filtering kernel manually defined with parameters is used for reserving and enhancing salient region details and edge details of the image; the image generation module is used for generating a picture which needs to be filtered by using the filtering kernel generated by the half filtering kernel module.

The residual error module has the specific structure that: input feature map X of residual module _res Sequentially passing through convolution layers with the convolution kernel number of 64 and the convolution kernel size of 3X 3 to obtain an output characteristic diagram X' _res . Finally, will output X' _res And input X _res Adding element by element to obtain the final output characteristic diagram X of the residual error module _res-out . Wherein all convolutional layers are followed by a ReLU nonlinear activation function.

Attention module specific structure: input profile X for attention Module _att The dimension of (a) is height H width W channel C. The attention module is divided into three branches of up, middle mid and down, and a characteristic diagram X is input _att The number of convolution kernels passing through the up branch is 64,after the convolution layer with convolution kernel size of 3 × 3, reshape operation is performed to obtain feature map X with shape of HW × 64 _up (ii) a Input feature map X _att Performing Reshape operation after the convolution layer with the number of the convolution kernels of 64 mid branches and the convolution kernel size of 3X 3 to obtain a feature map X with the shape of 64X HW _mid (ii) a Will feature diagram X _up And feature map X _mid Activating by adopting a Softmax function after matrix multiplication is carried out to obtain a characteristic diagram X with the shape of HW and HW _act (ii) a Input feature map X _att After the convolution layer with the convolution kernel number of 64 and the convolution kernel size of 3X 3 is processed by Reshape operation, the characteristic diagram X with the shape of HW X64 is obtained _down (ii) a Will feature diagram X _act And feature map X _down After matrix multiplication, reshape operation is carried out again to obtain a feature diagram X with the shape of H X W X64 _final (ii) a Characteristic diagram X _final After passing through a convolution layer with the number of 3 convolution kernels of 64 and the convolution kernel size of 3X 3, the convolution layer is compared with the input feature map X _att Adding element by element to obtain the final output characteristic diagram X of the attention module _att-out . Wherein all convolutional layers are followed by a ReLU nonlinear activation function.

The semi-filtering kernel module has the specific structure that: input feature map X of half-filter kernel module _filter The dimension of (a) is height H width W channel C. Input feature map X _filter Obtaining deep characteristic information X through a residual error module with 64 filtering kernels _deep (ii) a Sequentially passing through convolution layers with the convolution kernel number of 64, the convolution kernel size of 3X 3 and the multiple of 2 to obtain a required feature map X 'for generating filtering' _deep Size 2h x 2w x 64;

will feature picture X' _deep Dividing the channel into feature maps X with the size of 2H × 2W × 48 _A And feature pattern X of size 2H X2W X16 _B (ii) a Characteristic diagram X _A For generating adaptive filtering kernels, i.e. feature maps X _A Sequentially passing through convolution kernel with number k ² Convolution layer with convolution kernel size of 3 x 3 and Softmax activation function to obtain size of 2H x 2W x k ² Adaptive filtering kernel X _adp-f Where k is a predefined filter kernel size; feature(s)Drawing X _B Edge filter kernels Gabor filter kernels for combined generation of fixed filter kernel parameters, i.e. feature maps X _B Performing custom filtering operation with 16 Gabor filtering kernels with given parameters, and performing linear combination on the 16 Gabor filtering kernels to obtain the required size of 2H x 2W x k ² Edge filtering kernel X _gabor-f The method is used for rapidly enhancing the edge information of the picture to be reserved, wherein 16 Gabor filtering kernels comprise 8 directions, and the Gabor filtering kernels in the same direction comprise 2 Sigma parameters, so that the parameters of the 16 Gabor filtering kernels are different; finally, the adaptive filter kernel X is processed _adp-f And an edge filtering kernel X _gabor-f Adding element by element to obtain the final needed semi-filter kernel X _filter-out 。

The image generation module has the following specific structure: the image generation module comprises three inputs, and an input feature map X with the same scale ₁ Low-scale up-sampled input feature map X ₂ Input half-filter kernel X generated by half-filter kernel module _filter-out . Input feature map X ₁ Sequentially passing through convolution layers with convolution kernel number of 3 and convolution kernel size of 3X 3 and up-sampling layers with multiple of 2, and comparing output result with input characteristic diagram X ₂ Adding element by element to obtain a feature pattern X with the size of H W3 and finally needing filtering operation _gen (ii) a Half-filtered kernel X _filter-out And feature map X _gen Performing convolution operation of the custom filtering kernel to obtain a final feature diagram X with the size of H X W X3 _out Characteristic diagram X _out The picture is the required picture which is processed by the shot rendering.

The specific structure of the complete network: the complete network is divided into 2 branches, and each branch processes information with different scales; the initial input to the network is the full focus picture I generated in step 1.3 _org+c (ii) a Branch 1 contains a residual module, a half-filter kernel module and an image generation module, while branch 2 contains an attention module, a half-filter kernel module and an image generation module; in order to enhance the information correlation between different branches, the input of branch 2 is composed of the intermediate information and the full focus picture information of branch 1, which are all down-sampled, and the output result of branch 2 is fed back to the image generation of branch 1And the module is used for guiding the operation of the image generation module.

And 3, step 3: and training a semi-prediction filtering-based fast image scene rendering network model.

The network model is trained as follows:

firstly, the 5-channel full-focus picture I manufactured in the step 1.3 is input _org+c (ii) a Then, the saliency detection module and the restrictive prediction filtering module are used for preserving the saliency characteristics of the image and blurring the background; finally, continuously optimizing the picture I with shot rendering output by the model by utilizing the loss function _bokeh Gradually resembling the picture I with real bokeh rendering effect in the data set constructed in the step 1 _gt 。

In the training process, the loss function L adopts the combination of the L1 function and the LS function, so that the model output picture I is improved _bokeh And comparative picture I _gt Structural similarity between the two images, and by utilizing the back propagation of deep learning, the output image I of the model is continuously reduced _bokeh And comparative picture I _gt So as to realize the picture I with the shot rendering output by the model _bokeh The optimization is specifically represented as:

L＝L1(I _bokeh ,I _gt )+LS(I _bokeh ,I _gt )

wherein L1 (I) _bokeh ,I _gt ) Picture I with shot rendering representing model output _bokeh And contrast picture I _gt LS (I) of _bokeh ,I _gt ) Picture I with shot rendering representing model output _bokeh And comparative picture I _gt The loss function is expressed as follows:

the Sobel represents that gradient calculation is carried out on the picture in the horizontal direction and the vertical direction and is used for calculating the outline structure of the content of the picture, and N represents the sum of the number of pixel points of the picture, namely the width W multiplied by the height H of the picture.

And 4, step 4: receiving the picture to be subjected to shot rendering processing by the trained neural network model, and outputting the picture after the shot rendering processing is finished;

and (3) loading the weight of the shot rendering network model trained in the step (2) and updating parameters in the model. Secondly, the full focus picture I with the modified size in step 1.2 is taken _org+c The input data is transmitted into a shot rendering network model and sequentially passes through a significance detection module and a restrictive prediction filtering module to obtain a model output picture I with a shot rendering effect _bokeh 。

The invention has the following beneficial effects:

1. the method for rendering the fast shot of the image based on the semi-prediction filtering is innovatively provided, and the fast shot rendering of the image is realized on the premise of ensuring the shot rendering quality.

2. The coordinate graph is innovatively provided and used for assisting the training of the network model and improving the capability of the network model for distinguishing important contents in the input image.

Drawings

FIG. 1 is a schematic flow diagram of the process of the present invention;

FIG. 2 is a flowchart of a shot rendering process for a single image;

FIG. 3 is a diagram of a scene rendering network based on semi-predictive filtering;

FIG. 4 is an effect diagram of an automobile generating a shot rendering;

FIG. 5 is an effect diagram of a streetlight generating a shot rendering.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

The invention is first defined and explained below:

I _org : full focus picture

I _org+c : 5-channel full containing coordinate graph informationFocusing on pictures, true input information for the network model

I _gt : picture with real shot rendering effect

I _bokeh : model output picture with shot rendering effect

FIG. 2 is a flowchart of a shot rendering process for a single image;

as shown in fig. 1, the present invention provides a fast image shot rendering method based on semi-predictive filtering, which comprises the following basic steps:

step 1: data set production

Step 1.1: the method comprises the steps of obtaining data shot under different scenes through shooting by a single lens reflex, wherein the data shot under the different scenes are a pair of pictures, namely, all-focus pictures I shot by the single lens reflex _org And a picture I which is really shot by the single lens reflex camera by utilizing a large aperture and has a shot scene rendering effect _gt . Wherein the picture I is in full focus _org Picture I with real shot rendering effect as input image data in model training process _gt As comparison data for comparison with the model output images during the model training process.

Step 1.2: all pictures of the data set are interpolated into the size of 1024 x 1472 by a bicubic linear interpolation method, and the size of the data set is unified, so that the operation time required by a training network is reduced.

wherein, X represents the pixel point coordinate corresponding to the picture high dimension, and Y represents the pixel point coordinate corresponding to the picture wide dimension. Combining the X and Y information with the full focus picture I _org Combining, reconstructing a 5-channel full-focus picture I _org+c As the final input picture of the network model.

step 2.1: and (3) carrying out scene rendering task theory derivation on the fast image based on semi-prediction filtering. Suppose the input is a full focus picture I _org+c Using significance detection algorithm to convert the full focus picture I _org+c Divided into two parts including a significant characteristic part I in a picture _focus And background characteristics I of the picture _defocus . Background region picture I by utilizing semi-filtering fuzzy algorithm _defocus Blurring to obtain picture I with blurred background _blur The semi-filter fuzzy algorithm is used for converting the salient characteristic part I _focus Reserving and finally obtaining the picture I with the fuzzy background _blur And salient feature part I _focus Fusing to obtain the required picture I with the shot rendering _bokeh . The theoretical model of the scene rendering task is formulated as follows:

wherein

Representing a saliency detection algorithm;

representing a semi-filtered blurring algorithm.

Step 2.2: constructing a fast image scene rendering network based on semi-prediction filtering:

the fast image scene rendering network based on semi-prediction filtering comprises an attention module, a residual error module, a semi-filtering kernel module and an image generation module. Wherein the attention module is used for detecting an input all-in-focus picture I _org+c For assisting the operation of a subsequent restrictive prediction filtering module; residual mouldThe block is used for carrying out deep feature enhancement on input data; the semi-filtering kernel module is used for generating a required filtering kernel and is used for carrying out filtering operation on an input image and blurring partial content of a picture so as to generate a shot rendering effect, wherein the filtering kernel consists of a self-adaptive filtering kernel generated by a network and a small amount of Gabor filtering kernels manually defined with parameters, the self-adaptive filtering kernel generated by the network is used for self-adaptively blurring the input image, and the Gabor filtering kernel manually defined with parameters is used for reserving and enhancing salient region details and edge details of the image; the image generation module is used for generating a picture which needs to be filtered by using the filtering kernel generated by the half filtering kernel module.

The specific structure of the residual error module is as follows: input feature map X of residual module _res Sequentially passing through convolution layers with the convolution kernel number of 64 and the convolution kernel size of 3X 3 to obtain an output characteristic diagram X' _res . Finally, will output X' _res And input X _res Adding element by element to obtain the final output characteristic diagram X of the residual error module _res-out . Wherein all convolutional layers are followed by a ReLU nonlinear activation function.

The attention module has the following specific structure: input profile X for attention Module _att The dimension of (a) is height H width W channel C. The attention module is divided into three branches of up, middle mid and down, and a characteristic diagram X is input _att After the convolution layers with the convolution kernel number of 64 and the convolution kernel size of 3 × 3 are processed by Reshape operation, the characteristic diagram X with the shape of HW × 64 is obtained _up (ii) a Input feature map X _att Performing Reshape operation after the convolution layer with the number of the convolution kernels of 64 mid branches and the convolution kernel size of 3X 3 to obtain a feature map X with the shape of 64X HW _mid (ii) a Will feature diagram X _up And feature map X _mid Activating by adopting a Softmax function after matrix multiplication is carried out to obtain a characteristic diagram X with the shape of HW and HW _act (ii) a Input feature map X _att After the convolution layer with the convolution kernel number of 64 and the convolution kernel size of 3X 3 is processed by Reshape operation, the characteristic diagram X with the shape of HW X64 is obtained _down (ii) a Will feature diagram X _act And feature map X _down After the matrix multiplication operation is carried out, reshape operation is carried out again,obtaining a characteristic diagram X with the shape of H X W64 _final (ii) a Characteristic diagram X _final After passing through a convolution layer with the number of 3 convolution kernels of 64 and the convolution kernel size of 3X 3, the convolution layer is compared with the input feature map X _att Adding element by element to obtain the final output characteristic diagram X of the attention module _att-out . Wherein all convolutional layers are followed by a ReLU nonlinear activation function.

will feature picture X' _deep Dividing the channel into feature maps X with the size of 2H.2W.48 _A And feature pattern X of size 2H X2W X16 _B (ii) a Characteristic diagram X _A For generating adaptive filter kernels, i.e. feature maps X _A Sequentially passing through convolution kernel with number k ² Convolution kernel size 3 x 3 convolution layer and Softmax activation function to get size 2H x 2W x k ² Adaptive filtering kernel X _adp-f Where k is a predefined filter kernel size; characteristic diagram X _B Edge filter kernels Gabor filter kernels for combined generation of fixed filter kernel parameters, i.e. feature maps X _B Performing custom filtering operation with 16 Gabor filtering kernels with given parameters, and performing linear combination on the 16 Gabor filtering kernels to obtain the required size of 2H x 2W x k ² Edge filtering kernel X _gabor-f The method is used for rapidly enhancing the edge information of the picture to be changed and reserved, wherein 16 Gabor filtering kernels comprise 8 directions, and the Gabor filtering kernels in the same direction comprise 2 Sigma parameters, so that the parameters of the 16 Gabor filtering kernels are different; finally, the adaptive filter kernel X is processed _adp-f And an edge filtering kernel X _gabot-f Adding element by element to obtain the final needed half-filter kernel X _filter-out 。

The image generation module has the specific structure that: image generation deviceThe module comprises three inputs, and an input feature map X with the same scale ₁ Low-scale up-sampled input feature map X ₂ Half-filter kernel X generated by input half-filter kernel module _filter-out . Input feature map X ₁ Sequentially passing through convolution layers with convolution kernel number of 3 and convolution kernel size of 3X 3 and up-sampling layers with multiple of 2, and comparing output result with input characteristic diagram X ₂ Adding element by element to obtain a feature pattern X with the size of H W3 and finally needing filtering operation _gen (ii) a Half-filtered kernel X _filter-out And feature map X _gen Performing convolution operation of the custom filtering kernel to obtain a final feature diagram X with the size of H X W X3 _out Feature map X _out The picture is the required picture which is processed by the shot rendering.

The specific structure of the complete network: the complete network is divided into 2 branches, and each branch processes information with different scales; the initial input to the network is the full focus picture I generated in step 1.3 _org+c (ii) a Branch 1 contains a residual module, a half-filter kernel module and an image generation module, while branch 2 contains an attention module, a half-filter kernel module and an image generation module; in order to enhance the information correlation degree between different branches, the input of the branch 2 is composed of the intermediate information and the full focus picture information of the branch 1 which are all subjected to down sampling, and the output result of the branch 2 is fed back to the image generation module of the branch 1 for guiding the operation of the image generation module.

The network model is trained as follows:

firstly, the 5-channel full-focus picture I manufactured in the step 1.3 is input _org+c (ii) a Then, the saliency detection module and the restrictive prediction filtering module are used for preserving the saliency characteristics of the image and blurring the background; finally, continuously optimizing the picture I with shot rendering output by the model by utilizing the loss function _bokeh To make it gradually resemble the truth in the data set constructed in step 1Picture I with real and scattered scene rendering effect _gt 。

In the training process, the loss function L adopts the combination of an L1 function and an LS function, so that the output picture I of the model is improved _bokeh And comparative picture I _gt Structural similarity between the two images, and by utilizing the back propagation of deep learning, the output image I of the model is continuously reduced _bokeh And comparative picture I _gt So as to realize the picture I with the shot rendering output by the model _bokeh The optimization is specifically represented as:

L＝L1(I _bokeh ,I _gt )+LS(I _bokeh ,I _gt )

wherein L1 (I) _bokeh ,I _gt ) Picture I with shot rendering representing model output _bokeh And comparative picture I _gt Is a reconstruction function of LS (I) _bokeh ,I _gt ) Picture I with shot rendering representing model output _bokeh And contrast picture I _gt The loss function is expressed as follows:

the Sobel represents that gradient calculation in the horizontal direction and the vertical direction is carried out on the picture and is used for calculating the outline structure of the picture content, and N represents the sum of the number of pixel points of the picture, namely the width W multiplied by the height H of the picture.

And 4, step 4: the trained neural network model receives pictures needing to be subjected to shot rendering processing, and the pictures are output after the shot rendering processing is finished

Firstly, loading the weight of the shot rendering network model trained in the step 2, and updating parameters in the model. Secondly, the full focus picture I with the modified size in step 1.2 is taken _org+c The input data is transmitted into a shot rendering network model and sequentially passes through a significance detection module and a restrictive prediction filtering module to obtain a model output picture I with a shot rendering effect _bokeh 。

FIG. 4 is an effect diagram of an automobile generating a shot rendering;

FIG. 5 is an effect diagram of a streetlight generating a shot rendering.

Claims

1. A fast image shot rendering method based on semi-prediction filtering is characterized by comprising the following steps:

step 1: making a data set;

and step 3: training a fast image scene rendering network model based on semi-prediction filtering;

and 4, step 4: receiving the pictures needing to be subjected to the shot rendering processing by the trained neural network model, and outputting the pictures after the shot rendering processing is finished;

the specific method of the step 1 is as follows:

step 1.1: the method comprises the steps of obtaining data shot under different scenes through shooting by a single lens reflex, wherein the data shot under the different scenes are a pair of pictures, namely, the data are respectively a full-focus picture I shot by the single lens reflex _org Picture I actually shot by single lens reflex through large aperture and having scattered scene rendering effect _gt (ii) a Wherein the picture I is in full focus _org Picture I with true shot rendering effect as input image data in model training process _gt As comparison data for comparison with the model output image in the model training process;

step 1.2: interpolating all pictures of the data set into the size of 1024 height multiplied by 1472 width by a bicubic linear interpolation method;

step 1.3: making a coordinate graph; for the full focus picture I processed in the step 1.2 _org And (3) carrying out coordinate assignment, wherein the specific calculation method comprises the following steps:

x represents the pixel point coordinate corresponding to the high dimension of the picture, and Y represents the pixel point coordinate corresponding to the wide dimension of the picture; combining the information of X and Y with the full focus picture I _org Combining to reconstruct a 5-channel full-focus picture I _org+c As the final input picture of the network model;

the specific method of the step 2 is as follows:

step 2.1: fast image scene rendering task theory derivation based on semi-prediction filtering; suppose the input is a full focus picture I _org+c Using significance detection algorithm to convert the full focus picture I _org+c Divided into two parts including a significant characteristic part I in a picture _focus And background characteristics I of the picture _defocus (ii) a Background region picture I by utilizing half-filtering fuzzy algorithm _defocus Blurring to obtain picture I with blurred background _blur The semi-filtering fuzzy algorithm divides the salient feature part I _focus Preserving and finally obtaining the picture I with fuzzy background _blur And salient feature portion I _focus Fusing to obtain the required picture I with the shot rendering _bokeh (ii) a The theoretical model of the scene rendering task is formulated as follows:

wherein

Representing a saliency detection algorithm;

representing a semi-filtered fuzzy algorithm;

the fast image scene rendering network based on semi-prediction filtering comprises an attention module, a residual error module, a semi-filtering kernel module and an imageAn image generation module; wherein the attention module is used for detecting the input full focus picture I _org+c For assisting the operation of a subsequent restrictive prediction filtering module; the residual error module is used for carrying out deep feature enhancement on the input data; the semi-filtering kernel module is used for generating a required filtering kernel and is used for carrying out filtering operation on an input image and blurring partial content of a picture so as to generate a shot rendering effect, wherein the filtering kernel consists of a self-adaptive filtering kernel generated by a network and a small amount of Gabor filtering kernels manually defined with parameters, the self-adaptive filtering kernel generated by the network is used for self-adaptively blurring the input image, and the Gabor filtering kernel manually defined with parameters is used for reserving and enhancing salient region details and edge details of the image; the image generation module is used for generating a picture which needs to be filtered by using the filtering kernel generated by the half filtering kernel module;

the complete network is divided into 2 branches, and each branch processes information with different scales; the initial input to the network is the full focus picture I generated in step 1.3 _org+c (ii) a Branch 1 contains a residual module, a half-filter kernel module and an image generation module, while branch 2 contains an attention module, a half-filter kernel module and an image generation module; in order to enhance the information correlation degree between different branches, the input of the branch 2 consists of the intermediate information and the full focus picture information of the branch 1 which are all subjected to down sampling, and the output result of the branch 2 is fed back to the image generation module of the branch 1 and is used for guiding the operation of the image generation module;

the specific structure of the residual error module is as follows: input feature map X of residual module _res Sequentially passing through convolution layers with the convolution kernel number of 64 and the convolution kernel size of 3X 3 to obtain an output feature map X' _res (ii) a Finally, X 'is output' _res And input X _res Adding element by element to obtain the final output characteristic diagram X of the residual error module _res-out (ii) a Wherein a ReLU nonlinear activation function is connected after all the convolution layers;

attention module specific structure: input profile X for attention Module _att The dimension of (a) is height H and width W channels C; the attention module is divided into three branches of up, middle mid and down, and a characteristic diagram X is input _att Up-branched convolution kernelThe number of convolution layers is 64, the convolution kernel size is 3 × 3, and then Reshape operation is performed to obtain a feature map X with the shape of HW × 64 _up (ii) a Input feature map X _att Performing Reshape operation after convolution layer with the number of the convolution kernels passing through mid branches being 64 and the convolution kernel size being 3X 3 to obtain a feature map X with the shape of 64X HW _mid (ii) a Will feature diagram X _up And feature map X _mid After matrix multiplication operation is carried out, activating by adopting Softmax function to obtain a characteristic diagram X with the shape of HW _act (ii) a Input feature map X _att After the convolution layer with the convolution kernel number of 64 and the convolution kernel size of 3X 3 is processed by Reshape operation, the characteristic diagram X with the shape of HW X64 is obtained _down (ii) a Will feature diagram X _act And feature map X _down After the matrix multiplication, reshape operation is performed again to obtain a feature map X with a shape of H × W × 64 _final (ii) a Characteristic diagram X _final After passing through a convolution layer with the number of 3 convolution kernels of 64 and the convolution kernel size of 3X 3, the convolution layer is compared with the input feature map X _att Adding element by element to obtain the final output characteristic diagram X of the attention module _att-out (ii) a Wherein a ReLU nonlinear activation function is connected after all convolution layers;

the semi-filtering kernel module has the specific structure that: input feature map X of half-filter kernel module _filter The dimension of (a) is height H wide W channel C; input feature map X _filter Obtaining deep characteristic information X through a residual error module with 64 filtering kernels _deep (ii) a Sequentially passing through convolution layers with convolution kernel number of 64 and convolution kernel size of 3X 3 and up-sampling layers with multiple of 2 to obtain a feature map X 'of required generated filtering' _deep Size 2h x 2w x 64;

feature map X' _deep Dividing the channel into feature maps X with the size of 2H.2W.48 _A And feature pattern X of size 2H X2W X16 _B (ii) a Characteristic diagram X _A For generating adaptive filter kernels, i.e. feature maps X _A Sequentially passes through convolution kernel with the number of k ² Convolution kernel size 3 x 3 convolution layer and Softmax activation function to get size 2H x 2W x k ² Adaptive filtering kernel X _adp-f Wherein k is a predefined filtering kernelCun, cun; characteristic diagram X _B Edge filter kernels Gabor filter kernels for combined generation of fixed filter kernel parameters, i.e. feature maps X _B Performing custom filtering operation with 16 Gabor filtering kernels with given parameters, and performing linear combination on the 16 Gabor filtering kernels to obtain the required size 2H x 2W x k ² Edge filtering kernel X _gabor-f The method is used for rapidly enhancing the edge information of the picture to be reserved, wherein 16 Gabor filtering kernels comprise 8 directions, and the Gabor filtering kernels in the same direction comprise 2 Sigma parameters, so that the parameters of the 16 Gabor filtering kernels are different; finally, the adaptive filter kernel X is processed _adp-f And an edge filtering kernel X _gabor-f Adding element by element to obtain the final needed semi-filter kernel X _filter-out ；

The image generation module has the specific structure that: the image generation module comprises three inputs, and an input feature map X with the same scale ₁ Low-scale up-sampled input feature map X ₂ Half-filter kernel X generated by input half-filter kernel module _filter-out (ii) a Input feature map X ₁ Sequentially passing through convolution layers with convolution kernel number of 3 and convolution kernel size of 3X 3 and up-sampling layers with multiple of 2, and outputting the output result and the input characteristic diagram X ₂ Adding element by element to obtain a feature pattern X with the size of H W3 and finally needing filtering operation _gen (ii) a Half-filtered kernel X _filter-out And feature map X _gen Performing convolution operation of the custom filtering kernel to obtain a final feature diagram X with the size of H X W X3 _out Feature map X _out The picture is the picture which is required to be subjected to the shot rendering;

the specific method of step 3 is as follows:

the network model is trained as follows:

firstly, inputting the 5-channel full-focusing picture I manufactured in the step 1.3 _org+c (ii) a Then, the saliency detection module and the restrictive prediction filtering module are used for preserving the saliency characteristics of the image and blurring the background; finally, continuously optimizing the picture I with shot rendering output by the model by utilizing the loss function _bokeh Gradually resembles the picture with real shot rendering effect in the data set constructed in the step 1I _gt ；

L＝L1(I _bokeh ,I _gt )+LS(I _bokeh ,I _gt )

wherein Sobel represents the gradient calculation of the picture in the horizontal and vertical directions, and is used for calculating the outline structure of the content of the picture, and N represents the sum of the number of pixel points of the picture, namely the width W multiplied by the height H of the picture;

the specific method of the step 4 is as follows:

loading the weight of the shot rendering network model trained in the step 2, and updating parameters in the model; secondly, the full focus picture I with the modified size in step 1.2 is taken _org+c The input data is transmitted into a shot rendering network model and sequentially passes through a significance detection module and a restrictive prediction filtering module to obtain a model output picture I with a shot rendering effect _bokeh 。