CN113810597A

CN113810597A - Rapid image and scene rendering method based on semi-prediction filtering

Info

Publication number: CN113810597A
Application number: CN202110914290.0A
Authority: CN
Inventors: 颜成钢; 陈泉; 马立栋; 郑博仑; 孙垚棋; 张继勇; 李宗鹏
Original assignee: Hangzhou Dianzi University
Current assignee: Hangzhou Dianzi University
Priority date: 2021-08-10
Filing date: 2021-08-10
Publication date: 2021-12-17
Anticipated expiration: 2041-08-10
Also published as: CN113810597B

Abstract

A fast image scene rendering method based on semi-prediction filtering comprises the following steps. Firstly, shooting by a single lens reflex camera to obtain data shot under different scenes, interpolating all pictures of a data set into the size of 1024 × 1472 by using a bicubic linear interpolation method, carrying out coordinate assignment on the processed full-focus picture, making a coordinate graph, and then constructing and training a fast graph and scene rendering network model based on semi-prediction filtering, wherein the network model comprises an attention module, a residual error module, a semi-filtering kernel module and an image generation module; and finally, receiving the pictures needing to be subjected to the shot rendering processing by the trained neural network model, and outputting the pictures after the shot rendering processing is finished. The method realizes the rapid shot rendering of the image on the premise of ensuring the shot rendering quality, and innovatively provides a coordinate graph for assisting the training of a network model and improving the capability of the network model for distinguishing important contents in the input image.

Description

Rapid image and scene rendering method based on semi-prediction filtering

Technical Field

The invention relates to a fast image and scene rendering method based on semi-prediction filtering, in particular to the field of scene effect processing based on a deep learning technology.

Background

The shot rendering effect is generally considered as one of aesthetic standards in the field of photography, and is easily realized by a photographer using a single lens reflex camera with the support of the prior art, and the photographer sets the camera to a large aperture shooting mode for image shooting so as to blur an uninteresting part in an image. In consideration of popularization of the smart phone, manufacturers try to add complex hardware and cameras at the mobile phone end to achieve the shot rendering effect of the smart phone from a hardware level, but high manufacturing cost is not friendly to merchants and consumers. Therefore, a shot rendering algorithm aiming at the image is developed on the basis of a software level and becomes a research hotspot, the shot rendering implementation method depends on the operational performance of the mobile phone, the required hardware cost is relatively low, and the method is suitable for most smart phones on the market. At present, most algorithms are realized based on deep learning, and an end-to-end network is built to realize the shot effect rendering of images. However, when the deep learning algorithm is integrated into the mobile phone, it is a difficult problem to shorten the operation time, the operation speed and the rendering effect are restricted, and how to unify the operation speed and the rendering effect is a problem that must be considered.

Disclosure of Invention

The technical problem to be solved is as follows: aiming at the problems of high cost of the hardware-based implementation method and operation speed and rendering quality of the software-based implementation method, the invention provides a fast image and scene rendering method based on semi-prediction filtering.

The implementation steps are as follows: the invention provides a fast image scene rendering method based on semi-prediction filtering, which comprises the following basic steps:

step 1: making a data set;

step 1.1: the method comprises the steps of obtaining data shot under different scenes through shooting by a single lens reflex, wherein the data shot under the different scenes are a pair of pictures, namely, the data are respectively a full-focus picture I shot by the single lens reflex_orgAnd a picture I which is really shot by the single lens reflex camera by utilizing a large aperture and has a shot scene rendering effect_gt. Wherein the picture is in full focus I_orgPicture I with true shot rendering effect as input image data in model training process_gtAs comparison data for comparison with the model output images during the model training process.

Step 1.2: all pictures of the data set are interpolated to a size of 1024 x 1472 in height by using a bicubic linear interpolation method.

Step 1.3: and (5) making a coordinate graph. For the full focus picture I processed in the step 1.2_orgAnd (3) carrying out coordinate assignment, wherein the specific calculation method comprises the following steps:

wherein, X represents the pixel point coordinate corresponding to the high dimension of the picture, and Y represents the pixel point coordinate corresponding to the wide dimension of the picture. Combining the X and Y information with the full focus picture I_orgCombining to reconstruct a 5-channel full-focus picture I_org+cAs the final input picture of the network model.

Step 2: constructing a fast image scene rendering network model based on semi-prediction filtering;

step 2.1: semi-prediction filtering-based fast graph and scene rendering task theoryAnd (6) derivation. Suppose the input is a full focus picture I_org+cUsing significance detection algorithm to convert the full focus picture I_org+cDivided into two parts including a significant characteristic part I in a picture_focusAnd background characteristics I of the picture_defocus. Background region picture I by utilizing semi-filtering fuzzy algorithm_defocusBlurring to obtain picture I with blurred background_blurThe semi-filtering fuzzy algorithm divides the salient feature part I_focusPreserving and finally obtaining the picture I with fuzzy background_blurAnd salient feature portion I_focusFusing to obtain the required picture I with the shot rendering_bokeh. The theoretical model of the scene rendering task is formulated as follows:

wherein

Representing a saliency detection algorithm;

representing a semi-filtered blurring algorithm.

Step 2.2: constructing a fast image scene rendering network based on semi-prediction filtering;

the fast image scene rendering network based on semi-prediction filtering comprises an attention module, a residual error module, a semi-filtering kernel module and an image generation module. Wherein the attention module is used for detecting the input full focus picture I_org+cFor assisting the operation of a subsequent restrictive prediction filtering module; the residual error module is used for carrying out deep feature enhancement on the input data; the semi-filter kernel module is used for generating a needed filter kernel, and is used for carrying out filter operation on an input image and blurring partial content of the image so as to generate a shot rendering effect, wherein the filter kernel consists of a network generated self-adaptive filter kernel and a small amount of Gabor filter kernels with artificially defined parameters, and the network generated self-adaptive filter kernel is used for self-adaptingBlurring an input image, wherein a Gabor filtering kernel of artificially defined parameters is used for reserving and enhancing salient region details and edge details of the image; the image generation module is used for generating a picture which needs to be filtered by using the filtering kernel generated by the half filtering kernel module.

The specific structure of the residual error module is as follows: input feature map X of residual module_resSequentially obtaining an output feature map X 'after 3 convolution layers with the convolution kernel number of 64 and the convolution kernel size of 3X 3'_res. Finally, will output X'_resAnd input X_resAdding element by element to obtain the final output characteristic diagram X of the residual error module_res-out. Wherein all convolutional layers are followed by a ReLU nonlinear activation function.

Attention module specific structure: input profile X for attention Module_attThe dimension of (a) is height H width W channel C. The attention module is divided into three branches of up, middle mid and down, and a characteristic diagram X is input_attPerforming Reshape operation after convolution layer with convolution kernel number of 64 and convolution kernel size of 3X 3 after up branching to obtain feature diagram X with shape of HW X64_up(ii) a Input feature map X_attPerforming Reshape operation after convolution layers with the number of the convolution kernels passing through mid branches being 64 and the convolution kernel size being 3X 3 to obtain a feature map X with the shape of 64X HW_mid(ii) a Will feature diagram X_upAnd feature map X_midActivating by adopting a Softmax function after matrix multiplication is carried out to obtain a characteristic diagram X with the shape of HW and HW_act(ii) a Input feature map X_attAfter the convolution layer with the convolution kernel number of 64 and the convolution kernel size of 3X 3 is processed by Reshape operation, the characteristic diagram X with the shape of HW X64 is obtained_down(ii) a Will feature diagram X_actAnd feature map X_downAfter matrix multiplication, Reshape operation is carried out again to obtain a feature diagram X with the shape of H X W X64_final(ii) a Characteristic diagram X_finalAfter passing through a convolution layer with the number of 3 convolution kernels being 64 and the convolution kernel size being 3X 3, the convolution layer is compared with the input feature map X_attAdding element by element to obtain the final output characteristic diagram X of the attention module_att-out. Wherein all convolutional layers are followed by a ReLU nonlinear activation function.

The semi-filtering kernel module has the specific structure that: input feature map X of half-filter kernel module_filterThe dimension of (a) is height H width W channel C. Input feature map X_filterObtaining deep characteristic information X through a residual error module with 64 filtering kernels_deep(ii) a Sequentially passing through convolution layers with convolution kernel number of 64 and convolution kernel size of 3X 3 and up-sampling layers with multiple of 2 to obtain a feature map X 'of required generated filtering'_deep2H x 2W x 64 in size;

will feature picture X'_deepDivided by channel dimensions into a feature pattern X of size 2H X2W X48_AAnd a feature pattern X of size 2H X2W X16_B(ii) a Characteristic diagram X_AFor generating adaptive filtering kernels, i.e. feature maps X_ASequentially passing through convolution kernel with number k²Convolution layer with convolution kernel size of 3 x 3 and Softmax activation function to obtain size of 2H x 2W x k²Adaptive filtering kernel X_adp-fWhere k is a predefined filter kernel size; characteristic diagram X_BEdge filter kernels Gabor filter kernels for combined generation of fixed filter kernel parameters, i.e. feature maps X_BPerforming custom filtering operation with 16 Gabor filtering kernels with given parameters, and performing linear combination on the 16 Gabor filtering kernels to obtain the required size of 2H x 2W x k²Edge filtering kernel X_gabor-fThe method is used for rapidly enhancing the edge information of the picture to be reserved, wherein 16 Gabor filtering kernels comprise 8 directions, and the Gabor filtering kernels in the same direction comprise 2 Sigma parameters, so that the parameters of the 16 Gabor filtering kernels are different; finally, the adaptive filter kernel X is processed_adp-fAnd an edge filtering kernel X_gabor-fAdding element by element to obtain the final needed semi-filter kernel X_filter-out。

The image generation module has the specific structure that: the image generation module comprises three inputs, and an input feature map X with the same scale₁Low-scale up-sampled input feature map X₂Input half-filter kernel X generated by half-filter kernel module_filter-out. Input feature map X₁Sequentially passing through convolution layers with convolution kernel number of 3 and convolution kernel size of 3 x 3 and up-sampling layer with multiple of 2, and inputting the output resultCharacteristic diagram X₂Adding element by element to obtain a feature diagram X with the size H W3 and finally needing filtering operation_gen(ii) a Half-filtered kernel X_filter-outAnd feature map X_genPerforming convolution operation of the custom filtering kernel to obtain a final feature diagram X with the size of H X W X3_outFeature map X_outThe picture is the required picture which is processed by the shot rendering.

The specific structure of the complete network: the complete network is divided into 2 branches, and each branch processes information with different scales; the initial input to the network is the full focus picture I generated in step 1.3_org+c(ii) a Branch 1 contains a residual module, a half-filter kernel module and an image generation module, while branch 2 contains an attention module, a half-filter kernel module and an image generation module; in order to enhance the information correlation degree between different branches, the input of the branch 2 is composed of the intermediate information and the full focus picture information of the branch 1 which are all subjected to down sampling, and the output result of the branch 2 is fed back to the image generation module of the branch 1 for guiding the operation of the image generation module.

And step 3: and training a semi-prediction filtering-based fast image scene rendering network model.

The network model is trained as follows:

firstly, inputting the 5-channel full-focusing picture I manufactured in the step 1.3_org+c(ii) a Then, the saliency detection module and the restrictive prediction filtering module are used for preserving the saliency characteristics of the image and blurring the background; finally, continuously optimizing the picture I with shot rendering output by the model by utilizing the loss function_bokehGradually resembles the picture I with real shot rendering effect in the data set constructed in the step 1_gt。

In the training process, the loss function L adopts the combination of an L1 function and an LS function, and the model output picture I is improved_bokehAnd contrast picture I_gtStructural similarity between the two images, and by utilizing the back propagation of deep learning, the output image I of the model is continuously reduced_bokehAnd comparative picture I_gtSo as to realize the picture I with the shot rendering output by the model_bokehOptimization of (1), which is embodiedExpressed as:

L＝L1(I_bokeh,I_gt)+LS(I_bokeh,I_gt)

wherein L1 (I)_bokeh,I_gt) Picture I with shot rendering representing model output_bokehAnd contrast picture I_gtIs a reconstruction function of LS (I)_bokeh,I_gt) Picture I with shot rendering representing model output_bokehAnd contrast picture I_gtThe loss function is expressed as follows:

the Sobel represents that gradient calculation in the horizontal direction and the vertical direction is carried out on the picture and is used for calculating the outline structure of the picture content, and N represents the sum of the number of pixel points of the picture, namely the width W multiplied by the height H of the picture.

And 4, step 4: receiving the pictures needing to be subjected to the shot rendering processing by the trained neural network model, and outputting the pictures after the shot rendering processing is finished;

and (3) loading the weight of the shot rendering network model trained in the step (2) and updating parameters in the model. Secondly, the full focus picture I with the modified size in step 1.2 is taken_org+cThe input data is transmitted into a shot rendering network model and sequentially passes through a significance detection module and a restrictive prediction filtering module to obtain a model output picture I with a shot rendering effect_bokeh。

The invention has the following beneficial effects:

1. the method for fast image and scene rendering based on semi-predictive filtering is innovatively provided, and fast shot scene rendering of the image is achieved on the premise of ensuring shot scene rendering quality.

2. The coordinate graph is innovatively provided and used for assisting the training of the network model and improving the capability of the network model for distinguishing important contents in the input image.

Drawings

FIG. 1 is a schematic flow diagram of the process of the present invention;

FIG. 2 is a flowchart of a shot rendering process for a single image;

FIG. 3 is a diagram of a scene rendering network based on semi-predictive filtering;

FIG. 4 is an effect diagram of an automobile generating a shot rendering;

FIG. 5 is an effect diagram of a streetlight generating a shot rendering.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

The invention is first defined and explained below:

I_org: full focus picture

I_org+c: the 5-channel full-focus picture containing coordinate graph information is real input information of the network model

I_gt: picture with real shot rendering effect

I_bokeh: model output picture with shot rendering effect

FIG. 2 is a flowchart of a shot rendering process for a single image;

as shown in fig. 1, the present invention provides a fast image shot rendering method based on semi-predictive filtering, which comprises the following basic steps:

step 1: data set production

Step 1.2: all pictures of the data set are interpolated into the size of 1024 x 1472 by a bicubic linear interpolation method, and the size of the data set is unified, so that the operation time required by a training network is reduced.

step 2.1: and (3) carrying out theoretical derivation on a fast image scene rendering task based on semi-predictive filtering. Suppose the input is a full focus picture I_org+cUsing significance detection algorithm to convert the full focus picture I_org+cDivided into two parts including a significant characteristic part I in a picture_focusAnd background characteristics I of the picture_defocus. Background region picture I by utilizing semi-filtering fuzzy algorithm_defocusBlurring to obtain picture I with blurred background_blurThe semi-filtering fuzzy algorithm divides the salient feature part I_focusPreserving and finally obtaining the picture I with fuzzy background_blurAnd salient feature portion I_focusFusing to obtain the required picture I with the shot rendering_bokeh. The theoretical model of the scene rendering task is formulated as follows:

wherein

Representing a saliency detection algorithm;

representing a semi-filtered blurring algorithm.

Step 2.2: constructing a fast image scene rendering network based on semi-prediction filtering:

the fast image scene rendering network based on semi-prediction filtering comprises an attention module, a residual error module, a semi-filtering kernel module and an image generation module. Wherein the attention module is used for detecting the input full focus picture I_org+cFor assisting the operation of a subsequent restrictive prediction filtering module; the residual error module is used for carrying out deep feature enhancement on the input data; the semi-filtering kernel module is used for generating a required filtering kernel and is used for carrying out filtering operation on an input image and blurring partial content of a picture so as to generate a shot rendering effect, wherein the filtering kernel consists of a self-adaptive filtering kernel generated by a network and a small amount of Gabor filtering kernels manually defined with parameters, the self-adaptive filtering kernel generated by the network is used for self-adaptively blurring the input image, and the Gabor filtering kernel manually defined with parameters is used for reserving and enhancing salient region details and edge details of the image; the image generation module is used for generating a picture which needs to be filtered by using the filtering kernel generated by the half filtering kernel module.

Attention module specific structure: of attention modulesInput feature map X_attThe dimension of (a) is height H width W channel C. The attention module is divided into three branches of up, middle mid and down, and a characteristic diagram X is input_attPerforming Reshape operation after convolution layer with convolution kernel number of 64 and convolution kernel size of 3X 3 after up branching to obtain feature diagram X with shape of HW X64_up(ii) a Input feature map X_attPerforming Reshape operation after convolution layers with the number of the convolution kernels passing through mid branches being 64 and the convolution kernel size being 3X 3 to obtain a feature map X with the shape of 64X HW_mid(ii) a Will feature diagram X_upAnd feature map X_midActivating by adopting a Softmax function after matrix multiplication is carried out to obtain a characteristic diagram X with the shape of HW and HW_act(ii) a Input feature map X_attAfter the convolution layer with the convolution kernel number of 64 and the convolution kernel size of 3X 3 is processed by Reshape operation, the characteristic diagram X with the shape of HW X64 is obtained_down(ii) a Will feature diagram X_actAnd feature map X_downAfter matrix multiplication, Reshape operation is carried out again to obtain a feature diagram X with the shape of H X W X64_final(ii) a Characteristic diagram X_finalAfter passing through a convolution layer with the number of 3 convolution kernels being 64 and the convolution kernel size being 3X 3, the convolution layer is compared with the input feature map X_attAdding element by element to obtain the final output characteristic diagram X of the attention module_att-out. Wherein all convolutional layers are followed by a ReLU nonlinear activation function.

will feature picture X'_deepDivided by channel dimensions into a feature pattern X of size 2H X2W X48_AAnd a feature pattern X of size 2H X2W X16_B(ii) a Characteristic diagram X_AFor generating adaptive filtering kernels, i.e. feature maps X_ASequentially passes through a convolution kernel by the number ofk²Convolution layer with convolution kernel size of 3 x 3 and Softmax activation function to obtain size of 2H x 2W x k²Adaptive filtering kernel X_adp-fWhere k is a predefined filter kernel size; characteristic diagram X_BEdge filter kernels Gabor filter kernels for combined generation of fixed filter kernel parameters, i.e. feature maps X_BPerforming custom filtering operation with 16 Gabor filtering kernels with given parameters, and performing linear combination on the 16 Gabor filtering kernels to obtain the required size of 2H x 2W x k²Edge filtering kernel X_gabor-fThe method is used for rapidly enhancing the edge information of the picture to be reserved, wherein 16 Gabor filtering kernels comprise 8 directions, and the Gabor filtering kernels in the same direction comprise 2 Sigma parameters, so that the parameters of the 16 Gabor filtering kernels are different; finally, the adaptive filter kernel X is processed_adp-fAnd an edge filtering kernel X_gabot-fAdding element by element to obtain the final needed semi-filter kernel X_filter-out。

The image generation module has the specific structure that: the image generation module comprises three inputs, and an input feature map X with the same scale₁Low-scale up-sampled input feature map X₂Input half-filter kernel X generated by half-filter kernel module_filter-out. Input feature map X₁Sequentially passing through convolution layers with convolution kernel number of 3 and convolution kernel size of 3X 3 and up-sampling layer with multiple of 2, and outputting the result and inputting characteristic diagram X₂Adding element by element to obtain a feature diagram X with the size H W3 and finally needing filtering operation_gen(ii) a Half-filtered kernel X_filter-outAnd feature map X_genPerforming convolution operation of the custom filtering kernel to obtain a final feature diagram X with the size of H X W X3_outFeature map X_outThe picture is the required picture which is processed by the shot rendering.

The specific structure of the complete network: the complete network is divided into 2 branches, and each branch processes information with different scales; the initial input to the network is the full focus picture I generated in step 1.3_org+c(ii) a Branch 1 contains a residual module, a half-filter kernel module and an image generation module, while branch 2 contains an attention module, a half-filter kernel module and an image generation module; to strengthenWith the information relevance between the branches, the input of the branch 2 is composed of the intermediate information and the full focus picture information of the branch 1 which are all subjected to down sampling, and the output result of the branch 2 is fed back to the image generation module of the branch 1 for guiding the operation of the image generation module.

The network model is trained as follows:

In the training process, the loss function L adopts the combination of an L1 function and an LS function, and the model output picture I is improved_bokehAnd contrast picture I_gtStructural similarity between the two images, and by utilizing the back propagation of deep learning, the output image I of the model is continuously reduced_bokehAnd comparative picture I_gtSo as to realize the picture I with the shot rendering output by the model_bokehThe optimization is specifically represented as:

L＝L1(I_bokeh,I_gt)+LS(I_bokeh,I_gt)

And 4, step 4: the trained neural network model receives pictures needing to be subjected to shot rendering processing, and the pictures are output after the shot rendering processing is finished

Firstly, loading the weight of the shot rendering network model trained in the step 2, and updating parameters in the model. Secondly, the full focus picture I with the modified size in step 1.2 is taken_org+cThe input data is transmitted into a shot rendering network model and sequentially passes through a significance detection module and a restrictive prediction filtering module to obtain a model output picture I with a shot rendering effect_bokeh。

FIG. 4 is an effect diagram of an automobile generating a shot rendering;

FIG. 5 is an effect diagram of a streetlight generating a shot rendering.

Claims

1. A fast image shot rendering method based on semi-prediction filtering is characterized by comprising the following steps:

step 1: making a data set;

and step 3: training a fast image scene rendering network model based on semi-prediction filtering;

and 4, step 4: and receiving the pictures needing to be subjected to the shot rendering processing by the trained neural network model, and outputting the pictures after the shot rendering processing is finished.

2. The method for rendering a fast image scene based on semi-predictive filtering according to claim 1, wherein the specific method in step 1 is as follows:

step 1.1: acquiring data shot in different scenes through shooting by a single lens reflex, wherein the data shot in the different scenes are all data shot in the different scenesA pair of pictures, i.e. all-in-focus pictures I taken by slrs respectively_orgAnd a picture I which is really shot by the single lens reflex camera by utilizing a large aperture and has a shot scene rendering effect_gt(ii) a Wherein the picture is in full focus I_orgPicture I with true shot rendering effect as input image data in model training process_gtAs comparison data for comparison with the model output image in the model training process;

step 1.2: interpolating all pictures of the data set into the size of 1024 height multiplied by 1472 width by a bicubic linear interpolation method;

step 1.3: making a coordinate graph; for the full focus picture I processed in the step 1.2_orgAnd (3) carrying out coordinate assignment, wherein the specific calculation method comprises the following steps:

x represents the pixel point coordinate corresponding to the high dimension of the picture, and Y represents the pixel point coordinate corresponding to the wide dimension of the picture; combining the X and Y information with the full focus picture I_orgCombining to reconstruct a 5-channel full-focus picture I_org+cAs the final input picture of the network model.

3. The method for rendering a fast image scene based on semi-predictive filtering according to claim 2, wherein the specific method in step 2 is as follows:

step 2.1: carrying out scene rendering task theoretical derivation on the basis of the fast image of semi-prediction filtering; suppose the input is a full focus picture I_org+cUsing significance detection algorithm to convert the full focus picture I_org+cDivided into two parts including a significant characteristic part I in a picture_focusAnd background characteristics I of the picture_defocus(ii) a Background reduction using half-filter fuzzy algorithmRegion picture I_defocusBlurring to obtain picture I with blurred background_blurThe semi-filtering fuzzy algorithm divides the salient feature part I_focusPreserving and finally obtaining the picture I with fuzzy background_blurAnd salient feature portion I_focusFusing to obtain the required picture I with the shot rendering_bokeh(ii) a The theoretical model of the scene rendering task is formulated as follows:

wherein

Representing a saliency detection algorithm;

represents a semi-filtered fuzzy algorithm;

the fast image scene rendering network based on semi-prediction filtering comprises an attention module, a residual error module, a semi-filtering kernel module and an image generation module; wherein the attention module is used for detecting the input full focus picture I_org+cFor assisting the operation of a subsequent restrictive prediction filtering module; the residual error module is used for carrying out deep feature enhancement on the input data; the semi-filtering kernel module is used for generating a required filtering kernel and is used for carrying out filtering operation on an input image and blurring partial content of a picture so as to generate a shot rendering effect, wherein the filtering kernel consists of a self-adaptive filtering kernel generated by a network and a small amount of Gabor filtering kernels manually defined with parameters, the self-adaptive filtering kernel generated by the network is used for self-adaptively blurring the input image, and the Gabor filtering kernel manually defined with parameters is used for reserving and enhancing salient region details and edge details of the image; the image generation module is used for generating a picture which needs to be filtered by using the filtering kernel generated by the half filtering kernel module;

the complete network is divided into 2 branches, and each branch processes information with different scales; the initial input to the network is the full focus picture I generated in step 1.3_org+c(ii) a Branch 1 contains a residual module, a half-filter kernel module and an image generation module, while branch 2 contains an attention module, a half-filter kernel module and an image generation module; in order to enhance the information correlation degree between different branches, the input of the branch 2 is composed of the intermediate information and the full focus picture information of the branch 1 which are all subjected to down sampling, and the output result of the branch 2 is fed back to the image generation module of the branch 1 for guiding the operation of the image generation module.

4. The method for fast picture and scene rendering based on semi-predictive filtering according to claim 3, wherein the residual module has a specific structure: input feature map X of residual module_resSequentially obtaining an output feature map X 'after 3 convolution layers with the convolution kernel number of 64 and the convolution kernel size of 3X 3'_res(ii) a Finally, will output X'_resAnd input X_resAdding element by element to obtain the final output characteristic diagram X of the residual error module_res-out(ii) a Wherein all convolutional layers are followed by a ReLU nonlinear activation function.

5. The method for fast image scene rendering based on semi-predictive filtering according to claim 4, wherein the attention module has a specific structure: input profile X for attention Module_attThe dimension of (a) is height H and width W channels C; the attention module is divided into three branches of up, middle mid and down, and a characteristic diagram X is input_attPerforming Reshape operation after convolution layer with convolution kernel number of 64 and convolution kernel size of 3X 3 after up branching to obtain feature diagram X with shape of HW X64_up(ii) a Input feature map X_attPerforming Reshape operation after convolution layers with the number of the convolution kernels passing through mid branches being 64 and the convolution kernel size being 3X 3 to obtain a feature map X with the shape of 64X HW_mid(ii) a Will feature diagram X_upAnd feature map X_midActivating by adopting a Softmax function after matrix multiplication is carried out to obtain a characteristic diagram X with the shape of HW and HW_act(ii) a Input feature map X_attAfter the convolution layer with the convolution kernel number of 64 and the convolution kernel size of 3X 3 is processed by Reshape operation, the characteristic diagram X with the shape of HW X64 is obtained_down(ii) a Will feature diagram X_actAnd feature map X_downAfter matrix multiplication, Reshape operation is carried out again to obtain a feature diagram X with the shape of H X W X64_final(ii) a Characteristic diagram X_finalAfter passing through a convolution layer with the number of 3 convolution kernels being 64 and the convolution kernel size being 3X 3, the convolution layer is compared with the input feature map X_attAdding element by element to obtain the final output characteristic diagram X of the attention module_att-out(ii) a Wherein all convolutional layers are followed by a ReLU nonlinear activation function.

6. The method for fast image scene rendering based on semi-predictive filtering according to claim 5, wherein the semi-filtering kernel module has a specific structure: input feature map X of half-filter kernel module_filterThe dimension of (a) is height H and width W channels C; input feature map X_filterObtaining deep characteristic information X through a residual error module with 64 filtering kernels_deep(ii) a Sequentially passing through convolution layers with convolution kernel number of 64 and convolution kernel size of 3X 3 and up-sampling layers with multiple of 2 to obtain a feature map X 'of required generated filtering'_deep2H x 2W x 64 in size;

will feature picture X'_deepDivided by channel dimensions into a feature pattern X of size 2H X2W X48_AAnd a feature pattern X of size 2H X2W X16_B(ii) a Characteristic diagram X_AFor generating adaptive filtering kernels, i.e. feature maps X_ASequentially passing through convolution kernel with number k²Convolution layer with convolution kernel size of 3 x 3 and Softmax activation function to obtain size of 2H x 2W x k²Adaptive filtering kernel X_adp-fWhere k is a predefined filter kernel size; characteristic diagram X_BEdge filter kernels Gabor filter kernels for combined generation of fixed filter kernel parameters, i.e. feature maps X_BPerforming custom filtering operation with 16 Gabor filtering kernels with given parameters, and performing linear combination on the 16 Gabor filtering kernels to obtain the required size of 2H x 2W x k²Edge filterWave nucleus X_gabor-fThe method is used for rapidly enhancing the edge information of the picture to be reserved, wherein 16 Gabor filtering kernels comprise 8 directions, and the Gabor filtering kernels in the same direction comprise 2 Sigma parameters, so that the parameters of the 16 Gabor filtering kernels are different; finally, the adaptive filter kernel X is processed_adp-fAnd an edge filtering kernel X_gabor-fAdding element by element to obtain the final needed semi-filter kernel X_filter-out。

7. The method for fast image scene rendering based on semi-predictive filtering according to claim 6, wherein the image generation module has a specific structure: the image generation module comprises three inputs, and an input feature map X with the same scale₁Low-scale up-sampled input feature map X₂Input half-filter kernel X generated by half-filter kernel module_filter-out(ii) a Input feature map X₁Sequentially passing through convolution layers with convolution kernel number of 3 and convolution kernel size of 3X 3 and up-sampling layer with multiple of 2, and outputting the result and inputting characteristic diagram X₂Adding element by element to obtain a feature diagram X with the size H W3 and finally needing filtering operation_gen(ii) a Half-filtered kernel X_filter-outAnd feature map X_genPerforming convolution operation of the custom filtering kernel to obtain a final feature diagram X with the size of H X W X3_outFeature map X_outThe picture is the required picture which is processed by the shot rendering.

8. The method for fast picture and scene rendering based on semi-predictive filtering according to any one of claims 3-7, wherein the specific method in step 3 is as follows:

the network model is trained as follows:

firstly, inputting the 5-channel full-focusing picture I manufactured in the step 1.3_org+c(ii) a Then, the saliency detection module and the restrictive prediction filtering module are used for preserving the saliency characteristics of the image and blurring the background; finally, continuously optimizing the picture I with shot rendering output by the model by utilizing the loss function_bokehTo make it gradually resemble the number constructed in step 1Picture I with real shot rendering effect in data set_gt；

L＝L1(I_bokeh，I_gt)+LS(I_bokeh，I_gt)

wherein L1 (I)_bokeh，I_gt) Picture I with shot rendering representing model output_bokehAnd contrast picture I_gtIs a reconstruction function of LS (I)_bokeh，I_gt) Picture I with shot rendering representing model output_bokehAnd contrast picture I_gtThe loss function is expressed as follows:

9. The method for fast image scene rendering based on semi-predictive filtering according to claim 8, wherein the specific method in step 4 is as follows:

loading the weight of the shot rendering network model trained in the step 2, and updating parameters in the model; secondly, the full focus picture I with the modified size in step 1.2 is taken_org+cThe input data is transmitted into a shot rendering network model and sequentially passes through a significance detection module and a restrictive prediction filtering module to obtain a model output picture I with a shot rendering effect_bokeh。