CN112183727A - Countermeasure generation network model, and shot effect rendering method and system based on countermeasure generation network model - Google Patents

Countermeasure generation network model, and shot effect rendering method and system based on countermeasure generation network model Download PDF

Info

Publication number
CN112183727A
CN112183727A CN202011053131.8A CN202011053131A CN112183727A CN 112183727 A CN112183727 A CN 112183727A CN 202011053131 A CN202011053131 A CN 202011053131A CN 112183727 A CN112183727 A CN 112183727A
Authority
CN
China
Prior art keywords
shot
effect
picture
network
image
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011053131.8A
Other languages
Chinese (zh)
Inventor
冷聪
李成华
林嘉珉
程健
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing Artificial Intelligence Chip Innovation Institute Institute Of Automation Chinese Academy Of Sciences
Zhongke Fangcun Zhiwei Nanjing Technology Co ltd
Original Assignee
Nanjing Artificial Intelligence Chip Innovation Institute Institute Of Automation Chinese Academy Of Sciences
Zhongke Fangcun Zhiwei Nanjing Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing Artificial Intelligence Chip Innovation Institute Institute Of Automation Chinese Academy Of Sciences, Zhongke Fangcun Zhiwei Nanjing Technology Co ltd filed Critical Nanjing Artificial Intelligence Chip Innovation Institute Institute Of Automation Chinese Academy Of Sciences
Priority to CN202011053131.8A priority Critical patent/CN112183727A/en
Publication of CN112183727A publication Critical patent/CN112183727A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Molecular Biology (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Image Analysis (AREA)
  • Image Processing (AREA)

Abstract

The invention provides a shot effect rendering method based on an antagonism generation network, which is characterized in that a lightweight network is designed, and meanwhile, an operator supported by a TensorFlow Lite frame is used for realizing example normalization again, so that all operators for training a shot rendering model by the antagonism generation network consisting of an end-to-end generator in a glasses shape and a multi-sense-field discriminator can be calculated on a GPU of a smart phone without occupying large resources. The invention also can obviously detect the area to be focused while not depending on a prior method, so that the blurring effect of the out-of-focus area is natural, and the invention is suitable for the condition of multiple scenes rather than the purpose of processing specific scenes such as portrait only and the like.

Description

Countermeasure generation network model, and shot effect rendering method and system based on countermeasure generation network model
Technical Field
The invention relates to a shot effect rendering method and system based on an confrontation generation network, relates to general image data processing and image reconstruction technology based on machine deep learning, and particularly relates to the field of shot effect processing analysis based on neural network construction.
Background
On the basis of improving the living standard and the aesthetic value, the life recording is carried out through the image acquisition equipment, so that the universal demand of the public is met. In photography, the shot effect is considered to be one of the most important aesthetic criteria. Under the development of the prior art, a single lens reflex of a large-aperture lens can easily render a shot image with a natural effect, but as a common audience for image recording through a mobile phone, the smart phone is difficult to be provided with the large-aperture lens and other special sensors, so that the existing smart phone terminal is difficult to acquire a photo with the shot effect.
In the development of synthetic shot effect rendering, a semantic segmentation method is generally used for segmenting people from an image, and then other areas are blurred, so that the attention points are only processing of photos of the people, the limitation is strong, and the photos with richer scenes cannot be processed. Although the synthetic shot effect rendering of the smart phone is realized by relying on special or expensive hardware, the method is not suitable for the market of low-end smart phones.
Disclosure of Invention
The purpose of the invention is as follows: an object is to provide a method for creating a shot effect rendering model based on an antagonistic generation network model, so as to solve the above problems in the prior art. A further object is to propose a system implementing the above method.
The technical scheme is as follows: a confrontation generation network model for training a shot rendering comprises a judger and a generator, wherein the judger is used for receiving pictures, carrying out neural network training and supervising the difference between a generated image block and a shot image block in a data set; the generator is an end-to-end convolutional neural network and is used for outputting the picture subjected to shot rendering.
In a further embodiment, the determiner is further configured to:
the multi-sensing visual field decision device is used for receiving the picture data set with rich scenes and training the neural network; and meanwhile, the method is used for monitoring the difference between the generated image block and the corresponding scattered image block in the data set and paying attention to the details of the image blocks at the same position and different sizes.
The picture sets contain rich scenes and appear in pairs; and the pictures which appear in pairs and have no blurring effect in the picture set correspond to one picture with a shot effect, wherein the picture without the shot effect is used as a training set, and the picture with the shot effect is used as a label set for supervised learning.
The generator is further a two-stage network, and the two networks both adopt a structure consisting of an encoder and a decoder; the two stages are divided into a first stage and a second stage, and in the first stage, the training network learns the mapping from the image without the blurring effect to the residual error of the input image and the corresponding image with the shot effect; in the second stage, the vivid panoramic effect is generated by thinning;
in the first stage, the number of basic channels of the network is 16, and the maximum number of channels is 128; the relation between the input image I and the output image in the corresponding label set, which is O with a shot effect, and the residual R is R-I-O, so that I-R represents a rough picture with a shot rendering effect; residual errors R generated in the network at the stage have certain depth information, namely, extra depth information is not needed to be used as priori knowledge;
in the second stage, the number of basic channels of the network is 32, and the maximum number of channels is 256; the rough picture with the shot rendering effect generated in the first stage is refined to generate a shot effect picture with a vivid effect;
in the generator composition structure, an encoder block formed by the encoder is a convolutional layer with the step length of a preset length, comprises three downsampling layers and is used for outputting a characteristic diagram;
in the generator composition structure, a decoder block formed by the decoder is a transposed convolutional layer with three step lengths being preset lengths, and is used for receiving a feature map converted by a residual block; the feature map of the residual block conversion is obtained by converting the feature map output by the encoder block by adopting a preset number of residual blocks; the residual block is sequentially connected with conv, ReLU, instancenorm, conv and ReLU layers, and an additive connection exists between the input and the output;
the convolution layer of the encoder block and the transposition convolution layer of the decoder block are both activated by ReLU, the output layers of the two-stage network are realized by convolution with the step length of a preset length and a tanh function, and skip connection is arranged between the convolution layer and the transposition convolution layer of the mirror image of the convolution layer for enhancing details of an output image.
A method for establishing a shot effect rendering model based on an confrontation generation network comprises the following steps:
step one, obtaining a picture for training;
secondly, putting the obtained picture into a neural network for network training;
and step three, obtaining a confrontation generation network model for training the shot rendering.
In a further embodiment, the first step is further: in supervised learning, the training pictures have one label picture in one-to-one correspondence in a constructed training set, and the label picture is a picture with a shot rendering effect.
In a further embodiment, the second step is further: the neural network used for training is a confrontation generation network consisting of an end-to-end generator in the shape of glasses and a multi-receptive-field discriminator. The decision device is used for receiving the pictures, carrying out neural network training and supervising the difference between the generated image blocks and the scattered image blocks in the data set. The generator is an end-to-end convolutional neural network and is used for outputting the picture subjected to shot rendering.
The generator is further a two-stage network, and both networks adopt a structure formed by an encoder and a decoder. The two stages are divided into a first stage and a second stage, and in the first stage, the training network learns the mapping from the image without the blurring effect to the residual error of the input image and the corresponding image with the shot effect; in the second stage, the thinning produces vivid panoramic effect.
In the first stage, the number of basic channels of the network is 16, and the maximum number of channels is 128; the relation between the input image I and the output image in the corresponding label set, which is O with a shot effect, and the residual R is R-I-O, so that I-R represents a rough picture with a shot rendering effect; the residual error R generated in the network at this stage has certain depth information, i.e. no additional depth information is needed as a priori knowledge. In the second stage, the number of basic channels of the network is 32, and the maximum number of channels is 256; and refining the rough picture with the shot rendering effect generated in the first stage to generate a shot effect picture with a vivid effect.
In the generator composition structure, an encoder block composed of an encoder is a convolution layer with the step length of a preset length, comprises three down-sampling layers and is used for outputting a characteristic diagram; the decoder block composed of the decoder is a transposed convolutional layer with a set number step length as a preset length and is used for receiving a feature map converted by a residual block. Wherein the residual block is connected with conv, ReLU, instancenorm, conv and ReLU layers in sequence, and an additive connection exists between the input and the output. The feature map of the transform is obtained by transforming the feature map output by the encoder block by using 9 residual blocks.
The convolutional layer of the encoder block and the transposed convolutional layer of the decoder block are both activated by ReLU, the output layers of the two-stage network are both realized by convolution with the step length of a preset length and tanh, and skip connection is used for enhancing the details of the output image between the convolutional layers and the transposed convolutional layers of the mirror image.
In the network training process, normalization is carried out on the instances in the residual block in the generator by utilizing a lightweight frame TensorFlow Lite, so that the task of image-to-image translation is carried out, and further, the instance normalization is realized again by using an operator supported by the TensorFlow Lite frame; i.e. the calculation is performed according to a single channel of a single sample, wherein the example normalization is expressed as:
Figure BDA0002710133220000031
wherein the content of the first and second substances,
Figure BDA0002710133220000032
Figure BDA0002710133220000033
denotes the t-thijkElements where k and j are the height and width across the spatial dimension, i is the signal of the feature, t is the index of the image in batch,
Figure BDA0002710133220000034
the mean value is represented by the average value,
Figure BDA0002710133220000035
represents the variance; calculated from the constant size of the feature map for each layer using tf.nn.avg _ pool2d
Figure BDA0002710133220000036
The size of (2).
The loss function involved in training the network is:
Ltotal=0.5*L1+0.05*LSSIM+0.1*LVGG+Ladv
wherein L is1To mean absolute error, LSSIMFor structural similarity, LVGGFor perception of loss, LadvRepresenting a loss of confrontation;
Figure BDA0002710133220000041
Figure BDA0002710133220000042
wherein H represents the height of the image, W represents the width of the image, C represents the number of channels of the image, and F (-) is pre-trained on ImageNetFeature map of the 34 th layer output of VGG19 network, G (I)i,j,kFor pictures generated by the generator, Ci,j,kD (-) is the output of the discriminator, for the corresponding original picture with the shot effect.
In a further embodiment, the third step is further: parameter optimization of a training network model is realized through supervised learning of a large number of picture training sets until a function is converged to obtain a generation model which can obtain a picture with a shot rendering effect from one picture.
A shot effect rendering system based on an confrontation generation network is characterized in that firstly, a confrontation generation network model for training shot rendering, which is obtained after training, is stored and converted into a tflite file; secondly, deploying the tflite file to a mobile phone terminal; thirdly, inputting a picture without a background blurring effect into the deployed mobile phone terminal; then, calling a GPU (graphics processing Unit) of the mobile phone end; and finally, obtaining a picture which is subjected to shot rendering through the neural network. The method specifically comprises the following steps:
a first module for obtaining a shot rendering model for a confrontation-based generation network.
And the second module is used for deploying the model for obtaining the shot rendering effect based on the countermeasure generation network to the mobile phone end, and the model saves the shot rendering model for obtaining the shot rendering effect based on the countermeasure generation network, which is obtained by the first module, converts the model into tflite files which can be deployed to the mobile phone end and further deploys the tflite files to the mobile phone end of the user.
And the third module is used for calling the GPU of the mobile phone to operate, inputting a picture needing to be subjected to shot rendering after the model deployment is completed, calling the GPU of the mobile phone to start to operate after the picture is received, and starting to perform shot rendering of the picture.
And the fourth module is used for obtaining and presenting the picture with the shot rendering effect, and the picture with the shot rendering effect generated by the third module is visually presented by the fourth module.
Has the advantages that: the invention provides a shot effect rendering method based on an opposition generation network and a system for realizing the method, which are characterized in that a lightweight network is designed, and meanwhile, an operator supported by a TensorFlow Lite frame is used for realizing instance normalization again, so that all operators for training a shot rendering model by the opposition generation network consisting of a glasses-shaped end-to-end generator and a multi-sense-field discriminator can be calculated on a GPU of a smart phone without occupying large resources. The invention also meets the aims of obviously detecting the area to be focused while not depending on a prior method, leading the blurring effect of the out-of-focus area to be natural, and being suitable for the condition of multiple scenes rather than only processing specific scenes such as portrait and the like.
Drawings
FIG. 1 is a schematic flow chart of the method of the present invention.
Fig. 2 is a block diagram of the countermeasure network generator of the present invention.
FIG. 3 is a diagram of the multi-field discriminator according to the present invention.
FIG. 4 is a comparison graph of the invention for a shot rendering and a domain update algorithm.
Fig. 5 is a diagram showing effects of the present invention applied to a mobile phone.
Detailed Description
The applicant believes that although there are smart phone composite shot effect rendering implementations relying on special or expensive hardware, this is not suitable for the market of low-end smart phones.
In order to solve the problems in the prior art, and the shot effect rendering can be deployed to low-end smart phone equipment, the invention provides a shot rendering method based on an confrontation generation network and a system for realizing the method.
The present invention will be further described in detail with reference to the following examples and accompanying drawings.
In the present application, we propose a method for rendering a shot based on a countermeasure generation network and a system for implementing the method, where the included method for rendering a shot based on a countermeasure generation network specifically includes the following steps:
step one, obtaining a picture for training; the method comprises the steps of adopting a data set with rich scene paired pictures, wherein each picture without blurring effect corresponds to a picture with a shot effect in a picture training set.
Secondly, putting the obtained picture into a neural network for network training; the neural network used for training is a confrontation generation network consisting of an end-to-end generator in the shape of glasses and a multi-receptive-field discriminator. The decision device is used for receiving the pictures, carrying out neural network training and supervising the difference between the generated image blocks and the scattered image blocks in the data set. The generator is an end-to-end convolutional neural network and is used for outputting the picture subjected to shot rendering.
The discriminator used by the invention can be used as a strategy for generating more vivid shot images, and adopts a multi-receptive-field supervision mode, and the schematic diagram of the discriminator is shown in fig. 3. The discriminator designed by the invention can monitor the difference between the generated image blocks with the size of 70 multiplied by 70 and the scattered image blocks in the corresponding data set. Meanwhile, the depth of the PatchGAN discriminator is considered and modified in the design process, so that the network can pay attention to the details of the image blocks at the same position and in different sizes. The present network combines patch gan discriminators with different depths into a multi-field discriminator in the structure of the countermeasure section. The monitoring mode of multiple receptive fields is beneficial to the generator to generate the result which is more in line with the sense of human eyes.
The invention adopts fewer generator parameters and smaller models, and is easy to operate on portable equipment such as smart phones and the like. As shown in fig. 2, the generator is further a two-stage network, and both networks adopt a structure composed of an encoder and a decoder. The two stages are divided into a first stage and a second stage, and in the first stage, the training network learns the mapping from the image without the blurring effect to the residual error of the input image and the corresponding image with the shot effect; in the second stage, the thinning produces vivid panoramic effect.
In the first stage, the number of basic channels of the network is 16, and the maximum number of channels is 128; the relation between the input image I and the output image in the corresponding label set, which is O with a shot effect, and the residual R is R-I-O, so that I-R represents a rough picture with a shot rendering effect; the residual error R generated in the network at this stage has certain depth information, i.e. no additional depth information is needed as a priori knowledge. In the second stage, the number of basic channels of the network is 32, and the maximum number of channels is 256; and refining the rough picture I-R with the shot rendering effect generated in the first stage to generate a shot effect picture with a vivid effect.
In the generator composition structure, an encoder block composed of an encoder is a convolution layer with the step length of 2, comprises three down-sampling layers and is used for outputting a characteristic diagram; the decoder block, consisting of decoders, is three transposed convolutional layers of step size 2, for receiving the signature transformed by the residual block. Wherein the residual block is connected with conv, ReLU, instancenorm, conv and ReLU layers in sequence, and an additive connection exists between the input and the output. The feature map of the transform is obtained by transforming the feature map output by the encoder block by using 9 residual blocks.
The convolutional layer of the encoder block and the transposed convolutional layer of the decoder block are both activated by ReLU, the output layers of the two-stage network are both realized by convolution with step size of 1 plus tanh, and skip connection is used for enhancing details of the output image between the convolutional layers and the transposed convolutional layers mirrored therefrom.
In the process of network training, because the embodiment normalization exists in the residual block in the generator, if the embodiment normalization is removed, the generated image can not generate natural and vivid panoramic effect like a network with the embodiment normalization, and meanwhile, in order to more conveniently deploy the model to a mobile phone end, the invention adopts a lightweight framework TensorFlow Lite.
The lightweight framework TensorFlow Lite normalizes the instances in the residual block in the generator to perform the task of image-to-image translation, and the computation is performed according to a single channel of a single sample, wherein the normalization of the instances is expressed as:
Figure BDA0002710133220000071
wherein the content of the first and second substances,
Figure BDA0002710133220000072
Figure BDA0002710133220000073
denotes the t-thijkElements where k and j are the height and width across the spatial dimension, i is the signal of the feature, t is the index of the image in batch,
Figure BDA0002710133220000074
the mean value is represented by the average value,
Figure BDA0002710133220000075
represents the variance; calculated from the constant size of the feature map for each layer using tf.nn.avg _ pool2d
Figure BDA0002710133220000076
The size of (2).
The TensorFlow Lite framework does not support acceleration of the GPU of the mobile phone for instance normalization, so that additional memory overhead from the CPU to the GPU synchronization is increased by using the instance normalization, and the time for processing the image is greatly increased. In order to solve the problem, the embodiment normalization is realized again by using an operator supported by a TensorFlow Lite framework, so that all operations of finally obtaining the model can be carried out on a GPU at the end of the smart phone. Through the application verification and the re-realized example normalization, the running speed of the model established by the invention on the mobile phone is improved by nearly 6 times.
The loss function involved in training the network is:
Ltotal=0.5*L1+0.05*LSSIM+0.1*LVGG+Ladv
wherein L is1Is the mean absolute error, LSSIMFor structural similarity, LVGGFor sensing loss, LadvRepresenting a loss of confrontation; l isVGGTransmitting the input and the output into a VGG19 model pre-trained on an ImageNet data set, and taking a feature map corresponding to the 34 th layer of the VGG19 to calculate an average absolute error; l isadvRepresenting the countermeasure loss, and improving the final output effect of the generator through the countermeasure of the generator and the discriminator.
Figure BDA0002710133220000077
Figure BDA0002710133220000078
Wherein F (-) is the feature map of the output at layer 34 of VGG19 network pre-trained on ImageNet, G (I)i,j,kFor pictures generated by the generator, Ci,j,kD (-) is the output of the discriminator, for the corresponding original picture with the shot effect.
Step three, obtaining a confrontation generation network model for training the shot rendering; after supervised learning of a large number of picture training sets, a network model which achieves optimization of parameters of the network model until function convergence is achieved is obtained, and the network model can achieve the purpose of obtaining a picture with a shot rendering effect from one picture.
Based on the method, a system for realizing the method can be constructed, and the realization system firstly stores and converts an confrontation generation network model for training the shot rendering, which is obtained after training, into a tflite file; secondly, deploying the tflite file to a mobile phone terminal; thirdly, inputting a picture without a background blurring effect into the deployed mobile phone terminal; then, calling a GPU (graphics processing Unit) of the mobile phone end; and finally, obtaining a picture which is subjected to shot rendering through the neural network. The method specifically comprises the following steps:
and the first module is used for obtaining a shot rendering model based on the confrontation generation network and extracting the model which is trained in the previous stage.
And the second module is used for deploying the model for obtaining the shot rendering effect based on the countermeasure generation network to the mobile phone end, and the model saves the shot rendering model for obtaining the shot rendering effect based on the countermeasure generation network, which is obtained by the first module, converts the model into tflite files which can be deployed to the mobile phone end and further deploys the tflite files to the mobile phone end of the user.
And the third module is mainly used for transmitting a GPU operation instruction after receiving an input picture, specifically, the mobile phone input end receives a picture to be shot rendered, the GPU operation instruction is triggered after receiving the picture, and then the GPU of the mobile phone end starts to operate and starts to perform shot rendering of the picture.
And the fourth module is used for obtaining and presenting the picture with the shot rendering effect generated by the third module on a visual interface of the mobile phone end.
As shown in fig. 4, comparing the effect generated by the shot rendering method of the present invention with the algorithms proposed by Dutta and PyNET, it can be clearly seen that the shot effect graph generated by the present invention is most natural, the object in the focusing area in the generated shot image is clearly visible, the foreground and the background can be well separated, and the other two results are somewhat blurred, so that the foreground and the background cannot be separated. Wherein shown from left to right in fig. 4 are the input picture, the result of the algorithm proposed by Dutta, the result proposed by PyNET and the result after application of the present invention. Since the PyNET and Dutta et al methods rely heavily on the Megadepth map generated, both methods produce poor results once the depth map fails to provide accurate depth information.
The invention can be applied to the mobile phone and realizes the real-time shot picture rendering of the mobile phone. After the shot effect rendering system based on the countermeasure generation network is used, the mobile phone can finish processing the photo by only needing a common photo shot by a camera without depending on an expensive camera module and a plurality of camera systems and without estimating a depth map in advance, highlight the main part of the photo and blurring the background part of the photo. Fig. 5 shows the result of processing the photos taken by part of the mobile phone by using the invention. The left side is an original photo shot by the mobile phone, and the right side is a photo with a shot effect processed by the algorithm. It can be seen that the method provided by the invention realizes natural shot effect and keeps the definition of the main body part.
As noted above, while the present invention has been shown and described with reference to certain preferred embodiments, it is not to be construed as limited thereto. Various changes in form and detail may be made therein without departing from the spirit and scope of the invention as defined by the appended claims.

Claims (8)

1. An adversarial generation network model, comprising a decider and a generator; the decision device is used for receiving the pictures, carrying out neural network training and monitoring the difference between the generated image blocks and the scattered image blocks in the data set; the generator is an end-to-end convolutional neural network and is used for outputting the picture subjected to shot rendering.
2. A challenge generating network model according to claim 1, wherein said decider is further configured to:
the multi-sensing visual field decision device is used for receiving the picture data set with rich scenes and training the neural network; meanwhile, the method is used for monitoring the difference between the generated image blocks and the corresponding scattered image blocks in the data set and paying attention to the details of the image blocks at the same position and in different sizes;
the picture sets contain rich scenes and appear in pairs; and the pictures which appear in pairs and have no blurring effect in the picture set correspond to one picture with a shot effect, wherein the picture without the shot effect is used as a training set, and the picture with the shot effect is used as a label set for supervised learning.
3. A challenge generation network model according to claim 1, wherein the generator is further a two-stage network, and both networks are constructed with an encoder and a decoder; the two stages are divided into a first stage and a second stage, and in the first stage, the training network learns the mapping from the image without the blurring effect to the residual error of the input image and the corresponding image with the shot effect; in the second stage, the vivid panoramic effect is generated by thinning;
in the first stage, the number of basic channels of the network is 16, and the maximum number of channels is 128; the relation between the input image I and the output image in the corresponding label set, which is O with a shot effect, and the residual R is R-I-O, so that I-R represents a rough picture with a shot rendering effect; residual errors R generated in the network at the stage have certain depth information, namely, extra depth information is not needed to be used as priori knowledge;
in the second stage, the number of basic channels of the network is 32, and the maximum number of channels is 256; the rough picture with the shot rendering effect generated in the first stage is refined to generate a shot effect picture with a vivid effect;
in the generator composition structure, an encoder block formed by the encoder is a convolutional layer with the step length of a preset length, comprises three downsampling layers and is used for outputting a characteristic diagram;
in the generator composition structure, a decoder block composed of the decoder is a transposed convolutional layer with a preset number of step lengths and preset length, and is used for receiving a feature map converted by a residual block; the feature map of the residual block conversion is obtained by converting the feature map output by the encoder block by adopting a preset number of residual blocks; the residual block is sequentially connected with conv, ReLU, instancenorm, conv and ReLU layers, and an additive connection exists between the input and the output;
the convolution layer of the encoder block and the transposed convolution layer of the decoder block are both activated by ReLU, the output layers of the two-stage network are both realized by convolution with the step length of the preset length and tanh, and skip connection is used for enhancing details of the output image between the convolution layer and the transposed convolution layer of the mirror image.
4. A shot effect rendering method based on the countermeasure generation network model according to any one of claims 1 to 3, comprising:
firstly, acquiring a picture for training; then, putting the obtained picture into a confrontation generation network model for training the shot rendering according to any one of claims 1-3 for network training; finally, obtaining a network model for generating a shot rendering effect; the method comprises the steps that a training set is built, wherein the training pictures are in one-to-one correspondence with one label picture in the built training set, and the label pictures are pictures which correspond to each other in supervised learning and have a shot effect; normalizing the instances existing in the residual block in the generator by utilizing an operator supported by a TensorFlow Lite framework in the network training process; wherein the loss function involved in the training process is Ltotal
5. The shot effect rendering method of claim 4, wherein the neural network is further configured to:
the neural network used for training is a confrontation generation network consisting of an end-to-end generator in the shape of glasses and a multi-receptive-field discriminator; the decision device is used for receiving the pictures, carrying out neural network training and supervising the difference between the generated image blocks and the scattered image blocks in the data set; the generator is an end-to-end convolutional neural network and is used for outputting the picture subjected to shot rendering;
the generator is further a two-stage network, and the two networks both adopt a structure consisting of an encoder and a decoder; the two stages are divided into a first stage and a second stage, and in the first stage, the training network learns the mapping from the image without the blurring effect to the residual error of the input image and the corresponding image with the shot effect; in the second stage, the vivid panoramic effect is generated by thinning;
in the first stage, the number of basic channels of the network is 16, and the maximum number of channels is 128; the relation between the input image I and the output image in the corresponding label set, which is O with a shot effect, and the residual R is R-I-O, so that I-R represents a rough picture with a shot rendering effect; residual errors R generated in the network at the stage have certain depth information, namely, extra depth information is not needed to be used as priori knowledge;
in the second stage, the number of basic channels of the network is 32, and the maximum number of channels is 256; the rough picture with the shot rendering effect generated in the first stage is refined to generate a shot effect picture with a vivid effect;
in the generator composition structure, an encoder block formed by the encoder is a convolutional layer with the step length of a preset length, comprises three downsampling layers and is used for outputting a characteristic diagram;
in the generator composition structure, a decoder block composed of the decoder is a transposed convolutional layer with a preset number of step lengths and preset length, and is used for receiving a feature map converted by a residual block; the feature map of the residual block conversion is obtained by converting the feature map output by the encoder block by adopting a preset number of residual blocks; the residual block is sequentially connected with conv, ReLU, instancenorm, conv and ReLU layers, and an additive connection exists between the input and the output;
the convolution layer of the encoder block and the transposed convolution layer of the decoder block are both activated by ReLU, the output layers of the two-stage network are both realized by convolution with step length of 1 plus tanh, and skip connection is used for enhancing details of the output image between the convolution layer and the transposed convolution layer of the mirror image.
6. The bokeh effect rendering method of claim 4, wherein the instance normalization is further:
normalizing the instances in the residual block in the generator by using a lightweight frame TensorFlow Lite, thereby performing the task of image-to-image translation, and further realizing instance normalization again by using an operator supported by the TensorFlow Lite frame; i.e. the calculation is performed according to a single channel of a single sample, wherein the example normalization is expressed as:
Figure FDA0002710133210000031
wherein the content of the first and second substances,
Figure FDA0002710133210000032
Figure FDA0002710133210000033
denotes the t-thijkElements where k and j are the height and width across the spatial dimension, i is the signal of the feature, t is the index of the image in batch,
Figure FDA0002710133210000034
the mean value is represented by the average value,
Figure FDA0002710133210000035
represents the variance; calculated from the constant size of the feature map for each layer using tf.nn.avg _ pool2d
Figure FDA0002710133210000036
The size of (2).
7. The shot effect rendering method of claim 4, wherein the penalty function is further:
the loss function used in the process of training the network is:
Ltotal=0.5*L1+0.05*LSSIM+0.1*LVGG+Ladv
wherein L is1To mean absolute error, LSSIMFor structural similarity, LVGGFor perception of loss, LadvRepresenting a loss of confrontation;
Figure FDA0002710133210000037
Figure FDA0002710133210000038
wherein H represents the height of the image, W represents the width of the image, C represents the number of channels of the image, F (-) is a feature diagram of the 34 th layer output of the VGG19 network pre-trained on ImageNet, G (I)i,j,kFor pictures generated by the generator, Ci,j,kD (-) is the output of the discriminator, for the corresponding original picture with the shot effect.
8. A shot effect rendering system based on an antagonistic generation network model, which is used for realizing the method of any one of the above claims 4-7, and is characterized by comprising: firstly, saving and converting the confrontation generation network model for training the shot rendering according to claim 1 into a tflite file, and then deploying the tflite file to a mobile phone terminal; thirdly, inputting a picture without a background blurring effect into the deployed mobile phone terminal; then, calling a GPU (graphics processing Unit) of the mobile phone end; and finally, obtaining a picture which is subjected to shot rendering through the neural network.
CN202011053131.8A 2020-09-29 2020-09-29 Countermeasure generation network model, and shot effect rendering method and system based on countermeasure generation network model Pending CN112183727A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011053131.8A CN112183727A (en) 2020-09-29 2020-09-29 Countermeasure generation network model, and shot effect rendering method and system based on countermeasure generation network model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011053131.8A CN112183727A (en) 2020-09-29 2020-09-29 Countermeasure generation network model, and shot effect rendering method and system based on countermeasure generation network model

Publications (1)

Publication Number Publication Date
CN112183727A true CN112183727A (en) 2021-01-05

Family

ID=73946759

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011053131.8A Pending CN112183727A (en) 2020-09-29 2020-09-29 Countermeasure generation network model, and shot effect rendering method and system based on countermeasure generation network model

Country Status (1)

Country Link
CN (1) CN112183727A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2022217470A1 (en) * 2021-04-13 2022-10-20 Shanghaitech University Hair rendering system based on deep neural network

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180075581A1 (en) * 2016-09-15 2018-03-15 Twitter, Inc. Super resolution using a generative adversarial network
CN109712203A (en) * 2018-12-29 2019-05-03 福建帝视信息科技有限公司 A kind of image rendering methods based on from attention generation confrontation network
US20190251401A1 (en) * 2018-02-15 2019-08-15 Adobe Inc. Image composites using a generative adversarial neural network
CN110211192A (en) * 2019-05-13 2019-09-06 南京邮电大学 A kind of rendering method based on the threedimensional model of deep learning to two dimensional image
CN110324603A (en) * 2018-03-29 2019-10-11 三星电子株式会社 For handling method, electronic equipment and the medium of image
WO2020117657A1 (en) * 2018-12-03 2020-06-11 Google Llc Enhancing performance capture with real-time neural rendering
CN111476783A (en) * 2020-04-13 2020-07-31 腾讯科技(深圳)有限公司 Image processing method, device and equipment based on artificial intelligence and storage medium
CN111583135A (en) * 2020-04-24 2020-08-25 华南理工大学 Nuclear prediction neural network Monte Carlo rendering image denoising method
CN111625608A (en) * 2020-04-20 2020-09-04 中国地质大学(武汉) Method and system for generating electronic map according to remote sensing image based on GAN model

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180075581A1 (en) * 2016-09-15 2018-03-15 Twitter, Inc. Super resolution using a generative adversarial network
US20190251401A1 (en) * 2018-02-15 2019-08-15 Adobe Inc. Image composites using a generative adversarial neural network
CN110324603A (en) * 2018-03-29 2019-10-11 三星电子株式会社 For handling method, electronic equipment and the medium of image
WO2020117657A1 (en) * 2018-12-03 2020-06-11 Google Llc Enhancing performance capture with real-time neural rendering
CN109712203A (en) * 2018-12-29 2019-05-03 福建帝视信息科技有限公司 A kind of image rendering methods based on from attention generation confrontation network
CN110211192A (en) * 2019-05-13 2019-09-06 南京邮电大学 A kind of rendering method based on the threedimensional model of deep learning to two dimensional image
CN111476783A (en) * 2020-04-13 2020-07-31 腾讯科技(深圳)有限公司 Image processing method, device and equipment based on artificial intelligence and storage medium
CN111625608A (en) * 2020-04-20 2020-09-04 中国地质大学(武汉) Method and system for generating electronic map according to remote sensing image based on GAN model
CN111583135A (en) * 2020-04-24 2020-08-25 华南理工大学 Nuclear prediction neural network Monte Carlo rendering image denoising method

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
ANDREY IGNATOV等: "AIM 2019 Challenge on Bokeh Effect Synthesis: Methods and Results", 《2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOP (ICCVW)》, 5 March 2020 (2020-03-05), pages 3591 - 3598 *
MING QIAN等: "BGGAN: Bokeh-Glass Generative Adversarial Network for Rendering Realistic Bokeh", 《ECCV 2020: COMPUTER VISION – ECCV 2020 WORKSHOPS》, vol. 12537, 30 January 2021 (2021-01-30), pages 229 *
XINGE ZHU等: "Generative Adversarial Frontal View to Bird View Synthesis", 《2018 INTERNATIONAL CONFERENCE ON 3D VISION (3DV)》, 14 October 2018 (2018-10-14), pages 454 - 463 *
袁琳君等: "基于生成对抗网络的人像修复", 《计算机应用》, vol. 40, no. 03, 20 November 2019 (2019-11-20), pages 842 - 846 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2022217470A1 (en) * 2021-04-13 2022-10-20 Shanghaitech University Hair rendering system based on deep neural network

Similar Documents

Publication Publication Date Title
CN108345892B (en) Method, device and equipment for detecting significance of stereo image and storage medium
CN109948721B (en) Video scene classification method based on video description
CN111985281B (en) Image generation model generation method and device and image generation method and device
CN113507627B (en) Video generation method and device, electronic equipment and storage medium
CN100505840C (en) Method and device for transmitting face synthesized video
CN112200732B (en) Video deblurring method with clear feature fusion
CN110751649A (en) Video quality evaluation method and device, electronic equipment and storage medium
CN110599411A (en) Image restoration method and system based on condition generation countermeasure network
CN116863320B (en) Underwater image enhancement method and system based on physical model
CN112597824A (en) Behavior recognition method and device, electronic equipment and storage medium
CN113949808A (en) Video generation method and device, readable medium and electronic equipment
CN111597978A (en) Method for automatically generating pedestrian re-identification picture based on StarGAN network model
CN112183727A (en) Countermeasure generation network model, and shot effect rendering method and system based on countermeasure generation network model
CN106778576A (en) A kind of action identification method based on SEHM feature graphic sequences
CN113902636A (en) Image deblurring method and device, computer readable medium and electronic equipment
CN111524060B (en) System, method, storage medium and device for blurring portrait background in real time
CN112184586A (en) Method and system for rapidly blurring monocular visual image background based on depth perception
CN117097853A (en) Real-time image matting method and system based on deep learning
CN113409331B (en) Image processing method, image processing device, terminal and readable storage medium
CN113254713B (en) Multi-source emotion calculation system and method for generating emotion curve based on video content
WO2022235785A1 (en) Neural network architecture for image restoration in under-display cameras
CN113920023A (en) Image processing method and device, computer readable medium and electronic device
CN112232302A (en) Face recognition method
CN114005157A (en) Micro-expression recognition method of pixel displacement vector based on convolutional neural network
CN112200816A (en) Method, device and equipment for segmenting region of video image and replacing hair

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information
CB02 Change of applicant information

Address after: Room 203b, building 3, artificial intelligence Industrial Park, 266 Chuangyan Road, Qilin science and Technology Innovation Park, Jiangning District, Nanjing City, Jiangsu Province, 211000

Applicant after: Zhongke Fangcun Zhiwei (Nanjing) Technology Co.,Ltd.

Applicant after: Zhongke Nanjing artificial intelligence Innovation Research Institute

Address before: Room 203b, building 3, artificial intelligence Industrial Park, 266 Chuangyan Road, Qilin science and Technology Innovation Park, Jiangning District, Nanjing City, Jiangsu Province, 211000

Applicant before: Zhongke Fangcun Zhiwei (Nanjing) Technology Co.,Ltd.

Applicant before: NANJING ARTIFICIAL INTELLIGENCE CHIP INNOVATION INSTITUTE, INSTITUTE OF AUTOMATION, CHINESE ACADEMY OF SCIENCES