CN114022360B

CN114022360B - Rendered image super-resolution system based on deep learning

Info

Publication number: CN114022360B
Application number: CN202111305312.XA
Authority: CN
Inventors: 任志鹏; 赵建平; 陈纯毅; 娄岩
Original assignee: Changchun University of Science and Technology
Current assignee: Changchun University of Science and Technology
Priority date: 2021-11-05
Filing date: 2021-11-05
Publication date: 2024-05-03
Anticipated expiration: 2041-11-05
Also published as: CN114022360A

Abstract

The invention discloses a depth learning-based rendering image super-resolution system, wherein the rendered low-resolution image has characteristic information such as depth, texture, normal vector and the like, the image is simulated and trained mainly through the depth learning, the characteristic information of the image is obtained, and an optimized MFSR network model is obtained, so that the low-resolution image is converted into a high-resolution image, each index evaluation is carried out on the realized high-resolution image, wherein the evaluation comprises peak signal-to-noise ratio PSNR and structural similarity SSIM, the larger the PSNR value is, the closer the reconstructed image is to a real clear image, and if the SSIM value is closer to 1, the closer the reconstructed image is to the real clear image, and otherwise, the opposite is true. The invention can play the roles of enhancing video image quality, improving video quality and improving visual experience of users, and simultaneously can shorten rendering time, save cost and bring considerable economic benefit.

Description

Rendered image super-resolution system based on deep learning

Technical Field

The invention relates to the technical field of artificial intelligence deep learning, in particular to a rendering image super-resolution system based on deep learning.

Background

At present, many aspects of researches on super-resolution of images are carried out at home and abroad, and many algorithms and models are provided for super-resolution of general images. However, for the rendered image, the method of obtaining information features such as edges, textures, depth, normal vectors and the like to perform super resolution and improving the accuracy of a prediction result is utilized, so that the clearer the rendered image is, the higher the time and cost for rendering are, and no research on the aspect is available at home at present. But research on extraction of characteristic information of a rendered image is still in a starting stage at present. The invention is based on the part of research and development results of 'Jilin province science and technology center natural science foundation project 20190201271 JC'. The super-resolution system for the rendered image based on the deep learning has the advantages of shortening the rendering time, saving the cost and bringing considerable economic benefit

Disclosure of Invention

The invention provides a rendering image super-resolution system based on deep learning, aiming at the image distortion phenomenon in the field of film and television production, a key technology for intelligently extracting image features and a key technology for dynamically dividing images for storage are researched, an image feature recognition optimization algorithm based on a deep convolutional neural network is provided, a multi-scale multi-feature super-resolution network model system is created, the model system fully extracts features such as image edges, textures, depths and the like by using the deep learning technology, the reconstruction capability of the network on image feature information is enhanced, and the created network model is subjected to objective evaluation index comparison with the existing interpolation algorithm on a self-built data set to obtain a high-resolution and high-definition image.

The technical scheme adopted by the invention is a rendering image super-resolution system based on deep learning, which is characterized in that: the method comprises the steps of extracting an image data characteristic module, collecting and sorting pictures in a classical disney rendered image data set to find out more than 100 clear rendered images, wherein the rendered images comprise 10-dimensional information, the 10-dimensional information comprises R ',G',B ',normalR',normalG ',normalB',albedoR ',albedoG',albedoB and depth Z, downsampling the previous three-dimensional RGB channel by 2,3,4 times, interpolating and reducing the previous three-dimensional RGB channel to the original size by using a bicubic interpolation method, and then cutting 64 x 64 images after horizontal, vertical and overturn to obtain a blurred image as input data;

Creating a multi-feature super-resolution (MFSR) network model, forming a batch of each 64 groups of data of the blurred image, entering a training network model into 64 x 64 divided pictures of 10 channels of each batch, learning by a 22-layer deep convolution network, generating a 3-channel target image after extracting data features of each channel in the blurred image, and obtaining an optimized MFSR network model by calculating a loss function of the target image and a real image and reversely propagating an update weight value;

and (3) carrying out result analysis, namely carrying out peak signal-to-noise ratio (PSNR) and Structural Similarity (SSIM) evaluation on the reconstructed high-definition image, wherein the larger the PSNR value is, the closer the reconstructed image is to the real clear image, the closer the SSIM value is to 1, and the closer the reconstructed image is to the real clear image.

Further, the rendered image is a exr-type multi-channel file.

Further, the mathematical expression of the SSIM is: The mathematical expression of PSNR:

Wherein X _ori、X_res represents the true sharp image and the reconstructed image, MSE (Mean Squared Error) represents the error between X _ori and X _res, and m and n represent the row and column numbers of the image, respectively.

The beneficial effects of the invention are as follows: according to the invention, the low-resolution image rendered can be built into an optimized network model by using a deep learning technology, and the characteristic information such as the internal special edge, texture, depth and normal vector is acquired, so that the high-resolution image can be obtained more accurately. The invention can shorten the rendering time, save the cost and bring considerable economic benefit.

Drawings

Fig. 1 is a network model diagram of a depth learning based rendered image super-resolution system MFSR according to the present invention.

Fig. 2 is a block diagram of a rendered image super-resolution system module based on depth learning according to the present invention.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments of the present invention. The components of the embodiments of the present invention generally described and illustrated in the figures herein can be arranged and designed in a wide variety of different configurations.

The main technical points of the invention lie in a method for extracting image data characteristics and an optimization model system, which are as follows:

the data are obtained, the rendered image has information characteristics such as edges, textures, depth, normal vectors and the like, and the data of the image characteristics are obtained, so that the data loss rate is further reduced, and the corresponding precision is further improved.

By collecting and sorting pictures in a classical disney rendered image dataset, 100 clear rendered images can be found, more than 100 rendered images can be obtained, the rendered images are exr files, the files comprise 10-dimensional information, namely ' R ', ' G ', ' B ', ' normal.R ', ' normal.G ', ' B ', ' albedo.R ', ' albedo.G ', ' albedo.B ', ' depth Z ', and the like, R represents red, G represents green, B represents blue, ' normal.G ', ' normal.B ', ' albedo.R ', representing normal red, green and blue, ' albedo.G ', ' albedo.B ', representing normal vectors red, green and blue, ' depth of the images, the images are read in, the front three-dimensional ' RGB ' channels are downsampled by 2,3,4 times, the interpolation method is used for reducing the front three-dimensional channels to be large, the front three-dimensional channels to be the original depth of the images, the front three-dimensional channels are restored to be the original images, the front three-dimensional images are subjected to be cut to be the original 64, and the front three-dimensional images are subjected to be used as the corresponding data, and the front-dimensional images are subjected to 64, and the front-cut data are obtained, and the front-cut data are prepared, and the front 64 images are only cut to be used for the files.

Training, on the basis of deep research VDSR, DRCN, lapSRN, DRRN, memNet, IKN, MSICF and other network models, an innovative and optimized network model is provided, so that dynamic scheduling and distribution of computing resources are achieved, utilization rate and task completion benefits are maximized, and accurate and effective extraction of image feature information is ensured.

The method comprises the steps of providing a multi-feature super-resolution network model, namely MFSR network model, enabling each 64 groups of data of a fuzzy image to be a batch, enabling 64 x 64 divided pictures with 10 channels in each batch to enter a training network model, enabling the divided pictures to enter a 22-layer deep convolution network, obtaining a 3-channel target image after the convolution kernel size is 3*3 and extracting data features of all channels in a fuzzy input image, and enabling the update weight to be transmitted reversely through calculating a loss function of the target image output and a real image target when the loss function is deep learning and enabling the original image to be different from the target image, and finally obtaining the optimized network model. We set the momentum parameter to 0.9 and the weight decay to 10 ^-4. The learning rate is initialized to 0.1 and then reduced by a factor of 10 every 10 cycles. The training iterates 80 times, the loss function is converged to a very considerable degree at the 50 th time, and the training model of the 80 th generation is used for testing, so that the testing effect is better than the result of the VDSR network model.

Analysis of results we performed a comparative study on the dataset Car, classroom, bathroom and House established by themselves, each of which contained 5 images. And performing index evaluation on the realized high-definition image, wherein the index evaluation comprises peak signal-to-noise ratio (PSNR), structural Similarity (SSIM), running time and the like, so that the result of the system is ensured to be superior to that of the existing basic method.

The mathematical expression for SSIM is:

The SSIM gives an evaluation of the image quality by performing a similarity comparison of the structural information in the compared images,

If the value of SSIM is closer to 1, this means that the reconstructed image is closer to a true sharp image, and vice versa.

Mathematical expression of PSNR:

Wherein X _ori、X_res represents the true sharp image and the reconstructed image, MSE (Mean Squared Error) represents the error between X _ori and X _res, and m and n represent the row and column numbers of the image, respectively. PSNR is given in Db. It is worth noting that the larger the value of PSNR, the closer the reconstructed image is to the true sharp image. See table 1.

Claims

1. The rendering image super-resolution system based on the deep learning is characterized in that: the method comprises the steps of extracting an image data characteristic module, collecting and sorting pictures in a classical disney rendered image data set to find out more than 100 clear rendered images, wherein the rendered images comprise 10-dimensional information, the 10-dimensional information comprises R ',G',B ',normalR',normalG ',normalB',albedoR ',albedoG',albedoB and depth Z, downsampling the previous three-dimensional RGB channel by 2,3,4 times, interpolating and reducing the previous three-dimensional RGB channel to the original size by using a bicubic interpolation method, and then cutting 64 x 64 images after horizontal, vertical and overturn to obtain a blurred image as input data;

2. The depth learning based rendered image super resolution system of claim 1, wherein: the rendered image is a exr-type multi-channel file.

3. The depth learning based rendered image super resolution system of claim 1, wherein: the mathematical expression of the SSIM is as follows: The mathematical expression of PSNR: Wherein X _ori、X_res represents the true sharp image and the reconstructed image, MSE (Mean Squared Error) represents the error between X _ori and X _res, and m and n represent the row and column numbers of the image, respectively.