CN112991231A

CN112991231A - Single-image super-image and perception image enhancement joint task learning system

Info

Publication number: CN112991231A
Application number: CN202110466163.9A
Authority: CN
Inventors: 袁峰; 李晓; 张越皖; 徐亦飞; 李浬; 桑葛楠
Original assignee: Hangzhou Oying Network Technology Co ltd
Current assignee: Hangzhou Oying Network Technology Co ltd
Priority date: 2020-07-23
Filing date: 2021-04-28
Publication date: 2021-06-18
Anticipated expiration: 2041-04-28
Also published as: CN112381722A; CN112991231B

Abstract

The application relates to a single image super and perception image enhancement joint task learning system, an electronic device and a storage medium, wherein the single image super and perception image enhancement joint task learning system comprises a deep network model, and the deep network model comprises a first network, a second network and a third network. The first network comprises a large coding and decoding small coding and decoding residual network and a local residual network comprising superposed lightweight multi-scale residual blocks with different kernel sizes, and local and global information is described simultaneously by utilizing a multi-path learning strategy; the second network comprises two groups of convolution layers which are connected in parallel and share information, and is used for enhancing the high-frequency details of the image; the third network includes a U-net network and a color correction module for seeking an optimal fusion color correction matrix to learn color and tone mapping. By the method and the device, the problem that the visual effect of the image cannot be well improved by the joint task of image super-component and perception image enhancement in the prior art is solved.

Description

Single-image super-image and perception image enhancement joint task learning system

Technical Field

The present application relates to the field of computer vision, and in particular, to a single-image super and perceptual image enhancement joint task learning system, an electronic device, and a storage medium.

Background

Image hyper-segmentation and perceptual image enhancement are one of the main technologies in the field of computer vision and image processing. In recent years, deep learning techniques have achieved considerable performance in various computer vision tasks, and have greatly promoted the development of image super-scoring and perceptual image enhancement. In the aspect of image hyper-differentiation, various deep learning methods based on a traditional convolutional neural network and a generation countermeasure network (GAN) exist. For perceptual image enhancement, a series of automatic processing methods exist to deal with color restoration, image sharpness, brightness, contrast, and the like. For the combination of image super-segmentation and perceptual image enhancement, the prior art attempts to generate an enhanced perceptual image from an original low-segmentation image require sequential execution of super-segmentation and perceptual image enhancement methods, however, because errors can be propagated in a cascading process, such sequential execution is inefficient and difficult to guarantee accuracy. While when executed under a joint scheme, the outputs produced by these two tasks may complement each other, thereby producing better results.

At present, the following solutions exist for the joint task of image super-segmentation and perceptual image enhancement: schwartz, R.Giryes, and A.M.Bronstein,2018, "Deepisp: heated learning, an end-to-end image processing pipeline" proposes a color correction mapping for learning a specific digital camera using a deep neural network. Then, x.xu, y.ma, and w.sun,2019, "means real scene super-resolution with raw images", designs a dual network that can realize super-resolution real scenes by using original data and color images at the same time, and can be well applied to different cameras. Meanwhile, k.mei, j.li, j.zhang, h.wu, j.li, and r.huang,2019, "high resolution network for image removal and enhancing" learns image features at different resolutions using two parallel paths. For such joint tasks, currently, these methods only use perceptual image enhancement as an auxiliary product for solving the problem of over-segmentation of images in real scenes, and most methods pay more attention to details than colors, but cannot improve the visual effect of images well.

Aiming at the problem that the visual effect of an image cannot be well improved in a joint task of image super-segmentation and perceptual image enhancement in the related technology, an effective solution is not provided at present.

Disclosure of Invention

In the embodiment, a single image super-segmentation and perceptual image enhancement joint task learning system, an electronic device and a storage medium are provided to solve the problem that the visual effect of an image cannot be well improved in a joint task of image super-segmentation and perceptual image enhancement in the related art.

In a first aspect, in this embodiment, a single-image super and perceptual image enhancement joint task learning system is provided, including:

a learning module comprising a deep network model comprising a first network, a second network, and a third network;

the first network comprises a large coding and decoding residual error network, a small coding and decoding residual error network and a local residual error network, wherein the large coding and decoding residual error network uses more RRDB than the small coding and decoding residual error network, and the local residual error network comprises superposed lightweight multi-scale residual error blocks with different kernel sizes;

the second network is used for up-sampling the image and comprises two groups of convolution layers which are connected in parallel and share information;

the third network comprises a U-net network and a color correction module, wherein the U-net network is used for extracting image characteristics, the color correction module is used for generating a color correction matrix according to the extracted image characteristics, and the color correction matrix is used for performing color correction on output images of the first network and the second network;

the input of the depth network model is a training image pair, the training image pair comprises a single original image and a single corrected image, and the original image is simultaneously used as the input of the first network, the second network and the third network.

In some of these embodiments, the learning module further comprises an image pre-processing unit;

the image preprocessing unit is used for filtering the original image to obtain a basic information layer image before the original image is input into a second network, performing element-by-element division operation on the original image and the basic information layer image to obtain a detail information layer image, finally superposing the original image and the detail information layer image to obtain a result image, and inputting the result image into the second network.

In some of these embodiments, the two sets of convolutional layers of the second network employ different convolutional kernel sizes.

In some of these embodiments, the large codec residual network and the small codec residual network employ different downsampling scales.

In some of these embodiments, the setting of the residual block in the large codec residual network and the small codec residual network comprises: delete batch normalization layer, replace the PReLu layer with the RRelu layer, and/or delete channel attention module.

In some embodiments, the U-net network down-samples the original image to a first image during an encoding phase, and processes the first image according to a first policy and a second policy, respectively;

the first policy includes: down-sampling the first image to obtain a second image, and copying the second image;

the second policy includes: performing global average pooling on the first image to obtain a third image, and copying the third image;

and in a decoding stage, the U-net network merges the copied second image and the copied third image into a first feature map, and splices the first feature map and the first image to obtain a second feature map.

In some of these embodiments, the color correction module includes a stitching unit, a global color correction unit, and a local color correction unit; the splicing unit is used for splicing output images of the large coding and decoding residual error network, the small coding and decoding residual error network and the local residual error network to obtain a fifth image, feeding the fifth image into the third network, and obtaining a third characteristic diagram by adjusting the size and upsampling the fifth image;

the global color correction unit is used for inputting the third image into a full connection layer, and the full connection layer outputs a global color correction matrix;

and the local color correction unit is used for applying the global color correction matrix to the third characteristic diagram to obtain a local color correction matrix, and the local color correction matrix is applied to the output images of the first network and the second network.

In some of these embodiments, the optimization goal of the deep network model is minimization of a loss function, which is a linear superposition of content loss, total variation loss, multi-scale structural similarity loss, and pixel loss.

In some of these embodiments, the decoding stage of the large codec residual network and the encoding stage of the small codec residual network are provided with a skip connection.

In some of these embodiments, the system further comprises a training module for inputting a set of training images into the learning module to train the deep network model.

In a second aspect, in this embodiment, there is provided an electronic device, including a memory and a processor, wherein the memory stores a computer program, and the processor is configured to execute the computer program to implement the single image super and perceptual image enhancement joint task learning system according to the first aspect.

In a third aspect, in the present embodiment, there is provided a storage medium having stored thereon a computer program which, when executed by a processor, implements the single image hyper and perceptual image enhancement joint task learning system as described in the first aspect above. .

Compared with the prior art, the hyper-segmentation and perceptual image enhancement joint task learning system, the electronic device and the storage medium are provided, wherein the single-image hyper-segmentation and perceptual image enhancement joint task learning system trains a deep network model with a training image set, the model is provided with three sub-networks, a first network simultaneously describes local and global information by utilizing a multi-path learning strategy, a second network samples and enhances high-frequency details by utilizing two groups of convolutional layers which are connected in parallel and shared by information, and a third network seeks a color correction matrix to learn color and tone mapping. The invention can recover more details of the image and realize better contrast; endowing the image with vivid and natural colors, so that the reconstruction result is more real; eliminating noise and smearing of the image, producing a more visually pleasing result. Compared with the prior art, the method and the device solve the problem that the visual effect of the image cannot be well improved in the joint task of image super-resolution and perceived image enhancement.

The details of one or more embodiments of the application are set forth in the accompanying drawings and the description below to provide a more thorough understanding of the application.

Drawings

The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the application and together with the description serve to explain the application and not to limit the application. In the drawings:

fig. 1 is a block diagram of a hardware structure of a terminal of a single-image super and perceptual image enhancement joint task learning system provided in an embodiment of the present application;

FIG. 2 is a schematic diagram of the operation of a deep network model in a single-image hyper-perceptual image enhancement joint task learning system according to an embodiment;

FIG. 3 is a diagram illustrating a second network in a single-image hyper-perceptual image enhancement joint task learning system according to an embodiment;

FIG. 4 is a schematic diagram of an image preprocessing unit in a single-image super and perceptual image enhancement joint task learning system according to an embodiment.

Detailed Description

For a clearer understanding of the objects, aspects and advantages of the present application, reference is made to the following description and accompanying drawings.

Unless defined otherwise, technical or scientific terms used herein shall have the same general meaning as commonly understood by one of ordinary skill in the art to which this application belongs. The use of the terms "a" and "an" and "the" and similar referents in the context of this application do not denote a limitation of quantity, either in the singular or the plural. The terms "comprises," "comprising," "has," "having," and any variations thereof, as referred to in this application, are intended to cover non-exclusive inclusions; for example, a process, method, and system, article, or apparatus that comprises a list of steps or modules (elements) is not limited to the listed steps or modules, but may include other steps or modules (elements) not listed or inherent to such process, method, article, or apparatus. Reference throughout this application to "connected," "coupled," and the like is not limited to physical or mechanical connections, but may include electrical connections, whether direct or indirect. Reference to "a plurality" in this application means two or more. "and/or" describes an association relationship of associated objects, meaning that three relationships may exist, for example, "A and/or B" may mean: a exists alone, A and B exist simultaneously, and B exists alone. In general, the character "/" indicates a relationship in which the objects associated before and after are an "or". The terms "first," "second," "third," and the like in this application are used for distinguishing between similar items and not necessarily for describing a particular sequential or chronological order.

The system embodiments provided in the present embodiment may be implemented in a terminal, a computer, or a similar computing device. For example, the learning system is implemented on a terminal, and fig. 1 is a block diagram of a hardware structure of the terminal of the single image super and perceptual image enhancement joint task learning system according to the embodiment. As shown in fig. 1, the terminal may include one or more processors 102 (only one shown in fig. 1) and a memory 104 for storing data, wherein the processor 102 may include, but is not limited to, a processing device such as a microprocessor MCU or a programmable logic device FPGA. The terminal may also include an input-output device 108. It will be understood by those of ordinary skill in the art that the structure shown in fig. 1 is merely an illustration and is not intended to limit the structure of the terminal described above. For example, the terminal may also include more or fewer components than shown in FIG. 1, or have a different configuration than shown in FIG. 1.

The memory 104 can be used for storing computer programs, for example, software programs and modules of application software, such as a computer program corresponding to the single image hyper-perceptual image enhancement joint task learning system in the embodiment, and the processor 102 executes various functional applications and data processing by running the computer programs stored in the memory 104, thereby implementing the above-mentioned method. The memory 104 may include high speed random access memory, and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory. In some examples, the memory 104 may further include memory located remotely from the processor 102, which may be connected to the terminal over a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The input/output device 108 is used for realizing interaction between a user and a terminal, for example, displaying the learning result of the single image super and perceptual image enhancement joint task learning system in the embodiment to the user.

In the embodiment, a single-image super and perceptual image enhancement joint task learning system is provided, and the terms "module", "unit", "subunit", and the like used below can implement a combination of software and/or hardware of a predetermined function. Although the means described in the embodiments below are preferably implemented in software, an implementation in hardware, or a combination of software and hardware is also possible and contemplated.

The single-image super-perception and perception image enhancement joint task learning system comprises a learning module, wherein the learning module comprises a deep network model, fig. 2 is a working principle diagram of the deep network model, and as shown in fig. 2, the deep network model comprises a first network, a second network and a third network;

the first network comprises a large coding and decoding residual error network, a small coding and decoding residual error network and a local residual error network. Wherein the large codec Residual network uses a larger number of RRDB (Residual-in-Residual Block) than the small codec Residual network. The local residual network includes superimposed lightweight multi-scale residual blocks of different kernel sizes.

The large coding and decoding residual error network and the small coding and decoding residual error network are used for extracting the global information of the original image, and the local residual error network is used for extracting the local information of the original image. Specifically, the original image is simultaneously input into the large coding and decoding residual network, the small coding and decoding residual network and the local residual network, and output images of the large coding and decoding residual network, the small coding and decoding residual network and the local residual network are spliced to serve as output of the first network.

Further, the large coding and decoding residual error network and the small coding and decoding residual error network adopt different down-sampling scales so as to increase the diversity of the feature map.

Optionally, a jump connection is established between the decoding phase of the large codec residual network and the encoding phase of the small codec residual network to prevent the gradient from vanishing in the gradient propagation.

The second network is used to up-sample the image. As an implementation, as shown in fig. 3, the second network includes two sets of convolutional layers connected in parallel and sharing information (in fig. 3, conv denotes a convolutional layer, one of the convolutional layers has a size of 3 × 3, and the other convolutional layer has a size of 5 × 5), the two sets of convolutional layers exchange and fuse feature information with each other by means of cross connection, perform upsampling through a shuffle layer, concatenate two sets of output features (in fig. 3, concat denotes a feature concatenation), and input one 1 × 1 convolutional layer.

Optionally, the two sets of convolutional layers of the second network use convolutional kernels of different sizes to fix the problem of kernel size limitation to supplement more detail.

The third network includes a U-net network for extracting image features and a color correction module for generating a color correction matrix for color correcting output images of the first network and the second network from the extracted image features.

Specifically, the third network is used for recovering the lost perception of the original image, and a color correction matrix is generated by learning colors and tone mapping, and the color correction matrix is used for performing color correction on the output images of the first network and the second network, so that the recovered images have good spatial consistency locally and integrally.

The input of the depth network model is a training image pair which comprises a single original image and a single corrected image, and the original image is simultaneously used as the input of the first network, the second network and the third network. Specifically, the original image is simultaneously input into a first network, the output image of the first network is obtained and then input into a second network, meanwhile, the original image is input into the second network, two output images of the second network are spliced and fed into a convolution layer to output an intermediate image, and the intermediate image is corrected by using a color correction matrix generated by a third network to obtain an output image of the whole depth network model.

The above modules may be functional modules or program modules, and may be implemented by software or hardware. For a module implemented by hardware, the modules may be located in the same processor; or the modules can be respectively positioned in different processors in any combination.

According to the single-image super-segmentation and perception-image enhancement joint task learning system provided by the embodiment, local and global information is described simultaneously by utilizing a multi-path learning strategy through the first network, image super-segmentation is achieved, the second network captures high-frequency details by utilizing two groups of convolution layers which are connected in parallel and share information, and the third network learns color and tone mapping to endow images with vivid and natural colors and better contrast, so that a reconstruction result is more real. The method solves the problem that the visual effect of the image cannot be well improved by the joint task of image super-component and perception image enhancement in the prior art.

In some embodiments, a single-image super-sense and sense-image enhancement joint task learning system is provided, based on the above embodiments, a learning module of the system further includes an image preprocessing unit, fig. 4 is a working schematic diagram of the image preprocessing unit, as shown in fig. 4, the image preprocessing unit is configured to, before an original image I is input into a second network, perform filtering processing on the original image I to obtain a basic information layer image, and perform element-by-element division on the original image and the basic information layer image to obtain a detailed information layer image I_dFinally, the original image and the detail information layer image I are processed_dSuperposing to obtain a result image I_i+dThe result image I_i+dAs an input to the second network. The image preprocessing unit can save the edges and textures of the original image by filtering the original image so as to better save the high-frequency information of the image.

Specifically, the original image may be used as a guide map, and the guide filter may be used to perform filtering processing on the original image.

In some embodiments, a single-image super and perceptual image enhancement joint task learning system is provided, which improves the performance of a residual block in a large codec residual network and a small codec residual network as follows compared with the prior art on the basis of the above embodiments: delete batch normalization layer, replace the PReLu layer with the RRelu layer, and/or delete channel attention module.

In some embodiments, a single-image super-and-perceptual-image-enhancement joint task learning system is provided, and on the basis of the above embodiments, a U-net network of a third network is designed as follows, taking an original image with a size of W × H × 3 as an example for illustration:

in the encoding stage, the original image is first down-sampled to a size of

Respectively processing the first image according to a first strategy and a second strategy;

wherein the first policy comprises: down-sampling the first image to a second image of 1 × 1 × 64, copying and copying the second image

And (4) portions are obtained.

Specifically, the first image is first down-sampled to

Then it is reduced to 1 × 1 × 64 by the RRelu layer and the full connection layer.

The second policy includes: performing global average pooling on the first image to obtain a third image with size of 1 × 1 × 64, and copying the third image

And (4) portions are obtained.

In the decoding phase, the second image copied according to the two measurements and the third image copied are merged to a size of

And the first feature map is spliced with the first image to obtain a simultaneously captured local and global feature with the size of

The second characteristic diagram of (1).

In some embodiments, a single-image super and perceptual-image enhancement joint task learning system is provided, and on the basis of the above embodiments, the color correction module of the third network includes a stitching unit, a global color correction unit, and a local color correction unit. The following description is made in conjunction with the above-described embodiment in which the original image size is W × H × 3:

the splicing unit is used for splicing output images of the large coding and decoding residual error network, the small coding and decoding residual error network and the local residual error network to obtain a fifth image, feeding the fifth image into a convolution layer of a third network, and then carrying out size adjustment and up-sampling on the fifth image through a reverse convolution layer to obtain a third characteristic diagram with the size of 2 Wx 2 Hx 3 x 3.

Then learning a color transformation by the global color correction unit and the local color correction unit;

the global color correction unit is used for inputting the third image into a full-connection layer and then outputting a global color correction matrix;

and the local color correction unit is used for applying the global color correction matrix to the third characteristic diagram to obtain a local color correction matrix, the color correction matrix is used for performing color correction on each space position of the image, and the local color correction matrix is applied to the output images of the first network and the second network.

In some embodiments, a single-image super-and-perceptual-image-enhancement joint task learning system is provided, and on the basis of the above embodiments, the optimization goal of the deep network model of the system is to minimize a loss function L, which is a linear superposition of content loss, total variation loss, multi-scale structural similarity loss and pixel loss. Specifically, L is defined as follows:

L＝ω₁L_con+ω₂L_tv+ω₃L_color+ω₄L_MSSIM+ω₅L₁，

wherein L is_conRepresents a content loss, L_tvDenotes total variation loss, L_colorDenotes the loss of color, L_MSSIMRepresents a loss of multi-scale structural similarity, L₁Indicating pixel loss. Omega₁、ω₂、ω₃、ω₄And ω₅The weight coefficients of the respective losses.

It should be noted that the content loss represents the consistency of the content between the output image of the depth network model and the target image (i.e., the corrected image in the training image pair), the total variation loss represents the noise of the output image relative to the target image, the color loss represents the color difference between the output image and the target image, the pixel loss represents the difference between the pixel points of the output image and the target image, the multi-scale structural similarity is an index for measuring the similarity degree of two digital images, and when the multi-scale structural similarity is used as a loss function, the quality of the output image can be improved, so that the output image is more realistic. The content loss, total variation loss, multi-scale structural similarity loss and pixel loss can be specifically defined according to the existing algorithm or model.

In the above embodiment, the loss function L with a mixture of various losses is adopted, so that noise and artifacts of the image can be effectively removed.

In some embodiments, a single-image super and perceptual-image enhancement joint task learning system is provided, which further includes a training module on the basis of the above embodiments. The training module is used for inputting a training image set into the learning module to train the deep network model so as to obtain a completely trained deep network model, and a single low-resolution image is input into the completely trained deep network model so as to perform super-resolution and perception enhancement on the image and finally obtain a high-resolution enhanced image.

Alternatively, the training image set may be prepared by:

raw low-resolution images are acquired from different scenes and stored by scene type, e.g., city, building, nature, etc. For these original low-resolution images, images are enhanced manually using image processing tools such as Photoshop or Lightroom to obtain training image pairs.

Optionally, during training, the training module randomly extracts image pairs from different scene classes, avoiding high coherence between the images.

Optionally, a stochastic gradient descent method with an Adam accelerator may be adopted to optimize the objective function during deep network optimization training. For example, 140 times, the first 50 times, the kernel size is set to 64 × 64, and the learning rate is 1 × 10^-4The size of the image batch processing is 16, the kernel size is 88 × 88 in the last 70 times, and the learning rate is 1 × 10^-5The size of the image batch is 4.

In a preferred embodiment, a single image hyper-and perceptual image enhancement joint task learning system is provided, the system comprising a learning module, the learning module comprising a deep network model and an image preprocessing unit, the deep network model comprising a first network, a second network and a third network.

Specifically, the first network includes a large codec residual network, a small codec residual network, and a local residual network. The method comprises the steps of utilizing 4 RRDB in a large coding and decoding residual error network, utilizing 2 RRDB in a small coding and decoding residual error network, utilizing a down-sampling scale of 1/4 in the large coding and decoding residual error network and utilizing a down-sampling scale of 1/2 in the small coding and decoding residual error network.

In this embodiment, the following improvements are performed on the residual block in the large coding and decoding residual network and the small coding and decoding residual network to improve the performance of the residual block: delete batch normalization layer, replace the PReLu layer with RRelu layer, and delete channel attention module. And a jump connection is established at the decoding stage of the large coding and decoding residual error network and the coding stage of the small coding and decoding residual error network to prevent the gradient from disappearing in the gradient propagation.

The local residual network includes superimposed lightweight multi-scale residual blocks of different kernel sizes.

The second network comprises two groups of convolution layers which are connected in parallel and share information, and the two groups of convolution layers adopt convolution kernels with different sizes. Before the original image is input into the second network, the image preprocessing unit carries out filtering processing on the original image to obtain a basic information layer image, element-by-element division operation is carried out on the original image and the basic information layer image to obtain a detail information layer image, finally, the original image and the detail information layer image are overlapped to obtain a result image, and the result image is input into the second network.

Specifically, the U-net network of the third network in this embodiment is designed as follows:

representing the size of an original image as W multiplied by H multiplied by 3, firstly carrying out down sampling on the original image to a first image with the size of W/4 multiplied by H/4 multiplied by 64 through a series of convolution operations in an encoding stage, and then respectively processing the first image according to a first strategy and a second strategy; wherein the first policy comprises: the first image is down-sampled to a second image with a size of 1 × 1 × 64, and the second image is copied to be W/4 × H/8 × 64 copies. Specifically, the first image is down-sampled to W/16 XH/16 × 64, and then down-sampled to 1 × 1 × 64 by the RRelu layer and the full link layer. The second policy includes: the first image is subjected to global average pooling to obtain a third image with the size of 1 × 1 × 64, and the third image is copied by W/4 × H/8 × 64 parts.

In the decoding stage, the second image copied according to the two measurements and the third image copied are merged into a first feature map with the size of W/4 XH/64, and the first feature map is spliced with the first image to obtain a second feature map with the size of W/4 XH 128, wherein the second feature map captures local and global features simultaneously.

The color correction module comprises a splicing unit, a global color correction unit and a local color correction unit. Specifically, the stitching unit is configured to stitch output images of the large coding and decoding residual network, the small coding and decoding residual network, and the local residual network to obtain a fifth image, feed the fifth image into a convolution layer of a third network, and then perform resizing and upsampling on the fifth image through an inverse convolution layer to obtain a third feature map with a size of 2W × 2H × 3 × 3 × 3.

A color transformation is then learned by the global color correction unit and the local color correction unit: the global color correction unit inputs the third image into a full connection layer and then outputs a global color correction matrix; and the local color correction unit is used for applying the global color correction matrix to the third characteristic diagram to obtain a local color correction matrix, wherein the color correction matrix is used for performing color correction on each space position of the image, and the local color correction matrix is applied to the output images of the first network and the second network.

Specifically, the optimization goal of the deep network model of the system is to minimize a loss function L, which is defined as follows:

L＝ω₁L_con+ω₂L_tv+ω₃L_color+ω₄L_MSSIM+ω₅L₁，

wherein L is_conRepresents a content loss, L_tvDenotes total variation loss, L_colorDenotes the loss of color, L_MSSIMRepresents a loss of multi-scale structural similarity, L₁Indicating pixel loss. Wherein, ω is₁、ω₂、ω₃、ω₄And ω₅Are 0.001, 1, 0.0005, 300 and 0.05, respectively.

According to the single-image super-segmentation and perception-image enhancement joint task learning system provided by the embodiment, local and global information is described simultaneously by utilizing a multi-path learning strategy through the first network, image super-segmentation is realized, the second network captures high-frequency details by utilizing two groups of convolution layers which are connected in parallel and share information, and the third network learns color and tone mapping, so that vivid and natural colors and better contrast are given to images, noise and artifacts of the images can be effectively removed, and a reconstruction result is more real. The method solves the problem that the visual effect of the image cannot be well improved by the joint task of image super-component and perception image enhancement in the prior art. In addition, the invention has better operation efficiency, and obtains the best performance enhancement on PSNR (Peak Signal to Noise ratio) and SSIM (structural similarity) indexes compared with the prior art.

There is also provided in this embodiment an electronic device comprising a memory having a computer program stored therein and a processor configured to execute the computer program to implement the single image hyper and perceptual image enhancement joint task learning system as defined in any of the above.

Optionally, the electronic apparatus may further include a transmission device and an input/output device, wherein the transmission device is connected to the processor, and the input/output device is connected to the processor.

In addition, in combination with the single-image super-and-perceptual-image-enhancement joint task learning system provided in the foregoing embodiment, a storage medium may also be provided to implement in this embodiment. The storage medium having stored thereon a computer program; the computer program is used for realizing the single image super-and perception image enhancement joint task learning system in any one of the above embodiments when being executed by a processor.

It should be noted that, for specific examples in this embodiment, reference may be made to the examples described in the foregoing embodiments and optional implementations, and details are not described again in this embodiment.

It should be understood that the specific embodiments described herein are merely illustrative of this application and are not intended to be limiting. All other embodiments, which can be derived by a person skilled in the art from the examples provided herein without any inventive step, shall fall within the scope of protection of the present application.

It is obvious that the drawings are only examples or embodiments of the present application, and it is obvious to those skilled in the art that the present application can be applied to other similar cases according to the drawings without creative efforts. Moreover, it should be appreciated that in the development of any such actual implementation, as in any engineering or design project, numerous implementation-specific decisions must be made to achieve the developers' specific goals, such as compliance with system-related and business-related constraints, which may vary from one implementation to another.

The term "embodiment" is used herein to mean that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the present application. The appearances of such phrases in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. It is to be expressly or implicitly understood by one of ordinary skill in the art that the embodiments described in this application may be combined with other embodiments without conflict.

The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the patent protection. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present application shall be subject to the appended claims.

Claims

1. A single-image hyper-and perceptual-image enhancement joint task learning system, comprising:

2. The single image hyper-and perceptual image enhancement joint task learning system of claim 1, wherein the learning module further comprises an image pre-processing unit;

3. The single image hyper-and perceptual image enhancement joint task learning system of claim 1, wherein the two sets of convolutional layers of the second network employ different convolutional kernel sizes.

4. The single image super-and perceptual image enhancement joint task learning system of claim 1, wherein the large codec residual network and the small codec residual network employ different downsampling scales.

5. The single image super and perceptual image enhancement joint task learning system of claim 1, wherein the setting of the residual block in the large codec residual network and the small codec residual network comprises: delete batch normalization layer, replace the PReLu layer with the RRelu layer, and/or delete channel attention module.

6. The single image hyper-and perceptual image enhancement joint task learning system of any of claims 1 to 5, wherein:

in the encoding stage, the U-net network down-samples the original image to a first image and processes the first image according to a first strategy and a second strategy respectively;

7. The single image hyper-and perceptual image enhancement joint task learning system of claim 6, wherein the color correction module comprises a stitching unit, a global color correction unit, and a local color correction unit; the splicing unit is used for splicing output images of the large coding and decoding residual error network, the small coding and decoding residual error network and the local residual error network to obtain a fifth image, feeding the fifth image into the third network, and obtaining a third characteristic diagram by adjusting the size and upsampling the fifth image;

8. The single-image hyper-and perceptual image enhancement joint task learning system of any of the claims 1 to 5, wherein an optimization goal of the deep network model is minimization of a loss function, the loss function being linear superposition of content loss, total variation loss, multi-scale structural similarity loss and pixel loss.

9. The single image hyper-and perceptual image enhancement joint task learning system of any of the claims 1 to 5, wherein the decoding stage of the large codec residual network and the encoding stage of the small codec residual network are provided with a jump connection.

10. The single image hyper-and perceptual image enhancement joint task learning system of any of claims 1 to 5, further comprising a training module for inputting a set of training images into the learning module to train the deep network model.

11. An electronic device comprising a memory and a processor, wherein the memory has stored therein a computer program, and wherein the processor is configured to execute the computer program to implement the single image hyper and perceptual image enhancement joint task learning system of any of claims 1 to 9.

12. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, implements the single image hyper and perceptual image enhancement joint task learning system of any one of claims 1 to 9.