CN116703792A

CN116703792A - Method for enhancing low-light image by using generating network, training method and training equipment for generating network

Info

Publication number: CN116703792A
Application number: CN202310396091.4A
Authority: CN
Inventors: 郭皓明; 李想; 李威; 郭崎; 张知行
Original assignee: Shanghai Processor Technology Innovation Center
Current assignee: Shanghai Nanqili Technology Co ltd
Priority date: 2023-04-13
Filing date: 2023-04-13
Publication date: 2023-09-05

Abstract

The invention provides a method for enhancing a low-light image by using a generating network, a training method and equipment of the generating network, which comprises the steps of inputting the low-light image into the generating network; generating a network enhanced low-light image to obtain a generated image; generating a network output generated image; the generating network adopts a residual error structure, a main branch generates a residual error between the low-light image and the generated image, and a shortcut branch transmits the low-light image to the output end of the main branch to be added with the residual error to obtain the generated image; the main branch of the generating network is at least two residual error learning units which are arranged in a cascading way, and each residual error learning unit gradually and progressively generates residual errors. According to the invention, through setting the residual error structure, the learning task of a single generator is reduced, the learning and generating difficulties are reduced, and the training purpose is more easily achieved. Through setting up cascade structure, further alleviateed the study of generator, the generation degree of difficulty makes it more easy to reach training purpose.

Description

Method for enhancing low-light image by using generating network, training method and training equipment for generating network

Technical Field

The present invention relates to the field of image processing, and more particularly, to a method for enhancing a low-light image by using a generating network, and a training method and apparatus for the generating network.

Background

The low-illumination image is an image obtained under the condition of low illumination, has low quality, poor identification performance, contains a large amount of noise, is difficult to identify details in the low-illumination image, and has low use value. Low-light images result in lower visual quality due to insufficient light, which affects the performance of many computer vision systems when applied or processed. The low-light enhancement refers to a process of processing a low-light image, and aims to improve the perceived quality of an image acquired in a low-light environment, so that the perceived quality of the image basically reaches the visual perception effect of a sufficient-light image.

The scheme for low-light image enhancement includes: the conventional image method and the deep learning-based method, wherein the conventional image method comprises a method based on HE histogram equalization and a method based on Retinex theory model.

HE histogram equalization theory considers that three channel RGB pixel values of a conventional illumination intensity image appear with equal probability over the entire dynamic range, and the entire three channel RGB pixel value distribution is approximately uniformly distributed except that individual pixel values are more prominent, so that the image has a larger color dynamic range and higher contrast. Based on the method, after the histogram information of the input image is obtained, the three-channel RGB pixel distribution of the original input image is enabled to be more nearly uniformly distributed through a transformation function, so that HE histogram equalization is completed, and low-illumination image enhancement is achieved. Some areas of the image enhanced by this method are over-enhanced and the resulting image has many noise points.

The Retinex theory is based on the object color constancy, which is believed to be determined by the object's ability to reflect light in different bands, i.e., the colors exhibited by different bands, rather than by the intensity of the reflected light. The basic assumption of the Retinex theory is: the original image is the product of the image illumination component and the image reflectance component. The image enhancement method based on Retinex theory is to estimate the illumination component from the original image component, so as to decompose the image reflectivity component, and can generally perform a certain degree of adaptive enhancement on an image which is not particularly dark. However, the Retinex theory is not reasonable to priors the image reflectivity component as an enhancement result, especially in different ambient light conditions, such a priori false may lead to impractical enhancement such as distortion of feature details and color errors. Moreover, the effect of noise is often ignored in the Retinex theory model, with the consequent problem that the noise is amplified in the enhanced image.

In recent years, algorithms based on deep learning have found many applications in low-light image enhancement tasks. The solution of low-light image enhancement based on deep learning mainly involves: supervised learning (Supervised Learning, SL), unsupervised learning (Unsupervised Learning, UL). However, in a real scene, since it is difficult to obtain paired images of low light and sufficient light in the same scene, the application of supervised learning in low light image enhancement is limited. For unsupervised learning, the generalization capability is limited, and the problems of unstable network, complex network structure, low learning efficiency, unrealistic local details of the enhanced image and the like still exist.

Based on this, the technique of low-light image enhancement needs to be further improved.

Disclosure of Invention

In order to solve the problems in the low-light image enhancement method, the present invention provides a method for enhancing a low-light image by using a generating network, comprising inputting the low-light image into the generating network 1; the generation network 1 enhances the low-illumination image to obtain a generation image; the generation network 1 outputs the generated image; the generating network 1 adopts a residual error structure, a main branch of the generating network 1 generates a residual error between a low-light image and a generated image, and a shortcut branch of the generating network 1 transmits the low-light image to an output end of the main branch of the generating network 1 to be added with the residual error to obtain the generated image; wherein, the main branch of the generating network 1 is at least two residual error learning units 11 which are arranged in cascade, and each residual error learning unit progressively generates the residual error.

According to one embodiment of the present invention, the residual learning unit 11 adopts a residual structure, the main branch of the residual learning unit 11 is a generator 111, and the shortcut branch of the residual learning unit 11 is connected to the input and output of the generator 111.

According to one embodiment of the invention, the generating network 1 further comprises a channel attention branch connecting the input of the first stage generator and the output of the last stage generator 111; the channel attention branches take out the RGB three channels of the low-light image, then take out the RGB three channels, and multiply the RGB three channels with the output of the final stage generator 111 in order of channels.

According to one embodiment of the invention, generator 111 employs an encoder-residual block-decoder architecture, comprising a downsampling layer 1111, a residual block 1112, and upsampling layer 1113, and a dimension reduction layer 1114, the encoder and decoder being symmetrically arranged.

According to one embodiment of the invention, the generator 111 comprises three downsampling layers 1111, nine residual blocks 1112, two upsampling layers 1113 and one dimension reduction layer 1114; three downsampling layers 1111 constitute a decoder, two upsampling layers 1113 and one dimension reduction layer 1114 constitute an encoder; the downsampling layer 1111 is of a CBR structure; the upsampling layer 1113 is a TBR structure.

According to another aspect of the invention, there is provided a method for generating a network based on unsupervised generation of an countermeasure network training, comprising acquiring a low-light image and a sufficient-light image as training samples, the semantic contents of the low-light image and the sufficient-light image not necessarily being strictly consistent; inputting the low-illumination image into a generation network to obtain a generated image; inputting the generated image and the sufficient illumination image into a discrimination network to obtain a discrimination result; iteratively adjusting parameters of the generating network or the judging network until the generating network and the judging network reach Nash equilibrium; wherein the parameters of the generation network are adjusted based on the loss of semantic consistency between the low-light image and the generated image, and the contrast loss between the generated image and the sufficient-light image; adjusting parameters of the discrimination network based on the countermeasures loss; the generating network adopts a residual structure, and the shortcut branches of the generating network 1 are connected with the input end and the output end of the generating network 1; the main branch of the generating network 1 is at least two residual error learning units 11 which are arranged in cascade; the residual learning unit 11 adopts a residual structure, a main branch of the residual learning unit 11 is a generator 111, and a shortcut branch of the residual learning unit 11 is connected with an input end and an output end of the generator 111.

According to one embodiment of the invention, consistency of VGG19 features is used to maintain semantic content non-offset, semantic content consistency Loss _VGG The definition is as follows:

WH represents the width and length of the image; wi represents the ith pixel point in the width direction; hj represents the j-th pixel point in the length direction;VGG features representing the input low-light image; />VGG features representing the enhanced generated image; i ₂ Representing the L2 norm measure.

According to one embodiment of the invention, the countermeasures loss are defined as follows:

wherein ,is a loss function of the arbiter at the global scale; />Is a loss function of the arbiter at a local scale; />Is a loss function of the generator at the global scale; />Is a loss function of the generator at a local scale; x is X _r Real represents the distribution of the input image subject to the real image domain; x is X _g -generate represents the distribution of the input image subject to the generated image domain; d represents a discriminator network; i ₂ Representing the L2 norm measure; e represents the desire; the local discriminant uses discriminant results of averaging n random sub-blocks, n being an integer greater than 1.

According to one embodiment of the invention, n=6.

For the generating network in the invention, the channel attention mechanism is set to help the generating network automatically focus on the information of the area needing to be enhanced from the input. By setting the local scale of the discrimination network, the area needing to be enhanced in the input low-light image is adaptively enhanced when the generation network is trained. By setting the architecture of the generating network and the training process of the generating network, the emphasis on the area to be enhanced when the generating network performs image enhancement is increased, and the quality of the generated image is improved.

In the invention, for the generating network, the task of generating residual errors is decomposed by setting the residual error structure, the tasks of each generator are reduced, and the training purpose is easier to achieve during training. By setting the cascade structure, each generator only bears part of tasks, so that the generation difficulty and the learning difficulty are further reduced.

Drawings

FIG. 1 is a schematic diagram of the architecture of an unsupervised generation-based low-light image enhancement model of the present invention;

FIG. 2 is a schematic diagram of the structure of a generating network portion of a low-light image enhancement model;

FIG. 3 is a schematic diagram of a structure of a generation network and residual learning unit;

FIG. 4 is a schematic diagram of a generation network including channel attention branches;

FIG. 5 is a schematic diagram of the structure of the generator;

FIG. 6 is a schematic diagram of a global arbiter;

FIG. 7 is a schematic diagram of a local arbiter;

FIG. 8 is a schematic diagram of steps of a method of generating a network based on unsupervised generation of an countermeasure network training;

FIG. 9 is a schematic diagram of steps of a method for enhancing a low-light image using a generating network;

FIG. 10 is an exemplary block diagram of an apparatus for training a model.

Detailed Description

The principles and spirit of the present invention will be described below with reference to several exemplary embodiments. It should be understood that these embodiments are presented merely to enable those skilled in the art to better understand and practice the invention and are not intended to limit the scope of the invention in any way. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the invention to those skilled in the art.

Fig. 1 shows a schematic structure of the low-light image enhancement model based on the unsupervised generation countermeasure network of the present invention.

As shown in fig. 1, the generation countermeasure network includes two sub-networks, a generation network 1 and a discrimination network 2. The task of the generation network 1 is to enhance the input low-light image, generating a generated image that looks natural and real, similar to the original data. The task of the discrimination network 2 is to judge whether or not the generated image is close to the adequate illumination image, that is, to output the discrimination result, with the generated image and the adequate illumination image as inputs.

In the invention, the low-light image and the sufficient-light image are a group of relative concepts, and the visual effect of the sufficient-light image is better than that of the low-light image. The invention does not carry out quantization limitation on the illumination intensity of the low illumination image and the sufficient illumination image, and can be used as the low illumination image no matter the absolute value of the illumination intensity of the picture needing to be enhanced in the image enhancement processing process.

The training process for generating the countermeasure network includes training the discrimination network 2 first: the full illumination image is marked with a true label and the generated image generated by the generating network 1 is marked with a false label and is input into the judging network 2, and the judging network 2 is trained. When the loss is calculated, the discrimination network 2 is enabled to approach true discrimination of the sufficient illumination image, and the discrimination of the generated image generated by the generation network 1 is enabled to approach false, and only the parameters of the discrimination network 2 are updated in the process, and the parameters of the generation network 1 are not updated. Then retraining the generation network 1: the low-light image is input into the generating network 1, and then the generating image generated by the generating network 1 is tagged with a true label and sent to the discriminating network 2. When the loss is calculated, the discrimination network 2 makes discrimination of the generated image to true. In this process, only the parameters of the generation network 1 are updated, and the parameters of the discrimination network 2 are not updated. And then circularly updating the discrimination network 2 and the generation network 1 according to the steps, and repeatedly updating for a plurality of times until the generation network 1 and the discrimination network 2 reach Nash equilibrium.

Based on the unsupervised learning network model, the low-light image and the sufficient-light image can be selected from image pairs whose language content does not need to be strictly consistent. That is, a sufficient illumination image whose semantic content does not need to be strictly identical is assigned as a reference image of the generation domain while the low illumination image is input.

Fig. 2 shows a schematic structural diagram of a generating network part of the low-light image enhancement model.

As shown in fig. 2, in a low-light image enhancement model based on an unsupervised generation countermeasure network, a generation network 1 for generating the countermeasure network adopts a residual structure, and a main branch of the generation network 1 is at least two residual learning units 11 arranged in a cascade; the shortcut branch of the generating network 1 is connected with the input end and the output end of the generating network 1; the residual error learning unit 11 adopts a residual error structure, and a main branch of the residual error learning unit 11 is a generator 111; the shortcut branches of the residual learning unit 11 connect the input and output of the generator 111.

When the low-light image is enhanced by the generation network, inputting the low-light image into the generation network 1; the generation network 1 enhances the low-illumination image to obtain a generation image; the generation network 1 outputs the generated image; the main branch of the generating network 1 generates a residual error between the low-light image and the generated image, and the shortcut branch of the generating network 1 transmits the low-light image to the output end of the main branch of the generating network 1 to be added with the residual error, so as to obtain the generated image.

The residual structure includes a main branch and a short-cut (short-cut) branch, the short-cut branch providing a short-cut connection, superimposing an input of the main branch to an output of the main branch. In the present invention, at least two residual learning units 11 are cascaded to form a main branch of the generation network 1, and the input of the main branch is a low-light image, and residual images of the low-light image and the sufficient-light image are output. The shortcut branch of the generating network 1 superimposes the features of the low-light image with the output of the main branch, resulting in a generated image. Because the main branch part of the generating network 1 only generates residual errors in the enhancement process and only learns the residual errors in the training process, the learning difficulty in the training process and the task intensity in the enhancement process are reduced.

The cascade structure of two or more residual error learning units 11 makes each residual error learning unit 11 only complete the task of partial residual error learning and generating. For example, the input of the first residual learning unit is a low-light image, the output of the first residual learning unit is a first intermediate image, the input of the second residual learning unit is a first intermediate image, the output of the second residual learning unit is a second intermediate image, the steps are gradually progressed to the last residual learning unit, and the residual image is output, so that the difficulty level of learning and generating tasks of each residual learning unit 11 is reduced, and the complexity level of the generator 111 in the residual learning unit 11 can be simplified. Preferably, the generator 111 in each residual learning unit 11 adopts the same structure, so that residual learning, generation, and the like advance steadily.

Fig. 3 shows a schematic diagram of the structure of a generation network comprising a residual learning unit.

As shown in fig. 3, the generator 111 is a key component for feature extraction, conversion and reconstruction of low-light images. In the invention, the residual learning unit 11 is constructed by taking the generator 111 as a main branch, and the shortcut branch is connected with the input end and the output end of the generator 111, so that the tasks learned and generated by the generator 111 occupy a small part of the tasks learned and generated by the corresponding residual learning unit. The residual structure of the residual learning unit 11 greatly reduces the complexity of the generator 111, so that the generator 111 can be trained more easily, and an ideal result is obtained.

In the present invention, a residual structure of the entire network 1 is generated, and a plurality of residual learning units 11 are nested in the residual structure. The plurality of residual learning units 11 are cascaded to form a main branch of the generating network 1, so that residuals of the low-light image and the sufficient-light image are learned as a whole, the learning task of the generator 111 is further reduced, and the structure of the generator 111 is simplified.

Fig. 4 shows a schematic diagram of a generation network comprising channel attention branches.

As shown in fig. 4, the generating network 1 further includes a channel attention branch, which connects the input end of the first stage generator and the output end of the last stage generator; the channel attention branch is used for taking out three RGB channels of the input low-illumination image, inverting the three RGB channels and multiplying the three RGB channels with the output of the final stage generator in sequence according to the channel sequence.

The channel attention branches extract the characteristics of the input low-light image according to the RGB channels, and focus on darker areas in the low-light image after the characteristics are inverted, so that the study of the darker areas is enhanced, and the output result is more real.

Fig. 5 shows a schematic diagram of the structure of the generator.

As shown in fig. 5, the generator 111 employs an encoder-residual block-decoder architecture, including a downsampling layer 1111, a residual block 1112, and upsampling layer 1113, and a dimension reduction layer 1114, with the encoder and decoder being symmetrically arranged.

Each generator 111 uses an encoder-decoder architecture with residual block 1112, downsampling layer 1111 constitutes an encoder, upsampling layer 1113 and dimension reduction layer 1114 constitutes a decoder. The encoder performs feature extraction on the input image to obtain a feature map, the feature map is input into the residual block for conversion, a new feature representation is obtained, and finally the decoder reconstructs the features into a new image as the output of the generator 111.

As shown in fig. 5, the generator 111 includes three downsampling layers 1111, nine residual blocks 1112, two upsampling layers 1113 and one dimension reduction layer 1114; three downsampling layers 1111 constitute a decoder, two upsampling layers 1113 and one dimension reduction layer 1114 constitute an encoder; the downsampling layer 1111 is of a CBR structure; the upsampling layer 1113 is a TBR structure, and the dimension reduction layer 1114 restores the output image to conform to the input image format.

The CBR structure is composed of two-dimensional convolution, batch normalization and ReLu activation functions, and the network structure learns and extracts the characteristics of an input image through downsampling in a convolution form.

The addition of the residual block 1112 relieves the defects caused by the single use of the conventional convolution network, namely, the parameters are more, the network complexity is high, so that training is not easy to carry out, the problem of gradient disappearance easily occurs in the back propagation of gradient descent, and the training is difficult to carry out. The local jumper of residual block 1112 allows the gradient to pass directly back into the previous structure without blocking.

The TBR structure consists of two-dimensional deconvolution, batch normalization and ReLu activation functions, which map the feature layer back to the three-channel image domain by up-sampling in deconvolution.

Fig. 6 shows a schematic diagram of a global arbiter.

Fig. 7 shows a schematic diagram of a local arbiter.

As shown in fig. 6 and 7, the discrimination network 2 that generates the countermeasure network includes a global discriminator 21 and a local discriminator 22, the global discriminator 21 being for discriminating the entire image; the local discriminator 22 is used for discriminating a local area of an image.

The discriminant consists of two scales, a global scale and a local small scale. The network component is a CBR structure, and the last convolution layer is a single-channel output spectrum, so that 0/1 judgment of the true and false Boolean values of the image is completed. Under global scale discrimination, the input of the discriminator is the whole image. In the local small scale discrimination mode, the input of the discriminator is a plurality of image patches at corresponding positions, and the patches are taken out from the whole graph according to the corresponding positions.

In the invention, the residual learning unit 11 with the cascade structure only generates residual images, the complexity of learning and generating tasks is greatly reduced, and the generating network can obtain ideal results more easily during training. The generator 111 with the same structure is adopted, and each time of learning and residual error generation steadily advances, the stability of the generated network is enhanced. The discriminator adopts a multi-scale form, the enhanced image is more similar to the real image, and the local details of the image can be more real. The channel attention mechanism helps the generation network 1 automatically focus on the information of the region to be enhanced from the input, and more learnable parameters are available when extracting features, so that the model can better complete visual tasks.

Fig. 8 shows a schematic diagram of the steps of a method of generating a network based on unsupervised generation of an countermeasure network training.

As shown in fig. 8, a method for generating a network based on unsupervised generation of an countermeasure network training includes the steps of S1, acquiring a low-light image and a sufficient-light image as training samples, wherein semantic contents of the low-light image and the sufficient-light image are not strictly consistent; s2, inputting the low-light image into a generation network to obtain a generated image; s3, inputting the generated image and the sufficient illumination image into a discrimination network to obtain a discrimination result; step S4, iteratively adjusting parameters of the generating network or the judging network until the generating network and the judging network reach Nash equilibrium; wherein the parameters of the generation network are adjusted based on the loss of semantic consistency between the low-light image and the generated image, and the contrast loss between the generated image and the sufficient-light image; adjusting parameters of the discrimination network based on the countermeasures loss; the generating network adopts a residual structure, and the shortcut branches of the generating network 1 are connected with the input end and the output end of the generating network 1; the main branch of the generating network 1 is at least two residual error learning units 11 which are arranged in cascade; the residual learning unit 11 adopts a residual structure, a main branch of the residual learning unit 11 is a generator 111, and a shortcut branch of the residual learning unit 11 is connected with an input end and an output end of the generator 111.

In the invention, the training of the generating network and the judging network adopts an alternate training mode, namely, the parameters of the generating network and the judging network are adjusted iteratively.

The loss function for training of the generation network 1 comprises two parts. One is the contrast loss portion and the second is the semantic consistency loss portion. The parameters of the generation network 1 are adjusted according to the loss of semantic consistency between the low-light image and the generated image, and the contrast loss between the generated image and the sufficient-light image. Because the invention belongs to an unsupervised scheme, the input low-light image and the sufficient-light image are not strictly matched in terms of image content in the training process, so that the semantic consistency between the low-light image and the generated image cannot be limited in the distinguishing process. Based on this, in the present invention, by restricting the consistency of the VGG19 features, the semantic content of the enhanced generated image and the input low-light image is restricted to be consistent.

VGG19 employed a pre-trained model on ImageNet. Because the VGG19 features are insensitive to the pixel dynamic range of the input image, they can be used to limit the consistency of the image content features of the low-light image and the enhanced generated image, i.e., to maintain the consistency of the semantic content of the two. Loss of semantic content consistency Loss _VGG The definition is as follows:

wherein WH represents the width and length of the feature restriction loss image; wi represents the ith pixel point in the width direction; hj denotes in the length directionA j-th pixel point;VGG features representing the input low-light image and the enhanced generated image respectively, I ₂ Representing the L2 norm measure.

For training of the discriminant network, training is performed only in terms of countermeasures against loss.

In the invention, the discrimination network adopts a global and local multi-scale discrimination scheme. Specifically, the local size discrimination of n random areas is added in addition to the discrimination of the entire map size. n is an integer greater than 1. For example, n=6, the main feature may be substantially covered when the random area selection amount is small.

The challenge loss is defined as follows:

wherein ,is a loss function of the arbiter at the global scale; />Is a loss function of the arbiter at a local scale; />Is a loss function of the generator at the global scale; />Is a loss function of the generator at a local scale; x is X _r Real represents the distribution of the input image subject to the real image domain; x is X _g -generate represents the distribution of the input image subject to the generated image domain; d represents a discriminator network; i ₂ Representing the L2 norm measure; the local discriminant adopts discriminant results of average n random sub-blocks; e represents the desire.

n random areas are cut from the generated image and the sufficient illumination image according to the corresponding positions, and whether the small blocks belong to the real distribution or the distribution of the generated image is judged.

The global and local multi-scale structures are beneficial to ensuring that the whole and local areas of the enhanced image are as close as possible to the real sufficient illumination image, and the area needing to be enhanced in the input low illumination image can be adaptively enhanced instead of purely and integrally enhanced.

For the generating network in the invention, the generating network is assisted to automatically focus on the information of the area needing to be enhanced from the input by setting a channel attention mechanism. The local scale of the discrimination network is set, so that the area needing to be enhanced in the input low-light image is adaptively enhanced when the generation network is trained. Through the cooperation of the two schemes, the emphasis on the area to be enhanced when the generating network carries out image enhancement is increased, and the quality of the generated image is better improved.

In the invention, for the generating network, by setting the nested residual error structure, the learning and generating tasks of a single generator are lightened, the task difficulty is reduced, and the training purpose is easier to achieve. Through setting up cascade structure, make the training in-process, further alleviateed the study degree of difficulty and made it reach training purpose more easily.

In the invention, when generating network training, not only the counterloss is adopted for constraint, but also the consistency of the content of the images before and after the enhancement is limited by the semantic consistency loss, thereby enhancing the robustness of the model and overcoming the problem of poor generalization performance caused by an unsupervised training mode.

Fig. 9 shows a schematic of the steps of a method for enhancing a low-light image using a generating network.

As shown in fig. 9, step S10 includes inputting a low-light image into the generation network 1; in step S20, the generating network 1 enhances the low-light image, the main branch of the generating network 1 generates a residual error between the low-light image and the generated image, and the shortcut branch of the generating network 1 transmits the low-light image to the output end of the main branch of the generating network 1 to be added with the residual error, so as to obtain the generated image. Step 30, the generating network 1 outputs the generated image.

FIG. 10 is an exemplary block diagram of a device for training a generating network or enhancing low-light images with a generating network.

It is to be appreciated that device 300 may be a single device (e.g., a computing device) or a multi-function device including various peripheral devices.

As shown in fig. 10, the device 300 may include a central processing unit or central processing unit ("CPU") 311, which may be a general purpose CPU, a special purpose CPU, or other information processing and program running execution unit. Further, device 300 may also include a mass memory 312 and a read only memory ("ROM") 313, where mass memory 312 may be configured to store various types of data, including various and image enhancement, algorithm data, intermediate results, and various programs required to operate device 300. ROM313 may be configured to store data and instructions necessary for power-on self-test of device 300, initialization of functional modules in the system, drivers for basic input/output of the system, and booting the operating system. Optionally, the device 300 may also include other hardware platforms or components, such as a tensor processing unit ("TPU") 314, a graphics processing unit ("GPU") 315, a field programmable gate array ("FPGA") 316, and a machine learning unit ("MLU") 317, as shown. It will be appreciated that while various hardware platforms or components are shown in device 300, this is by way of example only and not limitation, and that one of ordinary skill in the art may add or remove corresponding hardware as desired. For example, device 300 may include only a CPU, associated storage device, and interface device to implement the method of the present invention for low-light image enhancement with a generating network and for training the generating network. In some embodiments, to facilitate the transfer and interaction of data with external networks, the device 300 further comprises a communication interface 318 whereby a local area network/wireless local area network ("LAN/WLAN") 305 may be connected through the communication interface 318, and further a local server 306 or the Internet ("Internet") 307 may be connected through the LAN/WLAN. Alternatively or additionally, device 300 may also be directly connected to the internet or a cellular network via communication interface 318 based on wireless communication technology, such as wireless communication technology based on generation 3 ("3G"), generation 4 ("4G"), or generation 5 ("5G"). In some application scenarios, the device 300 may also access the server 308 and database 309 of the external network as needed to obtain various known algorithms, data, and modules, and may store various data remotely, such as various types of data or instructions for image enhancement.

Peripheral devices of the apparatus 300 may include a display device 302, an input device 303, and a data transmission interface 304. In one embodiment, display device 302 may include, for example, one or more speakers and/or one or more visual displays configured to enhance low-light images using a generating network, for voice prompts for training an image enhancement model, and/or for video display of images. The input device 303 may include other input buttons or controls, such as a keyboard, mouse, microphone, gesture-capture camera, etc., configured to receive input of audio data and/or user instructions. The data transfer interface 304 may include, for example, a serial interface, a parallel interface, or a universal serial bus interface ("USB"), a small computer system interface ("SCSI"), serial ATA, fireWire ("FireWire"), PCI Express, and high definition multimedia interface ("HDMI"), etc., configured for data transfer and interaction with other devices or systems. In accordance with aspects of the present invention, the data transfer interface 304 may receive low-light images and high-light images and transmit various data or results to the device 300.

The above-described CPU 311, mass memory 312, ROM313, TPU 314, GPU 315, FPGA 316, MLU 317, and communication interface 318 of the device 300 may be connected to each other via a bus 319, and data interaction with peripheral devices is achieved via the bus. In one embodiment, CPU 311 may control other hardware components in device 300 and its peripherals via this bus 319.

Those skilled in the art will also appreciate from the foregoing description, taken in conjunction with the accompanying drawings, that embodiments of the present invention may also be implemented in software programs. The invention thus also provides a computer program product. The computer program product may be used to implement the method for training a model described in connection with fig. 8 of the present invention.

It should be noted that although the operations of the method of the present invention are depicted in the drawings in a particular order, this does not require or imply that the operations must be performed in that particular order or that all of the illustrated operations be performed in order to achieve desirable results. Rather, the steps depicted in the flowcharts may change the order of execution. Additionally or alternatively, certain steps may be omitted, multiple steps combined into one step to perform, and/or one step decomposed into multiple steps to perform.

It should be understood that when the terms "first," "second," "third," and "fourth," etc. are used in the claims, the specification and the drawings of the present invention, they are used merely to distinguish between different objects, and not to describe a particular order. The terms "comprises" and "comprising" when used in the specification and claims of the present invention are taken to specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.

It is also to be understood that the terminology used in the description of the invention herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used in the specification and claims, the singular forms "a", "an" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should be further understood that the term "and/or" as used in the present specification and claims refers to any and all possible combinations of one or more of the associated listed items, and includes such combinations.

Although the embodiments of the present invention are described above, the descriptions are merely examples for facilitating understanding of the present invention, and are not intended to limit the scope and application of the present invention. Any person skilled in the art can make any modification and variation in form and detail without departing from the spirit and scope of the present disclosure, but the scope of the present disclosure is defined by the appended claims.

Claims

1. A method for enhancing a low-light image using a generating network, comprising,

inputting the low-light image into a generating network (1);

the generation network (1) enhances the low-illumination image to obtain a generated image;

the generation network (1) outputs the generated image;

the generating network (1) adopts a residual error structure, a main branch of the generating network (1) generates a residual error between a low-light image and a generated image, and a shortcut branch of the generating network (1) transmits the low-light image to an output end of the main branch of the generating network (1) to be added with the residual error to obtain the generated image;

the main branch of the generating network (1) is at least two residual error learning units (11) which are arranged in a cascading way, and each residual error learning unit gradually and progressively generates the residual error.

2. The method for enhancing a low-light image using a generating network as recited in claim 1, wherein,

the residual error learning unit (11) adopts a residual error structure, a main branch of the residual error learning unit (11) is connected with an input end and an output end of the generator (111) through a shortcut branch of the residual error learning unit (11).

3. The method for enhancing a low-light image using a generating network as recited in claim 2, wherein,

the generating network (1) further comprises a channel attention branch connecting the input of the first stage generator and the output of the last stage generator (111);

the channel attention branches take out RGB three channels of the low-illumination image, then take out the channel attention branches, and multiply the RGB three channels with the output of the final stage generator (111) in turn according to the channel sequence.

4. The method for enhancing a low-light image using a generating network as recited in claim 2, wherein,

the generator (111) adopts an encoder-residual block-decoder architecture, comprising a downsampling layer (1111), a residual block (1112), an upsampling layer (1113) and a dimension reduction layer (1114), and the encoder and the decoder are symmetrically arranged.

5. The method for enhancing a low-light image using a generating network as recited in claim 4, wherein,

the generator (111) comprises three downsampling layers (1111), nine residual blocks (1112), two upsampling layers (1113) and one dimension reduction layer (1114);

three downsampling layers (1111) constitute a decoder, two upsampling layers (1113) and one dimension reduction layer (1114) constitute an encoder;

the downsampling layer (1111) is of a CBR structure;

the upsampling layer (1113) is a TBR structure.

6. A method for generating a network based on unsupervised generation of an countermeasure network training, comprising,

acquiring a low-light image and a sufficient-light image as training samples, wherein semantic contents of the low-light image and the sufficient-light image are not required to be strictly consistent;

inputting the low-illumination image into a generation network to obtain a generated image;

inputting the generated image and the sufficient illumination image into a discrimination network to obtain a discrimination result;

iteratively adjusting parameters of the generating network or the judging network until the generating network and the judging network reach Nash equilibrium;

wherein the parameters of the generation network are adjusted based on the loss of semantic consistency between the low-light image and the generated image, and the contrast loss between the generated image and the sufficient-light image;

adjusting parameters of the discrimination network based on the countermeasures loss;

the generating network adopts a residual structure, and shortcut branches of the generating network (1) are connected with an input end and an output end of the generating network (1); the main branch of the generating network (1) is at least two residual error learning units (11) which are arranged in a cascading way; the residual error learning unit (11) adopts a residual error structure, a main branch of the residual error learning unit (11) is connected with an input end and an output end of the generator (111) through a shortcut branch of the residual error learning unit (11).

7. The method for generating a network based on unsupervised generation of an countermeasure network training of claim 6,

the discrimination network (2) for generating the countermeasure network includes a global discriminator (21) and a local discriminator (22),

the global discriminator (21) is used for discriminating the whole image;

the local discriminator (22) is used for discriminating local areas of the image.

8. The method for generating a network based on unsupervised generation of an countermeasure network training of claim 7,

the global discriminant (21) and the local discriminant (22) are in a CBR structure, and the final convolution layer is a single-channel output spectrum.

9. The method for generating a network based on unsupervised generation of an countermeasure network training of claim 6,

maintaining semantic content non-offset by adopting consistency of VGG19 features, and losing Loss of semantic content consistency _VGG The definition is as follows:

WH represents the width and length of the image;

wi represents the ith pixel point in the width direction;

hj represents the j-th pixel point in the length direction;

v (X) represents VGG features of the input low-light image;

v (G (X)) represents VGG characteristics of the enhanced generated image;

|| || ₂ representing the L2 norm measure.

10. The method for generating a network based on unsupervised generation of an countermeasure network training of claim 6,

the challenge loss is defined as follows:

wherein ,

is a loss function of the arbiter at the global scale;

is a loss function of the arbiter at a local scale;

is a loss function of the generator at the global scale;

is a loss function of the generator at a local scale;

X _r real represents the distribution of the input image subject to the real image domain;

X _g -generate represents the distribution of the input image subject to the generated image domain;

d represents a discriminator network;

e represents the desire;

|| || ₂ representing the L2 norm measure;

the local discriminant uses discriminant results of averaging n random sub-blocks, n being an integer greater than 1.

11. An apparatus for enhancing a low-light image using a generating network, comprising:

a processor; and

a memory storing program instructions for a method of enhancing a low-light image with a generating network, which when executed by the processor, cause the apparatus to implement the method of any of claims 1-5.

12. A computer readable storage medium having stored thereon program instructions for a method for enhancing a low-light image with a generating network, which computer readable instructions, when executed by one or more processors, implement the method according to any of claims 1-5.

13. An apparatus for generating a network based on unsupervised generation of countering network training, comprising:

a processor; and

a memory storing program instructions for generating a network based on unsupervised generation of an countermeasure network training, which when executed by the processor, cause the apparatus to implement the method according to any one of claims 6-10.

14. A computer readable storage medium having stored thereon program instructions for generating a network based on unsupervised generation of an countermeasure network training, which computer readable instructions, when executed by one or more processors, implement the method according to any one of claims 6-10.