CN115272527A

CN115272527A - Image coloring method based on color disc countermeasure network

Info

Publication number: CN115272527A
Application number: CN202210924523.XA
Authority: CN
Inventors: 王毅; 乔宇
Original assignee: Shanghai AI Innovation Center
Current assignee: Shanghai AI Innovation Center
Priority date: 2022-08-02
Filing date: 2022-08-02
Publication date: 2022-11-01

Abstract

The invention relates to the technical field of image coloring, and provides an image coloring method based on a color disc countermeasure network, which comprises the following steps: constructing a color palette countermeasure network, wherein the color palette countermeasure network comprises a palette generator

Palette allocation generator

And a color discriminator D; inputting a gray image L into the color disc countermeasure network; by the palette generator

And the palette allocation generator

To generate an estimated color map

According to the gray image L and the estimated color chart

Generating an output image

And judging the output image by the color discriminator D

The authenticity of.

Description

Image coloring method based on color disc countermeasure network

Technical Field

The present invention generally relates to the field of image colorization. In particular, the invention relates to an image coloring method based on a color disk countermeasure network (PalGAN).

Background

Image coloring refers to the prediction of missing color information from grayscale images, which is widely used in the field of old photo processing and other visual editing applications. In addition, since image coloring is largely dependent on scene understanding, it is also used as an agent task for self-supervised learning. In the image coloring task, even if there is true color for supervision, it is still very challenging to predict pixel colors from a gray-scale image, since one input gray scale may correspond to multiple possible color variants.

The following describes an existing image coloring method:

the user-directed coloring method includes a method of coloring based on guidance of a reference image, in which color statistical information is transmitted to a given grayscale image, which can introduce semantic consistency in a neural feature space based on deep learning, and when the reference image and an input grayscale image share similar semantics, the method of coloring based on guidance of the reference image performs well, but its application is limited by reference retrieval quality, which is a drawback especially apparent when processing complex scenes. The user guidance-type coloring method includes a method of coloring based on a local color cue guidance and a language guidance in addition to the guidance based on the reference image. Methods of coloring based on local color cue guidance require a user to provide sufficient local color cues, for example in the form of graffiti, and propagate a given color according to the local affinity of the local color cues. The method of coloring based on language guide indicates by language which colors are used and how the colors are distributed.

The learning-based coloring method includes providing color images from gray input, learning pixel-to-pixel mapping, converting color pictures to gray pictures for pairwise training using large-scale datasets in a self-supervised manner. Among them, lizuka et al (Lizuka S, simo-Serra E, ishikawa H. Let be color! Joint end-to-end learning of global and local image principles for automatic image visualization with hierarchical classification [ J ]. ACM Transactions on Graphics (ToG), 2016, 35 (4): 1-11) propose the use of image-level tags to associate predicted colors with global semantics using global and local convolutional neural networks. Larsson et al (Larsson G, maire M, shakharavich G. Learning sensitivity for automatic color alignment [ C ]// European conference on computer vision. Springer, cham, 2016. In addition, additional input prompts can be integrated into a learning system through simulation, an automatic and semi-automatic image coloring mode is provided, and the model can also be based on expressive force on non-local modeling and a transform framework.

In a method of coloring with an additional prior from a pre-trained model, su et al (Su J W, chu H K, huang J B. Instant-aware image visualization [ C ]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern recognition.2020: 7968-7977.) propose using an off-the-shelf detector with instance-level annotations (e.g., instance bounding boxes and classes) in which the coloring model is focused on color rendering without the need to identify high-level semantics. In addition to the mentioned pre-trained discriminative model, the pre-trained generative model may also be used to improve the coloring performance of the diversity. Wu et al (Wu Y, wang X, li Y, et al. Methods visual and reverse image color orientation with generating color prior [ C ]// Proceedings of the IEEE/CVF International Conference on Computer vision.2021: 14377-14386.) propose to combine the color from pre-trained BigGAN to help the depth model to produce color results with diversity, where an additional encoder is proposed to project a given grayscale image into the underlying code and then estimate the color image from BigGAN, which by means of such preliminary prediction further refines the color results by the intermediate features in BigGAN. Affi et al (Affi M, brubaker M A, brown M S.Histogan: controlling colors of gan-generated and real images of color schemes [ C ]// Proceedings of the IEEE/CVF conference on computer vision and pattern recognition.2021: 7941-7950) use a pre-trained StyleGAN for image re-coloring, where color is controlled by histogram features.

However, the prior art still has the following problems: the prior art typically formulates coloring as a pixel-level regression task, which can be affected by multi-modal representations. For example, through extensive training data and end-to-end learning of depth models to learn color prior distributions, such as vegetation greenness, human skin tones, etc., but these methods tend to predict the average color of brown when objects with inherent color ambiguity (e.g., human clothing, automobiles, and other artifacts) are involved. For multi-modal problems, the prior art proposes to formulate color prediction as pixel-level color classification, where each pixel is assigned multiple colors according to a posteriori probability, however they suffer from region color inconsistency due to independent pixel-by-pixel sampling mechanisms. While the sampling problem can be partially aided by means of sequential modeling, the one-way sequential dependency of two-dimensional flat pixel primitives can lead to accumulation of errors and hinder learning efficiency.

Disclosure of Invention

To at least partially solve the above problems in the prior art, the present invention provides an image coloring method based on a color disk countermeasure network, comprising the following steps:

constructing a color palette countermeasure network, wherein the color palette countermeasure network comprises a palette generator

Palette allocation generator

And a color discriminator D;

inputting a gray image L into the color disc countermeasure network;

by said palette generator

And the palette allocation generator

To generate an estimated color map

According to the gray image L and the estimated color chart

Generating an output image

And

judging the output image by the color discriminator D

The authenticity of.

In one embodiment of the invention, provision is made for the palette generator to be designed as a single palette generator

Generating a palette histogram from the grayscale image L

Expressed as the following formula:

wherein, the palette histogram

The representation represents the probability of the palette,

and

represents the a-axis and b-axis in CIE Lab color space; and

by said palette allocation generator

According to the palette histogram

And the latent code z generates a and b values in CIE Lab color space, expressed as:

in one embodiment of the invention it is provided that the palette allocation generator

Comprises a residual block, a palette normalization layer, and a chroma attention moduleThe palette normalization layer normalizes and normalizes input features

Parameterized affine transformation, g (-) represents the fully connected layer.

In one embodiment of the invention, it is provided that the color attention module comprises a global interaction module and a local description module, wherein the color attention module inputs the feature map F, the high-level feature map S and the gray level image L, and the global interaction module executes a global interaction operation to generate a first feature map F ^g Performing a partial description operation by the partial description module to generate a second feature map F ^l The first characteristic diagram F ^g And said second characteristic diagram F ^l Fusing to generate a feature map residual F ', and adding the feature map residual F' back to the feature map F, represented as:

wherein f (-) represents a non-linear fusion operation,

Representing channel dimension stitching operations, CA _g Representing Global interoperation, CA _l Illustrating a partial description operation.

In one embodiment of the present invention, it is provided that the global interaction comprises reconstructing each regional feature point in the feature map F using the weights of the other regional feature points, and calculating local weights based on semantic similarity between regional feature points, expressed as:

where p and q represent patches centered around pixel positions p and q, respectively, S within the feature map F ^K And SQ represents a signature transformed from said high level signature S using convolution; and

the local description operation comprises mapping the grayscale image L to a corresponding ab-feature map by a local affine transformation { a, B }, represented as:

F ^l ＝A⊙L↓+B

wherein ↓ represents a multiplication operator element by element, ↓representsa down-sampling operation, and Ψ represents a learnable transform, cov (·,) represents the local covariance, var (·) represents the local variance, calculated for a given feature graph,

And

respectively, the smoothing process for F and L with an averaging filter, and e represents a positive number parameter.

In one embodiment of the invention, it is provided that a palette optimization is carried out, wherein the palette histogram is generated

A kernel weighted sum representation is performed, represented by the following formula:

wherein Ca (x) and C _b (x) Represents the value of pixel x in the a and b channels, respectively, k represents the kernel function, and σ represents a parameter that controls the smoothness of the neighboring bin; and

conducting palette regularization, wherein histogram of palettes is maximized

Entropy of (2)

To increase color diversity, represented by the formula

In one embodiment of the invention, it is provided that the color discriminator D outputs an image

Conversion to one-dimensional features g ∈ R ^256×1 And fusing the one-dimensional features with the palette by inner product and computing an output image

Of (2) authenticity

Represented by the formula:

where W represents a learnable linear projection.

In one embodiment of the invention, provision is made for the palette generator to be a binary generator

And the palette allocation generator

Training wherein the palette generator is trained

The optimization goal for training is expressed as:

where the repetition term represents the learning of palette reconstruction, the regularization term represents the learning of palette regularization, λ _rec1 And λ _rg Representing a balance parameter; and

assigning a generator to the palette

The optimization goal for training is expressed as:

wherein, the regression term is used for learning pixel level regression, the regression term is used for learning palette reconstruction, the adaptive term is used for learning countertraining,

estimating color maps from Eqn

Middle extract of (A) _reg 、λ _rec2 And λ _adv The balance parameters are represented by a number of parameters,

a training target representing a loss generator against training,

a training target representing a loss estimator of the opponent training, where P _I Representing the rgb image distribution.

In one embodiment of the invention, it is provided that the palette generator is jointly trained in a progressive manner

And the palette allocation generator

The present invention also provides a computer system, comprising:

a processor configured to execute machine-executable instructions; and

a memory having stored thereon machine executable instructions which, when executed by the processor, perform steps according to the method.

The invention has at least the following beneficial effects: the invention provides an image coloring method based on a color disc confrontation network, wherein coloring is decomposed into palette estimation and pixel distribution, so that the challenges of color blurring and region homogeneity can be effectively avoided, and natural diversification and controllable coloring are supported. Aiming at the color similarity which is less researched in the prior art, the invention provides a color attention module to consider semantic and local detail correspondence, and applies the correlation to color generation, thereby effectively reducing the color bleeding effect. In addition, the invention can enhance the fidelity and the sense of reality of coloring through color normalization, and further promote the diversity of generated colors.

Drawings

To further clarify the advantages and features that may be present in various embodiments of the present invention, a more particular description of various embodiments of the invention will be rendered by reference to the appended drawings. It is appreciated that these drawings depict only typical embodiments of the invention and are therefore not to be considered limiting of its scope. In the drawings, the same or corresponding parts will be denoted by the same or similar reference numerals for clarity.

Fig. 1 shows a computer system implementing the method according to the invention.

Fig. 2 shows an image coloring method based on a color palette countermeasure network in an embodiment of the present invention.

Fig. 3 shows a schematic diagram of a framework of a color wheel countermeasure network in an embodiment of the invention.

FIG. 4 shows a block diagram of a color attention module in one embodiment of the invention.

Detailed Description

It should be noted that the components in the figures may be exaggerated and not necessarily to scale for illustrative purposes. In the figures, identical or functionally identical components are provided with the same reference symbols.

In the present invention, "disposed on" \\8230 "", "disposed over" \8230 "", and "disposed over" \8230 "", do not exclude the presence of an intermediate therebetween, unless otherwise specified. Furthermore, "arranged above or 8230that" on "merely indicates the relative positional relationship between the two components, but in certain cases, for example after reversing the product direction, can also be switched to" arranged below or below "8230, and vice versa.

In the present invention, the embodiments are only intended to illustrate the aspects of the present invention, and should not be construed as limiting.

In the present invention, the terms "a" and "an" do not exclude the presence of a plurality of elements, unless otherwise specified.

It is further noted herein that in embodiments of the present invention, only a portion of the components or assemblies may be shown for clarity and simplicity, but those of ordinary skill in the art will appreciate that, given the teachings of the present invention, required components or assemblies may be added as needed in a particular scenario. Furthermore, features from different embodiments of the invention may be combined with each other, unless otherwise indicated. For example, a feature of the second embodiment may be substituted for a corresponding or functionally equivalent or similar feature of the first embodiment, and the resulting embodiments are likewise within the scope of the disclosure or recitation of the present application.

It is also noted herein that, within the scope of the present invention, the terms "same", "equal", and the like do not mean that the two values are absolutely equal, but allow some reasonable error, that is, the terms also encompass "substantially the same", "substantially equal". By analogy, in the present disclosure, the terms "perpendicular," parallel, "and the like in the directions of the tables also encompass the meanings of" substantially perpendicular, "" substantially parallel.

The numbering of the steps of the methods of the present invention does not limit the order of execution of the steps of the methods. Unless specifically stated, the method steps may be performed in a different order.

The invention is further elucidated with reference to the drawings in conjunction with the detailed description.

Fig. 1 shows a computer system 100 implementing the method according to the invention. Unless specifically stated otherwise, the method according to the present invention may be performed in the computer system 100 shown in FIG. 1 to achieve the objectives of the present invention, or the present invention may be distributively implemented in a plurality of computer systems 100 according to the present invention through a network, such as a local area network or the Internet. The computer system 100 of the present invention may include various types of computer systems, such as hand-held devices, laptop computers, personal Digital Assistants (PDAs), multiprocessor systems, microprocessor-based or programmable consumer electronics, network PCs, minicomputers, mainframe computers, network servers, tablet computers, and the like.

As shown in FIG. 1, computer system 100 includes a processor 111, a system bus 101, a system memory 102, a video adapter 105, an audio adapter 107, a hard drive interface 109, an optical drive interface 113, a network interface 114, and a Universal Serial Bus (USB) interface 112. The system bus 101 may be any of several types of bus structures including a memory bus or memory controller, a peripheral bus, and a local bus using any of a variety of bus architectures. The system bus 101 is used for communication between the respective bus devices. In addition to the bus devices or interfaces shown in fig. 1, other bus devices or interfaces are also contemplated. The system memory 102 includes a Read Only Memory (ROM) 103 and a Random Access Memory (RAM) 104, where the ROM 103 may store, for example, basic input/output system (BIOS) data used to implement basic routines for information transfer at start-up, and the RAM 104 is used to provide operating memory for the system that is accessed quickly. The computer system 100 further includes a hard disk drive 109 for reading from and writing to a hard disk 110, an optical drive interface 113 for reading from or writing to optical media such as a CD-ROM, and the like. Hard disk 110 may store, for example, an operating system and application programs. The drives and their associated computer-readable media provide nonvolatile storage of computer readable instructions, data structures, program modules and other data for the computer system 100. Computer system 100 may also include a video adapter 105 for image processing and/or image output for connecting an output device such as a display 106. Computer system 100 may also include an audio adapter 107 for audio processing and/or audio output, for connecting output devices such as speakers 108. In addition, computer system 100 may also include a network interface 114 for network connections, where network interface 114 may connect to the Internet 116 through a network device, such as a router 115, where the connection may be wired or wireless. In addition, computer system 100 may also include a universal serial bus interface (USB) 112 for connecting peripheral devices, including, for example, a keyboard 117, a mouse 118, and other peripheral devices, such as a microphone, a camera, and the like.

When the present invention is implemented on the computer system 100 described in fig. 1, the coloring can be decomposed into palette estimation and pixel allocation, effectively circumventing the challenges of color blur and region homogeneity, and supporting natural diversity and controllable coloring.

Furthermore, embodiments may be provided as a computer program product that may include one or more machine-readable media having stored thereon machine-executable instructions that, when executed by one or more machines such as a computer, network of computers, or other electronic devices, may result in the one or more machines performing operations according to embodiments of the present invention. The machine-readable medium may include, but is not limited to, floppy diskettes, optical disks, CD-ROMs (compact disc read-only memories), and magneto-optical disks, ROMs (read-only memories), RAMs (random access memories), EPROMs (erasable programmable read-only memories), EEPROMs (electrically erasable programmable read-only memories), magnetic or optical cards, flash memory, or other type of media/machine-readable medium suitable for storing machine-executable instructions.

Moreover, embodiments may be downloaded as a computer program product, wherein the program may be transferred from a remote computer (e.g., a server) to a requesting computer (e.g., a client) by way of one or more data signals embodied in and/or modulated by a carrier wave or other propagation medium via a communication link (e.g., a modem and/or network connection). Thus, a machine-readable medium as used herein may include, but is not necessarily required to be, such a carrier wave.

Fig. 2 is a schematic flow chart illustrating an image coloring method based on a color wheel countermeasure network according to an embodiment of the present invention. As shown in fig. 2, the method may include the steps of:

step 201, constructing a color palette countermeasure network (PalGAN), wherein the color palette countermeasure network comprises a palette generator

Palette allocation generator

And a color discriminator D.

Step 202, inputting a gray image L into the color wheel countermeasure network.

Step 203, generating the palette

And the palette allocation generator

To generate an estimated color map

Step 204, according to the gray image L and the estimated color chart

Generating an output color image

Step 205, judging the output image by the color discriminator D

The authenticity of.

Fig. 3 shows a schematic diagram of a framework of a color wheel countermeasure network in an embodiment of the invention. As shown in fig. 3, palGAN aims at coloring grayscale images, which decomposes coloring into palette prediction and color assignment problems. This solution not only results in an improvement in coloring effect, but also enables manipulation of global color by adjusting or regularizing the color palette, as compared to directly learning the pixel-to-pixel mapping from grayscale to color, which is employed by most learning-based methods.

For PalGAN, its input is a grayscale image (e.g., the luminance channel of a color image)

The output being an estimated colour map

The estimated color map will be used as a complement with L.

Palette generator

Estimating a global palette probability for a given grayscale image, expressed as

A 2D chroma histogram may be used

To express palette probability (

And

representing the a and b axes), modeling color information statistics, rather than learning deterministic statistics. On-palette generator

In

Is a network of encoders with multiple convolutional layers and several multilayer perceptions (MLPs), ending with a sigmoid function. The former is to extract features, and the latter is to convert spatial features into histograms (vector form). By explicit representation of the palette in the form of a histogram, it only makes the global color distribution more predictableAnd proper regularization can be introduced to operate.

User-directed coloring has proven effective in coloring images using color histograms of reference images in the prior art. Compared with the prior art, the invention synthesizes the palette histogram taking the input gray as the condition, and does not obtain the palette histogram from the reference image specified by the user. This makes the method a self-contained fully automatic coloring system that does not rely on any external guidance (i.e., reference images) to work, and in addition, estimating the palette histogram for each particular gray level can provide more accurate and meaningful information to the coloring process than selecting reference images in the field.

Palette allocation generator

Color assignment by conditional image generation based on palette histogram

And the additional potential code z (sampled from the normal distribution) generates the corresponding ab, denoted as

Palette allocation generator

Is a convolution generator consisting of a block of common residues used in image translation, and a custom Palette Normalization (PN) layer and Chroma Attention (CA) module. Palette normalization aims at facilitating generated color channels and palette guidance

For consistency, this guideline is used with each Batch Normalization layer. Specifically, the PN layer first normalizes its input characteristics and then performs a normalization by

Parameterized affine transformationInstead (where g (.) is the fully connected layer).

FIG. 4 shows a block diagram of a color attention module in one embodiment of the invention. The color attention module can explicitly align color affinity with its corresponding semantic and low-level features, effectively mitigating potential color bleeding or semantic misinterpretation.

The chroma attention module incorporates semantics and low-level similarities into building color relationships through global interaction and local description sub-modules. Specifically, the input to CA is a high resolution feature map F (of size

From

) A high level feature map S and a resized gray input L. It outputs two characteristic maps F from global interaction and local description respectively ^g And F ^l And fusing them into a feature map residual, adding back to the input feature map, represented by the following formula:

where f (-) is a nonlinear fusion operation formed by two successive convolutional layers,

is a channel dimension stitching operation. CA _g And CA _l Representing global interactions and local descriptions, respectively. In the present embodiment

Each regional feature point is reconstructed from the input feature map using a weighted sum of other feature points in the global interaction submodule and calculating such local weights from their semantic similarity. Formally, it is written as

Where p and q represent patches centered around pixel locations p and q within F, respectively. And wpq is calculated from the region interaction in the high-level feature map learned from the input gray-scale image. The region feature interaction is measured by normalizing the cosine similarity between the region features, and is expressed as:

wherein S represents a high level feature map, slave encoder

Extracted from the intermediate representation of (1). S ^K And S ^Q Two profiles are shown, converted from S using convolution.

Although the color variations of the texture and edges are subtle, ignoring these subtle differences can result in significant visual degradation. To preserve these details, global interactions may be supplemented by local description sub-modules. Assuming that local color affinities are linearly related to their corresponding intensities, this local relationship can be learned in a guided filter fashion that preserves guided edges well. The local description submodule computes a learnable local affine transformation

So as to make a gray scale image

Maps to its corresponding ab-feature map, expressed as:

F ^l ＝A⊙L↓+B

where | _ is the element-by-element multiplication operator, ↓ is a downsampling operation to ensure the spatial size of L and F ^l The same is true. { A, B } is parameterized by the learnable local correlation between L and F, represented by the following equation:

where Ψ is a learnable transform parameterized by a small convolutional network, cov (·,) computes the local covariance between the two feature maps (within a fixed window size), and var () computes the local variance for a given feature map.

And

the smoothed versions of F and L are represented by mean filters, respectively. E is a small positive number used to calculate stability.

To further ensure that the proposed palette allocation generator responds to a given palette, the difference between the palette extracted from the predicted color channel and the palette extracted from the corresponding ground truth may be minimized. However, since the hard threshold is not differentiable from the common histograms of images, the palette histogram can be considered as a joint distribution over a and b, represented by a weighted sum of kernels.

Formally, the color histogram is represented as:

wherein C is _a (x) And C _b (x) Representing the value of pixel x in the a and b channels, respectively. k is for measurement (C) _a (x)，C _b (x) And the kernel function for the difference between $ Z $ given (a, b) is the normalization factor. An anti-quadratic kernel can be used, represented by the formula:

where σ controls the smoothness of the neighboring bins. As a rule of thumb σ =0.1 works best.

To diversify the predicted colors palette regularization may be introduced to combat color separationDull color due to cloth imbalance. In one aspect, an ab histogram in the form of a probability palette can be employed to measure color distribution in the predicted color map and ground truth. Minimizing their differences explicitly takes into account the different color ratios, avoiding convergence to several dominant ratios. On the other hand, the resulting color can be diversified by increasing the probability of rare colors (statistical analysis in training samples). The entropy of the probability palette can be used to control this diversity. In the form of a sheet, the sheet is,

is expressed as

To improve

Can maximize the color diversity

The color discriminator may improve the result of the confrontational training, where the color palette is incorporated into the discriminator in a conditional projection manner. The input (connection between the ab image and its transformed RGB image) can be converted into a one-dimensional feature g e R using a convolution discriminator D ^256×1 . This feature is then merged with the palette by inner products. The possibility of entering authenticity is as follows:

wherein

Is a linear projection that can be learned,

are rgb versions after C and L conversion.

Different optimization objectives may be used to train palette estimation and assignment. For palette estimation, learning about palette reconstruction and regularization is as follows:

wherein λ _rec1 And λ _rg The effects of the different terms are balanced, set to 5.0 and 1.0 respectively.

The optimization goal of palette allocation is formed by pixel-level regression, palette reconstruction, and countertraining, such as:

wherein

Is to use Eqn from

Is extracted from the Chinese medicinal herbs. Lambda _reg 、λ _rec2 And λ _adv Set to 5.0, 1.0, respectively.

For the adversarial loss used, the change loss version can be adopted, whose generator training target is expressed as:

wherein

Is from

And L and P _L Rgb version of the converted representation grayscale mapImage distribution;

the optimization objective of the discriminator is expressed as:

wherein P is _I Representing the rgb image distribution, C is converted from I.

The palette generator may be jointly trained in a progressive manner

And palette allocation generator

Specifically, for { L _i Are to

The input of (2) is performed,

is that

Where 1 is an indicator function, the value is 1 if its condition is true, and 0 otherwise. ph is from uniform distribution

Is sampled in the middle. Training can be started from τ = 1and then reduced linearly to 0 near the end of learning.

The spectral normalization can be used and the update rule of two time scales used in the training (learning rates of generator and discriminator are 1e-4 and 4e-4, respectively, to stabilize learning ₁ = 0and beta ₁ Adam optimizer of = 0.9. For batch standardization of applications, a synchronized version may be employed. Training the method on the training set of ImageNet, and the training set are combined to form the training setThe data was trained 40 times using 8 TiTAN 2080ti with a batch size of 64. The images in training were randomly cropped from the resized images to a fixed size (256 × 256) with an unchanged aspect ratio. In the test, the image size can be adjusted to 256 × 256 and evaluated.

The present representative works on this method and on ImageNet and COCO-Stuff can be evaluated. On ImageNet, two evaluation protocols were employed. One is to evaluate all methods on the selective subset ctest10K (with 10K pictures) of its validation data (with 50K pictures) according to the protocol in (Larsson G, maire M, shakharavich g.learning representation for automatic visualization [ C ]// European conference on computer vision. Springer, cham, 2016. The other is to run on the complete verification set. For COCO-Stuff, all methods can be tested on their 5K validation images.

The method can be compared to existing learning-based coloring methods including Deoldify, clColor, UGColor, video coloring, instColor, colors, and GPColor, where note that InstColor is learned through a pre-trained object detection model (requiring a tag and bounding box), while GPColor utilizes a pre-trained (on tagged ImageNet) BigGAN. Other methods, including the present method, use only pairs of gray color images for training and a fully automated version without color cues for UGColor. .

The coloring result can be quantitatively evaluated using pixel-level similarity metrics PSNR, SSIM, image-level perception metrics LPIPS, and Freshet Inclusion Distance (FID). LPIPS and FIDs are more amenable to manual evaluation than PSNR and SSIM.

Compared with other methods, the proposed Pal GAN in the present invention does not utilize any annotation or hint in ImageNet (FID: 4.60and 2.78, lpips. And the method also achieves competitive fidelity scores (PSNR and SSIM) that show good color recovery capability of PalGAN. Given a ground truth palette, the present method can provide impressive fidelity performance as well as generation performance, which shows the upper bound performance of the method for reference. In view of the trade-off between fidelity and perceptual results, a fair result can be achieved in all benchmarks.

In addition, the coloring result of the method gives natural, various and fine color prediction in consideration of semantic correspondence and local gradient change. Due to color attention, it suffers less from common color bleeding than other methods.

Furthermore, the existing methods and methods can be manually evaluated, wherein the coloring tests are performed according to the protocols in (Zhang R, lsola P, efors A. Colorful image coloration [ C ]// European conference reference on computer vision. Springer, cham,2016 649-666.) and (Kumar M, weissenborn D, kalchbrenner N. Coloration transmitter [ J ]. ArXiv preprint arXiv:2102.04432, 2021.). In particular, ground truth color images and their corresponding colorization results (from the present method or otherwise) are provided to 20 participants in a random order. These participants need to determine which is more realistic than the other in no more than 2 seconds. Each method has 40 coloring predictions, randomly selecting labels from the lmageNet ctest10 k. The method defeats the competitor with great advantage in the test.

Although the method is trained in a self-supervised manner using synthetic data, it can also handle well real-world black and white historical pictures, where color boundaries and consistency are well handled, with good results on objects and portraits. Furthermore, reference-based (or example-based) coloring can also be performed by using palettes from reference color images, and even if image palettes without semantic relevance to the input are used, palGAN can still adjust a given color distribution well according to the semantics of a given image, keeping color regions consistent.

In other embodiments of the present invention, chromatic attention may be replaced with commonly used global self-attention; the color wheel (statistics of ab channel of image in Lab color space) can be replaced by other color features such as color aggregation vector, the color vectors of all pixels are clustered into k classes, and then the median or mean of k each class can be used as the color feature; the palette generator can be changed from a general image encoder with gradual down-sampling to an image encoder with the resolution maintained in the last dimension reduction or a U-type network structure; the palette allocation generator may also be changed from a typical progressive upsampled image decoder to a U-type network structure, wherein the upsampling and then convolving operations employed may be replaced by transposed convolution.

While various embodiments of the present invention have been described above, it should be understood that they have been presented by way of example only, and not limitation. It will be apparent to persons skilled in the relevant art that various combinations, modifications, and changes can be made thereto without departing from the spirit and scope of the invention. Thus, the breadth and scope of the present invention disclosed herein should not be limited by any of the above-described exemplary embodiments, but should be defined only in accordance with the following claims and their equivalents.

Claims

1. An image coloring method based on a color disk countermeasure network is characterized by comprising the following steps:

constructing a color wheel countermeasure network, wherein the color wheel countermeasure network comprises a palette generator

Palette allocation generator

And a color discriminator D;

inputting a gray image L into the color disc countermeasure network;

by said palette generator

And the palette allocation generator

To generate an estimated color map

According to the gray image L and the estimated color chart

Generating an output color image

And judging the output image by the color discriminator D

The authenticity of.

2. The method of coloring an image based on a color palette countermeasure network of claim 1, wherein the palette generator generates the image based on the color palette countermeasure network

Generating a palette histogram from the grayscale image L

Represented by the formula:

wherein, the color paletteHistogram graph

The representation represents the probability of the palette being,

and

representing the a-axis and b-axis in CIE Lab color space; and

by said palette allocation generator

According to the palette histogram

3. the method of coloring an image based on a color palette countermeasure network of claim 2, wherein the palette allocation generator

Including a residual block, a palette normalization layer by which input features are normalized and by which input features are normalized, and a chroma attention module

4. Color wheel based confrontation network of claim 3The image coloring method is characterized in that the chroma attention module comprises a global interaction module and a local description module, wherein the chroma attention module inputs a feature map F, a high-level feature map S and a gray level image L, the global interaction module executes global interaction operation to generate a first feature map Fg, and the local description module executes local description operation to generate a second feature map F ^l The first feature map Fg and the second feature map F ^l Fusing to generate a feature map residual F ', and adding the feature map residual F' back to the feature map F, represented as:

wherein f (-) represents a non-linear fusion operation,

Representing channel dimension stitching operations, CA _g Representing global interoperation, CA _l Illustrating the partial description operation.

5. The method of coloring images based on color wheel confrontation network of claim 4, wherein the global interactive operation includes reconstructing each region feature point in the feature map F using the weighting of other region feature points and calculating local weight according to semantic similarity between region feature points, expressed as:

where p and q represent patches centered around pixel positions p and q, respectively, S within the feature map F ^K And S ^Q Representing a feature map transformed from the high level feature map S using convolution; and

F ^l ＝A⊙L↓+B

wherein |, denotes the element-by-element multiplication operator, ↓, denotes a down-sampling operation, Ψ denotes a learnable transform, cov (·,) represents the local covariance, var (·) represents the local variance of the given feature graph,

And

respectively, the smoothing process for F and L with an averaging filter, and e represents a positive parameter.

6. The method of claim 5, wherein the palette optimization is performed by using the palette histogram as a basis

A kernel-weighted sum representation is performed, represented by the following formula:

wherein C is _a (x) And C _b (x) Represents the value of pixel x in the a and b channels, respectively, k represents the kernel function, and σ represents a parameter that controls the smoothness of the neighboring bin; and

conducting palette regularization, wherein histogram of palettes is maximized

Entropy of

To increase color diversity, represented by the formula

7. The method for coloring image based on color wheel countermeasure network of claim 6, wherein the color discriminator D outputs the image

Authenticity of

Represented by the formula:

where W represents a learnable linear projection.

8. The method for coloring an image based on a color palette countermeasure network as claimed in claim 7, wherein the color palette generator is applied

And the palette allocation generator

Training, wherein the palette generator is trained

The optimization goal for training is expressed as:

assigning a generator to the palette

The optimization goal for the training is expressed as:

estimating color maps from Eqn

Middle extract of (A) ("lambda _reg 、λ _rec2 And λ _adv The balance parameters are represented by a number of parameters,

a training target representing a loss generator against training,

9. The method for coloring images based on color wheel countermeasure network of claim 8, wherein the joint training is performed in a gradual mannerPalette generator

And the palette allocation generator

10. A computer system, comprising:

a processor configured to execute machine executable instructions; and

memory having stored thereon machine executable instructions which, when executed by the processor, perform the steps of the method according to one of claims 1 to 9.