CN115272527A - Image coloring method based on color disc countermeasure network - Google Patents

Image coloring method based on color disc countermeasure network Download PDF

Info

Publication number
CN115272527A
CN115272527A CN202210924523.XA CN202210924523A CN115272527A CN 115272527 A CN115272527 A CN 115272527A CN 202210924523 A CN202210924523 A CN 202210924523A CN 115272527 A CN115272527 A CN 115272527A
Authority
CN
China
Prior art keywords
palette
color
image
generator
feature map
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210924523.XA
Other languages
Chinese (zh)
Inventor
王毅
乔宇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai AI Innovation Center
Original Assignee
Shanghai AI Innovation Center
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai AI Innovation Center filed Critical Shanghai AI Innovation Center
Priority to CN202210924523.XA priority Critical patent/CN115272527A/en
Publication of CN115272527A publication Critical patent/CN115272527A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T11/002D [Two Dimensional] image generation
    • G06T11/40Filling a planar surface by adding surface attributes, e.g. colour or texture
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/56Extraction of image or video features relating to colour
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/762Arrangements for image or video recognition or understanding using pattern recognition or machine learning using clustering, e.g. of similar faces in social networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Multimedia (AREA)
  • Computing Systems (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Image Analysis (AREA)

Abstract

The invention relates to the technical field of image coloring, and provides an image coloring method based on a color disc countermeasure network, which comprises the following steps: constructing a color palette countermeasure network, wherein the color palette countermeasure network comprises a palette generator
Figure DDA0003777727570000017
Palette allocation generator
Figure DDA0003777727570000012
And a color discriminator D; inputting a gray image L into the color disc countermeasure network; by the palette generator
Figure DDA0003777727570000013
And the palette allocation generator
Figure DDA0003777727570000018
To generate an estimated color map
Figure DDA0003777727570000011
According to the gray image L and the estimated color chart
Figure DDA0003777727570000016
Generating an output image
Figure DDA0003777727570000015
And judging the output image by the color discriminator D
Figure DDA0003777727570000014
The authenticity of.

Description

Image coloring method based on color disc countermeasure network
Technical Field
The present invention generally relates to the field of image colorization. In particular, the invention relates to an image coloring method based on a color disk countermeasure network (PalGAN).
Background
Image coloring refers to the prediction of missing color information from grayscale images, which is widely used in the field of old photo processing and other visual editing applications. In addition, since image coloring is largely dependent on scene understanding, it is also used as an agent task for self-supervised learning. In the image coloring task, even if there is true color for supervision, it is still very challenging to predict pixel colors from a gray-scale image, since one input gray scale may correspond to multiple possible color variants.
The following describes an existing image coloring method:
the user-directed coloring method includes a method of coloring based on guidance of a reference image, in which color statistical information is transmitted to a given grayscale image, which can introduce semantic consistency in a neural feature space based on deep learning, and when the reference image and an input grayscale image share similar semantics, the method of coloring based on guidance of the reference image performs well, but its application is limited by reference retrieval quality, which is a drawback especially apparent when processing complex scenes. The user guidance-type coloring method includes a method of coloring based on a local color cue guidance and a language guidance in addition to the guidance based on the reference image. Methods of coloring based on local color cue guidance require a user to provide sufficient local color cues, for example in the form of graffiti, and propagate a given color according to the local affinity of the local color cues. The method of coloring based on language guide indicates by language which colors are used and how the colors are distributed.
The learning-based coloring method includes providing color images from gray input, learning pixel-to-pixel mapping, converting color pictures to gray pictures for pairwise training using large-scale datasets in a self-supervised manner. Among them, lizuka et al (Lizuka S, simo-Serra E, ishikawa H. Let be color! Joint end-to-end learning of global and local image principles for automatic image visualization with hierarchical classification [ J ]. ACM Transactions on Graphics (ToG), 2016, 35 (4): 1-11) propose the use of image-level tags to associate predicted colors with global semantics using global and local convolutional neural networks. Larsson et al (Larsson G, maire M, shakharavich G. Learning sensitivity for automatic color alignment [ C ]// European conference on computer vision. Springer, cham, 2016. In addition, additional input prompts can be integrated into a learning system through simulation, an automatic and semi-automatic image coloring mode is provided, and the model can also be based on expressive force on non-local modeling and a transform framework.
In a method of coloring with an additional prior from a pre-trained model, su et al (Su J W, chu H K, huang J B. Instant-aware image visualization [ C ]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern recognition.2020: 7968-7977.) propose using an off-the-shelf detector with instance-level annotations (e.g., instance bounding boxes and classes) in which the coloring model is focused on color rendering without the need to identify high-level semantics. In addition to the mentioned pre-trained discriminative model, the pre-trained generative model may also be used to improve the coloring performance of the diversity. Wu et al (Wu Y, wang X, li Y, et al. Methods visual and reverse image color orientation with generating color prior [ C ]// Proceedings of the IEEE/CVF International Conference on Computer vision.2021: 14377-14386.) propose to combine the color from pre-trained BigGAN to help the depth model to produce color results with diversity, where an additional encoder is proposed to project a given grayscale image into the underlying code and then estimate the color image from BigGAN, which by means of such preliminary prediction further refines the color results by the intermediate features in BigGAN. Affi et al (Affi M, brubaker M A, brown M S.Histogan: controlling colors of gan-generated and real images of color schemes [ C ]// Proceedings of the IEEE/CVF conference on computer vision and pattern recognition.2021: 7941-7950) use a pre-trained StyleGAN for image re-coloring, where color is controlled by histogram features.
However, the prior art still has the following problems: the prior art typically formulates coloring as a pixel-level regression task, which can be affected by multi-modal representations. For example, through extensive training data and end-to-end learning of depth models to learn color prior distributions, such as vegetation greenness, human skin tones, etc., but these methods tend to predict the average color of brown when objects with inherent color ambiguity (e.g., human clothing, automobiles, and other artifacts) are involved. For multi-modal problems, the prior art proposes to formulate color prediction as pixel-level color classification, where each pixel is assigned multiple colors according to a posteriori probability, however they suffer from region color inconsistency due to independent pixel-by-pixel sampling mechanisms. While the sampling problem can be partially aided by means of sequential modeling, the one-way sequential dependency of two-dimensional flat pixel primitives can lead to accumulation of errors and hinder learning efficiency.
Disclosure of Invention
To at least partially solve the above problems in the prior art, the present invention provides an image coloring method based on a color disk countermeasure network, comprising the following steps:
constructing a color palette countermeasure network, wherein the color palette countermeasure network comprises a palette generator
Figure BDA0003777727550000031
Palette allocation generator
Figure BDA0003777727550000032
And a color discriminator D;
inputting a gray image L into the color disc countermeasure network;
by said palette generator
Figure BDA0003777727550000033
And the palette allocation generator
Figure BDA0003777727550000034
To generate an estimated color map
Figure BDA0003777727550000035
According to the gray image L and the estimated color chart
Figure BDA0003777727550000036
Generating an output image
Figure BDA0003777727550000037
And
judging the output image by the color discriminator D
Figure BDA0003777727550000038
The authenticity of.
In one embodiment of the invention, provision is made for the palette generator to be designed as a single palette generator
Figure BDA0003777727550000039
Generating a palette histogram from the grayscale image L
Figure BDA00037777275500000310
Expressed as the following formula:
Figure BDA00037777275500000311
Figure BDA00037777275500000312
Figure BDA00037777275500000313
wherein, the palette histogram
Figure BDA00037777275500000314
The representation represents the probability of the palette,
Figure BDA00037777275500000315
and
Figure BDA00037777275500000316
represents the a-axis and b-axis in CIE Lab color space; and
by said palette allocation generator
Figure BDA00037777275500000317
According to the palette histogram
Figure BDA00037777275500000318
And the latent code z generates a and b values in CIE Lab color space, expressed as:
Figure BDA00037777275500000319
in one embodiment of the invention it is provided that the palette allocation generator
Figure BDA00037777275500000320
Comprises a residual block, a palette normalization layer, and a chroma attention moduleThe palette normalization layer normalizes and normalizes input features
Figure BDA00037777275500000321
Parameterized affine transformation, g (-) represents the fully connected layer.
In one embodiment of the invention, it is provided that the color attention module comprises a global interaction module and a local description module, wherein the color attention module inputs the feature map F, the high-level feature map S and the gray level image L, and the global interaction module executes a global interaction operation to generate a first feature map F g Performing a partial description operation by the partial description module to generate a second feature map F l The first characteristic diagram F g And said second characteristic diagram F l Fusing to generate a feature map residual F ', and adding the feature map residual F' back to the feature map F, represented as:
Figure BDA0003777727550000041
Figure BDA0003777727550000042
wherein f (-) represents a non-linear fusion operation,
Figure BDA0003777727550000043
Representing channel dimension stitching operations, CA g Representing Global interoperation, CA l Illustrating a partial description operation.
In one embodiment of the present invention, it is provided that the global interaction comprises reconstructing each regional feature point in the feature map F using the weights of the other regional feature points, and calculating local weights based on semantic similarity between regional feature points, expressed as:
Figure BDA0003777727550000044
Figure BDA0003777727550000045
where p and q represent patches centered around pixel positions p and q, respectively, S within the feature map F K And SQ represents a signature transformed from said high level signature S using convolution; and
the local description operation comprises mapping the grayscale image L to a corresponding ab-feature map by a local affine transformation { a, B }, represented as:
F l =A⊙L↓+B
Figure BDA0003777727550000046
Figure BDA0003777727550000047
wherein ↓ represents a multiplication operator element by element, ↓representsa down-sampling operation, and Ψ represents a learnable transform, cov (·,) represents the local covariance, var (·) represents the local variance, calculated for a given feature graph,
Figure BDA0003777727550000048
And
Figure BDA0003777727550000049
respectively, the smoothing process for F and L with an averaging filter, and e represents a positive number parameter.
In one embodiment of the invention, it is provided that a palette optimization is carried out, wherein the palette histogram is generated
Figure BDA00037777275500000410
A kernel weighted sum representation is performed, represented by the following formula:
Figure BDA00037777275500000411
Figure BDA0003777727550000051
wherein Ca (x) and C b (x) Represents the value of pixel x in the a and b channels, respectively, k represents the kernel function, and σ represents a parameter that controls the smoothness of the neighboring bin; and
conducting palette regularization, wherein histogram of palettes is maximized
Figure BDA0003777727550000052
Entropy of (2)
Figure BDA0003777727550000053
To increase color diversity, represented by the formula
Figure BDA0003777727550000054
In one embodiment of the invention, it is provided that the color discriminator D outputs an image
Figure BDA0003777727550000055
Conversion to one-dimensional features g ∈ R 256×1 And fusing the one-dimensional features with the palette by inner product and computing an output image
Figure BDA0003777727550000056
Of (2) authenticity
Figure BDA0003777727550000057
Represented by the formula:
Figure BDA0003777727550000058
Figure BDA0003777727550000059
Figure BDA00037777275500000510
where W represents a learnable linear projection.
In one embodiment of the invention, provision is made for the palette generator to be a binary generator
Figure BDA00037777275500000511
And the palette allocation generator
Figure BDA00037777275500000512
Training wherein the palette generator is trained
Figure BDA00037777275500000513
The optimization goal for training is expressed as:
Figure BDA00037777275500000514
where the repetition term represents the learning of palette reconstruction, the regularization term represents the learning of palette regularization, λ rec1 And λ rg Representing a balance parameter; and
assigning a generator to the palette
Figure BDA00037777275500000515
The optimization goal for training is expressed as:
Figure BDA00037777275500000516
Figure BDA00037777275500000517
Figure BDA00037777275500000518
Figure BDA00037777275500000519
wherein, the regression term is used for learning pixel level regression, the regression term is used for learning palette reconstruction, the adaptive term is used for learning countertraining,
Figure BDA00037777275500000520
estimating color maps from Eqn
Figure BDA00037777275500000521
Middle extract of (A) reg 、λ rec2 And λ adv The balance parameters are represented by a number of parameters,
Figure BDA00037777275500000522
a training target representing a loss generator against training,
Figure BDA00037777275500000523
a training target representing a loss estimator of the opponent training, where P I Representing the rgb image distribution.
In one embodiment of the invention, it is provided that the palette generator is jointly trained in a progressive manner
Figure BDA00037777275500000524
And the palette allocation generator
Figure BDA00037777275500000525
The present invention also provides a computer system, comprising:
a processor configured to execute machine-executable instructions; and
a memory having stored thereon machine executable instructions which, when executed by the processor, perform steps according to the method.
The invention has at least the following beneficial effects: the invention provides an image coloring method based on a color disc confrontation network, wherein coloring is decomposed into palette estimation and pixel distribution, so that the challenges of color blurring and region homogeneity can be effectively avoided, and natural diversification and controllable coloring are supported. Aiming at the color similarity which is less researched in the prior art, the invention provides a color attention module to consider semantic and local detail correspondence, and applies the correlation to color generation, thereby effectively reducing the color bleeding effect. In addition, the invention can enhance the fidelity and the sense of reality of coloring through color normalization, and further promote the diversity of generated colors.
Drawings
To further clarify the advantages and features that may be present in various embodiments of the present invention, a more particular description of various embodiments of the invention will be rendered by reference to the appended drawings. It is appreciated that these drawings depict only typical embodiments of the invention and are therefore not to be considered limiting of its scope. In the drawings, the same or corresponding parts will be denoted by the same or similar reference numerals for clarity.
Fig. 1 shows a computer system implementing the method according to the invention.
Fig. 2 shows an image coloring method based on a color palette countermeasure network in an embodiment of the present invention.
Fig. 3 shows a schematic diagram of a framework of a color wheel countermeasure network in an embodiment of the invention.
FIG. 4 shows a block diagram of a color attention module in one embodiment of the invention.
Detailed Description
It should be noted that the components in the figures may be exaggerated and not necessarily to scale for illustrative purposes. In the figures, identical or functionally identical components are provided with the same reference symbols.
In the present invention, "disposed on" \\8230 "", "disposed over" \8230 "", and "disposed over" \8230 "", do not exclude the presence of an intermediate therebetween, unless otherwise specified. Furthermore, "arranged above or 8230that" on "merely indicates the relative positional relationship between the two components, but in certain cases, for example after reversing the product direction, can also be switched to" arranged below or below "8230, and vice versa.
In the present invention, the embodiments are only intended to illustrate the aspects of the present invention, and should not be construed as limiting.
In the present invention, the terms "a" and "an" do not exclude the presence of a plurality of elements, unless otherwise specified.
It is further noted herein that in embodiments of the present invention, only a portion of the components or assemblies may be shown for clarity and simplicity, but those of ordinary skill in the art will appreciate that, given the teachings of the present invention, required components or assemblies may be added as needed in a particular scenario. Furthermore, features from different embodiments of the invention may be combined with each other, unless otherwise indicated. For example, a feature of the second embodiment may be substituted for a corresponding or functionally equivalent or similar feature of the first embodiment, and the resulting embodiments are likewise within the scope of the disclosure or recitation of the present application.
It is also noted herein that, within the scope of the present invention, the terms "same", "equal", and the like do not mean that the two values are absolutely equal, but allow some reasonable error, that is, the terms also encompass "substantially the same", "substantially equal". By analogy, in the present disclosure, the terms "perpendicular," parallel, "and the like in the directions of the tables also encompass the meanings of" substantially perpendicular, "" substantially parallel.
The numbering of the steps of the methods of the present invention does not limit the order of execution of the steps of the methods. Unless specifically stated, the method steps may be performed in a different order.
The invention is further elucidated with reference to the drawings in conjunction with the detailed description.
Fig. 1 shows a computer system 100 implementing the method according to the invention. Unless specifically stated otherwise, the method according to the present invention may be performed in the computer system 100 shown in FIG. 1 to achieve the objectives of the present invention, or the present invention may be distributively implemented in a plurality of computer systems 100 according to the present invention through a network, such as a local area network or the Internet. The computer system 100 of the present invention may include various types of computer systems, such as hand-held devices, laptop computers, personal Digital Assistants (PDAs), multiprocessor systems, microprocessor-based or programmable consumer electronics, network PCs, minicomputers, mainframe computers, network servers, tablet computers, and the like.
As shown in FIG. 1, computer system 100 includes a processor 111, a system bus 101, a system memory 102, a video adapter 105, an audio adapter 107, a hard drive interface 109, an optical drive interface 113, a network interface 114, and a Universal Serial Bus (USB) interface 112. The system bus 101 may be any of several types of bus structures including a memory bus or memory controller, a peripheral bus, and a local bus using any of a variety of bus architectures. The system bus 101 is used for communication between the respective bus devices. In addition to the bus devices or interfaces shown in fig. 1, other bus devices or interfaces are also contemplated. The system memory 102 includes a Read Only Memory (ROM) 103 and a Random Access Memory (RAM) 104, where the ROM 103 may store, for example, basic input/output system (BIOS) data used to implement basic routines for information transfer at start-up, and the RAM 104 is used to provide operating memory for the system that is accessed quickly. The computer system 100 further includes a hard disk drive 109 for reading from and writing to a hard disk 110, an optical drive interface 113 for reading from or writing to optical media such as a CD-ROM, and the like. Hard disk 110 may store, for example, an operating system and application programs. The drives and their associated computer-readable media provide nonvolatile storage of computer readable instructions, data structures, program modules and other data for the computer system 100. Computer system 100 may also include a video adapter 105 for image processing and/or image output for connecting an output device such as a display 106. Computer system 100 may also include an audio adapter 107 for audio processing and/or audio output, for connecting output devices such as speakers 108. In addition, computer system 100 may also include a network interface 114 for network connections, where network interface 114 may connect to the Internet 116 through a network device, such as a router 115, where the connection may be wired or wireless. In addition, computer system 100 may also include a universal serial bus interface (USB) 112 for connecting peripheral devices, including, for example, a keyboard 117, a mouse 118, and other peripheral devices, such as a microphone, a camera, and the like.
When the present invention is implemented on the computer system 100 described in fig. 1, the coloring can be decomposed into palette estimation and pixel allocation, effectively circumventing the challenges of color blur and region homogeneity, and supporting natural diversity and controllable coloring.
Furthermore, embodiments may be provided as a computer program product that may include one or more machine-readable media having stored thereon machine-executable instructions that, when executed by one or more machines such as a computer, network of computers, or other electronic devices, may result in the one or more machines performing operations according to embodiments of the present invention. The machine-readable medium may include, but is not limited to, floppy diskettes, optical disks, CD-ROMs (compact disc read-only memories), and magneto-optical disks, ROMs (read-only memories), RAMs (random access memories), EPROMs (erasable programmable read-only memories), EEPROMs (electrically erasable programmable read-only memories), magnetic or optical cards, flash memory, or other type of media/machine-readable medium suitable for storing machine-executable instructions.
Moreover, embodiments may be downloaded as a computer program product, wherein the program may be transferred from a remote computer (e.g., a server) to a requesting computer (e.g., a client) by way of one or more data signals embodied in and/or modulated by a carrier wave or other propagation medium via a communication link (e.g., a modem and/or network connection). Thus, a machine-readable medium as used herein may include, but is not necessarily required to be, such a carrier wave.
Fig. 2 is a schematic flow chart illustrating an image coloring method based on a color wheel countermeasure network according to an embodiment of the present invention. As shown in fig. 2, the method may include the steps of:
step 201, constructing a color palette countermeasure network (PalGAN), wherein the color palette countermeasure network comprises a palette generator
Figure BDA0003777727550000091
Palette allocation generator
Figure BDA0003777727550000092
And a color discriminator D.
Step 202, inputting a gray image L into the color wheel countermeasure network.
Step 203, generating the palette
Figure BDA0003777727550000093
And the palette allocation generator
Figure BDA0003777727550000094
To generate an estimated color map
Figure BDA0003777727550000095
Step 204, according to the gray image L and the estimated color chart
Figure BDA0003777727550000096
Generating an output color image
Figure BDA0003777727550000097
Step 205, judging the output image by the color discriminator D
Figure BDA0003777727550000098
The authenticity of.
Fig. 3 shows a schematic diagram of a framework of a color wheel countermeasure network in an embodiment of the invention. As shown in fig. 3, palGAN aims at coloring grayscale images, which decomposes coloring into palette prediction and color assignment problems. This solution not only results in an improvement in coloring effect, but also enables manipulation of global color by adjusting or regularizing the color palette, as compared to directly learning the pixel-to-pixel mapping from grayscale to color, which is employed by most learning-based methods.
For PalGAN, its input is a grayscale image (e.g., the luminance channel of a color image)
Figure BDA0003777727550000099
The output being an estimated colour map
Figure BDA00037777275500000910
The estimated color map will be used as a complement with L.
Palette generator
Figure BDA00037777275500000911
Estimating a global palette probability for a given grayscale image, expressed as
Figure BDA00037777275500000912
A 2D chroma histogram may be used
Figure BDA00037777275500000913
To express palette probability (
Figure BDA00037777275500000914
And
Figure BDA00037777275500000915
representing the a and b axes), modeling color information statistics, rather than learning deterministic statistics. On-palette generator
Figure BDA00037777275500000916
In
Figure BDA00037777275500000917
Is a network of encoders with multiple convolutional layers and several multilayer perceptions (MLPs), ending with a sigmoid function. The former is to extract features, and the latter is to convert spatial features into histograms (vector form). By explicit representation of the palette in the form of a histogram, it only makes the global color distribution more predictableAnd proper regularization can be introduced to operate.
User-directed coloring has proven effective in coloring images using color histograms of reference images in the prior art. Compared with the prior art, the invention synthesizes the palette histogram taking the input gray as the condition, and does not obtain the palette histogram from the reference image specified by the user. This makes the method a self-contained fully automatic coloring system that does not rely on any external guidance (i.e., reference images) to work, and in addition, estimating the palette histogram for each particular gray level can provide more accurate and meaningful information to the coloring process than selecting reference images in the field.
Palette allocation generator
Figure BDA00037777275500000918
Color assignment by conditional image generation based on palette histogram
Figure BDA0003777727550000101
And the additional potential code z (sampled from the normal distribution) generates the corresponding ab, denoted as
Figure BDA0003777727550000102
Palette allocation generator
Figure BDA0003777727550000103
Is a convolution generator consisting of a block of common residues used in image translation, and a custom Palette Normalization (PN) layer and Chroma Attention (CA) module. Palette normalization aims at facilitating generated color channels and palette guidance
Figure BDA0003777727550000104
For consistency, this guideline is used with each Batch Normalization layer. Specifically, the PN layer first normalizes its input characteristics and then performs a normalization by
Figure BDA0003777727550000105
Parameterized affine transformationInstead (where g (.) is the fully connected layer).
FIG. 4 shows a block diagram of a color attention module in one embodiment of the invention. The color attention module can explicitly align color affinity with its corresponding semantic and low-level features, effectively mitigating potential color bleeding or semantic misinterpretation.
The chroma attention module incorporates semantics and low-level similarities into building color relationships through global interaction and local description sub-modules. Specifically, the input to CA is a high resolution feature map F (of size
Figure BDA0003777727550000106
From
Figure BDA0003777727550000107
) A high level feature map S and a resized gray input L. It outputs two characteristic maps F from global interaction and local description respectively g And F l And fusing them into a feature map residual, adding back to the input feature map, represented by the following formula:
Figure BDA0003777727550000108
where f (-) is a nonlinear fusion operation formed by two successive convolutional layers,
Figure BDA0003777727550000109
is a channel dimension stitching operation. CA g And CA l Representing global interactions and local descriptions, respectively. In the present embodiment
Figure BDA00037777275500001010
Each regional feature point is reconstructed from the input feature map using a weighted sum of other feature points in the global interaction submodule and calculating such local weights from their semantic similarity. Formally, it is written as
Figure BDA00037777275500001011
Where p and q represent patches centered around pixel locations p and q within F, respectively. And wpq is calculated from the region interaction in the high-level feature map learned from the input gray-scale image. The region feature interaction is measured by normalizing the cosine similarity between the region features, and is expressed as:
Figure BDA00037777275500001012
wherein S represents a high level feature map, slave encoder
Figure BDA00037777275500001013
Extracted from the intermediate representation of (1). S K And S Q Two profiles are shown, converted from S using convolution.
Although the color variations of the texture and edges are subtle, ignoring these subtle differences can result in significant visual degradation. To preserve these details, global interactions may be supplemented by local description sub-modules. Assuming that local color affinities are linearly related to their corresponding intensities, this local relationship can be learned in a guided filter fashion that preserves guided edges well. The local description submodule computes a learnable local affine transformation
Figure BDA0003777727550000111
So as to make a gray scale image
Figure BDA0003777727550000112
Maps to its corresponding ab-feature map, expressed as:
F l =A⊙L↓+B
where | _ is the element-by-element multiplication operator, ↓ is a downsampling operation to ensure the spatial size of L and F l The same is true. { A, B } is parameterized by the learnable local correlation between L and F, represented by the following equation:
Figure BDA0003777727550000113
where Ψ is a learnable transform parameterized by a small convolutional network, cov (·,) computes the local covariance between the two feature maps (within a fixed window size), and var () computes the local variance for a given feature map.
Figure BDA0003777727550000114
And
Figure BDA0003777727550000115
the smoothed versions of F and L are represented by mean filters, respectively. E is a small positive number used to calculate stability.
To further ensure that the proposed palette allocation generator responds to a given palette, the difference between the palette extracted from the predicted color channel and the palette extracted from the corresponding ground truth may be minimized. However, since the hard threshold is not differentiable from the common histograms of images, the palette histogram can be considered as a joint distribution over a and b, represented by a weighted sum of kernels.
Formally, the color histogram is represented as:
Figure BDA0003777727550000116
wherein C is a (x) And C b (x) Representing the value of pixel x in the a and b channels, respectively. k is for measurement (C) a (x),C b (x) And the kernel function for the difference between $ Z $ given (a, b) is the normalization factor. An anti-quadratic kernel can be used, represented by the formula:
Figure BDA0003777727550000117
where σ controls the smoothness of the neighboring bins. As a rule of thumb σ =0.1 works best.
To diversify the predicted colors palette regularization may be introduced to combat color separationDull color due to cloth imbalance. In one aspect, an ab histogram in the form of a probability palette can be employed to measure color distribution in the predicted color map and ground truth. Minimizing their differences explicitly takes into account the different color ratios, avoiding convergence to several dominant ratios. On the other hand, the resulting color can be diversified by increasing the probability of rare colors (statistical analysis in training samples). The entropy of the probability palette can be used to control this diversity. In the form of a sheet, the sheet is,
Figure BDA0003777727550000118
is expressed as
Figure BDA0003777727550000119
To improve
Figure BDA00037777275500001110
Can maximize the color diversity
Figure BDA00037777275500001111
The color discriminator may improve the result of the confrontational training, where the color palette is incorporated into the discriminator in a conditional projection manner. The input (connection between the ab image and its transformed RGB image) can be converted into a one-dimensional feature g e R using a convolution discriminator D 256×1 . This feature is then merged with the palette by inner products. The possibility of entering authenticity is as follows:
Figure BDA0003777727550000121
wherein
Figure BDA0003777727550000122
Is a linear projection that can be learned,
Figure BDA0003777727550000123
are rgb versions after C and L conversion.
Different optimization objectives may be used to train palette estimation and assignment. For palette estimation, learning about palette reconstruction and regularization is as follows:
Figure BDA0003777727550000124
wherein λ rec1 And λ rg The effects of the different terms are balanced, set to 5.0 and 1.0 respectively.
The optimization goal of palette allocation is formed by pixel-level regression, palette reconstruction, and countertraining, such as:
Figure BDA0003777727550000125
wherein
Figure BDA0003777727550000126
Is to use Eqn from
Figure BDA0003777727550000127
Is extracted from the Chinese medicinal herbs. Lambda reg 、λ rec2 And λ adv Set to 5.0, 1.0, respectively.
For the adversarial loss used, the change loss version can be adopted, whose generator training target is expressed as:
Figure BDA0003777727550000128
wherein
Figure BDA0003777727550000129
Figure BDA00037777275500001210
Is from
Figure BDA00037777275500001211
And L and P L Rgb version of the converted representation grayscale mapImage distribution;
the optimization objective of the discriminator is expressed as:
Figure BDA00037777275500001212
wherein P is I Representing the rgb image distribution, C is converted from I.
The palette generator may be jointly trained in a progressive manner
Figure BDA00037777275500001213
And palette allocation generator
Figure BDA00037777275500001214
Specifically, for { L i Are to
Figure BDA00037777275500001215
The input of (2) is performed,
Figure BDA00037777275500001216
is that
Figure BDA00037777275500001217
Figure BDA00037777275500001218
Where 1 is an indicator function, the value is 1 if its condition is true, and 0 otherwise. ph is from uniform distribution
Figure BDA00037777275500001219
Is sampled in the middle. Training can be started from τ = 1and then reduced linearly to 0 near the end of learning.
The spectral normalization can be used and the update rule of two time scales used in the training (learning rates of generator and discriminator are 1e-4 and 4e-4, respectively, to stabilize learning 1 = 0and beta 1 Adam optimizer of = 0.9. For batch standardization of applications, a synchronized version may be employed. Training the method on the training set of ImageNet, and the training set are combined to form the training setThe data was trained 40 times using 8 TiTAN 2080ti with a batch size of 64. The images in training were randomly cropped from the resized images to a fixed size (256 × 256) with an unchanged aspect ratio. In the test, the image size can be adjusted to 256 × 256 and evaluated.
The present representative works on this method and on ImageNet and COCO-Stuff can be evaluated. On ImageNet, two evaluation protocols were employed. One is to evaluate all methods on the selective subset ctest10K (with 10K pictures) of its validation data (with 50K pictures) according to the protocol in (Larsson G, maire M, shakharavich g.learning representation for automatic visualization [ C ]// European conference on computer vision. Springer, cham, 2016. The other is to run on the complete verification set. For COCO-Stuff, all methods can be tested on their 5K validation images.
The method can be compared to existing learning-based coloring methods including Deoldify, clColor, UGColor, video coloring, instColor, colors, and GPColor, where note that InstColor is learned through a pre-trained object detection model (requiring a tag and bounding box), while GPColor utilizes a pre-trained (on tagged ImageNet) BigGAN. Other methods, including the present method, use only pairs of gray color images for training and a fully automated version without color cues for UGColor. .
The coloring result can be quantitatively evaluated using pixel-level similarity metrics PSNR, SSIM, image-level perception metrics LPIPS, and Freshet Inclusion Distance (FID). LPIPS and FIDs are more amenable to manual evaluation than PSNR and SSIM.
Compared with other methods, the proposed Pal GAN in the present invention does not utilize any annotation or hint in ImageNet (FID: 4.60and 2.78, lpips. And the method also achieves competitive fidelity scores (PSNR and SSIM) that show good color recovery capability of PalGAN. Given a ground truth palette, the present method can provide impressive fidelity performance as well as generation performance, which shows the upper bound performance of the method for reference. In view of the trade-off between fidelity and perceptual results, a fair result can be achieved in all benchmarks.
In addition, the coloring result of the method gives natural, various and fine color prediction in consideration of semantic correspondence and local gradient change. Due to color attention, it suffers less from common color bleeding than other methods.
Furthermore, the existing methods and methods can be manually evaluated, wherein the coloring tests are performed according to the protocols in (Zhang R, lsola P, efors A. Colorful image coloration [ C ]// European conference reference on computer vision. Springer, cham,2016 649-666.) and (Kumar M, weissenborn D, kalchbrenner N. Coloration transmitter [ J ]. ArXiv preprint arXiv:2102.04432, 2021.). In particular, ground truth color images and their corresponding colorization results (from the present method or otherwise) are provided to 20 participants in a random order. These participants need to determine which is more realistic than the other in no more than 2 seconds. Each method has 40 coloring predictions, randomly selecting labels from the lmageNet ctest10 k. The method defeats the competitor with great advantage in the test.
Although the method is trained in a self-supervised manner using synthetic data, it can also handle well real-world black and white historical pictures, where color boundaries and consistency are well handled, with good results on objects and portraits. Furthermore, reference-based (or example-based) coloring can also be performed by using palettes from reference color images, and even if image palettes without semantic relevance to the input are used, palGAN can still adjust a given color distribution well according to the semantics of a given image, keeping color regions consistent.
In other embodiments of the present invention, chromatic attention may be replaced with commonly used global self-attention; the color wheel (statistics of ab channel of image in Lab color space) can be replaced by other color features such as color aggregation vector, the color vectors of all pixels are clustered into k classes, and then the median or mean of k each class can be used as the color feature; the palette generator can be changed from a general image encoder with gradual down-sampling to an image encoder with the resolution maintained in the last dimension reduction or a U-type network structure; the palette allocation generator may also be changed from a typical progressive upsampled image decoder to a U-type network structure, wherein the upsampling and then convolving operations employed may be replaced by transposed convolution.
While various embodiments of the present invention have been described above, it should be understood that they have been presented by way of example only, and not limitation. It will be apparent to persons skilled in the relevant art that various combinations, modifications, and changes can be made thereto without departing from the spirit and scope of the invention. Thus, the breadth and scope of the present invention disclosed herein should not be limited by any of the above-described exemplary embodiments, but should be defined only in accordance with the following claims and their equivalents.

Claims (10)

1. An image coloring method based on a color disk countermeasure network is characterized by comprising the following steps:
constructing a color wheel countermeasure network, wherein the color wheel countermeasure network comprises a palette generator
Figure FDA0003777727540000011
Palette allocation generator
Figure FDA0003777727540000012
And a color discriminator D;
inputting a gray image L into the color disc countermeasure network;
by said palette generator
Figure FDA0003777727540000013
And the palette allocation generator
Figure FDA0003777727540000014
To generate an estimated color map
Figure FDA0003777727540000015
According to the gray image L and the estimated color chart
Figure FDA0003777727540000016
Generating an output color image
Figure FDA0003777727540000017
And judging the output image by the color discriminator D
Figure FDA0003777727540000018
The authenticity of.
2. The method of coloring an image based on a color palette countermeasure network of claim 1, wherein the palette generator generates the image based on the color palette countermeasure network
Figure FDA0003777727540000019
Generating a palette histogram from the grayscale image L
Figure FDA00037777275400000110
Represented by the formula:
Figure FDA00037777275400000111
Figure FDA00037777275400000112
Figure FDA00037777275400000113
wherein, the color paletteHistogram graph
Figure FDA00037777275400000114
The representation represents the probability of the palette being,
Figure FDA00037777275400000115
and
Figure FDA00037777275400000116
representing the a-axis and b-axis in CIE Lab color space; and
by said palette allocation generator
Figure FDA00037777275400000117
According to the palette histogram
Figure FDA00037777275400000118
And the latent code z generates a and b values in CIE Lab color space, expressed as:
Figure FDA00037777275400000119
3. the method of coloring an image based on a color palette countermeasure network of claim 2, wherein the palette allocation generator
Figure FDA00037777275400000120
Including a residual block, a palette normalization layer by which input features are normalized and by which input features are normalized, and a chroma attention module
Figure FDA00037777275400000121
Parameterized affine transformation, g (-) represents the fully connected layer.
4. Color wheel based confrontation network of claim 3The image coloring method is characterized in that the chroma attention module comprises a global interaction module and a local description module, wherein the chroma attention module inputs a feature map F, a high-level feature map S and a gray level image L, the global interaction module executes global interaction operation to generate a first feature map Fg, and the local description module executes local description operation to generate a second feature map F l The first feature map Fg and the second feature map F l Fusing to generate a feature map residual F ', and adding the feature map residual F' back to the feature map F, represented as:
Figure FDA00037777275400000122
Figure FDA0003777727540000021
wherein f (-) represents a non-linear fusion operation,
Figure FDA0003777727540000022
Representing channel dimension stitching operations, CA g Representing global interoperation, CA l Illustrating the partial description operation.
5. The method of coloring images based on color wheel confrontation network of claim 4, wherein the global interactive operation includes reconstructing each region feature point in the feature map F using the weighting of other region feature points and calculating local weight according to semantic similarity between region feature points, expressed as:
Figure FDA0003777727540000023
Figure FDA0003777727540000024
where p and q represent patches centered around pixel positions p and q, respectively, S within the feature map F K And S Q Representing a feature map transformed from the high level feature map S using convolution; and
the local description operation comprises mapping the grayscale image L to a corresponding ab-feature map by a local affine transformation { a, B }, represented as:
F l =A⊙L↓+B
Figure FDA0003777727540000025
Figure FDA0003777727540000026
wherein |, denotes the element-by-element multiplication operator, ↓, denotes a down-sampling operation, Ψ denotes a learnable transform, cov (·,) represents the local covariance, var (·) represents the local variance of the given feature graph,
Figure FDA0003777727540000027
And
Figure FDA0003777727540000028
respectively, the smoothing process for F and L with an averaging filter, and e represents a positive parameter.
6. The method of claim 5, wherein the palette optimization is performed by using the palette histogram as a basis
Figure FDA0003777727540000029
A kernel-weighted sum representation is performed, represented by the following formula:
Figure FDA00037777275400000210
Figure FDA00037777275400000211
wherein C is a (x) And C b (x) Represents the value of pixel x in the a and b channels, respectively, k represents the kernel function, and σ represents a parameter that controls the smoothness of the neighboring bin; and
conducting palette regularization, wherein histogram of palettes is maximized
Figure FDA00037777275400000212
Entropy of
Figure FDA00037777275400000213
To increase color diversity, represented by the formula
Figure FDA0003777727540000031
7. The method for coloring image based on color wheel countermeasure network of claim 6, wherein the color discriminator D outputs the image
Figure FDA0003777727540000032
Conversion to one-dimensional features g ∈ R 256×1 And fusing the one-dimensional features with the palette by inner product and computing an output image
Figure FDA0003777727540000033
Authenticity of
Figure FDA0003777727540000034
Represented by the formula:
Figure FDA0003777727540000035
Figure FDA0003777727540000036
Figure FDA0003777727540000037
where W represents a learnable linear projection.
8. The method for coloring an image based on a color palette countermeasure network as claimed in claim 7, wherein the color palette generator is applied
Figure FDA0003777727540000038
And the palette allocation generator
Figure FDA0003777727540000039
Training, wherein the palette generator is trained
Figure FDA00037777275400000310
The optimization goal for training is expressed as:
Figure FDA00037777275400000311
where the repetition term represents the learning of palette reconstruction, the regularization term represents the learning of palette regularization, λ rec1 And λ rg Representing a balance parameter; and
assigning a generator to the palette
Figure FDA00037777275400000312
The optimization goal for the training is expressed as:
Figure FDA00037777275400000313
Figure FDA00037777275400000314
Figure FDA00037777275400000315
Figure FDA00037777275400000316
wherein, the regression term is used for learning pixel level regression, the regression term is used for learning palette reconstruction, the adaptive term is used for learning countertraining,
Figure FDA00037777275400000317
estimating color maps from Eqn
Figure FDA00037777275400000318
Middle extract of (A) ("lambda reg 、λ rec2 And λ adv The balance parameters are represented by a number of parameters,
Figure FDA00037777275400000319
a training target representing a loss generator against training,
Figure FDA00037777275400000320
a training target representing a loss estimator of the opponent training, where P I Representing the rgb image distribution.
9. The method for coloring images based on color wheel countermeasure network of claim 8, wherein the joint training is performed in a gradual mannerPalette generator
Figure FDA00037777275400000321
And the palette allocation generator
Figure FDA00037777275400000322
10. A computer system, comprising:
a processor configured to execute machine executable instructions; and
memory having stored thereon machine executable instructions which, when executed by the processor, perform the steps of the method according to one of claims 1 to 9.
CN202210924523.XA 2022-08-02 2022-08-02 Image coloring method based on color disc countermeasure network Pending CN115272527A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210924523.XA CN115272527A (en) 2022-08-02 2022-08-02 Image coloring method based on color disc countermeasure network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210924523.XA CN115272527A (en) 2022-08-02 2022-08-02 Image coloring method based on color disc countermeasure network

Publications (1)

Publication Number Publication Date
CN115272527A true CN115272527A (en) 2022-11-01

Family

ID=83746591

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210924523.XA Pending CN115272527A (en) 2022-08-02 2022-08-02 Image coloring method based on color disc countermeasure network

Country Status (1)

Country Link
CN (1) CN115272527A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
FR3142024A1 (en) * 2022-11-14 2024-05-17 Lynred METHOD FOR COLORING AN INFRARED IMAGE

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107730568A (en) * 2017-10-31 2018-02-23 山东师范大学 Color method and device based on weight study
CN109859288A (en) * 2018-12-25 2019-06-07 北京飞搜科技有限公司 Based on the image painting methods and device for generating confrontation network
WO2019153741A1 (en) * 2018-02-07 2019-08-15 京东方科技集团股份有限公司 Image coloring method and apparatus
CN111524205A (en) * 2020-04-23 2020-08-11 北京信息科技大学 Image coloring processing method and device based on loop generation countermeasure network

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107730568A (en) * 2017-10-31 2018-02-23 山东师范大学 Color method and device based on weight study
WO2019153741A1 (en) * 2018-02-07 2019-08-15 京东方科技集团股份有限公司 Image coloring method and apparatus
CN109859288A (en) * 2018-12-25 2019-06-07 北京飞搜科技有限公司 Based on the image painting methods and device for generating confrontation network
CN111524205A (en) * 2020-04-23 2020-08-11 北京信息科技大学 Image coloring processing method and device based on loop generation countermeasure network

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
FR3142024A1 (en) * 2022-11-14 2024-05-17 Lynred METHOD FOR COLORING AN INFRARED IMAGE
WO2024105312A1 (en) * 2022-11-14 2024-05-23 Lynred Method for colorizing an infrared image

Similar Documents

Publication Publication Date Title
Li et al. PDR-Net: Perception-inspired single image dehazing network with refinement
Fu et al. Uncertainty inspired underwater image enhancement
Lim et al. DSLR: Deep stacked Laplacian restorer for low-light image enhancement
CN107818554B (en) Information processing apparatus and information processing method
Bellavia et al. Dissecting and reassembling color correction algorithms for image stitching
Kolesnikov et al. PixelCNN models with auxiliary variables for natural image modeling
CN112581370A (en) Training and reconstruction method of super-resolution reconstruction model of face image
CN112132739A (en) 3D reconstruction and human face posture normalization method, device, storage medium and equipment
Saleh et al. Adaptive uncertainty distribution in deep learning for unsupervised underwater image enhancement
Hu et al. Face hallucination from low quality images using definition-scalable inference
Wang et al. PalGAN: Image colorization with palette generative adversarial networks
Zheng et al. Truncated low-rank and total p variation constrained color image completion and its moreau approximation algorithm
CN114444565A (en) Image tampering detection method, terminal device and storage medium
Ahmed et al. PIQI: perceptual image quality index based on ensemble of Gaussian process regression
Liu et al. Hallucinating color face image by learning graph representation in quaternion space
Chaurasiya et al. Deep dilated CNN based image denoising
CN112651333A (en) Silence living body detection method and device, terminal equipment and storage medium
Yang et al. Blind image quality assessment of natural distorted image based on generative adversarial networks
Xu et al. Generative image completion with image-to-image translation
Liu et al. Attentive semantic and perceptual faces completion using self-attention generative adversarial networks
CN115272527A (en) Image coloring method based on color disc countermeasure network
Bugeau et al. Influence of color spaces for deep learning image colorization
Pajot et al. Unsupervised adversarial image inpainting
Chen et al. Face super resolution based on parent patch prior for VLQ scenarios
Prodan et al. Comprehensive evaluation of metrics for image resemblance

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination