CN112184582B - Attention mechanism-based image completion method and device - Google Patents

Attention mechanism-based image completion method and device Download PDF

Info

Publication number
CN112184582B
CN112184582B CN202011038187.6A CN202011038187A CN112184582B CN 112184582 B CN112184582 B CN 112184582B CN 202011038187 A CN202011038187 A CN 202011038187A CN 112184582 B CN112184582 B CN 112184582B
Authority
CN
China
Prior art keywords
image
loss function
binary mask
completion
attention
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011038187.6A
Other languages
Chinese (zh)
Other versions
CN112184582A (en
Inventor
赫然
马鑫
侯峦轩
黄怀波
王海滨
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Cas Artificial Intelligence Research Qingdao Co ltd
Original Assignee
Cas Artificial Intelligence Research Qingdao Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Cas Artificial Intelligence Research Qingdao Co ltd filed Critical Cas Artificial Intelligence Research Qingdao Co ltd
Priority to CN202011038187.6A priority Critical patent/CN112184582B/en
Publication of CN112184582A publication Critical patent/CN112184582A/en
Application granted granted Critical
Publication of CN112184582B publication Critical patent/CN112184582B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/77Retouching; Inpainting; Scratch removal
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30196Human being; Person
    • G06T2207/30201Face

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Image Analysis (AREA)

Abstract

The invention relates to an image completion method and device based on an attention mechanism, belonging to the technical field of computer image processing, and the method comprises the following processes: step S1, preprocessing the database image data, synthesizing a damaged image by using a binary mask, and taking the damaged image and the corresponding binary mask as the input of a network model; step S2, obtaining a generated confrontation network model capable of image completion through training; and step S3, using the trained generated confrontation network model to perform completion processing on the test data. The invention provides a generation countermeasure network model based on an attention mechanism aiming at the problem of image completion. The binary mask is used as an additional information guide, training learning is carried out by combining with an input image, and the model can enable a completion result to contain rich detail information and can keep structural continuity.

Description

Attention mechanism-based image completion method and device
Technical Field
The disclosure belongs to the technical field of computer image processing, and particularly relates to an image completion method and device based on an attention mechanism.
Background
The statements herein merely provide background information related to the present disclosure and may not necessarily constitute prior art.
Image inpainting refers to the generation of substitute content for missing regions in a given damaged image, and makes the repaired image visually realistic and semantically reasonable. Image completion tasks may be used in other applications, such as image editing, when scene elements distracting from human attention, such as people or objects (which are often unavoidable), are present in an image, allowing a user to remove unwanted elements from the image while filling in blank areas with visually and semantically reasonable content.
The inventor finds that: with the continuous development of science and technology, the demands of people in different fields are correspondingly improved, including movie advertisement animation production, online games and the like, and the vivid image restoration technology has important significance on the good experience of users. Therefore, under the background, an image completion method based on an attention mechanism is developed, so that the repaired image is vivid visually and reasonable semantically, and the method has important significance.
Disclosure of Invention
Aiming at the technical problems in the prior art, the invention provides an image completion method and device based on an attention mechanism.
At least one embodiment of the present disclosure provides an image completion method based on an attention mechanism, including the following steps:
step S1, preprocessing the database image data, synthesizing a damaged image by using a binary mask, and taking the damaged image and the corresponding binary mask as the input of a network model;
step S2, training the input data to obtain a generated confrontation network model capable of image completion;
and step S3, using the trained generated confrontation network model to perform completion processing on the test data.
Further, the database face image and the natural image after the preprocessing in the step S1 are consistent in size; in the image completion task, a damaged image and a corresponding binary mask are combined as input, and an undamaged image is used as a real image label.
Further, the process of generating the countermeasure network model in step S2 includes:
step (ii) ofS21: initializing a network weight parameter in an image completion task; wherein the loss function of the generator is L total The loss function of the discriminator is L D
Step S22: inputting the damaged image and the binary mask image into a generator network G for image completion task, inputting the generated completed image and the target image into a discriminator network D, and sequentially performing iterative training to enable a loss function L of the generator total Loss function L of sum arbiter D All are reduced to tend to be stable;
step S23: and simultaneously training the expression generation and removal tasks until all loss functions are not reduced any more, thereby obtaining a final generation confrontation network model.
Further, the damaged image is denoted as x, and the generated image is denoted as x
Figure GDA0003735352550000022
The target image is denoted as y and the binary mask image is denoted as M.
Further, note that the output value of the local convolution layer in the force mechanism depends on the undamaged area, which is mathematically described as follows:
Figure GDA0003735352550000021
where, an |, indicates pixel level multiplication, and 1 indicates a matrix with all elements 1 and the same shape and M. W represents the parameter of the convolutional layer, F represents the output characteristic diagram of the convolutional layer in the previous layer, b represents the deviation of the convolutional layer, and M represents the corresponding binary mask diagram;
Figure GDA0003735352550000031
is a scaling factor that adjusts the weight of the known region.
The binary mask map M also needs to be updated after the local convolution is performed, which is mathematically described as follows:
Figure GDA0003735352550000032
that is, if the convolutional layer can get an output result according to a valid input, the position in the binary mask is marked as 1.
Further, a dual attention fusion module in an attention mechanism that fuses together known regions and generations, comprising: firstly, channel-level statistical information is obtained:
Figure GDA0003735352550000033
wherein z is c (i, j) is the value of the c-th dimension of z. H GP Representing a global pooling layer. f. of c Representing the c-th dimension in the feature graph F;
then, acquiring the dependency relationship between the channels:
ω=f(W U δ(W D z))
where f and δ are denoted as sigmoid function and ReLU activation function, respectively. W U And W D Is a parameter of the convolutional layer. The obtained channel dimension information ω can be used to adjust the weight of the input:
Figure GDA0003735352550000034
wherein ω is c And f c Respectively representing a scaling factor and a feature map;
second attention is sought for α obtained by:
Figure GDA0003735352550000035
where x' is a different scale version of the damaged image x. A is a learnable variation function, and is composed of a plurality of convolution functions.
Figure GDA0003735352550000036
And x' are first joined and then fed into the convolutional layer. f is a sigmoid function, and alpha can be changed into attentionForce is tested;
finally, the obtained image completion result
Figure GDA0003735352550000041
Figure GDA0003735352550000042
Wherein, l and B denote Hadamard product-sum combining functions, respectively.
Further, the loss function is divided into a structural loss and a texture loss function:
Figure GDA0003735352550000043
Figure GDA0003735352550000044
where k denotes the penalty function computation at the kth layer of the decoder. L is a radical of an alcohol struct Representing the structural loss function, L text Representing the texture loss function, L rec Indicating L between images 1 Norm, L per Representing the perceptual loss function, L style Representing a style loss function, L tv Representing the total variation loss function, L adv Representing the penalty function. Lambda [ alpha ] rec 、λ per 、λ style 、λ tv And λ adv Representing a weighting factor.
At least one embodiment of the present disclosure provides an image completion apparatus based on attention mechanism, the apparatus including
A data processing module: the system comprises a database, a binary mask and a network model, wherein the database is configured to preprocess image data, synthesize a damaged image by using the binary mask, and combine the damaged image and a corresponding binary mask as input of the network model;
a model generation module: the system comprises a generating confrontation network model, a generating and confrontation network model and a control center, wherein the generating and confrontation network model is configured to be trained to obtain a generating and confrontation network model which can be subjected to image completion;
an image completion module: and (4) using the trained generated confrontation network model to perform completion processing on the test data.
Further, the data processing module is configured to make the database face image and the natural image size consistent after preprocessing; in the image completion task, a damaged image and a corresponding binary mask are combined as input, and an undamaged image is used as a real image label.
Further, the model generation module includes: initializing a network weight parameter in an image completion task; wherein the loss function of the generator is L total The loss function of the discriminator is L D (ii) a Inputting the damaged image and the binary mask image into a generator network G for image completion task, inputting the generated completed image and the target image into a discriminator network D, and sequentially performing iterative training to enable a loss function L of the generator total Loss function L of sum discriminator D All are reduced to tend to be stable; and simultaneously training the expression generation and removal tasks until all loss functions are not reduced any more, thereby obtaining a final generation confrontation network model.
The beneficial effects of this disclosure are as follows:
(1) in order to improve the generation quality (including rich texture details and structural continuity) of an image in an image completion task, the image completion method based on the attention mechanism is provided. Through the local convolution layer, the generation countermeasure network can utilize the prior information of the binary mask, and the quality of the generated image is improved. By means of the dual attention module, a multi-scale decoder is formed, and high-resolution images can be gradually generated.
(2) The image completion method introduces a reconstruction loss function, a style loss function, a total variation loss function and an antagonistic loss function as constraints at an image level and a characteristic level, and improves the robustness and accuracy of the network.
Drawings
The accompanying drawings, which are included to provide a further understanding of the disclosure, illustrate embodiments of the disclosure and together with the description serve to explain the disclosure and are not to limit the disclosure.
The image completion method based on the attention mechanism provided by the embodiment of the disclosure is a flow chart;
fig. is a flowchart of the dual attention module provided in the embodiment of the present disclosure;
fig. three is a diagram illustrating the effect of image completion on the public data set according to the embodiment of the present disclosure.
Detailed Description
It should be noted that the following detailed description is exemplary and is intended to provide further explanation of the disclosure. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this disclosure belongs.
The embodiment of the disclosure provides an attention mechanism-based image completion method, which comprises the following steps:
step S1, preprocessing the database image data, synthesizing the damaged image by using a binary mask, and taking the damaged image and the corresponding binary mask as the input of the network model;
specifically, a binary mask map is first generated offline using a binary mask algorithm. For the face image, normalizing the image according to the positions of the two eyes and cutting the image to be 256 × 256 with a uniform size; for natural images, the image size is first enlarged to 350 × 350, and then the enlarged image is randomly cropped to a uniform size 256 × 256. And randomly selecting an off-line generated binary mask image, and multiplying the off-line generated binary mask image by the damaged image to obtain the damaged image.
Further, in step S1, the size of the preprocessed database face image is consistent with that of the natural image, and meanwhile, in the next image completion task, the damaged image and the corresponding binary mask are combined as input, and the undamaged image is used as a real image label.
And step S2, training the generated confrontation network model based on the attention mechanism by using the training input data so as to complete the image completion task.
It should be noted that, in this step, in order to expand the sample size of the input data and improve the generalization capability of the network, the embodiment may employ data augmentation operations, including random inversion, so as to increase the number of the order training data.
Specifically, the step S2 includes:
step S21: initializing the network weight parameter in the image completion task, wherein the loss function of the generator is L total The penalty function of the arbiter is L D
Step S22: inputting the damaged image and the binary mask image into a generator network G for image completion task, inputting the generated completed image and the target image into a discriminator network D, and sequentially performing iterative training to enable a loss function L of the generator total Loss function L of sum discriminator D All reduce to tend to be stable;
step S23: and simultaneously training the expression generation and task removal until all loss functions are not reduced any more, thereby obtaining a final generated confrontation network model.
Further, assuming that the lesion image is denoted as x, the generated image is denoted as x
Figure GDA0003735352550000071
The target image is denoted as y and the binary mask image is denoted as M, then the output values of the local convolution layer in the above-mentioned attention mechanism depend on the undamaged area, which is mathematically described as follows:
Figure GDA0003735352550000072
wherein, l indicates a pixel-level multiplication, and 1 indicates a matrix in which all elements are 1 and the shape and M are the same. W represents a parameter of a convolutional layer, F represents an output characteristic diagram of a convolutional layer of a previous layer, b represents a deviation of the convolutional layer, and M represents a corresponding binary mask diagram.
Figure GDA0003735352550000073
Is a scaling factor that adjusts the weight of the known region.
The present embodiment also needs to update the binary mask map M after performing the partial convolution, and the mathematical description thereof is as follows:
Figure GDA0003735352550000074
that is, if the convolutional layer can get an output result according to a valid input, the position in the binary mask is marked as 1.
Further, in step S2, the dual attention fusion module in the attention mechanism fuses the known region and the generation region together, including:
first of all the statistical information at the channel level is obtained,
Figure GDA0003735352550000081
wherein z is c (i, j) is the value of the c-th dimension of z. H GP Representing a global pooling layer. f. of c Representing the c-th dimension in the feature map F.
Then, the dependency relationship between the channels is obtained,
ω=f(W U δ(W D z))
where f and δ are denoted sigmoid function and ReLU activation function, respectively. W U And W D Is a parameter of the convolutional layer. The obtained channel dimension information ω can be used to adjust the weight of the input:
Figure GDA0003735352550000082
wherein omega c And f c Respectively representing a scaling factor and a feature map.
Next, the attention map α is obtained by:
Figure GDA0003735352550000083
where x' is a different scale version of the damaged image x. A is a learnable variation function, which is composed of a plurality of convolution functions.
Figure GDA0003735352550000084
And x' are first joined and then fed into the convolutional layer. f is a sigmoid function, and alpha can be changed into an attention diagram.
Finally, the obtained image completion result
Figure GDA0003735352550000085
Figure GDA0003735352550000086
Wherein, the |, and B denote Hadamard product-sum join functions, respectively.
Therefore, the present embodiment provides a method with a wider application meaning for the problem of image completion. According to the method, the damaged image can be completed more accurately by using the prior information of the binary mask through the local convolution layer, and in addition, the resolution of the generated image can be gradually increased through the dual attention fusion module, so that rich detail information is continuously generated.
Further, the objective function in the image completion task in this embodiment is divided into a structure loss function and a texture loss function, which are expressed as follows:
Figure GDA0003735352550000091
Figure GDA0003735352550000092
where k denotes the penalty function computation at the kth layer of the decoder. L is struct Representing the structural loss function, L text Representing the texture loss function, L rec Indicating L between images 1 Norm, L per Representing the perceptual loss function, L style Representing a style loss function, L tv Representing the total variation loss function, L adv Representing the penalty function. Lambda rec 、λ per 、λ style 、λ tv And λ adv Representing a weighting factor.
Wherein the reconstruction loss function in the structural loss function is represented as:
Figure GDA0003735352550000093
wherein | 1 Represents L 1 And (4) norm.
Figure GDA0003735352550000094
cat represents the linking operation;
wherein the perceptual loss function of the texture loss function is represented as:
Figure GDA0003735352550000095
where φ is the pre-trained VGG-16 network. Phi is a unit of i And outputting the characteristic map of the ith pooling layer. The pool-1, pool-2 and pool-3 layers of VGG-16 are used in the present invention.
The lattice loss function in the texture loss function is expressed as:
Figure GDA0003735352550000096
wherein C is i The number of channels of the feature map representing the i-th layer output of the pre-trained model VGG-16.
The total variation loss function in the texture loss function is expressed as:
Figure GDA0003735352550000097
where omega represents a damaged area in the image. The total variation loss function is a smooth penalty term defined in the dilation domain of a pixel in the missing region.
The penalty function in the texture loss function is expressed as:
Figure GDA0003735352550000101
where D denotes a discriminator. y 'is a randomly scaled version of a sample sampled from y' and y. In the present invention, λ is set to 10.
The total loss function of this embodiment is defined as:
Figure GDA0003735352550000102
where P and Q are the number of layers of the decoder chosen.
The generation countermeasure network based on the attention mechanism mainly completes the image completion task, and the final goal of the generation countermeasure network is L total The loss function is minimized and stabilized.
The attention-based mechanism of generation confrontation network is trained as follows:
step S21: initializing a weight parameter of the network, wherein λ rec 、λ per 、λ style 、λ tv And λ adv 6, 0.1, 240, 0.1, 0.001, respectively, batch size 32, learning rate 10 -4 P and Q are {1,2,3,4,5,6} and {1,2,3} respectively.
Step S22: and inputting the combined damaged image and the binary mask image into a generator G for image completion. The generated complete image and the real target image are input into a discriminator D, and the iteration is carried out in sequence to ensure that the network total loss function L total And decreases to tend to stabilize.
It should be noted that, in the embodiment of the present disclosure, an encoder is used to extract features from input data, and a decoder is used to decode the obtained hidden code into an image. And the dual attention fusion module outputs a final completion image. In this example, the encoder and decoder each consist of 8 convolutional layers. Wherein, the sizes of the convolution layer filters in the encoder are respectively 7, 5, 3, 3, 3, 3, 3 and 3; the convolutional layer filters in the decoder are all 3 in size. In the present example, the feature map is upsampled using conventional methods. The number of layers of the convolutional layers and the number and size of the filters in each convolutional layer can be selected and set according to actual conditions. In the discriminator, a convolution neural network structure is adopted to take the real image pair and the generated complementary image pair as input, and the output adopts a block countermeasure loss function to judge whether the real image pair and the generated complementary image pair are true or false.
The embodiment of the disclosure provides that the local convolution layer utilizes the prior information in the binary mask map for the task of image completion by utilizing the high nonlinear fitting capability of the generated countermeasure network based on the attention mechanism. Second, the embodiments of the present disclosure provide a dual attention fusion module, which can form a multi-scale encoder. The encoder may gradually increase the texture detail in the generated image. In particular, the network advantageously produces high quality images with the constraint of an applied loss function. Thus, a model with image completion can be trained by the network shown in fig. one. In the testing stage, the binary mask and the damaged image are also used as the input of the model, and the generated image completion result is obtained, as shown in fig. three.
Step S3: and (4) performing completion processing on the test data by using a trained attention-based generation countermeasure network model.
To illustrate the specific implementation of the disclosed embodiment in detail and to verify the validity of the disclosed method, we apply the method proposed in this embodiment to four public databases (one face database and three nature databases) -CelebA-HQ, ImageNet, Places2 and pair Street View. CelebA-HQ contains 30000 high-quality face images. The plants 2 contained 365 scenes, with a total number of images exceeding 8000000. A Pairs Street View contains 15000 Paris Street View maps. ImageNet is a large data set, exceeding 14 hundred million images. For Places2, Pairs Street View, and ImageNet, the original validation and test set was used in the present invention. For CelebA-HQ, 28000 images were randomly selected for training and the remaining images were used for testing in the present invention. 60000 binary mask graphs are generated off line by using a binary mask algorithm. 55000 binary mask images are randomly selected for training, and the rest 5000 binary mask images are used for testing (the binary mask images are used for generating damaged images). The method comprises the steps of using a generated confrontation network based on an attention mechanism and an objective function designed in the invention, taking a damaged image and a corresponding binary mask image as input, and training the deep neural network by using confrontation and gradient back propagation between a generator and a discriminator. And continuously adjusting the weights of different tasks in the training process until the network converges finally to obtain the model for editing the facial expressions.
In order to test the effectiveness of the model, the image completion operation is performed by using the test set data, and the visualization result is shown in fig. three. This embodiment effectively proves that the method proposed by the present invention can generate high quality images.
Finally, it should be noted that the above embodiments are only used for illustrating the technical solutions of the present disclosure and not for limiting, and although the present disclosure is described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that modifications or equivalent substitutions can be made on the technical solutions of the present disclosure without departing from the spirit and scope of the technical solutions, which should be covered by the claims of the present disclosure.
Although the present disclosure has been described with reference to specific embodiments, it should be understood that the scope of the present disclosure is not limited thereto, and those skilled in the art will appreciate that various modifications and changes can be made without departing from the spirit and scope of the present disclosure.

Claims (8)

1. An image completion method based on an attention mechanism is characterized by comprising the following processes:
step S1, preprocessing the database image data, synthesizing a damaged image by using a binary mask, and combining the damaged image and the corresponding binary mask as input data of a network model;
step S2, training the input data to obtain a generated confrontation network model capable of image completion;
s3, using the trained generated confrontation network model to perform completion processing on the test data;
step S2 includes:
step S21: initializing a network weight parameter in an image completion task; wherein the loss function of the generator is L total The loss function of the discriminator is L D
Step S22: inputting the damaged image and the binary mask image into a generator network G for image completion task, inputting the generated completed image and the target image into a discriminator network D, and sequentially performing iterative training to enable a loss function L of the generator total Loss function L of sum arbiter D All reduce to tend to be stable;
step S23: training the expression generation and removal tasks at the same time until all loss functions are not reduced any more, thereby obtaining a final generation confrontation network model;
a dual attention fusion module in an attention mechanism that fuses together known regions and generations, comprising: firstly, channel-level statistical information is obtained:
Figure FDA0003735352540000011
wherein z is c (i, j) is the value of the c-th dimension of z, H GP Representing a global pooling layer, f c Representing the c-th dimension in the feature map F;
then, acquiring the dependency relationship between the channels:
ω=f(W U δ(W D z))
where f and δ are denoted sigmoid function and ReLU activation function, W, respectively U And W D Is a parameter of the convolutional layer, the obtained channel dimension information ω can be used to adjust the input weight:
Figure FDA0003735352540000021
wherein ω is c And f c Respectively representing a scaling factor and a feature map;
second attention is sought for α obtained by:
Figure FDA0003735352540000022
where x' is a version of the corrupted image x at different scales, A is a learnable variation function, consisting of multiple convolution functions,
Figure FDA0003735352540000023
and x' are firstly connected and then sent into the convolutional layer, wherein f is a sigmoid function and can change alpha into an attention map;
finally, the obtained image completion result
Figure FDA0003735352540000024
Figure FDA0003735352540000025
Wherein, l and B denote Hadamard product-sum combining functions, respectively.
2. The attention-based image completion method according to claim 1, wherein the database face image is identical in size to the natural image after the preprocessing in step S1; in the image completion task, a damaged image and a corresponding binary mask are combined as input, and an undamaged image is used as a real image label.
3. The attention-based image completion method according to claim 1,wherein the damaged image is recorded as x, and the generated image is recorded as x
Figure FDA0003735352540000026
The target image is denoted as y and the binary mask image is denoted as M.
4. The attention-based mechanism image completion method according to claim 1, wherein the output value of the local convolution layer in the attention-based mechanism depends on the undamaged area, and is mathematically described as follows:
Figure FDA0003735352540000027
wherein, 1 denotes a matrix having all elements of 1 and the same shape as M, W denotes a parameter of a convolution layer, F denotes an output characteristic diagram of a previous convolution layer, b denotes a deviation of the convolution layer, and M denotes a corresponding binary mask diagram;
Figure FDA0003735352540000031
is a scaling factor, adjusts the weight of the known region;
the binary mask map M also needs to be updated after the local convolution is performed, which is mathematically described as follows:
Figure FDA0003735352540000032
that is, if the convolutional layer can get an output result according to a valid input, the position in the binary mask is marked as 1.
5. The attention-based image completion method according to claim 1, wherein the loss function L of the generator total A structural loss and texture loss function is divided:
Figure FDA0003735352540000033
Figure FDA0003735352540000034
where k denotes the calculation of the loss function at the k-th layer of the decoder, L struct Representing the structural loss function, L text Representing the texture loss function, L rec Indicating L between images 1 Norm, L per Representing the perceptual loss function, L style Representing a style loss function, L tv Representing the total variation loss function, L adv Representing a function of antagonistic losses, λ rec 、λ per 、λ style 、λ tv And λ adv Representing a weighting factor.
6. An image complementing device based on an attention mechanism, comprising:
a data processing module: the system comprises a database, a binary mask and a network model, wherein the database is configured to preprocess image data, synthesize a damaged image by using the binary mask, and combine the damaged image and a corresponding binary mask as input of the network model;
a model generation module: the method comprises the steps that a generated confrontation network model capable of image completion is obtained through training;
an image completion module: using the trained generated confrontation network model to perform completion processing on the test data;
the preparation steps for generating the confrontation network model are as follows:
step S21: initializing a network weight parameter in an image completion task; wherein the loss function of the generator is L total The penalty function of the arbiter is L D
Step S22: inputting the damaged image and the binary mask image into a generator network G for image completion task, inputting the generated completed image and the target image into a discriminator network D, and sequentially performing iterative training to enable a loss function L of the generator total Loss function L of sum discriminator D All reduce to tend to be stable;
step S23: simultaneously training the expression generation and task removal until all loss functions are not reduced any more, thereby obtaining a final generated confrontation network model;
a dual attention fusion module in an attention mechanism that fuses known regions and generations together, comprising: firstly, channel-level statistical information is obtained:
Figure FDA0003735352540000041
wherein z is c (i, j) is the value of the c-th dimension of z, H GP Representing a global pooling layer, f c Representing the c-th dimension in the feature graph F;
then, acquiring the dependency relationship between the channels:
ω=f(W U δ(W D z))
where f and δ are denoted sigmoid function and ReLU activation function, W, respectively U And W D Is a parameter of the convolutional layer, the obtained channel dimension information ω can be used to adjust the input weight:
Figure FDA0003735352540000042
wherein ω is c And f c Respectively representing a scaling factor and a feature map;
second attention is sought for α obtained by:
Figure FDA0003735352540000043
where x' is a version of the corrupted image x at different scales, A is a learnable variation function, consisting of multiple convolution functions,
Figure FDA0003735352540000051
and x' are firstly connected and then sent into the convolutional layer, wherein f is a sigmoid function and can change alpha into an attention map;
finally, the obtained image completion result
Figure FDA0003735352540000052
Figure FDA0003735352540000053
Wherein, the |, and B denote Hadamard product-sum join functions, respectively.
7. The attention-based image completion apparatus according to claim 6, wherein the data processing module is configured to pre-process the database face image and the natural image to have the same size; in the image completion task, a damaged image and a corresponding binary mask are combined to be used as input, and an undamaged image is used as a real image label.
8. The attention-based image completion apparatus according to claim 6, wherein the model generation module comprises: initializing a network weight parameter in an image completion task; wherein the loss function of the generator is L total The loss function of the discriminator is L D (ii) a Inputting the damaged image and the binary mask image into a generator network G for image completion task, inputting the generated completed image and the target image into a discriminator network D, and sequentially performing iterative training to enable a loss function L of the generator total Loss function L of sum discriminator D All reduce to tend to be stable; and simultaneously training the expression generation and removal tasks until all loss functions are not reduced any more, thereby obtaining a final generation confrontation network model.
CN202011038187.6A 2020-09-28 2020-09-28 Attention mechanism-based image completion method and device Active CN112184582B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011038187.6A CN112184582B (en) 2020-09-28 2020-09-28 Attention mechanism-based image completion method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011038187.6A CN112184582B (en) 2020-09-28 2020-09-28 Attention mechanism-based image completion method and device

Publications (2)

Publication Number Publication Date
CN112184582A CN112184582A (en) 2021-01-05
CN112184582B true CN112184582B (en) 2022-08-19

Family

ID=73944421

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011038187.6A Active CN112184582B (en) 2020-09-28 2020-09-28 Attention mechanism-based image completion method and device

Country Status (1)

Country Link
CN (1) CN112184582B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112884673A (en) * 2021-03-11 2021-06-01 西安建筑科技大学 Reconstruction method for missing information between coffin chamber mural blocks of improved loss function SinGAN
CN113129234B (en) * 2021-04-20 2022-11-01 河南科技学院 Incomplete image fine restoration method based on intra-field and extra-field feature fusion
CN113221757B (en) * 2021-05-14 2022-09-02 上海交通大学 Method, terminal and medium for improving accuracy rate of pedestrian attribute identification
CN113962893B (en) * 2021-10-27 2024-07-09 山西大学 Face image restoration method based on multiscale local self-attention generation countermeasure network

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110288537A (en) * 2019-05-20 2019-09-27 湖南大学 Facial image complementing method based on the depth production confrontation network from attention
CN110458939A (en) * 2019-07-24 2019-11-15 大连理工大学 The indoor scene modeling method generated based on visual angle
CN111127346A (en) * 2019-12-08 2020-05-08 复旦大学 Multi-level image restoration method based on partial-to-integral attention mechanism

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018053340A1 (en) * 2016-09-15 2018-03-22 Twitter, Inc. Super resolution using a generative adversarial network
TWI682359B (en) * 2018-01-29 2020-01-11 國立清華大學 Image completion method

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110288537A (en) * 2019-05-20 2019-09-27 湖南大学 Facial image complementing method based on the depth production confrontation network from attention
CN110458939A (en) * 2019-07-24 2019-11-15 大连理工大学 The indoor scene modeling method generated based on visual angle
CN111127346A (en) * 2019-12-08 2020-05-08 复旦大学 Multi-level image restoration method based on partial-to-integral attention mechanism

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
Dual Attention Matching Network for Context-Aware Feature Sequence Based Person Re-identification;Jianlou Si et al.;《2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition》;20181217;第5363-5372页 *
Local Class-Specific and Global Image-Level Generative Adversarial Networks for Semantic-Guided Scene Generation;Hao Tang et al.;《2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)》;20200805;第7867-7876页 *

Also Published As

Publication number Publication date
CN112184582A (en) 2021-01-05

Similar Documents

Publication Publication Date Title
CN112184582B (en) Attention mechanism-based image completion method and device
CN111681252B (en) Medical image automatic segmentation method based on multipath attention fusion
CN112686817B (en) Image completion method based on uncertainty estimation
CN112686816A (en) Image completion method based on content attention mechanism and mask code prior
CN111815523A (en) Image restoration method based on generation countermeasure network
CN112818764B (en) Low-resolution image facial expression recognition method based on feature reconstruction model
CN110728219A (en) 3D face generation method based on multi-column multi-scale graph convolution neural network
CN111861945B (en) Text-guided image restoration method and system
CN111986075B (en) Style migration method for target edge clarification
CN111242841A (en) Image background style migration method based on semantic segmentation and deep learning
CN111915522A (en) Image restoration method based on attention mechanism
CN109447897B (en) Real scene image synthesis method and system
CN112801914A (en) Two-stage image restoration method based on texture structure perception
CN116704079B (en) Image generation method, device, equipment and storage medium
CN117788629B (en) Image generation method, device and storage medium with style personalization
CN116777764A (en) Diffusion model-based cloud and mist removing method and system for optical remote sensing image
CN114821050A (en) Named image segmentation method based on transformer
CN110415261B (en) Expression animation conversion method and system for regional training
CN109829857B (en) Method and device for correcting inclined image based on generation countermeasure network
CN111368734A (en) Micro expression recognition method based on normal expression assistance
CN110599495A (en) Image segmentation method based on semantic information mining
CN110782503B (en) Face image synthesis method and device based on two-branch depth correlation network
CN117611428A (en) Fashion character image style conversion method
Yu et al. MagConv: Mask-guided convolution for image inpainting
CN114331894A (en) Face image restoration method based on potential feature reconstruction and mask perception

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant