CN112184582A - Attention mechanism-based image completion method and device - Google Patents

Attention mechanism-based image completion method and device Download PDF

Info

Publication number
CN112184582A
CN112184582A CN202011038187.6A CN202011038187A CN112184582A CN 112184582 A CN112184582 A CN 112184582A CN 202011038187 A CN202011038187 A CN 202011038187A CN 112184582 A CN112184582 A CN 112184582A
Authority
CN
China
Prior art keywords
image
loss function
completion
attention
binary mask
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202011038187.6A
Other languages
Chinese (zh)
Other versions
CN112184582B (en
Inventor
赫然
马鑫
侯峦轩
黄怀波
王海滨
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Cas Artificial Intelligence Research Qingdao Co ltd
Original Assignee
Cas Artificial Intelligence Research Qingdao Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Cas Artificial Intelligence Research Qingdao Co ltd filed Critical Cas Artificial Intelligence Research Qingdao Co ltd
Priority to CN202011038187.6A priority Critical patent/CN112184582B/en
Publication of CN112184582A publication Critical patent/CN112184582A/en
Application granted granted Critical
Publication of CN112184582B publication Critical patent/CN112184582B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/77Retouching; Inpainting; Scratch removal
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30196Human being; Person
    • G06T2207/30201Face

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Image Analysis (AREA)

Abstract

The invention relates to an image completion method and device based on an attention mechanism, belonging to the technical field of computer image processing, and the method comprises the following processes: step S1, preprocessing the database image data, synthesizing a damaged image by using a binary mask, and taking the damaged image and the corresponding binary mask as the input of a network model; step S2, obtaining a generated confrontation network model capable of image completion through training; and step S3, using the trained generated confrontation network model to perform completion processing on the test data. The invention provides a generation countermeasure network model based on an attention mechanism aiming at the problem of image completion. The binary mask is used as an additional information guide, training learning is carried out by combining with an input image, and the model can enable a completion result to contain rich detail information and can keep structural continuity.

Description

Attention mechanism-based image completion method and device
Technical Field
The disclosure belongs to the technical field of computer image processing, and particularly relates to an image completion method and device based on an attention mechanism.
Background
The statements herein merely provide background related to the present disclosure and may not necessarily constitute prior art.
Image inpainting refers to the generation of substitute content for missing regions in a given damaged image, and makes the repaired image visually realistic and semantically reasonable. Image completion tasks may be used in other applications, such as image editing, when scene elements distracting from human attention, such as people or objects (which are often unavoidable), are present in an image, allowing a user to remove unwanted elements from the image while filling in blank areas with visually and semantically reasonable content.
The inventor finds that: with the continuous development of science and technology, the demands of people in different fields are correspondingly improved, including movie advertisement animation production, online games and the like, and the vivid image restoration technology has important significance on the good experience of users. Therefore, under the background, an image completion method based on an attention mechanism is developed, so that the repaired image is vivid visually and reasonable semantically, and the method has important significance.
Disclosure of Invention
Aiming at the technical problems in the prior art, the disclosure provides an attention mechanism-based image completion method and device.
At least one embodiment of the present disclosure provides an image completion method based on an attention mechanism, including the following steps:
step S1, preprocessing the database image data, synthesizing a damaged image by using a binary mask, and taking the damaged image and the corresponding binary mask as the input of a network model;
step S2: training input data to obtain a generated confrontation network model capable of performing image completion;
step S3: and (4) using the trained generated confrontation network model to perform completion processing on the test data.
Further, the database face image and the natural image after the preprocessing in the step S1 are consistent in size; in the image completion task, a damaged image and a corresponding binary mask are combined to be used as input, and an undamaged image is used as a real image label.
Further, the process of generating the countermeasure network model in step S2 includes:
step S21: initializing a network weight parameter in an image completion task; wherein the loss function of the generator is LtotalThe loss function of the discriminator is LD
Step S22: inputting the damaged image and the binary mask image into a generator network G for image completion task, inputting the generated completed image and the target image into a discriminator network D, and sequentially performing iterative training to enable a loss function L of the generatortotalLoss function L of sum discriminatorDAll reduce to tend to be stable;
step S23: and simultaneously training the expression generation and removal tasks until all loss functions are not reduced any more, thereby obtaining a final generation confrontation network model.
Further, the damaged image is denoted as x, and the generated image is denoted as x
Figure BDA0002705792360000021
The target image is denoted as y and the binary mask image is denoted as M.
Further, note that the output value of the local convolution layer in the force mechanism depends on the undamaged area, which is mathematically described as follows:
Figure BDA0002705792360000022
wherein, l indicates a pixel-level multiplication, and 1 indicates a matrix in which all elements are 1 and the shape and M are the same. W represents the parameter of the convolutional layer, F represents the output characteristic diagram of the convolutional layer of the previous layer, b represents the deviation of the convolutional layer, and M represents the corresponding binary mask diagram;
Figure BDA0002705792360000031
which can be considered as a scaling factor, adjusts the weight of the known region.
After the partial convolution is performed again, the binary mask map M also needs to be updated, and the mathematical description thereof is as follows:
Figure BDA0002705792360000032
that is, if the convolutional layer can get an output result according to a valid input, the position in the binary mask is marked as 1.
Further, a dual attention fusion module in an attention mechanism that fuses together known regions and generations, comprising: firstly, channel-level statistical information is obtained:
Figure BDA0002705792360000033
wherein z isc(i, j) is the value of the c-th dimension of z. HGPRepresenting a global pooling layer. f. ofcRepresenting the c-th dimension in the feature map F;
then, acquiring the dependency relationship between the channels:
ω=f(WU(WDz))
where f and are denoted sigmoid function and ReLU activation function, respectively. WUAnd WDIs a parameter of the lap layer. The obtained channel dimension information ω can be used to adjust the weight of the input:
Figure BDA0002705792360000034
wherein ω iscAnd fcRespectively representing a scaling factor and a feature map;
second attention is sought for α obtained by:
Figure BDA0002705792360000035
where x' is a different scale version of the damaged image x. A is a learnable variation function, and is composed of a plurality of convolution functions.
Figure BDA0002705792360000036
And x' are first joined and then fed into the convolutional layer. f is a sigmoid function which can change alpha into an attention diagram to some extent;
finally, the final image completion result is obtained
Figure BDA0002705792360000041
Figure BDA0002705792360000042
Wherein, l and B denote Hadamard product-sum combining functions, respectively.
Further, the loss function is divided into a structure loss and a texture loss function:
Figure BDA0002705792360000043
Figure BDA0002705792360000044
where k denotes the penalty function computation at the kth layer of the decoder. L isstructRepresenting the structural loss function, LtextRepresenting the texture loss function, LrecIndicating L between images1Norm, LperRepresenting the perceptual loss function, LstyleRepresenting a style loss function, LtvRepresenting the total variation loss function, LadvRepresenting the penalty function. Lambda [ alpha ]rec、λper、λstyle、λtvAnd λadvRepresenting a weighting factor.
At least one embodiment of the present disclosure provides an image completion apparatus based on attention mechanism, the apparatus including
A data processing module: the method comprises preprocessing database image data, synthesizing damaged images with binary masks, and inputting the damaged images and the corresponding binary masks as network models;
a model generation module: it is formulated to get through training and can carry on the generating of the image completion and resist the network model;
an image completion module: and (4) using the trained generated confrontation network model to perform completion processing on the test data.
Further, the size of the database face image after the data processing module is prepared and preprocessed is consistent with that of the natural image; in the image completion task, a damaged image and a corresponding binary mask are combined to be used as input, and an undamaged image is used as a real image label.
Further, the model generation module includes: initializing a network weight parameter in an image completion task; wherein, the loss function of the generator is Ltotal, and the loss function of the discriminator is LD; inputting the damaged image and the binary mask image into a generator network G for image completion task, inputting the generated completed image and the target image into a discriminator network D, and sequentially carrying out iterative training to reduce the loss function Ltotal of the generator and the loss function LD of the discriminator to be stable; and simultaneously training the expression generation and removal tasks until all loss functions are not reduced any more, thereby obtaining a final generation confrontation network model.
The beneficial effects of this disclosure are as follows:
(1) in order to improve the generation quality (including rich texture details and structural continuity) of an image in an image completion task, the image completion method based on the attention mechanism is provided. Through the local convolution layer, the generation countermeasure network can utilize the prior information of the binary mask, and the quality of the generated image is improved. By means of the dual attention module, a multi-scale decoder is formed, and high-resolution images can be gradually generated.
(2) The image completion method introduces a reconstruction loss function, a style loss function, a total variation loss function and an antagonistic loss function as constraints at an image level and a characteristic level, and improves the robustness and accuracy of the network.
Drawings
The accompanying drawings, which are included to provide a further understanding of the disclosure, illustrate embodiments of the disclosure and together with the description serve to explain the disclosure and are not to limit the disclosure.
FIG. 1 is a flow chart of an attention-based image completion method provided by an embodiment of the present disclosure;
FIG. 2 is a flow diagram of a dual attention module provided by embodiments of the present disclosure;
fig. 3 is a diagram illustrating the effect of image completion on a public data set provided by an embodiment of the present disclosure.
Detailed Description
It should be noted that the following detailed description is exemplary and is intended to provide further explanation of the disclosure. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this disclosure belongs.
The embodiment of the disclosure provides an image completion method based on an attention mechanism, which comprises the following steps:
step S1, preprocessing the database image data, synthesizing the damaged image by using a binary mask, and taking the damaged image and the corresponding binary mask as the input of the network model;
specifically, a binary mask map is first generated offline using a binary mask algorithm. For the face image, normalizing the image according to the positions of the two eyes and cutting the image to be 256 × 256 with a uniform size; for natural images, the image size is first enlarged to 350 × 350, and then the enlarged image is randomly cropped to a uniform size of 256 × 256. And randomly selecting an off-line generated binary mask image, and multiplying the binary mask image by the damaged image to obtain the damaged image.
Further, in step S1, the size of the preprocessed database face image is consistent with that of the natural image, and meanwhile, in the next image completion task, the damaged image and the corresponding binary mask are combined as input, and the undamaged image is used as a real image label.
And step S2, training the generated confrontation network model based on the attention mechanism by using the training input data so as to complete the image completion task.
It should be noted that, in this step, in order to expand the sample size of the input data and improve the generalization capability of the network, the embodiment may employ data augmentation operations, including random inversion, so as to increase the number of the sequential training data.
Specifically, the step S2 includes:
step S21: initializing network weight parameters in the image completion task, wherein the loss function of the generator is LtotalThe loss function of the discriminator is LD
Step S22: inputting the damaged image and the binary mask image into a generator network G for image completion task, inputting the generated completed image and the target image into a discriminator network D, and sequentially performing iterative training to enable a loss function L of the generatortotalAnd the loss function LD of the discriminator are both reduced to be stable;
step S23: and simultaneously training the expression generation and removal tasks until all loss functions are not reduced any more, thereby obtaining a final generation confrontation network model.
Further, assuming that the lesion image is denoted as x, the generated image is denoted as x
Figure BDA0002705792360000071
The target image is denoted as y and the binary mask image is denoted as M, then the output values of the local convolution layer in the above-mentioned attention mechanism depend on the undamaged area, which is mathematically described as follows:
Figure BDA0002705792360000072
wherein, l indicates a pixel-level multiplication, and 1 indicates a matrix in which all elements are 1 and the shape and M are the same. W represents a parameter of a convolutional layer, F represents an output characteristic diagram of a convolutional layer of a previous layer, b represents a deviation of the convolutional layer, and M represents a corresponding binary mask diagram.
Figure BDA0002705792360000073
Which can be considered as a scaling factor, adjusts the weight of the known region.
The present embodiment also needs to update the binary mask map M after performing the partial convolution, and the mathematical description thereof is as follows:
Figure BDA0002705792360000074
that is, if the convolutional layer can get an output result according to a valid input, the position in the binary mask is marked as 1.
Further, in step S2, the dual attention fusion module in the attention mechanism fuses the known region and the generated region together, including:
the statistical information at the channel level is obtained first,
Figure BDA0002705792360000081
wherein z isc(i, j) is the value of the c-th dimension of z. HGPRepresenting a global pooling layer. f. ofcRepresenting the c-th dimension in the feature map F.
Then, the dependency relationship between the channels is obtained,
ω=f(WU(WDz))
where f and are denoted sigmoid function and ReLU activation function, respectively. WUAnd WDIs a parameter of the lap layer. The obtained channel dimension information ω can be used to adjust the weight of the input:
Figure BDA0002705792360000082
wherein ω iscAnd fcRespectively representing a scaling factor and a feature map.
Second, attention is sought for α obtained by:
Figure BDA0002705792360000083
where x' is a different scale version of the damaged image x. A is a learnable variation function, and is composed of a plurality of convolution functions.
Figure BDA0002705792360000084
And x' are first joined and then fed into the convolutional layer. f is a sigmoid function, and alpha can be changed into an attention map to some extent.
Finally, the final image completion result is obtained
Figure BDA0002705792360000085
Figure BDA0002705792360000086
Wherein, l and B denote Hadamard product-sum combining functions, respectively.
Therefore, the present embodiment provides a method with a wider application meaning for the problem of image completion. According to the method, the damaged image can be completed more accurately by using the prior information of the binary mask through the local convolution layer, and in addition, the resolution of the generated image can be gradually increased through the dual attention fusion module, so that rich detail information is continuously generated.
Further, the objective function in the image completion task in this embodiment is divided into a structure loss function and a texture loss function, which are expressed as follows:
Figure BDA0002705792360000091
Figure BDA0002705792360000092
where k denotes the penalty function computation at the kth layer of the decoder. L isstructRepresenting the structural loss function, LtextRepresenting the texture loss function, LrecIndicating L between images1Norm, LperRepresenting the perceptual loss function, LstyleRepresenting a style loss function, LtvRepresenting the total variation loss function, LadvRepresenting the penalty function. Lambda [ alpha ]rec、λper、λstyle、λtvAnd λadvRepresenting a weighting factor.
Wherein the reconstruction loss function in the structural loss function is represented as:
Figure BDA0002705792360000093
wherein | · | purple sweet1Represents L1And (4) norm.
Figure BDA0002705792360000094
cat represents the linking operation;
wherein the perceptual loss function of the texture loss function is represented as:
Figure BDA0002705792360000095
where φ is the pre-trained VGG-16 network. Phi is aiAnd outputting the characteristic map of the ith pooling layer. The pool-1, pool-2 and pool-3 layers of VGG-16 are used in the present invention.
The lattice loss function in the texture loss function is expressed as:
Figure BDA0002705792360000096
wherein C isiThe number of channels of the feature map representing the i-th layer output of the pre-trained model VGG-16.
The total variation loss function in the texture loss function is expressed as:
Figure BDA0002705792360000097
where omega represents a damaged area in the image. The total variation loss function is a smooth penalty term and is defined on the expansion domain of one pixel in the missing region.
The penalty function in the texture loss function is expressed as:
Figure BDA0002705792360000101
where D denotes a discriminator. y 'is a randomly scaled version of a sample sampled from y' and y. In the present invention, λ is set to 10.
The total loss function of this embodiment is defined as:
Figure BDA0002705792360000102
where P and Q are the number of layers of the decoder selected.
The generation countermeasure network based on the attention mechanism mainly completes the image completion task, and the final goal of the generation countermeasure network is LtotalThe loss function is minimized and stabilized.
The attention-based mechanism of generation confrontation network is trained as follows:
step S21: initializing a weight parameter of the network, wherein λrec、λper、λstyle、λtvAnd λadv6, 0.1, 240, 0.1, 0.001, batch size 32, learning rate 10-4P and Q are {1, 2, 3, 4, 5, 6} and{1,2,3}。
step S22: and inputting the damaged image and the binary mask image into a generator G for image completion. The generated complete image and the real target image are input into a discriminator D, and the iteration is carried out in sequence to ensure that the network total loss function LtotalAnd decreases to tend to stabilize.
It should be noted that, in the embodiment of the present disclosure, an encoder is used to extract features from input data, and a decoder is used to decode the obtained hidden code into an image. And the dual attention fusion module outputs a final complete image. In this example, the encoder and decoder each consist of 8 convolutional layers. Wherein, the sizes of the convolution layer filters in the encoder are respectively 7, 5, 3, 3, 3, 3, 3 and 3; the convolutional layer filters in the decoder are all 3 in size. In the present example, the feature map is upsampled using conventional methods. The number of layers of the convolutional layers and the number and size of the filters in each convolutional layer can be selected and set according to actual conditions. In the discriminator, a convolution neural network structure is adopted to take the real image pair and the generated complementary image pair as input, and the output adopts a block countermeasure loss function to judge whether the real image pair and the generated complementary image pair are true or false.
The embodiment of the disclosure provides that the local convolution layer utilizes the prior information in the binary mask map for the task of image completion by utilizing the high nonlinear fitting capability of the generated countermeasure network based on the attention mechanism. Secondly, the embodiment of the present disclosure provides a dual attention fusion module, which can form a multi-scale encoder. The encoder may gradually increase the texture detail in the generated image. In particular, the network advantageously produces high quality images with the constraint of an applied loss function. Thus, a model with image completion can be trained by the network shown in fig. one. In the testing stage, the binary mask and the damaged image are also used as the input of the model, and the generated image completion result is obtained, as shown in fig. three.
Step S3: and (4) performing completion processing on the test data by using a trained attention-based generation countermeasure network model.
To illustrate the specific implementation of the disclosed embodiment in detail and to verify the validity of the disclosed method, we apply the method proposed in this embodiment to four public databases (one face database and three nature databases) -CelebA-HQ, ImageNet, Places2 and pair Street View. CelebA-HQ contains 30000 high-quality face images. The plants 2 contained 365 scenes, with a total number of images exceeding 8000000. A Pairs Street View contains 15000 Paris Street View maps. ImageNet is a large data set, exceeding 14 hundred million images. For Places2, Pairs Street View, and ImageNet, the original validation and test set was used in the present invention. For CelebA-HQ, 28000 images were randomly selected for training and the remaining images were used for testing in the present invention. 60000 binary mask graphs are generated off line by using a binary mask algorithm. 55000 binary mask images are randomly selected for training, and the rest 5000 binary mask images are used for testing (the binary mask images are used for generating damaged images). The method comprises the steps of using a generated confrontation network based on an attention mechanism and an objective function designed in the invention, taking a damaged image and a corresponding binary mask image as input, and training the deep neural network by using confrontation and gradient back propagation between a generator and a discriminator. And continuously adjusting the weights of different tasks in the training process until the network converges finally to obtain the model for editing the facial expressions.
In order to test the effectiveness of the model, the image completion operation is performed by using the test set data, and the visualization result is shown in fig. three. This embodiment effectively demonstrates that the proposed method can generate high quality images.
Finally, the above embodiments are only intended to illustrate the technical solutions of the present disclosure and not to limit, although the present disclosure has been described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that modifications or equivalent substitutions may be made to the technical solutions of the present disclosure without departing from the spirit and scope of the technical solutions, and all of them should be covered in the claims of the present disclosure.
Although the present disclosure has been described with reference to specific embodiments, it should be understood that the scope of the present disclosure is not limited thereto, and those skilled in the art will appreciate that various modifications and changes can be made without departing from the spirit and scope of the present disclosure.

Claims (10)

1. An image completion method based on an attention mechanism is characterized by comprising the following processes:
step S1: preprocessing the database image data, synthesizing a damaged image by using a binary mask, and taking the damaged image and the corresponding binary mask as input data of a network model;
step S2: training the input data to obtain a generated confrontation network model capable of performing image completion;
step S3: and (4) using the trained generated confrontation network model to perform completion processing on the test data.
2. The attention-based image completion method according to claim 1, wherein the database face image and the natural image are identical in size after the preprocessing in the step S1; in the image completion task, a damaged image and a corresponding binary mask are combined to be used as input, and an undamaged image is used as a real image label.
3. The attention-based image completion method according to claim 1, wherein the step S2 includes:
step S21: initializing a network weight parameter in an image completion task; wherein the loss function of the generator is LtotalThe loss function of the discriminator is LD
Step S22: inputting the damaged image and the binary mask image into a generator network G for image completion task, inputting the generated completed image and the target image into a discriminator network D, and sequentially performing iterative training to enable a loss function L of the generatortotalLoss function L of sum discriminatorDAll reduce to tend to be stable;
step S23: and simultaneously training the expression generation and removal tasks until all loss functions are not reduced any more, thereby obtaining a final generation confrontation network model.
4. The image completion method based on attention mechanism as claimed in claim 3, wherein the damaged image is marked as x, and the generated image is marked as x
Figure FDA0002705792350000011
The target image is denoted as y and the binary mask image is denoted as M.
5. The attention-based image completion method according to claim 3, wherein the output value of the local convolution layer in the attention-based system depends on the undamaged area, and is mathematically described as follows:
Figure FDA0002705792350000021
wherein, l indicates a pixel-level multiplication, and 1 indicates a matrix in which all elements are 1 and the shape and M are the same. W represents the parameter of the convolutional layer, F represents the output characteristic diagram of the convolutional layer in the previous layer, b represents the deviation of the convolutional layer, and M represents the corresponding binary mask diagram;
Figure FDA0002705792350000022
which can be considered as a scaling factor, adjusts the weight of the known region.
After the partial convolution is performed again, the binary mask map M also needs to be updated, and the mathematical description thereof is as follows:
Figure FDA0002705792350000023
that is, if the convolutional layer can get an output result according to a valid input, the position in the binary mask is marked as 1.
6. The method of image completion based on attention mechanism as claimed in claim 3, wherein the dual attention fusion module in the attention mechanism fuses the known region and the generation together, comprising: firstly, channel-level statistical information is obtained:
Figure FDA0002705792350000024
wherein z isc(i, j) is the value of the c-th dimension of z. HGPRepresenting a global pooling layer. f. ofcRepresenting the c-th dimension in the feature map F;
then, acquiring the dependency relationship between the channels:
ω=f(WU(WDz))
where f and are denoted sigmoid function and ReLU activation function, respectively. WUAnd WDIs a parameter of the lap layer. The obtained channel dimension information ω can be used to adjust the weight of the input:
Figure FDA0002705792350000025
wherein ω iscAnd fcRespectively representing a scaling factor and a feature map;
second attention is sought for α obtained by:
Figure FDA0002705792350000031
where x' is a different scale version of the damaged image x. A is a learnable variation function, and is composed of a plurality of convolution functions.
Figure FDA0002705792350000032
And x' are first joined and then fed into the convolutional layer. f is a sigmoid function which can change alpha into an attention diagram to some extent;
finally, the final image completion result is obtained
Figure FDA0002705792350000033
Figure FDA0002705792350000034
Wherein, l and B denote Hadamard product-sum combining functions, respectively.
7. The attention-based mechanism image inpainting method according to claim 3, wherein the loss function is divided into a structural loss function and a texture loss function:
Figure FDA0002705792350000035
Figure FDA0002705792350000036
where k denotes the penalty function computation at the kth layer of the decoder. L isstructRepresenting the structural loss function, LtextRepresenting the texture loss function, LrecIndicating L between images1Norm, LperRepresenting the perceptual loss function, LstyleRepresenting a style loss function, LtvRepresenting the total variation loss function, LadvRepresenting the penalty function. Lambda [ alpha ]rec、λper、λstyle、λtvAnd λadvRepresenting a weighting factor.
8. An image complementing device based on attention mechanism is characterized by comprising
A data processing module: the method comprises preprocessing database image data, synthesizing damaged images with binary masks, and inputting the damaged images and the corresponding binary masks as network models;
a model generation module: it is formulated to get through training and can carry on the generating of the image completion and resist the network model;
an image completion module: and (4) using the trained generated confrontation network model to perform completion processing on the test data.
9. The attention-based image completion apparatus according to claim 8, wherein the data processing module is configured to pre-process the database facial image and the natural image to have the same size; in the image completion task, a damaged image and a corresponding binary mask are combined to be used as input, and an undamaged image is used as a real image label.
10. The attention-based image completion apparatus according to claim 8, wherein the model generation module comprises: initializing a network weight parameter in an image completion task; wherein the loss function of the generator is LtotalThe loss function of the discriminator is LD(ii) a Inputting the damaged image and the binary mask image into a generator network G for image completion task, inputting the generated completed image and the target image into a discriminator network D, and sequentially performing iterative training to enable a loss function L of the generatortotalLoss function L of sum discriminatorDAll reduce to tend to be stable; and simultaneously training the expression generation and removal tasks until all loss functions are not reduced any more, thereby obtaining a final generation confrontation network model.
CN202011038187.6A 2020-09-28 2020-09-28 Attention mechanism-based image completion method and device Active CN112184582B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011038187.6A CN112184582B (en) 2020-09-28 2020-09-28 Attention mechanism-based image completion method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011038187.6A CN112184582B (en) 2020-09-28 2020-09-28 Attention mechanism-based image completion method and device

Publications (2)

Publication Number Publication Date
CN112184582A true CN112184582A (en) 2021-01-05
CN112184582B CN112184582B (en) 2022-08-19

Family

ID=73944421

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011038187.6A Active CN112184582B (en) 2020-09-28 2020-09-28 Attention mechanism-based image completion method and device

Country Status (1)

Country Link
CN (1) CN112184582B (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112884673A (en) * 2021-03-11 2021-06-01 西安建筑科技大学 Reconstruction method for missing information between coffin chamber mural blocks of improved loss function SinGAN
CN113129234A (en) * 2021-04-20 2021-07-16 河南科技学院 Incomplete image fine repairing method based on intra-field and extra-field feature fusion
CN113221757A (en) * 2021-05-14 2021-08-06 上海交通大学 Method, terminal and medium for improving accuracy rate of pedestrian attribute identification
CN113962893A (en) * 2021-10-27 2022-01-21 山西大学 Face image restoration method based on multi-scale local self-attention generation countermeasure network

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180075581A1 (en) * 2016-09-15 2018-03-15 Twitter, Inc. Super resolution using a generative adversarial network
US20190236759A1 (en) * 2018-01-29 2019-08-01 National Tsing Hua University Method of image completion
CN110288537A (en) * 2019-05-20 2019-09-27 湖南大学 Facial image complementing method based on the depth production confrontation network from attention
CN110458939A (en) * 2019-07-24 2019-11-15 大连理工大学 The indoor scene modeling method generated based on visual angle
CN111127346A (en) * 2019-12-08 2020-05-08 复旦大学 Multi-level image restoration method based on partial-to-integral attention mechanism

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180075581A1 (en) * 2016-09-15 2018-03-15 Twitter, Inc. Super resolution using a generative adversarial network
US20190236759A1 (en) * 2018-01-29 2019-08-01 National Tsing Hua University Method of image completion
CN110288537A (en) * 2019-05-20 2019-09-27 湖南大学 Facial image complementing method based on the depth production confrontation network from attention
CN110458939A (en) * 2019-07-24 2019-11-15 大连理工大学 The indoor scene modeling method generated based on visual angle
CN111127346A (en) * 2019-12-08 2020-05-08 复旦大学 Multi-level image restoration method based on partial-to-integral attention mechanism

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
HAO TANG ET AL.: "Local Class-Specific and Global Image-Level Generative Adversarial Networks for Semantic-Guided Scene Generation", 《2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR)》 *
JIANLOU SI ET AL.: "Dual Attention Matching Network for Context-Aware Feature Sequence Based Person Re-identification", 《2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION》 *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112884673A (en) * 2021-03-11 2021-06-01 西安建筑科技大学 Reconstruction method for missing information between coffin chamber mural blocks of improved loss function SinGAN
CN113129234A (en) * 2021-04-20 2021-07-16 河南科技学院 Incomplete image fine repairing method based on intra-field and extra-field feature fusion
CN113129234B (en) * 2021-04-20 2022-11-01 河南科技学院 Incomplete image fine restoration method based on intra-field and extra-field feature fusion
CN113221757A (en) * 2021-05-14 2021-08-06 上海交通大学 Method, terminal and medium for improving accuracy rate of pedestrian attribute identification
CN113221757B (en) * 2021-05-14 2022-09-02 上海交通大学 Method, terminal and medium for improving accuracy rate of pedestrian attribute identification
CN113962893A (en) * 2021-10-27 2022-01-21 山西大学 Face image restoration method based on multi-scale local self-attention generation countermeasure network

Also Published As

Publication number Publication date
CN112184582B (en) 2022-08-19

Similar Documents

Publication Publication Date Title
CN112184582B (en) Attention mechanism-based image completion method and device
CN112686817B (en) Image completion method based on uncertainty estimation
CN111160085A (en) Human body image key point posture estimation method
CN112686816A (en) Image completion method based on content attention mechanism and mask code prior
CN109903236B (en) Face image restoration method and device based on VAE-GAN and similar block search
CN111815523A (en) Image restoration method based on generation countermeasure network
CN110728219A (en) 3D face generation method based on multi-column multi-scale graph convolution neural network
CN111242841A (en) Image background style migration method based on semantic segmentation and deep learning
CN113989129A (en) Image restoration method based on gating and context attention mechanism
CN112102303A (en) Semantic image analogy method for generating countermeasure network based on single image
Li et al. Learning efficient gans for image translation via differentiable masks and co-attention distillation
CN109447897B (en) Real scene image synthesis method and system
CN111986075A (en) Style migration method for target edge clarification
CN110097615B (en) Stylized and de-stylized artistic word editing method and system
CN112149563A (en) Method and system for estimating postures of key points of attention mechanism human body image
CN112801914A (en) Two-stage image restoration method based on texture structure perception
CN112949553A (en) Face image restoration method based on self-attention cascade generation countermeasure network
CN117788629B (en) Image generation method, device and storage medium with style personalization
Wang et al. Diverse image inpainting with normalizing flow
CN114092354A (en) Face image restoration method based on generation countermeasure network
CN113160081A (en) Depth face image restoration method based on perception deblurring
CN117611428A (en) Fashion character image style conversion method
Sanjay et al. Early Renaissance Art Generation Using Deep Convolutional Generative Adversarial Networks
CN113111906B (en) Method for generating confrontation network model based on condition of single pair image training
CN114140317A (en) Image animation method based on cascade generation confrontation network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant