CN112184582A - Attention mechanism-based image completion method and device - Google Patents
Attention mechanism-based image completion method and device Download PDFInfo
- Publication number
- CN112184582A CN112184582A CN202011038187.6A CN202011038187A CN112184582A CN 112184582 A CN112184582 A CN 112184582A CN 202011038187 A CN202011038187 A CN 202011038187A CN 112184582 A CN112184582 A CN 112184582A
- Authority
- CN
- China
- Prior art keywords
- image
- loss function
- completion
- attention
- binary mask
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 33
- 230000007246 mechanism Effects 0.000 title claims abstract description 26
- 238000012549 training Methods 0.000 claims abstract description 23
- 238000012545 processing Methods 0.000 claims abstract description 12
- 238000012360 testing method Methods 0.000 claims abstract description 12
- 238000007781 pre-processing Methods 0.000 claims abstract description 8
- 230000002194 synthesizing effect Effects 0.000 claims abstract description 6
- 230000008569 process Effects 0.000 claims abstract description 5
- 230000006870 function Effects 0.000 claims description 93
- 238000010586 diagram Methods 0.000 claims description 10
- 230000009977 dual effect Effects 0.000 claims description 8
- 230000004927 fusion Effects 0.000 claims description 6
- 230000014509 gene expression Effects 0.000 claims description 5
- 238000011176 pooling Methods 0.000 claims description 4
- 230000004913 activation Effects 0.000 claims description 3
- 239000011159 matrix material Substances 0.000 claims description 3
- 230000008859 change Effects 0.000 claims description 2
- 230000001815 facial effect Effects 0.000 claims 1
- 238000013528 artificial neural network Methods 0.000 description 2
- 230000000295 complement effect Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 241000282326 Felis catus Species 0.000 description 1
- 230000003042 antagnostic effect Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000007796 conventional method Methods 0.000 description 1
- 238000013434 data augmentation Methods 0.000 description 1
- 230000007423 decrease Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000008921 facial expression Effects 0.000 description 1
- 230000003902 lesion Effects 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 238000010200 validation analysis Methods 0.000 description 1
- 238000012800 visualization Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T5/00—Image enhancement or restoration
- G06T5/77—Retouching; Inpainting; Scratch removal
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20081—Training; Learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20084—Artificial neural networks [ANN]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/30—Subject of image; Context of image processing
- G06T2207/30196—Human being; Person
- G06T2207/30201—Face
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Molecular Biology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- General Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- Life Sciences & Earth Sciences (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Image Analysis (AREA)
Abstract
The invention relates to an image completion method and device based on an attention mechanism, belonging to the technical field of computer image processing, and the method comprises the following processes: step S1, preprocessing the database image data, synthesizing a damaged image by using a binary mask, and taking the damaged image and the corresponding binary mask as the input of a network model; step S2, obtaining a generated confrontation network model capable of image completion through training; and step S3, using the trained generated confrontation network model to perform completion processing on the test data. The invention provides a generation countermeasure network model based on an attention mechanism aiming at the problem of image completion. The binary mask is used as an additional information guide, training learning is carried out by combining with an input image, and the model can enable a completion result to contain rich detail information and can keep structural continuity.
Description
Technical Field
The disclosure belongs to the technical field of computer image processing, and particularly relates to an image completion method and device based on an attention mechanism.
Background
The statements herein merely provide background related to the present disclosure and may not necessarily constitute prior art.
Image inpainting refers to the generation of substitute content for missing regions in a given damaged image, and makes the repaired image visually realistic and semantically reasonable. Image completion tasks may be used in other applications, such as image editing, when scene elements distracting from human attention, such as people or objects (which are often unavoidable), are present in an image, allowing a user to remove unwanted elements from the image while filling in blank areas with visually and semantically reasonable content.
The inventor finds that: with the continuous development of science and technology, the demands of people in different fields are correspondingly improved, including movie advertisement animation production, online games and the like, and the vivid image restoration technology has important significance on the good experience of users. Therefore, under the background, an image completion method based on an attention mechanism is developed, so that the repaired image is vivid visually and reasonable semantically, and the method has important significance.
Disclosure of Invention
Aiming at the technical problems in the prior art, the disclosure provides an attention mechanism-based image completion method and device.
At least one embodiment of the present disclosure provides an image completion method based on an attention mechanism, including the following steps:
step S1, preprocessing the database image data, synthesizing a damaged image by using a binary mask, and taking the damaged image and the corresponding binary mask as the input of a network model;
step S2: training input data to obtain a generated confrontation network model capable of performing image completion;
step S3: and (4) using the trained generated confrontation network model to perform completion processing on the test data.
Further, the database face image and the natural image after the preprocessing in the step S1 are consistent in size; in the image completion task, a damaged image and a corresponding binary mask are combined to be used as input, and an undamaged image is used as a real image label.
Further, the process of generating the countermeasure network model in step S2 includes:
step S21: initializing a network weight parameter in an image completion task; wherein the loss function of the generator is LtotalThe loss function of the discriminator is LD;
Step S22: inputting the damaged image and the binary mask image into a generator network G for image completion task, inputting the generated completed image and the target image into a discriminator network D, and sequentially performing iterative training to enable a loss function L of the generatortotalLoss function L of sum discriminatorDAll reduce to tend to be stable;
step S23: and simultaneously training the expression generation and removal tasks until all loss functions are not reduced any more, thereby obtaining a final generation confrontation network model.
Further, the damaged image is denoted as x, and the generated image is denoted as xThe target image is denoted as y and the binary mask image is denoted as M.
Further, note that the output value of the local convolution layer in the force mechanism depends on the undamaged area, which is mathematically described as follows:
wherein, l indicates a pixel-level multiplication, and 1 indicates a matrix in which all elements are 1 and the shape and M are the same. W represents the parameter of the convolutional layer, F represents the output characteristic diagram of the convolutional layer of the previous layer, b represents the deviation of the convolutional layer, and M represents the corresponding binary mask diagram;which can be considered as a scaling factor, adjusts the weight of the known region.
After the partial convolution is performed again, the binary mask map M also needs to be updated, and the mathematical description thereof is as follows:
that is, if the convolutional layer can get an output result according to a valid input, the position in the binary mask is marked as 1.
Further, a dual attention fusion module in an attention mechanism that fuses together known regions and generations, comprising: firstly, channel-level statistical information is obtained:
wherein z isc(i, j) is the value of the c-th dimension of z. HGPRepresenting a global pooling layer. f. ofcRepresenting the c-th dimension in the feature map F;
then, acquiring the dependency relationship between the channels:
ω=f(WU(WDz))
where f and are denoted sigmoid function and ReLU activation function, respectively. WUAnd WDIs a parameter of the lap layer. The obtained channel dimension information ω can be used to adjust the weight of the input:
wherein ω iscAnd fcRespectively representing a scaling factor and a feature map;
second attention is sought for α obtained by:
where x' is a different scale version of the damaged image x. A is a learnable variation function, and is composed of a plurality of convolution functions.And x' are first joined and then fed into the convolutional layer. f is a sigmoid function which can change alpha into an attention diagram to some extent;
Wherein, l and B denote Hadamard product-sum combining functions, respectively.
Further, the loss function is divided into a structure loss and a texture loss function:
where k denotes the penalty function computation at the kth layer of the decoder. L isstructRepresenting the structural loss function, LtextRepresenting the texture loss function, LrecIndicating L between images1Norm, LperRepresenting the perceptual loss function, LstyleRepresenting a style loss function, LtvRepresenting the total variation loss function, LadvRepresenting the penalty function. Lambda [ alpha ]rec、λper、λstyle、λtvAnd λadvRepresenting a weighting factor.
At least one embodiment of the present disclosure provides an image completion apparatus based on attention mechanism, the apparatus including
A data processing module: the method comprises preprocessing database image data, synthesizing damaged images with binary masks, and inputting the damaged images and the corresponding binary masks as network models;
a model generation module: it is formulated to get through training and can carry on the generating of the image completion and resist the network model;
an image completion module: and (4) using the trained generated confrontation network model to perform completion processing on the test data.
Further, the size of the database face image after the data processing module is prepared and preprocessed is consistent with that of the natural image; in the image completion task, a damaged image and a corresponding binary mask are combined to be used as input, and an undamaged image is used as a real image label.
Further, the model generation module includes: initializing a network weight parameter in an image completion task; wherein, the loss function of the generator is Ltotal, and the loss function of the discriminator is LD; inputting the damaged image and the binary mask image into a generator network G for image completion task, inputting the generated completed image and the target image into a discriminator network D, and sequentially carrying out iterative training to reduce the loss function Ltotal of the generator and the loss function LD of the discriminator to be stable; and simultaneously training the expression generation and removal tasks until all loss functions are not reduced any more, thereby obtaining a final generation confrontation network model.
The beneficial effects of this disclosure are as follows:
(1) in order to improve the generation quality (including rich texture details and structural continuity) of an image in an image completion task, the image completion method based on the attention mechanism is provided. Through the local convolution layer, the generation countermeasure network can utilize the prior information of the binary mask, and the quality of the generated image is improved. By means of the dual attention module, a multi-scale decoder is formed, and high-resolution images can be gradually generated.
(2) The image completion method introduces a reconstruction loss function, a style loss function, a total variation loss function and an antagonistic loss function as constraints at an image level and a characteristic level, and improves the robustness and accuracy of the network.
Drawings
The accompanying drawings, which are included to provide a further understanding of the disclosure, illustrate embodiments of the disclosure and together with the description serve to explain the disclosure and are not to limit the disclosure.
FIG. 1 is a flow chart of an attention-based image completion method provided by an embodiment of the present disclosure;
FIG. 2 is a flow diagram of a dual attention module provided by embodiments of the present disclosure;
fig. 3 is a diagram illustrating the effect of image completion on a public data set provided by an embodiment of the present disclosure.
Detailed Description
It should be noted that the following detailed description is exemplary and is intended to provide further explanation of the disclosure. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this disclosure belongs.
The embodiment of the disclosure provides an image completion method based on an attention mechanism, which comprises the following steps:
step S1, preprocessing the database image data, synthesizing the damaged image by using a binary mask, and taking the damaged image and the corresponding binary mask as the input of the network model;
specifically, a binary mask map is first generated offline using a binary mask algorithm. For the face image, normalizing the image according to the positions of the two eyes and cutting the image to be 256 × 256 with a uniform size; for natural images, the image size is first enlarged to 350 × 350, and then the enlarged image is randomly cropped to a uniform size of 256 × 256. And randomly selecting an off-line generated binary mask image, and multiplying the binary mask image by the damaged image to obtain the damaged image.
Further, in step S1, the size of the preprocessed database face image is consistent with that of the natural image, and meanwhile, in the next image completion task, the damaged image and the corresponding binary mask are combined as input, and the undamaged image is used as a real image label.
And step S2, training the generated confrontation network model based on the attention mechanism by using the training input data so as to complete the image completion task.
It should be noted that, in this step, in order to expand the sample size of the input data and improve the generalization capability of the network, the embodiment may employ data augmentation operations, including random inversion, so as to increase the number of the sequential training data.
Specifically, the step S2 includes:
step S21: initializing network weight parameters in the image completion task, wherein the loss function of the generator is LtotalThe loss function of the discriminator is LD;
Step S22: inputting the damaged image and the binary mask image into a generator network G for image completion task, inputting the generated completed image and the target image into a discriminator network D, and sequentially performing iterative training to enable a loss function L of the generatortotalAnd the loss function LD of the discriminator are both reduced to be stable;
step S23: and simultaneously training the expression generation and removal tasks until all loss functions are not reduced any more, thereby obtaining a final generation confrontation network model.
Further, assuming that the lesion image is denoted as x, the generated image is denoted as xThe target image is denoted as y and the binary mask image is denoted as M, then the output values of the local convolution layer in the above-mentioned attention mechanism depend on the undamaged area, which is mathematically described as follows:
wherein, l indicates a pixel-level multiplication, and 1 indicates a matrix in which all elements are 1 and the shape and M are the same. W represents a parameter of a convolutional layer, F represents an output characteristic diagram of a convolutional layer of a previous layer, b represents a deviation of the convolutional layer, and M represents a corresponding binary mask diagram.Which can be considered as a scaling factor, adjusts the weight of the known region.
The present embodiment also needs to update the binary mask map M after performing the partial convolution, and the mathematical description thereof is as follows:
that is, if the convolutional layer can get an output result according to a valid input, the position in the binary mask is marked as 1.
Further, in step S2, the dual attention fusion module in the attention mechanism fuses the known region and the generated region together, including:
the statistical information at the channel level is obtained first,
wherein z isc(i, j) is the value of the c-th dimension of z. HGPRepresenting a global pooling layer. f. ofcRepresenting the c-th dimension in the feature map F.
Then, the dependency relationship between the channels is obtained,
ω=f(WU(WDz))
where f and are denoted sigmoid function and ReLU activation function, respectively. WUAnd WDIs a parameter of the lap layer. The obtained channel dimension information ω can be used to adjust the weight of the input:
wherein ω iscAnd fcRespectively representing a scaling factor and a feature map.
Second, attention is sought for α obtained by:
where x' is a different scale version of the damaged image x. A is a learnable variation function, and is composed of a plurality of convolution functions.And x' are first joined and then fed into the convolutional layer. f is a sigmoid function, and alpha can be changed into an attention map to some extent.
Wherein, l and B denote Hadamard product-sum combining functions, respectively.
Therefore, the present embodiment provides a method with a wider application meaning for the problem of image completion. According to the method, the damaged image can be completed more accurately by using the prior information of the binary mask through the local convolution layer, and in addition, the resolution of the generated image can be gradually increased through the dual attention fusion module, so that rich detail information is continuously generated.
Further, the objective function in the image completion task in this embodiment is divided into a structure loss function and a texture loss function, which are expressed as follows:
where k denotes the penalty function computation at the kth layer of the decoder. L isstructRepresenting the structural loss function, LtextRepresenting the texture loss function, LrecIndicating L between images1Norm, LperRepresenting the perceptual loss function, LstyleRepresenting a style loss function, LtvRepresenting the total variation loss function, LadvRepresenting the penalty function. Lambda [ alpha ]rec、λper、λstyle、λtvAnd λadvRepresenting a weighting factor.
Wherein the reconstruction loss function in the structural loss function is represented as:
wherein the perceptual loss function of the texture loss function is represented as:
where φ is the pre-trained VGG-16 network. Phi is aiAnd outputting the characteristic map of the ith pooling layer. The pool-1, pool-2 and pool-3 layers of VGG-16 are used in the present invention.
The lattice loss function in the texture loss function is expressed as:
wherein C isiThe number of channels of the feature map representing the i-th layer output of the pre-trained model VGG-16.
The total variation loss function in the texture loss function is expressed as:
where omega represents a damaged area in the image. The total variation loss function is a smooth penalty term and is defined on the expansion domain of one pixel in the missing region.
The penalty function in the texture loss function is expressed as:
where D denotes a discriminator. y 'is a randomly scaled version of a sample sampled from y' and y. In the present invention, λ is set to 10.
The total loss function of this embodiment is defined as:
where P and Q are the number of layers of the decoder selected.
The generation countermeasure network based on the attention mechanism mainly completes the image completion task, and the final goal of the generation countermeasure network is LtotalThe loss function is minimized and stabilized.
The attention-based mechanism of generation confrontation network is trained as follows:
step S21: initializing a weight parameter of the network, wherein λrec、λper、λstyle、λtvAnd λadv6, 0.1, 240, 0.1, 0.001, batch size 32, learning rate 10-4P and Q are {1, 2, 3, 4, 5, 6} and{1,2,3}。
step S22: and inputting the damaged image and the binary mask image into a generator G for image completion. The generated complete image and the real target image are input into a discriminator D, and the iteration is carried out in sequence to ensure that the network total loss function LtotalAnd decreases to tend to stabilize.
It should be noted that, in the embodiment of the present disclosure, an encoder is used to extract features from input data, and a decoder is used to decode the obtained hidden code into an image. And the dual attention fusion module outputs a final complete image. In this example, the encoder and decoder each consist of 8 convolutional layers. Wherein, the sizes of the convolution layer filters in the encoder are respectively 7, 5, 3, 3, 3, 3, 3 and 3; the convolutional layer filters in the decoder are all 3 in size. In the present example, the feature map is upsampled using conventional methods. The number of layers of the convolutional layers and the number and size of the filters in each convolutional layer can be selected and set according to actual conditions. In the discriminator, a convolution neural network structure is adopted to take the real image pair and the generated complementary image pair as input, and the output adopts a block countermeasure loss function to judge whether the real image pair and the generated complementary image pair are true or false.
The embodiment of the disclosure provides that the local convolution layer utilizes the prior information in the binary mask map for the task of image completion by utilizing the high nonlinear fitting capability of the generated countermeasure network based on the attention mechanism. Secondly, the embodiment of the present disclosure provides a dual attention fusion module, which can form a multi-scale encoder. The encoder may gradually increase the texture detail in the generated image. In particular, the network advantageously produces high quality images with the constraint of an applied loss function. Thus, a model with image completion can be trained by the network shown in fig. one. In the testing stage, the binary mask and the damaged image are also used as the input of the model, and the generated image completion result is obtained, as shown in fig. three.
Step S3: and (4) performing completion processing on the test data by using a trained attention-based generation countermeasure network model.
To illustrate the specific implementation of the disclosed embodiment in detail and to verify the validity of the disclosed method, we apply the method proposed in this embodiment to four public databases (one face database and three nature databases) -CelebA-HQ, ImageNet, Places2 and pair Street View. CelebA-HQ contains 30000 high-quality face images. The plants 2 contained 365 scenes, with a total number of images exceeding 8000000. A Pairs Street View contains 15000 Paris Street View maps. ImageNet is a large data set, exceeding 14 hundred million images. For Places2, Pairs Street View, and ImageNet, the original validation and test set was used in the present invention. For CelebA-HQ, 28000 images were randomly selected for training and the remaining images were used for testing in the present invention. 60000 binary mask graphs are generated off line by using a binary mask algorithm. 55000 binary mask images are randomly selected for training, and the rest 5000 binary mask images are used for testing (the binary mask images are used for generating damaged images). The method comprises the steps of using a generated confrontation network based on an attention mechanism and an objective function designed in the invention, taking a damaged image and a corresponding binary mask image as input, and training the deep neural network by using confrontation and gradient back propagation between a generator and a discriminator. And continuously adjusting the weights of different tasks in the training process until the network converges finally to obtain the model for editing the facial expressions.
In order to test the effectiveness of the model, the image completion operation is performed by using the test set data, and the visualization result is shown in fig. three. This embodiment effectively demonstrates that the proposed method can generate high quality images.
Finally, the above embodiments are only intended to illustrate the technical solutions of the present disclosure and not to limit, although the present disclosure has been described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that modifications or equivalent substitutions may be made to the technical solutions of the present disclosure without departing from the spirit and scope of the technical solutions, and all of them should be covered in the claims of the present disclosure.
Although the present disclosure has been described with reference to specific embodiments, it should be understood that the scope of the present disclosure is not limited thereto, and those skilled in the art will appreciate that various modifications and changes can be made without departing from the spirit and scope of the present disclosure.
Claims (10)
1. An image completion method based on an attention mechanism is characterized by comprising the following processes:
step S1: preprocessing the database image data, synthesizing a damaged image by using a binary mask, and taking the damaged image and the corresponding binary mask as input data of a network model;
step S2: training the input data to obtain a generated confrontation network model capable of performing image completion;
step S3: and (4) using the trained generated confrontation network model to perform completion processing on the test data.
2. The attention-based image completion method according to claim 1, wherein the database face image and the natural image are identical in size after the preprocessing in the step S1; in the image completion task, a damaged image and a corresponding binary mask are combined to be used as input, and an undamaged image is used as a real image label.
3. The attention-based image completion method according to claim 1, wherein the step S2 includes:
step S21: initializing a network weight parameter in an image completion task; wherein the loss function of the generator is LtotalThe loss function of the discriminator is LD;
Step S22: inputting the damaged image and the binary mask image into a generator network G for image completion task, inputting the generated completed image and the target image into a discriminator network D, and sequentially performing iterative training to enable a loss function L of the generatortotalLoss function L of sum discriminatorDAll reduce to tend to be stable;
step S23: and simultaneously training the expression generation and removal tasks until all loss functions are not reduced any more, thereby obtaining a final generation confrontation network model.
5. The attention-based image completion method according to claim 3, wherein the output value of the local convolution layer in the attention-based system depends on the undamaged area, and is mathematically described as follows:
wherein, l indicates a pixel-level multiplication, and 1 indicates a matrix in which all elements are 1 and the shape and M are the same. W represents the parameter of the convolutional layer, F represents the output characteristic diagram of the convolutional layer in the previous layer, b represents the deviation of the convolutional layer, and M represents the corresponding binary mask diagram;which can be considered as a scaling factor, adjusts the weight of the known region.
After the partial convolution is performed again, the binary mask map M also needs to be updated, and the mathematical description thereof is as follows:
that is, if the convolutional layer can get an output result according to a valid input, the position in the binary mask is marked as 1.
6. The method of image completion based on attention mechanism as claimed in claim 3, wherein the dual attention fusion module in the attention mechanism fuses the known region and the generation together, comprising: firstly, channel-level statistical information is obtained:
wherein z isc(i, j) is the value of the c-th dimension of z. HGPRepresenting a global pooling layer. f. ofcRepresenting the c-th dimension in the feature map F;
then, acquiring the dependency relationship between the channels:
ω=f(WU(WDz))
where f and are denoted sigmoid function and ReLU activation function, respectively. WUAnd WDIs a parameter of the lap layer. The obtained channel dimension information ω can be used to adjust the weight of the input:
wherein ω iscAnd fcRespectively representing a scaling factor and a feature map;
second attention is sought for α obtained by:
where x' is a different scale version of the damaged image x. A is a learnable variation function, and is composed of a plurality of convolution functions.And x' are first joined and then fed into the convolutional layer. f is a sigmoid function which can change alpha into an attention diagram to some extent;
Wherein, l and B denote Hadamard product-sum combining functions, respectively.
7. The attention-based mechanism image inpainting method according to claim 3, wherein the loss function is divided into a structural loss function and a texture loss function:
where k denotes the penalty function computation at the kth layer of the decoder. L isstructRepresenting the structural loss function, LtextRepresenting the texture loss function, LrecIndicating L between images1Norm, LperRepresenting the perceptual loss function, LstyleRepresenting a style loss function, LtvRepresenting the total variation loss function, LadvRepresenting the penalty function. Lambda [ alpha ]rec、λper、λstyle、λtvAnd λadvRepresenting a weighting factor.
8. An image complementing device based on attention mechanism is characterized by comprising
A data processing module: the method comprises preprocessing database image data, synthesizing damaged images with binary masks, and inputting the damaged images and the corresponding binary masks as network models;
a model generation module: it is formulated to get through training and can carry on the generating of the image completion and resist the network model;
an image completion module: and (4) using the trained generated confrontation network model to perform completion processing on the test data.
9. The attention-based image completion apparatus according to claim 8, wherein the data processing module is configured to pre-process the database facial image and the natural image to have the same size; in the image completion task, a damaged image and a corresponding binary mask are combined to be used as input, and an undamaged image is used as a real image label.
10. The attention-based image completion apparatus according to claim 8, wherein the model generation module comprises: initializing a network weight parameter in an image completion task; wherein the loss function of the generator is LtotalThe loss function of the discriminator is LD(ii) a Inputting the damaged image and the binary mask image into a generator network G for image completion task, inputting the generated completed image and the target image into a discriminator network D, and sequentially performing iterative training to enable a loss function L of the generatortotalLoss function L of sum discriminatorDAll reduce to tend to be stable; and simultaneously training the expression generation and removal tasks until all loss functions are not reduced any more, thereby obtaining a final generation confrontation network model.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011038187.6A CN112184582B (en) | 2020-09-28 | 2020-09-28 | Attention mechanism-based image completion method and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011038187.6A CN112184582B (en) | 2020-09-28 | 2020-09-28 | Attention mechanism-based image completion method and device |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112184582A true CN112184582A (en) | 2021-01-05 |
CN112184582B CN112184582B (en) | 2022-08-19 |
Family
ID=73944421
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011038187.6A Active CN112184582B (en) | 2020-09-28 | 2020-09-28 | Attention mechanism-based image completion method and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112184582B (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112884673A (en) * | 2021-03-11 | 2021-06-01 | 西安建筑科技大学 | Reconstruction method for missing information between coffin chamber mural blocks of improved loss function SinGAN |
CN113129234A (en) * | 2021-04-20 | 2021-07-16 | 河南科技学院 | Incomplete image fine repairing method based on intra-field and extra-field feature fusion |
CN113221757A (en) * | 2021-05-14 | 2021-08-06 | 上海交通大学 | Method, terminal and medium for improving accuracy rate of pedestrian attribute identification |
CN113962893A (en) * | 2021-10-27 | 2022-01-21 | 山西大学 | Face image restoration method based on multi-scale local self-attention generation countermeasure network |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20180075581A1 (en) * | 2016-09-15 | 2018-03-15 | Twitter, Inc. | Super resolution using a generative adversarial network |
US20190236759A1 (en) * | 2018-01-29 | 2019-08-01 | National Tsing Hua University | Method of image completion |
CN110288537A (en) * | 2019-05-20 | 2019-09-27 | 湖南大学 | Facial image complementing method based on the depth production confrontation network from attention |
CN110458939A (en) * | 2019-07-24 | 2019-11-15 | 大连理工大学 | The indoor scene modeling method generated based on visual angle |
CN111127346A (en) * | 2019-12-08 | 2020-05-08 | 复旦大学 | Multi-level image restoration method based on partial-to-integral attention mechanism |
-
2020
- 2020-09-28 CN CN202011038187.6A patent/CN112184582B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20180075581A1 (en) * | 2016-09-15 | 2018-03-15 | Twitter, Inc. | Super resolution using a generative adversarial network |
US20190236759A1 (en) * | 2018-01-29 | 2019-08-01 | National Tsing Hua University | Method of image completion |
CN110288537A (en) * | 2019-05-20 | 2019-09-27 | 湖南大学 | Facial image complementing method based on the depth production confrontation network from attention |
CN110458939A (en) * | 2019-07-24 | 2019-11-15 | 大连理工大学 | The indoor scene modeling method generated based on visual angle |
CN111127346A (en) * | 2019-12-08 | 2020-05-08 | 复旦大学 | Multi-level image restoration method based on partial-to-integral attention mechanism |
Non-Patent Citations (2)
Title |
---|
HAO TANG ET AL.: "Local Class-Specific and Global Image-Level Generative Adversarial Networks for Semantic-Guided Scene Generation", 《2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR)》 * |
JIANLOU SI ET AL.: "Dual Attention Matching Network for Context-Aware Feature Sequence Based Person Re-identification", 《2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION》 * |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112884673A (en) * | 2021-03-11 | 2021-06-01 | 西安建筑科技大学 | Reconstruction method for missing information between coffin chamber mural blocks of improved loss function SinGAN |
CN113129234A (en) * | 2021-04-20 | 2021-07-16 | 河南科技学院 | Incomplete image fine repairing method based on intra-field and extra-field feature fusion |
CN113129234B (en) * | 2021-04-20 | 2022-11-01 | 河南科技学院 | Incomplete image fine restoration method based on intra-field and extra-field feature fusion |
CN113221757A (en) * | 2021-05-14 | 2021-08-06 | 上海交通大学 | Method, terminal and medium for improving accuracy rate of pedestrian attribute identification |
CN113221757B (en) * | 2021-05-14 | 2022-09-02 | 上海交通大学 | Method, terminal and medium for improving accuracy rate of pedestrian attribute identification |
CN113962893A (en) * | 2021-10-27 | 2022-01-21 | 山西大学 | Face image restoration method based on multi-scale local self-attention generation countermeasure network |
Also Published As
Publication number | Publication date |
---|---|
CN112184582B (en) | 2022-08-19 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112184582B (en) | Attention mechanism-based image completion method and device | |
CN112686817B (en) | Image completion method based on uncertainty estimation | |
CN111160085A (en) | Human body image key point posture estimation method | |
CN112686816A (en) | Image completion method based on content attention mechanism and mask code prior | |
CN109903236B (en) | Face image restoration method and device based on VAE-GAN and similar block search | |
CN111815523A (en) | Image restoration method based on generation countermeasure network | |
CN110728219A (en) | 3D face generation method based on multi-column multi-scale graph convolution neural network | |
CN111242841A (en) | Image background style migration method based on semantic segmentation and deep learning | |
CN113989129A (en) | Image restoration method based on gating and context attention mechanism | |
CN112102303A (en) | Semantic image analogy method for generating countermeasure network based on single image | |
Li et al. | Learning efficient gans for image translation via differentiable masks and co-attention distillation | |
CN109447897B (en) | Real scene image synthesis method and system | |
CN111986075A (en) | Style migration method for target edge clarification | |
CN110097615B (en) | Stylized and de-stylized artistic word editing method and system | |
CN112149563A (en) | Method and system for estimating postures of key points of attention mechanism human body image | |
CN112801914A (en) | Two-stage image restoration method based on texture structure perception | |
CN112949553A (en) | Face image restoration method based on self-attention cascade generation countermeasure network | |
CN117788629B (en) | Image generation method, device and storage medium with style personalization | |
Wang et al. | Diverse image inpainting with normalizing flow | |
CN114092354A (en) | Face image restoration method based on generation countermeasure network | |
CN113160081A (en) | Depth face image restoration method based on perception deblurring | |
CN117611428A (en) | Fashion character image style conversion method | |
Sanjay et al. | Early Renaissance Art Generation Using Deep Convolutional Generative Adversarial Networks | |
CN113111906B (en) | Method for generating confrontation network model based on condition of single pair image training | |
CN114140317A (en) | Image animation method based on cascade generation confrontation network |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |