CN114463209B

CN114463209B - Image restoration method based on deep multi-feature collaborative learning

Info

Publication number: CN114463209B
Application number: CN202210089664.4A
Authority: CN
Inventors: 王员根; 林嘉裕
Original assignee: Guangzhou University
Current assignee: Guangzhou University
Priority date: 2022-01-25
Filing date: 2022-01-25
Publication date: 2022-12-16
Anticipated expiration: 2042-01-25
Also published as: CN114463209A

Abstract

The invention relates to the field of image processing, in particular to an image restoration method based on depth multi-feature collaborative learning, which comprises the following steps: s1, inputting an image to be restored into a preset image feature encoder, and performing effective feature extraction on the image to be restored through deep neural network coding to form an effective image feature set; s2, decoding and repairing the effective image characteristic set through a preset image decoder, and forming a repaired image after passing through a local discriminator and a global discriminator; the image feature encoder consists of six convolutional layers, wherein three shallow convolutional layers are used for reorganizing texture features, and three deep convolutional layers are used for reorganizing structural features to obtain a structural feature set and a texture feature set; the image decoder comprises a soft gate control dual-feature fusion module used for fusing the structural features and the texture features, and a double-side propagation feature aggregation module used for balancing the features among channel information, context attention and feature space. The technology can effectively solve the artifact of the repaired image, so that the repaired image has detailed texture and better image appearance.

Description

Image restoration method based on deep multi-feature collaborative learning

Technical Field

The invention relates to the field of image processing, in particular to an image restoration method based on depth multi-feature collaborative learning.

Background

With the advancement of information technology and the advent of the digital age, digital images have been widely present in human life as carriers for recording and transferring image data, and have grown at an alarming rate. However, digital images are often damaged during capture, storage, processing and transmission or the integrity of the information stored in the images is lost due to occlusion. In order to retrieve the lost part of the damaged digital image information, the current technology can reasonably restore the lost digital image information according to the relevant characteristics of the information in the current image data, namely, the lost digital image information is restored as much as possible according to the image information which is not damaged or shielded, and the technology is commonly called image restoration technology.

Image restoration aims at reconstructing a damaged area or removing an unnecessary area in an image while improving its visual aesthetic sense, is widely used for low-level visual tasks such as restoring a damaged photograph or removing a target area, and the like, and currently conventional restoration methods are classified into a diffusion-based method and a block-based method.

For example, a feature equalization-based inter codec restoration method proposed by liu rainbow rain, the technology proposes an inter codec using deep and shallow convolutional feature layers as the structure and texture of an image, respectively. The deep features are sent to the structural branches and the shallow features are sent to the texture branches. In each branch, the holes are filled with a plurality of sizes. And connecting the characteristics from the two branches to perform channel equalization and characteristic equalization. The channel equalization adopted by the technology adopts a compression and activation network (SEnet), and uses a bilateral propagation activation function to balance the attention of the channel again on the characteristic equalization so as to realize the space equalization. And finally, generating an output image in a jump connection mode.

The technology provides a two-stage network image restoration algorithm based on bidirectional cascade edge detection network (BDCN) and U-net incomplete edge generation. In the first stage, image edge information is extracted based on a BDCN network to replace a Canny operator to extract the edge of a residual region, each layer of network learns the edge characteristics of a specific scale, multi-scale edge characteristics are obtained through fusion, then the edge characteristics of the residual image are extracted by using a contraction path based on a U-net network architecture, and then the image edge texture information is restored by using an expansion path. And in the second stage, hole convolution is used for lower adoption and upper sampling, and a missing image with rich details is reconstructed through a residual error network.

A cascade generation-based confrontation network image restoration algorithm proposed by He is formed by connecting coarsening and optimization generation sub-networks in series. A parallel convolution module is designed in a coarsening generation network and is formed by connecting 3 layers of convolution paths and 1 deep layer convolution path in parallel, and when the number of the convolution layers is deep, the problem of gradient disappearance can be solved; a cascade residual module is provided in a deep convolution path, and the characteristic multiplexing can be effectively enhanced by performing cross cascade on the double-layer convolution of 4 channels; and correspondingly adding the convolution result and the element of the module input characteristic diagram, and performing local residual learning to improve the expression capability of the network.

Aiming at the existing diffusion-based method for propagating the appearance information of the adjacent content to fill the missing area, only by means of a search mechanism on the adjacent content, when a large-area defective picture is repaired, obvious artifacts are generated. The block-based method fills the missing region by searching for the most similar block from the uncorrupted region, which has the advantage of acquiring far-distance information, but is difficult to generate semantically reasonable images due to lack of high-level structural understanding. With the progress of the technology, although the method based on deep learning can understand high-level semantics to generate reasonable content, due to the lack of an effective multi-feature fusion technology, the actual repairing effect of the existing image repairing method is still not natural and perfect.

Disclosure of Invention

The invention provides an image restoration method based on depth multi-feature collaborative learning, aiming at the technical problems of artifacts, unnatural structures and textures and the like in the conventional image restoration technology.

The image restoration method based on the depth multi-feature collaborative learning comprises the following steps:

s1, inputting an image to be restored into a preset image feature encoder, and performing effective feature extraction on the image to be restored through deep neural network encoding to form an effective image feature set;

s2, decoding and repairing the effective image feature set through a preset image decoder, and forming a repaired image through a local discriminator and a global discriminator;

the image feature encoder consists of six convolutional layers, wherein three shallow convolutional layers are used for reorganizing texture features to represent image details, and three deep convolutional layers are used for reorganizing structural features to represent image semantics to obtain a structural feature set and a texture feature set;

the image decoder comprises a soft gate control dual-feature fusion module used for fusing the structural features and the texture features, and a bilateral propagation feature aggregation module used for balancing the features among channel information, context attention and feature space.

Preferably, the texture feature and the structural feature are first filled in the damaged area by using three parallel streams with different kernel sizes, the three streams are combined to form an output feature map, and then the output feature map is mapped to the same size of the input feature.

Further, the output of the structural features and the texture features meets the following requirements:

L _rst ＝||g(F _cst )-I _st || ₁ (1-1)

L _rte ＝||g(F _cte )-I _gt || ₁ (1-2)

wherein, F _cst And F _cte Respectively expressed as output characteristics, L, of the structure and texture resulting from the concatenation of the multi-scale filling stages _rst And L _rte Denoted as reconstruction loss of structure and texture, respectively, g (-) is a convolution operation with a kernel size of 1, F can be expressed _cst And F _cte Respectively mapped as color images, I _gt And I _st Representing a real image and its structural image, respectively, using an edge preserving image smoothing method to generate I _st 。

Preferably, the soft-gated dual feature fusion module comprises a structure-guided texture feature unit for executing an algorithm,

G _te ＝σ(SE(h([F _cst ,F _cte ]))) (2-1)

wherein, F _cst And F _cte Respectively expressed as the output characteristics of the structure and texture generated by the concatenation of the multi-scale filling stages, h (-) is a convolution operation with kernel size 3, SE (-) is a compression and activation operation to capture important channel information, σ (-) is a Sigmoid activation function, G _te Is used to control the degree of refinement, F ', of texture information' _cte Indicating a texture feature with structure perception, alpha and beta are learnable parameters, an-l indicates an element correspondence product,

indicating that the elements are correspondingly added.

Preferably, the soft-gated dual feature fusion module comprises a texture-guided structural feature unit for executing an algorithm,

G _st ＝σ(SE(k([F _cst ,F _cte ]))) (2-3)

wherein, F _cst And F _cte Respectively expressed as the output characteristics of the structure and texture generated by the concatenation of the multi-scale filling stages, k (-) is a convolution operation with kernel size 3, SE (-) is a compression and activation operation to capture important channel information, σ (-) is a Sigmoid activation function, G _st The degree of refinement, F ', of the structural information is controlled' _cst Indicating a texture feature with structure perception, gamma is a learnable parameter, which indicates the corresponding product of elements,

indicating that the elements are correspondingly added.

F _fu ＝v([F′ _cst ,F′ _cte ]) (2-5)

Wherein, F' _cte And F' _cst Respectively representing texture features with structure perception and texture features with structure perception, v (-) is the kernelConvolution operation of size 1, F _fu Is the final output characteristic of the soft gating dual-characteristic fusion module.

Preferably, the bilateral propagation feature aggregation module includes a capture channel information fusion unit, which captures channel information by an adaptive core selection method using a dynamic core selection network to obtain the feature map F' _fu 。

Further, the bilateral propagation feature aggregation module includes a context attention fusion unit, configured to capture a relationship between input image blocks, and calculate a cosine similarity, and specifically execute the following algorithm:

wherein, feature F' _fu Divided into non-overlapping blocks (pixels of size 3 x 3),

representing the cosine similarity between the output feature blocks,

denotes the attention score, p, obtained by the Softmax function _i And p _j Are respectively the ith and jth blocks of the input feature F ', N being the input feature F' _fu The total number of blocks of (a) is,

a feature map reconstructed from the attention scores is shown.

Preferably, the bilateral propagation feature aggregation module includes a spatial information fusion unit, and specifically executes the following algorithm:

wherein, the first and the second end of the pipe are connected with each other,

and

representing spatial and range similarity profiles, x _i Is a characteristic of input

The ith characteristic channel of (1) _j Are neighboring feature channels at locations j around channel i,

is a Gaussian function for adjusting the spatial contributions from neighboring feature channels, and C (x) is

F (-) is a dot product operation.

Further, the calculation method of the output characteristic channel comprises the following steps:

wherein the content of the first and second substances,

and

representing spatial and range similarity profiles, q representsConvolution layer, kernel size 1.

Further, each channel feature is aggregated to obtain a reconstructed feature map

F 'is then formed by concatenated convolution' _fu And with

Fusion to give F _sc ，

Wherein

Is a recombined multichannel character, F' _fu To weigh the features obtained after channel information, F _sc For the final fusion repair feature, z is a convolution operation with a convolution kernel size of 1.

Preferably, the global and local discriminators are composed of five convolutional layers, the size of a convolutional kernel is 4, the step length is 2, and all layers except the last layer use a Leaky ReLu with a slope of 0.2.

Compared with the prior art, the image restoration method based on deep multi-feature collaborative learning has the following beneficial effects:

compared with the prior art, the method has the advantages that not only the relation between the image structure and the texture is considered, but also the relation between the image contexts is considered. The method adopts a single-stage network, and uses double branches to respectively learn the structure and the texture of the image, so that the generated structure and the texture are more consistent. And the image structure information is fully utilized, so that the generated image structure is more reasonable, and the visual image result is more real. Specifically, the consistency of the structure and the texture is enhanced through a soft gating dual feature fusion (SDFF) module, and the blurring and the artifacts around the hole area can be effectively reduced through a switching and recombination mode. The link from local features to overall consistency is enhanced through a Bilateral Propagation Feature Aggregation (BPFA) module, and the linkage between the contextual attention, the channel information and the feature space is considered, so that the repaired image has detailed textures and better image appearance.

Drawings

The present invention is further described with reference to the accompanying drawings, but the embodiments in the drawings do not limit the present invention in any way, and for those skilled in the art, other drawings may be obtained according to the following drawings without creative efforts.

FIG. 1 is a block diagram of such a multi-feature collaborative learning network provided by the present invention;

FIG. 2 is a schematic diagram of a soft-gated dual feature fusion module;

FIG. 3 is a schematic diagram of a bilateral propagation feature aggregation module;

FIG. 4 is a comparison graph of the repairing effect of the present invention on irregular holes and the image repairing technique based on deep learning in the prior art;

FIG. 5 is a comparison graph of the repairing effect of the present invention on the central hole and the existing image repairing technology based on deep learning;

fig. 6 is a graph of the results of an image repair ablation experiment of the present invention.

Detailed Description

The image inpainting method based on deep multi-feature collaborative learning provided by the invention is further described below with reference to the accompanying drawings, and it should be noted that the technical solution and the design principle of the invention are only explained in detail in an optimized technical solution below.

The core of the image restoration method based on deep multi-feature collaborative learning provided by the invention is to provide a multi-feature collaborative learning network for restoring damaged images. First, this patent proposes a soft-gated dual feature fusion (SDFF) module that enables the coordinated information exchange of image structure and texture, thereby enabling them to strengthen the connection between each other. Second, the patent uses a Bilateral Propagation Feature Aggregation (BPFA) module to further refine the generated structure and texture by enhancing the connection from local features to global consistency through collaborative learning of context attention, channel information, and feature space. In addition, the invention uses an end-to-end single-stage network training mode, and adopts double branches to respectively learn the image structure and texture in a single stage, thereby effectively reducing image artifacts and generating a more real image result.

Specifically, the technical overall stem model of the image inpainting method based on the deep multi-feature collaborative learning is shown in fig. 1, and the method includes the following steps: the encoder (1) consists of six convolutional layers. The three shallow features are reorganized into texture features to represent image details. Meanwhile, reorganizing the three deep features into structural features to represent image semantics; (2) Two branches are adopted to learn structural and textural features respectively; (3) A soft-gated dual feature fusion module to fuse the structural and texture features generated by the two branches, see fig. 2 in particular; (4) A bilateral propagation feature aggregation module equalizes features between channel information, context awareness, and feature space, see in particular fig. 3. Specifically, the dynamic kernel selection network (SKNets) is used for selecting capture channel information through an adaptive convolution kernel, capturing context relations in an image by using a Context Awareness (CA) module, and capturing the relation between space and range by using a Bilateral Propagation Activation (BPA) module; (5) Finally, the decoder is given guidance information by the jump-join method, synthesizing the structural and texture branches to produce more complex images; (6) The use of local and global discriminators makes the generated image more realistic.

Specifically, the image restoration method based on the depth multi-feature collaborative learning comprises the following steps:

s1, inputting an image to be restored into a preset image feature encoder, and performing effective feature extraction on the image to be restored through deep neural network coding to form an effective image feature set;

s2, decoding and repairing the effective image characteristic set through a preset image decoder, and forming a repaired image after passing through a local discriminator and a global discriminator;

the image decoder comprises a soft gate control dual-feature fusion module used for fusing the structural features and the texture features, and a double-side propagation feature aggregation module used for balancing the features among channel information, context attention and feature space.

L _rst ＝||g(F _cst )-I _st || ₁ (1-1)

L _rte ＝||g(F _cte )-I _gt || ₁ (1-2)

wherein, F _cst And F _cte Respectively, as output features of structure and texture generated by the concatenation of the multiple proportion fill stages, L _rst And L _rte Expressed as reconstruction loss of structure and texture, respectively, g (-) is a convolution operation with a kernel size of 1, let F _cst And F _cte Respectively mapped as color images, I _gt And I _st Representing a real image and its structural image, respectively, using an edge preserving image smoothing method to generate I _st 。

G _te ＝σ(SE(h([F _cst ,F _cte ]))) (2-1)

wherein, F _cst And F _cte Respectively expressed as output features of structure and texture generated by concatenation of multi-scale filling stages, h (-) is a convolution operation with kernel size 3, SE (-) is a compression and activation operation to capture important channel information, σ (-) is a Sigmoid activation function, G _te Is used to control the degree of refinement, F ', of texture information' _cte Indicating a texture feature with structure perception, alpha and beta are learnable parameters, which indicate the corresponding product of elements,

indicating that the elements are correspondingly added.

G _st ＝σ(SE(k([F _cst ,F _cte ]))) (2-3)

wherein, F _cst And F _cte Respectively, as output features of structure and texture generated by the concatenation of the multiple-scale filling stages, k (.) is a convolution operation with kernel size 3, SE (·) is a compression and activation operation to capture important channel information, σ (·) is a Sigmoid activation function, G (·) _st To control the degree of refinement, F ', of the structural information' _cst Indicating a texture feature with structure perception, gamma is a learnable parameter, which indicates the corresponding product of elements,

indicating that the elements are correspondingly added.

F _fu ＝v([F′ _cst ,F′ _cte ]) (2-5)

Wherein, F' _cte And F' _cst Respectively representing having structural perceptionV (-) is a convolution operation with kernel size 1, F _fu Is the final output characteristic of the soft gating dual-characteristic fusion module.

Preferably, the bilateral propagation feature aggregation module includes a capture channel information fusion unit, which captures channel information by an adaptive core selection method using a dynamic core selection network to obtain a feature map F' _fu 。

Further, the bilateral propagation feature aggregation module includes a context attention fusion unit, configured to capture a relationship between the input image blocks, and calculate a cosine similarity, and specifically execute the following algorithm:

representing the cosine similarity between the output feature blocks,

denotes the attention score, p, obtained by the Softmax function _i And p _j Are respectively input characteristic F' _fu Is an input feature F' _fu The total number of blocks of (a) is,

a feature map reconstructed from the attention scores is shown.

wherein the content of the first and second substances,

and

The ith characteristic channel of (1) _j Are adjacent feature channels at locations j around channel i,

is a Gaussian function for adjusting the spatial contributions from neighboring feature channels, C (x) is

F (-) is a dot product operation.

Further, the output characteristic channel calculation method comprises the following steps:

wherein the content of the first and second substances,

and

the spatial and range similarity feature maps are shown, q represents the convolution layer, and the kernel size is 1.

F 'is then formed by concatenated convolution' _fu And with

Are fused to give F _sc ，

Wherein

Is a recombined multichannel character, F' _fu For features obtained after weighing channel information, F _sc For the final fusion repair feature, z is a convolution operation with a convolution kernel size of 1.

The following points describe the core technical process in detail:

(1) Structural and texture branching

The texture feature of the shallow convolution reconstruction is denoted as F _te Structural features of deep convolutional recombination are denoted as F _st . In each branch, three parallel streams are used, with different scales to fill the damaged area. Where the kernel sizes of different streams are different. Finally, by combining the output feature maps of the three streams, the combined features are then mapped to the same size of the input features. Here, F _cst And F _cte Represented as the outputs of the texture and texture branches, respectively. To ensure that each branch is concerned with structure and texture separately, we use two reconstruction penaltiesAre respectively represented as L _rst And L _rte . The pixel level loss is defined as:

L _rst ＝||g(F _cst )-I _st || ₁ (1-1)

L _rte ＝||g(F _cte )-I _gt || ₁ (1-2)

where g (-) is a convolution operation with a kernel size of 1, with the goal of dividing F _cst And F _cte Respectively mapped as color images. I is _gt And I _st Respectively representing a real image and its structural image. Generating I using an edge preserving smoothing method _st 。

(2) Soft-gated dual feature fusion module

In the algorithm, the structural feature F generated by the two branches _cst And texture feature F _cte Better combinations are made. By exchanging the two types of information, the ratio is dynamically controlled by soft gating to achieve the purpose of dynamic combination. In particular, to construct structure-guided texture features. Soft gate control G _te To control the refinement of the texture information.

This is defined as:

G _te ＝σ(SE(h([F _cst ,F _cte ]))) (2-1)

where h (-) is a convolution operation with kernel size 3. SE (-) is a compression and activation operation to capture important channel information. σ (-) is a Sigmoid activation function, using soft gating G _te This can dynamically couple F _cst Is incorporated into F _cte ：

Where alpha and beta are learnable parameters, which indicate elemental multiplication,

representing an element addition.

Likewise, texture guides feature F' _cst Is defined as follows:

G _st ＝σ(SE(k([F _cst ,F _cte ]))) (2-3)

where k and h have the same arithmetic operation, γ is a learnable parameter.

Finally, is bonded F' _cte And F' _cst And generating the feature F using a convolution operation v having a kernel size of 1 _fu ：

F _fu ＝v([F′ _cst ,F′ _cte ]) (2-5)

(3) Bilateral propagation feature aggregation module

This module is proposed to re-weigh the channels and space so that the image representation is more consistent. Firstly, capturing channel information by using a dynamic core selection network in an adaptive core selection mode to obtain a feature map F' _fu The correlation between channels can be enhanced, and the consistency of the whole image can be maintained. And a Context Attention (CA) module is introduced to capture the association between image blocks. Specifically, for a given input feature F, we extract a block of 3 × 3 pixels and calculate the cosine similarity:

wherein p is _i And p _j The ith and jth blocks of the input feature F, respectively.

We use the Softmax function to obtain the attention score between each pair of blocks:

wherein N is an input feature F' _fu Total number of blocks. Next, the feature map is reconstructed using the attention scores:

reconstructed feature maps

Is obtained by directly recombining each block.

In the spatial and range domains, we introduce a Bilateral Propagation Activation (BPA) module to generate response values based on range and spatial distance. The response values were calculated as follows:

wherein x _i Is a characteristic of input

The number of locations in (f) · is a dot product operation. In the spatial domain, we explore j in the neighborhood s for global propagation. S is set to the same size as the input features in the experiment. In the range domain, v is an adjacent area of the position i, and the size thereof is set to 3 × 3. Therefore, we can obtain the characteristic diagram by the space and range similarity measurement method respectively

And

each feature channel can calculate:

where q represents a convolutional layer, the kernel size is 1.

Next, each channel is aggregated to obtain a reconstructed feature map

Finally, we concatenate then convolve F' _fu And

to obtain F _sc 。

Where z is a convolution operation with a convolution kernel size of 1.

(4) Distinguishing device

The invention introduces global and local discriminators to ensure that the local-global image content is more consistent. It consists of five convolutional layers, the convolutional kernel size is 4, the step length is 2, and all other layers except the last layer use Leaky ReLu with a slope of 0.2. Furthermore, spectral normalization is employed to achieve stable training.

The above are only preferred embodiments of the present invention, and it should be noted that the above preferred embodiments should not be considered as limiting the present invention, and the protection scope of the present invention should be subject to the scope defined by the claims. It will be apparent to those skilled in the art that several modifications, substitutions, improvements and embellishments of the steps can be made without departing from the spirit and scope of the invention, and these modifications, substitutions, improvements and embellishments should also be construed as the scope of the invention.

Claims

1. An image restoration method based on depth multi-feature collaborative learning comprises the following steps:

the image feature encoder is characterized by comprising six convolutional layers, wherein three shallow convolutional layers are used for reorganizing texture features to represent image details, and three deep convolutional layers are used for reorganizing structural features to represent image semantics to obtain a structural feature set and a texture feature set;

the image decoder comprises a soft-gated dual-feature fusion module for fusing the structural features and the texture features, a bilateral propagation feature aggregation module for equalizing the features between channel information, context attention and feature space, the soft-gated dual-feature fusion module comprising a structure-guided texture feature unit for executing the following algorithm,

G _te ＝σ(SE(h([F _cst ,F _cte ]))) (2-1)

F′ _cte ＝α(β(G _te ⊙F _cte )⊙F _cst )⊕F _cte (2-2)

wherein, F _cst And F _cte Respectively expressed as the output characteristics of the structure and texture generated by the concatenation of the multi-scale filling stages, h (-) is a convolution operation with kernel size 3, SE (-) is a compression and activation operation to capture important channel information, σ (-) is a Sigmoid activation function, G _te Is used to control the refinement degree, F' _cte Indicating a texture feature with structure perception, alpha and beta are learnable parameters, indicating an element to element product, and ^ indicating an element to element addition.

2. An image inpainting method as claimed in claim 1, characterized in that the texture features and the texture features are first filled into the damaged area using three parallel streams with different kernel sizes, respectively, the three streams are combined to form an output feature map, and then the output feature map is mapped to the same size as the input features, and the output of the texture features and the texture features satisfies the following requirements:

L _rst ＝||g(F _cst )-I _st || ₁ (1-1)

L _rte ＝||g(F _cte )-I _gt || ₁ (1-2)

wherein, F _cst And F _cte Output features, L, respectively expressed as structures and textures generated by a multi-scale fill-phase join _rst And L _rte Denoted as reconstruction loss of structure and texture, respectively, g (-) is a convolution operation with a kernel size of 1, F can be expressed _cst And F _cte Respectively mapped as color images, I _gt And I _st Representing a real image and its structural image, respectively, using an edge preserving image smoothing method to generate I _st 。

3. The image inpainting method of claim 1, wherein the soft-gated dual feature fusion module comprises a texture-guided structural feature unit configured to perform an algorithm,

G _st ＝σ(SE(k([F _cst ,F _cte ]))) (2-3)

F′ _cst ＝γ(G _st ⊙F _cte )⊕F _cst (2-4)

wherein, F _cst And F _cte Respectively expressed as the output characteristics of the structure and texture generated by the concatenation of the multi-scale filling stages, k (-) is a convolution operation with kernel size 3, SE (-) is a compression and activation operation to capture important channel information, σ (-) is a Sigmoid activation function, G _st To control structural informationDegree of refinement, F' _cst Indicating a structural feature with texture perception, gamma is a learnable parameter, indicating an element to element product, ^ element to element addition,

F _fu ＝v([F′ _cst ,F′ _cte ]) (2-5)

wherein, F' _cte And F' _cst Representing texture features with texture perception and texture features with texture perception, respectively, v (-) is a convolution operation with a kernel size of 1, F _fu Is the final output characteristic of the soft gate control dual-characteristic fusion module.

4. The method of claim 1, wherein the bilateral propagation feature aggregation module comprises a capture channel information fusion unit for capturing channel information by an adaptive core selection method using a dynamic core selection network to obtain the feature map F' _fu 。

5. The image inpainting method of claim 4, wherein the bilateral propagation feature aggregation module includes a context attention fusion unit for capturing a relation between the input image blocks and calculating cosine similarity, and specifically executes the following algorithm:

wherein, feature F' _fu The division into non-overlapping blocks is performed,

representing the cosine similarity between the output feature blocks,

denotes the attention score, p, obtained by the Softmax function _i And p _j Are respectively an input characteristic F' _fu Is an input feature F' _fu The total number of blocks of (a) is,

a feature map obtained by reconstructing the combined feature block from the attention scores is shown.

6. The image inpainting method of claim 1, wherein the bilateral propagation feature aggregation module includes a spatial information fusion unit that specifically executes the following algorithm:

wherein the content of the first and second substances,

and

F (-) is a dot product operation, in the spatial domain, j is explored for global propagation in the neighborhood s, and in the range domain, v is the neighborhood of location i, with the size set to 3 × 3.

7. The image inpainting method of claim 6, wherein the output feature channel is calculated by:

wherein the content of the first and second substances,

and

8. An image inpainting method as claimed in claim 7, wherein each channel feature is aggregated to obtain a reconstructed feature map

F 'is then formed by concatenated convolution' _fu And

fusion to give F _sc ，

Wherein

9. The image inpainting method of claim 1, wherein the global and local discriminators are composed of five convolutional layers, the convolutional kernel size is 4, the step size is 2, all layers except the last layer use a Leaky ReLu with a slope of 0.2, and stable training is achieved using spectral normalization.