CN114463209B - Image restoration method based on deep multi-feature collaborative learning - Google Patents

Image restoration method based on deep multi-feature collaborative learning Download PDF

Info

Publication number
CN114463209B
CN114463209B CN202210089664.4A CN202210089664A CN114463209B CN 114463209 B CN114463209 B CN 114463209B CN 202210089664 A CN202210089664 A CN 202210089664A CN 114463209 B CN114463209 B CN 114463209B
Authority
CN
China
Prior art keywords
feature
image
texture
cte
features
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210089664.4A
Other languages
Chinese (zh)
Other versions
CN114463209A (en
Inventor
王员根
林嘉裕
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangzhou University
Original Assignee
Guangzhou University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangzhou University filed Critical Guangzhou University
Priority to CN202210089664.4A priority Critical patent/CN114463209B/en
Publication of CN114463209A publication Critical patent/CN114463209A/en
Application granted granted Critical
Publication of CN114463209B publication Critical patent/CN114463209B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/77Retouching; Inpainting; Scratch removal
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computational Linguistics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Evolutionary Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Image Analysis (AREA)

Abstract

The invention relates to the field of image processing, in particular to an image restoration method based on depth multi-feature collaborative learning, which comprises the following steps: s1, inputting an image to be restored into a preset image feature encoder, and performing effective feature extraction on the image to be restored through deep neural network coding to form an effective image feature set; s2, decoding and repairing the effective image characteristic set through a preset image decoder, and forming a repaired image after passing through a local discriminator and a global discriminator; the image feature encoder consists of six convolutional layers, wherein three shallow convolutional layers are used for reorganizing texture features, and three deep convolutional layers are used for reorganizing structural features to obtain a structural feature set and a texture feature set; the image decoder comprises a soft gate control dual-feature fusion module used for fusing the structural features and the texture features, and a double-side propagation feature aggregation module used for balancing the features among channel information, context attention and feature space. The technology can effectively solve the artifact of the repaired image, so that the repaired image has detailed texture and better image appearance.

Description

Image restoration method based on deep multi-feature collaborative learning
Technical Field
The invention relates to the field of image processing, in particular to an image restoration method based on depth multi-feature collaborative learning.
Background
With the advancement of information technology and the advent of the digital age, digital images have been widely present in human life as carriers for recording and transferring image data, and have grown at an alarming rate. However, digital images are often damaged during capture, storage, processing and transmission or the integrity of the information stored in the images is lost due to occlusion. In order to retrieve the lost part of the damaged digital image information, the current technology can reasonably restore the lost digital image information according to the relevant characteristics of the information in the current image data, namely, the lost digital image information is restored as much as possible according to the image information which is not damaged or shielded, and the technology is commonly called image restoration technology.
Image restoration aims at reconstructing a damaged area or removing an unnecessary area in an image while improving its visual aesthetic sense, is widely used for low-level visual tasks such as restoring a damaged photograph or removing a target area, and the like, and currently conventional restoration methods are classified into a diffusion-based method and a block-based method.
For example, a feature equalization-based inter codec restoration method proposed by liu rainbow rain, the technology proposes an inter codec using deep and shallow convolutional feature layers as the structure and texture of an image, respectively. The deep features are sent to the structural branches and the shallow features are sent to the texture branches. In each branch, the holes are filled with a plurality of sizes. And connecting the characteristics from the two branches to perform channel equalization and characteristic equalization. The channel equalization adopted by the technology adopts a compression and activation network (SEnet), and uses a bilateral propagation activation function to balance the attention of the channel again on the characteristic equalization so as to realize the space equalization. And finally, generating an output image in a jump connection mode.
The technology provides a two-stage network image restoration algorithm based on bidirectional cascade edge detection network (BDCN) and U-net incomplete edge generation. In the first stage, image edge information is extracted based on a BDCN network to replace a Canny operator to extract the edge of a residual region, each layer of network learns the edge characteristics of a specific scale, multi-scale edge characteristics are obtained through fusion, then the edge characteristics of the residual image are extracted by using a contraction path based on a U-net network architecture, and then the image edge texture information is restored by using an expansion path. And in the second stage, hole convolution is used for lower adoption and upper sampling, and a missing image with rich details is reconstructed through a residual error network.
A cascade generation-based confrontation network image restoration algorithm proposed by He is formed by connecting coarsening and optimization generation sub-networks in series. A parallel convolution module is designed in a coarsening generation network and is formed by connecting 3 layers of convolution paths and 1 deep layer convolution path in parallel, and when the number of the convolution layers is deep, the problem of gradient disappearance can be solved; a cascade residual module is provided in a deep convolution path, and the characteristic multiplexing can be effectively enhanced by performing cross cascade on the double-layer convolution of 4 channels; and correspondingly adding the convolution result and the element of the module input characteristic diagram, and performing local residual learning to improve the expression capability of the network.
Aiming at the existing diffusion-based method for propagating the appearance information of the adjacent content to fill the missing area, only by means of a search mechanism on the adjacent content, when a large-area defective picture is repaired, obvious artifacts are generated. The block-based method fills the missing region by searching for the most similar block from the uncorrupted region, which has the advantage of acquiring far-distance information, but is difficult to generate semantically reasonable images due to lack of high-level structural understanding. With the progress of the technology, although the method based on deep learning can understand high-level semantics to generate reasonable content, due to the lack of an effective multi-feature fusion technology, the actual repairing effect of the existing image repairing method is still not natural and perfect.
Disclosure of Invention
The invention provides an image restoration method based on depth multi-feature collaborative learning, aiming at the technical problems of artifacts, unnatural structures and textures and the like in the conventional image restoration technology.
The image restoration method based on the depth multi-feature collaborative learning comprises the following steps:
s1, inputting an image to be restored into a preset image feature encoder, and performing effective feature extraction on the image to be restored through deep neural network encoding to form an effective image feature set;
s2, decoding and repairing the effective image feature set through a preset image decoder, and forming a repaired image through a local discriminator and a global discriminator;
the image feature encoder consists of six convolutional layers, wherein three shallow convolutional layers are used for reorganizing texture features to represent image details, and three deep convolutional layers are used for reorganizing structural features to represent image semantics to obtain a structural feature set and a texture feature set;
the image decoder comprises a soft gate control dual-feature fusion module used for fusing the structural features and the texture features, and a bilateral propagation feature aggregation module used for balancing the features among channel information, context attention and feature space.
Preferably, the texture feature and the structural feature are first filled in the damaged area by using three parallel streams with different kernel sizes, the three streams are combined to form an output feature map, and then the output feature map is mapped to the same size of the input feature.
Further, the output of the structural features and the texture features meets the following requirements:
L rst =||g(F cst )-I st || 1 (1-1)
L rte =||g(F cte )-I gt || 1 (1-2)
wherein, F cst And F cte Respectively expressed as output characteristics, L, of the structure and texture resulting from the concatenation of the multi-scale filling stages rst And L rte Denoted as reconstruction loss of structure and texture, respectively, g (-) is a convolution operation with a kernel size of 1, F can be expressed cst And F cte Respectively mapped as color images, I gt And I st Representing a real image and its structural image, respectively, using an edge preserving image smoothing method to generate I st
Preferably, the soft-gated dual feature fusion module comprises a structure-guided texture feature unit for executing an algorithm,
G te =σ(SE(h([F cst ,F cte ]))) (2-1)
Figure GDA0003888084410000031
wherein, F cst And F cte Respectively expressed as the output characteristics of the structure and texture generated by the concatenation of the multi-scale filling stages, h (-) is a convolution operation with kernel size 3, SE (-) is a compression and activation operation to capture important channel information, σ (-) is a Sigmoid activation function, G te Is used to control the degree of refinement, F ', of texture information' cte Indicating a texture feature with structure perception, alpha and beta are learnable parameters, an-l indicates an element correspondence product,
Figure GDA0003888084410000041
indicating that the elements are correspondingly added.
Preferably, the soft-gated dual feature fusion module comprises a texture-guided structural feature unit for executing an algorithm,
G st =σ(SE(k([F cst ,F cte ]))) (2-3)
Figure GDA0003888084410000042
wherein, F cst And F cte Respectively expressed as the output characteristics of the structure and texture generated by the concatenation of the multi-scale filling stages, k (-) is a convolution operation with kernel size 3, SE (-) is a compression and activation operation to capture important channel information, σ (-) is a Sigmoid activation function, G st The degree of refinement, F ', of the structural information is controlled' cst Indicating a texture feature with structure perception, gamma is a learnable parameter, which indicates the corresponding product of elements,
Figure GDA0003888084410000043
indicating that the elements are correspondingly added.
F fu =v([F′ cst ,F′ cte ]) (2-5)
Wherein, F' cte And F' cst Respectively representing texture features with structure perception and texture features with structure perception, v (-) is the kernelConvolution operation of size 1, F fu Is the final output characteristic of the soft gating dual-characteristic fusion module.
Preferably, the bilateral propagation feature aggregation module includes a capture channel information fusion unit, which captures channel information by an adaptive core selection method using a dynamic core selection network to obtain the feature map F' fu
Further, the bilateral propagation feature aggregation module includes a context attention fusion unit, configured to capture a relationship between input image blocks, and calculate a cosine similarity, and specifically execute the following algorithm:
Figure GDA0003888084410000044
Figure GDA0003888084410000045
Figure GDA0003888084410000046
wherein, feature F' fu Divided into non-overlapping blocks (pixels of size 3 x 3),
Figure GDA0003888084410000047
representing the cosine similarity between the output feature blocks,
Figure GDA0003888084410000048
denotes the attention score, p, obtained by the Softmax function i And p j Are respectively the ith and jth blocks of the input feature F ', N being the input feature F' fu The total number of blocks of (a) is,
Figure GDA0003888084410000051
a feature map reconstructed from the attention scores is shown.
Preferably, the bilateral propagation feature aggregation module includes a spatial information fusion unit, and specifically executes the following algorithm:
Figure GDA0003888084410000052
Figure GDA0003888084410000053
wherein, the first and the second end of the pipe are connected with each other,
Figure GDA0003888084410000054
and
Figure GDA0003888084410000055
representing spatial and range similarity profiles, x i Is a characteristic of input
Figure GDA0003888084410000056
The ith characteristic channel of (1) j Are neighboring feature channels at locations j around channel i,
Figure GDA0003888084410000057
is a Gaussian function for adjusting the spatial contributions from neighboring feature channels, and C (x) is
Figure GDA0003888084410000058
F (-) is a dot product operation.
Further, the calculation method of the output characteristic channel comprises the following steps:
Figure GDA0003888084410000059
wherein the content of the first and second substances,
Figure GDA00038880844100000510
and
Figure GDA00038880844100000511
representing spatial and range similarity profiles, q representsConvolution layer, kernel size 1.
Further, each channel feature is aggregated to obtain a reconstructed feature map
Figure GDA00038880844100000512
F 'is then formed by concatenated convolution' fu And with
Figure GDA00038880844100000513
Fusion to give F sc
Figure GDA00038880844100000514
Wherein
Figure GDA00038880844100000515
Is a recombined multichannel character, F' fu To weigh the features obtained after channel information, F sc For the final fusion repair feature, z is a convolution operation with a convolution kernel size of 1.
Preferably, the global and local discriminators are composed of five convolutional layers, the size of a convolutional kernel is 4, the step length is 2, and all layers except the last layer use a Leaky ReLu with a slope of 0.2.
Compared with the prior art, the image restoration method based on deep multi-feature collaborative learning has the following beneficial effects:
compared with the prior art, the method has the advantages that not only the relation between the image structure and the texture is considered, but also the relation between the image contexts is considered. The method adopts a single-stage network, and uses double branches to respectively learn the structure and the texture of the image, so that the generated structure and the texture are more consistent. And the image structure information is fully utilized, so that the generated image structure is more reasonable, and the visual image result is more real. Specifically, the consistency of the structure and the texture is enhanced through a soft gating dual feature fusion (SDFF) module, and the blurring and the artifacts around the hole area can be effectively reduced through a switching and recombination mode. The link from local features to overall consistency is enhanced through a Bilateral Propagation Feature Aggregation (BPFA) module, and the linkage between the contextual attention, the channel information and the feature space is considered, so that the repaired image has detailed textures and better image appearance.
Drawings
The present invention is further described with reference to the accompanying drawings, but the embodiments in the drawings do not limit the present invention in any way, and for those skilled in the art, other drawings may be obtained according to the following drawings without creative efforts.
FIG. 1 is a block diagram of such a multi-feature collaborative learning network provided by the present invention;
FIG. 2 is a schematic diagram of a soft-gated dual feature fusion module;
FIG. 3 is a schematic diagram of a bilateral propagation feature aggregation module;
FIG. 4 is a comparison graph of the repairing effect of the present invention on irregular holes and the image repairing technique based on deep learning in the prior art;
FIG. 5 is a comparison graph of the repairing effect of the present invention on the central hole and the existing image repairing technology based on deep learning;
fig. 6 is a graph of the results of an image repair ablation experiment of the present invention.
Detailed Description
The image inpainting method based on deep multi-feature collaborative learning provided by the invention is further described below with reference to the accompanying drawings, and it should be noted that the technical solution and the design principle of the invention are only explained in detail in an optimized technical solution below.
The core of the image restoration method based on deep multi-feature collaborative learning provided by the invention is to provide a multi-feature collaborative learning network for restoring damaged images. First, this patent proposes a soft-gated dual feature fusion (SDFF) module that enables the coordinated information exchange of image structure and texture, thereby enabling them to strengthen the connection between each other. Second, the patent uses a Bilateral Propagation Feature Aggregation (BPFA) module to further refine the generated structure and texture by enhancing the connection from local features to global consistency through collaborative learning of context attention, channel information, and feature space. In addition, the invention uses an end-to-end single-stage network training mode, and adopts double branches to respectively learn the image structure and texture in a single stage, thereby effectively reducing image artifacts and generating a more real image result.
Specifically, the technical overall stem model of the image inpainting method based on the deep multi-feature collaborative learning is shown in fig. 1, and the method includes the following steps: the encoder (1) consists of six convolutional layers. The three shallow features are reorganized into texture features to represent image details. Meanwhile, reorganizing the three deep features into structural features to represent image semantics; (2) Two branches are adopted to learn structural and textural features respectively; (3) A soft-gated dual feature fusion module to fuse the structural and texture features generated by the two branches, see fig. 2 in particular; (4) A bilateral propagation feature aggregation module equalizes features between channel information, context awareness, and feature space, see in particular fig. 3. Specifically, the dynamic kernel selection network (SKNets) is used for selecting capture channel information through an adaptive convolution kernel, capturing context relations in an image by using a Context Awareness (CA) module, and capturing the relation between space and range by using a Bilateral Propagation Activation (BPA) module; (5) Finally, the decoder is given guidance information by the jump-join method, synthesizing the structural and texture branches to produce more complex images; (6) The use of local and global discriminators makes the generated image more realistic.
Specifically, the image restoration method based on the depth multi-feature collaborative learning comprises the following steps:
s1, inputting an image to be restored into a preset image feature encoder, and performing effective feature extraction on the image to be restored through deep neural network coding to form an effective image feature set;
s2, decoding and repairing the effective image characteristic set through a preset image decoder, and forming a repaired image after passing through a local discriminator and a global discriminator;
the image feature encoder consists of six convolutional layers, wherein three shallow convolutional layers are used for reorganizing texture features to represent image details, and three deep convolutional layers are used for reorganizing structural features to represent image semantics to obtain a structural feature set and a texture feature set;
the image decoder comprises a soft gate control dual-feature fusion module used for fusing the structural features and the texture features, and a double-side propagation feature aggregation module used for balancing the features among channel information, context attention and feature space.
Preferably, the texture feature and the structural feature are first filled in the damaged area by using three parallel streams with different kernel sizes, the three streams are combined to form an output feature map, and then the output feature map is mapped to the same size of the input feature.
Further, the output of the structural features and the texture features meets the following requirements:
L rst =||g(F cst )-I st || 1 (1-1)
L rte =||g(F cte )-I gt || 1 (1-2)
wherein, F cst And F cte Respectively, as output features of structure and texture generated by the concatenation of the multiple proportion fill stages, L rst And L rte Expressed as reconstruction loss of structure and texture, respectively, g (-) is a convolution operation with a kernel size of 1, let F cst And F cte Respectively mapped as color images, I gt And I st Representing a real image and its structural image, respectively, using an edge preserving image smoothing method to generate I st
Preferably, the soft-gated dual feature fusion module comprises a structure-guided texture feature unit for executing an algorithm,
G te =σ(SE(h([F cst ,F cte ]))) (2-1)
Figure GDA0003888084410000081
wherein, F cst And F cte Respectively expressed as output features of structure and texture generated by concatenation of multi-scale filling stages, h (-) is a convolution operation with kernel size 3, SE (-) is a compression and activation operation to capture important channel information, σ (-) is a Sigmoid activation function, G te Is used to control the degree of refinement, F ', of texture information' cte Indicating a texture feature with structure perception, alpha and beta are learnable parameters, which indicate the corresponding product of elements,
Figure GDA0003888084410000082
indicating that the elements are correspondingly added.
Preferably, the soft-gated dual feature fusion module comprises a texture-guided structural feature unit for executing an algorithm,
G st =σ(SE(k([F cst ,F cte ]))) (2-3)
Figure GDA0003888084410000083
wherein, F cst And F cte Respectively, as output features of structure and texture generated by the concatenation of the multiple-scale filling stages, k (.) is a convolution operation with kernel size 3, SE (·) is a compression and activation operation to capture important channel information, σ (·) is a Sigmoid activation function, G (·) st To control the degree of refinement, F ', of the structural information' cst Indicating a texture feature with structure perception, gamma is a learnable parameter, which indicates the corresponding product of elements,
Figure GDA0003888084410000084
indicating that the elements are correspondingly added.
F fu =v([F′ cst ,F′ cte ]) (2-5)
Wherein, F' cte And F' cst Respectively representing having structural perceptionV (-) is a convolution operation with kernel size 1, F fu Is the final output characteristic of the soft gating dual-characteristic fusion module.
Preferably, the bilateral propagation feature aggregation module includes a capture channel information fusion unit, which captures channel information by an adaptive core selection method using a dynamic core selection network to obtain a feature map F' fu
Further, the bilateral propagation feature aggregation module includes a context attention fusion unit, configured to capture a relationship between the input image blocks, and calculate a cosine similarity, and specifically execute the following algorithm:
Figure GDA0003888084410000091
Figure GDA0003888084410000092
Figure GDA0003888084410000093
wherein, feature F' fu Divided into non-overlapping blocks (pixels of size 3 x 3),
Figure GDA0003888084410000094
representing the cosine similarity between the output feature blocks,
Figure GDA0003888084410000095
denotes the attention score, p, obtained by the Softmax function i And p j Are respectively input characteristic F' fu Is an input feature F' fu The total number of blocks of (a) is,
Figure GDA0003888084410000096
a feature map reconstructed from the attention scores is shown.
Preferably, the bilateral propagation feature aggregation module includes a spatial information fusion unit, and specifically executes the following algorithm:
Figure GDA0003888084410000097
Figure GDA0003888084410000098
wherein the content of the first and second substances,
Figure GDA0003888084410000099
and
Figure GDA00038880844100000910
representing spatial and range similarity profiles, x i Is a characteristic of input
Figure GDA00038880844100000911
The ith characteristic channel of (1) j Are adjacent feature channels at locations j around channel i,
Figure GDA00038880844100000912
is a Gaussian function for adjusting the spatial contributions from neighboring feature channels, C (x) is
Figure GDA00038880844100000913
F (-) is a dot product operation.
Further, the output characteristic channel calculation method comprises the following steps:
Figure GDA00038880844100000914
wherein the content of the first and second substances,
Figure GDA00038880844100000915
and
Figure GDA00038880844100000916
the spatial and range similarity feature maps are shown, q represents the convolution layer, and the kernel size is 1.
Further, each channel feature is aggregated to obtain a reconstructed feature map
Figure GDA0003888084410000101
F 'is then formed by concatenated convolution' fu And with
Figure GDA0003888084410000102
Are fused to give F sc
Figure GDA0003888084410000103
Wherein
Figure GDA0003888084410000104
Is a recombined multichannel character, F' fu For features obtained after weighing channel information, F sc For the final fusion repair feature, z is a convolution operation with a convolution kernel size of 1.
Preferably, the global and local discriminators are composed of five convolutional layers, the size of a convolutional kernel is 4, the step length is 2, and all layers except the last layer use a Leaky ReLu with a slope of 0.2.
The following points describe the core technical process in detail:
(1) Structural and texture branching
The texture feature of the shallow convolution reconstruction is denoted as F te Structural features of deep convolutional recombination are denoted as F st . In each branch, three parallel streams are used, with different scales to fill the damaged area. Where the kernel sizes of different streams are different. Finally, by combining the output feature maps of the three streams, the combined features are then mapped to the same size of the input features. Here, F cst And F cte Represented as the outputs of the texture and texture branches, respectively. To ensure that each branch is concerned with structure and texture separately, we use two reconstruction penaltiesAre respectively represented as L rst And L rte . The pixel level loss is defined as:
L rst =||g(F cst )-I st || 1 (1-1)
L rte =||g(F cte )-I gt || 1 (1-2)
where g (-) is a convolution operation with a kernel size of 1, with the goal of dividing F cst And F cte Respectively mapped as color images. I is gt And I st Respectively representing a real image and its structural image. Generating I using an edge preserving smoothing method st
(2) Soft-gated dual feature fusion module
In the algorithm, the structural feature F generated by the two branches cst And texture feature F cte Better combinations are made. By exchanging the two types of information, the ratio is dynamically controlled by soft gating to achieve the purpose of dynamic combination. In particular, to construct structure-guided texture features. Soft gate control G te To control the refinement of the texture information.
This is defined as:
G te =σ(SE(h([F cst ,F cte ]))) (2-1)
where h (-) is a convolution operation with kernel size 3. SE (-) is a compression and activation operation to capture important channel information. σ (-) is a Sigmoid activation function, using soft gating G te This can dynamically couple F cst Is incorporated into F cte
Figure GDA0003888084410000111
Where alpha and beta are learnable parameters, which indicate elemental multiplication,
Figure GDA0003888084410000112
representing an element addition.
Likewise, texture guides feature F' cst Is defined as follows:
G st =σ(SE(k([F cst ,F cte ]))) (2-3)
Figure GDA0003888084410000113
where k and h have the same arithmetic operation, γ is a learnable parameter.
Finally, is bonded F' cte And F' cst And generating the feature F using a convolution operation v having a kernel size of 1 fu
F fu =v([F′ cst ,F′ cte ]) (2-5)
(3) Bilateral propagation feature aggregation module
This module is proposed to re-weigh the channels and space so that the image representation is more consistent. Firstly, capturing channel information by using a dynamic core selection network in an adaptive core selection mode to obtain a feature map F' fu The correlation between channels can be enhanced, and the consistency of the whole image can be maintained. And a Context Attention (CA) module is introduced to capture the association between image blocks. Specifically, for a given input feature F, we extract a block of 3 × 3 pixels and calculate the cosine similarity:
Figure GDA0003888084410000114
wherein p is i And p j The ith and jth blocks of the input feature F, respectively.
We use the Softmax function to obtain the attention score between each pair of blocks:
Figure GDA0003888084410000115
wherein N is an input feature F' fu Total number of blocks. Next, the feature map is reconstructed using the attention scores:
Figure GDA0003888084410000116
reconstructed feature maps
Figure GDA0003888084410000117
Is obtained by directly recombining each block.
In the spatial and range domains, we introduce a Bilateral Propagation Activation (BPA) module to generate response values based on range and spatial distance. The response values were calculated as follows:
Figure GDA0003888084410000121
Figure GDA0003888084410000122
wherein x i Is a characteristic of input
Figure GDA0003888084410000123
The ith characteristic channel of (1) j Are adjacent feature channels at locations j around channel i,
Figure GDA0003888084410000124
is a Gaussian function for adjusting the spatial contributions from neighboring feature channels, C (x) is
Figure GDA0003888084410000125
The number of locations in (f) · is a dot product operation. In the spatial domain, we explore j in the neighborhood s for global propagation. S is set to the same size as the input features in the experiment. In the range domain, v is an adjacent area of the position i, and the size thereof is set to 3 × 3. Therefore, we can obtain the characteristic diagram by the space and range similarity measurement method respectively
Figure GDA0003888084410000126
And
Figure GDA0003888084410000127
each feature channel can calculate:
Figure GDA0003888084410000128
where q represents a convolutional layer, the kernel size is 1.
Next, each channel is aggregated to obtain a reconstructed feature map
Figure GDA0003888084410000129
Finally, we concatenate then convolve F' fu And
Figure GDA00038880844100001210
to obtain F sc
Figure GDA00038880844100001211
Where z is a convolution operation with a convolution kernel size of 1.
(4) Distinguishing device
The invention introduces global and local discriminators to ensure that the local-global image content is more consistent. It consists of five convolutional layers, the convolutional kernel size is 4, the step length is 2, and all other layers except the last layer use Leaky ReLu with a slope of 0.2. Furthermore, spectral normalization is employed to achieve stable training.
The above are only preferred embodiments of the present invention, and it should be noted that the above preferred embodiments should not be considered as limiting the present invention, and the protection scope of the present invention should be subject to the scope defined by the claims. It will be apparent to those skilled in the art that several modifications, substitutions, improvements and embellishments of the steps can be made without departing from the spirit and scope of the invention, and these modifications, substitutions, improvements and embellishments should also be construed as the scope of the invention.

Claims (9)

1. An image restoration method based on depth multi-feature collaborative learning comprises the following steps:
s1, inputting an image to be restored into a preset image feature encoder, and performing effective feature extraction on the image to be restored through deep neural network encoding to form an effective image feature set;
s2, decoding and repairing the effective image characteristic set through a preset image decoder, and forming a repaired image after passing through a local discriminator and a global discriminator;
the image feature encoder is characterized by comprising six convolutional layers, wherein three shallow convolutional layers are used for reorganizing texture features to represent image details, and three deep convolutional layers are used for reorganizing structural features to represent image semantics to obtain a structural feature set and a texture feature set;
the image decoder comprises a soft-gated dual-feature fusion module for fusing the structural features and the texture features, a bilateral propagation feature aggregation module for equalizing the features between channel information, context attention and feature space, the soft-gated dual-feature fusion module comprising a structure-guided texture feature unit for executing the following algorithm,
G te =σ(SE(h([F cst ,F cte ]))) (2-1)
F′ cte =α(β(G te ⊙F cte )⊙F cst )⊕F cte (2-2)
wherein, F cst And F cte Respectively expressed as the output characteristics of the structure and texture generated by the concatenation of the multi-scale filling stages, h (-) is a convolution operation with kernel size 3, SE (-) is a compression and activation operation to capture important channel information, σ (-) is a Sigmoid activation function, G te Is used to control the refinement degree, F' cte Indicating a texture feature with structure perception, alpha and beta are learnable parameters, indicating an element to element product, and ^ indicating an element to element addition.
2. An image inpainting method as claimed in claim 1, characterized in that the texture features and the texture features are first filled into the damaged area using three parallel streams with different kernel sizes, respectively, the three streams are combined to form an output feature map, and then the output feature map is mapped to the same size as the input features, and the output of the texture features and the texture features satisfies the following requirements:
L rst =||g(F cst )-I st || 1 (1-1)
L rte =||g(F cte )-I gt || 1 (1-2)
wherein, F cst And F cte Output features, L, respectively expressed as structures and textures generated by a multi-scale fill-phase join rst And L rte Denoted as reconstruction loss of structure and texture, respectively, g (-) is a convolution operation with a kernel size of 1, F can be expressed cst And F cte Respectively mapped as color images, I gt And I st Representing a real image and its structural image, respectively, using an edge preserving image smoothing method to generate I st
3. The image inpainting method of claim 1, wherein the soft-gated dual feature fusion module comprises a texture-guided structural feature unit configured to perform an algorithm,
G st =σ(SE(k([F cst ,F cte ]))) (2-3)
F′ cst =γ(G st ⊙F cte )⊕F cst (2-4)
wherein, F cst And F cte Respectively expressed as the output characteristics of the structure and texture generated by the concatenation of the multi-scale filling stages, k (-) is a convolution operation with kernel size 3, SE (-) is a compression and activation operation to capture important channel information, σ (-) is a Sigmoid activation function, G st To control structural informationDegree of refinement, F' cst Indicating a structural feature with texture perception, gamma is a learnable parameter, indicating an element to element product, ^ element to element addition,
F fu =v([F′ cst ,F′ cte ]) (2-5)
wherein, F' cte And F' cst Representing texture features with texture perception and texture features with texture perception, respectively, v (-) is a convolution operation with a kernel size of 1, F fu Is the final output characteristic of the soft gate control dual-characteristic fusion module.
4. The method of claim 1, wherein the bilateral propagation feature aggregation module comprises a capture channel information fusion unit for capturing channel information by an adaptive core selection method using a dynamic core selection network to obtain the feature map F' fu
5. The image inpainting method of claim 4, wherein the bilateral propagation feature aggregation module includes a context attention fusion unit for capturing a relation between the input image blocks and calculating cosine similarity, and specifically executes the following algorithm:
Figure FDA0003888084400000021
Figure FDA0003888084400000022
Figure FDA0003888084400000023
wherein, feature F' fu The division into non-overlapping blocks is performed,
Figure FDA0003888084400000031
representing the cosine similarity between the output feature blocks,
Figure FDA0003888084400000032
denotes the attention score, p, obtained by the Softmax function i And p j Are respectively an input characteristic F' fu Is an input feature F' fu The total number of blocks of (a) is,
Figure FDA0003888084400000033
a feature map obtained by reconstructing the combined feature block from the attention scores is shown.
6. The image inpainting method of claim 1, wherein the bilateral propagation feature aggregation module includes a spatial information fusion unit that specifically executes the following algorithm:
Figure FDA0003888084400000034
Figure FDA0003888084400000035
wherein the content of the first and second substances,
Figure FDA0003888084400000036
and
Figure FDA0003888084400000037
representing spatial and range similarity profiles, x i Is a characteristic of input
Figure FDA0003888084400000038
The ith characteristic channel of (1) j Are neighboring feature channels at locations j around channel i,
Figure FDA0003888084400000039
is a Gaussian function for adjusting the spatial contributions from neighboring feature channels, C (x) is
Figure FDA00038880844000000310
F (-) is a dot product operation, in the spatial domain, j is explored for global propagation in the neighborhood s, and in the range domain, v is the neighborhood of location i, with the size set to 3 × 3.
7. The image inpainting method of claim 6, wherein the output feature channel is calculated by:
Figure FDA00038880844000000311
wherein the content of the first and second substances,
Figure FDA00038880844000000312
and
Figure FDA00038880844000000313
the spatial and range similarity feature maps are shown, q represents the convolution layer, and the kernel size is 1.
8. An image inpainting method as claimed in claim 7, wherein each channel feature is aggregated to obtain a reconstructed feature map
Figure FDA00038880844000000314
F 'is then formed by concatenated convolution' fu And
Figure FDA00038880844000000315
fusion to give F sc
Figure FDA00038880844000000316
Wherein
Figure FDA00038880844000000317
Is a recombined multichannel character, F' fu To weigh the features obtained after channel information, F sc For the final fusion repair feature, z is a convolution operation with a convolution kernel size of 1.
9. The image inpainting method of claim 1, wherein the global and local discriminators are composed of five convolutional layers, the convolutional kernel size is 4, the step size is 2, all layers except the last layer use a Leaky ReLu with a slope of 0.2, and stable training is achieved using spectral normalization.
CN202210089664.4A 2022-01-25 2022-01-25 Image restoration method based on deep multi-feature collaborative learning Active CN114463209B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210089664.4A CN114463209B (en) 2022-01-25 2022-01-25 Image restoration method based on deep multi-feature collaborative learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210089664.4A CN114463209B (en) 2022-01-25 2022-01-25 Image restoration method based on deep multi-feature collaborative learning

Publications (2)

Publication Number Publication Date
CN114463209A CN114463209A (en) 2022-05-10
CN114463209B true CN114463209B (en) 2022-12-16

Family

ID=81410572

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210089664.4A Active CN114463209B (en) 2022-01-25 2022-01-25 Image restoration method based on deep multi-feature collaborative learning

Country Status (1)

Country Link
CN (1) CN114463209B (en)

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023225808A1 (en) * 2022-05-23 2023-11-30 Guangdong Oppo Mobile Telecommunications Corp., Ltd. Learned image compress ion and decompression using long and short attention module
CN114897742B (en) * 2022-06-10 2023-05-23 重庆师范大学 Image restoration method with texture and structural features fused twice
CN115082743B (en) * 2022-08-16 2022-12-06 之江实验室 Full-field digital pathological image classification system considering tumor microenvironment and construction method
CN115841625B (en) * 2023-02-23 2023-06-06 杭州电子科技大学 Remote sensing building image extraction method based on improved U-Net model
CN116681980B (en) * 2023-07-31 2023-10-20 北京建筑大学 Deep learning-based large-deletion-rate image restoration method, device and storage medium
CN117196981B (en) * 2023-09-08 2024-04-26 兰州交通大学 Bidirectional information flow method based on texture and structure reconciliation
CN117422911B (en) * 2023-10-20 2024-04-30 哈尔滨工业大学 Collaborative learning driven multi-category full-slice digital pathological image classification system

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108460746A (en) * 2018-04-10 2018-08-28 武汉大学 A kind of image repair method predicted based on structure and texture layer
CN112365422A (en) * 2020-11-17 2021-02-12 重庆邮电大学 Irregular missing image restoration method and system based on deep aggregation network
CN113298733A (en) * 2021-06-09 2021-08-24 华南理工大学 Implicit edge prior based scale progressive image completion method
WO2021232589A1 (en) * 2020-05-21 2021-11-25 平安国际智慧城市科技股份有限公司 Intention identification method, apparatus and device based on attention mechanism, and storage medium

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR102640237B1 (en) * 2019-10-25 2024-02-27 삼성전자주식회사 Image processing methods, apparatus, electronic devices, and computer-readable storage media

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108460746A (en) * 2018-04-10 2018-08-28 武汉大学 A kind of image repair method predicted based on structure and texture layer
WO2021232589A1 (en) * 2020-05-21 2021-11-25 平安国际智慧城市科技股份有限公司 Intention identification method, apparatus and device based on attention mechanism, and storage medium
CN112365422A (en) * 2020-11-17 2021-02-12 重庆邮电大学 Irregular missing image restoration method and system based on deep aggregation network
CN113298733A (en) * 2021-06-09 2021-08-24 华南理工大学 Implicit edge prior based scale progressive image completion method

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
Image Inpainting via Conditional Texture and Structure Dual Generation;Xiefan Guo等;《2021 IEEE/CVF International Conference on Computer Vision (ICCV)》;20210831;第14114-14123页 *
Selective Kernel Networks;Xiang Li等;《2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)》;20200109;第510-519页 *
基于双向门控尺度特征融合的遥感场景分类;宋中山等;《计算机应用》;20210222;第1-12页 *

Also Published As

Publication number Publication date
CN114463209A (en) 2022-05-10

Similar Documents

Publication Publication Date Title
CN114463209B (en) Image restoration method based on deep multi-feature collaborative learning
CN109447907B (en) Single image enhancement method based on full convolution neural network
CN111242238B (en) RGB-D image saliency target acquisition method
CN111754438A (en) Underwater image restoration model based on multi-branch gating fusion and restoration method thereof
CN110689495B (en) Image restoration method for deep learning
CN111787187B (en) Method, system and terminal for repairing video by utilizing deep convolutional neural network
CN110223251B (en) Convolution neural network underwater image restoration method suitable for artificial and natural light sources
CN113989129A (en) Image restoration method based on gating and context attention mechanism
CN112991231B (en) Single-image super-image and perception image enhancement joint task learning system
CN114897742B (en) Image restoration method with texture and structural features fused twice
CN110349087A (en) RGB-D image superior quality grid generation method based on adaptability convolution
CN115170915A (en) Infrared and visible light image fusion method based on end-to-end attention network
CN116958534A (en) Image processing method, training method of image processing model and related device
CN113554032A (en) Remote sensing image segmentation method based on multi-path parallel network of high perception
CN116167920A (en) Image compression and reconstruction method based on super-resolution and priori knowledge
CN116485741A (en) No-reference image quality evaluation method, system, electronic equipment and storage medium
CN115829880A (en) Image restoration method based on context structure attention pyramid network
CN114677477A (en) Virtual viewpoint synthesis method, system, medium, device and terminal
CN116109510A (en) Face image restoration method based on structure and texture dual generation
CN117689592A (en) Underwater image enhancement method based on cascade self-adaptive network
CN117408924A (en) Low-light image enhancement method based on multiple semantic feature fusion network
CN117061760A (en) Video compression method and system based on attention mechanism
CN116523985A (en) Structure and texture feature guided double-encoder image restoration method
CN116703750A (en) Image defogging method and system based on edge attention and multi-order differential loss
JPS62131383A (en) Method and apparatus for evaluating movement of image train

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant