CN114897738A - Image blind restoration method based on semantic inconsistency detection - Google Patents
Image blind restoration method based on semantic inconsistency detection Download PDFInfo
- Publication number
- CN114897738A CN114897738A CN202210574618.3A CN202210574618A CN114897738A CN 114897738 A CN114897738 A CN 114897738A CN 202210574618 A CN202210574618 A CN 202210574618A CN 114897738 A CN114897738 A CN 114897738A
- Authority
- CN
- China
- Prior art keywords
- image
- damaged
- region
- prediction
- residual
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 39
- 238000001514 detection method Methods 0.000 title claims abstract description 12
- 208000037170 Delayed Emergence from Anesthesia Diseases 0.000 claims abstract description 10
- 230000008439 repair process Effects 0.000 claims abstract description 7
- 230000002776 aggregation Effects 0.000 claims abstract description 4
- 238000004220 aggregation Methods 0.000 claims abstract description 4
- 230000006870 function Effects 0.000 claims description 22
- 230000007246 mechanism Effects 0.000 claims description 15
- 230000004913 activation Effects 0.000 claims description 12
- 238000012546 transfer Methods 0.000 claims description 10
- 238000004364 calculation method Methods 0.000 claims description 9
- 230000015556 catabolic process Effects 0.000 claims description 9
- 238000006731 degradation reaction Methods 0.000 claims description 9
- 230000008569 process Effects 0.000 claims description 9
- 210000004556 brain Anatomy 0.000 claims description 6
- 238000013507 mapping Methods 0.000 claims description 6
- 238000011160 research Methods 0.000 claims description 6
- 238000012549 training Methods 0.000 claims description 6
- 238000010606 normalization Methods 0.000 claims description 5
- 238000013461 design Methods 0.000 claims description 4
- 238000010586 diagram Methods 0.000 claims description 4
- 238000009499 grossing Methods 0.000 claims description 4
- 230000010354 integration Effects 0.000 claims description 4
- 239000011159 matrix material Substances 0.000 claims description 4
- 230000015654 memory Effects 0.000 claims description 4
- 238000009825 accumulation Methods 0.000 claims description 3
- 230000005540 biological transmission Effects 0.000 claims description 3
- 238000011109 contamination Methods 0.000 claims description 3
- 238000009826 distribution Methods 0.000 claims description 3
- 238000000605 extraction Methods 0.000 claims description 3
- 230000000007 visual effect Effects 0.000 claims description 3
- NAWXUBYGYWOOIX-SFHVURJKSA-N (2s)-2-[[4-[2-(2,4-diaminoquinazolin-6-yl)ethyl]benzoyl]amino]-4-methylidenepentanedioic acid Chemical compound C1=CC2=NC(N)=NC(N)=C2C=C1CCC1=CC=C(C(=O)N[C@@H](CC(=C)C(O)=O)C(O)=O)C=C1 NAWXUBYGYWOOIX-SFHVURJKSA-N 0.000 claims description 2
- 238000007596 consolidation process Methods 0.000 claims description 2
- 238000005520 cutting process Methods 0.000 claims description 2
- 230000002950 deficient Effects 0.000 claims description 2
- 238000009792 diffusion process Methods 0.000 claims description 2
- 230000004927 fusion Effects 0.000 claims description 2
- 238000011176 pooling Methods 0.000 claims description 2
- 230000002787 reinforcement Effects 0.000 claims description 2
- 230000004044 response Effects 0.000 claims description 2
- 238000005096 rolling process Methods 0.000 claims description 2
- 238000011524 similarity measure Methods 0.000 claims description 2
- 238000003860 storage Methods 0.000 claims description 2
- 238000007781 pre-processing Methods 0.000 abstract description 2
- 230000000295 complement effect Effects 0.000 abstract 1
- 238000012545 processing Methods 0.000 description 6
- 238000005516 engineering process Methods 0.000 description 3
- 238000011161 development Methods 0.000 description 2
- 238000007670 refining Methods 0.000 description 2
- 238000005070 sampling Methods 0.000 description 2
- 230000003044 adaptive effect Effects 0.000 description 1
- 230000004075 alteration Effects 0.000 description 1
- 230000008901 benefit Effects 0.000 description 1
- 239000000969 carrier Substances 0.000 description 1
- 238000011840 criminal investigation Methods 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 239000011521 glass Substances 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000004321 preservation Methods 0.000 description 1
- 230000003252 repetitive effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T5/00—Image enhancement or restoration
- G06T5/77—Retouching; Inpainting; Scratch removal
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T3/00—Geometric image transformations in the plane of the image
- G06T3/40—Scaling of whole images or parts thereof, e.g. expanding or contracting
- G06T3/4038—Image mosaicing, e.g. composing plane images from plane sub-images
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T5/00—Image enhancement or restoration
- G06T5/70—Denoising; Smoothing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2200/00—Indexing scheme for image data processing or generation, in general
- G06T2200/32—Indexing scheme for image data processing or generation, in general involving image mosaicing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20081—Training; Learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20084—Artificial neural networks [ANN]
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Molecular Biology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- General Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- Life Sciences & Earth Sciences (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses an image blind restoration method based on semantic inconsistency detection, which comprises the following steps: preprocessing an image with noise pollution to be used as input; amplifying semantic difference between a polluted region and a background through a mask prediction network constructed by the annular residual block, and roughly positioning a degraded region in the polluted image; then, by utilizing the texture similarity among the regions of different classes, a fine prediction mask is obtained through a mask refinement network; combining the damaged image and the prediction mask to input into an image restoration network, and iteratively utilizing the information of the effective area to complement the content of the damaged area based on the confidence consideration of the mask; meanwhile, the consistency of the structure is improved by using the context attention aggregation module at different scales; and fusing the multi-feature information and then decoding and restoring the multi-feature information into an image, thereby realizing the blind repair of the degraded image. The method can accurately detect the noise pollution in the real damaged image, and meets the requirement of robust restoration of various degraded images.
Description
Technical Field
The invention belongs to the field of computer graphics and image processing, and relates to an image blind restoration method based on semantic inconsistency detection.
Background
With the development of computer technology and multimedia technology, digital images become important information carriers. Over time and with some factors of resistance, the preservation process of photographs may be subject to various degradations, such as ink mark contamination, crease breakage, mildew and fade, etc.; in addition, accidents such as robbing of glasses during photographing and stains on camera lenses can also be caused when the time is recorded. These various techniques greatly affect the expression of image content. Therefore, image restoration techniques for restoring image content and improving image quality have been developed rapidly in recent years, and are widely used in the fields of image editing, target removal, biomedical image processing, criminal investigation, and the like. The image restoration technology has achieved a lot of important research results through years of development, and the processing means widely used at present, such as a Photoshop repair tool, etc., apply the traditional restoration method, and utilize the redundancy of image information to fill the damaged area with the pixels of the known area. Such methods can produce good patches of scene images with repetitive textures, but cannot produce new content due to lack of understanding of image semantics.
As a great research focus of computer vision direction, in recent years, researchers try to introduce deep learning methods into the field of image restoration, and although these models can infer missing content through provided effective pixels, these methods all assume blank content in an image as a damaged area and clearly need to provide a binary mask for calibration. The methods can train the model to infer the content of the missing region, however, the degradation mode and the position region of the damaged image in real life are often unknown, and it is difficult to provide an accurate mask to guide the region to be repaired in advance, which greatly limits the popularization of the methods in real scenes. Therefore, how to identify and repair the damaged content in the image only by the damaged image becomes a difficult problem to be solved urgently.
Disclosure of Invention
In order to overcome the defects, the invention provides an image blind repairing method based on semantic inconsistency detection, which comprises the following specific steps:
s1, inputting a damaged image I m A clean pixel region and a contaminated pixel region;
s2, constructing a mask prediction network through the multilayer residual block,generating single-channel coarse prediction soft mask for locating damaged area
S3, inputting the rough prediction mask and the damaged image obtained in the S2 into a mask thinning network again, improving the prediction accuracy of the regions such as the boundary and the like through reinforcement learning, and obtaining a fine damaged region prediction mask
S4, inputting the fine prediction mask obtained in S3 as prior information and the damaged image into a shared encoder, extracting the characteristics of effective pixels according to the guidance of the mask and transmitting the characteristics to a damaged area;
s5, inputting the deep characteristic diagram extracted by the encoder network into a multi-task parallel decoding branch, speculating the content of the missing area through a plurality of layers of rolling blocks, and ensuring the global semantic consistency by using context information;
s6, fusing the features extracted from different branches in S5, decoding by a decoder network, and recovering into an image;
and S7, utilizing the fine prediction mask in S3, cutting out pixels at the position of the damaged area in the S6 result, splicing the pixels with effective pixels in the damaged image, and outputting a final repaired image.
The technical scheme of the invention is characterized by comprising the following steps:
with respect to step S1, the invention first defines the damaged image, and instead of simply using blank pixels to represent the area to be repaired as in the prior art, the invention considers that the damaged image should consist of clean valid pixels and different types of degraded and contaminated pixels. Because no data set specially used for blind repair research exists at present, batch training data is firstly synthesized according to the thought for model training, and the mathematical expression is as follows:
in the formula (1), I m Representing a stitched defective image, I gt Representing a completely clean image, N representing the contaminating noise content, and M the binarization mask. In order to improve the robustness of the method, N simulates graffiti, creases, character occlusion, content of other images which are randomly intercepted, and the like and is spliced to I gt Generating a damaged image I containing multiple types of contamination and degradation m 。
Preferably, in step S1, in order to make the original image more natural to blend the pollution noise, a smooth gaussian function is used in the present invention to perform the smoothing process, and the formula is as follows:
I=I m *G σ (2)
in the formula (2), I represents the smoothed damaged image, I m Representing damaged images spliced directly, G σ Representing a two-dimensional gaussian kernel with standard deviation sigma.
For step S2, the present invention uses the modified circular residual convolution block as a feature extractor to locate the damaged area by enlarging the difference between the effective pixel area and the contaminated area and comparing the intrinsic properties of the different areas of the image. The annular residual block used in the invention comprises three steps, designs a recall and consolidation mechanism from human brain, and is realized through the propagation and feedback process of residual in CNN. The first stage is forward residual propagation, and solves the problem of gradient degradation in a deeper network by recalling input characteristic information, and formula definition can be expressed as:
y f =F(x,{W i })+W s *x (3)
in the formula (3), x represents an input feature map, y f The representation represents the learned residual map. F (x, { W) i }) represents the learned residual map, whose structure includes two convolution layers and an activation function ELU, W s Is a convolution of 1 × 1. The residual propagation looks like the memory mechanism of the human brain. Previous knowledge may be forgotten when the model learns more new knowledge, so a recall mechanism is needed to help evoke those previously ambiguous memories.
To further enhance the difference between the corrupted content and the valid content attributes, the second stage integrates the input feature information using residual feedback. By using a simple gating mechanism to learn the nonlinear relation between distinguishable characteristic channels, the diffusion of characteristic information is avoided, a response value is superposed on an input characteristic through an activation function, the difference of the image essential attributes of a noise region and an effective region is amplified, and a formula is defined as follows:
y b =(s(G(y f ))+1)*x (4)
in the formula (4), x is a residual mapping feature, y b Is the residual feedback feature, G (-) is the linear mapping, s is the activation function, sigmoid function used here. Unlike the recall mechanism simulated by residual propagation, residual feedback seems to be in the process of simulating human brain consolidated knowledge, and is a new understanding of features. In the third stage, the operation of the first stage is repeated, and residual error propagation is carried out on the new features, so that amplified feature differences are further learned. Two forward residual transmissions are combined with one reverse residual feedback to form a ring residual structure.
For step S3, the present invention introduces an attention mechanism to refine the coarse prediction mask, and improves the recognition result at details such as contour by paying attention to similar texture on the whole image. In particular, if a low confidence region predicted to be corrupt shares a similar texture with a high confidence region, the low confidence region should be modified. For this reason, it is necessary to extract key features of the damaged content from the high-confidence region to be used as global visual features of the class. According to the method, cosine similarity is calculated for a rough prediction mask to serve as new bias, a score map of a prediction region is reduced by Softmax, and the region which is still high after score is reduced can be considered as a region with obvious enough characteristics, so that key characteristics can be extracted from the regions to serve as global characteristics of a damaged region, and the calculation formula is as follows:
CosSim(x′ sem )=X∈R c×c
in the formula (5), CosSim (·) represents a modified cosine similarity calculation function, x' sem Representing a prediction weight matrix, i and j representing prediction classes, which may be divided into damaged and non-damaged areas, X i,j Indicating the cosine similarity between pixels of different prediction classes,is x' sem The ith channel of (a), indicates the prediction results belonging to a certain class for each pixel. X i,j The closer to 1 the more closely the image is,andthe more similar the activation results, the less trustworthy the location prediction. By setting the deviation of the same type of pixels to 0 and setting the deviation of different types of pixels to similarity score X i,j Thus, the region that still maintains a high activation value in the classification is the key feature, and the whole process is called key feature pooling.
Preferably, in said step S3, the invention utilizes a prediction weight matrix x' sem And a feature map x f Calculating the weighted sum to obtain the key feature v k The method comprises the following steps:
where i represents a prediction category. Merging the key features v k As Key, the feature x f Viewed as Query, highlight and key feature v k Obtaining an AttentionMap in the similar area, and performing convolution operation with the original image to predict the final thinning prediction mask
For step S4, the present invention introduces a gated convolution mechanism to improve residual convolution blocks, identifies damaged regions by learning, and dynamically selects effective pixel content in images, so that the convolution result depends only on effective pixels, and replaces the conventional residual convolution structure to perform feature extraction and integration of effective regions. The output of the gated convolution is calculated as:
Gating y,x =∑∑W g ·I
Feature y,x =∑∑W f ·I (7)
in the formula (7), I represents an input characteristic, W g And W f Two different convolution kernels are represented, #denotesthe use of the LeakyReLU activation function, and σ denotes the sigmoid function, all values are restricted to [0,1 ]]To indicate the importance of each local region,denotes the element-by-element corresponding multiplication, O y,x Representing the output characteristics of the soft-gated weights.
Preferably, in step S4, in order to avoid the influence of the error accumulation of the prediction mask on the image restoration result, the invention uses a new Probability Context Normalization (PCN) to perform statistical information transfer at the end of the improved residual block, and propagates the statistical information such as the mean and variance of the effective pixel region to the damaged region, so as to ensure that the distribution of the features of the inner and outer regions is consistent, and the notations are as follows:
in equation (8), X represents the output of the last layer of convolution in the gated residual block, H tableWill prediction maskSampling to the same size as X, β is a learnable channel attention weight, and "represents information transfer, specifically:
in the formula (9), X P And X Q Respectively representing the contaminated area and the valid pixel area, mu (-) represents the area mean, and sigma (-) represents the area variance. For an image, the feature mean is related to global semantics, and the variance is related to local texture features.
For step S5, the present invention obtains image context information using a multi-scale context attention aggregation branch, where the context similarity calculates a cosine similarity measure similarity between the patches inside and outside the missing region, finds the content with the highest similarity in the valid region for the patch of the region to be complemented, and assigns a higher reference weight so that the complemented content is kept consistent with the context in terms of semantics and texture. The similarity metric is formulated as follows:
in the formula (10), p i And p j Feature patches representing valid and missing regions respectively, and then obtaining the attention score of each patch through a softmax function:
where N denotes the number of patches of the effective area division. Through calculation, each patch in the missing region finds a region in the effective pixel which is more worth focusing, and a higher reference weight is given to the feature fusion.
Preferably, in step S5, in order to reduce the amount of computation and increase the inference speed, the present invention propagates the inter-patch attention similarity score by using context information transfer, specifically, the present invention calculates the similarity score once on the feature map with the network deep layer size of 32 × 32, and then propagates the attention score to the lower layers of different scales by using context attention transfer to perform feature weighting, as follows:
in the formula (12), l represents different network shallow layers,indicating the missing region patch at a different scale,representing an effective area, s, corresponding to the same size i,j The attention score is shown and N represents the number of taps in the background. Since the feature map size is hierarchically varied, the size of the patch should be varied accordingly, specifically to enlarge the mapping region by comparing the current feature map size with the attention score map, for example, every four neighboring pixels in the feature map with a size of 128 × 128 share an attention score value. Through the attention score sharing mode, the model reasoning result not only obtains better global semantic consistency, but also improves the storage and calculation speed efficiency obviously.
The blind image restoration method based on the nonsingular consistency detection solves the problems that the prior art cannot solve the problem that the damaged images of multiple degradation modes in a real scene are restored and the calibration mask is difficult to directly obtain, and has the following advantages:
(1) compared with the existing repairing method, the method analyzes and designs an end-to-end network model, does not need to provide a mask for calibrating a damaged area, automatically identifies the polluted and damaged areas in the image, repairs the result with consistent semantics and complete vision, repairs various damaged modes in the real image, and has robustness and authenticity.
(2) The method can be conveniently expanded to other research fields of image processing, such as target removal, highlight removal, image rain removal and defogging, and has good mobility and applicability.
Drawings
FIG. 1 is a flow chart of blind image restoration based on semantic inconsistency detection according to the present invention.
FIG. 2 is a schematic diagram of a prediction mask refinement module according to the present invention.
FIG. 3 is a diagram illustrating a structure of a probabilistic context aggregation convolutional block according to the present invention.
Detailed Description
The present invention will be described in further detail with reference to the accompanying drawings and specific embodiments.
An image blind restoration method based on semantic inconsistency detection is shown in fig. 1, which is a flow chart of the image blind restoration method based on semantic inconsistency detection of the present invention, and the method includes:
s1, preprocessing data, reading a damaged image I with noise pollution m The image size is uniformly adjusted to 256 × 256, and then the image size is input into the network model through normalization processing. In the training stage, various degradation modes in the simulated real scene are synthesized into a damaged image, and then Gaussian smoothing operation processing is additionally used, so that the image is more real and natural.
S2, coarse prediction of damaged area, inputting the processed degraded image into coarse mask prediction network constructed by six layers of annular residual blocks, wherein the whole structure is a coder-decoder network, learning image inherent attribute by convolution integration image context information, amplifying difference between effective pixel area and damaged area by alternative calculation of annular structure of residual propagation and residual feedback, generating single-channel coarse damaged area prediction maskWhen loss is calculated in the training stage, since only whether each position belongs to an effective area or a damaged area needs to be judged, binary cross entropy loss is used as a loss function and is expressed as follows:
in the formula (13), T is an adaptive weight, and p ∈ { p | M p 1 represents the real damage region, q ∈ { q | M q 0 represents a real effective area.
S3, refining the prediction mask, inputting the rough prediction mask generated in S2 and the damaged image into a mask refining network, firstly extracting image characteristics through a simple encoder, calculating the cosine similarity between the pixels predicted to be different categories, and then limiting the value to 0,1 by a softmax function as shown in figure 2]The more the numerical value is close to 1, the more the prediction category of the area is unreliable, the Key feature of the damaged area with high confidence level is screened out as Key, the overall image feature Query is traversed according to the Query mode of the attention mechanism to obtain the global attention weight, and finally the updated feature information is integrated through deconvolution and the image is restored to obtain the refined prediction mask with clearer and more accurate detail outline
And S4, extracting content features, inputting the damaged image into an encoder, and in order to avoid the influence caused by the wrong accumulation of the prediction mask, simultaneously, zooming the prediction thinning mask to be the same size as the feature map and inputting the prediction thinning mask into each layer of the encoder so as to guide the extraction of effective pixel information and the transmission of the effective pixel information to a damaged area. The encoder is composed of four layers of gating residual convolution blocks newly designed by the invention, the structure of the encoder is shown in fig. 3, the output of the standard convolution layers of two different tasks is multiplied element by element, wherein one layer is followed by a leayrelu function, the other layer is followed by a sigmoid function, the soft mask is automatically learned and updated from the input in a learnable mode, and the convolution operation is limited to be carried out only in an effective pixel area. In addition, probability context normalization is selected to replace batch normalization, so that transfer of image statistical information is realized, and the characteristic information distribution inside and outside the mask is ensured to be consistent.
S5, deducing the content of the missing region, the invention provides a multi-task parallel framework, and two parallel decoding branches are designed for feature reasoning and content propagation. As shown in fig. 1, the uplink branch is formed by multilayer cavity convolutions with expansion rates of 2, 4 and 8, and the receptive field is enlarged by different expansion rates to capture multi-scale context information; the descending branch uses a multi-scale context attention integration module, the attention scores among different patches are calculated on a feature map with the network deep layer size of 32 x 32, and feature weighting is carried out on the network shallow layers with different scales through a context attention transfer module, so that the feature global structure and semantic consistency are ensured.
And S6, decoding the features and restoring the images, and splicing the feature graphs extracted from different branches in the S5 according to channels and inputting the feature graphs into a decoder network for decoding. The structural design of the decoder is symmetrical to that of the encoder, the characteristics of four layers of gated residual convolution blocks and up-sampling are alternatively fused, and finally, the predicted restored image is restored through one layer of 3 multiplied by 3 common convolution;
and S7, outputting a final repairing result, selecting effective contents of the input image and contents of the predicting result by using the predicting mask to splice in order to ensure that the result is clearer, and outputting a clean repairing result with complete structure and consistent semantics through smoothing.
In conclusion, the blind restoration method based on the semantic inconsistency detection image is suitable for restoring real damaged images in real life, does not need to additionally provide a binarization mask for marking damaged areas, realizes high-quality restoration of degraded images through an end-to-end network, ensures that the restoration result has visual integrity and structural rationality, can robustly solve various image degradation and pollution in different real scenes, and has wide application value.
While the present invention has been described in detail with reference to the preferred embodiments, it should be understood that the above description should not be taken as limiting the invention. Various modifications and alterations to this invention will become apparent to those skilled in the art upon reading the foregoing description. Accordingly, the scope of the invention should be determined from the following claims.
Claims (2)
1. An image blind restoration method based on semantic inconsistency detection is characterized by comprising the following specific steps:
s1, inputting a damaged image I m Including a clean pixel region and a damaged pixel region;
s2, constructing a mask prediction network through multiple layers of residual blocks, and generating a single-channel rough prediction soft mask for positioning a damaged area
S3, inputting the rough prediction mask and the damaged image obtained in the S2 into a mask thinning network again, improving the prediction accuracy of the regions such as the boundary and the like through reinforcement learning, and obtaining a fine damaged region prediction mask
S4, inputting the fine prediction mask obtained in S3 as prior information and the damaged image into a shared encoder, extracting the characteristics of effective pixels according to the guidance of the mask and transmitting the characteristics to a damaged area;
s5, inputting the deep characteristic diagram extracted by the encoder network into a multi-task parallel decoding branch, speculating the content of the missing area through a plurality of layers of rolling blocks, and ensuring the global semantic consistency by using context information;
s6, fusing the features extracted from different branches in S5, decoding by a decoder network, and recovering into an image;
and S7, utilizing the fine prediction mask in S3, cutting out pixels at the position of the damaged area in the S6 result, splicing the pixels with effective pixels in the damaged image, and outputting a final repaired image.
2. The blind image restoration method based on semantic inconsistency detection according to claim 1, wherein for step S1, the invention first defines the damaged image, and unlike the existing research that simply uses blank pixels to represent the region to be restored, the invention considers that the damaged image should be composed of clean effective pixels and different types of degraded and polluted pixels. Because no data set specially used for blind repair research exists at present, batch training data is firstly synthesized according to the thought for model training, and the mathematical expression is as follows:
in the formula (1), I m Representing a stitched defective image, I gt Representing a completely clean image, N representing the contaminating noise content, and M the binarization mask. In order to improve the robustness of the method, N simulates graffiti, creases, character occlusion, content of other images which are randomly intercepted, and the like and is spliced to I gt Generating a damaged image I containing multiple types of contamination and degradation m 。
Preferably, in step S1, in order to make the original image more natural to blend the pollution noise, a smooth gaussian function is used in the present invention to perform the smoothing process, and the formula is as follows:
I=I m *G σ (2)
in the formula (2), I represents the smoothed damaged image, I m Representing damaged images spliced directly, G σ Representing a two-dimensional gaussian kernel with standard deviation sigma.
For step S2, the present invention uses the modified circular residual convolution block as a feature extractor to locate the damaged area by enlarging the difference between the effective pixel area and the contaminated area and comparing the intrinsic properties of the different areas of the image. The annular residual block used in the invention comprises three steps, designs a recall and consolidation mechanism from human brain, and is realized through the propagation and feedback process of residual in CNN. The first stage is forward residual propagation, and solves the problem of gradient degradation in a deeper network by recalling input characteristic information, and formula definition can be expressed as:
y f =F(x,{W i })+W s *x (3)
in the formula (3), x represents an input feature map, y f The representation represents the learned residual map. F (x, { W) i }) represents the learned residual map, whose structure includes two convolution layers and an activation function ELU, W s Is a convolution of 1 × 1. The residual propagation looks like the memory mechanism of the human brain. Previous knowledge may be forgotten when the model learns more new knowledge, so a recall mechanism is needed to help evoke those previously ambiguous memories.
To further enhance the difference between the corrupted content and the valid content attributes, the second stage integrates the input feature information using residual feedback. By using a simple gating mechanism to learn the nonlinear relation between distinguishable characteristic channels, the diffusion of characteristic information is avoided, a response value is superposed on an input characteristic through an activation function, the difference of the image essential attributes of a noise region and an effective region is amplified, and a formula is defined as follows:
y b =(s(G(y f ))+1)*x (4)
in the formula (4), x is a residual mapping feature, y b Is the residual feedback feature, G (-) is the linear mapping, s is the activation function, sigmoid function used here. Unlike the recall mechanism simulated by residual propagation, residual feedback seems to be in the process of simulating human brain consolidated knowledge, and is a new understanding of features. In the third stage, the operation of the first stage is repeated, and residual error propagation is carried out on the new features, so that amplified feature differences are further learned. Two forward residual transmissions are combined with one reverse residual feedback to form a ring residual structure.
For step S3, the present invention introduces an attention mechanism to refine the coarse prediction result, pay attention to similar texture on the whole image, and improve the recognition result at details such as contour. In particular, if a low confidence region predicted to be corrupt shares a similar texture with a high confidence region, the low confidence region should be modified. For this reason, it is necessary to extract key features of the damaged content from the high-confidence region to be used as global visual features of the class. According to the method, cosine similarity is calculated for a rough prediction mask to serve as new bias, a score map of a prediction region is reduced by Softmax, and the region which is still high after score is reduced can be considered as a region with obvious enough characteristics, so that key characteristics can be extracted from the regions to serve as global characteristics of a damaged region, and the calculation formula is as follows:
CosSim(x′ sem )=X∈R c×c
in the formula (5), CosSim (. cndot.) represents an improved cosine similarity calculation function, x' sem Representing a prediction weight matrix, i and j representing prediction categories, which may be divided into damaged and non-damaged areas, X i,j Indicating the cosine similarity between pixels of different prediction classes,is x' sem The ith channel of (2) indicates the prediction result belonging to a certain class for each pixel. X i,j The closer to 1 the more closely the image is,andthe more similar the activation results, the less trustworthy the location prediction. By setting the deviation of the same type of pixels to 0 and setting the deviation of different types of pixels to similarity score X i,j Thus, the regions that still maintain high activation values in the classification are key features, and the whole process is called as keyAnd (4) pooling the characteristics.
Preferably, in said step S3, the present invention utilizes a prediction weight matrix x' sen And a feature map x f Calculating the weighted sum to obtain the key feature v k The method comprises the following steps:
where i represents a prediction category. Merging the key features v k As Key, the feature x f Viewed as Query, highlight and key feature v k Obtaining an Attention Map in a similar area, and performing convolution operation with the original image to predict a final thinning prediction mask
For step S4, the present invention introduces a gated convolution mechanism to improve residual convolution blocks, identifies damaged regions by learning, and dynamically selects effective pixel content in images, so that the convolution result depends only on effective pixels, and replaces the conventional residual convolution structure to perform feature extraction and integration of effective regions. The output of the gated convolution is calculated as:
in the formula (7), I represents an input characteristic, W g And W f Two different convolution kernels are represented, #denotesthe use of the LeakyReLU activation function, and σ denotes the sigmoid function, all values are restricted to [0,1 ]]To indicate the importance of each local region,denotes the element-by-element corresponding multiplication, O y,x Representing the output characteristics of the soft-gated weights.
Preferably, in step S4, in order to avoid the influence of the error accumulation of the prediction mask on the image restoration result, the invention uses a new Probability Context Normalization (PCN) to perform statistical information transfer at the end of the improved residual block, and propagates the statistical information such as the mean and variance of the effective pixel region to the damaged region, so as to ensure that the distribution of the features of the inner and outer regions is consistent, and the notations are as follows:
in equation (8), X represents the output of the last layer convolution in the gated residual block, and H represents the prediction maskSampled to the same size as X, β is a learnable channel attention weight,the information transfer is represented, and the specific content is as follows:
in the formula (9), X P And X Q Respectively representing the contaminated area and the valid pixel area, mu (-) represents the area mean, and sigma (-) represents the area variance. For the image, the feature mean is related to global semantics, and the variance is related to local texture features.
For step S5, the present invention obtains image context information using a multi-scale context attention aggregation branch, where the context similarity calculates a cosine similarity measure similarity between the patches inside and outside the missing region, finds the content with the highest similarity in the valid region for the patch of the region to be complemented, and assigns a higher reference weight so that the complemented content is kept consistent with the context in terms of semantics and texture. The similarity metric is formulated as follows:
in the formula (10), p i And p j Feature patches representing valid and missing regions respectively, and then obtaining the attention score of each patch through a softmax function:
where N denotes the number of patches of the effective area division. Through calculation, each patch in the missing region finds a region in the effective pixel which is more worth focusing, and a higher reference weight is given to the feature fusion.
Preferably, in step S5, in order to reduce the amount of computation and increase the inference speed, the present invention propagates the inter-patch attention similarity score by using context information transfer, specifically, the present invention calculates the similarity score once on the feature map with the network deep layer size of 32 × 32, and then propagates the attention score to the lower layers of different scales by using context attention transfer to perform feature weighting, as follows:
in the formula (12), l represents different network shallow layers,indicating the missing region patch at a different scale,representing an effective area, s, corresponding to the same size i,j The attention score is shown and N represents the number of taps in the background. Since the size of the feature map isThe hierarchy varies, and therefore the size of the patch should also vary accordingly, specifically to enlarge the mapping region by comparing the current feature size with the attention score map, for example, every four neighboring pixels in a 128 × 128 size feature map share an attention score value. Through the attention score sharing mode, the model reasoning result not only obtains better global semantic consistency, but also improves the storage and calculation speed efficiency obviously.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210574618.3A CN114897738A (en) | 2022-05-25 | 2022-05-25 | Image blind restoration method based on semantic inconsistency detection |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210574618.3A CN114897738A (en) | 2022-05-25 | 2022-05-25 | Image blind restoration method based on semantic inconsistency detection |
Publications (1)
Publication Number | Publication Date |
---|---|
CN114897738A true CN114897738A (en) | 2022-08-12 |
Family
ID=82725567
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210574618.3A Pending CN114897738A (en) | 2022-05-25 | 2022-05-25 | Image blind restoration method based on semantic inconsistency detection |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114897738A (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110942439A (en) * | 2019-12-05 | 2020-03-31 | 北京华恒盛世科技有限公司 | Image restoration and enhancement method based on satellite picture defects |
US20230130772A1 (en) * | 2021-10-22 | 2023-04-27 | Suresoft Technologies Inc. | Method for Selecting the Last Patch from Among a Plurality Patches for Same Location and the Last Patch Selection Module |
CN116705642A (en) * | 2023-08-02 | 2023-09-05 | 西安邮电大学 | Method and system for detecting silver plating defect of semiconductor lead frame and electronic equipment |
CN117376632A (en) * | 2023-12-06 | 2024-01-09 | 中国信息通信研究院 | Data recovery method and system based on intelligent depth synthesis |
-
2022
- 2022-05-25 CN CN202210574618.3A patent/CN114897738A/en active Pending
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110942439A (en) * | 2019-12-05 | 2020-03-31 | 北京华恒盛世科技有限公司 | Image restoration and enhancement method based on satellite picture defects |
CN110942439B (en) * | 2019-12-05 | 2023-09-19 | 北京华恒盛世科技有限公司 | Image restoration and enhancement method based on satellite picture defects |
US20230130772A1 (en) * | 2021-10-22 | 2023-04-27 | Suresoft Technologies Inc. | Method for Selecting the Last Patch from Among a Plurality Patches for Same Location and the Last Patch Selection Module |
US11822915B2 (en) * | 2021-10-22 | 2023-11-21 | Suresoft Technologies Inc. | Method for selecting the last patch from among a plurality patches for same location and the last patch selection module |
CN116705642A (en) * | 2023-08-02 | 2023-09-05 | 西安邮电大学 | Method and system for detecting silver plating defect of semiconductor lead frame and electronic equipment |
CN116705642B (en) * | 2023-08-02 | 2024-01-19 | 西安邮电大学 | Method and system for detecting silver plating defect of semiconductor lead frame and electronic equipment |
CN117376632A (en) * | 2023-12-06 | 2024-01-09 | 中国信息通信研究院 | Data recovery method and system based on intelligent depth synthesis |
CN117376632B (en) * | 2023-12-06 | 2024-02-06 | 中国信息通信研究院 | Data recovery method and system based on intelligent depth synthesis |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109299274B (en) | Natural scene text detection method based on full convolution neural network | |
CN110738697A (en) | Monocular depth estimation method based on deep learning | |
CN114897738A (en) | Image blind restoration method based on semantic inconsistency detection | |
CN114120102A (en) | Boundary-optimized remote sensing image semantic segmentation method, device, equipment and medium | |
CN111340738B (en) | Image rain removing method based on multi-scale progressive fusion | |
CN111539887B (en) | Channel attention mechanism and layered learning neural network image defogging method based on mixed convolution | |
WO2022127454A1 (en) | Method and device for training cutout model and for cutout, equipment, and storage medium | |
CN111160407B (en) | Deep learning target detection method and system | |
CN113313810B (en) | 6D attitude parameter calculation method for transparent object | |
CN114048822A (en) | Attention mechanism feature fusion segmentation method for image | |
CN111882620A (en) | Road drivable area segmentation method based on multi-scale information | |
CN112396039B (en) | Mars grid terrain map generation method based on neighborhood relationship | |
CN112861785B (en) | Instance segmentation and image restoration-based pedestrian re-identification method with shielding function | |
CN115546768A (en) | Pavement marking identification method and system based on multi-scale mechanism and attention mechanism | |
CN113971764B (en) | Remote sensing image small target detection method based on improvement YOLOv3 | |
CN114926498A (en) | Rapid target tracking method based on space-time constraint and learnable feature matching | |
CN112785610B (en) | Lane line semantic segmentation method integrating low-level features | |
CN113807185A (en) | Data processing method and device | |
CN113378642A (en) | Method for detecting illegal occupation buildings in rural areas | |
CN109255794B (en) | Standard part depth full convolution characteristic edge detection method | |
CN116258877A (en) | Land utilization scene similarity change detection method, device, medium and equipment | |
CN116229104A (en) | Saliency target detection method based on edge feature guidance | |
CN116258937A (en) | Small sample segmentation method, device, terminal and medium based on attention mechanism | |
CN115661451A (en) | Deep learning single-frame infrared small target high-resolution segmentation method | |
CN113192018B (en) | Water-cooled wall surface defect video identification method based on fast segmentation convolutional neural network |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |