CN114897738A - Image blind restoration method based on semantic inconsistency detection - Google Patents

Image blind restoration method based on semantic inconsistency detection Download PDF

Info

Publication number
CN114897738A
CN114897738A CN202210574618.3A CN202210574618A CN114897738A CN 114897738 A CN114897738 A CN 114897738A CN 202210574618 A CN202210574618 A CN 202210574618A CN 114897738 A CN114897738 A CN 114897738A
Authority
CN
China
Prior art keywords
image
damaged
region
prediction
residual
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210574618.3A
Other languages
Chinese (zh)
Inventor
李昕
王志宽
刘航源
孙百乐
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China University of Petroleum East China
Original Assignee
China University of Petroleum East China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China University of Petroleum East China filed Critical China University of Petroleum East China
Priority to CN202210574618.3A priority Critical patent/CN114897738A/en
Publication of CN114897738A publication Critical patent/CN114897738A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/77Retouching; Inpainting; Scratch removal
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/40Scaling of whole images or parts thereof, e.g. expanding or contracting
    • G06T3/4038Image mosaicing, e.g. composing plane images from plane sub-images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/70Denoising; Smoothing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2200/00Indexing scheme for image data processing or generation, in general
    • G06T2200/32Indexing scheme for image data processing or generation, in general involving image mosaicing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses an image blind restoration method based on semantic inconsistency detection, which comprises the following steps: preprocessing an image with noise pollution to be used as input; amplifying semantic difference between a polluted region and a background through a mask prediction network constructed by the annular residual block, and roughly positioning a degraded region in the polluted image; then, by utilizing the texture similarity among the regions of different classes, a fine prediction mask is obtained through a mask refinement network; combining the damaged image and the prediction mask to input into an image restoration network, and iteratively utilizing the information of the effective area to complement the content of the damaged area based on the confidence consideration of the mask; meanwhile, the consistency of the structure is improved by using the context attention aggregation module at different scales; and fusing the multi-feature information and then decoding and restoring the multi-feature information into an image, thereby realizing the blind repair of the degraded image. The method can accurately detect the noise pollution in the real damaged image, and meets the requirement of robust restoration of various degraded images.

Description

Image blind restoration method based on semantic inconsistency detection
Technical Field
The invention belongs to the field of computer graphics and image processing, and relates to an image blind restoration method based on semantic inconsistency detection.
Background
With the development of computer technology and multimedia technology, digital images become important information carriers. Over time and with some factors of resistance, the preservation process of photographs may be subject to various degradations, such as ink mark contamination, crease breakage, mildew and fade, etc.; in addition, accidents such as robbing of glasses during photographing and stains on camera lenses can also be caused when the time is recorded. These various techniques greatly affect the expression of image content. Therefore, image restoration techniques for restoring image content and improving image quality have been developed rapidly in recent years, and are widely used in the fields of image editing, target removal, biomedical image processing, criminal investigation, and the like. The image restoration technology has achieved a lot of important research results through years of development, and the processing means widely used at present, such as a Photoshop repair tool, etc., apply the traditional restoration method, and utilize the redundancy of image information to fill the damaged area with the pixels of the known area. Such methods can produce good patches of scene images with repetitive textures, but cannot produce new content due to lack of understanding of image semantics.
As a great research focus of computer vision direction, in recent years, researchers try to introduce deep learning methods into the field of image restoration, and although these models can infer missing content through provided effective pixels, these methods all assume blank content in an image as a damaged area and clearly need to provide a binary mask for calibration. The methods can train the model to infer the content of the missing region, however, the degradation mode and the position region of the damaged image in real life are often unknown, and it is difficult to provide an accurate mask to guide the region to be repaired in advance, which greatly limits the popularization of the methods in real scenes. Therefore, how to identify and repair the damaged content in the image only by the damaged image becomes a difficult problem to be solved urgently.
Disclosure of Invention
In order to overcome the defects, the invention provides an image blind repairing method based on semantic inconsistency detection, which comprises the following specific steps:
s1, inputting a damaged image I m A clean pixel region and a contaminated pixel region;
s2, constructing a mask prediction network through the multilayer residual block,generating single-channel coarse prediction soft mask for locating damaged area
Figure BDA0003661593370000021
S3, inputting the rough prediction mask and the damaged image obtained in the S2 into a mask thinning network again, improving the prediction accuracy of the regions such as the boundary and the like through reinforcement learning, and obtaining a fine damaged region prediction mask
Figure BDA0003661593370000022
S4, inputting the fine prediction mask obtained in S3 as prior information and the damaged image into a shared encoder, extracting the characteristics of effective pixels according to the guidance of the mask and transmitting the characteristics to a damaged area;
s5, inputting the deep characteristic diagram extracted by the encoder network into a multi-task parallel decoding branch, speculating the content of the missing area through a plurality of layers of rolling blocks, and ensuring the global semantic consistency by using context information;
s6, fusing the features extracted from different branches in S5, decoding by a decoder network, and recovering into an image;
and S7, utilizing the fine prediction mask in S3, cutting out pixels at the position of the damaged area in the S6 result, splicing the pixels with effective pixels in the damaged image, and outputting a final repaired image.
The technical scheme of the invention is characterized by comprising the following steps:
with respect to step S1, the invention first defines the damaged image, and instead of simply using blank pixels to represent the area to be repaired as in the prior art, the invention considers that the damaged image should consist of clean valid pixels and different types of degraded and contaminated pixels. Because no data set specially used for blind repair research exists at present, batch training data is firstly synthesized according to the thought for model training, and the mathematical expression is as follows:
Figure BDA0003661593370000023
in the formula (1), I m Representing a stitched defective image, I gt Representing a completely clean image, N representing the contaminating noise content, and M the binarization mask. In order to improve the robustness of the method, N simulates graffiti, creases, character occlusion, content of other images which are randomly intercepted, and the like and is spliced to I gt Generating a damaged image I containing multiple types of contamination and degradation m
Preferably, in step S1, in order to make the original image more natural to blend the pollution noise, a smooth gaussian function is used in the present invention to perform the smoothing process, and the formula is as follows:
I=I m *G σ (2)
in the formula (2), I represents the smoothed damaged image, I m Representing damaged images spliced directly, G σ Representing a two-dimensional gaussian kernel with standard deviation sigma.
For step S2, the present invention uses the modified circular residual convolution block as a feature extractor to locate the damaged area by enlarging the difference between the effective pixel area and the contaminated area and comparing the intrinsic properties of the different areas of the image. The annular residual block used in the invention comprises three steps, designs a recall and consolidation mechanism from human brain, and is realized through the propagation and feedback process of residual in CNN. The first stage is forward residual propagation, and solves the problem of gradient degradation in a deeper network by recalling input characteristic information, and formula definition can be expressed as:
y f =F(x,{W i })+W s *x (3)
in the formula (3), x represents an input feature map, y f The representation represents the learned residual map. F (x, { W) i }) represents the learned residual map, whose structure includes two convolution layers and an activation function ELU, W s Is a convolution of 1 × 1. The residual propagation looks like the memory mechanism of the human brain. Previous knowledge may be forgotten when the model learns more new knowledge, so a recall mechanism is needed to help evoke those previously ambiguous memories.
To further enhance the difference between the corrupted content and the valid content attributes, the second stage integrates the input feature information using residual feedback. By using a simple gating mechanism to learn the nonlinear relation between distinguishable characteristic channels, the diffusion of characteristic information is avoided, a response value is superposed on an input characteristic through an activation function, the difference of the image essential attributes of a noise region and an effective region is amplified, and a formula is defined as follows:
y b =(s(G(y f ))+1)*x (4)
in the formula (4), x is a residual mapping feature, y b Is the residual feedback feature, G (-) is the linear mapping, s is the activation function, sigmoid function used here. Unlike the recall mechanism simulated by residual propagation, residual feedback seems to be in the process of simulating human brain consolidated knowledge, and is a new understanding of features. In the third stage, the operation of the first stage is repeated, and residual error propagation is carried out on the new features, so that amplified feature differences are further learned. Two forward residual transmissions are combined with one reverse residual feedback to form a ring residual structure.
For step S3, the present invention introduces an attention mechanism to refine the coarse prediction mask, and improves the recognition result at details such as contour by paying attention to similar texture on the whole image. In particular, if a low confidence region predicted to be corrupt shares a similar texture with a high confidence region, the low confidence region should be modified. For this reason, it is necessary to extract key features of the damaged content from the high-confidence region to be used as global visual features of the class. According to the method, cosine similarity is calculated for a rough prediction mask to serve as new bias, a score map of a prediction region is reduced by Softmax, and the region which is still high after score is reduced can be considered as a region with obvious enough characteristics, so that key characteristics can be extracted from the regions to serve as global characteristics of a damaged region, and the calculation formula is as follows:
CosSim(x′ sem )=X∈R c×c
Figure BDA0003661593370000031
Figure BDA0003661593370000032
in the formula (5), CosSim (·) represents a modified cosine similarity calculation function, x' sem Representing a prediction weight matrix, i and j representing prediction classes, which may be divided into damaged and non-damaged areas, X i,j Indicating the cosine similarity between pixels of different prediction classes,
Figure BDA0003661593370000041
is x' sem The ith channel of (a), indicates the prediction results belonging to a certain class for each pixel. X i,j The closer to 1 the more closely the image is,
Figure BDA0003661593370000042
and
Figure BDA0003661593370000043
the more similar the activation results, the less trustworthy the location prediction. By setting the deviation of the same type of pixels to 0 and setting the deviation of different types of pixels to similarity score X i,j Thus, the region that still maintains a high activation value in the classification is the key feature, and the whole process is called key feature pooling.
Preferably, in said step S3, the invention utilizes a prediction weight matrix x' sem And a feature map x f Calculating the weighted sum to obtain the key feature v k The method comprises the following steps:
Figure BDA0003661593370000044
where i represents a prediction category. Merging the key features v k As Key, the feature x f Viewed as Query, highlight and key feature v k Obtaining an AttentionMap in the similar area, and performing convolution operation with the original image to predict the final thinning prediction mask
Figure BDA0003661593370000045
For step S4, the present invention introduces a gated convolution mechanism to improve residual convolution blocks, identifies damaged regions by learning, and dynamically selects effective pixel content in images, so that the convolution result depends only on effective pixels, and replaces the conventional residual convolution structure to perform feature extraction and integration of effective regions. The output of the gated convolution is calculated as:
Gating y,x =∑∑W g ·I
Feature y,x =∑∑W f ·I (7)
Figure BDA0003661593370000046
in the formula (7), I represents an input characteristic, W g And W f Two different convolution kernels are represented, #denotesthe use of the LeakyReLU activation function, and σ denotes the sigmoid function, all values are restricted to [0,1 ]]To indicate the importance of each local region,
Figure BDA0003661593370000047
denotes the element-by-element corresponding multiplication, O y,x Representing the output characteristics of the soft-gated weights.
Preferably, in step S4, in order to avoid the influence of the error accumulation of the prediction mask on the image restoration result, the invention uses a new Probability Context Normalization (PCN) to perform statistical information transfer at the end of the improved residual block, and propagates the statistical information such as the mean and variance of the effective pixel region to the damaged region, so as to ensure that the distribution of the features of the inner and outer regions is consistent, and the notations are as follows:
Figure BDA0003661593370000051
in equation (8), X represents the output of the last layer of convolution in the gated residual block, H tableWill prediction mask
Figure BDA0003661593370000052
Sampling to the same size as X, β is a learnable channel attention weight, and "represents information transfer, specifically:
Figure BDA0003661593370000053
in the formula (9), X P And X Q Respectively representing the contaminated area and the valid pixel area, mu (-) represents the area mean, and sigma (-) represents the area variance. For an image, the feature mean is related to global semantics, and the variance is related to local texture features.
For step S5, the present invention obtains image context information using a multi-scale context attention aggregation branch, where the context similarity calculates a cosine similarity measure similarity between the patches inside and outside the missing region, finds the content with the highest similarity in the valid region for the patch of the region to be complemented, and assigns a higher reference weight so that the complemented content is kept consistent with the context in terms of semantics and texture. The similarity metric is formulated as follows:
Figure BDA0003661593370000054
in the formula (10), p i And p j Feature patches representing valid and missing regions respectively, and then obtaining the attention score of each patch through a softmax function:
Figure BDA0003661593370000055
where N denotes the number of patches of the effective area division. Through calculation, each patch in the missing region finds a region in the effective pixel which is more worth focusing, and a higher reference weight is given to the feature fusion.
Preferably, in step S5, in order to reduce the amount of computation and increase the inference speed, the present invention propagates the inter-patch attention similarity score by using context information transfer, specifically, the present invention calculates the similarity score once on the feature map with the network deep layer size of 32 × 32, and then propagates the attention score to the lower layers of different scales by using context attention transfer to perform feature weighting, as follows:
Figure BDA0003661593370000056
in the formula (12), l represents different network shallow layers,
Figure BDA0003661593370000057
indicating the missing region patch at a different scale,
Figure BDA0003661593370000058
representing an effective area, s, corresponding to the same size i,j The attention score is shown and N represents the number of taps in the background. Since the feature map size is hierarchically varied, the size of the patch should be varied accordingly, specifically to enlarge the mapping region by comparing the current feature map size with the attention score map, for example, every four neighboring pixels in the feature map with a size of 128 × 128 share an attention score value. Through the attention score sharing mode, the model reasoning result not only obtains better global semantic consistency, but also improves the storage and calculation speed efficiency obviously.
The blind image restoration method based on the nonsingular consistency detection solves the problems that the prior art cannot solve the problem that the damaged images of multiple degradation modes in a real scene are restored and the calibration mask is difficult to directly obtain, and has the following advantages:
(1) compared with the existing repairing method, the method analyzes and designs an end-to-end network model, does not need to provide a mask for calibrating a damaged area, automatically identifies the polluted and damaged areas in the image, repairs the result with consistent semantics and complete vision, repairs various damaged modes in the real image, and has robustness and authenticity.
(2) The method can be conveniently expanded to other research fields of image processing, such as target removal, highlight removal, image rain removal and defogging, and has good mobility and applicability.
Drawings
FIG. 1 is a flow chart of blind image restoration based on semantic inconsistency detection according to the present invention.
FIG. 2 is a schematic diagram of a prediction mask refinement module according to the present invention.
FIG. 3 is a diagram illustrating a structure of a probabilistic context aggregation convolutional block according to the present invention.
Detailed Description
The present invention will be described in further detail with reference to the accompanying drawings and specific embodiments.
An image blind restoration method based on semantic inconsistency detection is shown in fig. 1, which is a flow chart of the image blind restoration method based on semantic inconsistency detection of the present invention, and the method includes:
s1, preprocessing data, reading a damaged image I with noise pollution m The image size is uniformly adjusted to 256 × 256, and then the image size is input into the network model through normalization processing. In the training stage, various degradation modes in the simulated real scene are synthesized into a damaged image, and then Gaussian smoothing operation processing is additionally used, so that the image is more real and natural.
S2, coarse prediction of damaged area, inputting the processed degraded image into coarse mask prediction network constructed by six layers of annular residual blocks, wherein the whole structure is a coder-decoder network, learning image inherent attribute by convolution integration image context information, amplifying difference between effective pixel area and damaged area by alternative calculation of annular structure of residual propagation and residual feedback, generating single-channel coarse damaged area prediction mask
Figure BDA0003661593370000061
When loss is calculated in the training stage, since only whether each position belongs to an effective area or a damaged area needs to be judged, binary cross entropy loss is used as a loss function and is expressed as follows:
Figure BDA0003661593370000062
in the formula (13), T is an adaptive weight, and p ∈ { p | M p 1 represents the real damage region, q ∈ { q | M q 0 represents a real effective area.
S3, refining the prediction mask, inputting the rough prediction mask generated in S2 and the damaged image into a mask refining network, firstly extracting image characteristics through a simple encoder, calculating the cosine similarity between the pixels predicted to be different categories, and then limiting the value to 0,1 by a softmax function as shown in figure 2]The more the numerical value is close to 1, the more the prediction category of the area is unreliable, the Key feature of the damaged area with high confidence level is screened out as Key, the overall image feature Query is traversed according to the Query mode of the attention mechanism to obtain the global attention weight, and finally the updated feature information is integrated through deconvolution and the image is restored to obtain the refined prediction mask with clearer and more accurate detail outline
Figure BDA0003661593370000071
And S4, extracting content features, inputting the damaged image into an encoder, and in order to avoid the influence caused by the wrong accumulation of the prediction mask, simultaneously, zooming the prediction thinning mask to be the same size as the feature map and inputting the prediction thinning mask into each layer of the encoder so as to guide the extraction of effective pixel information and the transmission of the effective pixel information to a damaged area. The encoder is composed of four layers of gating residual convolution blocks newly designed by the invention, the structure of the encoder is shown in fig. 3, the output of the standard convolution layers of two different tasks is multiplied element by element, wherein one layer is followed by a leayrelu function, the other layer is followed by a sigmoid function, the soft mask is automatically learned and updated from the input in a learnable mode, and the convolution operation is limited to be carried out only in an effective pixel area. In addition, probability context normalization is selected to replace batch normalization, so that transfer of image statistical information is realized, and the characteristic information distribution inside and outside the mask is ensured to be consistent.
S5, deducing the content of the missing region, the invention provides a multi-task parallel framework, and two parallel decoding branches are designed for feature reasoning and content propagation. As shown in fig. 1, the uplink branch is formed by multilayer cavity convolutions with expansion rates of 2, 4 and 8, and the receptive field is enlarged by different expansion rates to capture multi-scale context information; the descending branch uses a multi-scale context attention integration module, the attention scores among different patches are calculated on a feature map with the network deep layer size of 32 x 32, and feature weighting is carried out on the network shallow layers with different scales through a context attention transfer module, so that the feature global structure and semantic consistency are ensured.
And S6, decoding the features and restoring the images, and splicing the feature graphs extracted from different branches in the S5 according to channels and inputting the feature graphs into a decoder network for decoding. The structural design of the decoder is symmetrical to that of the encoder, the characteristics of four layers of gated residual convolution blocks and up-sampling are alternatively fused, and finally, the predicted restored image is restored through one layer of 3 multiplied by 3 common convolution;
and S7, outputting a final repairing result, selecting effective contents of the input image and contents of the predicting result by using the predicting mask to splice in order to ensure that the result is clearer, and outputting a clean repairing result with complete structure and consistent semantics through smoothing.
In conclusion, the blind restoration method based on the semantic inconsistency detection image is suitable for restoring real damaged images in real life, does not need to additionally provide a binarization mask for marking damaged areas, realizes high-quality restoration of degraded images through an end-to-end network, ensures that the restoration result has visual integrity and structural rationality, can robustly solve various image degradation and pollution in different real scenes, and has wide application value.
While the present invention has been described in detail with reference to the preferred embodiments, it should be understood that the above description should not be taken as limiting the invention. Various modifications and alterations to this invention will become apparent to those skilled in the art upon reading the foregoing description. Accordingly, the scope of the invention should be determined from the following claims.

Claims (2)

1. An image blind restoration method based on semantic inconsistency detection is characterized by comprising the following specific steps:
s1, inputting a damaged image I m Including a clean pixel region and a damaged pixel region;
s2, constructing a mask prediction network through multiple layers of residual blocks, and generating a single-channel rough prediction soft mask for positioning a damaged area
Figure FDA0003661593360000012
S3, inputting the rough prediction mask and the damaged image obtained in the S2 into a mask thinning network again, improving the prediction accuracy of the regions such as the boundary and the like through reinforcement learning, and obtaining a fine damaged region prediction mask
Figure FDA0003661593360000013
S4, inputting the fine prediction mask obtained in S3 as prior information and the damaged image into a shared encoder, extracting the characteristics of effective pixels according to the guidance of the mask and transmitting the characteristics to a damaged area;
s5, inputting the deep characteristic diagram extracted by the encoder network into a multi-task parallel decoding branch, speculating the content of the missing area through a plurality of layers of rolling blocks, and ensuring the global semantic consistency by using context information;
s6, fusing the features extracted from different branches in S5, decoding by a decoder network, and recovering into an image;
and S7, utilizing the fine prediction mask in S3, cutting out pixels at the position of the damaged area in the S6 result, splicing the pixels with effective pixels in the damaged image, and outputting a final repaired image.
2. The blind image restoration method based on semantic inconsistency detection according to claim 1, wherein for step S1, the invention first defines the damaged image, and unlike the existing research that simply uses blank pixels to represent the region to be restored, the invention considers that the damaged image should be composed of clean effective pixels and different types of degraded and polluted pixels. Because no data set specially used for blind repair research exists at present, batch training data is firstly synthesized according to the thought for model training, and the mathematical expression is as follows:
Figure FDA0003661593360000011
in the formula (1), I m Representing a stitched defective image, I gt Representing a completely clean image, N representing the contaminating noise content, and M the binarization mask. In order to improve the robustness of the method, N simulates graffiti, creases, character occlusion, content of other images which are randomly intercepted, and the like and is spliced to I gt Generating a damaged image I containing multiple types of contamination and degradation m
Preferably, in step S1, in order to make the original image more natural to blend the pollution noise, a smooth gaussian function is used in the present invention to perform the smoothing process, and the formula is as follows:
I=I m *G σ (2)
in the formula (2), I represents the smoothed damaged image, I m Representing damaged images spliced directly, G σ Representing a two-dimensional gaussian kernel with standard deviation sigma.
For step S2, the present invention uses the modified circular residual convolution block as a feature extractor to locate the damaged area by enlarging the difference between the effective pixel area and the contaminated area and comparing the intrinsic properties of the different areas of the image. The annular residual block used in the invention comprises three steps, designs a recall and consolidation mechanism from human brain, and is realized through the propagation and feedback process of residual in CNN. The first stage is forward residual propagation, and solves the problem of gradient degradation in a deeper network by recalling input characteristic information, and formula definition can be expressed as:
y f =F(x,{W i })+W s *x (3)
in the formula (3), x represents an input feature map, y f The representation represents the learned residual map. F (x, { W) i }) represents the learned residual map, whose structure includes two convolution layers and an activation function ELU, W s Is a convolution of 1 × 1. The residual propagation looks like the memory mechanism of the human brain. Previous knowledge may be forgotten when the model learns more new knowledge, so a recall mechanism is needed to help evoke those previously ambiguous memories.
To further enhance the difference between the corrupted content and the valid content attributes, the second stage integrates the input feature information using residual feedback. By using a simple gating mechanism to learn the nonlinear relation between distinguishable characteristic channels, the diffusion of characteristic information is avoided, a response value is superposed on an input characteristic through an activation function, the difference of the image essential attributes of a noise region and an effective region is amplified, and a formula is defined as follows:
y b =(s(G(y f ))+1)*x (4)
in the formula (4), x is a residual mapping feature, y b Is the residual feedback feature, G (-) is the linear mapping, s is the activation function, sigmoid function used here. Unlike the recall mechanism simulated by residual propagation, residual feedback seems to be in the process of simulating human brain consolidated knowledge, and is a new understanding of features. In the third stage, the operation of the first stage is repeated, and residual error propagation is carried out on the new features, so that amplified feature differences are further learned. Two forward residual transmissions are combined with one reverse residual feedback to form a ring residual structure.
For step S3, the present invention introduces an attention mechanism to refine the coarse prediction result, pay attention to similar texture on the whole image, and improve the recognition result at details such as contour. In particular, if a low confidence region predicted to be corrupt shares a similar texture with a high confidence region, the low confidence region should be modified. For this reason, it is necessary to extract key features of the damaged content from the high-confidence region to be used as global visual features of the class. According to the method, cosine similarity is calculated for a rough prediction mask to serve as new bias, a score map of a prediction region is reduced by Softmax, and the region which is still high after score is reduced can be considered as a region with obvious enough characteristics, so that key characteristics can be extracted from the regions to serve as global characteristics of a damaged region, and the calculation formula is as follows:
CosSim(x′ sem )=X∈R c×c
Figure FDA0003661593360000021
Figure FDA0003661593360000031
in the formula (5), CosSim (. cndot.) represents an improved cosine similarity calculation function, x' sem Representing a prediction weight matrix, i and j representing prediction categories, which may be divided into damaged and non-damaged areas, X i,j Indicating the cosine similarity between pixels of different prediction classes,
Figure FDA0003661593360000034
is x' sem The ith channel of (2) indicates the prediction result belonging to a certain class for each pixel. X i,j The closer to 1 the more closely the image is,
Figure FDA0003661593360000036
and
Figure FDA0003661593360000035
the more similar the activation results, the less trustworthy the location prediction. By setting the deviation of the same type of pixels to 0 and setting the deviation of different types of pixels to similarity score X i,j Thus, the regions that still maintain high activation values in the classification are key features, and the whole process is called as keyAnd (4) pooling the characteristics.
Preferably, in said step S3, the present invention utilizes a prediction weight matrix x' sen And a feature map x f Calculating the weighted sum to obtain the key feature v k The method comprises the following steps:
Figure FDA0003661593360000032
where i represents a prediction category. Merging the key features v k As Key, the feature x f Viewed as Query, highlight and key feature v k Obtaining an Attention Map in a similar area, and performing convolution operation with the original image to predict a final thinning prediction mask
Figure FDA0003661593360000037
For step S4, the present invention introduces a gated convolution mechanism to improve residual convolution blocks, identifies damaged regions by learning, and dynamically selects effective pixel content in images, so that the convolution result depends only on effective pixels, and replaces the conventional residual convolution structure to perform feature extraction and integration of effective regions. The output of the gated convolution is calculated as:
Figure FDA0003661593360000033
in the formula (7), I represents an input characteristic, W g And W f Two different convolution kernels are represented, #denotesthe use of the LeakyReLU activation function, and σ denotes the sigmoid function, all values are restricted to [0,1 ]]To indicate the importance of each local region,
Figure FDA0003661593360000038
denotes the element-by-element corresponding multiplication, O y,x Representing the output characteristics of the soft-gated weights.
Preferably, in step S4, in order to avoid the influence of the error accumulation of the prediction mask on the image restoration result, the invention uses a new Probability Context Normalization (PCN) to perform statistical information transfer at the end of the improved residual block, and propagates the statistical information such as the mean and variance of the effective pixel region to the damaged region, so as to ensure that the distribution of the features of the inner and outer regions is consistent, and the notations are as follows:
Figure FDA0003661593360000041
in equation (8), X represents the output of the last layer convolution in the gated residual block, and H represents the prediction mask
Figure FDA0003661593360000046
Sampled to the same size as X, β is a learnable channel attention weight,
Figure FDA0003661593360000047
the information transfer is represented, and the specific content is as follows:
Figure FDA0003661593360000042
in the formula (9), X P And X Q Respectively representing the contaminated area and the valid pixel area, mu (-) represents the area mean, and sigma (-) represents the area variance. For the image, the feature mean is related to global semantics, and the variance is related to local texture features.
For step S5, the present invention obtains image context information using a multi-scale context attention aggregation branch, where the context similarity calculates a cosine similarity measure similarity between the patches inside and outside the missing region, finds the content with the highest similarity in the valid region for the patch of the region to be complemented, and assigns a higher reference weight so that the complemented content is kept consistent with the context in terms of semantics and texture. The similarity metric is formulated as follows:
Figure FDA0003661593360000043
in the formula (10), p i And p j Feature patches representing valid and missing regions respectively, and then obtaining the attention score of each patch through a softmax function:
Figure FDA0003661593360000044
where N denotes the number of patches of the effective area division. Through calculation, each patch in the missing region finds a region in the effective pixel which is more worth focusing, and a higher reference weight is given to the feature fusion.
Preferably, in step S5, in order to reduce the amount of computation and increase the inference speed, the present invention propagates the inter-patch attention similarity score by using context information transfer, specifically, the present invention calculates the similarity score once on the feature map with the network deep layer size of 32 × 32, and then propagates the attention score to the lower layers of different scales by using context attention transfer to perform feature weighting, as follows:
Figure FDA0003661593360000045
in the formula (12), l represents different network shallow layers,
Figure FDA0003661593360000048
indicating the missing region patch at a different scale,
Figure FDA0003661593360000049
representing an effective area, s, corresponding to the same size i,j The attention score is shown and N represents the number of taps in the background. Since the size of the feature map isThe hierarchy varies, and therefore the size of the patch should also vary accordingly, specifically to enlarge the mapping region by comparing the current feature size with the attention score map, for example, every four neighboring pixels in a 128 × 128 size feature map share an attention score value. Through the attention score sharing mode, the model reasoning result not only obtains better global semantic consistency, but also improves the storage and calculation speed efficiency obviously.
CN202210574618.3A 2022-05-25 2022-05-25 Image blind restoration method based on semantic inconsistency detection Pending CN114897738A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210574618.3A CN114897738A (en) 2022-05-25 2022-05-25 Image blind restoration method based on semantic inconsistency detection

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210574618.3A CN114897738A (en) 2022-05-25 2022-05-25 Image blind restoration method based on semantic inconsistency detection

Publications (1)

Publication Number Publication Date
CN114897738A true CN114897738A (en) 2022-08-12

Family

ID=82725567

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210574618.3A Pending CN114897738A (en) 2022-05-25 2022-05-25 Image blind restoration method based on semantic inconsistency detection

Country Status (1)

Country Link
CN (1) CN114897738A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110942439A (en) * 2019-12-05 2020-03-31 北京华恒盛世科技有限公司 Image restoration and enhancement method based on satellite picture defects
US20230130772A1 (en) * 2021-10-22 2023-04-27 Suresoft Technologies Inc. Method for Selecting the Last Patch from Among a Plurality Patches for Same Location and the Last Patch Selection Module
CN116705642A (en) * 2023-08-02 2023-09-05 西安邮电大学 Method and system for detecting silver plating defect of semiconductor lead frame and electronic equipment
CN117376632A (en) * 2023-12-06 2024-01-09 中国信息通信研究院 Data recovery method and system based on intelligent depth synthesis

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110942439A (en) * 2019-12-05 2020-03-31 北京华恒盛世科技有限公司 Image restoration and enhancement method based on satellite picture defects
CN110942439B (en) * 2019-12-05 2023-09-19 北京华恒盛世科技有限公司 Image restoration and enhancement method based on satellite picture defects
US20230130772A1 (en) * 2021-10-22 2023-04-27 Suresoft Technologies Inc. Method for Selecting the Last Patch from Among a Plurality Patches for Same Location and the Last Patch Selection Module
US11822915B2 (en) * 2021-10-22 2023-11-21 Suresoft Technologies Inc. Method for selecting the last patch from among a plurality patches for same location and the last patch selection module
CN116705642A (en) * 2023-08-02 2023-09-05 西安邮电大学 Method and system for detecting silver plating defect of semiconductor lead frame and electronic equipment
CN116705642B (en) * 2023-08-02 2024-01-19 西安邮电大学 Method and system for detecting silver plating defect of semiconductor lead frame and electronic equipment
CN117376632A (en) * 2023-12-06 2024-01-09 中国信息通信研究院 Data recovery method and system based on intelligent depth synthesis
CN117376632B (en) * 2023-12-06 2024-02-06 中国信息通信研究院 Data recovery method and system based on intelligent depth synthesis

Similar Documents

Publication Publication Date Title
CN109299274B (en) Natural scene text detection method based on full convolution neural network
CN110738697A (en) Monocular depth estimation method based on deep learning
CN114897738A (en) Image blind restoration method based on semantic inconsistency detection
CN114120102A (en) Boundary-optimized remote sensing image semantic segmentation method, device, equipment and medium
CN111340738B (en) Image rain removing method based on multi-scale progressive fusion
CN111539887B (en) Channel attention mechanism and layered learning neural network image defogging method based on mixed convolution
WO2022127454A1 (en) Method and device for training cutout model and for cutout, equipment, and storage medium
CN111160407B (en) Deep learning target detection method and system
CN113313810B (en) 6D attitude parameter calculation method for transparent object
CN114048822A (en) Attention mechanism feature fusion segmentation method for image
CN111882620A (en) Road drivable area segmentation method based on multi-scale information
CN112396039B (en) Mars grid terrain map generation method based on neighborhood relationship
CN112861785B (en) Instance segmentation and image restoration-based pedestrian re-identification method with shielding function
CN115546768A (en) Pavement marking identification method and system based on multi-scale mechanism and attention mechanism
CN113971764B (en) Remote sensing image small target detection method based on improvement YOLOv3
CN114926498A (en) Rapid target tracking method based on space-time constraint and learnable feature matching
CN112785610B (en) Lane line semantic segmentation method integrating low-level features
CN113807185A (en) Data processing method and device
CN113378642A (en) Method for detecting illegal occupation buildings in rural areas
CN109255794B (en) Standard part depth full convolution characteristic edge detection method
CN116258877A (en) Land utilization scene similarity change detection method, device, medium and equipment
CN116229104A (en) Saliency target detection method based on edge feature guidance
CN116258937A (en) Small sample segmentation method, device, terminal and medium based on attention mechanism
CN115661451A (en) Deep learning single-frame infrared small target high-resolution segmentation method
CN113192018B (en) Water-cooled wall surface defect video identification method based on fast segmentation convolutional neural network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination