CN112750082B - Human face super-resolution method and system based on fusion attention mechanism - Google Patents
Human face super-resolution method and system based on fusion attention mechanism Download PDFInfo
- Publication number
- CN112750082B CN112750082B CN202110081811.9A CN202110081811A CN112750082B CN 112750082 B CN112750082 B CN 112750082B CN 202110081811 A CN202110081811 A CN 202110081811A CN 112750082 B CN112750082 B CN 112750082B
- Authority
- CN
- China
- Prior art keywords
- attention
- fused
- resolution
- convolution
- fused attention
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 39
- 230000007246 mechanism Effects 0.000 title claims abstract description 21
- 230000004927 fusion Effects 0.000 title claims description 17
- 238000005070 sampling Methods 0.000 claims abstract description 18
- 230000001815 facial effect Effects 0.000 claims abstract description 17
- 230000000903 blocking effect Effects 0.000 claims abstract description 7
- 230000002708 enhancing effect Effects 0.000 claims abstract description 7
- 230000004913 activation Effects 0.000 claims description 14
- 238000004590 computer program Methods 0.000 claims description 5
- 230000009467 reduction Effects 0.000 claims description 5
- 230000009977 dual effect Effects 0.000 claims description 3
- 239000011159 matrix material Substances 0.000 claims description 3
- 238000011176 pooling Methods 0.000 claims description 3
- 238000012935 Averaging Methods 0.000 claims 1
- 230000006870 function Effects 0.000 description 10
- 238000012360 testing method Methods 0.000 description 7
- 238000012549 training Methods 0.000 description 7
- 238000013527 convolutional neural network Methods 0.000 description 6
- 238000010586 diagram Methods 0.000 description 6
- 230000000007 visual effect Effects 0.000 description 5
- 238000013135 deep learning Methods 0.000 description 3
- 230000015556 catabolic process Effects 0.000 description 2
- 238000006731 degradation reaction Methods 0.000 description 2
- 238000011156 evaluation Methods 0.000 description 2
- 230000006872 improvement Effects 0.000 description 2
- 230000003993 interaction Effects 0.000 description 2
- 210000000056 organ Anatomy 0.000 description 2
- 238000010200 validation analysis Methods 0.000 description 2
- 238000012795 verification Methods 0.000 description 2
- ORILYTVJVMAKLC-UHFFFAOYSA-N Adamantane Natural products C1C(C2)CC3CC1CC2C3 ORILYTVJVMAKLC-UHFFFAOYSA-N 0.000 description 1
- 208000004547 Hallucinations Diseases 0.000 description 1
- 238000004458 analytical method Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 239000012141 concentrate Substances 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 238000003384 imaging method Methods 0.000 description 1
- 238000000506 liquid--solid chromatography Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000008569 process Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T3/00—Geometric image transformations in the plane of the image
- G06T3/40—Scaling of whole images or parts thereof, e.g. expanding or contracting
- G06T3/4053—Scaling of whole images or parts thereof, e.g. expanding or contracting based on super-resolution, i.e. the output image resolution being higher than the sensor resolution
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/213—Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/25—Fusion techniques
- G06F18/253—Fusion techniques of extracted features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/16—Human faces, e.g. facial parts, sketches or expressions
- G06V40/168—Feature extraction; Face representation
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Computer Vision & Pattern Recognition (AREA)
- General Physics & Mathematics (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Artificial Intelligence (AREA)
- Life Sciences & Earth Sciences (AREA)
- Health & Medical Sciences (AREA)
- Oral & Maxillofacial Surgery (AREA)
- General Health & Medical Sciences (AREA)
- Human Computer Interaction (AREA)
- Multimedia (AREA)
- Image Processing (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a face super-resolution method and a face super-resolution system based on a fused attention mechanism, belonging to the field of face image super-resolution, wherein the method comprises the following steps: downsampling a high-resolution face image to a target low-resolution face image, performing blocking operation, separating out mutually overlapped image blocks, and extracting shallow features by using a shallow feature extractor; fusing the characteristics of the pixel, the channel and the space triple attention module, and enhancing the structural details of the reconstructed face; constructing a fused attention network as a deep feature extractor, inputting shallow facial features into the fused attention network to obtain deep features, wherein the fused attention network comprises a plurality of fused attention groups, and each fused attention group comprises a plurality of fused attention blocks; and (3) up-sampling the deep feature map, and reconstructing the up-sampled face feature map into a high-resolution face image of the target. The invention is superior to other latest face image super-resolution algorithms, and can generate higher-quality face high-resolution images.
Description
Technical Field
The invention belongs to the field of computer vision face super-resolution, and particularly relates to a face super-resolution method and a face super-resolution system based on a fused attention mechanism.
Background
The face Super-Resolution (face hallucination), which is a special field of Super-Resolution (SR), is a technology for deducing a High Resolution (HR) image from an input Low Resolution (LR) face image, and can significantly enhance the detail information of the Low Resolution face image. In real world surveillance scenes, the distance between the imaging sensor and the face tends to be too large, resulting in low resolution face images. And recovering the high-resolution face image by using the face super-resolution, thereby facilitating the identification of the target person. The method plays an important role in a plurality of applications such as face detection, face recognition and analysis.
Generally, the super-resolution of the face is similar to that of a general image restoration method, and three types of sources can be classified according to prior information: interpolation, reconstruction, and learning-based methods. Interpolation-based methods scale the pixel size of an image without generating pixels and calculate the value of missing pixels by mathematical formulas based on surrounding pixels. The super-resolution of the face based on reconstruction depends on the fusion sub-pixel registration information of a plurality of LR input images. However, when the magnification is too large, the efficiency and performance of the interpolation and reconstruction-based method may be greatly reduced. In recent decades, learning-based methods have been widely used in human face super-resolution, because learning-based methods can make full use of prior information in training samples, map LR images into HR images, and achieve satisfactory visual effects.
Recently, convolutional neural network (Convolutional Neural Networks, CNN) based methods have significantly improved over traditional SR methods. Among them, dong et al propose a deep convolution network (Learning a Deep Convolutional Network for Image Super-Resolution) for image super-Resolution, which is achieved by introducing three layers of CNN. Thereafter, in the development process of deep learning, the reconstruction performance of the SR is continuously improved, and the performance of the face SR is also improved. Attention mechanisms are introduced into the face SR to focus the face structure information. Wang et al propose a texture attention module (Face Super-Resolution by Learning Multi-view Texture Compensation) to obtain the correspondence between Face images and multi-view Face images. Song et al propose a two-stage face SR method (Learning to hallucinate face images via Component Generation and Enhancement, LCGE) that separately performs SR on five organ structures in a face image, and then restores these reconstructed organ structures to the face image, focusing the CNN's attention on local face information. Zhang proposes a channel attention mechanism (Image super-resolution using very deep residual channel attention networks, RCAN) to adaptively readjust the characteristics of the channel patterns by taking into account the interdependencies between channels.
Although the above-described face SR method using the attention mechanism achieves satisfactory results, most methods consider only a single attention mechanism, which limits the multi-feature extraction capability of CNN and lacks fusion and interaction of face structure information. Therefore, it is very important how to fully utilize various attention features to improve the reconstruction performance of the face SR.
Disclosure of Invention
Aiming at the defects or improvement demands of the prior art, the invention provides a face super-resolution method and a face super-resolution system based on a fused attention mechanism, which solve the technical problem that the existing face super-resolution reconstruction algorithm cannot simultaneously utilize multiple attention features, so that the reconstruction performance of a face image has a certain limitation.
In order to achieve the above object, according to one aspect of the present invention, there is provided a face super-resolution method based on a fused attention mechanism, including:
s1: a downsampling module is constructed to downsample the high-resolution face image to a target low-resolution face image;
s2: constructing a shallow feature extractor, performing blocking operation on the target low-resolution face image, separating out mutually overlapped image blocks, and extracting a shallow feature image by using the shallow feature extractor;
s3: constructing a fused attention block, fusing the characteristics of the pixel, the channel and the space triple attention module, generating the fused attention characteristic of the network, and enhancing the structural details of the reconstructed face;
s4: constructing a fused attention network as a deep feature extractor, and inputting shallow facial features into the fused attention network to obtain a deep facial feature map, wherein the fused attention network comprises a plurality of fused attention groups, and each fused attention group comprises a plurality of fused attention blocks;
s5: an up-sampling module is constructed, and up-sampling is carried out on the obtained deep feature map of the human face;
s6: and constructing a face image reconstruction module, and reconstructing the up-sampled face feature image into a high-resolution face image of the target.
In some alternative embodiments, step S2 comprises:
constructing a shallow feature extractor using a convolution layer, and extracting a shallow feature map, wherein the shallow feature map is expressed as: f (F) 0 =f(I LR ),F 0 Representing a shallow feature map, f representing a convolution operation, I LR Representing the input low resolution face image.
In some alternative embodiments, step S3 comprises:
constructing a fusion attention block consisting of three parallel attentions of pixel attentions, channel attentions and spatial attentions;
for input vector X H×W×C H represents the height of the feature map, W represents the width of the feature map, C represents the number of channels in which the feature map is located, and X H×W×C And inputting the three parallel attention points, fusing different characteristics extracted by different parallel attention points, and finally performing dimension reduction through one convolution to ensure that the input and the output are consistent in dimension.
In some alternative embodiments, in pixel attention, the computation is first reduced using a convolution to reduce the dimension, followed byThe latter consists of three parallel branches, wherein the uppermost and lowermost branches consist of a convolution and an activation function for obtaining dual pixel attention features; the middle layer branch consists of two convolutions and another activation function, is used for obtaining residual characteristics, and finally performs element level multiplication on the output characteristics of the three branches, and then obtains the final pixel attention characteristics by one convolutionWherein T is 1 、T 2 And T 3 Respectively representing three layers of branch characteristics, and f represents convolution operation;
in the channel attention, the global space information of the channel is firstly converted into a channel descriptor through a global average pool, a characteristic diagram of 1×1×C is obtained, and then the characteristic diagram is compressed into the channel descriptor through downsamplingThe feature map of each channel is up sampled again to be restored to 1 multiplied by C feature map, finally, a descriptor representing the weight of each channel is obtained by 1 multiplied by C through an activation function, and finally, the weight of each channel is multiplied by the two-dimensional matrix of the corresponding channel of the original feature map, and r is the coefficient of channel scaling;
in spatial attention, the channel size is reduced by starting with one convolution layer, and then the receptive field is enlarged by one convolution layer and the maximum pooling layer; immediately following a convolution group, the convolution group is composed of a plurality of convolution layers; finally, the final spatial attention feature is obtained from an activation function by upsampling the layer to recover the spatial dimension and using convolution to recover the channel dimension.
In some alternative embodiments, step S4 comprises:
the fused attention network comprises a plurality of fused attention groups FAG and long skip connections LSC, wherein each fused attention group further comprises a plurality of fused attention blocks with short skip connections SSCs, and the fused attention group of the m-th group is expressed as: f (F) m =H m (F m-1 )=H m (H m-1 (…H 1 (F 0 )…)),H m Represents the mth fused attention group, F m And F m-1 Input and output for the mth fused attention group;
stacking a fused attention block within each fused attention group, and representing an nth fused attention block in an mth fused attention group as: f (F) m,n =G m,n (F m,n-1 )=H m,n (H m,n-1 (…H m,1 (F m-1 )…),F m,n-1 And F m,n Is the input and output of the nth fused attention block in the mth fused attention group.
In some alternative embodiments, step S5 comprises:
the upsampled features are expressed as: f (F) UP =H UP (F BF ),F UP And H UP Representing the up-sampled features and up-sampling modules, respectively.
In some alternative embodiments, step S6 includes:
the face image reconstruction module is expressed as: i SR =H Recon (F UP ),H Recon And I SR Represented as a reconstruction module consisting of a convolution and a target high resolution face image, respectively.
In some alternative embodiments, the loss function L (θ) for the entire network is expressed as:n represents the size of the dataset, +.>And->The i Zhang Chao th face image and the i high-resolution face image in the data set are represented.
According to another aspect of the present invention, there is provided a face super-resolution system based on a fused attention mechanism, including:
the downsampling module is used for downsampling the high-resolution face image to a target low-resolution face image;
the shallow feature extractor module is used for performing blocking operation on the target low-resolution face image, and extracting a shallow feature image by using the shallow feature extractor after separating out image blocks which are overlapped with each other;
the deep feature extractor module is used for constructing a fused attention block, fusing the features of the pixel, the channel and the space triple attention module, generating the fused attention feature of the network and enhancing the structural details of the reconstructed face; constructing a fused attention network as a deep feature extractor, and inputting shallow facial features into the fused attention network to obtain a deep facial feature map, wherein the fused attention network comprises a plurality of fused attention groups, and each fused attention group comprises a plurality of fused attention blocks;
the up-sampling module is used for up-sampling the obtained deep feature map of the face;
and the face image reconstruction module is used for reconstructing the up-sampled face feature image into a high-resolution face image of the target.
According to another aspect of the present invention there is provided a computer readable storage medium having stored thereon a computer program which when executed by a processor performs the steps of any of the methods described above.
In general, the above technical solutions conceived by the present invention, compared with the prior art, enable the following beneficial effects to be obtained:
the invention provides a face super-resolution method and a face super-resolution system based on a fused attention mechanism, which are used for fusing pixel, channel and spatial attention characteristics, so that different attention span characteristics can be interacted and fused by a network, and the characteristic expression capability of the network is enhanced. The fused attention network provided by the invention can concentrate various attention characteristics of the network to interaction of facial structure information, so that the reconstruction performance of the facial image is improved.
Drawings
Fig. 1 is a schematic flow chart of a face super-resolution method based on a fused attention mechanism provided by an embodiment of the invention;
fig. 2 is a schematic diagram of a face super-resolution network structure based on a fused attention mechanism according to an embodiment of the present invention;
fig. 3 is a schematic structural diagram of a face super-resolution system based on a fused attention mechanism according to an embodiment of the present invention;
fig. 4 is a graph showing a comparison of test results according to an embodiment of the present invention.
Detailed Description
The present invention will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present invention more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention. In addition, the technical features of the embodiments of the present invention described below may be combined with each other as long as they do not collide with each other.
Fig. 1 is a schematic flow chart of a face super-resolution method based on a fused attention mechanism, which is provided by the embodiment of the invention, and includes the following steps:
s1: a downsampling module is constructed to downsample the high-resolution face image to a target low-resolution face image;
wherein the high resolution face image may be downsampled to the target low resolution face image using bicubic interpolation (Bicubic interpolation) in step S1.
In the embodiment of the invention, the FFHQ face data set is used as a training set, a verification set and a test set of the invention, wherein 850 images are included as the training data set, 100 images are used as the verification data set, and 50 images are used as the test data set. The image size in the dataset is 256 x 256 pixels, and in an embodiment of the invention the dataset may be downsampled using a bicubic degradation model, where the downsampling factor is 4, so that the downsampled low resolution image is 64 x 64 pixels in size.
S2: constructing a shallow feature extractor, performing blocking operation on the target low-resolution face image, separating out mutually overlapped image blocks, and extracting a shallow feature image by using the shallow feature extractor;
as shown in fig. 2, in an embodiment of the present invention, a 3*3 convolution layer may be used to construct a shallow feature extractor and extract a shallow feature map. The shallow feature map is expressed as:
F 0 =f 3×3 (I LR )
wherein F is 0 Representing shallow feature map, f 3×3 Representation 3*3 convolution, I LR Representing the input low resolution face image.
S3: constructing a fused attention block, fusing the characteristics of the pixel, the channel and the space triple attention module, generating the fused attention characteristic of the network, and enhancing the structural details of the reconstructed face;
in the embodiment of the invention, the fusion attention block consists of three parallel attention points of pixel attention, channel attention and space attention. For input vector X H×W×C H represents the height of the feature map, W represents the width of the feature map, C represents the number of channels in which the feature map is positioned, the feature map is input into three parallel attentions, different features extracted by different parallel attentions are fused, and finally the dimension is reduced by convolution of one 1*1, so that the input and the output are consistent in dimension, and the dimension can be expressed by a formula:
F fusion =f 1×1 (concat(PA,CA,SA))
wherein f 1×1 Representing 1*1 convolution layers, concat represents a fusion operation, (PA, CA, SA) representing pixel attention, channel attention, and spatial attention features, respectively.
Wherein in pixel attention, first the computation can be reduced using one 1*1 convolution dimension reduction, then it is composed of three parallel branches, where the uppermost and lowermost branches are composed of one 3*3 convolution and one Sigmoid activation function for obtaining the dual pixel attention feature; the middle layer branch consists of two 3*3 convolutions and a ReLU activation function to obtain residual features, and finally the output features of the three branches are multiplied at element level, and the final pixel attention feature is obtained by one 3*3 convolution, which can be expressed as:
wherein T is 1 、T 2 And T 3 Respectively represent three layers of branch characteristics, f 3×3 Representing 3*3 convolutions.
In the channel attention, the global spatial information of the channel can be firstly converted into the channel descriptor through a global average pool, namely, a characteristic diagram of 1×1×C is acquired, and then the characteristic diagram is compressed into the channel descriptor through downsamplingWherein r is a coefficient of channel scaling, up-sampling is restored to 1×1×c feature map, and finally, a descriptor representing the weight of each channel of 1×1×c is obtained through a sigmoid activation function, and finally, the weight of each channel is multiplied by a two-dimensional matrix of the channel corresponding to the original feature map.
Wherein in spatial attention, the channel size is reduced by starting with a 1 x 1 convolution layer, and then the receptive field is enlarged by a convolution layer with a step size of 2 and a maximum pooling layer; immediately following a convolution group, the convolution group is composed of 3 convolution layers with the step length of 3 and the convolution kernel of 7 multiplied by 7; finally, the spatial dimension is recovered by upsampling the layer and the channel dimension is recovered using a 1 x 1 convolution, the final spatial attention feature being obtained by a Sigmoid activation function.
In the embodiment of the present invention, the size of the convolution layer and the number of convolution layers may also be other values, which are not limited in uniqueness.
S4: constructing a fused attention network as a deep feature extractor, and inputting shallow facial features into the fused attention network to obtain a deep facial feature map, wherein the fused attention network comprises a plurality of fused attention groups, and each fused attention group comprises a plurality of fused attention blocks;
in the embodiment of the invention, the deep facial feature map is expressed as:
F BF =H FAN (F 0 )
wherein F is BF Representing deep facial feature map, H FAN Representing a fused attention network.
Wherein the fused attention network comprises 10 fused attention groups (Fusion Attention Group, FAG) and Long Skip-Connection (LSC). Each fused attention group also contains 10 fused attention blocks with Short Skip-Connection (SSC). The fused attention group of group m can be formulated as:
F m =H m (F m-1 )=H m (H m-1 (…H 1 (F 0 )…))
wherein. H m Represents the mth fused attention group, F m And F m-1 Input and output for the mth fused attention group; in addition, long skip connection LSCs are introduced to stabilize training of the network while residual information can be learned. Stacking fused attention blocks within each fused attention group, the nth fused attention block in the mth fused attention group can be expressed as:
F m,n =H m,n (F m,n-1 )=H m,n (H m,n-1 (…H m,1 (F m-1 )…)
wherein F is m,n-1 And F m,n Is the input and output of the nth fused attention block in the mth fused attention group, H m,n Is the nth fused attention block in the mth fused attention group.
S5: an up-sampling module is constructed, and up-sampling is carried out on the obtained deep feature map of the human face;
in the embodiment of the present invention, the up-sampled features are expressed as follows:
F UP =H UP (F BF )
wherein F is UP And H UP Representing the up-sampled features and up-sampling modules, respectively. The upsampling module may be implemented using sub-pixel convolution.
S6: and constructing a face image reconstruction module, and reconstructing the up-sampled face feature image into a high-resolution face image of the target.
In the embodiment of the invention, the face image reconstruction module is represented as follows:
I SR =H Recon (F UP )
wherein H is Recon And I SR Represented as a reconstruction module formed by a 3*3 convolution and a target high resolution face image, respectively.
Wherein the loss function L (θ) of the entire network is expressed as:
where N represents the size of the data set,and->The i Zhang Chao th face image and the i high-resolution face image in the data set are represented.
The invention also provides a face super-resolution system based on the fusion attention mechanism, which is used for realizing the face super-resolution method based on the fusion attention mechanism, as shown in fig. 3, and comprises the following steps:
a downsampling module 101, configured to downsample a high-resolution face image to a target low-resolution face image;
the shallow feature extractor module 102 is configured to perform a blocking operation on the target low-resolution face image, and extract a shallow feature map by using the shallow feature extractor after separating image blocks that overlap each other;
the deep feature extractor module 103 is used for constructing a fused attention block, fusing the features of the pixel, channel and space triple attention module, generating the fused attention feature of the network, and enhancing the structural details of the reconstructed face; constructing a fused attention network as a deep feature extractor, and inputting shallow facial features into the fused attention network to obtain a deep facial feature map, wherein the fused attention network comprises a plurality of fused attention groups, and each fused attention group comprises a plurality of fused attention blocks;
an up-sampling module 104, configured to up-sample the obtained deep feature map of the face;
the face image reconstruction module 105 is configured to reconstruct the up-sampled face feature map into a high-resolution face image of the target.
The invention also provides a computer storage medium, in which a computer program executable by a computer processor is stored, the computer program executing the above-mentioned human face super-resolution method based on the fusion attention mechanism.
The invention finally provides a test embodiment, which uses the FFHQ face data set to verify the algorithm. 850 images were used as training data sets, 100 images as validation data sets, and 50 images as test data sets. The HR image size is 256×256 pixels, the downsampling factor is 4, and thus the LR image (using the bicubic degradation model) is 64×64 pixels in size. Note that all training, validation and testing are based on luminance channels in YCbCr color space and use a 4-fold magnification factor for training and testing. The SR reconstruction result is evaluated using four evaluation indexes of Peak signal-to-noise ratio (PSNR), structural similarity (Structural SIMilarity, SSIM), feature similarity (Feature Similarity, FSIM), and visual information fidelity (Visual Information Fidelity, VIF) to verify the performance of SR reconstruction under a luminance channel. The model is trained by Adam optimizer, beta 1 =0.9,β 2 =0.999, and e=10 -8 . The initial learning rate is set to 10 -4 Then halved every 50 cycles. Table 1 shows the comparison results under the condition that the reconstruction multiple is 4 by the three evaluation indexes, and fig. 4 is a comparison chart of the reconstruction results of 4-fold face images.
The human face SR method for comparison comprises the following steps: bicubic, LCGE, EDGAN, SRFBN, MTC and RCAN. Bicubic is a classical image interpolation algorithm; LCGE is a classical two-step face SR method; EDGAN is one of the most advanced deep learning face SR algorithms that use the generation of a countermeasure network (Generative Adversarial Networks, GAN); SRFBN is a network of the latest and most advanced deep learning face SR algorithms using a feedback network; MTC is a new face SR based on multi-view texture compensation; RCAN is a classical SR method based on depth residual channel attention network. Fig. 4 (a) shows a Bicubic image; (b) is a graph of experimental results of the invention; (c) As can be seen from the original high resolution image, the present invention achieves a very high visual effect in the visual result.
Table 1 comparison results table of the present invention with six excellent algorithms
Method | Bicubic | LCGE | EDGAN | RCAN | SRFBN | MTC | The invention is that |
PSNR/dB | 29.81 | 31.12 | 30.87 | 32.67 | 32.42 | 32.01 | 32.85 |
SSIM | 0.8451 | 0.8668 | 0.8574 | 0.8977 | 0.8944 | 0.8885 | 0.9011 |
FSIM | 0.8889 | 0.9099 | 0.9231 | 0.9337 | 0.9305 | 0.9281 | 0.9359 |
VIF | 0.5246 | 0.5563 | 0.5386 | 0.6161 | 0.6077 | 0.5933 | 0.6219 |
From the experimental results of the table above, it can be seen that the present invention achieves significant advantages over the other six methods.
It should be noted that each step/component described in the present application may be split into more steps/components, or two or more steps/components or part of the operations of the steps/components may be combined into new steps/components, as needed for implementation, to achieve the object of the present invention.
It will be readily appreciated by those skilled in the art that the foregoing description is merely a preferred embodiment of the invention and is not intended to limit the invention, but any modifications, equivalents, improvements or alternatives falling within the spirit and principles of the invention are intended to be included within the scope of the invention.
Claims (8)
1. The human face super-resolution method based on the fusion attention mechanism is characterized by comprising the following steps of:
s1: a downsampling module is constructed to downsample the high-resolution face image to a target low-resolution face image;
s2: constructing a shallow feature extractor, performing blocking operation on the target low-resolution face image, separating out mutually overlapped image blocks, and extracting a shallow feature image by using the shallow feature extractor;
s3: constructing a fused attention block, fusing the characteristics of the pixel, the channel and the space triple attention module, generating the fused attention characteristic of the network, and enhancing the structural details of the reconstructed face;
s4: constructing a fused attention network as a deep feature extractor, and inputting shallow facial features into the fused attention network to obtain a deep facial feature map, wherein the fused attention network comprises a plurality of fused attention groups, and each fused attention group comprises a plurality of fused attention blocks;
s5: an up-sampling module is constructed, and up-sampling is carried out on the obtained deep feature map of the human face;
s6: a face image reconstruction module is constructed, and the face feature image after up-sampling is reconstructed into a high-resolution face image of the target;
the step S3 comprises the following steps:
constructing a fused attention block consisting of three parallel attentions of pixel attention, channel attention and spatial attention, wherein in pixel attention, first a convolution dimension reduction is used to reduce the computational effortThen three parallel branches, with the uppermost and lowermost branches consisting of a convolution and an activation function, for obtaining dual pixel attention features; the middle layer branch consists of two convolutions and another activation function, is used for obtaining residual characteristics, and finally performs element level multiplication on the output characteristics of the three branches, and then obtains the final pixel attention characteristics by one convolutionWherein T is 1 、T 2 And T 3 Respectively representing three layers of branch characteristics, and f represents convolution operation;
for input vector X H×W×C H represents the height of the feature map, W represents the width of the feature map, C represents the number of channels in which the feature map is located, and X H×W×C Inputting the three parallel attention points, fusing different characteristics extracted by different parallel attention points, and finally performing dimension reduction through one convolution to ensure that the input and the output are consistent in dimension;
the step S4 includes:
the fused attention network comprises a plurality of fused attention groups FAG and long skip connections LSC, wherein each fused attention group further comprises a plurality of fused attention blocks with short skip connections SSCs, and the fused attention group of the m-th group is expressed as: f (F) m =H m (F m-1 )=H m (H m-1 (…H 1 (F 0 )…)),H m Represents the mth fused attention group, F m And F m-1 Input and output for the mth fused attention group;
stacking a fused attention block within each fused attention group, and representing an nth fused attention block in an mth fused attention group as: f (F) m,n =H m,n (F m,n-1 )=H m,n (H m,n-1 (…H m,1 (F m-1 )…),F m,n-1 And F m,n Is the input and output of the nth fused attention block in the mth fused attention group.
2. The method according to claim 1, wherein step S2 comprises:
constructing a shallow feature extractor using a convolution layer, and extracting a shallow feature map, wherein the shallow feature map is expressed as: f (F) 0 =f(I LR ),F 0 Representing a shallow feature map, f representing a convolution operation, I LR Representing the input low resolution face image.
3. The method according to claim 2, wherein in channel attention, global spatial information of a channel is first converted into a channel descriptor by a global averaging pool, a feature map of 1 x C is obtained, and then compressed into by downsamplingThe feature map of each channel is up sampled again to be restored to 1 multiplied by C feature map, finally, a descriptor representing the weight of each channel is obtained by 1 multiplied by C through an activation function, and finally, the weight of each channel is multiplied by the two-dimensional matrix of the corresponding channel of the original feature map, and r is the coefficient of channel scaling; />
In spatial attention, the channel size is reduced by starting with one convolution layer, and then the receptive field is enlarged by one convolution layer and the maximum pooling layer; immediately following a convolution group, the convolution group is composed of a plurality of convolution layers; finally, the final spatial attention feature is obtained from an activation function by upsampling the layer to recover the spatial dimension and using convolution to recover the channel dimension.
4. A method according to claim 3, wherein step S5 comprises:
the upsampled features are expressed as: f (F) UP =H UP (F BF ),F uP And H UP Representing the up-sampled features and up-sampling modules, respectively.
5. The method according to claim 4, wherein step S6 comprises:
the face image reconstruction module is expressed as: i SR =H Recon (F UP ),H Recon And I SR Represented as a reconstruction module consisting of a convolution and a target high resolution face image, respectively.
7. A fused attention mechanism-based face super-resolution system, comprising:
the downsampling module is used for downsampling the high-resolution face image to a target low-resolution face image;
the shallow feature extractor module is used for performing blocking operation on the target low-resolution face image, and extracting a shallow feature image by using the shallow feature extractor after separating out image blocks which are overlapped with each other;
the deep feature extractor module is used for constructing a fused attention block, fusing the features of the pixel, the channel and the space triple attention module, generating the fused attention feature of the network and enhancing the structural details of the reconstructed face; constructing a fused attention network as a deep feature extractor, and inputting shallow facial features into the fused attention network to obtain a deep facial feature map, wherein the fused attention network comprises a plurality of fused attention groups, and each fused attention group comprises a plurality of fused attention blocks;
the up-sampling module is used for up-sampling the obtained deep feature map of the face; the face image reconstruction module is used for reconstructing the face feature image after up-sampling into a high-resolution face image of the target;
the deep feature extractor module is specifically configured to perform the following operations:
constructing a fused attention block consisting of three parallel attentions of pixel attentions, channel attentions and spatial attentions, wherein in the pixel attentions, first, a convolution is used to reduce the computation amount, and then three parallel branches are formed, wherein the branches of the uppermost layer and the lowermost layer are formed by a convolution and an activation function, and the two parallel branches are used for obtaining double pixel attentions; the middle layer branch consists of two convolutions and another activation function, is used for obtaining residual characteristics, and finally performs element level multiplication on the output characteristics of the three branches, and then obtains the final pixel attention characteristics by one convolutionWherein T is 1 、T 2 And T 3 Respectively representing three layers of branch characteristics, and f represents convolution operation; for input vector X H×W×C H represents the height of the feature map, W represents the width of the feature map, C represents the number of channels in which the feature map is located, and X H×W×C Inputting the three parallel attention points, fusing different characteristics extracted by different parallel attention points, and finally performing dimension reduction through one convolution to ensure that the input and the output are consistent in dimension; the fused attention network comprises a plurality of fused attention groups FAG and long skip connections LSC, wherein each fused attention group further comprises a plurality of fused attention blocks with short skip connections SSCs, and the fused attention group of the m-th group is expressed as: f (F) m =H m (F m-1 )=H m (H m-1 (…H 1 (F 0 )…)),H m Represents the mth fused attention group, F m And F m-1 Input and output for the mth fused attention group; stacking fusion attention blocks in each fusion attention group, and tabulating the nth fusion attention block in the mth fusion attention groupThe method is shown as follows: f (F) m,n =H m,n (F m,n-1 )=H m,n (H m,n-1 (…H m,1 (F m-1 )…),F m,n-1 And F m,n Is the input and output of the nth fused attention block in the mth fused attention group.
8. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the method according to any one of claims 1 to 6.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110081811.9A CN112750082B (en) | 2021-01-21 | 2021-01-21 | Human face super-resolution method and system based on fusion attention mechanism |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110081811.9A CN112750082B (en) | 2021-01-21 | 2021-01-21 | Human face super-resolution method and system based on fusion attention mechanism |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112750082A CN112750082A (en) | 2021-05-04 |
CN112750082B true CN112750082B (en) | 2023-05-16 |
Family
ID=75652773
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110081811.9A Active CN112750082B (en) | 2021-01-21 | 2021-01-21 | Human face super-resolution method and system based on fusion attention mechanism |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112750082B (en) |
Families Citing this family (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113269685A (en) * | 2021-05-12 | 2021-08-17 | 南通大学 | Image defogging method integrating multi-attention machine system |
CN113255530B (en) * | 2021-05-31 | 2024-03-29 | 合肥工业大学 | Attention-based multichannel data fusion network architecture and data processing method |
CN113256496B (en) * | 2021-06-11 | 2021-09-21 | 四川省人工智能研究院(宜宾) | Lightweight progressive feature fusion image super-resolution system and method |
CN113658040A (en) * | 2021-07-14 | 2021-11-16 | 西安理工大学 | Face super-resolution method based on prior information and attention fusion mechanism |
CN113379667B (en) * | 2021-07-16 | 2023-03-24 | 浙江大华技术股份有限公司 | Face image generation method, device, equipment and medium |
CN113642415B (en) * | 2021-07-19 | 2024-06-04 | 南京南瑞信息通信科技有限公司 | Face feature expression method and face recognition method |
CN113361493B (en) * | 2021-07-21 | 2022-05-20 | 天津大学 | Facial expression recognition method robust to different image resolutions |
CN113658047A (en) * | 2021-08-18 | 2021-11-16 | 北京石油化工学院 | Crystal image super-resolution reconstruction method |
CN113806561A (en) * | 2021-10-11 | 2021-12-17 | 中国人民解放军国防科技大学 | Knowledge graph fact complementing method based on entity attributes |
CN114418853B (en) * | 2022-01-21 | 2022-09-20 | 杭州碧游信息技术有限公司 | Image super-resolution optimization method, medium and equipment based on similar image retrieval |
CN114529450B (en) * | 2022-01-25 | 2023-04-25 | 华南理工大学 | Face image super-resolution method based on improved depth iteration cooperative network |
CN115358932B (en) * | 2022-10-24 | 2023-03-24 | 山东大学 | Multi-scale feature fusion face super-resolution reconstruction method and system |
CN116311479B (en) * | 2023-05-16 | 2023-07-21 | 四川轻化工大学 | Face recognition method, system and storage medium for unlocking automobile |
CN117061790B (en) * | 2023-10-12 | 2024-01-30 | 深圳云天畅想信息科技有限公司 | Streaming media video frame rendering method and device and storage medium |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111275643A (en) * | 2020-01-20 | 2020-06-12 | 西南科技大学 | True noise blind denoising network model and method based on channel and space attention |
CN111833246A (en) * | 2020-06-02 | 2020-10-27 | 天津大学 | Single-frame image super-resolution method based on attention cascade network |
CN111915592A (en) * | 2020-08-04 | 2020-11-10 | 西安电子科技大学 | Remote sensing image cloud detection method based on deep learning |
Family Cites Families (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9965705B2 (en) * | 2015-11-03 | 2018-05-08 | Baidu Usa Llc | Systems and methods for attention-based configurable convolutional neural networks (ABC-CNN) for visual question answering |
KR20190113119A (en) * | 2018-03-27 | 2019-10-08 | 삼성전자주식회사 | Method of calculating attention for convolutional neural network |
CN111192200A (en) * | 2020-01-02 | 2020-05-22 | 南京邮电大学 | Image super-resolution reconstruction method based on fusion attention mechanism residual error network |
CN112070670B (en) * | 2020-09-03 | 2022-05-10 | 武汉工程大学 | Face super-resolution method and system of global-local separation attention mechanism |
-
2021
- 2021-01-21 CN CN202110081811.9A patent/CN112750082B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111275643A (en) * | 2020-01-20 | 2020-06-12 | 西南科技大学 | True noise blind denoising network model and method based on channel and space attention |
CN111833246A (en) * | 2020-06-02 | 2020-10-27 | 天津大学 | Single-frame image super-resolution method based on attention cascade network |
CN111915592A (en) * | 2020-08-04 | 2020-11-10 | 西安电子科技大学 | Remote sensing image cloud detection method based on deep learning |
Non-Patent Citations (1)
Title |
---|
Yanting Hu,Jie Li,Yuanfei Huang,and Xinbo Gao.Channel-Wise and Spatial Feature Modulation Network for Single Image Super-Resolution.https://arxiv.org/abs/1809.11130.2018,全文. * |
Also Published As
Publication number | Publication date |
---|---|
CN112750082A (en) | 2021-05-04 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112750082B (en) | Human face super-resolution method and system based on fusion attention mechanism | |
CN108475415B (en) | Method and system for image processing | |
CN112070670B (en) | Face super-resolution method and system of global-local separation attention mechanism | |
CN111028150B (en) | Rapid space-time residual attention video super-resolution reconstruction method | |
Sun et al. | Lightweight image super-resolution via weighted multi-scale residual network | |
CN109389667B (en) | High-efficiency global illumination drawing method based on deep learning | |
CN111696038A (en) | Image super-resolution method, device, equipment and computer-readable storage medium | |
CN111768340B (en) | Super-resolution image reconstruction method and system based on dense multipath network | |
US20230153946A1 (en) | System and Method for Image Super-Resolution | |
CN113421187B (en) | Super-resolution reconstruction method, system, storage medium and equipment | |
CN116486074A (en) | Medical image segmentation method based on local and global context information coding | |
CN116757930A (en) | Remote sensing image super-resolution method, system and medium based on residual separation attention mechanism | |
CN114926336A (en) | Video super-resolution reconstruction method and device, computer equipment and storage medium | |
Shi et al. | Exploiting multi-scale parallel self-attention and local variation via dual-branch transformer-CNN structure for face super-resolution | |
CN116468605A (en) | Video super-resolution reconstruction method based on time-space layered mask attention fusion | |
Yang et al. | MRDN: A lightweight Multi-stage residual distillation network for image Super-Resolution | |
CN116091315A (en) | Face super-resolution reconstruction method based on progressive training and face semantic segmentation | |
Hua et al. | Dynamic scene deblurring with continuous cross-layer attention transmission | |
Zheng et al. | Depth image super-resolution using multi-dictionary sparse representation | |
CN116188272B (en) | Two-stage depth network image super-resolution reconstruction method suitable for multiple fuzzy cores | |
CN111292237B (en) | Face image super-resolution reconstruction method based on two-dimensional multi-set partial least square | |
Jeevan et al. | WaveMixSR: Resource-efficient neural network for image super-resolution | |
CN116797456A (en) | Image super-resolution reconstruction method, system, device and storage medium | |
CN116630152A (en) | Image resolution reconstruction method and device, storage medium and electronic equipment | |
CN116485654A (en) | Lightweight single-image super-resolution reconstruction method combining convolutional neural network and transducer |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |