CN112750082B - Human face super-resolution method and system based on fusion attention mechanism - Google Patents

Human face super-resolution method and system based on fusion attention mechanism Download PDF

Info

Publication number
CN112750082B
CN112750082B CN202110081811.9A CN202110081811A CN112750082B CN 112750082 B CN112750082 B CN 112750082B CN 202110081811 A CN202110081811 A CN 202110081811A CN 112750082 B CN112750082 B CN 112750082B
Authority
CN
China
Prior art keywords
attention
fused
resolution
convolution
fused attention
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110081811.9A
Other languages
Chinese (zh)
Other versions
CN112750082A (en
Inventor
卢涛
赵康辉
张彦铎
吴云韬
金从元
张力
余晗
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuhan Institute of Technology
Wuhan Fiberhome Technical Services Co Ltd
Original Assignee
Wuhan Institute of Technology
Wuhan Fiberhome Technical Services Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuhan Institute of Technology, Wuhan Fiberhome Technical Services Co Ltd filed Critical Wuhan Institute of Technology
Priority to CN202110081811.9A priority Critical patent/CN112750082B/en
Publication of CN112750082A publication Critical patent/CN112750082A/en
Application granted granted Critical
Publication of CN112750082B publication Critical patent/CN112750082B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/40Scaling of whole images or parts thereof, e.g. expanding or contracting
    • G06T3/4053Scaling of whole images or parts thereof, e.g. expanding or contracting based on super-resolution, i.e. the output image resolution being higher than the sensor resolution
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/213Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • General Health & Medical Sciences (AREA)
  • Human Computer Interaction (AREA)
  • Multimedia (AREA)
  • Image Processing (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a face super-resolution method and a face super-resolution system based on a fused attention mechanism, belonging to the field of face image super-resolution, wherein the method comprises the following steps: downsampling a high-resolution face image to a target low-resolution face image, performing blocking operation, separating out mutually overlapped image blocks, and extracting shallow features by using a shallow feature extractor; fusing the characteristics of the pixel, the channel and the space triple attention module, and enhancing the structural details of the reconstructed face; constructing a fused attention network as a deep feature extractor, inputting shallow facial features into the fused attention network to obtain deep features, wherein the fused attention network comprises a plurality of fused attention groups, and each fused attention group comprises a plurality of fused attention blocks; and (3) up-sampling the deep feature map, and reconstructing the up-sampled face feature map into a high-resolution face image of the target. The invention is superior to other latest face image super-resolution algorithms, and can generate higher-quality face high-resolution images.

Description

Human face super-resolution method and system based on fusion attention mechanism
Technical Field
The invention belongs to the field of computer vision face super-resolution, and particularly relates to a face super-resolution method and a face super-resolution system based on a fused attention mechanism.
Background
The face Super-Resolution (face hallucination), which is a special field of Super-Resolution (SR), is a technology for deducing a High Resolution (HR) image from an input Low Resolution (LR) face image, and can significantly enhance the detail information of the Low Resolution face image. In real world surveillance scenes, the distance between the imaging sensor and the face tends to be too large, resulting in low resolution face images. And recovering the high-resolution face image by using the face super-resolution, thereby facilitating the identification of the target person. The method plays an important role in a plurality of applications such as face detection, face recognition and analysis.
Generally, the super-resolution of the face is similar to that of a general image restoration method, and three types of sources can be classified according to prior information: interpolation, reconstruction, and learning-based methods. Interpolation-based methods scale the pixel size of an image without generating pixels and calculate the value of missing pixels by mathematical formulas based on surrounding pixels. The super-resolution of the face based on reconstruction depends on the fusion sub-pixel registration information of a plurality of LR input images. However, when the magnification is too large, the efficiency and performance of the interpolation and reconstruction-based method may be greatly reduced. In recent decades, learning-based methods have been widely used in human face super-resolution, because learning-based methods can make full use of prior information in training samples, map LR images into HR images, and achieve satisfactory visual effects.
Recently, convolutional neural network (Convolutional Neural Networks, CNN) based methods have significantly improved over traditional SR methods. Among them, dong et al propose a deep convolution network (Learning a Deep Convolutional Network for Image Super-Resolution) for image super-Resolution, which is achieved by introducing three layers of CNN. Thereafter, in the development process of deep learning, the reconstruction performance of the SR is continuously improved, and the performance of the face SR is also improved. Attention mechanisms are introduced into the face SR to focus the face structure information. Wang et al propose a texture attention module (Face Super-Resolution by Learning Multi-view Texture Compensation) to obtain the correspondence between Face images and multi-view Face images. Song et al propose a two-stage face SR method (Learning to hallucinate face images via Component Generation and Enhancement, LCGE) that separately performs SR on five organ structures in a face image, and then restores these reconstructed organ structures to the face image, focusing the CNN's attention on local face information. Zhang proposes a channel attention mechanism (Image super-resolution using very deep residual channel attention networks, RCAN) to adaptively readjust the characteristics of the channel patterns by taking into account the interdependencies between channels.
Although the above-described face SR method using the attention mechanism achieves satisfactory results, most methods consider only a single attention mechanism, which limits the multi-feature extraction capability of CNN and lacks fusion and interaction of face structure information. Therefore, it is very important how to fully utilize various attention features to improve the reconstruction performance of the face SR.
Disclosure of Invention
Aiming at the defects or improvement demands of the prior art, the invention provides a face super-resolution method and a face super-resolution system based on a fused attention mechanism, which solve the technical problem that the existing face super-resolution reconstruction algorithm cannot simultaneously utilize multiple attention features, so that the reconstruction performance of a face image has a certain limitation.
In order to achieve the above object, according to one aspect of the present invention, there is provided a face super-resolution method based on a fused attention mechanism, including:
s1: a downsampling module is constructed to downsample the high-resolution face image to a target low-resolution face image;
s2: constructing a shallow feature extractor, performing blocking operation on the target low-resolution face image, separating out mutually overlapped image blocks, and extracting a shallow feature image by using the shallow feature extractor;
s3: constructing a fused attention block, fusing the characteristics of the pixel, the channel and the space triple attention module, generating the fused attention characteristic of the network, and enhancing the structural details of the reconstructed face;
s4: constructing a fused attention network as a deep feature extractor, and inputting shallow facial features into the fused attention network to obtain a deep facial feature map, wherein the fused attention network comprises a plurality of fused attention groups, and each fused attention group comprises a plurality of fused attention blocks;
s5: an up-sampling module is constructed, and up-sampling is carried out on the obtained deep feature map of the human face;
s6: and constructing a face image reconstruction module, and reconstructing the up-sampled face feature image into a high-resolution face image of the target.
In some alternative embodiments, step S2 comprises:
constructing a shallow feature extractor using a convolution layer, and extracting a shallow feature map, wherein the shallow feature map is expressed as: f (F) 0 =f(I LR ),F 0 Representing a shallow feature map, f representing a convolution operation, I LR Representing the input low resolution face image.
In some alternative embodiments, step S3 comprises:
constructing a fusion attention block consisting of three parallel attentions of pixel attentions, channel attentions and spatial attentions;
for input vector X H×W×C H represents the height of the feature map, W represents the width of the feature map, C represents the number of channels in which the feature map is located, and X H×W×C And inputting the three parallel attention points, fusing different characteristics extracted by different parallel attention points, and finally performing dimension reduction through one convolution to ensure that the input and the output are consistent in dimension.
In some alternative embodiments, in pixel attention, the computation is first reduced using a convolution to reduce the dimension, followed byThe latter consists of three parallel branches, wherein the uppermost and lowermost branches consist of a convolution and an activation function for obtaining dual pixel attention features; the middle layer branch consists of two convolutions and another activation function, is used for obtaining residual characteristics, and finally performs element level multiplication on the output characteristics of the three branches, and then obtains the final pixel attention characteristics by one convolution
Figure BDA0002909622910000031
Wherein T is 1 、T 2 And T 3 Respectively representing three layers of branch characteristics, and f represents convolution operation;
in the channel attention, the global space information of the channel is firstly converted into a channel descriptor through a global average pool, a characteristic diagram of 1×1×C is obtained, and then the characteristic diagram is compressed into the channel descriptor through downsampling
Figure BDA0002909622910000041
The feature map of each channel is up sampled again to be restored to 1 multiplied by C feature map, finally, a descriptor representing the weight of each channel is obtained by 1 multiplied by C through an activation function, and finally, the weight of each channel is multiplied by the two-dimensional matrix of the corresponding channel of the original feature map, and r is the coefficient of channel scaling;
in spatial attention, the channel size is reduced by starting with one convolution layer, and then the receptive field is enlarged by one convolution layer and the maximum pooling layer; immediately following a convolution group, the convolution group is composed of a plurality of convolution layers; finally, the final spatial attention feature is obtained from an activation function by upsampling the layer to recover the spatial dimension and using convolution to recover the channel dimension.
In some alternative embodiments, step S4 comprises:
the fused attention network comprises a plurality of fused attention groups FAG and long skip connections LSC, wherein each fused attention group further comprises a plurality of fused attention blocks with short skip connections SSCs, and the fused attention group of the m-th group is expressed as: f (F) m =H m (F m-1 )=H m (H m-1 (…H 1 (F 0 )…)),H m Represents the mth fused attention group, F m And F m-1 Input and output for the mth fused attention group;
stacking a fused attention block within each fused attention group, and representing an nth fused attention block in an mth fused attention group as: f (F) m,n =G m,n (F m,n-1 )=H m,n (H m,n-1 (…H m,1 (F m-1 )…),F m,n-1 And F m,n Is the input and output of the nth fused attention block in the mth fused attention group.
In some alternative embodiments, step S5 comprises:
the upsampled features are expressed as: f (F) UP =H UP (F BF ),F UP And H UP Representing the up-sampled features and up-sampling modules, respectively.
In some alternative embodiments, step S6 includes:
the face image reconstruction module is expressed as: i SR =H Recon (F UP ),H Recon And I SR Represented as a reconstruction module consisting of a convolution and a target high resolution face image, respectively.
In some alternative embodiments, the loss function L (θ) for the entire network is expressed as:
Figure BDA0002909622910000051
n represents the size of the dataset, +.>
Figure BDA0002909622910000052
And->
Figure BDA0002909622910000053
The i Zhang Chao th face image and the i high-resolution face image in the data set are represented.
According to another aspect of the present invention, there is provided a face super-resolution system based on a fused attention mechanism, including:
the downsampling module is used for downsampling the high-resolution face image to a target low-resolution face image;
the shallow feature extractor module is used for performing blocking operation on the target low-resolution face image, and extracting a shallow feature image by using the shallow feature extractor after separating out image blocks which are overlapped with each other;
the deep feature extractor module is used for constructing a fused attention block, fusing the features of the pixel, the channel and the space triple attention module, generating the fused attention feature of the network and enhancing the structural details of the reconstructed face; constructing a fused attention network as a deep feature extractor, and inputting shallow facial features into the fused attention network to obtain a deep facial feature map, wherein the fused attention network comprises a plurality of fused attention groups, and each fused attention group comprises a plurality of fused attention blocks;
the up-sampling module is used for up-sampling the obtained deep feature map of the face;
and the face image reconstruction module is used for reconstructing the up-sampled face feature image into a high-resolution face image of the target.
According to another aspect of the present invention there is provided a computer readable storage medium having stored thereon a computer program which when executed by a processor performs the steps of any of the methods described above.
In general, the above technical solutions conceived by the present invention, compared with the prior art, enable the following beneficial effects to be obtained:
the invention provides a face super-resolution method and a face super-resolution system based on a fused attention mechanism, which are used for fusing pixel, channel and spatial attention characteristics, so that different attention span characteristics can be interacted and fused by a network, and the characteristic expression capability of the network is enhanced. The fused attention network provided by the invention can concentrate various attention characteristics of the network to interaction of facial structure information, so that the reconstruction performance of the facial image is improved.
Drawings
Fig. 1 is a schematic flow chart of a face super-resolution method based on a fused attention mechanism provided by an embodiment of the invention;
fig. 2 is a schematic diagram of a face super-resolution network structure based on a fused attention mechanism according to an embodiment of the present invention;
fig. 3 is a schematic structural diagram of a face super-resolution system based on a fused attention mechanism according to an embodiment of the present invention;
fig. 4 is a graph showing a comparison of test results according to an embodiment of the present invention.
Detailed Description
The present invention will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present invention more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention. In addition, the technical features of the embodiments of the present invention described below may be combined with each other as long as they do not collide with each other.
Fig. 1 is a schematic flow chart of a face super-resolution method based on a fused attention mechanism, which is provided by the embodiment of the invention, and includes the following steps:
s1: a downsampling module is constructed to downsample the high-resolution face image to a target low-resolution face image;
wherein the high resolution face image may be downsampled to the target low resolution face image using bicubic interpolation (Bicubic interpolation) in step S1.
In the embodiment of the invention, the FFHQ face data set is used as a training set, a verification set and a test set of the invention, wherein 850 images are included as the training data set, 100 images are used as the verification data set, and 50 images are used as the test data set. The image size in the dataset is 256 x 256 pixels, and in an embodiment of the invention the dataset may be downsampled using a bicubic degradation model, where the downsampling factor is 4, so that the downsampled low resolution image is 64 x 64 pixels in size.
S2: constructing a shallow feature extractor, performing blocking operation on the target low-resolution face image, separating out mutually overlapped image blocks, and extracting a shallow feature image by using the shallow feature extractor;
as shown in fig. 2, in an embodiment of the present invention, a 3*3 convolution layer may be used to construct a shallow feature extractor and extract a shallow feature map. The shallow feature map is expressed as:
F 0 =f 3×3 (I LR )
wherein F is 0 Representing shallow feature map, f 3×3 Representation 3*3 convolution, I LR Representing the input low resolution face image.
S3: constructing a fused attention block, fusing the characteristics of the pixel, the channel and the space triple attention module, generating the fused attention characteristic of the network, and enhancing the structural details of the reconstructed face;
in the embodiment of the invention, the fusion attention block consists of three parallel attention points of pixel attention, channel attention and space attention. For input vector X H×W×C H represents the height of the feature map, W represents the width of the feature map, C represents the number of channels in which the feature map is positioned, the feature map is input into three parallel attentions, different features extracted by different parallel attentions are fused, and finally the dimension is reduced by convolution of one 1*1, so that the input and the output are consistent in dimension, and the dimension can be expressed by a formula:
F fusion =f 1×1 (concat(PA,CA,SA))
wherein f 1×1 Representing 1*1 convolution layers, concat represents a fusion operation, (PA, CA, SA) representing pixel attention, channel attention, and spatial attention features, respectively.
Wherein in pixel attention, first the computation can be reduced using one 1*1 convolution dimension reduction, then it is composed of three parallel branches, where the uppermost and lowermost branches are composed of one 3*3 convolution and one Sigmoid activation function for obtaining the dual pixel attention feature; the middle layer branch consists of two 3*3 convolutions and a ReLU activation function to obtain residual features, and finally the output features of the three branches are multiplied at element level, and the final pixel attention feature is obtained by one 3*3 convolution, which can be expressed as:
Figure BDA0002909622910000081
wherein T is 1 、T 2 And T 3 Respectively represent three layers of branch characteristics, f 3×3 Representing 3*3 convolutions.
In the channel attention, the global spatial information of the channel can be firstly converted into the channel descriptor through a global average pool, namely, a characteristic diagram of 1×1×C is acquired, and then the characteristic diagram is compressed into the channel descriptor through downsampling
Figure BDA0002909622910000082
Wherein r is a coefficient of channel scaling, up-sampling is restored to 1×1×c feature map, and finally, a descriptor representing the weight of each channel of 1×1×c is obtained through a sigmoid activation function, and finally, the weight of each channel is multiplied by a two-dimensional matrix of the channel corresponding to the original feature map.
Wherein in spatial attention, the channel size is reduced by starting with a 1 x 1 convolution layer, and then the receptive field is enlarged by a convolution layer with a step size of 2 and a maximum pooling layer; immediately following a convolution group, the convolution group is composed of 3 convolution layers with the step length of 3 and the convolution kernel of 7 multiplied by 7; finally, the spatial dimension is recovered by upsampling the layer and the channel dimension is recovered using a 1 x 1 convolution, the final spatial attention feature being obtained by a Sigmoid activation function.
In the embodiment of the present invention, the size of the convolution layer and the number of convolution layers may also be other values, which are not limited in uniqueness.
S4: constructing a fused attention network as a deep feature extractor, and inputting shallow facial features into the fused attention network to obtain a deep facial feature map, wherein the fused attention network comprises a plurality of fused attention groups, and each fused attention group comprises a plurality of fused attention blocks;
in the embodiment of the invention, the deep facial feature map is expressed as:
F BF =H FAN (F 0 )
wherein F is BF Representing deep facial feature map, H FAN Representing a fused attention network.
Wherein the fused attention network comprises 10 fused attention groups (Fusion Attention Group, FAG) and Long Skip-Connection (LSC). Each fused attention group also contains 10 fused attention blocks with Short Skip-Connection (SSC). The fused attention group of group m can be formulated as:
F m =H m (F m-1 )=H m (H m-1 (…H 1 (F 0 )…))
wherein. H m Represents the mth fused attention group, F m And F m-1 Input and output for the mth fused attention group; in addition, long skip connection LSCs are introduced to stabilize training of the network while residual information can be learned. Stacking fused attention blocks within each fused attention group, the nth fused attention block in the mth fused attention group can be expressed as:
F m,n =H m,n (F m,n-1 )=H m,n (H m,n-1 (…H m,1 (F m-1 )…)
wherein F is m,n-1 And F m,n Is the input and output of the nth fused attention block in the mth fused attention group, H m,n Is the nth fused attention block in the mth fused attention group.
S5: an up-sampling module is constructed, and up-sampling is carried out on the obtained deep feature map of the human face;
in the embodiment of the present invention, the up-sampled features are expressed as follows:
F UP =H UP (F BF )
wherein F is UP And H UP Representing the up-sampled features and up-sampling modules, respectively. The upsampling module may be implemented using sub-pixel convolution.
S6: and constructing a face image reconstruction module, and reconstructing the up-sampled face feature image into a high-resolution face image of the target.
In the embodiment of the invention, the face image reconstruction module is represented as follows:
I SR =H Recon (F UP )
wherein H is Recon And I SR Represented as a reconstruction module formed by a 3*3 convolution and a target high resolution face image, respectively.
Wherein the loss function L (θ) of the entire network is expressed as:
Figure BDA0002909622910000091
where N represents the size of the data set,
Figure BDA0002909622910000092
and->
Figure BDA0002909622910000093
The i Zhang Chao th face image and the i high-resolution face image in the data set are represented.
The invention also provides a face super-resolution system based on the fusion attention mechanism, which is used for realizing the face super-resolution method based on the fusion attention mechanism, as shown in fig. 3, and comprises the following steps:
a downsampling module 101, configured to downsample a high-resolution face image to a target low-resolution face image;
the shallow feature extractor module 102 is configured to perform a blocking operation on the target low-resolution face image, and extract a shallow feature map by using the shallow feature extractor after separating image blocks that overlap each other;
the deep feature extractor module 103 is used for constructing a fused attention block, fusing the features of the pixel, channel and space triple attention module, generating the fused attention feature of the network, and enhancing the structural details of the reconstructed face; constructing a fused attention network as a deep feature extractor, and inputting shallow facial features into the fused attention network to obtain a deep facial feature map, wherein the fused attention network comprises a plurality of fused attention groups, and each fused attention group comprises a plurality of fused attention blocks;
an up-sampling module 104, configured to up-sample the obtained deep feature map of the face;
the face image reconstruction module 105 is configured to reconstruct the up-sampled face feature map into a high-resolution face image of the target.
The invention also provides a computer storage medium, in which a computer program executable by a computer processor is stored, the computer program executing the above-mentioned human face super-resolution method based on the fusion attention mechanism.
The invention finally provides a test embodiment, which uses the FFHQ face data set to verify the algorithm. 850 images were used as training data sets, 100 images as validation data sets, and 50 images as test data sets. The HR image size is 256×256 pixels, the downsampling factor is 4, and thus the LR image (using the bicubic degradation model) is 64×64 pixels in size. Note that all training, validation and testing are based on luminance channels in YCbCr color space and use a 4-fold magnification factor for training and testing. The SR reconstruction result is evaluated using four evaluation indexes of Peak signal-to-noise ratio (PSNR), structural similarity (Structural SIMilarity, SSIM), feature similarity (Feature Similarity, FSIM), and visual information fidelity (Visual Information Fidelity, VIF) to verify the performance of SR reconstruction under a luminance channel. The model is trained by Adam optimizer, beta 1 =0.9,β 2 =0.999, and e=10 -8 . The initial learning rate is set to 10 -4 Then halved every 50 cycles. Table 1 shows the comparison results under the condition that the reconstruction multiple is 4 by the three evaluation indexes, and fig. 4 is a comparison chart of the reconstruction results of 4-fold face images.
The human face SR method for comparison comprises the following steps: bicubic, LCGE, EDGAN, SRFBN, MTC and RCAN. Bicubic is a classical image interpolation algorithm; LCGE is a classical two-step face SR method; EDGAN is one of the most advanced deep learning face SR algorithms that use the generation of a countermeasure network (Generative Adversarial Networks, GAN); SRFBN is a network of the latest and most advanced deep learning face SR algorithms using a feedback network; MTC is a new face SR based on multi-view texture compensation; RCAN is a classical SR method based on depth residual channel attention network. Fig. 4 (a) shows a Bicubic image; (b) is a graph of experimental results of the invention; (c) As can be seen from the original high resolution image, the present invention achieves a very high visual effect in the visual result.
Table 1 comparison results table of the present invention with six excellent algorithms
Method Bicubic LCGE EDGAN RCAN SRFBN MTC The invention is that
PSNR/dB 29.81 31.12 30.87 32.67 32.42 32.01 32.85
SSIM 0.8451 0.8668 0.8574 0.8977 0.8944 0.8885 0.9011
FSIM 0.8889 0.9099 0.9231 0.9337 0.9305 0.9281 0.9359
VIF 0.5246 0.5563 0.5386 0.6161 0.6077 0.5933 0.6219
From the experimental results of the table above, it can be seen that the present invention achieves significant advantages over the other six methods.
It should be noted that each step/component described in the present application may be split into more steps/components, or two or more steps/components or part of the operations of the steps/components may be combined into new steps/components, as needed for implementation, to achieve the object of the present invention.
It will be readily appreciated by those skilled in the art that the foregoing description is merely a preferred embodiment of the invention and is not intended to limit the invention, but any modifications, equivalents, improvements or alternatives falling within the spirit and principles of the invention are intended to be included within the scope of the invention.

Claims (8)

1. The human face super-resolution method based on the fusion attention mechanism is characterized by comprising the following steps of:
s1: a downsampling module is constructed to downsample the high-resolution face image to a target low-resolution face image;
s2: constructing a shallow feature extractor, performing blocking operation on the target low-resolution face image, separating out mutually overlapped image blocks, and extracting a shallow feature image by using the shallow feature extractor;
s3: constructing a fused attention block, fusing the characteristics of the pixel, the channel and the space triple attention module, generating the fused attention characteristic of the network, and enhancing the structural details of the reconstructed face;
s4: constructing a fused attention network as a deep feature extractor, and inputting shallow facial features into the fused attention network to obtain a deep facial feature map, wherein the fused attention network comprises a plurality of fused attention groups, and each fused attention group comprises a plurality of fused attention blocks;
s5: an up-sampling module is constructed, and up-sampling is carried out on the obtained deep feature map of the human face;
s6: a face image reconstruction module is constructed, and the face feature image after up-sampling is reconstructed into a high-resolution face image of the target;
the step S3 comprises the following steps:
constructing a fused attention block consisting of three parallel attentions of pixel attention, channel attention and spatial attention, wherein in pixel attention, first a convolution dimension reduction is used to reduce the computational effortThen three parallel branches, with the uppermost and lowermost branches consisting of a convolution and an activation function, for obtaining dual pixel attention features; the middle layer branch consists of two convolutions and another activation function, is used for obtaining residual characteristics, and finally performs element level multiplication on the output characteristics of the three branches, and then obtains the final pixel attention characteristics by one convolution
Figure FDA0003937746770000011
Wherein T is 1 、T 2 And T 3 Respectively representing three layers of branch characteristics, and f represents convolution operation;
for input vector X H×W×C H represents the height of the feature map, W represents the width of the feature map, C represents the number of channels in which the feature map is located, and X H×W×C Inputting the three parallel attention points, fusing different characteristics extracted by different parallel attention points, and finally performing dimension reduction through one convolution to ensure that the input and the output are consistent in dimension;
the step S4 includes:
the fused attention network comprises a plurality of fused attention groups FAG and long skip connections LSC, wherein each fused attention group further comprises a plurality of fused attention blocks with short skip connections SSCs, and the fused attention group of the m-th group is expressed as: f (F) m =H m (F m-1 )=H m (H m-1 (…H 1 (F 0 )…)),H m Represents the mth fused attention group, F m And F m-1 Input and output for the mth fused attention group;
stacking a fused attention block within each fused attention group, and representing an nth fused attention block in an mth fused attention group as: f (F) m,n =H m,n (F m,n-1 )=H m,n (H m,n-1 (…H m,1 (F m-1 )…),F m,n-1 And F m,n Is the input and output of the nth fused attention block in the mth fused attention group.
2. The method according to claim 1, wherein step S2 comprises:
constructing a shallow feature extractor using a convolution layer, and extracting a shallow feature map, wherein the shallow feature map is expressed as: f (F) 0 =f(I LR ),F 0 Representing a shallow feature map, f representing a convolution operation, I LR Representing the input low resolution face image.
3. The method according to claim 2, wherein in channel attention, global spatial information of a channel is first converted into a channel descriptor by a global averaging pool, a feature map of 1 x C is obtained, and then compressed into by downsampling
Figure FDA0003937746770000021
The feature map of each channel is up sampled again to be restored to 1 multiplied by C feature map, finally, a descriptor representing the weight of each channel is obtained by 1 multiplied by C through an activation function, and finally, the weight of each channel is multiplied by the two-dimensional matrix of the corresponding channel of the original feature map, and r is the coefficient of channel scaling; />
In spatial attention, the channel size is reduced by starting with one convolution layer, and then the receptive field is enlarged by one convolution layer and the maximum pooling layer; immediately following a convolution group, the convolution group is composed of a plurality of convolution layers; finally, the final spatial attention feature is obtained from an activation function by upsampling the layer to recover the spatial dimension and using convolution to recover the channel dimension.
4. A method according to claim 3, wherein step S5 comprises:
the upsampled features are expressed as: f (F) UP =H UP (F BF ),F uP And H UP Representing the up-sampled features and up-sampling modules, respectively.
5. The method according to claim 4, wherein step S6 comprises:
the face image reconstruction module is expressed as: i SR =H Recon (F UP ),H Recon And I SR Represented as a reconstruction module consisting of a convolution and a target high resolution face image, respectively.
6. The method of claim 5, wherein the loss function L (θ) of the entire network is expressed as:
Figure FDA0003937746770000031
n represents the size of the dataset, +.>
Figure FDA0003937746770000032
And->
Figure FDA0003937746770000033
The i Zhang Chao th face image and the i high-resolution face image in the data set are represented.
7. A fused attention mechanism-based face super-resolution system, comprising:
the downsampling module is used for downsampling the high-resolution face image to a target low-resolution face image;
the shallow feature extractor module is used for performing blocking operation on the target low-resolution face image, and extracting a shallow feature image by using the shallow feature extractor after separating out image blocks which are overlapped with each other;
the deep feature extractor module is used for constructing a fused attention block, fusing the features of the pixel, the channel and the space triple attention module, generating the fused attention feature of the network and enhancing the structural details of the reconstructed face; constructing a fused attention network as a deep feature extractor, and inputting shallow facial features into the fused attention network to obtain a deep facial feature map, wherein the fused attention network comprises a plurality of fused attention groups, and each fused attention group comprises a plurality of fused attention blocks;
the up-sampling module is used for up-sampling the obtained deep feature map of the face; the face image reconstruction module is used for reconstructing the face feature image after up-sampling into a high-resolution face image of the target;
the deep feature extractor module is specifically configured to perform the following operations:
constructing a fused attention block consisting of three parallel attentions of pixel attentions, channel attentions and spatial attentions, wherein in the pixel attentions, first, a convolution is used to reduce the computation amount, and then three parallel branches are formed, wherein the branches of the uppermost layer and the lowermost layer are formed by a convolution and an activation function, and the two parallel branches are used for obtaining double pixel attentions; the middle layer branch consists of two convolutions and another activation function, is used for obtaining residual characteristics, and finally performs element level multiplication on the output characteristics of the three branches, and then obtains the final pixel attention characteristics by one convolution
Figure FDA0003937746770000041
Wherein T is 1 、T 2 And T 3 Respectively representing three layers of branch characteristics, and f represents convolution operation; for input vector X H×W×C H represents the height of the feature map, W represents the width of the feature map, C represents the number of channels in which the feature map is located, and X H×W×C Inputting the three parallel attention points, fusing different characteristics extracted by different parallel attention points, and finally performing dimension reduction through one convolution to ensure that the input and the output are consistent in dimension; the fused attention network comprises a plurality of fused attention groups FAG and long skip connections LSC, wherein each fused attention group further comprises a plurality of fused attention blocks with short skip connections SSCs, and the fused attention group of the m-th group is expressed as: f (F) m =H m (F m-1 )=H m (H m-1 (…H 1 (F 0 )…)),H m Represents the mth fused attention group, F m And F m-1 Input and output for the mth fused attention group; stacking fusion attention blocks in each fusion attention group, and tabulating the nth fusion attention block in the mth fusion attention groupThe method is shown as follows: f (F) m,n =H m,n (F m,n-1 )=H m,n (H m,n-1 (…H m,1 (F m-1 )…),F m,n-1 And F m,n Is the input and output of the nth fused attention block in the mth fused attention group.
8. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the method according to any one of claims 1 to 6.
CN202110081811.9A 2021-01-21 2021-01-21 Human face super-resolution method and system based on fusion attention mechanism Active CN112750082B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110081811.9A CN112750082B (en) 2021-01-21 2021-01-21 Human face super-resolution method and system based on fusion attention mechanism

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110081811.9A CN112750082B (en) 2021-01-21 2021-01-21 Human face super-resolution method and system based on fusion attention mechanism

Publications (2)

Publication Number Publication Date
CN112750082A CN112750082A (en) 2021-05-04
CN112750082B true CN112750082B (en) 2023-05-16

Family

ID=75652773

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110081811.9A Active CN112750082B (en) 2021-01-21 2021-01-21 Human face super-resolution method and system based on fusion attention mechanism

Country Status (1)

Country Link
CN (1) CN112750082B (en)

Families Citing this family (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113269685A (en) * 2021-05-12 2021-08-17 南通大学 Image defogging method integrating multi-attention machine system
CN113255530B (en) * 2021-05-31 2024-03-29 合肥工业大学 Attention-based multichannel data fusion network architecture and data processing method
CN113256496B (en) * 2021-06-11 2021-09-21 四川省人工智能研究院(宜宾) Lightweight progressive feature fusion image super-resolution system and method
CN113658040A (en) * 2021-07-14 2021-11-16 西安理工大学 Face super-resolution method based on prior information and attention fusion mechanism
CN113379667B (en) * 2021-07-16 2023-03-24 浙江大华技术股份有限公司 Face image generation method, device, equipment and medium
CN113642415B (en) * 2021-07-19 2024-06-04 南京南瑞信息通信科技有限公司 Face feature expression method and face recognition method
CN113361493B (en) * 2021-07-21 2022-05-20 天津大学 Facial expression recognition method robust to different image resolutions
CN113658047A (en) * 2021-08-18 2021-11-16 北京石油化工学院 Crystal image super-resolution reconstruction method
CN113806561A (en) * 2021-10-11 2021-12-17 中国人民解放军国防科技大学 Knowledge graph fact complementing method based on entity attributes
CN114418853B (en) * 2022-01-21 2022-09-20 杭州碧游信息技术有限公司 Image super-resolution optimization method, medium and equipment based on similar image retrieval
CN114529450B (en) * 2022-01-25 2023-04-25 华南理工大学 Face image super-resolution method based on improved depth iteration cooperative network
CN115358932B (en) * 2022-10-24 2023-03-24 山东大学 Multi-scale feature fusion face super-resolution reconstruction method and system
CN116311479B (en) * 2023-05-16 2023-07-21 四川轻化工大学 Face recognition method, system and storage medium for unlocking automobile
CN117061790B (en) * 2023-10-12 2024-01-30 深圳云天畅想信息科技有限公司 Streaming media video frame rendering method and device and storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111275643A (en) * 2020-01-20 2020-06-12 西南科技大学 True noise blind denoising network model and method based on channel and space attention
CN111833246A (en) * 2020-06-02 2020-10-27 天津大学 Single-frame image super-resolution method based on attention cascade network
CN111915592A (en) * 2020-08-04 2020-11-10 西安电子科技大学 Remote sensing image cloud detection method based on deep learning

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9965705B2 (en) * 2015-11-03 2018-05-08 Baidu Usa Llc Systems and methods for attention-based configurable convolutional neural networks (ABC-CNN) for visual question answering
KR20190113119A (en) * 2018-03-27 2019-10-08 삼성전자주식회사 Method of calculating attention for convolutional neural network
CN111192200A (en) * 2020-01-02 2020-05-22 南京邮电大学 Image super-resolution reconstruction method based on fusion attention mechanism residual error network
CN112070670B (en) * 2020-09-03 2022-05-10 武汉工程大学 Face super-resolution method and system of global-local separation attention mechanism

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111275643A (en) * 2020-01-20 2020-06-12 西南科技大学 True noise blind denoising network model and method based on channel and space attention
CN111833246A (en) * 2020-06-02 2020-10-27 天津大学 Single-frame image super-resolution method based on attention cascade network
CN111915592A (en) * 2020-08-04 2020-11-10 西安电子科技大学 Remote sensing image cloud detection method based on deep learning

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
Yanting Hu,Jie Li,Yuanfei Huang,and Xinbo Gao.Channel-Wise and Spatial Feature Modulation Network for Single Image Super-Resolution.https://arxiv.org/abs/1809.11130.2018,全文. *

Also Published As

Publication number Publication date
CN112750082A (en) 2021-05-04

Similar Documents

Publication Publication Date Title
CN112750082B (en) Human face super-resolution method and system based on fusion attention mechanism
CN108475415B (en) Method and system for image processing
CN112070670B (en) Face super-resolution method and system of global-local separation attention mechanism
CN111028150B (en) Rapid space-time residual attention video super-resolution reconstruction method
Sun et al. Lightweight image super-resolution via weighted multi-scale residual network
CN109389667B (en) High-efficiency global illumination drawing method based on deep learning
CN111696038A (en) Image super-resolution method, device, equipment and computer-readable storage medium
CN111768340B (en) Super-resolution image reconstruction method and system based on dense multipath network
US20230153946A1 (en) System and Method for Image Super-Resolution
CN113421187B (en) Super-resolution reconstruction method, system, storage medium and equipment
CN116486074A (en) Medical image segmentation method based on local and global context information coding
CN116757930A (en) Remote sensing image super-resolution method, system and medium based on residual separation attention mechanism
CN114926336A (en) Video super-resolution reconstruction method and device, computer equipment and storage medium
Shi et al. Exploiting multi-scale parallel self-attention and local variation via dual-branch transformer-CNN structure for face super-resolution
CN116468605A (en) Video super-resolution reconstruction method based on time-space layered mask attention fusion
Yang et al. MRDN: A lightweight Multi-stage residual distillation network for image Super-Resolution
CN116091315A (en) Face super-resolution reconstruction method based on progressive training and face semantic segmentation
Hua et al. Dynamic scene deblurring with continuous cross-layer attention transmission
Zheng et al. Depth image super-resolution using multi-dictionary sparse representation
CN116188272B (en) Two-stage depth network image super-resolution reconstruction method suitable for multiple fuzzy cores
CN111292237B (en) Face image super-resolution reconstruction method based on two-dimensional multi-set partial least square
Jeevan et al. WaveMixSR: Resource-efficient neural network for image super-resolution
CN116797456A (en) Image super-resolution reconstruction method, system, device and storage medium
CN116630152A (en) Image resolution reconstruction method and device, storage medium and electronic equipment
CN116485654A (en) Lightweight single-image super-resolution reconstruction method combining convolutional neural network and transducer

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant