CN117788292B

CN117788292B - Sub-pixel displacement-based super-resolution reconstruction system for sequence image

Info

Publication number: CN117788292B
Application number: CN202410100291.5A
Authority: CN
Inventors: 夏豪杰; 吴强; 曾鸿飞; 杨紫怡; 许非凡; 余鑫
Original assignee: Hefei University of Technology
Current assignee: Hefei University of Technology
Priority date: 2024-01-24
Filing date: 2024-01-24
Publication date: 2024-06-11
Anticipated expiration: 2044-01-24
Also published as: CN117788292A

Abstract

The invention discloses a sub-pixel displacement-based super-resolution reconstruction system for a sequence image, which comprises the following components: 1. the upper computer processing module: the device is used for sending a displacement instruction and an acquisition instruction and receiving an acquired sequence image; 2. and an imaging module: as an imaging core of the system, for acquiring images; 3. and the micro-displacement module is used for: receiving a displacement instruction, and driving an image sensor in an imaging module to perform accurate fine adjustment according to a preset position; 4. and a network construction module: constructing and optimizing a sub-pixel displacement-based sequence image super-resolution network through training data; 5. super-resolution module: and processing the acquired sequence images by using a trained network to generate a high-resolution image which is amplified by M times. The invention provides a sequence image with fixed sub-pixel displacement and a deep learning image processing technology through accurate hardware control, thereby obviously improving the image resolution and providing a higher-quality image solution for various fields such as medical imaging, remote sensing detection, precise measurement and the like.

Description

Sub-pixel displacement-based super-resolution reconstruction system for sequence image

Technical Field

The invention belongs to the field of super-resolution reconstruction, and particularly relates to a sub-pixel displacement-based super-resolution reconstruction system for a sequence image.

Background

With the continuous development of imaging technology, the requirement for high resolution of images is more and more urgent in the fields of vision measurement, medical imaging and the like. Under the hardware strategy, the most straightforward way to increase image resolution is to design new imaging systems or to retrofit existing imaging hardware to increase resolution, with limited acquired image resolution due to the limitations of sensor array density. Under the strategy of software, complementary information is fused into a low-resolution image or a sequence image through an image processing method, so that detail compensation and definition improvement are realized, and the image resolution is limited by the quality and technical means of an original image. In recent years, in the field of image processing, the rapid development of deep learning technology has brought about revolutionary changes to the field of image processing.

For single image super-resolution, super-resolution models based on deep learning are also receiving a great deal of attention. Deep learning has shown great potential in the super-resolution field from early convolutional neural network-based to later emerging Transformer frameworks. When we extend the field of view to multi-image super-resolution reconstruction, deep learning has grown very slowly in this sub-field and needs to take into account the correlation and motion information between different images.

The inventor finds that the research on the multi-image super-resolution reconstruction method based on deep learning at home and abroad is mainly realized by a universal algorithm instead of specific sub-pixel sequence images which are staggered with each other. Because the main information of the existing multi-image super-resolution reconstruction technology is derived from pixel displacement information among images, most of the information is generated by jittering of a camera, and the pixel displacement among images is not fixed, so that the finally obtained sub-pixel information is very limited. Meanwhile, the existing super-resolution reconstruction algorithm is mainly used for acquiring the motion relation among multiple images, and the motion estimation algorithm with low precision is mainly relied on, so that the requirements of the market on high-resolution images cannot be met.

Disclosure of Invention

The invention aims to solve the defects of the prior art, and provides a sub-pixel displacement-based super-resolution reconstruction system for a sequence image, so that the resolution and quality of the image can be obviously improved by utilizing a soft-hard combination strategy, and the limitation of the existing hardware and software strategy in terms of improving the resolution of the image is effectively broken through, and more accurate and high-quality image reconstruction is realized.

In order to achieve the aim of the invention, the invention adopts the following technical scheme:

the invention relates to a sub-pixel displacement-based super-resolution reconstruction system for a sequence image, which is characterized by comprising the following components: the device comprises an imaging module, a micro-displacement module, an upper computer processing module, a network construction module and a super-resolution module;

The imaging module includes: an image sensor and a lens;

the micro-displacement module comprises: a controller, a two-dimensional piezoelectric displacement table;

the image sensor is fixed on the two-dimensional piezoelectric displacement table, so that the central line of the image sensor coincides with the central line of the lens and is used as the initial position of the two-dimensional piezoelectric displacement table;

after the upper computer processing module sets a working mode, sending an acquisition instruction to the image sensor;

After the lens focuses on the object to be detected, the image sensor images the observed object through the lens at the initial position, so that an acquired initial image I ₁ of the observed object is sent to an upper computer processing module as an initial image I;

After receiving the initial image I ₁ of the observed object, the upper computer processing module sends an I-1 th displacement instruction to the controller;

The controller generates an ith-1 displacement precision signal in the X direction and the Y direction according to the ith-1 displacement instruction and sends the ith-1 displacement precision signal to the two-dimensional piezoelectric displacement platform;

The two-dimensional piezoelectric displacement platform respectively takes 1/M times of pixel size as a step in the X direction and the Y direction according to the ith displacement precision signal to drive the image sensor to carry out ith-1 time displacement, and after the ith-1 time preset position is reached, the upper computer processing module sends an acquisition instruction to the image sensor again, so that the image sensor acquires an ith-1 frame observation object displacement image I _i-1 at the ith-1 time preset position and then sends the I-1 frame observation object displacement image I _i-1 to the upper computer processing module;

the upper computer processing module receives an I-1 frame of observation object displacement image and then sends an I-th displacement instruction to the controller, so that the image sensor continuously acquires displacement images of the observation object at different preset positions under the displacement of the two-dimensional piezoelectric displacement table, and finally a group of sequence images I= { I ₁,I₂,…,I_i|i＝M², M = 1,2, … and N }, wherein M represents the amplification factor, and M ² represents the total number of the displacement images;

The network construction module constructs a sequence image super-resolution network based on sub-pixel displacement, trains the sequence image super-resolution network by utilizing a visible light data set, and obtains a trained sequence image super-resolution model;

And the super-resolution module inputs the sequence image I into a trained sequence image super-resolution model for processing, so that a super-resolution image with M times of reconstruction and amplification is obtained.

The super-resolution reconstruction system of the sequence image is also characterized in that the network construction module sets the reconstruction multiple as M, and performs step displacement of 1 to M-1 pixels on each original image in the visible light data set in the X-axis and Y-axis directions respectively, so as to obtain an image with 1 pixel displacement of M ² frames;

Downsampling the M ² frames with 1 pixel shift to obtain M ² frames with 1/M pixel shift Wherein/>Representing an i-th frame low resolution sequence image;

Taking any original image in the visible light data set as a high-resolution target image I _HR, wherein the dimension of the target image I _HR is H multiplied by W multiplied by C, H represents height, W represents width and C represents channel number;

the network construction module constructs a sequence image super-resolution network based on residual learning of an attention mechanism, and the sequence image super-resolution network comprises: the device comprises a sub-pixel sampling module, a shallow feature extraction module, a deep feature extraction module, a dense feature fusion module and a reconstruction module;

The sub-pixel sampling module carries out sub-pixel convolution operation with the up-sampling multiple of M on the I _LR to obtain a high-resolution characteristic diagram F ₀;

The shallow feature extraction module carries out convolution operation on the high-resolution feature map F ₀ and then outputs a shallow feature map F _SF;

The deep feature extraction module processes the shallow feature map F _SF by using k residual blocks, so as to obtain a residual feature map F _k output by a kth residual block by using formula (1):

In the formula (1), the components are as follows, Representing HCTM functions in the kth residual block;

The dense feature fusion module processes residual feature graphs [ F ₁,F₂,...,F_k ] output by k residual blocks by using the method (2) to obtain a dense feature graph F _DF:

F_DF＝W₂*(W₁*Concat([F₁,F₂,...,F_k]))+F₀ (2)

In formula (2), concat (·) represents the channel tandem operation per feature, and W ₁ and W ₂ represent the weights of the first convolution layer and the second convolution layer in the dense feature fusion module, respectively; + represents global feature residual connection;

The reconstruction module processes the dense feature map F _DF by using the formula (3), so that a super-resolution image I _SR with M times of reconstruction magnification is obtained;

I_SR＝W₄*(W₃*F_DF) (3)

In equation (3), W ₃ is the weight of the first convolutional layer in the reconstruction module, and W ₄ is the weight of the second convolutional layer in the reconstruction module;

The network construction module constructs a loss function based on the I _HR and the I _SR and is used for training and updating parameters of the sequence image super-resolution network so as to obtain a trained sequence image super-resolution model.

Any residual block of the deep feature extraction module comprises: first self-calibrating pixel attention moduleHybrid attention block H _HAB, convolutional layer cov, and second self-calibrating pixel attention mode/>And obtaining an output of each residual block using equation (4):

In equation (4), F _k-1 is the residual characteristic of the k-1 th residual block output, and W ₅ is the weight of the convolutional layer cov.

Compared with the prior art, the invention actively provides sub-pixel information among sequence images through the piezoelectric displacement platform, further enhances the images by utilizing a deep learning algorithm, finally can greatly improve the super-resolution reconstruction precision, makes remarkable progress in the aspects of reconstruction and definition of image details, improves the precision and reliability of the super-resolution reconstruction, further can provide advanced image solutions in key fields of precision measurement, medical imaging and the like, is expected to open up new application prospects, meets the increasing high-resolution image processing requirements, and particularly has the beneficial effects that:

1. Accurate sub-pixel displacement control: the invention precisely controls the position of the camera image sensor by using the two-dimensional piezoelectric micro-displacement platform, thereby realizing the sequential image acquisition of fixed sub-pixel displacement. The control mode solves the problem of inaccurate sub-pixel displacement caused by camera shake or inaccurate motion estimation algorithm in the traditional method, ensures that high-precision sub-pixel displacement sequence images are acquired, and enables the finally reconstructed images to be clearer and more accurate.

2. The method has the core that the sub-pixel displacement information coded in the sequence images is fully utilized, the M ² frame sequence images are directly flattened and input, and the sub-pixel convolution technology is applied, so that errors caused by insufficient precision in the traditional motion estimation algorithm are effectively compensated, the data processing efficiency and the image detail capturing capability are effectively improved, the tiny sub-pixel level change among the images can be extracted more accurately, and the capturing and reconstructing capability of the image details is enhanced.

3. Application of deep learning algorithm: the invention adopts a residual error learning algorithm based on an attention mechanism to reconstruct super resolution. By combining the local information processing capability of CNN and the global information processing advantage of a transducer, the reconstruction capability of details in sequence images is effectively improved, and meanwhile, the understanding and reconstruction of the whole content of the images are enhanced, so that the invention can extract more useful information from images with tiny displacement difference, thereby generating images with higher resolution and better quality. This is difficult to achieve in conventional methods, especially in processing sequential images with sub-pixel shift.

4. Combining hardware and software advantages: a key innovation of the present invention is that it combines precise control of hardware with advanced processing capabilities of software. Hardware (a two-dimensional piezoelectric micro-displacement platform and a CCD image sensor) provides accurate sub-pixel displacement information input between sequence images, and software (a residual error learning algorithm based on an attention mechanism) is responsible for carrying out efficient image processing and reconstruction by utilizing the sub-pixel information between the sequence images. The comprehensive application not only improves the resolution and quality of the reconstructed image, but also reduces the dependence on hardware imaging, so that the invention is widely applicable to different high-resolution imaging requirements.

Drawings

FIG. 1 is a schematic diagram of a sub-pixel displacement-based sequential image super-resolution reconstruction system;

FIG. 2 is a schematic diagram of the structure of the sub-pixel shift based sequential image super-resolution network of the present invention;

FIG. 3 is a schematic diagram of the residual block of HCTM of the present invention;

FIG. 4 is a schematic diagram of the structure of the self-calibrating channel attention module of the present invention;

FIG. 5 is a schematic diagram of the structure of the hybrid attention module of the present invention;

FIG. 6 is a schematic diagram of the channel attention module of the present invention;

fig. 7 is a diagram showing the comparison of the image generated by the present invention with other reconstruction methods in this embodiment.

Detailed Description

In this embodiment, in order to break through the limitation of the existing hardware and software strategies in terms of improving the image resolution, a sub-pixel displacement-based super-resolution reconstruction system for a sequential image realizes more accurate and high-quality image reconstruction. As shown in fig. 1, includes: the device comprises an imaging module, a micro-displacement module, an upper computer processing module, a network construction module and a super-resolution module.

An imaging module, comprising: an image sensor and a lens;

a micro-displacement module, comprising: a controller, a two-dimensional piezoelectric displacement table;

An image sensor is fixed on the two-dimensional piezoelectric displacement table, so that the central line of the image sensor coincides with the central line of the lens and is used as the initial position of the two-dimensional piezoelectric displacement table;

the lens is independent of the two-dimensional piezoelectric displacement table and fixed on the camera support with the shockproof function, so that the lens is ensured to be stable in the imaging process, and any vibration or displacement caused by the movement of the piezoelectric displacement table is avoided

After focusing an object to be detected by the lens, imaging the observed object by the image sensor at an initial position through the lens, so as to send an acquired initial image I ₁ of the observed object to the upper computer processing module;

The controllers are respectively connected with the two-dimensional piezoelectric displacement table and the upper computer processing module.

The two-dimensional piezoelectric displacement platform is used for realizing 1/M pixel-sized step displacement in the X direction and the Y direction;

The movement reference of the two-dimensional piezoelectric displacement stage is transmitted to the image sensor.

The two-dimensional piezoelectric displacement platform respectively takes 1/M times of pixel size as a step in the X direction and the Y direction according to the I-1 th displacement precision signal to drive the image sensor to carry out I-1 th displacement, and after the I-1 th displacement reaches a preset position, the upper computer processing module sends an acquisition instruction to the image sensor again, so that the image sensor acquires an I-1 th frame observation object displacement image I _i-1 at the preset position of the I-1 th displacement and then sends the I-1 th frame observation object displacement image I _i-1 to the upper computer processing module;

The upper computer processing module receives the I-1 th frame of observation object displacement image and then sends an I-th displacement instruction to the controller, so that the image sensor continuously acquires displacement images of the observation object at different preset positions under the displacement of the two-dimensional piezoelectric displacement table, and finally a group of sequence images I= { I ₁,I₂,…,I_i|i＝M², M=1, 2, … and N } are obtained, wherein M represents the amplification factor, and M ² represents the total number of the displacement images.

Taking super-resolution reconstruction magnification m=2 as an example, the number M ² of sequential images required for acquisition and reconstruction is 4 frames, and the required sub-pixel displacement is (0, 0), (0, 0.5), (0.5, 0), (0.5 ); at the initial position of the camera, a first frame of image is acquired, a two-dimensional piezoelectric displacement table is controlled to enable a sensing element to move rightwards by 0.5 times of pixel displacement, and a second image is acquired; moving upwards by 0.5 times of pixel displacement from the initial position, and collecting a third image; then respectively moving upwards and rightwards from the initial position by 0.5 times of pixel displacement, collecting a fourth image, and sequentially marking the images as I= [ I ₁,I₂,I₃,I₄ ] according to the collection sequence;

The super-resolution reconstruction system for the sequence images is characterized in that a network construction module sets reconstruction multiple as M, and performs 1-M-1 pixel stepping displacement on each original image in the visible light dataset in the X-axis and Y-axis directions respectively, so that M ² frames of images with 1 pixel displacement are obtained;

Downsampling the M ² frames with 1 pixel shift to obtain M ² frames with 1/M pixel shift Wherein/>Representing the i-th frame low resolution sequence image.

Taking the reconstruction multiple M as 3 as an example, and respectively performing displacement of 1 pixel and 2 pixels on each original image in the visible light data set in the directions of the X axis and the Y axis, we can generate 9 frames of images with relative displacement. Subsequently, 3-fold downsampling is performed on the 9-frame images, respectively, to finally obtain nine low-resolution sequential images having 1/3 pixel shift relative to each other

Any original image in the visible light dataset is taken as a high-resolution target image I _HR, and the dimension is H multiplied by W multiplied by C, wherein H represents height, W represents width and C represents channel number.

The images in the training set are resized and preprocessed to accommodate different magnifications: the high resolution image is cropped to 256×256, 192×192, and 124×124 in size, corresponding to magnification of×4, ×3, and×2, respectively;

Each image in each set of low resolution sequential images is subjected to a random cropping operation at a corresponding pixel point under a corresponding magnification to obtain a sequential image patch of size 64 x 64.

The network construction module constructs a sequential image super-resolution network based on residual learning of an attention mechanism, and the specific structure is shown in fig. 2 and comprises: the device comprises a sub-pixel sampling module, a shallow feature extraction module, a deep feature extraction module, a dense feature fusion module and a reconstruction module;

Sub-pixel sampling module, in particular a sequential image Flattening the channel, converting the input pixel space into a feature space by using the sub-pixel displacement encoded in the data, and forming a new high-resolution feature map F ₀ by using the pixel points at the same position, i.e. amplifying the image to M times of the low-resolution sequence image, for example: x 2, ×3, and x 4, etc.;

The shallow feature extraction module carries out convolution operation on the high-resolution feature map F ₀, maps from a low-dimensional space to a high-dimensional space and outputs a shallow feature map F _SF;

The deep feature extraction module processes the shallow feature map F _SF by using k HCTM residual blocks, and supposing that the number of HCTM is k, so as to obtain a residual feature map F _k output by the kth HCTM by using formula (1):

In the formula (1), the components are as follows, Representing a kth HCTM residual block function; each subsequent HCTM builds on the features extracted from the previous residual block, progressively refining and enhancing the feature representation. This layering and sequential processing is critical to an in-depth understanding of the image content, enabling the network to effectively reconstruct fine detail and high-level visual elements.

The deep feature extraction module is any one HCTM residual block based on a hybrid CNN and transducer architecture. As shown in fig. 3, includes: first self-calibrating pixel attention moduleHybrid attention block H _HAB, convolutional layer conv, and second self-calibrating pixel attention mode/>The proposed H _HAB can greatly improve the representation capability of the model; /(I)Is characterized by allowing the network to purposefully recover missing textures by pixel attention without introducing additional learnable parameters; given the input feature F _k-1, the kth HCTM uses/>Purposefully selecting important features from the input, and then extracting intermediate features through H _HAB; subsequently, using convolutional layer conv operation on intermediate features ensures the translation, etc. variability of our network. Finally, also introduceTo obtain features that focus more on the region of interest. And obtaining an output of each residual block using equation (2):

In equation (2), F _k-1 is the residual characteristic of the k-1 th residual block output, and W ₅ is the weight of the convolutional layer conv.

First self-calibrating pixel attention moduleThe aim is to enhance the feature extraction capability and to enhance the overall performance of the model. As shown in fig. 4, it integrates two key techniques, self-calibrating convolution (SC) and Pixel Attention (PA). The SC can not only extract multi-scale feature information, but also effectively capture more detailed elements such as contours and textures. The PA module enhances the ability of the network to identify local features of the image by assigning different attention weights to each pixel. As shown, the pixel attention typically includes a convolution layer and a sigmoid function to obtain an attention map, which is then multiplied by the input features. Self-calibrating convolution with pixel attention (SCAP) effectively combines the advantages of SC and PA modules while overcoming their respective limitations. The SC module provides an efficient feature extraction and spatial information adjustment mechanism, while the PA module strengthens the focus on local details.

Specifically, the SCPA includes two branches, each with a convolutional layer; given the input F _in, after the SCPA upper branch pixel convolution operation, the upper branch pixel convolution feature F' _in is output; after the SCPA lower branch pixel convolution operation, outputting a lower branch pixel convolution characteristic F' _in; notably, the number of channels C of F '_in and F' _in is only half of the input feature F _in.

The SCPA upper branch calculates an attention feature map F _PA for the upper branch pixel convolution feature F' _in through a pixel attention mechanism to strengthen the correlation representation of the feature; convolving the attention profile F _PA with conv to generate the upper branch profile F' _n, the process can be expressed as:

F_PA＝(W₅*F′_in)⊙σ(H_pconv(F′_in)) (3)

F′_n＝W₅*(F_PA) (4)

In the formula (3), H _pconv represents a first convolution operation of a branch on the SCPA, and σ and ∈indicate a function of sigmoid and element multiplication, respectively;

The SCPA lower branch carries out convolution layer conv operation on the lower branch pixel convolution characteristic F '_in to recover space domain information and output a lower branch characteristic F' _n;

The SCPA combines the output features F' _n and F "_n of the upper and lower branches, specifically by concatenating the features in channels by a Concat (·) function, and then processes the combined features through a pixel convolution layer to effectively fuse the attention information with the spatial information in order to purposefully recover the missing texture information. To speed up training, the final SCPA feature F _out is generated using the local residual path+.

F_out＝H_pconv(Concat(F′_n,F″_n))+F_in (5)

The hybrid attention block H _HAB employs two standard-based multi-head attention transducer modules in series, the structure of H _HAB is shown in fig. 4 as comprising: swin Transducer Layer (STL) and Hybrid Transducer Layer (HTL). H _HAB is a comprehensive feature extraction method that ensures the balance and thorough processing of image features by utilizing local and global attention mechanisms, which is critical for reconstructing high resolution images with rich detail and texture. The window size, embedding size, and attention header number in H _HAB are set to 8, 50, and 5, respectively.

In the Swin Transformer layer, as shown in fig. 5 (a), a first layer normalization (LN ₁), a multi-head self-attention Mechanism (MSA), a second layer normalization (LN ₂), and a multi-layer perceptron (MLP) are included. The input features are first processed through a first layer normalization (LN ₁) and then through a multi-head self-attention Mechanism (MSA). The attention output of each header in the MSA is connected to keep the output dimension matched to the input dimension, ensuring that the combined output has the same size as the original input. The output of the MSA is then normalized by a second layer normalization (LN ₂) and fed into an MLP consisting of two linear layers. Between these linear layers, a GELU activation function is applied to introduce nonlinearities. In both the MSA and MLP phases, local residual connections are used. This means that the outputs of each stage (MSA and MLP) are added to their respective input features, making the training smoother and helping to avoid the gradient vanishing problem.

In the hybrid converter layer, as shown in fig. 5 (b), a channel attention module (CAB) is added to the Swin converter layer, and the CAB and MSA modules are inserted in parallel after the first layer normalization (LN ₁) to enhance the representation capability of the network.

A channel attention module (CAB) consisting of CAB first standard convolutional layer, CAB second convolutional layer, and one Channel Attention (CA), and using GELU as an activation function between the convolutional layers. The Transformer architecture typically requires high channel dimensions for token embedding, and direct use of convolutional layers with constant width can result in significant computational costs. To address this problem, CAB employs channel compression and expansion strategies, in combination with a standard CA module, as shown in fig. 6. For an input feature with channel C, the CAB first convolution layer compresses the number of channels to a factor β. The CAB second convolution layer then expands the channel dimension back to the original size C. This method of compressing and then expanding channels helps reduce computational load while still allowing the network to efficiently handle features associated with the conversion channels. After the convolutional layer, the channel characteristics are adaptively readjusted using a standard Channel Attention (CA) module.

The dense feature fusion module processes the residual feature graphs [ F ₁,F₂,...,F_k ] output by the k residual blocks by using the formula (2), and introduces global feature residuals to obtain a dense feature graph F _DF:

F_DF＝W₂*(W₁*Concat[F₁,F₂,...,F_k]))+F₀ (6)

In formula (6), concat (·) represents the channel-to-channel tandem operation per feature, W ₁ and W ₂ represent the weights of the first and second convolution layers, respectively, in the dense feature fusion module; + represents global feature residual connection;

I_SR＝W₄*(W₃*F_DF) (7)

In equation (7), W ₃ is the weight of the first convolutional layer in the reconstruction module, and W ₄ is the weight of the second convolutional layer in the reconstruction module;

In the training process, the network construction module adopts an Adam algorithm as a main optimizer for model training, and parameters of the network construction module are set to be β1=0.9 and β2=0.999. In the training process, the initial learning rate is set to be 10 ^-4, and is halved after each 100 training periods are completed, and the whole training process comprises 400 training periods. To trade-off the size and performance of the model, our network contains four HCTM modules. The network constructed in this study was based on Pytorch framework and trained on a NVIDIARTX 3090 GPU.

The super-resolution module inputs the sequence image I into a trained sequence image super-resolution model for processing, so that a super-resolution image with M times of reconstruction and amplification is obtained.

The invention provides an innovative sub-pixel displacement-based super-resolution reconstruction system and a method thereof for a sequence image by skillfully combining hardware precise control and a deep learning software algorithm. The method is based on a hardware-implemented high-precision sub-pixel displacement technology, and ensures that the single pixel of the camera corresponds to the actual size of the imaging space accurately in the horizontal and vertical directions. The system drives the image sensor to move by utilizing the two-dimensional piezoelectric displacement table, so that a series of sequential images with high-precision sub-pixel displacement are captured. In the image reconstruction stage, the invention combines a convolutional neural network and a Transformer network architecture in deep learning, and adopts a residual error learning module based on an attention mechanism to carry out fine training on the model. The process makes full use of the sub-pixel displacement information encoded in the sequence image, successfully converts the sequence image into a super-resolution image, not only the resolution is obviously improved, but also the image details and useful information are more abundant, and the sequence image is easier to identify by human eyes. The invention realizes the reconstruction of the high-resolution image through the common low-resolution camera, and the technology has obvious value and important significance in a plurality of application fields such as remote sensing, defect detection and the like.

Examples:

In order to verify the effectiveness of the scheme of the present invention, the magnification M is Set to 4 in this example, so that a corresponding low-resolution sequence image M ² is created to 16 frames, and a comparison experiment is performed on five standard image test sets Set5, set14, urban100, BSD100, manga109, and objective evaluation indexes are shown in table 1.

Table 1 evaluation index comparison table of the inventive algorithm with Bicubic and POCS algorithms

From the comparison in table 1, it is evident that the present invention is significantly superior to other reconstruction methods in both peak signal-to-noise ratio (PSNR) and Structural Similarity (SSIM) image quality evaluation indexes.

In contrast studies, images generated by the present invention were compared with images generated using bicubic interpolation (Bicubic), convex set Projection (POCS), iterative projection (IBP) methods, and single image super-resolution (SwinIR) methods based on depth learning. This comparison allows us to intuitively observe that the algorithm of the present invention exhibits significant advantages in terms of super-resolution image reconstruction compared to other methods. Specifically, images reconstructed using conventional methods tend to have a loss in high frequency information, resulting in the overall image appearing blurred; the super-resolution of a single image based on deep learning is subjected to unnatural distortion under high magnification, especially in the edge and complex texture areas; the algorithm of the invention can effectively recover high-frequency information such as texture details and the like, thereby generating clearer and finer images. As shown in FIG. 7, the method of the invention effectively recovers high frequency information such as texture and contour details of a building, so that the image can be seen more clearly and truly.

Claims

1. A sub-pixel displacement based sequential image super-resolution reconstruction system, comprising: the device comprises an imaging module, a micro-displacement module, an upper computer processing module, a network construction module and a super-resolution module;

The imaging module includes: an image sensor and a lens;

2. The super-resolution reconstruction system according to claim 1, wherein the network construction module sets a reconstruction multiple to M, and performs a step displacement of 1 to M-1 pixels on each original image in the visible light dataset in the X-axis and Y-axis directions, respectively, so as to obtain an image in which M ² frames have a displacement of 1 pixel with respect to each other;

F_DF＝W₂*(W₁*Concat([F₁,F₂,...,F_k]))+F₀ (2)

I_SR＝W₄*(W₃*F_DF) (3)

3. The super-resolution reconstruction system according to claim 2, wherein the deep feature extraction module includes any one of residual blocks including: first self-calibrating pixel attention moduleHybrid attention block H _HAB, convolutional layer cov, and second self-calibrating pixel attention mode/>And obtaining an output of each residual block using equation (4):