CN114881858A - Lightweight binocular image super-resolution method based on multi-attention machine system fusion - Google Patents

Lightweight binocular image super-resolution method based on multi-attention machine system fusion Download PDF

Info

Publication number
CN114881858A
CN114881858A CN202210538803.7A CN202210538803A CN114881858A CN 114881858 A CN114881858 A CN 114881858A CN 202210538803 A CN202210538803 A CN 202210538803A CN 114881858 A CN114881858 A CN 114881858A
Authority
CN
China
Prior art keywords
resolution
fusion
attention
super
feature
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210538803.7A
Other languages
Chinese (zh)
Inventor
裴文江
冯程晨
夏亦犁
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Southeast University
Original Assignee
Southeast University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Southeast University filed Critical Southeast University
Priority to CN202210538803.7A priority Critical patent/CN114881858A/en
Publication of CN114881858A publication Critical patent/CN114881858A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/40Scaling of whole images or parts thereof, e.g. expanding or contracting
    • G06T3/4053Scaling of whole images or parts thereof, e.g. expanding or contracting based on super-resolution, i.e. the output image resolution being higher than the sensor resolution
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/047Probabilistic or stochastic networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/40Scaling of whole images or parts thereof, e.g. expanding or contracting
    • G06T3/4046Scaling of whole images or parts thereof, e.g. expanding or contracting using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Molecular Biology (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computing Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Probability & Statistics with Applications (AREA)
  • Image Processing (AREA)

Abstract

The invention discloses a lightweight binocular image super-resolution method based on multi-attention machine mechanism fusion, which mainly solves the problem that the model performance and the calculation efficiency are difficult to balance in a binocular image super-resolution task. Firstly, introducing a corrected binarization feature fusion frame to fuse multi-level image features extracted under the channel attention and space attention mechanism; secondly, global parallax information of the binocular image is extracted through a dual-channel attention mechanism, and meanwhile, a pyramid sampling mechanism is introduced to reduce module calculation amount. Experiments prove that the invention realizes the great improvement of the super-resolution performance under less parameters and confirms the portability of the lightweight network in the binocular image super-resolution task.

Description

Lightweight binocular image super-resolution method based on multi-attention machine system fusion
Technical Field
The invention relates to a lightweight binocular image super-resolution method based on multi-attention machine mechanism fusion, and belongs to the technical field of image processing.
Background
The binocular vision is inspired by bionics, the difference of the seen scenes forms three-dimensional space perception about scenes because of the difference of the positions of the left eye and the right eye of human beings, the binocular vision is a vision perception mechanism which is simulated by using a binocular camera, and a general binocular stereoscopic vision perception system similar to human eyes is constructed.
Unlike single image super resolution, binocular hyper-resolution is essentially a multi-input multi-output process, and a low-resolution left view and a low-resolution right view are input, so that a corresponding high-resolution binocular image needs to be reconstructed. If the binocular images are regarded as front and rear frames in the video, the task is simplified into a video super-resolution task with two frames of images, but the interaction relationship of the binocular images is represented by parallax, is different from the tiny movement offset between the video frames, and the research method has larger difference. Meanwhile, the image of the second view angle can provide extra information for a single image, and if the image is regarded as a single image hyper-segmentation task with reference, the provided information is limited to information under a low-resolution scene, and the reconstruction effect on high-level features is extremely low. Therefore, the binocular image super-resolution not only utilizes the related information between images on the basis of single image super-resolution, but also adds a parallax compensation mechanism in the traditional multi-image super-resolution task.
Most of lightweight network designs based on computer vision tasks such as graphic classification and semantic representation are gradually proposed in recent years, including classic MobileNet and xcoption structures, and most of the lightweight network designs are also introduced into the field of image super-segmentation, so that a better effect is achieved. With the addition of attention mechanism, the network performance is further improved, and how to reduce the number of parameters of the high-efficiency attention mechanism is also one of the important subjects in the research in the direction. For the binocular image super-resolution task, on one hand, the high efficiency of a feature extraction backbone network needs to be considered in the light weight of the model, and the parameter overhead can be reduced while the performance is guaranteed as far as possible in the binocular feature matching stage.
Disclosure of Invention
The technical problem is as follows: the invention aims to provide a lightweight binocular image super-resolution method based on multi-attention machine system fusion, aiming at the problem that the model performance and the operation efficiency are difficult to balance in a binocular super-resolution task, and the lightweight design idea which accords with a binocular image super-resolution network is searched by discussing the existing lightweight model in a single image super-resolution method.
The technical scheme is as follows: aiming at the problem that the model performance and the calculation efficiency are difficult to balance in a binocular super-resolution task, the invention provides a lightweight binocular image super-resolution method based on multi-attention-machine system fusion on the basis of researching a single-image super-resolution lightweight network, and the method comprises the following specific steps:
step 1: building a network model
Taking the low-resolution left view image and the right view image as network input, and performing super-resolution processing on the left view to obtain a high-resolution left view image; the construction network model comprises three sub-modules, namely a feature extraction module, a parallax attention extraction module and a feature reconstruction module;
first, a low-resolution binocular image pair is input
Figure BDA0003647533920000011
And
Figure BDA0003647533920000012
extracting shallow features of the left view image and the right view image through a 3 x3 convolutional layer:
Figure BDA0003647533920000013
Figure BDA0003647533920000014
wherein H sfe A 3 x3 convolutional layer representing the shared weight,
Figure BDA0003647533920000015
and
Figure BDA0003647533920000016
shallow features representing a left view and a right view respectively extracted from a low-resolution binocular graphics pair; inputting m feature fusion groups sharing weight values to further extract deeper features:
Figure BDA0003647533920000021
Figure BDA0003647533920000022
in the above formula, the first and second carbon atoms are,
Figure BDA0003647533920000023
represents the m-th feature fusion group, and similarly,
Figure BDA0003647533920000024
and
Figure BDA0003647533920000025
respectively represent the m-1 st and 1 st feature fusion groups,
Figure BDA0003647533920000026
and
Figure BDA0003647533920000027
the characteristic tensor of a deeper layer is output after shallow layer characteristics pass through the m characteristic fusion groups;
then, after the independent features of the low-resolution image pair are extracted, matching of binocular features is carried out through a parallax attention module based on a multi-scale pyramid sampling mechanism, and a parallax fusion feature tensor of the left view image is output:
Figure BDA0003647533920000028
wherein H DCPAM A two-channel parallax attention module is characterized,
Figure BDA0003647533920000029
representing the feature tensor obtained by the parallax attention module; subsequently, the previous stage features are further extracted and fused by using the n feature fusion groups again, and the characteristics are as follows:
Figure BDA00036475339200000210
similar to the feature extraction stage,
Figure BDA00036475339200000211
an nth feature fusion set representing a feature reconstruction phase,
Figure BDA00036475339200000212
and
Figure BDA00036475339200000213
respectively represent the (n-1) th and the 1 st feature fusion groups,
Figure BDA00036475339200000214
a fusion feature tensor representing a left view obtained after the n feature fusion groups;
and finally, pixel-by-pixel addition is carried out on the left view image subjected to the double-triple up-sampling and the superior output characteristic, so as to obtain a final left view super-resolution reconstruction result:
Figure BDA00036475339200000215
wherein H 5 And H 3 Respectively 5 × 5 and 3 × 3 convolution, H ps Representing the pixel reconstruction layer, H up For a bicubic upsampling operation, λ 1 And λ 2 Represents a trainable scalar parameter that is to be scaled,
Figure BDA00036475339200000216
the high-resolution left view after the final super-resolution is obtained;
step 2: constructing a binocular image data set, setting training parameters for network training,
dividing the data set image into a training set, a verification set and a test set, setting training parameters to train the network model, and performing network training on the training set to obtain a trained network model;
and step 3: and inputting the binocular image to be processed into the trained network model, and performing binocular image super-resolution reconstruction.
Wherein,
the feature fusion block in the step 1 takes a multi-attention fusion module as a basic module, integrates multi-level features extracted by channel attention, space attention and cavity convolution, and takes a corrected binarization feature fusion structure as a basic framework for building.
The parallax attention module in the step 1 is a dual-channel attention module and aims to extract local epipolar line characteristics and global parallax information.
And step 1, the convolution kernel sizes of all levels of the multi-scale pyramid sampling mechanism are [12,15,18 and 21 ].
In step 1, m is 2, n is 2, and the number of feature fusion blocks of the left view branch and the right view branch in the feature extraction stage is the same.
And 1, reconstructing a high-resolution right-view image in the same way by exchanging the left and right-view low-resolution images according to the low-resolution left-view image and the right-view image.
And 2, in the process of training the network model, using the super-resolution loss as a loss function.
In the process of training the network model in the step 2, Adam is used by a training optimizer, the learning rate is initialized to 0.00002, the learning rate is optimized in the training process, each 30 iterations are reduced to a half of the original one, the batch size is set to 4, and 120 iterations are trained and then converged.
And 2, in the process of training the network model, constructing a network environment based on the pytorch1.8 by using the Nvidia RTX3090Ti GPU.
Has the advantages that: due to the adoption of the technical scheme, compared with the prior art, the invention has the following advantages and beneficial effects:
1. the method disclosed by the invention integrates all levels of features extracted by a multi-attention machine mechanism by using a corrected binarization feature fusion framework, and improves the performance of super-resolution reconstruction on the premise of reducing model parameters and calculated amount.
2. A dual-channel parallax attention mechanism is introduced, so that the feature extraction along epipolar lines and global parallax is realized, and the effective compensation of the interactive information between the left view and the right view is provided.
Drawings
Figure 1 is a diagram of the overall network architecture,
FIG. 2(a) is a schematic diagram of a feature fusion group structure, FIG. 2(b) is a schematic diagram of a multi-attention fusion module, FIG. 2(c) is a schematic diagram of a dual channel attention module,
figure 3 is a graph comparing the results of the present invention with the prior art on a low resolution binocular image,
FIG. 4 is a graph comparing the efficiency of operation of the present invention with that of the prior art.
Detailed Description
The invention is explained in detail below with reference to the accompanying drawings and embodiments, and the binocular image super-resolution method based on multi-attention machine fusion provided by the invention specifically comprises the following steps:
step 1: building a network model
As shown in fig. 1, the network design of the present invention includes three sub-steps, i.e., feature extraction, disparity attention extraction, and feature reconstruction. In the characteristic extraction module stage, different hierarchical characteristics in a single view extracted by a multi-attention machine system are fused by adopting a parameter-shared modified binarization characteristic fusion framework; a Dual-channel parallax attention mechanism (DCPAM) and a pyramid sampling mechanism are introduced into a Dual-channel parallax attention extraction stage to fuse local and global information between two views; and in the characteristic reconstruction stage, the characteristic fusion group of the characteristic extraction module is continued, and the high-resolution left view image is reconstructed through the automatic parameter calculation unit.
Step 1.1: feature extraction
First, a low-resolution binocular image pair is input
Figure BDA0003647533920000031
And
Figure BDA0003647533920000032
extracting shallow features of the left view image and the right view image through a 3 x3 convolutional layer:
Figure BDA0003647533920000033
Figure BDA0003647533920000034
wherein H sfe A 3 x3 convolutional layer representing the shared weight,
Figure BDA0003647533920000035
and
Figure BDA0003647533920000036
shallow features representing the left and right views respectively extracted from the low-resolution binocular graphics pair. Then, inputting a feature fusion group of m sharing weights to further extract deeper features:
Figure BDA0003647533920000037
Figure BDA0003647533920000038
in the above formula, the first and second carbon atoms are,
Figure BDA0003647533920000039
represents the m-th feature fusion group,
Figure BDA00036475339200000310
and
Figure BDA00036475339200000311
respectively represent the m-1 st and 1 st feature fusion groups,
Figure BDA00036475339200000312
and
Figure BDA00036475339200000313
and the characteristic tensor of the deeper layer output after the shallow characteristic passes through the m characteristic fusion groups is referred.
Specifically, as shown in fig. 2(a), the feature fusion group is constructed by using a multi-attention fusion module as a basic module and a modified binarization fusion frame as a feature fusion frame.
As shown in fig. 2(b), the main branch of the multi-attention fusion module first performs spatial feature extraction, and first extracts input features by stacking 3 × 3 convolution layers, and 1 × 1 convolution reduces the number of channels to 1/4. To increase the receptive field, the spatial dimensions are reduced using a convolution kernel with a step size of 2, 7 × 7, followed by a hole convolution operation with a dilation rate of 1, 2, respectively, after the pooling layer. After the original input feature dimensions are up-sampled, the spatial feature tensors of the left view image and the right view image are output through a 1 × 1 convolution kernel Sigmoid normalization operation.
The auxiliary road of the multi-attention fusion module adopts an efficient channel attention mechanism and consists of 1 x 1 point-by-point convolution and depth-by-depth convolution to obtain channel characteristic tensors of the left view image and the right view image. The outputs of the multi-attention fusion modules are cascaded in a modified framework form of a binary fusion framework through multiplication of batch matrixes and are merged into the multi-attention fusion module.
And the corrected binarization fusion framework connects the currently generated output feature tensor with the input feature tensor of the lower level and is constructed in a layer-by-layer recursion mode. And simultaneously adding a channel recombination module, reducing the number of channels by utilizing 1 multiplied by 1 convolution to enable the number of channels to be the same as that of the input characteristic tensor, and adding the channels into the output characteristic tensor in a pixel-by-pixel addition mode.
Step 1.2: parallax attention extraction
After the independent features of the low-resolution image pair are extracted, matching of binocular features is conducted through a parallax attention module based on a multi-scale pyramid sampling mechanism, and a parallax fusion feature tensor of the left view image is output:
Figure BDA0003647533920000041
wherein H DCPAM A two-channel parallax attention module is characterized,
Figure BDA0003647533920000042
representing the feature tensor obtained by the disparity attention module.
Specifically, as shown in fig. 2(c), the DLPAM Module includes a Local Parallax Attention Module (LPAM) of the left branch and a Global Parallax Attention Module (GPAM) of the right branch. In LPAM, the left view feature tensor extracted in step 1.1 is passed through a 1 × 1 convolution kernel and a transitional residual block
Figure BDA0003647533920000043
And right view feature tensor
Figure BDA0003647533920000044
Respectively produced to have a size of
Figure BDA0003647533920000045
Characteristic tensor K of l And Q l By merging channel dimensions to
Figure BDA0003647533920000046
To obtain k l And q is l Matrix of contributions of right view to left view along epipolar lines (correlation matrix)
Figure BDA0003647533920000047
Through k l K after inversion l T And q is l Multiplying the batched matrixes, and performing softmax operation to obtain:
Figure BDA0003647533920000048
LPAM generates right view mask by single convolution
Figure BDA0003647533920000049
Is reshaped into
Figure BDA00036475339200000410
And
Figure BDA00036475339200000411
multiplying the batched matrixes to generate a parallax compensation tensor based on a right view
Figure BDA00036475339200000412
Figure BDA00036475339200000413
Local disparity attention feature tensor for final left view
Figure BDA00036475339200000414
Is formed by a matrix of contribution degrees
Figure BDA00036475339200000415
Disparity compensation tensor for right view
Figure BDA00036475339200000416
And the feature tensor of the deeper level of the left view
Figure BDA00036475339200000417
The three parts are cascaded and formed as follows:
Figure BDA00036475339200000418
the generation mode of the global parallax attention feature tensor of the right-channel GPAM is similar to that of the LPAM, and in order to reduce the huge calculation amount of matrix multiplication, a pyramid sampling mechanism is adopted for data compression. First, Q is obtained by referring to the preceding stage feature tensor extraction method in the LPAM g 、K g And V g All sizes are
Figure BDA00036475339200000419
Figure BDA00036475339200000420
Providing feature information of the right view, Q g And K g Is stretched into
Figure BDA00036475339200000421
Form to obtain q g And k g Through k g K after transposition g T And q is g Similar matrix of global feature is generated through softmax operation after multiplication of batched matrixes
Figure BDA00036475339200000422
Figure BDA00036475339200000423
Based on this, introduction of [12,15,18,21 ]]The pyramid sampling mechanism under the arrangement of convolution kernels with different sizes converts K into g Is compressed into
Figure BDA00036475339200000424
Figure BDA00036475339200000425
Is modified into
Figure BDA00036475339200000426
The global feature aggregation is composed of three parts, a mask G and a similarity matrix M g By dot productAnd V g Postpooling remodeling v g Is transferred v g T Matrix multiplication is carried out, and a left view global parallax attention characteristic tensor is output
Figure BDA00036475339200000427
Figure BDA0003647533920000051
The two-channel parallax attention tensor finally extracted by the two channels is obtained by adding the local parallax feature attention tensor and the global parallax feature attention tensor pixel by pixel:
Figure BDA0003647533920000052
step 1.3: feature reconstruction
And further extracting and fusing the preceding stage features by using n feature fusion groups, wherein the characteristics are as follows:
Figure BDA0003647533920000053
similar to the feature extraction stage,
Figure BDA0003647533920000054
an nth feature fusion set representing a feature reconstruction phase,
Figure BDA0003647533920000055
and
Figure BDA0003647533920000056
respectively represent the (n-1) th and the 1 st feature fusion groups,
Figure BDA0003647533920000057
and a fused feature tensor representing the left view image obtained after the n feature fusion groups. Finally, the fused feature tensor extracted by the n feature fusion groups is sent to the pixelThe recombination module is added with a unit for automatically calculating parameters, different feature weights are respectively given to the 3 x3 convolution layer and the 5 x 5 convolution layer, the feature tensors after automatic weighting are cascaded, and the feature tensors and the results after the bicubic upsampling of the original left view are added pixel by pixel to obtain a final high-resolution left view image:
Figure BDA0003647533920000058
wherein H 5 And H 3 Respectively 5 × 5 and 3 × 3 convolution, H ps Representing the pixel reconstruction layer, H up For a bicubic upsampling operation, λ 1 And λ 2 Represents a trainable scalar parameter that is to be scaled,
Figure BDA0003647533920000059
is the high-resolution left view after the final super-resolution.
Step 2: constructing a binocular image data set, dividing images of the data set into a training set, a verification set and a test set, setting training parameters to train a network model, and performing network training on the training set to obtain the trained network model;
step 2.1: loss function setting
In order to measure the generalized difference of the pixels between the super-resolution image and the ground real image, the super-resolution loss is calculated by adopting the mean square error,
Figure BDA00036475339200000510
and
Figure BDA00036475339200000511
the loss function refers to the left view result after the network hyper-resolution reconstruction and the ground real high-resolution left view image
Figure BDA00036475339200000512
Is calculated as follows:
Figure BDA00036475339200000513
step 2.2: training and setting:
the number m of feature fusion groups at each stage in the network is 2, n is 2, Adam is used by a training optimizer, the learning rate is initialized to 0.00002, the learning rate is optimized in the training process, each 30 iterations is reduced to half of the original learning rate, the batch size is set to 4, and 120 iterations are trained and then gradually converged. A network environment was built based on the pytorch1.8 using the Nvidia RTX3090Ti GPU.
And step 3: and inputting the binocular image to be processed into the trained model, and performing binocular image super-resolution reconstruction.
Fig. 3 shows that compared with the super-resolution reconstruction result of the existing binocular image super-resolution technology (bicubic up-sampling, StereoSR and PASSRnet), the reconstruction effect of the invention can clearly show the shape of the character, and the background and the character body are distinguished obviously without the influence of light spots and the like, so that the super-resolution result is good.
Fig. 4 shows a comparison between the present invention and the existing super-resolution reconstruction techniques (SRCNN, VDSR, carry, StereoSR, PASSRnet) in terms of operation efficiency, and the present invention has a shorter inference time under the same number of test samples, and achieves effective unification of performance and operation efficiency.

Claims (9)

1. A lightweight binocular image super-resolution method based on multi-attention machine fusion is characterized by comprising the following steps:
step 1: building a network model
Taking the low-resolution left view image and the right view image as network input, and performing super-resolution processing on the left view to obtain a high-resolution left view image; the construction network model comprises three sub-modules, namely a feature extraction module, a parallax attention extraction module and a feature reconstruction module;
first, a low-resolution binocular image pair is input
Figure FDA0003647533910000011
And
Figure FDA0003647533910000012
extracting shallow features of the left view image and the right view image through a 3 x3 convolutional layer:
Figure FDA0003647533910000013
Figure FDA0003647533910000014
wherein H sfe A 3 x3 convolutional layer representing a shared weight,
Figure FDA0003647533910000015
and
Figure FDA0003647533910000016
shallow features representing a left view and a right view respectively extracted from a low-resolution binocular graphics pair; inputting m feature fusion groups sharing weight values to further extract deeper features:
Figure FDA0003647533910000017
Figure FDA0003647533910000018
in the above-mentioned formula, the compound has the following structure,
Figure FDA0003647533910000019
represents the m-th feature fusion group, and similarly,
Figure FDA00036475339100000110
and
Figure FDA00036475339100000111
respectively represent the m-1 st and 1 st feature fusion groups,
Figure FDA00036475339100000112
and
Figure FDA00036475339100000113
the characteristic tensor of a deeper layer is output after shallow layer characteristics pass through the m characteristic fusion groups;
then, after the independent features of the low-resolution image pair are extracted, matching of binocular features is carried out through a parallax attention module based on a multi-scale pyramid sampling mechanism, and a parallax fusion feature tensor of the left view image is output:
Figure FDA00036475339100000114
wherein H DCPAM A two-channel parallax attention module is characterized,
Figure FDA00036475339100000115
representing the feature tensor obtained by the parallax attention module;
subsequently, the previous stage features are further extracted and fused by using the n feature fusion groups again, and the characteristics are as follows:
Figure FDA00036475339100000116
similar to the feature extraction stage,
Figure FDA00036475339100000117
an nth feature fusion set representing a feature reconstruction phase,
Figure FDA00036475339100000118
and
Figure FDA00036475339100000119
respectively represent the (n-1) th and the 1 st feature fusion groups,
Figure FDA00036475339100000120
representing a fusion feature tensor of the left view obtained after the n feature fusion groups;
and finally, pixel-by-pixel addition is carried out on the left view image subjected to the double-triple up-sampling and the superior output characteristic, so as to obtain a final left view super-resolution reconstruction result:
Figure FDA00036475339100000121
wherein H 5 And H 3 Respectively 5 × 5 and 3 × 3 convolution, H ps Representing the pixel reconstruction layer, H up For a bicubic upsampling operation, λ 1 And λ 2 Represents a trainable scalar parameter that is to be scaled,
Figure FDA00036475339100000122
the high-resolution left view after the final super-resolution is obtained;
step 2: constructing a binocular image data set, setting training parameters for network training,
dividing the data set image into a training set, a verification set and a test set, setting training parameters to train the network model, and performing network training on the training set to obtain a trained network model;
and step 3: and inputting the binocular image to be processed into the trained network model, and performing binocular image super-resolution reconstruction.
2. The lightweight binocular image super-resolution method based on multi-attention mechanism fusion of claim 1, wherein the feature fusion block of step 1 is constructed with a multi-attention fusion module as a basic module, integrates channel attention, spatial attention and multilevel features extracted by void convolution, and takes a modified binarization feature fusion structure as a basic framework.
3. The multi-attention-machine-fusion-based lightweight binocular image super-resolution method according to claim 1, wherein the parallax attention module in the step 1 is a dual-channel attention module and aims to extract local epipolar line features and global parallax information.
4. The multi-attention-machine-fusion-based lightweight binocular image super-resolution method according to claim 1, wherein the multi-scale pyramid sampling mechanism of step 1 has convolution kernel sizes of [12,15,18,21 ].
5. The method for super-resolution of lightweight binocular images based on multi-attention machine mechanism fusion according to claim 1, wherein m is 2, n is 2 in step 1, and the number of feature fusion blocks of left and right view branches in the feature extraction stage is the same.
6. The multi-attention-machine-fusion-based lightweight binocular image super-resolution method according to claim 1, wherein the low-resolution left-view image and the low-resolution right-view image in the step 1 are exchanged, and the high-resolution right-view image can be reconstructed in the same way by exchanging the left-view image and the right-view image.
7. The method for super-resolution of lightweight binocular images based on multi-attention machine fusion according to claim 1, wherein the super-resolution loss is used as a loss function in the training process of the network model in the step 2.
8. The method for super-resolution of the lightweight binocular images based on the multi-attention machine mechanism fusion as claimed in claim 1, wherein in the training process of the network model in the step 2, Adam is used by a training optimizer, the learning rate is initialized to 0.00002, the learning rate is optimized in the training process, each 30 iterations are reduced to half of the original value, the batch size is set to 4, and the training converges after 120 iterations.
9. The multi-attention-machine-fusion-based lightweight binocular image super-resolution method according to claim 1, wherein in the training process of the network model in the step 2, a network environment is built based on the pytorch1.8 by using an Nvidia RTX3090Ti GPU.
CN202210538803.7A 2022-05-17 2022-05-17 Lightweight binocular image super-resolution method based on multi-attention machine system fusion Pending CN114881858A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210538803.7A CN114881858A (en) 2022-05-17 2022-05-17 Lightweight binocular image super-resolution method based on multi-attention machine system fusion

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210538803.7A CN114881858A (en) 2022-05-17 2022-05-17 Lightweight binocular image super-resolution method based on multi-attention machine system fusion

Publications (1)

Publication Number Publication Date
CN114881858A true CN114881858A (en) 2022-08-09

Family

ID=82675306

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210538803.7A Pending CN114881858A (en) 2022-05-17 2022-05-17 Lightweight binocular image super-resolution method based on multi-attention machine system fusion

Country Status (1)

Country Link
CN (1) CN114881858A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117710215A (en) * 2024-01-09 2024-03-15 西南科技大学 Binocular image super-resolution method based on polar line windowing attention

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117710215A (en) * 2024-01-09 2024-03-15 西南科技大学 Binocular image super-resolution method based on polar line windowing attention
CN117710215B (en) * 2024-01-09 2024-06-04 西南科技大学 Binocular image super-resolution method based on polar line windowing attention

Similar Documents

Publication Publication Date Title
CN111652966B (en) Three-dimensional reconstruction method and device based on multiple visual angles of unmanned aerial vehicle
CN111275618B (en) Depth map super-resolution reconstruction network construction method based on double-branch perception
CN111739082B (en) Stereo vision unsupervised depth estimation method based on convolutional neural network
CN112991350B (en) RGB-T image semantic segmentation method based on modal difference reduction
CN112767253B (en) Multi-scale feature fusion binocular image super-resolution reconstruction method
CN114581560B (en) Multi-scale neural network infrared image colorization method based on attention mechanism
CN110930500A (en) Dynamic hair modeling method based on single-view video
CN108924528B (en) Binocular stylized real-time rendering method based on deep learning
CN112785502B (en) Light field image super-resolution method of hybrid camera based on texture migration
CN113096239B (en) Three-dimensional point cloud reconstruction method based on deep learning
Lin et al. Steformer: Efficient stereo image super-resolution with transformer
CN112819951A (en) Three-dimensional human body reconstruction method with shielding function based on depth map restoration
CN113538243A (en) Super-resolution image reconstruction method based on multi-parallax attention module combination
CN116486074A (en) Medical image segmentation method based on local and global context information coding
CN114881858A (en) Lightweight binocular image super-resolution method based on multi-attention machine system fusion
CN112184555B (en) Stereo image super-resolution reconstruction method based on deep interactive learning
CN114359039A (en) Knowledge distillation-based image super-resolution method
CN116934972A (en) Three-dimensional human body reconstruction method based on double-flow network
CN117036171A (en) Blueprint separable residual balanced distillation super-resolution reconstruction model and blueprint separable residual balanced distillation super-resolution reconstruction method for single image
CN116912727A (en) Video human behavior recognition method based on space-time characteristic enhancement network
CN116957057A (en) Multi-view information interaction-based light field image super-resolution network generation method
CN116309072A (en) Binocular image super-resolution method for feature channel separation and fusion
CN116703719A (en) Face super-resolution reconstruction device and method based on face 3D priori information
CN116152060A (en) Double-feature fusion guided depth image super-resolution reconstruction method
Zhang et al. Unsupervised learning of depth estimation based on attention model from monocular images

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination