CN115293968A

CN115293968A - Super-light-weight high-efficiency single-image super-resolution method

Info

Publication number: CN115293968A
Application number: CN202210861170.3A
Authority: CN
Inventors: 陶文兵; 吴憾; 刘李漫
Original assignee: Wuhan Tuke Intelligent Technology Co ltd
Current assignee: Wuhan Tuke Intelligent Technology Co ltd
Priority date: 2022-07-19
Filing date: 2022-07-19
Publication date: 2022-11-04

Abstract

The invention relates to an ultra-light-weight high-efficiency single-image super-resolution method, which takes a low-resolution image as input, so that an algorithm can be deployed on ultra-light-weight equipment or high-speed equipment. The method mainly comprises the following steps: a more lightweight super-resolution backbone network is provided, so that the parameter quantity of the network is greatly reduced while the performance of the network is maintained; the double-branch multi-scale reception field fusion module is provided, so that the characteristic diversity among distillation information and the overall network reception field are increased, the information overlapping is removed, and the network performance is improved; the light attention mechanism based on the learnable Hash mapping is provided, and is used for decoupling input information and enhancing regional boundary reconstruction and regional relation learning; a second-order coordinate attention mechanism based on global information supervision is provided, and accuracy of network feature expression is improved. The strategy provided by the invention can be used in a plurality of fields such as public security, medical imaging, remote sensing detection, video compression transmission and the like.

Description

Super-light-weight high-efficiency single-image super-resolution method

Technical Field

The invention relates to the field of computer vision, in particular to an ultra-light high-efficiency single-image super-resolution method.

Background

The single image super resolution task aims at reconstructing detail information from a low resolution image to obtain a high resolution image. However, the same low resolution image can be generated by different high resolution image degradations, which leads to task inadaptation and high difficulty.

In order to solve the problem, a plurality of single-image super-resolution algorithms based on deep learning are proposed and achieve the effect far beyond that of the traditional algorithm, and become the current mainstream method. At present, in many scenes, for example, in various mobile devices and wearable devices with weak computing power, the accuracy of the image super-resolution algorithm based on the traditional method is too low, while the complexity of the image super-resolution algorithm based on the deep learning is too high, and the computation amount is too large, so that the model cannot be deployed normally.

Disclosure of Invention

Aiming at the technical problems in the prior art, the invention provides an ultra-light-weight and high-efficiency single-image super-resolution method, so as to solve the technical problem that the existing deep learning method is difficult to deploy in various mobile devices and wearable devices with weak computing power.

According to a first aspect of the present invention, there is provided an ultra-lightweight and efficient single-image super-resolution method, comprising:

step 1, inputting a low-resolution image I to be processed _l For said low-resolution image I, using a 3 x 3 convolution _ι Shallow feature extraction is carried out to obtain shallow features F containing coarse-grained information ₀ ；

Step 2, the shallow layer characteristic F is treated ₀ Extracting deep layer characteristics to obtain deep layer characteristics F ₆ ；

Step 3, building the shallow layer characteristic F ₀ And the deep layer characteristic F ₆ The residual error connection between the two groups obtains the residual error characteristic of F _r ；

Step 4, utilizing a 3 x 3 convolution to convert the residual error characteristic F _r Conversion into feature F _u The feature F is then processed by the Sub-pixel module _u Upsampling to high resolution pictures I _h 。

On the basis of the technical scheme, the invention can be improved as follows.

Optionally, the process of performing deep feature extraction in step 2 includes: the shallow feature F ₀ Inputting a double-branch multi-scale receptive field fusion module for extraction and fusion;

the double-branch multi-scale perception fusion module comprises an information distillation branch and a feature extraction branch;

the feature extraction branch respectively extracts features by convolution of 1 × 1, 3 × 3, 5 × 5 and 7 × 7; and after the features are extracted by adopting 1 × 1, 3 × 3, 5 × 5 and 7 × 7 depth-by-depth convolution in the information distillation branch, fusion features are obtained.

Optionally, the process of performing deep feature extraction in step 2 includes: inputting the fusion features into a second-order coordinate attention module based on global information supervision;

the second order coordinate attention module includes: a global mean information coding branch and a global variance information coding branch;

inputting the fusion characteristics into the global mean information coding branch to carry out learnable weight coding to obtain global mean information after weight adjustment;

inputting the fusion characteristics into the global variance information coding branch to carry out learnable weight coding to obtain global variance information after weight adjustment;

and adding the global mean information and the global variance information after the weighting adjustment, and performing nonlinear mapping on the global mean information and the global variance information by using a Sigmoid function to obtain comprehensive coding global information.

Optionally, the process of performing deep feature extraction in step 2 includes: the shallow layer features are repeatedly input into the double-branch multi-scale receptive field fusion module and the second-order coordinate attention module for three times to obtain deep layer features { F } ₁ ，F ₂ ，F ₃ }；

The deep layer characteristic F ₁ And F ₃ Inputting adjacent feature fusion module to perform feature fusion to obtainA deep fusion feature, the adjacent feature fusion module comprising a 1 x 1 reduced dimension convolution and activation function.

Optionally, the process of performing deep feature extraction in step 2 further includes: inputting the deep fusion features into a reinforcement learning module for reinforcement learning, the reinforcement learning module reinforcement learning the deep fusion features using a decoupling-enhanced, learnable hash-mapping based lightweight Non-local attention mechanism.

Optionally, the reinforcement learning module includes a feature transformation branch and a global semantic information coding branch;

the process of the feature transformation branch processing the deep fusion features comprises the following steps: adding a whitening module to the right side of the 3 x 3 convolution with the channel compression ratio of r, and inputting the deep fusion features into the whitening module to obtain whitening features F _B ；

For the whitening characteristic F _B Obtaining the similarity information F between the whitening points in barrel-shaped distribution through similar map calculation _bs ；

The similarity information F between the whitening points _bs And the whitening feature F _B The self-similar characteristic correction operation in the barrel is executed together to obtain the corrected barrel-shaped distribution characteristic F _BR ；

According to the inverse mapping H ^-1 The barrel-shaped distribution feature F is formed _BR Reduced to a self-similar corrected feature F of size C/2 XH W _P ；

The process of the global semantic information coding branch for processing the deep fusion features comprises the following steps: convolving the deep fusion characteristic input channel by 1 multiplied by 1 with the compression ratio of C to obtain pixel-by-pixel significance information F with the size of 1 multiplied by H multiplied by W _i ；

The pixel-by-pixel saliency information F _i The input Softmax module carries out nonlinear mapping to obtain pixel-by-pixel relative significance information F 'distributed in the range of 0-1' _i ；

Comparing the pixel-by-pixel relative saliency information F' _i Performing pixel-by-pixel feature weighted summation operation on the transformation features obtained by the feature transformation branch as weights to obtain a global feature with the size of C/2 multiplied by 1Semantic information F _G ；

Using two 1 x 1 convolutions on the global semantic information F _G Channel coding is carried out to obtain channel-sensitive global semantic information F' _G 。

Optionally, the process of performing deep feature extraction in step 2 further includes: correcting the self-similarity of pixel point form by using the characteristic F _P And channel-sensitive global semantic information F' _G Simply adding to obtain the fusion feature F with the size of C/2 XHXW _f The fusion feature F is convolved by 1 x 1 liter dimension _f Reduced to C H W dimension size feature F _o 。

Optionally, the process of performing deep feature extraction in step 2 further includes:

the feature F is measured _o Obtaining a characteristic F after passing through the double-branch multi-scale receptive field fusion module and the second-order coordinate attention module ₄ The feature F is measured ₄ Obtaining a depth feature F through the dual-branch multi-scale receptive field fusion module and the second-order coordinate attention module ₅ ；

Using the neighboring feature fusion module to pair depth features F _o And the depth feature F ₅ After feature fusion, the depth feature F is obtained through the dual-branch multi-scale receptive field fusion module ₆ 。

The embodiment of the invention provides an ultra-light-weight high-efficiency single-image super-resolution method, which has obvious advantages for being deployed on mobile equipment with limited computing power, and the advantages mainly come from the following aspects: firstly, a super-resolution backbone network with lighter weight is obtained by removing and replacing a large number of redundant structures of the previous super-resolution algorithm, so that the parameter quantity of the network is greatly reduced while the performance of the network is maintained. On the basis, the double-branch multi-scale reception field fusion module is further utilized to increase the characteristic diversity among distillation information and the overall network reception field, and information overlapping is removed, so that the network performance is improved. And then decoupling the input information by using a decoupling enhanced lightweight attention mechanism based on learnable HashMap, and enhancing the region boundary reconstruction and the region internal relation learning by using the two types of decoupling information. And finally, the effective global information is extracted to supervise and correct the network, so that the accuracy of network feature expression is improved, and a high-quality high-resolution image is obtained. And obtaining a high-resolution image by reconstructing the detail information. The method ensures the precision and minimizes the model, and solves the problem that the previous super-resolution algorithm cannot be deployed on ultra-light equipment or high-speed equipment. The strategy provided by the invention can be used in a plurality of fields such as public security, medical imaging, remote sensing detection, video compression transmission and the like.

Drawings

FIG. 1 is a flow chart of a super-resolution method for single image with ultra-light weight and high efficiency provided by the present invention;

fig. 2 is a schematic overall architecture diagram of an embodiment of an ultra-lightweight and efficient single-image super-resolution method provided by the present invention:

fig. 3 is a schematic diagram of a dual-branch multi-scale receptive field fusion module architecture according to an embodiment of the present invention;

fig. 4 is a schematic diagram of a lightweight attention mechanism architecture based on learnable hash mapping with decoupling enhancement according to an embodiment of the present invention.

Detailed Description

The principles and features of this invention are described below in conjunction with the following drawings, which are set forth to illustrate, but are not to be construed to limit the scope of the invention.

In order to overcome the defects and problems in the background art, the invention provides a light super-resolution backbone network, which removes and replaces a large number of redundant structures of the previous super-resolution algorithm, so that the parameter quantity of the network is greatly reduced while the performance of the network is maintained. Fig. 1 is a flowchart of an ultra-lightweight high-efficiency single-image super-resolution method provided by the present invention, and as shown in fig. 1, the ultra-lightweight high-efficiency single-image super-resolution method includes:

step 1, inputting a low-resolution image I to be processed _ι Using a 3 x 3 convolution on the low resolution image I _ι Shallow layer characteristic extraction is carried out to obtain a shallow layer containing coarse granularity informationCharacteristic F ₀ 。

Step 2, shallow layer characteristic F ₀ Extracting deep layer characteristics to obtain deep layer characteristics F ₆ 。

Step 3, building shallow layer characteristics F ₀ And deep layer feature F ₆ The residual error connection between the two groups obtains the residual error characteristic of F _r 。

Step 4, utilizing a 3 x 3 convolution to convert the residual error characteristic F _r Conversion to features F more suitable for upsampling _u And then the Sub-pixel module is used to convert the low-resolution feature F _u Upsampling to high resolution pictures I _h 。

The super-resolution method for the single image, provided by the invention, has the advantages that the accuracy is ensured, the parameter quantity of the model is minimized, and the deployment and the use on various mobile devices are realized.

Example 1

In order to overcome the defects and problems in the background art, the embodiment 1 provided by the present invention provides a super-resolution backbone network with lighter weight, which removes and replaces a large number of redundant structures of the previous super-resolution algorithm, so that the parameter amount of the network is greatly reduced while the performance of the network is maintained. Meanwhile, on the basis, a double-branch multi-scale reception field fusion module is provided, and the module further increases the characteristic diversity among distillation information and the overall network reception field, removes information overlapping and improves the network performance. The light attention mechanism based on the learnable Hash mapping is provided, and decoupling is carried out on input information, and area boundary reconstruction and area relation learning are respectively enhanced by utilizing two kinds of decoupling information. A second-order coordinate attention mechanism based on global information supervision is provided, and the network is supervised and corrected by extracting effective global information, so that the accuracy of network feature expression is improved.

Fig. 2 is a schematic overall architecture diagram of an embodiment of an ultra-lightweight and efficient single-image super-resolution method provided by the present invention, and as can be seen from fig. 1 and fig. 2, the embodiment of the ultra-lightweight and efficient single-image super-resolution method includes:

step 1, inputting a low-resolution image I to be processed _ι Using a 3 x 3 convolution on the low resolution image I _ι Shallow feature F containing coarse grain information is obtained by shallow feature extraction ₀ 。

In a possible embodiment, the deep feature extraction in step 2 includes:

step 201, shallow layer characteristic F ₀ Inputting the two-branch multi-scale receptive field fusion module for extraction and fusion.

Fig. 3 is a schematic diagram of a dual-branch multi-scale receptive field fusion module architecture according to an embodiment of the present invention, and it can be seen from fig. 3 that the dual-branch multi-scale receptive field fusion module includes an information distillation branch and a feature extraction branch.

Extracting features in the feature extraction branch by convolution with 1 × 1, 3 × 3, 5 × 5 and 7 × 7 respectively; and extracting features by adopting 1 × 1, 3 × 3, 5 × 5 and 7 × 7 depth-by-depth convolution in the information distillation branch to obtain fusion features.

The characteristic of each time of entering the information distillation branch is ensured to be the characteristic before residual connection, so that the two branches not only effectively enlarge the network receptive field, but also reduce the information redundancy among the distillation characteristics.

Step 202, the process of deep layer feature extraction in step 2 includes: and inputting the fusion features into a second-order coordinate attention module based on global information supervision, and enhancing the overall control of the network on the image.

The second order coordinate attention module includes: a global mean information coding branch and a global variance information coding branch.

And inputting the fusion characteristics into a global mean information coding branch to carry out learnable weight coding to obtain the global mean information after weight adjustment.

And inputting the fusion characteristics into a global variance information coding branch to carry out learnable weight coding to obtain the global variance information after weight adjustment.

The method enables the features to adaptively adjust the weight distribution of the one-dimensional feature vector according to the task in the network training process, thereby obtaining more robust and reliable global information.

And at the end of the module, adding the global mean information and the global variance information after the weighting adjustment, and carrying out nonlinear mapping on the global mean information and the global variance information by using a Sigmoid function to obtain comprehensive coding global information.

Step 203, inputting the shallow layer feature into the double-branch multi-scale receptive field fusion module and the second-order coordinate attention module for three times to obtain the deep layer feature { F } ₁ ，F ₂ ，F ₃ }。

Deep layer characteristics F ₁ And F ₃ Inputting an adjacent feature fusion module to perform feature fusion to obtain a deep fusion feature, wherein the adjacent feature fusion module comprises a 1 x 1 dimensionality reduction convolution and an activation function to enhance the liquidity of shallower information in the network.

And 204, inputting the deep fusion features into a reinforced learning module for reinforced learning, wherein the reinforced learning module uses a decoupling reinforced lightweight Non-local attention mechanism based on learnable Hash mapping to carry out reinforced learning on the deep fusion features.

As shown in fig. 4, a schematic diagram of a decoupling enhanced lightweight attention mechanism architecture based on learnable hash mapping according to an embodiment of the present invention, and as can be seen in fig. 4, in a possible embodiment, the reinforcement learning module includes a feature transformation branch and a global semantic information coding branch.

The process of processing the deep fusion features by the feature transformation branch comprises the following steps: adding the 3 x 3 convolution right side with the channel compression ratio of r into a whitening module, and inputting the deep fusion features into the whitening module to obtain whitening features F _B 。

Whitening characteristic F _B Obtaining the similarity information F between the whitening points in barrel-shaped distribution through similar map calculation _bs 。

The similarity information F between the whitening points _bs And whitening feature F _B The self-similar characteristic correction operation in the barrel is executed together to obtain the corrected barrel-shaped distribution characteristic F _BR 。

According to the inverse mapping H ^-1 Barrel-shaped distribution of features F _BR Reduced to a self-similar corrected feature F of size C/2 XH W _P 。

The process of processing the deep fusion features by the global semantic information coding branch comprises the following steps: the deep fusion characteristics are input into a 1 multiplied by 1 convolution with a channel compression ratio of C to obtain pixel-by-pixel significance information F with the size of 1 multiplied by H multiplied by W _i 。

Pixel-by-pixel saliency information F _i The input Softmax module carries out nonlinear mapping to obtain pixel-by-pixel relative significance information F 'distributed in the range of 0-1' _i 。

Pixel-by-pixel relative saliency information F' _i Performing pixel-by-pixel feature weighted summation operation on the transformation features obtained by the feature transformation branch as weights to obtain global semantic information F with the size of C/2 multiplied by 1 _G 。

Using two 1 x 1 convolutions on global semantic information F _G Channel coding is carried out to obtain channel-sensitive global semantic information F' _G . The global semantic information F' _G The region boundaries can be well modeled.

Step 205, correcting self-similarity of pixel point form by using characteristic F _P And channel sensitive global semantic information F' _G Simply adding to obtain the fusion feature F with the size of C/2 XHXW _f Fusing features F by 1 × 1L dimensional convolution _f Reduced to C H W dimension size feature F _o 。

Step 206, the feature F _o Obtaining a characteristic F after passing through a double-branch multi-scale receptive field fusion module and a second-order coordinate attention module ₄ Will feature F ₄ Obtaining a depth characteristic F after passing through a double-branch multi-scale receptive field fusion module and a second-order coordinate attention module ₅ 。

Step 207, using the neighboring feature fusion module to pair depth features F _o And depth feature F ₅ After feature fusion, a depth feature F is obtained through a double-branch multi-scale receptive field fusion module ₆ 。

Step 3, building shallow layer characteristics F ₀ And deep layer feature F ₆ The residue in betweenThe difference connection obtains a residual error characteristic of F _r 。

Step 4, utilizing a 3 x 3 convolution to convert the residual error characteristic F _r Conversion to features F more suitable for upsampling _u The Sub-pixel module then converts the low-resolution feature F _u Upsampled to high resolution image I _h 。

The embodiment of the invention provides an ultra-light-weight high-efficiency single-image super-resolution method, which has obvious advantages for being deployed on mobile equipment with limited computing power, and the advantages mainly come from the following aspects: firstly, a super-resolution backbone network with lighter weight is obtained by removing and replacing a large number of redundant structures of the previous super-resolution algorithm, so that the parameter quantity of the network is greatly reduced while the performance of the network is maintained. On the basis, the double-branch multi-scale reception field fusion module is further utilized to increase the characteristic diversity among distillation information and the overall network reception field, and information overlapping is removed, so that the network performance is improved. And then decoupling the input information by using a decoupling enhanced light attention mechanism based on learnable HashMap, and respectively enhancing the region boundary reconstruction and the region interior relation learning by using the two kinds of decoupling information. And finally, the effective global information is extracted to supervise and correct the network, so that the accuracy of network feature expression is improved, and a high-quality high-resolution image is obtained. And obtaining a high-resolution image by reconstructing the detail information. The method ensures the precision and minimizes the model, and solves the problem that the previous super-resolution algorithm cannot be deployed on ultra-light equipment or high-speed equipment. The strategy provided by the invention can be used in a plurality of fields such as public security, medical imaging, remote sensing detection, video compression transmission and the like.

It should be noted that, in the foregoing embodiments, the description of each embodiment has an emphasis, and reference may be made to the related description of other embodiments for a part that is not described in detail in a certain embodiment.

As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

While preferred embodiments of the present invention have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including the preferred embodiment and all changes and modifications that fall within the scope of the invention.

It will be apparent to those skilled in the art that various changes and modifications may be made in the present invention without departing from the spirit and scope of the invention. Thus, if such modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to include such modifications and variations.

Claims

1. An ultra-lightweight and efficient single-image super-resolution method comprises the following steps:

step 1, inputting a low-resolution image I to be processed _l For said low-resolution image I, using a 3 x 3 convolution _l Shallow feature F containing coarse grain information is obtained by shallow feature extraction ₀ ；

Step 2, for the shallow layer characteristic F ₀ Extracting deep layer characteristics to obtain deep layer characteristics F ₆ ；

2. The single-image super-resolution method of claim 1, wherein the deep feature extraction in step 2 comprises: the shallow feature F ₀ Inputting a double-branch multi-scale receptive field fusion module for extraction and fusion;

extracting features in the feature extraction branch by convolution with 1 × 1, 3 × 3, 5 × 5 and 7 × 7 respectively; and after the features are extracted by adopting 1 × 1, 3 × 3, 5 × 5 and 7 × 7 depth-by-depth convolution in the information distillation branch, fusion features are obtained.

3. The single-image super-resolution method of claim 2, wherein the deep feature extraction in step 2 comprises: inputting the fusion features into a second-order coordinate attention module based on global information supervision;

4. The single-image super-resolution method according to claim 3, wherein the process of deep feature extraction in step 2 comprises: the shallow layer features are repeatedly input into the double-branch multi-scale receptive field fusion module and the second-order coordinate attention module for three times to obtain deep layer features { F } ₁ ,F ₂ ,F ₃ }；

The deep layer characteristic F ₁ And F ₃ Inputting an adjacent feature fusion module to perform feature fusion to obtain a deep fusion feature, wherein the adjacent feature fusion module comprises a 1 x 1 dimensionality reduction convolution and an activation function.

5. The single-image super-resolution method of claim 4, wherein the process of deep feature extraction in step 2 further comprises: inputting the deep fusion features into a reinforcement learning module for reinforcement learning, the reinforcement learning module using a decoupled enhanced lightweight Non-local attention mechanism based on learnable HashMaps for reinforcement learning of the deep fusion features.

6. The single image super resolution method according to claim 5, wherein the reinforcement learning module comprises a feature transformation branch and a global semantic information coding branch;

the process of the feature transformation branch for processing the deep fusion features comprises the following steps: adding a whitening module to the right side of the 3 x 3 convolution with the channel compression ratio of r, and inputting the deep fusion features into the whitening module to obtain whitening features F _B ；

The process of the global semantic information coding branch processing the deep fusion features comprises the following steps: convolving the deep fusion characteristic input channel by 1 multiplied by 1 with the compression ratio of C to obtain pixel-by-pixel significance information F with the size of 1 multiplied by H multiplied by W _i ；

Comparing the pixel-by-pixel relative saliency information F' _i Performing pixel-by-pixel feature weighted summation operation on the transformation features obtained by the feature transformation branch as weights to obtain global semantic information F with the size of C/2 multiplied by 1 _G ；

Using two 1 x 1 convolutions on the global semantic information F _G Channel coding is carried out to obtain global semantic information F 'sensitive to channels' _G 。

7. According to claim6, the process of deep feature extraction in the step 2 further includes: correcting the self-similarity of pixel point form by using the characteristic F _P And channel-sensitive global semantic information F' _G Simply adding to obtain the fusion feature F with the size of C/2 XHXW _f The fusion feature F is convolved by 1 x 1 liter dimension _f Feature F reduced to CxHxW dimension size _o 。

8. The single-image super-resolution method of claim 7, wherein the process of deep feature extraction in step 2 further comprises:

the feature F _o Obtaining a characteristic F after passing through the double-branch multi-scale receptive field fusion module and the second-order coordinate attention module ₄ The feature F is ₄ Obtaining a depth feature F through the dual-branch multi-scale receptive field fusion module and the second-order coordinate attention module ₅ ；

Using the adjacent feature fusion module to pair depth features F _o And the depth feature F ₅ After feature fusion, the depth feature F is obtained through the dual-branch multi-scale receptive field fusion module ₆ 。