CN115358961A

CN115358961A - Multi-focus image fusion method based on deep learning

Info

Publication number: CN115358961A
Application number: CN202211110378.8A
Authority: CN
Inventors: 陈滨; 熊峰; 邵艳利; 魏丹; 王兴起
Original assignee: Hangzhou Dianzi University
Current assignee: Hangzhou Dianzi University
Priority date: 2022-09-13
Filing date: 2022-09-13
Publication date: 2022-11-18

Abstract

The invention discloses a multifocal image fusion method based on deep learning. And then, obtaining edge information of the multifocal image pair according to the Laplacian, and obtaining a fused image label according to a maximum selection strategy. And inputting the multi-focus image pair into a generator fusion network to obtain a fusion image. In addition, high-frequency texture information of the source image is extracted according to the designed texture enhancement module, and a corresponding loss function optimization fusion image is designed. And finally, inputting the fused image label and the fused image into a discriminator to discriminate true and false, and optimizing a generator according to a discrimination result. The invention solves the problem of texture and edge information loss in the process of multi-focus image fusion, and the fused image can be used for further image processing.

Description

Multifocal image fusion method based on deep learning

Technical Field

The invention discloses a fusion method for generating a clear image by using a plurality of images acquired by different focus settings, which aims to ensure that the fusion image processed by the technology has full clear visual perception, can selectively reserve detailed information contained in a plurality of source images, and can be used for subsequent image processing tasks.

Background

Due to the characteristics of thick middle and thin two sides of the lens, the image obtained by lens imaging can be distorted at a position far away from the focus, so that the obtained image becomes fuzzy. Due to the need for sharper pictures, multi-focus image fusion has become an important topic in the field of image processing. Due to the limitation of the optical lens, only objects near the focus and within the depth of field can obtain full-focus and clear images, and only blurred images can be obtained by objects far away from the focus and outside the depth of field. Multi-focus image fusion algorithms have been proposed to address such problems, which synthesize a fully focused image by capturing the clear appearance of multiple source images with different focal points in the same scene. The fused image can be applied to the fields of photography visualization, object tracking, medical diagnosis, remote sensing monitoring and the like.

To date, image fusion can be divided into two categories: a conventional fusion method and a fusion method based on deep learning. Conventional image fusion algorithms focus on processing the transform domain and the spatial domain of the image. The algorithm based on the transform domain is to transform the image into different feature domains, then carry out weighted fusion on the feature domains, and finally reversely generate a fused image by the fused features. Even images of different modalities share similar attributes on a characteristic domain, so that the algorithm is suitable for fusion among multi-modality images, such as infrared-visible light image fusion and CT-MR image fusion; using a spatial domain based algorithm, an input image is first divided into a plurality of small blocks or regions, then the saliency of each small block is measured, and finally the most salient regions are fused into a new image. The algorithm based on the spatial domain is also suitable for the images of the same modality, such as multi-focus image fusion. However, the traditional image fusion algorithm has the inevitable defects of poor universality, low efficiency, edge blurring and the like. The deep learning algorithm is widely applied to the field of image fusion by virtue of strong feature expression capability. Researchers can complete the image fusion task by designing a proper network structure and a corresponding loss function. The deep learning based method first automatically learns the features in each image block using a convolutional neural network, learning according to labels that partition the focused and defocused regions. And continuously optimizing the network parameters according to the designed loss function. Researchers deal with different application scenes by adjusting the network structure of the depth network, and the quality and the efficiency of image fusion are improved, for example, a pixel level fusion CNN, an encoder-decoder fusion network, a residual error fusion network, and an end-to-end multifocal image fusion algorithm. Although the fusion effect is good, the defects of large calculation amount, complex network and the like still exist.

The typical algorithm still has disadvantages in three ways: (1) Along with the continuous deepening of the network layer number, the expansion or reduction effect of the gradient is continuously accumulated, so that the model can not be converged easily, and the final fusion effect is influenced; (2) In the feature fusion process, some algorithms do not have perfect image reconstruction capability, so that redundant information exists in fused images; (3) A large number of parameters and loss functions need to be calculated in the deep learning model, so that the complexity of an image fusion algorithm is high, and the calculation time cost is high. In order to solve the above problem, the present patent proposes a multi-focus image fusion algorithm based on texture enhancement. The algorithm is based on generation of a confrontation network, firstly, high-frequency information of a source image is extracted by using a texture enhancement module, and a reasonable loss function is designed; secondly, extracting the characteristics of the image to be fused by using a generator adopting a dual-channel mechanism; then performing concat fusion on the obtained characteristics; and finally, inputting the generated fusion image and the real image label into a discriminator for counterstudy.

Disclosure of Invention

The invention provides an improved algorithm aiming at a multi-focus image fusion technology based on deep learning, source images with different focuses can be fused into an image with a full focus area and high texture characteristics, and the fused image can be used for image processing tasks such as image segmentation and target recognition.

The method specifically comprises the following steps:

step 1: and converting the training sample images and the test sample images from an RGB space to a YCbCr space, and reserving Y-channel images therein as a training data set and a test data set.

And 2, step: using the Laplace operator (Laplacian) Obtaining edge information in each image in the training data set, and obtaining a fused image label I according to a Maximum selection strategy (Maximum) _r The concrete formula is as follows:

I _r ＝Maximum(Laplacian(I ₁ )+Laplacian(I ₂ ))

in which I ₁ And I ₂ Is a pair of multi-focus images in the training data set.

And step 3: constructing a generation countermeasure network model, wherein the generation countermeasure network model comprises a generator network and a discriminator network;

and 4, step 4: constructing a texture enhancement module (ITEB), and carrying out multi-focus image pair I corresponding to the training data set ₁ And I ₂ Inputting the information into a texture enhancement module to extract high-frequency information.

Extracting deep features of the image through a texture enhancement module, and obtaining high-frequency texture information of the image;

the method comprises the following specific steps: (1) constructing a texture enhancement model; (2) Obtaining shallow layer characteristic information of an input image by utilizing convolution operation; (3) Adding a channel attention mechanism to rescale the extracted characteristic channels, and distributing different weights to each channel; (4) Adding a relu function to enable the output of a part of network neurons to be 0, reducing the interdependency of parameters and relieving overfitting; (5) Adding residual connection, and connecting the extracted shallow information serving as input with the output of the next extraction stage; (6) Continuously superposing the 5 steps to obtain depth characteristic information of the source image;

and 5: texture information and edge information are introduced, and meanwhile, a reasonable loss function is designed according to an image structure. The loss function can be divided into a generator loss function and a discriminator loss function, where the generator loss includes content loss

Loss of antagonism

And loss of SSIM

Where content loss is used to extract and reconstruct information, countermeasures are used to enhance texture detail, SSIM loss is used to constrain the generator to generate images that are consistent with the true image structure. The generator loss function is expressed as:

where the hyperparameters α and β are used to balance the three to the same level, α and β being set to 100 and 0.01, respectively. Content loss for generators

Is the mean square error of the pixels in the fused image and the input image, and constrains the fused image and the focus area of the source image to have the same intensity distribution and texture details, which are defined as:

where G (z) is the fused image, z represents the pixel distribution of the fused image, X ₁ And X ₂ Is to input a source image, i and j represent the pixel values of the ith row and the jth column of the gradient map or source image, W and H are the maps respectivelyWidth and length of the image. While

And

the gradient image is a gradient image of a source image after being enhanced by the texture enhancing module, and the size of the gradient image is the same as that of the fused image. The competing loss of the generator may further enhance the texture detail of the fused image, which is defined as:

where N represents the number of fused images during training, a is the probability label that the generator expects the discriminator to discriminate fused images, here set to 1,

representing the operation of generating an image gradient map using the laplacian operator. This antagonistic game results in a fused image with a finer texture.

Since the mean square error is very sensitive to large errors, the pure use of the loss of the mean square error may result in an excessively smooth image, and the overall structure of the image is not considered, so that the fused image and the source image structure are consistent

Added to the loss function, which is defined as:

in which I _f Representing a fused image, I ₁ And I ₂ The method comprises the following steps of representing a source image, wherein an SSIM function is used for calculating the brightness, the contrast and the structure difference value between a fusion image and the source image, and the calculation formula is as follows:

where μ and σ denote the mean and standard deviation, respectively, of the entire image pixel matrix, c ₁ And c ₂ Is two minimum constants to prevent the denominator from being zero. The larger the SSIM value, the higher the structural similarity between the two.

The penalty function of the discriminator may enable the discriminator to accurately distinguish between genuine and fake data. The input of the discriminator is a fused image I generated by the generator _{real_fused} And a fused image I obtained according to the maximum selection principle and reconstruction _{fake_fused} The loss function of the discriminator is specifically expressed as:

where N is the number of fused images during training, AVE is the pixel averaging function, b represents the probability that the desired discriminator can recognize real data, and b =1, c represents the probability that the desired discriminator can recognize false data, where c =0 is set. With such constraints, the discriminator can continue to improve the ability to discriminate between true and false data, and then direct the generator to generate a strongly textured fused image.

Step 6: the multi-focus image pair I corresponding to the training data set ₁ And I ₂ Input to a generator network to obtain I ₁ And I ₂ Feature set F of ₁ ＝(x ₁ ,x ₂ ,…x _n ) And F ₂ ＝(x ₁ ,x ₂ ,…x _n ) Then obtaining a fused image I of the generator through concat fusion and reconstruction _f The concrete formula is as follows:

I _f ＝reconstruct(concat(F ₁ ，F ₂ ))

and 7: fused image I of generator _f And fused image tag I _r The discriminator of the generation countermeasure network is input for discrimination, and countermeasure rules are established between the generator and the discriminator to optimize the fusion image. Specific optimization step bagComprises the following steps:

7-1: the discriminator discriminates whether the fused image is a real image;

7-2: if not, minimizing the difference between the fused image and the fused image label through the loss function, feeding back the identification result to the generator, adjusting the fusion rule of the generator according to the identification result, and optimizing the fusion result.

7-3: and if so, the fused image of the generator is the optimal fused image.

And 8: and performing color space conversion on the fused gray-scale image to obtain a final fused image.

The invention has the beneficial effects that:

the multi-focus image fusion method based on deep learning disclosed by the invention adopts the texture enhancement module to extract the high-frequency texture information of the source image and designs the loss function which can promote the network to be rapidly converged to adaptively adjust the fusion rule of the generator, thereby effectively reducing the loss of important information in the process of feature extraction. In addition, a countermeasure rule and a design loss function are established between the generator and the discriminator, and the fusion result is optimized.

Drawings

Fig. 1 is a schematic flow chart of a method for fusing a multi-focus image based on deep learning according to the present invention;

FIG. 2 is a schematic diagram of the structure of a generator network in the practice of the present invention;

FIG. 3 is a schematic diagram of the structure of a network of discriminators in the practice of the present invention;

FIG. 4 is a block diagram of a texture enhancement module in accordance with an embodiment of the present invention;

Detailed Description

The present invention will be further explained with reference to the attached drawings, and the purpose, technical solutions and points of the present invention will be clearly and precisely explained. Referring to the attached drawing 1, the overall process of the invention comprises the following steps:

step 1: the training sample set and the Test sample set are converted from RGB space to YCbCr (Y: luminance component, cb: blue chrominance component, cr: red chrominance component) space, and the Y-channel images therein are reserved as a training data set Train _ set and a Test data set Test _ set.

Step 2: utilizing a Laplacian operator to obtain edge information of a multifocal image pair in a training set, and obtaining a fused image label I according to a maximum selection strategy _r 。

And step 3: and inputting the multifocal images in the training set into a texture enhancement module to obtain high-frequency texture information of the images, and designing a loss function for optimizing a generator.

And 4, step 4: inputting the training set multi-focus image pair into a generator network to obtain a fusion image I _f . The detailed structure of the generator network is shown in fig. 2.

And 5: label I of fused image _r And fused image I _f Inputting the image into a discriminator, obtaining a probability label, and discriminating whether the input image is a real image according to the probability label. The specific discrimination process is as follows: if the discriminator discriminates that the input image is not the real image, reducing the difference between the input image and the real image by using the loss function in the step 3, feeding back the discrimination result to the generator, and adaptively adjusting the fusion rule according to the feedback result by the generator to optimize the fusion result; if the discriminator discriminates that the input image is a real image, the fused image is the optimal result. The detailed structure of the discriminator is shown in figure 3.

Further, in step 3, the texture enhancement module calculation can be divided into 3 parts: and extracting image features, reconstructing and designing a loss function.

And 3-1, extracting deep features of the image to obtain high-frequency texture information of the image.

And 3-1-1, acquiring image edge information. First using the laplacian (Laplacian) Extracting edge information of an image focusing area:

image _p ＝Laplacian(image)

wherein image is a source image in the training dataset _p Is an edge information graph after laplacian conversion.

3-1-2, extracting deep features of the image. First image is transformed using a 3 × 3 convolution kernel _p Conversion from single channel to 32And converting 32 channels into 128 channels by using a convolution kernel of 3 × 3 to obtain a feature set Origin, wherein the convolution process is called conv operation.

3-1-3 attention is assigned a weight. After the channels are extracted, the feature map corresponding to each channel is compressed into a real number, and the real number set is expressed as { x ₁ ,x ₂ ,…x ₁₂₈ Then, each characteristic Channel is assigned with a weight through a parameter W, the W is learned and used for explicitly modeling the correlation among the characteristic channels, and then the assigned weight is applied to each original characteristic Channel to obtain a Channel attention set, which is expressed as { W } ₁ ,w ₂ ,…w ₁₂₈ And the importance of different channels can be learned by matching with a deep learning method.

3-1-4. Residual jump connection. After each feature Channel is assigned with different weights, the feature Set origin is assigned and added to the Channel attention Set Channel _ Set to obtain the output of each layer of ITEB of the texture enhancement module:

ITEB＝Origin+Channel_set

where ITEB is the result obtained for each layer of the texture enhancement module.

3-1-5, extracting the characteristic diagram of the deeper layer of the image. Continuously iterating the ITEB process of each layer of the texture enhancement block to obtain an extraction result F, wherein a specific formula is expressed as follows:

F＝ITEB(ITEB(…ITEB(I _p )))

and 3-2, reconstructing the image to restore the original size of the image. The 128-channel feature map is restored to a single-channel image by using a 1 × 1 convolution kernel, and the specific formula is as follows:

F _ITEB ＝constructor(F)

wherein F _ITEB And outputting the result of the texture enhancement module.

And 3-3, designing a loss function. And (3) introducing texture enhancement module evaluation information, and designing a loss function for promoting the rapid convergence of the network, wherein the specific formula of the loss function is as follows:

where W and H are the width and height of the image, respectively, l ₁ And l ₂ Respectively, a pair of multifocal images in the training data set, I _f Is the fused image output by the generator and,

and

the gradient image is a gradient image of a source image after being enhanced by the texture enhancing module, and the size of the gradient image is the same as that of the fused image. The detailed structure of the texture enhancement module is shown in FIG. 4.

Further, in step 4, the generator model is composed of two channels, the calculation of which is divided into 2 parts, image feature extraction and image restoration.

And 4-1, extracting the characteristics of the image.

4-1-1, extracting characteristics from the first layer. The source image is first converted from a single channel to 16 channel features using a 1 x 1 convolution kernel, the output of which is denoted layer1.

4-1-2, extracting characteristics in the second layer. The 16 channels of the first layer are converted to 32 channel features using a 33 convolution kernel, which is denoted as layer2.

4-1-3, full connection layer. Connecting the outputs of the first and second layers as inputs to the third layer may be expressed as:

layer2_3＝concat(layer1,layer2)

where layer2_3 represents the output of the fully connected layer, with 48 eigen-channel numbers.

4-1-4, extracting characteristics from the third layer. The output of the fully-connected layer is converted from 48 channels to 16 channels using a 3 x 3 convolution kernel, which is denoted as layer3.

4-1-5, full connecting layer. The output of the first three layers is connected as the input of the fourth layer, which can be expressed as:

layer3_4＝concat(layer1,layer2,layer3)

where layer3_4 is the output of the fully connected layer, with 64 eigen-channel numbers.

And 4-1-6, extracting characteristics in the fourth layer. The output of the fully-connected layer is converted from 64 channels to 16 channels, denoted layer4, using a 3 x 3 convolution kernel.

4-1-7, full connecting layer. The output connections of the first four layers are taken as inputs to the fused layer, which is represented as:

layer4_5＝concat(layer1,layer2,layer3,layer4)

where layer4_5 represents the output of a fully connected layer, which has 128 characteristic channels.

And 4-2, recovering the image. The image is converted from 128 channels to a single channel using a 1 x 1 convolution kernel, denoted layer5, and the final single channel image is acted on by a tangent hyperbolic function tanh, which is specifically denoted as:

the output of the generator is represented as:

I _f ＝tanh(layer5)

further, in step 5, the discriminator is composed of four convolutional layers and one linear layer, and the specific steps are as follows:

and 5-1, extracting image characteristics for identification.

5-1-1. First, the input image is converted from a single channel to 16 channels using a 3 x 3 convolution kernel;

5-1-2. Converting the input features from 16 channels to 32 channels using a convolution kernel of 3 x 3;

5-1-3. Converting the input features from 32 channels to 64 channels using a convolution kernel of 3 x 3;

5-1-4. Converting the input features from 64 channels to 128 channels using a convolution kernel of 3 x 3;

5-1-5, converting the input feature from 128 channels to 256 channels by using a convolution kernel of 3 x 3, and acquiring the height H of the output feature;

and 5-2, acquiring a probability label of the input image by using a linear layer.

5-2-1, firstly, the dimension of the output characteristic channel is readjusted, the channel number is 32, the size of each channel is H multiplied by 256, and the output can be expressed as reshape _ var;

5-2-2. Set the normalization matrix, receiver, which is expressed as:

recover＝[H×H×256,1]

5-2-3, multiplying the readjusted characteristic channel by a normalization matrix to obtain a probability distribution label:

Probability＝reshape_var×recover

where Proavailability represents the output of the discriminator.

Analysis of Experimental results

In order to more intuitively show the advantages of the invention, the algorithm used in the invention is compared with five most advanced fusion algorithms, including BF, GRW, quadtree, CNN and MFF-GAN, wherein BF, GRW and Quadtree are traditional fusion algorithms, CNN and MFF-GAN are methods based on deep learning,

in order to objectively evaluate the image fusion algorithm used by the invention, six popular statistical data are selected as objective indexes for measuring the fusion result. Including the mean gradient (Q) ^AG ) Information entropy (Q) ^EN ) Spatial frequency (Q) ^SF ) Gradient-based fusion performance (Q) ^AB|F ) Visual fidelity (Q) ^VIF ) Sum of difference correlations (Q) ^SCD ). The results of the objective comparison are shown in the following table:

comparison of the present invention with other algorithms

The indices in the table illustrate:

Q ^AG is the average value of the image gradient and is used for measuring the definition of the fused image. Q ^AG The larger the image, the sharper the image. Q ^AG The definition is as follows:

where M and N are the height and width of the image,I _f is a fused image, i and j represent the ith row and jth column of the image.

Q ^EN Is used to measure the information content of the image. Q ^EN The larger the size, the more information the fused image contains. Q ^EN The definition is as follows:

wherein p is _i Is the normalized probability for the gray value.

Q ^SF Used to measure the texture of the fused image. Q ^SF The larger the size, the richer the edges and texture of the image. Q ^SF The definition is as follows:

where RF and CF represent the row and column frequencies, respectively, which are defined as follows:

Q ^AB|F which is used to measure the degree of retention of edge information from the source image to the fused image. Q ^AB|F The larger the more edge information is retained. Q ^AB|F The definition is as follows:

wherein

And

respectively, the edge intensity and orientation value, w, at the location of the image (i, j) ^A And w ^B The weight values of the two source images corresponding to the fused image are obtained.

Q ^VIF Is used to measure the fidelity of the information of the fused image, similar to the human visual system. First, Q ^VIF The source image and the fused image are filtered and divided into different blocks. Each block is then evaluated for visual information of distortion. The visual fidelity of each block is then calculated. And finally, calculating the overall visual fidelity.

Q ^SCD The method is used for measuring the degree of correlation between the source image information and the fused image information, and can evaluate the pseudo information contained in the fused image. Q ^SCD The larger the size, the better the fusion performance, and the less pseudo information contained. Q ^SCD The definition is as follows:

Q ^SCD ＝r(I ₁ ,I _f )+r(I ₂ ,I _f )

wherein, I ₁ And I ₂ Is a source image, I _f Is the fused image, r (-) is the correlation coefficient function, defined as follows:

。

Claims

1. a multi-focus image fusion method based on deep learning is characterized by comprising the following steps:

step 1: preprocessing and color space converting are carried out on the training sample image and the test sample image to obtain a training data set and a test data set;

and 2, step: lelaplacian algorithm for obtaining edge information image of multi-focus image pair in training set _p And obtaining a fused image label I according to the maximum selection strategy _r ；

And step 3: constructing a deep convolutional neural network model and a texture enhancement module; extracting deep features of the image through a texture enhancement module, and obtaining high-frequency texture information of the image;

the method comprises the following specific steps: (1) constructing a texture enhancement model; (2) Obtaining shallow layer characteristic information of an input image by utilizing convolution operation; (3) Adding a channel attention mechanism to rescale the extracted characteristic channels, and distributing different weights to each channel; (4) Adding a relu function to enable the output of a part of network neurons to be 0, reducing the interdependency of parameters and relieving overfitting; (5) Adding residual connection, and connecting the extracted shallow layer information as input with the output of the next extraction stage; (6) Continuously superposing the 5 steps to obtain depth characteristic information of the source image;

and 4, step 4: reconstructing the image to restore the original size of the image; obtaining a high-frequency information set of the test sample by using the texture enhancement module in the step 2, and designing a corresponding loss function according to the high-frequency information set; the specific formula of the loss function is as follows:

where W and H are the width and height of the image, respectively, and l ₁ And l ₂ Respectively, a pair of multifocal images in a training data set, I _f Is the fused image output by the generator and,

and

the gradient image is a gradient image of a source image which is enhanced by a texture enhancing module, and the size of the gradient image is the same as that of a fusion image;

and 5: training the deep convolutional neural network model in the step 2 by using a training data set to obtain a trained deep convolutional neural network model;

step 6: obtaining a depth feature set of a training data set based on the trained depth convolution neural network model, and performing concat fusion on the depth feature set to obtain a fused gray scale map;

and 7: and performing color space conversion on the fused gray-scale image to obtain a final fused image.

2. The method for multi-focus image fusion based on deep learning of claim 1, wherein: in step 1, the training sample set and the test sample set are converted from RGB color space to YCbCr color space, wherein Y-channel images serve as the training data set and the test data set.

3. The method for multi-focus image fusion based on deep learning of claim 1, wherein: the deep convolutional neural network adopts a generation countermeasure network architecture and comprises a generator and a discriminator; the generator extracts features by using two channels, and inputs the features into a multi-focus image pair I in a training data set ₁ And I ₂ Output as a fused image I _f (ii) a The input of the discriminator is a fused image I _f And a fused image tag I _r The output is a probability label; and establishing a countermeasure rule between the generator and the discriminator to optimize the fused image.

4. The method for multi-focus image fusion based on deep learning of claim 1, wherein: reconstructing the image to restore the image to the original size; the method specifically comprises the following steps:

the 128-channel feature map is restored to a single-channel image by using a 1 × 1 convolution kernel, and the specific formula is as follows:

F _ITEB ＝constructor(F)

wherein F _ITEB And outputting the result of the texture enhancement module.

5. The method of claim 3, wherein the method comprises: the image I to be fused is _f And a fused image tag I _r Inputting the image into a discriminator to discriminate, establishing a countermeasure rule between the discriminator and a generator, and optimizing a fused image, wherein the steps comprise: (1) If the discriminator discriminates the fusion image I _f For false image, go through the steps4 minimizing the fused image I by the loss function _f And a fused image tag I _r The difference is obtained, the identification result is fed back to the generator, the fusion rule of the generator is adjusted in a self-adaptive mode, and the fusion image is optimized; (2) If the discriminator discriminates the fused image I _f Fusing the image I as a real image _f Is the optimal fused image.

6. The method for multi-focus image fusion based on deep learning of claim 1, wherein: extracting deep features of the image through a texture enhancement module in the step 3, and obtaining high-frequency texture information of the image;

3-1, extracting deep features of the image;

image is first transformed using a 33 convolution kernel _p Converting a single channel into 32 channels, converting the 32 channels into 128 channels by using a convolution kernel of 3 multiplied by 3 to obtain a feature set Origin, and calling the convolution process as conv operation;

3-2, attention distribution weight;

after the channels are extracted, the feature graph corresponding to each channel is compressed into a real number, and the real number set is expressed as { x ₁ ,x ₂ ,…x ₁₂₈ Then, each characteristic Channel is assigned with a weight through a parameter W, the W is learned and used for explicitly modeling the correlation among the characteristic channels, and then the assigned weight is applied to each original characteristic Channel to obtain a Channel attention set, which is expressed as { W } ₁ ,w ₂ ,…w ₁₂₈ Learning the importance of different channels by matching with a deep learning method;

3-3, residual jump connection;

after each feature Channel is assigned with different weights, the feature Set origin is assigned and added to the Channel attention Set Channel _ Set to obtain the output of each layer of ITEB of the texture enhancement module:

ITEB＝Origin+Channel_set

wherein ITEB is the result obtained by each layer of the texture enhancement module;

3-4, extracting a characteristic diagram of the deeper layer of the image;

continuously iterating the ITEB process of each layer of the texture enhancement block to obtain an extraction result F, wherein a specific formula is expressed as follows:

F＝ITEB(ITEB(…ITEB(I _p )))。