CN110322402B

CN110322402B - Medical image super-resolution reconstruction method based on dense mixed attention network

Info

Publication number: CN110322402B
Application number: CN201910361678.5A
Authority: CN
Inventors: 刘可文; 马圆; 熊红霞; 刘朝阳; 房攀攀; 李小军; 陈亚雷
Original assignee: Wuhan University of Technology WUT
Current assignee: Wuhan University of Technology WUT
Priority date: 2019-04-30
Filing date: 2019-04-30
Publication date: 2023-07-25
Anticipated expiration: 2039-04-30
Also published as: CN110322402A

Abstract

The invention discloses a medical image super-resolution reconstruction method based on a dense mixed attention network. The invention introduces a mixed attention mechanism on the basis of dense neural network, so that the neural network is more focused on channels and areas containing abundant high-frequency information, network convergence is accelerated, and the precision of super-resolution is further improved. Mainly comprises the following steps: designing and building a network based on a dense neural network and a mixed attention mechanism; preprocessing a data set, enhancing data, and constructing a training sample; training the network model using the L2 loss until the network model converges; and in the super-resolution reconstruction stage, inputting a medical image with low resolution, and performing super-resolution reconstruction to obtain a final high-resolution image by using the trained network model. Compared with the mainstream super-resolution method, the method disclosed by the invention has higher precision, and is an effective medical image super-resolution reconstruction method.

Description

Medical image super-resolution reconstruction method based on dense mixed attention network

Technical Field

The invention relates to a medical image super-resolution reconstruction technology, in particular to a medical image super-resolution reconstruction method based on a dense mixed attention network, and belongs to the field of digital image processing.

Background

Medical images are widely used for clinical diagnosis and treatment, but in the process of acquiring medical images, due to hardware limitation and environmental influence, the medical images have low resolution and blurring caused by the lack of high-frequency information. The improvement of the problems from the aspect of hardware is limited by the manufacturing process and the cost, and from the aspect of software, the super-resolution reconstruction of the medical image with low resolution by using the image super-resolution method can efficiently obtain the corresponding high-resolution image.

The current image super-resolution method mainly comprises three types, namely methods based on interpolation, modeling and learning, and the methods based on learning can be divided into a method based on sparse representation and a method based on convolutional neural network. The interpolation-based method has the characteristic of high calculation efficiency, but high-frequency texture detail information is easy to lose. The modeling-based method utilizes priori information to restrict the solution space, so that the effect is improved to a certain extent compared with the interpolation-based method, but when the size of an input image is smaller, the prior information which can be effectively utilized is less, and the super-resolution effect is poorer. The learning-based method realizes super resolution by learning the internal relation between the low-resolution image and the high-resolution image. In recent years, a super-resolution method based on a convolutional neural network achieves higher precision. However, the convolution kernel of the convolutional neural network treats each channel and region of the feature map equally, reducing the feature expression capability of the channels and regions of the network that contain rich high frequency information. In addition, the conventional convolutional neural network loses memory information during forward propagation, so that the idea of a dense neural network can be introduced, a large number of jump connection structures are added, multiplexing characteristics are achieved, and network performance is further improved. In summary, the performance of the learning-based image super-resolution method still has room for improvement.

Disclosure of Invention

The technical problems to be solved by the invention are as follows: in the process of acquiring the medical image, due to hardware limitation and environmental influence, the resolution of the medical image is low and the problem of blurring is caused by the lack of high-frequency information.

The invention solves the technical problems by adopting the following technical scheme:

the invention provides a medical image super-resolution reconstruction method based on a dense mixed attention network, which is characterized in that a mixed attention mechanism is introduced on the basis of the dense neural network, and an added mixed attention mechanism unit enables the neural network to pay more attention to channels and areas containing rich high-frequency information, so that the characteristic expression capability of the network is improved, the network convergence is accelerated, and the super-resolution precision is further improved.

The medical image super-resolution reconstruction method comprises the following steps:

step one: designing and building a network based on a dense neural network and a mixed attention mechanism;

step two: preprocessing an input image, enhancing data, and constructing a training sample;

step three: training the network model using the L2 loss until the network model converges;

step four: and in the super-resolution reconstruction stage, inputting a medical image with low resolution, and performing super-resolution reconstruction to obtain a final high-resolution image by using the trained network model.

The mixed attention mechanism refers to the capability of a network to enhance characteristic representation and pay attention to channels and areas with rich high-frequency information, and 2 cascade convolution layers and activation layers are arranged in a unit of the mixed attention mechanism.

The dense neural network comprises N (N is more than or equal to 8) basic units, N (N is more than or equal to 8) cascaded convolution layers and activation layers are arranged in each basic unit, and a mixed attention mechanism unit is cascaded at the end of each basic unit; and adding a large number of dense jump connection structures between each basic unit and each basic unit, and extracting a deeper feature representation.

In the above method, the dense neural network may be divided into five stages, which are feature extraction, feature nonlinear mapping, feature dimension reduction, deconvolution up-sampling, and convolution to obtain a final output, wherein: the feature extraction stage uses a cascade convolution and activation layer, the feature nonlinear mapping stage uses the dense neural network described by the method, the feature dimension reduction stage uses a bottleneck layer to reduce dimension, the deconvolution up-sampling is used, and the convolution obtains a final output.

In the above method, the preprocessing and data enhancement in the second step is to cut the input image, and perform a downsampling operation on the sub-image obtained by cutting to obtain a corresponding low resolution image, and acquire more training samples by using data enhancement.

In the third step of the method, a loss L based on L2 norm is adopted ₂ High resolution image obtained by quantization super resolution and true high resolutionThe similarity degree of the rate images is achieved, the training process adopts small-batch learning, and the adopted expression of the loss function is as follows:

wherein: i ^HR Is a true high resolution image; i ^SR A high resolution image obtained for performing super resolution; h, W, C are the size (length, width) and channel number of the input image, n is the number of small batch learning, v is the v Zhang Tezheng diagram in the small batch n, k is the kth channel of the v-th feature diagram, (I, j) is the coordinate position in the feature diagram, I _v,i,j,k The position of the kth channel, which is the v-th feature map, is the pixel value of (i, j).

In the fourth step of the method, the medical image obtained by super-resolution reconstruction is amplified by a low-resolution imageMultiple obtained->

The method and the technical process provided by the invention comprise a software system realized by the method expansion.

The method can further improve the super-resolution precision by setting more basic units or setting more convolution and activation layers in the basic units. The method for improving the super-resolution precision is contradicted with the invention by setting more basic units or setting more convolution and activation layers in the basic units on the basis of the network proposed by the method.

Experimental results show (see description of specific experimental data in specific implementation methods), compared with the mainstream image super-resolution method, the proposed method has 0.146dB-5.874dB and 0.1% -7.66% improvement on Peak Signal-to-Noise Ratio (PSNR) and structural similarity (Structural similarity, SSIM) respectively.

Compared with the prior art, the invention has the following main advantages:

introducing a mixed attention mechanism, providing a new mixed attention mechanism unit, adding the mechanism unit in a network structure can accelerate network convergence, enhance characteristic representation capability, further improve network performance, and the structural design of the mixed attention mechanism unit has initiative;

based on a dense neural network, a large number of dense jump connection structures are added, the characteristics of different stages and different scales are fully utilized, gradient information is directly transmitted to a front layer of the network, the problems of memory information loss, gradient disappearance and network degradation of a conventional neural network during forward propagation are solved, and the network performance is further improved;

the invention constructs the medical image super-resolution method based on the mixed attention mechanism and the dense neural network, and compared with the main stream method, the method improves the network performance, and the fusion of the mixed attention mechanism and the dense neural network has the initiative.

Drawings

Fig. 1 is a block diagram of a hybrid attention mechanism unit proposed by the present invention.

Fig. 2 is a basic unit structure diagram based on dense mixed attention network proposed by the present invention.

Fig. 3 is a network structure diagram of the medical image super-resolution reconstruction method based on the dense mixed attention network of the present invention.

Fig. 4 is a graph comparing the effect of super-resolution reconstruction of medical images obtained by 2 times of super-resolution of the other three methods.

Detailed Description

The invention discloses a medical image super-resolution reconstruction method based on a dense mixed attention network, which introduces a mixed attention mechanism based on the dense neural network, so that the neural network focuses on channels and areas containing rich high-frequency information, accelerates network convergence, and further improves the precision of super-resolution. Mainly comprises the following steps: designing and building a network based on a dense neural network and a mixed attention mechanism; preprocessing a data set, enhancing data, and constructing a training sample; training the network model using the L2 loss until the network model converges; and in the super-resolution reconstruction stage, inputting a medical image with low resolution, and performing super-resolution reconstruction to obtain a final high-resolution image by using the trained network model. Compared with the mainstream super-resolution method, the method disclosed by the invention has higher precision, and is an effective medical image super-resolution reconstruction method.

The invention will be further described with reference to examples and figures, but the invention is not limited thereto.

The medical image super-resolution reconstruction method based on the dense mixed attention network is characterized in that a mixed attention mechanism is introduced on the basis of the dense neural network, and the added mixed attention mechanism unit enables the neural network to pay more attention to channels and areas containing rich high-frequency information, so that the characteristic expression capability of the network is improved, the network convergence is accelerated, and the super-resolution precision is further improved.

The medical image super-resolution reconstruction method of the dense mixed attention network comprises the following steps of:

As shown in fig. 1, in the step one, the mixed attention mechanism unit has 2 cascaded convolution layers and activation layers. The image with dimension h×w×c is input to the attention mechanism unit, where H, W is the length and width of the image, and C is the channel number, and the description Fu with dimension h×w×c is obtained through two cascade convolution and activation:

τ＝f(W ₂ δ(W ₁ x)),τ∈R ^C ，

wherein: x is input, W ₁ For the parameters of the first layer convolution, the number of channels of the feature map with the first layer convolution execution factor of 16 is reduced to obtain a feature map with the dimension of H W C/16, delta (g) is RELU activation operation, and W ₂ For the parameters of the second-layer convolution, the second convolution execution factor is 16, the feature map channel is in a dimension of a few liters, and f (g) is a sigmoid activation operation. Performing convolution twice, activating channel number dimension reduction and channel number dimension increase on channel dimension, and learning C description matrices tau corresponding to different channels _i Where i=0, 1, 2..c, adaptively assigning a more sparse description matrix to channels containing a large amount of redundant low frequency information makes the neural network more focused on channels containing rich high frequency information. Each description matrix tau _i Is H x W for each element of the i-th channel of the corresponding input image. After two convolutions and activation, the region of the original input image containing abundant high-frequency information is reserved, the region containing a large amount of redundant low-frequency information is restrained, and the obtained description Fu is obtained _i And carrying out Hadamard product with the original input ith channel, so that the neural network is more focused on the region containing rich high-frequency information in the ith channel. The resulting descriptors are multiplied by the Hadamard of the input image to obtain a feature map through the hybrid attention mechanism unit.

The convolution layer is a basic unit of a convolution neural network and is used for extracting different features of an input image, the first layer of convolution layer can only extract some low-level features such as edges, lines, angles and other levels, and a deeper network can iteratively extract more complex features from the low-level features.

The activation layer is a basic unit of the convolutional neural network and is used for enhancing the judging function and the nonlinear characteristic of the whole neural network, the convolutional layer is not changed, and common activation functions include Sigmoid (S) function, RELU linear rectification function and the like.

In fig. 3, the network based on dense neural network and mixed attention mechanism designed and constructed in the first step is divided into five stages, which are feature extraction, feature nonlinear mapping, feature dimension reduction, deconvolution up-sampling, and convolution to obtain final output:

stage one: the feature extraction stage is to extract data from the input image, construct non-redundant feature information for facilitating subsequent learning and generalization, and the extracted features can be better than human hand-extracted features in some cases. As shown in fig. 3, the present invention uses a concatenated convolutional layer and RELU active layer to accomplish feature extraction after low resolution image input, the convolutional layer parameters are set to 128 x 3, i.e. 128 convolution kernels of size 3 channels number 3.

Stage two: the feature nonlinear mapping stage refers to using operators that do not satisfy the linearity condition for accomplishing the mapping of vector space (including abstract vector space made up of functions). As shown in fig. 3, the present invention uses a dense neural network in the middle of the network to complete the nonlinear mapping, where the dense neural network used includes n basic units, each basic unit has n cascaded convolutions and active layers, the last of the basic units is cascaded with a mixed attention block, the parameters of the convolutions layers in the basic units are set to be 16×3×3, that is, 16 convolution kernels with the size of 3×3, the active functions use RELU functions, the step size of the convolution operation is 1, and zero padding (zero padding) operation is used at the edge, so as to keep the size of the feature map consistent. Because a large number of dense jump structures are added, feature graphs of different stages and different scales are aggregated, the number of channels of an input convolution kernel increases linearly along with the deepening of the network layer, the number of channels of each convolution kernel of a first layer convolution in each basic unit is 16, and then the number of channels of the convolution kernel of each layer convolution increases by 16 on the basis of the upper layer.

Stage three: the feature dimension reduction stage refers to finding out main feature information from feature information containing a large number of redundant or irrelevant features, for reducing computational complexity. As shown in fig. 3, the present invention uses a bottleneck layer for dimension reduction, the bottleneck layer is a layer of convolution layer for mapping a high-dimensional space to a low-dimensional space, the convolution kernel parameters of the bottleneck layer are set to 256×1152×1×1, that is, 256 convolution kernels with a 1-channel number of 1152, and the step size of the convolution operation is 1.

Stage four: the deconvolution up-sampled convolution kernel parameter is set to 256 x 2, the step size of the convolution operation is 2, and zero padding (zero padding) operation is used at the edges.

Stage five: the convolution results in a final output, the parameters of the convolution kernel are set to 3 x 256 x 3, the step size of the convolution operation is 1, and zero padding (zero padding) operation is used at the edges.

The preprocessing and data enhancement are carried out in the second step, the specific implementation mode is that an input image is cut into sub-images with the size of 96×96, the sub-images obtained through cutting are subjected to double-three downsampling operation by using an imresize function of Matlab, a corresponding low-resolution image with the size of 48×48 is obtained, and more training samples are obtained by using data enhancement such as rotation and mirroring.

Training the network model by using the experimental L2 loss described in the third step until the network model converges, wherein the specific implementation mode is to adopt the loss L based on the L2 norm ₂ And quantifying the similarity degree of the high-resolution image obtained by super-resolution and the real high-resolution image, wherein the training and training process adopts small batch (mini-batch) learning, and the number of the small batch learning is set to be 16. The expression of the loss function used is:

wherein: i ^HR Is a true high resolution image; i ^SR A high resolution image obtained for performing super resolution; h, W, C are the size (length, width) and channel number of the input image, n is the number of small batch learning, v is the v Zhang Tezheng diagram in the small batch n, k is the kth channel of the v-th feature diagram, (I, j) is the coordinate position in the feature diagram, I _v,i,j,k The position of the kth channel, which is the nth image, is the pixel value of (i, j).

In the fourth step, the specific implementation manner is that the medical image with low resolution is input into a network to obtain an output high-resolution image, and the medical image obtained by the super-resolution reconstruction is equivalent to the amplification of the low-resolution image Obtained by doubling.

Experiments prove that the dense mixed attention network can fully multiplex the characteristics of different stages and different scales, and further improve the network performance; the mixed attention mechanism can make the neural network pay more attention to channels and areas containing abundant high-frequency information, inhibit channels and areas containing a large amount of redundant information, accelerate network convergence and further improve network performance.

In order to prove the effectiveness of the invention, 400 CT images with high definition and rich details and size of 512 multiplied by 512 are selected from public data sets of the national lung cancer center to serve as training sets, and 100 images are selected to serve as test sets. In the second step of the invention, the input image is cut into sub-images with the size of 96×96, the sub-images obtained by cutting are subjected to double-three downsampling operations by using the imresize function of Matlab, the corresponding low-resolution image with the size of 48×48 is obtained, and more training samples are obtained by using data enhancement such as rotation and mirroring.

In the experiment, a bicubic interpolation method is selected to be compared with two representative convolution neural network-based methods. To ensure the fairness of comparison, each method is tested in the same hardware environment.

Two representative convolutional neural network-based methods are selected:

method 1: the method proposed by Kim et al, reference is: kim J, kwon Lee J, mu Lee K.Accidet image super-resolution using very deep convolutional networks [ C ]// Proceedings of the IEEE conference on computer vision and pattern recognment.2016:1646-1654.

Method 2: the method proposed by Tong et al, reference is: child and Gao Qinquan an image super resolution method based on dense connection network [ P ]. Fujian: CN106991646a,2017-07-28.

Setting experimental hardware environment parameters:

TABLE 1

And (3) selecting an evaluation index:

the objective indexes widely used for evaluating the super-resolution effect of the image include Peak Signal-to-Noise Ratio (PSNR) and structural similarity (Structural similarity, SSIM), and the PSNR and the SSIM are used as the indexes for objective evaluation. In addition, the time required for completing the super-resolution of the single image is also used as one of the objective evaluation indexes of the reference.

Objective evaluation contrast for each super-resolution method:

TABLE 2

From the experimental data of the methods in Table 2, it can be seen that the peak signal-to-noise ratio (PSNR) and the Structural Similarity (SSIM) of the present invention have a respective increase of 5.874dB and 7.66% compared to bicubic interpolation, a respective increase of 0.259dB and 0.37% compared to method 1, and a respective increase of 0.146dB and 0.1% compared to method 2. In terms of super-resolution time consumption of single-frame images, the Bicubic interpolation method is fastest, and time consumption of the method 1, the method 2 and the method is higher than that of the Bicubic method, but the time consumption is within 0.5s, and the real-time performance is good.

The effect of each super-resolution method shown in fig. 4 is that four groups of images with relatively rich texture details are selected, and the effect of each super-resolution method is shown as CT images of aorta, tip of lung, lung and lung lobe respectively. The comparison between the high-resolution image obtained by each method and the real high-resolution image (GT) is shown, and the corresponding objective evaluation index value is marked below the image. Compared with a contrast method, the super-resolution reconstructed image has better image sharpness, the best visual perception effect, clear image, real details and uniform brightness, and the super-resolution obtained image is closest to the real image. The invention provides an effective medical image super-resolution reconstruction method by combining objective evaluation and subjective evaluation results.

Medical images to which the present invention is applicable include, but are not limited to, CT images, magnetic Resonance Imaging (MRI) images, X-ray images, and positron emission computed tomography (PET) images.

The invention utilizes the advantages of dense neural network to fully multiplex the characteristics of different stages and different scales, further improves the network performance, reduces the parameter number and improves the problems of gradient disappearance and network degradation; by utilizing the advantages of the mixed attention mechanism, a new mixed attention mechanism unit is provided, and the mechanism unit is added in a network structure to ensure that the neural network pays more attention to channels and areas containing rich high-frequency information, accelerates the network convergence and further improves the network performance; an effective medical image super-resolution reconstruction method is constructed by combining a dense neural network and a mixed attention mechanism.

What is not described in detail in this specification is prior art known to those skilled in the art.

Claims

1. A medical image super-resolution reconstruction method is characterized by comprising the steps of introducing a mixed attention mechanism on the basis of a dense neural network, enabling the neural network to pay more attention to channels and areas containing rich high-frequency information by an added mixed attention mechanism unit, improving the characteristic expression capability of the network, accelerating network convergence and further improving the precision of super-resolution;

the mixed attention mechanism means that the network has the capability of enhancing characteristic representation and paying attention to channels and areas with rich high-frequency information, and 2 cascade convolution layers and activation layers are arranged in the mixed attention mechanism unit;

the dense neural network is divided into five stages, namely feature extraction, feature nonlinear mapping, feature dimension reduction, deconvolution up-sampling and convolution to obtain final output, wherein: the feature extraction stage uses a cascade convolution and an activation layer, the feature dimension reduction stage uses a bottleneck layer to reduce dimension, the deconvolution up-sampling is used, and the convolution obtains a final output.

2. The medical image super-resolution reconstruction method according to claim 1, characterized by comprising the steps of:

3. The medical image super-resolution reconstruction method according to claim 2, wherein: the dense neural network comprises N basic units, wherein each basic unit is internally provided with N cascaded convolution layers and activation layers, N is more than or equal to 8, and the last of each basic unit is cascaded with a mixed attention mechanism unit; and adding a large number of dense jump connection structures between each basic unit and each basic unit, and extracting a deeper feature representation.

4. The medical image super-resolution reconstruction method according to claim 2, wherein the preprocessing and data enhancement in the second step is to cut an input image, perform a downsampling operation on a sub-image obtained by cutting to obtain a corresponding low-resolution image, and acquire more training samples by using data enhancement.

5. The method for reconstructing a super-resolution image as recited in claim 2, wherein in the third step, a loss L based on the L2 norm is used ₂ Quantifying the similarity degree between the high-resolution image obtained by super-resolution and the real high-resolution image,

the training process adopts small-batch learning, and the adopted expression of the loss function is as follows:

wherein: i ^HR Is a true high resolution image; i ^SR A high resolution image obtained for performing super resolution; h _, W, C are the size (length, width) and channel number of the input image, n is the number of small-batch learning, v is the v Zhang Tezheng diagram in the small-batch n, k is the kth channel of the v-th feature diagram, and (I, j) is the coordinate position in the feature diagram, I _v,i,j,k The position of the kth channel, which is the v-th feature map, is the pixel value of (i, j).

6. The medical image super-resolution reconstruction method according to claim 2, wherein in step four: the medical image obtained by super-resolution reconstruction is amplified by a low-resolution imageMultiple obtained->

7. The medical image super-resolution reconstruction method according to any one of claims 1 to 6, characterized by proposed methods and technical processes, including method extensions implemented software systems.

8. The medical image super-resolution reconstruction method according to any one of claims 1 to 6, wherein the super-resolution accuracy is improved by providing more basic units or more convolution and activation layers in the basic units.