CN114219719A

CN114219719A - CNN medical CT image denoising method based on dual attention and multi-scale features

Info

Publication number: CN114219719A
Application number: CN202111254289.6A
Authority: CN
Inventors: 张聚; 上官之博; 姚信威; 马栋; 牛彦; 施超; 潘玮栋; 陈德臣
Original assignee: Zhejiang University of Technology ZJUT
Current assignee: Zhejiang University of Technology ZJUT
Priority date: 2021-10-27
Filing date: 2021-10-27
Publication date: 2022-03-22

Abstract

The CNN medical CT image denoising algorithm based on double attention and multi-scale features specifically comprises the following steps: step 1) creating a medical CT image model; step 2) constructing a noise prior information extraction network; step 3), constructing a denoising network; step 4), preprocessing a data set; step 5), training a denoising network and updating parameters; the invention has the following advantages: 1) the method uses a double attention mechanism to be fused into a noise prior network to denoise the medical CT image, and enhances the noise extraction effect. 2) The multi-scale context module is used in the denoising network, so that the detail texture retaining effect of the network on the image is enhanced.

Description

CNN medical CT image denoising method based on dual attention and multi-scale features

Technical Field

The invention relates to the field of medical image denoising, in particular to a CNN medical CT image denoising method.

Technical Field

In the medical field, medical image processing techniques are increasingly used in therapy planning and disease diagnosis. The medical imaging techniques widely used in clinical and diagnostic applications are mainly nuclear Magnetic Resonance (MRI), Computed Tomography (CT), and the like. Although the medical imaging equipment and the image acquisition equipment are very advanced, the medical imaging equipment and the image acquisition equipment are influenced by objective factors, noise is inevitably generated in the medical image, and the noise can directly influence the quality of the medical image so as to influence the judgment of medical staff. Therefore, it is necessary to analyze and study the denoising technique in the medical image preprocessing. Denoising is the basis and precondition of image processing and is also an indispensable step in image preprocessing. The denoising technology for natural image preprocessing achieves obvious effect at present, but the research on the denoising technology in the medical image preprocessing has some problems.

Many early model-based methods found prior information for images and then applied optimization algorithms to iteratively solve the models, which is time consuming and inefficient. With the rise of deep learning, CNN is widely applied to denoising and achieves good effects. Today, various methods have been used to achieve good denoising of AWGN images, but it is difficult to perform denoising of real world images because the noise sources in the camera system are various (such as dark current noise, short noise, thermal noise, etc.), and further go through ISP processes (including demosaicing, Gamma correction, compression, etc.), so that the noise of real world images is much more complicated than gaussian noise. In addition, the noise level of the real world noise image is unknown, and if the non-blind AWGN denoising method is directly used, details can be easily eliminated or the image is too smooth, so that the generalization capability is poor.

Disclosure of Invention

The invention aims to overcome the defects of the prior art and provides a CNN medical CT image denoising method based on double attention and multi-scale characteristics

The invention aims to improve the denoising effect of a medical CT image, in the traditional denoising method based on a noise prior medical image, a single noise level image or multiple noise level images are often directly mixed to serve as training data, and a conventional cnn network is used for extracting noise information, so that the obtained denoising model can only process noise images in a limited range, the neural network model cannot be fully generalized to the images with wider noise levels, and the neural network cannot fully acquire image information during model training. In order to improve the denoising capability of the network, the invention adds a space attention mechanism and a channel attention mechanism based on double attention, namely parallel connection, into a noise extraction network, thereby realizing a noise prior network with stronger noise extraction capability. And connecting the extracted data and the noise image in series as the input of the denoising network, and optimizing parameters in the neural network by using the extracted information and the noise image.

The innovation and the advantages of the invention are as follows: the invention integrates a new double attention unit on the basis of the original noise prior network, and achieves better effect in the aspect of prior noise extraction. The generalization capability of denoising the real world noise image is enhanced.

In order to make the objects, technical solutions and advantages of the present invention clearer, the following is a detailed description of the technical solution of the present invention, and a CNN medical CT image denoising method based on dual attention and multi-scale features includes the following specific steps:

step 1) creating a medical CT image model:

creating a medical CT image model:

adopting a Gaussian noise model, wherein the mathematical expression is as follows:

Y＝X+V (1)

wherein X is a clean image without noise, Y is an actual image with noise, and V is noise; the noise distribution of V obeys Gaussian distribution, the Gaussian noise is a type of noise with a probability density function obeying Gaussian distribution (namely normal distribution), namely a Gaussian random variable z probability density function, and the mathematical expression of the Gaussian noise is as follows:

where μ is expressed as a mathematical expectation and σ is expressed as a standard deviation;

step 2) constructing a noise prior information extraction network:

this stage focuses on extracting features from noisy images, using a simple five-layer full convolution sub-network, without posing and batch normalization, followed by ReLU activation after each Conv layer. At each convolutional layer, the eigenchannel is set to 32 (the last layer is set to 1), and the convolutional kernel size is 3 × 3.

Before the last layer, a Dual attribute module is inserted to effectively capture the global dependency of the feature. The Dual attribute module consists of a space attention module and a channel attention module which are connected in parallel.

The spatial attention module aims to mutually enhance the expression of respective features by utilizing the association between any two features. Specifically, firstly, a correlation strength matrix between any two point features is calculated, namely, the original feature A is subjected to convolution dimensionality reduction to obtain a feature B and a feature C, then the feature dimensionality B is changed to ((HxW) xC ') and C is changed to (C' x (HxW)), and then the matrix product is carried out to obtain a correlation strength matrix ((HxW) x (HxW)) between any two point features. And then obtaining an attention graph S of each position to other positions through the normalization of a softmax operation, wherein the more similar two-point characteristics have larger response values. The response values in the attention map are then used as weights to perform weighted fusion on the feature D, so that for each position point, similar features are fused in the global space through the attention map.

The channel attention module aims to enhance specific semantic response capabilities under the channels by modeling associations between the channels. The specific process is similar to the position attention module, except that when the feature attention diagram X is obtained, dimension transformation and matrix multiplication are carried out on any two channel features to obtain the correlation strength of any two channels, and then the attribute diagram between the channels is obtained through the softmax operation. And finally, fusing by weighting the attention images among the channels, so that global association can be generated among the channels, and characteristics of stronger semantic response are obtained.

Step 3), constructing a denoising network:

the denoising network main body is of a UNet structure with hopping connection and is divided into an encoder-decoder left part and an encoder right part, and the two parts are connected by a context block module with 4 cavity convolutions with different expansion rates:

encoder part: the left half part is a down-sampling layer and consists of 4 convolution layers with the same structure. Each convolution layer consists of two conv + ReLU layers, the convolution kernel is set to be 3 multiplied by 3, the step length is 1, and padding is 1; the number of convolution channels of the first layer is 128, the number of convolution channels of the second layer is 256, the number of convolution channels of the third layer is 512, and the number of convolution channels of the fourth layer is 1024; the 2 x 2-sized maxpool layer connections are used for each downsampling, and the number of channels is doubled for each downsampling.

Context block module removal of BN layer, use of only the translation rates set to 1, 2, 3, 4, to further simplify the operation shortening time, first compress the feature channels using 1 × 1 Conv.

The compression ratio is set to 4, 1 × 1 convolution is used in the fusion (fusion) section, and similarly, local skip connection is used between the input and output features to prevent information blocking.

Decoder part: the right half part is an up-sampling layer which is symmetrical to the encoder part and consists of 4 up-sampling layers with the same structure. Each up-sampling convolution layer is the same as the left half part and consists of two conv + ReLU layers, the convolution kernel is set to be 3 multiplied by 3, the step length is 1, and the padding is 1; and performing characteristic up-sampling on each layer through 2 multiplied by 2 sized up-conv layers, wherein the number of channels is halved when each up-sampling is performed.

Jump connection is used between the encoder layers of the network, the feature map output by each encoder layer is added with the up-sampling result of the corresponding decoder layer, and the network is strengthened to restore the image detail texture to ensure the convergence of network training.

Step 4) data set preprocessing:

in the preprocessing stage of the data set, the data set is divided into a training set, a verification set and a test set. Cutting the pictures of the training set and the verification set to enable the sizes of the pictures to be uniform, replacing noise in the medical CT image with white Gaussian noise, and adding the white Gaussian noise with the mean value of 0 and the standard deviation of 5, 10, 15, 20, 25, 30, 35, 40, 45 and 50 to the original clean image to obtain the denoising network training data.

Step 5), training a denoising network and updating parameters:

the noise prior network firstly initializes the weight randomly, then carries out feature extraction on input noise image data x to obtain noise estimation data noise _ est, and finally splices the original input x and the noise _ est to be used as the input of the denoising network. The denoising network initializes the weight immediately and obtains output according to the input training network of the noise prior network. In the loss function, an original picture, a noisy picture, noise estimation and a real noise level are used as input, the mean variance is used for calculating the total loss, and finally an Adam optimizer is used for updating the network weight parameters.

Further, in the step (2), in order to further obtain the features of the global dependency relationship, the output results of the two modules are added and fused to obtain the final features for classifying the pixels.

The network model realizes blind denoising processing of the noise picture without training for a certain noise level picture.

The invention has the following advantages:

1. the method uses a double attention mechanism to be fused into a noise prior network to denoise the medical CT image, and enhances the noise extraction effect.

2. The multi-scale context module is used in the denoising network, so that the detail texture retaining effect of the network on the image is enhanced.

Drawings

FIG. 1 is a schematic illustration of a medical CT image containing Gaussian noise;

FIG. 2 is a schematic diagram of the overall network architecture of the present invention;

FIG. 3 is a schematic diagram of the Dual Attention module of the present invention;

FIG. 4 is a diagram of the Context Block module of the present invention;

FIG. 5 is a schematic diagram of denoising a medical CT image according to the present invention

The specific implementation mode is as follows:

the invention is explained in detail below with reference to the drawings

The CNN medical CT image denoising algorithm based on double attention and multi-scale features specifically comprises the following steps:

step 1) creating a medical CT image model:

creating a medical CT image model:

Y＝X+V (1)

step 2) constructing a noise prior information extraction network:

In order to further obtain the characteristics of the global dependency relationship, the output results of the two modules are added and fused to obtain the final characteristics for classifying the pixel points.

Step 3), constructing a denoising network:

Step 4) data set preprocessing:

Step 5), training a denoising network and updating parameters:

The embodiments described in this specification are merely illustrative of implementations of the inventive concept and the scope of the present invention should not be considered limited to the specific forms set forth in the embodiments but rather by the equivalents thereof as may occur to those skilled in the art upon consideration of the present inventive concept.

Claims

1. The CNN medical CT image denoising method based on double attention and multi-scale features comprises the following specific steps:

step 1) creating a medical CT image model:

creating a medical CT image model:

Y＝X+V (1)

step 2) constructing a noise prior information extraction network:

this stage focuses on extracting features from noisy images, using a simple five-layer full convolution sub-network, without posing and batch normalization, with ReLU activation after each Conv layer; at each convolutional layer, the eigenchannel is set to 32 (the last layer is set to 1), the convolutional kernel size is 3 × 3;

before the last layer, inserting a Dual attribute module to effectively capture the global dependency of the features; the Dual attribute module consists of a space attention module and a channel attention module which are connected in parallel;

the spatial attention module aims to mutually enhance the expression of the respective features by utilizing the association between any two features; specifically, firstly, calculating a correlation strength matrix between any two point features, namely, obtaining a feature B and a feature C by convolution dimension reduction of an original feature A, changing the feature dimension B to ((HxW) xC ') and the feature dimension C to (C' x (HxW)), and then obtaining a correlation strength matrix ((HxW) x (HxW)) between any two point features by matrix multiplication; then obtaining an attention graph S of each position to other positions through softmax operation normalization, wherein the more similar two-point characteristics are, the larger the response value is; then, taking the response value in the attention image as a weight to perform weighted fusion on the feature D, so that for each point of each position, similar features are fused in the global space through the attention image;

the channel attention module aims to enhance the specific semantic response capability under the channels by modeling the association between the channels; the specific process is similar to the position attention module, except that when the feature attention diagram X is obtained, dimension transformation and matrix multiplication are carried out on any two channel features to obtain the correlation strength of any two channels, and then the attention graph among the channels is obtained through softmax operation; finally, fusion is carried out through attention graph weighting among the channels, so that global association can be generated among the channels, and characteristics of stronger semantic response are obtained;

step 3), constructing a denoising network:

encoder part: the left half part is a down-sampling layer which consists of 4 convolution layers with the same structure; each convolution layer consists of two conv + ReLU layers, the convolution kernel is set to be 3 multiplied by 3, the step length is 1, and padding is 1; the number of convolution channels of the first layer is 128, the number of convolution channels of the second layer is 256, the number of convolution channels of the third layer is 512, and the number of convolution channels of the fourth layer is 1024; in each downsampling, 2 x 2-sized maxpool layer connection is used, and the number of channels is doubled after each downsampling;

removing the BN layer, only using the differences set to 1, 2, 3 and 4, and firstly compressing the feature channel by using 1 multiplied by 1Conv in order to further simplify the operation and shorten the time; compression ratio is set to 4, 1 × 1 convolution is used in the fusion (fusion) section, and similarly, local skip connection is used between the input and output features to prevent information blocking;

decoder part: the right half part is an upper sampling layer, is symmetrical to the encoder part and consists of 4 upper sampling layers with the same structure; each up-sampling convolution layer is the same as the left half part and consists of two conv + ReLU layers, the convolution kernel is set to be 3 multiplied by 3, the step length is 1, and the padding is 1; performing characteristic up-sampling on each layer through 2 multiplied by 2 sized up-conv layers, and reducing the number of channels by half once per up-sampling;

jump connection is used between the encoder layers of the network, the feature map output by each encoder layer is added with the up-sampling result of the corresponding decoder layer, and the network is enhanced to restore the image detail texture to ensure the convergence of network training;

step 4) data set preprocessing:

dividing a data set into a training set, a verification set and a test set in a data set preprocessing stage; cutting the pictures of the training set and the verification set to enable the sizes of the pictures to be uniform, replacing noise in the medical CT image with white Gaussian noise, and adding the white Gaussian noise with the mean value of 0 and the standard deviation of 5, 10, 15, 20, 25, 30, 35, 40, 45 and 50 to the original clean image to obtain denoising network training data;

step 5), training a denoising network and updating parameters:

the noise prior network firstly initializes the weight randomly, then carries out feature extraction on input noise image data x to obtain noise estimation data noise _ est, and finally splices the original input x and the noise _ est to be used as the input of the noise removing network; the denoising network initializes the weight immediately and obtains output according to the input training network of the noise prior network; in the loss function, an original picture, a noisy picture, noise estimation and a real noise level are used as input, the mean variance is used for calculating the total loss, and finally an Adam optimizer is used for updating the network weight parameters.

2. The CNN medical CT image denoising method based on dual attention and multi-scale features of claim 1, wherein: in the step (2), in order to further obtain the features of the global dependency relationship, the output results of the two modules are added and fused to obtain the final features for classifying the pixel points.