CN116958537A

CN116958537A - Lung nodule segmentation method based on U-Net model

Info

Publication number: CN116958537A
Application number: CN202310508247.3A
Authority: CN
Inventors: 黎建军; 李帅; 黄子杰; 杨礼源
Original assignee: China Jiliang University
Current assignee: China Jiliang University
Priority date: 2023-05-08
Filing date: 2023-05-08
Publication date: 2023-10-27

Abstract

The invention discloses a lung nodule segmentation method based on a U-Net model, which can effectively segment a lung nodule region of a lung CT image through a constructed improved model, wherein the improved lung nodule segmentation model is based on the U-Net model and comprises an encoder, a decoder, a residual block and an improved ASPP module, a BN layer, a Dropout layer and a residual block are added into the improved lung nodule segmentation model, the problems of overfitting, gradient explosion, gradient disappearance and the like are effectively solved, and the ASPP module is improved by adding a convolution layer, a CBAM convolution attention module and an increased parallel cavity convolution layer number.

Description

Lung nodule segmentation method based on U-Net model

Technical Field

The invention relates to the technical field of medical image processing, in particular to an improved U-Net lung nodule segmentation method.

Background

In recent years, with the continuous development of deep learning technology, deep learning has achieved remarkable results in medical image processing. The traditional segmentation method can not meet the increasingly huge and complex medical image segmentation task, and the deep learning can be applied to the fields of medical image segmentation and recognition, and has higher recognition accuracy compared with the experience judgment of doctors, so that the method becomes a hot spot for the research of the current medical field.

Lung cancer is one of the diseases with higher morbidity and mortality in cancer at present, and has been in a continuously growing situation in recent years. Because the lung cancer has no obvious symptoms in early stage, the lung cancer is not easy to be found by patients, and when the patients have obvious symptoms, cancer cells are spread and reach the stage of the advanced stage of the lung cancer. The early lung cancer mainly exists in the form of lung nodules, and the lung cancer mortality and treatment cost can be greatly reduced by screening the lung nodules to be the most effective way for preventing and treating lung cancer.

The electronic computer tomography (ComputedTomography, CT) is a high-precision lung nodule screening and analyzing means, and compared with a whole lung CT image, the lung nodule only occupies a small part of area and has different forms, so that the accurate segmentation of the lung nodule is difficult. In addition, when the imaging doctor segments the lung nodules, the imaging doctor is often influenced by subjective consciousness and experience, and visual erroneous judgment is also likely to occur particularly in the case of visual fatigue. The diagnosis of the lung nodule by the doctor is assisted by utilizing the segmentation technology in the image processing, so that the working intensity of the doctor can be reduced, the segmentation efficiency is improved, a basis is provided for early screening of lung cancer, and the health level of people is improved.

The medical image segmentation refers to a process of dividing a medical image into a plurality of mutually disjoint 'connected' regions according to some similarity feature (such as brightness, color, texture, area, shape, position, local statistical feature or spectral feature) of the medical image, where related features show consistency or similarity in the same region and obviously different regions, that is, pixels have some discontinuous characteristic at region boundaries.

U-Net (UnityNetworking) was originally intended to solve the problems associated with biomedical images and was also widely used in various directions of semantic segmentation. U-Net is an excellent semantic segmentation model, which is mainly performed in a similar way to other semantic segmentation models, and is different from Convolutional neural network (Convolitional NeuralNetworks, CNN) in that CNN is an image-level classification, and U-Net is a pixel-level classification, and output is a class of each pixel.

However, in the segmentation process of the U-Net model, the problems of gradient disappearance, easy loss of spatial information, low feature utilization rate and the like exist, and as the network layer number is deepened, over fitting is easy to cause. In addition, the U-Net model does not fully utilize the characteristic information among different stages (stages), and the problem of incomplete characteristic utilization is easy to occur. The problems affect the accuracy of the segmentation of the U-Net model. Therefore, how to better improve the accuracy of the U-Net model to segment lung nodules is of great importance.

In order to solve the problems, the invention provides a lung nodule segmentation method based on a U-Net model, so that the accuracy of lung nodule segmentation is improved.

Disclosure of Invention

The invention aims to provide a lung nodule segmentation method based on a U-Net network, which aims to solve the problems of gradient disappearance, gradient explosion, easy loss of spatial information, low characteristic utilization rate, overfitting and the like of the U-Net network in the background technology.

According to the lung nodule segmentation method based on the U-Net network, an improved U-Net network model is constructed, so that lung nodule areas of lung CT images can be effectively segmented, and the improved lung nodule segmentation model is based on the U-Net model and comprises an encoder, a decoder, a residual error module and an improved ASPP module.

The invention provides a lung nodule segmentation method based on a U-Net network, which comprises the following steps:

s1: the lung CT image is read, and the CT image is preprocessed;

s2: cutting out a lung nodule picture and generating a corresponding mask;

s3: dividing a data set according to the acquired lung nodule pictures and the corresponding masks to obtain a training set and a testing set;

s4: building an improved U-Net network structure;

s5: inputting the obtained training set into an improved U-Net network for training;

s6: and inputting the CT image of the lung nodule to be segmented into a trained U-Net improved network to obtain a segmentation result of the lung nodule.

Preferably, in the step S1, the lung CT image in the designated format in the dataset is read, and the original lung CT image is preprocessed by binarization, image edge filling, histogram correction, gray level conversion, image smoothing, gaussian filtering, clipping, image translation, thresholding, morphology, data enhancement, and the like.

Preferably, in step S2, a portion of the lung CT image including the lung nodule is cut, and a mask image corresponding to the lung nodule is generated.

Preferably, in step S3, the data set is divided according to the obtained lung nodule image and the corresponding mask, so as to obtain a training set and a testing set, where the training set accounts for 80% and the testing set accounts for 20%.

Preferably, in the step S4, in order to facilitate the design of the model and the fusion of the shallow layer features and the deep layer features, the invention replaces the "padding=valid" convolution in the original U-Net network with the "padding=same" convolution, after the setting is used, the feature map obtained after the convolution processing and the dimension of the original input will remain consistent, in addition, in order to accelerate the convergence speed of the model and avoid the problems of gradient disappearance and gradient explosion, the invention connects BN (BatchNormalization) layers after each convolution operation, finally, in order to prevent overfitting and improve the network performance, the invention designs a Dropout layer with an inactivation rate of 30%, and improves the 3x3 convolution layer and the ReLU layer of the original U-Net network: the Dropout layer is immediately behind the 3x3 convolution of the original U-Net network, the Dropout layer is then followed by the BN layer, and finally the ReLU layer, the residual structure is used for directly transmitting input information to output, so that the loss of information is reduced, the problem of gradient disappearance is reduced for the network, the constructed network model is enabled to be converged more quickly, the BN layer can accelerate the training and convergence speed of the network, control gradient explosion, prevent gradient disappearance and prevent overfitting, the principle of Dropout is that part of feature detection units are deactivated randomly with a certain probability during model training, and therefore a plurality of models are equivalently trained, and the finally obtained model is more generalized and robust, can effectively avoid the problem of overfitting, gradient explosion, gradient disappearance and the like through the improvement.

Preferably, in the step S4, a BN layer, a Dropout layer, and a residual block method are added to the original U-Net network, and an improved U-Net network structure is constructed by a method of improving an ASPP module, specifically:

input image- & gtfirst residual double convolution module- & gtfirst 2x2 max pooling layer- & gtsecond residual double convolution module- & gtsecond 2x2 max pooling layer- & gtthird residual double convolution module- & gtthird 2x2 max pooling layer- & gtfourth residual double convolution module- & gtfourth 2x2 max pooling layer- & gtimproved cavity convolution pyramid module (improved ASPP module- & gt3 x3 convolution combination- & gtfirst 2x2 up-sampling layer- & gtfirst residual double convolution stitching module- & gtsecond 2x2 up-sampling- & gtsecond residual double convolution stitching module- & gtthird 2x2 up-sampling- & gtthird residual double convolution stitching module- & gtfourth 2x2 up-sampling- & gtfourth residual double convolution stitching module- & gt1 x1 convolution- & gtsegmentation map.

Preferably, in the step S4, the residual double convolution module is specifically configured to:

firstly, obtaining a characteristic diagram through a 3x3 convolution combination, then, obtaining a new characteristic diagram through the obtained characteristic diagram and a 3x3 convolution combination, and connecting two 3x3 convolution combinations integrally and then combining residual blocks, wherein the 3x3 convolution combination comprises the following components: consists of a 3x3 convolution, a BN layer, a Dropout layer and a ReLU layer which are connected in sequence.

Preferably, in the step S4, the residual double convolution splicing module is specifically configured as follows:

the method comprises the steps of firstly, merging a left encoding path and a right decoding path through jump connection in a superposition operation, then, obtaining a characteristic diagram through a 3x3 convolution combination, then, obtaining a new characteristic diagram through the obtained characteristic diagram and a 3x3 convolution combination, and integrally and further combining two 3x3 convolution combinations with residual blocks to form the combined residual block connection, wherein the 3x3 convolution combination comprises the following components: the three-dimensional convolution module is formed by sequentially connecting a 3x3 convolution, a BN layer, a Dropout layer and a ReLU layer, wherein the 3x3 convolution combination in the residual double convolution module and the residual double convolution splicing module is identical.

The jump connection part fuses the left encoding path and the right decoding path together through superposition operation, namely, the low-level characteristic information obtained by the encoding path and the high-level characteristic information obtained by the decoding path are fused, and the channel number and the resolution of the characteristic diagrams at the head end and the tail end of the jump connection are the same, so that the fused characteristic diagrams have shallow characteristic information and deep semantic characteristic information at the same time, and a more accurate segmentation effect is facilitated.

Preferably, in the step S4, the modified ASPP module is specifically configured as follows:

the method has the advantages that different receptive fields and convolution blocks can be obtained through convolution, each layer is equivalent to operating on homologous but different characteristic blocks, subsequent semantic fusion and supplementation are facilitated, 1x1 convolution and 1x1 global pooling are additionally arranged, 8 branches are added in total, and convolution blocks are added before 7 branches (except for 1x1 global pooling layers), wherein after 1x1 global pooling layers, information obtained through 1x1 convolution and up-sampling operations is fused with the rest 7 branch information, in addition, CBAM (ConvolutionalBlock AttentionModule, convolution attention module) is introduced, the weight of a network channel is improved, important characteristics of learning are enhanced, the characteristic fusion is enhanced, 8 branch characteristics are extracted, and the number of the semantic information can be enriched by adjusting the channel through 1x1 convolution.

Preferably, in the step S5, in performing the network training, diceLoss is used as a loss function, a random gradient descent (StochasticGradientDescent, SGD) is used as an optimization function, and an intersection ratio is used as an evaluation index of the model.

Compared with the prior art, the invention has the beneficial effects that:

1. the invention adds BN layer, dropout layer and residual block in each block of the original U-Net network, uses residual structure to directly transmit the input information to the output, reduces the loss of information, reduces the problem of gradient disappearance for the network, thus making the constructed network model converge faster, the BN layer can accelerate the training and convergence speed of the network, control gradient explosion, prevent gradient disappearance and prevent over fitting, the Dropout layer randomly deactivates part of the feature detection units with a certain probability, the finally obtained model has more generalization and robustness, the over fitting problem can be effectively avoided, and the problems of over fitting, gradient explosion and gradient disappearance of the original U-Net network model can be effectively solved through the improvement.

2. The traditional downsampling can increase the receptive field, but can reduce the spatial resolution, in the original U-Net network, by adding an improved ASPP module, the receptive field is enlarged, multiscale context information can be effectively captured, the improved ASPP module is introduced, a CBAM convolution attention module is introduced, the receptive field is enlarged, the resolution can be ensured, in addition, the weight of a network channel can be improved, the model can pay better attention to information of important channels, the important features are strengthened, the information utilization rate is improved, the extracted information is fully fused, and the segmentation precision is improved.

Drawings

FIG. 1 is a schematic flow chart of the method of the invention;

FIG. 2 is a general framework diagram of an improvement over the U-Net network in accordance with the present invention;

FIG. 3 is a schematic view of an improved ASPP module according to the present invention;

FIG. 4 is a block diagram of a residual unit used in the present invention;

fig. 5 is a block diagram of a CBAM convolution attention module for use in the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and specifically described below with reference to the drawings in the embodiments of the present invention, but these embodiments are not limited to the present invention, and it should be noted that the following embodiments are only for illustrating the specific implementation of the present invention, and not all embodiments, based on the embodiments in the present invention, all other embodiments obtained by a person of ordinary skill in the art without making any inventive effort, are included in the protection scope of the present invention.

As shown in FIG. 1, the invention provides a lung nodule segmentation method for improving U-Net, which comprises the following steps:

step one: acquiring lung CT data sets with marking information from professional hospitals and doctors;

step two: reading a lung CT data set and a doctor's labeling file, cutting out a part with a lung nodule in a CT image through labeling information, and generating a mask picture corresponding to the lung nodule through segmentation information of edges;

step three: dividing the pictures intercepted in the second step into data sets, wherein the training set accounts for 80% and the testing set accounts for 20%;

step four: building an improved U-Net network model;

step five: inputting the training set divided in the third step into the network constructed in the fourth step for model training;

step six: inputting the test set into the trained network in the fifth step to obtain a lung nodule segmentation result graph.

In the first step, the data set contains 800 patients, and each case has about 200-400 CT images, and the images are marked by a professional doctor, and the size of each CT image is 512x512.

In the second step, the part containing the lung nodule is cut into a picture with the size of 64x64 by taking the node as the center according to the position information marked in the data set, and then a corresponding mask picture is generated according to the outline information of the node.

In the third step, the data set is divided according to the obtained lung nodule pictures and the corresponding masks, so as to obtain a training set and a testing set, wherein the training set accounts for 80% and the testing set accounts for 20%.

In the fourth step, a BN layer, a Dropout layer and a residual block are added to the original U-Net network, and an improved U-Net network structure is constructed by improving the ASPP module, which is specifically shown in fig. 2:

block1: inputting a 512x512 lung CT image to be detected, obtaining a 64-channel feature map through a 3x3 convolution combination, then obtaining a new 512x512x64 (width, height and channel number) feature map through a 3x3 convolution combination, and then, introducing a residual block into a second 3x3 convolution combination through a first 2x2 max pooling layer, wherein the residual block starts from the second 3x3 convolution combination and ends at the first 2x2 max pooling layer, and the 3x3 convolution combination comprises the following components: the 3x3 convolution, a BN layer, a Dropout layer and a ReLU layer are sequentially connected, and the following 3x3 convolution combination is the same as the above 3x3 convolution combination, and will not be described again.

Block2: 256x256x64 feature maps are obtained through a first 2x2 max pooling layer, 256x256x128 feature maps are obtained through two 3x3 convolution combinations, then a new 128-channel feature map passes through a second 2x2 max pooling layer, in addition, a residual block is introduced into the first 3x3 convolution combination and one branch, the residual block starts from the first 3x3 convolution combination and ends at the second 2x2 max pooling layer.

Block3: the 128x128x128 feature map is obtained through the second 2x2 max pooling layer, the 128x128x256 feature map is obtained through two 3x3 convolution combinations, then the new 256 channel feature map passes through the third 2x2 max pooling layer, in addition, a residual block is introduced into the first 3x3 convolution combination, the residual block starts from the first 3x3 convolution combination and ends at the third 2x2 max pooling layer.

Block4: the third 2x2 max pooling layer is used for obtaining a 64x64x256 feature map, the two 3x3 convolution combinations are used for obtaining a 64x64x512 feature map, then the new 512 channel feature map passes through the fourth 2x2 max pooling layer, in addition, a residual block is introduced into the first 3x3 convolution combination and is started from the first 3x3 convolution combination and is stopped at the fourth 2x2 max pooling layer.

Block5: after a fourth 2x2 maximum pooling layer, an improved ASPP module is adopted, then a 3x3 convolution combination is adopted to obtain a 32x32x1024 feature map, then a first 2x2 up-sampling layer is adopted, 5 branches are arranged in the original ASPP, namely, 1x1 convolution, 3x3 convolution with the hole numbers of 6, 12 and 18 and 1x1 global pooling are respectively carried out, the improved ASPP module increases the number of parallel hole convolution layers from 3 layers to 6 layers, the advantage is that different receptive fields and convolution blocks can be obtained through convolution, each layer is equivalent to operating on homologous but different feature blocks, the subsequent semantic fusion and supplementation are more facilitated, then 1x1 convolution and 1x1 global pooling are additionally adopted, the total number of branches is changed into 8, the convolution blocks are added before 7 branches (except for the 1x1 global pooling layer), after the 1x1 global pooling layer, the obtained information is subjected to convolution and up-sampling operation, the rest 7 branch information is carried out, the improved convolution blocks are adopted, the important feature information is further, the important feature is extracted by the improved, and the improved channel is improved, and the feature is improved, and the important feature is extracted by the method.

Block6: after passing through the first 2x2 up-sampling layer, the obtained feature map is in jump connection with the feature map after the second 3x3 convolution combination in the Block4 to obtain a 64x64x1024 feature map, then the feature map is obtained through two 3x3 convolution combinations to obtain a 64x64x512 feature map, then the new 512 channel feature map passes through the second 2x2 up-sampling layer, in addition, a residual Block is introduced into the first 3x3 convolution combination, and the residual Block starts from the first 3x3 convolution combination and ends at the second 2x2 up-sampling layer.

Block7: after passing through the second 2x2 up-sampling layer, the obtained feature map is in jump connection with the feature map after the second 3x3 convolution combination in the Block3 to obtain a 128x128x512 feature map, then the feature map is obtained through two 3x3 convolution combinations to obtain a 128x128x256 feature map, then the new 256 channel feature map passes through the third 2x2 up-sampling layer, in addition, a residual Block is introduced into the first 3x3 convolution combination, and the residual Block starts from the first 3x3 convolution combination and ends at the third 2x2 up-sampling layer.

Block8: after passing through the third 2x2 up-sampling layer, the obtained feature map is in jump connection with the feature map after the second 3x3 convolution combination in the Block2 to obtain a 256x256x256 feature map, then the feature map is obtained by two 3x3 convolution combinations to obtain a 256x256x128 feature map, then the new 128 channel feature map passes through the fourth 2x2 up-sampling layer, in addition, a residual Block is introduced into the first 3x3 convolution combination, and the residual Block starts from the first 3x3 convolution combination and ends at the fourth 2x2 up-sampling layer.

Block9: after passing through the fourth 2x2 up-sampling layer, the 512x512x64 feature map is obtained through combination of two 3x3 convolutions, and then the 512x512x1 feature map is obtained through 1x1 convolutions.

In step 5, in the network training, diceLoss is used as a loss function, a random gradient descent (StochasticGradientDescent, SGD) is used as an optimization function, and an intersection ratio is used as an evaluation index of the model.

In the step 6, the test set is input into a trained network model, and the segmentation result is checked.

For the sake of easy understanding of the present patent, a simple introduction is made here of a residual block and CBAM, as shown in fig. 4, which is divided into two parts: the direct mapping part and the residual part, the Weight in fig. 4 refers to convolution operation in the convolution network, CBAM (ConvolutionalBlockAttentionModule) is a lightweight convolution attention module, which combines a channel and a spatial attention mechanism module, and as can be seen from fig. 5, the CBAM comprises CAM (ChannelAttentionModule) and SAM (SpartialAttentionModule) sub-modules, and performs channel and spatial attention respectively, so that parameters and computational power can be saved, and the integration of the module into the existing network architecture as a plug-and-play module is ensured.

After training, the traditional U-Net network model and the improved U-Net network model of the invention are respectively evaluated, the intersection ratio of the traditional U-Net network model is 0.88, and the intersection ratio of the improved U-Net network model of the invention is 0.95, and compared with the traditional U-Net network model, the segmentation precision of the improved U-Net network model of the invention is improved by 8%.

The foregoing is merely illustrative of specific embodiments of the invention, and the scope of the invention is not limited thereto, but is intended to cover any variations or alternatives not covered by the inventive subject matter, and therefore, the scope of the invention is defined by the appended claims.

Claims

1. A method for improving lung nodule segmentation of U-Net, comprising the steps of:

step one: the lung CT image is read, and the CT image is preprocessed;

step two: cutting out a lung nodule picture and generating a corresponding mask;

step three: dividing a data set according to the acquired lung nodule pictures and the corresponding masks to obtain a training set and a testing set;

step four: building an improved U-Net network structure;

step five: inputting the obtained training set into an improved U-Net network for training;

step six: and inputting the CT image of the lung nodule to be segmented into a trained U-Net improved network to obtain a segmentation result of the lung nodule.

2. The method of claim 1, wherein in step one, the lung nodule segmentation method of U-Net is characterized in that, in the step one, lung CT images of a specified format in the dataset are read, and the original lung CT images are preprocessed by binarization, image edge filling, histogram correction, gray level transformation, thresholding, morphology, data enhancement, and the like.

3. The method for segmenting pulmonary nodules of claim 1, wherein in step two, a portion of the lung CT image containing pulmonary nodules is segmented and mask images corresponding to the pulmonary nodules are generated.

4. The method for segmenting pulmonary nodules of improved U-Net according to claim 1, wherein in step three, the data set is divided according to the obtained pulmonary nodule images and the corresponding masks to obtain a training set and a test set, wherein the training set accounts for 80% and the test set accounts for 20%.

5. The method for segmenting pulmonary nodules of improved U-Net according to claim 1, wherein in the fourth step, the constructed improved U-Net network model structure is: input image, first residual double convolution module, first 2x2 max pooling layer, second residual double convolution module, second 2x2 max pooling layer, third residual double convolution module, third 2x2 max pooling layer, fourth residual double convolution module, fourth 2x2 max pooling layer, improved cavity convolution pyramid module, 3x3 convolution combination, first 2x2 up-sampling layer, first residual double convolution splicing module, second 2x2 up-sampling, second residual double convolution splicing module, third 2x2 up-sampling, third residual double convolution splicing module, fourth 2x2 up-sampling, fourth residual double convolution splicing module, 1x1 convolution and split map.

6. The method according to claim 1, wherein in the fifth step, in the network training, diceLoss is used as a loss function, a random gradient descent SGD is used as an optimization function, and a cross-over ratio is used as an evaluation index of the model.