CN112215850A

CN112215850A - Method for segmenting brain tumor by using cascade void convolution network with attention mechanism

Info

Publication number: CN112215850A
Application number: CN202010848879.0A
Authority: CN
Inventors: 褚晶辉; 黄凯隆; 吕卫
Original assignee: Tianjin University
Current assignee: Tianjin University
Priority date: 2020-08-21
Filing date: 2020-08-21
Publication date: 2021-01-12

Abstract

The invention relates to a method for segmenting a brain tumor by a cascade cavity convolution network with an attention mechanism, which comprises the following steps: preprocessing data; the method for building the network structure comprises the following steps: establishing a cascade cavity convolution network with an attention mechanism, and adopting a three-level cascade framework to simplify a multi-class segmentation task into three two-class segmentation tasks, wherein the three-level segmentation networks are respectively W-Net, T-Net and E-Net and are respectively used for segmenting a brain tumor Whole (WT) area, a tumor nucleus (TC) area and an enhanced tumor nucleus (ET) area; segmenting in each stage from the axial direction, the sagittal direction and the coronal direction respectively, and then averaging in segmentation results in the three directions to obtain a more accurate segmentation result; the network structure of each level in the three-level cascade frame of the cascade cavity convolution with the attention mechanism is a full convolution network structure of encoding and decoding and is divided into four parts, namely an encoder, a decoder, a layer jump structure and a multi-layer feature map fusion.

Description

Method for segmenting brain tumor by using cascade void convolution network with attention mechanism

Technical Field

The invention relates to the field of image processing, in particular to a method for segmenting a three-dimensional medical brain tumor image.

Background

The brain tumor is an intracranial tumor with high lethality, and can be divided into high glial tumor (HGG) and low glial tumor (LGG) according to histological heterogeneity and tumor aggressiveness, and further divided into edema region, tumor nucleus region, enhanced tumor nucleus region, non-enhanced tumor nucleus region and necrosis region. Four modality images of brain tumor nuclear Magnetic Resonance (MR): the T1, T1ce, T2, and FLAIR are distinct in tumor regions of interest and may provide supplementary information to each other. Brain tumor segmentation is to segment different tumor regions in a brain image, and is important for the assessment of patient diseases, the formulation of treatment schemes and the subsequent observation and study. However, manual segmentation by doctors is time-consuming and labor-consuming, errors are easy to occur after long-time manual labeling, and segmentation results of doctors with different experiences are different, so that an automatic and high-accuracy brain tumor segmentation method is needed.

With the development of deep learning, brain tumor segmentation based on deep learning becomes a method with the highest accuracy, and the brain tumor segmentation based on deep learning is more common to be FCN^[1]、U-Net^[2]And V-Net^[3]The FCN deletes the full connection layer of the convolutional neural network to obtain a segmented image with the same size as the input image, the U-Net improves the FCN, a coding-decoding symmetrical structure is adopted, the V-Net changes the convolutional layer, the pooling layer and the upsampling layer of the U-Net into 3D variables, and residual connection is added to solve the problem of network degradation, DeepLab^[4]The system adds cavity convolution in the full convolution network to increase the receptive field of the convolution kernel. Oktayetal^[5]It is proposed to add spatial attention to the 3DU-Net structure for image segmentation and to use the decoder's profile attention on the encoder profile path.

Reference to the literature

[1]Shen H,Zhang J,Zheng W.Efficient symmetry-driven fully convolutional network for multimodal brain tumor segmentation[C]//2017 IEEE International Conference on Image Processing(ICIP).IEEE,2017:3864-3868.

[2]Ronneberger O,Fischer P,Brox T.U-Net:Convolutional Networks for Biomedi-cal Image Segmentation[J],2015,9351:234–241.

[3]Milletari,F.,Navab,N.,Ahmadi,S.A.,2016.V-net:Fully convolutional neural networks for volumetric medical image segmentation,in:2016 Fourth International Conference on 3D Vision(3DV),IEEE.pp.565–571.

[4]Chen,L.C.,Papandreou,G.,Kokkinos,I.,Murphy,K.,Yuille,A.L.,2017a.Deeplab:Semantic image segmentation with deep convolutional nets,atrous convolution,and fully connected crfs.IEEE transactions on pattern analysis and machine intelligence 40,834–848.

[5]Oktay O,Schlemper J,Folgoc L L,et al.Attention U-Net:Learning Where to Look for the Pancreas[J].2018.

[6]http://braintumorsegmentation.org/

[7]Isensee,F.,Kickingereder,P.,Wick,W.,Bendszus,M.,Maier-Hein,K.H.,2018.No new-net,in:International MICCAI Brain lesion Workshop,Springer.pp.234–244.

[8]Ioffe S,Szegedy C.Batch normalization:Accelerating deep network training by reducing internal covariate shift[J].arXiv preprint arXiv:1502.03167,2015.

[9]Xu B,Wang N,Chen T,et al.Empirical evaluation of rectified activations in convolutional network[J].arXiv preprint arXiv:1505.00853,2015.

[10]Noh H,Hong S,Han B.Learning deconvolution network for semantic segmentation[C]//Proceedings of the IEEE international conference on computer vision.2015:1520-1528.

Disclosure of Invention

The invention aims to provide a brain tumor segmentation method capable of improving segmentation precision. The brain tumor nuclear magnetic resonance image is divided by adopting a three-level cascaded full convolution network, the brain tumor nuclear magnetic resonance image is divided into a tumor Whole (WT) area, a tumor kernel (TC) area and an enhanced tumor kernel (ET) area, each level of network is similar, modification is carried out on the basis of the full convolution network, a coding-decoding network structure is adopted, a 3D convolution kernel is divided into intra-frame convolution and inter-frame convolution, hole convolution with different expansion rates is added, an attention mechanism is added, and multi-layer feature map fusion is added on a decoder, so that the dividing precision is improved. The technical scheme is as follows:

a method for segmenting brain tumors by a cascade cavity convolution network with an attention mechanism comprises the following steps:

(1) data preprocessing:

and selecting 3DMR images, constructing a training set and a verification set containing images of different brain tumor types, and preprocessing.

(2) The method for building the network structure comprises the following steps:

establishing a cascade cavity convolution network with an attention mechanism, adopting a three-level cascade framework, simplifying a plurality of segmentation tasks into three two-level segmentation tasks, reducing the segmentation difficulty and reducing network parameters, wherein the three-level segmentation networks are respectively W-Net, T-Net and E-Net and are respectively used for segmenting a brain tumor Whole (WT) area, a tumor nucleus (TC) area and an enhanced tumor nucleus (ET) area; segmenting in each stage from the axial direction, the sagittal direction and the coronal direction respectively, and then averaging in segmentation results in the three directions to obtain a more accurate segmentation result;

the network structure of each level in the three-level cascade frame of the cascade cavity convolution with the attention mechanism is a full convolution network structure of encoding and decoding and is divided into four parts of an encoder, a decoder, a layer jump structure and a multi-layer characteristic diagram fusion:

the encoder is of a four-layer structure, the first layer comprises four convolution kernels which are intra convolution layers with the size of 3 x 1, a batch normalization layer (BN) and a PReLU layer for non-linearization are arranged behind each intra convolution layer to form intra convolution blocks, and every two intra convolution blocks are connected through a residual to form a residual block; the second layer contains two residual blocks, a down-sampling layer and a convolution kernel which are interframe convolution layers with the size of 1 multiplied by 3, and a BN layer and a PReLU layer which are arranged behind the interframe convolution layers form interframe convolution blocks; the third layer has three residual blocks, an inter-frame convolution block and a down-sampling layer; respectively adding the second residual block and the third residual block into a cavity convolution with expansion rates of 2 and 3; the fourth layer comprises a down-sampling layer, three residual blocks with the expansion rate of 3 and an anti-convolution layer;

the decoding path has a three-layer structure, wherein the first layer has three residual blocks, an interframe convolution block and an up-sampling layer, and the first and second residual blocks are respectively added with a hole convolution with expansion rates of 3 and 2; the second layer has two residual blocks, an inter-frame convolution block and an deconvolution layer; the third layer contains two residual blocks;

the jump layer is connected with three: the first one adds the feature diagram output by the second layer of the decoder into the attention of the feature diagram output by the first layer of the encoder through four times of intra-frame convolution block operation, and is connected with the feature diagram output by the second layer of the decoder to be used as the input of the third layer of the decoder; the second one adds the feature diagram output by the second layer of the coder into the attention of the feature diagram output by the first layer of the decoder through two times of intra-frame convolution block operation, and is connected with the feature diagram output by the first layer of the decoder to be used as the input of the second layer of the decoder; the third section adds the output characteristic diagram of the third layer of the encoder to the attention of the characteristic diagram output by the fourth layer of the encoder, and is connected with the output characteristic diagram of the fourth layer of the encoder to be used as the input of the first layer of the decoder;

and the multilayer characteristic diagram fusion carries out two-time interframe convolution and two-time deconvolution operations on the fourth layer output characteristic diagram of the encoder, the first layer output characteristic diagram of the decoder is connected with the second layer output characteristic diagram of the decoder and the third layer output characteristic diagram of the decoder through one-time interframe convolution and one-time deconvolution operation, and the final output is obtained through one-class two-segmentation convolution.

(3) And (5) training and optimizing the model.

The invention has the following beneficial effects:

1: the three-level cascade framework of the cascade cavity convolution network with the attention mechanism reduces a plurality of classes of segmentation tasks into three two classes of segmentation tasks to divide the subareas of the brain tumors, thereby reducing the segmentation difficulty and the complexity of each class of network, limiting the segmentation range of the next class of network on the segmentation result output by the previous class of network by the cascade structure, reducing the problems of misjudgment and unbalanced number among classes, and improving the segmentation precision; the segmentation is respectively carried out on three dimensions and the average is calculated, so that the segmentation result is more reliable;

2: the use of the intra-frame convolution and the inter-frame convolution in the cascade void convolution network with the attention mechanism fully utilizes the space information of three dimensions of the slice, and reduces the consumption of network parameters and video memory; the cavity convolution in the intra-frame convolution can increase the receptive field of a convolution kernel on the basis of not damaging the image resolution, can reduce the number of down-sampling and is beneficial to protecting the image resolution;

3: the cascade void convolutional network with the attention mechanism is added with a spatial attention module in a layer jump structure, so that in the process that the network fuses a feature map containing context information of an encoder to a feature map containing detail information of a decoder, the weight of the feature map containing the detail information of the decoder is multiplied to the feature map containing the context information of the encoder, the part containing the detail information is endowed with a larger weight, the attention is concentrated on the fine part of the feature map, and the precision of a segmentation result is improved;

4: the cascade hole convolution network with attention mechanism adds multilayer characteristic diagram fusion in the decoder to further integrate the context information and the detail information, wherein the characteristic diagram of a lower layer in the decoder contains more context information, the characteristic diagram of a higher layer contains more detail information, the multilayer characteristic diagram fusion is connected with the characteristic diagrams of different layers in the decoder, and the context characteristic and the detail characteristic are integrated together, so that the accuracy of the segmentation result is further improved.

Drawings

FIG. 1 is a schematic diagram of a cascaded void convolution network architecture with attention mechanism

FIG. 2 is a space attention module structure

Detailed Description

Firstly, the technical scheme of the brain tumor segmentation of the cascade cavity convolution network with the attention mechanism is introduced, and the steps are as follows:

(1) data preprocessing:

the invention uses the published BraTS 2018^[6]The data set comprises 285 training sets and 66 verification sets, wherein the training setsThe method comprises the steps of including HGG 210 cases and LGG 75 cases, wherein each case has 3D MR images of T1, T1ce, T2 and FLAIR four modalities, and each size is 240 x 155; there were 66 validation sets, with no differentiation between tumor types. The slices are divided into 144 × 144 × 19 slices in the axial, sagittal, and coronal directions, respectively, as raw inputs.

the cascade cavity convolution network with the attention mechanism adopts a three-level cascade framework, a plurality of segmentation tasks are simplified into three two-level segmentation tasks, the segmentation difficulty is reduced, and network parameters are reduced, wherein the three-level segmentation networks are respectively W-Net, T-Net and E-Net and are respectively used for segmenting a brain tumor Whole (WT) area, a tumor nucleus (TC) area and an enhanced tumor nucleus (ET) area; segmenting in each stage from the axial direction, the sagittal direction and the coronal direction respectively, and then averaging in segmentation results in the three directions to obtain a more accurate segmentation result;

the network structure of each level in the three-level cascade frame of the cascade cavity convolution with the attention mechanism is similar, and the network structure is a full convolution network structure of encoding and decoding and is divided into four parts of an encoder, a decoder, a layer jump structure and a multi-layer feature map fusion:

the encoder has a four-layer structure, the first layer comprises four convolution kernels, each convolution kernel is a 3 × 3 × 1 intra convolution layer, and each intra convolution layer is followed by a batch normalization layer (BN)^[8]And a PReLU for non-linearization^[9]The layer, form the intra-frame and roll up the block, every two intra-frame roll up the block to connect and form the residual block through the residual; the second layer contains two residual blocks (four intra-frame convolution blocks), a down-sampling layer and a convolution kernel which are inter-frame convolution layers with the size of 1 multiplied by 3, and a BN layer and a PReLU layer which are arranged behind the inter-frame convolution layers form the inter-frame convolution blocks; the third layer has three residual blocks (six intra-frame convolution blocks), an inter-frame convolution block and a down-sampling layer; respectively adding the second residual block and the third residual block into a cavity convolution with expansion rates of 2 and 3; the fourth layer comprises a down-sampling layer, three residual blocks with expansion rate of 3 (six intra-frame convolution blocks) and a deconvolution layer^[10]；

The decoding path has a three-layer structure, the first layer has three residual blocks (six intra-frame rolling blocks), an inter-frame rolling block and an up-sampling layer, wherein the first and second residual blocks are respectively added with a hole convolution with expansion rates of 3 and 2; the second layer has two residual blocks (four intra-frame convolution blocks), an inter-frame convolution block, and an deconvolution layer; the third layer contains two residual blocks (four intra-frame convolution blocks);

the jump layer is connected with three: the first one adds the feature diagram of the first layer output of the coder into the attention (multiplied by the weight) of the feature diagram of the second layer output of the decoder through four times of intra-frame convolution block operation, and connects the feature diagram of the second layer output of the decoder as the input of the third layer of the decoder; the second one adds the feature diagram of the second layer output of the coder to the attention (multiplied by the weight) of the feature diagram of the first layer output of the decoder through two times of intra-frame convolution block operation, and is connected with the feature diagram of the first layer output of the decoder to be used as the input of the second layer of the decoder; the third section adds the output characteristic diagram of the third layer of the coder into the attention (multiplied by the weight) of the characteristic diagram output by the fourth layer of the coder, and is connected with the output characteristic diagram of the fourth layer of the coder to be used as the input of the first layer of the decoder;

(3) Model training and optimization:

285 cases (each case containing four modalities) of 3D NMR images of the BraTS 2018 training set are cut into blocks with the size of 144 x 19 and sent to a network, and the initial learning rate is 10^-4And using ADAM optimization, continuously and reversely propagating and updating the weight, training the model and storing.

Selection of learning rate: too small a learning rate can lead to long-time non-convergence and waste of resources; too large a learning rate may result in a local minimum being trapped. Therefore, the learning rate of 10 was selected for this experiment^-4。

Optimizing the model: ADAM optimization, according to a loss function, is continuously propagated backwards, and the weight is updated.

The embodiments will be described in further detail below with reference to the accompanying drawings:

first, a data set is prepared:

the invention uses the publication Brain Tumor Segmentation 2018(BraTS 2018)^[6]) The data set is divided into a training set and a verification set, wherein the training set comprises 285 cases, including 210 cases of high glial tumors (HGG) and 75 cases of low glial tumors (LGG), the verification set comprises 66 cases, and tumor types are not distinguished; each 3D nuclear magnetic resonance image containing four modes of T1, T1ce, T2 and FLAIR has the size of 240 x 155, the 285 images of the training set are cut, the images are cut into pieces with the size of 144 x 19, the pieces are sent to a network for training a model, and the 66 images of the verification set are used as a test set test model;

and secondly, constructing a cascade cavity convolution network with an attention mechanism by using a deep learning frame Tensorflow, wherein the whole network frame is in three-stage cascade, the network is trained in three directions of an axial direction, a sagittal direction and a coronal direction in each stage, and the network structure of each stage is similar. As shown in FIG. 1, FIG. 1 shows a network (W-Net) structure for dividing the whole brain tumor;

(1) sending the axial blocks into a network, and changing the axial blocks into a feature map with the channel number of 32 and the size of 144 multiplied by 19 through two residual blocks (four intra-frame convolution blocks, wherein each intra-frame convolution block comprises an intra-frame convolution layer with the convolution kernel size of 3 multiplied by 1, a BN layer and a PReLU layer) of a first layer of an encoder; after entering the second layer of the encoder, the second layer is converted into a feature map with channel number of 32 and size of 72 × 72 × 17 through two residual blocks (four intra-frame convolution blocks), an inter-frame convolution block (including an inter-frame convolution layer with convolution kernel size of 1 × 1 × 3, a BN layer and a prilu layer) and a down-sampling layer; entering a third layer of the encoder, and obtaining a characteristic diagram with the channel number of 36 multiplied by 15 through three residual blocks (six intra-frame rolling blocks, the expansion rates of the three residual blocks are 1, 2 and 3 respectively), an inter-frame rolling block and a down-sampling layer; entering a fourth layer of the encoder, and obtaining a feature map with the channel number of 32 and the size of 36 multiplied by 15 through a down-sampling layer, three residual blocks (six intra-frame rolling blocks) with the expansion rate of 3 and an up-sampling layer; the layer-skipping structure adds the output of the third layer of the encoder into the attention (multiplied by weight) of the output characteristic diagram of the fourth layer of the encoder, is connected with the output characteristic diagram of the fourth layer of the encoder and enters the first layer of the decoder; the first layer of the decoder obtains a characteristic diagram with the channel number of 32 and the size of 72 multiplied by 13 through three residual blocks (six intra-frame convolution blocks, the expansion rates of the three residual blocks are respectively 3, 2 and 1), an inter-frame convolution block and an deconvolution layer; the layer jump connection adds the attention (multiplied by weight) of the output characteristic diagram of the first layer of the decoder after the output of the second layer of the encoder is subjected to two inter-frame convolution block operations, is connected with the output characteristic diagram of the first layer of the decoder and enters the second layer of the decoder; the second layer of the decoder obtains a feature map with the channel number of 32 and the size of 144 multiplied by 11 through two residual error blocks (four intra convolution blocks), an inter convolution block and a deconvolution layer; the skip layer connection adds the attention (multiplied by weight) of the output characteristic diagram of the second layer of the decoder after the output of the first layer of the encoder is subjected to four inter-frame convolution block operations, is connected with the output characteristic diagram of the second layer of the decoder and enters the third layer of the decoder; the third layer of the encoder has two residual blocks (four intra-frame convolution blocks) to obtain a feature map with the channel number of 32 and the size of 144 multiplied by 11; performing interframe convolution operation and deconvolution operation on the output characteristic diagram of the fourth layer of the encoder twice, performing interframe convolution operation and deconvolution operation on the output characteristic diagram of the first layer of the decoder once, connecting the output characteristic diagram of the first layer of the decoder with the output characteristic diagrams of the second layer and the third layer of the decoder, and performing convolution of two types of segmentation to obtain 2-channel output with the size of 144 multiplied by 11; obtaining the integral segmentation result of the axial tumor;

the attention module is shown in fig. 2, and adds the feature map a and the feature map B after convolution with convolution kernel size of 1 × 1 × 1 and channel number of C, performs convolution with ReLU nonlinear operation, convolution kernel size of 1 × 1 × 1 and channel number of 1, multiplies the result by the feature map B after Sigmoid function, and gives weight (attention) to the feature map B;

(2) cutting the output obtained in the step (1), and cutting the whole tumor segmented in the step (1) to be used as the input of the second level of the cascade network, wherein the structure of the second level network is similar to that of the cascade network in the step (1); the process is the same as (1); reducing the down-sampling operation once on the second layer of the encoder compared with the down-sampling operation once on the second layer of the encoder in the case of the multi-layer feature map fusion, reducing one deconvolution layer on the second layer of the decoder, and reducing one deconvolution layer on each of the fourth layer of the encoder and the first layer of the decoder in the case of the multi-layer feature map fusion; obtaining the segmentation result of the axial tumor nucleus;

(3) cutting the output obtained in the step (2), and cutting the tumor kernel segmented in the step (2) to be used as the input of the third level of the cascade network, wherein the structure of the third level network is similar to that of the network in the figure 1; the process is the same as (1); compared with the method in the figure 1, the method has the advantages that the down-sampling operation is reduced once on the first layer and the second layer of the encoder, the deconvolution layer is reduced on the second layer and the third layer of the decoder, the deconvolution operation is reduced twice on the fourth layer of the encoder and the deconvolution operation is reduced on the first layer of the decoder when the multi-layer feature maps are fused; obtaining the segmentation result of the axially enhanced tumor nuclei;

(4) sending the sagittal blocks into a network, and obtaining the segmentation results of the sagittal tumor whole body, the tumor nucleus and the enhanced tumor nucleus through three-level network segmentation, wherein the processes are the same as (1) - (3);

(5) sending the coronal blocks into a network, and obtaining segmentation results of coronal tumor whole bodies, tumor nuclei and enhanced tumor nuclei through three-level network segmentation, wherein the processes are the same as (1) - (3);

(6) and averaging the segmentation results of the axial direction, the sagittal direction and the coronal direction to obtain a final segmentation result.

Thirdly, training the network, namely cutting 285 cases (each case comprises four modes) of 3D nuclear magnetic resonance images of the BraTS 2018 training set into blocks with the size of 144 x 19, and sending the blocks into the network, wherein the initial learning rate is 10^-4Using ADAM optimization, continuously and reversely propagating and updating the weight, training a model and storing; the loss functions selected in this embodiment are the Dice loss function and the Cross entropy loss function adopted in document 7.

And fourthly, testing the model, namely taking 66 cases (each case contains four modes) of the BraTS 2018 verification set as a test set test model, and segmenting the test set by using the trained model to obtain a Dice score and a Hausdorff distance. Obtaining a Dice score: WT 0.90462, TC 0.81727, ET 0.80091; hausdorff distance: WT is 4.81871, TC is 8.75708, and ET is 2.98508.

The invention has the following substantive characteristics and beneficial effects:

(1) three sub-areas of the brain tumor image are respectively segmented by adopting a three-level cascade framework, so that the difficulty of each stage of segmentation task is reduced; the encoder extracts context characteristic information through convolution, and increases the receptive field of a convolution kernel through a down-sampling layer and a cavity convolution, and the decoder extracts detail information through convolution and restores the image resolution through deconvolution operation; the layer jump structure connects the feature graphs of the same level in the encoder and the decoder, and fuses feature information.

(2) The 3D convolution kernel is divided into an intra-frame convolution kernel and an inter-frame convolution kernel, the characteristic information inside the slices and the characteristic information between the slices are respectively extracted, and meanwhile, the consumption of parameters and video memory of a network model can be reduced; adding cavity convolution with different expansion rates into the intra-frame convolution to obtain convolution kernels with different receptive fields;

(3) and adding a spatial attention module in the layer jump structure, so that when the encoder and the decoder are connected at the same stage of feature maps, the weights of the feature maps of the decoder are multiplied by the feature map of the encoder, and the attention of the convolutional network is focused on the detail features.

(4) And adding multi-stage feature map fusion into a decoder, recovering the original resolution of the feature map of each layer of the decoder through deconvolution, connecting, fusing global features and detail features, and improving the accuracy.

Claims

1. A method for segmenting brain tumors by a cascade cavity convolution network with an attention mechanism comprises the following steps:

(1) data preprocessing:

and selecting a 3D MR image, constructing a training set and a verification set containing images of different brain tumor types, and preprocessing.

the multilayer characteristic diagram fusion carries out two-time interframe convolution and two-time deconvolution operations on the fourth layer output characteristic diagram of the encoder, the first layer output characteristic diagram of the decoder is connected with the second layer output characteristic diagram of the decoder and the third layer output characteristic diagram of the decoder through one-time interframe convolution and one-time deconvolution operation, and then the final output is obtained through one-type two-segmentation convolution;

(3) and (5) training and optimizing the model.