CN117649385A

CN117649385A - Lung CT image segmentation method based on global and local attention mechanisms

Info

Publication number: CN117649385A
Application number: CN202311629043.1A
Authority: CN
Inventors: 刘培松; 王丹; 潘丹; 曾安; 杨宝瑶; 刘鑫; 杨洋; 刘军
Original assignee: Guangdong University of Technology
Current assignee: Guangdong University of Technology
Priority date: 2023-11-30
Filing date: 2023-11-30
Publication date: 2024-03-05

Abstract

The invention provides a lung CT image segmentation method based on global and local attention mechanisms, which comprises the steps of acquiring a lung CT image data set and preprocessing; establishing a lung tumor segmentation model based on a global-local attention mechanism, wherein the lung tumor segmentation model is a U-shaped architecture network, and a global-local transducer module is arranged in an encoder; inputting the preprocessed lung CT image data set into a lung tumor segmentation model for iterative training; finally, acquiring a lung CT image to be segmented and inputting the image into a trained lung tumor segmentation model for tumor segmentation; the invention realizes the deep understanding of the whole semantics and the context information of the image by combining the local feature transmission, the global semantic information capture and the multi-scale feature expression; in addition, the invention combines the global-local transducer with the hierarchical attention mechanism, so that the accuracy and performance of image segmentation can be comprehensively improved.

Description

Lung CT image segmentation method based on global and local attention mechanisms

Technical Field

The invention relates to the technical field of deep learning and CT image recognition, in particular to a lung CT image segmentation method based on global and local attention mechanisms.

Background

Lung cancer is a highly malignant and widespread cancer, with the most rapid increase in morbidity and mortality, and a serious hazard to human health. Computed tomography (Computer Tomography, CT) scanning is widely used in its noninvasive imaging modality for early detection, diagnosis and evaluation of lung tumors, and other stages. Clinically, high quality lung tumor segmentation based on lung CT images is critical to early diagnosis of lung cancer. However, at present, most of the lung tumors are marked by clinicians by manual marking, which is time-consuming and labor-consuming, so that a method capable of automatically identifying and dividing lung cancer is urgently needed to improve the diagnosis efficiency of clinicians.

Currently, segmentation of lung CT images is generally based on two types of neural network models: 1) The 2D convolutional neural network is based on a 2D U-Net model, such as MSDS-Unet, MEAU-Net and other models; MSDS-Unet is a multi-scale multi-level deep supervision U-Net, combines the existing U-Net model with a dual-path deep supervision mechanism to realize better segmentation performance, but mainly focuses on local information transmission and feature fusion, and less focuses on global semantic information, so that the network is insensitive to the contrast of focuses and surrounding tissues; MEAU-Net combines the mixed attention of Unet and multiple codes to divide lung tumor, but does not fully consider the expression capability of different scales on the characteristics, thereby influencing the accuracy of the division result and generating the situation of mistaken division of focus areas; 2) Segmentation models based on a transducer, such as a TransUnet model, a SwinUnet model, a TransFuse model and the like; the transune is a model which breaks through in the field of medical image segmentation based on a Transformer architecture, a self-attention mechanism of the Transformer is introduced to capture global and local information in an input image, the self-attention mechanism in the Transformer allows the model to pay attention to different input positions dynamically, and long-distance dependence in the captured image is facilitated; swinuent is a pure transducer model similar to U-Net, a decoder and an encoder both adopt a Swin transducer model, the Swin transducer realizes long-distance dependent capture by splitting an image into small blocks (called local panes) and introducing cross-pane position coding, and the architecture can better capture global features in the image; transFuse combines a transducer and a Convolutional Neural Network (CNN), runs a shallow CNN-based encoder and a transducer-based segmentation network in parallel, and then fuses features from two branches together using a Bifusion module to jointly make predictions; the self-attention mechanism of the Transformer structure focuses on the interaction of global information, and the information transfer between the far pixels is limited, resulting in loss of fine-grained detail and spatial context in local information modeling.

The prior art discloses a global-local feature association fusion lung CT image segmentation method, which comprises the following implementation scheme: creating an encoder consisting of a transducer layer; establishing an encoder consisting of a convolutional neural network; establishing a local feature association fusion module consisting of a transducer layer; establishing a decoder which consists of a convolutional neural network and is symmetrical to the second encoder; the first encoder and the second encoder are connected in parallel, and then are cascaded with the local feature association fusion module and the decoder to form a segmentation network model; inputting training set data into two encoders in parallel, and carrying out iterative training on the segmentation network model by using a back propagation method; inputting the test image into a trained segmentation network, and segmenting an infection area of a lung CT image; although the prior art improves the segmentation accuracy of the lung CT image by extracting global-local features and fusing, the global-local features are not extracted as a whole structure, but are arranged in parallel, so that the local information and the global context information cannot be simultaneously focused, and the segmentation accuracy still needs to be improved.

Disclosure of Invention

The invention provides a lung CT image segmentation method based on global and local attention mechanisms, which combines local feature transmission, global semantic information capturing and multi-scale feature expression so as to realize deep understanding of the whole semantic and the context information of an image; in addition, the invention also provides an HA-TransUnet (Hierarchical Attention Transformer-Unet) neural network combined with a hierarchical attention mechanism, which utilizes jump connection of different hierarchies and applies multiple types of attention mechanisms, so that the model can pay more attention to the characteristics of different hierarchies more accurately, fusion and utilization of multi-scale characteristics are realized, and the segmentation precision is further improved.

In order to solve the technical problems, the technical scheme of the invention is as follows:

a lung CT image segmentation method based on global and local attention mechanisms, comprising the steps of:

s1: acquiring a lung CT image data set and preprocessing;

s2: establishing a lung tumor segmentation model based on a global-local attention mechanism;

the lung tumor segmentation model is a U-shaped architecture network and comprises: the encoder and the decoder are connected in sequence, and jump connection is further arranged between the encoder and the decoder;

the encoder comprises the following components connected in sequence: a convolution module and a global-local transducer module;

s3: inputting the preprocessed lung CT image data set into the established lung tumor segmentation model for iterative training, and obtaining a trained lung tumor segmentation model;

s4: and acquiring a lung CT image to be segmented, inputting the lung CT image to be segmented into a trained lung tumor segmentation model for tumor segmentation, and completing the segmentation of the lung CT image.

Preferably, in the step S1, the specific method for acquiring and preprocessing the lung CT image dataset is as follows:

acquiring a 3D lung CT image data set, and sequentially performing lung region cutting, image resampling and image normalization on the acquired lung CT image data set to acquire a normalized lung CT image data set;

And 2D slicing each lung CT image in the normalized lung CT image data set to obtain a preprocessed lung CT image data set, and finishing preprocessing.

Preferably, in the preprocessing in step S1, an image resampling operation is performed by using a nearest neighbor interpolation method; image normalization operations were performed using Z-Score normalization.

Preferably, the lung tumor segmentation model established in the step S2 specifically includes:

the convolution module in the encoder comprises a first downsampling submodule, a second downsampling submodule and a linear layer which are sequentially connected; the first downsampling submodule and the second downsampling submodule have the same structure and each comprise a plurality of convolution blocks and a maximum pooling layer which are sequentially connected;

each convolution block has the same structure and comprises a first convolution layer, a first group of normalization layers and an activation function layer which are sequentially connected;

the global-local transducer module in the encoder comprises a plurality of global-local transducer sub-modules which are connected in sequence;

each global-local transducer sub-module has the same structure and comprises a second group of normalization layers, a global-local attention network, a third group of normalization layers and a multi-layer perceptron which are sequentially connected; the input of the second group of normalization layers is connected with the output of the global-local attention network to form residual summation, and the input of the third group of normalization layers is connected with the output of the multi-layer perceptron to form residual summation;

The global-local attention network comprises a global branch and a local branch which are arranged in parallel;

the global branch comprises a position coding addition point and a multi-head self-attention layer which are connected in sequence; taking the output of the multi-head self-attention layer as the output of a global branch;

the local branches comprise a first convolution branch and a second convolution branch which are arranged in parallel; the first convolution branch and the second convolution branch have the same structure and comprise a second convolution layer and a fourth group of normalization layers which are sequentially connected; adding and connecting the outputs of the first convolution branch and the second convolution branch to be used as the output of a local branch;

adding and connecting the output of the global branch and the output of the local branch to be used as the output of a global-local attention network;

the decoder comprises a first upsampling module, a first fusing module, a second upsampling module, a second fusing module, a third upsampling module, a third fusing module, a fourth upsampling module and an output module which are connected in sequence;

each up-sampling module has the same structure and comprises a plurality of convolution blocks and a bilinear interpolation layer which are connected in sequence;

the input of the first downsampling sub-module in the encoder is in jump connection with the input of the third fusion module in the decoder; the input of the second downsampling sub-module in the encoder is in jump connection with the input of the second fusion module in the decoder; the output of the second downsampling sub-module in the encoder is in skip connection with the input of the first fusion module in the decoder.

Preferably, in the lung tumor segmentation model of step S2, the jump connection is specifically:

the input of the first downsampling submodule in the encoder is connected with the input of the third fusion module in the decoder in a jumping manner through the channel attention CA module; the input of the second downsampling sub-module in the encoder is in jump connection with the input of the second fusion module in the decoder through the convolution block attention CBAM module; the output of the second downsampling sub-module in the encoder is in jump connection with the input of the first fusion module in the decoder through the spatial attention SA module.

Preferably, in the convolution module of the encoder, the first downsampling submodule and the second downsampling submodule each comprise a convolution block 1, a convolution block 2, a convolution block 3 and a maximum pooling layer which are connected in sequence;

the convolution kernel size of the first convolution layer in the convolution block 2 is 3×3; the convolution kernel size of the first convolution layer in the convolution block 1 and the convolution block 3 is 1×1;

the activation function of the activation function layer in each convolution block is specifically a ReLu activation function;

the convolution kernel size of the maximum pooling layer is 3×3.

Preferably, the global-local transducer module in the encoder comprises 12 global-local transducer sub-modules connected in sequence;

In the local branches of each global-local converter sub-module, the convolution kernel size of the second convolution layer in the first convolution branch is 1×1, and the convolution kernel size of the second convolution layer in the second convolution branch is 3×3.

Preferably, in the step S3, the preprocessed lung CT image dataset is input into the established lung tumor segmentation model for iterative training, and the specific method for obtaining the trained lung tumor segmentation model is as follows:

s3.1: the preprocessed lung CT image data set is subjected to a five-fold cross validation strategy according to the following steps: 1 is divided into a training set and a testing set;

s3.2: inputting the training set into the established lung tumor segmentation model, setting training parameters, and performing iterative training by using an Adam optimizer;

s3.3: calculating a loss value of the trained lung tumor segmentation model by using a preset loss function;

s3.4: repeating the steps S3.2-S3.3, storing the trained lung tumor segmentation model with the minimum loss value as a trained lung tumor segmentation model, and optimizing and adjusting parameters of the trained lung tumor segmentation model by using a test set.

Preferably, the loss function in step S3.3 is specifically a Dice loss function.

The invention also provides a lung CT image segmentation system based on the global and local attention mechanisms, which is applied to the lung CT image segmentation method based on the global and local attention mechanisms, and comprises the following steps:

data acquisition and preprocessing unit: the method comprises the steps of acquiring a lung CT image data set and preprocessing;

model building unit: for establishing a lung tumor segmentation model based on a global-local attention mechanism;

model training unit: the method comprises the steps of inputting a preprocessed lung CT image data set into an established lung tumor segmentation model for iterative training, and obtaining a trained lung tumor segmentation model;

lung CT image segmentation unit: the method is used for acquiring lung CT images to be segmented, inputting the lung CT images to be segmented into a trained lung tumor segmentation model for tumor segmentation, and completing the segmentation of the lung CT images.

Compared with the prior art, the technical scheme of the invention has the beneficial effects that:

The invention provides a lung CT image segmentation method based on global and local attention mechanisms, which comprises the steps of firstly acquiring a lung CT image data set and preprocessing; establishing a lung tumor segmentation model based on a global-local attention mechanism; the lung tumor segmentation model is a U-shaped architecture network and comprises: the encoder and the decoder are connected in sequence, and jump connection is further arranged between the encoder and the decoder; the encoder includes the connection in proper order: a convolution module and a global-local transducer module; inputting the preprocessed lung CT image data set into the established lung tumor segmentation model for iterative training, and obtaining a trained lung tumor segmentation model; finally, acquiring a lung CT image to be segmented, inputting the lung CT image to be segmented into a trained lung tumor segmentation model for tumor segmentation, and completing the segmentation of the lung CT image;

the lung CT image segmentation method based on the global and local attention mechanisms realizes the deep understanding of the whole semantics and the context information of the image by combining local feature transmission, global semantic information capture and multi-scale feature expression; in addition, the invention also provides an HA-TransUnet (Hierarchical Attention Transformer-Unet) neural network combined with a hierarchical attention mechanism, which utilizes jump connection of different hierarchies and applies multiple types of attention mechanisms, so that the model can pay more attention to the characteristics of different hierarchies more accurately, fusion and utilization of multi-scale characteristics are realized, and the segmentation precision is further improved; in addition, the invention also combines the global-local transducer with the hierarchical attention mechanism to comprehensively improve the accuracy and performance of image segmentation, and the network structure enables the model to more accurately understand the medical image and cope with more complex medical image segmentation tasks.

Drawings

Fig. 1 is a flowchart of a lung CT image segmentation method based on the global and local attention mechanisms provided in embodiment 1.

Fig. 2 is a structural diagram of a lung CT image segmentation method based on global and local attention mechanisms according to embodiment 2.

Fig. 3 is a schematic view of a lung tumor segmentation model provided in example 2.

Fig. 4 is a diagram showing the structure of a channel attention CA module according to embodiment 2.

Fig. 5 is a block diagram of a spatial attention SA module provided in embodiment 2.

Fig. 6 is a block diagram of the convolutional block attention CBAM module provided in example 2.

Fig. 7 is a global-local transducer sub-module (GLTB) structure diagram provided in embodiment 2.

Fig. 8 is a diagram showing the structure of the global-local attention network in the global-local transducer sub-module provided in embodiment 2.

Fig. 9 is a graph showing comparison of segmentation results under different methods as provided in example 2.

Fig. 10 is a diagram of a lung CT image segmentation system based on global and local attention mechanisms according to embodiment 3.

Detailed Description

The drawings are for illustrative purposes only and are not to be construed as limiting the present patent;

for the purpose of better illustrating the embodiments, certain elements of the drawings may be omitted, enlarged or reduced and do not represent the actual product dimensions;

It will be appreciated by those skilled in the art that certain well-known structures in the drawings and descriptions thereof may be omitted.

The technical scheme of the invention is further described below with reference to the accompanying drawings and examples.

Example 1

As shown in fig. 1, the present embodiment provides a lung CT image segmentation method based on global and local attention mechanisms, including the following steps:

s1: acquiring a lung CT image data set and preprocessing;

In the specific implementation process, a lung CT image data set is firstly obtained and preprocessed; establishing a lung tumor segmentation model based on a global-local attention mechanism; the lung tumor segmentation model is a U-shaped architecture network and comprises: the encoder and the decoder are connected in sequence, and jump connection is further arranged between the encoder and the decoder; the encoder includes the connection in proper order: a convolution module and a global-local transducer module; inputting the preprocessed lung CT image data set into the established lung tumor segmentation model for iterative training, and obtaining a trained lung tumor segmentation model; finally, acquiring a lung CT image to be segmented, inputting the lung CT image to be segmented into a trained lung tumor segmentation model for tumor segmentation, and completing the segmentation of the lung CT image;

in the lung CT image segmentation method based on the global and local attention mechanisms, through combining local feature transmission, global semantic information capture and multi-scale feature expression, the deep understanding of the whole semantic and context information of the image is realized, and the segmentation precision can be effectively improved.

Example 2

As shown in fig. 2, the present embodiment provides a lung CT image segmentation method based on global and local attention mechanisms, which includes the following steps:

S1: acquiring a lung CT image data set and preprocessing;

s4: acquiring a lung CT image to be segmented, inputting the lung CT image to be segmented into a trained lung tumor segmentation model for tumor segmentation, and completing the segmentation of the lung CT image;

in the step S1, the specific method for acquiring and preprocessing the lung CT image dataset includes:

acquiring a 3D lung CT image data set, sequentially carrying out lung region cutting, image resampling by using a nearest neighbor interpolation method and image normalization by using Z-Score standardization on the acquired lung CT image data set, and acquiring a normalized lung CT image data set;

2D slicing each lung CT image in the normalized lung CT image data set to obtain a preprocessed lung CT image data set, and finishing preprocessing;

the lung tumor segmentation model established in the step S2 specifically comprises the following steps:

the convolution module in the encoder comprises a first downsampling submodule, a second downsampling submodule and a linear layer which are sequentially connected; the first downsampling submodule and the second downsampling submodule have the same structure and respectively comprise a convolution block 1, a convolution block 2, a convolution block 3 and a maximum pooling layer which are sequentially connected; each convolution block has the same structure and comprises a first convolution layer, a first group of normalization layers and an activation function layer which are sequentially connected;

the convolution kernel size of the maximum pooling layer is 3×3;

the global-local transducer module in the encoder comprises 12 global-local transducer sub-modules which are connected in sequence;

the local branches comprise a first convolution branch and a second convolution branch which are arranged in parallel; the first convolution branch and the second convolution branch have the same structure and comprise a second convolution layer and a fourth group of normalization layers which are sequentially connected; the convolution kernel size of the second convolution layer in the first convolution branch is 1 multiplied by 1, and the convolution kernel size of the second convolution layer in the second convolution branch is 3 multiplied by 3; adding and connecting the outputs of the first convolution branch and the second convolution branch to be used as the output of a local branch;

each up-sampling module has the same structure and comprises 2 convolution blocks and a bilinear interpolation layer which are connected in sequence;

The input of the first downsampling submodule in the encoder is connected with the input of the third fusion module in the decoder in a jumping manner through the channel attention CA module; the input of the second downsampling sub-module in the encoder is in jump connection with the input of the second fusion module in the decoder through the convolution block attention CBAM module; the output of the second downsampling sub-module in the encoder is in jump connection with the input of the first fusion module in the decoder through the spatial attention SA module;

in the step S3, the preprocessed lung CT image dataset is input into the established lung tumor segmentation model for iterative training, and the specific method for obtaining the trained lung tumor segmentation model is as follows:

s3.3: calculating a loss value of the trained lung tumor segmentation model by using a preset Dice loss function;

In a specific implementation process, a LUNG CT image dataset is first acquired and preprocessed, the model segmentation performance and generalization performance are verified by using two different datasets, including a public dataset and a private dataset of a hospital, the public dataset is LUNG tumor data published in a cancer image archive (The Cancer Imaging Archive, TCIA), the TCIA is created and managed by the american national cancer institute, medical image data from multiple research institutions and medical centers around the world is collected, and the public dataset used in the present embodiment is named LUNG1 and contains images from 422 non-small cell LUNG cancer patients; the private data set is internal data GDPH from a certain hospital and has been authorized for use; GDPH contains 824 3D lung CT images, with lung tumor regions in this dataset manually delineated by two experienced radiologists; tag annotation is then cross-checked by multiple radiologists and a final agreement is reached by discussion; the size, resolution, layer thickness and other information of the image in the finally obtained lung CT image dataset are shown in table 1:

TABLE 1 specific information of lung CT image dataset

	GDPH	Lung1
			Total amount of data	824	422
Image size	512×512×(78～196)	512×512×(70～190)
			Layer thickness	1.25	3
Plane resolution	0.72×0.72	0.97×0.97

The data preprocessing mainly comprises three parts of cutting, resampling and normalizing, wherein the data is firstly cut, and a lot of irrelevant information exists outside the lung area of the tumor and can influence the segmentation result of the model; resampling the data, wherein heterogeneity can be caused among the data of different clients due to different data acquisition devices of different clients, and isotropic spatial resolution of an axial plane is obtained by adopting a nearest neighbor interpolation method, so that the distance between each voxel is 1 multiplied by 1; finally, normalizing the data, setting the window width of the lung tumor to 150 Hawsfiled Units (HU), setting the window level to-1000 HU, and enhancing the image contrast to highlight the lung structure, and normalizing the image by using Z-Score; then cutting the three-dimensional lung tumor data into 2D slice data according to the transverse position so that the data can be used for training and preprocessing is completed;

establishing a lung tumor segmentation model based on a global-local attention mechanism;

as shown in fig. 3, the lung tumor segmentation model in this embodiment is a U-shaped architecture network, including: the encoder and the decoder are connected in sequence, and jump connection is further arranged between the encoder and the decoder;

The encoder includes 2 downsampling sub-modules and 12 Global-local transducer sub-modules (Global-Local Transformer Block, GLTB); each downsampling sub-module comprises three convolution blocks and a maximum pooling operation; one of the convolution blocks comprises a 3×3 convolution, a group normalization and activation function ReLu, the convolution kernel of the convolution layers of the other two convolution blocks is 1×1, and the kernel size of the maximum pooling layer is 3×3; feature images generated by the two downsampling submodules are input into the GLTB after linear mapping, and the dimensions of the feature images are 16 multiplied by 196 multiplied by 768 after the feature images are subjected to GLTB; then the decoder decodes the characteristics generated by the encoder, the decoder comprises 4 up-sampling modules and an output layer, each up-sampling module consists of 2 convolution blocks and 1 bilinear interpolation layer, the bilinear interpolation layer is used for adjusting the size of an input image to a required size and simultaneously keeping the content and the details of the image, and in the gradual decoding process, the decoder fuses the jump-connected characteristic information of different scales and gradually restores the resolution of the image so as to generate a fine segmentation result;

hierarchical attention mechanisms include channel attention mechanisms (Channel Attention, CA), spatial attention mechanisms (Spatial Attention, SA), convolutional block attention mechanisms (Convolutional Block Attention Module, CBAM); as shown in fig. 4, the channel attention CA module in this embodiment mainly comprises an average pooling layer, a maximum pooling layer, a multi-layer perceptron and an activation function, after data passes through the first convolution module of the CNN layer, the data needs to pass through the channel attention mechanism CA in jump connection to transfer the features into the decoder, and the shallow features generated by the first convolution of the CNN module retain rich lung shape details, but lack deep semantic content, so that the channel input to the shallow features is selected and weighted by using the channel attention mechanism, thereby weakening the attention to irrelevant or redundant feature channels and improving the quality and effectiveness of the shallow feature expression; firstly, inputting data into an average pooling layer and a maximum pooling layer, respectively convoluting the data through a multi-layer perceptron, adding the data, activating a function layer through sigmoid, then weighting and multiplying the data again with the input of a CA module, and finally adding the multiplied result with the input of the CA module to be used as the output of the CA module;

As shown in fig. 5, the spatial attention SA module in this embodiment mainly comprises four parts, namely an average pooling layer, a maximum pooling layer, convolution and an activation function, and the spatial attention mechanism SA can not only suppress background noise and redundant information in an image, but also can calculate attention weights on different scales, so that a network can focus on important image areas on different levels and resolutions, thereby improving the quality and effectiveness of lung tumor image features; firstly, compressing an input feature map on a channel layer by using average pooling and maximum pooling operations, reserving space information of the feature map, splicing the two compressed feature maps according to channel dimensions to obtain a feature map with the channel number of 2, calculating the spliced feature map by using a convolution layer with the core size of 3 multiplied by 3, and multiplying the feature map with an original input feature map by using a weight value obtained by a sigmoid activation function;

as shown in fig. 6, in the present embodiment, the convolution block attention CBAM module is sequentially connected by the above-mentioned channel attention CA module and spatial attention SA module, and in addition, in the convolution block attention CBAM module, the input and output of the channel attention CA module and the input and output of the spatial attention SA module form weighted multiplication connection respectively;

The network structure of the GLTB in this embodiment is shown in fig. 7, and mainly consists of four parts, two groups of normalized, global-Local Attention (GLA) network and Multi-Layer Perceptron (MLP);

before the feature input GLTB of the CNN layer output, the input X needs to be vectorized, i.e. the input X is reconfigured into a series of flat 2D patchesWherein N is as followsShowing the total number of patches, P x P representing the size of each patch, C representing the number of channels of the image; then, it is necessary to vector patch X _p Mapping into a potential D-dimensional embedded space, and adding position information to reserve positions among the patches in order to encode the patch space information; the process is represented by the following formula:

wherein E represents the embedded mapping of the patch, E _pos Representing embedded position information, Z ₀ Indicated is the input of GLTB;

the GLTB encoder comprises 12 layers of GLA and a multi-layer perceptron, so the output of the first layer can be expressed by the following equation:

Z′ _l ＝GLA(GN(Z _l-1 ))+Z _l-1

Z _l ＝MLP(GN(Z′ _l )+Z′ _l

wherein Z is _L Is the final output of the global-local transducer module;

in the multi-head self-attention mechanism of the standard transducer, although the multi-head self-attention mechanism can capture global context information, local scale features cannot be focused, so that segmentation accuracy is reduced; in response to this problem, the present approach introduces a local-global attention mechanism in the transducer, the specific architecture is shown in fig. 8, unlike the multi-headed self-attention mechanism in the standard transducer, the introduced GLA network exploits an efficient global-local attention mechanism that includes global branches and convolved local branches to capture visually perceived global and local contexts;

In the global branch part, a multi-head self-attention mechanism is still adopted, and the self-attention mechanism can shorten the distance between remote dependence features through remodelling the image shape; in the local branch part, the local context is extracted by utilizing two parallel convolution layers and group normalization, so that the local dependency relationship is captured, namely the association between the current position and the surrounding position is ignored, the size of the convolution kernel of the convolution layers is respectively 3 multiplied by 3 and 1 multiplied by 1, and finally, the extracted features of the two branches are summed;

inputting the preprocessed lung CT image data set into the established lung tumor segmentation model for iterative training, and obtaining a trained lung tumor segmentation model;

in the training process, a five-fold cross-validation strategy is adopted, experiments are carried out on a GPU platform of RTX 3090, the Python version is 3.7, the CUDA version is 12.0, and the PyTorch version is 1.7.1; using Adam optimizer, training time was 150, learning rate was set to 0.0001, and the learning rate was attenuated by 10 per 10 passes ^-5 In order to prevent network overfitting, adopting a batch size of 16, introducing dropout operation into a network layer, and setting the probability of dropout to be 0.5; the method also introduces a Dice loss function to represent the training effect, provides measurement of model performance, measures the fitting degree or prediction accuracy of the model on training data by calculating the difference between the model prediction result and the true value, and indicates that the prediction of the model is closer to the true value by a smaller loss value, thus indicating that the model has better performance;

Finally, acquiring a lung CT image to be segmented, inputting the lung CT image to be segmented into a trained lung tumor segmentation model for tumor segmentation, and completing the segmentation of the lung CT image;

in this embodiment, a combination of hierarchical attention mechanisms and global-local attention converters (Hierarchical Attention Transformer-une) is provided, the model performs main experiments on GDPH and LUNG1 datasets, and the HA-convertent (the method of this embodiment) is compared with three other medical image segmentation methods, including: 1) U-Net; 2) Trans-Unet; 3) Swin-Unet, wherein U-Net is a symmetrical network structure, the decoder and the encoder both comprise four convolution modules, and finally an output layer is used for outputting a segmentation result; trans-Unet is that the encoder is composed of ResNet and a transducer in cascade, wherein the output of ResNet is used as the input of the transducer, and the decoder is consistent with the encoder of U-Net; the coder and decoder of the Swin-Unet are composed of Swin transform blocks, each block contains a window type self-attention mechanism, and meanwhile, jump connection is adopted to add multi-scale characteristics, so that a good result is obtained in the field of medical image segmentation; the segmentation performance of the different methods in the two data sets is shown in table 2:

Table 2 comparison of different methods on two data sets

As can be seen from table 2, in the GDPH dataset, the segmentation performance of the HA-transuret method was the best, the Dice index was 89.96%, the performance was improved by 6.13%, 1.60% and 4.13% respectively, and the HD was 59.52, 13.28%, 18.40% and 13.90% respectively, compared to the U-Net, trans-uret and Swin-uret performances, respectively, and the three comparative experimental methods were reduced; on the LUNG1 data set, the method also obtains good segmentation results, the Dice is 87.18%, compared with U-Net, trans-Unet and Swin-Unet, the segmentation results are respectively improved by 5.34%, 5.73% and 1.65%, and the HD is respectively reduced by 9.89%, 9.44% and 5.64; by comparing the other three models, the method of the embodiment obtains the optimal result for the segmentation of the lesion region of the lung, which shows that the HA-TransUnet HAs better segmentation precision compared with other networks;

as shown in fig. 9, fig. 9 is a comparison graph of segmentation results of the selected four data under different methods, fig. 9 (a) is an original image, and fig. 9 (b) is a real label; FIGS. 9 (c) to 9 (f) are the segmentation results of U-Net, trans-Unet, swin-Unet, and HA-TransUnet, respectively;

for the data of the first row, the contrast experimental method generally shows the phenomenon of multi-segmentation, and some boundary tissues are also divided into tumor areas by mistake in the segmentation process, which is caused by the blurring of the boundary between the lung tumor and the surrounding tissues and the low contrast; for the second row and the fourth row of data, the tumor is adhered to surrounding tissues, the situation of less segmentation occurs in U-Net, the situation of more segmentation occurs in Trans-Unet and Swin-Unet, the comparison experiment method can not well distinguish the tumor from the tissue, and for the third row of data, the comparison experiment method has a rough segmentation result under the condition of large background noise; the HA-TransUnet provided by the embodiment HAs the advantages that on one hand, the characteristics of multi-level scale are further refined in jump connection, on the other hand, the global-local attention mechanism embedded in the encoder enables the network to pay more attention to the shape and the position of the tumor, the segmentation result is closer to the label, and the segmentation accuracy is improved to a certain extent;

The embodiment provides an HA-TransUnet (Hierarchical Attention Transformer-Unet) neural network combined with a hierarchical attention mechanism, which utilizes jump connection of different hierarchies and applies multiple types of attention mechanisms, so that a model can pay more attention to the characteristics of different hierarchies more accurately, fusion and utilization of multi-scale characteristics are realized, and segmentation accuracy is further improved; in addition, the method also combines the global-local transducer with the hierarchical attention mechanism to comprehensively improve the accuracy and performance of image segmentation, and the network structure enables the model to more accurately understand the medical image and cope with more complex medical image segmentation tasks.

Example 3

As shown in fig. 10, the present embodiment provides a lung CT image segmentation system based on a global and local attention mechanism, and the application of the lung CT image segmentation method based on a global and local attention mechanism described in embodiment 1 or 2 includes:

data acquisition and preprocessing unit 301: the method comprises the steps of acquiring a lung CT image data set and preprocessing;

model creation unit 302: for establishing a lung tumor segmentation model based on a global-local attention mechanism;

model training unit 303: the method comprises the steps of inputting a preprocessed lung CT image data set into an established lung tumor segmentation model for iterative training, and obtaining a trained lung tumor segmentation model;

lung CT image segmentation unit 304: the method is used for acquiring lung CT images to be segmented, inputting the lung CT images to be segmented into a trained lung tumor segmentation model for tumor segmentation, and completing the segmentation of the lung CT images.

In the specific implementation process, firstly, the data acquisition and preprocessing unit 301 acquires and preprocesses a lung CT image data set; the model establishing unit 302 establishes a lung tumor segmentation model based on a global-local attention mechanism; the lung tumor segmentation model is a U-shaped architecture network and comprises: the encoder and the decoder are connected in sequence, and jump connection is further arranged between the encoder and the decoder; the encoder includes the connection in proper order: a convolution module and a global-local transducer module; the model training unit 303 inputs the preprocessed lung CT image data set into the established lung tumor segmentation model for iterative training, and a trained lung tumor segmentation model is obtained; finally, a lung CT image segmentation unit 304 acquires a lung CT image to be segmented, and inputs the lung CT image to be segmented into a trained lung tumor segmentation model to carry out tumor segmentation, so as to complete the segmentation of the lung CT image;

The system in the embodiment realizes deep understanding of the whole semantics and the context information of the image by combining local feature transmission, global semantic information capture and multi-scale feature expression, and can effectively improve the segmentation precision.

The same or similar reference numerals correspond to the same or similar components;

the terms describing the positional relationship in the drawings are merely illustrative, and are not to be construed as limiting the present patent;

it is to be understood that the above examples of the present invention are provided by way of illustration only and not by way of limitation of the embodiments of the present invention. Other variations or modifications of the above teachings will be apparent to those of ordinary skill in the art. It is not necessary here nor is it exhaustive of all embodiments. Any modification, equivalent replacement, improvement, etc. which come within the spirit and principles of the invention are desired to be protected by the following claims.

Claims

1. A lung CT image segmentation method based on global and local attention mechanisms, comprising the steps of:

s1: acquiring a lung CT image data set and preprocessing;

2. The method for segmenting a lung CT image based on global and local attention mechanisms according to claim 1, wherein in step S1, the acquisition and preprocessing of the lung CT image dataset is specifically:

3. The method for segmenting a lung CT image based on global and local attentiveness mechanisms according to claim 2, wherein in the preprocessing in step S1, an image resampling operation is performed by using a nearest neighbor interpolation method; image normalization operations were performed using Z-Score normalization.

4. A method for segmenting a lung CT image based on global and local attention mechanisms according to any one of claims 1 to 3, wherein the lung tumor segmentation model established in step S2 is specifically:

5. The method for segmenting lung CT images based on global and local attention mechanisms according to claim 4, wherein in the lung tumor segmentation model of step S2, the jump connection is specifically:

6. The pulmonary CT image segmentation method based on global and local attention mechanisms according to claim 5, wherein the convolution modules of the encoder include a convolution block 1, a convolution block 2, a convolution block 3, and a maximum pooling layer connected in sequence;

the convolution kernel size of the maximum pooling layer is 3×3.

7. The method of claim 6, wherein the global-local transducer module in the encoder comprises 12 global-local transducer sub-modules connected in sequence;

8. The method for segmenting lung CT images based on global and local attentiveness mechanisms according to claim 7, wherein in step S3, the step of inputting the preprocessed lung CT image dataset into the established lung tumor segmentation model for iterative training, and the step of obtaining the trained lung tumor segmentation model comprises the following steps:

9. The method of claim 8, wherein the loss function in step S3.3 is a Dice loss function.

10. A lung CT image segmentation system based on global and local attention mechanisms, applying a lung CT image segmentation method based on global and local attention mechanisms as claimed in any one of claims 1 to 9, comprising: