CN117522881B - Cardiac image segmentation method based on attention mechanism and multi-level feature fusion - Google Patents
Cardiac image segmentation method based on attention mechanism and multi-level feature fusion Download PDFInfo
- Publication number
- CN117522881B CN117522881B CN202311461592.2A CN202311461592A CN117522881B CN 117522881 B CN117522881 B CN 117522881B CN 202311461592 A CN202311461592 A CN 202311461592A CN 117522881 B CN117522881 B CN 117522881B
- Authority
- CN
- China
- Prior art keywords
- layer
- convolution
- feature map
- convolution layer
- module
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 230000004927 fusion Effects 0.000 title claims abstract description 26
- 238000003709 image segmentation Methods 0.000 title claims abstract description 22
- 238000000034 method Methods 0.000 title claims abstract description 20
- 230000007246 mechanism Effects 0.000 title claims abstract description 16
- 230000000747 cardiac effect Effects 0.000 title claims description 15
- 238000010586 diagram Methods 0.000 claims abstract description 61
- 238000011176 pooling Methods 0.000 claims description 96
- 230000011218 segmentation Effects 0.000 claims description 38
- 238000013184 cardiac magnetic resonance imaging Methods 0.000 claims description 34
- 239000011159 matrix material Substances 0.000 claims description 24
- 238000012549 training Methods 0.000 claims description 18
- 230000006870 function Effects 0.000 claims description 12
- 238000012360 testing method Methods 0.000 claims description 9
- 238000002595 magnetic resonance imaging Methods 0.000 claims description 6
- 238000012795 verification Methods 0.000 claims description 5
- GNFTZDOKVXKIBK-UHFFFAOYSA-N 3-(2-methoxyethoxy)benzohydrazide Chemical compound COCCOC1=CC=CC(C(=O)NN)=C1 GNFTZDOKVXKIBK-UHFFFAOYSA-N 0.000 claims description 3
- ORILYTVJVMAKLC-UHFFFAOYSA-N Adamantane Natural products C1C(C2)CC3CC1CC2C3 ORILYTVJVMAKLC-UHFFFAOYSA-N 0.000 claims description 3
- FGUUSXIOTUKUDN-IBGZPJMESA-N C1(=CC=CC=C1)N1C2=C(NC([C@H](C1)NC=1OC(=NN=1)C1=CC=CC=C1)=O)C=CC=C2 Chemical compound C1(=CC=CC=C1)N1C2=C(NC([C@H](C1)NC=1OC(=NN=1)C1=CC=CC=C1)=O)C=CC=C2 FGUUSXIOTUKUDN-IBGZPJMESA-N 0.000 claims description 3
- 238000012952 Resampling Methods 0.000 claims description 3
- 238000000354 decomposition reaction Methods 0.000 claims description 3
- 238000003745 diagnosis Methods 0.000 claims description 3
- 238000010606 normalization Methods 0.000 claims description 3
- 239000011295 pitch Substances 0.000 claims description 3
- 238000007781 pre-processing Methods 0.000 claims description 3
- 230000005540 biological transmission Effects 0.000 abstract description 3
- 238000000605 extraction Methods 0.000 abstract description 3
- 230000008569 process Effects 0.000 abstract description 3
- 230000010339 dilation Effects 0.000 description 8
- 238000013135 deep learning Methods 0.000 description 2
- 230000006872 improvement Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 230000002107 myocardial effect Effects 0.000 description 2
- 210000003484 anatomy Anatomy 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000013527 convolutional neural network Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 230000009977 dual effect Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000003384 imaging method Methods 0.000 description 1
- 238000000638 solvent extraction Methods 0.000 description 1
- 238000010200 validation analysis Methods 0.000 description 1
- 230000002861 ventricular Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/10—Segmentation; Edge detection
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/044—Recurrent networks, e.g. Hopfield networks
- G06N3/0442—Recurrent networks, e.g. Hopfield networks characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
- G06N3/0455—Auto-encoder networks; Encoder-decoder networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/0464—Convolutional networks [CNN, ConvNet]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/048—Activation functions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/80—Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
- G06V10/806—Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10072—Tomographic images
- G06T2207/10088—Magnetic resonance imaging [MRI]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20081—Training; Learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20084—Artificial neural networks [ANN]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/30—Subject of image; Context of image processing
- G06T2207/30004—Biomedical image processing
- G06T2207/30048—Heart; Cardiac
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Computing Systems (AREA)
- Evolutionary Computation (AREA)
- Software Systems (AREA)
- Artificial Intelligence (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Biophysics (AREA)
- Data Mining & Analysis (AREA)
- Computational Linguistics (AREA)
- Biomedical Technology (AREA)
- Molecular Biology (AREA)
- General Engineering & Computer Science (AREA)
- Mathematical Physics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Databases & Information Systems (AREA)
- Medical Informatics (AREA)
- Multimedia (AREA)
- Image Processing (AREA)
Abstract
A heart image segmentation method based on an attention mechanism and multi-level feature fusion relates to the technical field of image segmentation, and adopts a symmetrical structure of a decoder-encoder mode. The dense cascade module is designed, the cavity convolution layers in the dense cascade module are cascade connected in a dense mode, the original input of the module is combined with the output after the feature extraction, and the transmission of the image feature information is enhanced; the self-attention module at the position is introduced to replace a bottom structure in a model structure, so that the input global information can be fused, and the robustness of the features and the local connection between the features can be effectively enhanced. A channel attention module is added during the jump connection process to weight the feature maps between channels and select a useful feature map. The multi-level gating fusion module is added in the decoder part, so that the contribution of different level characteristic diagrams can be automatically adjusted, multi-level information is fully utilized, and better prediction is realized.
Description
Technical Field
The invention relates to the technical field of image segmentation, in particular to a heart image segmentation method based on an attention mechanism and multi-level feature fusion.
Background
In recent years, the development of a deep learning method has profound effects on the field of heart image segmentation, so that the heart image segmentation is more accurate, efficient and self-adaptive, the manual workload is reduced, and the images with high resolution, high contrast and high signal to noise ratio in any direction are provided. According to the segmentation result, the indexes such as myocardial quality, myocardial thickness, ejection fraction, ventricular volume and the like can be effectively obtained, so that accurate segmentation is particularly important. However, accurate segmentation is a challenge because of non-uniformity in magnetic field strength, artifacts are easily created during imaging, boundary blurring is caused, and the anatomy of the heart is complex.
With the advent of deep learning and the advent of convolutional neural networks, rapid, high-precision and high-reliability are becoming criteria for image segmentation. Of these, U-Net, and its variants, have been used by many researchers for cardiac MRI segmentation, with the appearance of U-Net being the most important, and have become the basis for image segmentation. There is still a need for further improvements such as the inability to integrate global information while downsampling may lose spatial data. This is particularly disadvantageous for segmentation of medical images, as medical image segmentation typically requires extensive context details.
Disclosure of Invention
In order to overcome the defects of the technology, the invention provides a heart image segmentation method based on an attention mechanism and multi-level feature fusion, which fuses input global information, can effectively enhance the robustness of features and the local connection between the features.
The technical scheme adopted for overcoming the technical problems is as follows:
a heart image segmentation method based on an attention mechanism and multi-level feature fusion comprises the following steps:
a) Acquiring a cardiac MRI image dataset X, x= { X 1,X2,...,Xi,...,XN }, wherein X i is the i-th cardiac MRI image, i e { 1..once, N }, N being the number of cardiac MRI images;
b) Preprocessing the cardiac MRI image data set X to obtain a preprocessed data set X';
c) Dividing the preprocessed data set X' into a training set, a verification set and a test set;
d) Slicing each preprocessed cardiac MRI image in the training set along a Z axis to obtain M slice images, wherein the ith slice image is F i, i epsilon {1, …, M };
e) Establishing a segmentation network model formed by an encoder and a decoder;
f) Inputting the ith slice image F i into an encoder of the segmentation network model, and outputting to obtain a feature map A 5-i;
g) Inputting the feature map A 5-i into a decoder of the segmentation network model, and outputting a segmentation result image P i;
h) Training a segmentation network model to obtain an optimized segmentation network model;
i) Slicing each preprocessed cardiac MRI image in the test set along a Z axis to obtain Q slice images, wherein the ith slice image is F i', i epsilon { 1.. The first place, Q };
j) The i-th slice image F i 'is input to the optimized segmentation network model, and the segmentation result image P i' is output.
Preferably, N cardiac MRI images are acquired from the automated cardiac diagnostic challenge disclosure data Automated Cardiac Diagnosis Challenge in step a).
Further, step b) comprises the steps of:
b-1) converting the ith cardiac MRI image X i into an array Numpy by utilizing GetArrayFromImage () function in numpy library, and cutting the i cardiac MRI image X i converted into an array Numpy into V2D slices along the Z axis direction;
b-2) resampling each 2D slice to obtain new 2D images with V pixel pitches of (1.5 ), center cropping each new 2D image to obtain V cropped 2D images with 384×384, stacking the cropped 2D images to restore to form a 3D image Numpy array, and converting the 3D image Numpy array into a cardiac MRI image by using GetArrayFromArray () function in numpy library
B-3) MRI images of the heartClockwise overturning by 90 degrees or anticlockwise overturning by 90 degrees along a horizontal axis or a vertical axis with the probability of 0.5 to obtain a rotated image, and carrying out normalization operation on the rotated image to obtain a preprocessed ith heart MRI image X' i;
b-4) N preprocessed cardiac MRI images constitute a preprocessed dataset X',
X={X′1,X′2,...,X′i,…,X′N}。
Preferably, in step c), the preprocessed data set X' is divided into a training set, a verification set and a test set according to a ratio of 7:1:2.
Preferably, M in step d) takes a value of 1312.
Further, step f) comprises the steps of:
The encoder for dividing the network model comprises a first intensive cascade module, a first maximum pooling layer, a second intensive cascade module, a second maximum pooling layer, a third intensive cascade module, a third maximum pooling layer, a fourth intensive cascade module, a fourth maximum pooling layer and a position self-attention module;
F-2) the first dense cascade module of the encoder is sequentially composed of a first convolution layer, a second convolution layer, a third convolution layer and a fourth convolution layer, and the ith slice image F i is input into the first dense cascade module and output to obtain a feature map A 1-i;
f-3) inputting the characteristic diagram A 1-i into a first maximum pooling layer of the encoder, and outputting to obtain a characteristic diagram A' 1-i;
f-4) the second dense cascade module of the encoder is sequentially composed of a first convolution layer, a second convolution layer, a third convolution layer and a fourth convolution layer, the feature map A' 1-i is input into the second dense cascade module, and the feature map A 2-i is obtained through output;
f-5) inputting the characteristic diagram A 2-i into a second maximum pooling layer of the encoder, and outputting to obtain a characteristic diagram A' 2-i;
f-6) the third dense cascade module of the encoder is sequentially composed of a first convolution layer, a second convolution layer, a third convolution layer and a fourth convolution layer, the characteristic diagram A' 2-i is input into the third dense cascade module, and the characteristic diagram A 3-i is obtained through output;
f-7) inputting the characteristic diagram A 3-i into a third maximum pooling layer of the encoder, and outputting to obtain a characteristic diagram A' 3-i;
f-8) a fourth dense cascade module of the encoder sequentially comprises a first convolution layer, a second convolution layer, a third convolution layer and a fourth convolution layer, wherein a characteristic diagram A' 3-i is input into the fourth dense cascade module, and a characteristic diagram A 4-i is obtained through output;
f-9) inputting the characteristic diagram A 4-i into a fourth maximum pooling layer of the encoder, and outputting to obtain a characteristic diagram A' 4-i;
f-10) the position self-attention module of the encoder consists of a first convolution layer, a second convolution layer, a third convolution layer, a first bilinear interpolation layer, a second bilinear interpolation layer, a third bilinear interpolation layer, a softmax layer, a fourth bilinear interpolation layer and a fourth convolution layer, and the characteristic diagram A 4′-i is sequentially input into the first convolution layer and output to obtain the characteristic diagram Map the characteristic mapInputting the characteristic map A 4′-i into a second convolution layer in sequence, and outputting to obtain a characteristic map/>Map/>Inputting the characteristic images into a second bilinear interpolation layer for bilinear interpolation to obtain a characteristic image K, sequentially inputting the characteristic image A 4′-i into a third convolution layer, and outputting to obtain a characteristic image/>Map/>Inputting the characteristic map Q and the characteristic map K into a softmax layer, outputting the characteristic map QK, multiplying the characteristic map QK and the characteristic map V to obtain a characteristic map Att, inputting the characteristic map Att into a fourth bilinear interpolation layer to perform bilinear interpolation, and inputting the characteristic map Att into a fourth convolution layer to obtain a characteristic map A 5-i.
Preferably, in step f-2), the first convolution layer of the first dense cascade module has a convolution kernel size of 3×3 and a dilation rate of 1, the second convolution layer has a convolution kernel size of 3×3 and a dilation rate of 3, the third convolution layer has a convolution kernel size of 3×3 and a dilation rate of 5, and the fourth convolution layer has a convolution kernel size of 3×3 and a dilation rate of 1; step f-3) wherein the step size of the first largest pooling layer is 2 and the pooling kernel size is 2x 2; in the step f-4), the convolution kernel size of the first convolution layer of the second dense cascade module is 3×3, the expansion rate is 1, the convolution kernel size of the second convolution layer is 3×3, the expansion rate is 3, the convolution kernel size of the third convolution layer is 3×3, the expansion rate is 5, and the convolution kernel size of the fourth convolution layer is 3×3, the expansion rate is 1; step f-5) wherein the step size of the second largest pooling layer is 2 and the pooling kernel size is 2x 2; in the step f-6), the convolution kernel size of the first convolution layer of the third dense cascade module is 3×3, the expansion rate is 1, the convolution kernel size of the second convolution layer is 3×3, the expansion rate is 3, the convolution kernel size of the third convolution layer is 3×3, the expansion rate is 5, and the convolution kernel size of the fourth convolution layer is 3×3, the expansion rate is 1; step f-7) wherein the step size of the third largest pooling layer is 2 and the pooling kernel size is 2x 2; the first convolution layer of the fourth dense cascade module in step f-8) has a convolution kernel size of 3×3 and an expansion ratio of 1, the second convolution layer has a convolution kernel size of 3×3 and an expansion ratio of 3, the third convolution layer has a convolution kernel size of 3×3 and an expansion ratio of 5, and the fourth convolution layer has a convolution kernel size of 3×3 and an expansion ratio of 1; step f-9) wherein the fourth maximum pooling layer has a stride of 2 and a pooling kernel size of 2×2; the convolution kernel sizes of the first convolution layer, the second convolution layer, the third convolution layer and the fourth convolution layer of the self-attention module in the step f-10) are all 1×1.
Further, step g) comprises the steps of:
g-1) the decoder for dividing the network model is composed of a first double convolution module, a second double convolution module, a third double convolution module, a fourth double convolution module, a first double-channel attention module, a second double-channel attention module, a third double-channel attention module, a fourth double-channel attention module, a first upsampling layer, a second upsampling layer, a third upsampling layer, a fourth upsampling layer, a fifth upsampling layer and a multi-level gating fusion module; g-2) inputting the feature map A 5-i into a fifth upsampling layer of the decoder, and outputting to obtain a feature map C 5-i; the fourth dual-channel attention module of the G-3) decoder is composed of a global average pooling layer, a first convolution layer, a first ReLU layer, a second convolution layer, a first Sigmoid layer, a global maximum pooling layer, a third convolution layer, a second ReLU layer, a fourth convolution layer and a second Sigmoid layer, the feature map A 4′-i is sequentially input into the global average pooling layer, the first convolution layer, the first ReLU layer, the second convolution layer and the first Sigmoid layer of the fourth dual-channel attention module, the feature map G a is output and obtained, the feature map A 4′-i is sequentially input into the global maximum pooling layer of the fourth dual-channel attention module, Outputting the third convolution layer, the second ReLU layer, the fourth convolution layer and the second Sigmoid layer to obtain a feature map G m, adding the feature map G a and the feature map G m, multiplying the feature map G 4′-i by elements to obtain a feature map A 4″-i, inputting the feature map C 5-i into a fifth upsampling layer of a decoder to obtain a feature map C 5′-i, and performing splicing operation on the feature map A 4″-i and the feature map C 5′-i to obtain a feature map D 4-i; the fourth double-convolution module of the g-4) decoder is composed of a first convolution layer, a first BN layer, a first ReLU layer, a second convolution layer, a second BN layer and a second ReLU layer in sequence, the feature map D 4-i is input into the fourth double-convolution module, and the feature map C 4-i is output and obtained;
g-5) the third dual-channel attention module of the decoder is composed of a global average pooling layer, a first convolution layer, a first ReLU layer, a second convolution layer, a first Sigmoid layer, a global maximum pooling layer, a third convolution layer, a second ReLU layer, a fourth convolution layer and a second Sigmoid layer, the feature map A 3′-i is sequentially input into the global average pooling layer, the first convolution layer, the first ReLU layer, the second convolution layer and the first Sigmoid layer of the third dual-channel attention module, the feature map G a 'is obtained by outputting, the feature map A 3′-i is sequentially input into the global maximum pooling layer, the third convolution layer, the second ReLU layer, the fourth convolution layer and the second Sigmoid layer of the third dual-channel attention module, the feature map G m' is obtained by outputting, the feature map G a 'and the feature map G m' are added and multiplied by the feature map A 3′-i element by element to obtain a feature map A 3″-i, the feature map C 4-i is input into the fourth upper layer of the decoder, the feature map C 3″-i is obtained by outputting the feature map A3735, and the feature map C37 3-i is spliced with the feature map C;
g-6) the third double-convolution module of the decoder is sequentially composed of a first convolution layer, a first BN layer, a first ReLU layer, a second convolution layer, a second BN layer and a second ReLU layer, and the feature map D 3-i is input into the third double-convolution module and output to obtain a feature map C 3-i;
G-7) the second dual-channel attention module of the decoder is composed of a global average pooling layer, a first convolution layer, a first ReLU layer, a second convolution layer, a first Sigmoid layer, a global maximum pooling layer, a third convolution layer, a second ReLU layer, a fourth convolution layer and a second Sigmoid layer, the feature map a 2′-i is sequentially input into the global average pooling layer, the first convolution layer, the first ReLU layer, the second convolution layer and the first Sigmoid layer of the second dual-channel attention module, the feature map G a ″ is obtained by outputting, the feature map a 2′-i is sequentially input into the global maximum pooling layer, the third convolution layer, the second ReLU layer, the fourth convolution layer and the second Sigmoid layer of the second dual-channel attention module, the feature map G m ″ is obtained by outputting the feature map G a ″ and the feature map G m ″ which are added and multiplied by the feature map a 2′-i element by element, the feature map a 2″-i is obtained by inputting the feature map C 3-i into the third upper layer of the decoder, the feature map C 3′-i is obtained by outputting the feature map C 2″-i D, and the feature map C37 2-i is obtained by splicing the feature map D;
g-8) the second double-convolution module of the decoder is sequentially composed of a first convolution layer, a first BN layer, a first ReLU layer, a second convolution layer, a second BN layer and a second ReLU layer, and the feature map D 2-i is input into the second double-convolution module and output to obtain a feature map C 2-i;
G-9) the first dual-channel attention module of the decoder is composed of a global average pooling layer, a first convolution layer, a first ReLU layer, a second convolution layer, a first Sigmoid layer, a global maximum pooling layer, a third convolution layer, a second ReLU layer, a fourth convolution layer and a second Sigmoid layer, the feature map A 1′-i is sequentially input into the global average pooling layer, the first convolution layer, the first ReLU layer, the second convolution layer and the first Sigmoid layer of the first dual-channel attention module, the feature map G a ' "is obtained by outputting, the feature map A 1′-i is sequentially input into the global maximum pooling layer, the third convolution layer, the second ReLU layer, the fourth convolution layer and the second Sigmoid layer of the second dual-channel attention module, the feature map G m '" is obtained by outputting, the feature map G a ' "and the feature map G m '" are added, the feature map A ' 1-i is multiplied by element to obtain the feature map A "1-i, the feature map C 2-i is input into the second decoder, the feature map C 1-i is obtained by outputting the feature map A 1-i, and the feature map C is obtained by splicing the feature map C with the feature map C3798;
g-10) the first double-convolution module of the decoder is sequentially composed of a first convolution layer, a first BN layer, a first ReLU layer, a second convolution layer, a second BN layer and a second ReLU layer, wherein a feature map D 1-i is input into the first double-convolution module, and a feature map C 1-i is output and obtained;
The multi-level gating fusion module of the G-11) decoder consists of a first upsampling layer, a second upsampling layer, a third upsampling layer, a first convolution layer, a Sigmoid layer and a second convolution layer, wherein the characteristic diagram C 2-i is input into the first upsampling layer and is output to obtain a characteristic diagram C 2-i ', inputting the characteristic diagram C 3-i into a second upsampling layer, outputting to obtain a characteristic diagram C 3-i ', inputting the characteristic diagram C 4-i into a third upsampling layer, outputting to obtain a characteristic diagram C 4-i ', splicing the characteristic diagrams C 2-i ', C 3-i ', and C 4-i ', sequentially inputting to a first convolution layer and a Sigmoid layer, outputting to obtain a weight matrix G, carrying out singular value decomposition on the weight matrix G to obtain a left singular vector matrix W1, a diagonal matrix W2 and a right singular vector matrix W3, multiplying a feature map C 2-i with the left singular vector matrix W1 to obtain a feature map W1', multiplying a feature map C 3-i with the diagonal matrix W2 to obtain a feature map W2', multiplying a feature map C 4-i with the right singular vector matrix W3 to obtain a feature map W3', splicing the feature maps W1', W2' and W3' and inputting the spliced feature map W1' into a second convolution layer, and outputting to obtain a feature map Z;
g-12) inputs the feature map C 1-i into the first upsampling layer, outputs the feature map C 1-i ', adds the feature map C 1-i ' to the feature map Z, inputs the feature map C 1-i ' to the convolution layer with the convolution kernel size of 1×1, and outputs the segmentation result image P i.
Preferably, in step g-3), the convolution kernels of the first convolution layer, the second convolution layer, the third convolution layer and the fourth convolution layer of the fourth dual-channel attention module are all 1×1, the step sizes are all 1, and the filling are all 1; in the step g-4), the convolution kernel sizes of the first convolution layer and the second convolution layer of the fourth double convolution module are 3 multiplied by 3, the step sizes are 1, and the filling is 1; in the step g-5), the convolution kernels of the first convolution layer, the second convolution layer, the third convolution layer and the fourth convolution layer of the third dual-channel attention module are all 1 multiplied by 1, the step sizes are all 1, and the filling is all 1; in the step g-6), the convolution kernel sizes of the first convolution layer and the second convolution layer of the third double convolution module are 3 multiplied by 3, the step sizes are 1, and the filling is 1; the convolution kernels of the first convolution layer, the second convolution layer, the third convolution layer and the fourth convolution layer of the second dual-channel attention module in the step g-7) are all 1 multiplied by 1, the step sizes are all 1, and the filling is all 1; in the step g-8), the convolution kernel sizes of the first convolution layer and the second convolution layer of the second double-convolution module are 3 multiplied by 3, the step sizes are 1, and the filling is 1; in the step g-9), the convolution kernels of the first convolution layer, the second convolution layer, the third convolution layer and the fourth convolution layer of the first dual-channel attention module are all 1 multiplied by 1, the step sizes are all 1, and the filling is all 1; in the step g-10), the convolution kernel sizes of the first convolution layer and the second convolution layer of the first double convolution module are 3 multiplied by 3, the step sizes are 1, and the filling is 1.
Further, step h) comprises the steps of:
h-1) calculating to obtain total loss L total through a formula L total=αLCrossEntorpy+(1-α)LDice, wherein L CrossEntorpy is a cross entropy loss function, L Dice is a Dice loss function, and alpha is a weight;
h-2) training a segmentation network model by using an Adam optimizer and adopting total loss L total to obtain an optimized segmentation network model, wherein the batch size is set to 10 during training, the maximum epoch is set to 200, the learning rate lr is 0.01, and the alpha is set to 0.05.
The beneficial effects of the invention are as follows:
(1) A symmetrical structure of the decoder-encoder mode is employed. The convolution module in the network model is improved, and a dense cascade module is provided. A position self-attention mechanism is introduced to replace the bottom structure in the model structure, and a dual-path channel attention module is added in the jump connection process. A multi-level gating fusion module is added in the decoder part to fuse the characteristics of different stages.
(2) The encoder consists of four dense cascade modules, wherein the cavity convolution layers with expansion rates of 1, 3 and 5 are cascade connected in a dense mode, the original input of the modules and the output after feature extraction are combined, the receptive field is enlarged, and the transmission of image feature information is enhanced.
(3) The channel attention module weights the feature maps between channels and selects a useful feature map. The position self-attention module in the model bottom structure fuses the input global information, and can effectively enhance the robustness of the features and the local connection between the features so as to realize better prediction.
(4) The multi-level gating fusion module can automatically learn and adjust the contribution of the feature map of each level, the weighting map obtained through dynamic learning can control the proportion of the feature information of each level, and then the features of each level are fused.
Drawings
FIG. 1 is a block diagram of a split network of the present invention;
FIG. 2 is a block diagram of a dense cascading module of the present invention;
FIG. 3 is a block diagram of a dual channel attention module of the present invention;
FIG. 4 is a block diagram of a position self-attention module of the present invention;
Fig. 5 is a block diagram of a multi-level gated fusion module of the present invention.
Detailed Description
The invention is further described with reference to fig. 1 to 5.
A heart image segmentation method based on an attention mechanism and multi-level feature fusion comprises the following steps:
a) A cardiac MRI image dataset X, x= { X 1,X2,...,Xi,...,XN }, where X i is the i-th cardiac MRI image, i e { 1..once., N }, N is the number of cardiac MRI images.
B) Preprocessing the cardiac MRI image data set X to obtain a preprocessed data set X'.
C) The preprocessed data set X' is divided into a training set, a verification set and a test set.
D) Slicing each preprocessed cardiac MRI image in the training set along the Z-axis, M slice images were obtained, the i-th slice image being F i, i e { 1..m }.
E) A partitioning network model is built consisting of an encoder and a decoder.
F) The i-th slice image F i is input to the encoder of the segmentation network model, and the feature map a 5-i is output.
G) The feature map a 5-i is input to a decoder of the segmentation network model, and the segmentation result image P i is output.
H) Training the segmentation network model to obtain an optimized segmentation network model.
I) And slicing each preprocessed cardiac MRI image in the test set along the Z axis to obtain Q slice images, wherein the ith slice image is F i ', i epsilon { 1.. The first slice image is Q } and the second slice image is F i'.
J) The i-th slice image F i 'is input to the optimized segmentation network model, and the segmentation result image P i' is output.
A symmetrical structure of the decoder-encoder mode is employed. The convolution module in the network model is improved, a dense cascade module is provided, hollow convolution layers in the dense cascade module are cascade connected in a dense mode, the original input of the module is combined with the output after the feature extraction, and the transmission of the image feature information is enhanced; the self-attention module at the position is introduced to replace a bottom structure in a model structure, so that the input global information can be fused, and the robustness of the features and the local connection between the features can be effectively enhanced. A channel attention module is added during the jump connection process to weight the feature maps between channels and select a useful feature map. A multi-level gating fusion module is added in the decoder part, and can automatically adjust the contributions of different level feature graphs and fully utilize multi-level information.
In one embodiment of the invention, N cardiac MRI images are acquired from the automated cardiac diagnostic challenge disclosure data Automated Cardiac Diagnosis Challenge in step a).
In one embodiment of the invention, step b) comprises the steps of:
b-1) converting the ith cardiac MRI image X i into an array Numpy using the GetArrayFromImage () function in the numpy library, cutting the i cardiac MRI image X i converted into an array Numpy into V2D slices along the Z-axis.
B-2) resampling each 2D slice to obtain new 2D images with V pixel pitches of (1.5 ), center cropping each new 2D image to obtain V cropped 2D images with 384×384, stacking the cropped 2D images to restore to form a 3D image Numpy array, and converting the 3D image Numpy array into a cardiac MRI image by using GetArrayFromArray () function in numpy library
B-3) MRI images of the heartAnd (3) clockwise overturning by 90 degrees or anticlockwise overturning by 90 degrees along a horizontal axis or a vertical axis with the probability of 0.5 to obtain a rotated image, and performing normalization operation on the rotated image to obtain a preprocessed ith heart MRI image X' i.
B-4) N preprocessed cardiac MRI images form a preprocessed dataset X ', X= { X' 1,X′2,...,X′i,...,X′N }.
In one embodiment of the present invention, the preprocessed data set X' is divided into a training set, a validation set and a test set in a ratio of 7:1:2 in step c).
In one embodiment of the invention, M in step d) takes on a value of 1312.
In one embodiment of the invention, step f) comprises the steps of:
The encoder of the f-1) segmentation network model is composed of a first intensive cascade module, a first maximum pooling layer, a second intensive cascade module, a second maximum pooling layer, a third intensive cascade module, a third maximum pooling layer, a fourth intensive cascade module, a fourth maximum pooling layer and a position self-attention module.
The first dense cascade module of the F-2) encoder is sequentially composed of a first convolution layer, a second convolution layer, a third convolution layer and a fourth convolution layer, and the ith slice image F i is input into the first dense cascade module and output to obtain a feature map A 1-i.
F-3) inputting the characteristic map A 1-i into the first maximum pooling layer of the encoder, and outputting to obtain the characteristic map A 1′-i. f-4) the second dense cascade module of the encoder is sequentially composed of a first convolution layer, a second convolution layer, a third convolution layer and a fourth convolution layer, and the feature map A 1′-i is input into the second dense cascade module and output to obtain the feature map A 2-i.
F-5) inputting the characteristic map A 2-i into the second maximum pooling layer of the encoder, and outputting to obtain the characteristic map A 2′-i. f-6) the third dense cascade module of the encoder is sequentially composed of a first convolution layer, a second convolution layer, a third convolution layer and a fourth convolution layer, and the characteristic diagram A 2′-i is input into the third dense cascade module and output to obtain the characteristic diagram A 3-i.
F-7) inputting the characteristic map A 3-i into the third maximum pooling layer of the encoder, and outputting to obtain the characteristic map A 3′-i. f-8) the fourth dense cascade module of the encoder is sequentially composed of a first convolution layer, a second convolution layer, a third convolution layer and a fourth convolution layer, and the characteristic diagram A 3′-i is input into the fourth dense cascade module and output to obtain the characteristic diagram A 4-i.
F-9) inputting the characteristic map A 4-i into the fourth maximum pooling layer of the encoder, and outputting to obtain the characteristic map A 4′-i. f-10) the position self-attention module of the encoder consists of a first convolution layer, a second convolution layer, a third convolution layer, a first bilinear interpolation layer, a second bilinear interpolation layer, a third bilinear interpolation layer, a softmax layer, a fourth bilinear interpolation layer and a fourth convolution layer, and the characteristic diagram A 4′-i is sequentially input into the first convolution layer and output to obtain the characteristic diagramMap/>Inputting the characteristic map A 4′-i into a second convolution layer in sequence, and outputting to obtain a characteristic map/>Map/>Inputting the characteristic images into a second bilinear interpolation layer for bilinear interpolation to obtain a characteristic image K, sequentially inputting the characteristic image A 4′-i into a third convolution layer, and outputting to obtain a characteristic image/>Map/>Inputting the characteristic map Q and the characteristic map K into a softmax layer, outputting the characteristic map QK, multiplying the characteristic map QK and the characteristic map V to obtain a characteristic map Att, inputting the characteristic map Att into a fourth bilinear interpolation layer to perform bilinear interpolation, and inputting the characteristic map Att into a fourth convolution layer to obtain a characteristic map A 5-i.
In this embodiment, in step f-2), the first convolution layer of the first dense cascade module has a convolution kernel size of 3×3 and a dilation rate of 1, the second convolution layer has a convolution kernel size of 3×3 and a dilation rate of 3, the third convolution layer has a convolution kernel size of 3×3 and a dilation rate of 5, and the fourth convolution layer has a convolution kernel size of 3×3 and a dilation rate of 1; step f-3) wherein the step size of the first largest pooling layer is 2 and the pooling kernel size is 2 x 2; in the step f-4), the convolution kernel size of the first convolution layer of the second dense cascade module is 3×3, the expansion rate is 1, the convolution kernel size of the second convolution layer is 3×3, the expansion rate is 3, the convolution kernel size of the third convolution layer is 3×3, the expansion rate is 5, and the convolution kernel size of the fourth convolution layer is 3×3, the expansion rate is 1; step f-5) wherein the step size of the second largest pooling layer is 2 and the pooling kernel size is 2 x 2; in the step f-6), the convolution kernel size of the first convolution layer of the third dense cascade module is 3×3, the expansion rate is 1, the convolution kernel size of the second convolution layer is 3×3, the expansion rate is 3, the convolution kernel size of the third convolution layer is 3×3, the expansion rate is 5, and the convolution kernel size of the fourth convolution layer is 3×3, the expansion rate is 1; step f-7) wherein the step size of the third largest pooling layer is 2 and the pooling kernel size is 2 x 2; the first convolution layer of the fourth dense cascade module in step f-8) has a convolution kernel size of 3×3 and an expansion ratio of 1, the second convolution layer has a convolution kernel size of 3×3 and an expansion ratio of 3, the third convolution layer has a convolution kernel size of 3×3 and an expansion ratio of 5, and the fourth convolution layer has a convolution kernel size of 3×3 and an expansion ratio of 1; step f-9) wherein the fourth maximum pooling layer has a stride of 2 and a pooling kernel size of 2×2; the convolution kernel sizes of the first convolution layer, the second convolution layer, the third convolution layer and the fourth convolution layer of the self-attention module in the step f-10) are all 1×1.
In one embodiment of the invention, step g) comprises the steps of:
The decoder of the g-1) split network model is composed of a first double convolution module, a second double convolution module, a third double convolution module, a fourth double convolution module, a first double-channel attention module, a second double-channel attention module, a third double-channel attention module, a fourth double-channel attention module, a first upsampling layer, a second upsampling layer, a third upsampling layer, a fourth upsampling layer, a fifth upsampling layer and a multi-level gating fusion module.
G-2) the feature map a 5-i is input into the fifth upsampling layer of the decoder, and the output results in the feature map C 5-i. The fourth dual-channel attention module of the G-3) decoder is composed of a global average pooling layer, a first convolution layer, a first ReLU layer, a second convolution layer, a first Sigmoid layer, a global maximum pooling layer, a third convolution layer, a second ReLU layer, a fourth convolution layer and a second Sigmoid layer, the feature map A 4′-i is sequentially input into the global average pooling layer, the first convolution layer, the first ReLU layer, the second convolution layer and the first Sigmoid layer of the fourth dual-channel attention module, the feature map G a is output and obtained, the feature map A 4′-i is sequentially input into the global maximum pooling layer of the fourth dual-channel attention module, And outputting the third convolution layer, the second ReLU layer, the fourth convolution layer and the second Sigmoid layer to obtain a feature map G m, adding the feature map G a and the feature map G m, multiplying the feature map G 4′-i by elements to obtain a feature map A 4″-i, inputting the feature map C 5-i into a fifth upsampling layer of a decoder to obtain a feature map C 5′-i, and performing splicing operation on the feature map A 4″-i and the feature map C 5′-i to obtain a feature map D 4-i. The fourth double-convolution module of the g-4) decoder is sequentially composed of a first convolution layer, a first BN layer, a first ReLU layer, a second convolution layer, a second BN layer and a second ReLU layer, and the feature map D 4-i is input into the fourth double-convolution module and output to obtain a feature map C 4-i.
G-5) the third dual-channel attention module of the decoder is composed of a global average pooling layer, a first convolution layer, a first ReLU layer, a second convolution layer, a first Sigmoid layer, a global maximum pooling layer, a third convolution layer, a second ReLU layer, a fourth convolution layer and a second Sigmoid layer, the feature map a 3′-i is sequentially input into the global average pooling layer, the first convolution layer, the first ReLU layer, the second convolution layer and the first Sigmoid layer of the third dual-channel attention module, the feature map G a 'is obtained by outputting, the feature map a 3′-i is sequentially input into the global maximum pooling layer, the third convolution layer, the second ReLU layer, the fourth convolution layer and the second Sigmoid layer of the third dual-channel attention module, the feature map G m' is obtained by outputting, the feature map G a 'and the feature map G m' are added and are multiplied by the feature map a 3′-i element by element to obtain the feature map a 3″-i, the feature map C 4-i is input into the fourth upper layer of the decoder, the feature map C 4′-i is obtained by outputting the feature map C3798, and the feature map C376857 is obtained by splicing the feature map C with the feature map C.
The third double-convolution module of the g-6) decoder is sequentially composed of a first convolution layer, a first BN layer, a first ReLU layer, a second convolution layer, a second BN layer and a second ReLU layer, and the feature map D 3-i is input into the third double-convolution module and output to obtain a feature map C 3-i.
G-7) the second dual-channel attention module of the decoder is composed of a global average pooling layer, a first convolution layer, a first ReLU layer, a second convolution layer, a first Sigmoid layer, a global maximum pooling layer, a third convolution layer, a second ReLU layer, a fourth convolution layer, and a second Sigmoid layer, the feature map a 2′-i is sequentially input into the global average pooling layer, the first convolution layer, the first ReLU layer, the second convolution layer, and the first Sigmoid layer of the second dual-channel attention module, the feature map G a ″ is obtained by outputting, the feature map a 2′-i is sequentially input into the global maximum pooling layer, the third convolution layer, the second ReLU layer, the fourth convolution layer, and the second Sigmoid layer of the second dual-channel attention module, the feature map G m ″ is obtained by multiplying the feature map a 2′-i by element after adding the feature map G a ″ and the feature map G m ″, the feature map C3776 is input into the third upper layer of the decoder, the feature map C3737D is obtained by outputting the feature map C37 2-i, and the feature map C 2″-i is obtained by splicing the feature map C.
G-8) the second double-convolution module of the decoder is sequentially composed of a first convolution layer, a first BN layer, a first ReLU layer, a second convolution layer, a second BN layer and a second ReLU layer, and the feature map D 2-i is input into the second double-convolution module and output to obtain a feature map C 2-i.
G-9) the first dual-channel attention module of the decoder is composed of a global average pooling layer, a first convolution layer, a first ReLU layer, a second convolution layer, a first Sigmoid layer, a global maximum pooling layer, a third convolution layer, a second ReLU layer, a fourth convolution layer and a second Sigmoid layer, the feature map a 1′-i is sequentially input into the global average pooling layer, the first convolution layer, the first ReLU layer, the second convolution layer and the first Sigmoid layer of the first dual-channel attention module, the feature map G a '"is output, the feature map a 1′-i is sequentially input into the global maximum pooling layer, the third convolution layer, the second ReLU layer, the fourth convolution layer and the second Sigmoid layer of the second dual-channel attention module, the feature map G m'" is output, the feature map G a '"and the feature map G m'" are added, the feature map a 1′-i is multiplied by element to obtain the feature map a 1″-i, the feature map C 2-i is input into the second dual-channel attention module, the feature map C3798 'is output, and the feature map C 1-i' is spliced with the feature map C3723.
G-10) the first double-convolution module of the decoder is sequentially composed of a first convolution layer, a first BN layer, a first ReLU layer, a second convolution layer, a second BN layer and a second ReLU layer, and the feature map D 1-i is input into the first double-convolution module and output to obtain the feature map C 1-i.
The multi-level gating fusion module of the G-11) decoder consists of a first upsampling layer, a second upsampling layer, a third upsampling layer, a first convolution layer, a Sigmoid layer and a second convolution layer, wherein the characteristic diagram C 2-i is input into the first upsampling layer and is output to obtain a characteristic diagram C 2-i ', inputting the characteristic diagram C 3-i into the second upsampling layer, outputting the characteristic diagram C 3-i', inputting the characteristic diagram C 4-i into the third upsampling layer, outputting the characteristic diagram C 4-i ', inputting the characteristic diagrams C 2-i', C 3-i ', and C 4-i', after the splicing operation, sequentially into the first convolution layer and the Sigmoid layer, the method comprises the steps of outputting a weight matrix G, performing Singular Value Decomposition (SVD) on the weight matrix G to obtain a left singular vector matrix W1, a diagonal matrix W2 and a right singular vector matrix W3, multiplying a feature map C 2-i by the left singular vector matrix W1 to obtain a feature map W1', multiplying a feature map C 3-i by the diagonal matrix W2 to obtain a feature map W2', multiplying a feature map C 4-i by the right singular vector matrix W3 to obtain a feature map W3', splicing the feature maps W1', W2 'and W3' and inputting the spliced feature map W1 'and W3' into a second convolution layer, and outputting to obtain a feature map Z.
G-12) inputs the feature map C 1-i into the first upsampling layer, outputs the feature map C 1-i ', adds the feature map C 1-i ' to the feature map Z, inputs the feature map C 1-i ' to the convolution layer with the convolution kernel size of 1×1, and outputs the segmentation result image P i.
In this embodiment, in step g-3), the convolution kernels of the first, second, third, and fourth dual-channel attention modules are 1×1 in size, 1 in step size, and 1 in padding; in the step g-4), the convolution kernel sizes of the first convolution layer and the second convolution layer of the fourth double convolution module are 3 multiplied by 3, the step sizes are 1, and the filling is 1; in the step g-5), the convolution kernels of the first convolution layer, the second convolution layer, the third convolution layer and the fourth convolution layer of the third dual-channel attention module are all 1 multiplied by 1, the step sizes are all 1, and the filling is all 1; in the step g-6), the convolution kernel sizes of the first convolution layer and the second convolution layer of the third double convolution module are 3 multiplied by 3, the step sizes are 1, and the filling is 1; the convolution kernels of the first convolution layer, the second convolution layer, the third convolution layer and the fourth convolution layer of the second dual-channel attention module in the step g-7) are all 1 multiplied by 1, the step sizes are all 1, and the filling is all 1; in the step g-8), the convolution kernel sizes of the first convolution layer and the second convolution layer of the second double-convolution module are 3 multiplied by 3, the step sizes are 1, and the filling is 1; in the step g-9), the convolution kernels of the first convolution layer, the second convolution layer, the third convolution layer and the fourth convolution layer of the first dual-channel attention module are all 1 multiplied by 1, the step sizes are all 1, and the filling is all 1; in the step g-10), the convolution kernel sizes of the first convolution layer and the second convolution layer of the first double convolution module are 3 multiplied by 3, the step sizes are 1, and the filling is 1.
Step h) comprises the steps of:
h-1) calculating to obtain total loss L total through a formula L total=αLCrossEntorpy+(1-α)LDice, wherein L CrossEntorpy is a cross entropy loss function, L Dice is a Dice loss function, and alpha is a weight;
h-2) training a segmentation network model by using an Adam optimizer and adopting total loss L total to obtain an optimized segmentation network model, wherein the batch size is set to 10 during training, the maximum epoch is set to 200, the learning rate lr is 0.01, and the alpha is set to 0.05.
Finally, it should be noted that: the foregoing description is only a preferred embodiment of the present invention, and the present invention is not limited thereto, but it is to be understood that modifications and equivalents of some of the technical features described in the foregoing embodiments may be made by those skilled in the art, although the present invention has been described in detail with reference to the foregoing embodiments. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention should be included in the protection scope of the present invention.
Claims (8)
1. A heart image segmentation method based on attention mechanism and multi-level feature fusion is characterized by comprising the following steps:
a) Acquiring a cardiac MRI image dataset X, x= { X 1,X2,...,Xi,...,XN }, wherein X i is the i-th cardiac MRI image, i e { 1..once, N }, N being the number of cardiac MRI images;
b) Preprocessing the cardiac MRI image data set X to obtain a preprocessed data set X';
c) Dividing the preprocessed data set X' into a training set, a verification set and a test set;
d) Slicing each preprocessed cardiac MRI image in the training set along a Z axis to obtain M slice images, wherein the ith slice image is F i, i is { 1., M };
e) Establishing a segmentation network model formed by an encoder and a decoder;
f) Inputting the ith slice image F i into an encoder of the segmentation network model, and outputting to obtain a feature map A 5-i;
g) Inputting the feature map A 5-i into a decoder of the segmentation network model, and outputting a segmentation result image P i;
h) Training a segmentation network model to obtain an optimized segmentation network model;
i) Slicing each preprocessed cardiac MRI image in the test set along a Z axis to obtain Q slice images, wherein the ith slice image is F i', i epsilon { 1.. The first place, Q };
j) Inputting the ith slice image F i 'into the optimized segmentation network model, and outputting a segmentation result image P i';
step f) comprises the steps of:
The encoder for dividing the network model comprises a first intensive cascade module, a first maximum pooling layer, a second intensive cascade module, a second maximum pooling layer, a third intensive cascade module, a third maximum pooling layer, a fourth intensive cascade module, a fourth maximum pooling layer and a position self-attention module;
F-2) the first dense cascade module of the encoder is sequentially composed of a first convolution layer, a second convolution layer, a third convolution layer and a fourth convolution layer, and the ith slice image F i is input into the first dense cascade module and output to obtain a feature map A 1-i;
f-3) inputting the characteristic diagram A 1-i into a first maximum pooling layer of the encoder, and outputting to obtain a characteristic diagram A' 1-i;
f-4) the second dense cascade module of the encoder is sequentially composed of a first convolution layer, a second convolution layer, a third convolution layer and a fourth convolution layer, the feature map A' 1-i is input into the second dense cascade module, and the feature map A 2-i is obtained through output;
f-5) inputting the characteristic diagram A 2-i into a second maximum pooling layer of the encoder, and outputting to obtain a characteristic diagram A' 2-i;
f-6) the third dense cascade module of the encoder is sequentially composed of a first convolution layer, a second convolution layer, a third convolution layer and a fourth convolution layer, the characteristic diagram A' 2-i is input into the third dense cascade module, and the characteristic diagram A 3-i is obtained through output;
f-7) inputting the characteristic diagram A 3-i into a third maximum pooling layer of the encoder, and outputting to obtain a characteristic diagram A' 3-i;
f-8) a fourth dense cascade module of the encoder sequentially comprises a first convolution layer, a second convolution layer, a third convolution layer and a fourth convolution layer, wherein a characteristic diagram A' 3-i is input into the fourth dense cascade module, and a characteristic diagram A 4-i is obtained through output;
f-9) inputting the characteristic diagram A 4-i into a fourth maximum pooling layer of the encoder, and outputting to obtain a characteristic diagram A' 4-i;
f-10) the position self-attention module of the encoder consists of a first convolution layer, a second convolution layer, a third convolution layer, a first bilinear interpolation layer, a second bilinear interpolation layer, a third bilinear interpolation layer, a softmax layer, a fourth bilinear interpolation layer and a fourth convolution layer, and the characteristic diagram A' 4-i is sequentially input into the first convolution layer and output to obtain the characteristic diagram Map/>Inputting the characteristic images into a bilinear interpolation layer to perform bilinear interpolation to obtain a characteristic image Q, sequentially inputting the characteristic images A' 4-i into a second convolution layer, and outputting to obtain a characteristic image/>Map/>Inputting the characteristic images into a second bilinear interpolation layer for bilinear interpolation to obtain a characteristic image K, sequentially inputting the characteristic images A' 4-i into a third convolution layer, and outputting to obtain a characteristic image/>Map/>Inputting the characteristic image into a third bilinear interpolation layer for bilinear interpolation to obtain a characteristic image V, multiplying the characteristic image Q by the characteristic image K, inputting the characteristic image into a softmax layer, outputting the characteristic image to obtain a characteristic image QK, multiplying the characteristic image QK by the characteristic image V to obtain a characteristic image Att, inputting the characteristic image Att into a fourth bilinear interpolation layer for bilinear interpolation, and inputting the characteristic image Att into a fourth convolution layer to obtain a characteristic image A 5-i;
Step g) comprises the steps of:
g-1) the decoder for dividing the network model is composed of a first double convolution module, a second double convolution module, a third double convolution module, a fourth double convolution module, a first double-channel attention module, a second double-channel attention module, a third double-channel attention module, a fourth double-channel attention module, a first upsampling layer, a second upsampling layer, a third upsampling layer, a fourth upsampling layer, a fifth upsampling layer and a multi-level gating fusion module;
g-2) inputting the feature map A 5-i into a fifth upsampling layer of the decoder, and outputting to obtain a feature map C 5-i;
G-3) the fourth dual-channel attention module of the decoder is composed of a global average pooling layer, a first convolution layer, a first ReLU layer, a second convolution layer, a first Sigmoid layer, a global maximum pooling layer, a third convolution layer, a second ReLU layer, a fourth convolution layer and a second Sigmoid layer, the feature map A ' 4-i is sequentially input into the global average pooling layer, the first convolution layer, the first ReLU layer, the second convolution layer and the first Sigmoid layer of the fourth dual-channel attention module, the feature map G a is output, the feature map A ' 4-i is sequentially input into the global maximum pooling layer, the third convolution layer, the second ReLU layer, the fourth convolution layer and the second Sigmoid layer of the fourth dual-channel attention module, the feature map G m is output, the feature map G a and the feature map G m are added and multiplied by the feature map A ' 4-i element by element to obtain a feature map A ' 4-i, the feature map C 5-i is input into the fifth upper layer of the decoder, the feature map A ' 4-i is input into the fourth dual-channel attention module, the feature map C ' 4-i is output, and the feature map C ' 4-i is spliced;
The fourth double-convolution module of the g-4) decoder is composed of a first convolution layer, a first BN layer, a first ReLU layer, a second convolution layer, a second BN layer and a second ReLU layer in sequence, the feature map D 4-i is input into the fourth double-convolution module, and the feature map C 4-i is output and obtained;
The third dual-channel attention module of the G-5) decoder is composed of a global average pooling layer, a first convolution layer, a first ReLU layer, a second convolution layer, a first Sigmoid layer, a global maximum pooling layer, a third convolution layer, a second ReLU layer, a fourth convolution layer and a second Sigmoid layer, the feature map A ' 3-i is sequentially input into the global average pooling layer, the first convolution layer, the first ReLU layer, the second convolution layer and the first Sigmoid layer of the third dual-channel attention module, the feature map G a ' is output, the feature map A ' 3-i is sequentially input into the global maximum pooling layer, the third convolution layer, the second ReLU layer, the fourth convolution layer and the second Sigmoid layer of the third dual-channel attention module, the feature map G m ' is output, the feature map G a ' and the feature map G m ' are added and multiplied by element by the feature map A ' 3-i to obtain a feature map A ' 3-i, the feature map C 4-i is input into the fourth dual-channel attention module, the feature map C ' 3723 is output, and the feature map C ' 3798 is spliced with the feature map C ' 3-i;
g-6) the third double-convolution module of the decoder is sequentially composed of a first convolution layer, a first BN layer, a first ReLU layer, a second convolution layer, a second BN layer and a second ReLU layer, and the feature map D 3-i is input into the third double-convolution module and output to obtain a feature map C 3-i;
G-7) the second dual-channel attention module of the decoder is composed of a global average pooling layer, a first convolution layer, a first ReLU layer, a second convolution layer, a first Sigmoid layer, a global maximum pooling layer, a third convolution layer, a second ReLU layer, a fourth convolution layer and a second Sigmoid layer, the feature map A ' 2-i is sequentially input into the global average pooling layer, the first convolution layer, the first ReLU layer, the second convolution layer and the first Sigmoid layer of the second dual-channel attention module, the feature map G a ″ is obtained by outputting, the feature map A ' 2-i is sequentially input into the global maximum pooling layer, the third convolution layer, the second ReLU layer, the fourth convolution layer and the second Sigmoid layer of the second dual-channel attention module, the feature map G m ″ is obtained by outputting, the feature map G a ″ and the feature map G m ″ are added and multiplied by element by the feature map A ' 2-i to obtain a feature map A "2-i", the feature map C 3-i is input into the third upper layer of the decoder, the feature map A ' 3-i is obtained by outputting the feature map A ' 2-i and the feature map C ' 3798, and the feature map C is obtained by splicing the feature map C with the feature map C ' 2-i;
g-8) the second double-convolution module of the decoder is sequentially composed of a first convolution layer, a first BN layer, a first ReLU layer, a second convolution layer, a second BN layer and a second ReLU layer, and the feature map D 2-i is input into the second double-convolution module and output to obtain a feature map C 2-i;
G-9) the first dual-channel attention module of the decoder is composed of a global average pooling layer, a first convolution layer, a first ReLU layer, a second convolution layer, a first Sigmoid layer, a global maximum pooling layer, a third convolution layer, a second ReLU layer, a fourth convolution layer and a second Sigmoid layer, the feature map A '1-i is sequentially input into the global average pooling layer, the first convolution layer, the first ReLU layer, the second convolution layer and the first Sigmoid layer of the first dual-channel attention module, the feature map G a' "is output, the feature map A '1-i is sequentially input into the global maximum pooling layer, the third convolution layer, the second ReLU layer, the fourth convolution layer and the second Sigmoid layer of the second dual-channel attention module, the feature map G m'" is output, the feature map G a '"and the feature map G m'" are added and then multiplied with the element of the feature map A '1-i to obtain a feature map A "1-i, the feature map C is input into the second layer of the decoder, the feature map C' 3776 is input into the second dual-channel attention module, and the feature map C '3798 is output, and the feature map C is spliced with the feature map C' 1-i";
g-10) the first double-convolution module of the decoder is sequentially composed of a first convolution layer, a first BN layer, a first ReLU layer, a second convolution layer, a second BN layer and a second ReLU layer, wherein a feature map D 1-i is input into the first double-convolution module, and a feature map C 1-i is output and obtained;
The multi-level gating fusion module of the G-11) decoder consists of a first upsampling layer, a second upsampling layer, a third upsampling layer, a first convolution layer, a Sigmoid layer and a second convolution layer, wherein the characteristic diagram C 2-i is input into the first upsampling layer and is output to obtain a characteristic diagram C 2-i ', inputting the characteristic diagram C 3-i into a second upsampling layer, outputting to obtain a characteristic diagram C 3-i ', inputting the characteristic diagram C 4-i into a third upsampling layer, outputting to obtain a characteristic diagram C 4-i ', splicing the characteristic diagrams C 2-i ', C 3-i ', and C 4-i ', sequentially inputting to a first convolution layer and a Sigmoid layer, outputting to obtain a weight matrix G, carrying out singular value decomposition on the weight matrix G to obtain a left singular vector matrix W1, a diagonal matrix W2 and a right singular vector matrix W3, multiplying a feature map C 2-i with the left singular vector matrix W1 to obtain a feature map W1', multiplying a feature map C 3-i with the diagonal matrix W2 to obtain a feature map W2', multiplying a feature map C 4-i with the right singular vector matrix W3 to obtain a feature map W3', splicing the feature maps W1', W2' and W3' and inputting the spliced feature map W1' into a second convolution layer, and outputting to obtain a feature map Z;
g-12) inputs the feature map C 1-i into the first upsampling layer, outputs the feature map C 1-i ', adds the feature map C 1-i ' to the feature map Z, inputs the feature map C 1-i ' to the convolution layer with the convolution kernel size of 1×1, and outputs the segmentation result image P i.
2. The cardiac image segmentation method based on attention mechanism and multi-level feature fusion of claim 1, wherein: n cardiac MRI images are acquired from the automated cardiac diagnostic challenge disclosure data Automated Cardiac Diagnosis Challenge in step a).
3. The cardiac image segmentation method based on attention mechanism and multi-level feature fusion of claim 1, wherein: step b) comprises the steps of:
b-1) converting the ith cardiac MRI image X i into an array Numpy by utilizing GetArrayFromImage () function in numpy library, and cutting the i cardiac MRI image X i converted into an array Numpy into V2D slices along the Z axis direction;
b-2) resampling each 2D slice to obtain new 2D images with V pixel pitches of (1.5 ), center cropping each new 2D image to obtain V cropped 2D images with 384×384, stacking the cropped 2D images to restore to form a 3D image Numpy array, and converting the 3D image Numpy array into a cardiac MRI image by using GetArrayFromArray () function in numpy library
B-3) MRI images of the heartClockwise overturning by 90 degrees or anticlockwise overturning by 90 degrees along a horizontal axis or a vertical axis with the probability of 0.5 to obtain a rotated image, and carrying out normalization operation on the rotated image to obtain a preprocessed ith heart MRI image X' i;
b-4) N preprocessed cardiac MRI images form a preprocessed dataset X ', X= { X' 1,X′2,...,X′i,...,X′N }.
4. The cardiac image segmentation method based on attention mechanism and multi-level feature fusion of claim 1, wherein: in the step c), the preprocessed data set X' is divided into a training set, a verification set and a test set according to the proportion of 7:1:2.
5. The cardiac image segmentation method based on attention mechanism and multi-level feature fusion of claim 1, wherein: in step d), M takes on a value of 1312.
6. The cardiac image segmentation method based on attention mechanism and multi-level feature fusion of claim 1, wherein: in the step f-2), the convolution kernel size of the first convolution layer of the first dense cascade module is 3×3, the expansion rate is 1, the convolution kernel size of the second convolution layer is 3×3, the expansion rate is 3, the convolution kernel size of the third convolution layer is 3×3, the expansion rate is 5, and the convolution kernel size of the fourth convolution layer is 3×3, the expansion rate is 1; step f-3) wherein the step size of the first largest pooling layer is 2 and the pooling kernel size is 2 x 2; in the step f-4), the convolution kernel size of the first convolution layer of the second dense cascade module is 3×3, the expansion rate is 1, the convolution kernel size of the second convolution layer is 3×3, the expansion rate is 3, the convolution kernel size of the third convolution layer is 3×3, the expansion rate is 5, and the convolution kernel size of the fourth convolution layer is 3×3, the expansion rate is 1; step f-5) wherein the step size of the second largest pooling layer is 2 and the pooling kernel size is 2 x 2; in the step f-6), the convolution kernel size of the first convolution layer of the third dense cascade module is 3×3, the expansion rate is 1, the convolution kernel size of the second convolution layer is 3×3, the expansion rate is 3, the convolution kernel size of the third convolution layer is 3×3, the expansion rate is 5, and the convolution kernel size of the fourth convolution layer is 3×3, the expansion rate is 1; step f-7) wherein the step size of the third largest pooling layer is 2 and the pooling kernel size is 2 x 2; the first convolution layer of the fourth dense cascade module in step f-8) has a convolution kernel size of 3×3 and an expansion ratio of 1, the second convolution layer has a convolution kernel size of 3×3 and an expansion ratio of 3, the third convolution layer has a convolution kernel size of 3×3 and an expansion ratio of 5, and the fourth convolution layer has a convolution kernel size of 3×3 and an expansion ratio of 1; step f-9) wherein the fourth maximum pooling layer has a stride of 2 and a pooling kernel size of 2×2; the convolution kernel sizes of the first convolution layer, the second convolution layer, the third convolution layer and the fourth convolution layer of the self-attention module in the step f-10) are all 1×1.
7. The cardiac image segmentation method based on attention mechanism and multi-level feature fusion of claim 1, wherein: in the step g-3), the convolution kernels of the first convolution layer, the second convolution layer, the third convolution layer and the fourth convolution layer of the fourth dual-channel attention module are all 1 multiplied by 1, the step sizes are all 1, and the filling is all 1; in the step g-4), the convolution kernel sizes of the first convolution layer and the second convolution layer of the fourth double convolution module are 3 multiplied by 3, the step sizes are 1, and the filling is 1; in the step g-5), the convolution kernels of the first convolution layer, the second convolution layer, the third convolution layer and the fourth convolution layer of the third dual-channel attention module are all 1 multiplied by 1, the step sizes are all 1, and the filling is all 1; in the step g-6), the convolution kernel sizes of the first convolution layer and the second convolution layer of the third double convolution module are 3 multiplied by 3, the step sizes are 1, and the filling is 1; the convolution kernels of the first convolution layer, the second convolution layer, the third convolution layer and the fourth convolution layer of the second dual-channel attention module in the step g-7) are all 1 multiplied by 1, the step sizes are all 1, and the filling is all 1; in the step g-8), the convolution kernel sizes of the first convolution layer and the second convolution layer of the second double-convolution module are 3 multiplied by 3, the step sizes are 1, and the filling is 1; in the step g-9), the convolution kernels of the first convolution layer, the second convolution layer, the third convolution layer and the fourth convolution layer of the first dual-channel attention module are all 1 multiplied by 1, the step sizes are all 1, and the filling is all 1; in the step g-10), the convolution kernel sizes of the first convolution layer and the second convolution layer of the first double convolution module are 3 multiplied by 3, the step sizes are 1, and the filling is 1.
8. The cardiac image segmentation method based on attention mechanism and multi-level feature fusion as recited in claim 1 in which step h) comprises the steps of:
h-1) calculating to obtain total loss L total through a formula L total=αLCrossEntorpy+(1-α)LDice, wherein L CrossEntorpy is a cross entropy loss function, L Dice is a Dice loss function, and alpha is a weight;
h-2) training a segmentation network model by using an Adam optimizer and adopting total loss L total to obtain an optimized segmentation network model, wherein the batch size is set to 10 during training, the maximum epoch is set to 200, the learning rate lr is 0.01, and the alpha is set to 0.05.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311461592.2A CN117522881B (en) | 2023-11-06 | 2023-11-06 | Cardiac image segmentation method based on attention mechanism and multi-level feature fusion |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311461592.2A CN117522881B (en) | 2023-11-06 | 2023-11-06 | Cardiac image segmentation method based on attention mechanism and multi-level feature fusion |
Publications (2)
Publication Number | Publication Date |
---|---|
CN117522881A CN117522881A (en) | 2024-02-06 |
CN117522881B true CN117522881B (en) | 2024-06-18 |
Family
ID=89759877
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202311461592.2A Active CN117522881B (en) | 2023-11-06 | 2023-11-06 | Cardiac image segmentation method based on attention mechanism and multi-level feature fusion |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN117522881B (en) |
Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116612131A (en) * | 2023-05-22 | 2023-08-18 | 山东省人工智能研究院 | Cardiac MRI structure segmentation method based on ADC-UNet model |
Family Cites Families (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10922816B2 (en) * | 2018-08-27 | 2021-02-16 | Siemens Healthcare Gmbh | Medical image segmentation from raw data using a deep attention neural network |
CN109389078B (en) * | 2018-09-30 | 2022-06-21 | 京东方科技集团股份有限公司 | Image segmentation method, corresponding device and electronic equipment |
US11270447B2 (en) * | 2020-02-10 | 2022-03-08 | Hong Kong Applied Science And Technology Institute Company Limited | Method for image segmentation using CNN |
AU2020103905A4 (en) * | 2020-12-04 | 2021-02-11 | Chongqing Normal University | Unsupervised cross-domain self-adaptive medical image segmentation method based on deep adversarial learning |
CN115375711A (en) * | 2022-09-19 | 2022-11-22 | 安徽大学 | Image segmentation method of global context attention network based on multi-scale fusion |
CN116843696B (en) * | 2023-04-27 | 2024-04-09 | 山东省人工智能研究院 | Cardiac MRI (magnetic resonance imaging) segmentation method based on feature similarity and super-parameter convolution attention |
CN116740076A (en) * | 2023-05-15 | 2023-09-12 | 苏州大学 | Network model and method for pigment segmentation in retinal pigment degeneration fundus image |
CN116563265B (en) * | 2023-05-23 | 2024-03-01 | 山东省人工智能研究院 | Cardiac MRI (magnetic resonance imaging) segmentation method based on multi-scale attention and self-adaptive feature fusion |
-
2023
- 2023-11-06 CN CN202311461592.2A patent/CN117522881B/en active Active
Patent Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116612131A (en) * | 2023-05-22 | 2023-08-18 | 山东省人工智能研究院 | Cardiac MRI structure segmentation method based on ADC-UNet model |
Also Published As
Publication number | Publication date |
---|---|
CN117522881A (en) | 2024-02-06 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2023231329A1 (en) | Medical image semantic segmentation method and apparatus | |
CN109584161A (en) | The Remote sensed image super-resolution reconstruction method of convolutional neural networks based on channel attention | |
CN110675321A (en) | Super-resolution image reconstruction method based on progressive depth residual error network | |
CN113012172A (en) | AS-UNet-based medical image segmentation method and system | |
CN109214989A (en) | Single image super resolution ratio reconstruction method based on Orientation Features prediction priori | |
CN111932461A (en) | Convolutional neural network-based self-learning image super-resolution reconstruction method and system | |
CN111583285A (en) | Liver image semantic segmentation method based on edge attention strategy | |
CN107341776A (en) | Single frames super resolution ratio reconstruction method based on sparse coding and combinatorial mapping | |
CN111667407B (en) | Image super-resolution method guided by depth information | |
CN113298717A (en) | Medical image super-resolution reconstruction method based on multi-attention residual error feature fusion | |
CN112561799A (en) | Infrared image super-resolution reconstruction method | |
CN112365422A (en) | Irregular missing image restoration method and system based on deep aggregation network | |
CN115239674B (en) | Computer angiography imaging synthesis method based on multi-scale discrimination | |
CN115565056A (en) | Underwater image enhancement method and system based on condition generation countermeasure network | |
CN110853048A (en) | MRI image segmentation method, device and storage medium based on rough training and fine training | |
CN115578427A (en) | Unsupervised single-mode medical image registration method based on deep learning | |
CN115375711A (en) | Image segmentation method of global context attention network based on multi-scale fusion | |
CN116739899A (en) | Image super-resolution reconstruction method based on SAUGAN network | |
CN114998458A (en) | Undersampled magnetic resonance image reconstruction method based on reference image and data correction | |
CN115222592A (en) | Underwater image enhancement method based on super-resolution network and U-Net network and training method of network model | |
CN113379606B (en) | Face super-resolution method based on pre-training generation model | |
CN117522881B (en) | Cardiac image segmentation method based on attention mechanism and multi-level feature fusion | |
CN117036162B (en) | Residual feature attention fusion method for super-resolution of lightweight chest CT image | |
CN116051609B (en) | Unsupervised medical image registration method based on band-limited deformation Fourier network | |
CN115859606A (en) | SAR-optical image translation method and system based on image evaluation and feature selection |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |