CN117522881B - Cardiac image segmentation method based on attention mechanism and multi-level feature fusion - Google Patents

Cardiac image segmentation method based on attention mechanism and multi-level feature fusion Download PDF

Info

Publication number
CN117522881B
CN117522881B CN202311461592.2A CN202311461592A CN117522881B CN 117522881 B CN117522881 B CN 117522881B CN 202311461592 A CN202311461592 A CN 202311461592A CN 117522881 B CN117522881 B CN 117522881B
Authority
CN
China
Prior art keywords
layer
convolution
feature map
convolution layer
module
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202311461592.2A
Other languages
Chinese (zh)
Other versions
CN117522881A (en
Inventor
陈长芳
翟纯琳
舒明雷
周书旺
孔祥龙
朱喆
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Qilu University of Technology
Shandong Institute of Artificial Intelligence
Original Assignee
Qilu University of Technology
Shandong Institute of Artificial Intelligence
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Qilu University of Technology, Shandong Institute of Artificial Intelligence filed Critical Qilu University of Technology
Priority to CN202311461592.2A priority Critical patent/CN117522881B/en
Publication of CN117522881A publication Critical patent/CN117522881A/en
Application granted granted Critical
Publication of CN117522881B publication Critical patent/CN117522881B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • G06N3/0442Recurrent networks, e.g. Hopfield networks characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • G06N3/0455Auto-encoder networks; Encoder-decoder networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10072Tomographic images
    • G06T2207/10088Magnetic resonance imaging [MRI]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30004Biomedical image processing
    • G06T2207/30048Heart; Cardiac

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • Evolutionary Computation (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Biophysics (AREA)
  • Data Mining & Analysis (AREA)
  • Computational Linguistics (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Multimedia (AREA)
  • Image Processing (AREA)

Abstract

A heart image segmentation method based on an attention mechanism and multi-level feature fusion relates to the technical field of image segmentation, and adopts a symmetrical structure of a decoder-encoder mode. The dense cascade module is designed, the cavity convolution layers in the dense cascade module are cascade connected in a dense mode, the original input of the module is combined with the output after the feature extraction, and the transmission of the image feature information is enhanced; the self-attention module at the position is introduced to replace a bottom structure in a model structure, so that the input global information can be fused, and the robustness of the features and the local connection between the features can be effectively enhanced. A channel attention module is added during the jump connection process to weight the feature maps between channels and select a useful feature map. The multi-level gating fusion module is added in the decoder part, so that the contribution of different level characteristic diagrams can be automatically adjusted, multi-level information is fully utilized, and better prediction is realized.

Description

Cardiac image segmentation method based on attention mechanism and multi-level feature fusion
Technical Field
The invention relates to the technical field of image segmentation, in particular to a heart image segmentation method based on an attention mechanism and multi-level feature fusion.
Background
In recent years, the development of a deep learning method has profound effects on the field of heart image segmentation, so that the heart image segmentation is more accurate, efficient and self-adaptive, the manual workload is reduced, and the images with high resolution, high contrast and high signal to noise ratio in any direction are provided. According to the segmentation result, the indexes such as myocardial quality, myocardial thickness, ejection fraction, ventricular volume and the like can be effectively obtained, so that accurate segmentation is particularly important. However, accurate segmentation is a challenge because of non-uniformity in magnetic field strength, artifacts are easily created during imaging, boundary blurring is caused, and the anatomy of the heart is complex.
With the advent of deep learning and the advent of convolutional neural networks, rapid, high-precision and high-reliability are becoming criteria for image segmentation. Of these, U-Net, and its variants, have been used by many researchers for cardiac MRI segmentation, with the appearance of U-Net being the most important, and have become the basis for image segmentation. There is still a need for further improvements such as the inability to integrate global information while downsampling may lose spatial data. This is particularly disadvantageous for segmentation of medical images, as medical image segmentation typically requires extensive context details.
Disclosure of Invention
In order to overcome the defects of the technology, the invention provides a heart image segmentation method based on an attention mechanism and multi-level feature fusion, which fuses input global information, can effectively enhance the robustness of features and the local connection between the features.
The technical scheme adopted for overcoming the technical problems is as follows:
a heart image segmentation method based on an attention mechanism and multi-level feature fusion comprises the following steps:
a) Acquiring a cardiac MRI image dataset X, x= { X 1,X2,...,Xi,...,XN }, wherein X i is the i-th cardiac MRI image, i e { 1..once, N }, N being the number of cardiac MRI images;
b) Preprocessing the cardiac MRI image data set X to obtain a preprocessed data set X';
c) Dividing the preprocessed data set X' into a training set, a verification set and a test set;
d) Slicing each preprocessed cardiac MRI image in the training set along a Z axis to obtain M slice images, wherein the ith slice image is F i, i epsilon {1, …, M };
e) Establishing a segmentation network model formed by an encoder and a decoder;
f) Inputting the ith slice image F i into an encoder of the segmentation network model, and outputting to obtain a feature map A 5-i;
g) Inputting the feature map A 5-i into a decoder of the segmentation network model, and outputting a segmentation result image P i;
h) Training a segmentation network model to obtain an optimized segmentation network model;
i) Slicing each preprocessed cardiac MRI image in the test set along a Z axis to obtain Q slice images, wherein the ith slice image is F i', i epsilon { 1.. The first place, Q };
j) The i-th slice image F i 'is input to the optimized segmentation network model, and the segmentation result image P i' is output.
Preferably, N cardiac MRI images are acquired from the automated cardiac diagnostic challenge disclosure data Automated Cardiac Diagnosis Challenge in step a).
Further, step b) comprises the steps of:
b-1) converting the ith cardiac MRI image X i into an array Numpy by utilizing GetArrayFromImage () function in numpy library, and cutting the i cardiac MRI image X i converted into an array Numpy into V2D slices along the Z axis direction;
b-2) resampling each 2D slice to obtain new 2D images with V pixel pitches of (1.5 ), center cropping each new 2D image to obtain V cropped 2D images with 384×384, stacking the cropped 2D images to restore to form a 3D image Numpy array, and converting the 3D image Numpy array into a cardiac MRI image by using GetArrayFromArray () function in numpy library
B-3) MRI images of the heartClockwise overturning by 90 degrees or anticlockwise overturning by 90 degrees along a horizontal axis or a vertical axis with the probability of 0.5 to obtain a rotated image, and carrying out normalization operation on the rotated image to obtain a preprocessed ith heart MRI image X' i;
b-4) N preprocessed cardiac MRI images constitute a preprocessed dataset X',
X={X′1,X′2,...,X′i,…,X′N}。
Preferably, in step c), the preprocessed data set X' is divided into a training set, a verification set and a test set according to a ratio of 7:1:2.
Preferably, M in step d) takes a value of 1312.
Further, step f) comprises the steps of:
The encoder for dividing the network model comprises a first intensive cascade module, a first maximum pooling layer, a second intensive cascade module, a second maximum pooling layer, a third intensive cascade module, a third maximum pooling layer, a fourth intensive cascade module, a fourth maximum pooling layer and a position self-attention module;
F-2) the first dense cascade module of the encoder is sequentially composed of a first convolution layer, a second convolution layer, a third convolution layer and a fourth convolution layer, and the ith slice image F i is input into the first dense cascade module and output to obtain a feature map A 1-i;
f-3) inputting the characteristic diagram A 1-i into a first maximum pooling layer of the encoder, and outputting to obtain a characteristic diagram A' 1-i;
f-4) the second dense cascade module of the encoder is sequentially composed of a first convolution layer, a second convolution layer, a third convolution layer and a fourth convolution layer, the feature map A' 1-i is input into the second dense cascade module, and the feature map A 2-i is obtained through output;
f-5) inputting the characteristic diagram A 2-i into a second maximum pooling layer of the encoder, and outputting to obtain a characteristic diagram A' 2-i;
f-6) the third dense cascade module of the encoder is sequentially composed of a first convolution layer, a second convolution layer, a third convolution layer and a fourth convolution layer, the characteristic diagram A' 2-i is input into the third dense cascade module, and the characteristic diagram A 3-i is obtained through output;
f-7) inputting the characteristic diagram A 3-i into a third maximum pooling layer of the encoder, and outputting to obtain a characteristic diagram A' 3-i;
f-8) a fourth dense cascade module of the encoder sequentially comprises a first convolution layer, a second convolution layer, a third convolution layer and a fourth convolution layer, wherein a characteristic diagram A' 3-i is input into the fourth dense cascade module, and a characteristic diagram A 4-i is obtained through output;
f-9) inputting the characteristic diagram A 4-i into a fourth maximum pooling layer of the encoder, and outputting to obtain a characteristic diagram A' 4-i;
f-10) the position self-attention module of the encoder consists of a first convolution layer, a second convolution layer, a third convolution layer, a first bilinear interpolation layer, a second bilinear interpolation layer, a third bilinear interpolation layer, a softmax layer, a fourth bilinear interpolation layer and a fourth convolution layer, and the characteristic diagram A 4-i is sequentially input into the first convolution layer and output to obtain the characteristic diagram Map the characteristic mapInputting the characteristic map A 4-i into a second convolution layer in sequence, and outputting to obtain a characteristic map/>Map/>Inputting the characteristic images into a second bilinear interpolation layer for bilinear interpolation to obtain a characteristic image K, sequentially inputting the characteristic image A 4-i into a third convolution layer, and outputting to obtain a characteristic image/>Map/>Inputting the characteristic map Q and the characteristic map K into a softmax layer, outputting the characteristic map QK, multiplying the characteristic map QK and the characteristic map V to obtain a characteristic map Att, inputting the characteristic map Att into a fourth bilinear interpolation layer to perform bilinear interpolation, and inputting the characteristic map Att into a fourth convolution layer to obtain a characteristic map A 5-i.
Preferably, in step f-2), the first convolution layer of the first dense cascade module has a convolution kernel size of 3×3 and a dilation rate of 1, the second convolution layer has a convolution kernel size of 3×3 and a dilation rate of 3, the third convolution layer has a convolution kernel size of 3×3 and a dilation rate of 5, and the fourth convolution layer has a convolution kernel size of 3×3 and a dilation rate of 1; step f-3) wherein the step size of the first largest pooling layer is 2 and the pooling kernel size is 2x 2; in the step f-4), the convolution kernel size of the first convolution layer of the second dense cascade module is 3×3, the expansion rate is 1, the convolution kernel size of the second convolution layer is 3×3, the expansion rate is 3, the convolution kernel size of the third convolution layer is 3×3, the expansion rate is 5, and the convolution kernel size of the fourth convolution layer is 3×3, the expansion rate is 1; step f-5) wherein the step size of the second largest pooling layer is 2 and the pooling kernel size is 2x 2; in the step f-6), the convolution kernel size of the first convolution layer of the third dense cascade module is 3×3, the expansion rate is 1, the convolution kernel size of the second convolution layer is 3×3, the expansion rate is 3, the convolution kernel size of the third convolution layer is 3×3, the expansion rate is 5, and the convolution kernel size of the fourth convolution layer is 3×3, the expansion rate is 1; step f-7) wherein the step size of the third largest pooling layer is 2 and the pooling kernel size is 2x 2; the first convolution layer of the fourth dense cascade module in step f-8) has a convolution kernel size of 3×3 and an expansion ratio of 1, the second convolution layer has a convolution kernel size of 3×3 and an expansion ratio of 3, the third convolution layer has a convolution kernel size of 3×3 and an expansion ratio of 5, and the fourth convolution layer has a convolution kernel size of 3×3 and an expansion ratio of 1; step f-9) wherein the fourth maximum pooling layer has a stride of 2 and a pooling kernel size of 2×2; the convolution kernel sizes of the first convolution layer, the second convolution layer, the third convolution layer and the fourth convolution layer of the self-attention module in the step f-10) are all 1×1.
Further, step g) comprises the steps of:
g-1) the decoder for dividing the network model is composed of a first double convolution module, a second double convolution module, a third double convolution module, a fourth double convolution module, a first double-channel attention module, a second double-channel attention module, a third double-channel attention module, a fourth double-channel attention module, a first upsampling layer, a second upsampling layer, a third upsampling layer, a fourth upsampling layer, a fifth upsampling layer and a multi-level gating fusion module; g-2) inputting the feature map A 5-i into a fifth upsampling layer of the decoder, and outputting to obtain a feature map C 5-i; the fourth dual-channel attention module of the G-3) decoder is composed of a global average pooling layer, a first convolution layer, a first ReLU layer, a second convolution layer, a first Sigmoid layer, a global maximum pooling layer, a third convolution layer, a second ReLU layer, a fourth convolution layer and a second Sigmoid layer, the feature map A 4-i is sequentially input into the global average pooling layer, the first convolution layer, the first ReLU layer, the second convolution layer and the first Sigmoid layer of the fourth dual-channel attention module, the feature map G a is output and obtained, the feature map A 4-i is sequentially input into the global maximum pooling layer of the fourth dual-channel attention module, Outputting the third convolution layer, the second ReLU layer, the fourth convolution layer and the second Sigmoid layer to obtain a feature map G m, adding the feature map G a and the feature map G m, multiplying the feature map G 4-i by elements to obtain a feature map A 4-i, inputting the feature map C 5-i into a fifth upsampling layer of a decoder to obtain a feature map C 5-i, and performing splicing operation on the feature map A 4-i and the feature map C 5-i to obtain a feature map D 4-i; the fourth double-convolution module of the g-4) decoder is composed of a first convolution layer, a first BN layer, a first ReLU layer, a second convolution layer, a second BN layer and a second ReLU layer in sequence, the feature map D 4-i is input into the fourth double-convolution module, and the feature map C 4-i is output and obtained;
g-5) the third dual-channel attention module of the decoder is composed of a global average pooling layer, a first convolution layer, a first ReLU layer, a second convolution layer, a first Sigmoid layer, a global maximum pooling layer, a third convolution layer, a second ReLU layer, a fourth convolution layer and a second Sigmoid layer, the feature map A 3-i is sequentially input into the global average pooling layer, the first convolution layer, the first ReLU layer, the second convolution layer and the first Sigmoid layer of the third dual-channel attention module, the feature map G a 'is obtained by outputting, the feature map A 3-i is sequentially input into the global maximum pooling layer, the third convolution layer, the second ReLU layer, the fourth convolution layer and the second Sigmoid layer of the third dual-channel attention module, the feature map G m' is obtained by outputting, the feature map G a 'and the feature map G m' are added and multiplied by the feature map A 3-i element by element to obtain a feature map A 3-i, the feature map C 4-i is input into the fourth upper layer of the decoder, the feature map C 3-i is obtained by outputting the feature map A3735, and the feature map C37 3-i is spliced with the feature map C;
g-6) the third double-convolution module of the decoder is sequentially composed of a first convolution layer, a first BN layer, a first ReLU layer, a second convolution layer, a second BN layer and a second ReLU layer, and the feature map D 3-i is input into the third double-convolution module and output to obtain a feature map C 3-i;
G-7) the second dual-channel attention module of the decoder is composed of a global average pooling layer, a first convolution layer, a first ReLU layer, a second convolution layer, a first Sigmoid layer, a global maximum pooling layer, a third convolution layer, a second ReLU layer, a fourth convolution layer and a second Sigmoid layer, the feature map a 2-i is sequentially input into the global average pooling layer, the first convolution layer, the first ReLU layer, the second convolution layer and the first Sigmoid layer of the second dual-channel attention module, the feature map G a ″ is obtained by outputting, the feature map a 2-i is sequentially input into the global maximum pooling layer, the third convolution layer, the second ReLU layer, the fourth convolution layer and the second Sigmoid layer of the second dual-channel attention module, the feature map G m ″ is obtained by outputting the feature map G a ″ and the feature map G m ″ which are added and multiplied by the feature map a 2-i element by element, the feature map a 2-i is obtained by inputting the feature map C 3-i into the third upper layer of the decoder, the feature map C 3-i is obtained by outputting the feature map C 2-i D, and the feature map C37 2-i is obtained by splicing the feature map D;
g-8) the second double-convolution module of the decoder is sequentially composed of a first convolution layer, a first BN layer, a first ReLU layer, a second convolution layer, a second BN layer and a second ReLU layer, and the feature map D 2-i is input into the second double-convolution module and output to obtain a feature map C 2-i;
G-9) the first dual-channel attention module of the decoder is composed of a global average pooling layer, a first convolution layer, a first ReLU layer, a second convolution layer, a first Sigmoid layer, a global maximum pooling layer, a third convolution layer, a second ReLU layer, a fourth convolution layer and a second Sigmoid layer, the feature map A 1-i is sequentially input into the global average pooling layer, the first convolution layer, the first ReLU layer, the second convolution layer and the first Sigmoid layer of the first dual-channel attention module, the feature map G a ' "is obtained by outputting, the feature map A 1-i is sequentially input into the global maximum pooling layer, the third convolution layer, the second ReLU layer, the fourth convolution layer and the second Sigmoid layer of the second dual-channel attention module, the feature map G m '" is obtained by outputting, the feature map G a ' "and the feature map G m '" are added, the feature map A ' 1-i is multiplied by element to obtain the feature map A "1-i, the feature map C 2-i is input into the second decoder, the feature map C 1-i is obtained by outputting the feature map A 1-i, and the feature map C is obtained by splicing the feature map C with the feature map C3798;
g-10) the first double-convolution module of the decoder is sequentially composed of a first convolution layer, a first BN layer, a first ReLU layer, a second convolution layer, a second BN layer and a second ReLU layer, wherein a feature map D 1-i is input into the first double-convolution module, and a feature map C 1-i is output and obtained;
The multi-level gating fusion module of the G-11) decoder consists of a first upsampling layer, a second upsampling layer, a third upsampling layer, a first convolution layer, a Sigmoid layer and a second convolution layer, wherein the characteristic diagram C 2-i is input into the first upsampling layer and is output to obtain a characteristic diagram C 2-i ', inputting the characteristic diagram C 3-i into a second upsampling layer, outputting to obtain a characteristic diagram C 3-i ', inputting the characteristic diagram C 4-i into a third upsampling layer, outputting to obtain a characteristic diagram C 4-i ', splicing the characteristic diagrams C 2-i ', C 3-i ', and C 4-i ', sequentially inputting to a first convolution layer and a Sigmoid layer, outputting to obtain a weight matrix G, carrying out singular value decomposition on the weight matrix G to obtain a left singular vector matrix W1, a diagonal matrix W2 and a right singular vector matrix W3, multiplying a feature map C 2-i with the left singular vector matrix W1 to obtain a feature map W1', multiplying a feature map C 3-i with the diagonal matrix W2 to obtain a feature map W2', multiplying a feature map C 4-i with the right singular vector matrix W3 to obtain a feature map W3', splicing the feature maps W1', W2' and W3' and inputting the spliced feature map W1' into a second convolution layer, and outputting to obtain a feature map Z;
g-12) inputs the feature map C 1-i into the first upsampling layer, outputs the feature map C 1-i ', adds the feature map C 1-i ' to the feature map Z, inputs the feature map C 1-i ' to the convolution layer with the convolution kernel size of 1×1, and outputs the segmentation result image P i.
Preferably, in step g-3), the convolution kernels of the first convolution layer, the second convolution layer, the third convolution layer and the fourth convolution layer of the fourth dual-channel attention module are all 1×1, the step sizes are all 1, and the filling are all 1; in the step g-4), the convolution kernel sizes of the first convolution layer and the second convolution layer of the fourth double convolution module are 3 multiplied by 3, the step sizes are 1, and the filling is 1; in the step g-5), the convolution kernels of the first convolution layer, the second convolution layer, the third convolution layer and the fourth convolution layer of the third dual-channel attention module are all 1 multiplied by 1, the step sizes are all 1, and the filling is all 1; in the step g-6), the convolution kernel sizes of the first convolution layer and the second convolution layer of the third double convolution module are 3 multiplied by 3, the step sizes are 1, and the filling is 1; the convolution kernels of the first convolution layer, the second convolution layer, the third convolution layer and the fourth convolution layer of the second dual-channel attention module in the step g-7) are all 1 multiplied by 1, the step sizes are all 1, and the filling is all 1; in the step g-8), the convolution kernel sizes of the first convolution layer and the second convolution layer of the second double-convolution module are 3 multiplied by 3, the step sizes are 1, and the filling is 1; in the step g-9), the convolution kernels of the first convolution layer, the second convolution layer, the third convolution layer and the fourth convolution layer of the first dual-channel attention module are all 1 multiplied by 1, the step sizes are all 1, and the filling is all 1; in the step g-10), the convolution kernel sizes of the first convolution layer and the second convolution layer of the first double convolution module are 3 multiplied by 3, the step sizes are 1, and the filling is 1.
Further, step h) comprises the steps of:
h-1) calculating to obtain total loss L total through a formula L total=αLCrossEntorpy+(1-α)LDice, wherein L CrossEntorpy is a cross entropy loss function, L Dice is a Dice loss function, and alpha is a weight;
h-2) training a segmentation network model by using an Adam optimizer and adopting total loss L total to obtain an optimized segmentation network model, wherein the batch size is set to 10 during training, the maximum epoch is set to 200, the learning rate lr is 0.01, and the alpha is set to 0.05.
The beneficial effects of the invention are as follows:
(1) A symmetrical structure of the decoder-encoder mode is employed. The convolution module in the network model is improved, and a dense cascade module is provided. A position self-attention mechanism is introduced to replace the bottom structure in the model structure, and a dual-path channel attention module is added in the jump connection process. A multi-level gating fusion module is added in the decoder part to fuse the characteristics of different stages.
(2) The encoder consists of four dense cascade modules, wherein the cavity convolution layers with expansion rates of 1, 3 and 5 are cascade connected in a dense mode, the original input of the modules and the output after feature extraction are combined, the receptive field is enlarged, and the transmission of image feature information is enhanced.
(3) The channel attention module weights the feature maps between channels and selects a useful feature map. The position self-attention module in the model bottom structure fuses the input global information, and can effectively enhance the robustness of the features and the local connection between the features so as to realize better prediction.
(4) The multi-level gating fusion module can automatically learn and adjust the contribution of the feature map of each level, the weighting map obtained through dynamic learning can control the proportion of the feature information of each level, and then the features of each level are fused.
Drawings
FIG. 1 is a block diagram of a split network of the present invention;
FIG. 2 is a block diagram of a dense cascading module of the present invention;
FIG. 3 is a block diagram of a dual channel attention module of the present invention;
FIG. 4 is a block diagram of a position self-attention module of the present invention;
Fig. 5 is a block diagram of a multi-level gated fusion module of the present invention.
Detailed Description
The invention is further described with reference to fig. 1 to 5.
A heart image segmentation method based on an attention mechanism and multi-level feature fusion comprises the following steps:
a) A cardiac MRI image dataset X, x= { X 1,X2,...,Xi,...,XN }, where X i is the i-th cardiac MRI image, i e { 1..once., N }, N is the number of cardiac MRI images.
B) Preprocessing the cardiac MRI image data set X to obtain a preprocessed data set X'.
C) The preprocessed data set X' is divided into a training set, a verification set and a test set.
D) Slicing each preprocessed cardiac MRI image in the training set along the Z-axis, M slice images were obtained, the i-th slice image being F i, i e { 1..m }.
E) A partitioning network model is built consisting of an encoder and a decoder.
F) The i-th slice image F i is input to the encoder of the segmentation network model, and the feature map a 5-i is output.
G) The feature map a 5-i is input to a decoder of the segmentation network model, and the segmentation result image P i is output.
H) Training the segmentation network model to obtain an optimized segmentation network model.
I) And slicing each preprocessed cardiac MRI image in the test set along the Z axis to obtain Q slice images, wherein the ith slice image is F i ', i epsilon { 1.. The first slice image is Q } and the second slice image is F i'.
J) The i-th slice image F i 'is input to the optimized segmentation network model, and the segmentation result image P i' is output.
A symmetrical structure of the decoder-encoder mode is employed. The convolution module in the network model is improved, a dense cascade module is provided, hollow convolution layers in the dense cascade module are cascade connected in a dense mode, the original input of the module is combined with the output after the feature extraction, and the transmission of the image feature information is enhanced; the self-attention module at the position is introduced to replace a bottom structure in a model structure, so that the input global information can be fused, and the robustness of the features and the local connection between the features can be effectively enhanced. A channel attention module is added during the jump connection process to weight the feature maps between channels and select a useful feature map. A multi-level gating fusion module is added in the decoder part, and can automatically adjust the contributions of different level feature graphs and fully utilize multi-level information.
In one embodiment of the invention, N cardiac MRI images are acquired from the automated cardiac diagnostic challenge disclosure data Automated Cardiac Diagnosis Challenge in step a).
In one embodiment of the invention, step b) comprises the steps of:
b-1) converting the ith cardiac MRI image X i into an array Numpy using the GetArrayFromImage () function in the numpy library, cutting the i cardiac MRI image X i converted into an array Numpy into V2D slices along the Z-axis.
B-2) resampling each 2D slice to obtain new 2D images with V pixel pitches of (1.5 ), center cropping each new 2D image to obtain V cropped 2D images with 384×384, stacking the cropped 2D images to restore to form a 3D image Numpy array, and converting the 3D image Numpy array into a cardiac MRI image by using GetArrayFromArray () function in numpy library
B-3) MRI images of the heartAnd (3) clockwise overturning by 90 degrees or anticlockwise overturning by 90 degrees along a horizontal axis or a vertical axis with the probability of 0.5 to obtain a rotated image, and performing normalization operation on the rotated image to obtain a preprocessed ith heart MRI image X' i.
B-4) N preprocessed cardiac MRI images form a preprocessed dataset X ', X= { X' 1,X′2,...,X′i,...,X′N }.
In one embodiment of the present invention, the preprocessed data set X' is divided into a training set, a validation set and a test set in a ratio of 7:1:2 in step c).
In one embodiment of the invention, M in step d) takes on a value of 1312.
In one embodiment of the invention, step f) comprises the steps of:
The encoder of the f-1) segmentation network model is composed of a first intensive cascade module, a first maximum pooling layer, a second intensive cascade module, a second maximum pooling layer, a third intensive cascade module, a third maximum pooling layer, a fourth intensive cascade module, a fourth maximum pooling layer and a position self-attention module.
The first dense cascade module of the F-2) encoder is sequentially composed of a first convolution layer, a second convolution layer, a third convolution layer and a fourth convolution layer, and the ith slice image F i is input into the first dense cascade module and output to obtain a feature map A 1-i.
F-3) inputting the characteristic map A 1-i into the first maximum pooling layer of the encoder, and outputting to obtain the characteristic map A 1-i. f-4) the second dense cascade module of the encoder is sequentially composed of a first convolution layer, a second convolution layer, a third convolution layer and a fourth convolution layer, and the feature map A 1-i is input into the second dense cascade module and output to obtain the feature map A 2-i.
F-5) inputting the characteristic map A 2-i into the second maximum pooling layer of the encoder, and outputting to obtain the characteristic map A 2-i. f-6) the third dense cascade module of the encoder is sequentially composed of a first convolution layer, a second convolution layer, a third convolution layer and a fourth convolution layer, and the characteristic diagram A 2-i is input into the third dense cascade module and output to obtain the characteristic diagram A 3-i.
F-7) inputting the characteristic map A 3-i into the third maximum pooling layer of the encoder, and outputting to obtain the characteristic map A 3-i. f-8) the fourth dense cascade module of the encoder is sequentially composed of a first convolution layer, a second convolution layer, a third convolution layer and a fourth convolution layer, and the characteristic diagram A 3-i is input into the fourth dense cascade module and output to obtain the characteristic diagram A 4-i.
F-9) inputting the characteristic map A 4-i into the fourth maximum pooling layer of the encoder, and outputting to obtain the characteristic map A 4-i. f-10) the position self-attention module of the encoder consists of a first convolution layer, a second convolution layer, a third convolution layer, a first bilinear interpolation layer, a second bilinear interpolation layer, a third bilinear interpolation layer, a softmax layer, a fourth bilinear interpolation layer and a fourth convolution layer, and the characteristic diagram A 4-i is sequentially input into the first convolution layer and output to obtain the characteristic diagramMap/>Inputting the characteristic map A 4-i into a second convolution layer in sequence, and outputting to obtain a characteristic map/>Map/>Inputting the characteristic images into a second bilinear interpolation layer for bilinear interpolation to obtain a characteristic image K, sequentially inputting the characteristic image A 4-i into a third convolution layer, and outputting to obtain a characteristic image/>Map/>Inputting the characteristic map Q and the characteristic map K into a softmax layer, outputting the characteristic map QK, multiplying the characteristic map QK and the characteristic map V to obtain a characteristic map Att, inputting the characteristic map Att into a fourth bilinear interpolation layer to perform bilinear interpolation, and inputting the characteristic map Att into a fourth convolution layer to obtain a characteristic map A 5-i.
In this embodiment, in step f-2), the first convolution layer of the first dense cascade module has a convolution kernel size of 3×3 and a dilation rate of 1, the second convolution layer has a convolution kernel size of 3×3 and a dilation rate of 3, the third convolution layer has a convolution kernel size of 3×3 and a dilation rate of 5, and the fourth convolution layer has a convolution kernel size of 3×3 and a dilation rate of 1; step f-3) wherein the step size of the first largest pooling layer is 2 and the pooling kernel size is 2 x 2; in the step f-4), the convolution kernel size of the first convolution layer of the second dense cascade module is 3×3, the expansion rate is 1, the convolution kernel size of the second convolution layer is 3×3, the expansion rate is 3, the convolution kernel size of the third convolution layer is 3×3, the expansion rate is 5, and the convolution kernel size of the fourth convolution layer is 3×3, the expansion rate is 1; step f-5) wherein the step size of the second largest pooling layer is 2 and the pooling kernel size is 2 x 2; in the step f-6), the convolution kernel size of the first convolution layer of the third dense cascade module is 3×3, the expansion rate is 1, the convolution kernel size of the second convolution layer is 3×3, the expansion rate is 3, the convolution kernel size of the third convolution layer is 3×3, the expansion rate is 5, and the convolution kernel size of the fourth convolution layer is 3×3, the expansion rate is 1; step f-7) wherein the step size of the third largest pooling layer is 2 and the pooling kernel size is 2 x 2; the first convolution layer of the fourth dense cascade module in step f-8) has a convolution kernel size of 3×3 and an expansion ratio of 1, the second convolution layer has a convolution kernel size of 3×3 and an expansion ratio of 3, the third convolution layer has a convolution kernel size of 3×3 and an expansion ratio of 5, and the fourth convolution layer has a convolution kernel size of 3×3 and an expansion ratio of 1; step f-9) wherein the fourth maximum pooling layer has a stride of 2 and a pooling kernel size of 2×2; the convolution kernel sizes of the first convolution layer, the second convolution layer, the third convolution layer and the fourth convolution layer of the self-attention module in the step f-10) are all 1×1.
In one embodiment of the invention, step g) comprises the steps of:
The decoder of the g-1) split network model is composed of a first double convolution module, a second double convolution module, a third double convolution module, a fourth double convolution module, a first double-channel attention module, a second double-channel attention module, a third double-channel attention module, a fourth double-channel attention module, a first upsampling layer, a second upsampling layer, a third upsampling layer, a fourth upsampling layer, a fifth upsampling layer and a multi-level gating fusion module.
G-2) the feature map a 5-i is input into the fifth upsampling layer of the decoder, and the output results in the feature map C 5-i. The fourth dual-channel attention module of the G-3) decoder is composed of a global average pooling layer, a first convolution layer, a first ReLU layer, a second convolution layer, a first Sigmoid layer, a global maximum pooling layer, a third convolution layer, a second ReLU layer, a fourth convolution layer and a second Sigmoid layer, the feature map A 4-i is sequentially input into the global average pooling layer, the first convolution layer, the first ReLU layer, the second convolution layer and the first Sigmoid layer of the fourth dual-channel attention module, the feature map G a is output and obtained, the feature map A 4-i is sequentially input into the global maximum pooling layer of the fourth dual-channel attention module, And outputting the third convolution layer, the second ReLU layer, the fourth convolution layer and the second Sigmoid layer to obtain a feature map G m, adding the feature map G a and the feature map G m, multiplying the feature map G 4-i by elements to obtain a feature map A 4-i, inputting the feature map C 5-i into a fifth upsampling layer of a decoder to obtain a feature map C 5-i, and performing splicing operation on the feature map A 4-i and the feature map C 5-i to obtain a feature map D 4-i. The fourth double-convolution module of the g-4) decoder is sequentially composed of a first convolution layer, a first BN layer, a first ReLU layer, a second convolution layer, a second BN layer and a second ReLU layer, and the feature map D 4-i is input into the fourth double-convolution module and output to obtain a feature map C 4-i.
G-5) the third dual-channel attention module of the decoder is composed of a global average pooling layer, a first convolution layer, a first ReLU layer, a second convolution layer, a first Sigmoid layer, a global maximum pooling layer, a third convolution layer, a second ReLU layer, a fourth convolution layer and a second Sigmoid layer, the feature map a 3-i is sequentially input into the global average pooling layer, the first convolution layer, the first ReLU layer, the second convolution layer and the first Sigmoid layer of the third dual-channel attention module, the feature map G a 'is obtained by outputting, the feature map a 3-i is sequentially input into the global maximum pooling layer, the third convolution layer, the second ReLU layer, the fourth convolution layer and the second Sigmoid layer of the third dual-channel attention module, the feature map G m' is obtained by outputting, the feature map G a 'and the feature map G m' are added and are multiplied by the feature map a 3-i element by element to obtain the feature map a 3-i, the feature map C 4-i is input into the fourth upper layer of the decoder, the feature map C 4-i is obtained by outputting the feature map C3798, and the feature map C376857 is obtained by splicing the feature map C with the feature map C.
The third double-convolution module of the g-6) decoder is sequentially composed of a first convolution layer, a first BN layer, a first ReLU layer, a second convolution layer, a second BN layer and a second ReLU layer, and the feature map D 3-i is input into the third double-convolution module and output to obtain a feature map C 3-i.
G-7) the second dual-channel attention module of the decoder is composed of a global average pooling layer, a first convolution layer, a first ReLU layer, a second convolution layer, a first Sigmoid layer, a global maximum pooling layer, a third convolution layer, a second ReLU layer, a fourth convolution layer, and a second Sigmoid layer, the feature map a 2-i is sequentially input into the global average pooling layer, the first convolution layer, the first ReLU layer, the second convolution layer, and the first Sigmoid layer of the second dual-channel attention module, the feature map G a ″ is obtained by outputting, the feature map a 2-i is sequentially input into the global maximum pooling layer, the third convolution layer, the second ReLU layer, the fourth convolution layer, and the second Sigmoid layer of the second dual-channel attention module, the feature map G m ″ is obtained by multiplying the feature map a 2-i by element after adding the feature map G a ″ and the feature map G m ″, the feature map C3776 is input into the third upper layer of the decoder, the feature map C3737D is obtained by outputting the feature map C37 2-i, and the feature map C 2-i is obtained by splicing the feature map C.
G-8) the second double-convolution module of the decoder is sequentially composed of a first convolution layer, a first BN layer, a first ReLU layer, a second convolution layer, a second BN layer and a second ReLU layer, and the feature map D 2-i is input into the second double-convolution module and output to obtain a feature map C 2-i.
G-9) the first dual-channel attention module of the decoder is composed of a global average pooling layer, a first convolution layer, a first ReLU layer, a second convolution layer, a first Sigmoid layer, a global maximum pooling layer, a third convolution layer, a second ReLU layer, a fourth convolution layer and a second Sigmoid layer, the feature map a 1-i is sequentially input into the global average pooling layer, the first convolution layer, the first ReLU layer, the second convolution layer and the first Sigmoid layer of the first dual-channel attention module, the feature map G a '"is output, the feature map a 1-i is sequentially input into the global maximum pooling layer, the third convolution layer, the second ReLU layer, the fourth convolution layer and the second Sigmoid layer of the second dual-channel attention module, the feature map G m'" is output, the feature map G a '"and the feature map G m'" are added, the feature map a 1-i is multiplied by element to obtain the feature map a 1-i, the feature map C 2-i is input into the second dual-channel attention module, the feature map C3798 'is output, and the feature map C 1-i' is spliced with the feature map C3723.
G-10) the first double-convolution module of the decoder is sequentially composed of a first convolution layer, a first BN layer, a first ReLU layer, a second convolution layer, a second BN layer and a second ReLU layer, and the feature map D 1-i is input into the first double-convolution module and output to obtain the feature map C 1-i.
The multi-level gating fusion module of the G-11) decoder consists of a first upsampling layer, a second upsampling layer, a third upsampling layer, a first convolution layer, a Sigmoid layer and a second convolution layer, wherein the characteristic diagram C 2-i is input into the first upsampling layer and is output to obtain a characteristic diagram C 2-i ', inputting the characteristic diagram C 3-i into the second upsampling layer, outputting the characteristic diagram C 3-i', inputting the characteristic diagram C 4-i into the third upsampling layer, outputting the characteristic diagram C 4-i ', inputting the characteristic diagrams C 2-i', C 3-i ', and C 4-i', after the splicing operation, sequentially into the first convolution layer and the Sigmoid layer, the method comprises the steps of outputting a weight matrix G, performing Singular Value Decomposition (SVD) on the weight matrix G to obtain a left singular vector matrix W1, a diagonal matrix W2 and a right singular vector matrix W3, multiplying a feature map C 2-i by the left singular vector matrix W1 to obtain a feature map W1', multiplying a feature map C 3-i by the diagonal matrix W2 to obtain a feature map W2', multiplying a feature map C 4-i by the right singular vector matrix W3 to obtain a feature map W3', splicing the feature maps W1', W2 'and W3' and inputting the spliced feature map W1 'and W3' into a second convolution layer, and outputting to obtain a feature map Z.
G-12) inputs the feature map C 1-i into the first upsampling layer, outputs the feature map C 1-i ', adds the feature map C 1-i ' to the feature map Z, inputs the feature map C 1-i ' to the convolution layer with the convolution kernel size of 1×1, and outputs the segmentation result image P i.
In this embodiment, in step g-3), the convolution kernels of the first, second, third, and fourth dual-channel attention modules are 1×1 in size, 1 in step size, and 1 in padding; in the step g-4), the convolution kernel sizes of the first convolution layer and the second convolution layer of the fourth double convolution module are 3 multiplied by 3, the step sizes are 1, and the filling is 1; in the step g-5), the convolution kernels of the first convolution layer, the second convolution layer, the third convolution layer and the fourth convolution layer of the third dual-channel attention module are all 1 multiplied by 1, the step sizes are all 1, and the filling is all 1; in the step g-6), the convolution kernel sizes of the first convolution layer and the second convolution layer of the third double convolution module are 3 multiplied by 3, the step sizes are 1, and the filling is 1; the convolution kernels of the first convolution layer, the second convolution layer, the third convolution layer and the fourth convolution layer of the second dual-channel attention module in the step g-7) are all 1 multiplied by 1, the step sizes are all 1, and the filling is all 1; in the step g-8), the convolution kernel sizes of the first convolution layer and the second convolution layer of the second double-convolution module are 3 multiplied by 3, the step sizes are 1, and the filling is 1; in the step g-9), the convolution kernels of the first convolution layer, the second convolution layer, the third convolution layer and the fourth convolution layer of the first dual-channel attention module are all 1 multiplied by 1, the step sizes are all 1, and the filling is all 1; in the step g-10), the convolution kernel sizes of the first convolution layer and the second convolution layer of the first double convolution module are 3 multiplied by 3, the step sizes are 1, and the filling is 1.
Step h) comprises the steps of:
h-1) calculating to obtain total loss L total through a formula L total=αLCrossEntorpy+(1-α)LDice, wherein L CrossEntorpy is a cross entropy loss function, L Dice is a Dice loss function, and alpha is a weight;
h-2) training a segmentation network model by using an Adam optimizer and adopting total loss L total to obtain an optimized segmentation network model, wherein the batch size is set to 10 during training, the maximum epoch is set to 200, the learning rate lr is 0.01, and the alpha is set to 0.05.
Finally, it should be noted that: the foregoing description is only a preferred embodiment of the present invention, and the present invention is not limited thereto, but it is to be understood that modifications and equivalents of some of the technical features described in the foregoing embodiments may be made by those skilled in the art, although the present invention has been described in detail with reference to the foregoing embodiments. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (8)

1. A heart image segmentation method based on attention mechanism and multi-level feature fusion is characterized by comprising the following steps:
a) Acquiring a cardiac MRI image dataset X, x= { X 1,X2,...,Xi,...,XN }, wherein X i is the i-th cardiac MRI image, i e { 1..once, N }, N being the number of cardiac MRI images;
b) Preprocessing the cardiac MRI image data set X to obtain a preprocessed data set X';
c) Dividing the preprocessed data set X' into a training set, a verification set and a test set;
d) Slicing each preprocessed cardiac MRI image in the training set along a Z axis to obtain M slice images, wherein the ith slice image is F i, i is { 1., M };
e) Establishing a segmentation network model formed by an encoder and a decoder;
f) Inputting the ith slice image F i into an encoder of the segmentation network model, and outputting to obtain a feature map A 5-i;
g) Inputting the feature map A 5-i into a decoder of the segmentation network model, and outputting a segmentation result image P i;
h) Training a segmentation network model to obtain an optimized segmentation network model;
i) Slicing each preprocessed cardiac MRI image in the test set along a Z axis to obtain Q slice images, wherein the ith slice image is F i', i epsilon { 1.. The first place, Q };
j) Inputting the ith slice image F i 'into the optimized segmentation network model, and outputting a segmentation result image P i';
step f) comprises the steps of:
The encoder for dividing the network model comprises a first intensive cascade module, a first maximum pooling layer, a second intensive cascade module, a second maximum pooling layer, a third intensive cascade module, a third maximum pooling layer, a fourth intensive cascade module, a fourth maximum pooling layer and a position self-attention module;
F-2) the first dense cascade module of the encoder is sequentially composed of a first convolution layer, a second convolution layer, a third convolution layer and a fourth convolution layer, and the ith slice image F i is input into the first dense cascade module and output to obtain a feature map A 1-i;
f-3) inputting the characteristic diagram A 1-i into a first maximum pooling layer of the encoder, and outputting to obtain a characteristic diagram A' 1-i;
f-4) the second dense cascade module of the encoder is sequentially composed of a first convolution layer, a second convolution layer, a third convolution layer and a fourth convolution layer, the feature map A' 1-i is input into the second dense cascade module, and the feature map A 2-i is obtained through output;
f-5) inputting the characteristic diagram A 2-i into a second maximum pooling layer of the encoder, and outputting to obtain a characteristic diagram A' 2-i;
f-6) the third dense cascade module of the encoder is sequentially composed of a first convolution layer, a second convolution layer, a third convolution layer and a fourth convolution layer, the characteristic diagram A' 2-i is input into the third dense cascade module, and the characteristic diagram A 3-i is obtained through output;
f-7) inputting the characteristic diagram A 3-i into a third maximum pooling layer of the encoder, and outputting to obtain a characteristic diagram A' 3-i;
f-8) a fourth dense cascade module of the encoder sequentially comprises a first convolution layer, a second convolution layer, a third convolution layer and a fourth convolution layer, wherein a characteristic diagram A' 3-i is input into the fourth dense cascade module, and a characteristic diagram A 4-i is obtained through output;
f-9) inputting the characteristic diagram A 4-i into a fourth maximum pooling layer of the encoder, and outputting to obtain a characteristic diagram A' 4-i;
f-10) the position self-attention module of the encoder consists of a first convolution layer, a second convolution layer, a third convolution layer, a first bilinear interpolation layer, a second bilinear interpolation layer, a third bilinear interpolation layer, a softmax layer, a fourth bilinear interpolation layer and a fourth convolution layer, and the characteristic diagram A' 4-i is sequentially input into the first convolution layer and output to obtain the characteristic diagram Map/>Inputting the characteristic images into a bilinear interpolation layer to perform bilinear interpolation to obtain a characteristic image Q, sequentially inputting the characteristic images A' 4-i into a second convolution layer, and outputting to obtain a characteristic image/>Map/>Inputting the characteristic images into a second bilinear interpolation layer for bilinear interpolation to obtain a characteristic image K, sequentially inputting the characteristic images A' 4-i into a third convolution layer, and outputting to obtain a characteristic image/>Map/>Inputting the characteristic image into a third bilinear interpolation layer for bilinear interpolation to obtain a characteristic image V, multiplying the characteristic image Q by the characteristic image K, inputting the characteristic image into a softmax layer, outputting the characteristic image to obtain a characteristic image QK, multiplying the characteristic image QK by the characteristic image V to obtain a characteristic image Att, inputting the characteristic image Att into a fourth bilinear interpolation layer for bilinear interpolation, and inputting the characteristic image Att into a fourth convolution layer to obtain a characteristic image A 5-i;
Step g) comprises the steps of:
g-1) the decoder for dividing the network model is composed of a first double convolution module, a second double convolution module, a third double convolution module, a fourth double convolution module, a first double-channel attention module, a second double-channel attention module, a third double-channel attention module, a fourth double-channel attention module, a first upsampling layer, a second upsampling layer, a third upsampling layer, a fourth upsampling layer, a fifth upsampling layer and a multi-level gating fusion module;
g-2) inputting the feature map A 5-i into a fifth upsampling layer of the decoder, and outputting to obtain a feature map C 5-i;
G-3) the fourth dual-channel attention module of the decoder is composed of a global average pooling layer, a first convolution layer, a first ReLU layer, a second convolution layer, a first Sigmoid layer, a global maximum pooling layer, a third convolution layer, a second ReLU layer, a fourth convolution layer and a second Sigmoid layer, the feature map A ' 4-i is sequentially input into the global average pooling layer, the first convolution layer, the first ReLU layer, the second convolution layer and the first Sigmoid layer of the fourth dual-channel attention module, the feature map G a is output, the feature map A ' 4-i is sequentially input into the global maximum pooling layer, the third convolution layer, the second ReLU layer, the fourth convolution layer and the second Sigmoid layer of the fourth dual-channel attention module, the feature map G m is output, the feature map G a and the feature map G m are added and multiplied by the feature map A ' 4-i element by element to obtain a feature map A ' 4-i, the feature map C 5-i is input into the fifth upper layer of the decoder, the feature map A ' 4-i is input into the fourth dual-channel attention module, the feature map C ' 4-i is output, and the feature map C ' 4-i is spliced;
The fourth double-convolution module of the g-4) decoder is composed of a first convolution layer, a first BN layer, a first ReLU layer, a second convolution layer, a second BN layer and a second ReLU layer in sequence, the feature map D 4-i is input into the fourth double-convolution module, and the feature map C 4-i is output and obtained;
The third dual-channel attention module of the G-5) decoder is composed of a global average pooling layer, a first convolution layer, a first ReLU layer, a second convolution layer, a first Sigmoid layer, a global maximum pooling layer, a third convolution layer, a second ReLU layer, a fourth convolution layer and a second Sigmoid layer, the feature map A ' 3-i is sequentially input into the global average pooling layer, the first convolution layer, the first ReLU layer, the second convolution layer and the first Sigmoid layer of the third dual-channel attention module, the feature map G a ' is output, the feature map A ' 3-i is sequentially input into the global maximum pooling layer, the third convolution layer, the second ReLU layer, the fourth convolution layer and the second Sigmoid layer of the third dual-channel attention module, the feature map G m ' is output, the feature map G a ' and the feature map G m ' are added and multiplied by element by the feature map A ' 3-i to obtain a feature map A ' 3-i, the feature map C 4-i is input into the fourth dual-channel attention module, the feature map C ' 3723 is output, and the feature map C ' 3798 is spliced with the feature map C ' 3-i;
g-6) the third double-convolution module of the decoder is sequentially composed of a first convolution layer, a first BN layer, a first ReLU layer, a second convolution layer, a second BN layer and a second ReLU layer, and the feature map D 3-i is input into the third double-convolution module and output to obtain a feature map C 3-i;
G-7) the second dual-channel attention module of the decoder is composed of a global average pooling layer, a first convolution layer, a first ReLU layer, a second convolution layer, a first Sigmoid layer, a global maximum pooling layer, a third convolution layer, a second ReLU layer, a fourth convolution layer and a second Sigmoid layer, the feature map A ' 2-i is sequentially input into the global average pooling layer, the first convolution layer, the first ReLU layer, the second convolution layer and the first Sigmoid layer of the second dual-channel attention module, the feature map G a ″ is obtained by outputting, the feature map A ' 2-i is sequentially input into the global maximum pooling layer, the third convolution layer, the second ReLU layer, the fourth convolution layer and the second Sigmoid layer of the second dual-channel attention module, the feature map G m ″ is obtained by outputting, the feature map G a ″ and the feature map G m ″ are added and multiplied by element by the feature map A ' 2-i to obtain a feature map A "2-i", the feature map C 3-i is input into the third upper layer of the decoder, the feature map A ' 3-i is obtained by outputting the feature map A ' 2-i and the feature map C ' 3798, and the feature map C is obtained by splicing the feature map C with the feature map C ' 2-i;
g-8) the second double-convolution module of the decoder is sequentially composed of a first convolution layer, a first BN layer, a first ReLU layer, a second convolution layer, a second BN layer and a second ReLU layer, and the feature map D 2-i is input into the second double-convolution module and output to obtain a feature map C 2-i;
G-9) the first dual-channel attention module of the decoder is composed of a global average pooling layer, a first convolution layer, a first ReLU layer, a second convolution layer, a first Sigmoid layer, a global maximum pooling layer, a third convolution layer, a second ReLU layer, a fourth convolution layer and a second Sigmoid layer, the feature map A '1-i is sequentially input into the global average pooling layer, the first convolution layer, the first ReLU layer, the second convolution layer and the first Sigmoid layer of the first dual-channel attention module, the feature map G a' "is output, the feature map A '1-i is sequentially input into the global maximum pooling layer, the third convolution layer, the second ReLU layer, the fourth convolution layer and the second Sigmoid layer of the second dual-channel attention module, the feature map G m'" is output, the feature map G a '"and the feature map G m'" are added and then multiplied with the element of the feature map A '1-i to obtain a feature map A "1-i, the feature map C is input into the second layer of the decoder, the feature map C' 3776 is input into the second dual-channel attention module, and the feature map C '3798 is output, and the feature map C is spliced with the feature map C' 1-i";
g-10) the first double-convolution module of the decoder is sequentially composed of a first convolution layer, a first BN layer, a first ReLU layer, a second convolution layer, a second BN layer and a second ReLU layer, wherein a feature map D 1-i is input into the first double-convolution module, and a feature map C 1-i is output and obtained;
The multi-level gating fusion module of the G-11) decoder consists of a first upsampling layer, a second upsampling layer, a third upsampling layer, a first convolution layer, a Sigmoid layer and a second convolution layer, wherein the characteristic diagram C 2-i is input into the first upsampling layer and is output to obtain a characteristic diagram C 2-i ', inputting the characteristic diagram C 3-i into a second upsampling layer, outputting to obtain a characteristic diagram C 3-i ', inputting the characteristic diagram C 4-i into a third upsampling layer, outputting to obtain a characteristic diagram C 4-i ', splicing the characteristic diagrams C 2-i ', C 3-i ', and C 4-i ', sequentially inputting to a first convolution layer and a Sigmoid layer, outputting to obtain a weight matrix G, carrying out singular value decomposition on the weight matrix G to obtain a left singular vector matrix W1, a diagonal matrix W2 and a right singular vector matrix W3, multiplying a feature map C 2-i with the left singular vector matrix W1 to obtain a feature map W1', multiplying a feature map C 3-i with the diagonal matrix W2 to obtain a feature map W2', multiplying a feature map C 4-i with the right singular vector matrix W3 to obtain a feature map W3', splicing the feature maps W1', W2' and W3' and inputting the spliced feature map W1' into a second convolution layer, and outputting to obtain a feature map Z;
g-12) inputs the feature map C 1-i into the first upsampling layer, outputs the feature map C 1-i ', adds the feature map C 1-i ' to the feature map Z, inputs the feature map C 1-i ' to the convolution layer with the convolution kernel size of 1×1, and outputs the segmentation result image P i.
2. The cardiac image segmentation method based on attention mechanism and multi-level feature fusion of claim 1, wherein: n cardiac MRI images are acquired from the automated cardiac diagnostic challenge disclosure data Automated Cardiac Diagnosis Challenge in step a).
3. The cardiac image segmentation method based on attention mechanism and multi-level feature fusion of claim 1, wherein: step b) comprises the steps of:
b-1) converting the ith cardiac MRI image X i into an array Numpy by utilizing GetArrayFromImage () function in numpy library, and cutting the i cardiac MRI image X i converted into an array Numpy into V2D slices along the Z axis direction;
b-2) resampling each 2D slice to obtain new 2D images with V pixel pitches of (1.5 ), center cropping each new 2D image to obtain V cropped 2D images with 384×384, stacking the cropped 2D images to restore to form a 3D image Numpy array, and converting the 3D image Numpy array into a cardiac MRI image by using GetArrayFromArray () function in numpy library
B-3) MRI images of the heartClockwise overturning by 90 degrees or anticlockwise overturning by 90 degrees along a horizontal axis or a vertical axis with the probability of 0.5 to obtain a rotated image, and carrying out normalization operation on the rotated image to obtain a preprocessed ith heart MRI image X' i;
b-4) N preprocessed cardiac MRI images form a preprocessed dataset X ', X= { X' 1,X′2,...,X′i,...,X′N }.
4. The cardiac image segmentation method based on attention mechanism and multi-level feature fusion of claim 1, wherein: in the step c), the preprocessed data set X' is divided into a training set, a verification set and a test set according to the proportion of 7:1:2.
5. The cardiac image segmentation method based on attention mechanism and multi-level feature fusion of claim 1, wherein: in step d), M takes on a value of 1312.
6. The cardiac image segmentation method based on attention mechanism and multi-level feature fusion of claim 1, wherein: in the step f-2), the convolution kernel size of the first convolution layer of the first dense cascade module is 3×3, the expansion rate is 1, the convolution kernel size of the second convolution layer is 3×3, the expansion rate is 3, the convolution kernel size of the third convolution layer is 3×3, the expansion rate is 5, and the convolution kernel size of the fourth convolution layer is 3×3, the expansion rate is 1; step f-3) wherein the step size of the first largest pooling layer is 2 and the pooling kernel size is 2 x 2; in the step f-4), the convolution kernel size of the first convolution layer of the second dense cascade module is 3×3, the expansion rate is 1, the convolution kernel size of the second convolution layer is 3×3, the expansion rate is 3, the convolution kernel size of the third convolution layer is 3×3, the expansion rate is 5, and the convolution kernel size of the fourth convolution layer is 3×3, the expansion rate is 1; step f-5) wherein the step size of the second largest pooling layer is 2 and the pooling kernel size is 2 x 2; in the step f-6), the convolution kernel size of the first convolution layer of the third dense cascade module is 3×3, the expansion rate is 1, the convolution kernel size of the second convolution layer is 3×3, the expansion rate is 3, the convolution kernel size of the third convolution layer is 3×3, the expansion rate is 5, and the convolution kernel size of the fourth convolution layer is 3×3, the expansion rate is 1; step f-7) wherein the step size of the third largest pooling layer is 2 and the pooling kernel size is 2 x 2; the first convolution layer of the fourth dense cascade module in step f-8) has a convolution kernel size of 3×3 and an expansion ratio of 1, the second convolution layer has a convolution kernel size of 3×3 and an expansion ratio of 3, the third convolution layer has a convolution kernel size of 3×3 and an expansion ratio of 5, and the fourth convolution layer has a convolution kernel size of 3×3 and an expansion ratio of 1; step f-9) wherein the fourth maximum pooling layer has a stride of 2 and a pooling kernel size of 2×2; the convolution kernel sizes of the first convolution layer, the second convolution layer, the third convolution layer and the fourth convolution layer of the self-attention module in the step f-10) are all 1×1.
7. The cardiac image segmentation method based on attention mechanism and multi-level feature fusion of claim 1, wherein: in the step g-3), the convolution kernels of the first convolution layer, the second convolution layer, the third convolution layer and the fourth convolution layer of the fourth dual-channel attention module are all 1 multiplied by 1, the step sizes are all 1, and the filling is all 1; in the step g-4), the convolution kernel sizes of the first convolution layer and the second convolution layer of the fourth double convolution module are 3 multiplied by 3, the step sizes are 1, and the filling is 1; in the step g-5), the convolution kernels of the first convolution layer, the second convolution layer, the third convolution layer and the fourth convolution layer of the third dual-channel attention module are all 1 multiplied by 1, the step sizes are all 1, and the filling is all 1; in the step g-6), the convolution kernel sizes of the first convolution layer and the second convolution layer of the third double convolution module are 3 multiplied by 3, the step sizes are 1, and the filling is 1; the convolution kernels of the first convolution layer, the second convolution layer, the third convolution layer and the fourth convolution layer of the second dual-channel attention module in the step g-7) are all 1 multiplied by 1, the step sizes are all 1, and the filling is all 1; in the step g-8), the convolution kernel sizes of the first convolution layer and the second convolution layer of the second double-convolution module are 3 multiplied by 3, the step sizes are 1, and the filling is 1; in the step g-9), the convolution kernels of the first convolution layer, the second convolution layer, the third convolution layer and the fourth convolution layer of the first dual-channel attention module are all 1 multiplied by 1, the step sizes are all 1, and the filling is all 1; in the step g-10), the convolution kernel sizes of the first convolution layer and the second convolution layer of the first double convolution module are 3 multiplied by 3, the step sizes are 1, and the filling is 1.
8. The cardiac image segmentation method based on attention mechanism and multi-level feature fusion as recited in claim 1 in which step h) comprises the steps of:
h-1) calculating to obtain total loss L total through a formula L total=αLCrossEntorpy+(1-α)LDice, wherein L CrossEntorpy is a cross entropy loss function, L Dice is a Dice loss function, and alpha is a weight;
h-2) training a segmentation network model by using an Adam optimizer and adopting total loss L total to obtain an optimized segmentation network model, wherein the batch size is set to 10 during training, the maximum epoch is set to 200, the learning rate lr is 0.01, and the alpha is set to 0.05.
CN202311461592.2A 2023-11-06 2023-11-06 Cardiac image segmentation method based on attention mechanism and multi-level feature fusion Active CN117522881B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311461592.2A CN117522881B (en) 2023-11-06 2023-11-06 Cardiac image segmentation method based on attention mechanism and multi-level feature fusion

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311461592.2A CN117522881B (en) 2023-11-06 2023-11-06 Cardiac image segmentation method based on attention mechanism and multi-level feature fusion

Publications (2)

Publication Number Publication Date
CN117522881A CN117522881A (en) 2024-02-06
CN117522881B true CN117522881B (en) 2024-06-18

Family

ID=89759877

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311461592.2A Active CN117522881B (en) 2023-11-06 2023-11-06 Cardiac image segmentation method based on attention mechanism and multi-level feature fusion

Country Status (1)

Country Link
CN (1) CN117522881B (en)

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116612131A (en) * 2023-05-22 2023-08-18 山东省人工智能研究院 Cardiac MRI structure segmentation method based on ADC-UNet model

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10922816B2 (en) * 2018-08-27 2021-02-16 Siemens Healthcare Gmbh Medical image segmentation from raw data using a deep attention neural network
CN109389078B (en) * 2018-09-30 2022-06-21 京东方科技集团股份有限公司 Image segmentation method, corresponding device and electronic equipment
US11270447B2 (en) * 2020-02-10 2022-03-08 Hong Kong Applied Science And Technology Institute Company Limited Method for image segmentation using CNN
AU2020103905A4 (en) * 2020-12-04 2021-02-11 Chongqing Normal University Unsupervised cross-domain self-adaptive medical image segmentation method based on deep adversarial learning
CN115375711A (en) * 2022-09-19 2022-11-22 安徽大学 Image segmentation method of global context attention network based on multi-scale fusion
CN116843696B (en) * 2023-04-27 2024-04-09 山东省人工智能研究院 Cardiac MRI (magnetic resonance imaging) segmentation method based on feature similarity and super-parameter convolution attention
CN116740076A (en) * 2023-05-15 2023-09-12 苏州大学 Network model and method for pigment segmentation in retinal pigment degeneration fundus image
CN116563265B (en) * 2023-05-23 2024-03-01 山东省人工智能研究院 Cardiac MRI (magnetic resonance imaging) segmentation method based on multi-scale attention and self-adaptive feature fusion

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116612131A (en) * 2023-05-22 2023-08-18 山东省人工智能研究院 Cardiac MRI structure segmentation method based on ADC-UNet model

Also Published As

Publication number Publication date
CN117522881A (en) 2024-02-06

Similar Documents

Publication Publication Date Title
WO2023231329A1 (en) Medical image semantic segmentation method and apparatus
CN109584161A (en) The Remote sensed image super-resolution reconstruction method of convolutional neural networks based on channel attention
CN110675321A (en) Super-resolution image reconstruction method based on progressive depth residual error network
CN113012172A (en) AS-UNet-based medical image segmentation method and system
CN109214989A (en) Single image super resolution ratio reconstruction method based on Orientation Features prediction priori
CN111932461A (en) Convolutional neural network-based self-learning image super-resolution reconstruction method and system
CN111583285A (en) Liver image semantic segmentation method based on edge attention strategy
CN107341776A (en) Single frames super resolution ratio reconstruction method based on sparse coding and combinatorial mapping
CN111667407B (en) Image super-resolution method guided by depth information
CN113298717A (en) Medical image super-resolution reconstruction method based on multi-attention residual error feature fusion
CN112561799A (en) Infrared image super-resolution reconstruction method
CN112365422A (en) Irregular missing image restoration method and system based on deep aggregation network
CN115239674B (en) Computer angiography imaging synthesis method based on multi-scale discrimination
CN115565056A (en) Underwater image enhancement method and system based on condition generation countermeasure network
CN110853048A (en) MRI image segmentation method, device and storage medium based on rough training and fine training
CN115578427A (en) Unsupervised single-mode medical image registration method based on deep learning
CN115375711A (en) Image segmentation method of global context attention network based on multi-scale fusion
CN116739899A (en) Image super-resolution reconstruction method based on SAUGAN network
CN114998458A (en) Undersampled magnetic resonance image reconstruction method based on reference image and data correction
CN115222592A (en) Underwater image enhancement method based on super-resolution network and U-Net network and training method of network model
CN113379606B (en) Face super-resolution method based on pre-training generation model
CN117522881B (en) Cardiac image segmentation method based on attention mechanism and multi-level feature fusion
CN117036162B (en) Residual feature attention fusion method for super-resolution of lightweight chest CT image
CN116051609B (en) Unsupervised medical image registration method based on band-limited deformation Fourier network
CN115859606A (en) SAR-optical image translation method and system based on image evaluation and feature selection

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant